Hi folks, I’m in a bit of a personal crisis currently and need to quickly find a piece of speech transcription software that works on Linux and does not require a significant time investment to set up and can help me transcribe a number of audio clips <15 min. each.

  • Can someone recommend a program that can transcribe some audio recordings for me and is relatively simple to set up and use?
  • Do such programs need a GPU to run effectively? I’m running a Dell XPS 9370 laptop which only has internal graphics.

My backup plan is to listen and transcribe by hand, so recommendations of a program that will allow me to self-transcribe by typing while listening at a reduced rate are also appreciated.

  • If any experienced transcribers are reading this, have you found that your pedals worked well with Linux?

Normally I would try out all the different programs and do more than the small number of searches I’ve done, but my timeline doesn’t allow time for to build a cluster of custom-coded transcription bots running gentoo on hand-soldered hardware.

My environment is EndeavorOS running on a Dell XPS 9370,internet is over Wifi, with no external dongles or anything currently hooked up.

  • just_another_person@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    2 months ago

    Depends on what the audio is. What’s the crisis?

    Generally, you can use CPU for anything based on pytorch, it will just take substantially longer.

    • njordomir@lemmy.worldOP
      link
      fedilink
      arrow-up
      1
      ·
      2 months ago

      Transcription of numerous voice mails and phone calls for a legal matter. Would like to supply transcripts with the audio files so we don’t have to pay as much time for the lawyer’s paralegals to review and decide what is actually going to be useful.

      • just_another_person@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        2 months ago

        Start with Whisper as someone else mentioned. DeepSpeech by Mozilla is another simple one.

        Both are similar in performance and accuracy for normal spoken conversation with no extra auditory noise.

        • njordomir@lemmy.worldOP
          link
          fedilink
          arrow-up
          2
          ·
          2 months ago

          Whisper worked for me. I’ll have to go back through and tag speakers and fox a few spots but you guys have saved me 80-90% of the work. Thank you.

  • Maxy@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    2
    ·
    2 months ago

    I’ve had good experiences with whisper.cpp (should be in the AUR). I used the large model on my GPU (3060), and it filled 11.5 out of the 12GB of vram, so you might have to settle for a lower tier model. The speed was pretty much real time on my GPU, so it might be quite a bit slower on your CPU, unless the lower tier models are also a lot faster (never tested them due to lack of necessity).

    The large model had pretty much perfect accuracy (only 5 or so mistakes in ~40 pages of transcriptions), and that was with Dutch audio recorded on a smartphone. If it can handle my pretty horrible conditions, your audio should (hopefully) be no problem to transcribe.

    • njordomir@lemmy.worldOP
      link
      fedilink
      arrow-up
      2
      ·
      2 months ago

      I used the base model and it ran at a very acceptable speed with CPU only. Decent accuracy considering the recording was mediocre quality at best. Thank you for the suggestion.

  • RmDebArc_5@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    12
    ·
    edit-2
    2 months ago

    I use Speech Note for STT/TTS and it works great. You can choose between different models, I use whisper (more accurate) or Vosk (faster). You don’t need a GPU, but it will speed things up greatly

    • njordomir@lemmy.worldOP
      link
      fedilink
      arrow-up
      12
      ·
      2 months ago

      I was able to quickly set up and use whisper (base) using Speech Note without issue and it saved me over 80% of what I would have had to manually do. Thank you for the recommendation.