This local voice-to-text app replaced every paid service for me

I was recently generating a bunch of mind maps using AI. The problem was that many of the insights I wanted had to be extracted from hour-long video lectures, which needed to be transcribed first for my mind map generation workflow to work. So I went ahead and looked for online services like Otter.ai to transcribe my videos. But after seeing their subscription prices, I wasn’t exactly excited to pay $18 a month, especially since I only needed basic transcriptions.

I already knew local transcription solutions were a thing, so I tried a few of them. After a bit of testing, I settled on Whisper. Setting it up took me only a few minutes, and I had unlimited transcription running locally on my computer. After seeing how effortless the setup was and how smoothly it ran, I knew right away that I didn’t need a paid transcription service.

Why I chose Whisper

My preferred setup for OpenAI’s Whisper model is to use the Whisper ASR Webservice running through Docker. This combination checks all the boxes I actually care about. The setup was genuinely fast, taking me less than ten minutes from downloading Docker to getting my first transcript. It runs completely free, with no hidden costs or usage limits. The setup supports over 50 languages with automatic detection and provides confidence ratings, so I can easily correct inaccurate transcripts. Though privacy wasn’t really a concern for me, having a setup that runs entirely offline and private is always a plus.

There are other ways to run Whisper locally on your computer. However, I find that using Docker makes the setup as smooth and fast as possible. I don’t have to wrestle with system dependencies or worry about configuration errors. I just pull the image, run it, and start transcribing immediately.

How I set up Whisper with Docker in 10 minutes

No technical knowledge required

Download the proper Docker Desktop app for your system

Setting up Whisper on my machine turned out to be much easier than I expected. To set this up, I first downloaded and installed Docker Desktop for my PC. This tool packages software with everything it needs to run, so you don’t have to worry about installing dependencies or configuring systems.

After the installation, I opened Docker Desktop, clicked the Terminal button at the bottom right of the window, and entered the following command:

        docker pull onerahmet/openai-whisper-asr-webservice:latest

This command pulls the Whisper ASR Webservice to Docker.

Docker downloading Whisper Asr Webservice

After the download is finished, I then run Whisper ASR Webservice by entering this command in the terminal:

        docker run -d -p 9000:9000 -e ASR_MODEL=base onerahmet/openai-whisper-asr-webservice:latest

Once the container was up and running, I clicked the link on 9000:9000 in Docker Desktop, which launched my default web browser and opened the Whisper ASR Webservice interface.

Open Whisper Asr Webservice by clicking on the port provided — Screenshot by Jayric Maning –no attributions required

Once opened, I could then start using the Whisper ASR Webservice using my browser.

Using Whisper ASR Webservice

Running transcriptions through the web interface

Short transcription test using video file

To transcribe a video or audio file, expand the /asr menu and click Try it out. Scroll down to audio-file, click Choose file to upload your audio, then hit Execute.

Transcriptions are color coded based on AI confidence

As you can see in the transcription, the text is color-coded to indicate Whisper’s confidence levels. Green and white mean high confidence, yellow and orange indicate medium confidence, and red means low confidence, where the model struggles to process the audio. I’ve found that yellow and orange text are often still accurate, but I always double-check any lines highlighted in red.

Whisper can also support long audio files. Here, I transcribed an hour-long audio lecture, and it finished in about a minute and a half. Perfect for my local AI mind map generation workflow.

Hour long AI lecture transcribed in less than two minutes

Whisper also supports more than 50 languages and automatically detects the language of your audio. I tried transcribing a Filipino-language video, and the results were decent enough, though it requires a fair bit of editing to make it accurate.

I also tested the same file with Otter.ai to see if the output was any better. To my surprise, it was actually worse. Seems the AI glitched or hallucinated, giving no coherent response.

Transcribing the same video with Otter.ai

That said, this doesn’t mean that all non-English languages produce poor results. Some languages simply have more training data available online. So there’s a good chance that other major languages such as French, Spanish, Mandarin, and Arabic will yield better accuracy.

Though it wasn’t perfect, I was pretty happy with the results of local AI transcription. I also tested translation on both Otter (using its chatbot) and Whisper (with its built-in tool), but the outputs were unusable. I don’t suggest using translation at all. Better use Google Translate or let ChatGPT do the translation if needed.

The trade off of running AI transcriptions locally

It’s not for everyone

Otter.ai offering advanced features like speaker diarization and summary

After using this setup, it became immediately clear what it can and can’t do compared to paid online services. These limitations are worth knowing upfront before you commit to this approach.

I think the biggest problem most people will have is the lack of mobile support. Running Whisper locally on a smartphone isn’t practical just yet, so there’s no way to use your phone for local AI transcription. Another limitation people will definitely notice is the lack of speaker diarization. While it’s technically possible, it requires quite a bit more setup and configuration, which may or may not work due to the fragmented nature of open-source projects like this. However, the option does exist if you absolutely need it. Paid services also integrate seamlessly with other platforms like Google Meet, Slack, and Zoom, automatically transcribing meetings as they happen. My local setup can’t do that either, since it requires manually uploading audio files.

Overall, running local AI transcription isn’t for everyone. It’s great if you’re comfortable tinkering with setups and don’t mind a bit of manual work, but if you prefer plug-and-play tools, mobile access, and integration with other online platforms, there’s really no substitute for using paid cloud-based services like Otter.ai or Fireflies.ai. Of course, there are good free online transcription services, but even these have their limits, which is why I’m still sticking with my local AI setup.

Get unlimited transcriptions today

The setup takes just a few minutes. If you’re tired of free-tier limits or thinking about a paid subscription, try this instead. Download Docker, run two commands, and you’ll have unlimited local transcription. No credit cards, no monthly fees, no surprises. It handles everything you throw at it. Whether you’re transcribing podcasts, lectures, interviews, or random voice memos, Whisper processes them all seamlessly. If you work with audio files regularly, this setup is absolutely worth trying.