Your own free transcripts: how to install Whisper.cpp on Mac OS
Generating your own transcripts is a great way to take notes from meetings, transcribe a podcast or an interview, help organise your recordings.
Whisper uses a speech recognition model from OpenAI, and Whisper.cpp is an optimised version for speed. If you’re running an Apple-silicon Mac, it’s the best option - and here’s how to install it, so you can quickly run transcription from the terminal window.
You need Homebrew for this. Given that this is a terminal window application, my guess is that you already have it. But if not, you want to go to the Homebrew website and follow the instructions there to install it. It takes all the pain out of installing and maintaining software.
Install the program
First, install Whisper CPP and ffmpeg, a helper program you’ll need, by typing:
brew install whisper-cpp && brew install ffmpeg
There, that was easy.
Install the speech data
Second, you’ll need to download a model from here. A “model” is the actual data that does the transcribing.
Assuming you’re speaking in English, grab the one ending medium.en.bin
- the small.en.bin
one is also fine to download for most, but in our experience we need to spend less time correcting the output from the medium model.
Put this model file somewhere. I’ve put it in ~/tools
because that’s where all my random publishing tools are.
I’ve also made a ~/tmp
folder as well, to put temporary files in.
(If you didn’t know, ~
means “in my home folder”, so you can see it in the finder by going to your home folder, or opening Finder and typing SHIFT+CMD+H. Make a folder in there called tools
and one called tmp
, in other words, and then put that big .bin
model fine you’ve just downloaded in the tools
folder).
Run it
You’ll need three commands whenever you want to transcribe something.
export GGML_METAL_PATH_RESOURCES="$(brew --prefix whisper-cpp)/share/whisper-cpp"
This tells Whisper where to find some things it needs to run really quickly. Without it, your transcript will run much slower than you’d like it to.
ffmpeg -y -i youraudiofile.wav -ar 16000 ~/tmp/tempinput.wav
This uses ffmpeg to copy your audio file and make it into a temporary copy with a sample-rate of 16kHz. Whisper needs that to work.
Finally, you’ll want the command that actually does the transcribing.
whisper-cpp --language en --print-colors --model ~/tools/ggml-medium.en.bin --output-vtt --file ~/tmp/tempinput.wav --output-file ~/tmp/tempinput
In this, we’re printing some pretty colours while we do it (green shows certainty, red shows it isn’t that certain at all); we’ve told it that it’s in English so it doesn’t have to guess; and we’ve told it to output the VTT file to the ~/tmp
folder. (The VTT file is the best format for everyone, trust me on this).
After you’ve run it, you’ll see a file in your ~/tmp
folder called tempinput.vtt
which is the generated VTT file. Super easy. It’s this file that you need to use for, say, podcast transcripts and other things.
I’m running a MacBook Air - the “medium” model runs at 10x speed, so a 4’30” podcast takes 43 seconds to transcribe. The “small” model is a little less accurate, but runs more than twice as fast - in just 18 seconds.
Just typing whisper-cpp
into the terminal will give you all the available options. If you just want text, then here’s where you can also set it to output a text file.
We use this every day as part of a batch script that encodes and packages our podcast for everyone.