Speech-to-Text: Use OpenAI’s Whisper for Free

While witnessing the arrival of revolutionary AI text-to-image models, there is another AI model was released by OpenAI for speech-to-text.

In this article, you will learn what is Whisper, its variations and its system requirement, and how to use it on your computer. Before getting into the article, check out the demo of Whisper in Hugging Face to get a glimpse.

While running Whisper in Hugging Face, it may take up to 9 seconds to process the input and show the output since it runs on the CPU. If you run Whisper on systems with GPU, it only takes 500 milliseconds or 1 or 2 seconds to show the result.

What is Whisper AI?

Whisper by OpenAI is an automatic speech recognition (ASR) that transcribes multilingual audio. Whisper is a result of training a neural network on 680,000 hours of multilingual and multitasking supervised data collected from the Internet.

Variations and System Requirements of Whisper

There are 5 variants of Whisper available from tiny to large. The large the parameters the better the output.

Models	Parameters	Syntax for English-only Model	Syntax for Multilingual Model	GPU Memory: VRAM Requirement	Relative Speed
tiny	39 M	tiny.en	tiny	~1 GB	~32x
base	74 M	base.en	base	~1 GB	~16x
small	244 M	small.en	small	~2 GB	~6x
medium	769 M	medium.en	medium	~5 GB	~2x
large	1.5 B	Nil	large	~10 GB	~1x

Ensure that you have the required amount of GPU memory for the model you choose to run.

How to Use Whisper for Free?

OpenAI has made Whisper open-source on their GitHub account. Thus, anyone can use the Whisper code for free.

You can run the code from the command line or inside the Python IDE.

Install the Whisper Code
Make an Audio
Run the Code in Python Environment
Command-line Execution

1. Install the Whisper Code

To download and install the Whisper code on your computer, just copy-paste the pip install command available on the OpenAI’s Git page.

pip install git+https://github.com/openai/whisper.git

OpenAI Whisper: Speech-to-text — OpenAI’s GitHub Page

Then, you also need to install “ffmpeg”.

To do so, run the below command:

For Ubuntu or Debian – sudo apt update && sudo apt install ffmpeg
For MacOs using Homebrew (https://brew.sh/) – brew install ffmpeg
For Windows using Chocolatey (https://chocolatey.org/) – choco install ffmpeg

2. Make an Audio

Make or get ready with the audio that you want to transcribe.

3. Run the Code in Python Environment

To run the code in your Python Environment, just copy-paste the code from OpenAI’s Git page.

Run Whisper on your computer — OpenAI’s GitHub Page

Then, change the audio file name in the “result” variable as shown in the screenshot below.

After running the code, you will be able to see the output in the text form.

4. Command-line Execution

If you don’t have any Python IDE and want to run the Whisper code in the command-line, you can do so by following the below instruction.

Run Whisper on your device — OpenAI’s GitHub Page

Conclusion

Whisper is a powerful speech-to-text and multilingual speech translation that was developed and open-sourced by OpenAI. You can run Whisper in your Python environment as mentioned in this article.

If you are not into coding and don’t want to try it in a Python environment, you can simply try the demo from Hugging Face. Since Whisper is in the earlier phase, you need to wait for the release of a graphical user interface (GUI) to use Whisper without any coding intervention.

FAQs

1. What Can Whisper Do?

Whisper by OpenAI is an automatic speech recognition (ASR) that can do multilingual speech recognition, speech translation, and language identification.

2. What Languages Does Whisper support?

Whisper currently supports 75 languages. Languages supported by Whisper are listed below in the order of low-to-high word error rate (WER):

Spanish
Italian
English
Portuguese
German
Japanese
Russian
Polish
French
Catalan
Dutch
Indonesian
Turkish
Malay
Ukranian
Swedish
Vietnamese
Norwegian
Finnish
Thai
Korean
Romanian
Slovak
Tagalog
Crotian
Danish
Czech
Arabic
Bulgarian
Urdu
Estonian
Hindi
Slovenian
Latvian
Azerbaijani
Serbian
Hebrew
Lithuanian
Persian
Welsh
Afrikaans
Icelandic
Marathi
Kazakh
Maori
Swahili
Nepali
Armenian
Belarusian
Kannada
Tajik
Occitan
Lingala
Maltese
Luxembourgish
Hausa
Javanese
Pashto
Uzbek
Khmer
Georgian
Telugu
Malayalam
Lao
Punjabi
Somali
Gujarati
Bengali
Assamese
Mongolian
Yoruba
Myanmar
Amharic
Shona
Sindhi

3. How to fix “RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same” while running Whisper?

To fix “RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same” error while running Whisper, just include fp16=False in the result variable.