While witnessing the arrival of revolutionary AI text-to-image models, there is another AI model was released by OpenAI for speech-to-text.
In this article, you will learn what is Whisper, its variations and its system requirement, and how to use it on your computer. Before getting into the article, check out the demo of Whisper in Hugging Face to get a glimpse.
While running Whisper in Hugging Face, it may take up to 9 seconds to process the input and show the output since it runs on the CPU. If you run Whisper on systems with GPU, it only takes 500 milliseconds or 1 or 2 seconds to show the result.
What is Whisper AI?
Whisper by OpenAI is an automatic speech recognition (ASR) that transcribes multilingual audio. Whisper is a result of training a neural network on 680,000 hours of multilingual and multitasking supervised data collected from the Internet.
Variations and System Requirements of Whisper
There are 5 variants of Whisper available from tiny to large. The large the parameters the better the output.
Models | Parameters | Syntax for English-only Model | Syntax for Multilingual Model | GPU Memory: VRAM Requirement | Relative Speed |
tiny | 39 M | tiny.en | tiny | ~1 GB | ~32x |
base | 74 M | base.en | base | ~1 GB | ~16x |
small | 244 M | small.en | small | ~2 GB | ~6x |
medium | 769 M | medium.en | medium | ~5 GB | ~2x |
large | 1.5 B | Nil | large | ~10 GB | ~1x |
Ensure that you have the required amount of GPU memory for the model you choose to run.
How to Use Whisper for Free?
OpenAI has made Whisper open-source on their GitHub account. Thus, anyone can use the Whisper code for free.
You can run the code from the command line or inside the Python IDE.
1. Install the Whisper Code
To download and install the Whisper code on your computer, just copy-paste the pip install command available on the OpenAI’s Git page.
pip install git+https://github.com/openai/whisper.git
Then, you also need to install “ffmpeg”.
To do so, run the below command:
- For Ubuntu or Debian – sudo apt update && sudo apt install ffmpeg
- For MacOs using Homebrew (https://brew.sh/) – brew install ffmpeg
- For Windows using Chocolatey (https://chocolatey.org/) – choco install ffmpeg
2. Make an Audio
Make or get ready with the audio that you want to transcribe.
3. Run the Code in Python Environment
To run the code in your Python Environment, just copy-paste the code from OpenAI’s Git page.
Then, change the audio file name in the “result” variable as shown in the screenshot below.
After running the code, you will be able to see the output in the text form.
4. Command-line Execution
If you don’t have any Python IDE and want to run the Whisper code in the command-line, you can do so by following the below instruction.
Conclusion
Whisper is a powerful speech-to-text and multilingual speech translation that was developed and open-sourced by OpenAI. You can run Whisper in your Python environment as mentioned in this article.
If you are not into coding and don’t want to try it in a Python environment, you can simply try the demo from Hugging Face. Since Whisper is in the earlier phase, you need to wait for the release of a graphical user interface (GUI) to use Whisper without any coding intervention.
FAQs
1. What Can Whisper Do?
Whisper by OpenAI is an automatic speech recognition (ASR) that can do multilingual speech recognition, speech translation, and language identification.
2. What Languages Does Whisper support?
Whisper currently supports 75 languages. Languages supported by Whisper are listed below in the order of low-to-high word error rate (WER):
- Spanish
- Italian
- English
- Portuguese
- German
- Japanese
- Russian
- Polish
- French
- Catalan
- Dutch
- Indonesian
- Turkish
- Malay
- Ukranian
- Swedish
- Vietnamese
- Norwegian
- Finnish
- Thai
- Korean
- Romanian
- Slovak
- Tagalog
- Crotian
- Danish
- Czech
- Arabic
- Bulgarian
- Urdu
- Estonian
- Hindi
- Slovenian
- Latvian
- Azerbaijani
- Serbian
- Hebrew
- Lithuanian
- Persian
- Welsh
- Afrikaans
- Icelandic
- Marathi
- Kazakh
- Maori
- Swahili
- Nepali
- Armenian
- Belarusian
- Kannada
- Tajik
- Occitan
- Lingala
- Maltese
- Luxembourgish
- Hausa
- Javanese
- Pashto
- Uzbek
- Khmer
- Georgian
- Telugu
- Malayalam
- Lao
- Punjabi
- Somali
- Gujarati
- Bengali
- Assamese
- Mongolian
- Yoruba
- Myanmar
- Amharic
- Shona
- Sindhi
3. How to fix “RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same” while running Whisper?
To fix “RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same” error while running Whisper, just include fp16=False in the result variable.