Speech-to-Text: Use OpenAI’s Whisper for Free

While witnessing the arrival of revolutionary AI text-to-image models, there is another AI model was released by OpenAI for speech-to-text.

In this article, you will learn what is Whisper, its variations and its system requirement, and how to use it on your computer. Before getting into the article, check out the demo of Whisper in Hugging Face to get a glimpse.

While running Whisper in Hugging Face, it may take up to 9 seconds to process the input and show the output since it runs on the CPU. If you run Whisper on systems with GPU, it only takes 500 milliseconds or 1 or 2 seconds to show the result.

What is Whisper AI?

Whisper by OpenAI is an automatic speech recognition (ASR) that transcribes multilingual audio. Whisper is a result of training a neural network on 680,000 hours of multilingual and multitasking supervised data collected from the Internet.

Variations and System Requirements of Whisper

There are 5 variants of Whisper available from tiny to large. The large the parameters the better the output.

Models Parameters Syntax for English-only Model Syntax for Multilingual Model GPU Memory: VRAM Requirement Relative Speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1.5 B Nil large ~10 GB ~1x

Ensure that you have the required amount of GPU memory for the model you choose to run.

How to Use Whisper for Free?

OpenAI has made Whisper open-source on their GitHub account. Thus, anyone can use the Whisper code for free.

You can run the code from the command line or inside the Python IDE.

  1. Install the Whisper Code
  2. Make an Audio
  3. Run the Code in Python Environment
  4. Command-line Execution

1. Install the Whisper Code

To download and install the Whisper code on your computer, just copy-paste the pip install command available on the OpenAI’s Git page.

pip install git+https://github.com/openai/whisper.git

OpenAI Whisper: Speech-to-text
OpenAI’s GitHub Page

Then, you also need to install “ffmpeg”.

To do so, run the below command:

  • For Ubuntu or Debian – sudo apt update && sudo apt install ffmpeg
  • For MacOs using Homebrew (https://brew.sh/) – brew install ffmpeg
  • For Windows using Chocolatey (https://chocolatey.org/) – choco install ffmpeg

2. Make an Audio

Make or get ready with the audio that you want to transcribe.

3. Run the Code in Python Environment

To run the code in your Python Environment, just copy-paste the code from OpenAI’s Git page.

Run Whisper on your computer
OpenAI’s GitHub Page

Then, change the audio file name in the “result” variable as shown in the screenshot below.

Speech-to-Text: Use OpenAI's Whisper for Free

After running the code, you will be able to see the output in the text form.

4. Command-line Execution

If you don’t have any Python IDE and want to run the Whisper code in the command-line, you can do so by following the below instruction.

Run Whisper on your device
OpenAI’s GitHub Page

Conclusion

Whisper is a powerful speech-to-text and multilingual speech translation that was developed and open-sourced by OpenAI. You can run Whisper in your Python environment as mentioned in this article.

If you are not into coding and don’t want to try it in a Python environment, you can simply try the demo from Hugging Face. Since Whisper is in the earlier phase, you need to wait for the release of a graphical user interface (GUI) to use Whisper without any coding intervention.

FAQs

1. What Can Whisper Do?

Whisper by OpenAI is an automatic speech recognition (ASR) that can do multilingual speech recognition, speech translation, and language identification.

2. What Languages Does Whisper support?

Whisper currently supports 75 languages. Languages supported by Whisper are listed below in the order of low-to-high word error rate (WER):

  1. Spanish
  2. Italian
  3. English
  4. Portuguese
  5. German
  6. Japanese
  7. Russian
  8. Polish
  9. French
  10. Catalan
  11. Dutch
  12. Indonesian
  13. Turkish
  14. Malay
  15. Ukranian
  16. Swedish
  17. Vietnamese
  18. Norwegian
  19. Finnish
  20. Thai
  21. Korean
  22. Romanian
  23. Slovak
  24. Tagalog
  25. Crotian
  26. Danish
  27. Czech
  28. Arabic
  29. Bulgarian
  30. Urdu
  31. Estonian
  32. Hindi
  33. Slovenian
  34. Latvian
  35. Azerbaijani
  36. Serbian
  37. Hebrew
  38. Lithuanian
  39. Persian
  40. Welsh
  41. Afrikaans
  42. Icelandic
  43. Marathi
  44. Kazakh
  45. Maori
  46. Swahili
  47. Nepali
  48. Armenian
  49. Belarusian
  50. Kannada
  51. Tajik
  52. Occitan
  53. Lingala
  54. Maltese
  55. Luxembourgish
  56. Hausa
  57. Javanese
  58. Pashto
  59. Uzbek
  60. Khmer
  61. Georgian
  62. Telugu
  63. Malayalam
  64. Lao
  65. Punjabi
  66. Somali
  67. Gujarati
  68. Bengali
  69. Assamese
  70. Mongolian
  71. Yoruba
  72. Myanmar
  73. Amharic
  74. Shona
  75. Sindhi

3. How to fix “RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same” while running Whisper?

To fix “RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.cuda.FloatTensor) should be the same” error while running Whisper, just include fp16=False in the result variable.

Share this:

Leave a Comment