Add microphone-based voice input to a Python app using a cloud speech API like Google or Azure with just a few lines of code.
Transcribe audio files entirely offline using OpenAI Whisper or Vosk without sending data to any external service.
Build a background listening loop in Python that waits for speech, transcribes it, and passes the text to another function.
Experiment with multiple speech recognition providers without rewriting your app by just changing one config value.
Core install is `pip install SpeechRecognition`. Microphone input also needs PyAudio. Offline engines like Whisper or Vosk require their own separate packages.
SpeechRecognition is a Python library that converts spoken audio into text. Its main strength is that it acts as a unified wrapper around many different speech recognition services and engines, so you can switch between them without rewriting your code. You install it with a single pip command and can have audio transcribed with just a few lines of Python. The services it supports span both online APIs and options that work entirely offline. Online options include Google Speech Recognition, Google Cloud Speech, Microsoft Azure Speech, Wit.ai, IBM Speech to Text, Groq's Whisper API, and the Cohere Transcribe API. Offline options include CMU Sphinx, Vosk, Snowboy (for detecting specific trigger words), and OpenAI Whisper running locally on your own machine. OpenAI-compatible self-hosted servers such as Ollama are also supported through the OpenAI API path. You can feed audio to the library from a microphone connected to your computer or from an audio file. The library includes tools to help manage microphone input, such as calibrating sensitivity to the ambient noise level in the room and listening in the background while other code keeps running. Examples in the repository demonstrate common tasks: recording from a microphone, transcribing a file, saving audio to disk, and adjusting recognition settings. Not every dependency is required upfront. The core package installs cleanly and you only add the additional libraries for the specific engine you want to use. For instance, microphone input requires PyAudio, Sphinx requires PocketSphinx, Whisper requires the whisper package, and Vosk requires the vosk package. This keeps the installation lightweight if you only need one or two of the supported engines. The library is designed for Python 3.9 and above and is distributed via PyPI. Source code and an issue tracker are on GitHub. The README includes a complete reference document for every public class and method.
← uberi on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.