uberi/speech_recognition

★ 8,963PythonAudience · developerComplexity · 2/5Setup · easy

Mindmap

mindmap
  root((repo))
    What it does
      Speech to text
      Unified API wrapper
    Online Services
      Google Speech
      Azure Speech
      Groq Whisper API
      Wit.ai and IBM
    Offline Engines
      OpenAI Whisper local
      Vosk
      CMU Sphinx
    Input Sources
      Microphone
      Audio files
    Audience
      Python developers
      AI app builders

mindmap root((repo)) What it does Speech to text Unified API wrapper Online Services Google Speech Azure Speech Groq Whisper API Wit.ai and IBM Offline Engines OpenAI Whisper local Vosk CMU Sphinx Input Sources Microphone Audio files Audience Python developers AI app builders

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Add microphone-based voice input to a Python app using a cloud speech API like Google or Azure with just a few lines of code.

USE CASE 2

Transcribe audio files entirely offline using OpenAI Whisper or Vosk without sending data to any external service.

USE CASE 3

Build a background listening loop in Python that waits for speech, transcribes it, and passes the text to another function.

USE CASE 4

Experiment with multiple speech recognition providers without rewriting your app by just changing one config value.

Tech stack

PythonPyAudioWhisperVoskPocketSphinx

Getting it running

Difficulty · easy Time to first run · 5min

Core install is `pip install SpeechRecognition`. Microphone input also needs PyAudio. Offline engines like Whisper or Vosk require their own separate packages.

No license information is mentioned in the explanation.

In plain English

SpeechRecognition is a Python library that converts spoken audio into text. Its main strength is that it acts as a unified wrapper around many different speech recognition services and engines, so you can switch between them without rewriting your code. You install it with a single pip command and can have audio transcribed with just a few lines of Python. The services it supports span both online APIs and options that work entirely offline. Online options include Google Speech Recognition, Google Cloud Speech, Microsoft Azure Speech, Wit.ai, IBM Speech to Text, Groq's Whisper API, and the Cohere Transcribe API. Offline options include CMU Sphinx, Vosk, Snowboy (for detecting specific trigger words), and OpenAI Whisper running locally on your own machine. OpenAI-compatible self-hosted servers such as Ollama are also supported through the OpenAI API path. You can feed audio to the library from a microphone connected to your computer or from an audio file. The library includes tools to help manage microphone input, such as calibrating sensitivity to the ambient noise level in the room and listening in the background while other code keeps running. Examples in the repository demonstrate common tasks: recording from a microphone, transcribing a file, saving audio to disk, and adjusting recognition settings. Not every dependency is required upfront. The core package installs cleanly and you only add the additional libraries for the specific engine you want to use. For instance, microphone input requires PyAudio, Sphinx requires PocketSphinx, Whisper requires the whisper package, and Vosk requires the vosk package. This keeps the installation lightweight if you only need one or two of the supported engines. The library is designed for Python 3.9 and above and is distributed via PyPI. Source code and an issue tracker are on GitHub. The README includes a complete reference document for every public class and method.

Copy-paste prompts

Prompt 1

Using the SpeechRecognition Python library, write a script that records from a microphone, waits for the user to stop talking, and prints the transcribed text using Google Speech Recognition.

Prompt 2

Show me how to use the SpeechRecognition library with OpenAI Whisper running locally to transcribe an audio file without any internet connection.

Prompt 3

Write a Python script that continuously listens in the background using SpeechRecognition and triggers a function whenever it detects speech.

Prompt 4

How do I calibrate the SpeechRecognition library's microphone sensitivity to the ambient noise level in the room before recording?

Open on GitHub → Explain another repo

← uberi on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.