Build a real-time voice chatbot that detects when the user stops speaking, sends the audio to an AI model, and streams the response back as speech.
Add live webcam object detection to a FastAPI app by wrapping your detection function with FastRTC.
Give users a phone number to call your Python AI voice assistant over a regular phone line via Hugging Face.
Mount FastRTC onto an existing FastAPI server to add real-time audio without rewriting your app.
Phone number feature requires a Hugging Face account, production WebRTC deployments require HTTPS.
FastRTC is a Python library that lets you add real-time audio and video streaming to an application with very little code. The core idea is that you write a regular Python function that processes audio or video, and FastRTC handles all the plumbing to turn that function into a live stream that a browser or phone can connect to. The library supports two transport protocols: WebRTC and WebSockets. WebRTC is the standard technology that browsers use for video calls and is designed for low-latency, peer-to-peer media. WebSockets are a simpler alternative for cases where WebRTC is not needed. Both are available through the same interface. For voice applications, FastRTC includes built-in voice activity detection that automatically figures out when the user has stopped speaking and hands that audio chunk to your function. This means you do not have to build your own silence detection to know when a speaker has finished a sentence before sending audio to a speech recognition model or an AI. A text-to-speech layer is also available as an optional install. Deployment options are flexible. You can launch a quick test interface built on Gradio with a single method call. You can mount the stream onto an existing FastAPI web server to integrate it into a larger application. There is also a method that gives you a temporary phone number so someone can call into your stream over a regular phone line, which requires a Hugging Face account. The README includes several demo applications built with the library: real-time voice conversations with models like Gemini, ChatGPT, and Claude, live speech transcription using Whisper, webcam object detection, and a voice-controlled code editor. Each example links to a live demo and its source code on Hugging Face Spaces. The library is published on PyPI and installs with pip.
← gradio-app on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.