explaingit

koljab/realtimevoicechat

Analysis updated 2026-07-03

3,721PythonAudience · developerComplexity · 4/5Setup · hard

TLDR

Self-hosted Python backend that lets you have a real-time spoken conversation with an AI language model through your browser, with low-latency speech recognition, AI responses, and voice synthesis you can interrupt mid-sentence.

Mindmap

mindmap
  root((RealtimeVoiceChat))
    What it does
      Browser voice chat
      Real-time AI responses
      Interruptible speech
    Pipeline
      Browser microphone
      WebSocket stream
      Speech to text
      AI language model
      Text to speech
    Tech stack
      Python FastAPI
      Docker
      Ollama or OpenAI
      Kokoro or Coqui
    Requirements
      NVIDIA GPU
      Linux recommended
      Windows script available
    Status
      Community maintained
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Run a real-time voice AI assistant locally on your own machine without sending audio to a cloud service

USE CASE 2

Build a browser-based voice interface for an Ollama model with interruptible, natural-sounding speech responses

USE CASE 3

Connect the voice pipeline to OpenAI's API instead of a local model for higher-quality responses

USE CASE 4

Experiment with different text-to-speech engines like Kokoro, Coqui, or Orpheus to find the best voice quality for your use case

What is it built with?

PythonFastAPIWebSocketDockerOllamaKokoroCoqui

How does it compare?

koljab/realtimevoicechatinsanum/gcalcliallenai/open-instruct
Stars3,7213,7213,720
LanguagePythonPythonPython
Setup difficultyhardmoderatehard
Complexity4/52/55/5
Audiencedeveloperdeveloperresearcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires a powerful NVIDIA GPU, without one the real-time feel breaks down.

In plain English

RealtimeVoiceChat lets you have a spoken conversation with an AI language model through your web browser. You speak, it listens and responds in near real-time with synthesized speech, and you can interrupt it mid-sentence just as you would in a normal conversation. The flow works like this: your browser captures your microphone audio and streams it over a WebSocket connection to a Python backend. The backend transcribes your speech to text using a library called RealtimeSTT, sends the text to an AI language model for a response, then converts the response back to speech using RealtimeTTS and streams the audio back to your browser. The whole pipeline is designed to keep the delay between you finishing a sentence and the AI starting its reply as short as possible. The AI language model backend is pluggable. By default it connects to Ollama, a tool for running open-source AI models locally on your own machine. You can also configure it to use OpenAI's API instead. For speech synthesis, you can choose between several text-to-speech engines: Kokoro, Coqui, or Orpheus. The turn-detection logic watches for pauses in your speech to decide when you have finished talking, and it adapts to the pace of the conversation. Running this project requires a reasonably powerful NVIDIA graphics card (GPU). Without one, the speech recognition and synthesis models run much more slowly and the real-time feel breaks down. The recommended setup uses Docker on Linux, which bundles the application and its dependencies into containers. A Windows installation script is also provided. The backend is built with Python and FastAPI. The original author is no longer actively adding features or providing support, and the project is now community-maintained. Pull requests from contributors are still reviewed and merged periodically.

Copy-paste prompts

Prompt 1
Set up realtimevoicechat with Docker on Linux so I can talk to a local Llama 3 model running in Ollama from my browser
Prompt 2
Show me how to configure realtimevoicechat to use OpenAI's API instead of Ollama, and switch the TTS engine to Kokoro
Prompt 3
How does the turn-detection logic in realtimevoicechat decide when I've finished speaking, and how can I tune the pause threshold?
Prompt 4
Walk me through the WebSocket message flow in realtimevoicechat from browser microphone input to AI audio response

Frequently asked questions

What is realtimevoicechat?

Self-hosted Python backend that lets you have a real-time spoken conversation with an AI language model through your browser, with low-latency speech recognition, AI responses, and voice synthesis you can interrupt mid-sentence.

What language is realtimevoicechat written in?

Mainly Python. The stack also includes Python, FastAPI, WebSocket.

How hard is realtimevoicechat to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is realtimevoicechat for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub koljab on gitmyhub

Verify against the repo before relying on details.