koljab/realtimevoicechat

Analysis updated 2026-07-03

★ 3,721PythonAudience · developerComplexity · 4/5Setup · hard

Mindmap

mindmap
  root((RealtimeVoiceChat))
    What it does
      Browser voice chat
      Real-time AI responses
      Interruptible speech
    Pipeline
      Browser microphone
      WebSocket stream
      Speech to text
      AI language model
      Text to speech
    Tech stack
      Python FastAPI
      Docker
      Ollama or OpenAI
      Kokoro or Coqui
    Requirements
      NVIDIA GPU
      Linux recommended
      Windows script available
    Status
      Community maintained

mindmap root((RealtimeVoiceChat)) What it does Browser voice chat Real-time AI responses Interruptible speech Pipeline Browser microphone WebSocket stream Speech to text AI language model Text to speech Tech stack Python FastAPI Docker Ollama or OpenAI Kokoro or Coqui Requirements NVIDIA GPU Linux recommended Windows script available Status Community maintained

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Run a real-time voice AI assistant locally on your own machine without sending audio to a cloud service

USE CASE 2

Build a browser-based voice interface for an Ollama model with interruptible, natural-sounding speech responses

USE CASE 3

Connect the voice pipeline to OpenAI's API instead of a local model for higher-quality responses

USE CASE 4

Experiment with different text-to-speech engines like Kokoro, Coqui, or Orpheus to find the best voice quality for your use case

What is it built with?

PythonFastAPIWebSocketDockerOllamaKokoroCoqui

How does it compare?

	koljab/realtimevoicechat	insanum/gcalcli	allenai/open-instruct
Stars	3,721	3,721	3,720
Language	Python	Python	Python
Setup difficulty	hard	moderate	hard
Complexity	4/5	2/5	5/5
Audience	developer	developer	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires a powerful NVIDIA GPU, without one the real-time feel breaks down.

In plain English

RealtimeVoiceChat lets you have a spoken conversation with an AI language model through your web browser. You speak, it listens and responds in near real-time with synthesized speech, and you can interrupt it mid-sentence just as you would in a normal conversation. The flow works like this: your browser captures your microphone audio and streams it over a WebSocket connection to a Python backend. The backend transcribes your speech to text using a library called RealtimeSTT, sends the text to an AI language model for a response, then converts the response back to speech using RealtimeTTS and streams the audio back to your browser. The whole pipeline is designed to keep the delay between you finishing a sentence and the AI starting its reply as short as possible. The AI language model backend is pluggable. By default it connects to Ollama, a tool for running open-source AI models locally on your own machine. You can also configure it to use OpenAI's API instead. For speech synthesis, you can choose between several text-to-speech engines: Kokoro, Coqui, or Orpheus. The turn-detection logic watches for pauses in your speech to decide when you have finished talking, and it adapts to the pace of the conversation. Running this project requires a reasonably powerful NVIDIA graphics card (GPU). Without one, the speech recognition and synthesis models run much more slowly and the real-time feel breaks down. The recommended setup uses Docker on Linux, which bundles the application and its dependencies into containers. A Windows installation script is also provided. The backend is built with Python and FastAPI. The original author is no longer actively adding features or providing support, and the project is now community-maintained. Pull requests from contributors are still reviewed and merged periodically.

Copy-paste prompts

Prompt 1

Set up realtimevoicechat with Docker on Linux so I can talk to a local Llama 3 model running in Ollama from my browser

Prompt 2

Show me how to configure realtimevoicechat to use OpenAI's API instead of Ollama, and switch the TTS engine to Kokoro

Prompt 3

How does the turn-detection logic in realtimevoicechat decide when I've finished speaking, and how can I tune the pause threshold?

Prompt 4

Walk me through the WebSocket message flow in realtimevoicechat from browser microphone input to AI audio response

Frequently asked questions

What is realtimevoicechat?

Self-hosted Python backend that lets you have a real-time spoken conversation with an AI language model through your browser, with low-latency speech recognition, AI responses, and voice synthesis you can interrupt mid-sentence.

What language is realtimevoicechat written in?

Mainly Python. The stack also includes Python, FastAPI, WebSocket.

How hard is realtimevoicechat to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is realtimevoicechat for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub koljab on gitmyhub

Verify against the repo before relying on details.