explaingit

livekit/agents

10,463PythonAudience · developerComplexity · 3/5Setup · moderate

TLDR

A Python framework for building real-time voice AI agents that listen, think, and speak in live audio sessions, letting you mix and swap speech-to-text, language model, and text-to-speech providers independently.

Mindmap

mindmap
  root((LiveKit Agents))
    What it does
      Real-time voice AI
      Listen and respond
      Live audio sessions
    Components
      Agent instructions
      AgentSession pipeline
      AgentServer process
    Features
      Provider swapping
      Agent handoff
      Phone call support
      MCP tool calls
    Ecosystem
      WebRTC transport
      Multiple LLM providers
      JavaScript sibling
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Build a phone-based customer service agent that answers inbound calls, understands speech, and replies with a synthesized voice.

USE CASE 2

Create a real-time voice assistant that calls external APIs mid-conversation using MCP tool integration.

USE CASE 3

Route a caller from one AI agent to another mid-session, such as handing off from a greeter to a billing specialist.

USE CASE 4

Run a voice AI agent that dials outbound calls through a telephony integration.

Tech stack

PythonWebRTCLiveKitOpenAIDeepgram

Getting it running

Difficulty · moderate Time to first run · 30min

Requires a LiveKit server or LiveKit Cloud account plus API keys for your chosen speech-to-text, language model, and text-to-speech providers.

In plain English

This is a Python framework for building voice AI agents that run on servers and talk to people in real time. An agent built with this library can listen to speech, understand what was said, generate a reply using a language model, and speak the reply back, all within a live audio or video session. The underlying connection technology is WebRTC, which is what browsers and apps use for real-time communication without noticeable delay. The framework is designed around a few building blocks. An Agent holds the instructions that tell the language model how to behave. An AgentSession manages the ongoing conversation, wiring together the speech-to-text, language model, and text-to-speech components you choose. An AgentServer is the process that runs on your machine or cloud server, waiting for users to connect and then launching a session for each one. You write an entrypoint function that describes what should happen when a user joins, similar to how a web server handles an incoming request. One of the key design choices is that each component (the part that converts speech to text, the language model, and the part that converts text back to speech) can be swapped independently. You can mix providers from OpenAI, Deepgram, Cartesia, and others, or route everything through LiveKit's own inference layer. The README shows a working example that sets up a simple weather assistant in around 30 lines of code. Beyond simple back-and-forth conversations, the framework supports handing off a conversation from one agent to another mid-session, which is useful for routing callers or switching personas. It also supports phone calls: the agent can dial out or receive inbound calls via a telephony integration. There is a built-in mechanism for detecting when a user has finished speaking before the agent responds, which reduces the chance of the agent interrupting mid-sentence. MCP tool integration is also supported, letting the agent call external services as part of a conversation. The library is fully open-source. A JavaScript version called AgentsJS exists in a separate repository for teams building on Node rather than Python.

Copy-paste prompts

Prompt 1
Using the LiveKit Agents framework, write a Python entrypoint that creates a customer support agent using OpenAI as the language model and Deepgram for speech-to-text.
Prompt 2
I want to add MCP tool integration to my LiveKit Agent so it can look up live data during a call. Show me how to wire that up.
Prompt 3
How do I implement agent handoff in LiveKit Agents, transferring a caller from a general intake agent to a specialist agent mid-conversation?
Prompt 4
Help me deploy a LiveKit Agent to a cloud server and connect it to a telephony provider to receive inbound phone calls.
Prompt 5
Show me how to swap the text-to-speech provider in my LiveKit Agent from one vendor to another without rewriting the rest of the code.
Open on GitHub → Explain another repo

← livekit on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.