Build a phone-based customer service agent that answers inbound calls, understands speech, and replies with a synthesized voice.
Create a real-time voice assistant that calls external APIs mid-conversation using MCP tool integration.
Route a caller from one AI agent to another mid-session, such as handing off from a greeter to a billing specialist.
Run a voice AI agent that dials outbound calls through a telephony integration.
Requires a LiveKit server or LiveKit Cloud account plus API keys for your chosen speech-to-text, language model, and text-to-speech providers.
This is a Python framework for building voice AI agents that run on servers and talk to people in real time. An agent built with this library can listen to speech, understand what was said, generate a reply using a language model, and speak the reply back, all within a live audio or video session. The underlying connection technology is WebRTC, which is what browsers and apps use for real-time communication without noticeable delay. The framework is designed around a few building blocks. An Agent holds the instructions that tell the language model how to behave. An AgentSession manages the ongoing conversation, wiring together the speech-to-text, language model, and text-to-speech components you choose. An AgentServer is the process that runs on your machine or cloud server, waiting for users to connect and then launching a session for each one. You write an entrypoint function that describes what should happen when a user joins, similar to how a web server handles an incoming request. One of the key design choices is that each component (the part that converts speech to text, the language model, and the part that converts text back to speech) can be swapped independently. You can mix providers from OpenAI, Deepgram, Cartesia, and others, or route everything through LiveKit's own inference layer. The README shows a working example that sets up a simple weather assistant in around 30 lines of code. Beyond simple back-and-forth conversations, the framework supports handing off a conversation from one agent to another mid-session, which is useful for routing callers or switching personas. It also supports phone calls: the agent can dial out or receive inbound calls via a telephony integration. There is a built-in mechanism for detecting when a user has finished speaking before the agent responds, which reduces the chance of the agent interrupting mid-sentence. MCP tool integration is also supported, letting the agent call external services as part of a conversation. The library is fully open-source. A JavaScript version called AgentsJS exists in a separate repository for teams building on Node rather than Python.
← livekit on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.