Analysis updated 2026-05-18
Build a WhatsApp voice assistant that automatically responds to audio messages with synthesized speech.
Create a custom WhatsApp voice bot with any combination of OpenAI, Anthropic, Groq, ElevenLabs, or Orchard Run.
Test the full voice pipeline locally using Docker and ngrok before deploying to a cloud server.
| orchard-run/orchard-meta-voice-agent | 0-bingwu-0/live-interpreter | 0xkaz/llm-governance-dashboard | |
|---|---|---|---|
| Stars | 2 | 2 | 2 |
| Language | Python | Python | Python |
| Setup difficulty | hard | moderate | hard |
| Complexity | 3/5 | 2/5 | 4/5 |
| Audience | developer | general | ops devops |
Figures from each repo's GitHub metadata at analysis time.
Requires a Meta developer account with WhatsApp Cloud API access, plus API keys for your chosen STT, LLM, and TTS providers.
This is a server that turns WhatsApp voice messages into AI-powered voice conversations. When someone sends an audio message to a WhatsApp number connected to this server, the server downloads it, runs it through a pipeline of three AI services, and sends a synthesized voice response back. The whole thing starts with a single Docker command. The pipeline works in three steps. First, a speech-to-text service converts the incoming audio into a text transcript. Second, a language model reads that transcript and generates a text response. Third, a text-to-speech service converts the response back into an audio file, which the server sends to the user through WhatsApp's cloud API. Each of the three steps can use a different service provider, configured through environment variables. For speech-to-text you can use Orchard Run, OpenAI, or Deepgram. For the language model you can use OpenAI, Anthropic, or Groq. For text-to-speech you can use Orchard Run, OpenAI, or ElevenLabs. The defaults use Orchard Run for both audio steps and OpenAI for the language model. The integration with WhatsApp uses Meta's WhatsApp Cloud API webhook system. When a user sends an audio message, Meta sends a notification to the server's webhook URL. The server then uses Meta's API to download the audio file, process it, and send the response. For local development, you can use ngrok, a tool that creates a public URL forwarding to your local machine, so Meta can reach your server while testing. The code is organized as a FastAPI application: one file handles the webhook, another orchestrates the three-step pipeline, and separate provider folders contain integrations for each AI service. No license is stated in the README.
A Docker-based server that receives WhatsApp audio messages, runs them through a configurable speech-to-text plus LLM plus text-to-speech pipeline, and replies with a synthesized voice.
Mainly Python. The stack also includes Python, FastAPI, Docker.
Setup difficulty is rated hard, with roughly 1h+ to a first successful run.
Mainly developer.
This repo across BitVibe Labs
Verify against the repo before relying on details.