explaingit

orchard-run/orchard-meta-voice-agent

Analysis updated 2026-05-18

2PythonAudience · developerComplexity · 3/5Setup · hard

TLDR

A Docker-based server that receives WhatsApp audio messages, runs them through a configurable speech-to-text plus LLM plus text-to-speech pipeline, and replies with a synthesized voice.

Mindmap

mindmap
  root((WhatsApp Voice Agent))
    How it works
      Receive audio message
      Speech to text
      LLM generates reply
      Text to speech
      Send audio response
    Providers
      STT options
      LLM options
      TTS options
    Setup
      Docker compose
      Environment variables
      Meta webhook config
    Local testing
      ngrok tunnel
      FastAPI server
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Build a WhatsApp voice assistant that automatically responds to audio messages with synthesized speech.

USE CASE 2

Create a custom WhatsApp voice bot with any combination of OpenAI, Anthropic, Groq, ElevenLabs, or Orchard Run.

USE CASE 3

Test the full voice pipeline locally using Docker and ngrok before deploying to a cloud server.

What is it built with?

PythonFastAPIDockerWhatsApp Cloud API

How does it compare?

orchard-run/orchard-meta-voice-agent0-bingwu-0/live-interpreter0xkaz/llm-governance-dashboard
Stars222
LanguagePythonPythonPython
Setup difficultyhardmoderatehard
Complexity3/52/54/5
Audiencedevelopergeneralops devops

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1h+

Requires a Meta developer account with WhatsApp Cloud API access, plus API keys for your chosen STT, LLM, and TTS providers.

In plain English

This is a server that turns WhatsApp voice messages into AI-powered voice conversations. When someone sends an audio message to a WhatsApp number connected to this server, the server downloads it, runs it through a pipeline of three AI services, and sends a synthesized voice response back. The whole thing starts with a single Docker command. The pipeline works in three steps. First, a speech-to-text service converts the incoming audio into a text transcript. Second, a language model reads that transcript and generates a text response. Third, a text-to-speech service converts the response back into an audio file, which the server sends to the user through WhatsApp's cloud API. Each of the three steps can use a different service provider, configured through environment variables. For speech-to-text you can use Orchard Run, OpenAI, or Deepgram. For the language model you can use OpenAI, Anthropic, or Groq. For text-to-speech you can use Orchard Run, OpenAI, or ElevenLabs. The defaults use Orchard Run for both audio steps and OpenAI for the language model. The integration with WhatsApp uses Meta's WhatsApp Cloud API webhook system. When a user sends an audio message, Meta sends a notification to the server's webhook URL. The server then uses Meta's API to download the audio file, process it, and send the response. For local development, you can use ngrok, a tool that creates a public URL forwarding to your local machine, so Meta can reach your server while testing. The code is organized as a FastAPI application: one file handles the webhook, another orchestrates the three-step pipeline, and separate provider folders contain integrations for each AI service. No license is stated in the README.

Copy-paste prompts

Prompt 1
Set up this WhatsApp voice agent with Anthropic as the LLM and ElevenLabs as the TTS provider. What .env variables do I need?
Prompt 2
Walk me through connecting Meta's WhatsApp Cloud API webhook to my locally running instance using ngrok.
Prompt 3
How do I add a new STT provider to this project? Show me the folder structure and interface I need to follow.
Prompt 4
I want the LLM to respond as a customer support agent. Where in pipeline.py do I set the system prompt?
Prompt 5
Deploy this WhatsApp voice agent to a cloud server with a stable public URL so I can remove the ngrok dependency.

Frequently asked questions

What is orchard-meta-voice-agent?

A Docker-based server that receives WhatsApp audio messages, runs them through a configurable speech-to-text plus LLM plus text-to-speech pipeline, and replies with a synthesized voice.

What language is orchard-meta-voice-agent written in?

Mainly Python. The stack also includes Python, FastAPI, Docker.

How hard is orchard-meta-voice-agent to set up?

Setup difficulty is rated hard, with roughly 1h+ to a first successful run.

Who is orchard-meta-voice-agent for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub orchard-run on gitmyhub

Verify against the repo before relying on details.