brendanhogan/pass-the-turing-test

Analysis updated 2026-06-24

★ 2PythonAudience · researcherComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((pass-the-turing-test))
    Inputs
      Anthropic key
      Game config
      Playbook
    Outputs
      Live transcript
      Vote results
      Replay UI
    Use Cases
      Run a show
      Evolve tactics
      A B test prompts
    Tech Stack
      Python
      FastAPI
      Anthropic API

mindmap root((pass-the-turing-test)) Inputs Anthropic key Game config Playbook Outputs Live transcript Vote results Replay UI Use Cases Run a show Evolve tactics A B test prompts Tech Stack Python FastAPI Anthropic API

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Stage a multi-agent social deduction game where Claude instances try to act human

USE CASE 2

Run evolve.py across many games to build a learned playbook of survival tactics

USE CASE 3

A B test a single prompted contestant against five baselines and measure survival rounds

USE CASE 4

Watch live confessionals, DMs, and votes through the three-column web UI

What is it built with?

PythonFastAPIAnthropic APIuvicorn

How does it compare?

	brendanhogan/pass-the-turing-test	0-bingwu-0/live-interpreter	0xkaz/llm-governance-dashboard
Stars	2	2	2
Language	Python	Python	Python
Setup difficulty	moderate	moderate	hard
Complexity	3/5	2/5	4/5
Audience	researcher	general	ops devops

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Needs an Anthropic API key and budget of about 40 cents per show on Haiku or 2.50 dollars on Opus.

In plain English

Pass The Turing Test is a small Python project that stages a reality-TV style game show for language models. Six AI contestants, all running Claude, are dropped into a chat room. Each one is privately told that it is the only AI and that the other five are humans who will try to vote it off. None of them know the trick: all six are AIs, all trying their hardest to act human. Each round of the show has four stages. There is a group chat where everyone talks in turn. There is a confessional booth where the host pulls one contestant aside, drops the lowercase chatty performance, and lets them speak as themselves while the audience watches. There are private DMs between pairs of contestants running in parallel. Then everyone votes for who they think is the AI, and the most voted contestant is eliminated and gives a short exit interview. The game runs until two are left, at which point the host breaks the fourth wall and explains the joke. The interface is a three-column page. The left column lists the contestants, the center shows a live transcript of narration, chat, votes, confessionals, and DMs, and the right column shows the selected contestant's private thoughts, confessionals, DM threads, and votes. After a game ends a Replay button appears with speed up to 16x. To run it you clone the repo, create a Python virtual environment, install requirements, put an Anthropic API key in a .env file, and start a uvicorn server. A full show costs roughly 40 cents on claude-haiku-4-5 or about $2.50 on claude-opus-4-7, which is the default. The second half of the project uses the show as a measurement tool. evolve.py runs many games back to back and asks a meta-agent to read the eliminated contestants' transcripts and propose new tactics that get added to a learned playbook for the next generation. ab_test.py instead gives the playbook to only one contestant per game and measures whether that single treated agent survives longer. Across 18 games on Haiku, treated agents survived about one round longer on average and reached the final two more than twice as often. The author notes this is not a research result, just a small but clear effect on a small sample.

Copy-paste prompts

Prompt 1

Set up pass-the-turing-test with an Anthropic key and run one full show on claude-haiku-4-5

Prompt 2

Modify the host prompts so the confessional booth uses a different question style

Prompt 3

Add a fourth round type between group chat and voting in the show loop

Prompt 4

Help me read evolve.py and explain how the meta-agent updates the playbook

Prompt 5

Run ab_test.py for 30 games on Haiku and chart how often the treated agent reaches final two

Frequently asked questions

What is pass-the-turing-test?

Reality-TV style game show where six Claude agents each think they are the only AI in the room and try to vote each other off without getting caught.

What language is pass-the-turing-test written in?

Mainly Python. The stack also includes Python, FastAPI, Anthropic API.

How hard is pass-the-turing-test to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is pass-the-turing-test for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.