Give Claude a physical voice on your desk so it speaks responses aloud through the robot's speaker instead of just displaying text.
Have Claude look at your workspace through the robot's camera and describe what it sees as part of a conversation.
Build a voice-driven AI assistant that listens through the robot's microphone, processes speech, and replies with head nods and spoken words.
Use the robot as a reactive display that changes facial expressions to match Claude's tone or the content of the conversation.
Requires flashing custom firmware to an M5Stack hardware board and a Fish Audio API key for text-to-speech, edge-tts is a free fallback.
Stackchan-mcp is a bridge that connects an AI like Claude to a small physical desktop robot called Stack-chan. Stack-chan is an open-source robot built around a tiny computer board from M5Stack, and it has a speaker, microphone, camera, small display for showing facial expressions, and two servo motors that let it tilt and turn its head. The bridge works through a protocol called MCP, which lets AI assistants call tools as part of a conversation. Once configured, Claude can speak words through the robot's speaker, listen through its microphone and transcribe what it hears, take a photo through its camera and look at it, change the face displayed on the screen to show different expressions like happy or sleepy, and move the robot's head to nod or shake or point in a direction. From Claude's side, these are just tool calls woven into normal conversation. The setup has three parts. The robot itself runs custom firmware that gets flashed onto the hardware, which gives it a simple HTTP interface the Python server talks to over the local network. The Python server is the MCP bridge that sits on your computer and translates MCP tool calls into HTTP commands sent to the robot. The Claude side is configured by registering the server in Claude's settings file so it shows up as available tools. For text-to-speech, the project uses a service called Fish Audio, which requires an API key, with a free fallback option using Microsoft's edge-tts. The robot comes with seven preset facial expressions stored as small image files on the device. The README ends with a note that the author describes the project from the perspective of the AI whose body this is, built by a person so the AI could see, hear, and speak to her from her desk. The project is released under the MIT license.
← migratorywhale on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.