Play War Thunder tanks using webcam hands and spoken commands
Map MediaPipe hand landmarks to synthetic mouse and key events
Run mlx-whisper as a local voice command engine on Apple Silicon
Prototype a panic key that releases all held inputs
Tested only on macOS 26.5 with Apple Silicon, and the in-game War Thunder bindings must match what the script expects.
This project lets you play the tank vehicle game War Thunder using your hands and your voice instead of the usual mouse and keyboard. The author pitches it as a challenge for players who already feel too skilled with the standard controls and want a more physical, harder way to drive a tank, aim a turret, pick a shell type, and fire. The setup needs macOS 26.5 (the author has not tested anywhere else), Python 3.11 or newer, a webcam, a microphone, and enough RAM to run both the game and the program at the same time. You also have to grant the terminal access to the camera and microphone, and the in game key bindings have to match what the program expects. In use, your left hand controls tank movement and your right hand controls the turret, tracked by the webcam. Voice commands handle the rest: saying Fire clicks the mouse to shoot, Range Finder issues the Command key, Scope sends Shift, and the digits one through four pick shell types. There is a panic button bound to the equals key that immediately stops all input, releases any held keys, and pauses the program. Under the hood, microphone audio is split by voice activity detection and transcribed locally using mlx-whisper on Apple Silicon. The webcam feed runs through Google's MediaPipe Hand Landmarker. No cloud APIs are used. The stack is Python with MediaPipe, OpenCV, sounddevice, and pynput, plus Quartz on macOS so mouse movement is sent as deltas the game can read. Configuration lives in config.toml.
Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.