explaingit

ddavidbailey/tank-controls

0PythonAudience · developerComplexity · 3/5ActiveSetup · hard

TLDR

A macOS Python program that drives War Thunder tanks with webcam hand tracking and local voice commands so movement, turret aim, and shell selection happen without a mouse or keyboard.

Mindmap

mindmap
  root((tank-controls))
    Inputs
      Webcam frames
      Microphone audio
      config.toml
    Outputs
      Synthetic key events
      Mouse delta movement
      Voice command actions
    Use Cases
      Play War Thunder hands-free
      Test MediaPipe hand tracking
      Try mlx-whisper voice control
      Build custom gesture controls
    Tech Stack
      Python
      MediaPipe
      mlx-whisper
      OpenCV
      Quartz

Things people build with this

USE CASE 1

Play War Thunder tanks using webcam hands and spoken commands

USE CASE 2

Map MediaPipe hand landmarks to synthetic mouse and key events

USE CASE 3

Run mlx-whisper as a local voice command engine on Apple Silicon

USE CASE 4

Prototype a panic key that releases all held inputs

Tech stack

PythonMediaPipemlx-whisperOpenCVpynput

Getting it running

Difficulty · hard Time to first run · 1h+

Tested only on macOS 26.5 with Apple Silicon, and the in-game War Thunder bindings must match what the script expects.

In plain English

This project lets you play the tank vehicle game War Thunder using your hands and your voice instead of the usual mouse and keyboard. The author pitches it as a challenge for players who already feel too skilled with the standard controls and want a more physical, harder way to drive a tank, aim a turret, pick a shell type, and fire. The setup needs macOS 26.5 (the author has not tested anywhere else), Python 3.11 or newer, a webcam, a microphone, and enough RAM to run both the game and the program at the same time. You also have to grant the terminal access to the camera and microphone, and the in game key bindings have to match what the program expects. In use, your left hand controls tank movement and your right hand controls the turret, tracked by the webcam. Voice commands handle the rest: saying Fire clicks the mouse to shoot, Range Finder issues the Command key, Scope sends Shift, and the digits one through four pick shell types. There is a panic button bound to the equals key that immediately stops all input, releases any held keys, and pauses the program. Under the hood, microphone audio is split by voice activity detection and transcribed locally using mlx-whisper on Apple Silicon. The webcam feed runs through Google's MediaPipe Hand Landmarker. No cloud APIs are used. The stack is Python with MediaPipe, OpenCV, sounddevice, and pynput, plus Quartz on macOS so mouse movement is sent as deltas the game can read. Configuration lives in config.toml.

Copy-paste prompts

Prompt 1
Walk me through giving Terminal camera and microphone access on macOS so tank-controls can see my hands
Prompt 2
Help me edit config.toml in tank-controls to remap the Fire command to a different word
Prompt 3
Show me how the MediaPipe landmarks in tank-controls are turned into turret aim deltas through Quartz
Prompt 4
Explain how voice activity detection chunks the mic stream before mlx-whisper transcribes it
Prompt 5
Adapt tank-controls so it sends inputs to a different game with its own keybinds
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.