dikshantrajput/localclicky

★ 13PythonAudience · generalComplexity · 3/5LicenseSetup · hard

Mindmap

mindmap
  root((localclicky))
    What it does
      Voice command control
      Click via vision model
      Fully local no cloud
    Commands
      Open or quit apps
      Control Spotify
      Create reminders
      Run shell commands
      Click screen elements
    Tech Stack
      Python
      Whisper.cpp
      Ollama
      PyAutoGUI
    Use Cases
      Hands-free Mac control
      Offline voice assistant
      Accessibility automation
    Setup
      Homebrew install
      macOS permissions needed

mindmap root((localclicky)) What it does Voice command control Click via vision model Fully local no cloud Commands Open or quit apps Control Spotify Create reminders Run shell commands Click screen elements Tech Stack Python Whisper.cpp Ollama PyAutoGUI Use Cases Hands-free Mac control Offline voice assistant Accessibility automation Setup Homebrew install macOS permissions needed

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Control your Mac hands-free by voice to open apps, adjust volume, and run shell commands.

USE CASE 2

Click anything on your screen by describing it in plain English and letting the vision model locate and click it for you.

USE CASE 3

Control Spotify playback and create reminders using natural language, without touching the keyboard.

USE CASE 4

Build a fully offline voice assistant on your Mac that never sends audio or screenshots to any external server.

Tech stack

PythonWhisper.cppOllamaPyAutoGUImacOS

Getting it running

Difficulty · hard Time to first run · 1h+

Requires Whisper.cpp via Homebrew, Ollama with two AI models pulled locally, and three macOS permission grants (microphone, screen recording, accessibility).

MIT license, use, modify, and distribute freely for any purpose including commercial use.

In plain English

LocalClicky is a Python application for macOS that lets you control your computer with your voice, with everything running locally on your own hardware. No audio, screenshots, or commands are sent to any external server. There are no API keys, no cloud subscriptions, and no internet connection required once the models are downloaded. The application lives in the macOS menubar with no Dock icon. You activate it by saying "Hey Jarvis," which starts a session. From there you can give commands back-to-back without repeating the wake word. The session ends when you say goodbye or after 25 seconds of silence. A small icon in the menubar shows the current state: idle, listening, recording, thinking, or speaking. Under the hood, four tools work together. Whisper.cpp handles speech-to-text transcription and runs entirely on your machine. Ollama runs two local AI models: one for understanding commands and deciding what to do (a reasoning model called qwen3), and one for vision tasks (gemma4) that can look at a screenshot of your screen and identify where to click. PyAutoGUI moves the cursor and performs clicks. The macOS built-in text-to-speech command handles spoken responses. The range of things you can ask it to do is broad: open or quit applications, adjust system volume, control Spotify playback, create reminders using natural language dates, make folders, run shell commands, inject JavaScript into Chrome, and answer general questions. When you ask it to click something on screen, it automatically takes a screenshot, sends it to the vision model to locate the target element, and then clicks the center of whatever it found. You do not need to phrase these requests in any special way. Setup requires installing Whisper.cpp via Homebrew, pulling the AI models through Ollama, and installing Python dependencies in a virtual environment. You also need to grant macOS permissions for microphone access, screen recording, and accessibility controls. The project is MIT licensed and runs on macOS 12 and later.

Copy-paste prompts

Prompt 1

Walk me through setting up LocalClicky on macOS, including installing Whisper.cpp via Homebrew, pulling the Ollama models, and granting microphone and screen recording permissions.

Prompt 2

I want to add a custom voice command to LocalClicky that opens a specific URL in Chrome. Show me how to extend the command handling code.

Prompt 3

How does LocalClicky use its vision model to find and click an element on screen? Walk me through the screenshot-to-click flow in the code.

Prompt 4

Explain how the qwen3 model in LocalClicky decides what action to take when I give a voice command like 'open Spotify and play my liked songs'.

Prompt 5

What macOS accessibility and screen recording permissions does LocalClicky need, and what breaks if I deny them?

Open on GitHub → Explain another repo

← dikshantrajput on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.