explaingit

farzaa/clicky

5,768SwiftAudience · vibe coderComplexity · 4/5LicenseSetup · hard

TLDR

A macOS menu bar app that puts a screen-aware AI voice assistant next to your cursor, press a shortcut, speak a question, and it answers aloud while visually pointing at specific elements on your screen.

Mindmap

mindmap
  root((Clicky))
    What it does
      Screen-aware AI
      Voice input output
      Cursor-side assistant
    Tech Stack
      Swift macOS
      Claude API
      AssemblyAI
      ElevenLabs
      Cloudflare Worker
    Features
      Screenshot capture
      Real-time transcription
      UI element pointing
      Menu bar app
    Use Cases
      Personal AI tutor
      Accessibility tool
      Custom AI assistant
    Setup
      Xcode required
      Three API keys
      Cloudflare deploy
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Build a personal screen-aware AI tutor that answers questions about whatever is visible on your Mac screen using your voice.

USE CASE 2

Extend Clicky to use a different AI model or TTS provider by modifying the Cloudflare Worker and Swift API calls.

USE CASE 3

Use Clicky as a starting point for a macOS accessibility tool that highlights UI elements and explains them aloud.

USE CASE 4

Deploy the Cloudflare Worker to securely proxy your Anthropic, AssemblyAI, and ElevenLabs API keys without embedding them in the app.

Tech stack

SwiftCloudflare WorkersAssemblyAIElevenLabsClaude API

Getting it running

Difficulty · hard Time to first run · 1h+

Requires macOS 14.2 or later, Xcode 15 or later, a Cloudflare account, and paid API keys for Anthropic, AssemblyAI, and ElevenLabs.

Use freely for any purpose, including commercial use, as long as you keep the copyright notice.

In plain English

Clicky is a macOS application that puts an AI assistant next to your cursor on screen. It can see what is on your screen, listen to you through your microphone, talk back to you using a synthesized voice, and visually point at specific elements on your display. The idea is to feel like having a real tutor sitting beside you while you work. The app runs as a menu bar item rather than appearing in the dock. When you press a keyboard shortcut (Control + Option), it takes a screenshot of your screen and listens to what you say. Your speech is transcribed in real time using a service called AssemblyAI, then sent along with the screenshot to Claude (an AI model by Anthropic) which generates a response. That response is read aloud using ElevenLabs, a text-to-speech service. Claude can also embed coordinates in its response that tell Clicky exactly where to move an on-screen pointer, so it can highlight a specific button or piece of UI while explaining it. The codebase is open source and structured in two parts: a Swift macOS app that handles the UI, audio capture, and screen capture, and a small Cloudflare Worker that acts as a proxy to hold your API keys securely so they are not embedded in the app itself. Setting it up yourself requires a Mac running macOS 14.2 or later, Xcode 15 or later, a free Cloudflare account, and API keys for Anthropic, AssemblyAI, and ElevenLabs. You deploy the worker to Cloudflare, update a few URLs in the Swift code to point to your deployed worker, then build and run the app in Xcode. The README suggests using Claude Code (a coding assistant tool) to walk through setup automatically by pasting a single prompt. The creator, Farza, has noted in an update from April 2026 that active development has moved to a private version, but the existing codebase is released under an MIT license and anyone is welcome to use, modify, or build on it.

Copy-paste prompts

Prompt 1
I want to set up Clicky on my Mac. Walk me through deploying the Cloudflare Worker, adding my API keys, and running the Swift app in Xcode from start to finish.
Prompt 2
Show me how Clicky encodes a screenshot and sends it along with transcribed speech to the Claude API in its Swift code.
Prompt 3
I want to swap ElevenLabs in Clicky for a different TTS service. Where in the Swift codebase is the TTS call made and what do I need to change?
Prompt 4
Explain how Clicky uses coordinates embedded in Claude's response to move an on-screen pointer to a specific UI element while speaking.
Open on GitHub → Explain another repo

← farzaa on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.