explaingit

sam-siavoshian/agent-notch

18SwiftAudience · developerComplexity · 4/5ActiveLicenseSetup · hard

TLDR

macOS notch app for M-series MacBooks that long-press triggers a voice-driven AI agent which takes over the mouse and keyboard using Claude Haiku, Gemini, OpenAI voice, and a Mercury 2 context summariser.

Mindmap

mindmap
  root((agent-notch))
    Inputs
      Long press gesture
      Voice command
      Screen contents
      Clipboard and selection
    Outputs
      Mouse and keyboard actions
      Notch status display
      Spoken responses
      Developer timeline
    Use Cases
      Hands free desktop control
      In app automation
      Hackathon demo
    Tech Stack
      Swift
      SwiftUI
      Claude Haiku
      Gemini
      OpenAI
      OpenRouter Mercury 2

Things people build with this

USE CASE 1

Long-press the notch companion and dictate a task that Claude Haiku then performs by driving the mouse and keyboard in Slack, Discord, or Figma.

USE CASE 2

Use the background observer to build a per-app memory of UI element positions so future agent runs do not relearn the same interface.

USE CASE 3

Open the developer window with Command-Shift-I to inspect the live observation stream, every API request, and the action timeline.

USE CASE 4

Fork the project as a starting point for your own desktop AI agent that needs Accessibility, Screen Recording, and Microphone permissions.

Tech stack

SwiftSwiftUIClaudeGeminiOpenAIOpenRouter

Getting it running

Difficulty · hard Time to first run · 1day+

Needs an M-series MacBook with a notch, macOS 14+, Xcode build, four API keys (Anthropic, Google, OpenAI, OpenRouter), and three sensitive system permissions.

MIT license, free to use, modify, and redistribute as long as the copyright notice is kept.

In plain English

Agent Notch is a macOS app that lives in the notch at the top of a MacBook screen. The idea is that you long-press a small cursor companion, speak what you want done, and an AI model called Claude Haiku 4.5 takes over the mouse and keyboard to carry it out. The notch itself shows what the agent is doing while it works. It only runs on M-series MacBooks that have a physical notch, on macOS 14 or newer. To install it you use a few command-line tools (Homebrew, XcodeGen, a signing script), then open the project in Xcode and build it. You need four API keys: one from Anthropic for the Claude model, one from Google for Gemini, one from OpenAI for voice transcription and text-to-speech, and one from OpenRouter for a context-selection model called Mercury 2. The app also asks for three macOS permissions on first run: Accessibility, so it can detect the long-press and send clicks, Screen Recording, so it can see what is on your screen, and Microphone, so it can hear you. A central piece of the project is the context system. Two things run in parallel. A background observer watches your screen and builds up a memory of where buttons live in apps like Slack, Discord, and Figma, so the agent does not have to relearn an interface every time. A foreground path triggers when you long-press: it grabs a fast snapshot of the current app, your selection, your clipboard, and your cursor position, then asks Mercury 2 to summarize it all into a short brief. The brief is passed to Claude before any action is taken, and references like 'her' or 'that doc' are resolved to concrete things first. Privacy is built into the architecture. Password managers, secure input fields, and credentials in URLs are never logged. A single kill switch pauses all collection. A developer window (Command-Shift-I) shows the live observation stream, the per-app memory, every request and response, and the full timeline of what the agent did after each long-press. The project is written in Swift and SwiftUI, was built at TritonHacks 2026, and is released under the MIT license.

Copy-paste prompts

Prompt 1
Set up agent-notch on an M-series MacBook: install Homebrew, XcodeGen, run the signing script, paste in the four API keys, and grant the three macOS permissions.
Prompt 2
Walk me through agent-notch's context system and explain how Mercury 2 turns a screen snapshot plus selection and clipboard into the brief that Claude actually sees.
Prompt 3
Replace agent-notch's OpenAI voice transcription with a local Whisper.cpp model and keep the rest of the pipeline working.
Prompt 4
Add a new privacy rule to agent-notch's redactor that strips OAuth bearer tokens from any logged HTTP header before it reaches the developer window.
Prompt 5
Port agent-notch's read-only screen observer into a standalone Swift package so another macOS app can reuse the per-app UI memory.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.