explaingit

ayushh0110/screenmind

14PythonAudience · generalComplexity · 3/5Setup · hard

TLDR

A Python app that records your screen and uses a local AI model to describe what you were doing, then lets you search or chat with your full screen history entirely on your own computer.

Mindmap

mindmap
  root((screenmind))
    What it does
      Records screen
      Transcribes audio
      Summarizes meetings
      Answers questions
    Privacy
      Local only
      Encrypted storage
      Sensitive data filtered
      PIN lock
    Search
      Semantic search
      Full text search
      Timeline view
      App analytics
    Integrations
      Claude Desktop
      Cursor
      Obsidian
      Notion
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Search your screen history to recall something you saw last week without remembering which app it was in.

USE CASE 2

Automatically transcribe and summarize your video calls from Zoom, Teams, or Google Meet.

USE CASE 3

Ask in plain English what you were working on at a specific time and get an answer grounded in your actual screen history.

USE CASE 4

Record voice memos with a hotkey and have them transcribed and stored alongside your screen activity.

Tech stack

PythonGemma 4SQLiteSemantic Search

Getting it running

Difficulty · hard Time to first run · 1h+

Requires a GPU with at least 4 GB VRAM to run the local Gemma 4 model at a usable speed.

In plain English

ScreenMind is a Python application that records your screen activity and turns it into a searchable, conversational memory, all running on your own computer without any cloud connection. Think of it as a private alternative to Microsoft Recall: every time your screen changes, ScreenMind captures a screenshot, runs it through a locally installed AI model called Gemma 4, and stores a structured description of what was on screen, what app you were using, and what you were doing. The AI model handles three types of input. For screenshots it reads the visual content and produces a detailed summary including the app name, your activity category, and a description of every visible element. For audio it can transcribe voice memos you record with a hotkey, and it also auto-detects video calls (Zoom, Teams, Google Meet) to transcribe and summarize those meetings automatically. For reasoning tasks it generates daily summaries and powers a chat interface where you can ask questions like "what did that person say on Discord yesterday" and get back answers grounded in your actual screen history. Searching your history uses two methods at once: a semantic search that finds content by meaning rather than exact keywords, and a full-text keyword search. You can also browse a timeline view, replay a timelapse of your entire day, and view an analytics dashboard showing which apps you used most and how your time was distributed. All data stays on your machine. ScreenMind filters out sensitive information such as credit card numbers, API keys, and passwords before saving anything. Screenshots are encrypted on disk. You can set a PIN to lock the dashboard and switch on an incognito mode to pause recording entirely. The project also includes an agent platform where you can write short automation scripts in plain text or Python, a server that exposes your screen history to AI coding tools like Claude Desktop and Cursor, and optional integrations with Obsidian and Notion. Setup requires Python 3.10 or later, about 5 GB of disk space for the AI model, and a graphics card with at least 4 GB of memory to run analysis at a reasonable speed.

Copy-paste prompts

Prompt 1
Using screenmind as context, write a Python script that queries my local screen history to find all times I was working in VS Code between 9am and 11am this week and list the file names visible.
Prompt 2
I want to add an automation to screenmind that detects when I open a browser tab with a specific URL pattern and logs it as a separate category. How do I extend the agent platform?
Prompt 3
Help me configure screenmind to activate incognito mode automatically when I open my banking website, using the app's built-in filter rules.
Prompt 4
Write a screenmind MCP server configuration that exposes my screen history to Claude Desktop so I can ask questions about my past work sessions.
Open on GitHub → Explain another repo

← ayushh0110 on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.