explaingit

cursortouch/windows-mcp

5,554PythonAudience · vibe coderComplexity · 3/5LicenseSetup · moderate

TLDR

A Python server that lets AI assistants like Claude control a Windows PC, clicking, typing, navigating files, and automating browsers, using the Windows accessibility layer instead of screenshots.

Mindmap

mindmap
  root((windows-mcp))
    What it does
      Click and type
      Open applications
      File navigation
      Browser automation
    How it works
      MCP protocol
      Accessibility layer
      No screenshots
    Setup
      uv install
      Background task
      MCP config
    Compatibility
      Windows 7 to 11
      Python 3.13 plus
      MIT licensed
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Let an AI assistant automatically open applications, fill out forms, and complete multi-step desktop tasks on your Windows PC.

USE CASE 2

Automate browser interactions where the AI reads the underlying page structure directly for more reliable clicks than pixel-matching.

USE CASE 3

Connect Claude Desktop or another MCP-compatible AI client to your Windows computer for hands-free task automation.

Tech stack

PythonWindows API

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Python 3.13+ and an MCP-compatible AI client like Claude Desktop, Windows 7-11 only, no macOS or Linux.

Use freely for any purpose including commercial projects, as long as you keep the copyright notice.

In plain English

Windows-MCP is a Python server that gives AI agents the ability to control a Windows computer. MCP stands for Model Context Protocol, a standard way for AI assistants to connect to external tools and capabilities. When you run Windows-MCP and connect it to an AI assistant like Claude, the assistant can perform actions on your desktop: open applications, click buttons, type text, navigate folders, read what is currently on screen, and interact with web pages through a browser. The server works with any AI language model, not just one specific product. It does not rely on image recognition or screenshot analysis to find interface elements. Instead, it reads the Windows accessibility layer, which is the same infrastructure that screen readers use to help people with visual impairments navigate the operating system. This makes interactions faster and more consistent than tools that try to match pixels on screen. Typical latency between consecutive actions is between 0.2 and 0.5 seconds, depending on how many applications are running and how quickly the language model generates its next instruction. A special mode for browser automation reads the underlying web page structure directly rather than treating the browser as a visual grid, which makes web interactions more precise. Installation is a single command using the uv package manager. The server can optionally be registered as a Windows background task that starts automatically at login. Configuration instructions are included for Claude Desktop, Perplexity Desktop, and other MCP-compatible clients. The project supports Windows 7 through Windows 11, requires Python 3.13 or newer, and is licensed under MIT.

Copy-paste prompts

Prompt 1
I have Windows-MCP running and connected to Claude Desktop. Write a prompt I can give Claude to open Excel, create a new spreadsheet, and enter a table of monthly sales data I describe.
Prompt 2
How do I register Windows-MCP as a Windows background task so it starts automatically every time my computer boots?
Prompt 3
Write a step-by-step workflow for Claude using Windows-MCP to find all PDFs in my Downloads folder modified this week and move them to a specific subfolder.
Prompt 4
How do I connect Windows-MCP to Perplexity Desktop instead of Claude Desktop, show me the config changes needed.
Prompt 5
What accessibility API does Windows-MCP use to read UI elements, and how does this make it more reliable than screenshot-based automation tools?
Open on GitHub → Explain another repo

← cursortouch on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.