explaingit

microsoft/ufo

8,634PythonAudience · developerComplexity · 4/5Setup · moderate

TLDR

UFO is a Microsoft AI research project that lets software agents control Windows apps visually by reading the screen, the way a person would. The latest version, Galaxy, coordinates agents across Windows, Linux, and Android to complete tasks on multiple devices at once.

Mindmap

mindmap
  root((repo))
    What it does
      Controls Windows apps visually
      Reads screen and acts
      Describes task in plain English
    Versions
      UFO single device
      UFO2 Desktop AgentOS
      UFO3 Galaxy multi-device
    Tech
      Python
      Large language models
      Windows Linux Android
    Use cases
      Task automation
      Multi-device workflows
      UI testing
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Automate repetitive Windows desktop tasks by describing the task in plain English and letting the agent click through apps on your behalf

USE CASE 2

Run multi-device workflows where agents on Windows, Linux, and Android collaborate on a single task without human involvement

USE CASE 3

Build automated UI testing flows that interact with desktop applications through their real visual interface, not just APIs

Tech stack

PythonLLMWindowsAndroidLinux

Getting it running

Difficulty · moderate Time to first run · 1h+

Requires a Windows machine and an LLM API key, multi-device Galaxy mode additionally needs Linux or Android machines configured as agents.

In plain English

UFO is a Microsoft research project that lets a computer use its own software the way a person would. Instead of a human clicking through menus and typing into apps, UFO reads the screen, understands what it sees, and takes actions to complete a task you describe in plain English. It works on Windows and is built on top of large language model technology. The project has gone through three major versions. The original UFO was a single-device agent for Windows released in early 2024. UFO2, called Desktop AgentOS, deepened the integration with Windows so the agent could interact with apps both through the visual interface and through underlying APIs when available. UFO3, the current version, introduces a framework called Galaxy that lets multiple agents on different devices (Windows, Linux, Android) work together on the same task at the same time. The Galaxy system breaks a user request into a graph of smaller tasks, where some tasks can run at the same time and others must wait for earlier ones to finish. A planning component called the ConstellationAgent figures out which device is best suited for each piece of the task, assigns work to the right machines, and adjusts the plan if something goes wrong mid-run. The devices communicate over a secure connection so they can share results and coordinate without human involvement. For someone who wants to automate a single Windows computer, UFO2 is described as stable and straightforward to set up. For workflows that span multiple machines or operating systems, Galaxy handles the coordination. The two modes are compatible: a UFO2 installation can act as one of the device agents inside a Galaxy setup, so existing users can move to the newer system gradually. The repository includes documentation, quick-start guides, and video demos. The full README is longer than what was shown.

Copy-paste prompts

Prompt 1
Using UFO2 on Windows, write a Python script that opens Excel, pastes data from the clipboard into column A, saves the file, and emails it to me
Prompt 2
Set up UFO3 Galaxy so a Windows agent fetches a sales report while a Linux agent processes it and an Android agent sends the summary via SMS
Prompt 3
Show me how to configure UFO to record every screen state it observes while completing a task so I can debug where it went wrong
Prompt 4
Write a UFO task description that opens Chrome, logs into a website with my saved credentials, downloads a CSV report, and closes Chrome when done
Open on GitHub → Explain another repo

← microsoft on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.