explaingit

bytedance/ui-tars-desktop

29,622TypeScriptAudience · developerComplexity · 4/5Setup · hard

TLDR

UI-TARS Desktop provides two AI agent tools that let an AI model look at your screen and control browsers or desktop apps by clicking and typing, so you can automate multi-step computer tasks using plain-language instructions.

Mindmap

mindmap
  root((repo))
    What It Does
      Screen observation
      Click and type
      Task automation
    Two Projects
      Agent TARS
      UI-TARS Desktop
    Interfaces
      CLI tool
      Web interface
      Desktop app
    Tech Stack
      TypeScript
      Electron
      MCP
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Automate a multi-step browser workflow by giving the Agent TARS a plain-language task description instead of writing code

USE CASE 2

Build a GUI automation agent that fills out web forms, navigates pages, and extracts information without needing a site API

USE CASE 3

Control a local or remote desktop programmatically through an AI model interface

USE CASE 4

Prototype AI agent workflows that interact with existing desktop software that has no automation API

Tech stack

TypeScriptElectronNode.jsMCP

Getting it running

Difficulty · hard Time to first run · 1h+

Requires AI model credentials or a local model, desktop app needs OS-level screen access permissions.

In plain English

UI-TARS Desktop is an open-source stack of two related AI agent projects that let an AI model observe and interact with graphical interfaces, web browsers, desktop applications, and terminals, the same way a human user would, by looking at the screen and clicking or typing. The first project, Agent TARS, is a general-purpose multimodal AI agent (multimodal meaning it can process both text and visual information). It can be controlled through a command-line tool or a web-based interface, and it connects to external tools via MCP (a protocol for giving AI agents access to real-world capabilities). You can give it a natural-language instruction like "book the earliest flight from X to Y on this website" and it will navigate a browser to complete the task. The second project, UI-TARS Desktop, is a desktop application built on a specific AI model called UI-TARS. It provides local or remote operators for computers and browsers, meaning it can control either the machine it runs on or a remote machine. Both projects are written in TypeScript and target developers and researchers building or experimenting with GUI automation agents, software that automates tasks by operating graphical interfaces rather than APIs. Someone would use this when they want an AI to perform multi-step computer tasks on their behalf, or when they are building agent-based automation tooling.

Copy-paste prompts

Prompt 1
Using Agent TARS, write a task instruction that opens a browser, searches for the cheapest flight from London to New York next Friday, and returns the price and airline.
Prompt 2
How do I set up UI-TARS Desktop on my machine so it can control my local browser, and how do I give it a task from the command line?
Prompt 3
Create an Agent TARS workflow that logs into a web app with my credentials, fills a form with given inputs, and takes a screenshot of the confirmation page.
Prompt 4
How do I connect Agent TARS to an MCP tool server so it can call external APIs while also navigating a browser to complete a task?
Open on GitHub → Explain another repo

← bytedance on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.