explaingit

tencentqqgylab/appagent

6,734PythonAudience · developerComplexity · 4/5LicenseSetup · hard

TLDR

A Python framework that lets an AI model control any Android app by looking at the screen and tapping or swiping, no app source code needed, just a USB-connected phone and an OpenAI API key.

Mindmap

mindmap
  root((appagent))
    What it does
      AI controls Android apps
      Screen-based tapping
    How it works
      Exploration phase
      Deployment phase
    Tech stack
      Python runtime
      GPT-4V or Qwen-VL
      ADB USB bridge
    Audience
      Automation developers
      AI agent researchers
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Automate repetitive tasks in any Android app, following users, filling forms, or navigating menus, without writing app-specific code.

USE CASE 2

Build an AI agent that learns how an app works by watching a human demonstration and then replicates those tasks on its own.

USE CASE 3

Run automated Android app tests that work at the visual UI layer, requiring no access to app internals or backend APIs.

Tech stack

PythonGPT-4VAndroidADB

Getting it running

Difficulty · hard Time to first run · 1h+

Requires an Android device or emulator, USB ADB connection, Python dependencies, and an OpenAI API key (~$0.03 per action request).

Use freely in any project, including commercial ones, as long as you keep the copyright notice (MIT license).

In plain English

AppAgent is a Python framework that lets an AI model control Android smartphone apps by looking at the screen and simulating taps and swipes. The agent sees the screen as an image, decides what to do next, and sends those actions to the phone over a USB connection using Android Debug Bridge (ADB). No access to an app's internal code or backend is needed, which means it can work with any app the phone can run. The framework uses a two-phase approach. In the exploration phase, the agent either explores an app on its own or watches a human demonstration to learn how the app works. It builds a written record of what each on-screen element does and saves that for later. In the deployment phase, when given a task, the agent reads from that record and applies what it learned to complete the task step by step. The AI model that makes the decisions is a multimodal model, meaning it processes both screenshots and text. The setup uses GPT-4V by default, which requires an OpenAI API key and costs around /bin/bash.03 per request. A free alternative using Alibaba Cloud's Qwen-VL model is also supported, though the README notes its performance is weaker. To get started, you connect an Android device or Android Studio emulator to your computer, install Python dependencies, configure your API key, and then point the agent at a task. The README includes a demo showing the agent following a user on X (formerly Twitter) and another example of it passing a CAPTCHA challenge. The project was published as a paper at CHI 2025 and is released under MIT. A follow-up project called AppAgentX, with an evolving mechanism, was released by the same team shortly after.

Copy-paste prompts

Prompt 1
Using AppAgent, how do I set up an Android emulator in Android Studio and run the agent to automatically follow a user on X (Twitter)?
Prompt 2
Walk me through configuring AppAgent with a free Qwen-VL API key instead of GPT-4V to reduce costs for automating Android tasks.
Prompt 3
How does AppAgent's exploration phase work? Show me how to have it learn a new app's UI layout by watching a human demo and saving the results.
Prompt 4
I want AppAgent to complete a multi-step task in a shopping app. How do I write the task description and deploy the agent against a connected Android phone?
Open on GitHub → Explain another repo

← tencentqqgylab on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.