Automate repetitive tasks in any Android app, following users, filling forms, or navigating menus, without writing app-specific code.
Build an AI agent that learns how an app works by watching a human demonstration and then replicates those tasks on its own.
Run automated Android app tests that work at the visual UI layer, requiring no access to app internals or backend APIs.
Requires an Android device or emulator, USB ADB connection, Python dependencies, and an OpenAI API key (~$0.03 per action request).
AppAgent is a Python framework that lets an AI model control Android smartphone apps by looking at the screen and simulating taps and swipes. The agent sees the screen as an image, decides what to do next, and sends those actions to the phone over a USB connection using Android Debug Bridge (ADB). No access to an app's internal code or backend is needed, which means it can work with any app the phone can run. The framework uses a two-phase approach. In the exploration phase, the agent either explores an app on its own or watches a human demonstration to learn how the app works. It builds a written record of what each on-screen element does and saves that for later. In the deployment phase, when given a task, the agent reads from that record and applies what it learned to complete the task step by step. The AI model that makes the decisions is a multimodal model, meaning it processes both screenshots and text. The setup uses GPT-4V by default, which requires an OpenAI API key and costs around /bin/bash.03 per request. A free alternative using Alibaba Cloud's Qwen-VL model is also supported, though the README notes its performance is weaker. To get started, you connect an Android device or Android Studio emulator to your computer, install Python dependencies, configure your API key, and then point the agent at a task. The README includes a demo showing the agent following a user on X (formerly Twitter) and another example of it passing a CAPTCHA challenge. The project was published as a paper at CHI 2025 and is released under MIT. A follow-up project called AppAgentX, with an evolving mechanism, was released by the same team shortly after.
← tencentqqgylab on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.