Automate repetitive smartphone tasks like opening apps and searching without manual taps.
Test mobile apps by having the AI interact with screens and verify expected behavior.
Research and develop AI phone agents by experimenting with vision-language models on real devices.
Requires running a local LLM (AutoGLM-Phone-9B via vLLM), Android/iOS device setup (ADB/HDC), and WebDriverAgent infrastructure; multiple moving parts with GPU/CUDA likely needed.
Open-AutoGLM is an open-source AI phone agent framework built on the AutoGLM model. The problem it solves is that most smartphone tasks still require you to tap through menus yourself. This project lets you describe a task in plain language, such as "open Meituan and search for nearby hotpot restaurants", and the AI automatically figures out what to do on your phone's screen and does it for you. The system works by connecting to your Android or HarmonyOS phone via ADB (Android Debug Bridge, a standard developer tool for communicating with Android devices) or HDC (the equivalent for Huawei HarmonyOS). It takes screenshots of your screen, uses a vision-language model to understand what is shown, plans a sequence of actions, and then executes taps, swipes, and text input on your behalf. It includes a confirmation step for sensitive actions and supports remote control over Wi-Fi. The AI model (AutoGLM-Phone-9B) can be run via third-party API services or self-hosted using vLLM or SGLang inference frameworks. You would use this for automating repetitive phone tasks, app testing, or research into AI phone agents. The framework supports iOS as well as Android, though iOS setup requires additional configuration via WebDriverAgent. It is written in Python and intended for research and educational use.
Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.