explaingit

zai-org/open-autoglm

Analysis updated 2026-05-18

25,201PythonAudience · developerComplexity · 4/5LicenseSetup · hard

TLDR

Open-source AI phone agent that automates Android and iOS tasks by understanding screenshots and executing taps, swipes, and text input based on plain-language instructions.

Mindmap

mindmap
  root((repo))
    What it does
      Automates phone tasks
      Understands screenshots
      Executes actions
    How it works
      Connects via ADB/HDC
      Vision-language model
      Plans and executes
    Supported platforms
      Android devices
      HarmonyOS phones
      iOS with setup
    Use cases
      Repetitive automation
      App testing
      Agent research
    Tech stack
      Python framework
      vLLM inference
      SGLang support
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Automate repetitive smartphone tasks like opening apps and searching without manual taps.

USE CASE 2

Test mobile apps by having the AI interact with screens and verify expected behavior.

USE CASE 3

Research and develop AI phone agents by experimenting with vision-language models on real devices.

What is it built with?

PythonAutoGLM-Phone-9BvLLMSGLangADBHDCWebDriverAgent

How does it compare?

zai-org/open-autoglmlucidrains/vit-pytorchzulip/zulip
Stars25,20125,14725,147
LanguagePythonPythonPython
Setup difficultyhardmoderatehard
Complexity4/53/54/5
Audiencedeveloperresearcherops devops

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires running a local LLM (AutoGLM-Phone-9B via vLLM), Android/iOS device setup (ADB/HDC), and WebDriverAgent infrastructure, multiple moving parts with GPU/CUDA likely needed.

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

In plain English

Open-AutoGLM is an open-source AI phone agent framework built on the AutoGLM model. The problem it solves is that most smartphone tasks still require you to tap through menus yourself. This project lets you describe a task in plain language, such as "open Meituan and search for nearby hotpot restaurants", and the AI automatically figures out what to do on your phone's screen and does it for you. The system works by connecting to your Android or HarmonyOS phone via ADB (Android Debug Bridge, a standard developer tool for communicating with Android devices) or HDC (the equivalent for Huawei HarmonyOS). It takes screenshots of your screen, uses a vision-language model to understand what is shown, plans a sequence of actions, and then executes taps, swipes, and text input on your behalf. It includes a confirmation step for sensitive actions and supports remote control over Wi-Fi. The AI model (AutoGLM-Phone-9B) can be run via third-party API services or self-hosted using vLLM or SGLang inference frameworks. You would use this for automating repetitive phone tasks, app testing, or research into AI phone agents. The framework supports iOS as well as Android, though iOS setup requires additional configuration via WebDriverAgent. It is written in Python and intended for research and educational use.

Copy-paste prompts

Prompt 1
How do I set up Open-AutoGLM to control my Android phone and automate a task like opening an app and searching for something?
Prompt 2
Show me how to use the AutoGLM-Phone-9B model with vLLM to process screenshots and generate phone actions.
Prompt 3
What's the difference between using ADB for Android and HDC for HarmonyOS in Open-AutoGLM, and how do I configure each?
Prompt 4
How can I add a confirmation step for sensitive actions in Open-AutoGLM before the AI executes them on my phone?

Frequently asked questions

What is open-autoglm?

Open-source AI phone agent that automates Android and iOS tasks by understanding screenshots and executing taps, swipes, and text input based on plain-language instructions.

What language is open-autoglm written in?

Mainly Python. The stack also includes Python, AutoGLM-Phone-9B, vLLM.

What license does open-autoglm use?

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

How hard is open-autoglm to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is open-autoglm for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub zai-org on gitmyhub

Verify against the repo before relying on details.