Automate repetitive desktop or browser tasks by describing them in plain language and letting the AI figure out and execute the steps.
Build a personal automation library that grows smarter over time as the agent saves reusable skills from completed tasks.
Control an Android phone via USB through AI instructions for automated mobile testing or workflow automation.
Requires an API key for a supported AI model such as Claude or Gemini, plus desktop access for computer-control features.
GenericAgent is a Python framework that lets a large language model (an AI system like Claude or Gemini) control a real computer on your behalf. It can open and interact with a browser, run terminal commands, manage files, move the mouse and keyboard, read the screen, and even control an Android phone via USB. You describe a task in plain language, and the agent figures out the steps, executes them, and reports back. The framework's central design idea is that it learns from experience. When the agent successfully completes a task for the first time, it automatically saves the approach as a reusable skill. The next time you ask for something similar, it recalls that skill directly rather than working it out from scratch. Over time this builds a personal skill library unique to your setup, which the README describes as a growing skill tree. The codebase itself is deliberately small, around 3,000 lines of core code. The agent loop that drives behavior is roughly 100 lines. The authors claim this minimal footprint lets the agent run within a context window far smaller than competing frameworks, which reduces cost and keeps the AI's attention focused on relevant information. Several interface options are included: a desktop GUI, a terminal interface, a Streamlit web app, a Telegram bot, and a WeChat bot. You connect it to whichever AI model you already have API access to, configure your key, and launch. The README notes the entire repository, including its git history and commit messages, was created autonomously by the agent itself with no manual terminal use by the author. The project has a published technical report on arXiv. It is released publicly with an open-source license. The full README is longer than what was shown.
← lsdefine on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.