explaingit

alibaba/page-agent

Analysis updated 2026-06-24

17,783TypeScriptAudience · developerComplexity · 3/5LicenseSetup · moderate

TLDR

JavaScript library that drives any web page from natural language instructions in the browser, using your own LLM API key and the page's DOM as text.

Mindmap

mindmap
  root((page-agent))
    Inputs
      Natural Language Instructions
      LLM API Key
      Page DOM
    Outputs
      Click and Input Actions
      Navigation Steps
    Use Cases
      In App AI Copilot
      Workflow Automation
      Accessibility Helper
    Tech Stack
      TypeScript
      Browser
      MCP
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

What do people build with it?

USE CASE 1

Add a natural-language copilot to your SaaS product so users can drive the UI by typing or speaking.

USE CASE 2

Automate repetitive multi-step workflows inside an internal ERP or CRM web app.

USE CASE 3

Improve accessibility by letting users control a complex web UI with plain English.

USE CASE 4

Expose your web app to external AI agents through the bundled MCP server.

What is it built with?

TypeScriptJavaScriptMCPChrome Extension

How does it compare?

alibaba/page-agentjustadudewhohacks/face-api.jsoblador/react-native-vector-icons
Stars17,78317,84817,855
LanguageTypeScriptTypeScriptTypeScript
Setup difficultymoderateeasymoderate
Complexity3/52/52/5
Audiencedeveloperdeveloperdeveloper

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Requires an LLM API key from a provider you supply, plus a page you control to embed the script.

MIT license: use freely for any purpose including commercial use, as long as you keep the copyright notice.

In plain English

Page Agent is a JavaScript library that lets you control any web page using natural language instructions. Instead of clicking buttons and filling forms manually, you describe what you want in plain text, like "Click the login button" or "Fill in the shipping address", and the library figures out which elements on the page need to interact with and performs the actions for you. The key distinction compared to other browser automation tools is that Page Agent runs directly inside the web page as ordinary JavaScript, not as a separate browser extension, a Python script, or a headless browser (a browser run programmatically without a visible window). It works by reading the page's structure as text rather than taking screenshots, which means it does not need a multimodal AI model that can interpret images. You bring your own AI model by providing an API key, and the library handles the interaction logic. The README describes several use cases: adding an AI copilot to a software product so users can navigate it with voice or text commands, automating repetitive multi-step workflows in enterprise tools like ERP or CRM systems, and improving accessibility by letting users control interfaces through natural language. An optional Chrome extension extends the capability across multiple browser tabs, and an MCP server (a protocol for connecting AI tools) lets external agents control the browser. You install it via npm or include it as a script tag on your page. It is written in TypeScript and released under the MIT license. It is designed for client-side web enhancement in applications you own, not for automated scraping of third-party sites.

Copy-paste prompts

Prompt 1
Drop page-agent into my React SPA and wire it up to OpenAI so users can navigate by typing.
Prompt 2
Build a page-agent script that automates a 5-step checkout flow in my internal admin tool.
Prompt 3
Expose my web app to an external AI agent using page-agent's MCP server with auth.
Prompt 4
Compare page-agent to Playwright for in-browser automation in an enterprise CRM context.
Prompt 5
Help me restrict page-agent so it can only click elements inside a specific section of my app.

Frequently asked questions

What is page-agent?

JavaScript library that drives any web page from natural language instructions in the browser, using your own LLM API key and the page's DOM as text.

What language is page-agent written in?

Mainly TypeScript. The stack also includes TypeScript, JavaScript, MCP.

What license does page-agent use?

MIT license: use freely for any purpose including commercial use, as long as you keep the copyright notice.

How hard is page-agent to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is page-agent for?

Mainly developer.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub alibaba on gitmyhub

Verify against the repo before relying on details.