Page Agent is a JavaScript library that lets you control any web page using natural language instructions. Instead of clicking buttons and filling forms manually, you describe what you want in plain text, like "Click the login button" or "Fill in the shipping address", and the library figures out which elements on the page need to interact with and performs the actions for you. The key distinction compared to other browser automation tools is that Page Agent runs directly inside the web page as ordinary JavaScript, not as a separate browser extension, a Python script, or a headless browser (a browser run programmatically without a visible window). It works by reading the page's structure as text rather than taking screenshots, which means it does not need a multimodal AI model that can interpret images. You bring your own AI model by providing an API key, and the library handles the interaction logic. The README describes several use cases: adding an AI copilot to a software product so users can navigate it with voice or text commands, automating repetitive multi-step workflows in enterprise tools like ERP or CRM systems, and improving accessibility by letting users control interfaces through natural language. An optional Chrome extension extends the capability across multiple browser tabs, and an MCP server (a protocol for connecting AI tools) lets external agents control the browser. You install it via npm or include it as a script tag on your page. It is written in TypeScript and released under the MIT license. It is designed for client-side web enhancement in applications you own, not for automated scraping of third-party sites.
Generated 2026-05-21 · Model: sonnet-4-6 · Verify against the repo before relying on details.