explaingit

vercel-labs/agent-browser

📈 Trending33,359RustAudience · developerComplexity · 3/5ActiveLicenseSetup · moderate

TLDR

A command-line tool that lets AI agents control a web browser, clicking buttons, filling forms, taking screenshots, through simple text commands.

Mindmap

mindmap
  root((agent-browser))
    What it does
      Control browser via CLI
      Click buttons and links
      Fill forms automatically
      Take screenshots
    How it works
      Launches Chrome browser
      Accessibility tree snapshots
      Element reference IDs
      Natural language mode
    Use cases
      AI web automation
      Form filling workflows
      Content scraping
      Repetitive web tasks
    Tech stack
      Rust core
      Chrome for Testing
      npm distribution
    Audience
      AI engineers
      Automation builders
      Vibe coders

Things people build with this

USE CASE 1

Build an AI agent that autonomously fills out web forms and submits them without human intervention.

USE CASE 2

Automate repetitive web tasks like logging in, navigating pages, and extracting data from multiple websites.

USE CASE 3

Create a chatbot that can browse the web, read page content, and answer questions about what it finds.

USE CASE 4

Scrape dynamic web content by having an agent click through pages and capture screenshots or text.

Tech stack

RustChrome for TestingNode.jsnpmCargo

Getting it running

Difficulty · moderate Time to first run · 30min

Requires Chrome for Testing binary download and Rust/Cargo build compilation.

Use freely for any purpose including commercial. Keep the notice and disclose changes to the patent grant.

In plain English

Agent-browser is a command-line tool that lets AI agents control a web browser programmatically, opening pages, clicking buttons, filling in forms, taking screenshots, and extracting information, all from simple text commands in a terminal. It was built by Vercel Labs specifically to power automated browser tasks inside AI-driven workflows. The problem it solves is that AI agents often need to interact with the web just like a human would: navigating to a URL, reading the page content, clicking a link, or submitting a form. Most existing tools for browser automation are designed for software testing and can be heavy or slow. Agent-browser is designed to be extremely fast and lightweight, making it well-suited for AI pipelines where the agent issues many browser commands in sequence. It works by launching a Chrome browser in the background (using Google's official "Chrome for Testing" channel) and exposing a set of clean command-line instructions to control it: things like "click this element", "fill this input field", "take a screenshot", or "get the text of this element". The tool can identify elements by reference IDs from an accessibility tree snapshot, a structured representation of everything visible on the page, which is particularly useful for AI agents that reason about page structure rather than pixel positions. It also supports natural-language commands through a built-in AI chat mode. You would use this tool when building an AI agent that needs to browse the web, fill out forms, scrape content, or automate repetitive web tasks. The core binary is written in Rust for maximum performance, and it is distributed via npm, Homebrew, or Cargo (Rust's package manager).

Copy-paste prompts

Prompt 1
How do I set up agent-browser to let my AI agent click buttons and fill forms on a website?
Prompt 2
Show me how to use the accessibility tree snapshot feature to identify page elements for my automation script.
Prompt 3
How can I integrate agent-browser into my Node.js project to automate web scraping tasks?
Prompt 4
What's the best way to use agent-browser's natural language mode to let an AI agent understand and interact with web pages?
Prompt 5
How do I take screenshots and extract text from a webpage using agent-browser commands?
Open on GitHub → Explain another repo

Generated 2026-05-18 · Model: sonnet-4-6 · Verify against the repo before relying on details.