ntegrals/openbrowser

★ 9,444TypeScriptAudience · developerComplexity · 3/5LicenseSetup · moderate

Mindmap

mindmap
  root((repo))
    What it does
      AI browser control
      Plain English tasks
      Screenshot and act loop
    AI providers
      OpenAI
      Anthropic Claude
      Google AI
    Features
      Cost tracking
      Session replay
      Sandboxed execution
      Interactive prompt
    Setup
      Bun runtime
      Your own API keys
      npm or bun install

mindmap root((repo)) What it does AI browser control Plain English tasks Screenshot and act loop AI providers OpenAI Anthropic Claude Google AI Features Cost tracking Session replay Sandboxed execution Interactive prompt Setup Bun runtime Your own API keys npm or bun install

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Build an AI agent that fills out web forms, navigates multi-step flows, and extracts page data based on plain-English task descriptions instead of brittle selectors.

USE CASE 2

Automate repetitive browser research tasks by describing the goal to an AI rather than writing step-by-step Playwright scripts.

USE CASE 3

Run sandboxed browser agents in production pipelines with memory limits, timeouts, and domain restrictions to keep resource usage predictable.

Tech stack

TypeScriptPlaywrightBunOpenAIAnthropicGoogle AI

Getting it running

Difficulty · moderate Time to first run · 30min

Requires API keys for OpenAI, Anthropic, or Google AI, uses the Bun JavaScript runtime rather than standard Node.js.

MIT licensed: use freely for any purpose including commercial projects, as long as you keep the copyright notice.

In plain English

Open Browser is a TypeScript framework that lets you give an AI language model control of a web browser. You describe what you want accomplished in plain text, and the AI agent figures out the steps: clicking buttons, filling in forms, navigating between pages, and pulling out information. The underlying browser control is handled by Playwright, a standard tool for automating web browsers, and the AI reasoning can use models from OpenAI, Anthropic, or Google. The way it works is straightforward: you provide a task description, and on each step the agent takes a screenshot and reads the page structure, sends that to an AI model to decide what action to take next, then carries out that action in the browser. This cycle repeats until the task is finished or a step limit is reached. The README includes a diagram illustrating this loop. The project comes with three pieces. The core library handles the agent logic, browser interaction, and AI model integration. A command-line tool lets you run agents or issue individual browser commands (open a URL, click an element, take a screenshot, extract content as markdown) directly from a terminal. A sandboxed execution environment lets you run agents with memory limits, timeouts, domain restrictions, and CPU monitoring, which is useful when running agents in production or in automated pipelines where you need predictable resource usage. Additional features mentioned in the README include an interactive session where you can type commands into a live browser prompt for testing and debugging, cost tracking so you can see how much each agent run is spending on AI API calls, and session replay recording. Configuration options cover step limits, screenshot frequency, allowed and blocked URLs, proxy settings, and more. The project is MIT licensed, built with Bun (a JavaScript runtime), and requires your own API keys for whichever AI provider you want to use.

Copy-paste prompts

Prompt 1

Using Open Browser with the Anthropic Claude API, write a TypeScript script that logs into a website and extracts all text from the main dashboard.

Prompt 2

Show me how to set up Open Browser with GPT-4o to search Google for a keyword and return the titles and URLs of the first five results.

Prompt 3

How do I enable sandboxed execution mode in Open Browser with a 30-second timeout and restrict the agent to one specific domain?

Prompt 4

Set up cost tracking in Open Browser so I can log and review how much each agent run spends on AI API calls.

Prompt 5

Configure session replay recording in Open Browser so I can watch back exactly what the AI agent did after each task run.

Open on GitHub → Explain another repo

← ntegrals on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.