// AI NATIVE STACK

AI Native › AI Agent › Agent Tool › Browser Use

CRASH COURSE · AI-NATIVE · beginner · 10 min read · v0.2

Browser Use — let your agent browse the web like a human.

agent-tool ai-native browser-use web-automation python

TL;DR — Browser Use gives AI agents a real browser. The model sees the page (via vision or structured DOM extraction), decides what to click/type/scroll, and the library executes those actions via Playwright. It handles multi-tab browsing, form filling, file uploads, cookie persistence, and captcha integration. The bridge between "the agent wants to check a website" and actually doing it.

What it is

Browser Use is a Python library that connects LLM agents to a Playwright-controlled browser. Each step: the library extracts the page state (DOM elements, screenshots, or both), sends it to the model, and executes the model's chosen action (click, type, scroll, navigate, extract). It runs the full agent loop internally or integrates as a tool into LangChain/other frameworks.

Why it exists

Web search tools return text snippets, but many agent tasks require actually interacting with websites — filling forms, navigating SPAs, reading dynamic content, comparing prices across tabs. Browser Use makes the browser a first-class agent tool instead of a brittle scraper hack.

Install & setup

pip install browser-use
playwright install chromium
export OPENAI_API_KEY=sk-...

Basic usage

from browser_use import Agent
from langchain_openai import ChatOpenAI

agent = Agent(
    task="Go to google.com and search for 'vLLM PagedAttention'",
    llm=ChatOpenAI(model="gpt-4o"),
)

import asyncio
result = asyncio.run(agent.run())
print(result)

Custom actions

from browser_use import Agent, Controller

controller = Controller()

@controller.action("Save page content to file")
async def save_content(content: str, filename: str):
    with open(filename, "w") as f:
        f.write(content)
    return f"Saved to {filename}"

agent = Agent(
    task="Go to news.ycombinator.com, get the top 5 stories, save to hn.txt",
    llm=ChatOpenAI(model="gpt-4o"),
    controller=controller,
)

As a LangChain tool

from browser_use.tools import BrowserTool

browser_tool = BrowserTool()
# Use in any LangChain agent as a tool

When to use, when to skip

Use it when your agent needs to interact with real web pages — form filling, navigation, data extraction from dynamic sites, multi-step web workflows.

Skip it for simple web scraping (Firecrawl or Jina Reader are faster/cheaper) or when APIs are available. Browser automation is slow and expensive (vision tokens + Playwright overhead).

vs the alternatives

ToolBest forTrade-off
Browser UseFull web interaction, form filling, SPAsSlow, token-heavy
FirecrawlFast web scraping, clean markdownRead-only
Jina ReaderURL-to-text conversionRead-only, simpler
SeleniumTraditional browser automationNot AI-native

Verified against Browser Use docs (browser-use.com), May 2026.

← AI Native Stack
© cvam — written in plaintext, served warm