TL;DR — Browser Use gives AI agents a real browser. The model sees the page (via vision or structured DOM extraction), decides what to click/type/scroll, and the library executes those actions via Playwright. It handles multi-tab browsing, form filling, file uploads, cookie persistence, and captcha integration. The bridge between "the agent wants to check a website" and actually doing it.
What it is
Browser Use is a Python library that connects LLM agents to a Playwright-controlled browser. Each step: the library extracts the page state (DOM elements, screenshots, or both), sends it to the model, and executes the model's chosen action (click, type, scroll, navigate, extract). It runs the full agent loop internally or integrates as a tool into LangChain/other frameworks.
Why it exists
Web search tools return text snippets, but many agent tasks require actually interacting with websites — filling forms, navigating SPAs, reading dynamic content, comparing prices across tabs. Browser Use makes the browser a first-class agent tool instead of a brittle scraper hack.
Install & setup
pip install browser-use
playwright install chromium
export OPENAI_API_KEY=sk-...
Basic usage
from browser_use import Agent
from langchain_openai import ChatOpenAI
agent = Agent(
task="Go to google.com and search for 'vLLM PagedAttention'",
llm=ChatOpenAI(model="gpt-4o"),
)
import asyncio
result = asyncio.run(agent.run())
print(result)
Custom actions
from browser_use import Agent, Controller
controller = Controller()
@controller.action("Save page content to file")
async def save_content(content: str, filename: str):
with open(filename, "w") as f:
f.write(content)
return f"Saved to {filename}"
agent = Agent(
task="Go to news.ycombinator.com, get the top 5 stories, save to hn.txt",
llm=ChatOpenAI(model="gpt-4o"),
controller=controller,
)
As a LangChain tool
from browser_use.tools import BrowserTool
browser_tool = BrowserTool()
# Use in any LangChain agent as a tool
When to use, when to skip
Use it when your agent needs to interact with real web pages — form filling, navigation, data extraction from dynamic sites, multi-step web workflows.
Skip it for simple web scraping (Firecrawl or Jina Reader are faster/cheaper) or when APIs are available. Browser automation is slow and expensive (vision tokens + Playwright overhead).
vs the alternatives
| Tool | Best for | Trade-off |
|---|---|---|
| Browser Use | Full web interaction, form filling, SPAs | Slow, token-heavy |
| Firecrawl | Fast web scraping, clean markdown | Read-only |
| Jina Reader | URL-to-text conversion | Read-only, simpler |
| Selenium | Traditional browser automation | Not AI-native |
Verified against Browser Use docs (browser-use.com), May 2026.