TL;DR — Selenium is the veteran browser automation framework — WebDriver protocol, every browser, every language. In the AI agent era it's the fallback when you need raw browser control without AI-native abstractions. Stable, well-documented, battle-tested, but requires manual scripting of every interaction.
What it is
Selenium is an open-source browser automation framework built on the W3C WebDriver protocol. It launches a real browser (Chrome, Firefox, Edge, Safari), navigates to pages, finds elements, clicks, types, waits, and extracts data. Available in Python, Java, JavaScript, C#, Ruby, and Kotlin. It's been the industry standard for browser automation and testing since 2004.
Why it still matters
In a world of AI-native browser tools (Browser Use, Playwright-based agents), Selenium still matters because: it supports every browser including Safari, has the largest ecosystem of wrappers and CI integrations, is understood by every QA and DevOps engineer, and gives you deterministic, scriptable control without LLM costs. When your agent needs a reliable browser tool that does exactly what you tell it — no more, no less — Selenium delivers.
Install & setup
pip install selenium
# Selenium 4.6+ includes Selenium Manager,
# which auto-downloads the correct browser driver.
# No need for manual chromedriver installs.
Basic usage
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
driver.get("https://www.google.com")
search_box = driver.find_element(By.NAME, "q")
search_box.send_keys("vLLM PagedAttention")
search_box.send_keys(Keys.RETURN)
results = driver.find_elements(By.CSS_SELECTOR, "h3")
for r in results[:5]:
print(r.text)
driver.quit()
Explicit waits
Never use time.sleep() — use explicit waits for reliable automation:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
element = wait.until(
EC.presence_of_element_located((By.ID, "results"))
)
As an agent tool
Wrap Selenium actions as tool functions for your agent framework:
from langchain.tools import tool
@tool
def browse_url(url: str) -> str:
"""Navigate to a URL and return the page text."""
driver = webdriver.Chrome()
driver.get(url)
text = driver.find_element(By.TAG_NAME, "body").text
driver.quit()
return text[:5000]
This gives deterministic browser access without LLM-in-the-loop costs — the agent decides what to browse, Selenium handles how.
When to use, when to skip
Use it when you need deterministic browser automation, Safari support, integration with existing test infrastructure, or when you want to avoid LLM costs for browser actions. Also good when your team already knows Selenium.
Skip it when you need the agent to autonomously navigate unknown pages (use Browser Use), when you want modern async APIs (use Playwright), or when you just need page content as text (use Jina Reader or Firecrawl).
vs the alternatives
| Tool | Best for | Trade-off |
|---|---|---|
| Selenium | Deterministic browser control, widest browser support | Verbose, no AI-native features |
| Playwright | Modern async browser automation | No Safari WebKit on Windows |
| Browser Use | AI-driven autonomous browsing | LLM cost per action, less deterministic |
| Firecrawl | URL→markdown without browser | No interaction, API cost |
| Puppeteer | Chrome/Chromium-only automation | JS-only, Chrome-only |
Verified against Selenium docs (selenium.dev/documentation), May 2026.