// AI NATIVE STACK

AI Native › AI Agent › Agent Tool › Selenium

CRASH COURSE · AI-NATIVE · beginner · 10 min read · v4

Selenium — the original browser automation, now an agent tool.

agent-tool ai-native selenium browser automation

TL;DR — Selenium is the veteran browser automation framework — WebDriver protocol, every browser, every language. In the AI agent era it's the fallback when you need raw browser control without AI-native abstractions. Stable, well-documented, battle-tested, but requires manual scripting of every interaction.

What it is

Selenium is an open-source browser automation framework built on the W3C WebDriver protocol. It launches a real browser (Chrome, Firefox, Edge, Safari), navigates to pages, finds elements, clicks, types, waits, and extracts data. Available in Python, Java, JavaScript, C#, Ruby, and Kotlin. It's been the industry standard for browser automation and testing since 2004.

Why it still matters

In a world of AI-native browser tools (Browser Use, Playwright-based agents), Selenium still matters because: it supports every browser including Safari, has the largest ecosystem of wrappers and CI integrations, is understood by every QA and DevOps engineer, and gives you deterministic, scriptable control without LLM costs. When your agent needs a reliable browser tool that does exactly what you tell it — no more, no less — Selenium delivers.

Install & setup

pip install selenium

# Selenium 4.6+ includes Selenium Manager,
# which auto-downloads the correct browser driver.
# No need for manual chromedriver installs.

Basic usage

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get("https://www.google.com")

search_box = driver.find_element(By.NAME, "q")
search_box.send_keys("vLLM PagedAttention")
search_box.send_keys(Keys.RETURN)

results = driver.find_elements(By.CSS_SELECTOR, "h3")
for r in results[:5]:
    print(r.text)

driver.quit()

Explicit waits

Never use time.sleep() — use explicit waits for reliable automation:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)
element = wait.until(
    EC.presence_of_element_located((By.ID, "results"))
)

As an agent tool

Wrap Selenium actions as tool functions for your agent framework:

from langchain.tools import tool

@tool
def browse_url(url: str) -> str:
    """Navigate to a URL and return the page text."""
    driver = webdriver.Chrome()
    driver.get(url)
    text = driver.find_element(By.TAG_NAME, "body").text
    driver.quit()
    return text[:5000]

This gives deterministic browser access without LLM-in-the-loop costs — the agent decides what to browse, Selenium handles how.

When to use, when to skip

Use it when you need deterministic browser automation, Safari support, integration with existing test infrastructure, or when you want to avoid LLM costs for browser actions. Also good when your team already knows Selenium.

Skip it when you need the agent to autonomously navigate unknown pages (use Browser Use), when you want modern async APIs (use Playwright), or when you just need page content as text (use Jina Reader or Firecrawl).

vs the alternatives

ToolBest forTrade-off
SeleniumDeterministic browser control, widest browser supportVerbose, no AI-native features
PlaywrightModern async browser automationNo Safari WebKit on Windows
Browser UseAI-driven autonomous browsingLLM cost per action, less deterministic
FirecrawlURL→markdown without browserNo interaction, API cost
PuppeteerChrome/Chromium-only automationJS-only, Chrome-only

Verified against Selenium docs (selenium.dev/documentation), May 2026.

← AI Native Stack
© cvam — written in plaintext, served warm