Skip to content

Browser Automation

Launch a browser, interact with any web app, and verify UI state — all driven by your AI coding agent.

What Your Agent Can Do

  • Open a browser and navigate to your app
  • Inspect page state via screenshots and DOM structure
  • Interact with elements: click, type, select, scroll, upload files
  • Assert UI state with natural language (e.g., "the login form should be visible")
  • Capture console errors and network requests

Example prompt:

Add a "Forgot Password?" link below the login form. After implementing,
use Shiplight to verify your implementation in the browser.

Session Tools

ToolDescription
new_sessionCreate a browser session with optional device emulation and auto-login
close_sessionClose a browser session
close_allClose all browser sessions
get_session_stateGet current URL and session info
save_storage_stateSave cookies/localStorage for fast session restore

Page Inspection

ToolDescription
navigateNavigate to a URL
get_page_infoGet current page URL and title
get_domDOM tree with interactive element indices
take_screenshotSet-of-Mark screenshot matching element indices
get_locatorExtract Playwright locator/xpath for an element

Performing Actions

Shiplight can interact with any element on the page using natural language. Examples:

  • "Click the Sign In button"
  • "Type 'hello@example.com' in the email field"
  • "Select 'Monthly' from the billing dropdown"
  • "Upload the file at /tmp/report.pdf"
  • "Scroll down to the pricing section"
  • "Press Enter to submit the form"
  • "Go back to the previous page"

AI-Powered Assertions & Extraction

Shiplight uses a secondary AI model to reason about the page for verification and data extraction. Examples:

  • "Verify the error message is not visible"
  • "Check that the user's name appears in the top right corner"
  • "Assert the form submission was successful"
  • "Extract the order total into a variable"
  • "Wait until the loading spinner disappears"

TIP

Basic interactions (clicks, typing, scrolling) work without API keys. AI-powered assertions and extraction require GOOGLE_API_KEY or ANTHROPIC_API_KEY.

Debugging Tools

ToolDescription
get_browser_console_logsGet browser console output with filtering
get_browser_network_logsGet network requests with status filtering
clear_logsClear console and network logs

Configuration

All configuration is done through environment variables in your MCP server config.

Environment Variables

VariableRequiredDescriptionDefault
GOOGLE_API_KEYFor AI-powered actionsGoogle AI API key
ANTHROPIC_API_KEY(one of these two)Anthropic API key
WEB_AGENT_MODELIf using AI toolsAI model for the web agent
PWDEBUGNoSet to console to enable Playwright debug logging

AI Model Options

ProviderAPI KeySupported Models
GoogleGOOGLE_API_KEYgemini-2.5-pro, gemini-3-pro-preview
AnthropicANTHROPIC_API_KEYclaude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-6

Released under the MIT License.