Skip to content

Browser Automation

Launch a browser, interact with any web app, and verify UI state — all driven by your AI coding agent.

Quick Start

The basic workflow is: launch → inspect → interact → verify → close. Use /verify to kick it off:

/verify that the Sign Up flow works at http://localhost:3000

Your agent handles the rest — it launches a browser, takes screenshots, reads the DOM, performs actions, and reports back.

Slash commands

Shiplight provides slash commands that guide your AI agent through common workflows:

  • /verify — Open a browser and verify UI changes
  • /create-tests — Walk through your app and create reusable YAML E2E test cases
  • /triage — Reproduce failing tests, diagnose root causes, and fix YAML E2E tests

Type these directly in your AI coding agent (Claude Code, Cursor, etc.) to trigger the corresponding workflow.

Use Cases

Verify UI Changes

The most common use case. After implementing a feature or fixing a bug, ask your agent to check it in a real browser.

/verify I just added a dark mode toggle to the settings page.
Open http://localhost:3000/settings, click the toggle,
and verify the page switches to dark mode.

To get a video recording of the verification for review:

/verify the checkout flow works end-to-end.
Record the session so I can watch it.

Your agent records the session and gives you a video file and a Playwright trace when done.

Test with Authentication

For apps that require login, your agent logs in once and saves the session. Future sessions restore it automatically — no login flow needed.

First time — login and save:

/verify http://localhost:3000/dashboard — log in with
test@example.com / password123

Subsequent sessions — the agent restores the saved session automatically:

/verify http://localhost:3000/dashboard loads correctly

Use a Persistent Chrome Profile

Instead of starting fresh every time, you can point your agent at a Chrome profile directory. Everything persists between sessions — login state, cookies, bookmarks, extensions, localStorage.

First time — create a profile and log in:

/verify https://app.example.com — use the Chrome profile
at ./my-chrome-profile. I'll log in manually.

Log in once in the browser window. The profile saves everything.

Subsequent sessions — your login is already there:

/verify https://app.example.com/dashboard —
use the Chrome profile at ./my-chrome-profile

No credentials, no storage state files, no setup — just point to the profile directory and go. This works great for:

  • Apps with Google OAuth / SSO — log in once manually, reuse forever
  • Complex authenticated state — multi-step onboarding, 2FA, org switching
  • Extension testing — combine with path_to_extension to test extensions with real login state

Without a Chrome profile, a fresh temp profile is created and cleaned up when the session closes.

Test Chrome Extensions

Test an unpacked Chrome extension by loading it into the browser at launch.

/verify my Chrome extension at ./my-extension works —
open https://example.com and check that the extension
injected its UI into the page.

Your agent launches Chromium in headed mode with the extension loaded. It can see and interact with any UI the extension injects into the page — content scripts, banners, sidebars, or modified DOM elements.

Combine with a persistent Chrome profile to test extensions with authenticated state:

/verify my extension at ./my-extension works on the dashboard —
use the Chrome profile at ./my-chrome-profile and open
https://app.example.com/dashboard

TIP

Screenshots and video recordings capture the page content only — the browser toolbar (with extension icons) is not included. Verify your extension works by checking its effects: DOM changes, console logs, network requests, or storage state.

Test on an Existing Browser

If you need to control a browser that's already open — with specific tabs, complex state, or extensions you've configured manually — you can attach to it instead of launching a new one.

Simpler alternative

For most cases, a persistent Chrome profile is simpler and doesn't require any setup. Use the relay approach below only when you need to interact with a browser that's already running.

This uses the Shiplight Chrome Extension as a relay between your agent and your browser. See Attach to Existing Browser for setup — you'll need to install the extension and enable the relay server. A direct CDP connection (no extension) is also supported.

Attach to my browser and check if the payment form
on the current page has any console errors.

Your agent auto-discovers tabs via the extension relay — no URL or configuration needed.

Test Responsive and Mobile Layouts

Test how your app looks on different screen sizes and devices.

/verify http://localhost:3000 on mobile (390x844 viewport, touch enabled) —
check that the hamburger menu appears instead of the desktop navigation bar.

Available emulation options:

OptionExampleDescription
viewport390x844, 768x1024Screen dimensions
is_mobileMobile CSS media queries, meta viewport
has_touchTouch events
user_agentCustom UA stringOverride the browser user agent
color_schemedark, lightEmulated color scheme
localeja-JP, fr-FRBrowser locale
timezone_idAmerica/New_YorkEmulated timezone
geolocationlat/lng coordinatesEmulated GPS location

Debug Failures

When something goes wrong, your agent can inspect console errors and network requests to diagnose issues.

/verify http://localhost:3000/dashboard — click "Load Data"
and check if there are any console errors or failed network requests.
/verify the submit button on the form page works.
Try submitting and show me any JavaScript errors and failed API calls.

Your agent checks console errors and failed network requests, and reports what it finds.

Performing Actions

Your agent can interact with any element on the page:

  • Click, type, select — "Click the Sign In button", "Type 'hello@example.com' in the email field", "Select 'Monthly' from the billing dropdown"
  • Scroll, navigate — "Scroll down to the pricing section", "Go back to the previous page"
  • Files — "Upload the file at /tmp/report.pdf"
  • Keyboard — "Press Enter to submit the form", "Press Escape to close the modal"
  • Tabs — "Switch to the second tab", "Close the current tab"

All browser actions are deterministic — no LLM API keys are needed for the MCP server. Your coding agent (Claude, Cursor, etc.) handles the reasoning; Shiplight just executes the actions.

For the full list of MCP tools, environment variables, and how the server is configured, see MCP Server.

Released under the MIT License.