AGENTCAST: LIVE BROWSER SESSIONS FOR AI AGENTS
Watch your agents browse. Tell them what to click. In natural language.
THE PROBLEM
Problem: AI agents can call APIs. But most of the web isn't APIs. It's forms, buttons, dropdowns, CAPTCHAs, and JavaScript SPAs. Your agent needs to actually browse.
Solution: AgentCast extends the Cloudflare Agents SDK with real browser sessions. Natural language control via Stagehand. Live screencasting via CDP. Watch your agent work in real-time.
Browser Session Flow:
Client → Worker → BrowserAgent (DO)
↓
Container (Bun + Playwright)
↓
Chrome DevTools Protocol
↓
Stagehand (NL → Actions)
↓
Live Screencast Frames
↓
WebSocket → ClientTHE FUN PART
Watching an AI agent navigate a website is genuinely entertaining. It's like watching a very fast intern who can't read social cues but somehow always finds the submit button. You'll catch yourself narrating: "No, not that dropdown... yes, that one... good robot."
ARCHITECTURE OVERVIEW
CORE COMPONENTS
- • BrowserAgent - Extends Cloudflare Agent class
- • Container - Bun + Playwright + Chrome
- • Stagehand - Natural language → browser actions
- • CDP - Chrome DevTools Protocol for screencasting
- • WebSocket - Bidirectional control
TECH STACK
- • Cloudflare Workers - Orchestration layer
- • Cloudflare Containers - Chrome instances
- • Durable Objects - Session state
- • Playwright - Browser automation
- • Zod - Typed data extraction
KEY FEATURES
- Live screencasting - Real-time video stream of browser actions
- Natural language control - "click the login button" just works
- Bidirectional control - Viewers can click and type in the browser
- Structured extraction - Use Zod schemas to extract typed data
- Session tracking - Monitor status, activity timestamps, URLs
- Agents SDK compatible - Extends the official Cloudflare Agent class
QUICK START
// Install the SDK
npm install agentcast
// or
bun add agentcastBROWSER AGENT
Extend BrowserAgent. Use goto, act, extract. Natural language all the way down.
// agent.ts
import { BrowserAgent, type BrowserAgentEnv } from "agentcast";
export class MyAgent extends BrowserAgent<BrowserAgentEnv> {
async onTask(task: string) {
// Navigate to a page
await this.goto("https://example.com");
// Natural language actions via Stagehand
await this.act("click the login button");
await this.act("fill in username with 'demo@example.com'");
await this.act("fill in password with 'secret123'");
await this.act("click submit");
// Extract structured data with Zod
const data = await this.extract({
instruction: "Get the user profile info",
schema: z.object({
name: z.string(),
email: z.string(),
plan: z.enum(["free", "pro", "enterprise"])
})
});
return data;
}
}WORKER
Expose endpoints for starting sessions and streaming video.
// worker.ts
import { Hono } from "hono";
import { MyAgent } from "./agent";
const app = new Hono<{ Bindings: Env }>();
// Start a new browser session
app.post("/session", async (c) => {
const { task } = await c.req.json();
const id = c.env.AGENT.newUniqueId();
const stub = c.env.AGENT.get(id);
// Agent spins up container, starts browser, begins task
const result = await stub.startTask(task);
return c.json({ sessionId: id.toString(), result });
});
// Stream live video of agent actions
app.get("/session/:id/stream", async (c) => {
const id = c.env.AGENT.idFromString(c.req.param("id"));
const stub = c.env.AGENT.get(id);
// Returns WebSocket upgrade for live screencast
return stub.fetch(c.req.raw);
});
export default app;
export { MyAgent };STRUCTURED EXTRACTION
Define Zod schemas. Get typed data from any page. No HTML parsing.
// Structured data extraction with Zod schemas
import { z } from "zod";
const productSchema = z.object({
title: z.string(),
price: z.number(),
rating: z.number().optional(),
inStock: z.boolean()
});
// Agent extracts typed data from any page
const product = await agent.extract({
instruction: "Get the main product details",
schema: productSchema
});
// TypeScript knows the shape
console.log(product.title); // string
console.log(product.price); // number
console.log(product.inStock); // booleanLIVE VIEWER
WebSocket for screencast frames. Canvas rendering. Bidirectional control.
// SvelteKit viewer component
<script lang="ts">
let canvas: HTMLCanvasElement;
let ws: WebSocket;
function connect(sessionId: string) {
ws = new WebSocket(`wss://your-worker.workers.dev/session/${sessionId}/stream`);
ws.onmessage = (e) => {
const frame = JSON.parse(e.data);
if (frame.type === 'screencast') {
// Render CDP screencast frame to canvas
const img = new Image();
img.onload = () => {
canvas.getContext('2d')?.drawImage(img, 0, 0);
};
img.src = `data:image/jpeg;base64,${frame.data}`;
}
};
}
function sendClick(x: number, y: number) {
// Bidirectional control - viewer can interact
ws.send(JSON.stringify({ type: 'click', x, y }));
}
function sendInstruction(text: string) {
// Natural language control
ws.send(JSON.stringify({ type: 'act', instruction: text }));
}
</script>
<canvas bind:this={canvas} on:click={(e) => sendClick(e.offsetX, e.offsetY)} />
<input type="text" on:keydown={(e) => e.key === 'Enter' && sendInstruction(e.target.value)} />WRANGLER CONFIG
Enable containers. Bind the Durable Object.
// wrangler.toml
name = "agentcast-demo"
main = "src/worker.ts"
compatibility_date = "2024-12-01"
[containers]
enabled = true
[[durable_objects.bindings]]
name = "AGENT"
class_name = "MyAgent"
[[migrations]]
tag = "v1"
new_classes = ["MyAgent"]USE CASES
// Use cases
// 1. E-COMMERCE MONITORING
await agent.goto("https://competitor.com/products");
const prices = await agent.extract({
instruction: "Get all product prices on this page",
schema: z.array(z.object({ name: z.string(), price: z.number() }))
});
// 2. FORM AUTOMATION
await agent.act("fill in the contact form with our support email");
await agent.act("select 'Enterprise' from the plan dropdown");
await agent.act("click submit and wait for confirmation");
// 3. SCRAPING WITH INTERACTION
await agent.act("click 'Load More' until all results are visible");
const results = await agent.extract({
instruction: "Extract all search results",
schema: searchResultsSchema
});
// 4. TESTING & QA
await agent.goto("https://staging.myapp.com");
await agent.act("log in as test user");
await agent.act("navigate to settings");
const settings = await agent.extract({ instruction: "Get current settings", schema: settingsSchema });
expect(settings.notifications).toBe(true);ARCHITECTURE PATTERNS
SESSION LIFECYCLE
- • Worker creates BrowserAgent DO
- • DO spins up container with Chrome
- • Stagehand translates NL to actions
- • CDP captures screencast frames
- • WebSocket streams to viewers
BIDIRECTIONAL CONTROL
- • Viewer sends click coordinates
- • Viewer sends NL instructions
- • Agent processes and executes
- • Results stream back to viewer
- • Human-in-the-loop when needed
STAGEHAND INTEGRATION
- • NL instruction → DOM analysis
- • Element identification via AI
- • Action execution via Playwright
- • No XPath, no CSS selectors
- • Works on any page layout
DATA EXTRACTION
- • Define schema with Zod
- • AI reads visible content
- • Returns typed, validated data
- • No scraping, no parsing
- • Handles dynamic content
PRODUCTION USE CASES
COMPETITOR MONITORING
Navigate to competitor sites, extract pricing, inventory, product details. Works even when they change their HTML structure.
FORM AUTOMATION
Fill out complex multi-step forms with natural language. Handle dropdowns, date pickers, file uploads.
END-TO-END TESTING
Write tests in natural language. "Log in, go to settings, change notification preferences." Watch them run in real-time.
LEAD RESEARCH
Visit company websites, LinkedIn profiles. Extract contact info, company details, recent news. Structured data, not raw HTML.
PERFORMANCE CHARACTERISTICS
- Container startup: ~2-5s for Chrome initialization
- Screencast latency: ~100-200ms frame delay
- Action execution: ~1-3s per natural language instruction
- Extraction: ~2-5s depending on page complexity
- Session duration: Limited by container timeout (configurable)
- Concurrent sessions: Limited by account container limits
OPERATIONAL CONSIDERATIONS
- Container limits: Check your Cloudflare plan for container quotas
- Memory usage: Chrome is hungry; allocate accordingly
- Session cleanup: Implement timeouts to prevent runaway sessions
- Rate limiting: Don't hammer target sites; add delays between actions
- Authentication: Store credentials securely; consider secrets manager
- Error handling: Pages fail; implement retry logic with backoff