AGENTCAST: LIVE BROWSER SESSIONS FOR AI AGENTS

Watch your agents browse. Tell them what to click. In natural language.

BROWSER AUTOMATION 2025.12.08

THE PROBLEM

Problem: AI agents can call APIs. But most of the web isn't APIs. It's forms, buttons, dropdowns, CAPTCHAs, and JavaScript SPAs. Your agent needs to actually browse.

Solution: AgentCast extends the Cloudflare Agents SDK with real browser sessions. Natural language control via Stagehand. Live screencasting via CDP. Watch your agent work in real-time.

Browser Session Flow:

Client → Worker → BrowserAgent (DO)
                       ↓
              Container (Bun + Playwright)
                       ↓
              Chrome DevTools Protocol
                       ↓
              Stagehand (NL → Actions)
                       ↓
              Live Screencast Frames
                       ↓
              WebSocket → Client

THE FUN PART

Watching an AI agent navigate a website is genuinely entertaining. It's like watching a very fast intern who can't read social cues but somehow always finds the submit button. You'll catch yourself narrating: "No, not that dropdown... yes, that one... good robot."

ARCHITECTURE OVERVIEW

CORE COMPONENTS

• BrowserAgent - Extends Cloudflare Agent class
• Container - Bun + Playwright + Chrome
• Stagehand - Natural language → browser actions
• CDP - Chrome DevTools Protocol for screencasting
• WebSocket - Bidirectional control

TECH STACK

• Cloudflare Workers - Orchestration layer
• Cloudflare Containers - Chrome instances
• Durable Objects - Session state
• Playwright - Browser automation
• Zod - Typed data extraction

KEY FEATURES

Live screencasting - Real-time video stream of browser actions
Natural language control - "click the login button" just works
Bidirectional control - Viewers can click and type in the browser
Structured extraction - Use Zod schemas to extract typed data
Session tracking - Monitor status, activity timestamps, URLs
Agents SDK compatible - Extends the official Cloudflare Agent class

QUICK START

// Install the SDK
npm install agentcast

// or
bun add agentcast

BROWSER AGENT

Extend BrowserAgent. Use goto, act, extract. Natural language all the way down.

// agent.ts
import { BrowserAgent, type BrowserAgentEnv } from "agentcast";

export class MyAgent extends BrowserAgent<BrowserAgentEnv> {
  
  async onTask(task: string) {
    // Navigate to a page
    await this.goto("https://example.com");
    
    // Natural language actions via Stagehand
    await this.act("click the login button");
    await this.act("fill in username with 'demo@example.com'");
    await this.act("fill in password with 'secret123'");
    await this.act("click submit");
    
    // Extract structured data with Zod
    const data = await this.extract({
      instruction: "Get the user profile info",
      schema: z.object({
        name: z.string(),
        email: z.string(),
        plan: z.enum(["free", "pro", "enterprise"])
      })
    });
    
    return data;
  }
}

WORKER

Expose endpoints for starting sessions and streaming video.

// worker.ts
import { Hono } from "hono";
import { MyAgent } from "./agent";

const app = new Hono<{ Bindings: Env }>();

// Start a new browser session
app.post("/session", async (c) => {
  const { task } = await c.req.json();
  
  const id = c.env.AGENT.newUniqueId();
  const stub = c.env.AGENT.get(id);
  
  // Agent spins up container, starts browser, begins task
  const result = await stub.startTask(task);
  
  return c.json({ sessionId: id.toString(), result });
});

// Stream live video of agent actions
app.get("/session/:id/stream", async (c) => {
  const id = c.env.AGENT.idFromString(c.req.param("id"));
  const stub = c.env.AGENT.get(id);
  
  // Returns WebSocket upgrade for live screencast
  return stub.fetch(c.req.raw);
});

export default app;
export { MyAgent };

STRUCTURED EXTRACTION

Define Zod schemas. Get typed data from any page. No HTML parsing.

// Structured data extraction with Zod schemas
import { z } from "zod";

const productSchema = z.object({
  title: z.string(),
  price: z.number(),
  rating: z.number().optional(),
  inStock: z.boolean()
});

// Agent extracts typed data from any page
const product = await agent.extract({
  instruction: "Get the main product details",
  schema: productSchema
});

// TypeScript knows the shape
console.log(product.title);  // string
console.log(product.price);  // number
console.log(product.inStock); // boolean

LIVE VIEWER

WebSocket for screencast frames. Canvas rendering. Bidirectional control.

// SvelteKit viewer component
<script lang="ts">
  let canvas: HTMLCanvasElement;
  let ws: WebSocket;
  
  function connect(sessionId: string) {
    ws = new WebSocket(`wss://your-worker.workers.dev/session/${sessionId}/stream`);
    
    ws.onmessage = (e) => {
      const frame = JSON.parse(e.data);
      
      if (frame.type === 'screencast') {
        // Render CDP screencast frame to canvas
        const img = new Image();
        img.onload = () => {
          canvas.getContext('2d')?.drawImage(img, 0, 0);
        };
        img.src = `data:image/jpeg;base64,${frame.data}`;
      }
    };
  }
  
  function sendClick(x: number, y: number) {
    // Bidirectional control - viewer can interact
    ws.send(JSON.stringify({ type: 'click', x, y }));
  }
  
  function sendInstruction(text: string) {
    // Natural language control
    ws.send(JSON.stringify({ type: 'act', instruction: text }));
  }
</script>

<canvas bind:this={canvas} on:click={(e) => sendClick(e.offsetX, e.offsetY)} />
<input type="text" on:keydown={(e) => e.key === 'Enter' && sendInstruction(e.target.value)} />

WRANGLER CONFIG

Enable containers. Bind the Durable Object.

// wrangler.toml
name = "agentcast-demo"
main = "src/worker.ts"
compatibility_date = "2024-12-01"

[containers]
enabled = true

[[durable_objects.bindings]]
name = "AGENT"
class_name = "MyAgent"

[[migrations]]
tag = "v1"
new_classes = ["MyAgent"]

USE CASES

// Use cases

// 1. E-COMMERCE MONITORING
await agent.goto("https://competitor.com/products");
const prices = await agent.extract({
  instruction: "Get all product prices on this page",
  schema: z.array(z.object({ name: z.string(), price: z.number() }))
});

// 2. FORM AUTOMATION
await agent.act("fill in the contact form with our support email");
await agent.act("select 'Enterprise' from the plan dropdown");
await agent.act("click submit and wait for confirmation");

// 3. SCRAPING WITH INTERACTION
await agent.act("click 'Load More' until all results are visible");
const results = await agent.extract({
  instruction: "Extract all search results",
  schema: searchResultsSchema
});

// 4. TESTING & QA
await agent.goto("https://staging.myapp.com");
await agent.act("log in as test user");
await agent.act("navigate to settings");
const settings = await agent.extract({ instruction: "Get current settings", schema: settingsSchema });
expect(settings.notifications).toBe(true);

ARCHITECTURE PATTERNS

SESSION LIFECYCLE

• Worker creates BrowserAgent DO
• DO spins up container with Chrome
• Stagehand translates NL to actions
• CDP captures screencast frames
• WebSocket streams to viewers

BIDIRECTIONAL CONTROL

• Viewer sends click coordinates
• Viewer sends NL instructions
• Agent processes and executes
• Results stream back to viewer
• Human-in-the-loop when needed

STAGEHAND INTEGRATION

• NL instruction → DOM analysis
• Element identification via AI
• Action execution via Playwright
• No XPath, no CSS selectors
• Works on any page layout

DATA EXTRACTION

• Define schema with Zod
• AI reads visible content
• Returns typed, validated data
• No scraping, no parsing
• Handles dynamic content

PRODUCTION USE CASES

COMPETITOR MONITORING

Navigate to competitor sites, extract pricing, inventory, product details. Works even when they change their HTML structure.

FORM AUTOMATION

Fill out complex multi-step forms with natural language. Handle dropdowns, date pickers, file uploads.

END-TO-END TESTING

Write tests in natural language. "Log in, go to settings, change notification preferences." Watch them run in real-time.

LEAD RESEARCH

Visit company websites, LinkedIn profiles. Extract contact info, company details, recent news. Structured data, not raw HTML.

PERFORMANCE CHARACTERISTICS

Container startup: ~2-5s for Chrome initialization
Screencast latency: ~100-200ms frame delay
Action execution: ~1-3s per natural language instruction
Extraction: ~2-5s depending on page complexity
Session duration: Limited by container timeout (configurable)
Concurrent sessions: Limited by account container limits

OPERATIONAL CONSIDERATIONS

Container limits: Check your Cloudflare plan for container quotas
Memory usage: Chrome is hungry; allocate accordingly
Session cleanup: Implement timeouts to prevent runaway sessions
Rate limiting: Don't hammer target sites; add delays between actions
Authentication: Store credentials securely; consider secrets manager
Error handling: Pages fail; implement retry logic with backoff

Get new posts by email.

Join 6 other subscribers.