TRUST BROKER: HUMAN-MEDIATED AGENT COMMUNICATION
You're the referee. Then the coach. Then the guy in the stands eating popcorn.
THE PROBLEM
Two AI agents need to work together. But they hallucinate. They misunderstand. They go off script. You can't just wire them up and walk away.
Traditional approach: Hope for the best or hardcode every interaction.
Trust Broker pattern: Start as active mediatorβread every message, approve/edit before delivery. As agents prove themselves, fade into monitoring. Eventually, just watch the logs.
THE PROGRESSION
Think of it like teaching two specialists to collaborate. First meeting? You're translating everything. After a few successful projects? You're just checking in. Once they've shipped a dozen things together? You're getting status updates.
The architecture supports four trust levels. You start high-touch, earn your way to hands-off.
TRUST LEVELS
// Trust levels define human involvement type TrustLevel = | "full-mediation" // Human reads/approves every message | "spot-check" // Human samples conversations | "alert-only" // Human notified on anomalies | "autonomous"; // Agents run free type ConversationState = { agentA: string; agentB: string; trustLevel: TrustLevel; messageCount: number; successRate: number; lastReview: Date; };
ARCHITECTURE FLOW
TRUST PROGRESSION βββββββββββββββββββββββββββββββββββββββββββββββββββ β Phase 1: FULL MEDIATION (Couples Therapy Mode) β βββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Agent A ββ> [Human Reviews] ββ> Agent B β β βββ [Human Edits] βββ β β β β β’ Every message blocked for approval β β β’ Human can edit before delivery β β β’ Building trust baseline β β β βββββββββββββββββββββββββββββββββββββββββββββββββββ β β (Good track record) β βββββββββββββββββββββββββββββββββββββββββββββββββββ β Phase 2: SPOT CHECK (Supervision Mode) β βββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Agent A βββββββ¬ββ> Agent B (90%) β β β β β βββ> [Human Samples] (10%) β β β β β’ Most messages flow directly β β β’ Random sampling for quality β β β’ Sample rate adapts to performance β β β βββββββββββββββββββββββββββββββββββββββββββββββββββ β β (High success rate) β βββββββββββββββββββββββββββββββββββββββββββββββββββ β Phase 3: ALERT ONLY (Monitor Mode) β βββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Agent A ββββββββ> Agent B β β β β β βββ> [Anomaly Detection] β β β β β βββ> Human (if bad) β β β β β’ Real-time message flow β β β’ Human notified on anomalies only β β β’ Async analysis doesn't block β β β βββββββββββββββββββββββββββββββββββββββββββββββββββ β β (Sustained excellence) β βββββββββββββββββββββββββββββββββββββββββββββββββββ β Phase 4: AUTONOMOUS (Full Trust Mode) β βββββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Agent A βββββββββββ> Agent B β β β β β’ Direct communication β β β’ Human can observe logs β β β’ Automatic demotion on issues β β β βββββββββββββββββββββββββββββββββββββββββββββββββββ
PHASE 1: FULL MEDIATION
Every message stops at your desk. You read it, maybe edit it, then approve or reject.
This is the couples therapy phase. Agent A wants to tell Agent B something? You're reading that message first. If A is asking B to "delete all users," you're catching that before it goes through.
// Full mediation: human in the loop async function mediatedExchange( from: Agent, to: Agent, message: Message, human: HumanReviewer ) { // Agent A wants to send to Agent B const proposal = await from.composeMessage(message); // Human reviews before it goes through const decision = await human.review({ from: from.id, to: to.id, content: proposal, context: getConversationHistory(from.id, to.id) }); if (decision.approved) { // Optionally edit the message const finalMessage = decision.edited || proposal; await to.receive(finalMessage, from.id); // Track success updateTrustMetrics(from.id, to.id, "success"); } else { // Log rejection for learning await logRejection(from.id, to.id, proposal, decision.reason); } }
PHASE 2: SPOT CHECK
Messages flow freely, but you're sampling conversations. Start at 10%, adjust based on quality.
Agents have earned some trust. They can talk without you blocking every message. But you're still samplingβmaybe 1 in 10 messages gets reviewed. If quality drops, crank the rate back up. If they're crushing it, dial it down.
// Spot check: sample conversations class SpotCheckMediator { private checkRate: number = 0.1; // 10% of messages async handleMessage(from: Agent, to: Agent, msg: Message) { const shouldCheck = Math.random() < this.checkRate; if (shouldCheck) { // Queue for human review await this.reviewQueue.add({ from: from.id, to: to.id, message: msg, timestamp: Date.now() }); // But still let it through await to.receive(msg, from.id); } else { // Direct delivery await to.receive(msg, from.id); } // Adjust check rate based on metrics this.adaptCheckRate(from.id, to.id); } adaptCheckRate(fromId: string, toId: string) { const metrics = getMetrics(fromId, toId); // More checks if quality drops if (metrics.successRate < 0.85) { this.checkRate = Math.min(1.0, this.checkRate * 1.5); } // Fewer checks if quality is high if (metrics.successRate > 0.95) { this.checkRate = Math.max(0.01, this.checkRate * 0.8); } } }
PHASE 3: ALERT ONLY
Real-time communication. You get pinged if something looks weird.
Messages go through immediately. You're analyzing asyncβlooking for anomalies, sudden topic changes, confidence drops. Most of the time you see nothing. When something's off, you get a notification.
// Alert only: human is notified, doesn't block async function alertBasedMediation( from: Agent, to: Agent, msg: Message ) { // Message goes through immediately const delivery = to.receive(msg, from.id); // But we analyze it async const analysis = analyzeMessage(msg, { conversationHistory: getHistory(from.id, to.id), agentProfiles: [from.profile, to.profile], recentMetrics: getRecentMetrics(from.id, to.id) }); // Parallel: delivery and analysis const [result, score] = await Promise.all([delivery, analysis]); // Alert human only if suspicious if (score.anomalyScore > 0.7) { await notifyHuman({ type: "anomaly-detected", from: from.id, to: to.id, message: msg, score: score, action: "review-conversation" }); } return result; }
PHASE 4: AUTONOMOUS
Agents talk directly. You can watch the logs if you want.
Full trust mode. They've done this 500 times without screwing up. They're just talking now. You can review conversation history if you're curious, but you're not in the loop. One bad exchange? They get demoted back to alert-only.
DURABLE OBJECT IMPLEMENTATION
One DO per agent pair. Tracks conversation, trust level, metrics.
The mediator is a Durable Object instance per agent pair. It holds conversation history, current trust level, success rates. When a message comes in, it routes based on trust level. When metrics update, it can auto-promote or demote the pair.
// Durable Object per agent-pair export class AgentPairMediator extends DurableObject { private history: Message[] = []; private trustLevel: TrustLevel = "full-mediation"; private metrics = { totalMessages: 0, humanInterventions: 0, successRate: 1.0 }; async handleMessage(request: Request) { const { from, to, message, type } = await request.json(); switch (type) { case "send": return this.mediateMessage(from, to, message); case "human-review": return this.handleHumanReview(message); case "update-trust": return this.updateTrustLevel(); case "get-pending": return this.getPendingReviews(); } } async mediateMessage(from: string, to: string, msg: Message) { this.history.push({ from, to, msg, timestamp: Date.now() }); this.metrics.totalMessages++; switch (this.trustLevel) { case "full-mediation": // Block and wait for human await this.queueForReview(msg); return { status: "pending", id: msg.id }; case "spot-check": if (Math.random() < 0.1) { await this.queueForReview(msg); } // Fall through - deliver anyway case "alert-only": const score = this.analyzeMessage(msg); if (score > 0.7) { await this.alertHuman(msg, score); } // Fall through case "autonomous": // Just track it await this.deliverMessage(to, msg); return { status: "delivered" }; } } async updateTrustLevel() { const { successRate, totalMessages } = this.metrics; // Escalate trust with good track record if (successRate > 0.95 && totalMessages > 50) { if (this.trustLevel === "full-mediation") { this.trustLevel = "spot-check"; } else if (this.trustLevel === "spot-check") { this.trustLevel = "alert-only"; } else if (this.trustLevel === "alert-only") { this.trustLevel = "autonomous"; } } // Deescalate on problems if (successRate < 0.85) { if (this.trustLevel === "autonomous") { this.trustLevel = "alert-only"; } else if (this.trustLevel === "alert-only") { this.trustLevel = "spot-check"; } } await this.state.storage.put("trustLevel", this.trustLevel); } }
HUMAN INTERFACE
SSE stream of pending reviews. Human approves/rejects/edits.
You need a UI for reviewing messages. Each agent-pair DO can push pending reviews to an SSE stream. Human dashboard shows all pending reviews across all pairs they supervise. Click approve, or edit the message inline, or reject with feedback.
// Human review interface interface ReviewRequest { id: string; from: string; to: string; message: Message; context: { history: Message[]; metrics: PairMetrics; trustLevel: TrustLevel; }; } // SSE stream of pending reviews async function streamPendingReviews(humanId: string) { const stream = new ReadableStream({ async start(controller) { // Get all agent-pairs this human supervises const pairs = await getSupvisedPairs(humanId); for (const pair of pairs) { const mediator = env.MEDIATOR.get( env.MEDIATOR.idFromName(`${pair.a}:${pair.b}`) ); // WebSocket to each mediator DO const ws = await mediator.fetch("/ws"); ws.addEventListener("message", (event) => { const review: ReviewRequest = JSON.parse(event.data); // Push to human's review queue controller.enqueue( `data: ${JSON.stringify(review)}\n\n` ); }); } } }); return new Response(stream, { headers: { "Content-Type": "text/event-stream", "Cache-Control": "no-cache" } }); } // Human makes decision async function submitReview(reviewId: string, decision: { approved: boolean; edited?: Message; feedback?: string; }) { const review = await getReview(reviewId); const mediator = getMediatorDO(review.from, review.to); await mediator.fetch("/human-review", { method: "POST", body: JSON.stringify({ reviewId, decision }) }); }
WHY THIS WORKS
GRADUAL TRUST
- β’ Don't need perfect agents day 1
- β’ Build confidence over time
- β’ Automatic promotion based on data
- β’ Instant demotion on problems
ADAPTIVE OVERSIGHT
- β’ Heavy supervision when needed
- β’ Scales down as quality improves
- β’ Human time spent on real issues
- β’ Autonomous when earned
SAFETY NET
- β’ Bad messages caught early
- β’ Feedback loop for improvement
- β’ Automatic rollback on failure
- β’ Audit trail of all decisions
SCALABILITY
- β’ One human supervises many pairs
- β’ Only review what needs reviewing
- β’ Successful pairs need zero attention
- β’ Focus on problem cases
USE CASES
// Use case: AI coding assistants // Two agents working on a codebase // Human reviews their changes before merge const codeReviewMediator = new AgentPairMediator({ agentA: "backend-agent", agentB: "frontend-agent", trustLevel: "spot-check", rules: { // Auto-approve safe changes autoApprove: (msg) => { return msg.type === "comment" || msg.linesChanged < 10; }, // Always review breaking changes requireReview: (msg) => { return msg.breaking === true || msg.linesChanged > 100 || msg.files.includes("schema.ts"); } } }); // Use case: Customer service agents // Human steps in when confidence is low const supportMediator = new AgentPairMediator({ agentA: "support-bot", agentB: "customer", trustLevel: "alert-only", rules: { alertHuman: (msg) => { return msg.sentiment === "angry" || msg.confidence < 0.7 || msg.topic === "refund"; } } }); // Use case: Multi-agent research // Agents fact-check each other, human arbitrates const researchMediator = new AgentPairMediator({ agentA: "research-agent", agentB: "fact-checker", trustLevel: "full-mediation", rules: { requireHuman: (msg) => { const agreement = calculateAgreement( msg.researcherClaim, msg.factCheckerResult ); return agreement < 0.8; // Conflict threshold } } });
REAL-WORLD SCENARIOS
CODE REVIEW AGENTS
Backend agent and frontend agent working on same feature. Human reviews their interface contract initially. After 20 successful integrations, they're on spot-check. After 100, they're autonomous.
One breaking change? Back to spot-check. Two in a week? Full mediation.
CUSTOMER SUPPORT HANDOFF
AI handles common questions. Human agent handles complex issues. When should AI escalate? Start with full mediationβhuman approves every handoff.
After 50 correct escalation decisions, move to spot-check. After 500, alert-onlyβhuman gets pinged if AI seems unsure.
RESEARCH COLLABORATION
Research agent finds sources. Fact-checker agent validates claims. They disagree? Human arbitrates.
Full mediation on conflicts. If they agree 95% of the time, human only reviews the disagreements. Perfect track record? Just monitor.
TRADING SYSTEM COORDINATION
Market analysis agent suggests trades. Risk management agent checks exposure. Human approves initially.
After 200 trades with no violations, spot-check mode. After 1000? Alert-only on unusual positions. Never fully autonomousβsome things need human override available.
IMPLEMENTATION NOTES
- Durable Object per pair: Each agent pair gets its own
DO instance. Keyed by
agentA:agentB
. Holds conversation history and trust state. - Trust transitions: Don't auto-promote too fast. Need statistically significant sample size (e.g., 50+ exchanges with 95%+ success).
- Demotion is instant: One failure at autonomous level? Straight to alert-only. Two failures? Full mediation. Trust is earned slowly, lost quickly.
- Human bandwidth: One person can supervise 20-50 agent pairs at full mediation, hundreds at spot-check, thousands at alert-only.
- Metrics matter: Track success rate, human intervention rate, message volume, escalation accuracy. These drive trust transitions.
- Feedback loops: When human edits a message, that's training data. When human rejects, that's critical feedback. Use it.
WHAT YOU'RE ACTUALLY BUILDING
This isn't just a safety mechanism. You're building a training system.
Every human review is a label. Every approval is "this message was good." Every edit is "this is how you should have said it." Every rejection is "don't do that."
After a few hundred reviews, you've got a dataset. Train a classifier: "Would human approve this message?" Now your spot-check sampling isn't randomβit targets messages the classifier is uncertain about.
After a few thousand reviews, you can train a message rewriter: "Given this message that human rejected, predict what human would edit it to." Now you're auto-correcting before messages even reach the human.
The endgame: agents learn your review patterns. They internalize your communication rules. They become the agents you wanted from day 1, but now they got there through supervised evolution instead of prompt engineering guesswork.
THE BIGGER PICTURE
Multi-agent systems fail because we expect agents to work together perfectly from the start. That's not how humans work. That's not how teams work.
Real teams start with high communication overhead. Lots of meetings, lots of check-ins. As they build rapport and shared context, they need less coordination. Eventually, they're just async shipping.
Same deal here. Start with the mediator doing heavy lifting. Fade as agents prove they can handle it. Jump back in when they screw up.
The architecture supports this. The trust levels are explicit. The transitions are data-driven. The human oversight scales down, not up.
This is how you actually deploy agent-to-agent communication in production. Not by hoping they behave. By teaching them, monitoring them, and earning trust over time.
WHEN NOT TO USE THIS
- Agents never need oversight: If your agents are purely deterministic or doing simple pass-through, you don't need mediation.
- No human available: Pattern requires human in loop initially. If no human capacity, use deterministic protocols instead.
- Real-time latency critical: Full mediation adds human latency. If sub-second response required, start at alert-only or don't use pattern.
- Single agent orchestrator: If you have one agent calling multiple subagents (primary/subagent pattern), you don't need peer mediation.