TRUST BROKER: HUMAN-MEDIATED AGENT COMMUNICATION

You're the referee. Then the coach. Then the guy in the stands eating popcorn.

ARCHITECTURE PATTERN 2025.10.14

THE PROBLEM

Two AI agents need to work together. But they hallucinate. They misunderstand. They go off script. You can't just wire them up and walk away.

Traditional approach: Hope for the best or hardcode every interaction.

Trust Broker pattern: Start as active mediator—read every message, approve/edit before delivery. As agents prove themselves, fade into monitoring. Eventually, just watch the logs.

THE PROGRESSION

Think of it like teaching two specialists to collaborate. First meeting? You're translating everything. After a few successful projects? You're just checking in. Once they've shipped a dozen things together? You're getting status updates.

The architecture supports four trust levels. You start high-touch, earn your way to hands-off.

TRUST LEVELS

// Trust levels define human involvement
type TrustLevel = 
  | "full-mediation"    // Human reads/approves every message
  | "spot-check"        // Human samples conversations
  | "alert-only"        // Human notified on anomalies
  | "autonomous";       // Agents run free

type ConversationState = {
  agentA: string;
  agentB: string;
  trustLevel: TrustLevel;
  messageCount: number;
  successRate: number;
  lastReview: Date;
};

ARCHITECTURE FLOW

TRUST PROGRESSION

┌─────────────────────────────────────────────────┐
│ Phase 1: FULL MEDIATION (Couples Therapy Mode)  │
├─────────────────────────────────────────────────┤
│                                                  │
│  Agent A ──> [Human Reviews] ──> Agent B        │
│         ←── [Human Edits] ←──                    │
│                                                  │
│  • Every message blocked for approval           │
│  • Human can edit before delivery               │
│  • Building trust baseline                      │
│                                                  │
└─────────────────────────────────────────────────┘
          │
          │ (Good track record)
          ↓
┌─────────────────────────────────────────────────┐
│ Phase 2: SPOT CHECK (Supervision Mode)          │
├─────────────────────────────────────────────────┤
│                                                  │
│  Agent A ──────┬──> Agent B (90%)               │
│                │                                 │
│                └──> [Human Samples] (10%)       │
│                                                  │
│  • Most messages flow directly                  │
│  • Random sampling for quality                  │
│  • Sample rate adapts to performance            │
│                                                  │
└─────────────────────────────────────────────────┘
          │
          │ (High success rate)
          ↓
┌─────────────────────────────────────────────────┐
│ Phase 3: ALERT ONLY (Monitor Mode)              │
├─────────────────────────────────────────────────┤
│                                                  │
│  Agent A ────────> Agent B                      │
│                      │                           │
│                      └──> [Anomaly Detection]   │
│                              │                   │
│                              └──> Human (if bad) │
│                                                  │
│  • Real-time message flow                       │
│  • Human notified on anomalies only             │
│  • Async analysis doesn't block                 │
│                                                  │
└─────────────────────────────────────────────────┘
          │
          │ (Sustained excellence)
          ↓
┌─────────────────────────────────────────────────┐
│ Phase 4: AUTONOMOUS (Full Trust Mode)           │
├─────────────────────────────────────────────────┤
│                                                  │
│  Agent A ←──────────> Agent B                   │
│                                                  │
│  • Direct communication                         │
│  • Human can observe logs                       │
│  • Automatic demotion on issues                 │
│                                                  │
└─────────────────────────────────────────────────┘

PHASE 1: FULL MEDIATION

Every message stops at your desk. You read it, maybe edit it, then approve or reject.

This is the couples therapy phase. Agent A wants to tell Agent B something? You're reading that message first. If A is asking B to "delete all users," you're catching that before it goes through.

// Full mediation: human in the loop
async function mediatedExchange(
  from: Agent, 
  to: Agent, 
  message: Message,
  human: HumanReviewer
) {
  // Agent A wants to send to Agent B
  const proposal = await from.composeMessage(message);
  
  // Human reviews before it goes through
  const decision = await human.review({
    from: from.id,
    to: to.id,
    content: proposal,
    context: getConversationHistory(from.id, to.id)
  });
  
  if (decision.approved) {
    // Optionally edit the message
    const finalMessage = decision.edited || proposal;
    await to.receive(finalMessage, from.id);
    
    // Track success
    updateTrustMetrics(from.id, to.id, "success");
  } else {
    // Log rejection for learning
    await logRejection(from.id, to.id, proposal, decision.reason);
  }
}

PHASE 2: SPOT CHECK

Messages flow freely, but you're sampling conversations. Start at 10%, adjust based on quality.

Agents have earned some trust. They can talk without you blocking every message. But you're still sampling—maybe 1 in 10 messages gets reviewed. If quality drops, crank the rate back up. If they're crushing it, dial it down.

// Spot check: sample conversations
class SpotCheckMediator {
  private checkRate: number = 0.1; // 10% of messages
  
  async handleMessage(from: Agent, to: Agent, msg: Message) {
    const shouldCheck = Math.random() < this.checkRate;
    
    if (shouldCheck) {
      // Queue for human review
      await this.reviewQueue.add({
        from: from.id,
        to: to.id,
        message: msg,
        timestamp: Date.now()
      });
      
      // But still let it through
      await to.receive(msg, from.id);
    } else {
      // Direct delivery
      await to.receive(msg, from.id);
    }
    
    // Adjust check rate based on metrics
    this.adaptCheckRate(from.id, to.id);
  }
  
  adaptCheckRate(fromId: string, toId: string) {
    const metrics = getMetrics(fromId, toId);
    
    // More checks if quality drops
    if (metrics.successRate < 0.85) {
      this.checkRate = Math.min(1.0, this.checkRate * 1.5);
    }
    
    // Fewer checks if quality is high
    if (metrics.successRate > 0.95) {
      this.checkRate = Math.max(0.01, this.checkRate * 0.8);
    }
  }
}

PHASE 3: ALERT ONLY

Real-time communication. You get pinged if something looks weird.

Messages go through immediately. You're analyzing async—looking for anomalies, sudden topic changes, confidence drops. Most of the time you see nothing. When something's off, you get a notification.

// Alert only: human is notified, doesn't block
async function alertBasedMediation(
  from: Agent,
  to: Agent, 
  msg: Message
) {
  // Message goes through immediately
  const delivery = to.receive(msg, from.id);
  
  // But we analyze it async
  const analysis = analyzeMessage(msg, {
    conversationHistory: getHistory(from.id, to.id),
    agentProfiles: [from.profile, to.profile],
    recentMetrics: getRecentMetrics(from.id, to.id)
  });
  
  // Parallel: delivery and analysis
  const [result, score] = await Promise.all([delivery, analysis]);
  
  // Alert human only if suspicious
  if (score.anomalyScore > 0.7) {
    await notifyHuman({
      type: "anomaly-detected",
      from: from.id,
      to: to.id,
      message: msg,
      score: score,
      action: "review-conversation"
    });
  }
  
  return result;
}

PHASE 4: AUTONOMOUS

Agents talk directly. You can watch the logs if you want.

Full trust mode. They've done this 500 times without screwing up. They're just talking now. You can review conversation history if you're curious, but you're not in the loop. One bad exchange? They get demoted back to alert-only.

DURABLE OBJECT IMPLEMENTATION

One DO per agent pair. Tracks conversation, trust level, metrics.

The mediator is a Durable Object instance per agent pair. It holds conversation history, current trust level, success rates. When a message comes in, it routes based on trust level. When metrics update, it can auto-promote or demote the pair.

// Durable Object per agent-pair
export class AgentPairMediator extends DurableObject {
  private history: Message[] = [];
  private trustLevel: TrustLevel = "full-mediation";
  private metrics = {
    totalMessages: 0,
    humanInterventions: 0,
    successRate: 1.0
  };
  
  async handleMessage(request: Request) {
    const { from, to, message, type } = await request.json();
    
    switch (type) {
      case "send":
        return this.mediateMessage(from, to, message);
      
      case "human-review":
        return this.handleHumanReview(message);
      
      case "update-trust":
        return this.updateTrustLevel();
      
      case "get-pending":
        return this.getPendingReviews();
    }
  }
  
  async mediateMessage(from: string, to: string, msg: Message) {
    this.history.push({ from, to, msg, timestamp: Date.now() });
    this.metrics.totalMessages++;
    
    switch (this.trustLevel) {
      case "full-mediation":
        // Block and wait for human
        await this.queueForReview(msg);
        return { status: "pending", id: msg.id };
      
      case "spot-check":
        if (Math.random() < 0.1) {
          await this.queueForReview(msg);
        }
        // Fall through - deliver anyway
        
      case "alert-only":
        const score = this.analyzeMessage(msg);
        if (score > 0.7) {
          await this.alertHuman(msg, score);
        }
        // Fall through
        
      case "autonomous":
        // Just track it
        await this.deliverMessage(to, msg);
        return { status: "delivered" };
    }
  }
  
  async updateTrustLevel() {
    const { successRate, totalMessages } = this.metrics;
    
    // Escalate trust with good track record
    if (successRate > 0.95 && totalMessages > 50) {
      if (this.trustLevel === "full-mediation") {
        this.trustLevel = "spot-check";
      } else if (this.trustLevel === "spot-check") {
        this.trustLevel = "alert-only";
      } else if (this.trustLevel === "alert-only") {
        this.trustLevel = "autonomous";
      }
    }
    
    // Deescalate on problems
    if (successRate < 0.85) {
      if (this.trustLevel === "autonomous") {
        this.trustLevel = "alert-only";
      } else if (this.trustLevel === "alert-only") {
        this.trustLevel = "spot-check";
      }
    }
    
    await this.state.storage.put("trustLevel", this.trustLevel);
  }
}

HUMAN INTERFACE

SSE stream of pending reviews. Human approves/rejects/edits.

You need a UI for reviewing messages. Each agent-pair DO can push pending reviews to an SSE stream. Human dashboard shows all pending reviews across all pairs they supervise. Click approve, or edit the message inline, or reject with feedback.

// Human review interface
interface ReviewRequest {
  id: string;
  from: string;
  to: string;
  message: Message;
  context: {
    history: Message[];
    metrics: PairMetrics;
    trustLevel: TrustLevel;
  };
}

// SSE stream of pending reviews
async function streamPendingReviews(humanId: string) {
  const stream = new ReadableStream({
    async start(controller) {
      // Get all agent-pairs this human supervises
      const pairs = await getSupvisedPairs(humanId);
      
      for (const pair of pairs) {
        const mediator = env.MEDIATOR.get(
          env.MEDIATOR.idFromName(`${pair.a}:${pair.b}`)
        );
        
        // WebSocket to each mediator DO
        const ws = await mediator.fetch("/ws");
        
        ws.addEventListener("message", (event) => {
          const review: ReviewRequest = JSON.parse(event.data);
          
          // Push to human's review queue
          controller.enqueue(
            `data: ${JSON.stringify(review)}\n\n`
          );
        });
      }
    }
  });
  
  return new Response(stream, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache"
    }
  });
}

// Human makes decision
async function submitReview(reviewId: string, decision: {
  approved: boolean;
  edited?: Message;
  feedback?: string;
}) {
  const review = await getReview(reviewId);
  const mediator = getMediatorDO(review.from, review.to);
  
  await mediator.fetch("/human-review", {
    method: "POST",
    body: JSON.stringify({ reviewId, decision })
  });
}

WHY THIS WORKS

GRADUAL TRUST

• Don't need perfect agents day 1
• Build confidence over time
• Automatic promotion based on data
• Instant demotion on problems

ADAPTIVE OVERSIGHT

• Heavy supervision when needed
• Scales down as quality improves
• Human time spent on real issues
• Autonomous when earned

SAFETY NET

• Bad messages caught early
• Feedback loop for improvement
• Automatic rollback on failure
• Audit trail of all decisions

SCALABILITY

• One human supervises many pairs
• Only review what needs reviewing
• Successful pairs need zero attention
• Focus on problem cases

USE CASES

// Use case: AI coding assistants
// Two agents working on a codebase
// Human reviews their changes before merge

const codeReviewMediator = new AgentPairMediator({
  agentA: "backend-agent",
  agentB: "frontend-agent",
  trustLevel: "spot-check",
  rules: {
    // Auto-approve safe changes
    autoApprove: (msg) => {
      return msg.type === "comment" || 
             msg.linesChanged < 10;
    },
    // Always review breaking changes
    requireReview: (msg) => {
      return msg.breaking === true ||
             msg.linesChanged > 100 ||
             msg.files.includes("schema.ts");
    }
  }
});

// Use case: Customer service agents
// Human steps in when confidence is low

const supportMediator = new AgentPairMediator({
  agentA: "support-bot",
  agentB: "customer",
  trustLevel: "alert-only",
  rules: {
    alertHuman: (msg) => {
      return msg.sentiment === "angry" ||
             msg.confidence < 0.7 ||
             msg.topic === "refund";
    }
  }
});

// Use case: Multi-agent research
// Agents fact-check each other, human arbitrates

const researchMediator = new AgentPairMediator({
  agentA: "research-agent",
  agentB: "fact-checker",
  trustLevel: "full-mediation",
  rules: {
    requireHuman: (msg) => {
      const agreement = calculateAgreement(
        msg.researcherClaim,
        msg.factCheckerResult
      );
      return agreement < 0.8; // Conflict threshold
    }
  }
});

REAL-WORLD SCENARIOS

CODE REVIEW AGENTS

Backend agent and frontend agent working on same feature. Human reviews their interface contract initially. After 20 successful integrations, they're on spot-check. After 100, they're autonomous.

One breaking change? Back to spot-check. Two in a week? Full mediation.

CUSTOMER SUPPORT HANDOFF

AI handles common questions. Human agent handles complex issues. When should AI escalate? Start with full mediation—human approves every handoff.

After 50 correct escalation decisions, move to spot-check. After 500, alert-only—human gets pinged if AI seems unsure.

RESEARCH COLLABORATION

Research agent finds sources. Fact-checker agent validates claims. They disagree? Human arbitrates.

Full mediation on conflicts. If they agree 95% of the time, human only reviews the disagreements. Perfect track record? Just monitor.

TRADING SYSTEM COORDINATION

Market analysis agent suggests trades. Risk management agent checks exposure. Human approves initially.

After 200 trades with no violations, spot-check mode. After 1000? Alert-only on unusual positions. Never fully autonomous—some things need human override available.

IMPLEMENTATION NOTES

Durable Object per pair: Each agent pair gets its own DO instance. Keyed by agentA:agentB. Holds conversation history and trust state.
Trust transitions: Don't auto-promote too fast. Need statistically significant sample size (e.g., 50+ exchanges with 95%+ success).
Demotion is instant: One failure at autonomous level? Straight to alert-only. Two failures? Full mediation. Trust is earned slowly, lost quickly.
Human bandwidth: One person can supervise 20-50 agent pairs at full mediation, hundreds at spot-check, thousands at alert-only.
Metrics matter: Track success rate, human intervention rate, message volume, escalation accuracy. These drive trust transitions.
Feedback loops: When human edits a message, that's training data. When human rejects, that's critical feedback. Use it.

WHAT YOU'RE ACTUALLY BUILDING

This isn't just a safety mechanism. You're building a training system.

Every human review is a label. Every approval is "this message was good." Every edit is "this is how you should have said it." Every rejection is "don't do that."

After a few hundred reviews, you've got a dataset. Train a classifier: "Would human approve this message?" Now your spot-check sampling isn't random—it targets messages the classifier is uncertain about.

After a few thousand reviews, you can train a message rewriter: "Given this message that human rejected, predict what human would edit it to." Now you're auto-correcting before messages even reach the human.

The endgame: agents learn your review patterns. They internalize your communication rules. They become the agents you wanted from day 1, but now they got there through supervised evolution instead of prompt engineering guesswork.

THE BIGGER PICTURE

Multi-agent systems fail because we expect agents to work together perfectly from the start. That's not how humans work. That's not how teams work.

Real teams start with high communication overhead. Lots of meetings, lots of check-ins. As they build rapport and shared context, they need less coordination. Eventually, they're just async shipping.

Same deal here. Start with the mediator doing heavy lifting. Fade as agents prove they can handle it. Jump back in when they screw up.

The architecture supports this. The trust levels are explicit. The transitions are data-driven. The human oversight scales down, not up.

This is how you actually deploy agent-to-agent communication in production. Not by hoping they behave. By teaching them, monitoring them, and earning trust over time.

WHEN NOT TO USE THIS

Agents never need oversight: If your agents are purely deterministic or doing simple pass-through, you don't need mediation.
No human available: Pattern requires human in loop initially. If no human capacity, use deterministic protocols instead.
Real-time latency critical: Full mediation adds human latency. If sub-second response required, start at alert-only or don't use pattern.
Single agent orchestrator: If you have one agent calling multiple subagents (primary/subagent pattern), you don't need peer mediation.