HOW TO RALPH FOREVER

Run an AI coding agent in a loop until it's done.

Stop trying to be the harness. Install the harness. Orchestrate the harness.

Ralphing turns a repo into a self-shipping worker: it takes one tiny story, makes one safe change, proves it with gates, commits, repeats.

The model can be non-deterministic. The funnel must be deterministic.

        \
         \    __
          \  /  \
           \/    \
            |    |
            |    |
           /|    |\
          / |    | \
         /  |____|  \
        /   /    \   \
       /   /      \   \
      /___/        \___\
           πŸ¦„ RALPH

THE TWO RULES

1) Determinism

Not "the model is deterministic." It isn't.

Determinism means:

  • the same gates run every time
  • the same scope limits apply every time
  • the same "green means ship" rule holds every time

If it's correct, it passes. If it's wrong, it can't graduate.

2) Short Memory

Agents don't get better by carrying a novel.

They get better by doing small iterations and leaving one tiny gem behind:

  • one invariant that prevents a repeat mistake
  • one line in progress.txt or AGENTS.md

No emotional backlog. No encyclopedias. Just strong notes.

THE PROBLEM (WHY PROMPTING DIES)

AI agents fail like real engineers: they pick tasks too big, get lost, forget what they were doing, and stop after one attempt.

One-shot prompting is a trap. Context limits are real. The vibe wears off. You stop.

So instead of trying harder to prompt… change the architecture.

THE INVERSION

Make the repo the brain. Repo = state. Commits = memory. Actions = disposable compute.

Every run starts fresh from a clean checkout. The agent rehydrates context by reading a few canonical files. It does one small thing. It proves it with gates. It commits. That commit triggers the next run.

Loop.

WHAT "RALPH" IS

A to-do list, a contract, and a runner. Everything else is optional frosting.

The Loop (Mental Model)

    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   PUSH      β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Action Runs β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Read Canonical Filesβ”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Pick Next Story β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Implement  β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Run Gates (tests/etc)β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Guard Checks   β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Commit+Push β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
           β”‚
           └──────────┐
                      β”‚
                      β–Ό
              (loop repeats)

"Deterministically bad" means it can be clumsy, but it keeps trying.

THE MINIMUM INTERFACE

These files are your steering wheel.

  • AGENTS.md β€” contract + kill switch (PAUSED)
  • scripts/ralph/prd.json β€” backlog (tiny stories)
  • scripts/ralph/progress.txt β€” compressed learnings (one gem per miss)
  • scripts/ralph/constraints.json β€” scope limits (diff budgets, allowed paths, deps policy)
  • scripts/ralph/guard.sh β€” enforces constraints using git diff
  • .github/workflows/ralph.yml β€” the loop runner

(Your tool can add model config like .opencode/opencode.json.)

THE HARNESS (GUARDRAILS + GATES + PROMOTION LADDER)

Here's the missing piece most people don't name:

Guardrails contain. Gates prove. Promotion graduates.

Guardrails (containment)

Start with:

  • allowed paths (where it's allowed to edit)
  • diff budget (max files/lines changed)
  • dependency lock (no lockfile changes by default)
  • secrets policy (never print env, never touch auth files)

Gates (truth)

At minimum:

  • typecheck
  • tests (even a tiny set)
  • build

Promotion Ladder (reality)

This is how you stop breaking staging/prod:

        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚    PROD     β”‚ ← canary + rollback
        β”‚  (strictest)β”‚
        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
               β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
        β”‚  STAGING    β”‚ ← smoke + real flows
        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
               β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
        β”‚  PREVIEW    β”‚ ← deploy + smoke test
        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
               β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
        β”‚   LOCAL     β”‚ ← typecheck + tests + build
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Same paradigm. Higher rung = stricter proof.

FAILURE THAT KEEPS TRYING

If a run fails and doesn't push, nothing triggers the next run.

So failure must become:

  1. State (failure.json)
  2. Event (either cron retry, or commit the failure record to retrigger)

Add a retry budget so you don't burn compute forever.

SPEED COMES FROM CONSTRAINTS

If you want lots of iterations, you need tight loops.

Small stories + strict diff budgets turns "agent chaos" into "agent throughput."

Add constraints that force small work:

{
Β Β "iteration": { "maxFilesChanged": 12, "maxLinesChanged": 400 },
Β Β "allowPaths": ["src/", "app/", "content/", "scripts/"],
Β Β "denyPaths": [".github/workflows/"],
Β Β "dependencies": { "allowDependencyChanges": false }
}

Then enforce with a guard script that reads git diff. If it violates constraints: hard fail before commit.

HOW TO START RIGHT NOW

  1. Install the file interface
  2. Default safe: PAUSED: true
  3. First run proves plumbing, not value
  4. Add one tiny story
  5. Make gates real
  6. Turn it on: PAUSED: false

Ralph doesn't need motivation. It needs a backlog and a box.

Give it both.

SCALING MODES

By default: you do. And that's fine.

There are two operating modes:

SINGLE-LANE (recommended): only one open Ralph PR at a time. Simple mental model. Less entropy. You merge when ready.

MULTI-LANE (PR factory): multiple PRs in parallel. Requires an arbiter (human or agent). Otherwise you get PR pileup + conflicts + duplicated work.

Don't do multi-lane until your gates are real.

If you're trying to make money right now, start single-lane. Ship. Learn. Iterate.

THE NEXT PRIMITIVE: FLEET RALPHING

Once every repo can Ralph, you can have a repo that Ralphs your repos.

That "Conductor" loop can:

  • install/upgrade the loop everywhere
  • flip a global ON/OFF
  • broadcast changes across your entire codebase fleet

A Ralph ships code. A Conductor ships capability.

Autopilot repos β†’ autopilot fleet.

Ralphing: AI agents that keep trying until they succeed.

You orchestrate harnesses now.

Learn more about Ralph Wiggum technique