Ralphing illustration

HOW TO RALPH FOREVER

Run an AI coding agent in a loop until it's done.

Stop trying to be the harness. Install the harness. Orchestrate the harness.

Ralphing turns a repo into a self-shipping worker: it takes one tiny story, makes one safe change, proves it with gates, commits, repeats.

The model can be non-deterministic. The funnel must be deterministic.

THE TWO RULES

1) Determinism

Not "the model is deterministic." It isn't.

Determinism means:

  • the same gates run every time
  • the same scope limits apply every time
  • the same "green means ship" rule holds every time

If it's correct, it passes. If it's wrong, it can't graduate.

2) Short Memory

Agents don't get better by carrying a novel.

They get better by doing small iterations and leaving one tiny gem behind:

  • one invariant that prevents a repeat mistake
  • one line in progress.txt or AGENTS.md

No emotional backlog. No encyclopedias. Just strong notes.

THE PROBLEM (WHY PROMPTING DIES)

AI agents fail like real engineers: they pick tasks too big, get lost, forget what they were doing, and stop after one attempt.

One-shot prompting is a trap. Context limits are real. The vibe wears off. You stop.

So instead of trying harder to prompt… change the architecture.

THE INVERSION

Make the repo the brain. Repo = state. Commits = memory. Actions = disposable compute.

Every run starts fresh from a clean checkout. The agent rehydrates context by reading a few canonical files. It does one small thing. It proves it with gates. It commits. That commit triggers the next run.

Loop.

WHAT "RALPH" IS

A to-do list, a contract, and a runner. Everything else is optional frosting.

The Loop (Mental Model)

Push β†’ action runs β†’ agent reads canonical files β†’ picks next todo story β†’ implements β†’ runs gates (typecheck/tests/build) β†’ guard checks scope β†’ commit + push β†’ push triggers next run.

"Deterministically bad" means it can be clumsy, but it keeps trying.

THE MINIMUM INTERFACE

These files are your steering wheel.

  • AGENTS.md β€” contract + kill switch (PAUSED)
  • scripts/ralph/prd.json β€” backlog (tiny stories)
  • scripts/ralph/progress.txt β€” compressed learnings (one gem per miss)
  • scripts/ralph/constraints.json β€” scope limits (diff budgets, allowed paths, deps policy)
  • scripts/ralph/guard.sh β€” enforces constraints using git diff
  • .github/workflows/ralph.yml β€” the loop runner

(Your tool can add model config like .opencode/opencode.json.)

THE HARNESS (GUARDRAILS + GATES + PROMOTION LADDER)

Here's the missing piece most people don't name:

Guardrails contain. Gates prove. Promotion graduates.

Guardrails (containment)

Start with:

  • allowed paths (where it's allowed to edit)
  • diff budget (max files/lines changed)
  • dependency lock (no lockfile changes by default)
  • secrets policy (never print env, never touch auth files)

Gates (truth)

At minimum:

  • typecheck
  • tests (even a tiny set)
  • build

Promotion Ladder (reality)

This is how you stop breaking staging/prod:

  • Local: typecheck + tests + build
  • Preview: deploy an ephemeral URL + smoke test the URL
  • Staging: stricter smoke + a couple real flows
  • Prod: gradual rollout + rollback triggers

Same paradigm. Higher rung = stricter proof.

FAILURE THAT KEEPS TRYING

If a run fails and doesn't push, nothing triggers the next run.

So failure must become:

  1. State (failure.json)
  2. Event (either cron retry, or commit the failure record to retrigger)

Add a retry budget so you don't burn compute forever.

SPEED COMES FROM CONSTRAINTS

If you want lots of iterations, you need tight loops.

Small stories + strict diff budgets turns "agent chaos" into "agent throughput."

Add constraints that force small work:

{
Β Β "iteration": { "maxFilesChanged": 12, "maxLinesChanged": 400 },
Β Β "allowPaths": ["src/", "app/", "content/", "scripts/"],
Β Β "denyPaths": [".github/workflows/"],
Β Β "dependencies": { "allowDependencyChanges": false }
}

Then enforce with a guard script that reads git diff. If it violates constraints: hard fail before commit.

HOW TO START RIGHT NOW

  1. Install the file interface
  2. Default safe: PAUSED: true
  3. First run proves plumbing, not value
  4. Add one tiny story
  5. Make gates real
  6. Turn it on: PAUSED: false

Ralph doesn't need motivation. It needs a backlog and a box.

Give it both.

SCALING MODES

By default: you do. And that's fine.

There are two operating modes:

SINGLE-LANE (recommended): only one open Ralph PR at a time. Simple mental model. Less entropy. You merge when ready.

MULTI-LANE (PR factory): multiple PRs in parallel. Requires an arbiter (human or agent). Otherwise you get PR pileup + conflicts + duplicated work.

Don't do multi-lane until your gates are real.

If you're trying to make money right now, start single-lane. Ship. Learn. Iterate.

THE NEXT PRIMITIVE: FLEET RALPHING

Once every repo can Ralph, you can have a repo that Ralphs your repos.

That "Conductor" loop can:

  • install/upgrade the loop everywhere
  • flip a global ON/OFF
  • broadcast changes across your entire codebase fleet

A Ralph ships code. A Conductor ships capability.

Autopilot repos β†’ autopilot fleet.

Ralphing: AI agents that keep trying until they succeed.

You orchestrate harnesses now.

Learn more about Ralph Wiggum technique

Get new posts by email.
Join 6 other subscribers.