HOW TO RALPH FOREVER

Run an AI coding agent in a loop until it's done.

Stop trying to be the harness. Install the harness. Orchestrate the harness.

Ralphing turns a repo into a self-shipping worker: it takes one tiny story, makes one safe change, proves it with gates, commits, repeats.

The model can be non-deterministic. The funnel must be deterministic.

THE TWO RULES

1) Determinism

Not "the model is deterministic." It isn't.

Determinism means:

the same gates run every time
the same scope limits apply every time
the same "green means ship" rule holds every time

If it's correct, it passes. If it's wrong, it can't graduate.

2) Short Memory

Agents don't get better by carrying a novel.

They get better by doing small iterations and leaving one tiny gem behind:

one invariant that prevents a repeat mistake
one line in progress.txt or AGENTS.md

No emotional backlog. No encyclopedias. Just strong notes.

THE PROBLEM (WHY PROMPTING DIES)

AI agents fail like real engineers: they pick tasks too big, get lost, forget what they were doing, and stop after one attempt.

One-shot prompting is a trap. Context limits are real. The vibe wears off. You stop.

So instead of trying harder to prompt… change the architecture.

THE INVERSION

Make the repo the brain. Repo = state. Commits = memory. Actions = disposable compute.

Every run starts fresh from a clean checkout. The agent rehydrates context by reading a few canonical files. It does one small thing. It proves it with gates. It commits. That commit triggers the next run.

Loop.

WHAT "RALPH" IS

A to-do list, a contract, and a runner. Everything else is optional frosting.

The Loop (Mental Model)

Push → action runs → agent reads canonical files → picks next todo story → implements → runs gates (typecheck/tests/build) → guard checks scope → commit + push → push triggers next run.

"Deterministically bad" means it can be clumsy, but it keeps trying.

THE MINIMUM INTERFACE

These files are your steering wheel.

AGENTS.md — contract + kill switch (PAUSED)
scripts/ralph/prd.json — backlog (tiny stories)
scripts/ralph/progress.txt — compressed learnings (one gem per miss)
scripts/ralph/constraints.json — scope limits (diff budgets, allowed paths, deps policy)
scripts/ralph/guard.sh — enforces constraints using git diff
.github/workflows/ralph.yml — the loop runner

(Your tool can add model config like .opencode/opencode.json.)

THE HARNESS (GUARDRAILS + GATES + PROMOTION LADDER)

Here's the missing piece most people don't name:

Guardrails contain. Gates prove. Promotion graduates.

Guardrails (containment)

Start with:

allowed paths (where it's allowed to edit)
diff budget (max files/lines changed)
dependency lock (no lockfile changes by default)
secrets policy (never print env, never touch auth files)

Gates (truth)

At minimum:

typecheck
tests (even a tiny set)
build

Promotion Ladder (reality)

This is how you stop breaking staging/prod:

Local: typecheck + tests + build
Preview: deploy an ephemeral URL + smoke test the URL
Staging: stricter smoke + a couple real flows
Prod: gradual rollout + rollback triggers

Same paradigm. Higher rung = stricter proof.

FAILURE THAT KEEPS TRYING

If a run fails and doesn't push, nothing triggers the next run.

So failure must become:

State (failure.json)
Event (either cron retry, or commit the failure record to retrigger)

Add a retry budget so you don't burn compute forever.

SPEED COMES FROM CONSTRAINTS

If you want lots of iterations, you need tight loops.

Small stories + strict diff budgets turns "agent chaos" into "agent throughput."

Add constraints that force small work:

{
  "iteration": { "maxFilesChanged": 12, "maxLinesChanged": 400 },
  "allowPaths": ["src/", "app/", "content/", "scripts/"],
  "denyPaths": [".github/workflows/"],
  "dependencies": { "allowDependencyChanges": false }
}

Then enforce with a guard script that reads git diff. If it violates constraints: hard fail before commit.

HOW TO START RIGHT NOW

Install the file interface
Default safe: PAUSED: true
First run proves plumbing, not value
Add one tiny story
Make gates real
Turn it on: PAUSED: false

Ralph doesn't need motivation. It needs a backlog and a box.

Give it both.

SCALING MODES

By default: you do. And that's fine.

There are two operating modes:

SINGLE-LANE (recommended): only one open Ralph PR at a time. Simple mental model. Less entropy. You merge when ready.

MULTI-LANE (PR factory): multiple PRs in parallel. Requires an arbiter (human or agent). Otherwise you get PR pileup + conflicts + duplicated work.

Don't do multi-lane until your gates are real.

If you're trying to make money right now, start single-lane. Ship. Learn. Iterate.

THE NEXT PRIMITIVE: FLEET RALPHING

Once every repo can Ralph, you can have a repo that Ralphs your repos.

That "Conductor" loop can:

install/upgrade the loop everywhere
flip a global ON/OFF
broadcast changes across your entire codebase fleet

A Ralph ships code. A Conductor ships capability.

Autopilot repos → autopilot fleet.

Ralphing: AI agents that keep trying until they succeed.

You orchestrate harnesses now.

Learn more about Ralph Wiggum technique

Get new posts by email.

Join 6 other subscribers.