HOW TO RALPH FOREVER
Run an AI coding agent in a loop until it's done.
Stop trying to be the harness. Install the harness. Orchestrate the harness.
Ralphing turns a repo into a self-shipping worker: it takes one tiny story, makes one safe change, proves it with gates, commits, repeats.
The model can be non-deterministic. The funnel must be deterministic.
THE TWO RULES
1) Determinism
Not "the model is deterministic." It isn't.
Determinism means:
- the same gates run every time
- the same scope limits apply every time
- the same "green means ship" rule holds every time
If it's correct, it passes. If it's wrong, it can't graduate.
2) Short Memory
Agents don't get better by carrying a novel.
They get better by doing small iterations and leaving one tiny gem behind:
- one invariant that prevents a repeat mistake
- one line in
progress.txtorAGENTS.md
No emotional backlog. No encyclopedias. Just strong notes.
THE PROBLEM (WHY PROMPTING DIES)
AI agents fail like real engineers: they pick tasks too big, get lost, forget what they were doing, and stop after one attempt.
One-shot prompting is a trap. Context limits are real. The vibe wears off. You stop.
So instead of trying harder to prompt⦠change the architecture.
THE INVERSION
Make the repo the brain. Repo = state. Commits = memory. Actions = disposable compute.
Every run starts fresh from a clean checkout. The agent rehydrates context by reading a few canonical files. It does one small thing. It proves it with gates. It commits. That commit triggers the next run.
Loop.
WHAT "RALPH" IS
A to-do list, a contract, and a runner. Everything else is optional frosting.
The Loop (Mental Model)
Push β action runs β agent reads canonical files β picks next todo story β implements β runs gates (typecheck/tests/build) β guard checks scope β commit + push β push triggers next run.
"Deterministically bad" means it can be clumsy, but it keeps trying.
THE MINIMUM INTERFACE
These files are your steering wheel.
AGENTS.mdβ contract + kill switch (PAUSED)scripts/ralph/prd.jsonβ backlog (tiny stories)scripts/ralph/progress.txtβ compressed learnings (one gem per miss)scripts/ralph/constraints.jsonβ scope limits (diff budgets, allowed paths, deps policy)scripts/ralph/guard.shβ enforces constraints usinggit diff.github/workflows/ralph.ymlβ the loop runner
(Your tool can add model config like .opencode/opencode.json.)
THE HARNESS (GUARDRAILS + GATES + PROMOTION LADDER)
Here's the missing piece most people don't name:
Guardrails contain. Gates prove. Promotion graduates.
Guardrails (containment)
Start with:
- allowed paths (where it's allowed to edit)
- diff budget (max files/lines changed)
- dependency lock (no lockfile changes by default)
- secrets policy (never print env, never touch auth files)
Gates (truth)
At minimum:
- typecheck
- tests (even a tiny set)
- build
Promotion Ladder (reality)
This is how you stop breaking staging/prod:
- Local: typecheck + tests + build
- Preview: deploy an ephemeral URL + smoke test the URL
- Staging: stricter smoke + a couple real flows
- Prod: gradual rollout + rollback triggers
Same paradigm. Higher rung = stricter proof.
FAILURE THAT KEEPS TRYING
If a run fails and doesn't push, nothing triggers the next run.
So failure must become:
- State (
failure.json) - Event (either cron retry, or commit the failure record to retrigger)
Add a retry budget so you don't burn compute forever.
SPEED COMES FROM CONSTRAINTS
If you want lots of iterations, you need tight loops.
Small stories + strict diff budgets turns "agent chaos" into "agent throughput."
Add constraints that force small work:
Β Β "iteration": { "maxFilesChanged": 12, "maxLinesChanged": 400 },
Β Β "allowPaths": ["src/", "app/", "content/", "scripts/"],
Β Β "denyPaths": [".github/workflows/"],
Β Β "dependencies": { "allowDependencyChanges": false }
}
Then enforce with a guard script that reads git diff. If it violates constraints: hard fail before commit.
HOW TO START RIGHT NOW
- Install the file interface
- Default safe:
PAUSED: true - First run proves plumbing, not value
- Add one tiny story
- Make gates real
- Turn it on:
PAUSED: false
Ralph doesn't need motivation. It needs a backlog and a box.
Give it both.
SCALING MODES
By default: you do. And that's fine.
There are two operating modes:
SINGLE-LANE (recommended): only one open Ralph PR at a time. Simple mental model. Less entropy. You merge when ready.
MULTI-LANE (PR factory): multiple PRs in parallel. Requires an arbiter (human or agent). Otherwise you get PR pileup + conflicts + duplicated work.
Don't do multi-lane until your gates are real.
If you're trying to make money right now, start single-lane. Ship. Learn. Iterate.
THE NEXT PRIMITIVE: FLEET RALPHING
Once every repo can Ralph, you can have a repo that Ralphs your repos.
That "Conductor" loop can:
- install/upgrade the loop everywhere
- flip a global ON/OFF
- broadcast changes across your entire codebase fleet
A Ralph ships code. A Conductor ships capability.
Autopilot repos β autopilot fleet.
Ralphing: AI agents that keep trying until they succeed.
You orchestrate harnesses now.