THINK IN LOOPS

2-3 iterations. Stop. Review. Update context. Repeat.

AI WORKFLOWS 2026.01.17

THE WORKFLOW

Problem: AI models drift off-task. They make confident mistakes. Full auto is gambling.

Solution: Slow mode. 2-3 iterations, then stop. Review the code. Update context. Repeat. The human is the checkpoint in the loop.

Ralph runs (2-3x) → Stop
         ↓
    Review code
         ↓
    Update context
         ↓
    Repeat

WHAT'S A RALPH?

Named after Geoffrey Huntley's concept: AI models are like Ralph Wiggum from The Simpsons. No memory. No plan. No continuity. Every moment is complete and isolated.

Every message is a fresh instance that only knows what you show it right now. That's not a limitation to complain about—it's the lever you control.

Every message is a new Ralph.

WHY SLOW MODE WORKS

When you let Ralph run unsupervised for 50 iterations, you're hoping. Wishing. You come back to either a miracle or a mess.

When you stop every 2-3 iterations:

You see where Ralph is heading
You correct course before it compounds
You update the context with what you learned
The next iterations are better informed

Time isn't saved. Time is spent differently. Instead of typing implementation, you're writing tests. Instead of debugging your code, you're examining what Ralph could see.

THE MEMENTO PROBLEM

There's a movie called Memento about a man who can't form new memories. Every few minutes, he forgets everything. So he tattoos the important things on his body—the facts that must persist.

Your AI is this man. The context window is his mind. Everything you don't tattoo (include explicitly) doesn't exist.

The inversion: you are not the one without memory. The AI is. You're the handler. You're the one who writes the tattoos.

In slow mode, the "code review" step is where you write new tattoos. You see what Ralph did, you update AGENTS.md or your task list, you give the next Ralph better context than the last one had.

THE ONLY LEVER

The context window is the only thing you control.

That's how you steer. That's how you shape. That's how you influence.

Not the model. Not the framework. Not the tooling. The window.

Once you understand this, you stop caring which AI you're using. The context window is the game board. Everything else is furniture.

THE SIMPLEST IMPLEMENTATION

A GitHub Action that loops:

name: ralph

on:
  push:
    branches: [main]
  workflow_dispatch:

jobs:
  run:
    runs-on: ubuntu-latest
    timeout-minutes: 15

    steps:
      - uses: actions/checkout@v4

      - name: Run Ralph
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          npm install -g @anthropic-ai/claude-code
          
          # The context: everything Ralph can see
          claude --dangerously-skip-permissions \
            --print \
            "$(cat AGENTS.md)"

      - name: Guardrail - Something Must Change
        run: |
          if git diff --quiet; then
            echo "Nothing changed. Loop failed."
            exit 1
          fi

      - name: Continue Loop
        run: |
          git config user.name "ralph-bot"
          git config user.email "ralph-bot@users.noreply.github.com"
          git add -A
          git commit -m "ralph: iteration"
          git push  # triggers next loop

Fresh context (checkout) → Ralph reads AGENTS.md → Ralph works → Guardrail checks for changes → Commit and push → Push triggers next iteration

THE GUARDRAIL

git diff --quiet

Did something change? Yes → continue. No → fail.

The loop enforces itself. You can't wish for a different outcome with the same inputs. If Ralph did nothing, that's not feedback—that's a broken loop.

ADDING BACKPRESSURE

Tests are backpressure:

      - name: Run Tests
        run: npm test
        
      - name: Guardrail
        run: |
          if git diff --quiet; then
            echo "Nothing changed."
            exit 1
          fi

Now Ralph must change things in a way that passes tests. The tests define the shape. Ralph fills it.

ADDING A KILL SWITCH

      - name: Check Kill Switch
        run: |
          if [ -f .ralph/PAUSED ]; then
            echo "Ralph is paused."
            exit 0
          fi

Stop Ralph: touch .ralph/PAUSED && git push
Restart: rm .ralph/PAUSED && git push

ADDING FAILURE MEMORY

Ralphs forget. The filesystem remembers:

      - name: Record Outcome
        if: always()
        run: |
          mkdir -p .ralph
          
          if [ "${{ job.status }}" = "success" ]; then
            echo "0" > .ralph/failures
          else
            FAILURES=$(cat .ralph/failures 2>/dev/null || echo "0")
            echo "$((FAILURES + 1))" > .ralph/failures
            
            if [ $FAILURES -ge 5 ]; then
              touch .ralph/PAUSED
              echo "Too many failures. Pausing."
            fi
          fi
          
          git add .ralph/
          git commit -m "ralph: record state" || true
          git push || true

Five failures in a row? Ralph pauses itself.

THE CORE LOOP

Here's what every loop needs:

1. Fresh context every cycle — Ralph is born again each time
2. Clear instructions — what should Ralph try to do?
3. The ability to act — Ralph must be able to change something
4. Observation — did it work?
5. A guardrail — something MUST change, or the loop failed

That last one is key: the loop must eat its own tail.

If Ralph runs and nothing changes, that's not neutral. That's failure. You can't observe without influencing. The concept validates itself.

SLOW MODE VS FULL AUTO

Full auto:
ralph runs → ralph runs → ralph runs → ... → you check → ???

Slow mode:
ralph runs (2-3x) → you review → update context → ralph runs (2-3x) → you review → ...

Full auto is gambling. Slow mode is steering.

The right tools for proper code review don't fully exist yet. So we slow down. We stay in the loop. We use our eyes as the checkpoint until better tooling catches up.

THE INVERSION

Old: I write code, tests validate it.
New: I write tests, Ralph fills in the code.

Old: I am the harness.
New: Tests are the harness. I observe.

Old: Debugging means tracing my code.
New: Debugging means examining what Ralph could see.

Same features. Built backwards.

YOU'RE ALREADY BUILDING HARNESSES

"I could just write the code."

Sure. Then write the tests to make sure it's not broken. Then the guardrails in staging. Then the cascade of other tests as you add more features. Then production monitoring. Then alerts. Then rollback procedures.

You're already building harnesses. You always were.

CI/CD is a loop. Testing is a guardrail. Code review is a checkpoint. Deployment is a gate. Monitoring is observation.

Ralph doesn't add ceremony. Ralph makes visible the ceremony that was always there.

WHERE THIS GOES

I can see a world where I'm just talking to repos.

Set up a Ralph. Make sure the loop is on. Extend the PRD when I want something new. Sit back and make sure the right checks are in place.

Any coding agent. Any model. Doesn't matter. They all modify the repo. The repo has the loop. The loop has the guardrails.

I don't build features. I build checks.

The harder part is writing tests that actually specify what you want. That's not trivial. But when you can define "done" clearly enough that a test can verify it, Ralph has a shot. When Ralph fails, you learn what was missing from the context or what the test didn't cover.

PATIENCE

The biggest shift isn't technical. It's patience.

Patience for the loop. Patience for the model. Patience for the process.

Observation requires giving up control. Giving up ego. Because you truly can't control anything—even when you thought you were in control.

Code breaks. Tests break. What did we do before Ralph? We tried to fix it. Tried to better ourselves. Did it again.

A new loop was born. After every failure, always another iteration.

That was always true. We just didn't call it a loop. We called it "debugging" or "iterating" or "learning." But it was always: try, fail, adjust, try again.

Ralph just makes the loop visible. And once you see it, you realize: there will always be another iteration. That's not a problem to solve. That's the process itself.

THE WALL YOU'LL HIT

Once you start doing this, you hit the same problem every team hits:

Observation.

You need to see what changed, why it changed, what failed, and what reality says. Without observation, loops turn into superstition.

I'm using gateproof.dev in my projects. It's just a collection of ideas for doing this end-to-end—observing what changed and validating against constraints. Not expecting many people to use it, but it works for me.

Automation without checkpoints is just a fast way to be wrong.

Slower loops. More deliberate steering. Code review as backpressure.

That's the workflow.