SYSTEM_LOG ENTRY
SYSTEM_LOG ENTRY
Since I started posting about autonomous agent loops, I've been getting questions. "How do I set this flow-next thing up?" "What's Ralph?" "How do you use it?"
This tweet started it all:
Also had to explain to my wife why I'm searching for Ralph Wiggum images. With modest results.
But before I explain my flow-next plugin, understand Ralph first. Not my implementation-the technique itself.
As Geoffrey Huntley says, start with first principles.
Essential reading:
My point: try building a dumb bash loop yourself first. See what happens when you externalize state to files. You'll learn more in an hour than from any thread.
Ralph isn't "an agent that remembers forever." It's an agent that controls what it remembers.

LLM context windows only grow-add tokens, never delete. Wrong turns, failed attempts, hallucinations accumulate. Context pollution.
Most tools try "compaction"-summarizing when context gets long. But summarization is lossy. Details vanish. The model invents things. Forgets to implement the task as requested.
Ralph takes a different approach: treat context like managed memory.
Geoff calls it a "malloc orchestrator." Don't dump everything in and hope. Deliberately control what goes in. Pin a stable frame of reference. Let the session end. Start fresh.
while :; do cat task.md | agent; doneSimple loop. Deeper insight: state lives in files, not conversation.
Each iteration reconstructs reality from the filesystem-not from a polluted transcript of what went wrong.
Progress persists. Failures evaporate.
Anthropic's own context engineering guidance emphasizes re-anchoring from sources of truth. Their long-context documentation warns about context rot. They recommend fresh context windows with structured progress files.
Their sample ralph-wiggum plugin takes a different approach-single session, accumulating context, a stop hook that re-feeds the prompt without clearing history.
Flow-next follows the re-anchoring philosophy. Fresh context each iteration.

Same model writing code AND reviewing it = same blind spots. Context steers generation. The system converges to local coherence, not global correctness.
Self-testing is self-consistency. Not falsification.
Flow-next gates on cross-model review. A different model checks the work. Different weights, different priors. Use RepoPrompt (macOS) or Codex CLI (any platform) with GPT 5.2 High, or whatever reviewer you trust.
Two models > one.
Most tools: "warning, proceed anyway?"
Flow-next: SHIP or fix it. No proceeding until the reviewer approves. Enforced re-reviews until SHIP verdict or max retry attempts (configurable, default 3).
Subagents gather context, analyze patterns, fetch docs, run gap analysis.
Plans get reviewed too-architecture, approach, scope checked by a different model before implementation starts.
Catches design issues when they're cheap to fix.
Full task graph in .flow/:
Everything version-controlled. Full audit trail.
After N failures (default 3), the system blocks the task and moves on. No infinite loops burning your API budget or Claude Max subscription.
Every iteration re-reads the epic spec, task spec, and git state. Reconstruct reality from sources of truth. Drift-proof.
--watch streams tool calls in real-time so you can see what's happening without blocking autonomy. --watch verbose includes model responses too.
Full loop from idea to shipped code.
Start with what I want to build. Rough notes, bullet points, whatever works. Thinking time, not typing time.
/flow-next:interview
This is the underrated step. Claude asks probing questions:
10-15 minutes of back-and-forth refines a rough spec into something precise.
This is where most of the value is. A crisp spec prevents 10x the debugging later. Often surfaces 2-3 things I didn't even consider.
/flow-next:plan <spec>
Claude breaks the spec into:
This is the manual/automated cutoff.
After planning, I review. Does the breakdown make sense? Tasks small enough? Dependencies right?
If doing interactive work: run /flow-next:work <task> one at a time, staying in the loop.
If going fully autonomous: run the Ralph loop and let it execute the entire backlog.
For full autonomous mode, I prepare 5-10 plans before starting.
/flow-next:work fn-1.1 --review=rp
Work on an entire epic (fn-1) or single task (fn-1.1).
For each task:
I'm watching, can intervene. Good for complex tasks, learning a new codebase, anything where taste matters.
/flow-next:ralph-init # one-time setup
scripts/ralph/ralph.sh --watchThe loop runs unattended:
Kick it off before bed. Wake up to completed features or clear visibility into what's stuck.

Whether interactive or autonomous, I review the PR before merging. AI generates, human approves.
Full audit trail-every review receipt, every commit, every status change.
The heuristic: If you can write checkboxes, you can Ralph it. If you can't, you're not ready to loop-you're ready to think.
Install:
/plugin marketplace add https://github.com/gmickel/gmickel-claude-marketplace
/plugin install flow-next
/flow-next:setupInitialize Ralph mode:
/flow-next:ralph-initResources:
Two models > one. Process failures, not model failures. Agents that actually finish what they start.