A pipeline of specialized LLM agents — each with one job, defined inputs, and defined outputs — that hands artifacts through fixed stages, logs every struggle, and improves its own rules over time.
A discovery interviewer talks to residents about what they need. A city planner reads that brief and designs the streets. An architect reads the plan and specifies the buildings. A construction crew builds from the spec. Then three independent inspectors — electrical, structural, fire — each sign off before occupancy. If any inspector flags a problem, the crew fixes it and inspectors re-check. No building gets occupied until all three sign off.
The planner and crew don't start over from scratch each time — they learn. They keep a record of every decision, every flagged problem, every repeating pattern. A learning analyst reads that record and says: "this keeps coming up — let's make it a standing rule." A rule writer takes the approved rules and writes them into the inspection checklist.
That is this system.
All artifacts for a feature live in docs/briefs/{NNN}-{feature-name}/. Sequence numbers are monotonically increasing — prior-round files are never deleted. The full history is always visible.
CLAUDE.md at session start — it's what makes the system project-aware rather than generic. It contains the project intro, coding requirements, abstraction decisions, and non-negotiables that anchor every agent's context. /init-project generates it from what actually exists in your codebase.NNN.01-dis-{feature}.md — requirement briefNNN.02-arc-{feature}.md — feature specificationNNN.03-des-{feature}.md — UI/UX specificationNNN.{SEQ}-eng-{feature}.mdSeparation of concerns across agents, not within agents. An agent that does everything well is an agent that does nothing particularly well. Every agent also logs what it struggled with — those reflections accumulate across sessions and feed the learning loop.
query struggles and query skill-gaps. The same struggle appearing across agents and features is the signal that an agent definition needs updating.Every session starts fresh. Every agent's context window resets. The SQLite database is the only thing that persists across that boundary. Without it, the system can build features but cannot learn from them.
[MISSING_TEST] on features 001, 003, and 007 has logged evidence three times — but across three separate sessions, it never saw the pattern itself.docs/agent-analysis/YYYY-MM-DD.md with explicit confidence levels and exact text to add. The human reviews and approves selectively — each proposal is independent.The most significant capability of this system is that it can improve itself without human enumeration of every rule. The evidence accumulates, the pattern is detected, proposals are approved, and the system updates itself. The loop is tight — but the human remains in it. Every change to agent definitions or skill files passes through a human decision.
Every session starts fresh. The database is the only thing that persists across that boundary. Schema auto-creates on first use — no migration or setup step needed. A pure Ruby CLI with no gem dependencies.
query struggles and query skill-gaps across all features. The same struggle appearing across multiple agents and features is a signal that the agent definitions need updating — not just one agent, the whole team.The category column is the key to everything. By enforcing a standard vocabulary across all review agents, the log-analyst can run GROUP BY category HAVING COUNT(DISTINCT feature_id) >= 3 to find issues that keep recurring — the highest-confidence signal that a prevention skill is needed. Consistent vocabulary is what makes cross-feature pattern detection reliable.
Three commands are the developer-facing interface. The orchestrator runs live in the current conversation — not as a subagent — because it needs to interact with the user across multiple turns.
Skills contain decisions already made — patterns, rules, format contracts — that agents would otherwise have to re-derive from training data or make up on the spot. Each skill becomes a separate system block with its own cache control.
| Skill | discovery | architect | design | engineer | code-review | security | performance | orchestrator | log-analyst | skill-builder |
|---|---|---|---|---|---|---|---|---|---|---|
| agent-log | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — | ✓ |
| rails-principles | — | ✓ | — | ✓ | ✓ | ✓ | ✓ | — | — | — |
| design-system | — | ✓ | ✓ | ✓ | ✓ | — | — | — | — | — |
| architect-spec-format | — | ✓ | — | — | ✓ | ✓ | ✓ | — | — | — |
| discovery-brief-format | ✓ | ✓ | — | — | — | — | — | — | — | — |
Every artifact is a markdown file with a defined section contract. Nothing advances until the previous artifact exists and every required section is complete.
Rails Agentic Engineering Team — a pattern by Robert Evans · Built and validated in production