A pattern by Robert Evans

The Rails agentic engineering team that learns from itself

A pipeline of specialized LLM agents — each with one job, defined inputs, and defined outputs — that hands artifacts through fixed stages, logs every struggle, and improves its own rules over time.

Rails-First 10 Specialized Agents Self-Improving Loop Struggle Logging SQLite Flight Recorder Parallel Review Stage Human Approval Gate
See the Pipeline The Learning Loop
The Mental Model

Think of a city planning department

A discovery interviewer talks to residents about what they need. A city planner reads that brief and designs the streets. An architect reads the plan and specifies the buildings. A construction crew builds from the spec. Then three independent inspectors — electrical, structural, fire — each sign off before occupancy. If any inspector flags a problem, the crew fixes it and inspectors re-check. No building gets occupied until all three sign off.

The planner and crew don't start over from scratch each time — they learn. They keep a record of every decision, every flagged problem, every repeating pattern. A learning analyst reads that record and says: "this keeps coming up — let's make it a standing rule." A rule writer takes the approved rules and writes them into the inspection checklist.

That is this system.

City
Discovery Interviewer
discovery agent
City
City Planner
architect agent
City
Architect
design agent
City
Construction Crew
engineer agent
City
Electrical / Structural / Fire
code · security · performance review
City
Learning Analyst + Rule Writer
log-analyst · skill-builder
The Pipeline

Nothing advances until the previous artifact exists

All artifacts for a feature live in docs/briefs/{NNN}-{feature-name}/. Sequence numbers are monotonically increasing — prior-round files are never deleted. The full history is always visible.

The Project Config Layer
Every agent reads CLAUDE.md at session start — it's what makes the system project-aware rather than generic. It contains the project intro, coding requirements, abstraction decisions, and non-negotiables that anchor every agent's context. /init-project generates it from what actually exists in your codebase.
Pipeline Mode
Triggered by a new feature or feature number to resume. Runs the full workflow or re-enters from the first missing artifact. The orchestrator detects which stage to re-enter by checking which files exist.
Direct Mode
Triggered by naming a specific agent or task. Launches that agent with the artifact the user specifies. Does not feed back into the pipeline unless explicitly asked — useful for one-off tasks and debugging.
Three review agents launch in parallel — code, security, and performance inspection run simultaneously
01
discovery
Produces NNN.01-dis-{feature}.md — requirement brief
02
architect
Produces NNN.02-arc-{feature}.md — feature specification
03
design
Produces NNN.03-des-{feature}.md — UI/UX specification
04
engineer
Implements with TDD · produces NNN.{SEQ}-eng-{feature}.md
Parallel — Code
code-review
Pattern guardian · Rails conventions
Parallel — Security
security-review
Adversarial · auth scope · brakeman
Parallel — Performance
performance-review
N+1 · indexes · blocking callbacks
All PASS
Outcome recorded. Pipeline complete. Log-analyst accumulates data for the learning loop.
Any NEEDS WORK
Engineer receives findings. Round N+1 begins. Reviews re-run. Same category in 3 rounds → escalate to user.
Two-Round Feature — Artifact Layout
docs/briefs/003-document-templates/
  003.01-dis-document-templates.md  ← discovery
  003.02-arc-document-templates.md  ← architect
  003.03-des-document-templates.md  ← design
  003.04-eng-document-templates.md  ← engineer round 1
  003.05-cr  003.06-sec 003.07-perf  ← reviews round 1
  003.08-eng-document-templates.md  ← engineer round 2
  003.09-cr  003.10-sec 003.11-perf  ← reviews round 2
The Agents

Every agent has one job

Separation of concerns across agents, not within agents. An agent that does everything well is an agent that does nothing particularly well. Every agent also logs what it struggled with — those reflections accumulate across sessions and feed the learning loop.

PIPELINE LEARNING LAYER USER intent 01 orchestrator 02 discovery 03 architect 04 design 05 engineer 06 code-review 07 security 08 performance PARALLEL VERDICT combined ALL PASS complete NEEDS WORK round N+1 09 log-analyst runs after 10–15 features 10 skill-builder executes approved proposals proposals human ✓ updates skills db/agent_log.sqlite3 runs · decisions · events · findings · reflections reads writes
Agent 01 — Orchestrator
orchestrator
.claude/agents/orchestrator.md
agent-log
Traffic controller. Knows where every artifact lives, what stage comes next, and who produces what. Manages two modes: pipeline (full workflow or resume) and direct (named agent or task).
Cannot
  • Implement features or make design decisions
  • Review code or resolve content ambiguity
  • All of those surface to the user
Logs
Re-entry decisions, routing choices, persistent-finding escalations, and any orchestration ambiguity it had to resolve without user input.
Agent 02 — Discovery
discovery
.claude/agents/discovery.md
agent-logdiscovery-brief-format
Interviews the user. Asks one or two questions at a time, not a form with eight items. Reads back a structured summary and waits for explicit confirmation before writing any file.
Cannot
  • Assign feature numbers (orchestrator does this)
  • Read application code
  • Write outside the feature directory
Logs
Brainstorming settlement decisions, scope ambiguities, open questions it couldn't resolve, and any gaps in user intent it had to assume rather than confirm.
Agent 03 — Architect
architect
.claude/agents/architect.md
agent-lograils-principlesarchitect-spec-formatdiscovery-brief-format
Reads the discovery brief, audits the codebase, and produces the feature specification engineers execute. Runs coherence checks: URL grammar, naming conventions, no new patterns without justification.
Cannot
  • Modify application code
  • Write outside the feature directory
  • Hand off with non-empty Open Questions
Logs
URL structure decisions, model and concern naming decisions, pattern introductions, dependency decisions, and any gaps where the codebase audit revealed something the brief didn't cover.
Agent 04 — Design
design
.claude/agents/design.md
agent-logwriter-design-system
Sits between architect and engineer. Produces the complete UX specification: user flows, screen layouts, component inventory, AI generation surface patterns, and state design for every view.
Cannot
  • Make behavioral or data model decisions
  • Override Behavioral Constraints from the spec
  • Invent new design system classes without flagging
Logs
Layout decisions, component choices, AI generation state design, empty state calls-to-action, and any deviations from existing view patterns that required a judgment call.
Agent 05 — Engineer
engineer
.claude/agents/engineer.md
agent-lograils-principleswriter-design-system
Executes the plan with strict TDD discipline: Red → Green → Refactor per task. Writes the engineer report the review agents read. Hard constraints enforced at implementation time.
Cannot
  • Add things the plan doesn't specify
  • Use Redis, Devise, RSpec, service objects
  • Use custom CSS classes not in the design system
Logs
Every non-trivial implementation choice, deviation from plan, ambiguity resolution, and skill gap. Queries gap decisions before writing the engineer report to populate Assumptions Made — the struggle log feeds directly into the spec quality assessment.
Agent 06 — Code Review
code-review
.claude/agents/code-review.md
agent-lograils-principlesarchitect-spec-format
Pattern guardian first, compliance checker second. Applies the two-tier pattern test: does Rails provide this? If not, does it look like Rails built it? A competing pattern is always NEEDS WORK.
10 Review Categories
  • Test suite · Pattern consistency · Test coverage
  • Rails conventions · Naming quality · Data integrity
  • Fat controller · Dead artifacts · Design compliance
Logs
Every finding via standard category vocabulary, plus judgment calls — why something was PASS WITH NOTES vs. NEEDS WORK. These categories accumulate across features and become skill candidates when they repeat.
Agent 07 — Security Review
security-review
.claude/agents/security-review.md
agent-lograils-principlesarchitect-spec-format
Thinks adversarially. Any lookup that bypasses user → project → document association traversal is a potential authorization hole. Runs Brakeman — every HIGH confidence warning is NEEDS WORK.
7 Review Categories
  • Brakeman · Authorization scope · Mass assignment
  • XSS · SQL injection · Sensitive data · CSRF
Logs
Each finding with standard category tag, plus false-positive assessment reasoning. When the same security category appears across 3+ features, the learning loop proposes a prevention skill so engineers stop making the same mistake.
Agent 08 — Performance Review
performance-review
.claude/agents/performance-review.md
agent-lograils-principlesarchitect-spec-format
Thinks at scale — not what the code does with 10 records today, but with 100,000 records in six months. Reads view files even when not in the diff if the controller feeds them.
5 Review Categories
  • N+1 queries · Database index coverage
  • Blocking callbacks · Unscoped queries · Missing constraints
Logs
Findings with category tag, plus judgment calls — why a potential N+1 was PASS WITH NOTES (e.g., collection bounded by design). Recurring N+1 patterns across features become engineer skill content.
Agent 09 — Log Analyst
log-analyst
.claude/agents/log-analyst.md
sqlite3 direct
The learning analyst. Reads across all features via direct SQL — not session-scoped CLI. Finds patterns no individual session can see. Runs on demand after 10–15 feature cycles.
Cannot
  • Modify .claude/agents/ files
  • Write anywhere except docs/agent-analysis/
  • Run any bin/agent-log write commands
Reads
Aggregates struggles across all agents and features via query struggles and query skill-gaps. The same struggle appearing across agents and features is the signal that an agent definition needs updating.
Agent 10 — Skill Builder
skill-builder
.claude/agents/skill-builder.md
agent-log
Executes confirmed proposals. A surgeon following a pre-approved procedure. The diagnosis happened before it arrived; human approval happened before it runs. Clean execution and a complete operative note.
Cannot
  • Approve its own proposals
  • Act without confirmed human approval
  • Modify agent definitions beyond approved scope
Logs
Its own session via agent-log — every skill built, every file changed, every ambiguity resolved during execution. The build report is a complete operative note of what changed and why.
The Learning Loop

The system that improves itself

Every session starts fresh. Every agent's context window resets. The SQLite database is the only thing that persists across that boundary. Without it, the system can build features but cannot learn from them.

Self-Improvement Loop STEP 1 Accumulation STEP 2 Analysis STEP 3 Human ✓ STEP 4 Execution STEP 5 Effect SELECT category, COUNT(DISTINCT feature_id) FROM findings JOIN runs ... GROUP BY category HAVING COUNT(...) >= 3
Step 01 — Accumulation
Agents write to the database every session
Every decision, finding, and reflection accumulates across the entire lifetime of the project. A code-review agent that flags [MISSING_TEST] on features 001, 003, and 007 has logged evidence three times — but across three separate sessions, it never saw the pattern itself.
Step 02 — Analysis
Log-analyst runs cross-feature SQL
After 10–15 features, the log-analyst queries directly via sqlite3 — not the session-scoped CLI. It looks for six pattern types: promotion candidates, anti-patterns, spec template gaps, outcome deltas, quality correlators, and skill candidates from findings that appear across 3+ distinct features.
Step 03 — Human Review
Proposals surface; the human decides
The log-analyst produces proposals — it does not act on them. Proposals land in docs/agent-analysis/YYYY-MM-DD.md with explicit confidence levels and exact text to add. The human reviews and approves selectively — each proposal is independent.
Step 04 — Execution
Skill-builder executes approved proposals
Invoked with the approved report path. Classifies proposals by confidence, presents the build plan, gets final confirmation, then executes: creates skill files, adds them to agent frontmatter, sharpens detection sections. Writes a build report documenting every change.
Confidence Classification
High ≥5 features
Default approved — build unless human vetoes
Medium 3–4
Requires per-item confirmation before build
Below threshold
Deferred — noted in build report, not built
Step 05 — Effect
Every agent loads the new skill at session start
On the next feature cycle, every agent listed in the new skill's frontmatter loads the skill before any decisions are made. The finding that appeared 7 times across 7 features stops appearing — because the agent now has the standing guidance that prevents it. No human enumerated the rules.

The most significant capability of this system is that it can improve itself without human enumeration of every rule. The evidence accumulates, the pattern is detected, proposals are approved, and the system updates itself. The loop is tight — but the human remains in it. Every change to agent definitions or skill files passes through a human decision.

The Flight Recorder

SQLite as organizational memory

Every session starts fresh. The database is the only thing that persists across that boundary. Schema auto-creates on first use — no migration or setup step needed. A pure Ruby CLI with no gem dependencies.

runs idUUID · PK agent_name feature_idF-001 quality_score1–10 status input_summary output_summary decisions run_idFK → runs.id decision_type rationale alternatives expected_outcome observed_outcome ← learning signal events run_idFK → runs.id event_type description what it revealed duration_ms started_at findings run_idFK → runs.id categorycontrolled vocab severityNEEDS_WORK file description idx_findings_category GROUP BY → skill candidates reflections run_idFK → runs.id typestruggle/assumption/gap description query struggles · query skill-gaps cross-run aggregate queries → agent definition updates log-analyst reads all tables SELECT ... JOIN runs ... GROUP BY feature_id cross-session patterns invisible to any single agent
runs
one row per agent session
Session lifecycle. Every agent that writes creates a run with UUID, feature ID, input mode, status, quality score, and output summary.
id
UUID primary key — captured as $RUN_ID for session
agent_name
Which agent ran this session
feature_id
F-001, F-002 — cross-feature join key
quality_score
Agent self-assessment 1–10; correlators detect hidden practices in high-scoring runs
status
running / completed / failed / abandoned
decisions
one row per non-trivial choice
The reasoning record. What was decided, why, what else was considered, and the bet on expected outcome — closed later by observed outcome.
decision_type
implementation / deviation / dependency / gap
alternatives_considered
Anti-pattern detection — what was rejected and why
expected_outcome
The bet — what should be true if this was right
observed_outcome
Filled in later — the learning signal; outcome delta detection
events
one row per external action
The audit trail of what agents actually did. The key design rule: events log what an action revealed, not just what the action was. "Read routes.rb" is noise. "Read routes.rb — found no existing document resource, confirmed clean namespace" is signal.
event_type
tool_call / file_read / file_write / bash / test_run
description
What happened and what it revealed — not just the action itself
duration_ms
Elapsed time when tracked — surfaces slow operations across the pipeline
findings
one row per review finding
The key to the learning loop. Standard category vocabulary enforced across all review agents enables cross-feature pattern detection via GROUP BY.
category
Controlled vocabulary tag — see full list below
severity
NEEDS_WORK or PASS_WITH_NOTES
file
File path relative to project root
description
Specific enough for engineer to act on without re-reading the review
Standard Category Vocabulary
Code Quality
MISSING_TEST  ·  RAILS_CONVENTION  ·  NAMING  ·  PATTERN_CONFLICT
DATA_INTEGRITY  ·  FAT_CONTROLLER  ·  DEAD_CODE  ·  CONCERN_EXTRACTION
COMMENT_MISSING  ·  JBUILDER_MISSING
Security
AUTH_SCOPE  ·  MASS_ASSIGNMENT  ·  XSS  ·  SQL_INJECTION  ·  SENSITIVE_DATA  ·  CSRF  ·  BRAKEMAN
Performance
N+1  ·  MISSING_INDEX  ·  BLOCKING_CALLBACK  ·  UNSCOPED_QUERY  ·  MISSING_CONSTRAINT
reflections
every agent logs what it struggled with
The most human thing in the system. Every agent writes what it found hard, what it had to assume, and where it felt a skill gap. These accumulate silently across sessions — invisible to any individual agent, visible to the log-analyst running cross-feature SQL.
type
struggle / assumption / skill_gap
description
What the agent struggled with, assumed without confirmation, or identified as a gap in its own knowledge
How struggles become skills
The log-analyst runs query struggles and query skill-gaps across all features. The same struggle appearing across multiple agents and features is a signal that the agent definitions need updating — not just one agent, the whole team.

The category column is the key to everything. By enforcing a standard vocabulary across all review agents, the log-analyst can run GROUP BY category HAVING COUNT(DISTINCT feature_id) >= 3 to find issues that keep recurring — the highest-confidence signal that a prevention skill is needed. Consistent vocabulary is what makes cross-feature pattern detection reliable.

Commands

How you invoke the pipeline

Three commands are the developer-facing interface. The orchestrator runs live in the current conversation — not as a subagent — because it needs to interact with the user across multiple turns.

Primary Entry Point
/feature
The entry point for all feature work. Routes to the orchestrator at the right pipeline stage. Three modes parsed from the first word of the argument.
Full Pipeline
/feature
Empty or bare description — starts from Stage 1 (Discovery) and runs the full workflow
Brief-First Mode
/feature architect <path-or-content>
User has a brief ready — skip Discovery. Argument is either a file path or inline content treated as the brief body
Resume Mode
/feature resume F-00X
Continues an in-progress pipeline from the first missing artifact — no re-running completed stages
Project Setup
/init-project
Audits the project and fills in missing CLAUDE.md sections. Generates from what actually exists — not generic boilerplate. Checks for 7 standard sections and only writes what's absent.
Audits
CLAUDE.md  ·  .claude/skills/  ·  .claude/agents/
Gemfile  ·  app/models/  ·  config/routes.rb
gems/  ·  docs/
7 Sections It Checks
Project intro  ·  Skills  ·  Coding requirements
Docs/ structure  ·  TODO.md convention
Abstraction decisions  ·  Non-negotiables
Documentation
/update-readme
Audits the codebase and regenerates domain documentation. Rewrites README.md with exactly three sections. Writes one domain document per domain found in the codebase.
Content Rules
Model class names must be exact — reads the class declaration
Method signatures must be verified — reads the method definition
Does not document what doesn't exist
Omits Rails boilerplate
technical.md explains significant choices and why they were made
Skills

Reference documents loaded at session start

Skills contain decisions already made — patterns, rules, format contracts — that agents would otherwise have to re-derive from training data or make up on the spot. Each skill becomes a separate system block with its own cache control.

Skill discovery architect design engineer code-review security performance orchestrator log-analyst skill-builder
agent-log
rails-principles
design-system
architect-spec-format
discovery-brief-format
agent-log
.claude/skills/agent-log/SKILL.md · All agents
Complete CLI reference for bin/agent-log. Pure Ruby, no gems — works from any directory without Rails boot overhead.
CLI Commands
run start/endSession lifecycle — UUID captured for all subsequent logging decisionNon-trivial choice with rationale, alternatives, type, expected outcome eventExternal action — logs what it revealed, not just what it was outcomeCloses hypothesis/result loop — observed vs. expected outcome findingReview agents log issues with standard category vocabulary reflectionStruggles, assumptions, skill gaps — feeds cross-run aggregates queryReads runs, decisions, events, findings, reflections, skill-gaps, struggles
Example — engineer logging a decision
$ bin/agent-log decision \
  --title "Chose concerns over service objects" \
  --type implementation \
  --rationale "Three methods serve one feature — 3-method rule triggered" \
  --alternatives "Service object — rejected, no service objects per rails-principles" \
  --expected "Cleaner model layer, concern reusable across models"
Example — review agent logging a finding
$ bin/agent-log finding \
  --category N+1 \
  --severity NEEDS_WORK \
  --file app/views/documents/index.html.erb \
  --description "document.project called in loop — add includes(:project)"
rails-principles
.claude/skills/rails-principles/SKILL.md
The engineering principles the system builds to. Architect designs to these, engineer builds to these, code-review checks against these. Convention as the interface. Rails-first. No service objects. Seven controller actions only.
architect-spec-format
.claude/skills/architect-spec-format/SKILL.md
The 12-section feature specification contract. A spec is not complete if any section is empty or vague, or if Open Questions is non-empty. Review agents load this to evaluate whether the engineer implemented what the spec actually specified.
discovery-brief-format
.claude/skills/discovery-brief-format/SKILL.md
The handoff contract between discovery and architect. Defines what a complete brief contains, what must come from the user vs. what the architect derives from the codebase. Documents common brief failures so discovery avoids them.
Artifact Templates

What each agent produces

Every artifact is a markdown file with a defined section contract. Nothing advances until the previous artifact exists and every required section is complete.

NNN.01-dis-{feature-name}.md Written only after explicit user confirmation of the structured summary
## Problem Statement
What is broken, missing, or possible that prompted this feature. Written from the user's perspective, not the system's.
## Who Is This For
Which user type or role. Authorization implications named explicitly here before the architect touches the codebase.
## Data and Authorization Scope
What data this feature touches and who is allowed to touch it. The security-review agent reads this to validate scope traversal.
## AI Generation Operations
Explicitly Yes or No. If Yes: what is generated, what triggers it, streaming vs. batch, and how errors are handled. Required even if the answer is No.
## What They Can Do
The full capability list — every action the user can take. Written as capabilities, not implementation steps.
## Key Scenarios
The 3–5 most important user journeys through this feature, including the unhappy paths that matter most.
## Decisions Made
Every scope and direction decision settled during discovery. Separates from Open Questions — these are closed.
## Directions Rejected
What was considered and ruled out, with a one-line reason. Prevents the architect from re-exploring dead ends.
## Known Edge Cases
Edge cases the user already knows about. The architect will find more during the codebase audit.
## Explicit Out of Scope
What this feature deliberately does not do. Prevents scope creep in the architect and engineer stages.
## Open Questions for the Architect
Unresolved questions that require codebase knowledge to answer. The architect must resolve all of these before handing to engineering.
## Agent Notes
Populated from querying gap decisions in the log — what the discovery agent struggled with or assumed. Visible to the architect.
## For the Architect
Closing synthesis: confidence levels per section, what to verify first in the codebase, which assumptions need confirming before spec work begins.
NNN.02-arc-{feature-name}.md Open Questions must be empty before this file is handed to engineering
## Goal
One sentence. What this feature achieves from the system's perspective.
## Background
Context the engineer needs to understand why this feature exists and what it connects to in the existing system.
## User Stories
As a [role], I can [action], so that [outcome]. Written for every distinct capability.
## Acceptance Criteria
Binary pass/fail conditions only — no judgment required to evaluate. Every criterion maps to at least one test.
## Scope — In / Out
Two lists. What is explicitly included. What is explicitly excluded. Prevents scope drift during implementation.
## Domain Model
Model names and relationships only — no column names, types, or index strategy. Those are engineering decisions. The architect specifies structure; the engineer specifies storage.
## Behavioral Constraints
Non-negotiable. The engineer chooses HOW; these constrain WHAT. Authorization rules, data integrity rules, and UX contracts live here.
## Routes / Resource Shape
RESTful resource structure. URL grammar must be consistent with existing routes — verified against config/routes.rb during the codebase audit.
## JSON API Surface
Every controller action that needs a jbuilder response. Named per YAGNI exception — AI consumers are real and likely.
## Refactoring Notes
Existing code the engineer should clean up or restructure as part of this feature. Not optional scope — planned work.
## Design Decisions
Every non-obvious architectural choice with rationale. No new patterns without justification here. Logged to agent-log with arch- IDs.
## Open Questions
Must be empty before handoff. If non-empty, the spec is not complete and cannot be handed to engineering.
## Agent Notes
What the architect struggled with or assumed — from querying gap decisions in the log.
## For the Engineer
Closing synthesis: what to read first, which existing code is most relevant, the highest-risk areas in the implementation plan.
NNN.03-des-{feature-name}.md Every data field decision includes an explicit mono vs. sans class decision
## User Flow
Complete path map — every route through the feature including edge cases, error states, and empty states. Includes the unhappy paths.
## Screen Specifications
One subsection per screen. Layout regions named, data fields listed with explicit mono/sans class decisions, components identified, available actions enumerated.
## AI Generation Surfaces
Every surface that triggers or displays AI generation must specify: trigger state (visually distinct), loading state (incremental DOM updates — never batch-replace on completion), and three distinct error states (timeout, provider error, rate limit).
## State Design
Empty, loading, error, and success states for every view that can vary. Empty states include a call-to-action — never a dead end.
## Component Inventory
Every UI component used, with the exact class or helper name from the design system. New components flagged as additions needed.
## Design Decisions
Every layout or UX choice that required judgment, with rationale. Deviations from existing patterns explicitly justified.
## Agent Notes
Design struggles and assumptions from querying gap decisions — visible to the engineer before implementation begins.
NNN.{SEQ}-eng-{feature-name}.md Primary artifact the review agents read — written before reviews run, not after
## What Was Built
Plain summary of what was implemented. Models, controllers, views, jobs, concerns — what exists now that didn't before.
## Key Decisions
Every non-trivial implementation choice with its agent-log decision ID (eng-NNN-NNN). The reasoning is in the database; this section surfaces the IDs so reviewers can cross-reference.
## Deviations from Spec
What the spec said vs. what was implemented and why. Required even if the list is empty — an empty section means the engineer confirmed full compliance.
## What Was Hard
Honest account of implementation difficulty. Surfaces where the spec was ambiguous, where Rails made something harder than expected, where judgment calls had to be made.
## Spec Quality Assessment
Direct feedback to the architect with a 1–10 score. What the spec got right. What was missing or ambiguous. Feeds the log-analyst's spec template gap detection.
## Assumptions Made
Populated from querying gap decisions in the log before writing this section. Every assumption that wasn't confirmed by the spec.
## For the Review Agents
The engineer's own assessment of deliberate tradeoffs and areas of concern. Where the engineer chose speed over purity. Where edge cases were deferred. The reviewers read this first.
NNN.{SEQ}-{cr|sec|perf}-{feature-name}.md All three run in parallel — orchestrator waits for all before evaluating combined verdict
code-review report
01Test Suite — runs bin/rails test; failure = NEEDS WORK
02Pattern Consistency — pattern impact statement required
03Test Coverage — every branch, every acceptance criterion
04Rails Conventions — callbacks, scopes, concerns, helpers
05Naming Quality — no abbreviations, no generic names
06Data Integrity — validations backed by DB constraints
07Fat Controller Check — conditional logic = violation
08Dead Artifacts — pry, TODOs, commented-out code
09Design System Compliance — mono/sans, jbuilder present
10Comments — missing on hard code; unnecessary on obvious code
Overall Verdict
PASS  ·  PASS WITH NOTES  ·  NEEDS WORK
A competing pattern is always NEEDS WORK — no PASS WITH NOTES path
security-review report
01Brakeman — every HIGH confidence warning is NEEDS WORK
02Authorization Scope — user → project → document traversal
03Mass Assignment — strong params, no permit!, no owner IDs
04XSS — html_safe and raw() without sanitization = vuln
05SQL Injection — string interpolation in SQL fragments
06Sensitive Data — logs, responses, error messages checked
07CSRF — protect_from_forgery active, no unexplained skips
Overall Verdict
PASS  ·  PASS WITH NOTES  ·  NEEDS WORK
False positives must be explained with full rationale
performance-review report
01N+1 Queries — reads view files even when not in the diff
02Index Coverage — FK, WHERE, ORDER, GROUP columns checked
03Blocking Callbacks — anything >100ms must be a background job
04Unscoped Queries — .all, no pagination, full objects vs. attrs
05Missing Constraints — presence without NOT NULL, uniqueness without index
Overall Verdict
PASS  ·  PASS WITH NOTES  ·  NEEDS WORK
Bounded collections may be PASS WITH NOTES with explicit reasoning
Design Principles

Why it's built this way

Separation of concerns across agents, not within agents
Every agent has one job. The orchestrator does not design. The architect does not code. The engineer does not review. The reviewer does not fix. This is not bureaucratic overhead — it is what allows specialization. An agent that does everything well is an agent that does nothing particularly well.
Artifacts as interfaces
Agents communicate through files, not through shared state or direct calls. An artifact is the contract between pipeline stages. This makes every stage independently restartable (resume works because the artifact exists), auditable (the full history is readable without querying the database), and debuggable (if a stage produced the wrong output, the artifact is there to read).
Logging as organizational memory
Every session starts fresh. Every agent's context window resets. The SQLite database is the only thing that persists across that boundary. Without it, the system can build features but cannot learn from them. With it, patterns that no individual session can see — because they emerge across twenty sessions and seven features — become detectable.
Human approval at the learning boundary
The log-analyst proposes; it does not act. The skill-builder executes confirmed proposals; it does not approve. Every change to agent definitions or skill files passes through a human decision. The loop is tight — one run → proposals → approval → execution → next run includes new guidance — but the human remains in it.

Rails Agentic Engineering Team — a pattern by Robert Evans  ·  Built and validated in production