Rails Agentic Engineering Team — A Self-Improving Agent Pipeline

The Pipeline

Nothing advances until the previous artifact exists

All artifacts for a feature live in docs/briefs/{NNN}-{feature-name}/. Sequence numbers are monotonically increasing — prior-round files are never deleted. The full history is always visible.

The Project Config Layer

Every agent reads CLAUDE.md at session start — it's what makes the system project-aware rather than generic. It contains the project intro, coding requirements, abstraction decisions, and non-negotiables that anchor every agent's context. /init-project generates it from what actually exists in your codebase.

Pipeline Mode

Triggered by a new feature or feature number to resume. Runs the full workflow or re-enters from the first missing artifact. The orchestrator detects which stage to re-enter by checking which files exist.

Direct Mode

Triggered by naming a specific agent or task. Launches that agent with the artifact the user specifies. Does not feed back into the pipeline unless explicitly asked — useful for one-off tasks and debugging.

discovery

Produces NNN.01-dis-{feature}.md — requirement brief

architect

Produces NNN.02-arc-{feature}.md — feature specification

design

Produces NNN.03-des-{feature}.md — UI/UX specification

engineer

Implements with TDD · produces NNN.{SEQ}-eng-{feature}.md

Parallel — Code

code-review

Pattern guardian · Rails conventions

Parallel — Security

security-review

Adversarial · auth scope · brakeman

Parallel — Performance

performance-review

N+1 · indexes · blocking callbacks

All PASS

Outcome recorded. Pipeline complete. Log-analyst accumulates data for the learning loop.

Any NEEDS WORK

Engineer receives findings. Round N+1 begins. Reviews re-run. Same category in 3 rounds → escalate to user.

Two-Round Feature — Artifact Layout

docs/briefs/003-document-templates/
  003.01-dis-document-templates.md ← discovery
  003.02-arc-document-templates.md ← architect
  003.03-des-document-templates.md ← design
  003.04-eng-document-templates.md ← engineer round 1
  003.05-cr 003.06-sec 003.07-perf ← reviews round 1
  003.08-eng-document-templates.md ← engineer round 2
  003.09-cr 003.10-sec 003.11-perf ← reviews round 2

The Agents

Every agent has one job

Separation of concerns across agents, not within agents. An agent that does everything well is an agent that does nothing particularly well. Every agent also logs what it struggled with — those reflections accumulate across sessions and feed the learning loop.

Agent 01 — Orchestrator

orchestrator

.claude/agents/orchestrator.md

agent-log

Traffic controller. Knows where every artifact lives, what stage comes next, and who produces what. Manages two modes: pipeline (full workflow or resume) and direct (named agent or task).

Cannot

Implement features or make design decisions
Review code or resolve content ambiguity
All of those surface to the user

Logs

Re-entry decisions, routing choices, persistent-finding escalations, and any orchestration ambiguity it had to resolve without user input.

Agent 02 — Discovery

discovery

.claude/agents/discovery.md

agent-logdiscovery-brief-format

Interviews the user. Asks one or two questions at a time, not a form with eight items. Reads back a structured summary and waits for explicit confirmation before writing any file.

Cannot

Assign feature numbers (orchestrator does this)
Read application code
Write outside the feature directory

Logs

Brainstorming settlement decisions, scope ambiguities, open questions it couldn't resolve, and any gaps in user intent it had to assume rather than confirm.

Agent 03 — Architect

architect

.claude/agents/architect.md

agent-lograils-principlesarchitect-spec-formatdiscovery-brief-format

Reads the discovery brief, audits the codebase, and produces the feature specification engineers execute. Runs coherence checks: URL grammar, naming conventions, no new patterns without justification.

Cannot

Modify application code
Write outside the feature directory
Hand off with non-empty Open Questions

Logs

URL structure decisions, model and concern naming decisions, pattern introductions, dependency decisions, and any gaps where the codebase audit revealed something the brief didn't cover.

Agent 04 — Design

design

.claude/agents/design.md

agent-logwriter-design-system

Sits between architect and engineer. Produces the complete UX specification: user flows, screen layouts, component inventory, AI generation surface patterns, and state design for every view.

Cannot

Make behavioral or data model decisions
Override Behavioral Constraints from the spec
Invent new design system classes without flagging

Logs

Layout decisions, component choices, AI generation state design, empty state calls-to-action, and any deviations from existing view patterns that required a judgment call.

Agent 05 — Engineer

engineer

.claude/agents/engineer.md

agent-lograils-principleswriter-design-system

Executes the plan with strict TDD discipline: Red → Green → Refactor per task. Writes the engineer report the review agents read. Hard constraints enforced at implementation time.

Cannot

Add things the plan doesn't specify
Use Redis, Devise, RSpec, service objects
Use custom CSS classes not in the design system

Logs

Every non-trivial implementation choice, deviation from plan, ambiguity resolution, and skill gap. Queries gap decisions before writing the engineer report to populate Assumptions Made — the struggle log feeds directly into the spec quality assessment.

Agent 06 — Code Review

code-review

.claude/agents/code-review.md

agent-lograils-principlesarchitect-spec-format

Pattern guardian first, compliance checker second. Applies the two-tier pattern test: does Rails provide this? If not, does it look like Rails built it? A competing pattern is always NEEDS WORK.

10 Review Categories

Test suite · Pattern consistency · Test coverage
Rails conventions · Naming quality · Data integrity
Fat controller · Dead artifacts · Design compliance

Logs

Every finding via standard category vocabulary, plus judgment calls — why something was PASS WITH NOTES vs. NEEDS WORK. These categories accumulate across features and become skill candidates when they repeat.

Agent 07 — Security Review

security-review

.claude/agents/security-review.md

agent-lograils-principlesarchitect-spec-format

Thinks adversarially. Any lookup that bypasses user → project → document association traversal is a potential authorization hole. Runs Brakeman — every HIGH confidence warning is NEEDS WORK.

7 Review Categories

Brakeman · Authorization scope · Mass assignment
XSS · SQL injection · Sensitive data · CSRF

Logs

Each finding with standard category tag, plus false-positive assessment reasoning. When the same security category appears across 3+ features, the learning loop proposes a prevention skill so engineers stop making the same mistake.

Agent 08 — Performance Review

performance-review

.claude/agents/performance-review.md

agent-lograils-principlesarchitect-spec-format

Thinks at scale — not what the code does with 10 records today, but with 100,000 records in six months. Reads view files even when not in the diff if the controller feeds them.

5 Review Categories

N+1 queries · Database index coverage
Blocking callbacks · Unscoped queries · Missing constraints

Logs

Findings with category tag, plus judgment calls — why a potential N+1 was PASS WITH NOTES (e.g., collection bounded by design). Recurring N+1 patterns across features become engineer skill content.

Agent 09 — Log Analyst

log-analyst

.claude/agents/log-analyst.md

sqlite3 direct

The learning analyst. Reads across all features via direct SQL — not session-scoped CLI. Finds patterns no individual session can see. Runs on demand after 10–15 feature cycles.

Cannot

Modify .claude/agents/ files
Write anywhere except docs/agent-analysis/
Run any bin/agent-log write commands

Reads

Aggregates struggles across all agents and features via query struggles and query skill-gaps. The same struggle appearing across agents and features is the signal that an agent definition needs updating.

Agent 10 — Skill Builder

skill-builder

.claude/agents/skill-builder.md

agent-log

Executes confirmed proposals. A surgeon following a pre-approved procedure. The diagnosis happened before it arrived; human approval happened before it runs. Clean execution and a complete operative note.

Cannot

Approve its own proposals
Act without confirmed human approval
Modify agent definitions beyond approved scope

Logs

Its own session via agent-log — every skill built, every file changed, every ambiguity resolved during execution. The build report is a complete operative note of what changed and why.

The Learning Loop

The system that improves itself

Every session starts fresh. Every agent's context window resets. The SQLite database is the only thing that persists across that boundary. Without it, the system can build features but cannot learn from them.

Step 01 — Accumulation

Agents write to the database every session

Every decision, finding, and reflection accumulates across the entire lifetime of the project. A code-review agent that flags [MISSING_TEST] on features 001, 003, and 007 has logged evidence three times — but across three separate sessions, it never saw the pattern itself.

Step 02 — Analysis

Log-analyst runs cross-feature SQL

After 10–15 features, the log-analyst queries directly via sqlite3 — not the session-scoped CLI. It looks for six pattern types: promotion candidates, anti-patterns, spec template gaps, outcome deltas, quality correlators, and skill candidates from findings that appear across 3+ distinct features.

Step 03 — Human Review

Proposals surface; the human decides

The log-analyst produces proposals — it does not act on them. Proposals land in docs/agent-analysis/YYYY-MM-DD.md with explicit confidence levels and exact text to add. The human reviews and approves selectively — each proposal is independent.

Step 04 — Execution

Skill-builder executes approved proposals

Invoked with the approved report path. Classifies proposals by confidence, presents the build plan, gets final confirmation, then executes: creates skill files, adds them to agent frontmatter, sharpens detection sections. Writes a build report documenting every change.

Confidence Classification

High ≥5 features

Default approved — build unless human vetoes

Medium 3–4

Requires per-item confirmation before build

Below threshold

Deferred — noted in build report, not built

Step 05 — Effect

Every agent loads the new skill at session start

On the next feature cycle, every agent listed in the new skill's frontmatter loads the skill before any decisions are made. The finding that appeared 7 times across 7 features stops appearing — because the agent now has the standing guidance that prevents it. No human enumerated the rules.

The most significant capability of this system is that it can improve itself without human enumeration of every rule. The evidence accumulates, the pattern is detected, proposals are approved, and the system updates itself. The loop is tight — but the human remains in it. Every change to agent definitions or skill files passes through a human decision.

The Flight Recorder

SQLite as organizational memory

Every session starts fresh. The database is the only thing that persists across that boundary. Schema auto-creates on first use — no migration or setup step needed. A pure Ruby CLI with no gem dependencies.

runs

one row per agent session

Session lifecycle. Every agent that writes creates a run with UUID, feature ID, input mode, status, quality score, and output summary.

UUID primary key — captured as $RUN_ID for session

agent_name

Which agent ran this session

feature_id

F-001, F-002 — cross-feature join key

quality_score

Agent self-assessment 1–10; correlators detect hidden practices in high-scoring runs

status

running / completed / failed / abandoned

decisions

one row per non-trivial choice

The reasoning record. What was decided, why, what else was considered, and the bet on expected outcome — closed later by observed outcome.

decision_type

implementation / deviation / dependency / gap

alternatives_considered

Anti-pattern detection — what was rejected and why

expected_outcome

The bet — what should be true if this was right

observed_outcome

Filled in later — the learning signal; outcome delta detection

events

one row per external action

The audit trail of what agents actually did. The key design rule: events log what an action revealed, not just what the action was. "Read routes.rb" is noise. "Read routes.rb — found no existing document resource, confirmed clean namespace" is signal.

event_type

tool_call / file_read / file_write / bash / test_run

description

What happened and what it revealed — not just the action itself

duration_ms

Elapsed time when tracked — surfaces slow operations across the pipeline

findings

one row per review finding

The key to the learning loop. Standard category vocabulary enforced across all review agents enables cross-feature pattern detection via GROUP BY.

How you invoke the pipeline

Three commands are the developer-facing interface. The orchestrator runs live in the current conversation — not as a subagent — because it needs to interact with the user across multiple turns.

Primary Entry Point

/feature

The entry point for all feature work. Routes to the orchestrator at the right pipeline stage. Three modes parsed from the first word of the argument.

Full Pipeline

/feature

Empty or bare description — starts from Stage 1 (Discovery) and runs the full workflow

Brief-First Mode

/feature architect <path-or-content>

User has a brief ready — skip Discovery. Argument is either a file path or inline content treated as the brief body

Resume Mode

/feature resume F-00X

Continues an in-progress pipeline from the first missing artifact — no re-running completed stages

Project Setup

/init-project

Audits the project and fills in missing CLAUDE.md sections. Generates from what actually exists — not generic boilerplate. Checks for 7 standard sections and only writes what's absent.

Audits

CLAUDE.md · .claude/skills/ · .claude/agents/
Gemfile · app/models/ · config/routes.rb
gems/ · docs/

7 Sections It Checks

Project intro · Skills · Coding requirements
Docs/ structure · TODO.md convention
Abstraction decisions · Non-negotiables

Documentation

/update-readme

Audits the codebase and regenerates domain documentation. Rewrites README.md with exactly three sections. Writes one domain document per domain found in the codebase.

Content Rules

Model class names must be exact — reads the class declaration
Method signatures must be verified — reads the method definition
Does not document what doesn't exist
Omits Rails boilerplate
technical.md explains significant choices and why they were made

Skills

Reference documents loaded at session start

Skills contain decisions already made — patterns, rules, format contracts — that agents would otherwise have to re-derive from training data or make up on the spot. Each skill becomes a separate system block with its own cache control.

Skill	discovery	architect	design	engineer	code-review	security	performance	orchestrator	log-analyst	skill-builder
agent-log	✓	✓	✓	✓	✓	✓	✓	✓	—	✓
rails-principles	—	✓	—	✓	✓	✓	✓	—	—	—
design-system	—	✓	✓	✓	✓	—	—	—	—	—
architect-spec-format	—	✓	—	—	✓	✓	✓	—	—	—
discovery-brief-format	✓	✓	—	—	—	—	—	—	—	—

agent-log

.claude/skills/agent-log/SKILL.md · All agents

Complete CLI reference for bin/agent-log. Pure Ruby, no gems — works from any directory without Rails boot overhead.

CLI Commands

run start/endSession lifecycle — UUID captured for all subsequent logging decisionNon-trivial choice with rationale, alternatives, type, expected outcome eventExternal action — logs what it revealed, not just what it was outcomeCloses hypothesis/result loop — observed vs. expected outcome findingReview agents log issues with standard category vocabulary reflectionStruggles, assumptions, skill gaps — feeds cross-run aggregates queryReads runs, decisions, events, findings, reflections, skill-gaps, struggles

Example — engineer logging a decision

$ bin/agent-log decision \
  --title "Chose concerns over service objects" \
  --type implementation \
  --rationale "Three methods serve one feature — 3-method rule triggered" \
  --alternatives "Service object — rejected, no service objects per rails-principles" \
  --expected "Cleaner model layer, concern reusable across models"

Example — review agent logging a finding

$ bin/agent-log finding \
  --category N+1 \
  --severity NEEDS_WORK \
  --file app/views/documents/index.html.erb \
  --description "document.project called in loop — add includes(:project)"

rails-principles

.claude/skills/rails-principles/SKILL.md

The engineering principles the system builds to. Architect designs to these, engineer builds to these, code-review checks against these. Convention as the interface. Rails-first. No service objects. Seven controller actions only.

architect-spec-format

.claude/skills/architect-spec-format/SKILL.md

The 12-section feature specification contract. A spec is not complete if any section is empty or vague, or if Open Questions is non-empty. Review agents load this to evaluate whether the engineer implemented what the spec actually specified.

discovery-brief-format

.claude/skills/discovery-brief-format/SKILL.md

The handoff contract between discovery and architect. Defines what a complete brief contains, what must come from the user vs. what the architect derives from the codebase. Documents common brief failures so discovery avoids them.

Artifact Templates

What each agent produces

Every artifact is a markdown file with a defined section contract. Nothing advances until the previous artifact exists and every required section is complete.

NNN.01-dis-{feature-name}.md Written only after explicit user confirmation of the structured summary

## Problem Statement

What is broken, missing, or possible that prompted this feature. Written from the user's perspective, not the system's.

## Who Is This For

Which user type or role. Authorization implications named explicitly here before the architect touches the codebase.

## Data and Authorization Scope

What data this feature touches and who is allowed to touch it. The security-review agent reads this to validate scope traversal.

## AI Generation Operations

Explicitly Yes or No. If Yes: what is generated, what triggers it, streaming vs. batch, and how errors are handled. Required even if the answer is No.

## What They Can Do

The full capability list — every action the user can take. Written as capabilities, not implementation steps.

## Key Scenarios

The 3–5 most important user journeys through this feature, including the unhappy paths that matter most.

## Decisions Made

Every scope and direction decision settled during discovery. Separates from Open Questions — these are closed.

## Directions Rejected

What was considered and ruled out, with a one-line reason. Prevents the architect from re-exploring dead ends.

## Known Edge Cases

Edge cases the user already knows about. The architect will find more during the codebase audit.

## Explicit Out of Scope

What this feature deliberately does not do. Prevents scope creep in the architect and engineer stages.

## Open Questions for the Architect

Unresolved questions that require codebase knowledge to answer. The architect must resolve all of these before handing to engineering.

## Agent Notes

Populated from querying gap decisions in the log — what the discovery agent struggled with or assumed. Visible to the architect.

## For the Architect

Closing synthesis: confidence levels per section, what to verify first in the codebase, which assumptions need confirming before spec work begins.

NNN.02-arc-{feature-name}.md Open Questions must be empty before this file is handed to engineering

## Goal

One sentence. What this feature achieves from the system's perspective.

## Background

Context the engineer needs to understand why this feature exists and what it connects to in the existing system.

## User Stories

As a [role], I can [action], so that [outcome]. Written for every distinct capability.

## Acceptance Criteria

Binary pass/fail conditions only — no judgment required to evaluate. Every criterion maps to at least one test.

## Scope — In / Out

Two lists. What is explicitly included. What is explicitly excluded. Prevents scope drift during implementation.

## Domain Model

Model names and relationships only — no column names, types, or index strategy. Those are engineering decisions. The architect specifies structure; the engineer specifies storage.

## Behavioral Constraints

Non-negotiable. The engineer chooses HOW; these constrain WHAT. Authorization rules, data integrity rules, and UX contracts live here.

## Routes / Resource Shape

RESTful resource structure. URL grammar must be consistent with existing routes — verified against config/routes.rb during the codebase audit.

## JSON API Surface

Every controller action that needs a jbuilder response. Named per YAGNI exception — AI consumers are real and likely.

## Refactoring Notes

Existing code the engineer should clean up or restructure as part of this feature. Not optional scope — planned work.

## Design Decisions

Every non-obvious architectural choice with rationale. No new patterns without justification here. Logged to agent-log with arch- IDs.

## Open Questions

Must be empty before handoff. If non-empty, the spec is not complete and cannot be handed to engineering.

## Agent Notes

What the architect struggled with or assumed — from querying gap decisions in the log.

## For the Engineer

Closing synthesis: what to read first, which existing code is most relevant, the highest-risk areas in the implementation plan.

NNN.03-des-{feature-name}.md Every data field decision includes an explicit mono vs. sans class decision

## User Flow

Complete path map — every route through the feature including edge cases, error states, and empty states. Includes the unhappy paths.

## Screen Specifications

One subsection per screen. Layout regions named, data fields listed with explicit mono/sans class decisions, components identified, available actions enumerated.

## AI Generation Surfaces

Every surface that triggers or displays AI generation must specify: trigger state (visually distinct), loading state (incremental DOM updates — never batch-replace on completion), and three distinct error states (timeout, provider error, rate limit).

## State Design

Empty, loading, error, and success states for every view that can vary. Empty states include a call-to-action — never a dead end.

## Component Inventory

Every UI component used, with the exact class or helper name from the design system. New components flagged as additions needed.

## Design Decisions

Every layout or UX choice that required judgment, with rationale. Deviations from existing patterns explicitly justified.

## Agent Notes

Design struggles and assumptions from querying gap decisions — visible to the engineer before implementation begins.

NNN.{SEQ}-eng-{feature-name}.md Primary artifact the review agents read — written before reviews run, not after

## What Was Built

Plain summary of what was implemented. Models, controllers, views, jobs, concerns — what exists now that didn't before.

## Key Decisions

Every non-trivial implementation choice with its agent-log decision ID (eng-NNN-NNN). The reasoning is in the database; this section surfaces the IDs so reviewers can cross-reference.

## Deviations from Spec

What the spec said vs. what was implemented and why. Required even if the list is empty — an empty section means the engineer confirmed full compliance.

## What Was Hard

Honest account of implementation difficulty. Surfaces where the spec was ambiguous, where Rails made something harder than expected, where judgment calls had to be made.

## Spec Quality Assessment

Direct feedback to the architect with a 1–10 score. What the spec got right. What was missing or ambiguous. Feeds the log-analyst's spec template gap detection.

## Assumptions Made

Populated from querying gap decisions in the log before writing this section. Every assumption that wasn't confirmed by the spec.

## For the Review Agents

The engineer's own assessment of deliberate tradeoffs and areas of concern. Where the engineer chose speed over purity. Where edge cases were deferred. The reviewers read this first.

NNN.{SEQ}-{cr|sec|perf}-{feature-name}.md All three run in parallel — orchestrator waits for all before evaluating combined verdict

code-review report

01Test Suite — runs bin/rails test; failure = NEEDS WORK

02Pattern Consistency — pattern impact statement required

03Test Coverage — every branch, every acceptance criterion

04Rails Conventions — callbacks, scopes, concerns, helpers

05Naming Quality — no abbreviations, no generic names

06Data Integrity — validations backed by DB constraints

07Fat Controller Check — conditional logic = violation

08Dead Artifacts — pry, TODOs, commented-out code

09Design System Compliance — mono/sans, jbuilder present

10Comments — missing on hard code; unnecessary on obvious code

Overall Verdict

PASS · PASS WITH NOTES · NEEDS WORK
A competing pattern is always NEEDS WORK — no PASS WITH NOTES path

security-review report

01Brakeman — every HIGH confidence warning is NEEDS WORK

02Authorization Scope — user → project → document traversal

03Mass Assignment — strong params, no permit!, no owner IDs

04XSS — html_safe and raw() without sanitization = vuln

05SQL Injection — string interpolation in SQL fragments

06Sensitive Data — logs, responses, error messages checked

07CSRF — protect_from_forgery active, no unexplained skips

Overall Verdict

PASS · PASS WITH NOTES · NEEDS WORK
False positives must be explained with full rationale

performance-review report

01N+1 Queries — reads view files even when not in the diff

02Index Coverage — FK, WHERE, ORDER, GROUP columns checked

03Blocking Callbacks — anything >100ms must be a background job

04Unscoped Queries — .all, no pagination, full objects vs. attrs

05Missing Constraints — presence without NOT NULL, uniqueness without index

Overall Verdict

PASS · PASS WITH NOTES · NEEDS WORK
Bounded collections may be PASS WITH NOTES with explicit reasoning

The Rails agentic engineering team that learns from itself

Think of a city planning department

Nothing advances until the previous artifact exists

Every agent has one job

The system that improves itself

SQLite as organizational memory

How you invoke the pipeline

Reference documents loaded at session start

What each agent produces

Why it's built this way