npm - @jamie-tam/forge - Versions diffs - 6.0.0 - Mend

@jamie-tam/forge 6.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (213) hide show

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 forge contributors
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,389 @@
+# forge
+**A production-grade `.claude` harness for prototype-driven AI development**
+<!-- Badges -->
+<!-- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) -->
+<!-- [![Claude Code](https://img.shields.io/badge/Claude%20Code-compatible-blue.svg)](#) -->
+---
+## What is this?
+forge is a complete `.claude` folder that turns Claude Code into a structured development workflow engine. It covers the path from idea to production via a **prototype-driven SDLC** (concept slides → wireframe → working prototype → codify → production build → deliver), with quality gates that apply where they actually catch bugs — at the prototype-to-production transition. Drop it into any project and have an opinionated, battle-tested development process from day one.
+Release history: see `CHANGELOG.md`.
+## What's new in v6
+v6 is a substantial redesign driven by dogfood evidence. Two named pillars:
+1. **Prototype-driven SDLC** — replaces v4/v5's plan-driven pipeline. The artifact escalation (slides → wireframe → prototype → codify → production) puts iteration before code-perfection. Most phases are *softened*: trivial work skips them; the work manifest's `phase_plan` tracks deviations explicitly.
+2. **`aiwiki/` knowledge layer** — typed pages (decisions / gotchas / conventions / architecture / sessions) with real-time LINT on every write and async `dream` consolidation at phase-close. Dream proposals land in `aiwiki/proposed/` for user review (atomic swap on accept, discard on reject).
+The v6 schema (`templates/manifests/v6/SCHEMA.md`) adds `phase_plan:` at the top of every work manifest — six plan-status values (`active`, `active-light`, `active-commit-only`, `skipped`, `as-discovered`, `complete-inline`) cover the patterns dogfood surfaced.
+## Key Features
+- **30 skills** spanning the prototype-driven pipeline plus production-grade quality gates
+- **13 commands** for discovery, setup, workflows, evolution, and validation
+- **17 specialized agents** with explicit phase-context (prototype-builder for Phase 4, builder for Phase 6, plus architects, reviewers, hunters, dreamers)
+- **8 common rules** tiered by phase (always-on safety floor + phase-conditional from codify onward) + **7 common references** for skill authoring, agent coordination, and workflow protocol
+- **Work manifests** with `phase_plan` (preflight intent), `phases` (gate state), `slice_graph` (build decomposition), and `artifacts` (phase outputs + lock signals)
+- **`aiwiki/` typed pages** with LINT on every write + async dream consolidation
+- **Quality gates** at the prototype-to-production transition where they catch real bugs, not premature plan-phase gates
+- **Self-learning** via the gotcha system that captures lessons and promotes them to rules when patterns repeat
+---
+## Installation
+```bash
+npx @jamie-tam/forge init
+```
+This installs everything into your project's `.claude/` directory:
+- **Rules** — phase-tiered (always-on safety floor + phase-conditional standards from codify onward)
+- **Skills** — 30 atomic skills covering the prototype-driven pipeline + production gates
+- **Commands** — 8 slash commands for setup, workflows, evolution, and validation
+- **Agents** — 17 specialized subagents with MCP server access (context7, playwright)
+- **Hooks** — gate-enforcer, wiki-lint, telemetry, state preservation, session checkpoints
+- **CLAUDE.md** — project configuration template
+- **`aiwiki/` scaffold** — typed-page schemas, LINT config, usage rules (populated on first work item)
+### Update
+```bash
+npx @jamie-tam/forge update           # sync to latest version
+npx @jamie-tam/forge update --dry-run # preview what would change
+```
+Syncs forge assets to the latest version. Detects user-modified files and warns before overwriting. Cleans stale files from previous versions. CLAUDE.md is always preserved.
+### Uninstall
+```bash
+npx @jamie-tam/forge uninstall           # remove forge, keep user files + state
+npx @jamie-tam/forge uninstall --force   # also remove CLAUDE.md and AGENTS.md
+npx @jamie-tam/forge uninstall --purge   # also remove .forge/ state directory
+npx @jamie-tam/forge uninstall --dry-run # preview what would be removed
+```
+Only removes forge-managed files — your custom skills, agents, and rules are preserved. Uses the installed file manifest for precise removal.
+### After Installation
+```bash
+# In Claude Code, run /setup to detect your stack and fill .claude/CLAUDE.md
+/setup
+# Then start building
+/feature "add user authentication"
+```
+---
+## For AI Agents
+If you are an LLM agent (Claude Code, Cursor, Windsurf, etc.) helping a user set up, update, or remove forge, follow these instructions.
+### Install
+```bash
+npx @jamie-tam/forge init
+```
+- Run from the root of the user's project directory.
+- Requires Node.js 18+. Verify with `node --version` before running.
+- After install, verify `.claude/` directory exists and contains `rules/`, `skills/`, `commands/`, `agents/`.
+- Then run the `/setup` slash command to detect the project stack and fill `.claude/CLAUDE.md`.
+### Update
+```bash
+npx @jamie-tam/forge update --dry-run   # preview first
+npx @jamie-tam/forge update             # apply
+```
+- Always run `--dry-run` first and show the user what will change.
+- User-modified files are preserved automatically. If `--force` is passed, all files are overwritten.
+- CLAUDE.md is never overwritten.
+### Uninstall
+```bash
+npx @jamie-tam/forge uninstall --dry-run   # preview first
+npx @jamie-tam/forge uninstall             # apply
+```
+- Always run `--dry-run` first and confirm with the user before removing.
+- Default uninstall only removes forge-managed files. User-created skills, agents, and rules are preserved.
+- `--force` also removes CLAUDE.md and AGENTS.md. `--purge` also removes `.forge/` state. Only use these if the user explicitly asks.
+---
+## Prerequisites
+- **Claude Code** — Required. [Get started](https://claude.ai/claude-code)
+- **Node.js 18+** — Required for `npx @jamie-tam/forge init`
+## Recommended Tools (Optional)
+Forge detects these tools automatically and offers integration when available. Without them, forge works exactly the same — they add capabilities, not requirements.
+| Tool | What It Adds | Install |
+|------|-------------|---------|
+| **OpenAI Codex** | Second-opinion reviews across 10 skills (code review, security audit, architecture verification), alternative TDD builder | [Codex plugin for Claude Code](https://github.com/openai/codex-plugin-cc) |
+| **Graphify** | Knowledge graph for codebase navigation across 5 skills, up to 283x token reduction on large repos, feeds into Codex for maximum gap detection | `pip install graphifyy` (open source) |
+> **Codex:** When detected, forge offers a second opinion at quality gates and planning reviews. You choose per-feature whether to use it (`always`, `ask each time`, or `never`). Codex excels at gap detection — finding unused code, mismatched limits, and authorization inconsistencies. See `protocols/codex.md`.
+> **Graphify:** When `graphify-out/graph.json` exists in your project, forge automatically enriches codebase analysis with community structure, god nodes, and dependency edges. If the graphify CLI is also installed, skills use targeted graph queries during security audits, code reviews, and debugging. Graph context feeds into Codex for the highest-impact combination. See `protocols/graphify.md`.
+---
+## Architecture
+```
+Commands (workflows)          Skills (prefix-grouped)              Rules (phase-tiered)
+---------------------         -----------------------              --------------------
+/forge   (discovery)          concept-slides     [Phase 1]         Always-on (safety floor):
+/setup                        build-wireframe    [Phase 2]           common/security
+/feature                      build-prototype    [Phase 3]           common/guardrails
+/greenfield                   iterate-prototype  [Phase 4]           common/verification
+/bugfix                       harden             [Phase 5]           common/skill-selection
+/refactor                     build-scaffold                       Phase-conditional (from codify):
+/hotfix                       build-tdd          [Phase 6]            common/testing
+/evolve                       build-pr-workflow                       common/quality-gates
+/validate                     quality-code-review                     common/git-workflow
+                              quality-test-plan
+Work Manifests (v6)           quality-test-execution               References (load on demand)
+-------------------           quality-security-audit               ----------------------------
+.forge/work/                  quality-uiux                         common/
+  {type}/                     deliver-deploy     [Phase 7]           agent-coordination
+    {name}/                   deliver-db-migration                   io-protocol
+      manifest.yaml           deliver-onboarding                     skill-authoring
+        schema_version: "6"   discover-requirements                  skill-compliance
+        phase_plan: {…}       discover-codebase-analysis             phases (canonical vocab)
+        phases: {…}           plan-brainstorm                        coding-standards
+        slice_graph: {…}      plan-architecture                    typescript/  python/  react/
+        artifacts: {…}        plan-design-system                     [extensible]
+                              plan-task-decompose
+aiwiki/ (knowledge layer)     support-system-guide                 Protocols (read on demand)
+-------------------------     support-debug                        --------------------------
+  decisions/                  support-gotcha                       protocols/codex.md
+  gotchas/                    support-wiki-lint                    protocols/graphify.md
+  conventions/                support-runtime-reachability
+  architecture/               support-dream                        Hooks
+  sessions/                   support-skill-validator              ------
+  raw/                                                             gate-enforcer
+  proposed/  (dream queue)                                         wiki-lint
+  schemas/                                                         session-start / pre-compact
+  CLAUDE.md  (usage rules)                                         telemetry
+```
+---
+## Commands Overview
+| Command | Description | When to Use |
+|---------|-------------|-------------|
+| `/forge` | One-screen orientation -- installed capabilities, active work, suggested next | Run this first if you're new, or any time you want an overview |
+| `/setup` | Detect project stack, install matching language rules, and fill the project profile | Right after `npx @jamie-tam/forge init` |
+| `/feature` | Full feature workflow with quality gates at every transition | Adding a feature to an existing codebase |
+| `/greenfield` | New project from zero -- includes scaffolding and onboarding docs | Starting a brand-new project |
+| `/bugfix` | Systematic debug, TDD fix, code review, and PR | Something is broken and needs fixing |
+| `/refactor` | Analyze, plan approach, restructure with tests, PR | Code needs cleanup without new functionality |
+| `/hotfix` | Emergency compressed flow -- debug, minimal test, deploy | Production is down, need a fix now |
+| `/evolve` | Review gotchas, improve skills, validate changes | Improving the system itself after a project cycle |
+| `/validate` | Run 5 consistency checks across all skills, rules, and commands | After customizing skills or running `/evolve` |
+---
+## Skills Overview
+### Prototype-driven SDLC (Phases 1-5) — v6 new
+| Skill | Phase | Description |
+|-------|-------|-------------|
+| `concept-slides` | 1 | Low-fidelity marp slide deck capturing the product idea (hook, sub-concepts, journey, deferrals) |
+| `build-wireframe` | 2 | Single-HTML annotated wireframe with click-through demo and callouts; backend-heavy products get a verification-UI sketch |
+| `build-prototype` | 3 | Vite + React + TS + Tailwind v4 + Zustand v5 scaffold with parallel partition builders |
+| `iterate-prototype` | 4 | Polish loop with feedback file + 5-cycle drift checks; captures gotchas + conventions as side effects |
+| `harden` | 5 | Codifies the locked prototype into architecture files + ADRs (with `/second-opinion` review) + slice graph + tasks |
+### Discover & Plan (fallback for non-prototype work)
+| Skill | Description |
+|-------|-------------|
+| `discover-requirements` | Transform raw input (meetings, emails, screenshots) into structured requirements |
+| `discover-codebase-analysis` | Map existing code patterns, dependencies, and conventions before making changes |
+| `plan-brainstorm` | Explore 2-3 approaches with tradeoffs, reach user-approved decision (non-prototype fallback) |
+| `plan-architecture` | Generate API contracts, DB schemas, system diagrams (non-prototype fallback) |
+| `plan-task-decompose` | Break work into 2-5 minute implementable tasks with dependency mapping |
+| `plan-design-system` | Establish visual direction (theme, colors, spacing, typography, component specs) — independent of prototype path |
+### Build (Phase 6 — production)
+| Skill | Description |
+|-------|-------------|
+| `build-scaffold` | Set up new project structure, tooling, Docker, and initial tests |
+| `build-tdd` | Strict RED-GREEN-REFACTOR cycle — no production code without a failing test. Dispatches `builder` agent or runs inline |
+| `build-pr-workflow` | Git worktree management, atomic PR-per-function, manifest linking |
+### Quality (gates at the prototype-to-production transition)
+| Skill | Description |
+|-------|-------------|
+| `quality-code-review` | Multi-stage review chain: safety → craft → runtime reachability → gotcha-hunter (→ optional Codex adversarial) |
+| `quality-test-plan` | Generate comprehensive test strategy: unit, integration, E2E, smoke, load, contract |
+| `quality-test-execution` | Execute ALL tests from the plan — no skipping any type or scenario |
+| `quality-security-audit` | OWASP Top 10, secrets scanning, dependency vulnerabilities, Codex + graphify integration |
+| `quality-uiux` | Accessibility, responsive design, component reuse, library recommendations |
+### Deliver (Phase 7)
+| Skill | Description |
+|-------|-------------|
+| `deliver-deploy` | Pre-deploy verification, staging smoke tests, deploy, post-deploy verification |
+| `deliver-db-migration` | Safe migration with rollback scripts and forward+rollback+forward testing |
+| `deliver-onboarding` | Generate getting-started guide, architecture overview, testing guide, common tasks |
+### Support (cross-cutting)
+| Skill | Description |
+|-------|-------------|
+| `support-system-guide` | Meta-skill: guides which command/skill to use, manages `.forge/` + `aiwiki/` directories |
+| `support-debug` | Systematic 4-phase debugging with 3-fix threshold and fact-checking |
+| `support-gotcha` | Self-learning from mistakes — captures lessons to `aiwiki/gotchas/`, auto-promotes after N=3 occurrences |
+| `support-wiki-lint` | Validates `aiwiki/` page schemas + citations on every write (hook-driven); backfills `@<sha7>` citation hashes |
+| `support-dream` | Async wiki consolidation (merge/promote/prune) at phase-close / pre-compact / manual. Writes to `aiwiki/proposed/` for user review |
+| `support-runtime-reachability` | Detects orphaned exports (code with no production caller) — the gate that catches "tests pass but feature isn't wired up" |
+| `support-skill-validator` | Validate skill consistency: conflicts, I/O gaps, overlaps, gate coverage, drift |
+---
+## How It Works
+A standard-size `/feature "add auth"` on an existing prototype-driven codebase flows like this (preflight collapses phases based on scope):
+```
+User: /feature "add auth"
+  |
+  +-- Step 0: Repo-state detection
+  |     Prototype mode? → offer redirect to iterate-prototype
+  |     Production mode? → proceed
+  +-- Step 0a: Preflight checks (git clean, tests pass)
+  +-- Step 0b: aiwiki/ scaffold if missing
+  +-- Step 1: phase_plan — preflight planner proposes per-phase
+  |     active / active-light / skipped status; user confirms
+  +-- Create/resume manifest (.forge/work/feature/add-auth/manifest.yaml)
+  |
+  +-- Phase 2: [concept-slides] (if active)
+  |     Low-fidelity deck — what is this feature, where it sits
+  |   === GATE: concept locked → artifacts.concept.locked_at ===
+  |
+  +-- Phase 3: [build-wireframe] (if active)
+  |     Single-HTML mini-wireframe for the feature's screens
+  |   === GATE: wireframe locked → artifacts.wireframe.locked_at ===
+  |
+  +-- Phase 4: [build-prototype] + [iterate-prototype] (if active)
+  |     Working prototype skin into existing app; iterate to satisfaction
+  |     Side effects: capture gotchas + conventions to aiwiki/
+  |   === GATE: prototype locked → artifacts.prototype.locked_at ===
+  |
+  +-- Phase 5: [harden] (if active)
+  |     Codify prototype → aiwiki/architecture/, ADRs (with /second-opinion),
+  |     slice graph in manifest. Dream consolidates phase 4 captures.
+  |   === GATE: codify locked + user approves dream proposal ===
+  |
+  +-- Phase 6: [build-pr-workflow] worktree → per-slice:
+  |     [build-tdd] strict RED-GREEN-REFACTOR
+  |     [quality-code-review] safety → craft → runtime-reach → gotcha-hunter
+  |       (→ optional Codex adversarial cross-check)
+  |     [support-runtime-reachability] orphan-export gate (catches wiring gaps)
+  |     Per-slice dream consolidates slice-scoped wiki updates
+  |   === GATE: all slices terminal + code-review-final passes ===
+  |
+  +-- [quality-test-plan] + [quality-test-execution]
+  +-- [quality-uiux] (if frontend)
+  +-- [quality-code-review] final pass across full diff
+  |   === GATE: zero failures, full traceability ===
+  |
+  +-- Phase 7: [build-pr-workflow] create PR(s)
+  +-- [deliver-deploy] (optional)
+  +-- [deliver-onboarding] (for major features)
+  +-- [support-gotcha] retrospective; final dream consolidates
+```
+The manifest tracks plan, gate state, slice graph, and phase artifacts (with `locked_at` timestamps acting as lock signals). Wiki-lint fires on every `aiwiki/` write. Dream runs async at phase-close and pre-compact, writing proposals to `aiwiki/proposed/` for user review. If a session ends mid-feature, running `/feature "add auth"` again resumes from exactly where you left off — phases with `artifacts.{phase}.locked_at` set are skipped on resume.
+**Trivial features collapse phases.** A small UI tweak might run only Phase 6 + deliver, with concept/wireframe/prototype/codify marked `skipped` in `phase_plan`. The preflight step decides per-feature.
+---
+## Comparison to Alternatives
+forge synthesizes the best patterns from four leading Claude Code configuration repos. For a detailed feature-by-feature comparison, see **[docs/COMPARISON.md](docs/COMPARISON.md)**.
+---
+## Provenance
+Every skill in forge has a documented origin -- what it was inspired by, what was adapted, and what is entirely new. See **[docs/PROVENANCE.md](docs/PROVENANCE.md)** for the full provenance table.
+---
+## Philosophy
+**Process over tools.** Skills enforce workflows, not just provide advice. Every skill has a defined process with concrete steps, not vague guidance.
+**Test-first discipline.** No production code without a failing test. RED-GREEN-REFACTOR is non-negotiable. Tests prove behavior, not just coverage numbers.
+**Self-improving.** The gotcha system captures lessons from every feature, every bug, every debugging session. When patterns repeat three times, they promote to rules. The system gets better the more you use it.
+**Quality over speed.** Quality gates enforce real criteria at every phase transition. You cannot skip from brainstorm to code without approved architecture. You cannot deploy with failing tests. The gates exist because cutting corners costs more time in the long run.
+**Adaptive to project.** The project profile in CLAUDE.md controls how skills behave. A startup prototype gets lighter reviews and skips load testing. A mission-critical system gets full security audits and multi-pass reviews. One system, tuned to your context.
+---
+## Quick Reference
+For a compact cheat sheet covering all commands, workflows, and skills, see **[docs/CHEATSHEET.md](docs/CHEATSHEET.md)**.
+---
+## Contributing
+Contributions are welcome. To contribute:
+1. Fork the repository
+2. Create a feature branch (`feat/your-improvement`)
+3. Make your changes
+4. Run `/validate` to ensure skill consistency
+5. Submit a pull request with a clear description of what changed and why
+If you are adding a new skill, follow the I/O protocol (skills with artifact handoffs should declare Requires and Produces; pure methodology skills may omit the I/O Contract) and ensure your skill does not conflict with existing ones.
+---
+## License
+[MIT](LICENSE)
+---
+## Credits
+forge stands on the shoulders of four excellent Claude Code configuration repositories:
+- **[gstack](https://github.com/garrytan/gstack)** by Garry Tan -- The "Boil the Lake" philosophy. Role-based virtual team, security auditing (`/cso`), deployment pipeline (`/ship`), and the idea that completeness is cheap with AI. Inspired our guardrails, security audit, and deploy skills.
+- **[superpowers](https://github.com/obra/superpowers)** by Jesse Vincent (obra) -- Test-first discipline and systematic process skills. The most rigorous TDD and debugging methodology available. Directly influenced our `build-tdd`, `support-debug`, `plan-brainstorm`, and code review approach.
+- **[everything-claude-code](https://github.com/AffaanMustafa/everything-claude-code)** (ECC) by Affaan Mustafa -- Battle-tested over 10+ months with 125+ skills and 28 agents. Layered rules, continuous learning via instincts, and enterprise-grade coverage. Inspired our gotcha system, onboarding skill, and layered rule architecture.
+- **[oh-my-claude-code](https://github.com/jcosta33/oh-my-claude-code)** by jcosta33 -- Slash command ergonomics and composable prompt engineering. Influenced our command routing, skill invocation patterns, and the overall developer experience of the slash command system.
+forge combines superpowers' process discipline, gstack's safety guardrails, ECC's layered rules, and oh-my-claude-code's slash command ergonomics -- then adds feature manifests, quality gates, I/O protocol, self-learning, and skill validation to create a reusable foundation for all projects.

package/agents/architect.md ADDED Viewed

@@ -0,0 +1,92 @@
+---
+name: architect
+color: blue
+description: "Designs and evaluates system architecture — API contracts, DB schemas, component design, and trade-off analysis. Use for complex designs touching 3+ components or introducing new patterns."
+tools: [Read, Glob, Grep, WebSearch, WebFetch]
+mcpServers: [plugin:context7:context7]
+model: opus
+effort: max
+---
+# Architect Agent
+You are a senior software architect evaluating system design decisions. You think in terms of trade-offs, not absolutes. Every architecture decision has costs and benefits — your job is to make those trade-offs explicit.
+## Evaluation Areas
+### 1. System Design
+- Is the component decomposition appropriate? (not too monolithic, not too fragmented)
+- Are component boundaries well-defined? (clear responsibilities, minimal overlap)
+- Is the communication pattern between components appropriate? (sync vs async, direct vs message bus)
+- Are there single points of failure?
+- Is the design testable? (can components be tested in isolation?)
+### 2. API Design
+- Are endpoints RESTful and consistent? (naming, HTTP methods, status codes)
+- Are request/response shapes well-typed and validated?
+- Is versioning considered?
+- Are error responses structured and informative?
+- Is pagination, filtering, and sorting handled for list endpoints?
+- Are rate limiting and authentication considered?
+### 3. Data Modeling
+- Is the schema normalized appropriately? (not over-normalized, not under-normalized)
+- Are indexes defined for common query patterns?
+- Are foreign key relationships correct?
+- Is data integrity enforced at the database level? (constraints, not just application logic)
+- Are there migration concerns? (data migration, not just schema migration)
+### 4. Scalability
+- Will this design handle 10x the current load?
+- Are there bottlenecks? (database, network, computation)
+- Can components scale independently?
+- Is caching appropriate? (what, where, invalidation strategy)
+- Are there stateful components that would complicate horizontal scaling?
+### 5. Maintainability
+- Is the design documented somewhere a new developer can find it (architecture file, ADR, or README)?
+- Are there implicit contracts that should be explicit?
+- Is the design consistent with the existing codebase patterns?
+- Will this design be easy to modify when requirements change?
+### 6. Security Architecture
+- Are trust boundaries clearly defined?
+- Is data encrypted at rest and in transit where needed?
+- Are authentication and authorization patterns consistent?
+- Are there data exposure risks in the API design?
+## Process
+1. **Read** all architecture artifacts and relevant code
+2. **Research** when uncertain — use WebSearch and WebFetch to verify architectural patterns, library choices, and best practices
+3. **Analyze** each area above
+4. **Present** findings with trade-off analysis, not just recommendations
+## Output Format
+For each finding:
+```
+AREA: {System Design / API Design / Data Modeling / Scalability / Maintainability / Security}
+SEVERITY: {Critical / Important / Consideration}
+FINDING: {What was observed}
+TRADE-OFF: {What are the options and their costs/benefits}
+RECOMMENDATION: {What I suggest and why}
+```
+End with:
+```
+ARCHITECTURE ASSESSMENT
+=======================
+Overall: SOUND / NEEDS REVISION / FUNDAMENTAL CONCERNS
+Key strengths: {what is done well}
+Key risks: {what needs attention}
+Recommended changes: {prioritized list}
+```
+## Rules
+- Think in trade-offs, not absolutes. "It depends" is a valid answer if you explain what it depends on.
+- When recommending a pattern or library, verify it is current and well-maintained (use WebSearch).
+- Reference existing codebase patterns. Do not recommend patterns that contradict the established architecture without strong justification.
+- Every Critical finding must include a concrete alternative design.
+- Do not over-engineer. A simple design that works is better than a complex design that might scale.

package/agents/builder.md ADDED Viewed

@@ -0,0 +1,122 @@
+---
+name: builder
+color: green
+description: "Implements features, fixes, and refactors using strict TDD. Use when a task needs production code + tests written."
+tools: [Read, Edit, Write, Glob, Grep, Bash]
+mcpServers: [plugin:context7:context7]
+model: opus
+effort: xhigh
+---
+# Builder Agent
+You are an implementation-focused developer. You receive a task with exact file paths, code examples, and expected outcomes. You execute using strict Test-Driven Development. The dispatch prompt carries task scope and artifacts only; the TDD methodology lives here.
+Start immediately. No acknowledgments. Dense output over verbose.
+## Phase Context
+This agent runs at **Phase 6 (production-build)** — after `harden` has codified the architecture and slice graph. The Iron Law below (no production code without a failing test) is unconditional within this phase.
+**Phase 4 (prototype) is the exception** — `prototype-builder` is the prototype-phase implementation agent and intentionally does NOT enforce TDD. Prototype iteration uses manual click-through verification, not test coverage. If you find yourself dispatched against a prototype-phase task (`phase_plan.production-build: skipped` or work path under `pocs/`), surface the mismatch and stop — the caller should be invoking `prototype-builder`, not `builder`.
+## What You Receive
+Each task includes:
+- **Task description** with exact file paths and code
+- **Architecture artifacts** (API contracts, DB schema) — provided inline in your prompt since `.forge/` is not in version control
+- **Codebase conventions** from the codebase analysis
+- **Dependencies** — which tasks must complete before yours
+- **Test oracles** — assertions the slice must satisfy (file paths, error codes, state transitions), produced by `harden` Step 2.5 and stored at `aiwiki/oracles/{slug}.md` for the slice you are building
+## Oracles Are Load-Bearing
+Before writing a RED test, load `aiwiki/oracles/{slug}.md` for the slice being built. Production code must satisfy every oracle assertion listed there. If oracles are missing for a slice that crosses a wiring boundary (CLI ↔ filesystem, process exec, manifest I/O, external services), halt and surface to the user — do not invent oracles or proceed against an empty oracle file. The harden phase owes you that artifact; missing oracles means harden has not closed, and silently filling the gap re-creates the wiring-blindness that mock-heavy TDD has historically caused.
+## Investigation (before coding)
+Classify: **Trivial** (single file) | **Scoped** (2-5 files) | **Complex** (multi-system).
+For non-trivial tasks, investigate first:
+1. Glob to map relevant files, Grep to find existing patterns, Read to understand context
+2. Answer: What patterns does this codebase use? What tests exist? What could break?
+Skip investigation for trivial single-file tasks.
+## TDD Guardrails
+The Iron Law:
+```
+NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
+```
+If code for this behavior exists before RED, delete it and re-implement from the test.
+Do not keep it as reference.
+Do not adapt it incrementally while writing the test.
+Verify RED before GREEN:
+- The target test fails.
+- It fails because the feature or fix is missing.
+- It does not fail because of syntax errors, import errors, or broken test setup.
+- If it fails for the wrong reason, fix the test/setup first and rerun RED.
+Use architecture artifacts as constraints:
+- API contracts define the shapes, status codes, and errors your unit tests should assert.
+- DB schema artifacts define constraints and invariants your tests must respect.
+- Integration tests are required at every module boundary where the slice crosses CLI, filesystem, process execution, manifest I/O, or external services. Mocks are permitted ONLY for nondeterministic, slow, paid, or unavailable dependencies; every mocked boundary requires at least one contract or integration test elsewhere that exercises real wiring.
+## Execution Protocol
+For each task:
+1. **Read** the task and referenced files to understand context
+2. **Verify libraries** via context7 (MCP) before writing tests — never assume API signatures
+3. **RED** — write the failing test exactly as specified in the task
+4. **Verify RED** — run the test, confirm it fails because the behavior is missing, not because of syntax/import/setup errors
+5. **GREEN** — write minimal code to pass
+6. **Verify GREEN** — run the test, confirm all tests pass
+7. **REFACTOR** — clean up if needed, keep tests green
+8. **Commit** — one conventional commit per TDD cycle
+## Commit Discipline
+One commit per completed TDD cycle. Format: `type(scope): description`
+```bash
+# Stage specific files, never git add .
+git add src/services/payment.ts tests/services/payment.test.ts
+git commit -m "feat(payments): add createPayment with validation"
+```
+The commit message describes the **behavior added**, not the test written.
+## Rules
+- Follow the TDD protocol in this file exactly
+- Do NOT write production code before a failing test
+- Do NOT add features beyond what the task specifies
+- Do NOT modify files outside the task's scope
+- Do NOT skip test verification steps (RED and GREEN must both be confirmed by running tests)
+- When uncertain about a library API, verify via context7 — never guess
+- If the task references an architecture contract, use it to set unit-test expectations and validate your implementation matches it exactly
+- If you encounter a blocker (missing dependency, unclear requirement), stop and report — do not guess
+- Write minimal code to pass each test — no abstractions, helpers, or configurability beyond what the test requires
+- Match the codebase's existing patterns (naming, error handling, import style) — discover them during investigation
+## Failure Modes to Avoid
+- **Premature completion** — saying "done" before running verification
+- **Test hacks** — modifying tests to pass instead of fixing production code
+- **Scope creep** — fixing "while I'm here" issues in adjacent code
+- **Silent failure** — looping on the same broken approach without progress
+After 3 failed attempts on the same issue, stop and report with full context — do not keep guessing.
+## Output
+When complete, report:
+- Files created/modified
+- Tests written and their status
+- Commits made (hashes and messages)
+- Any issues encountered