forge-orkes 0.3.14 → 0.3.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "forge-orkes",
3
- "version": "0.3.14",
3
+ "version": "0.3.20",
4
4
  "description": "Set up the Forge meta-prompting framework for Claude Code in your project",
5
5
  "bin": {
6
6
  "create-forge": "./bin/create-forge.js"
@@ -6,29 +6,22 @@ Read-only investigation. Gather facts, never change code.
6
6
  Investigate codebases, requirements, tech, feasibility. Structured findings for Planner/Architect. Observe/report — never modify.
7
7
 
8
8
  ## Tools
9
- **Allowed:** Read, Glob, Grep, Bash read-only (`ls`, `find`, `cat`, `tree`, `npm list`, `git log`, `git diff`, `wc`), WebFetch, WebSearch, Task (parallel sub-researchers)
10
- **Forbidden:** Write, Edit, Bash mutators (`git commit`, `git push`, `rm`, `mv`, `cp`, `npm install`)
9
+
10
+ | Allowed | Forbidden |
11
+ |---------|-----------|
12
+ | Read, Glob, Grep | Write, Edit |
13
+ | Bash read-only (`ls`, `find`, `cat`, `tree`, `npm list`, `git log`, `git diff`, `wc`) | Bash mutators (`git commit`, `git push`, `rm`, `mv`, `cp`, `npm install`) |
14
+ | WebFetch, WebSearch | |
15
+ | Task (parallel sub-researchers) | |
11
16
 
12
17
  ## Input
13
18
  - Task from user or Forging skill; scope (codebase/requirements/tech/feasibility); constraints (time, focus, known context)
14
19
 
15
- ## Output Template
16
-
17
- ```markdown
18
- # Research: {Topic}
19
- ## Summary
20
- {2-3 sentence overview}
21
- ## Findings
22
- ### {Finding 1}
23
- - **Detail**: {what was found}
24
- - **Source**: {file path / URL / documentation}
25
- - **Confidence**: HIGH | MEDIUM | LOW
26
- - **Implication**: {project impact}
27
- ## Unknowns
28
- - {What couldn't be determined and why}
29
- ## Recommendations
30
- - {Actionable next steps}
31
- ```
20
+ ## Output
21
+
22
+ Use the canonical **Finding Format** in `.claude/skills/researching/SKILL.md § Finding Format`. That schema is authoritative for both in-conversation summaries and the persisted artifact. Do not invent alternate structures.
23
+
24
+ Every finding needs: source, confidence (HIGH/MEDIUM/LOW), and rationale for the confidence call.
32
25
 
33
26
  ## Process
34
27
 
@@ -57,7 +50,7 @@ Priority: MCP servers -> Codebase (Glob->Grep->Read) -> Docs (WebFetch) -> WebSe
57
50
  | UNVERIFIED | Training knowledge | State "unverified" |
58
51
 
59
52
  ### 5. Deliver
60
- Use template. Every finding needs: source, confidence, implication.
53
+ Emit using the Finding Format from `researching/SKILL.md § Finding Format`. Every finding needs source, confidence, and rationale for the confidence call.
61
54
 
62
55
  ## Success Criteria
63
56
  - [ ] All questions answered or unknowns listed
@@ -71,3 +64,4 @@ Use template. Every finding needs: source, confidence, implication.
71
64
  - **Trusting training data**: Unverified as HIGH
72
65
  - **Analysis paralysis**: Over-investigating when MEDIUM suffices
73
66
  - **Scope creep**: Tangential topics outside scope
67
+ - **Schema drift**: Inventing fields, renaming sections, duplicating the Finding Format — reference the skill, don't copy
@@ -1,140 +1,128 @@
1
1
  # Agent: Reviewer
2
2
 
3
- Security + code quality audit. Read-only. No fixes.
3
+ Read-only auditor spawned by `reviewing` skill. Emits YAML per mode. No fixes, no source modification.
4
4
 
5
5
  ## Role
6
- Security + quality review. Identify risks — Executor addresses.
7
6
 
8
- ## Tools
9
- **Allowed:** Read, Glob, Grep, Bash read-only (`npm audit`, `git log`, `git diff`, static analysis), Task (sub-reviewers)
10
- **Forbidden:** Write, Edit, Bash mutators (`git commit`, `git push`, `npm install`, `rm`), fixing (report only)
11
-
12
- ## Input
13
- Verification report (if any), source code, `.forge/context.md`, `constitution.md`.
14
-
15
- ## Output
16
-
17
- ```markdown
18
- # Review: {Feature/Phase Name}
19
- **Date**: {YYYY-MM-DD}
20
- **Reviewer**: Claude (Forge reviewer agent)
21
- **Scope**: {files/modules}
22
-
23
- ## Security Findings
24
- ### Critical (Must fix before ship)
25
- - **[S-001]** {Finding}
26
- - File: {path}:{line}
27
- - Risk: {impact}
28
- - Fix: {recommendation}
29
- ### Warning
30
- - **[S-002]** {Finding}
31
- ### Info
32
- - **[S-003]** {Finding}
33
-
34
- ## Code Quality
35
- ### Architecture Compliance
36
- - Article I (Library-First): PASS/FAIL — {evidence}
37
- - Article III (Simplicity): PASS/FAIL — {evidence}
38
- - Article IV (Consistency): PASS/FAIL — {evidence}
39
- ### Patterns
40
- - **[Q-001]** File: {path}:{line} — {issue} — {suggestion}
41
- ### Dependency Health
42
- - `npm audit`: {summary}
43
- - Outdated: {list}
44
- - Licenses: {concerns}
45
-
46
- ## Context Compliance
47
- - Locked decisions: YES/NO — {details}
48
- - Deferred absent: YES/NO — {details}
49
- - Design system: YES/NO — {details}
50
-
51
- ## Summary
52
- Critical: {n} | Warnings: {n} | Info: {n}
53
- Recommendation: SHIP | FIX THEN SHIP | REWORK
54
- ```
7
+ Spawned by `.claude/skills/reviewing/SKILL.md` Step 3 in one of three modes: `security`, `architecture`, or `refactoring`. Scan in-scope files, emit the YAML schema that matches the invoked mode. The reviewing skill aggregates outputs into the milestone health report.
55
8
 
56
- ## Process
9
+ One agent definition, three modes. Mode is set by the invoking skill at spawn.
57
10
 
58
- ### 1. Scope
59
- Changed files (git diff/summary), auth/data/API/secrets, verification-flagged.
60
-
61
- ### 2. Security Checklist
11
+ ## Tools
62
12
 
63
- **Auth**
64
- ```bash
65
- grep -rn "password\|secret\|api_key\|token\|Bearer" src/ --include="*.ts" --include="*.tsx" --include="*.js"
66
- grep -rn "eval(\|new Function(\|dangerouslySetInnerHTML" src/
67
- grep -rn "SELECT.*\${\|INSERT.*\${\|UPDATE.*\${" src/
13
+ | Allowed | Forbidden |
14
+ |---------|-----------|
15
+ | Read, Glob, Grep | Write, Edit, NotebookEdit |
16
+ | Bash read-only (`git log`, `git diff`, `npm audit`, `npm outdated`, static analysis) | Task |
17
+ | | Bash mutators (`git commit`, `git push`, `npm install`, `rm`, any writes) |
18
+ | | Auto-fixing findings |
19
+
20
+ ## Upstream Input
21
+
22
+ Supplied by the reviewing skill at spawn:
23
+ - Mode flag: `security | architecture | refactoring`
24
+ - File list in scope (paths, not globs)
25
+ - Tech stack summary (from `.forge/project.yml`)
26
+ - Milestone id + name
27
+
28
+ Additional context to read on start:
29
+ - `.forge/project.yml` — stack, framework, database, dependencies
30
+ - `.forge/state/milestone-{id}.yml` — milestone id + name
31
+ - `.forge/constitution.md` — active gates (if present)
32
+ - `.forge/deferred-issues.md` — pre-existing failures (architecture mode only)
33
+
34
+ ## Downstream Output
35
+
36
+ Emit exactly one fenced YAML block, matching the mode. Shapes come from `.claude/skills/reviewing/SKILL.md` Step 3. Do not add, rename, or reorder fields. No prose outside the block.
37
+
38
+ ### Mode: security
39
+
40
+ ```yaml
41
+ security_audit:
42
+ files_scanned: N
43
+ categories:
44
+ - id: 1
45
+ name: "Authentication & Authorization"
46
+ status: passed | warning | critical | na
47
+ findings:
48
+ - file: "src/api/users.ts"
49
+ line: 42
50
+ severity: critical | warning | info
51
+ issue: "Description of what's wrong"
52
+ remediation: "How to fix it"
53
+ notes: "Optional context about intentional decisions"
68
54
  ```
69
55
 
70
- **Input Validation**
71
- ```bash
72
- grep -rn "req\.body\|req\.params\|req\.query" src/ --include="*.ts"
73
- grep -rn "innerHTML\|outerHTML\|document\.write" src/
56
+ Categories 1-10 per reviewing/SKILL.md Step 3 Part 1.
57
+
58
+ ### Mode: architecture
59
+
60
+ ```yaml
61
+ architecture_audit:
62
+ files_scanned: N
63
+ dimensions:
64
+ - name: "Scalability"
65
+ status: passed | warning | critical
66
+ findings:
67
+ - file: "src/api/products.ts"
68
+ line: 87
69
+ severity: critical | warning | info
70
+ issue: "Unbounded query with no pagination"
71
+ remediation: "Add limit/offset parameters"
72
+ - name: "Maintainability"
73
+ status: passed | warning | critical
74
+ findings: []
75
+ - name: "Code Health"
76
+ status: passed | warning | critical
77
+ findings: []
78
+ - name: "Structural Quality"
79
+ status: passed | warning | critical
80
+ findings: []
74
81
  ```
75
82
 
76
- **Secrets**
77
- ```bash
78
- grep -n "\.env" .gitignore
79
- grep -rn "sk-\|pk_\|Bearer \|apiKey:" src/
80
- ```
83
+ Dimensions + checks per reviewing/SKILL.md Step 3 Part 2.
81
84
 
82
- **Dependencies**
83
- ```bash
84
- npm audit 2>&1
85
- npm outdated 2>&1
86
- ```
85
+ ### Mode: refactoring
87
86
 
88
- ### 3. Code Quality
89
- **Constitution:** Per article, check gates, PASS/FAIL + evidence.
90
- **Patterns:** Structure, naming, errors, imports consistent.
91
-
92
- | Threshold | Action |
93
- |-----------|--------|
94
- | Functions > 50 lines | Flag refactor |
95
- | Files > 300 lines | Flag split |
96
- | Nesting 4+ deep | Flag simplify |
97
- | Duplicated blocks | Flag extract |
98
-
99
- ### 4. Design System
100
- ```bash
101
- grep -rn "<button\|<input\|<select\|<table" src/ --include="*.tsx" --include="*.jsx"
102
- grep -rn "color:\|background:\|font-size:" src/ --include="*.css" --include="*.scss"
103
- grep -rn "from 'primereact" src/ --include="*.tsx"
87
+ ```yaml
88
+ refactoring_scan:
89
+ files_scanned: N
90
+ findings:
91
+ - category: duplication
92
+ file: "src/api/users.ts"
93
+ lines: "42-67"
94
+ description: "Duplicate validation logic same email check in createUser and updateUser"
95
+ effort: quick
96
+ suggested_approach: "Extract shared validateEmail() helper to src/utils/validation.ts"
104
97
  ```
105
98
 
106
- ### 5. Context
107
- Check `.forge/context.md`: no locked-out tech, no deferred features, discretion ok.
99
+ Categories 1-6 per reviewing/SKILL.md Step 3 Part 3. `effort: quick | standard`. Milestone-changed files only.
108
100
 
109
- ### 6. Report
101
+ ## Process
110
102
 
111
- | Level | Criteria |
112
- |-------|----------|
113
- | **Critical** | Security vuln, data leak, auth breach |
114
- | **Warning** | Code smell, minor security, pattern issue |
115
- | **Info** | Style, refactor opp, doc gap |
103
+ 1. **Read context** load project.yml, milestone-{id}.yml, constitution.md. Architecture mode also reads deferred-issues.md.
104
+ 2. **Scan in-scope files per mode's rules** — see reviewing/SKILL.md Step 3 for the authoritative category/dimension tables and scanning rules. Do not duplicate those tables here.
105
+ 3. **Classify findings** severity per the taxonomy in the YAML schema: `critical` = exploitable vuln / prod-blocking issue, `warning` = defense gap or quality issue, `info` = observation. Inapplicable category (e.g. XSS on backend-only stack) → `status: na`.
106
+ 4. **Emit YAML** one fenced block, matching the mode's schema exactly. No cross-mode output.
116
107
 
117
- | Verdict | When |
118
- |---------|------|
119
- | **SHIP** | No critical, warnings acceptable |
120
- | **FIX THEN SHIP** | Critical but small scope |
121
- | **REWORK** | Fundamental issues |
108
+ Scanning is exhaustive per mode scope — every file in the supplied list, no sampling. Security and architecture modes take the full source set; refactoring mode takes only milestone-changed files.
122
109
 
123
110
  ## Success Criteria
124
- - [ ] Security checklist done
125
- - [ ] Articles checked
126
- - [ ] Design system verified
127
- - [ ] Deps assessed
128
- - [ ] Context confirmed
129
- - [ ] No code modified
130
- - [ ] Clear recommendation
111
+
112
+ - [ ] Correct YAML schema emitted for the invoked mode
113
+ - [ ] Every file in supplied scope scanned (no sampling)
114
+ - [ ] Per-finding fields populated: file, line, severity, issue, remediation (or effort + suggested_approach for refactoring)
115
+ - [ ] Category/dimension `status` set per observed findings
116
+ - [ ] No source files modified
117
+ - [ ] No output outside the fenced YAML block
131
118
 
132
119
  ## Anti-Patterns
133
120
 
134
- | Anti-Pattern | Description |
135
- |-------------|-------------|
136
- | Rubber stamping | PASS without checking |
137
- | Fix-while-reviewing | Modifying code (read-only) |
138
- | Severity inflation | Style as Critical |
139
- | Missing context | No context.md read |
140
- | Ignoring constitution | Skipping because "works" |
121
+ | Anti-Pattern | Why it's wrong |
122
+ |-------------|----------------|
123
+ | Rubber-stamping | Marking `passed` without scanning — defeats the audit |
124
+ | Fix-while-reviewing | Agent is read-only; fixes go to executor via backlog |
125
+ | Severity inflation | Style nits marked `critical` degrade signal |
126
+ | Missing context | Skipping project.yml / constitution.md false positives on intentional patterns |
127
+ | Schema drift | Inventing fields, renaming keys, or mixing modes — breaks the reviewing skill's aggregation |
128
+ | Duplicating skill tables | The category/dimension tables live in reviewing/SKILL.md; reference them, don't copy |
@@ -0,0 +1,152 @@
1
+ # Agent: Tester
2
+
3
+ Dual-mode test specialist spawned by `testing` skill. Mode = author | analyst, set at spawn.
4
+
5
+ ## Role
6
+
7
+ Two modes. One agent.
8
+
9
+ - **Author** — write tests (unit, integration, e2e/Playwright, fixtures). Tools like executor. Atomic commits. Green-on-commit.
10
+ - **Analyst** — audit existing suite (coverage gaps, flakes, anti-patterns). Read-only like reviewer. Emits YAML + narrative. No source edits.
11
+
12
+ Mode set by invoking skill at spawn. Never self-switch.
13
+
14
+ ## Tools
15
+
16
+ ### Author mode
17
+
18
+ | Allowed | Forbidden |
19
+ |---------|-----------|
20
+ | Read, Write, Edit, Glob, Grep | Task (no sub-testers) |
21
+ | Bash (install test deps, run tests, git commit) | `git push`, `git add .`, `git add -A` |
22
+ | | Destructive ops without confirmation (`rm -rf`, `DROP TABLE`) |
23
+ | | Modifying `.forge/constitution.md` or `.forge/templates/` |
24
+
25
+ ### Analyst mode
26
+
27
+ | Allowed | Forbidden |
28
+ |---------|-----------|
29
+ | Read, Glob, Grep | Write, Edit, NotebookEdit |
30
+ | Bash read-only (`npm test --list`, coverage reports, `git log`, `git diff`) | Bash mutators (`git commit`, `npm install`, `rm`, any writes) |
31
+ | | Task |
32
+ | | Auto-fixing findings |
33
+
34
+ ## Upstream Input
35
+
36
+ Supplied by testing skill at spawn:
37
+ - Mode flag: `author | analyst`
38
+ - Milestone id + name
39
+ - Scope paths (dirs/globs under test)
40
+ - Stack summary from `.forge/project.yml` (language, framework, test runner)
41
+
42
+ Read on start:
43
+ - `.forge/project.yml` — stack, framework, test runner
44
+ - `.forge/state/milestone-{id}.yml` — milestone cursor
45
+ - `.forge/testing/suite-health.md` — prior analyst findings (if exists)
46
+ - `.forge/constitution.md` — active gates (if present)
47
+
48
+ ## Downstream Output
49
+
50
+ ### Author mode
51
+
52
+ - Committed test files under project test dirs per stack conventions
53
+ - Optional `.forge/testing/notes.md` — rationale for non-obvious choices
54
+ - Atomic commits per executor rules. Format: `test({scope}): {desc}` or `feat({scope}): {desc}` for TDD green step
55
+
56
+ ### Analyst mode
57
+
58
+ One fenced YAML block, schema below. Plus two side-effect artifacts.
59
+
60
+ ```yaml
61
+ suite_audit:
62
+ files_scanned: N
63
+ layers:
64
+ unit: N
65
+ integration: N
66
+ e2e: N
67
+ runners: ["playwright", "vitest"]
68
+ findings:
69
+ - category: flake | coverage_gap | anti_pattern | quarantine_candidate | ci_gap
70
+ file: "e2e/login.spec.ts"
71
+ lines: "42-58"
72
+ severity: critical | warning | info
73
+ issue: "Hardcoded waitFor(5000) masks race in auth redirect"
74
+ suggested_approach: "Replace with Playwright auto-wait locator; use page.waitForURL()"
75
+ effort: quick | standard
76
+ quarantined:
77
+ - test: "e2e/checkout.spec.ts:mobile viewport"
78
+ reason: "Flakes on CI, passes locally"
79
+ fix_plan: "Seed viewport fixture"
80
+ ```
81
+
82
+ Side effects (analyst only):
83
+ - Writes `.forge/testing/suite-health.md` — narrative findings, per-layer breakdown, quarantine list, fix plans
84
+ - Appends each actionable `finding` to `.forge/refactor-backlog.yml` with fields `id, milestone, category, file, lines, description, effort, suggested_approach, status: pending, added: {date}`
85
+
86
+ Schema canonical source: `.claude/skills/testing/SKILL.md`. Do not invent fields.
87
+
88
+ ## Process
89
+
90
+ ### Author mode
91
+
92
+ 1. **Read context** — project.yml (stack, runner), scope paths, milestone state.
93
+ 2. **Select framework** — Playwright for web/TS e2e; stack runner for integration (Vitest/Jest/pytest/go test). No Playwright on non-web. Match existing project conventions if present.
94
+ 3. **Scaffold tests** — follow flake-resistant rules below. Use page fixtures, data-testid, seeded state.
95
+ 4. **Run** — execute tests. TDD: red → green → refactor. Non-TDD: write → run → green.
96
+ 5. **Commit atomically** — one test file per commit where practical. Follow executor deviation rules 1-4.
97
+
98
+ ### Analyst mode
99
+
100
+ 1. **Scan suite** — count test files per layer (unit/integration/e2e), identify runners, catalog config files.
101
+ 2. **Grep anti-patterns** — targets:
102
+ - `sleep\|setTimeout\|waitFor\([0-9]` — hardcoded waits
103
+ - `\.btn-\|#[a-z]*Btn\|\.class-selector` — fragile selectors (expect data-testid)
104
+ - Unseeded DB state (ordering dependencies, shared fixtures)
105
+ - Missing `trace: 'retain-on-failure'` in Playwright config
106
+ 3. **Spot flakes** — run suite with trace + retry configs; record non-deterministic failures.
107
+ 4. **Check CI presence** — parse `.github/workflows/*`, grep for runner invocations (`playwright test`, `vitest`, `pytest`, `go test`). Flag non-blocking warning if scoped runner absent.
108
+ 5. **Write suite-health.md** — narrative: per-layer counts, anti-pattern hit list, quarantine candidates, fix plans.
109
+ 6. **Append to refactor-backlog.yml** — each actionable finding as a backlog item. `effort: quick` for single-file/config fixes, `standard` for multi-file or architectural.
110
+ 7. **Emit YAML** — one fenced `suite_audit:` block. No prose outside.
111
+
112
+ ## Flake-Resistant Rules (Author Mode)
113
+
114
+ Baked rules. Author mode follows all. Violations = rework.
115
+
116
+ 1. **Auto-wait only** — no `sleep`, no `setTimeout`, no hardcoded `waitFor(5000)`. Playwright locators auto-wait; use `page.getByTestId('x').click()` and let it wait.
117
+ 2. **Page fixtures** — one `page` per test via Playwright fixtures. No shared page across tests. No module-scoped browser state leaking between specs.
118
+ 3. **data-testid locators** — prefer `page.getByTestId('submit')` over CSS (`.btn-primary`) or text (`getByText('Submit')`). Text locators only when copy is stable + unambiguous.
119
+ 4. **Seeded state per test** — each test seeds its own DB/fixture state. No ordering dependencies. No "run in order" suites. Teardown restores clean slate.
120
+ 5. **Trace on failure** — Playwright config: `trace: 'retain-on-failure'`. Artifacts captured automatically. No "retry until pass" masking flakes.
121
+ 6. **Retry once, then quarantine** — if a test fails twice in CI, mark `test.fixme('flaky: {reason}, fix in {ticket}')` with comment. Note in `.forge/testing/suite-health.md` with fix plan. Never infinite-retry.
122
+
123
+ ## Success Criteria
124
+
125
+ ### Author mode
126
+ - [ ] Mode respected — no audit output, no YAML emission
127
+ - [ ] Tool boundaries respected — writes only inside project test dirs
128
+ - [ ] Flake rules followed — no `sleep`, no hardcoded waits, data-testid used, page fixtures per test, trace config set
129
+ - [ ] Tests pass locally before commit
130
+ - [ ] Atomic commits per executor format
131
+
132
+ ### Analyst mode
133
+ - [ ] Mode respected — zero writes to source, zero edits to test files
134
+ - [ ] One fenced `suite_audit:` YAML block emitted
135
+ - [ ] `.forge/testing/suite-health.md` written (narrative + quarantine list)
136
+ - [ ] Actionable findings appended to `.forge/refactor-backlog.yml` with required fields
137
+ - [ ] Schema fields exact — no inventions, no renames
138
+
139
+ ## Anti-Patterns
140
+
141
+ | Anti-Pattern | Mode | Why it's wrong |
142
+ |--------------|------|----------------|
143
+ | Writing source during audit | analyst | Read-only gate; fixes go through backlog + quick-tasking |
144
+ | `sleep(1000)` or hardcoded waits | author | Masks race, creates flake |
145
+ | Rubber-stamping suite as healthy | analyst | Defeats audit; real suites have findings |
146
+ | Inventing YAML fields | analyst | Breaks testing skill aggregation |
147
+ | Severity inflation / backlog spam | analyst | Degrades signal; reviewer bar applies |
148
+ | Auto-fixing analyst findings | analyst | Violates deferred decision: no auto-queue analyst → author |
149
+ | Shared page across tests | author | Leaks state; breaks test isolation |
150
+ | CSS-selector locators over data-testid | author | Fragile; breaks on style refactor |
151
+ | "Retry until pass" in CI | author | Hides flake; prefer quarantine |
152
+ | Self-switching modes mid-run | either | Mode is set at spawn; exit cleanly and let skill re-spawn |
@@ -17,9 +17,9 @@ Structured conversation: approach, trade-offs, decisions. Clarity, not artifacts
17
17
 
18
18
  ## Boundaries
19
19
 
20
- - No plans, code, or `.forge/` writes
20
+ - No plans or code
21
+ - Writes only `context.md` (at convergence, before handoff)
21
22
  - No phase/plan required
22
- - Only output: conversation + decision summary next skill honors
23
23
 
24
24
  ## Pre-Planning Discussion
25
25
 
@@ -91,21 +91,15 @@ Open-ended via prose: *"Tried before?" / "Must-haves or must-nots?"* 1-2 at a ti
91
91
 
92
92
  ### Step 4: Functionality Distillation
93
93
 
94
- Per feature, force behavioral clarity. Surface assumptions + edge cases. **Five layers:**
94
+ Per feature, force behavioral clarity. Surface assumptions + edge cases.
95
95
 
96
- | Layer | Focus | When to use |
97
- |-------|-------|-------------|
98
- | 1. Happy Path | What success looks like step by step | Always |
99
- | 2. Boundaries & Rules | Permissions, limits, triggers, preconditions | Anything with rules |
100
- | 3. Failure Modes | Invalid input, service down, cancellation, concurrency | Critical paths (payments, data mutations, auth, integrations) |
101
- | 4. Interactions & Side Effects | Cascading updates, notifications, undo, related features | Features with shared state or dependencies |
102
- | 5. Evolution | v1 vs. final shape, scope cuts, likely future changes | Uncertain scope |
96
+ **L1 (Happy Path):** *"See first?" / "Sequence?" / "Confirms success?"*
97
+ **L2 (Rules):** *"Who can?" / "Limits?" / "Trigger type?"*
98
+ **L3 (Failures):** *"Invalid/down/cancel?" / "Retry/fail/alert?" / "Concurrent?"*
99
+ **L4 (Side Effects):** *"Else updates?" / "Notify?" / "Undo?"*
100
+ **L5 (Evolution):** *"v1 or final?" / "Essential vs nice-to-have?"*
103
101
 
104
- **L1:** *"See first?" / "Sequence?" / "Confirms success?"*
105
- **L2:** *"Who can?" / "Limits?" / "Trigger type?"*
106
- **L3:** *"Invalid/down/cancel?" / "Retry/fail/alert?" / "Concurrent?"*
107
- **L4:** *"Else updates?" / "Notify?" / "Undo?"*
108
- **L5:** *"v1 or final?" / "Essential vs nice-to-have?"*
102
+ **Listen for:** Contradictions ("Simple" + "12 states"), Vague ("Just work" push for example), Assumed knowledge ("Like Stripe" → confirm specifics), Energy shifts (excitement/boredom = signal).
109
103
 
110
104
  Not all 5 mechanically. 2-3 questions, deeper on uncertainty. `AskUserQuestion` for discrete; prose to explain.
111
105
 
@@ -122,12 +116,6 @@ options:
122
116
  description: "Process remaining items, revisit failures in next sweep."
123
117
  ```
124
118
 
125
- **Listen for:**
126
- - **Contradictions** -- "Simple" + "12 states." Surface.
127
- - **Vague** -- "Just work." Push for example.
128
- - **Assumed knowledge** -- "Like Stripe." Confirm specifics.
129
- - **Energy shifts** -- Excitement/boredom = signal.
130
-
131
119
  ### Step 5: Converge
132
120
 
133
121
  Summarize decided items, then confirm:
@@ -144,7 +132,7 @@ options:
144
132
  description: "Topics we haven't covered yet."
145
133
  ```
146
134
 
147
- Decisions flow into `context.md` as **Locked Decisions** at planning.
135
+ Decisions written to `.forge/context.md` at Phase Handoff (below).
148
136
 
149
137
  ## Post-Planning Discussion
150
138
 
@@ -235,8 +223,13 @@ Re-plan? Route to `planning` with summary.
235
223
 
236
224
  After convergence:
237
225
 
238
- 1. **Persist** -- Step 5/6 summary flows into `context.md` at planning. Post-planning revisions noted.
226
+ 1. **Write `context.md`** -- Create/update `.forge/context.md` from template (`.forge/templates/context.md`). Populate:
227
+ - **Locked Decisions**: all confirmed decisions from Steps 1-5
228
+ - **Deferred Ideas**: anything explicitly deferred
229
+ - **Discretion Areas**: topics left to agent judgment
230
+ - **Needs Resolution**: unresolved items (if any)
231
+ - If `context.md` already exists (post-planning discussion), update relevant sections + log amendments
239
232
  2. **Update state** -- `current.status` = `planning` (`architecting` for Full) in milestone yml
240
233
  3. **Recommend clear:**
241
234
 
242
- *"Decisions captured. State written. `/clear` then `/forge` to continue with {planning/architecting}."*
235
+ *"Decisions written to `.forge/context.md`. State updated. `/clear` then `/forge` to continue with {planning/architecting}."*
@@ -13,19 +13,9 @@ description: "Build to plan with atomic commits, deviation rules, and context en
13
13
 
14
14
  ## Deviation Rules
15
15
 
16
- ### Rule 4 Check First (Architectural)
17
- New DB table, service layer, schema change, library swap, or infra change?
18
- - **YES** → **STOP.** Checkpoint: what, proposed change, why, impact, alternatives. Wait for user.
19
- - **NO** → Continue to Rules 1-3.
16
+ **Full definitions:** `.claude/agents/executor.md`. Decision order: **Rule 4 first** (architectural → STOP, ask user), then Rule 1 (bugs), Rule 2 (critical gaps), Rule 3 (infra blockers). Uncertain → Rule 4.
20
17
 
21
- ### Rule 1: Auto-Fix Bugs
22
- Bug blocking current task → Fix inline. Add test if applicable. Document: "Rule 1: Fixed [bug] because [reason]."
23
-
24
- ### Rule 2: Auto-Add Critical Functionality
25
- Missing error handling, validation, null checks, auth, CSRF, rate limiting, logging → Add it. Document: "Rule 2: Added [what] because [reason]."
26
-
27
- ### Rule 3: Auto-Fix Blocking Infrastructure
28
- Missing dep, wrong types, broken imports, env var, build config → Fix it. Document: "Rule 3: Fixed [issue] because [reason]."
18
+ Execution-phase operational guidance below supplements the rules — it does not redefine them.
29
19
 
30
20
  ### 3-Strike Limit
31
21
  3 auto-fix attempts on a single task → STOP. Document remaining issues. Move to next task.
@@ -79,9 +79,9 @@ Tier + state → invoke via `Skill` tool.
79
79
 
80
80
  **CRITICAL: NEVER `EnterPlanMode`.** "Planning" = `Skill(planning)`. Native plan mode writes wrong format, bypasses gates + state.
81
81
 
82
- ### Mandatory Auto-Routing on Resume
82
+ ### Auto-Routing (Always Deterministic)
83
83
 
84
- **No menus.** Deterministic. Brief → route. Choices only at `complete` or corrupted.
84
+ **No menus.** Applies on first run and resume. Deterministic. Brief → route. Choices only at `complete` or corrupted.
85
85
 
86
86
  1. Read `milestone-{id}.yml` → 2. Check advancement → 3. `current.status` → skill → 4. Brief + route:
87
87
 
@@ -121,7 +121,6 @@ Subagents via `Task` → same precedence.
121
121
  | `reviewing` | `Skill(reviewing)` → complete |
122
122
  | `complete` | Done. Ask what's next. |
123
123
 
124
- - 100% tasks != complete -- verification + reviewing must run.
125
124
  - Skip before `current.status`; resume current.
126
125
 
127
126
  ### Status Advancement Check
@@ -148,27 +147,12 @@ Sessions end before advancement. On resume detect + fix:
148
147
 
149
148
  ## Deviation Rules (All Tiers)
150
149
 
151
- | Situation | Action | Rule |
152
- |-----------|--------|------|
153
- | Bug blocking task | Auto-fix, document | 1 |
154
- | Missing validation/error handling/null checks | Auto-add, document | 2 |
155
- | Broken import/dep/config | Auto-fix, document | 3 |
156
- | New DB table, service layer, library swap | **STOP. Ask user.** | 4 |
157
- | After verifying passes | Health + refactoring audit | `reviewing` |
158
- | Model routing mismatch | Flag expensive models for mechanical tasks | `reviewing` |
159
-
160
- Uncertain → 4. Never silently make arch decisions.
150
+ See CLAUDE.md § Deviation Rules (always loaded) or `.claude/agents/executor.md` for full definitions. Uncertain → Rule 4.
161
151
 
162
152
  ## Context Handoff Protocol
163
153
 
164
154
  Phase transitions = clear boundaries. **Recommend `/clear`** after writing state. Next phase reads disk artifacts, not working memory.
165
155
 
166
- Standard/Full:
167
-
168
- ```
169
- researching → [clear] → discussing → [clear] → architecting → [clear] → planning → [clear] → executing → [clear] → verifying → [clear] → reviewing
170
- ```
171
-
172
156
  **Skip:** Quick, short phase + context <40%, user opted out.
173
157
 
174
158
  ### Handoff Pattern
@@ -182,8 +166,8 @@ researching → [clear] → discussing → [clear] → architecting → [clear]
182
166
 
183
167
  | Phase | Writes | Read By |
184
168
  |-------|--------|---------|
185
- | researching | Research summary | discussing |
186
- | discussing | Decisions → context.md | planning |
169
+ | researching | `.forge/research/milestone-{id}.md` | discussing |
170
+ | discussing | `.forge/context.md` (locked decisions, deferrals, discretion) | planning |
187
171
  | architecting | ADRs, data models, API contracts | planning |
188
172
  | planning | `.forge/phases/m{M}-{N}-{name}/`, requirements.yml, roadmap.yml | executing |
189
173
  | executing | Committed code, execution summary, state | verifying |
@@ -196,10 +180,11 @@ After clear, skills load from disk via "Read Context"/"Pre-Execution Checklist".
196
180
  ## State Transitions
197
181
 
198
182
  ```
199
- not_started → [init if new] → researching → [clear] → discussing → [clear] → planning → [clear] → executing → [clear] → verifying → [clear] → reviewing → complete
200
- debugging (if stuck)
201
- ↗ designing (if UI)
202
- securing (if auth/data)
183
+ Standard: not_started → [init?] → researching → discussing → planning → executing → verifying → reviewing → complete
184
+ Full: not_started [init?] → researching → discussing → architecting → planning → executing → verifying → reviewing → complete
185
+
186
+ Branches (any tier, on-demand): debugging (stuck) · designing (UI) · securing (auth/data/API)
187
+ Phase boundaries: `[clear]` recommended between phases to reset context.
203
188
  ```
204
189
 
205
190
  Update `milestone-{id}.yml` + `index.yml` `last_updated` at each transition.
@@ -94,6 +94,8 @@ Check: Any structured agent/workflow system
94
94
  | `extensions/` | Specialized skills | `.claude/skills/` (if relevant) |
95
95
  | Project specs (if filled in) | Locked decisions | `.forge/context.md` |
96
96
 
97
+ **BMAD:** No dedicated absorption table. Use Generic path.
98
+
97
99
  **Generic:** `researching` reads all markdown/config, maps to Forge equivalents, confirms with user.
98
100
 
99
101
  **Process:** Read sources → synthesize per target → prose to YAML where expected, keep markdown where expected → unmapped to `.forge/context.md` "Carried Forward" → originals to `.forge/archive/{name}/` → show summary. Then continue brownfield steps.
@@ -83,7 +83,9 @@ For independent topics, research simultaneously:
83
83
 
84
84
  Never research sequentially when topics are independent.
85
85
 
86
- ## Output Template
86
+ ## Finding Format
87
+
88
+ Canonical schema for research output. Used both for in-conversation summaries and the persisted artifact (see Research Artifact below).
87
89
 
88
90
  ```markdown
89
91
  # Research: [Topic]
@@ -95,6 +97,9 @@ Never research sequentially when topics are independent.
95
97
  1. [Finding] — Confidence: HIGH/MEDIUM/LOW — Source: [source]
96
98
  2. [Finding] — Confidence: HIGH/MEDIUM/LOW — Source: [source]
97
99
 
100
+ ## Ruled Out
101
+ - [approach]: [reason rejected]
102
+
98
103
  ## Recommendations
99
104
  - [Recommended approach with rationale]
100
105
 
@@ -110,9 +115,15 @@ Never research sequentially when topics are independent.
110
115
 
111
116
  Research output under 500 lines. If larger, split into focused documents: `research-codebase.md`, `research-tech.md`, `research-requirements.md`.
112
117
 
118
+ ## Research Artifact
119
+
120
+ Always write `.forge/research/milestone-{id}.md` at phase end. Create directory if needed. Immutable after writing — dated snapshot, never updated. Re-research if stale.
121
+
122
+ Artifact uses the Finding Format above, with two adjustments: prepend `Date: {YYYY-MM-DD}` below the title, and omit the Sources section (URLs are ephemeral).
123
+
113
124
  ## Phase Handoff
114
125
 
115
- 1. **Persist findings** Write research summary to `.forge/phases/m{M}-{N}-{name}/` or present inline. Standard tier: inline is fine. Full tier with multiple topics: write to files.
126
+ 1. **Write artifact** (see Research Artifact section above).
116
127
  2. **Update state** — Set `current.status` to `discussing` in `.forge/state/milestone-{id}.yml`
117
128
  3. **Recommend context clear:**
118
129
 
@@ -0,0 +1,122 @@
1
+ ---
2
+ name: testing
3
+ description: "Write e2e/integration tests OR audit existing suite health. Triggers: UI flows needing browser coverage, flaky suite cleanup, coverage-gap investigation, missing test infra scaffolding. Scope: e2e + integration only — unit tests stay with executing TDD flow. Playwright for web/TS e2e; stack-aware runner (Vitest/Jest/pytest/go test) for integration. Inherits verifying gate — does not run own verification."
4
+ ---
5
+
6
+ # Testing
7
+
8
+ On-demand branch skill. Peer of `designing`, `securing`, `debugging`. Not in core pipeline — invoked when testing work needed.
9
+
10
+ **Scope boundary:** e2e + integration layers only. Unit tests belong to `executing` TDD flow. No overlap.
11
+
12
+ ## Step 1: Determine Mode
13
+
14
+ Three modes:
15
+
16
+ - **analyst** — audit existing suite. Spawns tester (analyst). Output: `suite-health.md` + refactor-backlog appends. **Brownfield default.**
17
+ - **author** — write new tests (e2e or integration). Spawns tester (author). Output: committed test files.
18
+ - **ci-check** — grep CI workflows for test runner invocations. No agent spawn — skill runs directly.
19
+
20
+ ### Intent Inference
21
+
22
+ | User says | Mode |
23
+ |-----------|------|
24
+ | "audit tests" / "suite health" / "find flakes" / "test quality" | analyst |
25
+ | "write e2e" / "add tests for X" / "scaffold Playwright" / "integration tests" | author |
26
+ | "does CI run tests" / "CI integration" / "test pipeline" | ci-check |
27
+
28
+ Ambiguous → ask. Brownfield project + no mode specified → default analyst.
29
+
30
+ ## Step 2: Read Project Context
31
+
32
+ ```
33
+ Read: .forge/project.yml → stack, framework, test runner, verification commands
34
+ Read: .forge/state/milestone-{id}.yml → current milestone
35
+ Read: .forge/testing/suite-health.md → prior audit findings (if exists)
36
+ Read: .github/workflows/* → CI config (ci-check mode, analyst CI sub-check)
37
+ ```
38
+
39
+ ## Step 3: Mode Flows
40
+
41
+ ### Analyst Mode
42
+
43
+ 1. Spawn tester agent via Agent tool, mode = `analyst`:
44
+ - Pass: milestone id + name, scope paths, stack summary from project.yml
45
+ - Agent scans test files, runs suite with traces/retries, greps anti-patterns
46
+ 2. Agent writes `.forge/testing/suite-health.md`:
47
+ - **Coverage Gaps** — untested surfaces, missing layers
48
+ - **Flake Candidates** — tests that failed-then-passed, use `sleep`/hardcoded waits
49
+ - **Anti-Patterns** — non-`data-testid` selectors, unseeded state, missing fixtures, no trace config
50
+ - **Quarantine Queue** — tests recommended for `test.fixme()` with reason + fix plan
51
+ 3. Agent appends actionable items to `.forge/refactor-backlog.yml`:
52
+ ```yaml
53
+ - id: T-{NNN}
54
+ source: testing-analyst
55
+ date: YYYY-MM-DD
56
+ category: flake | coverage-gap | anti-pattern | quarantine
57
+ file: "path/to/test.spec.ts"
58
+ lines: "NN-NN"
59
+ description: "What's wrong"
60
+ effort: quick | standard
61
+ suggested_approach: "How to fix"
62
+ status: pending
63
+ ```
64
+ ID numbering: read existing backlog, find max T-NNN, increment. No gaps.
65
+ 4. Receive agent YAML output (`suite_audit:` block). Review for completeness.
66
+
67
+ ### Author Mode
68
+
69
+ 1. **Determine layer** — e2e vs integration. Ask if ambiguous.
70
+ 2. **Select runner:**
71
+ - e2e + web/TS → **Playwright** (only option v1 — non-web e2e deferred)
72
+ - integration → read `project.yml` stack:
73
+ - `js` / `ts` → Vitest or Jest (match existing project runner)
74
+ - `python` → pytest
75
+ - `go` → go test
76
+ - `java` → JUnit
77
+ - If project has existing test runner, match it. Don't introduce a second.
78
+ 3. Spawn tester agent via Agent tool, mode = `author`:
79
+ - Pass: milestone id + name, layer, runner, scope paths, stack summary
80
+ - Agent scaffolds tests following baked flake-resistant rules (see tester.md)
81
+ 4. Agent commits atomically per executor conventions.
82
+
83
+ ### CI-Check Mode
84
+
85
+ No agent spawn. Skill runs directly.
86
+
87
+ 1. **Detect CI provider:**
88
+ - `.github/workflows/*.yml` → GH Actions (supported v1)
89
+ - `.circleci/`, `.gitlab-ci.yml`, `Jenkinsfile` → not supported v1. Report: *"Non-GH CI detected, verify test integration manually."*
90
+ 2. **For GH Actions** — grep workflow files for runner invocations:
91
+ ```
92
+ playwright test | npx playwright | vitest | jest | pytest | go test
93
+ ```
94
+ 3. **Report:**
95
+ - Runner references found → OK, list which runners + which workflows
96
+ - E2e suite exists locally but no `playwright test` in workflows → **warn** (non-blocking): *"E2e suite not wired to CI."*
97
+ - No test runner referenced at all → **warn**: *"No test runner found in CI workflows."*
98
+ 4. Warnings are non-blocking. Surface to user, don't gate.
99
+
100
+ ## Step 4: Quarantine Convention
101
+
102
+ When analyst or author recommends quarantining a flaky test:
103
+
104
+ 1. Mark with `test.fixme('reason', async () => { ... })` — Playwright/Vitest pattern
105
+ 2. Comment above: `// Quarantined YYYY-MM-DD — see .forge/testing/suite-health.md`
106
+ 3. Add entry to suite-health.md Quarantine Queue: test path | reason | fix plan | owner
107
+ 4. **Fix plan required** — quarantine without fix plan = anti-pattern. Temporary only.
108
+
109
+ ## Step 5: Gate Inheritance
110
+
111
+ Testing skill does **NOT** run its own verification gate. Committed tests flow through `verifying` skill's existing `project.yml` verification commands (`npm test`, `npx playwright test`, etc.).
112
+
113
+ If verification commands need updating to include new test layers → surface as refactor item for user. Do not silently modify `project.yml`.
114
+
115
+ ## Step 6: Output Checklist
116
+
117
+ - [ ] Mode determined + user-confirmed if ambiguous
118
+ - [ ] Agent spawned with correct mode flag (analyst or author) OR skill ran ci-check directly
119
+ - [ ] **Analyst:** suite-health.md written + refactor-backlog appended
120
+ - [ ] **Author:** tests committed, flake-resistant rules followed
121
+ - [ ] **CI-check:** warnings surfaced, non-blocking
122
+ - [ ] No independent verification invoked — defer to `verifying` skill
@@ -125,7 +125,8 @@ State lives in `.forge/`:
125
125
  - `state/index.yml` — Global: active milestones, desire_paths, metrics
126
126
  - `state/milestone-{id}.yml` — Per-milestone cursor: position, progress, decisions, blockers
127
127
  - `context.md` — Locked decisions + deferred ideas (discuss phase)
128
- - `plan.md` — Task plans with must_haves frontmatter
128
+ - `research/milestone-{id}.md` — Research findings snapshot (dated, immutable)
129
+ - `phases/m{M}-{N}-{name}/plan-{NN}.md` — Task plans with must_haves frontmatter
129
130
  - `refactor-backlog.yml` — Refactoring catalog, worked via quick-tasking
130
131
 
131
132
  **Milestones** group phases into concurrent streams. Own state file — no conflicts across sessions.
@@ -134,10 +135,14 @@ State lives in `.forge/`:
134
135
 
135
136
  ## Deviation Rules
136
137
 
137
- 1. **Bug blocking task** → Auto-fix. Document "Rule 1."
138
- 2. **Missing critical functionality** (error handling, validation, auth, null checks) → Auto-add. "Rule 2."
139
- 3. **Blocking infrastructure** (missing dep, wrong types, broken imports) → Auto-fix. "Rule 3."
140
- 4. **Architectural change** (new DB table, service layer, library switch) → **STOP. Ask user.** "Rule 4."
138
+ **Full definitions:** `.claude/agents/executor.md`.
139
+
140
+ | Situation | Action | Rule |
141
+ |-----------|--------|------|
142
+ | Bug blocking task | Auto-fix, document | 1 |
143
+ | Missing validation/error handling/null checks | Auto-add, document | 2 |
144
+ | Broken import/dep/config | Auto-fix, document | 3 |
145
+ | New DB table, service layer, library swap | **STOP. Ask user.** | 4 |
141
146
 
142
147
  Priority: Rule 4 first. Then 1-3. Uncertain → Rule 4.
143
148