forge-orkes 0.3.7 → 0.3.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,437 @@
1
+ ---
2
+ name: reviewing
3
+ description: "Use after verifying passes to assess codebase health and catalog improvement opportunities. Combines security audit (10 categories), architecture audit (4 dimensions), and refactoring scan (6 categories) into a single review pass. This is the pre-completion gate — it answers 'is this healthy enough to ship, and what could be better?'"
4
+ ---
5
+
6
+ # Reviewing: Health Audit + Refactoring Review
7
+
8
+ You are the pre-completion gate. After `verifying` confirms the work delivers what was promised, you assess codebase health AND catalog improvement opportunities in a single review pass. Three parallel scans — security, architecture, and refactoring — produce a structured report that determines whether the milestone can complete.
9
+
10
+ ## When to Trigger
11
+
12
+ - **Automatically** after `verifying` returns a PASSED verdict (Standard and Full tiers)
13
+ - **On-demand** at any time via user request
14
+
15
+ ## Process Overview
16
+
17
+ 1. Read project context (`.forge/project.yml`) to determine tech stack
18
+ 2. Scope the review — glob all source files, determine milestone diff
19
+ 3. Spawn three parallel subagents: Security Audit + Architecture Audit + Refactoring Scan
20
+ 4. Collect results, score per-category, determine overall status
21
+ 5. Write health report to `.forge/audits/milestone-{id}-health-report.md`
22
+ 6. Write accepted refactoring items to `.forge/refactor-backlog.yml`
23
+ 7. Route based on results: healthy → complete, critical issues → user decides
24
+
25
+ ## Step 1: Read Context
26
+
27
+ ```
28
+ Read: .forge/project.yml → tech stack, framework, database, dependencies
29
+ Read: .forge/state/milestone-{id}.yml → milestone ID and name
30
+ Read: .forge/constitution.md → active architectural gates (if exists)
31
+ Read: .forge/refactor-backlog.yml → existing backlog items (if any)
32
+ ```
33
+
34
+ Determine which security categories apply based on the tech stack. For example:
35
+ - No database → SQL/NoSQL Injection is N/A
36
+ - No frontend → XSS Prevention is N/A
37
+ - No CI/CD config → Pipeline Security is N/A
38
+
39
+ Determine the milestone's starting point for the git diff (for refactoring scan):
40
+ - Check git log for the commit tagged or noted as the milestone start
41
+ - If unavailable, use the first commit after the previous milestone's completion date
42
+ - Fallback: ask the user for the starting commit or branch
43
+
44
+ ## Step 2: Scope the Review
45
+
46
+ ```
47
+ Glob: src/**/*.{ts,tsx,js,jsx,py,go,rs,java} (adapt to project language)
48
+ Glob: **/*.env*, **/docker-compose*, **/.github/workflows/*
49
+ Glob: **/next.config*, **/vite.config*, **/webpack.config*
50
+ ```
51
+
52
+ Also get the diff file list for the refactoring scan:
53
+ ```
54
+ git diff --name-only {milestone_start}..HEAD
55
+ ```
56
+
57
+ Present scope summary to user:
58
+ *"Review scope: {N} source files, {N} config files, {N} files changed in this milestone. Scanning security (10 categories), architecture (4 dimensions), and refactoring opportunities (6 categories). This will take a moment."*
59
+
60
+ Build explicit file lists for each subagent — pass file paths, not globs, so nothing is missed.
61
+
62
+ ## Step 3: Spawn Parallel Scans
63
+
64
+ Spawn all three scans as fresh-context subagents. Each receives the explicit file list for their scope, the tech stack from `project.yml`, and their specific instructions below.
65
+
66
+ ### Part 1: Security Audit (subagent)
67
+
68
+ Spawn a security auditor agent with a fresh context window.
69
+
70
+ **10 Security Categories:**
71
+
72
+ | # | Category | What It Checks |
73
+ |---|----------|---------------|
74
+ | 1 | Authentication & Authorization | Every endpoint has auth middleware; role checks before data access |
75
+ | 2 | Data Scoping / Tenant Isolation | Queries scoped to correct user/tenant; no cross-tenant data leaks |
76
+ | 3 | Input Validation | Request bodies/params validated before use in queries or logic |
77
+ | 4 | Error Information Leakage | No stack traces, DB schemas, or internal details in API responses |
78
+ | 5 | XSS Prevention | No unsanitized user content injected into DOM |
79
+ | 6 | SQL/NoSQL Injection | All queries use parameterized placeholders, no string interpolation |
80
+ | 7 | Secrets Management | No hardcoded keys/tokens; `.env` in `.gitignore`; `process.env` usage |
81
+ | 8 | CORS Policy | No wildcard `*` origins in production; appropriate method restrictions |
82
+ | 9 | HTTP Security Headers | CSP, X-Frame-Options, HSTS, X-Content-Type-Options, Referrer-Policy |
83
+ | 10 | CI/CD Pipeline Security | Secrets via secrets context, not hardcoded in workflow files |
84
+
85
+ **Agent behavior rules:**
86
+ - Read every file in the provided list. No sampling or skipping.
87
+ - Every finding must have: file path, line number, what's wrong, severity, remediation.
88
+ - Understand context before flagging — read surrounding code, check for middleware, wrappers, and higher-order protections.
89
+ - Document intentionally public endpoints; don't flag them as vulnerabilities.
90
+ - Severity is firm: `critical` = exploitable vulnerability, `warning` = defense-in-depth gap, `info` = observation.
91
+ - Prefer false negatives over false positives — only flag what you're confident about.
92
+ - Categories that don't apply to this project's stack → mark as N/A with brief explanation.
93
+
94
+ **Project adaptation:** Adapt checks to the detected stack:
95
+ - Express vs Next.js vs Fastify endpoint patterns
96
+ - PostgreSQL vs MongoDB vs SQLite query patterns
97
+ - GitHub Actions vs GitLab CI vs other CI systems
98
+ - React vs Vue vs Svelte frontend patterns
99
+
100
+ **Output format** (return to orchestrator):
101
+
102
+ ```yaml
103
+ security_audit:
104
+ files_scanned: N
105
+ categories:
106
+ - id: 1
107
+ name: "Authentication & Authorization"
108
+ status: passed | warning | critical | na
109
+ findings:
110
+ - file: "src/api/users.ts"
111
+ line: 42
112
+ severity: critical | warning | info
113
+ issue: "Description of what's wrong"
114
+ remediation: "How to fix it"
115
+ notes: "Optional context about intentional decisions"
116
+ ```
117
+
118
+ ### Part 2: Architecture Audit (subagent)
119
+
120
+ Spawn an architecture auditor agent with a fresh context window.
121
+
122
+ **4 Architecture Dimensions:**
123
+
124
+ | Dimension | What It Checks |
125
+ |-----------|---------------|
126
+ | **Scalability** | Synchronous blocking calls, missing pagination, unbounded queries, N+1 query patterns, missing caching opportunities, single points of failure, hardcoded limits |
127
+ | **Maintainability** | Code complexity hotspots (files >300 lines, deeply nested logic >4 levels, god components/classes), circular dependencies, duplicated logic that warrants abstraction |
128
+ | **Code Health** | Dead code / unused exports, TODO/FIXME inventory with age, test coverage gaps (untested critical paths), stale/vulnerable dependencies |
129
+ | **Structural Quality** | Separation of concerns violations (business logic in UI layer), inconsistent patterns across similar features, missing error boundaries, API contract consistency |
130
+
131
+ **Agent behavior rules:**
132
+ - Check actual code, not theoretical concerns.
133
+ - Every finding references specific files with evidence.
134
+ - Severity: `critical` = architectural debt that will cause production issues or block future work, `warning` = quality concern worth addressing, `info` = improvement opportunity.
135
+ - Respect existing ADRs in `.forge/decisions/` — don't flag intentional architectural choices as issues.
136
+ - Respect constitutional articles in `.forge/constitution.md` — if the constitution permits a pattern, don't flag it.
137
+
138
+ **Output format** (return to orchestrator):
139
+
140
+ ```yaml
141
+ architecture_audit:
142
+ files_scanned: N
143
+ dimensions:
144
+ - name: "Scalability"
145
+ status: passed | warning | critical
146
+ findings:
147
+ - file: "src/api/products.ts"
148
+ line: 87
149
+ severity: critical | warning | info
150
+ issue: "Unbounded query with no pagination"
151
+ remediation: "Add limit/offset parameters"
152
+ - name: "Maintainability"
153
+ status: passed | warning | critical
154
+ findings: []
155
+ - name: "Code Health"
156
+ status: passed | warning | critical
157
+ findings: []
158
+ - name: "Structural Quality"
159
+ status: passed | warning | critical
160
+ findings: []
161
+ ```
162
+
163
+ ### Part 3: Refactoring Scan (subagent)
164
+
165
+ Spawn a refactoring scanner agent with a fresh context window. Pass it only the files changed during the milestone (from the git diff).
166
+
167
+ **6 Refactoring Categories:**
168
+
169
+ | # | Category | What to Look For |
170
+ |---|----------|-----------------|
171
+ | 1 | **Duplication** | Similar logic in 2+ places that could be extracted into a shared function, hook, or utility |
172
+ | 2 | **Complexity hotspots** | Functions >50 lines, nesting >3 levels deep, high cyclomatic complexity, overly long files |
173
+ | 3 | **Naming & clarity** | Unclear variable/function names, misleading abstractions, functions that do more than their name suggests |
174
+ | 4 | **Pattern inconsistency** | Same thing done differently across the milestone's files (e.g., error handling, data fetching, state management) |
175
+ | 5 | **Dead code** | Unused functions, unreachable branches, commented-out code left behind, unused imports |
176
+ | 6 | **Abstraction issues** | Over-engineered helpers used once, repeated inline code that warrants extraction, premature or missing abstractions |
177
+
178
+ **Agent behavior rules:**
179
+ - Read every file in the diff. No sampling.
180
+ - Every finding must reference a specific file and line range.
181
+ - Understand context — don't flag intentional patterns documented in the constitution.
182
+ - Don't duplicate findings from the security or architecture audits.
183
+ - Estimate effort for each item: `quick` (< 30 min, under 50 lines) or `standard` (needs planning).
184
+ - Suggest a concrete approach for each finding, not just "refactor this."
185
+ - Prefer fewer high-quality findings over many low-signal ones.
186
+
187
+ **Output format** (return to orchestrator):
188
+
189
+ ```yaml
190
+ refactoring_scan:
191
+ files_scanned: N
192
+ findings:
193
+ - category: duplication
194
+ file: "src/api/users.ts"
195
+ lines: "42-67"
196
+ description: "Duplicate validation logic — same email check in createUser and updateUser"
197
+ effort: quick
198
+ suggested_approach: "Extract shared validateEmail() helper to src/utils/validation.ts"
199
+ ```
200
+
201
+ ## Step 4: Score Results
202
+
203
+ After all three subagents return, compute scores.
204
+
205
+ **Per-category scoring (security + architecture):**
206
+
207
+ | Status | Meaning |
208
+ |--------|---------|
209
+ | `passed` | No issues found |
210
+ | `warning` | Non-critical issues (info-level also maps here) |
211
+ | `critical` | Real vulnerabilities or architectural blockers |
212
+ | `na` | Category doesn't apply to this project |
213
+
214
+ **Overall health status:**
215
+
216
+ | Overall | Condition |
217
+ |---------|-----------|
218
+ | `passed` | ALL categories and dimensions passed or N/A |
219
+ | `warnings_only` | One or more warnings, zero critical |
220
+ | `issues_found` | One or more critical findings |
221
+
222
+ **Refactoring findings** are separate from the health status — they never block completion.
223
+
224
+ ## Step 5: Write Health Report
225
+
226
+ Create `.forge/audits/` directory if needed. Write to `.forge/audits/milestone-{id}-health-report.md`.
227
+
228
+ **YAML frontmatter:**
229
+
230
+ ```yaml
231
+ ---
232
+ milestone_id: {id}
233
+ milestone_name: "{name}"
234
+ reviewed: "{ISO 8601 timestamp}"
235
+ status: passed | warnings_only | issues_found
236
+ security:
237
+ status: passed | warnings_only | issues_found
238
+ categories_passed: N
239
+ categories_warning: N
240
+ categories_critical: N
241
+ categories_na: N
242
+ architecture:
243
+ status: passed | warnings_only | issues_found
244
+ scalability: passed | warning | critical
245
+ maintainability: passed | warning | critical
246
+ code_health: passed | warning | critical
247
+ structural_quality: passed | warning | critical
248
+ refactoring:
249
+ findings_count: N
250
+ quick_count: N
251
+ standard_count: N
252
+ total_files_scanned: N
253
+ ---
254
+ ```
255
+
256
+ **Body structure:**
257
+
258
+ ```markdown
259
+ # Review Report: {milestone name}
260
+
261
+ ## Executive Summary
262
+ {1-3 sentences: overall health assessment, key findings, refactoring highlights, recommendation}
263
+
264
+ ## Security Findings
265
+
266
+ ### Category 1: Authentication & Authorization — {STATUS}
267
+ | File | Line | Severity | Issue | Remediation |
268
+ |------|------|----------|-------|-------------|
269
+ | ... | ... | ... | ... | ... |
270
+
271
+ {Repeat for each category. N/A categories get a single line: "N/A — {reason}"}
272
+
273
+ ## Architecture Findings
274
+
275
+ ### Scalability — {STATUS}
276
+ | File | Line | Severity | Issue | Remediation |
277
+ |------|------|----------|-------|-------------|
278
+ | ... | ... | ... | ... | ... |
279
+
280
+ {Repeat for each dimension}
281
+
282
+ ## Refactoring Opportunities
283
+
284
+ ### Duplication ({N} items)
285
+ | File | Lines | Description | Effort | Approach |
286
+ |------|-------|-------------|--------|----------|
287
+ | ... | ... | ... | quick/standard | ... |
288
+
289
+ {Repeat for each refactoring category with findings}
290
+
291
+ ## Public Endpoints
292
+ {List of intentionally public endpoints documented during security audit}
293
+
294
+ ## Files Scanned
295
+ {Count and list of all files scanned across all three scans}
296
+ ```
297
+
298
+ **Health trend tracking:** If a previous audit exists for an earlier milestone (check `.forge/audits/` for prior reports), compare results and note improvements or regressions in the executive summary.
299
+
300
+ ## Step 6: Present Results + Triage Refactoring
301
+
302
+ ### Health Results
303
+
304
+ Present the health status first — this is the gate.
305
+
306
+ **If HEALTHY (all passed):**
307
+ *"Health audit passed. No security vulnerabilities or architectural concerns found."*
308
+
309
+ **If NEEDS ATTENTION (critical issues):**
310
+ *"Review found critical issues that should be addressed before shipping:"*
311
+ Inline the top 3 findings per critical category so the user sees them immediately.
312
+
313
+ **If WARNINGS ONLY:**
314
+ *"Review passed with warnings — no critical issues, but {N} items worth noting. See the full report at `.forge/audits/milestone-{id}-health-report.md`."*
315
+
316
+ ### Refactoring Triage
317
+
318
+ After presenting health results, show refactoring findings for triage. Group by category, max 10 initially:
319
+
320
+ *"I also found {N} refactoring opportunities in the code built during this milestone:"*
321
+
322
+ For each category with findings:
323
+ *"**Duplication** ({N} items):*
324
+ *1. `src/api/users.ts:42-67` — Duplicate email validation in createUser and updateUser. Quick fix: extract shared helper. [Accept / Dismiss]*
325
+ *2. ...*"
326
+
327
+ The user can respond with:
328
+ - **Accept** (individual item) → add to backlog
329
+ - **Dismiss** (individual item) → skip, not a real issue or intentional
330
+ - **Accept all** → bulk add all remaining items to backlog
331
+ - **Dismiss all** → skip everything, no backlog items added
332
+
333
+ For dismissed items, optionally ask for a brief reason (helps calibrate future scans).
334
+
335
+ ## Step 7: Write Backlog + Route
336
+
337
+ ### Write Refactoring Backlog
338
+
339
+ Read existing `.forge/refactor-backlog.yml` (if any). Determine the next item ID by incrementing from the highest existing ID.
340
+
341
+ Append accepted items to `.forge/refactor-backlog.yml`:
342
+
343
+ ```yaml
344
+ items:
345
+ - id: R001
346
+ milestone: 1
347
+ category: duplication
348
+ file: "src/api/users.ts"
349
+ lines: "42-67"
350
+ description: "Duplicate validation logic — same email check in createUser and updateUser"
351
+ effort: quick
352
+ suggested_approach: "Extract shared validateEmail() helper"
353
+ status: pending
354
+ added: "2026-03-18"
355
+ completed: null
356
+ ```
357
+
358
+ If the file doesn't exist yet, create it from the template at `.forge/templates/refactor-backlog.yml`.
359
+
360
+ ### Route Based on Health Status
361
+
362
+ #### HEALTHY or WARNINGS ONLY (user accepts)
363
+
364
+ Update `.forge/state/milestone-{id}.yml`:
365
+ - Set `current.status` to `complete`
366
+
367
+ Update `.forge/state/index.yml`:
368
+ - Set milestone status to `complete`
369
+ - Update `last_updated` timestamp
370
+
371
+ Present to user:
372
+ *"Milestone [{name}] is complete. {N} refactoring items are in the backlog for whenever you want to tackle them."*
373
+
374
+ If Beads is installed, run `bd complete` to update the dependency graph.
375
+
376
+ #### NEEDS ATTENTION (critical issues found)
377
+
378
+ Do NOT mark milestone complete. Present choices:
379
+
380
+ *"Options:"*
381
+ - **A. Fix critical issues** — return to `planning` in fix mode with findings as requirements
382
+ - **B. Accept risk and continue** — document accepted risks in report, complete the milestone
383
+
384
+ If user chooses A:
385
+ - Create fix requirements from critical findings
386
+ - Route to `planning` skill in fix mode
387
+ - After fix execution + re-verification, re-run `reviewing` (not full verification — just this review)
388
+
389
+ If user chooses B:
390
+ - Append "Accepted Risks" section to the health report with user's acknowledgment
391
+ - Complete the milestone (same as HEALTHY path above)
392
+
393
+ #### WARNINGS ONLY (user wants to fix)
394
+
395
+ If user wants to fix warnings instead of accepting:
396
+ - Create fix requirements from warning findings
397
+ - Route to `planning` in fix mode
398
+ - After fix execution, re-run `reviewing`
399
+
400
+ ## Gate Type: Mixed
401
+
402
+ - **Security critical findings** → soft gate (user can accept risk, but strongly recommended to fix)
403
+ - **Architecture critical findings** → soft gate (same — user has final authority)
404
+ - **Warnings** → advisory (noted in report, user chooses)
405
+ - **Refactoring items** → never block (cataloged to backlog for future work)
406
+
407
+ The report documents the decision either way, creating an audit trail.
408
+
409
+ ## Backlog Lifecycle
410
+
411
+ Backlog items follow this lifecycle:
412
+
413
+ ```
414
+ pending → in_progress → done
415
+ pending → dismissed (during triage or later review)
416
+ ```
417
+
418
+ Items with `effort: quick` can be picked up directly via `quick-tasking`.
419
+ Items with `effort: standard` should go through the Standard tier flow.
420
+
421
+ When working a backlog item:
422
+ 1. `forge` surfaces it as an available task
423
+ 2. User selects it
424
+ 3. Route to `quick-tasking` or Standard tier based on effort
425
+ 4. On completion, update the item's `status` to `done` and set `completed` date
426
+
427
+ ## Phase Handoff
428
+
429
+ After reviewing completes (all paths: HEALTHY, accepted risk, accepted warnings):
430
+
431
+ 1. **Verify persistence** — Confirm health report is written to `.forge/audits/milestone-{id}-health-report.md` and refactoring backlog is updated
432
+ 2. **Update state** — Set `current.status` to `complete` in `.forge/state/milestone-{id}.yml`
433
+ 3. **Present completion:**
434
+
435
+ *"Milestone [{name}] complete. Review report at `.forge/audits/milestone-{id}-health-report.md`. {N} refactoring items in backlog.*
436
+
437
+ *Start new work with `/forge` or tackle backlog items anytime."*
@@ -128,7 +128,7 @@ Based on all verification levels:
128
128
 
129
129
  ### PASSED
130
130
  All truths verified, all artifacts substantive and wired, all key links connected, requirements covered.
131
- → Route to `auditing` skill for health audit before milestone completion.
131
+ → Route to `reviewing` skill for health audit + refactoring review before milestone completion.
132
132
 
133
133
  ### GAPS FOUND
134
134
  Some truths failed or artifacts are stubs.
@@ -219,10 +219,10 @@ Only suggest changes when there's clear evidence (3+ occurrences). One-off issue
219
219
  After verification completes with a PASSED verdict:
220
220
 
221
221
  1. **Verify persistence** — Confirm verification results are documented, desire paths retrospective is logged to `.forge/state/index.yml`
222
- 2. **Update state** — Set `current.status` to `auditing` in `.forge/state/milestone-{id}.yml`
222
+ 2. **Update state** — Set `current.status` to `reviewing` in `.forge/state/milestone-{id}.yml`
223
223
  3. **Recommend context clear:**
224
224
 
225
- *"Verification phase complete — all truths verified, artifacts substantive and wired. I recommend clearing context (`/clear`) before the health audit — the auditing skill spawns fresh subagents anyway, and a clean orchestrator context ensures nothing is missed.*
225
+ *"Verification phase complete — all truths verified, artifacts substantive and wired. I recommend clearing context (`/clear`) before the review — the reviewing skill spawns fresh subagents anyway, and a clean orchestrator context ensures nothing is missed.*
226
226
 
227
227
  *Ready to continue? Clear context and invoke `/forge` to resume."*
228
228
 
@@ -30,6 +30,17 @@ constraints:
30
30
  - "" # e.g., "No custom auth — use Clerk"
31
31
  - "" # e.g., "No server-side rendering"
32
32
 
33
+ verification:
34
+ commands: # Shell commands run after each task commit
35
+ - "" # e.g., "npm run lint"
36
+ - "" # e.g., "npm test"
37
+ - "" # e.g., "npx tsc --noEmit"
38
+ auto_fix: true # On failure, agent fixes and retries
39
+ max_retries: 2 # Max auto-fix attempts per command (0 = fail immediately)
40
+ # Commands are auto-detected during init from package.json scripts.
41
+ # Advisory mode: commands that were already failing before Forge started
42
+ # run but don't block — they log warnings only.
43
+
33
44
  success_criteria: # How do we know we're done?
34
45
  - "" # e.g., "User can create and edit posts"
35
46
  - "" # e.g., "All tests pass with >80% coverage"
@@ -29,11 +29,11 @@ Forge auto-detects complexity. Override with: "Use Quick/Standard/Full tier."
29
29
 
30
30
  ### Standard (hours)
31
31
  **Triggers:** new feature, component, significant refactor, multi-file change
32
- **Flow:** → `researching` → `discussing` → `planning` → `executing` → `verifying` → `auditing` → `refactoring` → done
32
+ **Flow:** → `researching` → `discussing` → `planning` → `executing` → `verifying` → `reviewing` → done
33
33
 
34
34
  ### Full (days)
35
35
  **Triggers:** new project, major milestone, complex multi-system feature, architectural decisions needed
36
- **Flow:** → `researching` → `discussing` → `architecting` → `planning` → `executing` → `verifying` → `auditing` → `refactoring` → done
36
+ **Flow:** → `researching` → `discussing` → `architecting` → `planning` → `executing` → `verifying` → `reviewing` → done
37
37
  **Optional additions:** `designing` (UI work), `securing` (auth/data/API), `debugging` (stuck on issue)
38
38
 
39
39
  ## Skill Routing
@@ -48,8 +48,7 @@ Forge auto-detects complexity. Override with: "Use Quick/Standard/Full tier."
48
48
  | Break work into executable tasks with gates | `planning` | Standard, Full |
49
49
  | Build code with deviation rules + atomic commits | `executing` | All |
50
50
  | Prove work actually delivers on goals | `verifying` | Standard, Full |
51
- | Audit application health before shipping | `auditing` | Standard, Full |
52
- | Review refactoring opportunities after milestone audit | `refactoring` | Standard, Full |
51
+ | Audit health + catalog refactoring opportunities | `reviewing` | Standard, Full |
53
52
  | Fix a small, scoped issue fast | `quick-tasking` | Quick |
54
53
  | Build UI with design system consistency | `designing` | When UI involved |
55
54
  | Review security before shipping | `securing` | When auth/data/API involved |
@@ -71,7 +70,7 @@ Forge auto-detects complexity. Override with: "Use Quick/Standard/Full tier."
71
70
  When a task touches 20+ files or a complex subsystem, spawn a fresh executor agent with isolated context. This prevents context rot — the #1 cause of quality degradation in long sessions.
72
71
 
73
72
  ### Context Handoff Between Phases
74
- Each phase writes its outputs to `.forge/` before completing. At every phase boundary (researching → discussing → planning → executing → verifying → auditing → refactoring), the completing skill recommends clearing context (`/clear`) before the next phase begins. The next phase loads what it needs from disk. This is advisory — skip for short phases where context is under 40%. See the `forge` skill's "Context Handoff Protocol" for full details.
73
+ Each phase writes its outputs to `.forge/` before completing. At every phase boundary (researching → discussing → planning → executing → verifying → reviewing), the completing skill recommends clearing context (`/clear`) before the next phase begins. The next phase loads what it needs from disk. This is advisory — skip for short phases where context is under 40%. See the `forge` skill's "Context Handoff Protocol" for full details.
75
74
 
76
75
  ### Lazy Loading
77
76
  Skills load only when invoked. CLAUDE.md stays in context; skill details load on demand. This keeps base context lean (~300 lines) while making full framework available.
@@ -84,9 +83,7 @@ Skills load only when invoked. CLAUDE.md stays in context; skill details load on
84
83
  | `planner` | Planning with constitutional gates | Read + Write (plan files only) | Planning phases |
85
84
  | `executor` | Building with deviation rules | All dev tools | Execution phases |
86
85
  | `verifier` | Goal-backward verification | Read + Bash (test execution) | Verification phases |
87
- | `security-auditor` | Security vulnerability scanner | Read, Bash, Grep, Glob | Auditing phase |
88
- | `architecture-auditor` | Structural health assessor | Read, Grep, Glob | Auditing phase |
89
- | `reviewer` | Security + code quality audit | Read-only + npm audit | Before shipping |
86
+ | `reviewer` | Security + architecture + refactoring audit | Read, Bash, Grep, Glob | Reviewing phase |
90
87
 
91
88
  ## Project Init (First Run)
92
89
 
@@ -115,7 +112,7 @@ For Quick tier tasks, init is skipped — just do the work.
115
112
  ## State Management
116
113
 
117
114
  Project state lives in `.forge/`:
118
- - `project.yml` — Vision, stack, design system, constraints (< 5 KB)
115
+ - `project.yml` — Vision, stack, design system, verification commands, constraints (< 5 KB)
119
116
  - `constitution.md` — Active architectural gates (selected during init)
120
117
  - `design-system.md` — Component mapping table (generated during init)
121
118
  - `requirements.yml` — Structured requirements with `[NEEDS CLARIFICATION]` markers
@@ -124,7 +121,7 @@ Project state lives in `.forge/`:
124
121
  - `state/milestone-{id}.yml` — Per-milestone cursor: current position, progress, decisions, blockers, deviations
125
122
  - `context.md` — Locked user decisions + deferred ideas (created during discuss phase)
126
123
  - `plan.md` — Per-phase task plans with must_haves frontmatter
127
- - `refactor-backlog.yml` — Refactoring opportunities cataloged after milestone audits, worked via quick-tasking
124
+ - `refactor-backlog.yml` — Refactoring opportunities cataloged during milestone reviews, worked via quick-tasking
128
125
 
129
126
  ### Milestones
130
127
  Milestones group phases into concurrent work streams. Each milestone has its own state file, so different sessions can work on different milestones without conflicts. On resume, Forge shows active milestones and asks which one to work on.
@@ -133,7 +130,7 @@ Milestones group phases into concurrent work streams. Each milestone has its own
133
130
  YAML for anything agents parse programmatically (project, requirements, roadmap, state). Markdown for human-facing content (constitution, context, verification reports). Never free-form prose for machine state.
134
131
 
135
132
  ### Milestone Completion: Status vs. Percentage
136
- **`current.status` is the authoritative workflow position.** A milestone is only complete when `current.status == complete`. The `progress.overall_percent` field measures task completion — not workflow completion. A milestone at 100% task completion still needs verifying, auditing, and refactoring before it is done. On resume, always check and display `current.status` to determine next steps.
133
+ **`current.status` is the authoritative workflow position.** A milestone is only complete when `current.status == complete`. The `progress.overall_percent` field measures task completion — not workflow completion. A milestone at 100% task completion still needs verifying and reviewing before it is done. On resume, always check and display `current.status` to determine next steps.
137
134
 
138
135
  ## Deviation Rules (Executor Decision Tree)
139
136
 
@@ -146,6 +143,27 @@ When the executor encounters issues during building:
146
143
 
147
144
  Priority: Rule 4 first (stop if architectural). Then Rules 1-3 (auto-fix). Uncertain? → Rule 4 (ask).
148
145
 
146
+ ## Verification Gates
147
+
148
+ After each task commit, the executor runs configured verification commands from `project.yml`:
149
+
150
+ ```yaml
151
+ verification:
152
+ commands:
153
+ - cmd: "npm run lint"
154
+ - cmd: "npm test"
155
+ - cmd: "npx tsc --noEmit"
156
+ advisory: true # pre-existing failures — warn only
157
+ auto_fix: true # agent fixes and retries on failure
158
+ max_retries: 2 # max auto-fix attempts per command
159
+ ```
160
+
161
+ - **Auto-detected during init** from `package.json` scripts (test, lint, typecheck)
162
+ - **Advisory mode**: commands that were already failing before Forge started run but don't block
163
+ - **Auto-fix loop**: on failure, agent reads output, fixes code, amends commit, re-runs (up to max_retries)
164
+ - **3-strike integration**: verification retries count toward the task's 3-strike limit
165
+ - Empty `commands` list = no verification gate (opt-out)
166
+
149
167
  ## Beads Integration (Optional)
150
168
 
151
169
  When Beads is installed, Forge gains persistent cross-session memory: