@harness-engineering/cli 1.13.1 → 1.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (147) hide show
  1. package/dist/agents/skills/claude-code/harness-autopilot/SKILL.md +240 -39
  2. package/dist/agents/skills/claude-code/harness-autopilot/skill.yaml +6 -0
  3. package/dist/agents/skills/claude-code/harness-brainstorming/SKILL.md +39 -0
  4. package/dist/agents/skills/claude-code/harness-code-review/SKILL.md +44 -0
  5. package/dist/agents/skills/claude-code/harness-execution/SKILL.md +44 -0
  6. package/dist/agents/skills/claude-code/harness-planning/SKILL.md +39 -0
  7. package/dist/agents/skills/claude-code/harness-product-spec/SKILL.md +5 -5
  8. package/dist/agents/skills/claude-code/harness-release-readiness/SKILL.md +3 -3
  9. package/dist/agents/skills/claude-code/harness-verification/SKILL.md +35 -0
  10. package/dist/agents/skills/claude-code/initialize-harness-project/SKILL.md +11 -3
  11. package/dist/agents/skills/gemini-cli/harness-autopilot/SKILL.md +240 -39
  12. package/dist/agents/skills/gemini-cli/harness-autopilot/skill.yaml +6 -0
  13. package/dist/agents/skills/gemini-cli/harness-brainstorming/SKILL.md +39 -0
  14. package/dist/agents/skills/gemini-cli/harness-code-review/SKILL.md +44 -0
  15. package/dist/agents/skills/gemini-cli/harness-execution/SKILL.md +44 -0
  16. package/dist/agents/skills/gemini-cli/harness-planning/SKILL.md +39 -0
  17. package/dist/agents/skills/gemini-cli/harness-product-spec/SKILL.md +5 -5
  18. package/dist/agents/skills/gemini-cli/harness-release-readiness/SKILL.md +3 -3
  19. package/dist/agents/skills/gemini-cli/harness-verification/SKILL.md +35 -0
  20. package/dist/agents/skills/gemini-cli/initialize-harness-project/SKILL.md +11 -3
  21. package/dist/agents/skills/package.json +1 -0
  22. package/dist/agents/skills/vitest.config.mts +5 -0
  23. package/dist/agents-md-ZGNIDWAF.js +8 -0
  24. package/dist/{architecture-2R5Z4ZAF.js → architecture-ZLIH5533.js} +4 -4
  25. package/dist/bin/harness-mcp.js +14 -14
  26. package/dist/bin/harness.js +27 -25
  27. package/dist/{check-phase-gate-2OFZ7OWW.js → check-phase-gate-ZOXVBDCN.js} +4 -4
  28. package/dist/{chunk-ND6PNADU.js → chunk-2BKLWLY6.js} +9 -9
  29. package/dist/{chunk-65FRIL4D.js → chunk-3ZZKVN62.js} +1 -1
  30. package/dist/{chunk-C2ERUR3L.js → chunk-7MJAPE3Z.js} +165 -49
  31. package/dist/{chunk-Z77YQRQT.js → chunk-B2HKP423.js} +16 -5
  32. package/dist/{chunk-QPEH2QPG.js → chunk-DBSOCI3G.js} +53 -54
  33. package/dist/{chunk-TKJZKICB.js → chunk-EDXIVMAP.js} +7 -7
  34. package/dist/{chunk-MHBMTPW7.js → chunk-ERS5EVUZ.js} +9 -0
  35. package/dist/{chunk-JSTQ3AWB.js → chunk-FIAPHX37.js} +1 -1
  36. package/dist/{chunk-IMFVFNJE.js → chunk-FTMXDOR6.js} +1 -1
  37. package/dist/{chunk-72GHBOL2.js → chunk-GZKSBLQL.js} +1 -1
  38. package/dist/{chunk-K6XAPGML.js → chunk-H7Y5CKTM.js} +1 -1
  39. package/dist/{chunk-SSKDAOX5.js → chunk-J4RAX7YB.js} +1164 -516
  40. package/dist/{chunk-UAX4I5ZE.js → chunk-LGYBN7Y6.js} +2 -2
  41. package/dist/{chunk-QY4T6YAZ.js → chunk-N25INEIX.js} +4 -4
  42. package/dist/{chunk-4ZMOCPYO.js → chunk-ND2ENWDM.js} +1 -1
  43. package/dist/{chunk-NERR4TAO.js → chunk-NNHDDXYT.js} +1250 -765
  44. package/dist/{chunk-NKDM3FMH.js → chunk-OD3S2NHN.js} +1 -1
  45. package/dist/{chunk-NOPU4RZ4.js → chunk-OFXQSFOW.js} +3 -3
  46. package/dist/{chunk-TS3XWPW5.js → chunk-RCWZBSK5.js} +1 -1
  47. package/dist/{chunk-VUCPTQ6G.js → chunk-SD3SQOZ2.js} +1 -1
  48. package/dist/{chunk-DZS7CJKL.js → chunk-VEPAJXBW.js} +45 -47
  49. package/dist/{chunk-IM32EEDM.js → chunk-YLXFKVJE.js} +9 -9
  50. package/dist/{chunk-Q6AB7W5Z.js → chunk-YQ6KC6TE.js} +1 -1
  51. package/dist/{chunk-PQ5YK4AY.js → chunk-Z2OOPXJO.js} +2740 -1221
  52. package/dist/ci-workflow-765LSHRD.js +8 -0
  53. package/dist/{dist-2B363XUH.js → dist-ALQDD67R.js} +64 -2
  54. package/dist/{dist-HXHWB7SV.js → dist-B26DFXMP.js} +571 -478
  55. package/dist/{dist-L7LAAQAS.js → dist-DZ63LLUD.js} +1 -1
  56. package/dist/{dist-D4RYGUZE.js → dist-USY2C5JL.js} +3 -1
  57. package/dist/{docs-FZOPM4GK.js → docs-NRMQCOJ6.js} +4 -4
  58. package/dist/engine-3RB7MXPP.js +8 -0
  59. package/dist/{entropy-LVHJMFGH.js → entropy-6AGX2ZUN.js} +3 -3
  60. package/dist/{feedback-IHLVLMRD.js → feedback-MY4QZIFD.js} +1 -1
  61. package/dist/{generate-agent-definitions-64S3CLEZ.js → generate-agent-definitions-ZAE726AU.js} +4 -4
  62. package/dist/{graph-loader-GJZ4FN4Y.js → graph-loader-2M2HXDQI.js} +1 -1
  63. package/dist/index.d.ts +156 -17
  64. package/dist/index.js +24 -24
  65. package/dist/loader-UUTVMQCC.js +10 -0
  66. package/dist/{mcp-JQUI7BVZ.js → mcp-VU5FMO52.js} +14 -14
  67. package/dist/{performance-ZTVSUANN.js → performance-2D7G6NMJ.js} +3 -3
  68. package/dist/{review-pipeline-76JHKGSV.js → review-pipeline-RAQ55ISU.js} +1 -1
  69. package/dist/runtime-BCK5RRZQ.js +9 -0
  70. package/dist/{security-FWQZF2IZ.js → security-2RPQEN62.js} +1 -1
  71. package/dist/templates/axum/Cargo.toml.hbs +8 -0
  72. package/dist/templates/axum/src/main.rs +12 -0
  73. package/dist/templates/axum/template.json +16 -0
  74. package/dist/templates/django/manage.py.hbs +19 -0
  75. package/dist/templates/django/requirements.txt.hbs +1 -0
  76. package/dist/templates/django/src/settings.py.hbs +44 -0
  77. package/dist/templates/django/src/urls.py +6 -0
  78. package/dist/templates/django/src/wsgi.py.hbs +9 -0
  79. package/dist/templates/django/template.json +21 -0
  80. package/dist/templates/express/package.json.hbs +15 -0
  81. package/dist/templates/express/src/app.ts +12 -0
  82. package/dist/templates/express/src/lib/.gitkeep +0 -0
  83. package/dist/templates/express/template.json +16 -0
  84. package/dist/templates/fastapi/requirements.txt.hbs +2 -0
  85. package/dist/templates/fastapi/src/main.py +8 -0
  86. package/dist/templates/fastapi/template.json +20 -0
  87. package/dist/templates/gin/go.mod.hbs +5 -0
  88. package/dist/templates/gin/main.go +15 -0
  89. package/dist/templates/gin/template.json +19 -0
  90. package/dist/templates/go-base/.golangci.yml +16 -0
  91. package/dist/templates/go-base/AGENTS.md.hbs +35 -0
  92. package/dist/templates/go-base/go.mod.hbs +3 -0
  93. package/dist/templates/go-base/harness.config.json.hbs +17 -0
  94. package/dist/templates/go-base/main.go +7 -0
  95. package/dist/templates/go-base/template.json +14 -0
  96. package/dist/templates/java-base/AGENTS.md.hbs +35 -0
  97. package/dist/templates/java-base/checkstyle.xml +20 -0
  98. package/dist/templates/java-base/harness.config.json.hbs +16 -0
  99. package/dist/templates/java-base/pom.xml.hbs +39 -0
  100. package/dist/templates/java-base/src/main/java/App.java.hbs +5 -0
  101. package/dist/templates/java-base/template.json +13 -0
  102. package/dist/templates/nestjs/nest-cli.json +5 -0
  103. package/dist/templates/nestjs/package.json.hbs +18 -0
  104. package/dist/templates/nestjs/src/app.module.ts +8 -0
  105. package/dist/templates/nestjs/src/lib/.gitkeep +0 -0
  106. package/dist/templates/nestjs/src/main.ts +11 -0
  107. package/dist/templates/nestjs/template.json +16 -0
  108. package/dist/templates/nextjs/template.json +15 -1
  109. package/dist/templates/python-base/.python-version +1 -0
  110. package/dist/templates/python-base/AGENTS.md.hbs +32 -0
  111. package/dist/templates/python-base/harness.config.json.hbs +16 -0
  112. package/dist/templates/python-base/pyproject.toml.hbs +18 -0
  113. package/dist/templates/python-base/ruff.toml +5 -0
  114. package/dist/templates/python-base/src/__init__.py +0 -0
  115. package/dist/templates/python-base/template.json +13 -0
  116. package/dist/templates/react-vite/index.html +12 -0
  117. package/dist/templates/react-vite/package.json.hbs +18 -0
  118. package/dist/templates/react-vite/src/App.tsx +7 -0
  119. package/dist/templates/react-vite/src/lib/.gitkeep +0 -0
  120. package/dist/templates/react-vite/src/main.tsx +9 -0
  121. package/dist/templates/react-vite/template.json +19 -0
  122. package/dist/templates/react-vite/vite.config.ts +6 -0
  123. package/dist/templates/rust-base/AGENTS.md.hbs +35 -0
  124. package/dist/templates/rust-base/Cargo.toml.hbs +6 -0
  125. package/dist/templates/rust-base/clippy.toml +2 -0
  126. package/dist/templates/rust-base/harness.config.json.hbs +17 -0
  127. package/dist/templates/rust-base/src/main.rs +3 -0
  128. package/dist/templates/rust-base/template.json +14 -0
  129. package/dist/templates/spring-boot/pom.xml.hbs +50 -0
  130. package/dist/templates/spring-boot/src/main/java/Application.java.hbs +19 -0
  131. package/dist/templates/spring-boot/template.json +15 -0
  132. package/dist/templates/vue/index.html +12 -0
  133. package/dist/templates/vue/package.json.hbs +16 -0
  134. package/dist/templates/vue/src/App.vue +7 -0
  135. package/dist/templates/vue/src/lib/.gitkeep +0 -0
  136. package/dist/templates/vue/src/main.ts +4 -0
  137. package/dist/templates/vue/template.json +19 -0
  138. package/dist/templates/vue/vite.config.ts +6 -0
  139. package/dist/{validate-GCHZJIL7.js → validate-KBYQAEWE.js} +4 -4
  140. package/dist/validate-cross-check-OABMREW4.js +8 -0
  141. package/package.json +7 -5
  142. package/dist/agents-md-XU3BHE22.js +0 -8
  143. package/dist/ci-workflow-EHV65NQB.js +0 -8
  144. package/dist/engine-OL4T6NZS.js +0 -8
  145. package/dist/loader-DPYFB6R6.js +0 -10
  146. package/dist/runtime-X7U6SC7K.js +0 -9
  147. package/dist/validate-cross-check-STFHYMAZ.js +0 -8
@@ -31,7 +31,7 @@ Autopilot orchestrates these persona agents — it never reimplements their logi
31
31
  - **Claude Code:** Use the Agent tool with `subagent_type` set to the persona name.
32
32
  - **Gemini CLI:** Use the `run_agent` tool targeting the persona by name, or dispatch via `harness persona run <name>`.
33
33
 
34
- **Human always approves plans.** No plan executes without explicit human sign-off, regardless of complexity level. The difference is whether autopilot generates the plan automatically or asks the human to drive planning interactively.
34
+ **Plans are gated by concern signals.** When no concern signals fire (low complexity, no planner concerns, task count within threshold), plans are auto-approved with a structured report and execution proceeds immediately. When any signal fires, the plan pauses for human review with the standard yes/revise/skip/stop flow. The `--review-plans` session flag forces all plans to pause regardless of signals.
35
35
 
36
36
  ## Process
37
37
 
@@ -42,7 +42,7 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
42
42
 
43
43
  [next phase?]
44
44
  ↓ ↓
45
- ASSESS DONE
45
+ ASSESS FINAL_REVIEW → DONE
46
46
  ```
47
47
 
48
48
  ---
@@ -61,7 +61,9 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
61
61
  - Create the session directory if it does not exist
62
62
 
63
63
  3. **Check for existing state.** Read `{sessionDir}/autopilot-state.json`. If it exists and `currentState` is not `DONE`:
64
+ - **Schema migration:** If `schemaVersion < 3`, backfill missing fields: set `startingCommit` to the earliest commit in `history` (or current HEAD if no history), set `decisions` to `[]`, set `finalReview` to `{ "status": "pending", "findings": [], "retryCount": 0 }`. If `schemaVersion < 4`, set `reviewPlans` to `false`. Update `schemaVersion` to `4` and save.
64
65
  - Report: "Resuming autopilot from state `{currentState}`, phase {currentPhase}: {phaseName}."
66
+ - Skip steps 4 and 5 (initial state creation and flag parsing) — these only apply to fresh starts.
65
67
  - Skip to the recorded `currentState` and continue from there.
66
68
 
67
69
  4. **If no existing state (fresh start):**
@@ -70,12 +72,15 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
70
72
  - For each phase heading (`### Phase N: Name`), extract:
71
73
  - Phase name
72
74
  - Complexity annotation (`<!-- complexity: low|medium|high -->`, default: `medium`)
75
+ - Capture the starting commit: run `git rev-parse HEAD` and store the result as `startingCommit`.
73
76
  - Create `{sessionDir}/autopilot-state.json`:
74
77
  ```json
75
78
  {
76
- "schemaVersion": 2,
79
+ "schemaVersion": 4,
77
80
  "sessionDir": ".harness/sessions/<slug>",
78
81
  "specPath": "<path to spec>",
82
+ "startingCommit": "<git rev-parse HEAD output>",
83
+ "reviewPlans": false,
79
84
  "currentState": "ASSESS",
80
85
  "currentPhase": 0,
81
86
  "phases": [
@@ -91,11 +96,19 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
91
96
  "maxAttempts": 3,
92
97
  "currentTask": null
93
98
  },
94
- "history": []
99
+ "history": [],
100
+ "decisions": [],
101
+ "finalReview": {
102
+ "status": "pending",
103
+ "findings": [],
104
+ "retryCount": 0
105
+ }
95
106
  }
96
107
  ```
97
108
 
98
- 5. **Load context via gather_context.** Use the `gather_context` MCP tool to load all working context efficiently:
109
+ 5. **Parse session flags.** Check CLI arguments for `--review-plans`. If present, set `state.reviewPlans: true` in the state file. This flag persists for the entire session — resuming a session preserves the setting from when it was started (the flag is only read on fresh start, not on resume).
110
+
111
+ 6. **Load context via gather_context.** Use the `gather_context` MCP tool to load all working context efficiently:
99
112
 
100
113
  ```json
101
114
  gather_context({
@@ -109,19 +122,19 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
109
122
 
110
123
  This loads session-scoped learnings, handoff, state, and validation results in a single call. The `session` parameter ensures all reads come from the session directory (`.harness/sessions/<slug>/`), isolating this workstream from others. Note any relevant learnings or known dead ends for the current phase from the returned `learnings` array.
111
124
 
112
- 6. **Load session summary for cold start.** If resuming (existing `autopilot-state.json` found):
125
+ 7. **Load session summary for cold start.** If resuming (existing `autopilot-state.json` found):
113
126
  - Call `loadSessionSummary()` for the session slug to get quick orientation context (~200 tokens).
114
127
  - The summary provides the last skill, phase, status, and next step — enough to understand where the autopilot left off without re-reading the full state machine.
115
128
  - If no summary exists (first run), skip — the full INIT handles context loading.
116
129
 
117
- 7. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
130
+ 8. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
118
131
  - Current project priorities (which features are `in-progress`)
119
132
  - Blockers that may affect the upcoming phases
120
133
  - Overall project status and milestone progress
121
134
 
122
135
  This provides the autopilot with project-level context beyond the individual spec being executed. If the roadmap does not exist, skip this step — the autopilot operates normally without it.
123
136
 
124
- 8. **Transition to ASSESS.**
137
+ 9. **Transition to ASSESS.**
125
138
 
126
139
  ---
127
140
 
@@ -192,27 +205,114 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
192
205
 
193
206
  ---
194
207
 
195
- ### APPROVE_PLAN — Human Review Gate
196
-
197
- **This state always pauses for human input.**
208
+ ### APPROVE_PLAN — Conditional Review Gate
198
209
 
199
- 1. **Present the plan summary:**
210
+ 1. **Gather plan metadata:**
200
211
  - Phase name and number
201
- - Task count
212
+ - Task count (from the plan file)
202
213
  - Checkpoint count
203
- - Estimated time (task count × 3 minutes)
214
+ - Estimated time (task count x 3 minutes)
204
215
  - Effective complexity (original + any override)
205
- - Any concerns from the planning handoff
216
+ - Concerns array from the planning handoff (`{sessionDir}/handoff.json` field `concerns`, default: `[]` if field is absent)
217
+
218
+ 2. **Evaluate `shouldPauseForReview`.** Check the following signals in order. If **any** signal is true, pause for human review. If **all** are false, auto-approve.
219
+
220
+ | # | Signal | Condition | Description |
221
+ | --- | -------------------- | ------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
222
+ | 1 | `reviewPlans` | `state.reviewPlans === true` | Session-level flag set by `--review-plans` CLI arg |
223
+ | 2 | `highComplexity` | `phase.complexity === "high"` | Phase is marked as high complexity in the spec (reachable when resuming after interactive planning; confirms the plan is ready for automated execution even though the human drove planning) |
224
+ | 3 | `complexityOverride` | `phase.complexityOverride !== null` | Planner produced more tasks than expected for the spec complexity |
225
+ | 4 | `plannerConcerns` | Handoff `concerns` array is non-empty | Planner flagged specific risks or uncertainties |
226
+ | 5 | `taskCount` | Plan contains > 15 tasks (i.e., 16+) | Plan is large enough to warrant human review |
227
+
228
+ 3. **Build the signal evaluation result** for reporting and recording:
229
+
230
+ ```json
231
+ {
232
+ "reviewPlans": false,
233
+ "highComplexity": "low",
234
+ "complexityOverride": null,
235
+ "plannerConcerns": [],
236
+ "taskCount": 8,
237
+ "taskThreshold": 15
238
+ }
239
+ ```
240
+
241
+ 4. **If auto-approving (no signals fired):**
242
+
243
+ a. **Emit structured auto-approve report:**
244
+
245
+ ```
246
+ Auto-approved Phase 1: Setup Infrastructure
247
+ Review mode: auto
248
+ Complexity: low (no override)
249
+ Planner concerns: none
250
+ Tasks: 8 (threshold: 15)
251
+ ```
252
+
253
+ b. **Record the decision** in state `decisions` array:
254
+
255
+ ```json
256
+ {
257
+ "phase": 0,
258
+ "decision": "auto_approved_plan",
259
+ "timestamp": "ISO-8601",
260
+ "signals": {
261
+ "reviewPlans": false,
262
+ "highComplexity": "low",
263
+ "complexityOverride": null,
264
+ "plannerConcerns": [],
265
+ "taskCount": 8,
266
+ "taskThreshold": 15
267
+ }
268
+ }
269
+ ```
270
+
271
+ c. **Transition to EXECUTE** — no human interaction needed.
272
+
273
+ 5. **If pausing for review (one or more signals fired):**
206
274
 
207
- 2. **Ask:** "Approve this plan and begin execution? (yes / revise / skip phase / stop)"
275
+ a. **Emit structured pause report** showing which signal(s) triggered:
276
+
277
+ ```
278
+ Pausing for review -- Phase 2: Auth Middleware
279
+ Review mode: manual (--review-plans flag set)
280
+ Complexity override: low -> medium (triggered)
281
+ Planner concerns: 2 concern(s)
282
+ Tasks: 12 (threshold: 15)
283
+ ```
284
+
285
+ Mark triggered signals explicitly. Non-triggered signals display their normal value without "(triggered)".
286
+
287
+ b. **Present the plan summary:** task count, checkpoint count, estimated time, effective complexity, and any concerns from the planning handoff.
288
+
289
+ c. **Ask:** "Approve this plan and begin execution? (yes / revise / skip phase / stop)"
208
290
  - **yes** — Transition to EXECUTE.
209
- - **revise** — Tell user to edit the plan file directly, then re-present.
291
+ - **revise** — Tell user to edit the plan file directly, then re-present from step 1.
210
292
  - **skip phase** — Mark phase as `skipped` in state, transition to PHASE_COMPLETE.
211
293
  - **stop** — Save state and exit. User can resume later.
212
294
 
213
- 3. **Record the decision** in state: `decisions` array.
295
+ d. **Record the decision** in state `decisions` array:
296
+
297
+ ```json
298
+ {
299
+ "phase": 0,
300
+ "decision": "approved_plan",
301
+ "timestamp": "ISO-8601",
302
+ "signals": {
303
+ "reviewPlans": true,
304
+ "highComplexity": "low",
305
+ "complexityOverride": "medium",
306
+ "plannerConcerns": ["concern text"],
307
+ "taskCount": 12,
308
+ "taskThreshold": 15
309
+ }
310
+ }
311
+ ```
312
+
313
+ Use the actual decision value: `approved_plan`, `revised_plan`, `skipped_phase`, or `stopped`.
214
314
 
215
- 4. **Update state** with `currentState: "EXECUTE"` and save.
315
+ 6. **Update state** with `currentState: "EXECUTE"` (or appropriate state for skip/stop) and save.
216
316
 
217
317
  ---
218
318
 
@@ -289,9 +389,9 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
289
389
 
290
390
  2. **When the agent returns:**
291
391
  - **All checks pass:** Transition to REVIEW.
292
- - **Failures found:** Surface findings to the user. Ask: "Fix these issues before review? (yes / skip verification / stop)"
293
- - **yes** — Re-enter EXECUTE with targeted fixes (retry budget resets for verification fixes).
294
- - **skip** — Proceed to REVIEW with verification warnings noted.
392
+ - **Failures found:** Surface findings to the user. Ask: "Fix these issues before review? (fix / skip verification / stop)"
393
+ - **fix** — Re-enter EXECUTE with targeted fixes (retry budget resets for verification fixes).
394
+ - **skip** — Record skip decision in `decisions` array. Proceed to REVIEW with verification warnings noted.
295
395
  - **stop** — Save state and exit.
296
396
 
297
397
  3. **Update state** with `currentState: "REVIEW"` and save.
@@ -320,9 +420,10 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
320
420
  ```
321
421
 
322
422
  2. **When the agent returns:**
423
+ - **Persist review findings:** Write the review findings to `{sessionDir}/phase-{N}-review.json` (array of findings with severity, file, line, title). This file is consumed by FINAL_REVIEW step 3.
323
424
  - **No blocking findings:** Report summary, transition to PHASE_COMPLETE.
324
- - **Blocking findings:** Surface to user. Ask: "Address blocking findings before completing this phase? (yes / override / stop)"
325
- - **yes** — Re-enter EXECUTE with review fixes.
425
+ - **Blocking findings:** Surface to user. Ask: "Address blocking findings before completing this phase? (fix / override / stop)"
426
+ - **fix** — Re-enter EXECUTE with review fixes.
326
427
  - **override** — Record override decision, transition to PHASE_COMPLETE.
327
428
  - **stop** — Save state and exit.
328
429
 
@@ -379,7 +480,76 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
379
480
  - If more phases remain: "Phase {N} complete. Next: Phase {N+1}: {name} (complexity: {level}). Continue? (yes / stop)"
380
481
  - **yes** — Increment `currentPhase`, reset `retryBudget`, transition to ASSESS.
381
482
  - **stop** — Save state and exit.
382
- - If no more phases: Transition to DONE.
483
+ - If no more phases: Transition to FINAL_REVIEW.
484
+
485
+ ---
486
+
487
+ ### FINAL_REVIEW — Project-Wide Code Review
488
+
489
+ > Runs automatically after the last phase completes. Reviews the cumulative diff (`startingCommit..HEAD`) across all phases to catch cross-phase issues before the PR offer.
490
+
491
+ 1. **Update state** with `currentState: "FINAL_REVIEW"` and save.
492
+
493
+ 2. **Update `finalReview` tracking** in `autopilot-state.json`: set `finalReview.status` to `"in_progress"`.
494
+
495
+ 3. **Gather per-phase review findings.** Read from `{sessionDir}/` — each phase's review output is stored alongside the phase handoff. Collect all review findings across phases into a single context block.
496
+
497
+ 4. **Dispatch review agent using the Agent tool:**
498
+
499
+ ```
500
+ Agent tool parameters:
501
+ subagent_type: "harness-code-reviewer"
502
+ description: "Final review: cross-phase coherence check"
503
+ prompt: |
504
+ You are running harness-code-review as a final project-wide review.
505
+
506
+ Diff scope: startingCommit..HEAD (use `git diff {startingCommit}..HEAD`)
507
+ Starting commit: {startingCommit}
508
+ Session directory: {sessionDir}
509
+ Session slug: {sessionSlug}
510
+
511
+ On startup, call gather_context({ session: "{sessionSlug}" }) to load
512
+ session-scoped learnings, state, and validation context.
513
+
514
+ ## Per-Phase Review Findings
515
+
516
+ {collected per-phase findings}
517
+
518
+ These were found and addressed during per-phase reviews. Don't assume
519
+ they're resolved — verify. Focus extra attention on cross-phase coherence:
520
+ naming consistency, duplicated utilities, architectural drift across phases.
521
+
522
+ Review the FULL diff (startingCommit..HEAD), not just the last phase.
523
+ Report findings with severity (blocking / warning / note).
524
+ ```
525
+
526
+ 5. **When the agent returns:**
527
+ - **No blocking findings:** Store all findings (blocking, warning, note) in `finalReview.findings`. Update `finalReview.status` to `"passed"`, report summary, transition to DONE.
528
+ - **Blocking findings:** Store all findings (blocking, warning, note) in `finalReview.findings`. Surface blocking findings to user. Ask: "Address blocking findings before completing? (fix / override / stop)"
529
+ - **fix** — Increment `finalReview.retryCount`. If `retryCount <= 3`: dispatch fixes using the Agent tool, then run `harness validate` to verify the fix, then re-run FINAL_REVIEW from step 2 (re-sets status to `in_progress`, re-gathers per-phase findings for fresh context). If `retryCount > 3`: stop — present all attempts to user, record in `.harness/failures.md`, ask: "How should we proceed? (fix manually and continue / stop)"
530
+
531
+ Fix dispatch:
532
+
533
+ ```
534
+ Agent tool parameters:
535
+ subagent_type: "harness-task-executor"
536
+ description: "Fix final review findings"
537
+ prompt: |
538
+ Fix the following blocking review findings. One task per finding.
539
+
540
+ {blocking findings with file, line, title, and rationale}
541
+
542
+ Session directory: {sessionDir}
543
+ Session slug: {sessionSlug}
544
+
545
+ Follow the harness-execution skill process. Commit each fix atomically.
546
+ Write {sessionDir}/handoff.json when done.
547
+ ```
548
+
549
+ - **override** — Record override decision (rationale from user) in state `decisions` array. Update `finalReview.status` to `"overridden"`. Transition to DONE.
550
+ - **stop** — Save state and exit. Resumable from FINAL_REVIEW.
551
+
552
+ 6. **Update state** and save after each step.
383
553
 
384
554
  ---
385
555
 
@@ -390,7 +560,8 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
390
560
  - Total tasks across all phases
391
561
  - Total retries used
392
562
  - Total time (first phase start to last phase completion)
393
- - Any overridden review findings
563
+ - Final review result: `finalReview.status` (passed / overridden) and total findings count from `finalReview.findings`
564
+ - Any overridden review findings (per-phase and final)
394
565
 
395
566
  2. **Offer next steps:**
396
567
  - "Create a PR? (yes / no)"
@@ -407,7 +578,11 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
407
578
  "pending": [],
408
579
  "concerns": [],
409
580
  "decisions": ["<all decisions from all phases>"],
410
- "contextKeywords": ["<merged from spec>"]
581
+ "contextKeywords": ["<merged from spec>"],
582
+ "finalReview": {
583
+ "status": "<passed | overridden>",
584
+ "findingsCount": "<number of findings from final review>"
585
+ }
411
586
  }
412
587
  ```
413
588
 
@@ -419,9 +594,13 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
419
594
  - [skill:harness-autopilot] [outcome:observation] {any notable patterns from the run}
420
595
  ```
421
596
 
422
- 5. **Update roadmap to done.** If `docs/roadmap.md` exists and the current spec maps to a roadmap feature, call `manage_roadmap` with action `update` to set the feature status to `done`. Derive the feature name from the spec title (H1 heading) or the session's `handoff.json` `summary` field. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `updateFeature()` from core. Skip silently if no roadmap exists or if the feature is not found. Do not use `force_sync: true`.
597
+ 5. **Promote session learnings to global.** Call `promoteSessionLearnings(projectPath, sessionSlug)` to move generalizable session learnings (tagged `[outcome:gotcha]`, `[outcome:decision]`, `[outcome:observation]`) to the global `learnings.md`. Report: "Promoted {N} learnings to global, {M} session-specific entries kept in session."
598
+
599
+ 6. **Check if pruning is needed.** Call `countLearningEntries(projectPath)`. If the count exceeds 30, suggest: "Global learnings.md has {count} entries (threshold: 30). Run `harness learnings prune` to analyze patterns and archive old entries."
600
+
601
+ 7. **Update roadmap to done.** If `docs/roadmap.md` exists and the current spec maps to a roadmap feature, call `manage_roadmap` with action `update` to set the feature status to `done`. Derive the feature name from the spec title (H1 heading) or the session's `handoff.json` `summary` field. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `updateFeature()` from core. Skip silently if no roadmap exists or if the feature is not found. Do not use `force_sync: true`.
423
602
 
424
- 6. **Write final session summary.** Update the session summary to reflect completion:
603
+ 8. **Write final session summary.** Update the session summary to reflect completion:
425
604
 
426
605
  ```json
427
606
  writeSessionSummary(projectPath, sessionSlug, {
@@ -435,7 +614,7 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
435
614
  })
436
615
  ```
437
616
 
438
- 7. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file — it serves as a record.
617
+ 9. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file — it serves as a record.
439
618
 
440
619
  ## Harness Integration
441
620
 
@@ -444,7 +623,7 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
444
623
  - **`harness check-deps`** — Delegated to harness-execution (included in task steps).
445
624
  - **State file** — `.harness/sessions/<slug>/autopilot-state.json` tracks the orchestration state machine. `.harness/sessions/<slug>/state.json` tracks task-level execution state (managed by harness-execution). The slug is derived from the spec path during INIT.
446
625
  - **Handoff** — `.harness/sessions/<slug>/handoff.json` is written by each delegated skill and read by the next. Autopilot writes a final handoff on DONE.
447
- - **Learnings** — `.harness/learnings.md` (global) is appended by both delegated skills and autopilot itself.
626
+ - **Learnings** — `.harness/learnings.md` (global) is appended by both delegated skills and autopilot itself. On DONE, session learnings with generalizable outcomes are promoted to global via `promoteSessionLearnings`. If global count exceeds 30, autopilot suggests running `harness learnings prune`.
448
627
  - **Roadmap context** — During INIT, reads `docs/roadmap.md` (if present) for project-level priorities, blockers, and milestone status. Provides broader context for phase execution decisions.
449
628
  - **Roadmap sync** — During PHASE_COMPLETE, calls `manage_roadmap` with `sync` and `apply: true` to reflect phase progress. During DONE, calls `manage_roadmap` with `update` to set feature status to `done`. Both skip silently when no roadmap exists. Neither uses `force_sync: true`.
450
629
 
@@ -456,7 +635,8 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
456
635
  - Planning override bumps complexity upward when task signals disagree
457
636
  - Retry budget (3 attempts) with escalating context before surfacing failures
458
637
  - Existing skills (planning, execution, verification, review) are unchanged
459
- - Human approves every plan before execution begins
638
+ - Plans auto-approve when no concern signals fire; plans pause for human review when any signal fires
639
+ - `--review-plans` flag forces human review for all plans in a session
460
640
  - Phase completion summary shown between every phase
461
641
 
462
642
  ## Examples
@@ -491,10 +671,11 @@ Plan generated: docs/plans/2026-03-19-core-scanner-plan.md (8 tasks, ~24 min)
491
671
  **Phase 1 — APPROVE_PLAN:**
492
672
 
493
673
  ```
494
- Phase 1: Core Scanner
495
- Tasks: 8 | Checkpoints: 1 | Est. time: 24 min | Complexity: low
496
- Approve this plan and begin execution? (yes / revise / skip / stop)
497
- User: "yes"
674
+ Auto-approved Phase 1: Core Scanner
675
+ Review mode: auto
676
+ Complexity: low (no override)
677
+ Planner concerns: none
678
+ Tasks: 8 (threshold: 15)
498
679
  ```
499
680
 
500
681
  **Phase 1 — EXECUTE → VERIFY → REVIEW:**
@@ -533,10 +714,22 @@ Resuming autopilot from state PLAN, phase 2: Rule Engine.
533
714
  Found plan: docs/plans/2026-03-19-rule-engine-plan.md
534
715
  ```
535
716
 
536
- **Phase 2 — APPROVE_PLAN → EXECUTE → VERIFY → REVIEW → PHASE_COMPLETE**
717
+ **Phase 2 — APPROVE_PLAN:**
537
718
 
538
719
  ```
539
- [Same flow as Phase 1, with checkpoint pauses as needed]
720
+ Pausing for review -- Phase 2: Rule Engine
721
+ Review mode: auto
722
+ Complexity: high (triggered)
723
+ Planner concerns: none
724
+ Tasks: 14 (threshold: 15)
725
+ Approve this plan and begin execution? (yes / revise / skip / stop)
726
+ → User: "yes"
727
+ ```
728
+
729
+ **Phase 2 — EXECUTE → VERIFY → REVIEW → PHASE_COMPLETE**
730
+
731
+ ```
732
+ [Execution with checkpoint pauses as needed]
540
733
  Phase 2: Rule Engine — COMPLETE
541
734
  Tasks: 14/14 | Retries: 1 | Verification: pass | Review: 0 blocking
542
735
  Next: Phase 3: CLI Integration (complexity: low). Continue? (yes / stop)
@@ -545,11 +738,19 @@ Next: Phase 3: CLI Integration (complexity: low). Continue? (yes / stop)
545
738
 
546
739
  **Phase 3 — [auto-plans, executes, completes]**
547
740
 
741
+ **FINAL_REVIEW:**
742
+
743
+ ```
744
+ [harness-code-reviewer runs cross-phase review on startingCommit..HEAD]
745
+ Final review: 0 blocking, 1 warning. Passed.
746
+ ```
747
+
548
748
  **DONE:**
549
749
 
550
750
  ```
551
751
  All phases complete.
552
752
  Total: 3 phases, 30 tasks, 1 retry
753
+ Final review: passed (0 blocking, 1 warning)
553
754
  Create a PR? (yes / no)
554
755
  → User: "yes"
555
756
  ```
@@ -592,7 +793,7 @@ How should we proceed? (fix manually and continue / revise plan / stop)
592
793
  ## Gates
593
794
 
594
795
  - **No reimplementing delegated skills.** Autopilot orchestrates. If you are writing planning logic, execution logic, verification logic, or review logic, STOP. Delegate to the appropriate persona agent via `subagent_type`.
595
- - **No executing without plan approval.** Every plan must be explicitly approved by the human before execution begins. No exceptions, regardless of complexity level.
796
+ - **No executing without plan approval.** Every plan passes through the APPROVE_PLAN gate. When no concern signals fire, the plan is auto-approved with a structured report. When any signal fires, the plan pauses for human review. The `--review-plans` flag forces all plans to pause. No plan reaches EXECUTE without passing this gate.
596
797
  - **No skipping VERIFY or REVIEW.** Every phase goes through verification and review. The human can override findings, but the steps cannot be skipped.
597
798
  - **No infinite retries.** The retry budget is 3 attempts. If exhausted, STOP and surface to the human. Do not extend the budget without explicit human instruction.
598
799
  - **No modifying session state files manually.** The session state files are managed by the skill. If the state appears corrupted, start fresh rather than patching it.
@@ -23,6 +23,9 @@ cli:
23
23
  - name: path
24
24
  description: Project root path
25
25
  required: false
26
+ - name: review-plans
27
+ description: Force human review of all plans (overrides auto-approve)
28
+ required: false
26
29
  mcp:
27
30
  tool: run_skill
28
31
  input:
@@ -37,6 +40,9 @@ phases:
37
40
  - name: loop
38
41
  description: Execute state machine — assess, plan, execute, verify, review per phase
39
42
  required: true
43
+ - name: final_review
44
+ description: Project-wide code review of cumulative changes before PR offer
45
+ required: true
40
46
  - name: complete
41
47
  description: Final summary and PR offering
42
48
  required: true
@@ -259,6 +259,45 @@ For each proposed approach, evaluate from each perspective:
259
259
 
260
260
  Converge on a recommendation that addresses all concerns before presenting the design.
261
261
 
262
+ ## Session State
263
+
264
+ This skill reads and writes to the following session sections via `manage_state`:
265
+
266
+ | Section | Read | Write | Purpose |
267
+ | ------------- | ---- | ----- | ----------------------------------------------------------------- |
268
+ | terminology | yes | yes | Captures domain terms discovered during brainstorming |
269
+ | decisions | no | yes | Records design decisions made during exploration |
270
+ | constraints | yes | no | Reads constraints to scope brainstorming |
271
+ | risks | no | yes | Captures risks identified during brainstorming |
272
+ | openQuestions | yes | yes | Adds new questions, resolves answered ones |
273
+ | evidence | no | yes | Cites sources for design recommendations and prior art references |
274
+
275
+ **When to write:** After each phase transition (EXPLORE -> EVALUATE -> PRIORITIZE -> VALIDATE), append relevant entries to the appropriate sections. This ensures downstream skills (planning, execution) inherit accumulated context without re-discovery.
276
+
277
+ **When to read:** At the start of Phase 1 (EXPLORE), read `terminology` and `constraints` from the session to inherit context from prior skills or previous brainstorming sessions on the same feature.
278
+
279
+ ## Evidence Requirements
280
+
281
+ When this skill makes claims about existing code behavior, architecture patterns, or technical tradeoffs, it MUST cite evidence using one of:
282
+
283
+ 1. **File reference:** `file:line` format (e.g., `src/services/auth.ts:42` -- "existing JWT middleware handles token refresh")
284
+ 2. **Prior art reference:** `file` format with description (e.g., `src/utils/email.ts` -- "email utility already exists, can be reused for notifications")
285
+ 3. **Documentation reference:** `docs/path` format (e.g., `docs/changes/user-auth/proposal.md` -- "prior spec established OAuth2 as the auth standard")
286
+ 4. **Session evidence:** Write to the `evidence` session section:
287
+ ```json
288
+ manage_state({
289
+ action: "append_entry",
290
+ session: "<current-session>",
291
+ section: "evidence",
292
+ authorSkill: "harness-brainstorming",
293
+ content: "src/services/auth.ts:42 -- existing JWT middleware supports refresh tokens"
294
+ })
295
+ ```
296
+
297
+ **When to cite:** During Phase 1 (EXPLORE) when referencing existing code or patterns. During Phase 3 (PRIORITIZE) when justifying tradeoffs with concrete code references. During Phase 4 (VALIDATE) when spec references existing implementation details.
298
+
299
+ **Uncited claims:** Technical assertions without citations MUST be prefixed with `[UNVERIFIED]`. Example: `[UNVERIFIED] The current auth middleware does not support refresh tokens`. Uncited claims are flagged during review (Wave 2.2).
300
+
262
301
  ## Harness Integration
263
302
 
264
303
  - **`harness validate`** — Run after writing the spec to `docs/`. Verifies project health and that the new spec file is properly placed.
@@ -138,6 +138,24 @@ Run mechanical checks to establish an exclusion boundary. Any issue caught mecha
138
138
 
139
139
  **Output:** A set of mechanical findings (file, line, tool, message). This set becomes the exclusion list for Phase 5.
140
140
 
141
+ #### Evidence Gate (session-aware)
142
+
143
+ When a `sessionSlug` is available (e.g., via autopilot dispatch or `--session` flag), the pipeline loads evidence entries from the session state and cross-references them with review findings:
144
+
145
+ 1. Load evidence entries: `readSessionSection(projectRoot, sessionSlug, 'evidence')`
146
+ 2. For each finding, check if any active evidence entry references the same file:line location
147
+ 3. Findings without matching evidence are tagged with `[UNVERIFIED]` prefix in their title
148
+ 4. An evidence coverage report is appended to the review output:
149
+ ```
150
+ Evidence Coverage:
151
+ Evidence entries: 12
152
+ Findings with evidence: 8/10
153
+ Uncited findings: 2 (flagged as [UNVERIFIED])
154
+ Coverage: 80%
155
+ ```
156
+
157
+ When no session is available, evidence checking is skipped silently. This is not an error -- evidence checking enhances reviews but does not gate them.
158
+
141
159
  **Exit:** If any mechanical check fails (harness validate, typecheck, or tests), report the mechanical failures in Strengths/Issues/Assessment format and stop the pipeline. The code has fundamental issues that must be fixed before AI review adds value. Lint warnings and security scan findings do not stop the pipeline — they are recorded for exclusion only.
142
160
 
143
161
  ---
@@ -628,6 +646,32 @@ _This section is not part of the pipeline. It documents the process for respondi
628
646
 
629
647
  ---
630
648
 
649
+ ## Evidence Requirements
650
+
651
+ When this skill produces review findings, every finding MUST include evidence citations. The `ReviewFinding.evidence` array field already exists in the finding schema -- this section defines the citation standard for populating it.
652
+
653
+ Every review finding MUST cite evidence using one of:
654
+
655
+ 1. **File reference:** `file:line` format (e.g., `src/api/routes/users.ts:12-15` -- "direct import from db/queries.ts bypasses service layer")
656
+ 2. **Diff evidence:** Before/after code from the PR diff with file path and line numbers
657
+ 3. **Dependency chain:** Import path showing the violation (e.g., `routes/users.ts:3 imports db/queries.ts` -- "violates routes -> services -> db layer direction")
658
+ 4. **Test evidence:** Include test command and output when findings relate to missing or failing tests
659
+ 5. **Convention reference:** Cite the specific convention file and rule (e.g., `AGENTS.md:45` -- "convention requires services layer between routes and db")
660
+ 6. **Session evidence:** Write significant findings to the `evidence` session section:
661
+ ```json
662
+ manage_state({
663
+ action: "append_entry",
664
+ session: "<current-session>",
665
+ section: "evidence",
666
+ authorSkill: "harness-code-review",
667
+ content: "src/api/routes/users.ts:12-15 -- layer violation: direct import from db/queries.ts"
668
+ })
669
+ ```
670
+
671
+ **When to cite:** In Phase 4 (FAN-OUT), each subagent populates the `evidence` array in every `ReviewFinding`. In Phase 5 (VALIDATE), evidence is used to verify reachability claims. In Phase 7 (OUTPUT), every issue in the review includes its file:line location and rationale backed by evidence.
672
+
673
+ **Uncited claims:** Review findings without evidence in the `evidence` array are discarded during Phase 5 (VALIDATE). Observations that cannot be tied to specific file:line references MUST be prefixed with `[UNVERIFIED]` and downgraded to `severity: 'suggestion'`.
674
+
631
675
  ## Harness Integration
632
676
 
633
677
  - **`assess_project`** — Used in Phase 2 (MECHANICAL) to run `validate`, `deps`, and `docs` checks in parallel. Must pass for the pipeline to continue to AI review. Failures are Critical issues that stop the pipeline.
@@ -345,6 +345,50 @@ These are non-negotiable. When any condition is met, stop immediately.
345
345
 
346
346
  - **Three consecutive failures on the same task.** After 3 attempts, the task design is likely wrong. Stop. Report: "Task N has failed 3 times. Root cause: [analysis]. The plan may need revision."
347
347
 
348
+ ## Session State
349
+
350
+ This skill reads and writes to the following session sections via `manage_state`:
351
+
352
+ | Section | Read | Write | Purpose |
353
+ | ------------- | ---- | ----- | ------------------------------------------------------------------------------------- |
354
+ | terminology | yes | yes | Reads domain terms for consistent naming; adds terms discovered during implementation |
355
+ | decisions | yes | yes | Reads planning decisions for context; records implementation decisions |
356
+ | constraints | yes | yes | Reads constraints to respect boundaries; adds constraints discovered during coding |
357
+ | risks | yes | yes | Reads risks for awareness; updates risk status as mitigated or realized |
358
+ | openQuestions | yes | yes | Reads questions for context; resolves questions answered by implementation |
359
+ | evidence | yes | yes | Reads prior evidence; writes file:line citations, test outputs, and diff references |
360
+
361
+ **When to write:** After each task completion, append relevant entries. Evidence entries should be written for every significant technical assertion (test result, file reference, performance measurement). Mark openQuestions as resolved when implementation answers them.
362
+
363
+ **When to read:** During Phase 1 (PREPARE), read all sections via `gather_context` with `include: ["sessions"]` to inherit full accumulated context from brainstorming and planning.
364
+
365
+ ## Evidence Requirements
366
+
367
+ When this skill makes claims about task completion, test results, or code behavior, it MUST cite evidence using one of:
368
+
369
+ 1. **File reference:** `file:line` format (e.g., `src/services/notification-service.ts:42` -- "create method implemented with validation")
370
+ 2. **Test output:** Include the actual test command and its output:
371
+ ```
372
+ $ npx vitest run src/services/notification-service.test.ts
373
+ PASS src/services/notification-service.test.ts (8 tests)
374
+ ```
375
+ 3. **Diff evidence:** Before/after with file path for modifications to existing files
376
+ 4. **Harness output:** Include `harness validate` output as evidence of project health
377
+ 5. **Session evidence:** Write to the `evidence` session section after each task:
378
+ ```json
379
+ manage_state({
380
+ action: "append_entry",
381
+ session: "<current-session>",
382
+ section: "evidence",
383
+ authorSkill: "harness-execution",
384
+ content: "src/services/notification-service.ts:42 -- create method returns Notification with all required fields"
385
+ })
386
+ ```
387
+
388
+ **When to cite:** After every task completion in Phase 2 (EXECUTE). Every commit message claim ("added X", "fixed Y") must be backed by test output or file reference. During Phase 4 (PERSIST) when writing learnings that reference specific code behavior.
389
+
390
+ **Uncited claims:** Technical assertions without citations MUST be prefixed with `[UNVERIFIED]`. Example: `[UNVERIFIED] The notification service handles duplicate entries`. Uncited claims are flagged during review (Wave 2.2).
391
+
348
392
  ## Harness Integration
349
393
 
350
394
  - **`harness validate`** — Run after every task completion. Mandatory. No task is complete without a passing validation.