@harness-engineering/cli 1.11.0 → 1.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. package/dist/agents/skills/claude-code/harness-autopilot/SKILL.md +57 -9
  2. package/dist/agents/skills/claude-code/harness-brainstorming/SKILL.md +1 -1
  3. package/dist/agents/skills/claude-code/harness-code-review/SKILL.md +19 -2
  4. package/dist/agents/skills/claude-code/harness-execution/SKILL.md +39 -12
  5. package/dist/agents/skills/claude-code/harness-planning/SKILL.md +28 -11
  6. package/dist/agents/skills/claude-code/harness-roadmap/SKILL.md +34 -0
  7. package/dist/agents/skills/claude-code/harness-verification/SKILL.md +42 -0
  8. package/dist/agents/skills/gemini-cli/harness-autopilot/SKILL.md +57 -9
  9. package/dist/agents/skills/gemini-cli/harness-brainstorming/SKILL.md +1 -1
  10. package/dist/agents/skills/gemini-cli/harness-code-review/SKILL.md +19 -2
  11. package/dist/agents/skills/gemini-cli/harness-execution/SKILL.md +39 -12
  12. package/dist/agents/skills/gemini-cli/harness-planning/SKILL.md +28 -11
  13. package/dist/agents/skills/gemini-cli/harness-roadmap/SKILL.md +34 -0
  14. package/dist/agents/skills/gemini-cli/harness-verification/SKILL.md +42 -0
  15. package/dist/{agents-md-ZFV6RR5J.js → agents-md-P2RHSUV7.js} +1 -1
  16. package/dist/{architecture-EXNUMH5R.js → architecture-ESOOE26S.js} +2 -2
  17. package/dist/bin/harness-mcp.js +10 -10
  18. package/dist/bin/harness.js +12 -12
  19. package/dist/{check-phase-gate-VZFOY2PO.js → check-phase-gate-S2MZKLFQ.js} +2 -2
  20. package/dist/{chunk-GSIVNYVJ.js → chunk-2VU4MFM3.js} +4 -4
  21. package/dist/{chunk-2NCIKJES.js → chunk-3KOLLWWE.js} +1 -1
  22. package/dist/{chunk-X3MN5UQJ.js → chunk-5VY23YK3.js} +1 -1
  23. package/dist/{chunk-I6JZYEGT.js → chunk-7KQSUZVG.js} +96 -50
  24. package/dist/{chunk-PA2XHK75.js → chunk-7PZWR4LI.js} +3 -3
  25. package/dist/{chunk-2YSQOUHO.js → chunk-KELT6K6M.js} +662 -283
  26. package/dist/{chunk-WUJTCNOU.js → chunk-LD3DKUK5.js} +1 -1
  27. package/dist/{chunk-Z75JC6I2.js → chunk-MACVXDZK.js} +2 -2
  28. package/dist/{chunk-NC6PXVWT.js → chunk-MI5XJQDY.js} +3 -3
  29. package/dist/{chunk-WJZDO6OY.js → chunk-PSNN4LWX.js} +2 -2
  30. package/dist/{chunk-ZWC3MN5E.js → chunk-RZSUJBZZ.js} +765 -203
  31. package/dist/{chunk-TI4TGEX6.js → chunk-WPPDRIJL.js} +1 -1
  32. package/dist/{ci-workflow-K5RCRNYR.js → ci-workflow-4NYBUG6R.js} +1 -1
  33. package/dist/{dist-JVZ2MKBC.js → dist-WF4C7A4A.js} +27 -1
  34. package/dist/{docs-PWCUVYWU.js → docs-BPYCN2DR.js} +2 -2
  35. package/dist/{engine-6XUP6GAK.js → engine-LXLIWQQ3.js} +1 -1
  36. package/dist/{entropy-4I6JEYAC.js → entropy-4VDVV5CR.js} +2 -2
  37. package/dist/{feedback-TNIW534S.js → feedback-63QB5RCA.js} +1 -1
  38. package/dist/{generate-agent-definitions-MWKEA5NU.js → generate-agent-definitions-QABOJG56.js} +1 -1
  39. package/dist/index.d.ts +80 -43
  40. package/dist/index.js +17 -13
  41. package/dist/{loader-4FIPIFII.js → loader-Z2IT7QX3.js} +1 -1
  42. package/dist/{mcp-MOKLYNZL.js → mcp-KQHEL5IF.js} +10 -10
  43. package/dist/{performance-BTOJCPXU.js → performance-26BH47O4.js} +2 -2
  44. package/dist/{review-pipeline-3YTW3463.js → review-pipeline-GHR3WFBI.js} +1 -1
  45. package/dist/{runtime-GO7K2PJE.js → runtime-PDWD7UIK.js} +1 -1
  46. package/dist/{security-4P2GGFF6.js → security-UQFUZXEN.js} +1 -1
  47. package/dist/{validate-JN44D2Q7.js → validate-N7QJOKFZ.js} +2 -2
  48. package/dist/{validate-cross-check-DB7RIFFF.js → validate-cross-check-EDQ5QGTM.js} +1 -1
  49. package/package.json +4 -4
@@ -102,20 +102,26 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
102
102
  path: "<project-root>",
103
103
  intent: "Autopilot phase execution for <spec name>",
104
104
  skill: "harness-autopilot",
105
+ session: "<session-slug>",
105
106
  include: ["state", "learnings", "handoff", "validation"]
106
107
  })
107
108
  ```
108
109
 
109
- This loads learnings (including failure entries tagged `[outcome:failure]`), handoff context, state, and validation results in a single call. Note any relevant learnings or known dead ends for the current phase from the returned `learnings` array.
110
+ This loads session-scoped learnings, handoff, state, and validation results in a single call. The `session` parameter ensures all reads come from the session directory (`.harness/sessions/<slug>/`), isolating this workstream from others. Note any relevant learnings or known dead ends for the current phase from the returned `learnings` array.
110
111
 
111
- 6. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
112
+ 6. **Load session summary for cold start.** If resuming (existing `autopilot-state.json` found):
113
+ - Call `loadSessionSummary()` for the session slug to get quick orientation context (~200 tokens).
114
+ - The summary provides the last skill, phase, status, and next step — enough to understand where the autopilot left off without re-reading the full state machine.
115
+ - If no summary exists (first run), skip — the full INIT handles context loading.
116
+
117
+ 7. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
112
118
  - Current project priorities (which features are `in-progress`)
113
119
  - Blockers that may affect the upcoming phases
114
120
  - Overall project status and milestone progress
115
121
 
116
122
  This provides the autopilot with project-level context beyond the individual spec being executed. If the roadmap does not exist, skip this step — the autopilot operates normally without it.
117
123
 
118
- 7. **Transition to ASSESS.**
124
+ 8. **Transition to ASSESS.**
119
125
 
120
126
  ---
121
127
 
@@ -155,9 +161,11 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
155
161
 
156
162
  Spec: {specPath}
157
163
  Session directory: {sessionDir}
164
+ Session slug: {sessionSlug}
158
165
  Phase description: {phase description from spec}
159
- Previous phase learnings (global): {relevant learnings from .harness/learnings.md}
160
- Known failures to avoid (global): {relevant entries from .harness/failures.md}
166
+
167
+ On startup, call gather_context({ session: "{sessionSlug}" }) to load
168
+ session-scoped learnings, state, and validation context.
161
169
 
162
170
  Follow the harness-planning skill process exactly. Write the plan to
163
171
  docs/plans/{date}-{phase-name}-plan.md. Write {sessionDir}/handoff.json when done.
@@ -221,9 +229,11 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
221
229
 
222
230
  Plan: {planPath}
223
231
  Session directory: {sessionDir}
232
+ Session slug: {sessionSlug}
224
233
  State: {sessionDir}/state.json
225
- Learnings (global): .harness/learnings.md
226
- Failures (global): .harness/failures.md
234
+
235
+ On startup, call gather_context({ session: "{sessionSlug}" }) to load
236
+ session-scoped learnings, state, and validation context.
227
237
 
228
238
  Follow the harness-execution skill process exactly.
229
239
  Update {sessionDir}/state.json after each task.
@@ -268,6 +278,10 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
268
278
  You are running harness-verification for phase {N}: {name}.
269
279
 
270
280
  Session directory: {sessionDir}
281
+ Session slug: {sessionSlug}
282
+
283
+ On startup, call gather_context({ session: "{sessionSlug}" }) to load
284
+ session-scoped learnings, state, and validation context.
271
285
 
272
286
  Follow the harness-verification skill process exactly.
273
287
  Report pass/fail with findings.
@@ -296,6 +310,10 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
296
310
  You are running harness-code-review for phase {N}: {name}.
297
311
 
298
312
  Session directory: {sessionDir}
313
+ Session slug: {sessionSlug}
314
+
315
+ On startup, call gather_context({ session: "{sessionSlug}" }) to load
316
+ session-scoped learnings, state, and validation context.
299
317
 
300
318
  Follow the harness-code-review skill process exactly.
301
319
  Report findings with severity (blocking / warning / note).
@@ -341,7 +359,23 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
341
359
 
342
360
  4. **Sync roadmap.** If `docs/roadmap.md` exists, call `manage_roadmap` with action `sync` and `apply: true`. This reflects the just-completed phase in the roadmap (e.g., updating the feature from `planned` to `in-progress`). If `manage_roadmap` is unavailable, fall back to direct file manipulation using `syncRoadmap()` from core. Skip silently if no roadmap exists. Do not use `force_sync: true` — the human-always-wins rule applies.
343
361
 
344
- 5. **Check for next phase:**
362
+ 5. **Write session summary.** Update the session summary to reflect the completed phase:
363
+
364
+ ```json
365
+ writeSessionSummary(projectPath, sessionSlug, {
366
+ session: "<session-slug>",
367
+ lastActive: "<ISO timestamp>",
368
+ skill: "harness-autopilot",
369
+ phase: "<completed phase number> of <total phases>",
370
+ status: "Phase <N> complete. <tasks completed>/<total> tasks.",
371
+ spec: "<spec path>",
372
+ plan: "<current plan path>",
373
+ keyContext: "<1-2 sentences: what this phase accomplished, key decisions>",
374
+ nextStep: "<e.g., Continue to Phase N+1: <name>, or DONE>"
375
+ })
376
+ ```
377
+
378
+ 6. **Check for next phase:**
345
379
  - If more phases remain: "Phase {N} complete. Next: Phase {N+1}: {name} (complexity: {level}). Continue? (yes / stop)"
346
380
  - **yes** — Increment `currentPhase`, reset `retryBudget`, transition to ASSESS.
347
381
  - **stop** — Save state and exit.
@@ -387,7 +421,21 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
387
421
 
388
422
  5. **Update roadmap to done.** If `docs/roadmap.md` exists and the current spec maps to a roadmap feature, call `manage_roadmap` with action `update` to set the feature status to `done`. Derive the feature name from the spec title (H1 heading) or the session's `handoff.json` `summary` field. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `updateFeature()` from core. Skip silently if no roadmap exists or if the feature is not found. Do not use `force_sync: true`.
389
423
 
390
- 6. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file it serves as a record.
424
+ 6. **Write final session summary.** Update the session summary to reflect completion:
425
+
426
+ ```json
427
+ writeSessionSummary(projectPath, sessionSlug, {
428
+ session: "<session-slug>",
429
+ lastActive: "<ISO timestamp>",
430
+ skill: "harness-autopilot",
431
+ status: "DONE. <total phases> phases, <total tasks> tasks complete.",
432
+ spec: "<spec path>",
433
+ keyContext: "<1-2 sentences: overall summary of what was built>",
434
+ nextStep: "All phases complete. Create PR or close session."
435
+ })
436
+ ```
437
+
438
+ 7. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file — it serves as a record.
391
439
 
392
440
  ## Harness Integration
393
441
 
@@ -161,7 +161,7 @@ These keywords flow into the `handoff.json` `contextKeywords` field when the spe
161
161
  - Call `manage_roadmap` with action `add`, `status: "planned"`, `milestone: "Current Work"`, and the spec path. Include a one-line summary from the spec overview.
162
162
  - If the feature already exists in the roadmap (duplicate name), skip silently — the feature was likely added manually or by a prior brainstorming session.
163
163
  - Log: `"Added '<feature-name>' to roadmap as planned"` (informational, not a prompt).
164
- - If `manage_roadmap` is unavailable, fall back to direct file manipulation using `addFeature()` from core.
164
+ - If `manage_roadmap` is unavailable, fall back to direct file manipulation using `parseRoadmap`/`serializeRoadmap` from core to read, modify, and write `docs/roadmap.md`.
165
165
  - If no roadmap exists, skip this step silently.
166
166
 
167
167
  7. **Write handoff and suggest transition.** After the human approves the spec:
@@ -212,12 +212,13 @@ gather_context({
212
212
  path: "<project-root>",
213
213
  intent: "Code review of <change description>",
214
214
  skill: "harness-code-review",
215
+ session: "<session-slug-if-provided>",
215
216
  tokenBudget: 8000,
216
217
  include: ["graph", "learnings", "validation"]
217
218
  })
218
219
  ```
219
220
 
220
- This replaces manual `query_graph` + `get_impact` + `find_context_for` calls with a single composite call that assembles review context in parallel, ranked by relevance. Falls back gracefully when no graph is available (`meta.graphAvailable: false`).
221
+ This replaces manual `query_graph` + `get_impact` + `find_context_for` calls with a single composite call that assembles review context in parallel, ranked by relevance. Falls back gracefully when no graph is available (`meta.graphAvailable: false`). When `session` is provided (e.g., via autopilot dispatch), learnings and state are scoped to the session directory. If no session is known, omit the parameter — `gather_context` falls back to global files.
221
222
 
222
223
  For domain-specific scoping (compliance, bug detection, security, architecture), supplement `gather_context` output with targeted `query_graph` calls as needed.
223
224
 
@@ -528,6 +529,22 @@ Write `.harness/handoff.json`:
528
529
  }
529
530
  ```
530
531
 
532
+ **Write session summary (if session is known).** If running within a session context, update the session summary:
533
+
534
+ ```json
535
+ writeSessionSummary(projectPath, sessionSlug, {
536
+ session: "<session-slug>",
537
+ lastActive: "<ISO timestamp>",
538
+ skill: "harness-code-review",
539
+ status: "Review complete. Assessment: <approve|request-changes|comment>. <N> findings.",
540
+ spec: "<spec path if known>",
541
+ keyContext: "<1-2 sentences: review outcome, key findings>",
542
+ nextStep: "<e.g., Address blocking findings / Ready to merge / Observations delivered>"
543
+ })
544
+ ```
545
+
546
+ If no session slug is known, skip this step.
547
+
531
548
  **If assessment is "approve":**
532
549
 
533
550
  Call `emit_interaction`:
@@ -614,7 +631,7 @@ _This section is not part of the pipeline. It documents the process for respondi
614
631
  ## Harness Integration
615
632
 
616
633
  - **`assess_project`** — Used in Phase 2 (MECHANICAL) to run `validate`, `deps`, and `docs` checks in parallel. Must pass for the pipeline to continue to AI review. Failures are Critical issues that stop the pipeline.
617
- - **`gather_context`** — Used in Phase 3 (CONTEXT) for efficient parallel context assembly. Replaces separate graph query calls.
634
+ - **`gather_context`** — Used in Phase 3 (CONTEXT) for efficient parallel context assembly. The `session` parameter scopes learnings and state to the session directory when provided by autopilot dispatch. Replaces separate graph query calls.
618
635
  - **`harness cleanup`** — Optional check during Phase 2 for entropy accumulation in changed files.
619
636
  - **Graph queries** — Used in Phase 3 (CONTEXT) for dependency-scoped context and in Phase 5 (VALIDATE) for reachability verification. Graceful fallback when no graph exists.
620
637
  - **`emit_interaction`** -- Call after review approval to suggest transitioning to merge/PR creation. Only emitted on APPROVE assessment. Uses confirmed transition (waits for user approval).
@@ -33,20 +33,29 @@ Deviating from the plan mid-execution introduces untested assumptions, breaks ta
33
33
  path: "<project-root>",
34
34
  intent: "Execute plan tasks starting from current position",
35
35
  skill: "harness-execution",
36
+ session: "<session-slug-if-known>",
36
37
  include: ["state", "learnings", "handoff", "validation"]
37
38
  })
38
39
  ```
39
40
 
41
+ **Session resolution:** If a session directory is known (passed via autopilot dispatch or available from a previous handoff), include the `session` parameter. This scopes all state reads/writes to `.harness/sessions/<slug>/`. If no session is known, omit it — `gather_context` falls back to global files at `.harness/`.
42
+
40
43
  This returns `state` (current position — if null, this is a fresh start at Task 1), `learnings` (hard-won insights from previous sessions — do not ignore them), `handoff` (structured context from the previous skill), and `validation` (current project health). If any constituent fails, its field is null and the error is reported in `meta.errors`.
41
44
 
42
- 3. **Check for known dead ends.** Review `learnings` entries tagged `[outcome:failure]`. If any match approaches in the current plan, surface warnings before proceeding.
45
+ 3. **Load session summary for cold start.** If resuming a session (session slug is known), read the session summary for quick orientation:
46
+ - Call `listActiveSessions()` to read the session index (~100 tokens).
47
+ - If the target session is known, call `loadSessionSummary()` for that session (~200 tokens).
48
+ - If ambiguous (multiple active sessions, no clear target), present the index to the user and ask which session to resume.
49
+ - The summary provides skill, phase, status, key context, and next step — enough to orient without re-reading full state + learnings + plan.
50
+
51
+ 4. **Check for known dead ends.** Review `learnings` entries tagged `[outcome:failure]`. If any match approaches in the current plan, surface warnings before proceeding.
43
52
 
44
- 4. **Verify prerequisites.** For the current task:
53
+ 5. **Verify prerequisites.** For the current task:
45
54
  - Are dependency tasks marked complete in state?
46
55
  - Do the files referenced in the task exist as expected?
47
56
  - Does the test suite pass? Run `harness validate` to confirm a clean baseline.
48
57
 
49
- 5. **If prerequisites fail,** do not proceed. Report what is missing and which task is blocked.
58
+ 6. **If prerequisites fail,** do not proceed. Report what is missing and which task is blocked.
50
59
 
51
60
  ### Graph-Enhanced Context (when available)
52
61
 
@@ -220,7 +229,7 @@ emit_interaction({
220
229
 
221
230
  Between tasks (especially between sessions):
222
231
 
223
- 1. **Update `.harness/state.json`** with current position, progress, and `lastSession` context:
232
+ 1. **Update state (session-scoped `{sessionDir}/state.json` if session is known, otherwise `.harness/state.json`)** with current position, progress, and `lastSession` context:
224
233
 
225
234
  ```json
226
235
  {
@@ -241,7 +250,7 @@ harness scan [path]
241
250
 
242
251
  Skipping this step means subsequent graph queries (impact analysis, dependency health, test advisor) may return stale results.
243
252
 
244
- 2. **Append tagged learnings to `.harness/learnings.md`.** Tag every entry with skill and outcome:
253
+ 2. **Append tagged learnings to the session-scoped learnings file (`{sessionDir}/learnings.md` if session is known, otherwise `.harness/learnings.md`).** Tag every entry with skill and outcome:
245
254
 
246
255
  ```markdown
247
256
  ## YYYY-MM-DD — Task N: <task name>
@@ -251,9 +260,9 @@ Skipping this step means subsequent graph queries (impact analysis, dependency h
251
260
  - [skill:harness-execution] [outcome:decision] What was decided and why
252
261
  ```
253
262
 
254
- 3. **Record failures in `.harness/failures.md`** if any task was escalated after retry exhaustion (from Phase 2 Step 5). Include the approach attempted and why it failed, so future sessions avoid the same dead end.
263
+ 3. **Record failures in the session-scoped failures file (`{sessionDir}/failures.md` if session is known, otherwise `.harness/failures.md`)** if any task was escalated after retry exhaustion (from Phase 2 Step 5). Include the approach attempted and why it failed, so future sessions avoid the same dead end.
255
264
 
256
- 4. **Write `.harness/handoff.json`** with structured context for the next skill or session:
265
+ 4. **Write the session-scoped handoff (`{sessionDir}/handoff.json` if session is known, otherwise `.harness/handoff.json`)** with structured context for the next skill or session:
257
266
 
258
267
  ```json
259
268
  {
@@ -266,11 +275,29 @@ Skipping this step means subsequent graph queries (impact analysis, dependency h
266
275
  }
267
276
  ```
268
277
 
269
- 5. **Sync roadmap (mandatory when present).** If `docs/roadmap.md` exists, call `manage_roadmap` with action `sync` and `apply: true` to update linked feature statuses from the just-completed execution state. Do not use `force_sync: true` — the human-always-wins rule applies. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `syncRoadmap()` from core. If no roadmap exists, skip silently.
278
+ 5. **Write session summary.** Write/update the session summary for cold-start context restoration:
279
+
280
+ ```json
281
+ writeSessionSummary(projectPath, sessionSlug, {
282
+ session: "<session-slug>",
283
+ lastActive: "<ISO timestamp>",
284
+ skill: "harness-execution",
285
+ phase: "<current phase of plan>",
286
+ status: "<e.g., Task 4/6 complete, paused at CHECKPOINT>",
287
+ spec: "<spec path if known>",
288
+ plan: "<plan path>",
289
+ keyContext: "<1-2 sentences: what was accomplished, key decisions made>",
290
+ nextStep: "<what to do next when resuming>"
291
+ })
292
+ ```
293
+
294
+ This overwrites any previous summary for this session. The index.md is updated automatically.
295
+
296
+ 6. **Sync roadmap (mandatory when present).** If `docs/roadmap.md` exists, call `manage_roadmap` with action `sync` and `apply: true` to update linked feature statuses from the just-completed execution state. Do not use `force_sync: true` — the human-always-wins rule applies. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `syncRoadmap()` from core. If no roadmap exists, skip silently.
270
297
 
271
- 6. **Learnings are append-only.** Never edit or delete previous learnings. They are a chronological record.
298
+ 7. **Learnings are append-only.** Never edit or delete previous learnings. They are a chronological record.
272
299
 
273
- 7. **Auto-transition to verification.** When ALL tasks in the plan are complete (not when stopping mid-plan):
300
+ 8. **Auto-transition to verification.** When ALL tasks in the plan are complete (not when stopping mid-plan):
274
301
 
275
302
  Call `emit_interaction`:
276
303
 
@@ -325,8 +352,8 @@ These are non-negotiable. When any condition is met, stop immediately.
325
352
  - **`harness check-deps`** — Run when tasks add new imports or modules. Catches boundary violations early.
326
353
  - **`harness state show`** — View current execution position and progress.
327
354
  - **`harness state learn "<message>"`** — Append a learning from the command line.
328
- - **`.harness/state.json`** Read at session start to resume position. Updated after every task.
329
- - **`.harness/learnings.md`** Append-only knowledge capture. Read at session start for prior context.
355
+ - **State file** — Session-scoped at `{sessionDir}/state.json` when session is known, otherwise `.harness/state.json`. Read at session start to resume position. Updated after every task.
356
+ - **Learnings file** — Session-scoped at `{sessionDir}/learnings.md` when session is known, otherwise `.harness/learnings.md`. Append-only knowledge capture. Read at session start for prior context.
330
357
  - **Roadmap sync** — After completing plan execution, call `manage_roadmap` with action `sync` and `apply: true` to update roadmap status. Mandatory when `docs/roadmap.md` exists. Do not use `force_sync: true`. Falls back to `syncRoadmap()` from core if MCP tool is unavailable.
331
358
  - **`emit_interaction`** -- Call at plan completion to auto-transition to harness-verification. Uses auto-transition (proceeds immediately without user confirmation).
332
359
 
@@ -193,22 +193,39 @@ When presenting the task breakdown, use progress markers:
193
193
  }
194
194
  ```
195
195
 
196
- 9. **Request plan sign-off:**
196
+ 9. **Write session summary (if session is known).** If running within a session (autopilot dispatch or standalone with session context), write the session summary:
197
197
 
198
198
  ```json
199
- emit_interaction({
200
- path: "<project-root>",
201
- type: "confirmation",
202
- confirmation: {
203
- text: "Approve plan at <plan-file-path>?",
204
- context: "<task count> tasks, <estimated time> minutes. <one-sentence summary>",
205
- impact: "Approving unlocks task-by-task execution. Plan defines exact file paths, code, and commands.",
206
- risk: "low"
207
- }
199
+ writeSessionSummary(projectPath, sessionSlug, {
200
+ session: "<session-slug>",
201
+ lastActive: "<ISO timestamp>",
202
+ skill: "harness-planning",
203
+ status: "Plan complete. <N> tasks defined.",
204
+ spec: "<spec path if known>",
205
+ plan: "<plan file path>",
206
+ keyContext: "<1-2 sentences: what was planned, key decisions>",
207
+ nextStep: "Approve plan and begin execution."
208
208
  })
209
209
  ```
210
210
 
211
- 10. **Suggest transition to execution.** After the human approves the plan:
211
+ If no session slug is known (standalone invocation without session context), skip this step.
212
+
213
+ 10. **Request plan sign-off:**
214
+
215
+ ```json
216
+ emit_interaction({
217
+ path: "<project-root>",
218
+ type: "confirmation",
219
+ confirmation: {
220
+ text: "Approve plan at <plan-file-path>?",
221
+ context: "<task count> tasks, <estimated time> minutes. <one-sentence summary>",
222
+ impact: "Approving unlocks task-by-task execution. Plan defines exact file paths, code, and commands.",
223
+ risk: "low"
224
+ }
225
+ })
226
+ ```
227
+
228
+ 11. **Suggest transition to execution.** After the human approves the plan:
212
229
 
213
230
  Call `emit_interaction`:
214
231
 
@@ -395,6 +395,38 @@ Choice?
395
395
  harness validate: passed
396
396
  ```
397
397
 
398
+ ---
399
+
400
+ ### Command: `--query <filter>` -- Query Features by Filter
401
+
402
+ #### Phase 1: SCAN -- Load Roadmap
403
+
404
+ 1. Check if `docs/roadmap.md` exists.
405
+ - If missing: error with clear message. "No roadmap found at docs/roadmap.md. Run `--create` first to bootstrap one."
406
+ 2. Parse the roadmap (via `manage_roadmap query` or direct read).
407
+
408
+ #### Phase 2: FILTER -- Apply Query
409
+
410
+ 1. Accept filter patterns:
411
+ - **Status filter:** `backlog`, `planned`, `in-progress`, `done`, `blocked` -- returns all features with that status
412
+ - **Milestone filter:** `milestone:<name>` -- returns all features in the named milestone (partial match)
413
+
414
+ 2. Display matching features with their milestone context:
415
+
416
+ ```
417
+ QUERY: <filter>
418
+
419
+ Results (N matches):
420
+ - Feature A (Current Work) .................. in-progress
421
+ - Feature B (Backlog) ....................... planned
422
+
423
+ Total: N matches
424
+ ```
425
+
426
+ 3. No file writes. This is a read-only operation.
427
+
428
+ ---
429
+
398
430
  ## Harness Integration
399
431
 
400
432
  - **`manage_roadmap` MCP tool** -- Primary read/write interface for roadmap operations. Supports `show`, `add`, `update`, `remove`, and `query` actions. Use this when MCP is available for structured CRUD.
@@ -422,6 +454,8 @@ Choice?
422
454
  16. `--edit` updates `last_manual_edit` timestamp (since changes are human-driven)
423
455
  17. Output matches the roadmap markdown format exactly (frontmatter, H2 milestones, H3 features, 5 fields each)
424
456
  18. `harness validate` passes after all operations
457
+ 19. `--query` filters features by status or milestone and displays results with milestone context
458
+ 20. `--query` errors gracefully when no roadmap exists, directing the user to `--create`
425
459
 
426
460
  ## Examples
427
461
 
@@ -35,6 +35,26 @@ The words "should", "probably", "seems to", and "I believe" are forbidden in ver
35
35
 
36
36
  ---
37
37
 
38
+ ### Context Loading
39
+
40
+ Before running verification levels, load session context if a session slug was provided (e.g., by autopilot dispatch):
41
+
42
+ ```json
43
+ gather_context({
44
+ path: "<project-root>",
45
+ intent: "Verify phase deliverables",
46
+ skill: "harness-verification",
47
+ session: "<session-slug-if-provided>",
48
+ include: ["state", "learnings", "validation"]
49
+ })
50
+ ```
51
+
52
+ **Session resolution:** If a session slug is known (passed via autopilot dispatch or available from a previous handoff), include the `session` parameter. This scopes all state reads to `.harness/sessions/<slug>/`. If no session is known, omit it — `gather_context` falls back to global files at `.harness/`.
53
+
54
+ Use the returned learnings to check for known failures and dead ends relevant to the artifacts being verified.
55
+
56
+ ---
57
+
38
58
  ### Level 1: EXISTS — The Artifact Is Present
39
59
 
40
60
  For every artifact that was supposed to be created or modified:
@@ -201,6 +221,22 @@ Write `.harness/handoff.json`:
201
221
  }
202
222
  ```
203
223
 
224
+ **Write session summary (if session is known).** If running within a session context, update the session summary:
225
+
226
+ ```json
227
+ writeSessionSummary(projectPath, sessionSlug, {
228
+ session: "<session-slug>",
229
+ lastActive: "<ISO timestamp>",
230
+ skill: "harness-verification",
231
+ status: "Verification <PASS|FAIL|INCOMPLETE>. <N> artifacts checked, <N> gaps.",
232
+ spec: "<spec path if known>",
233
+ keyContext: "<1-2 sentences: verification outcome, any gaps found>",
234
+ nextStep: "<e.g., Proceed to code review / Resolve gaps>"
235
+ })
236
+ ```
237
+
238
+ If no session slug is known, skip this step.
239
+
204
240
  **If verdict is PASS (all levels passed, no gaps):**
205
241
 
206
242
  Call `emit_interaction`:
@@ -339,6 +375,12 @@ Task: "Create UserService with create, read, update, delete operations."
339
375
  - **When verification reveals the spec itself is incomplete:** Do not fill in the gaps yourself. Escalate to the human: "Verification found that the spec does not define behavior for [scenario]. How should this be handled?"
340
376
  - **When you cannot run harness checks:** If `harness validate` or `harness check-deps` cannot be run (missing configuration, broken tooling), this is a blocking issue. Do not skip verification — fix the tooling or escalate.
341
377
 
378
+ ## Harness Integration
379
+
380
+ - **`gather_context`** — Used in Context Loading phase (before Level 1) to load session-scoped state, learnings, and validation in a single call. The `session` parameter scopes reads to the session directory when provided by autopilot dispatch.
381
+ - **`harness validate`** — Run during Level 3 (WIRED) to verify artifact integration.
382
+ - **`harness check-deps`** — Run during Level 3 (WIRED) to verify dependency boundaries.
383
+
342
384
  After verification completes, append a tagged learning:
343
385
 
344
386
  - **YYYY-MM-DD [skill:harness-verification] [outcome:pass/fail]:** Verified [feature]. [Brief note on what was found or confirmed.]
@@ -102,20 +102,26 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
102
102
  path: "<project-root>",
103
103
  intent: "Autopilot phase execution for <spec name>",
104
104
  skill: "harness-autopilot",
105
+ session: "<session-slug>",
105
106
  include: ["state", "learnings", "handoff", "validation"]
106
107
  })
107
108
  ```
108
109
 
109
- This loads learnings (including failure entries tagged `[outcome:failure]`), handoff context, state, and validation results in a single call. Note any relevant learnings or known dead ends for the current phase from the returned `learnings` array.
110
+ This loads session-scoped learnings, handoff, state, and validation results in a single call. The `session` parameter ensures all reads come from the session directory (`.harness/sessions/<slug>/`), isolating this workstream from others. Note any relevant learnings or known dead ends for the current phase from the returned `learnings` array.
110
111
 
111
- 6. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
112
+ 6. **Load session summary for cold start.** If resuming (existing `autopilot-state.json` found):
113
+ - Call `loadSessionSummary()` for the session slug to get quick orientation context (~200 tokens).
114
+ - The summary provides the last skill, phase, status, and next step — enough to understand where the autopilot left off without re-reading the full state machine.
115
+ - If no summary exists (first run), skip — the full INIT handles context loading.
116
+
117
+ 7. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
112
118
  - Current project priorities (which features are `in-progress`)
113
119
  - Blockers that may affect the upcoming phases
114
120
  - Overall project status and milestone progress
115
121
 
116
122
  This provides the autopilot with project-level context beyond the individual spec being executed. If the roadmap does not exist, skip this step — the autopilot operates normally without it.
117
123
 
118
- 7. **Transition to ASSESS.**
124
+ 8. **Transition to ASSESS.**
119
125
 
120
126
  ---
121
127
 
@@ -155,9 +161,11 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
155
161
 
156
162
  Spec: {specPath}
157
163
  Session directory: {sessionDir}
164
+ Session slug: {sessionSlug}
158
165
  Phase description: {phase description from spec}
159
- Previous phase learnings (global): {relevant learnings from .harness/learnings.md}
160
- Known failures to avoid (global): {relevant entries from .harness/failures.md}
166
+
167
+ On startup, call gather_context({ session: "{sessionSlug}" }) to load
168
+ session-scoped learnings, state, and validation context.
161
169
 
162
170
  Follow the harness-planning skill process exactly. Write the plan to
163
171
  docs/plans/{date}-{phase-name}-plan.md. Write {sessionDir}/handoff.json when done.
@@ -221,9 +229,11 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
221
229
 
222
230
  Plan: {planPath}
223
231
  Session directory: {sessionDir}
232
+ Session slug: {sessionSlug}
224
233
  State: {sessionDir}/state.json
225
- Learnings (global): .harness/learnings.md
226
- Failures (global): .harness/failures.md
234
+
235
+ On startup, call gather_context({ session: "{sessionSlug}" }) to load
236
+ session-scoped learnings, state, and validation context.
227
237
 
228
238
  Follow the harness-execution skill process exactly.
229
239
  Update {sessionDir}/state.json after each task.
@@ -268,6 +278,10 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
268
278
  You are running harness-verification for phase {N}: {name}.
269
279
 
270
280
  Session directory: {sessionDir}
281
+ Session slug: {sessionSlug}
282
+
283
+ On startup, call gather_context({ session: "{sessionSlug}" }) to load
284
+ session-scoped learnings, state, and validation context.
271
285
 
272
286
  Follow the harness-verification skill process exactly.
273
287
  Report pass/fail with findings.
@@ -296,6 +310,10 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
296
310
  You are running harness-code-review for phase {N}: {name}.
297
311
 
298
312
  Session directory: {sessionDir}
313
+ Session slug: {sessionSlug}
314
+
315
+ On startup, call gather_context({ session: "{sessionSlug}" }) to load
316
+ session-scoped learnings, state, and validation context.
299
317
 
300
318
  Follow the harness-code-review skill process exactly.
301
319
  Report findings with severity (blocking / warning / note).
@@ -341,7 +359,23 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
341
359
 
342
360
  4. **Sync roadmap.** If `docs/roadmap.md` exists, call `manage_roadmap` with action `sync` and `apply: true`. This reflects the just-completed phase in the roadmap (e.g., updating the feature from `planned` to `in-progress`). If `manage_roadmap` is unavailable, fall back to direct file manipulation using `syncRoadmap()` from core. Skip silently if no roadmap exists. Do not use `force_sync: true` — the human-always-wins rule applies.
343
361
 
344
- 5. **Check for next phase:**
362
+ 5. **Write session summary.** Update the session summary to reflect the completed phase:
363
+
364
+ ```json
365
+ writeSessionSummary(projectPath, sessionSlug, {
366
+ session: "<session-slug>",
367
+ lastActive: "<ISO timestamp>",
368
+ skill: "harness-autopilot",
369
+ phase: "<completed phase number> of <total phases>",
370
+ status: "Phase <N> complete. <tasks completed>/<total> tasks.",
371
+ spec: "<spec path>",
372
+ plan: "<current plan path>",
373
+ keyContext: "<1-2 sentences: what this phase accomplished, key decisions>",
374
+ nextStep: "<e.g., Continue to Phase N+1: <name>, or DONE>"
375
+ })
376
+ ```
377
+
378
+ 6. **Check for next phase:**
345
379
  - If more phases remain: "Phase {N} complete. Next: Phase {N+1}: {name} (complexity: {level}). Continue? (yes / stop)"
346
380
  - **yes** — Increment `currentPhase`, reset `retryBudget`, transition to ASSESS.
347
381
  - **stop** — Save state and exit.
@@ -387,7 +421,21 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
387
421
 
388
422
  5. **Update roadmap to done.** If `docs/roadmap.md` exists and the current spec maps to a roadmap feature, call `manage_roadmap` with action `update` to set the feature status to `done`. Derive the feature name from the spec title (H1 heading) or the session's `handoff.json` `summary` field. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `updateFeature()` from core. Skip silently if no roadmap exists or if the feature is not found. Do not use `force_sync: true`.
389
423
 
390
- 6. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file it serves as a record.
424
+ 6. **Write final session summary.** Update the session summary to reflect completion:
425
+
426
+ ```json
427
+ writeSessionSummary(projectPath, sessionSlug, {
428
+ session: "<session-slug>",
429
+ lastActive: "<ISO timestamp>",
430
+ skill: "harness-autopilot",
431
+ status: "DONE. <total phases> phases, <total tasks> tasks complete.",
432
+ spec: "<spec path>",
433
+ keyContext: "<1-2 sentences: overall summary of what was built>",
434
+ nextStep: "All phases complete. Create PR or close session."
435
+ })
436
+ ```
437
+
438
+ 7. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file — it serves as a record.
391
439
 
392
440
  ## Harness Integration
393
441
 
@@ -161,7 +161,7 @@ These keywords flow into the `handoff.json` `contextKeywords` field when the spe
161
161
  - Call `manage_roadmap` with action `add`, `status: "planned"`, `milestone: "Current Work"`, and the spec path. Include a one-line summary from the spec overview.
162
162
  - If the feature already exists in the roadmap (duplicate name), skip silently — the feature was likely added manually or by a prior brainstorming session.
163
163
  - Log: `"Added '<feature-name>' to roadmap as planned"` (informational, not a prompt).
164
- - If `manage_roadmap` is unavailable, fall back to direct file manipulation using `addFeature()` from core.
164
+ - If `manage_roadmap` is unavailable, fall back to direct file manipulation using `parseRoadmap`/`serializeRoadmap` from core to read, modify, and write `docs/roadmap.md`.
165
165
  - If no roadmap exists, skip this step silently.
166
166
 
167
167
  7. **Write handoff and suggest transition.** After the human approves the spec:
@@ -212,12 +212,13 @@ gather_context({
212
212
  path: "<project-root>",
213
213
  intent: "Code review of <change description>",
214
214
  skill: "harness-code-review",
215
+ session: "<session-slug-if-provided>",
215
216
  tokenBudget: 8000,
216
217
  include: ["graph", "learnings", "validation"]
217
218
  })
218
219
  ```
219
220
 
220
- This replaces manual `query_graph` + `get_impact` + `find_context_for` calls with a single composite call that assembles review context in parallel, ranked by relevance. Falls back gracefully when no graph is available (`meta.graphAvailable: false`).
221
+ This replaces manual `query_graph` + `get_impact` + `find_context_for` calls with a single composite call that assembles review context in parallel, ranked by relevance. Falls back gracefully when no graph is available (`meta.graphAvailable: false`). When `session` is provided (e.g., via autopilot dispatch), learnings and state are scoped to the session directory. If no session is known, omit the parameter — `gather_context` falls back to global files.
221
222
 
222
223
  For domain-specific scoping (compliance, bug detection, security, architecture), supplement `gather_context` output with targeted `query_graph` calls as needed.
223
224
 
@@ -528,6 +529,22 @@ Write `.harness/handoff.json`:
528
529
  }
529
530
  ```
530
531
 
532
+ **Write session summary (if session is known).** If running within a session context, update the session summary:
533
+
534
+ ```json
535
+ writeSessionSummary(projectPath, sessionSlug, {
536
+ session: "<session-slug>",
537
+ lastActive: "<ISO timestamp>",
538
+ skill: "harness-code-review",
539
+ status: "Review complete. Assessment: <approve|request-changes|comment>. <N> findings.",
540
+ spec: "<spec path if known>",
541
+ keyContext: "<1-2 sentences: review outcome, key findings>",
542
+ nextStep: "<e.g., Address blocking findings / Ready to merge / Observations delivered>"
543
+ })
544
+ ```
545
+
546
+ If no session slug is known, skip this step.
547
+
531
548
  **If assessment is "approve":**
532
549
 
533
550
  Call `emit_interaction`:
@@ -614,7 +631,7 @@ _This section is not part of the pipeline. It documents the process for respondi
614
631
  ## Harness Integration
615
632
 
616
633
  - **`assess_project`** — Used in Phase 2 (MECHANICAL) to run `validate`, `deps`, and `docs` checks in parallel. Must pass for the pipeline to continue to AI review. Failures are Critical issues that stop the pipeline.
617
- - **`gather_context`** — Used in Phase 3 (CONTEXT) for efficient parallel context assembly. Replaces separate graph query calls.
634
+ - **`gather_context`** — Used in Phase 3 (CONTEXT) for efficient parallel context assembly. The `session` parameter scopes learnings and state to the session directory when provided by autopilot dispatch. Replaces separate graph query calls.
618
635
  - **`harness cleanup`** — Optional check during Phase 2 for entropy accumulation in changed files.
619
636
  - **Graph queries** — Used in Phase 3 (CONTEXT) for dependency-scoped context and in Phase 5 (VALIDATE) for reachability verification. Graceful fallback when no graph exists.
620
637
  - **`emit_interaction`** -- Call after review approval to suggest transitioning to merge/PR creation. Only emitted on APPROVE assessment. Uses confirmed transition (waits for user approval).