npm - @harness-engineering/cli - Versions diffs - 1.11.0 → 1.13.0 - Mend

@harness-engineering/cli 1.11.0 → 1.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (49) hide show

package/dist/agents/skills/claude-code/harness-autopilot/SKILL.md CHANGED Viewed

@@ -102,20 +102,26 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
      path: "<project-root>",
      intent: "Autopilot phase execution for <spec name>",
      skill: "harness-autopilot",
+     session: "<session-slug>",
      include: ["state", "learnings", "handoff", "validation"]
    })
    ```
-   This loads learnings (including failure entries tagged `[outcome:failure]`), handoff context, state, and validation results in a single call. Note any relevant learnings or known dead ends for the current phase from the returned `learnings` array.
+   This loads session-scoped learnings, handoff, state, and validation results in a single call. The `session` parameter ensures all reads come from the session directory (`.harness/sessions/<slug>/`), isolating this workstream from others. Note any relevant learnings or known dead ends for the current phase from the returned `learnings` array.
-6. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
+6. **Load session summary for cold start.** If resuming (existing `autopilot-state.json` found):
+   - Call `loadSessionSummary()` for the session slug to get quick orientation context (~200 tokens).
+   - The summary provides the last skill, phase, status, and next step — enough to understand where the autopilot left off without re-reading the full state machine.
+   - If no summary exists (first run), skip — the full INIT handles context loading.
+7. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
    - Current project priorities (which features are `in-progress`)
    - Blockers that may affect the upcoming phases
    - Overall project status and milestone progress
    This provides the autopilot with project-level context beyond the individual spec being executed. If the roadmap does not exist, skip this step — the autopilot operates normally without it.
-7. **Transition to ASSESS.**
+8. **Transition to ASSESS.**
 ---
@@ -155,9 +161,11 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
        Spec: {specPath}
        Session directory: {sessionDir}
+       Session slug: {sessionSlug}
        Phase description: {phase description from spec}
-       Previous phase learnings (global): {relevant learnings from .harness/learnings.md}
-       Known failures to avoid (global): {relevant entries from .harness/failures.md}
+       On startup, call gather_context({ session: "{sessionSlug}" }) to load
+       session-scoped learnings, state, and validation context.
        Follow the harness-planning skill process exactly. Write the plan to
        docs/plans/{date}-{phase-name}-plan.md. Write {sessionDir}/handoff.json when done.
@@ -221,9 +229,11 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
        Plan: {planPath}
        Session directory: {sessionDir}
+       Session slug: {sessionSlug}
        State: {sessionDir}/state.json
-       Learnings (global): .harness/learnings.md
-       Failures (global): .harness/failures.md
+       On startup, call gather_context({ session: "{sessionSlug}" }) to load
+       session-scoped learnings, state, and validation context.
        Follow the harness-execution skill process exactly.
        Update {sessionDir}/state.json after each task.
@@ -268,6 +278,10 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
        You are running harness-verification for phase {N}: {name}.
        Session directory: {sessionDir}
+       Session slug: {sessionSlug}
+       On startup, call gather_context({ session: "{sessionSlug}" }) to load
+       session-scoped learnings, state, and validation context.
        Follow the harness-verification skill process exactly.
        Report pass/fail with findings.
@@ -296,6 +310,10 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
        You are running harness-code-review for phase {N}: {name}.
        Session directory: {sessionDir}
+       Session slug: {sessionSlug}
+       On startup, call gather_context({ session: "{sessionSlug}" }) to load
+       session-scoped learnings, state, and validation context.
        Follow the harness-code-review skill process exactly.
        Report findings with severity (blocking / warning / note).
@@ -341,7 +359,23 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
 4. **Sync roadmap.** If `docs/roadmap.md` exists, call `manage_roadmap` with action `sync` and `apply: true`. This reflects the just-completed phase in the roadmap (e.g., updating the feature from `planned` to `in-progress`). If `manage_roadmap` is unavailable, fall back to direct file manipulation using `syncRoadmap()` from core. Skip silently if no roadmap exists. Do not use `force_sync: true` — the human-always-wins rule applies.
-5. **Check for next phase:**
+5. **Write session summary.** Update the session summary to reflect the completed phase:
+   ```json
+   writeSessionSummary(projectPath, sessionSlug, {
+     session: "<session-slug>",
+     lastActive: "<ISO timestamp>",
+     skill: "harness-autopilot",
+     phase: "<completed phase number> of <total phases>",
+     status: "Phase <N> complete. <tasks completed>/<total> tasks.",
+     spec: "<spec path>",
+     plan: "<current plan path>",
+     keyContext: "<1-2 sentences: what this phase accomplished, key decisions>",
+     nextStep: "<e.g., Continue to Phase N+1: <name>, or DONE>"
+   })
+   ```
+6. **Check for next phase:**
    - If more phases remain: "Phase {N} complete. Next: Phase {N+1}: {name} (complexity: {level}). Continue? (yes / stop)"
      - **yes** — Increment `currentPhase`, reset `retryBudget`, transition to ASSESS.
      - **stop** — Save state and exit.
@@ -387,7 +421,21 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
 5. **Update roadmap to done.** If `docs/roadmap.md` exists and the current spec maps to a roadmap feature, call `manage_roadmap` with action `update` to set the feature status to `done`. Derive the feature name from the spec title (H1 heading) or the session's `handoff.json` `summary` field. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `updateFeature()` from core. Skip silently if no roadmap exists or if the feature is not found. Do not use `force_sync: true`.
-6. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file — it serves as a record.
+6. **Write final session summary.** Update the session summary to reflect completion:
+   ```json
+   writeSessionSummary(projectPath, sessionSlug, {
+     session: "<session-slug>",
+     lastActive: "<ISO timestamp>",
+     skill: "harness-autopilot",
+     status: "DONE. <total phases> phases, <total tasks> tasks complete.",
+     spec: "<spec path>",
+     keyContext: "<1-2 sentences: overall summary of what was built>",
+     nextStep: "All phases complete. Create PR or close session."
+   })
+   ```
+7. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file — it serves as a record.
 ## Harness Integration

package/dist/agents/skills/claude-code/harness-brainstorming/SKILL.md CHANGED Viewed

@@ -161,7 +161,7 @@ These keywords flow into the `handoff.json` `contextKeywords` field when the spe
    - Call `manage_roadmap` with action `add`, `status: "planned"`, `milestone: "Current Work"`, and the spec path. Include a one-line summary from the spec overview.
    - If the feature already exists in the roadmap (duplicate name), skip silently — the feature was likely added manually or by a prior brainstorming session.
    - Log: `"Added '<feature-name>' to roadmap as planned"` (informational, not a prompt).
-   - If `manage_roadmap` is unavailable, fall back to direct file manipulation using `addFeature()` from core.
+   - If `manage_roadmap` is unavailable, fall back to direct file manipulation using `parseRoadmap`/`serializeRoadmap` from core to read, modify, and write `docs/roadmap.md`.
    - If no roadmap exists, skip this step silently.
 7. **Write handoff and suggest transition.** After the human approves the spec:

package/dist/agents/skills/claude-code/harness-code-review/SKILL.md CHANGED Viewed

@@ -212,12 +212,13 @@ gather_context({
   path: "<project-root>",
   intent: "Code review of <change description>",
   skill: "harness-code-review",
+  session: "<session-slug-if-provided>",
   tokenBudget: 8000,
   include: ["graph", "learnings", "validation"]
 })
 ```
-This replaces manual `query_graph` + `get_impact` + `find_context_for` calls with a single composite call that assembles review context in parallel, ranked by relevance. Falls back gracefully when no graph is available (`meta.graphAvailable: false`).
+This replaces manual `query_graph` + `get_impact` + `find_context_for` calls with a single composite call that assembles review context in parallel, ranked by relevance. Falls back gracefully when no graph is available (`meta.graphAvailable: false`). When `session` is provided (e.g., via autopilot dispatch), learnings and state are scoped to the session directory. If no session is known, omit the parameter — `gather_context` falls back to global files.
 For domain-specific scoping (compliance, bug detection, security, architecture), supplement `gather_context` output with targeted `query_graph` calls as needed.
@@ -528,6 +529,22 @@ Write `.harness/handoff.json`:
 }
 ```
+**Write session summary (if session is known).** If running within a session context, update the session summary:
+```json
+writeSessionSummary(projectPath, sessionSlug, {
+  session: "<session-slug>",
+  lastActive: "<ISO timestamp>",
+  skill: "harness-code-review",
+  status: "Review complete. Assessment: <approve|request-changes|comment>. <N> findings.",
+  spec: "<spec path if known>",
+  keyContext: "<1-2 sentences: review outcome, key findings>",
+  nextStep: "<e.g., Address blocking findings / Ready to merge / Observations delivered>"
+})
+```
+If no session slug is known, skip this step.
 **If assessment is "approve":**
 Call `emit_interaction`:
@@ -614,7 +631,7 @@ _This section is not part of the pipeline. It documents the process for respondi
 ## Harness Integration
 - **`assess_project`** — Used in Phase 2 (MECHANICAL) to run `validate`, `deps`, and `docs` checks in parallel. Must pass for the pipeline to continue to AI review. Failures are Critical issues that stop the pipeline.
-- **`gather_context`** — Used in Phase 3 (CONTEXT) for efficient parallel context assembly. Replaces separate graph query calls.
+- **`gather_context`** — Used in Phase 3 (CONTEXT) for efficient parallel context assembly. The `session` parameter scopes learnings and state to the session directory when provided by autopilot dispatch. Replaces separate graph query calls.
 - **`harness cleanup`** — Optional check during Phase 2 for entropy accumulation in changed files.
 - **Graph queries** — Used in Phase 3 (CONTEXT) for dependency-scoped context and in Phase 5 (VALIDATE) for reachability verification. Graceful fallback when no graph exists.
 - **`emit_interaction`** -- Call after review approval to suggest transitioning to merge/PR creation. Only emitted on APPROVE assessment. Uses confirmed transition (waits for user approval).

package/dist/agents/skills/claude-code/harness-execution/SKILL.md CHANGED Viewed

@@ -33,20 +33,29 @@ Deviating from the plan mid-execution introduces untested assumptions, breaks ta
      path: "<project-root>",
      intent: "Execute plan tasks starting from current position",
      skill: "harness-execution",
+     session: "<session-slug-if-known>",
      include: ["state", "learnings", "handoff", "validation"]
    })
    ```
+   **Session resolution:** If a session directory is known (passed via autopilot dispatch or available from a previous handoff), include the `session` parameter. This scopes all state reads/writes to `.harness/sessions/<slug>/`. If no session is known, omit it — `gather_context` falls back to global files at `.harness/`.
    This returns `state` (current position — if null, this is a fresh start at Task 1), `learnings` (hard-won insights from previous sessions — do not ignore them), `handoff` (structured context from the previous skill), and `validation` (current project health). If any constituent fails, its field is null and the error is reported in `meta.errors`.
-3. **Check for known dead ends.** Review `learnings` entries tagged `[outcome:failure]`. If any match approaches in the current plan, surface warnings before proceeding.
+3. **Load session summary for cold start.** If resuming a session (session slug is known), read the session summary for quick orientation:
+   - Call `listActiveSessions()` to read the session index (~100 tokens).
+   - If the target session is known, call `loadSessionSummary()` for that session (~200 tokens).
+   - If ambiguous (multiple active sessions, no clear target), present the index to the user and ask which session to resume.
+   - The summary provides skill, phase, status, key context, and next step — enough to orient without re-reading full state + learnings + plan.
+4. **Check for known dead ends.** Review `learnings` entries tagged `[outcome:failure]`. If any match approaches in the current plan, surface warnings before proceeding.
-4. **Verify prerequisites.** For the current task:
+5. **Verify prerequisites.** For the current task:
    - Are dependency tasks marked complete in state?
    - Do the files referenced in the task exist as expected?
    - Does the test suite pass? Run `harness validate` to confirm a clean baseline.
-5. **If prerequisites fail,** do not proceed. Report what is missing and which task is blocked.
+6. **If prerequisites fail,** do not proceed. Report what is missing and which task is blocked.
 ### Graph-Enhanced Context (when available)
@@ -220,7 +229,7 @@ emit_interaction({
 Between tasks (especially between sessions):
-1. **Update `.harness/state.json`** with current position, progress, and `lastSession` context:
+1. **Update state (session-scoped `{sessionDir}/state.json` if session is known, otherwise `.harness/state.json`)** with current position, progress, and `lastSession` context:
    ```json
    {
@@ -241,7 +250,7 @@ harness scan [path]
 Skipping this step means subsequent graph queries (impact analysis, dependency health, test advisor) may return stale results.
-2. **Append tagged learnings to `.harness/learnings.md`.** Tag every entry with skill and outcome:
+2. **Append tagged learnings to the session-scoped learnings file (`{sessionDir}/learnings.md` if session is known, otherwise `.harness/learnings.md`).** Tag every entry with skill and outcome:
    ```markdown
    ## YYYY-MM-DD — Task N: <task name>
@@ -251,9 +260,9 @@ Skipping this step means subsequent graph queries (impact analysis, dependency h
    - [skill:harness-execution] [outcome:decision] What was decided and why
    ```
-3. **Record failures in `.harness/failures.md`** if any task was escalated after retry exhaustion (from Phase 2 Step 5). Include the approach attempted and why it failed, so future sessions avoid the same dead end.
+3. **Record failures in the session-scoped failures file (`{sessionDir}/failures.md` if session is known, otherwise `.harness/failures.md`)** if any task was escalated after retry exhaustion (from Phase 2 Step 5). Include the approach attempted and why it failed, so future sessions avoid the same dead end.
-4. **Write `.harness/handoff.json`** with structured context for the next skill or session:
+4. **Write the session-scoped handoff (`{sessionDir}/handoff.json` if session is known, otherwise `.harness/handoff.json`)** with structured context for the next skill or session:
    ```json
    {
@@ -266,11 +275,29 @@ Skipping this step means subsequent graph queries (impact analysis, dependency h
    }
    ```
-5. **Sync roadmap (mandatory when present).** If `docs/roadmap.md` exists, call `manage_roadmap` with action `sync` and `apply: true` to update linked feature statuses from the just-completed execution state. Do not use `force_sync: true` — the human-always-wins rule applies. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `syncRoadmap()` from core. If no roadmap exists, skip silently.
+5. **Write session summary.** Write/update the session summary for cold-start context restoration:
+   ```json
+   writeSessionSummary(projectPath, sessionSlug, {
+     session: "<session-slug>",
+     lastActive: "<ISO timestamp>",
+     skill: "harness-execution",
+     phase: "<current phase of plan>",
+     status: "<e.g., Task 4/6 complete, paused at CHECKPOINT>",
+     spec: "<spec path if known>",
+     plan: "<plan path>",
+     keyContext: "<1-2 sentences: what was accomplished, key decisions made>",
+     nextStep: "<what to do next when resuming>"
+   })
+   ```
+   This overwrites any previous summary for this session. The index.md is updated automatically.
+6. **Sync roadmap (mandatory when present).** If `docs/roadmap.md` exists, call `manage_roadmap` with action `sync` and `apply: true` to update linked feature statuses from the just-completed execution state. Do not use `force_sync: true` — the human-always-wins rule applies. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `syncRoadmap()` from core. If no roadmap exists, skip silently.
-6. **Learnings are append-only.** Never edit or delete previous learnings. They are a chronological record.
+7. **Learnings are append-only.** Never edit or delete previous learnings. They are a chronological record.
-7. **Auto-transition to verification.** When ALL tasks in the plan are complete (not when stopping mid-plan):
+8. **Auto-transition to verification.** When ALL tasks in the plan are complete (not when stopping mid-plan):
    Call `emit_interaction`:
@@ -325,8 +352,8 @@ These are non-negotiable. When any condition is met, stop immediately.
 - **`harness check-deps`** — Run when tasks add new imports or modules. Catches boundary violations early.
 - **`harness state show`** — View current execution position and progress.
 - **`harness state learn "<message>"`** — Append a learning from the command line.
-- **`.harness/state.json`** — Read at session start to resume position. Updated after every task.
-- **`.harness/learnings.md`** — Append-only knowledge capture. Read at session start for prior context.
+- **State file** — Session-scoped at `{sessionDir}/state.json` when session is known, otherwise `.harness/state.json`. Read at session start to resume position. Updated after every task.
+- **Learnings file** — Session-scoped at `{sessionDir}/learnings.md` when session is known, otherwise `.harness/learnings.md`. Append-only knowledge capture. Read at session start for prior context.
 - **Roadmap sync** — After completing plan execution, call `manage_roadmap` with action `sync` and `apply: true` to update roadmap status. Mandatory when `docs/roadmap.md` exists. Do not use `force_sync: true`. Falls back to `syncRoadmap()` from core if MCP tool is unavailable.
 - **`emit_interaction`** -- Call at plan completion to auto-transition to harness-verification. Uses auto-transition (proceeds immediately without user confirmation).

package/dist/agents/skills/claude-code/harness-planning/SKILL.md CHANGED Viewed

@@ -193,22 +193,39 @@ When presenting the task breakdown, use progress markers:
    }
    ```
-9. **Request plan sign-off:**
+9. **Write session summary (if session is known).** If running within a session (autopilot dispatch or standalone with session context), write the session summary:
    ```json
-   emit_interaction({
-     path: "<project-root>",
-     type: "confirmation",
-     confirmation: {
-       text: "Approve plan at <plan-file-path>?",
-       context: "<task count> tasks, <estimated time> minutes. <one-sentence summary>",
-       impact: "Approving unlocks task-by-task execution. Plan defines exact file paths, code, and commands.",
-       risk: "low"
-     }
+   writeSessionSummary(projectPath, sessionSlug, {
+     session: "<session-slug>",
+     lastActive: "<ISO timestamp>",
+     skill: "harness-planning",
+     status: "Plan complete. <N> tasks defined.",
+     spec: "<spec path if known>",
+     plan: "<plan file path>",
+     keyContext: "<1-2 sentences: what was planned, key decisions>",
+     nextStep: "Approve plan and begin execution."
    })
    ```
-10. **Suggest transition to execution.** After the human approves the plan:
+   If no session slug is known (standalone invocation without session context), skip this step.
+10. **Request plan sign-off:**
+```json
+emit_interaction({
+  path: "<project-root>",
+  type: "confirmation",
+  confirmation: {
+    text: "Approve plan at <plan-file-path>?",
+    context: "<task count> tasks, <estimated time> minutes. <one-sentence summary>",
+    impact: "Approving unlocks task-by-task execution. Plan defines exact file paths, code, and commands.",
+    risk: "low"
+  }
+})
+```
+11. **Suggest transition to execution.** After the human approves the plan:
     Call `emit_interaction`:

package/dist/agents/skills/claude-code/harness-roadmap/SKILL.md CHANGED Viewed

@@ -395,6 +395,38 @@ Choice?
    harness validate: passed
    ```
+---
+### Command: `--query <filter>` -- Query Features by Filter
+#### Phase 1: SCAN -- Load Roadmap
+1. Check if `docs/roadmap.md` exists.
+   - If missing: error with clear message. "No roadmap found at docs/roadmap.md. Run `--create` first to bootstrap one."
+2. Parse the roadmap (via `manage_roadmap query` or direct read).
+#### Phase 2: FILTER -- Apply Query
+1. Accept filter patterns:
+   - **Status filter:** `backlog`, `planned`, `in-progress`, `done`, `blocked` -- returns all features with that status
+   - **Milestone filter:** `milestone:<name>` -- returns all features in the named milestone (partial match)
+2. Display matching features with their milestone context:
+   ```
+   QUERY: <filter>
+   Results (N matches):
+     - Feature A (Current Work) .................. in-progress
+     - Feature B (Backlog) ....................... planned
+   Total: N matches
+   ```
+3. No file writes. This is a read-only operation.
+---
 ## Harness Integration
 - **`manage_roadmap` MCP tool** -- Primary read/write interface for roadmap operations. Supports `show`, `add`, `update`, `remove`, and `query` actions. Use this when MCP is available for structured CRUD.
@@ -422,6 +454,8 @@ Choice?
 16. `--edit` updates `last_manual_edit` timestamp (since changes are human-driven)
 17. Output matches the roadmap markdown format exactly (frontmatter, H2 milestones, H3 features, 5 fields each)
 18. `harness validate` passes after all operations
+19. `--query` filters features by status or milestone and displays results with milestone context
+20. `--query` errors gracefully when no roadmap exists, directing the user to `--create`
 ## Examples

package/dist/agents/skills/claude-code/harness-verification/SKILL.md CHANGED Viewed

@@ -35,6 +35,26 @@ The words "should", "probably", "seems to", and "I believe" are forbidden in ver
 ---
+### Context Loading
+Before running verification levels, load session context if a session slug was provided (e.g., by autopilot dispatch):
+```json
+gather_context({
+  path: "<project-root>",
+  intent: "Verify phase deliverables",
+  skill: "harness-verification",
+  session: "<session-slug-if-provided>",
+  include: ["state", "learnings", "validation"]
+})
+```
+**Session resolution:** If a session slug is known (passed via autopilot dispatch or available from a previous handoff), include the `session` parameter. This scopes all state reads to `.harness/sessions/<slug>/`. If no session is known, omit it — `gather_context` falls back to global files at `.harness/`.
+Use the returned learnings to check for known failures and dead ends relevant to the artifacts being verified.
+---
 ### Level 1: EXISTS — The Artifact Is Present
 For every artifact that was supposed to be created or modified:
@@ -201,6 +221,22 @@ Write `.harness/handoff.json`:
 }
 ```
+**Write session summary (if session is known).** If running within a session context, update the session summary:
+```json
+writeSessionSummary(projectPath, sessionSlug, {
+  session: "<session-slug>",
+  lastActive: "<ISO timestamp>",
+  skill: "harness-verification",
+  status: "Verification <PASS|FAIL|INCOMPLETE>. <N> artifacts checked, <N> gaps.",
+  spec: "<spec path if known>",
+  keyContext: "<1-2 sentences: verification outcome, any gaps found>",
+  nextStep: "<e.g., Proceed to code review / Resolve gaps>"
+})
+```
+If no session slug is known, skip this step.
 **If verdict is PASS (all levels passed, no gaps):**
 Call `emit_interaction`:
@@ -339,6 +375,12 @@ Task: "Create UserService with create, read, update, delete operations."
 - **When verification reveals the spec itself is incomplete:** Do not fill in the gaps yourself. Escalate to the human: "Verification found that the spec does not define behavior for [scenario]. How should this be handled?"
 - **When you cannot run harness checks:** If `harness validate` or `harness check-deps` cannot be run (missing configuration, broken tooling), this is a blocking issue. Do not skip verification — fix the tooling or escalate.
+## Harness Integration
+- **`gather_context`** — Used in Context Loading phase (before Level 1) to load session-scoped state, learnings, and validation in a single call. The `session` parameter scopes reads to the session directory when provided by autopilot dispatch.
+- **`harness validate`** — Run during Level 3 (WIRED) to verify artifact integration.
+- **`harness check-deps`** — Run during Level 3 (WIRED) to verify dependency boundaries.
 After verification completes, append a tagged learning:
 - **YYYY-MM-DD [skill:harness-verification] [outcome:pass/fail]:** Verified [feature]. [Brief note on what was found or confirmed.]

package/dist/agents/skills/gemini-cli/harness-autopilot/SKILL.md CHANGED Viewed

@@ -102,20 +102,26 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
      path: "<project-root>",
      intent: "Autopilot phase execution for <spec name>",
      skill: "harness-autopilot",
+     session: "<session-slug>",
      include: ["state", "learnings", "handoff", "validation"]
    })
    ```
-   This loads learnings (including failure entries tagged `[outcome:failure]`), handoff context, state, and validation results in a single call. Note any relevant learnings or known dead ends for the current phase from the returned `learnings` array.
+   This loads session-scoped learnings, handoff, state, and validation results in a single call. The `session` parameter ensures all reads come from the session directory (`.harness/sessions/<slug>/`), isolating this workstream from others. Note any relevant learnings or known dead ends for the current phase from the returned `learnings` array.
-6. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
+6. **Load session summary for cold start.** If resuming (existing `autopilot-state.json` found):
+   - Call `loadSessionSummary()` for the session slug to get quick orientation context (~200 tokens).
+   - The summary provides the last skill, phase, status, and next step — enough to understand where the autopilot left off without re-reading the full state machine.
+   - If no summary exists (first run), skip — the full INIT handles context loading.
+7. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
    - Current project priorities (which features are `in-progress`)
    - Blockers that may affect the upcoming phases
    - Overall project status and milestone progress
    This provides the autopilot with project-level context beyond the individual spec being executed. If the roadmap does not exist, skip this step — the autopilot operates normally without it.
-7. **Transition to ASSESS.**
+8. **Transition to ASSESS.**
 ---
@@ -155,9 +161,11 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
        Spec: {specPath}
        Session directory: {sessionDir}
+       Session slug: {sessionSlug}
        Phase description: {phase description from spec}
-       Previous phase learnings (global): {relevant learnings from .harness/learnings.md}
-       Known failures to avoid (global): {relevant entries from .harness/failures.md}
+       On startup, call gather_context({ session: "{sessionSlug}" }) to load
+       session-scoped learnings, state, and validation context.
        Follow the harness-planning skill process exactly. Write the plan to
        docs/plans/{date}-{phase-name}-plan.md. Write {sessionDir}/handoff.json when done.
@@ -221,9 +229,11 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
        Plan: {planPath}
        Session directory: {sessionDir}
+       Session slug: {sessionSlug}
        State: {sessionDir}/state.json
-       Learnings (global): .harness/learnings.md
-       Failures (global): .harness/failures.md
+       On startup, call gather_context({ session: "{sessionSlug}" }) to load
+       session-scoped learnings, state, and validation context.
        Follow the harness-execution skill process exactly.
        Update {sessionDir}/state.json after each task.
@@ -268,6 +278,10 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
        You are running harness-verification for phase {N}: {name}.
        Session directory: {sessionDir}
+       Session slug: {sessionSlug}
+       On startup, call gather_context({ session: "{sessionSlug}" }) to load
+       session-scoped learnings, state, and validation context.
        Follow the harness-verification skill process exactly.
        Report pass/fail with findings.
@@ -296,6 +310,10 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
        You are running harness-code-review for phase {N}: {name}.
        Session directory: {sessionDir}
+       Session slug: {sessionSlug}
+       On startup, call gather_context({ session: "{sessionSlug}" }) to load
+       session-scoped learnings, state, and validation context.
        Follow the harness-code-review skill process exactly.
        Report findings with severity (blocking / warning / note).
@@ -341,7 +359,23 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
 4. **Sync roadmap.** If `docs/roadmap.md` exists, call `manage_roadmap` with action `sync` and `apply: true`. This reflects the just-completed phase in the roadmap (e.g., updating the feature from `planned` to `in-progress`). If `manage_roadmap` is unavailable, fall back to direct file manipulation using `syncRoadmap()` from core. Skip silently if no roadmap exists. Do not use `force_sync: true` — the human-always-wins rule applies.
-5. **Check for next phase:**
+5. **Write session summary.** Update the session summary to reflect the completed phase:
+   ```json
+   writeSessionSummary(projectPath, sessionSlug, {
+     session: "<session-slug>",
+     lastActive: "<ISO timestamp>",
+     skill: "harness-autopilot",
+     phase: "<completed phase number> of <total phases>",
+     status: "Phase <N> complete. <tasks completed>/<total> tasks.",
+     spec: "<spec path>",
+     plan: "<current plan path>",
+     keyContext: "<1-2 sentences: what this phase accomplished, key decisions>",
+     nextStep: "<e.g., Continue to Phase N+1: <name>, or DONE>"
+   })
+   ```
+6. **Check for next phase:**
    - If more phases remain: "Phase {N} complete. Next: Phase {N+1}: {name} (complexity: {level}). Continue? (yes / stop)"
      - **yes** — Increment `currentPhase`, reset `retryBudget`, transition to ASSESS.
      - **stop** — Save state and exit.
@@ -387,7 +421,21 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
 5. **Update roadmap to done.** If `docs/roadmap.md` exists and the current spec maps to a roadmap feature, call `manage_roadmap` with action `update` to set the feature status to `done`. Derive the feature name from the spec title (H1 heading) or the session's `handoff.json` `summary` field. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `updateFeature()` from core. Skip silently if no roadmap exists or if the feature is not found. Do not use `force_sync: true`.
-6. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file — it serves as a record.
+6. **Write final session summary.** Update the session summary to reflect completion:
+   ```json
+   writeSessionSummary(projectPath, sessionSlug, {
+     session: "<session-slug>",
+     lastActive: "<ISO timestamp>",
+     skill: "harness-autopilot",
+     status: "DONE. <total phases> phases, <total tasks> tasks complete.",
+     spec: "<spec path>",
+     keyContext: "<1-2 sentences: overall summary of what was built>",
+     nextStep: "All phases complete. Create PR or close session."
+   })
+   ```
+7. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file — it serves as a record.
 ## Harness Integration

package/dist/agents/skills/gemini-cli/harness-brainstorming/SKILL.md CHANGED Viewed

@@ -161,7 +161,7 @@ These keywords flow into the `handoff.json` `contextKeywords` field when the spe
    - Call `manage_roadmap` with action `add`, `status: "planned"`, `milestone: "Current Work"`, and the spec path. Include a one-line summary from the spec overview.
    - If the feature already exists in the roadmap (duplicate name), skip silently — the feature was likely added manually or by a prior brainstorming session.
    - Log: `"Added '<feature-name>' to roadmap as planned"` (informational, not a prompt).
-   - If `manage_roadmap` is unavailable, fall back to direct file manipulation using `addFeature()` from core.
+   - If `manage_roadmap` is unavailable, fall back to direct file manipulation using `parseRoadmap`/`serializeRoadmap` from core to read, modify, and write `docs/roadmap.md`.
    - If no roadmap exists, skip this step silently.
 7. **Write handoff and suggest transition.** After the human approves the spec:

package/dist/agents/skills/gemini-cli/harness-code-review/SKILL.md CHANGED Viewed

@@ -212,12 +212,13 @@ gather_context({
   path: "<project-root>",
   intent: "Code review of <change description>",
   skill: "harness-code-review",
+  session: "<session-slug-if-provided>",
   tokenBudget: 8000,
   include: ["graph", "learnings", "validation"]
 })
 ```
-This replaces manual `query_graph` + `get_impact` + `find_context_for` calls with a single composite call that assembles review context in parallel, ranked by relevance. Falls back gracefully when no graph is available (`meta.graphAvailable: false`).
+This replaces manual `query_graph` + `get_impact` + `find_context_for` calls with a single composite call that assembles review context in parallel, ranked by relevance. Falls back gracefully when no graph is available (`meta.graphAvailable: false`). When `session` is provided (e.g., via autopilot dispatch), learnings and state are scoped to the session directory. If no session is known, omit the parameter — `gather_context` falls back to global files.
 For domain-specific scoping (compliance, bug detection, security, architecture), supplement `gather_context` output with targeted `query_graph` calls as needed.
@@ -528,6 +529,22 @@ Write `.harness/handoff.json`:
 }
 ```
+**Write session summary (if session is known).** If running within a session context, update the session summary:
+```json
+writeSessionSummary(projectPath, sessionSlug, {
+  session: "<session-slug>",
+  lastActive: "<ISO timestamp>",
+  skill: "harness-code-review",
+  status: "Review complete. Assessment: <approve|request-changes|comment>. <N> findings.",
+  spec: "<spec path if known>",
+  keyContext: "<1-2 sentences: review outcome, key findings>",
+  nextStep: "<e.g., Address blocking findings / Ready to merge / Observations delivered>"
+})
+```
+If no session slug is known, skip this step.
 **If assessment is "approve":**
 Call `emit_interaction`:
@@ -614,7 +631,7 @@ _This section is not part of the pipeline. It documents the process for respondi
 ## Harness Integration
 - **`assess_project`** — Used in Phase 2 (MECHANICAL) to run `validate`, `deps`, and `docs` checks in parallel. Must pass for the pipeline to continue to AI review. Failures are Critical issues that stop the pipeline.
-- **`gather_context`** — Used in Phase 3 (CONTEXT) for efficient parallel context assembly. Replaces separate graph query calls.
+- **`gather_context`** — Used in Phase 3 (CONTEXT) for efficient parallel context assembly. The `session` parameter scopes learnings and state to the session directory when provided by autopilot dispatch. Replaces separate graph query calls.
 - **`harness cleanup`** — Optional check during Phase 2 for entropy accumulation in changed files.
 - **Graph queries** — Used in Phase 3 (CONTEXT) for dependency-scoped context and in Phase 5 (VALIDATE) for reachability verification. Graceful fallback when no graph exists.
 - **`emit_interaction`** -- Call after review approval to suggest transitioning to merge/PR creation. Only emitted on APPROVE assessment. Uses confirmed transition (waits for user approval).