npm - @harness-engineering/cli - Versions diffs - 1.14.0 → 1.15.0 - Mend

@harness-engineering/cli 1.14.0 → 1.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (44) hide show

package/dist/agents/skills/claude-code/harness-autopilot/SKILL.md CHANGED Viewed

@@ -31,7 +31,7 @@ Autopilot orchestrates these persona agents — it never reimplements their logi
 - **Claude Code:** Use the Agent tool with `subagent_type` set to the persona name.
 - **Gemini CLI:** Use the `run_agent` tool targeting the persona by name, or dispatch via `harness persona run <name>`.
-**Human always approves plans.** No plan executes without explicit human sign-off, regardless of complexity level. The difference is whether autopilot generates the plan automatically or asks the human to drive planning interactively.
+**Plans are gated by concern signals.** When no concern signals fire (low complexity, no planner concerns, task count within threshold), plans are auto-approved with a structured report and execution proceeds immediately. When any signal fires, the plan pauses for human review with the standard yes/revise/skip/stop flow. The `--review-plans` session flag forces all plans to pause regardless of signals.
 ## Process
@@ -42,7 +42,7 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
                                                                          ↓
                                                                    [next phase?]
                                                                     ↓         ↓
-                                                                 ASSESS      DONE
+                                                                 ASSESS   FINAL_REVIEW → DONE
 ```
 ---
@@ -61,7 +61,9 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
    - Create the session directory if it does not exist
 3. **Check for existing state.** Read `{sessionDir}/autopilot-state.json`. If it exists and `currentState` is not `DONE`:
+   - **Schema migration:** If `schemaVersion < 3`, backfill missing fields: set `startingCommit` to the earliest commit in `history` (or current HEAD if no history), set `decisions` to `[]`, set `finalReview` to `{ "status": "pending", "findings": [], "retryCount": 0 }`. If `schemaVersion < 4`, set `reviewPlans` to `false`. Update `schemaVersion` to `4` and save.
    - Report: "Resuming autopilot from state `{currentState}`, phase {currentPhase}: {phaseName}."
+   - Skip steps 4 and 5 (initial state creation and flag parsing) — these only apply to fresh starts.
    - Skip to the recorded `currentState` and continue from there.
 4. **If no existing state (fresh start):**
@@ -70,12 +72,15 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
    - For each phase heading (`### Phase N: Name`), extract:
      - Phase name
      - Complexity annotation (`<!-- complexity: low|medium|high -->`, default: `medium`)
+   - Capture the starting commit: run `git rev-parse HEAD` and store the result as `startingCommit`.
    - Create `{sessionDir}/autopilot-state.json`:
      ```json
      {
-       "schemaVersion": 2,
+       "schemaVersion": 4,
        "sessionDir": ".harness/sessions/<slug>",
        "specPath": "<path to spec>",
+       "startingCommit": "<git rev-parse HEAD output>",
+       "reviewPlans": false,
        "currentState": "ASSESS",
        "currentPhase": 0,
        "phases": [
@@ -91,11 +96,19 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
          "maxAttempts": 3,
          "currentTask": null
        },
-       "history": []
+       "history": [],
+       "decisions": [],
+       "finalReview": {
+         "status": "pending",
+         "findings": [],
+         "retryCount": 0
+       }
      }
      ```
-5. **Load context via gather_context.** Use the `gather_context` MCP tool to load all working context efficiently:
+5. **Parse session flags.** Check CLI arguments for `--review-plans`. If present, set `state.reviewPlans: true` in the state file. This flag persists for the entire session — resuming a session preserves the setting from when it was started (the flag is only read on fresh start, not on resume).
+6. **Load context via gather_context.** Use the `gather_context` MCP tool to load all working context efficiently:
    ```json
    gather_context({
@@ -109,19 +122,19 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
    This loads session-scoped learnings, handoff, state, and validation results in a single call. The `session` parameter ensures all reads come from the session directory (`.harness/sessions/<slug>/`), isolating this workstream from others. Note any relevant learnings or known dead ends for the current phase from the returned `learnings` array.
-6. **Load session summary for cold start.** If resuming (existing `autopilot-state.json` found):
+7. **Load session summary for cold start.** If resuming (existing `autopilot-state.json` found):
    - Call `loadSessionSummary()` for the session slug to get quick orientation context (~200 tokens).
    - The summary provides the last skill, phase, status, and next step — enough to understand where the autopilot left off without re-reading the full state machine.
    - If no summary exists (first run), skip — the full INIT handles context loading.
-7. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
+8. **Load roadmap context.** If `docs/roadmap.md` exists, read it to understand:
    - Current project priorities (which features are `in-progress`)
    - Blockers that may affect the upcoming phases
    - Overall project status and milestone progress
    This provides the autopilot with project-level context beyond the individual spec being executed. If the roadmap does not exist, skip this step — the autopilot operates normally without it.
-8. **Transition to ASSESS.**
+9. **Transition to ASSESS.**
 ---
@@ -192,27 +205,114 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
 ---
-### APPROVE_PLAN — Human Review Gate
-**This state always pauses for human input.**
+### APPROVE_PLAN — Conditional Review Gate
-1. **Present the plan summary:**
+1. **Gather plan metadata:**
    - Phase name and number
-   - Task count
+   - Task count (from the plan file)
    - Checkpoint count
-   - Estimated time (task count × 3 minutes)
+   - Estimated time (task count x 3 minutes)
    - Effective complexity (original + any override)
-   - Any concerns from the planning handoff
+   - Concerns array from the planning handoff (`{sessionDir}/handoff.json` field `concerns`, default: `[]` if field is absent)
+2. **Evaluate `shouldPauseForReview`.** Check the following signals in order. If **any** signal is true, pause for human review. If **all** are false, auto-approve.
+   | #   | Signal               | Condition                             | Description                                                                                                                                                                                  |
+   | --- | -------------------- | ------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+   | 1   | `reviewPlans`        | `state.reviewPlans === true`          | Session-level flag set by `--review-plans` CLI arg                                                                                                                                           |
+   | 2   | `highComplexity`     | `phase.complexity === "high"`         | Phase is marked as high complexity in the spec (reachable when resuming after interactive planning; confirms the plan is ready for automated execution even though the human drove planning) |
+   | 3   | `complexityOverride` | `phase.complexityOverride !== null`   | Planner produced more tasks than expected for the spec complexity                                                                                                                            |
+   | 4   | `plannerConcerns`    | Handoff `concerns` array is non-empty | Planner flagged specific risks or uncertainties                                                                                                                                              |
+   | 5   | `taskCount`          | Plan contains > 15 tasks (i.e., 16+)  | Plan is large enough to warrant human review                                                                                                                                                 |
+3. **Build the signal evaluation result** for reporting and recording:
+   ```json
+   {
+     "reviewPlans": false,
+     "highComplexity": "low",
+     "complexityOverride": null,
+     "plannerConcerns": [],
+     "taskCount": 8,
+     "taskThreshold": 15
+   }
+   ```
+4. **If auto-approving (no signals fired):**
+   a. **Emit structured auto-approve report:**
+   ```
+   Auto-approved Phase 1: Setup Infrastructure
+     Review mode: auto
+     Complexity: low (no override)
+     Planner concerns: none
+     Tasks: 8 (threshold: 15)
+   ```
+   b. **Record the decision** in state `decisions` array:
+   ```json
+   {
+     "phase": 0,
+     "decision": "auto_approved_plan",
+     "timestamp": "ISO-8601",
+     "signals": {
+       "reviewPlans": false,
+       "highComplexity": "low",
+       "complexityOverride": null,
+       "plannerConcerns": [],
+       "taskCount": 8,
+       "taskThreshold": 15
+     }
+   }
+   ```
+   c. **Transition to EXECUTE** — no human interaction needed.
+5. **If pausing for review (one or more signals fired):**
-2. **Ask:** "Approve this plan and begin execution? (yes / revise / skip phase / stop)"
+   a. **Emit structured pause report** showing which signal(s) triggered:
+   ```
+   Pausing for review -- Phase 2: Auth Middleware
+     Review mode: manual (--review-plans flag set)
+     Complexity override: low -> medium (triggered)
+     Planner concerns: 2 concern(s)
+     Tasks: 12 (threshold: 15)
+   ```
+   Mark triggered signals explicitly. Non-triggered signals display their normal value without "(triggered)".
+   b. **Present the plan summary:** task count, checkpoint count, estimated time, effective complexity, and any concerns from the planning handoff.
+   c. **Ask:** "Approve this plan and begin execution? (yes / revise / skip phase / stop)"
    - **yes** — Transition to EXECUTE.
-   - **revise** — Tell user to edit the plan file directly, then re-present.
+   - **revise** — Tell user to edit the plan file directly, then re-present from step 1.
    - **skip phase** — Mark phase as `skipped` in state, transition to PHASE_COMPLETE.
    - **stop** — Save state and exit. User can resume later.
-3. **Record the decision** in state: `decisions` array.
+   d. **Record the decision** in state `decisions` array:
+   ```json
+   {
+     "phase": 0,
+     "decision": "approved_plan",
+     "timestamp": "ISO-8601",
+     "signals": {
+       "reviewPlans": true,
+       "highComplexity": "low",
+       "complexityOverride": "medium",
+       "plannerConcerns": ["concern text"],
+       "taskCount": 12,
+       "taskThreshold": 15
+     }
+   }
+   ```
+   Use the actual decision value: `approved_plan`, `revised_plan`, `skipped_phase`, or `stopped`.
-4. **Update state** with `currentState: "EXECUTE"` and save.
+6. **Update state** with `currentState: "EXECUTE"` (or appropriate state for skip/stop) and save.
 ---
@@ -289,9 +389,9 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
 2. **When the agent returns:**
    - **All checks pass:** Transition to REVIEW.
-   - **Failures found:** Surface findings to the user. Ask: "Fix these issues before review? (yes / skip verification / stop)"
-     - **yes** — Re-enter EXECUTE with targeted fixes (retry budget resets for verification fixes).
-     - **skip** — Proceed to REVIEW with verification warnings noted.
+   - **Failures found:** Surface findings to the user. Ask: "Fix these issues before review? (fix / skip verification / stop)"
+     - **fix** — Re-enter EXECUTE with targeted fixes (retry budget resets for verification fixes).
+     - **skip** — Record skip decision in `decisions` array. Proceed to REVIEW with verification warnings noted.
      - **stop** — Save state and exit.
 3. **Update state** with `currentState: "REVIEW"` and save.
@@ -320,9 +420,10 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
    ```
 2. **When the agent returns:**
+   - **Persist review findings:** Write the review findings to `{sessionDir}/phase-{N}-review.json` (array of findings with severity, file, line, title). This file is consumed by FINAL_REVIEW step 3.
    - **No blocking findings:** Report summary, transition to PHASE_COMPLETE.
-   - **Blocking findings:** Surface to user. Ask: "Address blocking findings before completing this phase? (yes / override / stop)"
-     - **yes** — Re-enter EXECUTE with review fixes.
+   - **Blocking findings:** Surface to user. Ask: "Address blocking findings before completing this phase? (fix / override / stop)"
+     - **fix** — Re-enter EXECUTE with review fixes.
      - **override** — Record override decision, transition to PHASE_COMPLETE.
      - **stop** — Save state and exit.
@@ -379,7 +480,76 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
    - If more phases remain: "Phase {N} complete. Next: Phase {N+1}: {name} (complexity: {level}). Continue? (yes / stop)"
      - **yes** — Increment `currentPhase`, reset `retryBudget`, transition to ASSESS.
      - **stop** — Save state and exit.
-   - If no more phases: Transition to DONE.
+   - If no more phases: Transition to FINAL_REVIEW.
+---
+### FINAL_REVIEW — Project-Wide Code Review
+> Runs automatically after the last phase completes. Reviews the cumulative diff (`startingCommit..HEAD`) across all phases to catch cross-phase issues before the PR offer.
+1. **Update state** with `currentState: "FINAL_REVIEW"` and save.
+2. **Update `finalReview` tracking** in `autopilot-state.json`: set `finalReview.status` to `"in_progress"`.
+3. **Gather per-phase review findings.** Read from `{sessionDir}/` — each phase's review output is stored alongside the phase handoff. Collect all review findings across phases into a single context block.
+4. **Dispatch review agent using the Agent tool:**
+   ```
+   Agent tool parameters:
+     subagent_type: "harness-code-reviewer"
+     description: "Final review: cross-phase coherence check"
+     prompt: |
+       You are running harness-code-review as a final project-wide review.
+       Diff scope: startingCommit..HEAD (use `git diff {startingCommit}..HEAD`)
+       Starting commit: {startingCommit}
+       Session directory: {sessionDir}
+       Session slug: {sessionSlug}
+       On startup, call gather_context({ session: "{sessionSlug}" }) to load
+       session-scoped learnings, state, and validation context.
+       ## Per-Phase Review Findings
+       {collected per-phase findings}
+       These were found and addressed during per-phase reviews. Don't assume
+       they're resolved — verify. Focus extra attention on cross-phase coherence:
+       naming consistency, duplicated utilities, architectural drift across phases.
+       Review the FULL diff (startingCommit..HEAD), not just the last phase.
+       Report findings with severity (blocking / warning / note).
+   ```
+5. **When the agent returns:**
+   - **No blocking findings:** Store all findings (blocking, warning, note) in `finalReview.findings`. Update `finalReview.status` to `"passed"`, report summary, transition to DONE.
+   - **Blocking findings:** Store all findings (blocking, warning, note) in `finalReview.findings`. Surface blocking findings to user. Ask: "Address blocking findings before completing? (fix / override / stop)"
+     - **fix** — Increment `finalReview.retryCount`. If `retryCount <= 3`: dispatch fixes using the Agent tool, then run `harness validate` to verify the fix, then re-run FINAL_REVIEW from step 2 (re-sets status to `in_progress`, re-gathers per-phase findings for fresh context). If `retryCount > 3`: stop — present all attempts to user, record in `.harness/failures.md`, ask: "How should we proceed? (fix manually and continue / stop)"
+       Fix dispatch:
+       ```
+       Agent tool parameters:
+         subagent_type: "harness-task-executor"
+         description: "Fix final review findings"
+         prompt: |
+           Fix the following blocking review findings. One task per finding.
+           {blocking findings with file, line, title, and rationale}
+           Session directory: {sessionDir}
+           Session slug: {sessionSlug}
+           Follow the harness-execution skill process. Commit each fix atomically.
+           Write {sessionDir}/handoff.json when done.
+       ```
+     - **override** — Record override decision (rationale from user) in state `decisions` array. Update `finalReview.status` to `"overridden"`. Transition to DONE.
+     - **stop** — Save state and exit. Resumable from FINAL_REVIEW.
+6. **Update state** and save after each step.
 ---
@@ -390,7 +560,8 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
    - Total tasks across all phases
    - Total retries used
    - Total time (first phase start to last phase completion)
-   - Any overridden review findings
+   - Final review result: `finalReview.status` (passed / overridden) and total findings count from `finalReview.findings`
+   - Any overridden review findings (per-phase and final)
 2. **Offer next steps:**
    - "Create a PR? (yes / no)"
@@ -407,7 +578,11 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
      "pending": [],
      "concerns": [],
      "decisions": ["<all decisions from all phases>"],
-     "contextKeywords": ["<merged from spec>"]
+     "contextKeywords": ["<merged from spec>"],
+     "finalReview": {
+       "status": "<passed | overridden>",
+       "findingsCount": "<number of findings from final review>"
+     }
    }
    ```
@@ -419,9 +594,13 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
    - [skill:harness-autopilot] [outcome:observation] {any notable patterns from the run}
    ```
-5. **Update roadmap to done.** If `docs/roadmap.md` exists and the current spec maps to a roadmap feature, call `manage_roadmap` with action `update` to set the feature status to `done`. Derive the feature name from the spec title (H1 heading) or the session's `handoff.json` `summary` field. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `updateFeature()` from core. Skip silently if no roadmap exists or if the feature is not found. Do not use `force_sync: true`.
+5. **Promote session learnings to global.** Call `promoteSessionLearnings(projectPath, sessionSlug)` to move generalizable session learnings (tagged `[outcome:gotcha]`, `[outcome:decision]`, `[outcome:observation]`) to the global `learnings.md`. Report: "Promoted {N} learnings to global, {M} session-specific entries kept in session."
+6. **Check if pruning is needed.** Call `countLearningEntries(projectPath)`. If the count exceeds 30, suggest: "Global learnings.md has {count} entries (threshold: 30). Run `harness learnings prune` to analyze patterns and archive old entries."
+7. **Update roadmap to done.** If `docs/roadmap.md` exists and the current spec maps to a roadmap feature, call `manage_roadmap` with action `update` to set the feature status to `done`. Derive the feature name from the spec title (H1 heading) or the session's `handoff.json` `summary` field. If `manage_roadmap` is unavailable, fall back to direct file manipulation using `updateFeature()` from core. Skip silently if no roadmap exists or if the feature is not found. Do not use `force_sync: true`.
-6. **Write final session summary.** Update the session summary to reflect completion:
+8. **Write final session summary.** Update the session summary to reflect completion:
    ```json
    writeSessionSummary(projectPath, sessionSlug, {
@@ -435,7 +614,7 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
    })
    ```
-7. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file — it serves as a record.
+9. **Clean up state:** Set `currentState: "DONE"` in `{sessionDir}/autopilot-state.json`. Do not delete the file — it serves as a record.
 ## Harness Integration
@@ -444,7 +623,7 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
 - **`harness check-deps`** — Delegated to harness-execution (included in task steps).
 - **State file** — `.harness/sessions/<slug>/autopilot-state.json` tracks the orchestration state machine. `.harness/sessions/<slug>/state.json` tracks task-level execution state (managed by harness-execution). The slug is derived from the spec path during INIT.
 - **Handoff** — `.harness/sessions/<slug>/handoff.json` is written by each delegated skill and read by the next. Autopilot writes a final handoff on DONE.
-- **Learnings** — `.harness/learnings.md` (global) is appended by both delegated skills and autopilot itself.
+- **Learnings** — `.harness/learnings.md` (global) is appended by both delegated skills and autopilot itself. On DONE, session learnings with generalizable outcomes are promoted to global via `promoteSessionLearnings`. If global count exceeds 30, autopilot suggests running `harness learnings prune`.
 - **Roadmap context** — During INIT, reads `docs/roadmap.md` (if present) for project-level priorities, blockers, and milestone status. Provides broader context for phase execution decisions.
 - **Roadmap sync** — During PHASE_COMPLETE, calls `manage_roadmap` with `sync` and `apply: true` to reflect phase progress. During DONE, calls `manage_roadmap` with `update` to set feature status to `done`. Both skip silently when no roadmap exists. Neither uses `force_sync: true`.
@@ -456,7 +635,8 @@ INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → REVIEW →
 - Planning override bumps complexity upward when task signals disagree
 - Retry budget (3 attempts) with escalating context before surfacing failures
 - Existing skills (planning, execution, verification, review) are unchanged
-- Human approves every plan before execution begins
+- Plans auto-approve when no concern signals fire; plans pause for human review when any signal fires
+- `--review-plans` flag forces human review for all plans in a session
 - Phase completion summary shown between every phase
 ## Examples
@@ -491,10 +671,11 @@ Plan generated: docs/plans/2026-03-19-core-scanner-plan.md (8 tasks, ~24 min)
 **Phase 1 — APPROVE_PLAN:**
 ```
-Phase 1: Core Scanner
-Tasks: 8 | Checkpoints: 1 | Est. time: 24 min | Complexity: low
-Approve this plan and begin execution? (yes / revise / skip / stop)
-→ User: "yes"
+Auto-approved Phase 1: Core Scanner
+  Review mode: auto
+  Complexity: low (no override)
+  Planner concerns: none
+  Tasks: 8 (threshold: 15)
 ```
 **Phase 1 — EXECUTE → VERIFY → REVIEW:**
@@ -533,10 +714,22 @@ Resuming autopilot from state PLAN, phase 2: Rule Engine.
 Found plan: docs/plans/2026-03-19-rule-engine-plan.md
 ```
-**Phase 2 — APPROVE_PLAN → EXECUTE → VERIFY → REVIEW → PHASE_COMPLETE**
+**Phase 2 — APPROVE_PLAN:**
 ```
-[Same flow as Phase 1, with checkpoint pauses as needed]
+Pausing for review -- Phase 2: Rule Engine
+  Review mode: auto
+  Complexity: high (triggered)
+  Planner concerns: none
+  Tasks: 14 (threshold: 15)
+Approve this plan and begin execution? (yes / revise / skip / stop)
+→ User: "yes"
+```
+**Phase 2 — EXECUTE → VERIFY → REVIEW → PHASE_COMPLETE**
+```
+[Execution with checkpoint pauses as needed]
 Phase 2: Rule Engine — COMPLETE
 Tasks: 14/14 | Retries: 1 | Verification: pass | Review: 0 blocking
 Next: Phase 3: CLI Integration (complexity: low). Continue? (yes / stop)
@@ -545,11 +738,19 @@ Next: Phase 3: CLI Integration (complexity: low). Continue? (yes / stop)
 **Phase 3 — [auto-plans, executes, completes]**
+**FINAL_REVIEW:**
+```
+[harness-code-reviewer runs cross-phase review on startingCommit..HEAD]
+Final review: 0 blocking, 1 warning. Passed.
+```
 **DONE:**
 ```
 All phases complete.
 Total: 3 phases, 30 tasks, 1 retry
+Final review: passed (0 blocking, 1 warning)
 Create a PR? (yes / no)
 → User: "yes"
 ```
@@ -592,7 +793,7 @@ How should we proceed? (fix manually and continue / revise plan / stop)
 ## Gates
 - **No reimplementing delegated skills.** Autopilot orchestrates. If you are writing planning logic, execution logic, verification logic, or review logic, STOP. Delegate to the appropriate persona agent via `subagent_type`.
-- **No executing without plan approval.** Every plan must be explicitly approved by the human before execution begins. No exceptions, regardless of complexity level.
+- **No executing without plan approval.** Every plan passes through the APPROVE_PLAN gate. When no concern signals fire, the plan is auto-approved with a structured report. When any signal fires, the plan pauses for human review. The `--review-plans` flag forces all plans to pause. No plan reaches EXECUTE without passing this gate.
 - **No skipping VERIFY or REVIEW.** Every phase goes through verification and review. The human can override findings, but the steps cannot be skipped.
 - **No infinite retries.** The retry budget is 3 attempts. If exhausted, STOP and surface to the human. Do not extend the budget without explicit human instruction.
 - **No modifying session state files manually.** The session state files are managed by the skill. If the state appears corrupted, start fresh rather than patching it.

package/dist/agents/skills/claude-code/harness-autopilot/skill.yaml CHANGED Viewed

@@ -23,6 +23,9 @@ cli:
     - name: path
       description: Project root path
       required: false
+    - name: review-plans
+      description: Force human review of all plans (overrides auto-approve)
+      required: false
 mcp:
   tool: run_skill
   input:
@@ -37,6 +40,9 @@ phases:
   - name: loop
     description: Execute state machine — assess, plan, execute, verify, review per phase
     required: true
+  - name: final_review
+    description: Project-wide code review of cumulative changes before PR offer
+    required: true
   - name: complete
     description: Final summary and PR offering
     required: true

package/dist/agents/skills/claude-code/harness-product-spec/SKILL.md CHANGED Viewed

@@ -43,7 +43,7 @@
    ```
 5. **Load project context.** Scan the project for existing specs, user stories, or PRDs to maintain consistency in format and terminology:
-   - Check `docs/specs/`, `docs/requirements/`, `docs/prd/` for existing documents
+   - Check `docs/changes/`, `docs/requirements/`, `docs/prd/` for existing documents
    - Check `.github/ISSUE_TEMPLATE/` for the project's preferred issue format
    - Identify domain terminology used in existing specs
@@ -133,7 +133,7 @@
    REQ-003 -> US-004, US-005 (could-have)
    ```
-5. **Write the PRD to file.** Save to the project's spec directory (detected in Phase 1 or defaulting to `docs/specs/`). Use a filename pattern: `YYYY-MM-DD-feature-name-prd.md`.
+5. **Write the PRD to file.** Save to the project's spec directory (detected in Phase 1 or defaulting to `docs/changes/`). Use a filename pattern: `YYYY-MM-DD-feature-name-prd.md`.
 ---
@@ -171,7 +171,7 @@
    Coverage: all actors covered, all constraints addressed
    Open questions: N remaining
-   Generated: docs/specs/2026-03-27-notifications-prd.md
+   Generated: docs/changes/2026-03-27-notifications-prd.md
    ```
 ---
@@ -227,7 +227,7 @@ Phase 2: CRAFT
               And their other preferences remain unchanged.
 Phase 3: GENERATE
-  Written: docs/specs/2026-03-27-team-notifications-prd.md
+  Written: docs/changes/2026-03-27-team-notifications-prd.md
   Sections: problem statement, 4 user stories, 12 acceptance criteria, 8 BDD scenarios
   Traceability: REQ-001 -> US-001, US-002 | REQ-002 -> US-003, US-004
@@ -260,7 +260,7 @@ Phase 2: CRAFT
             return 400 and log a security warning.
 Phase 3: GENERATE
-  Written: docs/specs/2026-03-27-stripe-webhooks-prd.md
+  Written: docs/changes/2026-03-27-stripe-webhooks-prd.md
   Technical constraints section includes: idempotency keys, signature verification,
     5-second response SLA, Stripe retry behavior documentation