npm - @lannguyensi/harness - Versions diffs - 0.17.4 → 0.19.0 - Mend

@lannguyensi/harness 0.17.4 → 0.19.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (45) hide show

package/CHANGELOG.md +65 -0
package/README.md +86 -201
package/dist/cli/approve/understanding.js +51 -35
package/dist/cli/approve/understanding.js.map +1 -1
package/dist/cli/doctor/format.js +20 -2
package/dist/cli/doctor/format.js.map +1 -1
package/dist/cli/doctor/index.d.ts +8 -0
package/dist/cli/doctor/index.js +27 -1
package/dist/cli/doctor/index.js.map +1 -1
package/dist/cli/doctor/npm-bin-path.d.ts +23 -0
package/dist/cli/doctor/npm-bin-path.js +82 -0
package/dist/cli/doctor/npm-bin-path.js.map +1 -0
package/dist/cli/doctor/types.d.ts +20 -4
package/dist/cli/doctor/types.js.map +1 -1
package/dist/cli/index.js +19 -2
package/dist/cli/index.js.map +1 -1
package/dist/cli/init/agent-tasks-auth.d.ts +32 -0
package/dist/cli/init/agent-tasks-auth.js +75 -0
package/dist/cli/init/agent-tasks-auth.js.map +1 -0
package/dist/cli/init/composer.js +11 -0
package/dist/cli/init/composer.js.map +1 -1
package/dist/cli/init/dependencies.js +7 -3
package/dist/cli/init/dependencies.js.map +1 -1
package/dist/cli/init/interactive.d.ts +5 -0
package/dist/cli/init/interactive.js +162 -4
package/dist/cli/init/interactive.js.map +1 -1
package/dist/cli/init/profiles.d.ts +2 -2
package/dist/cli/init/profiles.js +30 -0
package/dist/cli/init/profiles.js.map +1 -1
package/dist/cli/init/templates.d.ts +1 -1
package/dist/cli/init/templates.js +37 -1
package/dist/cli/init/templates.js.map +1 -1
package/dist/cli/pack/hook-post-tool-use.d.ts +19 -0
package/dist/cli/pack/hook-post-tool-use.js +168 -0
package/dist/cli/pack/hook-post-tool-use.js.map +1 -0
package/dist/cli/pack/hook-pre-tool-use.js +5 -2
package/dist/cli/pack/hook-pre-tool-use.js.map +1 -1
package/dist/cli/session-start/index.js +8 -1
package/dist/cli/session-start/index.js.map +1 -1
package/dist/policy-packs/builtin/understanding-before-execution-runtime.d.ts +47 -1
package/dist/policy-packs/builtin/understanding-before-execution-runtime.js +98 -1
package/dist/policy-packs/builtin/understanding-before-execution-runtime.js.map +1 -1
package/dist/policy-packs/builtin/understanding-before-execution.js +87 -2
package/dist/policy-packs/builtin/understanding-before-execution.js.map +1 -1
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,71 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
 ## [Unreleased]
+## [0.19.0] - 2026-05-17
+**Headline: setup UX gap closed for non-agent-tasks operators.** Through v0.18.x several pieces of the harness experience silently degraded for operators picking Solo (or Team without an agent-tasks account): the new per-task understanding-gate marker expiry never fired because the configured boundary list was agent-tasks MCP names, `harness init --interactive` left the bridge wired but unauthenticated, `harness doctor` flagged the deliberately operator-driven `dogfood-before-release` policy as a missing-producer false positive, and an nvm-drift class of bug went undiagnosed. This release closes those four gaps: profile-aware reset defaults plus a new `expire_on_bash_match` regex list for gh-CLI workflows, a post-install auth probe with login / skip / abort dialog, doctor respect for the policy's own `producers:` array, and a doctor warning that catches when `npm prefix -g`'s bin dir is not on PATH. Doc cleanup made the external-account assumptions of each profile explicit up-front.
+### Added
+- **understanding-gate: `approval_lifecycle.expire_on_bash_match`** (harness/f54e0ecb). New optional schema field on the `understanding-before-execution` pack config: a string array of regex patterns matched against the `Bash` tool's `tool_input.command`. When a Bash command matches, the per-session approval marker is deleted on PostToolUse, same semantics as the existing `expire_on_tool_match` does for MCP tool names. Enables gh-CLI / pure-Bash workflows to declare task boundaries (e.g. `^gh pr (merge|close)\b`, `^git push origin (master|main)\b`) so the gate's per-task re-prompt works for them too. Profile defaults updated: Solo drops the agent-tasks tool list (dead weight there) and ships only the Bash list with `max_age: 1h`; Team and Full keep the tool list and add the Bash list for hybrid coverage. Patterns are pre-compiled at parse time, invalid regexes dropped with stderr warnings. Round-trip regression tests in `tests/cli/init-full-template-pins.test.ts` parse each template through `yaml.parse + new RegExp + .test()` to pin the escape-pipeline correctness, since the unit-level tests bypass that surface.
+- **`harness doctor`: warn when `npm prefix -g`'s bin dir is not on PATH** (harness/4ddd78ed). Surfaces the nvm-drift footgun where `harness init --interactive` runs `npm i -g` against the active Node's prefix but the operator's shell PATH points at a different one, so installed binaries are silently invisible to subsequent doctor probes. Doctor now resolves the bin dir via `npm prefix -g` (the modern replacement for the removed `npm bin -g`) and renders an `Environment` section with the actionable PATH-patch suggestion when the bin dir is not in `process.env.PATH`. The section stays absent on ok and on the unknown branch (npm missing); skipped under `--shallow` so the 100ms timing budget stays intact.
+- **`harness init --interactive`: post-install auth probe for the agent-tasks bridge** (harness/3f775180). After a successful `npm i -g @agent-tasks/mcp-bridge`, the wizard runs `agent-tasks-mcp-bridge status` to detect whether a token is configured. Three branches:
+  - **ok**: prints `✓ agent-tasks token validated against the backend.` and continues.
+  - **token present but validation fails** (backend unreachable, expired token, wrong base URL): prints an informational warning naming the bridge's reason and continues. The wizard does not block on this because the recovery is not actionable from inside it.
+  - **no token stored**: opens a three-option dialog: (a) run `agent-tasks-mcp-bridge login` interactively now via stdio pass-through, (b) skip with a reminder, (c) abort the wizard with a pointer to the signup URL and the re-run command. After a successful login the wizard re-probes to confirm.
+  Closes the silent footgun where a fresh operator could finish the wizard with `harness doctor` reporting all-green but every `mcp__agent-tasks__*` call returning an auth error.
+- **FULL_TEMPLATE `git-preflight` hook pin: `min_version: "0.1.1"` + `version_command: ["preflight", "--version"]`** (agent-preflight/cb5a1770). Same pattern as the existing pins for `agent-tasks-mcp-bridge`, `grounding-mcp`, `memory-router-user-prompt-submit`. Floor at agent-preflight 0.1.1, the release that distinguishes "tool not installed" (e.g. an npm script invoking eslint that is not in devDependencies) from real lint/test/typecheck failures. Stale 0.1.0 installs silently emit false-positive blockers that keep the `preflight-before-*` policies closed forever; with the floor wired, `harness doctor` now warns operators to upgrade.
+### Changed
+- **`harness doctor`: producer-gap warning now respects the policy's own `producers:` array** (harness/f97e152f). A `block` policy with a `within:` window used to be flagged with `⚠ ... no manifest hook produces it` whenever no automatic SessionStart hook wrote the required tag, even when the policy itself declared a `producers:` entry pointing the agent at the manual recovery (`mcp__agent-grounding__ledger_add`). For `dogfood-before-release` in the Full template that was a false positive: the gate is deliberately operator-driven (an automatic SessionStart producer would defeat its purpose), and the `producers:` array IS the schema-blessed manual recovery path the agent sees in the deny envelope. Doctor now treats a non-empty `producers:` array as a documented producer and suppresses the warning. The warning still fires when both kinds are absent. Visible effect on the Full template: one fewer false-positive warning (dogfood-before-release flips from `⚠` to `✓`); the two preflight policies were already satisfied by the `git-preflight` SessionStart hook and stay green.
+- **Profile dependency clarity in README + wizard** (harness/75de11c4). README, `docs/init-interactive.md`, `docs/for-humans.md`, the wizard's profile-choice descriptions, and the Team-profile confirm prompt now state the external-account assumptions of each profile up-front: Solo is standalone, Team requires an agent-tasks account, Full additionally requires `@lannguyensi/agent-preflight` and `gh` on PATH. The wizard also prints a post-init reminder for Team/Full operators naming `agent-tasks-mcp-bridge login` as the auth recovery path and `--template solo` as the fallback for non-agent-tasks workflows.
+## [0.18.0] - 2026-05-17
+**Headline: per-task understanding-gate marker expiry.** Through v0.17.x the approval marker had no lifetime: one `harness approve understanding` covered every subsequent Edit / Write / Bash for the whole session. That contract was correct when the gate was about "agent starts a session, picks ONE interpretation, runs", but no longer matches multi-task sessions, where a stale interpretation can silently drive the next task's edits. Live failure mode from the v0.17.4 dogfood: three sequential tasks in one session, marker stayed valid across all three, the third task started implementing the wrong fix surface before the operator caught the misdiagnose. v0.18 expires the marker on configurable task-boundary tools and (optionally) on a TTL safety net, so a fresh task gets a fresh Understanding Report. Backing task: agent-tasks/d8ee60ca.
+**Operator action required (sort of):** the new behaviour is default-on for every install via `harness init --template solo / team / full` and via `init --interactive` Custom. Existing manifests that already use the pack will see the stricter behaviour on the next `harness apply` if they re-render from the template. Operators who prefer the legacy "one approval per session, no expiry" contract opt out by setting `policy_packs[].config.approval_lifecycle: { mode: "session" }`. Manifests that copy the pack config verbatim from the README / docs and pin it inline keep working unchanged until they explicitly add the new block.
+### Added
+- **`config.approval_lifecycle` on the understanding-before-execution pack** (agent-tasks/d8ee60ca). New schema-shape under the pack's `config:`:
+  ```yaml
+  policy_packs:
+    - name: understanding-before-execution
+      config:
+        approval_lifecycle:
+          expire_on_tool_match:
+            - mcp__agent-tasks__task_finish
+            - mcp__agent-tasks__task_abandon
+            - mcp__agent-tasks__pull_requests_merge
+          max_age: 4h
+  ```
+  `expire_on_tool_match` is a list of tool name patterns whose successful PostToolUse fires marker expiry. `max_age` is a duration (`24h` / `30m` / `PT1H` / ...) that the PreToolUse blocker enforces against the marker's `approvedAt` field. Both are optional. `{ mode: "session" }` opts out of both and restores the legacy behaviour. Coupling note: the default tool list names `mcp__agent-tasks__*` verbs because that is what every wizard-defaulted install uses, but the field is purely string-based, so operators on Linear / JIRA / GitHub Projects override with their own task-system verbs.
+- **PostToolUse marker-expiry hook** (`harness pack hook post-tool-use`, new subcommand). Reads the PostToolUse event JSON from stdin and, when the just-completed tool matches the pack's `expire_on_tool_match` list, deletes the per-session approval marker. Fails closed-to-noop: any error path is logged and the hook exits 0, so a bug in this code never escalates into a session-wide tool block. Worst case the marker persists past the intended boundary, which degrades to the legacy per-session contract.
+- **`checkApprovalMarker` honours `opts.maxAgeMs`** (extended). When set, a marker whose `approvedAt` is older than `now - maxAgeMs` is treated as expired and returns `matched:false` with an "expired" detail, so the agent sees the same "no approval" UX as a never-approved session and must re-approve. A marker with no readable `approvedAt` (body corrupted, missing field) skips the freshness check, so the existence-only DoS-resistance contract from v0.13.0 still wins.
+### Changed
+- **`init --template solo / team / full` + Custom-composer all ship `approval_lifecycle` defaults by default.** Re-running `harness init --force` on an existing install picks them up; an existing operator-edited manifest keeps the legacy behaviour unchanged until the operator manually adds the block or re-renders from a template.
+- **`policy_packs[].config.approval_lifecycle` flows into the pack-expand surface.** `expandPolicyPacks` now contributes 4 Claude hooks instead of 3 (UserPromptSubmit + Stop + PreToolUse + the new PostToolUse). Operators who pinned the v0.17 3-hook shape in custom infrastructure should expect the new hook in their generated `settings.json` after the next `harness apply`.
+### Verification
+- `npm test`: 1361/1361 (was 1344, +17 new tests across `tests/cli/pack-hook-post-tool-use.test.ts`, `tests/policy-packs/marker-max-age.test.ts`, and additions to `tests/policy-packs/expand.test.ts`).
+- `npm run typecheck`: clean.
+- Golden fixture: `docs/examples/full-manifest.expected.yaml` updated for the new pack config block.
 ## [0.17.4] - 2026-05-17
 **Headline: `harness init --interactive` wire-now actually wires settings.json now.** Closes a silent-no-op bug surfaced during the v0.17.2 dogfood (operator picked Full, picked claude-code in wire-now, but branch-protection's hooks never reached `~/.claude/settings.json`). Root cause: wireRuntime called `apply({ target, merge: true })` without `overwriteDrift`. A pre-existing stale or missing `~/.claude/harness.generated/.last-apply` snapshot made the freshly-rendered `harness.generated/settings.json` look like full-file drift, so apply returned `outcome: "drift-refuse"` without throwing. wireRuntime only checked `targetWritten` and printed nothing when it was false — leaving the operator with a "restart hint" line that implied success while settings.json was never updated. Fix: init's wire-now passes `overwriteDrift: true` with an auto-confirm prompt. Drift safeguards remain in place for ad-hoc `harness apply`; init's canonical "start from scratch" intent now always lands. Backing task: agent-tasks/df68b3e6.

package/README.md CHANGED Viewed

@@ -11,18 +11,13 @@ applies, audits, and *enforces*.
 > exact context, and why.
 A coding agent like Claude Code is configured across half a dozen
-files: `settings.json`, `CLAUDE.md`, memory notes, MCP registrations,
-hook scripts, per-project overrides. No single file answers *"what can
-this agent do right now, and why is it set up that way?"*, so
-configuration drifts between sessions, rules you wrote down in one
-place quietly stop firing, and a broken tool is discovered only by
-tripping over it.
-`harness` puts all of that in one YAML file you can read, validate,
-and diff. From that file it generates the config the agent actually
-loads, and at runtime it enforces the rules you declared: it blocks a
-tool call that violates one, and records every decision so you can
-see what fired and why.
+files (`settings.json`, `CLAUDE.md`, memory notes, MCP registrations,
+hook scripts, per-project overrides), and no single file answers
+*"what can this agent do right now, and why is it set up that way?"*.
+`harness` puts all of it in one YAML you read, validate, and diff;
+generates the config the agent loads from it; and at runtime blocks
+tool calls that violate the declared rules while recording every
+decision.
 ## See it work
@@ -31,13 +26,20 @@ until it has logged a review.*
 Claude Code goes to merge PR 42. Before the tool call runs, the
 runtime hands the event to `harness`, which checks it against the
-manifest:
+manifest. The hook protocol wire shape is the legacy engine-vocabulary
+envelope (operators see this on stderr; agents read it via
+`permissionDecisionReason` when the policy declares no `ux:` block):
 ```console
 $ harness policy intercept       # Claude Code runs this before each tool call
 {"decision":"block","reason":"review-before-merge: no matching ledger entry for tag `review:42`","hookSpecificOutput":{"hookEventName":"PreToolUse","permissionDecision":"deny","permissionDecisionReason":"review-before-merge: no matching ledger entry for tag `review:42`"}}
 ```
+Built-in block-enforcement policies ship a `ux:` block since v0.17.0,
+so the agent sees a plain-language three-section form
+([`docs/for-agents.md`](docs/for-agents.md#agent-facing-block-messages-ux-block));
+the engine-vocabulary text above stays in the audit ledger.
 Blocked. `harness explain` says exactly why:
 ```console
@@ -78,52 +80,6 @@ timestamp            policy               outcome  reason
 Declare the rule once; every session is held to it, with a paper
 trail of every decision.
-## What the agent sees vs what the engine records
-A policy has two readers: the audit ledger (which wants every internal
-detail) and the agent (which only needs to know what is blocked, what
-condition is missing, and which command satisfies it). Declaring a
-policy's `ux:` block splits those readers cleanly.
-Engine-internal model (unchanged): session IDs, ledger entries,
-attestations, provenance chains, policy DAGs. All of it still feeds
-`audit`, `explain --trace`, and the evidence-ledger writes that
-`session-export` replays.
-Agent-facing model (new, opt-in per policy): `cannot` (what is
-blocked), `required` (the missing precondition, in plain words), and
-`run` (the exact command to satisfy it). When `ux:` is declared, the
-agent sees only this shape, with `${VAR}` references substituted
-against the same context the `ledger_tag` resolved against.
-```yaml
-policies:
-  - name: preflight-before-investigation
-    requires: { ledger_tag: "preflight:${REPO}", within: "1h" }
-    enforcement: block
-    ux:
-      cannot: "You cannot investigate this repository yet."
-      required: ["verified repository preflight"]
-      run: ["harness preflight"]
-```
-On block, the agent sees:
-```
-You cannot investigate this repository yet.
-Required:
-- verified repository preflight
-Run:
-  harness preflight
-```
-Not `no matching ledger entry for tag preflight:harness`. The
-internal failure (tag, hint, matched count) is still written to the
-ledger for `audit` and `explain --trace`. Policies without `ux:` keep
-the legacy deny envelope unchanged.
 ## Concepts in six lines
 | Term | What it is |
@@ -152,25 +108,16 @@ flowchart LR
     observe -. refine .-> declare
 ```
-One manifest declares grounding, tools, memory, hooks, policies, and
-workflows. `apply` materialises that into the files Claude Code
-actually reads. At runtime, hooks and policies enforce the contract
-and write decision rows to the evidence ledger. The read-side
-surfaces (`audit`, `explain --trace`, `session-export`) replay those
-rows so you can see what fired, why, and across which session.
-Whatever you learn from observing flows back into the manifest. That
-loop is the whole product.
+Observe → refine → declare is the whole loop. The read-side surfaces
+(`audit`, `explain --trace`, `session-export`) replay rows the runtime
+already recorded, so what flows back into the manifest is grounded in
+what actually happened.
 ## Pick your audience
-- **Operator?** Read [`docs/for-humans.md`](docs/for-humans.md). It
-  walks from `npm i -g @lannguyensi/harness` through your first
-  `apply`, your first real policy, and the diagnostics cheat sheet.
-- **Agent (or onboarding one)?** Read
-  [`docs/for-agents.md`](docs/for-agents.md). It defines the
-  workflow lifecycle, the policy / ledger sequence, the CLI cheat
-  sheet split by side-effect class, and the audit triumvirate
-  (`audit` vs `explain --trace` vs `session-export`).
+- **Operator?** [`docs/for-humans.md`](docs/for-humans.md): install through first `apply`, first real policy, diagnostics cheat sheet.
+- **Agent (or onboarding one)?** [`docs/for-agents.md`](docs/for-agents.md): workflow lifecycle, policy / ledger sequence, CLI cheat sheet by side-effect class, the audit triumvirate.
+- **Writing your own policy?** [`docs/writing-custom-policies.md`](docs/writing-custom-policies.md): three tripwires, four worked recipes (each validated in CI), author loop, field reference.
 ## Install
@@ -189,11 +136,21 @@ command path, install to wired-in, no prose.
 harness init --interactive
 ```
-Guided wizard that detects your environment (existing `~/.claude/` and
-`~/.codex/`, MCP servers already wired in `settings.json`, harness
-binary version), picks a profile (`solo` / `team` / `custom`), and
-writes a starting `harness.yaml`. Ctrl-C at any prompt aborts with no
-partial write. Walkthrough + limitations: `docs/init-interactive.md`.
+Guided wizard. Detects `~/.claude/` and `~/.codex/`, MCP servers
+already wired in `settings.json`, harness binary version. Picks a
+profile (`solo` / `team` / `custom`) and writes a starting
+`harness.yaml`. Ctrl-C aborts cleanly. Walkthrough +
+limitations: [`docs/init-interactive.md`](docs/init-interactive.md).
+### Profiles at a glance
+| Profile | External accounts / tools required | Best for |
+|---------|------------------------------------|----------|
+| `solo`  | None. `npm` + Claude Code is enough. | Single operators who want the Understanding Gate without committing to a tasking system. |
+| `team`  | An **agent-tasks** account ([hosted](https://agent-tasks.opentriologue.ai) or [self-hosted](https://github.com/LanNguyenSi/agent-tasks)). | Teams that already use `agent-tasks` for PR review tracking. The merge gate (`review:<pr-number>` ledger tag) wires against the agent-tasks MCP. |
+| `full`  | Same as `team` plus `@lannguyensi/agent-preflight` and `gh` on PATH. | Operators who want every reference policy enforced (dogfood gate, preflight gates, review-subagent gate, merge gate). |
+**Not using agent-tasks?** Pick `solo`. The `team` and `full` review gates currently match only the agent-tasks MCP tool names, so a `gh pr create` workflow stays unprotected by them today. Tool-agnostic gates that also match `gh pr` are tracked in the backlog.
 If you prefer non-interactive (CI, fresh-VM provisioning), pick a
 template directly:
@@ -204,17 +161,14 @@ harness init --template team   # solo + agent-tasks MCP + review-before-merge po
 harness init --template full   # everything from the Appendix A reference manifest
 ```
-Debug what the harness sees in your env without writing anything:
+Use `harness init --probe` for a JSON snapshot of detected runtimes
+and MCPs without writing anything.
-```bash
-harness init --probe   # JSON snapshot of detected runtimes + MCPs + manifest
-```
-## Try it yourself
+## Try it without installing
-The demo above shows the runtime path. To see policy matching without
-installing anything or touching the ledger, run `dry-run` against the
-reference manifest:
+`harness dry-run` reports which hooks fire and which policies match
+for a given tool call, against the reference manifest, before any
+ledger I/O:
 ```bash
 git clone https://github.com/LanNguyenSi/harness && cd harness
@@ -225,44 +179,23 @@ node dist/cli/main.js dry-run "merge PR 42" \
   --config docs/examples/full-manifest.yaml
 ```
-`dry-run` reads the reference manifest, runs the trigger matcher,
-substitutes `${PR_NUMBER}=42` through the JSONPath-restricted extract
-DSL, and tells you exactly which hooks would fire and which policies
-would match, before any ledger I/O.
-The reference manifest is a schema-coverage example, not a runnable
-config. `harness validate --config docs/examples/full-manifest.yaml`
-will report errors for install-specific hook script paths it
-references (and warnings for binaries like `git-batch` that only exist
-in a real install). That is expected; the file header spells out the
-contract. Use `harness init --template full` to get a manifest
-tailored to your machine.
-Convinced? Install globally and set up your own:
-`npm i -g @lannguyensi/harness && harness init --interactive`.
+`docs/examples/full-manifest.yaml` is a schema-coverage example, not a
+runnable config (the file header spells out the contract). For a
+manifest tailored to your machine, install globally and run
+`harness init --interactive`.
 ## Uninstall
 `harness uninstall` is the single-command teardown: dry-run by default,
-`--apply` to mutate. It inventories what harness planted under
-`~/.claude/` (manifest, lock, `harness.generated/`, harness-owned hook
-groups and `mcpServers` entries in `settings.json`, any leftover
-`settings.json.pre-harness-<TS>` backups), then removes them after
-writing a reversible backup + JSON snapshot next to `settings.json`.
-```bash
-harness uninstall                                      # list, exit 0
-harness uninstall --apply                              # tear down
-harness uninstall --restore-from <pre-harness-backup>  # atomic restore
-npm uninstall -g @lannguyensi/harness                  # drop the CLI itself
-```
+`--apply` to mutate, `--restore-from <backup>` to roll back. Full
+inventory + recommended order in [`docs/uninstall.md`](docs/uninstall.md).
 ## Status
 harness ships in phases. Phases 1 through 6 are released: read-only
 inventory → managed edits → declarative truth → policy layer → polish
 and dogfood lessons → the Understanding Gate Policy Pack. Phase 7, the
-Risk Gate, is next. The current release is `v0.16.0`.
+Risk Gate, is next. The current release is `v0.19.0`.
 The phase-by-phase plan with acceptance criteria lives in
 [`docs/ROADMAP.md`](docs/ROADMAP.md); what shipped in each version is
@@ -270,109 +203,61 @@ in [`CHANGELOG.md`](CHANGELOG.md).
 ## Policy Packs
-A *Policy Pack* is a reusable bundle of instruction template, hooks,
-policies, and permission profiles that ships under one name and is
-referenced from `harness.yaml` with a single key. The first pack,
-`understanding-before-execution` (shipped in `v0.9.0`), forces agents
-to expose and confirm their task interpretation before any
-write-capable tool fires.
+A *Policy Pack* is a reusable bundle of hooks, policies, instruction
+template, and permission profiles shipped under one name and enabled
+from `harness.yaml` with a single key:
 ```yaml
 policy_packs:
   - name: understanding-before-execution
     config:
-      mode: grill_me                       # fast_confirm | grill_me | strict
-      permission_profile: safe-start       # safe-start | implementation-after-approval | high-risk-grill-me
-```
-Manage packs with `harness pack add / remove / list`. Apply against
-either runtime:
-```sh
-harness apply --runtime claude-code        # default; writes harness.generated/settings.json
-harness apply --runtime codex              # writes harness.generated/codex/config.toml
+      mode: grill_me                  # fast_confirm | grill_me | strict
+      permission_profile: safe-start  # safe-start | implementation-after-approval | high-risk-grill-me
 ```
-Approve a session's Understanding Report via
-`harness approve understanding --session <id>` (round-trips both the
-evidence-ledger tag and the persisted JSON report). Verify the
-adapter wiring with `harness doctor --target codex` (`--json` for
-machine-readable). The full reference lives in
-[`docs/policy-packs/understanding-before-execution.md`](docs/policy-packs/understanding-before-execution.md);
-synthetic-stdin dogfood under
-[`dogfood/phase6-6/`](dogfood/phase6-6/run-smoke.sh) exercises the
-block / allow / capture / approve round-trip without a real Codex
-binary.
+Manage packs with `harness pack add / remove / list`. Two packs ship
+today: [`understanding-before-execution`](docs/policy-packs/understanding-before-execution.md)
+(forces an Understanding Report before any write-capable tool fires)
+and [`branch-protection`](docs/policy-packs/branch-protection.md)
+(blocks source mutations on protected branches without an explicit
+override). Custom packs from `path:`, `npm:`, or `git:` sources are
+out of scope for v1 (see the pack docs for the future-vocabulary
+contract).
 ## What's next
-**Phase 7, Risk Gate.** Today's policy model evaluates a rule per
-matching trigger and returns a binary block/allow. Phase 7 makes
-harness reason about *the action itself*: an Action Envelope (tool +
-raw input + session + runtime context) is enriched by a Context
-Resolver (production / staging / dev / unknown), classified by a Risk
-Classifier (severity + categories + reversibility), then matched
-against policies whose `when:` clauses can reference
-`risk.severity_at_least`, `environment.name`, and similar. The
-decision space extends to `allow / warn / require_approval / deny`.
-Motivating use case: prevent `DROP TABLE users`, `kubectl delete
-namespace prod`, `terraform destroy` against an unverified production
-target, even if the model would have happily run them.
-Phase 7 builds on Phase 4's `policy intercept` runtime backbone and
-Phase 6's Policy Pack distribution surface; neither is replaced.
+**Phase 7, Risk Gate.** Today's policy model returns a binary
+block/allow per matching trigger. Phase 7 lets harness reason about
+the action itself (Action Envelope → Context Resolver → Risk
+Classifier) and extends the decision space to `allow / warn /
+require_approval / deny`. Motivating use case: block `DROP TABLE
+users`, `kubectl delete namespace prod`, `terraform destroy` against
+unverified production targets. Full plan in
+[`docs/ROADMAP.md#phase-7--risk-gate`](docs/ROADMAP.md#phase-7--risk-gate).
 > Bring your favorite agent harness. Add governance.
 ## Why this exists
-A working agent harness today has six to eight configuration
-surfaces, each with its own schema and lifecycle: `~/.claude/settings.json`,
-`CLAUDE.md` (per repo + root), `~/.claude/projects/*/memory/*.md`
-with frontmatter, `~/.claude/keybindings.json`, MCP server
-registrations in `~/.claude.json`, skill directories, per-project
-overrides, and external CLIs that behave differently per project.
-There is no single place that answers *"what can this agent do right
-now, and why is that configured that way?"*. Drift between sessions
-is invisible until it breaks something. Humans editing one surface
-do not know which other surfaces they need to touch. A fresh agent
-instance has no way to audit its own setup.
-Our entry point into this problem: on 2026-04-23, an
-`agent-grounding` checkout that was 16 commits behind origin led two
-tasks to be incorrectly called "stale". The check that would have
-caught it already exists,
+On 2026-04-23, an `agent-grounding` checkout that was 16 commits
+behind origin led two tasks to be incorrectly called "stale". The
+check that would have caught it already existed:
 [`agent-preflight`](https://github.com/LanNguyenSi/agent-preflight)
-runs `git fetch` + `git status` (alongside lint, typecheck, test,
-audit) and emits a structured `ready` + confidence-score result. The
-missing piece was not the check itself, it was the deterministic
-*trigger*: a `SessionStart` hook that invokes `preflight run` and a
-policy that gates further work on the result. Building that wiring
-needs an agreed-upon place for harness config to live first. That
-conversation is the origin of this repo.
+runs `git fetch` + `git status` and emits a structured `ready` +
+confidence-score result. The missing piece was not the check, it was
+the deterministic *trigger*: a `SessionStart` hook that invokes
+`preflight run` and a policy that gates further work on the result.
+Building that wiring needs an agreed-upon place for harness config to
+live first. That conversation is the origin of this repo.
 ## Related
-- [`agent-grounding`](https://github.com/LanNguyenSi/agent-grounding):
-  grounding primitives (evidence-ledger, claim-gate,
-  review-claim-gate); `grounding-mcp` is the canonical client surface
-  harness queries through `queryLedgerByTag`.
-- [`agent-memory`](https://github.com/LanNguyenSi/agent-memory):
-  memory surfaces the control plane inventories.
-- [`agent-tasks`](https://github.com/LanNguyenSi/agent-tasks): the
-  MCP-registered task platform whose registration + health appear in
-  `harness describe`.
-- [`agent-preflight`](https://github.com/LanNguyenSi/agent-preflight):
-  local preflight validator; the canonical implementation of
-  preflight-hook content harness wires.
-- [`codebase-oracle`](https://github.com/LanNguyenSi/codebase-oracle):
-  an opt-in MCP surface for multi-repo RAG search. Not in the Full
-  default; operators wire it via `harness add mcp codebase-oracle
-  --command codebase-oracle,mcp`.
-- [`agent-dx`](https://github.com/LanNguyenSi/agent-dx): ships
-  `git-batch-cli`, a day-to-day tool whose inventory appears in
-  `harness describe`.
+- [`agent-grounding`](https://github.com/LanNguyenSi/agent-grounding): evidence-ledger, claim-gate, review-claim-gate; `grounding-mcp` is the canonical client surface harness queries through `queryLedgerByTag`.
+- [`agent-memory`](https://github.com/LanNguyenSi/agent-memory): the memory surfaces the control plane inventories.
+- [`agent-tasks`](https://github.com/LanNguyenSi/agent-tasks): MCP-registered task platform whose registration + health appear in `harness describe`.
+- [`agent-preflight`](https://github.com/LanNguyenSi/agent-preflight): local preflight validator; the canonical implementation of preflight-hook content harness wires.
+- [`codebase-oracle`](https://github.com/LanNguyenSi/codebase-oracle): opt-in MCP for multi-repo RAG search; not in Full, wire via `harness add mcp codebase-oracle --command codebase-oracle,mcp`.
+- [`agent-dx`](https://github.com/LanNguyenSi/agent-dx): ships `git-batch-cli`, a day-to-day tool whose inventory appears in `harness describe`.
 ## License

package/dist/cli/approve/understanding.js CHANGED Viewed

@@ -60,8 +60,18 @@ async function writeLedgerTag(manifest, sessionId, content, opts) {
  * from a silent dead end to a "hook fired but parse failed because X"
  * pointer. Best-effort: any I/O error is swallowed and we report no
  * parse-error, mirroring the listPersistedReports contract.
+ *
+ * `sessionId` filter (agent-tasks/b13205b2): each parse-error log's JSON
+ * header carries the `sessionId` of the session that produced it. The
+ * lookup used to return the directory-newest log regardless of whose
+ * session wrote it, so a stale parse-error from a previous session would
+ * surface in the current operator's approve output and read like a
+ * failure of THEIR session. Logs whose header sessionId does not match
+ * `sessionId` are now skipped entirely. Logs missing a `sessionId` field
+ * (or whose header is not JSON) are also skipped, since we cannot
+ * attribute them.
  */
-function findLatestParseError(dir) {
+function findLatestParseError(dir, sessionId) {
     let names;
     try {
         names = fs.readdirSync(dir);
@@ -69,7 +79,7 @@ function findLatestParseError(dir) {
     catch {
         return null;
     }
-    let newest = null;
+    const candidates = [];
     for (const name of names) {
         if (!name.endsWith(".log"))
             continue;
@@ -83,42 +93,48 @@ function findLatestParseError(dir) {
         }
         if (!stat.isFile())
             continue;
-        if (!newest || stat.mtimeMs > newest.mtimeMs) {
-            newest = { filePath: full, mtimeMs: stat.mtimeMs };
-        }
-    }
-    if (!newest)
-        return null;
-    let raw;
-    try {
-        raw = fs.readFileSync(newest.filePath, "utf8");
-    }
-    catch {
-        return { filePath: newest.filePath, summary: "<unreadable>" };
+        candidates.push({ filePath: full, mtimeMs: stat.mtimeMs });
     }
-    // The standalone package writes a JSON header followed by `--- raw ---`
-    // and the original assistant text. Read the header for a `message`,
-    // `reason`, or `missing` field; fall back to the first line if the
-    // schema is unfamiliar so a future format change still surfaces
-    // *something* rather than going silent.
-    const header = raw.split("\n--- raw ---")[0] ?? raw;
-    let summary = (header.split("\n")[0] ?? "").trim();
-    try {
-        const parsed = JSON.parse(header);
-        if (typeof parsed["message"] === "string" && parsed["message"].length > 0) {
-            summary = parsed["message"];
+    candidates.sort((a, b) => b.mtimeMs - a.mtimeMs);
+    for (const cand of candidates) {
+        let raw;
+        try {
+            raw = fs.readFileSync(cand.filePath, "utf8");
+        }
+        catch {
+            continue;
+        }
+        // The standalone package writes a JSON header followed by `--- raw ---`
+        // and the original assistant text. Read the header for a `message`,
+        // `reason`, or `missing` field; fall back to the first line if the
+        // schema is unfamiliar so a future format change still surfaces
+        // *something* rather than going silent.
+        const header = raw.split("\n--- raw ---")[0] ?? raw;
+        let summary = (header.split("\n")[0] ?? "").trim();
+        let headerSessionId = null;
+        try {
+            const parsed = JSON.parse(header);
+            if (typeof parsed["sessionId"] === "string") {
+                headerSessionId = parsed["sessionId"];
+            }
+            if (typeof parsed["message"] === "string" && parsed["message"].length > 0) {
+                summary = parsed["message"];
+            }
+            else if (typeof parsed["reason"] === "string") {
+                const missing = Array.isArray(parsed["missing"])
+                    ? ` (missing: ${parsed["missing"].filter((m) => typeof m === "string").join(", ")})`
+                    : "";
+                summary = `${parsed["reason"]}${missing}`;
+            }
         }
-        else if (typeof parsed["reason"] === "string") {
-            const missing = Array.isArray(parsed["missing"])
-                ? ` (missing: ${parsed["missing"].filter((m) => typeof m === "string").join(", ")})`
-                : "";
-            summary = `${parsed["reason"]}${missing}`;
+        catch {
+            /* keep the first-line fallback; headerSessionId stays null */
         }
+        if (headerSessionId !== sessionId)
+            continue;
+        return { filePath: cand.filePath, summary };
     }
-    catch {
-        /* keep the first-line fallback */
-    }
-    return { filePath: newest.filePath, summary };
+    return null;
 }
 function rewriteReportApproved(filePath, approvedAt, approvedBy) {
     const raw = fs.readFileSync(filePath, "utf8");
@@ -247,7 +263,7 @@ export async function approveUnderstanding(opts = {}) {
         // fired but the parser rejected the report — here is why", rather
         // than a silent dead end.
         const parseErrorsDir = path.join(path.dirname(reportsDir), "parse-errors");
-        const latestParseError = findLatestParseError(parseErrorsDir);
+        const latestParseError = findLatestParseError(parseErrorsDir, sessionId);
         let reason;
         if (reports.length === 0) {
             reason = `no reports found at ${reportsDir}`;