@curdx/flow 7.1.6 → 7.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,100 @@
2
2
 
3
3
  All notable changes to `@curdx/flow` are documented here. Format follows [Keep a Changelog](https://keepachangelog.com/) and the project follows [Semantic Versioning](https://semver.org/).
4
4
 
5
+ ## 7.1.8 — 2026-05-07
6
+
7
+ ### Fixed
8
+
9
+ - **stop-watcher: completed specs no longer trigger missing-verification-block error (PR #9).** A spec finalized per NFR-5a (v7.1.0 retain-on-completion) repeatedly emitted `Phase 'execution' has no verification block. Run: /curdx-flow:execution` whenever the user stopped a session and the transcript happened to contain `ALL_TASKS_COMPLETE`. Root cause: `handleCompletion()` in `src/hooks/stop-watcher.ts` ran the iron-law verification gate without checking `state.completed === true` first; the mainline gate at L826 had the check, but handleCompletion exited before reaching it. Fix adds the same short-circuit immediately after the stateMalformed check inside handleCompletion. Mirrors the L826 guard intent — the two paths can't share a single check because handleCompletion returns before the mainline gate runs. Bundle (`plugins/curdx-flow/hooks/scripts/stop-watcher.mjs`) regenerated.
10
+ - **tests: cross-platform fixture transcripts via runHook helper (PR #8).** Windows/Node22 CI was red since the OB-3 epic merged: 5/13 stop-watcher POC-gate tests failed because static fixtures (e.g. `all-complete.json`) hardcode `transcript_path: "/tmp/curdx-fixture-transcripts/complete.txt"`, which doesn't reliably resolve on Windows. The `runHook` helper (`tests/hooks/_helpers.ts`) was finished half-way during v7.0.0-beta.1's Bug 2 fix — it rewrote the fixture's `cwd` field to the runtime tmpdir but never gave `transcript_path` the same treatment. This release extends the cwd-rewrite block with a prefix substitution: any `transcript_path` starting with `/tmp/` is replaced with `os.tmpdir()` so the hook reads from the path the test's `beforeEach` actually provisions. No fixture JSON files needed editing. Companion change: `tests/hooks/stop-watcher.test.ts` `beforeEach` now writes to `os.tmpdir()/curdx-fixture-transcripts/...` via `FIXTURE_TRANSCRIPT_DIR`/`FIXTURE_TRANSCRIPT_FILE` constants (6 hardcoded `/tmp/` occurrences replaced).
11
+
12
+ ### Added — quality / CI hardening (no user-facing CLI changes)
13
+
14
+ - **`tests/hooks/lib/verify-blocks.test.ts` — 13 isolated unit tests for the iron-law gate (PR #11).** verify-blocks is the single source of truth referenced from 4 distinct callers (Stop hook, TaskCompleted hook, npm verify, `curdx-flow check` CLI). Until now its behavior was only locked down via the Stop hook e2e suite — coarse proxy that exercised happy + missing + stale via spawnSync but couldn't isolate `walkSrcTree`'s prune list, depth cap, or the typed-coercion in `getVerificationPhase`. New direct tests cover: VERIFICATION_PHASES order, getVerificationPhase typed coercion + null-on-legacy-"unknown", verifyPhaseBlock 4 design-mandated branches (missing / non-zero-with-failedReason / non-zero-default / stale-evidence / happy), and walkSrcTree's empty-dir-zero / max-mtime-aggregation / WALK_SKIP_DIRS pruning / FR-8 fail-open on non-existent dir. Imports the TS source directly (`../../../src/hooks/lib/verify-blocks.js`) rather than spawning the bundle — verify-blocks is a non-CLI lib per the source comment at L17-21; consistent with `tests/hooks/subagent-context-injector.test.ts:6`.
15
+ - **`tests/runner/manifest-integrity.test.ts` — frontmatter + reference-link guard for 31 plugin .md files (PR #12).** 15 commands + 10 agents + 6 skills had **zero automated check** before this release — a typo'd YAML key, renamed reference, or missing description could ship to users with no CI guard. New test validates: discovery sanity (lower bounds catch mass-deletion regressions), commands have non-empty `description`, agents/skills have both `name` + `description`, agent `name` matches filename (Task tool dispatch key), skill `name` matches parent directory (Skill tool dispatch key), every `references/<foo>.md` mention resolves — commands/agents look in global `plugins/curdx-flow/references/`, skills look in their LOCAL `skills/<name>/references/` subdir (the test differentiates the two scopes). Regex parsing only — no `gray-matter` dep, same convention as `claudeMd.test.ts` and `iron-law-doc.test.ts`.
16
+ - **`test:runner` npm script + wired into `verify` chain (PR #12).** `tests/runner/` (claudeMd, two-stage-review, iron-law-doc, e2e-verification-flow, etc. — 9 files / 73 tests including the new manifest-integrity) was orphaned from the npm script chain: it ran locally on demand but never as part of `npm run verify` and never in CI. New `test:runner` script runs `vitest run tests/runner`. `verify` chain now: `typecheck && check-versions && check:hooks-fresh && build && check:bundle && test:hooks && test:analyze && test:runner && check-verification-blocks`.
17
+ - **CI: bundle-size guard (`check:bundle`) runs on every PR (PR #10).** Validates `dist/index.mjs` against the 84 KB NFR-3 threshold (currently 72.94 KB ≤ 84 KB). Previously buried in `prepublishOnly` — too late to catch drift before merge. The test-matrix job also now runs `npm run build` (tsup) before `check:bundle` so `dist/index.mjs` exists when the guard fires.
18
+
19
+ ### Notes
20
+
21
+ - **Plugin distribution constraint (re-affirmed):** `marketplace.json` declares `source: "./plugins/curdx-flow"`, so end users get the plugin via git clone (npm tarball ships only `dist/` per `package.json` `files`). Build artifacts under `plugins/curdx-flow/hooks/scripts/**/*.mjs` and `*.mjs.map` therefore **must remain git-tracked** — gitignoring them would break end-user installs. PR #9 includes the rebuilt `stop-watcher.mjs` bundle alongside the TS source change for this reason.
22
+ - **Follow-ups deferred (not P0):**
23
+ - **`test:analyze` in CI**: vitest exits with code 2 in GitHub Actions ~300ms after all 106 analyze tests pass; reproducible only in CI, not local macOS. Suspected fork-pool teardown interaction with the preceding `test:hooks` vitest run. Filed for separate investigation; PR #10 added only `check:bundle` to the matrix.
24
+ - **`test:runner` step in `ci.yml`**: script ships in this release, but the matching `- run: npm run test:runner` step is a follow-up commit (avoiding scope creep on the v7.1.8 release wave).
25
+
26
+ ## 7.1.7 — 2026-05-07
27
+
28
+ ### Added — observability-v2 epic (OB-1 + OB-2 error-logger + OB-3, PR #7)
29
+
30
+ #### OB-3 — cost / time / token analytics (spec-cost-time-token-analytics)
31
+ - **`npx curdx-flow analyze --cost-summary` flag suite.** New citty flag turns on cost / time / token aggregation across the spec funnel. Companion flags `--by-spec`, `--by-phase`, `--by-task`, `--top <N>`, and the existing `--since <duration>` compose freely. Off by default — the v7.1.7 baseline `analyze` 7-section report is preserved byte-equal when `--cost-summary` is absent (NFR-6).
32
+ - **`src/analyze/pricing.ts` — hardcoded 3-model pricing table + `LAST_UPDATED` constant.** Zero npm runtime deps. Per-1M-token USD for Opus 4.7 / Sonnet 4.6 / Haiku 4.5 across 5 fields each (input / output / cache_read / cache_5m_write / cache_1h_write). `MODEL_ALIASES` maps `claude-haiku-4-5 → claude-haiku-4-5-20251001` defensively. README documents 3-step refresh workflow (WebFetch official → diff `PRICING` → bump `LAST_UPDATED` + CHANGELOG entry).
33
+ - **`src/analyze/cost.ts` — `computeCost` + `extractUsageRowsFromEvents` + `extractTrailerUsage` + `aggregateBy(level, ctx)`.** Cost formula `(input·base + 5m·1.25·base + 1h·2·base + read·0.1·base + output·out)/1e6` rounded to 4 decimals. Trailer regex `/<usage>[\s\S]*?total_tokens:\s*(\d+)[\s\S]*?tool_uses:\s*(\d+)[\s\S]*?duration_ms:\s*(\d+)[\s\S]*?<\/usage>/g` parses subagent text-embedded usage (verified empirically, 681 occurrences across 702 transcripts). Three-level aggregation (`spec` / `phase` / `task`) joins via OB-2 correlationId `<sid>:<task>:<iter>`; phase resolution reads `specs/<name>/.curdx-state.json`.
34
+ - **`src/analyze/recommend.ts` — 8-rule engine + MAD outlier detection.** Pure-function rules: cache hit-rate, output tokens per turn (split-task signal), hit-cap rate, Opus mix in non-critical phases, MAD-based cost spike, wall-clock p95, cache_creation/read ratio (Anthropic anti-pattern #1), retry/loop count. Modified z-score (Iglewicz & Hoaglin 1993) with `MIN_N=10`, threshold `|z| > 3.5`, `0.6745` scale constant. Fourth severity `insufficient_data` covers small-sample / MAD=0 cases. `REC_THRESHOLDS` const centralizes all 16 numeric thresholds for future tuning.
35
+ - **`tests/analyze/fixtures/sample-with-usage.jsonl` (NEW, 7 rows).** Synthetic transcript carrying realistic `assistant.message.usage` nested cache_creation blocks across 3 models, 1 sidechain row, 1 subagent `<usage>` trailer in a tool_result text field, and 1 legacy-schema row (no nested cache_creation) for FR-PARSER-3 backwards-compat.
36
+ - **Cost Breakdown report (R1-R7 seven tables).** New markdown section: R1 per-spec / R2 per-phase / R3 per-task (with trailerCount column) / R4 cache hit / R5 wall-clock / R6 model mix (with Opus 4.7 +35% tokenizer footnote) / R7 top-N hot tasks. Each table independently togglable via `--by-spec` / `--by-phase` / `--by-task` filters.
37
+ - **Recommendations section + JSON top-level array.** Markdown `## Recommendations` block with 4-color severity prefixes (`[SEV]` red / `[WARN]` yellow / `[INFO]` blue / `[N/A]` gray), `NO_COLOR` env honored. JSON output gains top-level `recommendations: Recommendation[]` array sibling to `costBreakdown`. Top-level `totalCost.usd` mirror preserved (jq Validation Hint compatibility).
38
+ - **77 new analyze tests (pricing 11 + cost 15 + recommend 39 incl. 11 MAD edge cases + integration 12 incl. trailer attribution + requestId dedup regression).** Total analyze suite went 36 → 106; integration count 3 → 7.
39
+
40
+ #### OB-2 — error-logger.ts upgrade (spec-decision-event-logging, partial)
41
+ - **`logHookEvent` + 4-field schema (`level` / `kind` / `payload` / `correlationId`).** Extends `src/hooks/_shared/error-logger.ts` with NEVER-throw event-logging API. Legacy `logHookError` preserved as a thin redirect (`logHookEvent({ ...ctx, level: 'error', kind: ctx.kind ?? 'unknown' })`) so existing call sites stay byte-equal. `EventKind` closed type union (10 kinds) + `coerceKind` coercer maps anything outside the closed set to `'unknown'`. `EventLevel` is `'error' | 'info' | 'metric' | 'decision'`.
42
+ - **Log rotation (size 10 MB OR age 30 d → `errors.<ISO-ts>-<pid>.jsonl`, retain 5).** New `shouldRotate(filePath)` dual-gate. `pruneRotatedFiles(dir, keep=5)` globs and unlinks oldest beyond retention. Throttled: `rotateIfNeeded` only does the `statSync` check every 10th `logHookEvent` call to keep p99 hot-path budget intact.
43
+ - **Cross-platform `safeRename(src, dst)` 4-step fallback chain.** POSIX atomic `renameSync` happy path, Windows `EBUSY` retry chain (50/200/500 ms), `EXDEV` cross-device copy+unlink, silent give-up persistent failure (NEVER-throw contract).
44
+ - **7 new tests in `tests/hooks/event-logger.test.ts`.** logHookEvent 4-field write, `coerceKind` unknown→'unknown', rotation triggers (size + age), `buildCorrelationId` 3-segment format, old single-field row round-trip with `??` defaults, NEVER-throws on disk-full mock.
45
+ - **Note**: hook-side `logHookEvent` call-site integration (across `stop-watcher` / `task-completed-verifier` / `subagent-context-injector` / `stop-failure-handler`) deferred to a follow-up spec — those hook files were independently rewritten by the v7.1.7 verification-iron-law epic and need re-integration on the new hook structure.
46
+
47
+ #### OB-1 — analyze reads real transcripts (spec-analyze-real-transcript)
48
+ - **🚨 CRITICAL FIX (B1) — `npx curdx-flow analyze` now reads `~/.claude/projects/<encoded-cwd>/*.jsonl` instead of the bundled fixture.** Prior versions (v7.1.6 and earlier) had `POC_FIXTURE_REL = "tests/analyze/fixtures/sample.jsonl"` hardcoded in `src/analyze/index.ts`; 5 separate code paths keyed off it, so the CLI silently emitted *test fixture data* dressed as user analytics for every invocation. Fix replaces all 5 sites with `resolveTranscriptSource({ cwd, fixtureOverride, sessionFilter })`. Major user-impact bug — every prior `analyze` output should be considered invalid.
49
+ - **`--session <uuid>` CLI flag.** New citty arg in `src/flows/analyze.ts` filters the per-session multi-glob to a single transcript under `~/.claude/projects/<encoded-cwd>/`.
50
+ - **`src/analyze/transcript-path.ts` resolver module.** Exports `resolveTranscriptSource(opts?)`, `TranscriptNotFoundError`, `TranscriptSource` union type. Encodes `realpath(cwd)` (cached) → `~/.claude/projects/<encoded>` layout. Honors `CURDX_TRANSCRIPT_FIXTURE` test-only env var for snapshot test isolation.
51
+ - **`cleanupOrphanState` helper in `src/analyze/index.ts`.** Two-pass GC over `state.files`: pass 1 drops entries with `lastModifiedMs > 30 days ago` OR `!existsSync(path)`; pass 2 caps total at 100 entries. Active session paths protected. Fail-open per FR-C3.
52
+ - **8 new tests in `tests/analyze/transcript-path.test.ts`.** cwd `/`→`-` encoding, multi-session glob, missing project dir → `TranscriptNotFoundError`, `fixtureOverride` short-circuit, `--session` filter, 30-day GC, 100-entry cap, file-gone GC.
53
+
54
+ ### Added — verification iron law + parallel dispatch + cost-runaway guards (PR #5)
55
+
56
+ - **Layer-2 opt-in `TaskCompleted` hook (`plugins/curdx-flow/hooks/scripts/task-completed.mjs`).** Fires when a task is marked complete and inspects the active spec's `verificationBlocks` state field; if any block is unresolved, the hook emits a non-blocking warning to remind the executor that the next phase requires real-environment verification proof. Wired into `plugins/curdx-flow/hooks/hooks.json` as opt-in (Layer-2) — disabled by default to keep the baseline path zero-friction; users opt in by enabling the hook entry. Pairs with the `verifyPhaseBlock` gate in `stop-watcher.mjs` (see *Changed*) so blocking enforcement and surfaced reminders share the same state field.
57
+ - **`verificationBlocks` state schema field on `.curdx-state.json` (`plugins/curdx-flow/schemas/spec.schema.json`).** Optional array of `{ phase: string, requiredArtifact: string, resolved: boolean }` records. Phase commands write a block when they finish a deliverable that the next phase must verify against the real environment (e.g. design phase produces an architectural claim that tasks-phase implementation must reproduce). The block stays unresolved until a `[VERIFY]` task or VE proof clears it. Legacy state files without the field continue to work — `verificationBlocks === undefined` is treated as "no outstanding blocks", preserving backwards compatibility with v7.0.x and earlier 7.1.x state files.
58
+ - **`plugins/curdx-flow/references/iron-law-verification.md` reference doc.** Canonical write-up of the iron-law verification protocol: when a verification block is required, what counts as a "real-environment proof" (API response capture, log line, DB row, screenshot — *not* `tests pass` or `code compiles`), how `[VERIFY]` tasks are inserted, and the failure mode this protects against (LLM-style "looks done" optimism). Cross-referenced from the new skill (below) and from `commands/design.md` / `commands/tasks.md` so phase agents pick up the rule without re-deriving it.
59
+ - **`plugins/curdx-flow/skills/verification-before-completion.md` plugin skill.** Triggers on prompts like "verify completion", "check verification blocks", "iron law", "before marking done". Loads the iron-law reference and walks the agent through inspecting `verificationBlocks` on the active spec, collecting evidence per block, and either resolving or escalating each unresolved record. Intended as the entry point for both `[VERIFY]` task agents and human-driven `/curdx-flow:status` audits.
60
+ - **`scripts/check-verification-blocks.mjs` release gate.** New verify-chain step (wired into `package.json` `npm run verify`) that walks every `.curdx/specs/*/.curdx-state.json` and exits non-zero if any spec has unresolved `verificationBlocks` while `completed === true`. This makes "claimed-done but not verified" a hard CI failure rather than a silent drift, mirroring the `check-versions.mjs` pattern. Skipped automatically when no specs exist.
61
+ - **`npx curdx-flow check` CLI subcommand.** New citty subcommand (`src/flows/check.ts`, registered in `src/index.ts`) that runs the same verification-block walk as the release gate, but locally and outside CI. Useful for pre-commit verification or for spec authors auditing their own state files. Output mirrors the existing `analyze` subcommand's tone: green PASS line per spec, red row + JSON diff for each unresolved block. Exits non-zero on any unresolved block to fit shell pipelines.
62
+ - **`plugins/curdx-flow/references/bounded-parallel-dispatch.md` reference doc.** Extends the previous `references/parallel-research.md` from a research-only playbook into a generalized "fan out by independent domain, coordinator is single source of truth" playbook covering the **research / review / debug** domains. New sections: `## Domain Coverage` (per-domain row table), `## Independence Criteria` (a 3-item pre-flight checklist that *all* must pass before fan-out — Independent Input / Independent Output / Independent Context), `## Per-Domain Anti-patterns` (≥10 numbered items: 3 research + 5 review + 5 debug, each shaped as "1-sentence statement; Coordinator: do this instead"), and `## Subagent vs Grep` (verbatim Anthropic best-practices citation: "predilection for subagents"). The pre-existing 5-step dispatch pattern, topic identification, merging, and complexity-scaling sections are preserved verbatim per the spec's drift-test invariant.
63
+ - **Drift detection test (`tests/runner/bounded-parallel-dispatch-doc.test.ts`).** Mirrors the `iron-law-doc.test.ts` shape (vitest + `node:fs`/`path`, no extra deps). Eight assertions: new doc exists, old stub is ≤3 lines and contains "Moved to ... bounded-parallel-dispatch.md", §4 has ≥10 numbered anti-patterns total, ≥3 per domain (research/review/debug — catches single-bucket starvation that would silently neuter spec B's review-domain contract), all 3 independence criterion strings present, zero `parallel-research.md` literal matches across the 6 inbound `commands/*.md` files (excluding the stub), the "predilection for subagents" string is preserved, and the 5-step dispatch pattern heading survives the rename verbatim. Test runs in <5 ms.
64
+ - **`code-quality-reviewer` agent (3-layer drift defense, 30 rubrics across 6 categories).** New review-domain subagent that audits implementation drift against design-phase claims along three layers (literal-string match, semantic equivalence, behavioral parity) and scores work against 30 rubrics organized into 6 categories (correctness, security, performance, maintainability, observability, test quality). Emits the SLSA verdict shape (`{ pass: bool, severity, rubric_id, evidence }`) so coordinators can merge multi-reviewer output deterministically. Pairs with `spec-reviewer` (narrowed — see *Changed*) so spec-compliance and code-quality concerns no longer overlap.
65
+ - **`plugins/curdx-flow/references/two-stage-review.md` reference doc — domain boundary + SLSA verdict shape.** Canonical write-up of the two-stage review protocol: which review domain owns which concern (spec-compliance = `spec-reviewer`, code-quality = `code-quality-reviewer`), the verdict object shape every reviewer must emit (SLSA-style), and the merge rule the coordinator applies when both stages report. Cross-referenced from both reviewer agents and from `commands/design.md` / `commands/tasks.md` so the boundary is locked at every entry point.
66
+ - **Parallel dispatch at design/tasks phase boundaries (consumes `bounded-parallel-dispatch.md`).** The design and tasks phase commands now fan out spec-compliance and code-quality reviews concurrently using the bounded-parallel-dispatch playbook (Independent Input / Independent Output / Independent Context). Coordinator merges the two SLSA verdicts into a single `REVIEW_PASS` / `REVIEW_FAIL` final-line protocol verdict — wall-clock for the review step is now bounded by max(stage1, stage2) instead of stage1 + stage2.
67
+ - **`SubagentStart` hook (`subagent-context-injector`) — preempts superpowers issue #237 by re-injecting spec context + iron-law summary into every subagent dispatch.** New plugin hook (`plugins/curdx-flow/hooks/scripts/subagent-context-injector.mjs`, bundled from `src/hooks/subagent-context-injector.ts`) fires on every Claude Code `SubagentStart` event and emits a compact `<curdx-spec-context>` block (~120 B; budget ceiling 2 KB) carrying `phase`, `spec`, and `iron-law: <IRON_LAW_SUMMARY>` into the dispatched subagent's `additionalContext`. This sidesteps the upstream superpowers gap (issue #237 was closed wontfix) where subagents lose the parent session's spec framing and iron-law guard the moment they spin up. Wired into `plugins/curdx-flow/hooks/hooks.json` as a baseline (always-on) hook — no opt-in env var required because `SubagentStart` is GA in current Claude Code (unlike the `TaskCompleted` event added in 7.1.7's Layer-2 hook, which still requires an opt-in flag upstream). Fail-open across 5 paths (state-absent / malformed JSON / completed-spec / payload-over-budget / unexpected throw): every failure path emits `{continue:true}` with no `hookSpecificOutput`, never blocks subagent dispatch.
68
+ - **Shared `src/hooks/lib/build-context-payload.ts` — single source of truth for SessionStart + SubagentStart context construction (DRY).** New pure-function library exporting `IRON_LAW_SUMMARY` constant (`"No completion claim without fresh verification."`), `BuildContextPayloadOpts` interface (`forSubagent?: boolean; maxBytes?: number`), `PayloadOverBudgetError` class, and `buildContextPayload(state, specDir, opts?)`. The function returns the existing SessionStart JSON shape (specName, phase, taskIndex, totalTasks, goal, awaitingApproval) when `forSubagent !== true`, and a key:value `<curdx-spec-context>…</curdx-spec-context>` text block when `forSubagent === true`. Throws `PayloadOverBudgetError` when the rendered output exceeds `opts.maxBytes ?? 2048` (NFR-1 budget enforcement). Both hooks now consume this lib instead of inlining their own payload construction logic.
69
+ - **7 unit tests + drift gate test + byte-equal SubagentStart baseline.** `tests/runner/subagent-context-injector.test.ts` covers cases (a)-(g) — happy path, state-absent, malformed-state, NFR-1 size budget (`additionalContext` ≤ 200 B; full output ≤ 2048 B), `IRON_LAW_SUMMARY` substring presence, `state.completed === true` short-circuit, and quickMode universal-injection (D2). `tests/runner/subagent-context-doc.test.ts` is a single-assertion drift gate that imports `IRON_LAW_SUMMARY` from the shared lib and asserts the constant appears verbatim in `plugins/curdx-flow/references/iron-law-verification.md` — failing CI if either side drifts. `tests/runner/byte-equal.test.ts` gains a frozen `SUBAGENT_START_BASELINE` constant capturing exact `{"hookSpecificOutput":{...},"continue":true}` stdout + `EXIT_CODE=0` for the SubagentStart hook (no v6.0.6 baseline existed since the hook is new — the v7.1.7 output is the new floor). Hook test count moved from 91 → 99 (+8: 7 unit + 1 byte-equal regression).
70
+ - **`StopFailure` hook (`stop-failure-handler`) — observability-only handler with 8-matcher map for autonomous-loop failure classification.** New plugin hook (`plugins/curdx-flow/hooks/scripts/stop-failure-handler.mjs`, bundled from `src/hooks/stop-failure-handler.ts`) fires on every Claude Code `StopFailure` event and writes a structured `~/.claude/curdx-flow/stop-failures.jsonl` line (`ts`, `matcher`, `session_id`, `cwd`, `spec`, `phase`) classifying the failure cause into one of 8 known matchers: `rate_limit`, `max_output_tokens`, `api_died`, `network_error`, `auth_error`, `quota_exceeded`, `tool_error`, plus `unknown` fallback for forward-compat with future Anthropic-side matcher additions. **Observability-only by upstream contract** — the hook's stdout/exit-code is fully ignored by Claude Code on `StopFailure` events (per Anthropic docs), so the hook never attempts to block, retry, or alter the loop; it exists purely so `npx curdx-flow analyze` and post-hoc audits can attribute autonomous-loop terminations. Wired into `plugins/curdx-flow/hooks/hooks.json` as a baseline (always-on) hook. Fail-open across every error path: malformed JSON / unwritable log dir / unknown matcher / unexpected throw all emit `{continue:true}` with no side effect on the parent loop.
71
+ - **`plugins/curdx-flow/references/cache-ttl-and-cost.md` reference doc (4 sections + GH#46829 cite).** Canonical write-up of the prompt-cache cost calculus that governs autonomous-loop sleep cadence and iteration ceilings: (1) **TTL Window** — Anthropic's 5-minute prompt-cache TTL, why sleeping ≥ 300 s evicts the cache and forces a full uncached read of conversation context on the next wake-up; (2) **Cost Multiplier** — empirical 5-10× cost penalty per cache miss documented in upstream Anthropic discussion `GH#46829` plus the 17.1% over-billed real-money figure measured by spec E's research phase; (3) **Cadence Heuristics** — pick `delaySeconds` under 270 s to stay in cache, or commit to ≥ 1200 s and amortize the single miss; explicitly avoid 300-600 s "worst-of-both" zone; (4) **Iteration Ceiling Math** — at 30-iter cap × ~2 min nominal turn = ~1 hr unattended ceiling × ~$0.15 nominal turn cost = ~$4.50 nominal blast radius. Cross-referenced from `commands/implement.md`, the loop skill, and the new `--max-global-iterations` CLI flag's help text. Drift test (`tests/runner/cache-ttl-doc.test.ts`) locks the four section headings + the `GH#46829` literal cite + the `5-10×` and `17.1%` figures so a silent doc-edit cannot quietly de-fang the cost-multiplier warning.
72
+ - **`--max-global-iterations` CLI flag (mirrors `--max-task-iterations`).** New citty flag on `npx curdx-flow implement` (`src/flows/implement.ts`) that overrides `state.maxGlobalIterations` for the current invocation. Mirrors the existing `--max-task-iterations` flag's contract: integer ≥ 1, written into `.curdx-state.json` on next state write, takes precedence over schema defaults but loses to a value already present in legacy state files (so users who set `maxGlobalIterations: 100` in v7.1.6 keep their value). Lets users explicitly opt back into the old 100-iter ceiling via `--max-global-iterations 100` after the default tightening (see *Changed*). i18n: 2 new `implement.flags.maxGlobalIterations.*` keys in `src/i18n/{en,zh}.ts`.
73
+ - **5 unit tests + 3 max-iter enforcement tests + drift test + CLI flag tests + byte-equal baseline.** `tests/runner/stop-failure-handler.test.ts` covers the 8-matcher map (one assertion per matcher → log-line shape match) and 5 fail-open cases (no-state, malformed-state, unwritable-log-dir, unknown-matcher, unexpected-throw). `tests/runner/max-iterations-enforcement.test.ts` covers (a) coordinator hard-blocks when `currentIteration >= maxGlobalIterations`, (b) execution-loop hard-blocks when `taskIteration >= maxTaskIterations`, (c) error message contains `current/cap/3-step remediation`. `tests/runner/cache-ttl-doc.test.ts` is the drift gate (5 assertions: 4 section headings + GH#46829 cite). `tests/runner/implement-cli-flags.test.ts` covers `--max-global-iterations` parse + state-write + legacy-state precedence. `tests/runner/byte-equal.test.ts` gains `STOP_FAILURE_BASELINE` frozen constant for the new hook's `{continue:true}` no-op output (new floor — no v6.0.6 baseline existed). Hook test count moved 99 → 113 (+14: 8 matcher + 5 fail-open + 3 max-iter + 1 drift + 1 CLI flag + 1 byte-equal regression… some fold into existing files, net +14 assertions).
74
+
75
+ ### Changed
76
+
77
+ - **`stop-watcher.mjs` extended with `verifyPhaseBlock` gate.** The stop hook now consults `verificationBlocks` before allowing a phase transition; if the current phase has an unresolved block, the transition is blocked with a structured error pointing the user at `npx curdx-flow check` and the iron-law reference doc. Existing transition rules (state-completion-marker semantics, schema validation) run unchanged ahead of the new gate, so legacy specs without `verificationBlocks` skip the new check entirely.
78
+ - **`reality-verification` skill renamed to `verification-before-completion`; legacy alias preserved.** The skill's previous name described the *technique* (reproduce failure before/after); the new name describes *when to invoke it* (before marking work complete), which aligns with how the spec-executor and `[VERIFY]` task agents actually trigger it. The original `reality-verification` skill name is retained as a redirecting alias so existing prompts and references continue to resolve.
79
+ - **`npm run verify` chain extended with `check-verification-blocks` step.** The release gate now runs after `test:analyze` and before publish, alongside the existing `check-versions` / `check:hooks-fresh` / `check:bundle` gates. `prepublishOnly` continues to invoke the same chain via `npm run verify`, so local `npm publish` and the CI release workflow both fail fast on unresolved verification blocks.
80
+ - **Renamed `plugins/curdx-flow/references/parallel-research.md` → `plugins/curdx-flow/references/bounded-parallel-dispatch.md`.** The old name described only one domain; the new name describes the underlying technique ("bounded parallel dispatch by independent domain"), which is what the doc actually formalizes once review and debug rules are added. The old path retains a 1-line stub redirect (`> Moved to [bounded-parallel-dispatch.md](./bounded-parallel-dispatch.md). ...`) so the 9 soft `specs/*.md` references continue to resolve without churn — only the 4 hard inbound refs were physically rewritten (see below). Backwards-compat preserved for any external link.
81
+ - **4 inbound references updated to the new path.** `plugins/curdx-flow/commands/research.md` (×2, lines 78 and 103), `plugins/curdx-flow/commands/start.md` (×1, line 193), and `plugins/curdx-flow/references/triage-flow.md` (×1, line 46) now point at `references/bounded-parallel-dispatch.md`. Soft consumers (`commands/requirements.md` / `design.md` / `tasks.md`) were grep-verified to contain zero literal `parallel-research.md` matches and were left untouched. The drift test's `path-consistency-commands` assertion locks this so a future regression cannot silently re-introduce the old path string.
82
+ - **`spec-reviewer` narrowed to spec-compliance only (E1 13-item cut/split/move audit).** The previous `spec-reviewer` covered both spec-conformance and general code-quality concerns, which produced verdict overlap and double-flagging when the same issue tripped both rubric sets. E1 audited the original 13-item rubric and cut/split/moved each item: 5 items stayed (true spec-compliance — schema, phase ordering, state-marker semantics, file scope, commit-message convention), 5 items moved to the new `code-quality-reviewer`, and 3 items were split into both reviewers with non-overlapping evidence requirements. Net: each concern has exactly one owning reviewer, and the merge rule is now deterministic.
83
+ - **`VerificationBlock.reviews` keyed-object field on `.curdx-state.json` (additive, backwards-compatible).** Schema gains an optional `reviews: Record<reviewerId, { verdict, severity, rubric_id, evidence }>` map on each `verificationBlock`. Coordinator writes one entry per dispatched reviewer (`spec-reviewer`, `code-quality-reviewer`); legacy state files without the field continue to parse and execute unchanged because `reviews === undefined` is treated as "no reviews recorded yet". Schema validation only enforces the field's *shape* when present.
84
+ - **REVIEW_PASS / REVIEW_FAIL final-line protocol byte-equal preserved (NFR-1).** The exact final-line verdict tokens emitted by phase commands and reviewer agents (`REVIEW_PASS` / `REVIEW_FAIL`) are byte-equal to v7.1.6 output. Downstream consumers (stop-watcher, state-completion-marker, external scripts grepping for these tokens) require zero changes. The two-stage review protocol changes *who* produces the verdict (now a coordinator merge of two SLSA verdicts) but not *what bytes appear on the final line*.
85
+ - **`SessionStart` hook (`load-spec-context.ts`) refactored to use shared `build-context-payload.ts` lib (surgical, byte-equal preserved).** The previously-inlined payload construction inside `load-spec-context.ts` is now a single `Object.assign(block, JSON.parse(buildContextPayload(state, specPath)))` call shared with the new `SubagentStart` hook (D4 surgical refactor). Source diff was 21 ins / 21 del (well under the 50-LoC ceiling). All 3 function signatures, the `readEnabledSetting` / `readGoalFromProgress` helpers, the stderr banner, and the handler control flow are preserved verbatim — the lib's default-branch fallbacks (`phase ?? "unknown"`, `taskIndex/totalTasks ?? 0`, `awaitingApproval === true`) match the historical inline assignments so byte-equal output holds for any state shape. All 16 v6.0.6 byte-equal regressions in `tests/runner/byte-equal.test.ts` pass unchanged — load-spec-context output is bit-for-bit identical to the pre-refactor binary. Insertion order of keys on `block` is preserved because `Object.assign` keeps existing keys' order and only appends new ones.
86
+ - **🚨 USER-VISIBLE DEFAULT CHANGE — `maxGlobalIterations` default tightened: `100` → `30`.** New autonomous-loop initializations now write `maxGlobalIterations: 30` into `.curdx-state.json` instead of the historical `100`. The new ceiling represents a **~1 hour unattended runtime cap** (30 iter × ~2 min nominal turn) and a **~$4.50 nominal blast radius** (30 iter × ~$0.15 nominal turn cost), both calibrated to the 5-min prompt-cache TTL economics documented in the new `references/cache-ttl-and-cost.md` reference (GH#46829: 5-10× cost multiplier per cache miss; 17.1% real-money over-bill measured upstream). **Backwards-compat preserved**: existing `.curdx-state.json` files that already carry `maxGlobalIterations: 100` keep their value untouched — the new default applies only to fresh `npx curdx-flow new` / `implement` initializations. **Opt-in to legacy ceiling**: pass `--max-global-iterations 100` (see *Added*) to restore the v7.1.6 behavior on a per-invocation basis. Rationale: the prior `100` value was both (a) effectively never enforced (stop-watcher only `stderr`-warned, coordinator did not read the field at all — see next bullet) and (b) a user-perceived cap that did not match the real cost-runaway risk profile. Pairs with the *hard-block* enforcement change below — together they make the cap a real ceiling instead of a polite suggestion.
87
+ - **🚨 BREAKING-ish — `stop-watcher.mjs` now HARD-BLOCKS (was soft-warn) when `maxGlobalIterations` is hit.** Previously the stop hook merely emitted a `stderr` warning when `state.currentIteration >= state.maxGlobalIterations` and let the autonomous loop continue forever. The hook now emits a structured `{continue: false}` decision with an actionable error message of shape:
88
+ ```
89
+ ❌ Iteration cap reached: <current>/<cap> global iterations.
90
+ Remediation:
91
+ 1. Inspect .curdx/specs/<spec>/.curdx-state.json — confirm progress is real, not stuck.
92
+ 2. If work is genuinely incomplete and the cap is too tight, raise it: `npx curdx-flow implement --max-global-iterations <N>`
93
+ 3. If the loop is stuck, run `npx curdx-flow cancel <spec>` and re-investigate root cause before resuming.
94
+ ```
95
+ Same hard-block treatment is now applied at the **coordinator level** (`src/runner/execution-loop.ts`) for `maxTaskIterations` so per-task cost runaway is also bounded. **Migration**: users who previously relied on the soft-warn behavior to "watch and decide later" must now either set a higher cap (`--max-global-iterations 200`, etc.) or accept the new ~1 hr / ~$4.50 ceiling. The combined default change (100 → 30) + hard-block enforcement is the load-bearing fix in this release for the cost-runaway-guards spec — schema field `maxGlobalIterations` was *declared* in v7.0.x but never *enforced* until now.
96
+
97
+ > **Note on event GA status:** the `SubagentStart` hook event is **GA** in current Claude Code — no env var opt-in is required, unlike the `TaskCompleted` event added in this same release for the Layer-2 verification-blocks reminder hook (which is still gated behind an opt-in flag upstream). Both hooks coexist cleanly: SubagentStart fires on every subagent dispatch baseline, TaskCompleted fires only when explicitly enabled.
98
+
5
99
  ## 7.1.6 — 2026-05-06
6
100
 
7
101
  ### Changed
package/README.md CHANGED
@@ -503,3 +503,17 @@ Verified on **macOS** and **Linux**. Windows is **declared supported but not tes
503
503
  ### Configuration
504
504
 
505
505
  Set `errorLogEnabled: false` in `~/.claude/settings.json` to disable hook error logging. The schema map at `plugins/curdx-flow/schemas/transcript-events.json` is auto-resolved post-bundle; if missing, the parser falls back to a builtin minimal whitelist with a stderr warning.
506
+
507
+ ---
508
+
509
+ ## 💰 Pricing Refresh Workflow
510
+
511
+ `npx @curdx/flow analyze --cost-summary` computes USD cost using a hardcoded pricing table at `src/analyze/pricing.ts` (zero npm runtime deps — no pricing-API call, no phone-home). Anthropic publishes occasional price changes, so the table needs a **quarterly self-check** (≤ 90 days since `LAST_UPDATED`).
512
+
513
+ When the self-check fires (or whenever Anthropic announces new pricing), follow the **3-step refresh** below:
514
+
515
+ 1. **WebFetch official pricing.** Fetch <https://www.anthropic.com/pricing> (or the model-specific docs page) via WebFetch. Capture the per-1M-token rates for every model row currently in `pricing.ts` — `input`, `output`, `cache_creation`, `cache_read`. Cross-check against `https://docs.anthropic.com/en/docs/about-claude/pricing` for any model that has a docs page.
516
+ 2. **Diff `PRICING` table.** Open `src/analyze/pricing.ts` and reconcile each row against the WebFetched figures. Add new model rows for any newly-released model the spec funnel might encounter (Opus, Sonnet, Haiku across versions). Remove rows only if Anthropic has formally retired a model — keep historical rows otherwise so retro `analyze` runs on old transcripts still produce correct cost.
517
+ 3. **Bump `LAST_UPDATED` + add CHANGELOG entry.** Update the `LAST_UPDATED` constant at the top of `pricing.ts` to today's `YYYY-MM-DD`. Add a `Changed` line to `CHANGELOG.md` referencing the diff (e.g. `pricing: refresh 2026-08-07 — Opus 4.7 input $15→$18, cache_read unchanged`). Commit as `chore(pricing): refresh table to YYYY-MM-DD`.
518
+
519
+ > **Quarterly self-check:** the `--cost-summary` command emits a stderr warning when `today - LAST_UPDATED > 90 days`. Treat that warning as a blocking todo — stale pricing makes every cost figure in the report directionally wrong.