npm - @lh8ppl/claude-memory-kit - Versions diffs - 0.3.3 → 0.3.4 - Mend

@lh8ppl/claude-memory-kit 0.3.3 → 0.3.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/README.md CHANGED Viewed

@@ -20,7 +20,7 @@
 - **Bounded by compression** — session → daily → weekly Haiku rollups (cron or lazy-on-read) keep the snapshot small as history grows. The session-buffer rollup self-heals at session start too, so memory stays bounded even if you never cleanly close the window.
 - **Don't start empty — import the rules you already own** — `cmk import-claude-md` parses an existing `CLAUDE.md` / `.cursorrules` / `AGENTS.md` into typed, searchable facts through the same safe write path (secret screening, sanitization, dedup), with provenance back to source file + line. `--dry-run` previews first.
 - **Per-project, in-repo** — `context/` lives inside your project and travels with `git clone`. Each project keeps its own memory.
-- **8 health checks** — `cmk doctor` validates hook wiring, distill freshness, transcript firing, INDEX consistency, cron registration, native-memory coexistence, stale locks, and native-binding health (npm 12 readiness) — each failure with a repair command.
+- **9 health checks** — `cmk doctor` validates hook wiring, distill freshness, transcript firing, INDEX consistency, cron registration, native-memory coexistence, stale locks, native-binding health (npm 12 readiness), and version drift (a project scaffold behind your installed `cmk` after an update) — each failure with a repair command.
 ## Install — pick ONE route
@@ -62,7 +62,7 @@ Most-used commands (full list via `cmk --help`):
 | Command | Purpose |
 | --- | --- |
 | `cmk install` | Scaffold `context/` + the `memory-write`/`memory-search` skills + `.gitignore` + CLAUDE.md block + wire hooks (`--no-hooks` for scaffold-only) |
-| `cmk doctor` | Run HC-1..HC-8 health checks, surface repair commands |
+| `cmk doctor` | Run HC-1..HC-9 health checks, surface repair commands |
 | `cmk repair --hooks` / `--locks` / `--index` / `--all` | Idempotent self-repair |
 | `cmk search "<query>" [--mode keyword\|semantic\|hybrid] [--scope facts\|transcripts\|decisions]` | Search memory — by meaning with the embedder (hybrid default after `--with-semantic`); `--scope transcripts` = the raw session record; `--scope decisions` = the decision journal (history / "what did we reject") |
 | `cmk get <id…>` / `cmk timeline <id>` / `cmk cite <id>` / `cmk recent-activity` | Read the index back — full fact bodies + provenance, sequential context around an observation, a canonical citation link, recent changes (the CLI side of the `mk_*` MCP read tools) |

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@lh8ppl/claude-memory-kit",
-  "version": "0.3.3",
+  "version": "0.3.4",
   "description": "cmk — the CLI for claude-memory-kit. Per-project, in-repo memory system for Claude Code.",
   "type": "module",
   "bin": {

package/src/claude-md.mjs CHANGED Viewed

@@ -64,7 +64,10 @@ function buildBlock(content, version) {
  *     gracefully from a corrupted block (e.g. the user accidentally
  *     deleted the end marker by hand).
  */
-function findManagedBlock(text) {
+// Exported (Task 162) for version-drift.mjs (HC-9) — reads the managed-block
+// version marker without re-implementing the parser. Public contract: returns
+// `{version, corrupted, ...}` or null.
+export function findManagedBlock(text) {
   const startMatch = text.match(MARKER_START_RE);
   if (!startMatch) return null;
@@ -107,7 +110,9 @@ function parseVersion(v) {
  *   compareVersions('1.0.0', '1.0.0') === 0
  *   compareVersions('2.0.0', '1.9.9') === 1
  */
-function compareVersions(a, b) {
+// Exported (Task 162) for version-drift.mjs (HC-9). Public contract: -1/0/1,
+// strips a `-prerelease` suffix before comparing.
+export function compareVersions(a, b) {
   const av = parseVersion(a);
   const bv = parseVersion(b);
   for (let i = 0; i < 3; i++) {

package/src/compress-retry.mjs ADDED Viewed

@@ -0,0 +1,155 @@
+// Bounded, transient-only retry for the Haiku compress call (Task 161 / D-175).
+//
+// WHY this exists: the v0.3.3 cut-gate surfaced `haiku_timeout` / `compress_failed`
+// failures on the compression path. Measurement (D-174) proved the failure is
+// ENVIRONMENTAL/TRANSIENT, not input-size-driven — the kit's own compress.log shows
+// the largest SUCCESS (470 KB) bigger than the largest timeout (334 KB), and a 9 KB
+// input timed out. So the fix is a RETRY (a re-call usually succeeds), not an input
+// cap. The kit inherited the no-retry shape from claude-remember (its precedent —
+// which doesn't retry either); the rest of the field does.
+//
+// The SHAPE is grounded in a 9-system code read
+// (docs/research/2026-06-19-llm-call-retry-patterns-cross-system.md), which converges
+// unanimously on: bounded attempts, exponential backoff, retry ONLY the transient
+// class keyed on the error TYPE, NEVER the deterministic class, reraise after
+// exhaustion. graphiti's `is_server_or_retry_error` predicate + Letta's
+// ValueError(retry)-vs-RuntimeError(don't) split are the model.
+//
+// COMPOSITION (design §8.5 / D-42): the SessionEnd-hook `compressSession` runs under
+// a 60s ceiling CONCURRENT with autoPersona — a 50s attempt + a 50s retry = 100s
+// blows the ceiling. So callers under the ceiling pass `maxAttempts: 1` (no retry —
+// they delegate the retry to the ceiling-free lazy path via the existing
+// restore-on-failure, D-79); only the ceiling-free paths (daily-distill /
+// weekly-curate / lazy compress) pass `maxAttempts: 2`.
+//
+// COOLDOWN INTERACTION (skill-review I-1): the 120s Haiku cooldown marker is touched
+// on SUCCESS only, and the callers gate `isCooldownActive` ONCE before the retry loop.
+// A retry WIDENS the existing "no marker until success" window (~50s → ~100s), so a
+// second hook firing mid-retry could pass the gate and start its own compress. This
+// is NOT a new bug class — the window pre-exists the retry — and the D-79 claim-rename
+// mutex (renameSync of now.md is the real lock) still prevents any corruption: only
+// one roll wins the rename; the other reads an empty buffer and skips. The retry only
+// makes a pre-existing benign window ~2× wider; no marker change is warranted.
+//
+// NO JITTER (skill-review M-2): the field (graphiti) jitters its backoff
+// (wait_random_exponential) to avoid thundering-herd across many concurrent clients.
+// The kit's compress is a single low-concurrency local process (one detached child at
+// a time, gated by the cooldown), so there is no herd to avoid — a plain exponential
+// backoff is sufficient and keeps the timing deterministic for tests.
+/**
+ * Classify a compress() rejection as transient (worth a retry) or deterministic
+ * (a re-call re-fails identically — don't waste the attempt or the budget).
+ *
+ * Transient (retry):
+ *   - HaikuTimeoutError (`category: 'haiku_timeout'`) — `claude --print` was slow;
+ *     the D-174 environmental case. A re-call usually succeeds.
+ *   - HaikuFailedError (`category: 'haiku_failed'`) whose stderr looks like a
+ *     transient server/overload/rate-limit blip (the field's 5xx/429/overloaded
+ *     class), classified from the `exit_code`/`stderr` that 161.6a now captures.
+ *
+ * Deterministic (do NOT retry):
+ *   - A spawn error (`code: 'ENOENT'` etc.) — the binary isn't there; re-spawning
+ *     re-fails.
+ *   - A HaikuFailedError whose stderr is a known-deterministic class (auth /
+ *     invalid-key / policy / bad-request) — retrying always re-fails (graphiti's
+ *     explicit "retrying policy-violating content will always fail").
+ *   - Anything unrecognized — default to NOT retryable (conservative: an unknown
+ *     failure is more likely a real bug than a blip, and a wasted retry costs the
+ *     hook budget).
+ *
+ * @param {unknown} err
+ * @returns {boolean}
+ */
+export function isRetryableCompressError(err) {
+  if (!err || typeof err !== 'object') return false;
+  // Timeout = the transient/environmental case (D-174). Always retryable.
+  if (err.category === 'haiku_timeout') return true;
+  // Spawn-level failure (ENOENT / EACCES / EINVAL) — the binary/permissions are
+  // wrong; re-spawning re-fails identically. Never retryable.
+  if (typeof err.code === 'string' && /^E[A-Z]+$/.test(err.code)) return false;
+  // Non-zero exit — conditional on WHY (the exit_code/stderr 161.6a captures).
+  if (err.category === 'haiku_failed') {
+    const stderr = String(err.stderr ?? '').toLowerCase();
+    // Known-DETERMINISTIC classes: a re-call re-fails. Never retry these.
+    // (Skill-review I-2: `not found` was DROPPED — it appears in transient
+    // contexts too, e.g. a transient "host not found" / "upstream not found,
+    // retrying"; a deterministic 404 from `claude --print` is unlikely, and the
+    // conservative default below already catches a genuine unknown deterministic
+    // failure. Keeping only HIGH-CONFIDENCE deterministic markers.)
+    if (
+      /auth|invalid[_ -]?(api[_ -]?)?key|unauthor|forbidden|permission|policy|invalid[_ -]?request|bad[_ -]?request/.test(
+        stderr,
+      )
+    ) {
+      return false;
+    }
+    // Known-TRANSIENT classes: server/overload/rate blips recover on a re-call.
+    if (/overload|rate[_ -]?limit|429|5\d\d|timeout|timed[_ -]?out|temporar|unavailable|connection|network|reset/.test(stderr)) {
+      return true;
+    }
+    // Unknown non-zero exit → conservative: do NOT retry (treat as deterministic).
+    return false;
+  }
+  return false;
+}
+/**
+ * Call `backend.compress(opts)` with a bounded, transient-only retry.
+ *
+ * @param {{compress: (opts: object) => Promise<any>}} backend
+ * @param {object} opts                  — passed verbatim to backend.compress on every attempt.
+ * @param {object} [config]
+ * @param {number} [config.maxAttempts=2] — TOTAL attempts (1 = no retry; the ceiling-bound contract). Field range is 2–4; the kit uses ≤2 (one retry) to fit the budget.
+ * @param {number} [config.baseBackoffMs=600] — exponential backoff base: wait `baseBackoffMs * 2**(attempt-1)` before attempt N+1. 0 disables the wait (tests).
+ * @param {(err: unknown) => boolean} [config.isRetryable=isRetryableCompressError]
+ * @param {(ms: number) => Promise<void>} [config.sleep] — injectable for tests.
+ * @param {(info: {attempt: number, error: unknown}) => void} [config.onRetry] — fired once
+ *   per retry (Task 161.12 observability), BEFORE the backoff, with the FAILED attempt
+ *   number + the (transient) error. Callers use it to record a `retries` count on their
+ *   compress.log entry so a frequent-retry rate (the degrading-environment signal D-174
+ *   is about) is visible. Not fired on a first-try success or a non-retryable failure.
+ * @returns {Promise<any>} the backend.compress result; reraises the last error after exhaustion.
+ */
+export async function compressWithRetry(
+  backend,
+  opts,
+  {
+    maxAttempts = 2,
+    baseBackoffMs = 600,
+    isRetryable = isRetryableCompressError,
+    sleep = (ms) => new Promise((r) => setTimeout(r, ms)),
+    onRetry,
+  } = {},
+) {
+  const attempts = Math.max(1, maxAttempts);
+  let lastErr;
+  for (let attempt = 1; attempt <= attempts; attempt++) {
+    try {
+      return await backend.compress(opts);
+    } catch (err) {
+      lastErr = err;
+      // Stop immediately if this is the last attempt OR the error isn't transient.
+      if (attempt >= attempts || !isRetryable(err)) {
+        throw err;
+      }
+      // We're going to retry — surface it for observability (161.12), before the wait.
+      if (typeof onRetry === 'function') {
+        try {
+          onRetry({ attempt, error: err });
+        } catch {
+          // onRetry is best-effort instrumentation — never let it break the retry.
+        }
+      }
+      // Exponential backoff before the next attempt (skip the wait when base is 0).
+      const delay = baseBackoffMs > 0 ? baseBackoffMs * 2 ** (attempt - 1) : 0;
+      if (delay > 0) await sleep(delay);
+    }
+  }
+  // Unreachable (the loop either returns or throws), but satisfies control-flow analysis.
+  throw lastErr;
+}

package/src/compress-session.mjs CHANGED Viewed

@@ -37,6 +37,7 @@ import { join, dirname } from 'node:path';
 import { nowIso } from './audit-log.mjs';
 import { ERROR_CATEGORIES } from './result-shapes.mjs';
 import { HaikuTimeoutError } from './compressor.mjs';
+import { compressWithRetry } from './compress-retry.mjs';
 import {
   DEFAULT_COOLDOWN_MS,
   isCooldownActive,
@@ -225,6 +226,12 @@ export async function compressSession({
   now,
   cooldownMs = DEFAULT_COOLDOWN_MS,
   maxOutputBytes = DEFAULT_MAX_OUTPUT_BYTES,
+  // Task 161 / D-175: retry policy. DEFAULT 1 = NO retry — the SessionEnd-hook
+  // contract: this fn runs under the 60s ceiling CONCURRENT with autoPersona, where
+  // a 50s attempt + a 50s retry = 100s blows the ceiling. The ceiling-free LAZY
+  // caller (runLazyCompress) passes maxAttempts:2 to opt into one retry; the hook
+  // keeps its restore-on-failure (D-79) and delegates the retry to that lazy path.
+  maxAttempts = 1,
 } = {}) {
   const ts = now ?? nowIso();
   const date = dateFromIso(ts);
@@ -325,14 +332,21 @@ export async function compressSession({
   // restoreRolling call, so the buffer is never stranded in the rolling file.
   // See design §8.5 for the composition rationale.
   let result;
+  let retries = 0; // Task 161.12: count retries (only the lazy maxAttempts:2 path can retry).
   try {
-    result = await backend.compress({
-      input: wrapBufferForPrompt(buffer),
-      instructions,
-      preserveCitationIds: true,
-      maxOutputBytes,
-      timeoutMs: 50_000,
-    });
+    // maxAttempts default 1 (hook contract: no retry); the lazy caller passes 2.
+    // compressWithRetry is a no-op wrapper at maxAttempts:1 (single attempt, reraise).
+    result = await compressWithRetry(
+      backend,
+      {
+        input: wrapBufferForPrompt(buffer),
+        instructions,
+        preserveCitationIds: true,
+        maxOutputBytes,
+        timeoutMs: 50_000,
+      },
+      { maxAttempts, onRetry: () => { retries += 1; } },
+    );
   } catch (err) {
     // Distinguish HAIKU_TIMEOUT (slow Anthropic) from COMPRESS_FAILED
     // (non-zero subprocess exit / spawn ENOENT / etc). Analytics
@@ -357,6 +371,13 @@ export async function compressSession({
       duration_ms,
       success: false,
       error_category: errorCategory,
+      // Task 161 (D-173 observability): capture the STRUCTURED failure reason
+      // (subprocess exit code + stderr) so a `compress_failed` is diagnosable.
+      // Pre-161 the log kept only error_category — the WHY was discarded, which
+      // is why the kit's own 329-byte compress_failed could not be explained.
+      ...(err?.exitCode != null ? { exit_code: err.exitCode } : {}),
+      ...(err?.stderr ? { error_detail: String(err.stderr).slice(0, 500) } : {}),
+      ...(retries > 0 ? { retries } : {}), // 161.12: failed AFTER retrying (lazy path)
     };
     writeCompressLogEntry({ projectRoot, date, entry });
     return {
@@ -397,6 +418,7 @@ export async function compressSession({
     cost_usd: result?.costUSD ?? 0,
     duration_ms,
     success: true,
+    ...(retries > 0 ? { retries } : {}), // 161.12: succeeded after a transient retry (lazy path)
   };
   writeCompressLogEntry({ projectRoot, date, entry });

package/src/compressor.mjs CHANGED Viewed

@@ -94,6 +94,23 @@ export class HaikuTimeoutError extends Error {
   }
 }
+// Non-zero subprocess exit (the `compress_failed` category). Carries the
+// STRUCTURED exit code + captured stderr so callers can write the real
+// failure reason into compress.log — pre-161 this was a plain Error with
+// the detail buried in `.message`, and the log kept only `error_category`,
+// making a `compress_failed` undiagnosable (the 329-byte failure in the
+// kit's own log that the D-173 investigation could not explain). Mirrors
+// HaikuTimeoutError so the two failure modes carry parallel diagnostics.
+export class HaikuFailedError extends Error {
+  constructor(message, { exitCode, stderr }) {
+    super(message);
+    this.name = 'HaikuFailedError';
+    this.category = 'haiku_failed';
+    this.exitCode = exitCode ?? null;
+    this.stderr = stderr ?? '';
+  }
+}
 // SIGTERM → grace window → SIGKILL escalation. Exported so the kill
 // chain itself is independently testable against real OS processes
 // (see tests/spawn-smoke-kill-chain.test.js) — the production code
@@ -292,8 +309,9 @@ export class HaikuViaAnthropicApi extends CompressorBackend {
         if (settled) return; // timeout already fired
         if (code !== 0) {
           settleReject(
-            new Error(
+            new HaikuFailedError(
               `HaikuViaAnthropicApi: claude --print exit ${code}: ${stderr.trim() || '(no stderr)'}`,
+              { exitCode: code, stderr: stderr.trim() },
             ),
           );
           return;

package/src/daily-distill.mjs CHANGED Viewed

@@ -28,6 +28,7 @@ import { join } from 'node:path';
 import { nowIso } from './audit-log.mjs';
 import { ERROR_CATEGORIES } from './result-shapes.mjs';
 import { HaikuTimeoutError } from './compressor.mjs';
+import { compressWithRetry } from './compress-retry.mjs';
 import {
   DEFAULT_COOLDOWN_MS,
   isCooldownActive,
@@ -195,14 +196,23 @@ export async function dailyDistill({
   const instructions = buildDistillInstructions(maxOutputBytes);
   let result;
+  let retries = 0; // Task 161.12: count retries so the log shows the retry RATE.
   try {
-    result = await backend.compress({
-      input: buffer,
-      instructions,
-      preserveCitationIds: true,
-      maxOutputBytes,
-      timeoutMs: 50_000,
-    });
+    // Task 161 / D-175: ceiling-free path (cron/detached child, NO 60s hook ceiling)
+    // → bounded transient-only retry. A re-call recovers the D-174 environmental
+    // timeout / transient non-zero exit; a deterministic failure (ENOENT/auth) fails
+    // fast (isRetryableCompressError). maxAttempts:2 = one retry.
+    result = await compressWithRetry(
+      backend,
+      {
+        input: buffer,
+        instructions,
+        preserveCitationIds: true,
+        maxOutputBytes,
+        timeoutMs: 50_000,
+      },
+      { maxAttempts: 2, onRetry: () => { retries += 1; } },
+    );
     touchCooldownMarker({ projectRoot, now: ts });
   } catch (err) {
     touchCooldownMarker({ projectRoot, now: ts });
@@ -217,6 +227,10 @@ export async function dailyDistill({
         ts, scope: 'daily-distill', input_bytes, output_bytes: 0,
         model_id: typeof backend.modelId === 'function' ? backend.modelId() : null,
         cost_usd: 0, duration_ms, success: false, error_category: errorCategory,
+        // Task 161 (D-173 observability): structured failure reason — see compress-session.mjs.
+        ...(err?.exitCode != null ? { exit_code: err.exitCode } : {}),
+        ...(err?.stderr ? { error_detail: String(err.stderr).slice(0, 500) } : {}),
+        ...(retries > 0 ? { retries } : {}), // 161.12: failed AFTER retrying
       },
     });
     return {
@@ -246,6 +260,7 @@ export async function dailyDistill({
         (typeof backend.modelId === 'function' ? backend.modelId() : null),
       cost_usd: result?.costUSD ?? 0,
       duration_ms, success: true, source_days: files.length,
+      ...(retries > 0 ? { retries } : {}), // 161.12: succeeded after a transient retry
     },
   });
   return {

package/src/doctor.mjs CHANGED Viewed

@@ -1,4 +1,4 @@
-// `cmk doctor` — health checks HC-1..HC-8 (Task 37, T-031; memsearch HC-1/HC-7 removed in Task 120; HC-8 native bindings added in Task 141a).
+// `cmk doctor` — health checks HC-1..HC-9 (Task 37, T-031; memsearch HC-1/HC-7 removed in Task 120; HC-8 native bindings added in Task 141a; HC-9 version-drift/update-path added in Task 162 / D-176).
 //
 // Public boundary:
 //   async runDoctor({projectRoot, userDir, now, promptUser?, ...overrides})
@@ -46,6 +46,8 @@ import { cronSentinelPath } from './lazy-compress.mjs';
 import { getNativeAutoMemoryState } from './native-memory.mjs';
 import { checkKitBinding, checkEmbedderBinding } from './native-binding.mjs';
 import { resolveDefaultSearchMode } from './semantic-backend.mjs';
+import { checkVersionDrift } from './version-drift.mjs';
+import { getKitVersion } from './install.mjs';
 const TWO_DAYS_MS = 2 * 24 * 60 * 60 * 1000;
 const THREE_DAYS_MS = 3 * 24 * 60 * 60 * 1000;
@@ -541,10 +543,27 @@ async function hc8NativeBindings({ projectRoot, kitBindingProbe, embedderBinding
  * parameter lands at that PR alongside the actual consent flow — not
  * pre-empted in v0.1.0 to avoid the "forward-compat hooks rot" pattern.
  */
+// --- HC-9: project scaffold version matches the installed cmk (Task 162 / D-176) ---
+// After `npm i -g @latest`, a project's version-stamped scaffold stays at the OLD
+// version until `cmk install` re-runs there (the easily-forgotten per-project step).
+// HC-9 reads the project's CLAUDE.md managed-block version + the installed binary
+// version and tells the user to re-run `cmk install` when the project is behind.
+function hc9VersionDrift({ projectRoot, kitVersion }) {
+  const claudeMdPath = join(projectRoot, 'CLAUDE.md');
+  let claudeMdText = null;
+  try {
+    if (existsSync(claudeMdPath)) claudeMdText = readFileSync(claudeMdPath, 'utf8');
+  } catch {
+    claudeMdText = null; // unreadable → skip (treated as not-installed)
+  }
+  return checkVersionDrift({ claudeMdText, kitVersion });
+}
 export async function runDoctor({
   projectRoot,
   userDir,
   now,
+  kitVersion,
   kitBindingProbe,
   embedderBindingProbe,
 } = {}) {
@@ -569,10 +588,12 @@ export async function runDoctor({
   const c6 = hc6NativeAutoMemory({ projectRoot, now: ts });
   const c7 = hc7StaleLocks({ projectRoot, userDir: resolvedUserDir });
   const c8 = await hc8NativeBindings({ projectRoot, kitBindingProbe, embedderBindingProbe });
+  // HC-9: kitVersion injectable for tests; defaults to the installed binary's version.
+  const c9 = hc9VersionDrift({ projectRoot, kitVersion: kitVersion ?? getKitVersion() });
   return {
     action: 'completed',
-    checks: [c1, c2, c3, c4, c5, c6, c7, c8],
+    checks: [c1, c2, c3, c4, c5, c6, c7, c8, c9],
     duration_ms: Date.now() - t0,
   };
 }

package/src/lazy-compress.mjs CHANGED Viewed

@@ -411,6 +411,11 @@ export async function runLazyCompress({
       backend,
       now: ts,
       cooldownMs: 0,
+      // Task 161 / D-175: the lazy path is a DETACHED SessionStart child with NO 60s
+      // hook ceiling, so it opts into the one retry the hook path can't afford. This
+      // is where the SessionEnd-hook's failed roll (which restored now.md, D-79) gets
+      // its real bounded retry.
+      maxAttempts: 2,
     });
   } else if (verdict.action === 'stale-weekly') {
     delegatedTo = 'weekly-curate';

package/src/version-drift.mjs ADDED Viewed

@@ -0,0 +1,72 @@
+// HC-9: version-drift detection (Task 162 / D-176).
+//
+// WHY: after a user updates the global `cmk` (npm i -g @latest), a project's
+// version-stamped scaffold — the CLAUDE.md managed block, the hooks, the skills —
+// stays at the OLD version until `cmk install` re-runs in that project. Updating the
+// npm package ALONE does not touch a project (the per-project re-install is the
+// easily-forgotten step). Pre-162 the kit was silent about it (D-172: no update path).
+// HC-9 makes `cmk doctor` TELL the user the project is behind + the exact command.
+//
+// The project's installed version lives in the CLAUDE.md managed-block start marker
+// (`<!-- claude-memory-kit:start v0.3.3 -->`); the installed binary version is
+// getKitVersion(). Drift = binary NEWER than the project marker → "run cmk install".
+// A project marker NEWER than the binary is a downgrade (older global cli opening a
+// newer-scaffolded project), NOT drift — flag pass, not a false alarm.
+import { findManagedBlock, compareVersions } from './claude-md.mjs';
+/**
+ * Pure HC-9 check. Injectable inputs (no disk read here) so the logic is unit-tested
+ * without a fixture tree; the doctor wiring reads CLAUDE.md + getKitVersion() and
+ * passes them in.
+ *
+ * @param {object} args
+ * @param {string|null} args.claudeMdText — the project's CLAUDE.md content, or null if absent.
+ * @param {string} args.kitVersion        — the installed binary version (getKitVersion()).
+ * @returns {{id:'HC-9', name:string, status:'pass'|'fail'|'skip', message:string, recoveryCommand?:string}}
+ */
+export function checkVersionDrift({ claudeMdText, kitVersion } = {}) {
+  const id = 'HC-9';
+  const name = 'Project scaffold version matches the installed cmk';
+  // No CLAUDE.md, or no managed block → the project isn't kit-installed (or the block
+  // was hand-removed). Not a drift signal; skip (HC-1/repair owns the missing-block case).
+  if (!claudeMdText) {
+    return { id, name, status: 'skip', message: 'no CLAUDE.md found — project not kit-installed' };
+  }
+  const block = findManagedBlock(claudeMdText);
+  if (!block) {
+    return { id, name, status: 'skip', message: 'no claude-memory-kit managed block in CLAUDE.md' };
+  }
+  // `block.version` is the `:start vX` marker value (findManagedBlock recovers it
+  // even from a corrupted/orphan-start block — a stale corrupted block still earns
+  // the `cmk install` advice, which fixes both). compareVersions strips any
+  // `-prerelease` tag, so a `v0.3.4-beta` scaffold reads as `0.3.4` (the kit ships
+  // no prereleases today; this is the intended "close enough" behavior).
+  const projectVersion = block.version;
+  const cmp = compareVersions(kitVersion, projectVersion);
+  if (cmp <= 0) {
+    // Binary == project (match) OR binary < project (a downgrade — older cli, newer
+    // scaffold). Neither is "re-run install to catch up." Pass.
+    return {
+      id,
+      name,
+      status: 'pass',
+      message:
+        cmp === 0
+          ? `project scaffold (v${projectVersion}) matches the installed cmk (v${kitVersion})`
+          : `project scaffold (v${projectVersion}) is newer than the installed cmk (v${kitVersion}) — likely an older global cli; not drift`,
+    };
+  }
+  // Binary NEWER than the project marker → the project is stale. THE drift case.
+  return {
+    id,
+    name,
+    status: 'fail',
+    message: `this project's scaffold is v${projectVersion} but your installed cmk is v${kitVersion} — re-run \`cmk install\` here to refresh the CLAUDE.md block, hooks, and skills (then restart Claude Code)`,
+    recoveryCommand: 'cmk install',
+  };
+}

package/src/weekly-curate.mjs CHANGED Viewed

@@ -39,6 +39,7 @@ import { canonicalize } from '@lh8ppl/cmk-canonicalize';
 import { nowIso } from './audit-log.mjs';
 import { ERROR_CATEGORIES, errorResult } from './result-shapes.mjs';
 import { HaikuTimeoutError } from './compressor.mjs';
+import { compressWithRetry } from './compress-retry.mjs';
 import {
   DEFAULT_COOLDOWN_MS,
   isCooldownActive,
@@ -388,14 +389,21 @@ export async function weeklyCurate({
   const sourceDates = old.map((f) => f.date);
   let result;
+  let retries = 0; // Task 161.12: count retries so the log shows the retry RATE.
   try {
-    result = await backend.compress({
-      input: buffer,
-      instructions,
-      preserveCitationIds: true,
-      maxOutputBytes: archiveMaxBytes,
-      timeoutMs: 50_000,
-    });
+    // Task 161 / D-175: ceiling-free path (cron/detached child, NO 60s hook ceiling)
+    // → bounded transient-only retry (maxAttempts:2 = one retry). See compress-retry.mjs.
+    result = await compressWithRetry(
+      backend,
+      {
+        input: buffer,
+        instructions,
+        preserveCitationIds: true,
+        maxOutputBytes: archiveMaxBytes,
+        timeoutMs: 50_000,
+      },
+      { maxAttempts: 2, onRetry: () => { retries += 1; } },
+    );
     touchCooldownMarker({ projectRoot, now: ts });
   } catch (err) {
     touchCooldownMarker({ projectRoot, now: ts });
@@ -418,6 +426,10 @@ export async function weeklyCurate({
         duration_ms,
         success: false,
         error_category: errorCategory,
+        // Task 161 (D-173 observability): structured failure reason — see compress-session.mjs.
+        ...(err?.exitCode != null ? { exit_code: err.exitCode } : {}),
+        ...(err?.stderr ? { error_detail: String(err.stderr).slice(0, 500) } : {}),
+        ...(retries > 0 ? { retries } : {}), // 161.12: failed AFTER retrying
       },
     });
     return errorResult({
@@ -487,6 +499,7 @@ export async function weeklyCurate({
       archived_days: old.length,
       current_days: current.length,
       recent_rebuild_action: recentResult?.action ?? 'skipped',
+      ...(retries > 0 ? { retries } : {}), // 161.12: succeeded after a transient retry
       ...(deletionErrors.length > 0 ? { deletion_errors: deletionErrors } : {}),
     },
   });