npm - moflo - Versions diffs - 4.9.24 → 4.9.25 - Mend

moflo 4.9.24 → 4.9.25

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/.claude/skills/healer/SKILL.md +51 -0
package/.claude/skills/publish/SKILL.md +4 -8
package/bin/lib/file-sync.mjs +200 -0
package/bin/session-start-launcher.mjs +107 -74
package/dist/src/cli/commands/doctor-checks-runtime.js +78 -0
package/dist/src/cli/commands/doctor-fixes.js +39 -0
package/dist/src/cli/commands/doctor-registry.js +4 -1
package/dist/src/cli/init/executor.js +1 -0
package/dist/src/cli/version.js +1 -1
package/package.json +2 -2
package/scripts/post-install-bootstrap.mjs +38 -62

package/.claude/skills/healer/SKILL.md ADDED Viewed

@@ -0,0 +1,51 @@
+---
+name: healer
+description: Run moflo's Healer (`flo healer`, alias for `flo doctor`) from inside the Claude session. Audit-only by default; pass `--fix` to apply auto-repairs, `-c <component>` for a single check. Use when something feels off (missing moflo.yaml, daemon dead, statusline empty, hooks not firing) or as a periodic health check. Distinct from Claude Code's built-in `/doctor`, which diagnoses Claude Code itself, not moflo.
+arguments: "[--fix] [-c <component>]"
+---
+# /healer — moflo Installation Healer
+Thin wrapper around the `flo healer` CLI. All check + fix logic lives in the CLI; this skill just shells out, surfaces results in-thread, and gives one-line follow-up nudges.
+**Arguments:** $ARGUMENTS
+## Procedure
+1. **Memory first** (gate requirement):
+   ```
+   mcp__moflo__memory_search { query: "doctor healer fix moflo.yaml gate hook wiring", namespace: "guidance" }
+   ```
+2. **Run the CLI** with the user's arguments passed through:
+   ```bash
+   npx moflo healer --json $ARGUMENTS
+   ```
+   - No args → audit-only.
+   - `--fix` → CLI runs auto-repairs after the audit.
+   - `-c <component>` → restricts to one check.
+   - Always include `--json` so output is machine-parseable.
+3. **Surface the JSON in-thread**. Group by status:
+   - `✓ N passing` (count only)
+   - `⚠ warnings` — list `name: message`; flag with `[auto-fixable]` when the result has a `fix` field
+   - `✗ failures` — same
+   - If `--fix` mode, also list which fixes were applied vs which need manual action.
+4. **Nudge based on what changed.** Only mention next steps for state that *actually* changed:
+   - Daemon restarted → `Statusline should refresh within ~5s.`
+   - `moflo.yaml` created → `Review the new defaults at the project root before your next deep run.`
+   - Hook wiring repaired → `Restart Claude Code so the new SessionStart hook fires next launch.`
+   - In audit-only mode with auto-fixable issues → `Run /healer --fix to repair.`
+## Rules
+- **Don't** re-document checks or fixes here. The CLI's `--help` and `src/cli/commands/doctor-*` are the source of truth.
+- **Don't** call `flo doctor` directly — use the `healer` alias for thematic consistency. They're equivalent CLI-side.
+- **Don't** swallow non-zero exit codes silently — surface them in the summary.
+- **Note for users:** Claude Code has its own built-in `/doctor` command that diagnoses Claude Code itself. This skill (`/healer`) diagnoses **moflo**, not Claude Code. The two are complementary, not duplicates — and the healer also runs `claude doctor` internally as a delegated check (`Claude Code Doctor`) so Claude-side issues (auth, settings drift, IDE/extension state) surface in the same report. With `--fix`, the healer re-runs `claude doctor` interactively so you can see and act on its findings; Claude-side fixes typically need user gestures (re-auth, IDE reload) and aren't auto-applied.
+## See Also
+- `flo doctor --help` — full flag/component list
+- `/eldar` — broader project-setup audit; consults the Healer as one input

package/.claude/skills/publish/SKILL.md CHANGED Viewed

@@ -93,19 +93,15 @@ Skipped by default — `ci.yml` runs the full test suite on every PR. Run when `
 **Must have 0 test file failures.** If any test files fail, retest them individually to distinguish real failures from flaky ones (per broken window theory). Fix all real failures before proceeding.
-### Step 5: Doctor (always)
+### Step 5: Doctor (always — strict, no repair)
-Default mode:
-```bash
-npx moflo doctor --fix
-```
-Check mode (`CHECK_MODE=true`):
 ```bash
 npx moflo doctor --strict
 ```
-Doctor is the only check with no CI equivalent — it inspects local state (daemon lock, embeddings hygiene, sandbox tier, vector-stats freshness) that CI cannot validate for you. Always runs.
+Doctor is the only check with no CI equivalent — it inspects local state (daemon lock, embeddings hygiene, sandbox tier, vector-stats freshness) that CI cannot validate for you. Always runs in `--strict` mode regardless of `CHECK_MODE`.
+**Never `--fix` on the publish path.** A release pipeline must fail fast on broken local state, not silently repair it; a doctor that auto-repairs masks the very signal we want — "something is off, stop and investigate before shipping." If `doctor --strict` fails, stop and run `flo healer --fix` (or `npx moflo doctor --fix`) interactively, verify the repair, then retry the publish.
 ### Step 6: Smoke Tests (only if `CHECK_MODE=true`)

package/bin/lib/file-sync.mjs ADDED Viewed

@@ -0,0 +1,200 @@
+/**
+ * Shared file-sync helper for the launcher (#854 §3) and the postinstall
+ * bootstrap (#857 / #975).
+ *
+ * Both layers used to inline the same retry/breaker + copy logic. They drifted
+ * once already (the bootstrap added hash-skip + atomic tmp+rename for #975
+ * while the launcher kept the bare copyFileSync), so this module is the single
+ * source of truth.
+ *
+ * Responsibilities:
+ *   - hash-skip when src and dest are byte-identical (eliminates the dominant
+ *     failure class — overwriting an unchanged file open by Claude/indexer).
+ *   - atomic tmp + rename so concurrent readers never see a torn write.
+ *   - post-write size verify to catch torn writes from AV mid-stream and
+ *     partial DrvFs writes that returned success codes.
+ *   - retry the transient error class (EBUSY/EPERM/EACCES + EVERIFY) with
+ *     exponential backoff [50,200,800]ms.
+ *   - circuit-break after CIRCUIT_BREAK_THRESHOLD distinct exhausted-retry
+ *     failures so a sick host (AV mid-scan over node_modules) doesn't compound
+ *     wall-clock cost.
+ *
+ * Ships at `bin/lib/file-sync.mjs`. Bootstrap imports via relative path from
+ * `scripts/`; launcher imports via `./lib/file-sync.mjs` after sync to
+ * `<consumer>/.claude/scripts/lib/`.
+ */
+import {
+  copyFileSync,
+  existsSync,
+  mkdirSync,
+  readFileSync,
+  renameSync,
+  statSync,
+  unlinkSync,
+} from 'node:fs';
+import { createHash } from 'node:crypto';
+import { dirname } from 'node:path';
+export const TRANSIENT_CODES = new Set(['EBUSY', 'EPERM', 'EACCES']);
+export const RETRY_BACKOFF_MS = [50, 200, 800];
+export const CIRCUIT_BREAK_THRESHOLD = 5;
+// Code attached to the post-write size-verify failure. Treated as transient by
+// syncWithRetry so torn writes from AV mid-stream / partial DrvFs writes get a
+// retry instead of immediately surfacing as a hard failure.
+export const VERIFY_FAIL_CODE = 'EVERIFY';
+const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
+export function fileHash(path) {
+  try {
+    return createHash('sha1').update(readFileSync(path)).digest('hex');
+  } catch {
+    return null;
+  }
+}
+export function contentEqual(srcPath, destPath) {
+  if (!existsSync(destPath)) return false;
+  // Size check first — skips the SHA-1 pass on every mis-sized pair without
+  // any I/O on the file body. For the bootstrap's small file set the SHA-1
+  // is cheap, but this fires on every file on every session-start with
+  // version drift, and under load (AV lock + retries) the reads compound.
+  let srcSize, destSize;
+  try {
+    srcSize = statSync(srcPath).size;
+    destSize = statSync(destPath).size;
+  } catch {
+    return false;
+  }
+  if (srcSize !== destSize) return false;
+  const srcHash = fileHash(srcPath);
+  if (!srcHash) return false;
+  const destHash = fileHash(destPath);
+  return destHash !== null && srcHash === destHash;
+}
+/**
+ * Atomic copy via tmp + rename with post-write size verify.
+ *
+ * Steps:
+ *   1. copyFileSync(src, dest.tmp)
+ *   2. Verify dest.tmp size matches src size (catches torn writes from AV
+ *      mid-stream and partial DrvFs writes that returned success codes).
+ *      Mismatch unlinks the tmp and throws { code: 'EVERIFY' }, which the
+ *      retry loop treats as transient.
+ *   3. renameSync(dest.tmp, dest) — atomic on Win/macOS/Linux/WSL/DrvFs.
+ *
+ * If rename fails, the .tmp sidecar persists as a recovery breadcrumb — next
+ * session-start can complete the swap once the original lock has cleared.
+ *
+ * `deps` is dependency injection for tests (#976 fault injection of the
+ * truncated-tmp / partial-DrvFs scenario). Production callers omit it.
+ */
+export function atomicCopy(src, dest, deps = {}) {
+  const _copyFile = deps.copyFile || copyFileSync;
+  const _stat = deps.stat || statSync;
+  const _rename = deps.rename || renameSync;
+  const _unlink = deps.unlink || unlinkSync;
+  const tmp = `${dest}.tmp`;
+  _copyFile(src, tmp);
+  let srcSize, tmpSize;
+  try {
+    srcSize = _stat(src).size;
+    tmpSize = _stat(tmp).size;
+  } catch (statErr) {
+    try { _unlink(tmp); } catch { /* best-effort cleanup */ }
+    const err = new Error(`atomicCopy verify stat failed: ${statErr.message || statErr}`);
+    err.code = statErr.code || VERIFY_FAIL_CODE;
+    throw err;
+  }
+  if (srcSize !== tmpSize) {
+    try { _unlink(tmp); } catch { /* best-effort cleanup */ }
+    const err = new Error(
+      `atomicCopy size mismatch (src=${srcSize} tmp=${tmpSize}) for ${dest}`,
+    );
+    err.code = VERIFY_FAIL_CODE;
+    throw err;
+  }
+  _rename(tmp, dest);
+}
+export function errMessage(err) {
+  if (!err) return 'unknown error';
+  return err.code ? `${err.code} ${err.message || ''}`.trim() : (err.message || String(err));
+}
+/**
+ * Build a retry-aware syncer.
+ *
+ * @param {object} [options]
+ * @param {(key: string, dest: string) => void} [options.onSuccess]
+ *   Fires after every successful syncFile (including hash-skip identical
+ *   paths). Use it to record manifest entries from the launcher; bootstrap
+ *   ignores it.
+ *
+ * @returns {{
+ *   syncFile: (src: string, dest: string, key: string) => Promise<{ok?: boolean, skipped?: true | 'identical'}>,
+ *   failures: Array<{key: string, message: string, src?: string, dest?: string}>,
+ *   isCircuitOpen: () => boolean,
+ * }}
+ */
+export function makeSyncer({ onSuccess } = {}) {
+  let circuitOpen = false;
+  const failures = [];
+  async function syncWithRetry(operation) {
+    const maxAttempts = circuitOpen ? 1 : RETRY_BACKOFF_MS.length + 1;
+    let lastErr = null;
+    let lastCode = null;
+    for (let attempt = 0; attempt < maxAttempts; attempt++) {
+      if (attempt > 0) await sleep(RETRY_BACKOFF_MS[attempt - 1]);
+      try {
+        operation();
+        return { ok: true };
+      } catch (err) {
+        lastErr = err;
+        lastCode = err && err.code ? err.code : null;
+        const transient = TRANSIENT_CODES.has(lastCode) || lastCode === VERIFY_FAIL_CODE;
+        if (!transient) break;
+      }
+    }
+    if (!circuitOpen && failures.length + 1 >= CIRCUIT_BREAK_THRESHOLD) {
+      circuitOpen = true;
+    }
+    return { ok: false, err: lastErr, code: lastCode };
+  }
+  async function syncFile(src, dest, key) {
+    if (!existsSync(src)) return { skipped: true };
+    try {
+      mkdirSync(dirname(dest), { recursive: true });
+    } catch (err) {
+      failures.push({ key, message: errMessage(err), src, dest });
+      return { ok: false };
+    }
+    if (contentEqual(src, dest)) {
+      try { onSuccess?.(key, dest); } catch { /* non-fatal */ }
+      return { ok: true, skipped: 'identical' };
+    }
+    const result = await syncWithRetry(() => atomicCopy(src, dest));
+    if (result.ok) {
+      try { onSuccess?.(key, dest); } catch { /* non-fatal */ }
+      return { ok: true };
+    }
+    const transient = TRANSIENT_CODES.has(result.code) || result.code === VERIFY_FAIL_CODE;
+    const tail = transient
+      ? ` (retried ${RETRY_BACKOFF_MS.length}× after ${result.code}${circuitOpen ? '; circuit open' : ''})`
+      : '';
+    failures.push({ key, message: `${errMessage(result.err)}${tail}`, src, dest });
+    return { ok: false };
+  }
+  return {
+    syncFile,
+    failures,
+    isCircuitOpen: () => circuitOpen,
+  };
+}

package/bin/session-start-launcher.mjs CHANGED Viewed

@@ -8,13 +8,14 @@
  */
 import { spawn, execFileSync } from 'child_process';
-import { existsSync, readFileSync, writeFileSync, copyFileSync, unlinkSync, readdirSync, mkdirSync, statSync } from 'fs';
+import { existsSync, readFileSync, writeFileSync, unlinkSync, readdirSync, mkdirSync, statSync } from 'fs';
 import { resolve, dirname, join } from 'path';
 import { fileURLToPath, pathToFileURL } from 'url';
 import { mofloDir } from './lib/moflo-paths.mjs';
 import { repairMemoryDbIfCorrupt } from './lib/db-repair.mjs';
 import { resolveMofloBin } from './lib/resolve-bin.mjs';
 import { applyRetiredPrune } from './lib/retired-files.mjs';
+import { makeSyncer, contentEqual } from './lib/file-sync.mjs';
 // Headless skip (#860). The daemon's headless workers spawn `claude --print`
 // with CLAUDE_CODE_HEADLESS=true (see src/cli/services/headless-worker-
@@ -166,8 +167,12 @@ const UPGRADE_NOTICE_INPROGRESS_TTL_MS = 5 * 60 * 1000;
 const UPGRADE_NOTICE_COMPLETED_TTL_MS = 2 * 60 * 1000;
 const UPGRADE_NOTICE_PATH = () => join(mofloDir(projectRoot), 'upgrade-notice.json');
-function writeUpgradeNotice(status) {
-  if (!upgradeNoticeContext) return;
+// Single-source-of-truth notice writer. Reused by writeUpgradeNotice (the
+// version-bump / drift-heal path) and the §0-bootstrap-sentinel + §3h paths
+// (#975 statusline-channel promotion). Keeps the JSON shape colocated with
+// the TTL constants instead of letting it drift across two inline copies.
+function buildAndWriteNotice(context, status) {
+  if (!context) return;
   const ttlMs = status === 'completed'
     ? UPGRADE_NOTICE_COMPLETED_TTL_MS
     : UPGRADE_NOTICE_INPROGRESS_TTL_MS;
@@ -176,9 +181,9 @@ function writeUpgradeNotice(status) {
     const now = Date.now();
     const notice = {
       status,
-      kind: upgradeNoticeContext.kind,
-      from: upgradeNoticeContext.from,
-      to: upgradeNoticeContext.to,
+      kind: context.kind,
+      from: context.from,
+      to: context.to,
       at: new Date(now).toISOString(),
       expiresAt: new Date(now + ttlMs).toISOString(),
       changes: 0,
@@ -187,6 +192,10 @@ function writeUpgradeNotice(status) {
   } catch { /* non-fatal — statusline just won't show the segment */ }
 }
+function writeUpgradeNotice(status) {
+  buildAndWriteNotice(upgradeNoticeContext, status);
+}
 // ── 0-pre. Drop any stale upgrade notice (#738, #743) ───────────────────────
 // `upgrade-notice.json` is a transient handshake between launcher and
 // statusline — it should never survive past the launcher run that wrote it.
@@ -201,6 +210,39 @@ try {
   unlinkSync(join(mofloDir(projectRoot), 'upgrade-notice.json'));
 } catch { /* non-fatal — file usually doesn't exist */ }
+// ── 0-bootstrap-sentinel. Surface partial-bootstrap failures (#975) ─────────
+// `scripts/post-install-bootstrap.mjs` writes `.moflo/bootstrap-failed.json`
+// when its file-sync left some helpers unwritten (WSL DrvFs lock, EBUSY race,
+// breaker open, …). Without this block the user has no in-session signal
+// that the upgrade was incomplete — the launcher itself ran fine, but it's
+// running from STALE files. Emit a high-visibility line pointing them at
+// the healer so the silent failure mode that produced #975 can't recur.
+// Section 3h below clears the sentinel after a clean re-sync.
+//
+// Also write a `kind: 'repair'` upgrade-notice so the statusline surfaces
+// the prompt persistently — emitWarning lands on stderr only and Claude Code
+// relays it once on session start; the statusline keeps the indicator in
+// front of the user until §3h flips it to `completed` (sync resolved) or
+// the 5-min in-progress TTL expires (visibility cap, statusline tests).
+let bootstrapSentinelData = null;
+const BOOTSTRAP_SENTINEL_PATH = resolve(mofloDir(projectRoot), 'bootstrap-failed.json');
+let bootstrapNoticeContext = null;
+try {
+  if (existsSync(BOOTSTRAP_SENTINEL_PATH)) {
+    bootstrapSentinelData = JSON.parse(readFileSync(BOOTSTRAP_SENTINEL_PATH, 'utf-8'));
+    const count = Array.isArray(bootstrapSentinelData?.failures) ? bootstrapSentinelData.failures.length : 0;
+    const sentinelVersion = bootstrapSentinelData?.mofloVersion || 'unknown';
+    emitWarning(
+      `Upgrade detected ${count} unfinished install step(s) from npm install (moflo@${sentinelVersion}). Run /healer --fix to repair.`,
+    );
+    bootstrapNoticeContext = { kind: 'repair', from: sentinelVersion, to: sentinelVersion };
+    buildAndWriteNotice(bootstrapNoticeContext, 'in-progress');
+  }
+} catch (err) {
+  // Unreadable sentinel — leave it; healer will catch the underlying issue.
+  emitWarning(`bootstrap sentinel read skipped (${errMessage(err)})`);
+}
 // ── 0. Legacy whole-DB / directory migrations have been retired (#851) ─────
 // LEGACY-V2: Pre-#851 the launcher renamed `.claude-flow/` → `.moflo/` and
 // byte-copied `.swarm/memory.db` → `.moflo/moflo.db` on every session start.
@@ -518,65 +560,21 @@ try {
       // pre-upgrade content forever because it was never recorded in the
       // manifest. Surface failures on stderr — Claude Code captures
       // session-start stderr as additionalContext so the user sees them too.
-      const syncFailures = [];
-      // Standard retry with exponential backoff + circuit breaker for the
-      // transient error class (EBUSY / EPERM / EACCES — Windows file lock,
-      // AV real-time scan, concurrent helper invocation). Hard errors
-      // (ENOENT, etc.) fall through immediately. Once 5 distinct files have
-      // exhausted retries the circuit opens and the tail of the sync runs
-      // with maxAttempts=1 so a sick host (AV mid-scan over node_modules)
-      // doesn't compound the wall-clock cost. Async setTimeout — never
-      // busy-wait in a session-start hook (CPU pinning during EBUSY backoff
-      // is the worst possible response when the OS is the bottleneck).
-      const TRANSIENT_CODES = new Set(['EBUSY', 'EPERM', 'EACCES']);
-      const RETRY_BACKOFF_MS = [50, 200, 800];
-      const CIRCUIT_BREAK_THRESHOLD = 5;
-      let circuitOpen = false;
-      const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
-      async function syncWithRetry(operation) {
-        const maxAttempts = circuitOpen ? 1 : RETRY_BACKOFF_MS.length + 1;
-        let lastErr = null;
-        let lastCode = null;
-        for (let attempt = 0; attempt < maxAttempts; attempt++) {
-          if (attempt > 0) await sleep(RETRY_BACKOFF_MS[attempt - 1]);
-          try {
-            operation();
-            return { ok: true };
-          } catch (err) {
-            lastErr = err;
-            lastCode = err && err.code ? err.code : null;
-            if (!TRANSIENT_CODES.has(lastCode)) break;
-          }
-        }
-        if (!circuitOpen && syncFailures.length + 1 >= CIRCUIT_BREAK_THRESHOLD) {
-          circuitOpen = true;
-        }
-        return { ok: false, err: lastErr, code: lastCode };
-      }
-      /** Copy src → dest if src exists, record `{path, size}` in manifest.
-       * Retries the transient error class with backoff (#854); failures land
-       * in syncFailures for the post-block stderr summary. The recorded size
-       * is read from the just-written destination so a subsequent launcher
-       * can detect content drift via size mismatch. */
+      //
+      // Retry/breaker semantics (#854) + hash-skip + atomic tmp+rename + post-
+      // write verify (#975) live in `./lib/file-sync.mjs`, shared with
+      // `scripts/post-install-bootstrap.mjs` so the npm-install path and the
+      // session-start path can't drift. The launcher records manifest entries
+      // on success via the onSuccess callback so currentManifest stays the
+      // single source of truth for next-session retired-file cleanup.
       function recordManifestEntry(manifestKey, dest) {
         let size = null;
         try { size = statSync(dest).size; } catch { /* size left null — drift check still works on file-existence */ }
         currentManifest.push({ path: manifestKey, size });
       }
-      async function syncFile(src, dest, manifestKey) {
-        if (!existsSync(src)) return;
-        const result = await syncWithRetry(() => copyFileSync(src, dest));
-        if (result.ok) {
-          recordManifestEntry(manifestKey, dest);
-          return;
-        }
-        const tail = TRANSIENT_CODES.has(result.code)
-          ? ` (retried ${RETRY_BACKOFF_MS.length}× after ${result.code}${circuitOpen ? '; circuit open' : ''})`
-          : '';
-        syncFailures.push({ key: manifestKey, message: `${errMessage(result.err)}${tail}` });
-      }
+      const { syncFile, failures: syncFailures } = makeSyncer({
+        onSuccess: (key, dest) => recordManifestEntry(key, dest),
+      });
       // Version changed — sync scripts from bin/
       if (autoUpdateConfig.scripts) {
@@ -663,20 +661,11 @@ try {
           for (const srcDir of helperSources) {
             const src = resolve(srcDir, file);
             if (existsSync(src)) {
-              const inlineResult = await syncWithRetry(() => copyFileSync(src, dest));
-              if (inlineResult.ok) {
-                recordManifestEntry(`.claude/helpers/${file}`, dest);
-              } else {
-                const code = inlineResult.code;
-                const tail = TRANSIENT_CODES.has(code)
-                  ? ` (retried ${RETRY_BACKOFF_MS.length}× after ${code}${circuitOpen ? '; circuit open' : ''})`
-                  : '';
-                syncFailures.push({
-                  key: `.claude/helpers/${file}`,
-                  message: `${errMessage(inlineResult.err)}${tail}`,
-                });
-              }
-              break; // first source wins
+              // First existing source wins — same semantics as before. The
+              // shared syncFile records manifest + collects failures the
+              // same way the rest of section 3 does.
+              await syncFile(src, dest, `.claude/helpers/${file}`);
+              break;
             }
           }
         }
@@ -1314,6 +1303,15 @@ try {
           'review defaults — model routing, sandbox, gates, hooks',
         );
       }
+    } else {
+      // Previously a silent skip — masked the actual reason consumers didn't
+      // get a yaml after upgrading from pre-#895 versions. If neither template
+      // path resolves the install is incomplete (partial extract, prune ate
+      // the file, dogfood without a built dist/). Surface a healer hint so
+      // the user can repair instead of staring at a missing yaml.
+      emitWarning(
+        `moflo.yaml create skipped — template not found at ${tplPaths.join(' or ')}; run 'flo doctor --fix' to repair`,
+      );
     }
   }
 } catch (err) {
@@ -1494,6 +1492,41 @@ if (pendingVersionStampWrite) {
   }
 }
+// ── 3h. Clear bootstrap sentinel if section-3 sync resolved it (#975) ───────
+// Section 3 above re-attempts the same file copies the bootstrap was supposed
+// to do, with the launcher's own retry logic. If after section 3 every file
+// the bootstrap reported as failed is now byte-identical to its source, the
+// previously-unfinished work is done — drop the sentinel so the warning at
+// section 0-bootstrap-sentinel doesn't fire on the next session. If anything
+// is still mismatched, leave the sentinel in place; healer / next session
+// will re-attempt.
+if (bootstrapSentinelData?.failures?.length > 0) {
+  try {
+    const allRepaired = bootstrapSentinelData.failures.every((f) => {
+      if (!f?.src || !f?.dest) return false;
+      // contentEqual already does the size short-circuit + SHA-1 hash and
+      // is the same predicate the §3 sync used to decide whether to skip
+      // the copy in the first place — reusing it here keeps "the sentinel
+      // is clearable when bytes match" consistent across both code paths.
+      return contentEqual(f.src, f.dest);
+    });
+    if (allRepaired) {
+      unlinkSync(BOOTSTRAP_SENTINEL_PATH);
+      emitMutation('cleared bootstrap-failed sentinel', 'previously-failed copies are now in sync');
+      // Flip the §0-bootstrap-sentinel "in-progress" repair notice to
+      // "completed" so the statusline shows the post-repair badge. Skip when
+      // §3 already wrote its own upgradeNoticeContext (version bump / drift)
+      // — that path runs the §3f writer with its own kind/version and we
+      // shouldn't clobber it from here.
+      if (bootstrapNoticeContext && !upgradeNoticeContext) {
+        buildAndWriteNotice(bootstrapNoticeContext, 'completed');
+      }
+    }
+  } catch (err) {
+    emitWarning(`bootstrap sentinel verify skipped (${errMessage(err)})`);
+  }
+}
 // Bypasses emitMutation — framing, not a mutation, so it must not inflate the count.
 if (mutationCount > 0) {
   try {

package/dist/src/cli/commands/doctor-checks-runtime.js CHANGED Viewed

@@ -147,6 +147,84 @@ export async function checkClaudeCode() {
         };
     }
 }
+/**
+ * Delegate diagnostics to Claude Code's own `claude doctor` command and surface
+ * the result. Catches Claude-side issues (settings drift, MCP/auth, IDE/extension
+ * state, update channel) that moflo's own checks can't see — since `claude` is
+ * not a moflo-owned binary we don't try to parse its output structurally; we
+ * just report exit code + a short tail. Skip silently when `claude` isn't
+ * installed — `checkClaudeCode` already covers that condition.
+ */
+export async function checkClaudeCodeDoctor() {
+    try {
+        await runCommand('claude --version', 3000);
+    }
+    catch {
+        return {
+            name: 'Claude Code Doctor',
+            status: 'pass',
+            message: 'Skipped (claude CLI not installed — see Claude Code CLI check)',
+        };
+    }
+    // Capture both streams + exit code without throwing. `claude doctor` exits
+    // non-zero on findings, so a try/catch over execAsync would lose the body.
+    const result = await new Promise((resolve) => {
+        const child = exec('claude doctor', {
+            encoding: 'utf8',
+            timeout: 30000,
+            shell: process.platform === 'win32' ? 'cmd.exe' : '/bin/sh',
+            env: { ...process.env },
+            windowsHide: true,
+        }, (err, stdout, stderr) => {
+            resolve({
+                code: err && typeof err.code === 'number'
+                    ? (err.code)
+                    : (err ? 1 : 0),
+                stdout: (stdout || '').toString().trim(),
+                stderr: (stderr || '').toString().trim(),
+            });
+        });
+        child.on('error', () => resolve({ code: 1, stdout: '', stderr: '' }));
+    });
+    // claude doctor not recognised → some Claude versions don't ship the
+    // subcommand. Surface as a pass-skip rather than a failure so older Claude
+    // installs aren't penalised.
+    const combined = `${result.stdout}\n${result.stderr}`.toLowerCase();
+    if (/unknown command|command not found|usage:.*claude/.test(combined) &&
+        !combined.includes('check')) {
+        return {
+            name: 'Claude Code Doctor',
+            status: 'pass',
+            message: 'Skipped (this Claude version does not expose `claude doctor`)',
+        };
+    }
+    if (result.code === 0) {
+        const firstLine = result.stdout.split(/\r?\n/).find((l) => l.trim()) || 'No issues reported';
+        return { name: 'Claude Code Doctor', status: 'pass', message: firstLine.slice(0, 120) };
+    }
+    // Non-zero with zero output → `claude doctor` is interactive in current
+    // Claude Code releases (verified on 2.1.132): it opens a TUI and produces
+    // nothing on a non-TTY child stdout, then our exec timeout kills it. Treat
+    // as a skip — the check can't observe the TUI from here, and warning would
+    // fire on every machine running the same Claude version.
+    if (!result.stdout && !result.stderr) {
+        return {
+            name: 'Claude Code Doctor',
+            status: 'pass',
+            message: 'Skipped (claude doctor is interactive — run manually to see findings)',
+        };
+    }
+    // Non-zero — surface the tail so the user has a hint, and point to the
+    // interactive command for the full report. Don't try to fix from here:
+    // Claude-side fixes (re-auth, settings repair, IDE reload) need user gestures.
+    const tailLines = result.stdout.split(/\r?\n/).filter((l) => l.trim()).slice(-3).join(' | ');
+    return {
+        name: 'Claude Code Doctor',
+        status: 'warn',
+        message: `claude doctor reported issues: ${tailLines.slice(0, 200)}`,
+        fix: 'Run `claude doctor` interactively for full report and follow its instructions',
+    };
+}
 export async function installClaudeCode() {
     try {
         output.writeln();

package/dist/src/cli/commands/doctor-fixes.js CHANGED Viewed

@@ -5,6 +5,7 @@
  * shell-out where possible). Falls back to running the check's `fix` string
  * if it looks like an `npx`/`npm`/`claude` command.
  */
+import { execSync } from 'child_process';
 import { existsSync, mkdirSync, readFileSync, unlinkSync, writeFileSync } from 'fs';
 import { join } from 'path';
 import { output } from '../output.js';
@@ -139,6 +140,23 @@ export async function autoFixCheck(check) {
                 return false;
             }
         },
+        // moflo.yaml auto-create. The session-start launcher already runs
+        // `ensureMofloYamlExists` (see bin/session-start-launcher.mjs § 3d-yaml-create,
+        // #895) but it can miss when the launcher itself was old at upgrade time —
+        // user reported moflo.yaml absent after npm-installing past 4.9.2. Mirror
+        // the same canonical create here so doctor --fix (and the /healer skill
+        // wrapping it) self-heal on the spot instead of waiting for the next
+        // SessionStart firing.
+        'moflo.yaml': async () => {
+            try {
+                const { ensureMofloYamlExists } = await import('../init/moflo-yaml-template.js');
+                const result = ensureMofloYamlExists(process.cwd());
+                return result.created || existsSync(join(process.cwd(), 'moflo.yaml'));
+            }
+            catch {
+                return false;
+            }
+        },
         'Daemon Status': async () => {
             const lockFile = join(process.cwd(), '.moflo', 'daemon.lock');
             const pidFile = join(process.cwd(), '.moflo', 'daemon.pid');
@@ -157,6 +175,27 @@ export async function autoFixCheck(check) {
         'Claude Code CLI': async () => {
             return installClaudeCode();
         },
+        // Pass-through to Claude Code's own diagnostic. We don't own its CLI surface
+        // and most Claude-side findings (auth, IDE reload, settings drift) need
+        // user gestures, so the "fix" here is just to re-run with inherited stdio
+        // and let the user act on what they see.
+        'Claude Code Doctor': async () => {
+            try {
+                execSync('claude doctor', {
+                    encoding: 'utf8',
+                    stdio: 'inherit',
+                    windowsHide: true,
+                    timeout: 60000,
+                });
+                return true;
+            }
+            catch {
+                // Non-zero exit is informational here — user has seen the output and
+                // can act on it. Don't claim success, but don't claim failure of OUR
+                // healer either; flag as "needs manual action".
+                return false;
+            }
+        },
         'Zombie Processes': async () => {
             const result = await findZombieProcesses(true);
             return result.killed > 0 || result.details.length === 0;

package/dist/src/cli/commands/doctor-registry.js CHANGED Viewed

@@ -8,7 +8,7 @@ import { checkSubagentHealth, checkSpellExecution, checkMcpToolInvocation, check
 import { checkEmbeddingHygiene } from './doctor-embedding-hygiene.js';
 import { checkSwarmFunctional, checkHiveMindFunctional, } from './doctor-checks-swarm.js';
 import { checkMemoryAccessFunctional } from './doctor-checks-memory-access.js';
-import { checkBuildTools, checkClaudeCode, checkDiskSpace, checkGit, checkGitRepo, checkNodeVersion, checkNpmVersion, } from './doctor-checks-runtime.js';
+import { checkBuildTools, checkClaudeCode, checkClaudeCodeDoctor, checkDiskSpace, checkGit, checkGitRepo, checkNodeVersion, checkNpmVersion, } from './doctor-checks-runtime.js';
 import { checkConfigFile, checkDaemonStatus, checkMcpServers, checkMemoryDatabase, checkMofloYamlCompliance, checkStatusLine, checkTestDirs, } from './doctor-checks-config.js';
 import { checkSpellEngine, checkSandboxTier } from './doctor-checks-platform.js';
 import { checkEmbeddings, checkSemanticQuality, } from './doctor-checks-memory.js';
@@ -21,6 +21,7 @@ export const allChecks = [
     checkNodeVersion,
     checkNpmVersion,
     checkClaudeCode,
+    checkClaudeCodeDoctor,
     checkGit,
     checkGitRepo,
     checkConfigFile,
@@ -63,6 +64,8 @@ export const componentMap = {
     'node': checkNodeVersion,
     'npm': checkNpmVersion,
     'claude': checkClaudeCode,
+    'claude-doctor': checkClaudeCodeDoctor,
+    'claude-code-doctor': checkClaudeCodeDoctor,
     'config': checkConfigFile,
     'yaml': checkMofloYamlCompliance,
     'moflo-yaml': checkMofloYamlCompliance,

package/dist/src/cli/init/executor.js CHANGED Viewed

@@ -33,6 +33,7 @@ export const SKILLS_MAP = {
     core: [
         'eldar',
         'guidance',
+        'healer',
         'flo-simplify',
         'reasoningbank-intelligence',
     ],

package/dist/src/cli/version.js CHANGED Viewed

@@ -2,5 +2,5 @@
  * Auto-generated by build. Do not edit manually.
  * Source of truth: root package.json → scripts/sync-version.mjs
  */
-export const VERSION = '4.9.24';
+export const VERSION = '4.9.25';
 //# sourceMappingURL=version.js.map

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "moflo",
-  "version": "4.9.24",
+  "version": "4.9.25",
   "description": "MoFlo — AI agent orchestration for Claude Code. A standalone, opinionated toolkit with semantic memory, learned routing, gates, spells, and the /flo issue-execution skill.",
   "main": "dist/src/cli/index.js",
   "type": "module",
@@ -84,7 +84,7 @@
     "@typescript-eslint/eslint-plugin": "^7.18.0",
     "@typescript-eslint/parser": "^7.18.0",
     "eslint": "^8.0.0",
-    "moflo": "^4.9.23",
+    "moflo": "^4.9.24",
     "tsx": "^4.21.0",
     "typescript": "^5.9.3",
     "vitest": "^4.0.0"

package/scripts/post-install-bootstrap.mjs CHANGED Viewed

@@ -37,15 +37,16 @@
  */
 import {
-  copyFileSync,
   existsSync,
   mkdirSync,
   readdirSync,
   readFileSync,
   statSync,
+  writeFileSync,
 } from 'node:fs';
 import { dirname, join, resolve } from 'node:path';
 import { fileURLToPath, pathToFileURL } from 'node:url';
+import { errMessage, makeSyncer } from '../bin/lib/file-sync.mjs';
 const SCRIPT_PATH = fileURLToPath(import.meta.url);
 const MOFLO_ROOT = resolve(dirname(SCRIPT_PATH), '..');
@@ -87,68 +88,12 @@ export const SOURCE_HELPER_FILES = [
   'post-commit',
 ];
-// ── Retry + circuit breaker (#854 contract) ──────────────────────────────────
+// ── Retry + atomic copy + circuit breaker (#854 / #975) ─────────────────────
 //
-// Mirrors the launcher's syncWithRetry. Backoff [50,200,800]ms covers Windows
-// EBUSY windows from concurrent helper invocation + AV real-time scan. The
-// breaker opens after 5 distinct files exhaust retries so a sick host
-// (AV mid-scan over node_modules) doesn't compound wall-clock cost.
-const TRANSIENT_CODES = new Set(['EBUSY', 'EPERM', 'EACCES']);
-const RETRY_BACKOFF_MS = [50, 200, 800];
-const CIRCUIT_BREAK_THRESHOLD = 5;
-const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
-function makeSyncer() {
-  let circuitOpen = false;
-  const failures = [];
-  async function syncWithRetry(operation) {
-    const maxAttempts = circuitOpen ? 1 : RETRY_BACKOFF_MS.length + 1;
-    let lastErr = null;
-    let lastCode = null;
-    for (let attempt = 0; attempt < maxAttempts; attempt++) {
-      if (attempt > 0) await sleep(RETRY_BACKOFF_MS[attempt - 1]);
-      try {
-        operation();
-        return { ok: true };
-      } catch (err) {
-        lastErr = err;
-        lastCode = err && err.code ? err.code : null;
-        if (!TRANSIENT_CODES.has(lastCode)) break;
-      }
-    }
-    if (!circuitOpen && failures.length + 1 >= CIRCUIT_BREAK_THRESHOLD) {
-      circuitOpen = true;
-    }
-    return { ok: false, err: lastErr, code: lastCode };
-  }
-  async function syncFile(src, dest, manifestKey) {
-    if (!existsSync(src)) return { skipped: true };
-    try {
-      mkdirSync(dirname(dest), { recursive: true });
-    } catch (err) {
-      failures.push({ key: manifestKey, message: errMessage(err) });
-      return { ok: false };
-    }
-    const result = await syncWithRetry(() => copyFileSync(src, dest));
-    if (result.ok) return { ok: true };
-    const tail = TRANSIENT_CODES.has(result.code)
-      ? ` (retried ${RETRY_BACKOFF_MS.length}× after ${result.code}${circuitOpen ? '; circuit open' : ''})`
-      : '';
-    failures.push({ key: manifestKey, message: `${errMessage(result.err)}${tail}` });
-    return { ok: false };
-  }
-  return { syncFile, failures };
-}
-function errMessage(err) {
-  if (!err) return 'unknown error';
-  return err.code ? `${err.code} ${err.message || ''}`.trim() : (err.message || String(err));
-}
+// Implementation lives in `bin/lib/file-sync.mjs` so the launcher's section 3
+// shares the same hash-skip + atomic + verify path. Backoff [50,200,800]ms
+// covers Windows EBUSY windows from concurrent helper invocation + AV scan;
+// breaker opens after 5 distinct exhausted-retry failures.
 // ── Project root discovery ──────────────────────────────────────────────────
 //
@@ -294,6 +239,37 @@ export async function runBootstrap({
     log(
       `moflo: postinstall bootstrap left ${failures.length} file(s) unsynced — run 'flo doctor --fix' to repair:\n${sample}${more}`,
     );
+    // #975: write a sentinel that session-start picks up so the user gets a
+    // visible "upgrade left work undone" prompt instead of a silent stale
+    // launcher. The bootstrap's stderr alone is buried in `npm install`
+    // output noise. Best-effort write — we never block install on this.
+    try {
+      const mofloDir = resolve(projectRoot, '.moflo');
+      mkdirSync(mofloDir, { recursive: true });
+      let mofloVersion = 'unknown';
+      try {
+        const pkgPath = resolve(mofloRoot, 'package.json');
+        if (existsSync(pkgPath)) {
+          mofloVersion = JSON.parse(readFileSync(pkgPath, 'utf-8')).version || 'unknown';
+        }
+      } catch { /* version is informational only */ }
+      const sentinel = {
+        timestamp: new Date().toISOString(),
+        mofloVersion,
+        failures: failures.map((f) => ({
+          key: f.key,
+          message: f.message,
+          src: f.src,
+          dest: f.dest,
+        })),
+      };
+      writeFileSync(
+        resolve(mofloDir, 'bootstrap-failed.json'),
+        JSON.stringify(sentinel, null, 2),
+        'utf-8',
+      );
+    } catch { /* sentinel write must not block install */ }
   }
   return { ran: true, synced, failed: failures.length, failures };