moflo 4.9.24 → 4.9.25

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,51 @@
1
+ ---
2
+ name: healer
3
+ description: Run moflo's Healer (`flo healer`, alias for `flo doctor`) from inside the Claude session. Audit-only by default; pass `--fix` to apply auto-repairs, `-c <component>` for a single check. Use when something feels off (missing moflo.yaml, daemon dead, statusline empty, hooks not firing) or as a periodic health check. Distinct from Claude Code's built-in `/doctor`, which diagnoses Claude Code itself, not moflo.
4
+ arguments: "[--fix] [-c <component>]"
5
+ ---
6
+
7
+ # /healer — moflo Installation Healer
8
+
9
+ Thin wrapper around the `flo healer` CLI. All check + fix logic lives in the CLI; this skill just shells out, surfaces results in-thread, and gives one-line follow-up nudges.
10
+
11
+ **Arguments:** $ARGUMENTS
12
+
13
+ ## Procedure
14
+
15
+ 1. **Memory first** (gate requirement):
16
+ ```
17
+ mcp__moflo__memory_search { query: "doctor healer fix moflo.yaml gate hook wiring", namespace: "guidance" }
18
+ ```
19
+
20
+ 2. **Run the CLI** with the user's arguments passed through:
21
+ ```bash
22
+ npx moflo healer --json $ARGUMENTS
23
+ ```
24
+ - No args → audit-only.
25
+ - `--fix` → CLI runs auto-repairs after the audit.
26
+ - `-c <component>` → restricts to one check.
27
+ - Always include `--json` so output is machine-parseable.
28
+
29
+ 3. **Surface the JSON in-thread**. Group by status:
30
+ - `✓ N passing` (count only)
31
+ - `⚠ warnings` — list `name: message`; flag with `[auto-fixable]` when the result has a `fix` field
32
+ - `✗ failures` — same
33
+ - If `--fix` mode, also list which fixes were applied vs which need manual action.
34
+
35
+ 4. **Nudge based on what changed.** Only mention next steps for state that *actually* changed:
36
+ - Daemon restarted → `Statusline should refresh within ~5s.`
37
+ - `moflo.yaml` created → `Review the new defaults at the project root before your next deep run.`
38
+ - Hook wiring repaired → `Restart Claude Code so the new SessionStart hook fires next launch.`
39
+ - In audit-only mode with auto-fixable issues → `Run /healer --fix to repair.`
40
+
41
+ ## Rules
42
+
43
+ - **Don't** re-document checks or fixes here. The CLI's `--help` and `src/cli/commands/doctor-*` are the source of truth.
44
+ - **Don't** call `flo doctor` directly — use the `healer` alias for thematic consistency. They're equivalent CLI-side.
45
+ - **Don't** swallow non-zero exit codes silently — surface them in the summary.
46
+ - **Note for users:** Claude Code has its own built-in `/doctor` command that diagnoses Claude Code itself. This skill (`/healer`) diagnoses **moflo**, not Claude Code. The two are complementary, not duplicates — and the healer also runs `claude doctor` internally as a delegated check (`Claude Code Doctor`) so Claude-side issues (auth, settings drift, IDE/extension state) surface in the same report. With `--fix`, the healer re-runs `claude doctor` interactively so you can see and act on its findings; Claude-side fixes typically need user gestures (re-auth, IDE reload) and aren't auto-applied.
47
+
48
+ ## See Also
49
+
50
+ - `flo doctor --help` — full flag/component list
51
+ - `/eldar` — broader project-setup audit; consults the Healer as one input
@@ -93,19 +93,15 @@ Skipped by default — `ci.yml` runs the full test suite on every PR. Run when `
93
93
 
94
94
  **Must have 0 test file failures.** If any test files fail, retest them individually to distinguish real failures from flaky ones (per broken window theory). Fix all real failures before proceeding.
95
95
 
96
- ### Step 5: Doctor (always)
96
+ ### Step 5: Doctor (always — strict, no repair)
97
97
 
98
- Default mode:
99
- ```bash
100
- npx moflo doctor --fix
101
- ```
102
-
103
- Check mode (`CHECK_MODE=true`):
104
98
  ```bash
105
99
  npx moflo doctor --strict
106
100
  ```
107
101
 
108
- Doctor is the only check with no CI equivalent — it inspects local state (daemon lock, embeddings hygiene, sandbox tier, vector-stats freshness) that CI cannot validate for you. Always runs.
102
+ Doctor is the only check with no CI equivalent — it inspects local state (daemon lock, embeddings hygiene, sandbox tier, vector-stats freshness) that CI cannot validate for you. Always runs in `--strict` mode regardless of `CHECK_MODE`.
103
+
104
+ **Never `--fix` on the publish path.** A release pipeline must fail fast on broken local state, not silently repair it; a doctor that auto-repairs masks the very signal we want — "something is off, stop and investigate before shipping." If `doctor --strict` fails, stop and run `flo healer --fix` (or `npx moflo doctor --fix`) interactively, verify the repair, then retry the publish.
109
105
 
110
106
  ### Step 6: Smoke Tests (only if `CHECK_MODE=true`)
111
107
 
@@ -0,0 +1,200 @@
1
+ /**
2
+ * Shared file-sync helper for the launcher (#854 §3) and the postinstall
3
+ * bootstrap (#857 / #975).
4
+ *
5
+ * Both layers used to inline the same retry/breaker + copy logic. They drifted
6
+ * once already (the bootstrap added hash-skip + atomic tmp+rename for #975
7
+ * while the launcher kept the bare copyFileSync), so this module is the single
8
+ * source of truth.
9
+ *
10
+ * Responsibilities:
11
+ * - hash-skip when src and dest are byte-identical (eliminates the dominant
12
+ * failure class — overwriting an unchanged file open by Claude/indexer).
13
+ * - atomic tmp + rename so concurrent readers never see a torn write.
14
+ * - post-write size verify to catch torn writes from AV mid-stream and
15
+ * partial DrvFs writes that returned success codes.
16
+ * - retry the transient error class (EBUSY/EPERM/EACCES + EVERIFY) with
17
+ * exponential backoff [50,200,800]ms.
18
+ * - circuit-break after CIRCUIT_BREAK_THRESHOLD distinct exhausted-retry
19
+ * failures so a sick host (AV mid-scan over node_modules) doesn't compound
20
+ * wall-clock cost.
21
+ *
22
+ * Ships at `bin/lib/file-sync.mjs`. Bootstrap imports via relative path from
23
+ * `scripts/`; launcher imports via `./lib/file-sync.mjs` after sync to
24
+ * `<consumer>/.claude/scripts/lib/`.
25
+ */
26
+
27
+ import {
28
+ copyFileSync,
29
+ existsSync,
30
+ mkdirSync,
31
+ readFileSync,
32
+ renameSync,
33
+ statSync,
34
+ unlinkSync,
35
+ } from 'node:fs';
36
+ import { createHash } from 'node:crypto';
37
+ import { dirname } from 'node:path';
38
+
39
+ export const TRANSIENT_CODES = new Set(['EBUSY', 'EPERM', 'EACCES']);
40
+ export const RETRY_BACKOFF_MS = [50, 200, 800];
41
+ export const CIRCUIT_BREAK_THRESHOLD = 5;
42
+
43
+ // Code attached to the post-write size-verify failure. Treated as transient by
44
+ // syncWithRetry so torn writes from AV mid-stream / partial DrvFs writes get a
45
+ // retry instead of immediately surfacing as a hard failure.
46
+ export const VERIFY_FAIL_CODE = 'EVERIFY';
47
+
48
+ const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
49
+
50
+ export function fileHash(path) {
51
+ try {
52
+ return createHash('sha1').update(readFileSync(path)).digest('hex');
53
+ } catch {
54
+ return null;
55
+ }
56
+ }
57
+
58
+ export function contentEqual(srcPath, destPath) {
59
+ if (!existsSync(destPath)) return false;
60
+ // Size check first — skips the SHA-1 pass on every mis-sized pair without
61
+ // any I/O on the file body. For the bootstrap's small file set the SHA-1
62
+ // is cheap, but this fires on every file on every session-start with
63
+ // version drift, and under load (AV lock + retries) the reads compound.
64
+ let srcSize, destSize;
65
+ try {
66
+ srcSize = statSync(srcPath).size;
67
+ destSize = statSync(destPath).size;
68
+ } catch {
69
+ return false;
70
+ }
71
+ if (srcSize !== destSize) return false;
72
+ const srcHash = fileHash(srcPath);
73
+ if (!srcHash) return false;
74
+ const destHash = fileHash(destPath);
75
+ return destHash !== null && srcHash === destHash;
76
+ }
77
+
78
+ /**
79
+ * Atomic copy via tmp + rename with post-write size verify.
80
+ *
81
+ * Steps:
82
+ * 1. copyFileSync(src, dest.tmp)
83
+ * 2. Verify dest.tmp size matches src size (catches torn writes from AV
84
+ * mid-stream and partial DrvFs writes that returned success codes).
85
+ * Mismatch unlinks the tmp and throws { code: 'EVERIFY' }, which the
86
+ * retry loop treats as transient.
87
+ * 3. renameSync(dest.tmp, dest) — atomic on Win/macOS/Linux/WSL/DrvFs.
88
+ *
89
+ * If rename fails, the .tmp sidecar persists as a recovery breadcrumb — next
90
+ * session-start can complete the swap once the original lock has cleared.
91
+ *
92
+ * `deps` is dependency injection for tests (#976 fault injection of the
93
+ * truncated-tmp / partial-DrvFs scenario). Production callers omit it.
94
+ */
95
+ export function atomicCopy(src, dest, deps = {}) {
96
+ const _copyFile = deps.copyFile || copyFileSync;
97
+ const _stat = deps.stat || statSync;
98
+ const _rename = deps.rename || renameSync;
99
+ const _unlink = deps.unlink || unlinkSync;
100
+
101
+ const tmp = `${dest}.tmp`;
102
+ _copyFile(src, tmp);
103
+ let srcSize, tmpSize;
104
+ try {
105
+ srcSize = _stat(src).size;
106
+ tmpSize = _stat(tmp).size;
107
+ } catch (statErr) {
108
+ try { _unlink(tmp); } catch { /* best-effort cleanup */ }
109
+ const err = new Error(`atomicCopy verify stat failed: ${statErr.message || statErr}`);
110
+ err.code = statErr.code || VERIFY_FAIL_CODE;
111
+ throw err;
112
+ }
113
+ if (srcSize !== tmpSize) {
114
+ try { _unlink(tmp); } catch { /* best-effort cleanup */ }
115
+ const err = new Error(
116
+ `atomicCopy size mismatch (src=${srcSize} tmp=${tmpSize}) for ${dest}`,
117
+ );
118
+ err.code = VERIFY_FAIL_CODE;
119
+ throw err;
120
+ }
121
+ _rename(tmp, dest);
122
+ }
123
+
124
+ export function errMessage(err) {
125
+ if (!err) return 'unknown error';
126
+ return err.code ? `${err.code} ${err.message || ''}`.trim() : (err.message || String(err));
127
+ }
128
+
129
+ /**
130
+ * Build a retry-aware syncer.
131
+ *
132
+ * @param {object} [options]
133
+ * @param {(key: string, dest: string) => void} [options.onSuccess]
134
+ * Fires after every successful syncFile (including hash-skip identical
135
+ * paths). Use it to record manifest entries from the launcher; bootstrap
136
+ * ignores it.
137
+ *
138
+ * @returns {{
139
+ * syncFile: (src: string, dest: string, key: string) => Promise<{ok?: boolean, skipped?: true | 'identical'}>,
140
+ * failures: Array<{key: string, message: string, src?: string, dest?: string}>,
141
+ * isCircuitOpen: () => boolean,
142
+ * }}
143
+ */
144
+ export function makeSyncer({ onSuccess } = {}) {
145
+ let circuitOpen = false;
146
+ const failures = [];
147
+
148
+ async function syncWithRetry(operation) {
149
+ const maxAttempts = circuitOpen ? 1 : RETRY_BACKOFF_MS.length + 1;
150
+ let lastErr = null;
151
+ let lastCode = null;
152
+ for (let attempt = 0; attempt < maxAttempts; attempt++) {
153
+ if (attempt > 0) await sleep(RETRY_BACKOFF_MS[attempt - 1]);
154
+ try {
155
+ operation();
156
+ return { ok: true };
157
+ } catch (err) {
158
+ lastErr = err;
159
+ lastCode = err && err.code ? err.code : null;
160
+ const transient = TRANSIENT_CODES.has(lastCode) || lastCode === VERIFY_FAIL_CODE;
161
+ if (!transient) break;
162
+ }
163
+ }
164
+ if (!circuitOpen && failures.length + 1 >= CIRCUIT_BREAK_THRESHOLD) {
165
+ circuitOpen = true;
166
+ }
167
+ return { ok: false, err: lastErr, code: lastCode };
168
+ }
169
+
170
+ async function syncFile(src, dest, key) {
171
+ if (!existsSync(src)) return { skipped: true };
172
+ try {
173
+ mkdirSync(dirname(dest), { recursive: true });
174
+ } catch (err) {
175
+ failures.push({ key, message: errMessage(err), src, dest });
176
+ return { ok: false };
177
+ }
178
+ if (contentEqual(src, dest)) {
179
+ try { onSuccess?.(key, dest); } catch { /* non-fatal */ }
180
+ return { ok: true, skipped: 'identical' };
181
+ }
182
+ const result = await syncWithRetry(() => atomicCopy(src, dest));
183
+ if (result.ok) {
184
+ try { onSuccess?.(key, dest); } catch { /* non-fatal */ }
185
+ return { ok: true };
186
+ }
187
+ const transient = TRANSIENT_CODES.has(result.code) || result.code === VERIFY_FAIL_CODE;
188
+ const tail = transient
189
+ ? ` (retried ${RETRY_BACKOFF_MS.length}× after ${result.code}${circuitOpen ? '; circuit open' : ''})`
190
+ : '';
191
+ failures.push({ key, message: `${errMessage(result.err)}${tail}`, src, dest });
192
+ return { ok: false };
193
+ }
194
+
195
+ return {
196
+ syncFile,
197
+ failures,
198
+ isCircuitOpen: () => circuitOpen,
199
+ };
200
+ }
@@ -8,13 +8,14 @@
8
8
  */
9
9
 
10
10
  import { spawn, execFileSync } from 'child_process';
11
- import { existsSync, readFileSync, writeFileSync, copyFileSync, unlinkSync, readdirSync, mkdirSync, statSync } from 'fs';
11
+ import { existsSync, readFileSync, writeFileSync, unlinkSync, readdirSync, mkdirSync, statSync } from 'fs';
12
12
  import { resolve, dirname, join } from 'path';
13
13
  import { fileURLToPath, pathToFileURL } from 'url';
14
14
  import { mofloDir } from './lib/moflo-paths.mjs';
15
15
  import { repairMemoryDbIfCorrupt } from './lib/db-repair.mjs';
16
16
  import { resolveMofloBin } from './lib/resolve-bin.mjs';
17
17
  import { applyRetiredPrune } from './lib/retired-files.mjs';
18
+ import { makeSyncer, contentEqual } from './lib/file-sync.mjs';
18
19
 
19
20
  // Headless skip (#860). The daemon's headless workers spawn `claude --print`
20
21
  // with CLAUDE_CODE_HEADLESS=true (see src/cli/services/headless-worker-
@@ -166,8 +167,12 @@ const UPGRADE_NOTICE_INPROGRESS_TTL_MS = 5 * 60 * 1000;
166
167
  const UPGRADE_NOTICE_COMPLETED_TTL_MS = 2 * 60 * 1000;
167
168
  const UPGRADE_NOTICE_PATH = () => join(mofloDir(projectRoot), 'upgrade-notice.json');
168
169
 
169
- function writeUpgradeNotice(status) {
170
- if (!upgradeNoticeContext) return;
170
+ // Single-source-of-truth notice writer. Reused by writeUpgradeNotice (the
171
+ // version-bump / drift-heal path) and the §0-bootstrap-sentinel + §3h paths
172
+ // (#975 statusline-channel promotion). Keeps the JSON shape colocated with
173
+ // the TTL constants instead of letting it drift across two inline copies.
174
+ function buildAndWriteNotice(context, status) {
175
+ if (!context) return;
171
176
  const ttlMs = status === 'completed'
172
177
  ? UPGRADE_NOTICE_COMPLETED_TTL_MS
173
178
  : UPGRADE_NOTICE_INPROGRESS_TTL_MS;
@@ -176,9 +181,9 @@ function writeUpgradeNotice(status) {
176
181
  const now = Date.now();
177
182
  const notice = {
178
183
  status,
179
- kind: upgradeNoticeContext.kind,
180
- from: upgradeNoticeContext.from,
181
- to: upgradeNoticeContext.to,
184
+ kind: context.kind,
185
+ from: context.from,
186
+ to: context.to,
182
187
  at: new Date(now).toISOString(),
183
188
  expiresAt: new Date(now + ttlMs).toISOString(),
184
189
  changes: 0,
@@ -187,6 +192,10 @@ function writeUpgradeNotice(status) {
187
192
  } catch { /* non-fatal — statusline just won't show the segment */ }
188
193
  }
189
194
 
195
+ function writeUpgradeNotice(status) {
196
+ buildAndWriteNotice(upgradeNoticeContext, status);
197
+ }
198
+
190
199
  // ── 0-pre. Drop any stale upgrade notice (#738, #743) ───────────────────────
191
200
  // `upgrade-notice.json` is a transient handshake between launcher and
192
201
  // statusline — it should never survive past the launcher run that wrote it.
@@ -201,6 +210,39 @@ try {
201
210
  unlinkSync(join(mofloDir(projectRoot), 'upgrade-notice.json'));
202
211
  } catch { /* non-fatal — file usually doesn't exist */ }
203
212
 
213
+ // ── 0-bootstrap-sentinel. Surface partial-bootstrap failures (#975) ─────────
214
+ // `scripts/post-install-bootstrap.mjs` writes `.moflo/bootstrap-failed.json`
215
+ // when its file-sync left some helpers unwritten (WSL DrvFs lock, EBUSY race,
216
+ // breaker open, …). Without this block the user has no in-session signal
217
+ // that the upgrade was incomplete — the launcher itself ran fine, but it's
218
+ // running from STALE files. Emit a high-visibility line pointing them at
219
+ // the healer so the silent failure mode that produced #975 can't recur.
220
+ // Section 3h below clears the sentinel after a clean re-sync.
221
+ //
222
+ // Also write a `kind: 'repair'` upgrade-notice so the statusline surfaces
223
+ // the prompt persistently — emitWarning lands on stderr only and Claude Code
224
+ // relays it once on session start; the statusline keeps the indicator in
225
+ // front of the user until §3h flips it to `completed` (sync resolved) or
226
+ // the 5-min in-progress TTL expires (visibility cap, statusline tests).
227
+ let bootstrapSentinelData = null;
228
+ const BOOTSTRAP_SENTINEL_PATH = resolve(mofloDir(projectRoot), 'bootstrap-failed.json');
229
+ let bootstrapNoticeContext = null;
230
+ try {
231
+ if (existsSync(BOOTSTRAP_SENTINEL_PATH)) {
232
+ bootstrapSentinelData = JSON.parse(readFileSync(BOOTSTRAP_SENTINEL_PATH, 'utf-8'));
233
+ const count = Array.isArray(bootstrapSentinelData?.failures) ? bootstrapSentinelData.failures.length : 0;
234
+ const sentinelVersion = bootstrapSentinelData?.mofloVersion || 'unknown';
235
+ emitWarning(
236
+ `Upgrade detected ${count} unfinished install step(s) from npm install (moflo@${sentinelVersion}). Run /healer --fix to repair.`,
237
+ );
238
+ bootstrapNoticeContext = { kind: 'repair', from: sentinelVersion, to: sentinelVersion };
239
+ buildAndWriteNotice(bootstrapNoticeContext, 'in-progress');
240
+ }
241
+ } catch (err) {
242
+ // Unreadable sentinel — leave it; healer will catch the underlying issue.
243
+ emitWarning(`bootstrap sentinel read skipped (${errMessage(err)})`);
244
+ }
245
+
204
246
  // ── 0. Legacy whole-DB / directory migrations have been retired (#851) ─────
205
247
  // LEGACY-V2: Pre-#851 the launcher renamed `.claude-flow/` → `.moflo/` and
206
248
  // byte-copied `.swarm/memory.db` → `.moflo/moflo.db` on every session start.
@@ -518,65 +560,21 @@ try {
518
560
  // pre-upgrade content forever because it was never recorded in the
519
561
  // manifest. Surface failures on stderr — Claude Code captures
520
562
  // session-start stderr as additionalContext so the user sees them too.
521
- const syncFailures = [];
522
-
523
- // Standard retry with exponential backoff + circuit breaker for the
524
- // transient error class (EBUSY / EPERM / EACCES Windows file lock,
525
- // AV real-time scan, concurrent helper invocation). Hard errors
526
- // (ENOENT, etc.) fall through immediately. Once 5 distinct files have
527
- // exhausted retries the circuit opens and the tail of the sync runs
528
- // with maxAttempts=1 so a sick host (AV mid-scan over node_modules)
529
- // doesn't compound the wall-clock cost. Async setTimeout — never
530
- // busy-wait in a session-start hook (CPU pinning during EBUSY backoff
531
- // is the worst possible response when the OS is the bottleneck).
532
- const TRANSIENT_CODES = new Set(['EBUSY', 'EPERM', 'EACCES']);
533
- const RETRY_BACKOFF_MS = [50, 200, 800];
534
- const CIRCUIT_BREAK_THRESHOLD = 5;
535
- let circuitOpen = false;
536
- const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
537
- async function syncWithRetry(operation) {
538
- const maxAttempts = circuitOpen ? 1 : RETRY_BACKOFF_MS.length + 1;
539
- let lastErr = null;
540
- let lastCode = null;
541
- for (let attempt = 0; attempt < maxAttempts; attempt++) {
542
- if (attempt > 0) await sleep(RETRY_BACKOFF_MS[attempt - 1]);
543
- try {
544
- operation();
545
- return { ok: true };
546
- } catch (err) {
547
- lastErr = err;
548
- lastCode = err && err.code ? err.code : null;
549
- if (!TRANSIENT_CODES.has(lastCode)) break;
550
- }
551
- }
552
- if (!circuitOpen && syncFailures.length + 1 >= CIRCUIT_BREAK_THRESHOLD) {
553
- circuitOpen = true;
554
- }
555
- return { ok: false, err: lastErr, code: lastCode };
556
- }
557
-
558
- /** Copy src → dest if src exists, record `{path, size}` in manifest.
559
- * Retries the transient error class with backoff (#854); failures land
560
- * in syncFailures for the post-block stderr summary. The recorded size
561
- * is read from the just-written destination so a subsequent launcher
562
- * can detect content drift via size mismatch. */
563
+ //
564
+ // Retry/breaker semantics (#854) + hash-skip + atomic tmp+rename + post-
565
+ // write verify (#975) live in `./lib/file-sync.mjs`, shared with
566
+ // `scripts/post-install-bootstrap.mjs` so the npm-install path and the
567
+ // session-start path can't drift. The launcher records manifest entries
568
+ // on success via the onSuccess callback so currentManifest stays the
569
+ // single source of truth for next-session retired-file cleanup.
563
570
  function recordManifestEntry(manifestKey, dest) {
564
571
  let size = null;
565
572
  try { size = statSync(dest).size; } catch { /* size left null — drift check still works on file-existence */ }
566
573
  currentManifest.push({ path: manifestKey, size });
567
574
  }
568
- async function syncFile(src, dest, manifestKey) {
569
- if (!existsSync(src)) return;
570
- const result = await syncWithRetry(() => copyFileSync(src, dest));
571
- if (result.ok) {
572
- recordManifestEntry(manifestKey, dest);
573
- return;
574
- }
575
- const tail = TRANSIENT_CODES.has(result.code)
576
- ? ` (retried ${RETRY_BACKOFF_MS.length}× after ${result.code}${circuitOpen ? '; circuit open' : ''})`
577
- : '';
578
- syncFailures.push({ key: manifestKey, message: `${errMessage(result.err)}${tail}` });
579
- }
575
+ const { syncFile, failures: syncFailures } = makeSyncer({
576
+ onSuccess: (key, dest) => recordManifestEntry(key, dest),
577
+ });
580
578
 
581
579
  // Version changed — sync scripts from bin/
582
580
  if (autoUpdateConfig.scripts) {
@@ -663,20 +661,11 @@ try {
663
661
  for (const srcDir of helperSources) {
664
662
  const src = resolve(srcDir, file);
665
663
  if (existsSync(src)) {
666
- const inlineResult = await syncWithRetry(() => copyFileSync(src, dest));
667
- if (inlineResult.ok) {
668
- recordManifestEntry(`.claude/helpers/${file}`, dest);
669
- } else {
670
- const code = inlineResult.code;
671
- const tail = TRANSIENT_CODES.has(code)
672
- ? ` (retried ${RETRY_BACKOFF_MS.length}× after ${code}${circuitOpen ? '; circuit open' : ''})`
673
- : '';
674
- syncFailures.push({
675
- key: `.claude/helpers/${file}`,
676
- message: `${errMessage(inlineResult.err)}${tail}`,
677
- });
678
- }
679
- break; // first source wins
664
+ // First existing source wins same semantics as before. The
665
+ // shared syncFile records manifest + collects failures the
666
+ // same way the rest of section 3 does.
667
+ await syncFile(src, dest, `.claude/helpers/${file}`);
668
+ break;
680
669
  }
681
670
  }
682
671
  }
@@ -1314,6 +1303,15 @@ try {
1314
1303
  'review defaults — model routing, sandbox, gates, hooks',
1315
1304
  );
1316
1305
  }
1306
+ } else {
1307
+ // Previously a silent skip — masked the actual reason consumers didn't
1308
+ // get a yaml after upgrading from pre-#895 versions. If neither template
1309
+ // path resolves the install is incomplete (partial extract, prune ate
1310
+ // the file, dogfood without a built dist/). Surface a healer hint so
1311
+ // the user can repair instead of staring at a missing yaml.
1312
+ emitWarning(
1313
+ `moflo.yaml create skipped — template not found at ${tplPaths.join(' or ')}; run 'flo doctor --fix' to repair`,
1314
+ );
1317
1315
  }
1318
1316
  }
1319
1317
  } catch (err) {
@@ -1494,6 +1492,41 @@ if (pendingVersionStampWrite) {
1494
1492
  }
1495
1493
  }
1496
1494
 
1495
+ // ── 3h. Clear bootstrap sentinel if section-3 sync resolved it (#975) ───────
1496
+ // Section 3 above re-attempts the same file copies the bootstrap was supposed
1497
+ // to do, with the launcher's own retry logic. If after section 3 every file
1498
+ // the bootstrap reported as failed is now byte-identical to its source, the
1499
+ // previously-unfinished work is done — drop the sentinel so the warning at
1500
+ // section 0-bootstrap-sentinel doesn't fire on the next session. If anything
1501
+ // is still mismatched, leave the sentinel in place; healer / next session
1502
+ // will re-attempt.
1503
+ if (bootstrapSentinelData?.failures?.length > 0) {
1504
+ try {
1505
+ const allRepaired = bootstrapSentinelData.failures.every((f) => {
1506
+ if (!f?.src || !f?.dest) return false;
1507
+ // contentEqual already does the size short-circuit + SHA-1 hash and
1508
+ // is the same predicate the §3 sync used to decide whether to skip
1509
+ // the copy in the first place — reusing it here keeps "the sentinel
1510
+ // is clearable when bytes match" consistent across both code paths.
1511
+ return contentEqual(f.src, f.dest);
1512
+ });
1513
+ if (allRepaired) {
1514
+ unlinkSync(BOOTSTRAP_SENTINEL_PATH);
1515
+ emitMutation('cleared bootstrap-failed sentinel', 'previously-failed copies are now in sync');
1516
+ // Flip the §0-bootstrap-sentinel "in-progress" repair notice to
1517
+ // "completed" so the statusline shows the post-repair badge. Skip when
1518
+ // §3 already wrote its own upgradeNoticeContext (version bump / drift)
1519
+ // — that path runs the §3f writer with its own kind/version and we
1520
+ // shouldn't clobber it from here.
1521
+ if (bootstrapNoticeContext && !upgradeNoticeContext) {
1522
+ buildAndWriteNotice(bootstrapNoticeContext, 'completed');
1523
+ }
1524
+ }
1525
+ } catch (err) {
1526
+ emitWarning(`bootstrap sentinel verify skipped (${errMessage(err)})`);
1527
+ }
1528
+ }
1529
+
1497
1530
  // Bypasses emitMutation — framing, not a mutation, so it must not inflate the count.
1498
1531
  if (mutationCount > 0) {
1499
1532
  try {
@@ -147,6 +147,84 @@ export async function checkClaudeCode() {
147
147
  };
148
148
  }
149
149
  }
150
+ /**
151
+ * Delegate diagnostics to Claude Code's own `claude doctor` command and surface
152
+ * the result. Catches Claude-side issues (settings drift, MCP/auth, IDE/extension
153
+ * state, update channel) that moflo's own checks can't see — since `claude` is
154
+ * not a moflo-owned binary we don't try to parse its output structurally; we
155
+ * just report exit code + a short tail. Skip silently when `claude` isn't
156
+ * installed — `checkClaudeCode` already covers that condition.
157
+ */
158
+ export async function checkClaudeCodeDoctor() {
159
+ try {
160
+ await runCommand('claude --version', 3000);
161
+ }
162
+ catch {
163
+ return {
164
+ name: 'Claude Code Doctor',
165
+ status: 'pass',
166
+ message: 'Skipped (claude CLI not installed — see Claude Code CLI check)',
167
+ };
168
+ }
169
+ // Capture both streams + exit code without throwing. `claude doctor` exits
170
+ // non-zero on findings, so a try/catch over execAsync would lose the body.
171
+ const result = await new Promise((resolve) => {
172
+ const child = exec('claude doctor', {
173
+ encoding: 'utf8',
174
+ timeout: 30000,
175
+ shell: process.platform === 'win32' ? 'cmd.exe' : '/bin/sh',
176
+ env: { ...process.env },
177
+ windowsHide: true,
178
+ }, (err, stdout, stderr) => {
179
+ resolve({
180
+ code: err && typeof err.code === 'number'
181
+ ? (err.code)
182
+ : (err ? 1 : 0),
183
+ stdout: (stdout || '').toString().trim(),
184
+ stderr: (stderr || '').toString().trim(),
185
+ });
186
+ });
187
+ child.on('error', () => resolve({ code: 1, stdout: '', stderr: '' }));
188
+ });
189
+ // claude doctor not recognised → some Claude versions don't ship the
190
+ // subcommand. Surface as a pass-skip rather than a failure so older Claude
191
+ // installs aren't penalised.
192
+ const combined = `${result.stdout}\n${result.stderr}`.toLowerCase();
193
+ if (/unknown command|command not found|usage:.*claude/.test(combined) &&
194
+ !combined.includes('check')) {
195
+ return {
196
+ name: 'Claude Code Doctor',
197
+ status: 'pass',
198
+ message: 'Skipped (this Claude version does not expose `claude doctor`)',
199
+ };
200
+ }
201
+ if (result.code === 0) {
202
+ const firstLine = result.stdout.split(/\r?\n/).find((l) => l.trim()) || 'No issues reported';
203
+ return { name: 'Claude Code Doctor', status: 'pass', message: firstLine.slice(0, 120) };
204
+ }
205
+ // Non-zero with zero output → `claude doctor` is interactive in current
206
+ // Claude Code releases (verified on 2.1.132): it opens a TUI and produces
207
+ // nothing on a non-TTY child stdout, then our exec timeout kills it. Treat
208
+ // as a skip — the check can't observe the TUI from here, and warning would
209
+ // fire on every machine running the same Claude version.
210
+ if (!result.stdout && !result.stderr) {
211
+ return {
212
+ name: 'Claude Code Doctor',
213
+ status: 'pass',
214
+ message: 'Skipped (claude doctor is interactive — run manually to see findings)',
215
+ };
216
+ }
217
+ // Non-zero — surface the tail so the user has a hint, and point to the
218
+ // interactive command for the full report. Don't try to fix from here:
219
+ // Claude-side fixes (re-auth, settings repair, IDE reload) need user gestures.
220
+ const tailLines = result.stdout.split(/\r?\n/).filter((l) => l.trim()).slice(-3).join(' | ');
221
+ return {
222
+ name: 'Claude Code Doctor',
223
+ status: 'warn',
224
+ message: `claude doctor reported issues: ${tailLines.slice(0, 200)}`,
225
+ fix: 'Run `claude doctor` interactively for full report and follow its instructions',
226
+ };
227
+ }
150
228
  export async function installClaudeCode() {
151
229
  try {
152
230
  output.writeln();
@@ -5,6 +5,7 @@
5
5
  * shell-out where possible). Falls back to running the check's `fix` string
6
6
  * if it looks like an `npx`/`npm`/`claude` command.
7
7
  */
8
+ import { execSync } from 'child_process';
8
9
  import { existsSync, mkdirSync, readFileSync, unlinkSync, writeFileSync } from 'fs';
9
10
  import { join } from 'path';
10
11
  import { output } from '../output.js';
@@ -139,6 +140,23 @@ export async function autoFixCheck(check) {
139
140
  return false;
140
141
  }
141
142
  },
143
+ // moflo.yaml auto-create. The session-start launcher already runs
144
+ // `ensureMofloYamlExists` (see bin/session-start-launcher.mjs § 3d-yaml-create,
145
+ // #895) but it can miss when the launcher itself was old at upgrade time —
146
+ // user reported moflo.yaml absent after npm-installing past 4.9.2. Mirror
147
+ // the same canonical create here so doctor --fix (and the /healer skill
148
+ // wrapping it) self-heal on the spot instead of waiting for the next
149
+ // SessionStart firing.
150
+ 'moflo.yaml': async () => {
151
+ try {
152
+ const { ensureMofloYamlExists } = await import('../init/moflo-yaml-template.js');
153
+ const result = ensureMofloYamlExists(process.cwd());
154
+ return result.created || existsSync(join(process.cwd(), 'moflo.yaml'));
155
+ }
156
+ catch {
157
+ return false;
158
+ }
159
+ },
142
160
  'Daemon Status': async () => {
143
161
  const lockFile = join(process.cwd(), '.moflo', 'daemon.lock');
144
162
  const pidFile = join(process.cwd(), '.moflo', 'daemon.pid');
@@ -157,6 +175,27 @@ export async function autoFixCheck(check) {
157
175
  'Claude Code CLI': async () => {
158
176
  return installClaudeCode();
159
177
  },
178
+ // Pass-through to Claude Code's own diagnostic. We don't own its CLI surface
179
+ // and most Claude-side findings (auth, IDE reload, settings drift) need
180
+ // user gestures, so the "fix" here is just to re-run with inherited stdio
181
+ // and let the user act on what they see.
182
+ 'Claude Code Doctor': async () => {
183
+ try {
184
+ execSync('claude doctor', {
185
+ encoding: 'utf8',
186
+ stdio: 'inherit',
187
+ windowsHide: true,
188
+ timeout: 60000,
189
+ });
190
+ return true;
191
+ }
192
+ catch {
193
+ // Non-zero exit is informational here — user has seen the output and
194
+ // can act on it. Don't claim success, but don't claim failure of OUR
195
+ // healer either; flag as "needs manual action".
196
+ return false;
197
+ }
198
+ },
160
199
  'Zombie Processes': async () => {
161
200
  const result = await findZombieProcesses(true);
162
201
  return result.killed > 0 || result.details.length === 0;
@@ -8,7 +8,7 @@ import { checkSubagentHealth, checkSpellExecution, checkMcpToolInvocation, check
8
8
  import { checkEmbeddingHygiene } from './doctor-embedding-hygiene.js';
9
9
  import { checkSwarmFunctional, checkHiveMindFunctional, } from './doctor-checks-swarm.js';
10
10
  import { checkMemoryAccessFunctional } from './doctor-checks-memory-access.js';
11
- import { checkBuildTools, checkClaudeCode, checkDiskSpace, checkGit, checkGitRepo, checkNodeVersion, checkNpmVersion, } from './doctor-checks-runtime.js';
11
+ import { checkBuildTools, checkClaudeCode, checkClaudeCodeDoctor, checkDiskSpace, checkGit, checkGitRepo, checkNodeVersion, checkNpmVersion, } from './doctor-checks-runtime.js';
12
12
  import { checkConfigFile, checkDaemonStatus, checkMcpServers, checkMemoryDatabase, checkMofloYamlCompliance, checkStatusLine, checkTestDirs, } from './doctor-checks-config.js';
13
13
  import { checkSpellEngine, checkSandboxTier } from './doctor-checks-platform.js';
14
14
  import { checkEmbeddings, checkSemanticQuality, } from './doctor-checks-memory.js';
@@ -21,6 +21,7 @@ export const allChecks = [
21
21
  checkNodeVersion,
22
22
  checkNpmVersion,
23
23
  checkClaudeCode,
24
+ checkClaudeCodeDoctor,
24
25
  checkGit,
25
26
  checkGitRepo,
26
27
  checkConfigFile,
@@ -63,6 +64,8 @@ export const componentMap = {
63
64
  'node': checkNodeVersion,
64
65
  'npm': checkNpmVersion,
65
66
  'claude': checkClaudeCode,
67
+ 'claude-doctor': checkClaudeCodeDoctor,
68
+ 'claude-code-doctor': checkClaudeCodeDoctor,
66
69
  'config': checkConfigFile,
67
70
  'yaml': checkMofloYamlCompliance,
68
71
  'moflo-yaml': checkMofloYamlCompliance,
@@ -33,6 +33,7 @@ export const SKILLS_MAP = {
33
33
  core: [
34
34
  'eldar',
35
35
  'guidance',
36
+ 'healer',
36
37
  'flo-simplify',
37
38
  'reasoningbank-intelligence',
38
39
  ],
@@ -2,5 +2,5 @@
2
2
  * Auto-generated by build. Do not edit manually.
3
3
  * Source of truth: root package.json → scripts/sync-version.mjs
4
4
  */
5
- export const VERSION = '4.9.24';
5
+ export const VERSION = '4.9.25';
6
6
  //# sourceMappingURL=version.js.map
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "moflo",
3
- "version": "4.9.24",
3
+ "version": "4.9.25",
4
4
  "description": "MoFlo — AI agent orchestration for Claude Code. A standalone, opinionated toolkit with semantic memory, learned routing, gates, spells, and the /flo issue-execution skill.",
5
5
  "main": "dist/src/cli/index.js",
6
6
  "type": "module",
@@ -84,7 +84,7 @@
84
84
  "@typescript-eslint/eslint-plugin": "^7.18.0",
85
85
  "@typescript-eslint/parser": "^7.18.0",
86
86
  "eslint": "^8.0.0",
87
- "moflo": "^4.9.23",
87
+ "moflo": "^4.9.24",
88
88
  "tsx": "^4.21.0",
89
89
  "typescript": "^5.9.3",
90
90
  "vitest": "^4.0.0"
@@ -37,15 +37,16 @@
37
37
  */
38
38
 
39
39
  import {
40
- copyFileSync,
41
40
  existsSync,
42
41
  mkdirSync,
43
42
  readdirSync,
44
43
  readFileSync,
45
44
  statSync,
45
+ writeFileSync,
46
46
  } from 'node:fs';
47
47
  import { dirname, join, resolve } from 'node:path';
48
48
  import { fileURLToPath, pathToFileURL } from 'node:url';
49
+ import { errMessage, makeSyncer } from '../bin/lib/file-sync.mjs';
49
50
 
50
51
  const SCRIPT_PATH = fileURLToPath(import.meta.url);
51
52
  const MOFLO_ROOT = resolve(dirname(SCRIPT_PATH), '..');
@@ -87,68 +88,12 @@ export const SOURCE_HELPER_FILES = [
87
88
  'post-commit',
88
89
  ];
89
90
 
90
- // ── Retry + circuit breaker (#854 contract) ──────────────────────────────────
91
+ // ── Retry + atomic copy + circuit breaker (#854 / #975) ─────────────────────
91
92
  //
92
- // Mirrors the launcher's syncWithRetry. Backoff [50,200,800]ms covers Windows
93
- // EBUSY windows from concurrent helper invocation + AV real-time scan. The
94
- // breaker opens after 5 distinct files exhaust retries so a sick host
95
- // (AV mid-scan over node_modules) doesn't compound wall-clock cost.
96
-
97
- const TRANSIENT_CODES = new Set(['EBUSY', 'EPERM', 'EACCES']);
98
- const RETRY_BACKOFF_MS = [50, 200, 800];
99
- const CIRCUIT_BREAK_THRESHOLD = 5;
100
-
101
- const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
102
-
103
- function makeSyncer() {
104
- let circuitOpen = false;
105
- const failures = [];
106
-
107
- async function syncWithRetry(operation) {
108
- const maxAttempts = circuitOpen ? 1 : RETRY_BACKOFF_MS.length + 1;
109
- let lastErr = null;
110
- let lastCode = null;
111
- for (let attempt = 0; attempt < maxAttempts; attempt++) {
112
- if (attempt > 0) await sleep(RETRY_BACKOFF_MS[attempt - 1]);
113
- try {
114
- operation();
115
- return { ok: true };
116
- } catch (err) {
117
- lastErr = err;
118
- lastCode = err && err.code ? err.code : null;
119
- if (!TRANSIENT_CODES.has(lastCode)) break;
120
- }
121
- }
122
- if (!circuitOpen && failures.length + 1 >= CIRCUIT_BREAK_THRESHOLD) {
123
- circuitOpen = true;
124
- }
125
- return { ok: false, err: lastErr, code: lastCode };
126
- }
127
-
128
- async function syncFile(src, dest, manifestKey) {
129
- if (!existsSync(src)) return { skipped: true };
130
- try {
131
- mkdirSync(dirname(dest), { recursive: true });
132
- } catch (err) {
133
- failures.push({ key: manifestKey, message: errMessage(err) });
134
- return { ok: false };
135
- }
136
- const result = await syncWithRetry(() => copyFileSync(src, dest));
137
- if (result.ok) return { ok: true };
138
- const tail = TRANSIENT_CODES.has(result.code)
139
- ? ` (retried ${RETRY_BACKOFF_MS.length}× after ${result.code}${circuitOpen ? '; circuit open' : ''})`
140
- : '';
141
- failures.push({ key: manifestKey, message: `${errMessage(result.err)}${tail}` });
142
- return { ok: false };
143
- }
144
-
145
- return { syncFile, failures };
146
- }
147
-
148
- function errMessage(err) {
149
- if (!err) return 'unknown error';
150
- return err.code ? `${err.code} ${err.message || ''}`.trim() : (err.message || String(err));
151
- }
93
+ // Implementation lives in `bin/lib/file-sync.mjs` so the launcher's section 3
94
+ // shares the same hash-skip + atomic + verify path. Backoff [50,200,800]ms
95
+ // covers Windows EBUSY windows from concurrent helper invocation + AV scan;
96
+ // breaker opens after 5 distinct exhausted-retry failures.
152
97
 
153
98
  // ── Project root discovery ──────────────────────────────────────────────────
154
99
  //
@@ -294,6 +239,37 @@ export async function runBootstrap({
294
239
  log(
295
240
  `moflo: postinstall bootstrap left ${failures.length} file(s) unsynced — run 'flo doctor --fix' to repair:\n${sample}${more}`,
296
241
  );
242
+
243
+ // #975: write a sentinel that session-start picks up so the user gets a
244
+ // visible "upgrade left work undone" prompt instead of a silent stale
245
+ // launcher. The bootstrap's stderr alone is buried in `npm install`
246
+ // output noise. Best-effort write — we never block install on this.
247
+ try {
248
+ const mofloDir = resolve(projectRoot, '.moflo');
249
+ mkdirSync(mofloDir, { recursive: true });
250
+ let mofloVersion = 'unknown';
251
+ try {
252
+ const pkgPath = resolve(mofloRoot, 'package.json');
253
+ if (existsSync(pkgPath)) {
254
+ mofloVersion = JSON.parse(readFileSync(pkgPath, 'utf-8')).version || 'unknown';
255
+ }
256
+ } catch { /* version is informational only */ }
257
+ const sentinel = {
258
+ timestamp: new Date().toISOString(),
259
+ mofloVersion,
260
+ failures: failures.map((f) => ({
261
+ key: f.key,
262
+ message: f.message,
263
+ src: f.src,
264
+ dest: f.dest,
265
+ })),
266
+ };
267
+ writeFileSync(
268
+ resolve(mofloDir, 'bootstrap-failed.json'),
269
+ JSON.stringify(sentinel, null, 2),
270
+ 'utf-8',
271
+ );
272
+ } catch { /* sentinel write must not block install */ }
297
273
  }
298
274
 
299
275
  return { ran: true, synced, failed: failures.length, failures };