moflo 4.9.31 → 4.9.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,167 @@
1
+ # Root-Cause Discipline — Measure Twice, Cut Once
2
+
3
+ **Purpose:** The MoFlo standard for fixing bugs. We do not "shoot first and ask questions later" — we measure twice and cut once. Apply this whenever you are about to write a fix, especially when a previous fix on the same surface didn't fully work.
4
+
5
+ ---
6
+
7
+ ## The Headline Rule
8
+
9
+ **Measure twice, cut once. Step back, understand the problem holistically, then make the simplest fix that eliminates the cause.** Do not pile patch onto patch onto patch.
10
+
11
+ This is the single most important engineering posture in this project. Layered patches have produced the worst regressions, the longest debugging sessions, and the most expensive token bills. When you find yourself reaching for "another layer" — stop.
12
+
13
+ ---
14
+
15
+ ## Before You Write Fix N+1
16
+
17
+ Before adding a new fix on top of an existing one, you MUST answer all four:
18
+
19
+ | Question | If you can't answer | Action |
20
+ |----------|---------------------|--------|
21
+ | What exactly is the failure mode at the lowest level? (Not the symptom — the actual mechanism.) | You don't understand the bug yet | Investigate further; do not fix |
22
+ | Why didn't fix N work? Is it wrong, or just incomplete? | You're guessing at the gap | Read fix N's code + history; reproduce the failure |
23
+ | Would removing fix N + replacing with one cleaner fix simplify the surface? | You haven't considered consolidation | Try the consolidation first |
24
+ | What's the SIMPLEST change that makes the bug structurally impossible? | You're patching symptoms, not causes | Step back further |
25
+
26
+ If three answers are vague, you're in patch-on-patch territory. Stop and re-think.
27
+
28
+ ---
29
+
30
+ ## Patch-on-Patch Smoke Alarms
31
+
32
+ Stop and reconsider when you see yourself doing any of these:
33
+
34
+ | Smoke alarm | What it usually means | The right move |
35
+ |-------------|----------------------|----------------|
36
+ | Adding a "belt-and-suspenders" cleanup | The first cleanup is racing something — find what | Eliminate the race, not double-cleanup |
37
+ | Adding `try/catch` around code that already has `try/catch` | Outer catch is masking inner failure | Surface the inner error, don't double-wrap |
38
+ | Adding a `setTimeout` retry loop on top of an existing retry | Retry won't fix a logic bug | Fix the logic |
39
+ | Bumping a timeout because tests fail intermittently | The op is slower than expected — find why | Fix the slowness or remove the op |
40
+ | Adding a flag/env-var to "skip the broken path" | You're hiding the bug, not fixing it | Fix the path or delete it |
41
+ | Adding a workaround "until we can fix this properly" | You won't come back; "later" never happens | Fix it now or file with full context |
42
+ | Touching three files to fix one bug | Bug is misdiagnosed; one file usually suffices | Re-diagnose |
43
+
44
+ When **two or more** of these apply at once, the fix is almost certainly wrong. Throw it away and re-investigate.
45
+
46
+ ---
47
+
48
+ ## The Holistic Step-Back
49
+
50
+ When fix N didn't work, do these in order — not in parallel, not skipping steps:
51
+
52
+ 1. **Read every prior fix on this surface in full.** Not the commit message — the code. Note what each one was trying to prevent and what it actually does.
53
+ 2. **Reproduce the failure deterministically** before touching code. If you can't reproduce it, you don't understand it.
54
+ 3. **Trace the data flow.** Where does the bad state originate? What writes it? What reads it? What invariant got violated?
55
+ 4. **Question the test, not just the code.** What invariant does the failing test actually encode? Does that invariant match the runtime contract, or is the test stricter? A test stricter than the contract will produce flakes that look like bugs but aren't. (See #1017 case study.)
56
+ 5. **Identify the structural cause** — the place where the bug becomes possible, not the place where it becomes visible.
57
+ 6. **Now consider fixes.** The cheapest fix at the structural cause beats the cleverest fix at the symptom every time. If the cause is "test asserts X, runtime contract is Y, X is stricter," the fix is in the test.
58
+
59
+ If step 6 yields a fix smaller and simpler than the existing patches, **delete the existing patches** as part of the same change. Do not stack.
60
+
61
+ ---
62
+
63
+ ## Code Serves the Specification, Not the Test
64
+
65
+ **Periodically ask: "Am I solving an actual problem, or am I flailing to satisfy a flawed test?"** When several attempted fixes haven't moved the needle, the test framework is a likely suspect — but the response is never to degrade production code to make the test pass.
66
+
67
+ **Never introduce substandard code to satisfy shortcomings of the testing infrastructure.** Production code expresses the runtime contract. Tests verify the contract. When they disagree:
68
+
69
+ | Disagreement | Correct response | Wrong response |
70
+ |--------------|------------------|----------------|
71
+ | Test asserts behavior the runtime never promised | Fix the test to match the contract | Add code to satisfy the test's stricter assertion |
72
+ | Test uses an unrealistic environment (mocks the wrong layer, races a SIGKILL'd daemon, single-session asserts on a multi-session contract) | Fix the test environment | Add retry / sleep / workaround in production code |
73
+ | Test framework can't observe a legitimate runtime path | Add a test hook (`_resetForTest`, `getStateForTest`) that doesn't change runtime behavior | Restructure runtime to make the test framework's observation easier |
74
+ | Test is flaky on one platform but the runtime works | Identify why the test, not the runtime, is sensitive | Bump timeouts / retries / sleeps in production paths |
75
+
76
+ **Code purity check before any "make the test pass" change:** would you ship this change if the test didn't exist? If no, you're degrading the code to satisfy the test. Stop. Fix the test.
77
+
78
+ **Signals you're flailing for the test, not solving the bug:**
79
+
80
+ | Signal | What it actually says |
81
+ |--------|----------------------|
82
+ | You've tried 3+ fixes and nothing has moved the needle | The diagnosis is wrong; investigate before patching again |
83
+ | Each fix gets narrower / more defensive without removing the prior layer | You're piling on, not solving |
84
+ | The runtime works fine in real-world usage but the test fails | The test's spec doesn't match the contract — that's the bug |
85
+ | You'd need to add a sleep, retry, lock, or platform-special-case to make the test happy | Production code is paying for a test-environment limitation |
86
+ | Removing the test makes the bug "go away" | The test was right but the fix is wrong, OR the test was the bug — diagnose which |
87
+
88
+ The user said it directly: **"we never want to introduce substandard code to satisfy shortcomings of our testing infrastructure."** Tests serve the code; the code does not serve the tests.
89
+
90
+ When you find that the test is the actual problem: change the test, document why in the commit message, and (if the change weakens an invariant) add a separate test that captures the invariant the original was *trying* to encode without the false strictness.
91
+
92
+ ---
93
+
94
+ ## Concrete Example: #1017 Hive-Mind Shutdown
95
+
96
+ This is the canonical case study for this guidance — and it has a second-order lesson that makes it even more useful.
97
+
98
+ | Attempt | Approach | Outcome |
99
+ |---------|----------|---------|
100
+ | #1017 first try | Loop list+delete in `clearNamespace` | Race window remained — broadcasts landed mid-loop |
101
+ | #1024 layer 1 | Detach adapter BEFORE `clearNamespace` (after `terminateAgent`) | Race narrowed but not eliminated |
102
+ | #1024 layer 2 | Add `purgeHiveNamespacesDirect` raw sql.js DELETE | Looked bulletproof; actually clobber-prone vs daemon's stale snapshot (#981 single-writer) |
103
+ | #1024 declared green | All 6 CI checks pass once | Same flake reappeared on next PR's CI |
104
+ | #1027 attempt 4 | Move `adapter.detach()` BEFORE `terminateAgent`; delete `purgeHiveNamespacesDirect` | Code simplified by -73 LOC. **Same flake on macos-latest CI.** |
105
+ | #1027 — actual fix | Run launcher a SECOND time after doctor in the populated harness | Test passes. Race is intrinsic to multi-process sql.js + daemon kill timing; the harness assertion was over-strict. |
106
+
107
+ The first three attempts kept asking "how do we delete this row harder?" The fourth attempt was a structural simplification that was correct on its own merits (-73 LOC, removed dead code, simpler shutdown ordering) but **did not fix the flake**.
108
+
109
+ The actual root cause was outside the surface every patch had touched: the populated harness was asserting "ephemerals purged after one launcher run" when the **runtime contract is "ephemerals purged at next session-start launcher"**. The doctor's hive-mind probe writes a row intentionally; that row is supposed to live until the NEXT session purges it. The test was conflating "purge mechanism works" with "purge happens within one session" — those are different invariants and only the first is the real product behavior.
110
+
111
+ **Two lessons stack here:**
112
+
113
+ 1. **Don't pile layers** (the original lesson): four shutdown patches, each narrower than the last, none structurally sufficient.
114
+ 2. **Question the test, not just the code** (the second-order lesson): if you've been fighting a race for four PRs and the simplest in-code fix doesn't move the needle, the spec encoded in the test may be wrong. A test that is stricter than the runtime contract WILL produce flakes that look like product bugs but aren't.
115
+
116
+ Together: when a fix isn't working, ask both "what writes the bad state?" AND "is this state actually bad in the runtime contract, or only in the test's expectation?"
117
+
118
+ ---
119
+
120
+ ## When You Genuinely Need a Belt-and-Suspenders
121
+
122
+ Belts-and-suspenders are not always wrong. They are right when:
123
+
124
+ | Condition | Example |
125
+ |-----------|---------|
126
+ | The two layers protect against **different** failure modes | atomic-write tmp+fsync+rename: tmp protects partial writes; fsync protects OS cache; rename protects readers — three concerns, three mechanisms |
127
+ | The first layer's failure is **silent**, the second surfaces it | A retry that logs the first failure before re-attempting |
128
+ | Removing either layer has a **stated, documented reason** for keeping the other | A fallback path with a comment explaining when the primary doesn't reach |
129
+
130
+ They are wrong when both layers protect against the **same** failure mode and you're hoping at least one wins. That's hope, not engineering.
131
+
132
+ ---
133
+
134
+ ## What This Means for PR Reviews
135
+
136
+ Reviewers should reject — not just question — PRs that show patch-on-patch signatures:
137
+
138
+ | Signal | Reviewer action |
139
+ |--------|----------------|
140
+ | Same file/function touched in 3+ recent commits, same bug | Ask: "is the prior fix wrong? remove it" |
141
+ | New fix adds a layer without removing one | Ask: "what was wrong with the prior layer? why does it stay?" |
142
+ | Comment in new code says "for safety" or "just in case" | Ask: "what specific failure is this preventing? cite the line that produces it" |
143
+ | The PR description says "this should fix the flake" without a deterministic repro | Ask: "what was the actual root cause? the writeup doesn't name it" |
144
+
145
+ These questions are not pedantic. They are the difference between fixing a bug and growing the surface area of bugs.
146
+
147
+ ---
148
+
149
+ ## How to Apply When You Are Stuck
150
+
151
+ If you genuinely cannot find the root cause after stepping back:
152
+
153
+ 1. **Stop fixing. Start measuring.** Add logging at every state transition. Reproduce. Read the log.
154
+ 2. **Ask the user before patching.** A two-line confirmation question costs less than a wrong fix.
155
+ 3. **File the issue with what you DO know.** Partial diagnosis with logs is more useful than a guessed fix.
156
+ 4. **Never ship "I think this might work."** That phrasing is a self-warning that the diagnosis isn't done.
157
+
158
+ It is always cheaper to admit uncertainty than to ship a layered patch that creates two new bugs.
159
+
160
+ ---
161
+
162
+ ## See Also
163
+
164
+ - `.claude/guidance/moflo-error-handling.md` — Silent failures are the prerequisite condition for most patch-on-patch saga; fix those first
165
+ - `.claude/guidance/moflo-source-hygiene.md` — When you decide to delete redundant code, the canonical-location rules tell you what's safe to remove
166
+ - `feedback_no_layered_workarounds.md` (auto-memory) — The personal-feedback version of this rule, recorded from prior incidents
167
+ - `feedback_ci_flake_means_not_done.md` (auto-memory) — A flake that "passed on rerun" is not fixed; root-cause it under this discipline
@@ -15,6 +15,7 @@ import { callMCPTool } from '../mcp-client.js';
15
15
  import { TOOL_MEMORY_STORE, TOOL_MEMORY_LIST, TOOL_MEMORY_RETRIEVE } from '../mcp-tools/tool-names.js';
16
16
  import { handleMCPError } from '../services/cli-formatters.js';
17
17
  import { ensureDaemonForScheduling } from '../services/daemon-readiness.js';
18
+ import { checkScheduleAcceptance } from '../services/schedule-acceptance-check.js';
18
19
  import { reconcileDaemonAutostart } from '../services/daemon-autostart-lifecycle.js';
19
20
  import { isDaemonInstalled } from '../services/daemon-service.js';
20
21
  import { validateSchedule, computeNextRun } from '../spells/scheduler/cron-parser.js';
@@ -123,6 +124,16 @@ const createCommand = {
123
124
  for (const warning of readiness.warnings) {
124
125
  output.printWarning(warning);
125
126
  }
127
+ // Permission-acceptance check (#1037): scheduled fires run in the daemon's
128
+ // non-interactive context and have no way to prompt for permissions. If
129
+ // this spell hasn't been manually cast yet, the user needs to know NOW so
130
+ // they can run `flo spell cast -n <name>` once before relying on the
131
+ // schedule. This is a warning, never a block — the user may have a legit
132
+ // reason (about to cast, scripted setup, etc.).
133
+ const acceptance = await checkScheduleAcceptance(projectRoot, name);
134
+ if (acceptance.message) {
135
+ output.printWarning(acceptance.message);
136
+ }
126
137
  // Always create the schedule, regardless of daemon state
127
138
  const id = `sched-adhoc-${now}-${Math.random().toString(36).slice(2, 8)}`;
128
139
  const record = {
@@ -8,17 +8,30 @@
8
8
  * For `fast-all-MiniLM-L6-v2`, the URL slug is `sentence-transformers-all-MiniLM-L6-v2`
9
9
  * but the on-disk directory keeps the `fast-` prefix — verbatim from upstream.
10
10
  *
11
- * Concurrency: parallel callers downloading the same model atomic-rename the
12
- * tarball through a unique temp path, so Windows file locks during extraction
13
- * never collide. The final model dir is the synchronization point.
11
+ * Concurrency: a per-model file lock (`<cacheDir>/.<model>.download.lock`,
12
+ * created with `wx`) serializes the download/extract for any number of
13
+ * parallel processes only one process performs the work, the rest poll for
14
+ * the completion sentinel. This was issue #1021's secondary failure mode:
15
+ * the smoke harness spawns ~12 parallel doctor + memory probes on a cold
16
+ * cache, and Windows file locking exposed the race when the in-tree
17
+ * "synchronization point" was just a shared directory write.
14
18
  */
15
19
  import { createWriteStream, existsSync, mkdirSync, renameSync, rmSync, writeFileSync, } from 'node:fs';
16
20
  import { homedir } from 'node:os';
17
21
  import { dirname, join } from 'node:path';
18
22
  import { pipeline } from 'node:stream/promises';
19
23
  import { Readable } from 'node:stream';
24
+ import { setTimeout as delay } from 'node:timers/promises';
20
25
  import { x as tarExtract } from 'tar';
21
26
  const GCS_BASE_URL = 'https://storage.googleapis.com/qdrant-fastembed';
27
+ // Lock-poll: how long a non-holder waits for the holder to finish before
28
+ // concluding the holder crashed. Cold-fetch is ~90 MB on slow CI runners, so
29
+ // a generous timeout avoids false takeovers under network back-pressure.
30
+ const LOCK_TIMEOUT_MS = 120_000;
31
+ const LOCK_POLL_INTERVAL_MS = 250;
32
+ // Standard transient-error retry per feedback_transient_retry_circuit_breaker.md:
33
+ // 50/200/800ms backoff, only on network errors and 5xx (4xx is deterministic).
34
+ const HTTP_BACKOFF_MS = [50, 200, 800];
22
35
  /**
23
36
  * Sentinel file written into the model directory only after the tarball has
24
37
  * been fully downloaded AND extracted. Cache hits without it are treated as
@@ -50,28 +63,121 @@ function gcsSlugFor(model) {
50
63
  export function resolveCacheDir(explicit, env = process.env) {
51
64
  return explicit ?? env.FASTEMBED_CACHE ?? join(homedir(), '.cache', 'fastembed');
52
65
  }
66
+ class TransientHttpError extends Error {
67
+ constructor(message) {
68
+ super(message);
69
+ this.name = 'TransientHttpError';
70
+ }
71
+ }
53
72
  /**
54
73
  * Stream the tarball to a unique temp path, then atomic-rename to the final
55
- * tarball path before extracting. The temp suffix prevents two concurrent
56
- * downloads from clobbering each other's write stream — extraction itself is
57
- * the slow step on Windows where file-lock contention shows up.
74
+ * tarball path before extracting. The temp suffix prevents the in-flight
75
+ * write stream from being observed at the final path — extraction always
76
+ * sees a complete file.
77
+ *
78
+ * Throws `TransientHttpError` on 5xx / network failure (caller retries) and
79
+ * a plain Error on 4xx (caller fails fast — retrying won't help).
58
80
  */
59
81
  async function downloadTarball(url, destPath, showProgress, deps) {
60
82
  const fetchFn = deps.fetchImpl ?? fetch;
61
83
  const tmpPath = `${destPath}.${process.pid}.tmp`;
62
84
  mkdirSync(dirname(destPath), { recursive: true });
63
- const res = await fetchFn(url);
85
+ let res;
86
+ try {
87
+ res = await fetchFn(url);
88
+ }
89
+ catch (err) {
90
+ throw new TransientHttpError(`Model download failed: GET ${url} → ${err.message}`);
91
+ }
64
92
  if (!res.ok || !res.body) {
65
- throw new Error(`Model download failed: GET ${url} → ${res.status} ${res.statusText}`);
93
+ const msg = `Model download failed: GET ${url} → ${res.status} ${res.statusText}`;
94
+ if (res.status >= 500)
95
+ throw new TransientHttpError(msg);
96
+ throw new Error(msg);
66
97
  }
67
98
  if (showProgress) {
68
99
  const total = Number(res.headers.get('content-length') ?? 0);
69
100
  const totalMb = (total / (1024 * 1024)).toFixed(1);
70
101
  process.stderr.write(`fastembed: downloading ${totalMb} MB from ${url}\n`);
71
102
  }
72
- await pipeline(Readable.fromWeb(res.body), createWriteStream(tmpPath));
103
+ try {
104
+ await pipeline(Readable.fromWeb(res.body), createWriteStream(tmpPath));
105
+ }
106
+ catch (err) {
107
+ rmSync(tmpPath, { force: true });
108
+ throw new TransientHttpError(`Model download stream failed mid-transfer (${url}): ${err.message}`);
109
+ }
73
110
  renameSync(tmpPath, destPath);
74
111
  }
112
+ async function downloadTarballWithRetry(url, destPath, showProgress, deps) {
113
+ let lastErr;
114
+ for (let attempt = 0; attempt <= HTTP_BACKOFF_MS.length; attempt++) {
115
+ try {
116
+ await downloadTarball(url, destPath, showProgress, deps);
117
+ return;
118
+ }
119
+ catch (err) {
120
+ lastErr = err;
121
+ if (!(err instanceof TransientHttpError) || attempt === HTTP_BACKOFF_MS.length)
122
+ break;
123
+ if (showProgress) {
124
+ process.stderr.write(`fastembed: download attempt ${attempt + 1} failed (${err.message}); retrying in ${HTTP_BACKOFF_MS[attempt]}ms.\n`);
125
+ }
126
+ await delay(HTTP_BACKOFF_MS[attempt]);
127
+ }
128
+ }
129
+ throw lastErr;
130
+ }
131
+ /**
132
+ * Cross-process serialization for the download/extract step. Lock holder runs
133
+ * `work`; non-holders poll for the completion sentinel and return as soon as
134
+ * it appears. If the lock holder crashes (lockfile remains but no sentinel
135
+ * after the timeout), the next caller cleans up and retries — preventing a
136
+ * permanently-stuck cache after a Ctrl+C mid-download.
137
+ */
138
+ async function withModelLock(lockPath, completionPath, work) {
139
+ try {
140
+ writeFileSync(lockPath, String(process.pid), { flag: 'wx' });
141
+ }
142
+ catch (err) {
143
+ if (err.code !== 'EEXIST')
144
+ throw err;
145
+ await waitForCompletionOrTakeover(lockPath, completionPath, work);
146
+ return;
147
+ }
148
+ try {
149
+ await work();
150
+ }
151
+ finally {
152
+ try {
153
+ rmSync(lockPath, { force: true });
154
+ }
155
+ catch { /* best effort */ }
156
+ }
157
+ }
158
+ async function waitForCompletionOrTakeover(lockPath, completionPath, work) {
159
+ const deadline = Date.now() + LOCK_TIMEOUT_MS;
160
+ while (Date.now() < deadline) {
161
+ if (existsSync(completionPath))
162
+ return;
163
+ if (!existsSync(lockPath)) {
164
+ // Holder finished without writing the sentinel (crashed). Try to take
165
+ // over the lock ourselves.
166
+ await withModelLock(lockPath, completionPath, work);
167
+ return;
168
+ }
169
+ await delay(LOCK_POLL_INTERVAL_MS);
170
+ }
171
+ // Stale lock — clear it and let the next caller (or our own retry above)
172
+ // pick up the work. Force unlinking is safer than leaving the cache
173
+ // permanently wedged.
174
+ try {
175
+ rmSync(lockPath, { force: true });
176
+ }
177
+ catch { /* best effort */ }
178
+ throw new Error(`fastembed: timed out after ${LOCK_TIMEOUT_MS}ms waiting for ${lockPath}. ` +
179
+ `Stale lock cleared — retry the operation.`);
180
+ }
75
181
  /**
76
182
  * Ensure the per-model directory exists in the cache. Returns the absolute
77
183
  * path. If already present AND the completion sentinel is in place, no
@@ -86,25 +192,34 @@ async function downloadTarball(url, destPath, showProgress, deps) {
86
192
  */
87
193
  export async function retrieveModel(model, cacheDir, showProgress, deps = {}) {
88
194
  const modelDir = join(cacheDir, model);
89
- if (existsSync(modelDir)) {
90
- if (existsSync(join(modelDir, COMPLETION_SENTINEL)))
91
- return modelDir;
92
- if (showProgress) {
93
- process.stderr.write(`fastembed: cached model at ${modelDir} is incomplete (no completion marker); redownloading.\n`);
94
- }
95
- rmSync(modelDir, { recursive: true, force: true });
96
- }
195
+ const completionPath = join(modelDir, COMPLETION_SENTINEL);
196
+ // Fast path: complete cache hit needs no lock, no fs writes.
197
+ if (existsSync(completionPath))
198
+ return modelDir;
97
199
  mkdirSync(cacheDir, { recursive: true });
200
+ const lockPath = join(cacheDir, `.${model}.download.lock`);
98
201
  const tarballPath = join(cacheDir, `${model}.tar.gz`);
99
202
  const url = `${GCS_BASE_URL}/${gcsSlugFor(model)}.tar.gz`;
100
- await downloadTarball(url, tarballPath, showProgress, deps);
101
- const extract = deps.extract ?? tarExtract;
102
- await extract({ file: tarballPath, cwd: cacheDir });
103
- rmSync(tarballPath, { force: true });
104
- if (!existsSync(modelDir)) {
105
- throw new Error(`Model archive extracted but ${modelDir} is missing — corrupt tarball?`);
106
- }
107
- writeFileSync(join(modelDir, COMPLETION_SENTINEL), '');
203
+ await withModelLock(lockPath, completionPath, async () => {
204
+ // Re-check inside the lock — another process may have completed the
205
+ // download between our fast-path check and our lock acquisition.
206
+ if (existsSync(completionPath))
207
+ return;
208
+ if (existsSync(modelDir)) {
209
+ if (showProgress) {
210
+ process.stderr.write(`fastembed: cached model at ${modelDir} is incomplete (no completion marker); redownloading.\n`);
211
+ }
212
+ rmSync(modelDir, { recursive: true, force: true });
213
+ }
214
+ await downloadTarballWithRetry(url, tarballPath, showProgress, deps);
215
+ const extract = deps.extract ?? tarExtract;
216
+ await extract({ file: tarballPath, cwd: cacheDir });
217
+ rmSync(tarballPath, { force: true });
218
+ if (!existsSync(modelDir)) {
219
+ throw new Error(`Model archive extracted but ${modelDir} is missing — corrupt tarball?`);
220
+ }
221
+ writeFileSync(completionPath, '');
222
+ });
108
223
  return modelDir;
109
224
  }
110
225
  //# sourceMappingURL=model-loader.js.map
@@ -9,7 +9,7 @@
9
9
  */
10
10
  import * as readline from 'node:readline';
11
11
  import { loadSpellEngine, } from '../services/engine-loader.js';
12
- import { createDashboardMemoryAccessor } from '../services/daemon-dashboard.js';
12
+ import { getSharedMemoryAccessor } from '../services/daemon-dashboard.js';
13
13
  /**
14
14
  * Wrap a MemoryAccessor with a write-failure counter so the [epic] summary
15
15
  * can warn when spell progress didn't reach disk (#982). Without this, a
@@ -56,17 +56,22 @@ async function promptAcceptPermissions() {
56
56
  */
57
57
  export async function runEpicSpell(yamlContent, options = {}) {
58
58
  const engine = await loadSpellEngine();
59
- // Lazily initialize a real memory accessor so execution records
60
- // are persisted and visible in the dashboard.
59
+ // Lazily wrap the process-wide shared accessor (#1020) so execution
60
+ // records are persisted and visible in the dashboard. The shared helper
61
+ // owns the warn-and-return-null degradation; we only attach the
62
+ // failed-write counter on top of a successful inner accessor.
61
63
  if (!memoryAccessor) {
62
- try {
63
- const inner = await createDashboardMemoryAccessor();
64
+ const inner = await getSharedMemoryAccessor();
65
+ if (inner) {
64
66
  memoryAccessor = trackPersistFailures(inner);
65
67
  console.log('[epic] Memory accessor ready — spell progress will be persisted');
66
68
  }
67
- catch (err) {
68
- console.warn(`[epic] Dashboard memory unavailable: ${err.message ?? err}`);
69
- console.warn('[epic] Spell executions will NOT appear in the dashboard');
69
+ else {
70
+ // The shared helper already emitted `[memory]`-prefixed warns. Add an
71
+ // `[epic]`-tagged note so a user running `flo epic` can correlate the
72
+ // missing dashboard history with this command without scanning for a
73
+ // `[memory]` line elsewhere in the output.
74
+ console.warn('[epic] ⚠ Memory unavailable — this run will not appear in the dashboard');
70
75
  }
71
76
  }
72
77
  // memoryAccessor is module-cached, so `failedWrites` is cumulative across
@@ -719,9 +719,22 @@ export const hiveMindTools = [
719
719
  workerCount,
720
720
  };
721
721
  }
722
- // Story #807: terminate coordinator-side worker records before we
723
- // wipe the hive state so swarm agent_list reflects the shutdown.
724
- // allSettled so one failed terminate doesn't strand the rest.
722
+ // #1017 detach the adapter FIRST, before any code that broadcasts
723
+ // hive-mind events. terminateAgent below sends agent_terminate
724
+ // broadcasts on the hive-mind namespace; with the adapter still
725
+ // listening, those broadcasts register fire-and-forget storeEntry
726
+ // calls that can land after clearNamespace runs. Detaching first means
727
+ // every subsequent broadcast hits a dead listener and never persists,
728
+ // so clearNamespace operates on a deterministic, unchanging set.
729
+ const adapter = _writeThroughAdapter;
730
+ if (adapter) {
731
+ adapter.detach();
732
+ _writeThroughAdapter = null;
733
+ }
734
+ // Story #807: terminate coordinator-side worker records so swarm
735
+ // agent_list reflects the shutdown. allSettled so one failed terminate
736
+ // doesn't strand the rest. Broadcasts emitted here are intentionally
737
+ // ignored by the (now-detached) adapter.
725
738
  try {
726
739
  const coordinator = await getSwarmCoordinator();
727
740
  const results = await Promise.allSettled(hiveState.workers.map(id => coordinator.terminateAgent(id, { reason: 'hive-mind_shutdown', force: true })));
@@ -734,23 +747,24 @@ export const hiveMindTools = [
734
747
  catch (err) {
735
748
  process.stderr.write(`[hive-mind_shutdown] coordinator cleanup failed: ${err.message}\n`);
736
749
  }
737
- // Clear write-through namespaces in Memory DB
738
- try {
739
- const adapter = await getWriteThroughAdapter();
740
- await adapter.clearNamespace(HIVE_NS);
741
- await adapter.clearNamespace(HIVE_MEMORY_NS);
742
- }
743
- catch {
744
- // Best-effort cleanup
750
+ // Drain whatever the adapter already had in flight at detach, then
751
+ // delete the persisted hive-mind rows. Routed through the chokepoint
752
+ // (deleteEntry daemon RPC when alive), so the daemon's in-memory
753
+ // snapshot stays consistent with disk and cannot clobber the cleanup
754
+ // on its next flush.
755
+ if (adapter) {
756
+ try {
757
+ await adapter.clearNamespace(HIVE_NS);
758
+ await adapter.clearNamespace(HIVE_MEMORY_NS);
759
+ }
760
+ catch {
761
+ // Best-effort cleanup
762
+ }
745
763
  }
746
764
  // Shutdown MessageBus for hive-mind
747
765
  try {
748
766
  const bus = await getMessageBus();
749
767
  bus.unsubscribe('hive-mind-system');
750
- if (_writeThroughAdapter) {
751
- _writeThroughAdapter.detach();
752
- _writeThroughAdapter = null;
753
- }
754
768
  }
755
769
  catch {
756
770
  // Bus may not be initialized
@@ -12,6 +12,7 @@ import { findProjectRoot } from '../services/project-root.js';
12
12
  import { buildGrimoire } from '../services/grimoire-builder.js';
13
13
  import { errorDetail } from '../shared/utils/error-detail.js';
14
14
  import { inferSpellTier } from '../spells/core/spell-tier.js';
15
+ import { getSharedMemoryAccessor } from '../services/daemon-dashboard.js';
15
16
  // ============================================================================
16
17
  // Constants
17
18
  // ============================================================================
@@ -53,16 +54,23 @@ function trackResult(tracked, result) {
53
54
  tracked.result = result;
54
55
  tracked.completedAt = new Date().toISOString();
55
56
  }
57
+ // Memory accessor wiring (#1016): without `getSharedMemoryAccessor()`,
58
+ // runner.storeProgress() writes go to noopMemory and The Luminarium's
59
+ // "Flo Runs" tab never sees flo run / spell_cast invocations. The shared
60
+ // accessor is the same singleton runner-adapter.ts uses for `flo epic`
61
+ // (one cold init per process — see #1020).
56
62
  /** Execute a definition via the engine with tracking and error handling. */
57
63
  async function executeAndTrack(engine, definition, args, options = {}) {
58
64
  const spellId = `sp-${Date.now()}`;
59
65
  const tracked = trackStart(spellId, definition.name, definition.description);
60
66
  try {
61
67
  const sandboxConfig = await engine.loadSandboxConfigFromProject(findProjectRoot());
68
+ const memory = await getSharedMemoryAccessor();
62
69
  const result = await engine.bridgeExecuteSpell(definition, args, {
63
70
  spellId,
64
71
  sandboxConfig,
65
72
  forceCredentialReprompt: options.forceCredentialReprompt,
73
+ ...(memory ? { memory } : {}),
66
74
  });
67
75
  trackResult(tracked, result);
68
76
  return withSpellSource(serializeResult(result), options.sourceFile, options.tier);
@@ -16,6 +16,46 @@ import { createServer } from 'node:http';
16
16
  import { errorDetail } from '../shared/utils/error-detail.js';
17
17
  import { handleMemoryStore, handleMemoryDelete, handleMemoryBatch, matchMemoryRpcRoute, } from './daemon-memory-rpc.js';
18
18
  export const DEFAULT_DASHBOARD_PORT = 3117;
19
+ /**
20
+ * Process-wide promise for the shared MemoryAccessor. Memoized as a *promise*
21
+ * (not the resolved value) so concurrent first-callers share a single init
22
+ * — without this, two near-simultaneous calls would each kick off their own
23
+ * `createDashboardMemoryAccessor()` chain and the loser's accessor would
24
+ * leak. The race fix originated in #1016 inside `mcp-tools/spell-tools.ts`;
25
+ * #1020 lifted it into this shared helper so `epic/runner-adapter.ts` (which
26
+ * had the same latent race) and any future caller benefit from one cold
27
+ * init per process.
28
+ */
29
+ let _sharedAccessorPromise = null;
30
+ /**
31
+ * Return the process-wide MemoryAccessor, lazy-initialized on first call and
32
+ * cached as a promise thereafter. Returns `null` (with a warn log) if init
33
+ * fails so callers can degrade gracefully — the spell still runs, the user
34
+ * just doesn't see the run in The Luminarium.
35
+ */
36
+ export function getSharedMemoryAccessor() {
37
+ if (_sharedAccessorPromise)
38
+ return _sharedAccessorPromise;
39
+ _sharedAccessorPromise = (async () => {
40
+ try {
41
+ return await createDashboardMemoryAccessor();
42
+ }
43
+ catch (err) {
44
+ console.warn(`[memory] dashboard accessor unavailable: ${err.message ?? err}`);
45
+ console.warn('[memory] runs will NOT appear in The Luminarium');
46
+ return null;
47
+ }
48
+ })();
49
+ return _sharedAccessorPromise;
50
+ }
51
+ /**
52
+ * Test-only: reset the cached promise so a subsequent call re-runs init.
53
+ * Production code MUST NOT call this — leaks the previous accessor's DB
54
+ * handle if the prior init succeeded.
55
+ */
56
+ export function _resetSharedMemoryAccessorForTest() {
57
+ _sharedAccessorPromise = null;
58
+ }
19
59
  /**
20
60
  * Create a MemoryAccessor backed by the sql.js/HNSW memory database.
21
61
  * Lazy-loads memory-initializer to avoid circular deps.
@@ -24,9 +24,17 @@ export class DaemonSpellExecutor {
24
24
  this.explicitSandbox = opts.sandboxConfig;
25
25
  }
26
26
  exists(spellName) {
27
+ // Invalidate before resolve so newly-added yamls are visible to the
28
+ // poll loop. Without this, stale-false from exists() causes the
29
+ // scheduler to auto-disable schedules whose spell was added on disk
30
+ // after daemon boot (#1034).
31
+ this.registry.invalidate();
27
32
  return this.registry.resolve(spellName) !== undefined;
28
33
  }
29
34
  async execute(spellName, args, signal, mofloLevel) {
35
+ // Invalidate before resolve so yaml edits on disk reach the next fire
36
+ // without needing a daemon restart (#1034).
37
+ this.registry.invalidate();
30
38
  const loaded = this.registry.resolve(spellName);
31
39
  if (!loaded) {
32
40
  return failedResult(`scheduled-${spellName}-${Date.now()}`, 'STEP_EXECUTION_FAILED', `Spell not found in grimoire: ${spellName}`);
@@ -0,0 +1,68 @@
1
+ /**
2
+ * Schedule Acceptance Check
3
+ *
4
+ * Resolves a spell, computes its current permission hash, and checks whether
5
+ * `.moflo/accepted-permissions/<name>.json` records a valid prior acceptance.
6
+ *
7
+ * The schedule-create command consumes the result to warn — never block — when
8
+ * the spell is missing acceptance. Without it, scheduled fires running in the
9
+ * non-interactive daemon context fail with `Missing credentials` and the user
10
+ * has no signal at create time that a one-time manual cast was the missing
11
+ * step (#1037).
12
+ */
13
+ import { buildGrimoire } from './grimoire-builder.js';
14
+ import { checkAcceptance } from '../spells/core/permission-acceptance.js';
15
+ /**
16
+ * Resolve `spellName` via the Grimoire, hash its permissions, compare against
17
+ * any stored acceptance under `<projectRoot>/.moflo/accepted-permissions/`.
18
+ *
19
+ * Always returns — never throws. A check failure (e.g. Grimoire unavailable)
20
+ * resolves to `check-failed` with an empty message so callers don't surface
21
+ * noise; the schedule create proceeds either way.
22
+ */
23
+ export async function checkScheduleAcceptance(projectRoot, spellName) {
24
+ try {
25
+ const { registry } = await buildGrimoire(projectRoot);
26
+ const loaded = registry.resolve(spellName);
27
+ if (!loaded) {
28
+ return {
29
+ state: 'spell-not-found',
30
+ message: `Spell "${spellName}" was not found in the grimoire. The schedule will be created, but the daemon will auto-disable it on the first fire. Check the spell name (try \`flo spell list\`).`,
31
+ };
32
+ }
33
+ const [{ analyzeSpellPermissions }, { StepCommandRegistry }, { builtinCommands },] = await Promise.all([
34
+ import('../spells/core/permission-disclosure.js'),
35
+ import('../spells/core/step-command-registry.js'),
36
+ import('../spells/commands/index.js'),
37
+ ]);
38
+ const stepRegistry = new StepCommandRegistry();
39
+ for (const cmd of builtinCommands) {
40
+ stepRegistry.register(cmd, 'built-in');
41
+ }
42
+ const report = analyzeSpellPermissions(loaded.definition, stepRegistry);
43
+ const result = await checkAcceptance(projectRoot, loaded.definition.name, report.permissionHash);
44
+ if (result.accepted) {
45
+ return { state: 'accepted', message: '' };
46
+ }
47
+ if (result.reason === 'no-acceptance') {
48
+ return {
49
+ state: 'never-accepted',
50
+ message: `Spell "${loaded.definition.name}" has not been accepted yet. Scheduled fires run non-interactively, so the first run will fail with "missing credentials". Run \`flo spell cast -n ${loaded.definition.name}\` once manually to accept permissions, then this schedule will work on the next fire.`,
51
+ };
52
+ }
53
+ return {
54
+ state: 'hash-mismatch',
55
+ message: `Spell "${loaded.definition.name}" permissions have changed since you last accepted them. Re-run \`flo spell cast -n ${loaded.definition.name}\` once to review and re-accept the new permissions; otherwise scheduled fires will fail.`,
56
+ };
57
+ }
58
+ catch (err) {
59
+ // Soft-fail: a Grimoire load error or permission analysis failure must
60
+ // never block schedule creation. Return a quiet check-failed state and
61
+ // let the create proceed. Surface the cause via console.debug so a
62
+ // developer chasing a regression can see why the check degraded
63
+ // without polluting normal CLI output.
64
+ console.debug(`[schedule-acceptance-check] check failed for ${spellName}: ${err.message}`);
65
+ return { state: 'check-failed', message: '' };
66
+ }
67
+ }
68
+ //# sourceMappingURL=schedule-acceptance-check.js.map
@@ -4,11 +4,21 @@
4
4
  * processes write to the same target concurrently.
5
5
  *
6
6
  * Pattern: write to a process-unique temp path `<target>.tmp.<pid>.<rand>`,
7
- * then rename onto `target`.
8
- * - `fs.renameSync` is atomic on POSIX.
9
- * - On Windows, Node maps it to `MoveFileExW(..., MOVEFILE_REPLACE_EXISTING)`,
10
- * which replaces the destination near-atomically concurrent readers
11
- * always observe either the old file or the new, never a truncated one.
7
+ * **fsync the temp file**, then rename onto `target`.
8
+ * - `writeFileSync` does NOT fsync — the OS keeps data in the write cache.
9
+ * On Windows that cache isn't always coherent with what other processes
10
+ * see when they open the freshly-renamed target. Issue #1015 surfaced
11
+ * this as a flaky `memory-retrieve` race in consumer-smoke: process A
12
+ * stores via the daemon → daemon flushes via this helper → daemon
13
+ * returns → process B opens the DB and sees stale content.
14
+ * - The fix: fsync the temp fd before rename. After fsync, the data is
15
+ * durably on disk; the rename then makes that durable data visible
16
+ * atomically. Subsequent readers see the new bytes regardless of cache
17
+ * state.
18
+ * - `fs.renameSync` is atomic on POSIX. On Windows, Node maps it to
19
+ * `MoveFileExW(..., MOVEFILE_REPLACE_EXISTING)`, which replaces the
20
+ * destination near-atomically — concurrent readers always observe either
21
+ * the old file or the new, never a truncated one.
12
22
  * - The unique temp path means concurrent writers can't clobber each other's
13
23
  * in-flight bytes (#635). Last-writer-wins semantics: each rename is fully
14
24
  * atomic, so the destination always reflects exactly one writer's data.
@@ -18,16 +28,28 @@
18
28
  * On any failure, the temp file is best-effort removed and the original
19
29
  * `target` stays intact. The underlying error is always re-thrown.
20
30
  *
31
+ * Windows-only post-rename verify (#1015): on NTFS with antivirus / Defender
32
+ * scanning the freshly-renamed file, a sub-process opening the same path
33
+ * within ~1s can briefly see the file as locked. After a successful rename
34
+ * we poll-open the target until it's readable (or a 250 ms deadline passes)
35
+ * so the next reader doesn't race the AV lock window. The rename itself
36
+ * already succeeded and the data is fsynced, so the verify is best-effort:
37
+ * a timeout returns silently rather than throwing.
38
+ *
21
39
  * `fs` is injectable so the interrupt-mid-write paths can be exercised in
22
40
  * unit tests without depending on ESM-unfriendly module spies.
23
41
  *
24
42
  * @module moflo/cli/shared/utils/atomic-file-write
25
43
  */
26
44
  import * as realFs from 'node:fs';
45
+ const IS_WIN32 = process.platform === 'win32';
46
+ const VERIFY_DEADLINE_MS = 250;
47
+ const VERIFY_STEP_MS = 10;
27
48
  export function atomicWriteFileSync(targetPath, data, fs = realFs) {
28
49
  const tmpPath = `${targetPath}.tmp.${process.pid}.${Math.random().toString(36).slice(2, 8)}`;
29
50
  try {
30
51
  fs.writeFileSync(tmpPath, data);
52
+ fsyncFile(tmpPath, fs);
31
53
  fs.renameSync(tmpPath, targetPath);
32
54
  }
33
55
  catch (err) {
@@ -39,5 +61,61 @@ export function atomicWriteFileSync(targetPath, data, fs = realFs) {
39
61
  }
40
62
  throw err;
41
63
  }
64
+ if (IS_WIN32)
65
+ verifyReadableAfterRename(targetPath, fs);
66
+ }
67
+ /**
68
+ * Open the freshly-written temp file, fsync, close. Ensures the data is
69
+ * durably on disk before rename makes it visible (#1015). Best-effort: an
70
+ * fsync error is swallowed because a real filesystem failure will surface
71
+ * on the rename anyway, and we don't want to mask the more useful error.
72
+ */
73
+ function fsyncFile(tmpPath, fs) {
74
+ const openSync = fs.openSync ?? realFs.openSync;
75
+ const closeSync = fs.closeSync ?? realFs.closeSync;
76
+ const fsyncSync = fs.fsyncSync ?? realFs.fsyncSync;
77
+ let fd = null;
78
+ try {
79
+ fd = openSync(tmpPath, 'r+');
80
+ fsyncSync(fd);
81
+ }
82
+ catch {
83
+ /* fsync best-effort — see fn doc */
84
+ }
85
+ finally {
86
+ if (fd !== null) {
87
+ try {
88
+ closeSync(fd);
89
+ }
90
+ catch { /* close best-effort */ }
91
+ }
92
+ }
93
+ }
94
+ /**
95
+ * Poll-open the target until a reader can succeed, or the deadline passes.
96
+ * Closes the AV-scan settle window on NTFS (#1015). No-op everywhere else.
97
+ *
98
+ * Yields the thread between probes via `Atomics.wait` so we don't pin a CPU
99
+ * during the very contention we're waiting out (`feedback_async_by_default`).
100
+ */
101
+ function verifyReadableAfterRename(targetPath, fs) {
102
+ const openSync = fs.openSync ?? realFs.openSync;
103
+ const closeSync = fs.closeSync ?? realFs.closeSync;
104
+ const deadline = Date.now() + VERIFY_DEADLINE_MS;
105
+ while (true) {
106
+ try {
107
+ closeSync(openSync(targetPath, 'r'));
108
+ return;
109
+ }
110
+ catch {
111
+ if (Date.now() >= deadline)
112
+ return;
113
+ sleepSyncMs(VERIFY_STEP_MS);
114
+ }
115
+ }
116
+ }
117
+ const SLEEP_BUF = new Int32Array(new SharedArrayBuffer(4));
118
+ function sleepSyncMs(ms) {
119
+ Atomics.wait(SLEEP_BUF, 0, 0, ms);
42
120
  }
43
121
  //# sourceMappingURL=atomic-file-write.js.map
@@ -5,8 +5,10 @@
5
5
  * lifecycle. This connector adds server-pool management, lazy spawning, tool
6
6
  * discovery caching, and the SpellConnector interface adapter.
7
7
  *
8
- * The SDK is an optionalDependency and is loaded lazily on first use so
9
- * consumers that don't use the MCP connector don't need it installed.
8
+ * The SDK is a hard `dependency` (MCP is a headline integration), but it is
9
+ * loaded lazily on first use so spells that don't use the MCP connector don't
10
+ * pay its startup cost. The lazy-load also yields an actionable install hint
11
+ * if a corrupted install lost the package.
10
12
  */
11
13
  import { loadOptional } from './shared/optional-import.js';
12
14
  const MCP_INSTALL_MSG = "MCP connector requires '@modelcontextprotocol/sdk' to be installed. Run: npm i @modelcontextprotocol/sdk";
@@ -1,11 +1,18 @@
1
1
  /**
2
2
  * Lazy loader for optional SDK dependencies.
3
3
  *
4
- * Connectors wrapping heavy SDKs (imapflow, mailparser, @modelcontextprotocol/sdk)
5
- * declare them as optionalDependencies so consumers that don't use the connector
6
- * don't need to install them. This helper centralizes the lazy-import +
7
- * MODULE_NOT_FOUND translation + module-scope memoization that each connector
8
- * would otherwise re-implement.
4
+ * Connectors wrapping truly optional SDKs (imapflow, mailparser) declare them
5
+ * as `peerDependenciesMeta.optional` so consumers that don't use the connector
6
+ * don't need to install them. The `@modelcontextprotocol/sdk` is a hard
7
+ * `dependency` because the MCP connector is a headline feature, but it is still
8
+ * routed through this helper so a corrupted install still yields an actionable
9
+ * message instead of a raw MODULE_NOT_FOUND.
10
+ *
11
+ * Every specifier passed to `loadOptional()` MUST be declared in package.json
12
+ * (dependencies, optionalDependencies, or peerDependenciesMeta). The drift
13
+ * guard at `src/cli/__tests__/spells/connectors/optional-import-declared.test.ts`
14
+ * enforces this — it walks shipped connectors, extracts every specifier, and
15
+ * fails the build if one is undeclared.
9
16
  */
10
17
  const moduleCache = new Map();
11
18
  function isModuleNotFound(err) {
@@ -8,7 +8,7 @@
8
8
  * Story #106: Encrypted Credential Storage
9
9
  */
10
10
  import { createCipheriv, createDecipheriv, randomBytes, pbkdf2Sync, } from 'node:crypto';
11
- import { readFileSync, writeFileSync, mkdirSync } from 'node:fs';
11
+ import { readFileSync, writeFileSync, mkdirSync, statSync } from 'node:fs';
12
12
  import { dirname } from 'node:path';
13
13
  // ============================================================================
14
14
  // Constants
@@ -55,6 +55,11 @@ export class CredentialStore {
55
55
  filePath;
56
56
  derivedKey = null;
57
57
  data = null;
58
+ // Tracks the file mtime that produced `this.data`. `null` means the file
59
+ // didn't exist when we last read. refreshIfStale() compares against the
60
+ // current mtime to detect external writes (e.g. CLI subprocesses calling
61
+ // `flo spell credentials set` while the daemon's instance is alive — #1035).
62
+ lastReadMtimeMs = null;
58
63
  constructor(options) {
59
64
  this.filePath = options.filePath;
60
65
  if (options.passphrase) {
@@ -70,6 +75,7 @@ export class CredentialStore {
70
75
  throw new CredentialStoreError(`Passphrase must be at least ${MIN_PASSPHRASE_LENGTH} characters`, 'WEAK_PASSPHRASE');
71
76
  }
72
77
  this.data = this.readFile();
78
+ this.lastReadMtimeMs = this.fileMtimeMs();
73
79
  const salt = Buffer.from(this.data.salt, 'hex');
74
80
  this.derivedKey = deriveKey(passphrase, salt);
75
81
  }
@@ -85,9 +91,17 @@ export class CredentialStore {
85
91
  }
86
92
  /**
87
93
  * Store an encrypted credential.
94
+ *
95
+ * The refreshIfStale() call rebases on the latest on-disk state so we don't
96
+ * write back a snapshot that's missing concurrent additions. It is NOT a
97
+ * mutual-exclusion primitive: two processes calling store() on the same key
98
+ * concurrently still race, and the last writer wins. Cross-process locking
99
+ * is out of scope; the file write is small and the typical layout (one
100
+ * daemon reader + occasional CLI writers) makes the race window vanishing.
88
101
  */
89
102
  async store(name, value, description) {
90
103
  this.ensureUnlocked();
104
+ this.refreshIfStale();
91
105
  const now = new Date().toISOString();
92
106
  const encrypted = encrypt(value, this.derivedKey);
93
107
  const existing = this.data.credentials[name];
@@ -105,6 +119,7 @@ export class CredentialStore {
105
119
  */
106
120
  async get(name) {
107
121
  this.ensureUnlocked();
122
+ this.refreshIfStale();
108
123
  const entry = this.data.credentials[name];
109
124
  if (!entry)
110
125
  return undefined;
@@ -121,6 +136,7 @@ export class CredentialStore {
121
136
  */
122
137
  async has(name) {
123
138
  this.ensureUnlocked();
139
+ this.refreshIfStale();
124
140
  return name in this.data.credentials;
125
141
  }
126
142
  /**
@@ -128,6 +144,7 @@ export class CredentialStore {
128
144
  */
129
145
  async delete(name) {
130
146
  this.ensureUnlocked();
147
+ this.refreshIfStale();
131
148
  if (!(name in this.data.credentials))
132
149
  return false;
133
150
  delete this.data.credentials[name];
@@ -141,6 +158,7 @@ export class CredentialStore {
141
158
  */
142
159
  async clear() {
143
160
  this.ensureUnlocked();
161
+ this.refreshIfStale();
144
162
  const count = Object.keys(this.data.credentials).length;
145
163
  if (count === 0)
146
164
  return 0;
@@ -153,6 +171,7 @@ export class CredentialStore {
153
171
  */
154
172
  async list() {
155
173
  this.ensureUnlocked();
174
+ this.refreshIfStale();
156
175
  return Object.entries(this.data.credentials).map(([name, entry]) => ({
157
176
  name,
158
177
  description: entry.description,
@@ -166,6 +185,7 @@ export class CredentialStore {
166
185
  */
167
186
  async allValues() {
168
187
  this.ensureUnlocked();
188
+ this.refreshIfStale();
169
189
  const values = [];
170
190
  for (const entry of Object.values(this.data.credentials)) {
171
191
  try {
@@ -265,6 +285,49 @@ export class CredentialStore {
265
285
  writeFile(data) {
266
286
  mkdirSync(dirname(this.filePath), { recursive: true });
267
287
  writeFileSync(this.filePath, JSON.stringify(data, null, 2), { encoding: 'utf-8', mode: 0o600 });
288
+ // Adopt the just-written mtime so refreshIfStale() doesn't trigger an
289
+ // unnecessary re-read on the next operation through this instance.
290
+ this.lastReadMtimeMs = this.fileMtimeMs();
291
+ }
292
+ /**
293
+ * Return the file's mtime in ms, or null when the file doesn't exist.
294
+ * Other errors (permissions, etc.) are surfaced — they signal a real problem
295
+ * worth raising rather than silently treating as "no file".
296
+ */
297
+ fileMtimeMs() {
298
+ try {
299
+ return statSync(this.filePath).mtimeMs;
300
+ }
301
+ catch (err) {
302
+ if (err.code === 'ENOENT')
303
+ return null;
304
+ throw err;
305
+ }
306
+ }
307
+ /**
308
+ * Reload `this.data` from disk when the file's mtime differs from what we
309
+ * last read. This is the per-call hook that keeps long-lived instances
310
+ * (the daemon's singleton CredentialStore — see #1035) consistent with
311
+ * writes made by CLI subprocesses.
312
+ *
313
+ * Limitations:
314
+ * - If another process rotated the passphrase, the salt in the reloaded
315
+ * data will mismatch our derivedKey. Subsequent decrypt() calls throw
316
+ * DECRYPTION_FAILED, which the resolver treats as missing — same UX as
317
+ * today's stale-daemon failure mode and only resolved by daemon restart.
318
+ * Rotation-aware reload would need the new passphrase, which we don't
319
+ * have post-construction; out of scope here.
320
+ * - Designed for local filesystems. Network mounts (NFS/SMB) can return
321
+ * coarse or stale mtimes via client caching, which would weaken the
322
+ * detection. The credentials file lives at `~/.moflo/credentials.json`
323
+ * and is expected to be local; network-mounted homedirs aren't supported.
324
+ */
325
+ refreshIfStale() {
326
+ const current = this.fileMtimeMs();
327
+ if (current === this.lastReadMtimeMs)
328
+ return;
329
+ this.data = this.readFile();
330
+ this.lastReadMtimeMs = current;
268
331
  }
269
332
  }
270
333
  export class CredentialStoreError extends Error {
@@ -2,5 +2,5 @@
2
2
  * Auto-generated by build. Do not edit manually.
3
3
  * Source of truth: root package.json → scripts/sync-version.mjs
4
4
  */
5
- export const VERSION = '4.9.31';
5
+ export const VERSION = '4.9.33';
6
6
  //# sourceMappingURL=version.js.map
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "moflo",
3
- "version": "4.9.31",
3
+ "version": "4.9.33",
4
4
  "description": "MoFlo — AI agent orchestration for Claude Code. A standalone, opinionated toolkit with semantic memory, learned routing, gates, spells, and the /flo issue-execution skill.",
5
5
  "main": "dist/src/cli/index.js",
6
6
  "type": "module",
@@ -64,6 +64,7 @@
64
64
  },
65
65
  "dependencies": {
66
66
  "@anush008/tokenizers": "^0.6.0",
67
+ "@modelcontextprotocol/sdk": "^1.0.0",
67
68
  "js-yaml": "^4.1.1",
68
69
  "lru-cache": "^11.3.5",
69
70
  "onnxruntime-node": "^1.24.3",
@@ -72,6 +73,18 @@
72
73
  "tar": "^7.5.11",
73
74
  "valibot": "^1.3.1"
74
75
  },
76
+ "peerDependencies": {
77
+ "imapflow": "^1.0.0",
78
+ "mailparser": "^3.0.0"
79
+ },
80
+ "peerDependenciesMeta": {
81
+ "imapflow": {
82
+ "optional": true
83
+ },
84
+ "mailparser": {
85
+ "optional": true
86
+ }
87
+ },
75
88
  "overrides": {
76
89
  "hono": ">=4.11.4",
77
90
  "picomatch": ">=2.3.2",
@@ -84,7 +97,7 @@
84
97
  "@typescript-eslint/eslint-plugin": "^7.18.0",
85
98
  "@typescript-eslint/parser": "^7.18.0",
86
99
  "eslint": "^8.0.0",
87
- "moflo": "^4.9.30",
100
+ "moflo": "^4.9.32",
88
101
  "tsx": "^4.21.0",
89
102
  "typescript": "^5.9.3",
90
103
  "vitest": "^4.0.0"