npm - kushi-agents - Versions diffs - 5.7.5 → 5.7.6 - Mend

kushi-agents 5.7.5 → 5.7.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/package.json +1 -1
package/plugin/learnings/cross-cutting.md +27 -0
package/plugin/runners/discover.mjs +6 -3
package/plugin/runners/test/unit/discover.test.mjs +1 -1

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "kushi-agents",
-  "version": "5.7.5",
+  "version": "5.7.6",
   "description": "Install Kushi — multi-source project evidence agent with Comprehensive Structured Capture (CSC) into weekly-only files across Email, Teams, OneNote, Loop, SharePoint, Meetings, CRM, ADO. Meetings retain a sibling verbatim/ audit folder. WorkIQ-only for M365 sources (Graph / m365_* FORBIDDEN as fallbacks; user-paste is first-class). Host-agnostic.",
   "type": "module",
   "bin": {

package/plugin/learnings/cross-cutting.md CHANGED Viewed

@@ -4,6 +4,33 @@ Newest on top. Format defined in [`README.md`](./README.md). Use this file when
 ---
+### 2026-05-29 — WorkIQ stdout is fully-buffered through the Windows spawn chain
+**Symptom**: After v5.7.5 added intelligent heartbeats, user re-ran discover. Heartbeats showed `...waiting for first byte (10s/180s, no output yet)` for 170+ seconds, then `✗ TIMEOUT after 180045ms (received 0 bytes before kill)`. Yet running `workiq.cmd ask -q "..."` directly from PowerShell returned 4 emails in 29 seconds.
+**Root cause** (reproduced via standalone node-spawn test): The Windows WorkIQ CLI is a 6-process chain — `cmd.exe → workiq.cmd → node-runner.exe → Clawpilot.exe → workiq-wrapper.cjs → workiq.exe`. When invoked from a child process (not a TTY), stdout is **fully-buffered**: WorkIQ accumulates the entire response and only flushes at process exit. A single 4649-byte response that takes 43 seconds shows zero bytes for the first 42 seconds and all 4649 bytes in one chunk at second 43. Direct PowerShell invocation gets line-buffered output via TTY, which is why interactive testing hides this.
+**Implication**: There's no way for `discover.mjs` to stream WorkIQ partial output. We can only wait for completion. Therefore:
+- Timeouts must be generous enough for the slowest legitimate query (5+ minutes, not 90s).
+- "0 bytes received" during heartbeat is **NOT** a stuck signal on Windows — it's the steady state for the entire query. Heartbeat copy must reflect that.
+- Total worst-case discover walltime is `7 × timeout` (35 min at 5min budget) but typical ≈ 30s/source ≈ 4 min total.
+**Fix shipped (v5.7.6, 2026-05-29)**:
+1. **Default `--timeout-ms` bumped from 180s to 300s** (5 min per source).
+2. **Heartbeat message updated** when stdout is empty: `...still running (Xs/Ys, WorkIQ buffers stdout until exit, this is normal)` — explains the 0-byte state instead of misleadingly saying "no output yet".
+3. **Test mirror updated** to accept the new heartbeat copy.
+**Lesson — pipe-buffered stdio invalidates "no output = stuck" heuristics**:
+- On Windows especially, complex CLI chains (Electron/wrapper-based tools) often pipe-buffer their stdout, releasing all output at process exit.
+- Heartbeat / progress UI must distinguish "0 bytes" from "stuck" carefully. The fact that the runtime is *running* (process alive, parent still reading from pipe) is more meaningful than byte count.
+- For runners shelling out to such tools, default timeouts must scale to the slowest legitimate query, not the median. 5 min/source is a safer default than 90s.
+- This is a property of the *child tool*, not of our code. We can't make WorkIQ stream; we can only set realistic budgets and explain the wait honestly.
+**Files changed**: `plugin/runners/discover.mjs` (timeout default + heartbeat copy), `plugin/runners/test/unit/discover.test.mjs` (heartbeat assertion).
+---
 ### 2026-05-29 — Per-step stderr alone is not enough; long single steps need heartbeat ticks
 **Symptom**: After v5.7.4 added per-source `[discover] → email ...` stderr lines, a user re-ran discover from VS Code Copilot Chat. Each WorkIQ call still took 30–90s, and the host **still** killed the runner because between sources there was no output for 30+ seconds. The single `[discover] → email ...` line at the start of each source wasn't enough — the host watchdog measures *idle output* (no bytes on either stream for N seconds), not just *no progress at all*.

package/plugin/runners/discover.mjs CHANGED Viewed

@@ -24,7 +24,7 @@ import { ask as workiqAsk, resolveWorkiqBin } from './lib/workiq.mjs';
 const ALL_SOURCES = ['email', 'teams', 'meetings', 'onenote', 'sharepoint', 'crm', 'ado'];
 function parseArgs(argv) {
-  const args = { force: false, dryRun: false, timeoutMs: 180_000, sources: null };
+  const args = { force: false, dryRun: false, timeoutMs: 300_000, sources: null };
   for (let i = 0; i < argv.length; i++) {
     const a = argv[i];
     if (a === '--project') args.project = argv[++i];
@@ -32,7 +32,7 @@ function parseArgs(argv) {
     else if (a === '--source') (args.sources ??= []).push(argv[++i]);
     else if (a === '--force') args.force = true;
     else if (a === '--dry-run') args.dryRun = true;
-    else if (a === '--timeout-ms') args.timeoutMs = Number(argv[++i]) || 180_000;
+    else if (a === '--timeout-ms') args.timeoutMs = Number(argv[++i]) || 300_000;
     else if (a === '--help' || a === '-h') args.help = true;
   }
   return args;
@@ -224,7 +224,10 @@ async function main() {
       const sec = Math.round(elapsedMs / 1000);
       const budget = Math.round(args.timeoutMs / 1000);
       if (stdoutBytes === 0) {
-        log(`  ${source}: ...waiting for first byte (${sec}s/${budget}s, no output yet)`);
+        // WorkIQ on Windows pipes stdout fully-buffered (cmd.exe → node-runner →
+        // Clawpilot → wrapper.cjs → workiq.exe chain). Output won't appear until
+        // workiq exits. 0 bytes ≠ stuck — it's normal until the final flush.
+        log(`  ${source}: ...still running (${sec}s/${budget}s, WorkIQ buffers stdout until exit, this is normal)`);
       } else {
         log(`  ${source}: ...still working (${sec}s/${budget}s, ${stdoutBytes} bytes received)`);
       }

package/plugin/runners/test/unit/discover.test.mjs CHANGED Viewed

@@ -195,6 +195,6 @@ test('discover: heartbeat ticks emit during slow workiq calls (v5.7.5 host-watch
     KUSHI_DISCOVER_HEARTBEAT_MS: '1000', // 1s heartbeat for the test
   });
   assert.equal(r.code, 0, r.stderr);
-  assert.match(r.stderr, /still working|waiting for first byte/,
+  assert.match(r.stderr, /still working|still running/,
     `expected heartbeat tick on stderr, got: ${r.stderr}`);
 });