npm - @link-assistant/hive-mind - Versions diffs - 1.59.4 → 1.59.5 - Mend

@link-assistant/hive-mind 1.59.4 → 1.59.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/CHANGELOG.md +122 -0
package/package.json +1 -1
package/src/github-merge-repo-actions.lib.mjs +53 -15
package/src/github-merge.lib.mjs +61 -27

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,127 @@
 # @link-assistant/hive-mind
+## 1.59.5
+### Patch Changes
+- bb24175: Fix `/merge` to correctly detect active CI runs on the default branch — issue
+  #1722.
+  The `/merge` command merged PR #1719 even though a CI/CD workflow run was
+  still in progress on `main`. The merge triggered a new run, which cancelled
+  the previous one. Verbose log:
+  ```
+  [VERBOSE] /merge: Checking for active CI runs on link-assistant/hive-mind branch main...
+  [VERBOSE] /merge: Error checking active runs on main: stdout maxBuffer length exceeded
+  [VERBOSE] /merge: No active CI runs on main branch. Ready to proceed.
+  ```
+  Two compounding root causes in
+  [`src/github-merge.lib.mjs`](./src/github-merge.lib.mjs)
+  `getActiveBranchRuns()` (and the parallel
+  [`src/github-merge-repo-actions.lib.mjs`](./src/github-merge-repo-actions.lib.mjs)
+  `getAllActiveRepoRuns()` introduced by issue #1503):
+  1. **No `maxBuffer` override on `gh api --paginate --slurp`.** Node's default
+     `child_process.exec` buffer is 1 MB; the unfiltered `actions/runs` response
+     on this repo's `main` was 12.7 MB, so `exec` rejected with
+     `stdout maxBuffer length exceeded`.
+  2. **Fetch errors became "no active runs".** The `catch` block returned
+     `hasActiveRuns: false`, which the caller (`waitForBranchCI`) interpreted as
+     "branch CI is idle, ready to merge". A transient fetch/buffer/parse error
+     was indistinguishable from genuine idleness.
+  Fix:
+  - **Server-side `?status=` filter**, looped over the active set
+    (`in_progress`, `queued`, `waiting`, `requested`, `pending`) with run-id
+    dedup. Response size scales with active-run count, not with historical-run
+    count — typically a few KB instead of 12+ MB.
+  - **Raise `exec` `maxBuffer` to `githubLimits.bufferMaxSize`** (10 MB, env
+    `HIVE_MIND_GITHUB_BUFFER_MAX_SIZE`) for all `gh` calls in
+    `github-merge.lib.mjs` and `github-merge-repo-actions.lib.mjs`. The existing
+    `githubLimits` infrastructure was already used in `github.batch.lib.mjs`;
+    this just wires it into the `/merge` paths.
+  - **Stop swallowing fetch errors as "idle".** Errors now propagate. The
+    surrounding `waitForBranchCI` / `waitForAllRepoActions` poll loops already
+    retry on the next tick; the timeout-final check has its own try/catch that
+    returns an explicit failure (instead of a false-positive "ready to merge").
+  Tests:
+  [`tests/test-active-branch-runs-buffer-1722.mjs`](./tests/test-active-branch-runs-buffer-1722.mjs)
+  shadows `gh` on `PATH` with a Node script that scripts active-run responses,
+  and asserts: (a) every call uses `?status=`, (b) duplicate runs across
+  statuses are deduplicated, (c) >1 MB responses are handled cleanly, (d)
+  `gh` failures throw rather than report idle, (e) `waitForBranchCI` keeps
+  polling on errors, (f) idle branches still resolve as ready,
+  (g) `getAllActiveRepoRuns` parity.
+  Documentation:
+  [`docs/case-studies/issue-1722/`](./docs/case-studies/issue-1722/README.md)
+  contains the timeline (with downloaded bot log, cancelled-run logs, run
+  metadata), facts, per-symptom root-cause analysis, and solution plan.
+  [`experiments/issue-1722-buffer-overflow.mjs`](./experiments/issue-1722-buffer-overflow.mjs)
+  is a minimal reproduction. No upstream report required — the fix lives
+  entirely in this repo.
+- 1a92ca1: Fix flaky CI `test-suites` job caused by `use-m`'s no-retry global npm install
+  — issue #1724.
+  CI run [25109962685](https://github.com/link-assistant/hive-mind/actions/runs/25109962685/job/73581228475)
+  on `main` failed in the `test-suites` job at the third test file
+  (`tests/test-active-branch-runs-buffer-1722.mjs`) with:
+  ```
+  Error: Failed to install command-stream@latest globally.
+    [cause]: Error: Command failed: npm install -g command-stream-v-latest@npm:command-stream@latest
+    npm error code ENOTEMPTY
+    npm error path /opt/hostedtoolcache/node/24.14.1/x64/lib/node_modules/command-stream-v-latest/js/src/commands
+  ```
+  Root cause: `src/github.lib.mjs` and `src/playwright-mcp.lib.mjs` call
+  `await use('command-stream')` at module top level (via `use-m`). Every test
+  file that transitively imports either module re-runs
+  `npm install -g command-stream-v-latest@npm:command-stream@latest`. `use-m`'s
+  `ensurePackageInstalled` issues a single `npm install -g` with no retry, and
+  npm intermittently fails with `ENOTEMPTY: directory not empty, rmdir` on
+  GitHub-hosted Ubuntu runners (a long-standing npm rmdir race against itself
+  when the previous global install left files behind).
+  Fix:
+  - New
+    [`scripts/preinstall-use-m-packages.mjs`](./scripts/preinstall-use-m-packages.mjs)
+    pre-installs every package the codebase loads through `use-m @latest`
+    (`command-stream`, `getenv`, `links-notation`, `@dotenvx/dotenvx`,
+    `telegraf`, `zx`, `yargs`) using the same alias scheme `use-m` does
+    (`<pkg-without-@-or-/>-v-latest`), with exponential-backoff retry on the
+    flake symptoms (`ENOTEMPTY` / `EBUSY` / `EPERM` / `ECONNRESET` / `ETIMEDOUT`
+    / `EAI_AGAIN` / `429` / `503`). After this step, `use-m`'s
+    `installedVersion === latestVersion` early-return path skips the install at
+    test time, so test imports never touch `npm install -g` again.
+  - The script also satisfies the case-study "verbose mode for next iteration"
+    requirement via `PREINSTALL_USE_M_VERBOSE=1` (or `RUNNER_DEBUG=1`), which
+    logs each attempt's command, stdout, stderr, and backoff delay, and
+    recognizes "package present on disk after a flake" as recovered success.
+  - Wires `node scripts/preinstall-use-m-packages.mjs` into the `test-suites`
+    and `test-execution` jobs in
+    [`.github/workflows/release.yml`](./.github/workflows/release.yml) right
+    after `npm install`, before any step that runs test files or `solve.mjs`.
+  Tests:
+  [`tests/test-preinstall-use-m-packages-1724.mjs`](./tests/test-preinstall-use-m-packages-1724.mjs)
+  covers the alias scheme, retryable-error matcher, exponential backoff, and
+  the four `installWithRetry` paths (first-success, retry-then-succeed,
+  non-retryable-abort, recovered-from-disk) deterministically (no real npm
+  calls). Marked `@hive-mind-test-suite default` so it runs in the same job
+  that previously flaked.
+  Documentation:
+  [`docs/case-studies/issue-1724/`](./docs/case-studies/issue-1724/README.md)
+  contains the timeline, verbatim error, downloaded failed-run logs, the
+  no-retry snippet from the live `use-m` source
+  (`logs/use-m-source.js`), the comparison with both pipeline templates
+  (JS/Rust — neither template uses `use-m @latest` at module load yet, so the
+  flake is hive-mind-specific until they do), and the implementation plan.
 ## 1.59.4
 ### Patch Changes

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@link-assistant/hive-mind",
-  "version": "1.59.4",
+  "version": "1.59.5",
   "description": "AI-powered issue solver and hive mind for collaborative problem solving",
   "main": "src/hive.mjs",
   "type": "module",

package/src/github-merge-repo-actions.lib.mjs CHANGED Viewed

@@ -11,7 +11,14 @@
 import { promisify } from 'util';
 import { exec as execCallback } from 'child_process';
-const exec = promisify(execCallback);
+import { githubLimits } from './config.lib.mjs';
+const execRaw = promisify(execCallback);
+// Issue #1722: raise exec maxBuffer above Node's 1 MB default for paginated gh
+// API responses (workflow runs can easily exceed that on busy repos).
+const exec = (cmd, opts = {}) => execRaw(cmd, { maxBuffer: githubLimits.bufferMaxSize, ...opts });
+// Statuses we treat as "not yet finished".
+const ACTIVE_RUN_STATUSES = ['in_progress', 'queued', 'waiting', 'requested', 'pending'];
 /**
  * Get ALL active workflow runs across the entire repository (no branch filter).
@@ -21,20 +28,34 @@ const exec = promisify(execCallback);
  * @returns {Promise<{runs: Array, hasActiveRuns: boolean, count: number}>}
  */
 export async function getAllActiveRepoRuns(owner, repo, verbose = false) {
-  try {
-    const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?per_page=100" --paginate --slurp`);
-    const runs = JSON.parse(stdout.trim() || '[]')
-      .flatMap(page => page.workflow_runs || [])
-      .filter(run => ['in_progress', 'queued', 'waiting', 'requested', 'pending'].includes(run.status))
-      .map(run => ({ id: run.id, name: run.name, status: run.status, head_branch: run.head_branch, head_sha: run.head_sha?.slice(0, 7) }));
-    if (verbose && runs.length > 0) {
-      console.log(`[VERBOSE] repo-actions: ${runs.length} active run(s) in ${owner}/${repo}`);
-      for (const r of runs) console.log(`[VERBOSE] repo-actions:   ${r.name} (${r.status}) on ${r.head_branch}`);
+  // Issue #1722: filter on the server side per status to avoid pulling the full
+  // history of workflow runs (which can exceed exec maxBuffer). Also: do not
+  // swallow errors as "no active runs" — bubble them up so callers can retry
+  // instead of merging on top of a still-running CI run.
+  const seen = new Set();
+  const runs = [];
+  for (const status of ACTIVE_RUN_STATUSES) {
+    const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?status=${status}&per_page=100" --paginate --slurp`);
+    const pages = JSON.parse(stdout.trim() || '[]');
+    for (const page of pages) {
+      for (const run of page.workflow_runs || []) {
+        if (seen.has(run.id)) continue;
+        seen.add(run.id);
+        runs.push({
+          id: run.id,
+          name: run.name,
+          status: run.status,
+          head_branch: run.head_branch,
+          head_sha: run.head_sha?.slice(0, 7),
+        });
+      }
     }
-    return { runs, hasActiveRuns: runs.length > 0, count: runs.length };
-  } catch {
-    return { runs: [], hasActiveRuns: false, count: 0 };
   }
+  if (verbose && runs.length > 0) {
+    console.log(`[VERBOSE] repo-actions: ${runs.length} active run(s) in ${owner}/${repo}`);
+    for (const r of runs) console.log(`[VERBOSE] repo-actions:   ${r.name} (${r.status}) on ${r.head_branch}`);
+  }
+  return { runs, hasActiveRuns: runs.length > 0, count: runs.length };
 }
 /**
@@ -52,7 +73,16 @@ export async function waitForAllRepoActions(owner, repo, options = {}, verbose =
   let peakRunCount = 0;
   while (Date.now() - startTime < timeout) {
-    const active = await getAllActiveRepoRuns(owner, repo, verbose);
+    let active;
+    try {
+      active = await getAllActiveRepoRuns(owner, repo, verbose);
+    } catch (error) {
+      // Issue #1722: do not silently treat fetch errors as "no active runs".
+      // Log and retry on the next poll instead.
+      console.error(`[ERROR] repo-actions: Error checking repo CI: ${error.message}`);
+      await new Promise(resolve => setTimeout(resolve, pollInterval));
+      continue;
+    }
     if (onStatusUpdate) {
       try {
         await onStatusUpdate({ ...active, elapsedMs: Date.now() - startTime });
@@ -66,7 +96,15 @@ export async function waitForAllRepoActions(owner, repo, options = {}, verbose =
     peakRunCount = Math.max(peakRunCount, active.count);
     await new Promise(resolve => setTimeout(resolve, pollInterval));
   }
-  const finalRuns = await getAllActiveRepoRuns(owner, repo, verbose);
+  // Issue #1722: if the timeout-final check throws, surface that as an error
+  // rather than reporting "no remaining runs".
+  let finalRuns;
+  try {
+    finalRuns = await getAllActiveRepoRuns(owner, repo, verbose);
+  } catch (error) {
+    console.error(`[ERROR] repo-actions: Final CI check failed after timeout: ${error.message}`);
+    return { success: false, waitedForRuns: true, timedOut: true, remainingRuns: [] };
+  }
   return { success: false, waitedForRuns: true, timedOut: true, remainingRuns: finalRuns.runs };
 }

package/src/github-merge.lib.mjs CHANGED Viewed

@@ -14,9 +14,17 @@
 import { promisify } from 'util';
 import { exec as execCallback } from 'child_process';
-const exec = promisify(execCallback);
+const execRaw = promisify(execCallback);
 import { parseGitHubUrl } from './github.lib.mjs';
+import { githubLimits } from './config.lib.mjs';
+// Issue #1722: gh api `--paginate --slurp` responses for repos with many
+// historical workflow runs can easily exceed Node's default 1 MB exec buffer
+// (observed 12.7 MB on this repo's main branch). Default to the configured
+// githubLimits.bufferMaxSize (10 MB; HIVE_MIND_GITHUB_BUFFER_MAX_SIZE) for all
+// gh calls in this file.
+const exec = (cmd, opts = {}) => execRaw(cmd, { maxBuffer: githubLimits.bufferMaxSize, ...opts });
 // Issue #1413: Import ready tag sync, timeline, and label constant from separate module
 // to keep this file under the 1500 line limit
@@ -674,9 +682,20 @@ export function parseRepositoryUrl(url) {
   };
 }
+/**
+ * Statuses we treat as "still running" / "not yet finished".
+ * Issue #1722: be exhaustive — GitHub uses several non-completed statuses.
+ */
+const ACTIVE_RUN_STATUSES = ['in_progress', 'queued', 'waiting', 'requested', 'pending'];
 /**
  * Get active workflow runs on a specific branch
  * Issue #1307: Used to check if there are any in-progress or queued runs on the target branch
+ * Issue #1722: Filter on the server side per status, otherwise the unfiltered
+ * `--paginate --slurp` response can overflow exec maxBuffer on busy repos
+ * (observed 12.7 MB on link-assistant/hive-mind main). Also: errors are now
+ * surfaced rather than swallowed as `hasActiveRuns: false`, which previously
+ * caused /merge to merge on top of a still-running CI run.
  * @param {string} owner - Repository owner
  * @param {string} repo - Repository name
  * @param {string} branch - Branch name (default: main)
@@ -684,36 +703,38 @@ export function parseRepositoryUrl(url) {
  * @returns {Promise<{runs: Array<Object>, hasActiveRuns: boolean, count: number}>}
  */
 export async function getActiveBranchRuns(owner, repo, branch = 'main', verbose = false) {
-  try {
-    // Query for in_progress and queued runs on the specified branch
-    const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?branch=${branch}&per_page=100" --paginate --slurp`);
-    const runs = JSON.parse(stdout.trim() || '[]')
-      .flatMap(page => page.workflow_runs || [])
-      .filter(run => run.status === 'in_progress' || run.status === 'queued')
-      .map(run => ({ id: run.id, name: run.name, status: run.status, created_at: run.created_at, html_url: run.html_url }));
-    if (verbose) {
-      console.log(`[VERBOSE] /merge: Found ${runs.length} active runs on ${owner}/${repo} branch ${branch}`);
-      for (const run of runs) {
-        console.log(`[VERBOSE] /merge:   - Run #${run.id}: ${run.name} (${run.status})`);
+  const seen = new Set();
+  const runs = [];
+  for (const status of ACTIVE_RUN_STATUSES) {
+    const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?branch=${branch}&status=${status}&per_page=100" --paginate --slurp`);
+    const pages = JSON.parse(stdout.trim() || '[]');
+    for (const page of pages) {
+      for (const run of page.workflow_runs || []) {
+        if (seen.has(run.id)) continue;
+        seen.add(run.id);
+        runs.push({
+          id: run.id,
+          name: run.name,
+          status: run.status,
+          created_at: run.created_at,
+          html_url: run.html_url,
+        });
       }
     }
+  }
-    return {
-      runs,
-      hasActiveRuns: runs.length > 0,
-      count: runs.length,
-    };
-  } catch (error) {
-    if (verbose) {
-      console.log(`[VERBOSE] /merge: Error checking active runs on ${branch}: ${error.message}`);
+  if (verbose) {
+    console.log(`[VERBOSE] /merge: Found ${runs.length} active runs on ${owner}/${repo} branch ${branch}`);
+    for (const run of runs) {
+      console.log(`[VERBOSE] /merge:   - Run #${run.id}: ${run.name} (${run.status})`);
     }
-    return {
-      runs: [],
-      hasActiveRuns: false,
-      count: 0,
-    };
   }
+  return {
+    runs,
+    hasActiveRuns: runs.length > 0,
+    count: runs.length,
+  };
 }
 /**
@@ -788,7 +809,20 @@ export async function waitForBranchCI(owner, repo, branch = 'main', options = {}
   }
   // Timeout reached
-  const finalCheck = await getActiveBranchRuns(owner, repo, branch, verbose);
+  // Issue #1722: if the final check throws, do NOT silently report "ready".
+  // Treat it the same as still-active (force a timeout failure), so /merge
+  // waits/retries instead of merging on top of a still-running CI run.
+  let finalCheck;
+  try {
+    finalCheck = await getActiveBranchRuns(owner, repo, branch, verbose);
+  } catch (error) {
+    return {
+      success: false,
+      waitedForRuns: true,
+      completedRuns: totalWaitedRuns,
+      error: `Timeout reached and final CI check failed on ${branch}: ${error.message}`,
+    };
+  }
   if (finalCheck.hasActiveRuns) {
     return {
       success: false,