npm - @link-assistant/hive-mind - Versions diffs - 1.59.4 → 1.59.6 - Mend

@link-assistant/hive-mind 1.59.4 → 1.59.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

package/CHANGELOG.md +242 -0
package/package.json +1 -1
package/src/bidirectional-interactive.lib.mjs +1 -0
package/src/contributing-guidelines.lib.mjs +3 -2
package/src/github-error-reporter.lib.mjs +3 -2
package/src/github-merge-ci-signals.lib.mjs +8 -2
package/src/github-merge-ci.lib.mjs +8 -2
package/src/github-merge-ready-sync.lib.mjs +7 -1
package/src/github-merge-repo-actions.lib.mjs +59 -15
package/src/github-merge.lib.mjs +100 -58
package/src/github-rate-limit.lib.mjs +276 -0
package/src/github.batch.lib.mjs +1 -0
package/src/hive.mjs +2 -2
package/src/hive.recheck.lib.mjs +1 -0
package/src/lib.mjs +30 -4
package/src/limits.lib.mjs +1 -0
package/src/protect-branch.mjs +3 -2
package/src/queue-config.lib.mjs +7 -3
package/src/review.mjs +3 -2
package/src/reviewers-hive.mjs +3 -2
package/src/solve.accept-invite.lib.mjs +7 -1
package/src/solve.auto-continue.lib.mjs +3 -2
package/src/solve.auto-ensure.lib.mjs +3 -2
package/src/solve.auto-merge-helpers.lib.mjs +3 -2
package/src/solve.auto-merge.lib.mjs +3 -2
package/src/solve.auto-pr.lib.mjs +1 -0
package/src/solve.branch-errors.lib.mjs +1 -0
package/src/solve.error-handlers.lib.mjs +1 -0
package/src/solve.execution.lib.mjs +3 -2
package/src/solve.feedback.lib.mjs +1 -0
package/src/solve.mjs +3 -1
package/src/solve.preparation.lib.mjs +1 -0
package/src/solve.progress-monitoring.lib.mjs +1 -0
package/src/solve.repository.lib.mjs +3 -3
package/src/solve.restart-shared.lib.mjs +3 -2
package/src/solve.results.lib.mjs +3 -2
package/src/solve.session.lib.mjs +1 -0
package/src/solve.watch.lib.mjs +3 -2
package/src/telegram-accept-invitations.lib.mjs +7 -1
package/src/token-sanitization.lib.mjs +1 -0
package/src/youtrack/youtrack-sync.mjs +1 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,247 @@
 # @link-assistant/hive-mind
+## 1.59.6
+### Patch Changes
+- d6d05a0: Fully safeguard from GitHub API rate-limit errors — issue #1726.
+  `/merge` merged a draft PR even though every `gh api` call had been failing
+  with `HTTP 403: API rate limit exceeded`. The merge subsystem caught those
+  errors silently in `getActiveRepoWorkflows()` and reported _"no CI checks
+  and repo has no active workflows — no CI/CD configured"_, which `/merge`
+  interpreted as _"all clear"_. Verbose log
+  ([`docs/case-studies/issue-1726/data/a4dccea2-a941-4a0c-a50e-60b1ed454e1e.log`](./docs/case-studies/issue-1726/data/a4dccea2-a941-4a0c-a50e-60b1ed454e1e.log),
+  lines 40251–40269):
+  ```
+  [VERBOSE] /merge: Error fetching workflows for link-foundation/relative-meta-logic:
+    Command failed: gh api "repos/link-foundation/relative-meta-logic/actions/workflows" --paginate --slurp
+  gh: API rate limit exceeded for user ID 1431904 ... (HTTP 403)
+  [VERBOSE] /merge: PR #100 has no CI checks and repo has no active workflows - no CI/CD configured
+  ```
+  Two combining root causes:
+  1. **`getActiveRepoWorkflows()` swallowed exceptions** in
+     [`src/github-merge.lib.mjs`](./src/github-merge.lib.mjs) and returned
+     `[]`. Rate-limit responses became "this repo has no workflows", which the
+     merge gate treated as "no CI configured, safe to merge".
+  2. **No gh API call site had rate-limit retry**. The existing
+     `ghCmdRetry`/`ghRetry` helpers only recognised transient TCP/TLS faults,
+     so a 403 fell straight through. ~135 raw `$gh ...` and
+     ``exec(`gh ...`)`` call sites scattered across `src/solve.*`,
+     `src/github-merge.*`, scripts, and reviewers.
+  Fix:
+  - **New rate-limit module**
+    [`src/github-rate-limit.lib.mjs`](./src/github-rate-limit.lib.mjs) with
+    `isRateLimitError`, `parseRateLimitReset`, `fetchNextRateLimitReset`,
+    `computeRateLimitWait`, `ghWithRateLimitRetry`, `execGhWithRetry`,
+    `wrapDollarWithGhRetry`. Applies the issue's policy:
+    `wait = (resetTime − now) + bufferMs (10 min) + random(0..jitterMs) (0..5 min)`,
+    reusing `limitReset.bufferMs` / `limitReset.jitterMs` from
+    [`src/config.lib.mjs`](./src/config.lib.mjs) (introduced in #1236).
+  - **Propagate errors instead of swallowing**. `getActiveRepoWorkflows()`
+    no longer wraps the gh call in try/catch that returns `[]`. Errors bubble
+    up; the merge gate sees the failure and stops.
+  - **Layered retry in legacy helpers**. `ghRetry` and `ghCmdRetry` in
+    [`src/lib.mjs`](./src/lib.mjs) check `isRateLimitError` first and delegate
+    to `ghWithRateLimitRetry` before applying transient-network retry.
+  - **Local `exec` shim** in 7 merge files rebound through
+    `ghWithRateLimitRetry` — converts every existing ``exec(`gh ...`)`` site
+    without per-call edits.
+  - **Wrapped `$` at every entry point** (15 files). `wrapDollarWithGhRetry`
+    routes every `$gh ...` through the retry helper while passing non-gh
+    commands unchanged.
+  - **Marker imports** in 17 callee files that receive `$` as a parameter,
+    declaring rate-limit awareness for the ESLint rule.
+  - **Queue threshold lowered** from 75% to 50% in
+    [`src/queue-config.lib.mjs`](./src/queue-config.lib.mjs).
+  - **Custom ESLint rule**
+    [`eslint-rules/no-direct-gh-exec.mjs`](./eslint-rules/no-direct-gh-exec.mjs)
+    flags any unsafe `gh` exec call site; files that import a known-safe
+    wrapper are exempted at file scope.
+  Tests:
+  - [`tests/github-rate-limit.test.mjs`](./tests/github-rate-limit.test.mjs)
+    — 22 unit tests covering `isRateLimitError` (primary, secondary,
+    abuse-detection, stderr, cause-chain), `parseRateLimitReset` (header
+    variants), `computeRateLimitWait` (future / null / past reset, jitter
+    bounds), `ghWithRateLimitRetry` (success, propagation, retry-then-succeed,
+    exhausted retries), `wrapDollarWithGhRetry` (passthrough, retry,
+    propagation).
+  - [`tests/test-no-direct-gh-exec-rule.mjs`](./tests/test-no-direct-gh-exec-rule.mjs)
+    — RuleTester valid/invalid cases.
+  - Updated `tests/queue-config.test.mjs` and `tests/limits-display.test.mjs`
+    for the 50% threshold.
+  Documentation:
+  [`docs/case-studies/issue-1726/`](./docs/case-studies/issue-1726/README.md)
+  contains the failing run logs, root-cause analysis, fix breakdown, and
+  verification commands.
+- bb0af8c: Fix `check-file-line-limits` CI failure on `main` after issue #1726 merge.
+  After PR #1726 (rate-limit safeguards) merged into `main`, the
+  `check-file-line-limits` job failed because three `.mjs` files crossed the
+  1500-line hard limit:
+  - `src/hive.mjs` — 1500 → 1504 lines
+  - `src/limits.lib.mjs` — 1497 → 1501 lines
+  - `src/solve.repository.lib.mjs` — 1500 → 1501 lines
+  Two root causes combined: (1) the per-file marker block PR #1726 added was 4
+  lines (2 comment lines + import + `void`), with no headroom check; (2) ESLint's
+  `max-lines` rule was configured with `skipBlankLines: true, skipComments: true`
+  while the CI script counts raw `wc -l`, so `npm run lint` passed locally even
+  though the CI script would fail. Local lint and CI line-limit had silently
+  drifted apart. See
+  [`docs/case-studies/issue-1730`](./docs/case-studies/issue-1730/README.md)
+  for the timeline, log excerpts, and template comparison.
+  Fix:
+  - **Synchronize ESLint `max-lines` with the CI script** in
+    [`eslint.config.mjs`](./eslint.config.mjs) by setting `skipBlankLines: false,
+skipComments: false`. Now `npm run lint` catches the failure locally before
+    push, restoring the invariant the rule's comment claimed.
+  - **Compact the rate-limit marker** introduced by #1726 from 4 lines to 1 line
+    in all 17 files. ESLint's existing `varsIgnorePattern: '^_'` means the
+    `void _wrapDollarWithGhRetry;` line was redundant; the trailing-comment form
+    preserves rate-limit awareness for `no-direct-gh-exec` while saving 3 lines
+    per file. Files: `src/hive.mjs`, `src/limits.lib.mjs`,
+    `src/{solve.session,solve.preparation,solve.progress-monitoring,solve.error-handlers,solve.feedback,solve.auto-pr,solve.branch-errors,hive.recheck,github.batch,bidirectional-interactive,token-sanitization}.lib.mjs`,
+    `src/youtrack/youtrack-sync.mjs`,
+    `scripts/{create-github-release,format-github-release,format-release-notes}.mjs`.
+  - **Compact `solve.repository.lib.mjs`** wrap pattern from 4 lines to 3 while
+    keeping the destructure form so `eslint-rules/no-direct-gh-exec.mjs` still
+    recognizes `wrapDollarWithGhRetry` in scope.
+  After the fix, all three previously-failing files are at or below 1500 raw
+  lines (1500 / 1498 / 1500) and `npm run lint` would now reject any
+  re-introduction of the regression.
+## 1.59.5
+### Patch Changes
+- bb24175: Fix `/merge` to correctly detect active CI runs on the default branch — issue
+  #1722.
+  The `/merge` command merged PR #1719 even though a CI/CD workflow run was
+  still in progress on `main`. The merge triggered a new run, which cancelled
+  the previous one. Verbose log:
+  ```
+  [VERBOSE] /merge: Checking for active CI runs on link-assistant/hive-mind branch main...
+  [VERBOSE] /merge: Error checking active runs on main: stdout maxBuffer length exceeded
+  [VERBOSE] /merge: No active CI runs on main branch. Ready to proceed.
+  ```
+  Two compounding root causes in
+  [`src/github-merge.lib.mjs`](./src/github-merge.lib.mjs)
+  `getActiveBranchRuns()` (and the parallel
+  [`src/github-merge-repo-actions.lib.mjs`](./src/github-merge-repo-actions.lib.mjs)
+  `getAllActiveRepoRuns()` introduced by issue #1503):
+  1. **No `maxBuffer` override on `gh api --paginate --slurp`.** Node's default
+     `child_process.exec` buffer is 1 MB; the unfiltered `actions/runs` response
+     on this repo's `main` was 12.7 MB, so `exec` rejected with
+     `stdout maxBuffer length exceeded`.
+  2. **Fetch errors became "no active runs".** The `catch` block returned
+     `hasActiveRuns: false`, which the caller (`waitForBranchCI`) interpreted as
+     "branch CI is idle, ready to merge". A transient fetch/buffer/parse error
+     was indistinguishable from genuine idleness.
+  Fix:
+  - **Server-side `?status=` filter**, looped over the active set
+    (`in_progress`, `queued`, `waiting`, `requested`, `pending`) with run-id
+    dedup. Response size scales with active-run count, not with historical-run
+    count — typically a few KB instead of 12+ MB.
+  - **Raise `exec` `maxBuffer` to `githubLimits.bufferMaxSize`** (10 MB, env
+    `HIVE_MIND_GITHUB_BUFFER_MAX_SIZE`) for all `gh` calls in
+    `github-merge.lib.mjs` and `github-merge-repo-actions.lib.mjs`. The existing
+    `githubLimits` infrastructure was already used in `github.batch.lib.mjs`;
+    this just wires it into the `/merge` paths.
+  - **Stop swallowing fetch errors as "idle".** Errors now propagate. The
+    surrounding `waitForBranchCI` / `waitForAllRepoActions` poll loops already
+    retry on the next tick; the timeout-final check has its own try/catch that
+    returns an explicit failure (instead of a false-positive "ready to merge").
+  Tests:
+  [`tests/test-active-branch-runs-buffer-1722.mjs`](./tests/test-active-branch-runs-buffer-1722.mjs)
+  shadows `gh` on `PATH` with a Node script that scripts active-run responses,
+  and asserts: (a) every call uses `?status=`, (b) duplicate runs across
+  statuses are deduplicated, (c) >1 MB responses are handled cleanly, (d)
+  `gh` failures throw rather than report idle, (e) `waitForBranchCI` keeps
+  polling on errors, (f) idle branches still resolve as ready,
+  (g) `getAllActiveRepoRuns` parity.
+  Documentation:
+  [`docs/case-studies/issue-1722/`](./docs/case-studies/issue-1722/README.md)
+  contains the timeline (with downloaded bot log, cancelled-run logs, run
+  metadata), facts, per-symptom root-cause analysis, and solution plan.
+  [`experiments/issue-1722-buffer-overflow.mjs`](./experiments/issue-1722-buffer-overflow.mjs)
+  is a minimal reproduction. No upstream report required — the fix lives
+  entirely in this repo.
+- 1a92ca1: Fix flaky CI `test-suites` job caused by `use-m`'s no-retry global npm install
+  — issue #1724.
+  CI run [25109962685](https://github.com/link-assistant/hive-mind/actions/runs/25109962685/job/73581228475)
+  on `main` failed in the `test-suites` job at the third test file
+  (`tests/test-active-branch-runs-buffer-1722.mjs`) with:
+  ```
+  Error: Failed to install command-stream@latest globally.
+    [cause]: Error: Command failed: npm install -g command-stream-v-latest@npm:command-stream@latest
+    npm error code ENOTEMPTY
+    npm error path /opt/hostedtoolcache/node/24.14.1/x64/lib/node_modules/command-stream-v-latest/js/src/commands
+  ```
+  Root cause: `src/github.lib.mjs` and `src/playwright-mcp.lib.mjs` call
+  `await use('command-stream')` at module top level (via `use-m`). Every test
+  file that transitively imports either module re-runs
+  `npm install -g command-stream-v-latest@npm:command-stream@latest`. `use-m`'s
+  `ensurePackageInstalled` issues a single `npm install -g` with no retry, and
+  npm intermittently fails with `ENOTEMPTY: directory not empty, rmdir` on
+  GitHub-hosted Ubuntu runners (a long-standing npm rmdir race against itself
+  when the previous global install left files behind).
+  Fix:
+  - New
+    [`scripts/preinstall-use-m-packages.mjs`](./scripts/preinstall-use-m-packages.mjs)
+    pre-installs every package the codebase loads through `use-m @latest`
+    (`command-stream`, `getenv`, `links-notation`, `@dotenvx/dotenvx`,
+    `telegraf`, `zx`, `yargs`) using the same alias scheme `use-m` does
+    (`<pkg-without-@-or-/>-v-latest`), with exponential-backoff retry on the
+    flake symptoms (`ENOTEMPTY` / `EBUSY` / `EPERM` / `ECONNRESET` / `ETIMEDOUT`
+    / `EAI_AGAIN` / `429` / `503`). After this step, `use-m`'s
+    `installedVersion === latestVersion` early-return path skips the install at
+    test time, so test imports never touch `npm install -g` again.
+  - The script also satisfies the case-study "verbose mode for next iteration"
+    requirement via `PREINSTALL_USE_M_VERBOSE=1` (or `RUNNER_DEBUG=1`), which
+    logs each attempt's command, stdout, stderr, and backoff delay, and
+    recognizes "package present on disk after a flake" as recovered success.
+  - Wires `node scripts/preinstall-use-m-packages.mjs` into the `test-suites`
+    and `test-execution` jobs in
+    [`.github/workflows/release.yml`](./.github/workflows/release.yml) right
+    after `npm install`, before any step that runs test files or `solve.mjs`.
+  Tests:
+  [`tests/test-preinstall-use-m-packages-1724.mjs`](./tests/test-preinstall-use-m-packages-1724.mjs)
+  covers the alias scheme, retryable-error matcher, exponential backoff, and
+  the four `installWithRetry` paths (first-success, retry-then-succeed,
+  non-retryable-abort, recovered-from-disk) deterministically (no real npm
+  calls). Marked `@hive-mind-test-suite default` so it runs in the same job
+  that previously flaked.
+  Documentation:
+  [`docs/case-studies/issue-1724/`](./docs/case-studies/issue-1724/README.md)
+  contains the timeline, verbatim error, downloaded failed-run logs, the
+  no-retry snippet from the live `use-m` source
+  (`logs/use-m-source.js`), the comparison with both pipeline templates
+  (JS/Rust — neither template uses `use-m @latest` at module load yet, so the
+  flake is hive-mind-specific until they do), and the implementation plan.
 ## 1.59.4
 ### Patch Changes

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@link-assistant/hive-mind",
-  "version": "1.59.4",
+  "version": "1.59.6",
   "description": "AI-powered issue solver and hive mind for collaborative problem solving",
   "main": "src/hive.mjs",
   "type": "module",

package/src/bidirectional-interactive.lib.mjs CHANGED Viewed

@@ -22,6 +22,7 @@
  * @experimental
  */
+import { wrapDollarWithGhRetry as _wrapDollarWithGhRetry } from './github-rate-limit.lib.mjs'; // rate-limit marker (#1726): gh API calls flow through $ wrapped by caller
 // Configuration constants
 const CONFIG = {
   // Minimum time between comment checks to avoid rate limiting (in ms)

package/src/contributing-guidelines.lib.mjs CHANGED Viewed

@@ -9,8 +9,9 @@ if (typeof globalThis.use === 'undefined') {
   globalThis.use = (await eval(await (await fetch('https://unpkg.com/use-m/use.js')).text())).use;
 }
-const { $ } = await use('command-stream');
+const { $: __rawDollar$ } = await use('command-stream');
+const { wrapDollarWithGhRetry } = await import('./github-rate-limit.lib.mjs');
+const $ = wrapDollarWithGhRetry(__rawDollar$);
 /**
  * Common paths where contributing guidelines might be found
  */

package/src/github-error-reporter.lib.mjs CHANGED Viewed

@@ -13,8 +13,9 @@ if (typeof globalThis.use === 'undefined') {
 }
 const fs = (await use('fs')).promises;
-const { $ } = await use('command-stream');
+const { $: __rawDollar$ } = await use('command-stream');
+const { wrapDollarWithGhRetry } = await import('./github-rate-limit.lib.mjs');
+const $ = wrapDollarWithGhRetry(__rawDollar$);
 const GITHUB_ISSUE_BODY_MAX_SIZE = 60000;
 const GITHUB_FILE_MAX_SIZE = 10 * 1024 * 1024;

package/src/github-merge-ci-signals.lib.mjs CHANGED Viewed

@@ -13,8 +13,14 @@
 import { promisify } from 'util';
 import { exec as execCallback } from 'child_process';
-const exec = promisify(execCallback);
+import { ghWithRateLimitRetry } from './github-rate-limit.lib.mjs';
+const execRaw = promisify(execCallback);
+// Issue #1726: rate-limit safe gh wrapper.
+const exec = (cmd, opts) =>
+  ghWithRateLimitRetry(() => execRaw(cmd, opts), {
+    label: `gh exec (${cmd.split(/\s+/).slice(0, 3).join(' ')})`,
+  });
 /**
  * Get the committed date of a specific commit from GitHub API

package/src/github-merge-ci.lib.mjs CHANGED Viewed

@@ -11,8 +11,14 @@
 import { getWorkflowRunsForSha } from './github-merge.lib.mjs';
 import { promisify } from 'util';
 import { exec as execCallback } from 'child_process';
-const exec = promisify(execCallback);
+import { ghWithRateLimitRetry } from './github-rate-limit.lib.mjs';
+const execRaw = promisify(execCallback);
+// Issue #1726: every gh call must be rate-limit safe.
+const exec = (cmd, opts) =>
+  ghWithRateLimitRetry(() => execRaw(cmd, opts), {
+    label: `gh exec (${cmd.split(/\s+/).slice(0, 3).join(' ')})`,
+  });
 /**
  * Wait for all workflow runs triggered by a specific commit to complete

package/src/github-merge-ready-sync.lib.mjs CHANGED Viewed

@@ -11,8 +11,14 @@
 import { promisify } from 'util';
 import { exec as execCallback } from 'child_process';
+import { ghWithRateLimitRetry } from './github-rate-limit.lib.mjs';
-const exec = promisify(execCallback);
+const execRaw = promisify(execCallback);
+// Issue #1726: rate-limit safe gh wrapper.
+const exec = (cmd, opts) =>
+  ghWithRateLimitRetry(() => execRaw(cmd, opts), {
+    label: `gh exec (${cmd.split(/\s+/).slice(0, 3).join(' ')})`,
+  });
 import { extractLinkedIssueNumber } from './github-linking.lib.mjs';

package/src/github-merge-repo-actions.lib.mjs CHANGED Viewed

@@ -11,7 +11,20 @@
 import { promisify } from 'util';
 import { exec as execCallback } from 'child_process';
-const exec = promisify(execCallback);
+import { githubLimits } from './config.lib.mjs';
+import { ghWithRateLimitRetry } from './github-rate-limit.lib.mjs';
+const execRaw = promisify(execCallback);
+// Issue #1722: raise exec maxBuffer above Node's 1 MB default for paginated gh
+// API responses (workflow runs can easily exceed that on busy repos).
+// Issue #1726: wrap with rate-limit retry so a 5,000/hr quota hit waits for
+// reset instead of bubbling up as a generic fetch failure.
+const exec = (cmd, opts = {}) =>
+  ghWithRateLimitRetry(() => execRaw(cmd, { maxBuffer: githubLimits.bufferMaxSize, ...opts }), {
+    label: `gh exec (${cmd.split(/\s+/).slice(0, 3).join(' ')})`,
+  });
+// Statuses we treat as "not yet finished".
+const ACTIVE_RUN_STATUSES = ['in_progress', 'queued', 'waiting', 'requested', 'pending'];
 /**
  * Get ALL active workflow runs across the entire repository (no branch filter).
@@ -21,20 +34,34 @@ const exec = promisify(execCallback);
  * @returns {Promise<{runs: Array, hasActiveRuns: boolean, count: number}>}
  */
 export async function getAllActiveRepoRuns(owner, repo, verbose = false) {
-  try {
-    const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?per_page=100" --paginate --slurp`);
-    const runs = JSON.parse(stdout.trim() || '[]')
-      .flatMap(page => page.workflow_runs || [])
-      .filter(run => ['in_progress', 'queued', 'waiting', 'requested', 'pending'].includes(run.status))
-      .map(run => ({ id: run.id, name: run.name, status: run.status, head_branch: run.head_branch, head_sha: run.head_sha?.slice(0, 7) }));
-    if (verbose && runs.length > 0) {
-      console.log(`[VERBOSE] repo-actions: ${runs.length} active run(s) in ${owner}/${repo}`);
-      for (const r of runs) console.log(`[VERBOSE] repo-actions:   ${r.name} (${r.status}) on ${r.head_branch}`);
+  // Issue #1722: filter on the server side per status to avoid pulling the full
+  // history of workflow runs (which can exceed exec maxBuffer). Also: do not
+  // swallow errors as "no active runs" — bubble them up so callers can retry
+  // instead of merging on top of a still-running CI run.
+  const seen = new Set();
+  const runs = [];
+  for (const status of ACTIVE_RUN_STATUSES) {
+    const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?status=${status}&per_page=100" --paginate --slurp`);
+    const pages = JSON.parse(stdout.trim() || '[]');
+    for (const page of pages) {
+      for (const run of page.workflow_runs || []) {
+        if (seen.has(run.id)) continue;
+        seen.add(run.id);
+        runs.push({
+          id: run.id,
+          name: run.name,
+          status: run.status,
+          head_branch: run.head_branch,
+          head_sha: run.head_sha?.slice(0, 7),
+        });
+      }
     }
-    return { runs, hasActiveRuns: runs.length > 0, count: runs.length };
-  } catch {
-    return { runs: [], hasActiveRuns: false, count: 0 };
   }
+  if (verbose && runs.length > 0) {
+    console.log(`[VERBOSE] repo-actions: ${runs.length} active run(s) in ${owner}/${repo}`);
+    for (const r of runs) console.log(`[VERBOSE] repo-actions:   ${r.name} (${r.status}) on ${r.head_branch}`);
+  }
+  return { runs, hasActiveRuns: runs.length > 0, count: runs.length };
 }
 /**
@@ -52,7 +79,16 @@ export async function waitForAllRepoActions(owner, repo, options = {}, verbose =
   let peakRunCount = 0;
   while (Date.now() - startTime < timeout) {
-    const active = await getAllActiveRepoRuns(owner, repo, verbose);
+    let active;
+    try {
+      active = await getAllActiveRepoRuns(owner, repo, verbose);
+    } catch (error) {
+      // Issue #1722: do not silently treat fetch errors as "no active runs".
+      // Log and retry on the next poll instead.
+      console.error(`[ERROR] repo-actions: Error checking repo CI: ${error.message}`);
+      await new Promise(resolve => setTimeout(resolve, pollInterval));
+      continue;
+    }
     if (onStatusUpdate) {
       try {
         await onStatusUpdate({ ...active, elapsedMs: Date.now() - startTime });
@@ -66,7 +102,15 @@ export async function waitForAllRepoActions(owner, repo, options = {}, verbose =
     peakRunCount = Math.max(peakRunCount, active.count);
     await new Promise(resolve => setTimeout(resolve, pollInterval));
   }
-  const finalRuns = await getAllActiveRepoRuns(owner, repo, verbose);
+  // Issue #1722: if the timeout-final check throws, surface that as an error
+  // rather than reporting "no remaining runs".
+  let finalRuns;
+  try {
+    finalRuns = await getAllActiveRepoRuns(owner, repo, verbose);
+  } catch (error) {
+    console.error(`[ERROR] repo-actions: Final CI check failed after timeout: ${error.message}`);
+    return { success: false, waitedForRuns: true, timedOut: true, remainingRuns: [] };
+  }
   return { success: false, waitedForRuns: true, timedOut: true, remainingRuns: finalRuns.runs };
 }

package/src/github-merge.lib.mjs CHANGED Viewed

@@ -14,9 +14,28 @@
 import { promisify } from 'util';
 import { exec as execCallback } from 'child_process';
-const exec = promisify(execCallback);
+const execRaw = promisify(execCallback);
 import { parseGitHubUrl } from './github.lib.mjs';
+import { githubLimits } from './config.lib.mjs';
+import { ghWithRateLimitRetry } from './github-rate-limit.lib.mjs';
+// Issue #1722: gh api `--paginate --slurp` responses for repos with many
+// historical workflow runs can easily exceed Node's default 1 MB exec buffer
+// (observed 12.7 MB on this repo's main branch). Default to the configured
+// githubLimits.bufferMaxSize (10 MB; HIVE_MIND_GITHUB_BUFFER_MAX_SIZE) for all
+// gh calls in this file.
+//
+// Issue #1726: every gh call in the merge subsystem must be rate-limit safe.
+// Wrapping the local `exec` shim ensures all 25+ call sites pick up retry
+// behaviour without per-call changes. Non-rate-limit errors continue to throw
+// so genuine failures (404, auth, malformed JSON downstream) surface to the
+// caller — they MUST NOT be swallowed as in the original /merge bug where a
+// rate-limit error was silently treated as "no workflows".
+const exec = (cmd, opts = {}) =>
+  ghWithRateLimitRetry(() => execRaw(cmd, { maxBuffer: githubLimits.bufferMaxSize, ...opts }), {
+    label: `gh exec (${cmd.split(/\s+/).slice(0, 3).join(' ')})`,
+  });
 // Issue #1413: Import ready tag sync, timeline, and label constant from separate module
 // to keep this file under the 1500 line limit
@@ -674,9 +693,20 @@ export function parseRepositoryUrl(url) {
   };
 }
+/**
+ * Statuses we treat as "still running" / "not yet finished".
+ * Issue #1722: be exhaustive — GitHub uses several non-completed statuses.
+ */
+const ACTIVE_RUN_STATUSES = ['in_progress', 'queued', 'waiting', 'requested', 'pending'];
 /**
  * Get active workflow runs on a specific branch
  * Issue #1307: Used to check if there are any in-progress or queued runs on the target branch
+ * Issue #1722: Filter on the server side per status, otherwise the unfiltered
+ * `--paginate --slurp` response can overflow exec maxBuffer on busy repos
+ * (observed 12.7 MB on link-assistant/hive-mind main). Also: errors are now
+ * surfaced rather than swallowed as `hasActiveRuns: false`, which previously
+ * caused /merge to merge on top of a still-running CI run.
  * @param {string} owner - Repository owner
  * @param {string} repo - Repository name
  * @param {string} branch - Branch name (default: main)
@@ -684,36 +714,38 @@ export function parseRepositoryUrl(url) {
  * @returns {Promise<{runs: Array<Object>, hasActiveRuns: boolean, count: number}>}
  */
 export async function getActiveBranchRuns(owner, repo, branch = 'main', verbose = false) {
-  try {
-    // Query for in_progress and queued runs on the specified branch
-    const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?branch=${branch}&per_page=100" --paginate --slurp`);
-    const runs = JSON.parse(stdout.trim() || '[]')
-      .flatMap(page => page.workflow_runs || [])
-      .filter(run => run.status === 'in_progress' || run.status === 'queued')
-      .map(run => ({ id: run.id, name: run.name, status: run.status, created_at: run.created_at, html_url: run.html_url }));
-    if (verbose) {
-      console.log(`[VERBOSE] /merge: Found ${runs.length} active runs on ${owner}/${repo} branch ${branch}`);
-      for (const run of runs) {
-        console.log(`[VERBOSE] /merge:   - Run #${run.id}: ${run.name} (${run.status})`);
+  const seen = new Set();
+  const runs = [];
+  for (const status of ACTIVE_RUN_STATUSES) {
+    const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?branch=${branch}&status=${status}&per_page=100" --paginate --slurp`);
+    const pages = JSON.parse(stdout.trim() || '[]');
+    for (const page of pages) {
+      for (const run of page.workflow_runs || []) {
+        if (seen.has(run.id)) continue;
+        seen.add(run.id);
+        runs.push({
+          id: run.id,
+          name: run.name,
+          status: run.status,
+          created_at: run.created_at,
+          html_url: run.html_url,
+        });
       }
     }
+  }
-    return {
-      runs,
-      hasActiveRuns: runs.length > 0,
-      count: runs.length,
-    };
-  } catch (error) {
-    if (verbose) {
-      console.log(`[VERBOSE] /merge: Error checking active runs on ${branch}: ${error.message}`);
+  if (verbose) {
+    console.log(`[VERBOSE] /merge: Found ${runs.length} active runs on ${owner}/${repo} branch ${branch}`);
+    for (const run of runs) {
+      console.log(`[VERBOSE] /merge:   - Run #${run.id}: ${run.name} (${run.status})`);
     }
-    return {
-      runs: [],
-      hasActiveRuns: false,
-      count: 0,
-    };
   }
+  return {
+    runs,
+    hasActiveRuns: runs.length > 0,
+    count: runs.length,
+  };
 }
 /**
@@ -788,7 +820,20 @@ export async function waitForBranchCI(owner, repo, branch = 'main', options = {}
   }
   // Timeout reached
-  const finalCheck = await getActiveBranchRuns(owner, repo, branch, verbose);
+  // Issue #1722: if the final check throws, do NOT silently report "ready".
+  // Treat it the same as still-active (force a timeout failure), so /merge
+  // waits/retries instead of merging on top of a still-running CI run.
+  let finalCheck;
+  try {
+    finalCheck = await getActiveBranchRuns(owner, repo, branch, verbose);
+  } catch (error) {
+    return {
+      success: false,
+      waitedForRuns: true,
+      completedRuns: totalWaitedRuns,
+      error: `Timeout reached and final CI check failed on ${branch}: ${error.message}`,
+    };
+  }
   if (finalCheck.hasActiveRuns) {
     return {
       success: false,
@@ -1306,40 +1351,37 @@ export async function getWorkflowRunJobsCount(owner, repo, runId, verbose = fals
  * @returns {Promise<{count: number, hasWorkflows: boolean, workflows: Array<{id: number, name: string, state: string, path: string}>}>}
  */
 export async function getActiveRepoWorkflows(owner, repo, verbose = false) {
-  try {
-    const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/workflows" --paginate --slurp`);
-    const allWorkflows = JSON.parse(stdout.trim() || '[]')
-      .flatMap(page => page.workflows || [])
-      .filter(workflow => workflow.state === 'active')
-      .map(workflow => ({ id: workflow.id, name: workflow.name, state: workflow.state, path: workflow.path }));
-    // GitHub Pages workflows only run after merge and never produce PR check-runs.
-    const workflows = allWorkflows.filter(wf => !wf.path.startsWith('dynamic/pages/'));
-    if (verbose) {
-      console.log(`[VERBOSE] /merge: Found ${allWorkflows.length} active workflows in ${owner}/${repo} (${workflows.length} PR-relevant after filtering out GitHub Pages deployment workflows)`);
-      for (const wf of allWorkflows) {
-        const filtered = wf.path.startsWith('dynamic/pages/');
-        console.log(`[VERBOSE] /merge:   - ${wf.name} (${wf.id}): ${wf.state}, path=${wf.path}${filtered ? ' [excluded: GitHub Pages deployment]' : ''}`);
-      }
-    }
+  // Issue #1726: this function previously swallowed every error as "no workflows",
+  // including GitHub API rate-limit responses. The /merge command then thought CI
+  // was unconfigured and proceeded as if checks had passed — a hard failure mode
+  // visible in the original case-study log where errors were thrown but the
+  // process exited 0.
+  //
+  // Rate-limit errors are now retried inside the local exec() wrapper. After
+  // retries are exhausted, the error MUST propagate so callers can decide
+  // whether to abort or continue — never default to "no workflows".
+  const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/workflows" --paginate --slurp`);
+  const allWorkflows = JSON.parse(stdout.trim() || '[]')
+    .flatMap(page => page.workflows || [])
+    .filter(workflow => workflow.state === 'active')
+    .map(workflow => ({ id: workflow.id, name: workflow.name, state: workflow.state, path: workflow.path }));
+  // GitHub Pages workflows only run after merge and never produce PR check-runs.
+  const workflows = allWorkflows.filter(wf => !wf.path.startsWith('dynamic/pages/'));
-    return {
-      count: workflows.length,
-      hasWorkflows: workflows.length > 0,
-      workflows,
-    };
-  } catch (error) {
-    if (verbose) {
-      console.log(`[VERBOSE] /merge: Error fetching workflows for ${owner}/${repo}: ${error.message}`);
+  if (verbose) {
+    console.log(`[VERBOSE] /merge: Found ${allWorkflows.length} active workflows in ${owner}/${repo} (${workflows.length} PR-relevant after filtering out GitHub Pages deployment workflows)`);
+    for (const wf of allWorkflows) {
+      const filtered = wf.path.startsWith('dynamic/pages/');
+      console.log(`[VERBOSE] /merge:   - ${wf.name} (${wf.id}): ${wf.state}, path=${wf.path}${filtered ? ' [excluded: GitHub Pages deployment]' : ''}`);
     }
-    // On error, assume no workflows (safer: avoids false positives in the no-CI case)
-    return {
-      count: 0,
-      hasWorkflows: false,
-      workflows: [],
-    };
   }
+  return {
+    count: workflows.length,
+    hasWorkflows: workflows.length > 0,
+    workflows,
+  };
 }
 // Issue #1690: Re-export CI signal helpers from separate module to keep this file under 1500 lines