@link-assistant/hive-mind 1.59.4 → 1.59.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/CHANGELOG.md +242 -0
  2. package/package.json +1 -1
  3. package/src/bidirectional-interactive.lib.mjs +1 -0
  4. package/src/contributing-guidelines.lib.mjs +3 -2
  5. package/src/github-error-reporter.lib.mjs +3 -2
  6. package/src/github-merge-ci-signals.lib.mjs +8 -2
  7. package/src/github-merge-ci.lib.mjs +8 -2
  8. package/src/github-merge-ready-sync.lib.mjs +7 -1
  9. package/src/github-merge-repo-actions.lib.mjs +59 -15
  10. package/src/github-merge.lib.mjs +100 -58
  11. package/src/github-rate-limit.lib.mjs +276 -0
  12. package/src/github.batch.lib.mjs +1 -0
  13. package/src/hive.mjs +2 -2
  14. package/src/hive.recheck.lib.mjs +1 -0
  15. package/src/lib.mjs +30 -4
  16. package/src/limits.lib.mjs +1 -0
  17. package/src/protect-branch.mjs +3 -2
  18. package/src/queue-config.lib.mjs +7 -3
  19. package/src/review.mjs +3 -2
  20. package/src/reviewers-hive.mjs +3 -2
  21. package/src/solve.accept-invite.lib.mjs +7 -1
  22. package/src/solve.auto-continue.lib.mjs +3 -2
  23. package/src/solve.auto-ensure.lib.mjs +3 -2
  24. package/src/solve.auto-merge-helpers.lib.mjs +3 -2
  25. package/src/solve.auto-merge.lib.mjs +3 -2
  26. package/src/solve.auto-pr.lib.mjs +1 -0
  27. package/src/solve.branch-errors.lib.mjs +1 -0
  28. package/src/solve.error-handlers.lib.mjs +1 -0
  29. package/src/solve.execution.lib.mjs +3 -2
  30. package/src/solve.feedback.lib.mjs +1 -0
  31. package/src/solve.mjs +3 -1
  32. package/src/solve.preparation.lib.mjs +1 -0
  33. package/src/solve.progress-monitoring.lib.mjs +1 -0
  34. package/src/solve.repository.lib.mjs +3 -3
  35. package/src/solve.restart-shared.lib.mjs +3 -2
  36. package/src/solve.results.lib.mjs +3 -2
  37. package/src/solve.session.lib.mjs +1 -0
  38. package/src/solve.watch.lib.mjs +3 -2
  39. package/src/telegram-accept-invitations.lib.mjs +7 -1
  40. package/src/token-sanitization.lib.mjs +1 -0
  41. package/src/youtrack/youtrack-sync.mjs +1 -0
package/CHANGELOG.md CHANGED
@@ -1,5 +1,247 @@
1
1
  # @link-assistant/hive-mind
2
2
 
3
+ ## 1.59.6
4
+
5
+ ### Patch Changes
6
+
7
+ - d6d05a0: Fully safeguard from GitHub API rate-limit errors — issue #1726.
8
+
9
+ `/merge` merged a draft PR even though every `gh api` call had been failing
10
+ with `HTTP 403: API rate limit exceeded`. The merge subsystem caught those
11
+ errors silently in `getActiveRepoWorkflows()` and reported _"no CI checks
12
+ and repo has no active workflows — no CI/CD configured"_, which `/merge`
13
+ interpreted as _"all clear"_. Verbose log
14
+ ([`docs/case-studies/issue-1726/data/a4dccea2-a941-4a0c-a50e-60b1ed454e1e.log`](./docs/case-studies/issue-1726/data/a4dccea2-a941-4a0c-a50e-60b1ed454e1e.log),
15
+ lines 40251–40269):
16
+
17
+ ```
18
+ [VERBOSE] /merge: Error fetching workflows for link-foundation/relative-meta-logic:
19
+ Command failed: gh api "repos/link-foundation/relative-meta-logic/actions/workflows" --paginate --slurp
20
+ gh: API rate limit exceeded for user ID 1431904 ... (HTTP 403)
21
+
22
+ [VERBOSE] /merge: PR #100 has no CI checks and repo has no active workflows - no CI/CD configured
23
+ ```
24
+
25
+ Two combining root causes:
26
+ 1. **`getActiveRepoWorkflows()` swallowed exceptions** in
27
+ [`src/github-merge.lib.mjs`](./src/github-merge.lib.mjs) and returned
28
+ `[]`. Rate-limit responses became "this repo has no workflows", which the
29
+ merge gate treated as "no CI configured, safe to merge".
30
+ 2. **No gh API call site had rate-limit retry**. The existing
31
+ `ghCmdRetry`/`ghRetry` helpers only recognised transient TCP/TLS faults,
32
+ so a 403 fell straight through. ~135 raw `$gh ...` and
33
+ ``exec(`gh ...`)`` call sites scattered across `src/solve.*`,
34
+ `src/github-merge.*`, scripts, and reviewers.
35
+
36
+ Fix:
37
+ - **New rate-limit module**
38
+ [`src/github-rate-limit.lib.mjs`](./src/github-rate-limit.lib.mjs) with
39
+ `isRateLimitError`, `parseRateLimitReset`, `fetchNextRateLimitReset`,
40
+ `computeRateLimitWait`, `ghWithRateLimitRetry`, `execGhWithRetry`,
41
+ `wrapDollarWithGhRetry`. Applies the issue's policy:
42
+ `wait = (resetTime − now) + bufferMs (10 min) + random(0..jitterMs) (0..5 min)`,
43
+ reusing `limitReset.bufferMs` / `limitReset.jitterMs` from
44
+ [`src/config.lib.mjs`](./src/config.lib.mjs) (introduced in #1236).
45
+ - **Propagate errors instead of swallowing**. `getActiveRepoWorkflows()`
46
+ no longer wraps the gh call in try/catch that returns `[]`. Errors bubble
47
+ up; the merge gate sees the failure and stops.
48
+ - **Layered retry in legacy helpers**. `ghRetry` and `ghCmdRetry` in
49
+ [`src/lib.mjs`](./src/lib.mjs) check `isRateLimitError` first and delegate
50
+ to `ghWithRateLimitRetry` before applying transient-network retry.
51
+ - **Local `exec` shim** in 7 merge files rebound through
52
+ `ghWithRateLimitRetry` — converts every existing ``exec(`gh ...`)`` site
53
+ without per-call edits.
54
+ - **Wrapped `$` at every entry point** (15 files). `wrapDollarWithGhRetry`
55
+ routes every `$gh ...` through the retry helper while passing non-gh
56
+ commands unchanged.
57
+ - **Marker imports** in 17 callee files that receive `$` as a parameter,
58
+ declaring rate-limit awareness for the ESLint rule.
59
+ - **Queue threshold lowered** from 75% to 50% in
60
+ [`src/queue-config.lib.mjs`](./src/queue-config.lib.mjs).
61
+ - **Custom ESLint rule**
62
+ [`eslint-rules/no-direct-gh-exec.mjs`](./eslint-rules/no-direct-gh-exec.mjs)
63
+ flags any unsafe `gh` exec call site; files that import a known-safe
64
+ wrapper are exempted at file scope.
65
+
66
+ Tests:
67
+ - [`tests/github-rate-limit.test.mjs`](./tests/github-rate-limit.test.mjs)
68
+ — 22 unit tests covering `isRateLimitError` (primary, secondary,
69
+ abuse-detection, stderr, cause-chain), `parseRateLimitReset` (header
70
+ variants), `computeRateLimitWait` (future / null / past reset, jitter
71
+ bounds), `ghWithRateLimitRetry` (success, propagation, retry-then-succeed,
72
+ exhausted retries), `wrapDollarWithGhRetry` (passthrough, retry,
73
+ propagation).
74
+ - [`tests/test-no-direct-gh-exec-rule.mjs`](./tests/test-no-direct-gh-exec-rule.mjs)
75
+ — RuleTester valid/invalid cases.
76
+ - Updated `tests/queue-config.test.mjs` and `tests/limits-display.test.mjs`
77
+ for the 50% threshold.
78
+
79
+ Documentation:
80
+ [`docs/case-studies/issue-1726/`](./docs/case-studies/issue-1726/README.md)
81
+ contains the failing run logs, root-cause analysis, fix breakdown, and
82
+ verification commands.
83
+
84
+ - bb0af8c: Fix `check-file-line-limits` CI failure on `main` after issue #1726 merge.
85
+
86
+ After PR #1726 (rate-limit safeguards) merged into `main`, the
87
+ `check-file-line-limits` job failed because three `.mjs` files crossed the
88
+ 1500-line hard limit:
89
+ - `src/hive.mjs` — 1500 → 1504 lines
90
+ - `src/limits.lib.mjs` — 1497 → 1501 lines
91
+ - `src/solve.repository.lib.mjs` — 1500 → 1501 lines
92
+
93
+ Two root causes combined: (1) the per-file marker block PR #1726 added was 4
94
+ lines (2 comment lines + import + `void`), with no headroom check; (2) ESLint's
95
+ `max-lines` rule was configured with `skipBlankLines: true, skipComments: true`
96
+ while the CI script counts raw `wc -l`, so `npm run lint` passed locally even
97
+ though the CI script would fail. Local lint and CI line-limit had silently
98
+ drifted apart. See
99
+ [`docs/case-studies/issue-1730`](./docs/case-studies/issue-1730/README.md)
100
+ for the timeline, log excerpts, and template comparison.
101
+
102
+ Fix:
103
+ - **Synchronize ESLint `max-lines` with the CI script** in
104
+ [`eslint.config.mjs`](./eslint.config.mjs) by setting `skipBlankLines: false,
105
+ skipComments: false`. Now `npm run lint` catches the failure locally before
106
+ push, restoring the invariant the rule's comment claimed.
107
+ - **Compact the rate-limit marker** introduced by #1726 from 4 lines to 1 line
108
+ in all 17 files. ESLint's existing `varsIgnorePattern: '^_'` means the
109
+ `void _wrapDollarWithGhRetry;` line was redundant; the trailing-comment form
110
+ preserves rate-limit awareness for `no-direct-gh-exec` while saving 3 lines
111
+ per file. Files: `src/hive.mjs`, `src/limits.lib.mjs`,
112
+ `src/{solve.session,solve.preparation,solve.progress-monitoring,solve.error-handlers,solve.feedback,solve.auto-pr,solve.branch-errors,hive.recheck,github.batch,bidirectional-interactive,token-sanitization}.lib.mjs`,
113
+ `src/youtrack/youtrack-sync.mjs`,
114
+ `scripts/{create-github-release,format-github-release,format-release-notes}.mjs`.
115
+ - **Compact `solve.repository.lib.mjs`** wrap pattern from 4 lines to 3 while
116
+ keeping the destructure form so `eslint-rules/no-direct-gh-exec.mjs` still
117
+ recognizes `wrapDollarWithGhRetry` in scope.
118
+
119
+ After the fix, all three previously-failing files are at or below 1500 raw
120
+ lines (1500 / 1498 / 1500) and `npm run lint` would now reject any
121
+ re-introduction of the regression.
122
+
123
+ ## 1.59.5
124
+
125
+ ### Patch Changes
126
+
127
+ - bb24175: Fix `/merge` to correctly detect active CI runs on the default branch — issue
128
+ #1722.
129
+
130
+ The `/merge` command merged PR #1719 even though a CI/CD workflow run was
131
+ still in progress on `main`. The merge triggered a new run, which cancelled
132
+ the previous one. Verbose log:
133
+
134
+ ```
135
+ [VERBOSE] /merge: Checking for active CI runs on link-assistant/hive-mind branch main...
136
+ [VERBOSE] /merge: Error checking active runs on main: stdout maxBuffer length exceeded
137
+ [VERBOSE] /merge: No active CI runs on main branch. Ready to proceed.
138
+ ```
139
+
140
+ Two compounding root causes in
141
+ [`src/github-merge.lib.mjs`](./src/github-merge.lib.mjs)
142
+ `getActiveBranchRuns()` (and the parallel
143
+ [`src/github-merge-repo-actions.lib.mjs`](./src/github-merge-repo-actions.lib.mjs)
144
+ `getAllActiveRepoRuns()` introduced by issue #1503):
145
+ 1. **No `maxBuffer` override on `gh api --paginate --slurp`.** Node's default
146
+ `child_process.exec` buffer is 1 MB; the unfiltered `actions/runs` response
147
+ on this repo's `main` was 12.7 MB, so `exec` rejected with
148
+ `stdout maxBuffer length exceeded`.
149
+ 2. **Fetch errors became "no active runs".** The `catch` block returned
150
+ `hasActiveRuns: false`, which the caller (`waitForBranchCI`) interpreted as
151
+ "branch CI is idle, ready to merge". A transient fetch/buffer/parse error
152
+ was indistinguishable from genuine idleness.
153
+
154
+ Fix:
155
+ - **Server-side `?status=` filter**, looped over the active set
156
+ (`in_progress`, `queued`, `waiting`, `requested`, `pending`) with run-id
157
+ dedup. Response size scales with active-run count, not with historical-run
158
+ count — typically a few KB instead of 12+ MB.
159
+ - **Raise `exec` `maxBuffer` to `githubLimits.bufferMaxSize`** (10 MB, env
160
+ `HIVE_MIND_GITHUB_BUFFER_MAX_SIZE`) for all `gh` calls in
161
+ `github-merge.lib.mjs` and `github-merge-repo-actions.lib.mjs`. The existing
162
+ `githubLimits` infrastructure was already used in `github.batch.lib.mjs`;
163
+ this just wires it into the `/merge` paths.
164
+ - **Stop swallowing fetch errors as "idle".** Errors now propagate. The
165
+ surrounding `waitForBranchCI` / `waitForAllRepoActions` poll loops already
166
+ retry on the next tick; the timeout-final check has its own try/catch that
167
+ returns an explicit failure (instead of a false-positive "ready to merge").
168
+
169
+ Tests:
170
+ [`tests/test-active-branch-runs-buffer-1722.mjs`](./tests/test-active-branch-runs-buffer-1722.mjs)
171
+ shadows `gh` on `PATH` with a Node script that scripts active-run responses,
172
+ and asserts: (a) every call uses `?status=`, (b) duplicate runs across
173
+ statuses are deduplicated, (c) >1 MB responses are handled cleanly, (d)
174
+ `gh` failures throw rather than report idle, (e) `waitForBranchCI` keeps
175
+ polling on errors, (f) idle branches still resolve as ready,
176
+ (g) `getAllActiveRepoRuns` parity.
177
+
178
+ Documentation:
179
+ [`docs/case-studies/issue-1722/`](./docs/case-studies/issue-1722/README.md)
180
+ contains the timeline (with downloaded bot log, cancelled-run logs, run
181
+ metadata), facts, per-symptom root-cause analysis, and solution plan.
182
+ [`experiments/issue-1722-buffer-overflow.mjs`](./experiments/issue-1722-buffer-overflow.mjs)
183
+ is a minimal reproduction. No upstream report required — the fix lives
184
+ entirely in this repo.
185
+
186
+ - 1a92ca1: Fix flaky CI `test-suites` job caused by `use-m`'s no-retry global npm install
187
+ — issue #1724.
188
+
189
+ CI run [25109962685](https://github.com/link-assistant/hive-mind/actions/runs/25109962685/job/73581228475)
190
+ on `main` failed in the `test-suites` job at the third test file
191
+ (`tests/test-active-branch-runs-buffer-1722.mjs`) with:
192
+
193
+ ```
194
+ Error: Failed to install command-stream@latest globally.
195
+ [cause]: Error: Command failed: npm install -g command-stream-v-latest@npm:command-stream@latest
196
+ npm error code ENOTEMPTY
197
+ npm error path /opt/hostedtoolcache/node/24.14.1/x64/lib/node_modules/command-stream-v-latest/js/src/commands
198
+ ```
199
+
200
+ Root cause: `src/github.lib.mjs` and `src/playwright-mcp.lib.mjs` call
201
+ `await use('command-stream')` at module top level (via `use-m`). Every test
202
+ file that transitively imports either module re-runs
203
+ `npm install -g command-stream-v-latest@npm:command-stream@latest`. `use-m`'s
204
+ `ensurePackageInstalled` issues a single `npm install -g` with no retry, and
205
+ npm intermittently fails with `ENOTEMPTY: directory not empty, rmdir` on
206
+ GitHub-hosted Ubuntu runners (a long-standing npm rmdir race against itself
207
+ when the previous global install left files behind).
208
+
209
+ Fix:
210
+ - New
211
+ [`scripts/preinstall-use-m-packages.mjs`](./scripts/preinstall-use-m-packages.mjs)
212
+ pre-installs every package the codebase loads through `use-m @latest`
213
+ (`command-stream`, `getenv`, `links-notation`, `@dotenvx/dotenvx`,
214
+ `telegraf`, `zx`, `yargs`) using the same alias scheme `use-m` does
215
+ (`<pkg-without-@-or-/>-v-latest`), with exponential-backoff retry on the
216
+ flake symptoms (`ENOTEMPTY` / `EBUSY` / `EPERM` / `ECONNRESET` / `ETIMEDOUT`
217
+ / `EAI_AGAIN` / `429` / `503`). After this step, `use-m`'s
218
+ `installedVersion === latestVersion` early-return path skips the install at
219
+ test time, so test imports never touch `npm install -g` again.
220
+ - The script also satisfies the case-study "verbose mode for next iteration"
221
+ requirement via `PREINSTALL_USE_M_VERBOSE=1` (or `RUNNER_DEBUG=1`), which
222
+ logs each attempt's command, stdout, stderr, and backoff delay, and
223
+ recognizes "package present on disk after a flake" as recovered success.
224
+ - Wires `node scripts/preinstall-use-m-packages.mjs` into the `test-suites`
225
+ and `test-execution` jobs in
226
+ [`.github/workflows/release.yml`](./.github/workflows/release.yml) right
227
+ after `npm install`, before any step that runs test files or `solve.mjs`.
228
+
229
+ Tests:
230
+ [`tests/test-preinstall-use-m-packages-1724.mjs`](./tests/test-preinstall-use-m-packages-1724.mjs)
231
+ covers the alias scheme, retryable-error matcher, exponential backoff, and
232
+ the four `installWithRetry` paths (first-success, retry-then-succeed,
233
+ non-retryable-abort, recovered-from-disk) deterministically (no real npm
234
+ calls). Marked `@hive-mind-test-suite default` so it runs in the same job
235
+ that previously flaked.
236
+
237
+ Documentation:
238
+ [`docs/case-studies/issue-1724/`](./docs/case-studies/issue-1724/README.md)
239
+ contains the timeline, verbatim error, downloaded failed-run logs, the
240
+ no-retry snippet from the live `use-m` source
241
+ (`logs/use-m-source.js`), the comparison with both pipeline templates
242
+ (JS/Rust — neither template uses `use-m @latest` at module load yet, so the
243
+ flake is hive-mind-specific until they do), and the implementation plan.
244
+
3
245
  ## 1.59.4
4
246
 
5
247
  ### Patch Changes
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@link-assistant/hive-mind",
3
- "version": "1.59.4",
3
+ "version": "1.59.6",
4
4
  "description": "AI-powered issue solver and hive mind for collaborative problem solving",
5
5
  "main": "src/hive.mjs",
6
6
  "type": "module",
@@ -22,6 +22,7 @@
22
22
  * @experimental
23
23
  */
24
24
 
25
+ import { wrapDollarWithGhRetry as _wrapDollarWithGhRetry } from './github-rate-limit.lib.mjs'; // rate-limit marker (#1726): gh API calls flow through $ wrapped by caller
25
26
  // Configuration constants
26
27
  const CONFIG = {
27
28
  // Minimum time between comment checks to avoid rate limiting (in ms)
@@ -9,8 +9,9 @@ if (typeof globalThis.use === 'undefined') {
9
9
  globalThis.use = (await eval(await (await fetch('https://unpkg.com/use-m/use.js')).text())).use;
10
10
  }
11
11
 
12
- const { $ } = await use('command-stream');
13
-
12
+ const { $: __rawDollar$ } = await use('command-stream');
13
+ const { wrapDollarWithGhRetry } = await import('./github-rate-limit.lib.mjs');
14
+ const $ = wrapDollarWithGhRetry(__rawDollar$);
14
15
  /**
15
16
  * Common paths where contributing guidelines might be found
16
17
  */
@@ -13,8 +13,9 @@ if (typeof globalThis.use === 'undefined') {
13
13
  }
14
14
 
15
15
  const fs = (await use('fs')).promises;
16
- const { $ } = await use('command-stream');
17
-
16
+ const { $: __rawDollar$ } = await use('command-stream');
17
+ const { wrapDollarWithGhRetry } = await import('./github-rate-limit.lib.mjs');
18
+ const $ = wrapDollarWithGhRetry(__rawDollar$);
18
19
  const GITHUB_ISSUE_BODY_MAX_SIZE = 60000;
19
20
  const GITHUB_FILE_MAX_SIZE = 10 * 1024 * 1024;
20
21
 
@@ -13,8 +13,14 @@
13
13
 
14
14
  import { promisify } from 'util';
15
15
  import { exec as execCallback } from 'child_process';
16
-
17
- const exec = promisify(execCallback);
16
+ import { ghWithRateLimitRetry } from './github-rate-limit.lib.mjs';
17
+
18
+ const execRaw = promisify(execCallback);
19
+ // Issue #1726: rate-limit safe gh wrapper.
20
+ const exec = (cmd, opts) =>
21
+ ghWithRateLimitRetry(() => execRaw(cmd, opts), {
22
+ label: `gh exec (${cmd.split(/\s+/).slice(0, 3).join(' ')})`,
23
+ });
18
24
 
19
25
  /**
20
26
  * Get the committed date of a specific commit from GitHub API
@@ -11,8 +11,14 @@
11
11
  import { getWorkflowRunsForSha } from './github-merge.lib.mjs';
12
12
  import { promisify } from 'util';
13
13
  import { exec as execCallback } from 'child_process';
14
-
15
- const exec = promisify(execCallback);
14
+ import { ghWithRateLimitRetry } from './github-rate-limit.lib.mjs';
15
+
16
+ const execRaw = promisify(execCallback);
17
+ // Issue #1726: every gh call must be rate-limit safe.
18
+ const exec = (cmd, opts) =>
19
+ ghWithRateLimitRetry(() => execRaw(cmd, opts), {
20
+ label: `gh exec (${cmd.split(/\s+/).slice(0, 3).join(' ')})`,
21
+ });
16
22
 
17
23
  /**
18
24
  * Wait for all workflow runs triggered by a specific commit to complete
@@ -11,8 +11,14 @@
11
11
 
12
12
  import { promisify } from 'util';
13
13
  import { exec as execCallback } from 'child_process';
14
+ import { ghWithRateLimitRetry } from './github-rate-limit.lib.mjs';
14
15
 
15
- const exec = promisify(execCallback);
16
+ const execRaw = promisify(execCallback);
17
+ // Issue #1726: rate-limit safe gh wrapper.
18
+ const exec = (cmd, opts) =>
19
+ ghWithRateLimitRetry(() => execRaw(cmd, opts), {
20
+ label: `gh exec (${cmd.split(/\s+/).slice(0, 3).join(' ')})`,
21
+ });
16
22
 
17
23
  import { extractLinkedIssueNumber } from './github-linking.lib.mjs';
18
24
 
@@ -11,7 +11,20 @@
11
11
 
12
12
  import { promisify } from 'util';
13
13
  import { exec as execCallback } from 'child_process';
14
- const exec = promisify(execCallback);
14
+ import { githubLimits } from './config.lib.mjs';
15
+ import { ghWithRateLimitRetry } from './github-rate-limit.lib.mjs';
16
+ const execRaw = promisify(execCallback);
17
+ // Issue #1722: raise exec maxBuffer above Node's 1 MB default for paginated gh
18
+ // API responses (workflow runs can easily exceed that on busy repos).
19
+ // Issue #1726: wrap with rate-limit retry so a 5,000/hr quota hit waits for
20
+ // reset instead of bubbling up as a generic fetch failure.
21
+ const exec = (cmd, opts = {}) =>
22
+ ghWithRateLimitRetry(() => execRaw(cmd, { maxBuffer: githubLimits.bufferMaxSize, ...opts }), {
23
+ label: `gh exec (${cmd.split(/\s+/).slice(0, 3).join(' ')})`,
24
+ });
25
+
26
+ // Statuses we treat as "not yet finished".
27
+ const ACTIVE_RUN_STATUSES = ['in_progress', 'queued', 'waiting', 'requested', 'pending'];
15
28
 
16
29
  /**
17
30
  * Get ALL active workflow runs across the entire repository (no branch filter).
@@ -21,20 +34,34 @@ const exec = promisify(execCallback);
21
34
  * @returns {Promise<{runs: Array, hasActiveRuns: boolean, count: number}>}
22
35
  */
23
36
  export async function getAllActiveRepoRuns(owner, repo, verbose = false) {
24
- try {
25
- const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?per_page=100" --paginate --slurp`);
26
- const runs = JSON.parse(stdout.trim() || '[]')
27
- .flatMap(page => page.workflow_runs || [])
28
- .filter(run => ['in_progress', 'queued', 'waiting', 'requested', 'pending'].includes(run.status))
29
- .map(run => ({ id: run.id, name: run.name, status: run.status, head_branch: run.head_branch, head_sha: run.head_sha?.slice(0, 7) }));
30
- if (verbose && runs.length > 0) {
31
- console.log(`[VERBOSE] repo-actions: ${runs.length} active run(s) in ${owner}/${repo}`);
32
- for (const r of runs) console.log(`[VERBOSE] repo-actions: ${r.name} (${r.status}) on ${r.head_branch}`);
37
+ // Issue #1722: filter on the server side per status to avoid pulling the full
38
+ // history of workflow runs (which can exceed exec maxBuffer). Also: do not
39
+ // swallow errors as "no active runs" bubble them up so callers can retry
40
+ // instead of merging on top of a still-running CI run.
41
+ const seen = new Set();
42
+ const runs = [];
43
+ for (const status of ACTIVE_RUN_STATUSES) {
44
+ const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?status=${status}&per_page=100" --paginate --slurp`);
45
+ const pages = JSON.parse(stdout.trim() || '[]');
46
+ for (const page of pages) {
47
+ for (const run of page.workflow_runs || []) {
48
+ if (seen.has(run.id)) continue;
49
+ seen.add(run.id);
50
+ runs.push({
51
+ id: run.id,
52
+ name: run.name,
53
+ status: run.status,
54
+ head_branch: run.head_branch,
55
+ head_sha: run.head_sha?.slice(0, 7),
56
+ });
57
+ }
33
58
  }
34
- return { runs, hasActiveRuns: runs.length > 0, count: runs.length };
35
- } catch {
36
- return { runs: [], hasActiveRuns: false, count: 0 };
37
59
  }
60
+ if (verbose && runs.length > 0) {
61
+ console.log(`[VERBOSE] repo-actions: ${runs.length} active run(s) in ${owner}/${repo}`);
62
+ for (const r of runs) console.log(`[VERBOSE] repo-actions: ${r.name} (${r.status}) on ${r.head_branch}`);
63
+ }
64
+ return { runs, hasActiveRuns: runs.length > 0, count: runs.length };
38
65
  }
39
66
 
40
67
  /**
@@ -52,7 +79,16 @@ export async function waitForAllRepoActions(owner, repo, options = {}, verbose =
52
79
  let peakRunCount = 0;
53
80
 
54
81
  while (Date.now() - startTime < timeout) {
55
- const active = await getAllActiveRepoRuns(owner, repo, verbose);
82
+ let active;
83
+ try {
84
+ active = await getAllActiveRepoRuns(owner, repo, verbose);
85
+ } catch (error) {
86
+ // Issue #1722: do not silently treat fetch errors as "no active runs".
87
+ // Log and retry on the next poll instead.
88
+ console.error(`[ERROR] repo-actions: Error checking repo CI: ${error.message}`);
89
+ await new Promise(resolve => setTimeout(resolve, pollInterval));
90
+ continue;
91
+ }
56
92
  if (onStatusUpdate) {
57
93
  try {
58
94
  await onStatusUpdate({ ...active, elapsedMs: Date.now() - startTime });
@@ -66,7 +102,15 @@ export async function waitForAllRepoActions(owner, repo, options = {}, verbose =
66
102
  peakRunCount = Math.max(peakRunCount, active.count);
67
103
  await new Promise(resolve => setTimeout(resolve, pollInterval));
68
104
  }
69
- const finalRuns = await getAllActiveRepoRuns(owner, repo, verbose);
105
+ // Issue #1722: if the timeout-final check throws, surface that as an error
106
+ // rather than reporting "no remaining runs".
107
+ let finalRuns;
108
+ try {
109
+ finalRuns = await getAllActiveRepoRuns(owner, repo, verbose);
110
+ } catch (error) {
111
+ console.error(`[ERROR] repo-actions: Final CI check failed after timeout: ${error.message}`);
112
+ return { success: false, waitedForRuns: true, timedOut: true, remainingRuns: [] };
113
+ }
70
114
  return { success: false, waitedForRuns: true, timedOut: true, remainingRuns: finalRuns.runs };
71
115
  }
72
116
 
@@ -14,9 +14,28 @@
14
14
  import { promisify } from 'util';
15
15
  import { exec as execCallback } from 'child_process';
16
16
 
17
- const exec = promisify(execCallback);
17
+ const execRaw = promisify(execCallback);
18
18
 
19
19
  import { parseGitHubUrl } from './github.lib.mjs';
20
+ import { githubLimits } from './config.lib.mjs';
21
+ import { ghWithRateLimitRetry } from './github-rate-limit.lib.mjs';
22
+
23
+ // Issue #1722: gh api `--paginate --slurp` responses for repos with many
24
+ // historical workflow runs can easily exceed Node's default 1 MB exec buffer
25
+ // (observed 12.7 MB on this repo's main branch). Default to the configured
26
+ // githubLimits.bufferMaxSize (10 MB; HIVE_MIND_GITHUB_BUFFER_MAX_SIZE) for all
27
+ // gh calls in this file.
28
+ //
29
+ // Issue #1726: every gh call in the merge subsystem must be rate-limit safe.
30
+ // Wrapping the local `exec` shim ensures all 25+ call sites pick up retry
31
+ // behaviour without per-call changes. Non-rate-limit errors continue to throw
32
+ // so genuine failures (404, auth, malformed JSON downstream) surface to the
33
+ // caller — they MUST NOT be swallowed as in the original /merge bug where a
34
+ // rate-limit error was silently treated as "no workflows".
35
+ const exec = (cmd, opts = {}) =>
36
+ ghWithRateLimitRetry(() => execRaw(cmd, { maxBuffer: githubLimits.bufferMaxSize, ...opts }), {
37
+ label: `gh exec (${cmd.split(/\s+/).slice(0, 3).join(' ')})`,
38
+ });
20
39
 
21
40
  // Issue #1413: Import ready tag sync, timeline, and label constant from separate module
22
41
  // to keep this file under the 1500 line limit
@@ -674,9 +693,20 @@ export function parseRepositoryUrl(url) {
674
693
  };
675
694
  }
676
695
 
696
+ /**
697
+ * Statuses we treat as "still running" / "not yet finished".
698
+ * Issue #1722: be exhaustive — GitHub uses several non-completed statuses.
699
+ */
700
+ const ACTIVE_RUN_STATUSES = ['in_progress', 'queued', 'waiting', 'requested', 'pending'];
701
+
677
702
  /**
678
703
  * Get active workflow runs on a specific branch
679
704
  * Issue #1307: Used to check if there are any in-progress or queued runs on the target branch
705
+ * Issue #1722: Filter on the server side per status, otherwise the unfiltered
706
+ * `--paginate --slurp` response can overflow exec maxBuffer on busy repos
707
+ * (observed 12.7 MB on link-assistant/hive-mind main). Also: errors are now
708
+ * surfaced rather than swallowed as `hasActiveRuns: false`, which previously
709
+ * caused /merge to merge on top of a still-running CI run.
680
710
  * @param {string} owner - Repository owner
681
711
  * @param {string} repo - Repository name
682
712
  * @param {string} branch - Branch name (default: main)
@@ -684,36 +714,38 @@ export function parseRepositoryUrl(url) {
684
714
  * @returns {Promise<{runs: Array<Object>, hasActiveRuns: boolean, count: number}>}
685
715
  */
686
716
  export async function getActiveBranchRuns(owner, repo, branch = 'main', verbose = false) {
687
- try {
688
- // Query for in_progress and queued runs on the specified branch
689
- const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?branch=${branch}&per_page=100" --paginate --slurp`);
690
- const runs = JSON.parse(stdout.trim() || '[]')
691
- .flatMap(page => page.workflow_runs || [])
692
- .filter(run => run.status === 'in_progress' || run.status === 'queued')
693
- .map(run => ({ id: run.id, name: run.name, status: run.status, created_at: run.created_at, html_url: run.html_url }));
694
-
695
- if (verbose) {
696
- console.log(`[VERBOSE] /merge: Found ${runs.length} active runs on ${owner}/${repo} branch ${branch}`);
697
- for (const run of runs) {
698
- console.log(`[VERBOSE] /merge: - Run #${run.id}: ${run.name} (${run.status})`);
717
+ const seen = new Set();
718
+ const runs = [];
719
+ for (const status of ACTIVE_RUN_STATUSES) {
720
+ const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/runs?branch=${branch}&status=${status}&per_page=100" --paginate --slurp`);
721
+ const pages = JSON.parse(stdout.trim() || '[]');
722
+ for (const page of pages) {
723
+ for (const run of page.workflow_runs || []) {
724
+ if (seen.has(run.id)) continue;
725
+ seen.add(run.id);
726
+ runs.push({
727
+ id: run.id,
728
+ name: run.name,
729
+ status: run.status,
730
+ created_at: run.created_at,
731
+ html_url: run.html_url,
732
+ });
699
733
  }
700
734
  }
735
+ }
701
736
 
702
- return {
703
- runs,
704
- hasActiveRuns: runs.length > 0,
705
- count: runs.length,
706
- };
707
- } catch (error) {
708
- if (verbose) {
709
- console.log(`[VERBOSE] /merge: Error checking active runs on ${branch}: ${error.message}`);
737
+ if (verbose) {
738
+ console.log(`[VERBOSE] /merge: Found ${runs.length} active runs on ${owner}/${repo} branch ${branch}`);
739
+ for (const run of runs) {
740
+ console.log(`[VERBOSE] /merge: - Run #${run.id}: ${run.name} (${run.status})`);
710
741
  }
711
- return {
712
- runs: [],
713
- hasActiveRuns: false,
714
- count: 0,
715
- };
716
742
  }
743
+
744
+ return {
745
+ runs,
746
+ hasActiveRuns: runs.length > 0,
747
+ count: runs.length,
748
+ };
717
749
  }
718
750
 
719
751
  /**
@@ -788,7 +820,20 @@ export async function waitForBranchCI(owner, repo, branch = 'main', options = {}
788
820
  }
789
821
 
790
822
  // Timeout reached
791
- const finalCheck = await getActiveBranchRuns(owner, repo, branch, verbose);
823
+ // Issue #1722: if the final check throws, do NOT silently report "ready".
824
+ // Treat it the same as still-active (force a timeout failure), so /merge
825
+ // waits/retries instead of merging on top of a still-running CI run.
826
+ let finalCheck;
827
+ try {
828
+ finalCheck = await getActiveBranchRuns(owner, repo, branch, verbose);
829
+ } catch (error) {
830
+ return {
831
+ success: false,
832
+ waitedForRuns: true,
833
+ completedRuns: totalWaitedRuns,
834
+ error: `Timeout reached and final CI check failed on ${branch}: ${error.message}`,
835
+ };
836
+ }
792
837
  if (finalCheck.hasActiveRuns) {
793
838
  return {
794
839
  success: false,
@@ -1306,40 +1351,37 @@ export async function getWorkflowRunJobsCount(owner, repo, runId, verbose = fals
1306
1351
  * @returns {Promise<{count: number, hasWorkflows: boolean, workflows: Array<{id: number, name: string, state: string, path: string}>}>}
1307
1352
  */
1308
1353
  export async function getActiveRepoWorkflows(owner, repo, verbose = false) {
1309
- try {
1310
- const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/workflows" --paginate --slurp`);
1311
- const allWorkflows = JSON.parse(stdout.trim() || '[]')
1312
- .flatMap(page => page.workflows || [])
1313
- .filter(workflow => workflow.state === 'active')
1314
- .map(workflow => ({ id: workflow.id, name: workflow.name, state: workflow.state, path: workflow.path }));
1315
-
1316
- // GitHub Pages workflows only run after merge and never produce PR check-runs.
1317
- const workflows = allWorkflows.filter(wf => !wf.path.startsWith('dynamic/pages/'));
1318
-
1319
- if (verbose) {
1320
- console.log(`[VERBOSE] /merge: Found ${allWorkflows.length} active workflows in ${owner}/${repo} (${workflows.length} PR-relevant after filtering out GitHub Pages deployment workflows)`);
1321
- for (const wf of allWorkflows) {
1322
- const filtered = wf.path.startsWith('dynamic/pages/');
1323
- console.log(`[VERBOSE] /merge: - ${wf.name} (${wf.id}): ${wf.state}, path=${wf.path}${filtered ? ' [excluded: GitHub Pages deployment]' : ''}`);
1324
- }
1325
- }
1354
+ // Issue #1726: this function previously swallowed every error as "no workflows",
1355
+ // including GitHub API rate-limit responses. The /merge command then thought CI
1356
+ // was unconfigured and proceeded as if checks had passed — a hard failure mode
1357
+ // visible in the original case-study log where errors were thrown but the
1358
+ // process exited 0.
1359
+ //
1360
+ // Rate-limit errors are now retried inside the local exec() wrapper. After
1361
+ // retries are exhausted, the error MUST propagate so callers can decide
1362
+ // whether to abort or continue — never default to "no workflows".
1363
+ const { stdout } = await exec(`gh api "repos/${owner}/${repo}/actions/workflows" --paginate --slurp`);
1364
+ const allWorkflows = JSON.parse(stdout.trim() || '[]')
1365
+ .flatMap(page => page.workflows || [])
1366
+ .filter(workflow => workflow.state === 'active')
1367
+ .map(workflow => ({ id: workflow.id, name: workflow.name, state: workflow.state, path: workflow.path }));
1368
+
1369
+ // GitHub Pages workflows only run after merge and never produce PR check-runs.
1370
+ const workflows = allWorkflows.filter(wf => !wf.path.startsWith('dynamic/pages/'));
1326
1371
 
1327
- return {
1328
- count: workflows.length,
1329
- hasWorkflows: workflows.length > 0,
1330
- workflows,
1331
- };
1332
- } catch (error) {
1333
- if (verbose) {
1334
- console.log(`[VERBOSE] /merge: Error fetching workflows for ${owner}/${repo}: ${error.message}`);
1372
+ if (verbose) {
1373
+ console.log(`[VERBOSE] /merge: Found ${allWorkflows.length} active workflows in ${owner}/${repo} (${workflows.length} PR-relevant after filtering out GitHub Pages deployment workflows)`);
1374
+ for (const wf of allWorkflows) {
1375
+ const filtered = wf.path.startsWith('dynamic/pages/');
1376
+ console.log(`[VERBOSE] /merge: - ${wf.name} (${wf.id}): ${wf.state}, path=${wf.path}${filtered ? ' [excluded: GitHub Pages deployment]' : ''}`);
1335
1377
  }
1336
- // On error, assume no workflows (safer: avoids false positives in the no-CI case)
1337
- return {
1338
- count: 0,
1339
- hasWorkflows: false,
1340
- workflows: [],
1341
- };
1342
1378
  }
1379
+
1380
+ return {
1381
+ count: workflows.length,
1382
+ hasWorkflows: workflows.length > 0,
1383
+ workflows,
1384
+ };
1343
1385
  }
1344
1386
 
1345
1387
  // Issue #1690: Re-export CI signal helpers from separate module to keep this file under 1500 lines