npm - @link-assistant/hive-mind - Versions diffs - 1.57.3 → 1.59.0 - Mend

@link-assistant/hive-mind 1.57.3 → 1.59.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/CHANGELOG.md +186 -0
package/package.json +1 -1
package/src/anthropic-server-tool-pricing.lib.mjs +34 -0
package/src/bidirectional-interactive.lib.mjs +392 -21
package/src/claude.budget-stats.lib.mjs +154 -27
package/src/claude.cost.lib.mjs +88 -0
package/src/claude.lib.mjs +54 -58
package/src/codex.lib.mjs +31 -0
package/src/config.lib.mjs +39 -2
package/src/github-cost-info.lib.mjs +4 -1
package/src/lino.lib.mjs +3 -1
package/src/solve.auto-merge.lib.mjs +5 -0
package/src/solve.config.lib.mjs +39 -0
package/src/sub-session-size.lib.mjs +239 -0
package/src/use-with-retry.lib.mjs +91 -0

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,191 @@
 # @link-assistant/hive-mind
+## 1.59.0
+### Minor Changes
+- 903b10e: Add `--auto-input-until-mergeable` (issue #1708): a new experimental
+  mode that extends a single Claude session for as long as possible by
+  streaming PR/issue comments, CI/CD failures, uncommitted-changes
+  status, and PR/issue title/body updates as NDJSON `user` frames into
+  the live `claude --input-format stream-json` process — instead of
+  killing the process and restarting with the feedback prepended to a
+  fresh prompt.
+  What it ships:
+  - Three new flags in `src/solve.config.lib.mjs`, all defaulting to
+    `false` and marked `[EXPERIMENTAL]`:
+    - `--auto-input-until-mergeable` — top-level opt-in for the new
+      behavior. Implies `--accept-incomming-comments-as-input` and
+      defaults to `--queue-comments-to-input` so the AI can finish its
+      current step before being interrupted.
+    - `--stream-comments-to-input` — forward each comment immediately
+      as it arrives. Default for `--accept-incomming-comments-as-input`
+      on its own (preserves the existing #817 behavior).
+    - `--queue-comments-to-input` — buffer comments while the AI is
+      busy and flush them only on `result` events. Default delivery
+      mode for `--auto-input-until-mergeable`. Mutually exclusive with
+      `--stream-comments-to-input`; queue mode wins if both are set.
+  - Queue-vs-stream delivery wired into
+    `src/bidirectional-interactive.lib.mjs#createBidirectionalHandler`:
+    - New `deliveryMode` option (`'stream'` / `'queue'`) plus
+      `markAiBusy()` / `markAiIdle()` lifecycle methods exposed on the
+      handler.
+    - In queue mode, comment frames and status frames are buffered in
+      `pendingFrames` while busy and FIFO-flushed to stdin on the next
+      `result` event. In stream mode, frames go to stdin immediately as
+      today.
+  - Status streaming (only when `--auto-input-until-mergeable` is on)
+    in `src/bidirectional-interactive.lib.mjs#checkForStatusChanges`:
+    - New parallel poller emits one-shot NDJSON frames for: PR
+      title/body changes, issue title/body changes (Issue #1708 G1),
+      uncommitted local changes (`git status --porcelain`), and CI
+      blockers (via `getMergeBlockers`).
+    - Each change is keyed by a stable signature so the same failing
+      check doesn't re-emit on every poll; failures in any sub-check
+      are swallowed and logged so the poller never breaks the live
+      Claude session.
+  - Stream parser in `src/claude.lib.mjs#executeClaudeCommand` now
+    signals `markAiBusy()` on `assistant` / `tool_use` / `tool_result`
+    events and `markAiIdle()` on `result` events, so queue-mode
+    buffering tracks the actual AI lifecycle.
+  - `src/solve.auto-merge.lib.mjs#watchUntilMergeable` logs a
+    "streaming-first" banner when `--auto-input-until-mergeable` was
+    active, so it is clear the auto-restart loop is the fallback rather
+    than the primary handler.
+  - For non-Claude tools, the validator continues to warn and disable
+    all four flags — the existing #817 fallback path. The default
+    behavior of every existing flag
+    (`--auto-restart-until-mergeable`, `--auto-merge`, etc.) is
+    preserved (R4: "must not break any existing features").
+  - Tests:
+    `tests/test-auto-input-until-mergeable-1708.mjs` (59 assertions)
+    and 11 new assertions in
+    `tests/test-bidirectional-interactive.mjs` cover flag composition,
+    queue-vs-stream routing, FIFO flushing on idle, busy-flag
+    preservation across stream-mode writes, default-deliveryMode is
+    stream, status-frame stamping with the right header per kind
+    (`comment` / `ci` / `uncommitted` / `metadata`), and metadata
+    diff/snapshot helpers.
+  The case study at `docs/case-studies/issue-1708/` is updated to
+  reflect that R1, R2 (Claude path), R3 (PR/issue title+body, CI,
+  uncommitted, comments), R4, R5, R6, plus G1, G5, G7 are addressed
+  here. Codex/Agent/OpenCode still degrade gracefully (no mid-session
+  NDJSON channel upstream) and use the existing `watchUntilMergeable`
+  loop as documented in G4.
+- 6efcab4: Fix cost / token calculation correctness, unify Total / sub-session format,
+  add verbose budget trace, and case study for issue #1710
+  Resolves the four "strange things" the issue reported by changing both the
+  public-pricing math and the rendered output:
+  - **R1 — `$0.040000` residual eliminated.** `calculateModelCost`
+    ([`src/claude.lib.mjs`](./src/claude.lib.mjs)) now bills Anthropic
+    server-side tools. `web_search` is charged at the documented
+    $10 / 1 000 requests rate (= $0.01 / req) via the new constants module
+    [`src/anthropic-server-tool-pricing.lib.mjs`](./src/anthropic-server-tool-pricing.lib.mjs).
+    For the issue's PR #1707 run that comes out to exactly the previously-shown
+    $0.040000 / +0.16% delta, so the public-pricing total now reconciles with
+    Anthropic's reported `total_cost_usd`. `accumulateModelUsage`
+    ([`src/claude.budget-stats.lib.mjs`](./src/claude.budget-stats.lib.mjs))
+    also picks up `usage.server_tool_use.web_search_requests` from JSONL.
+  - **R2 — Haiku sub-session line includes input information.** Sub-agent
+    models never appear as the responding model in the parent JSONL, so
+    `peakContextUsage` stays at `0`. The fallback in `buildBudgetStatsString`
+    now emits the cumulative `(X new + Y cache writes [+ Z cache reads])`
+    phrase instead of dropping the input information entirely.
+  - **R3/R5 — Sub-session and Total reconcile.** The bullet line is now
+    labelled `peak request: …` so it cannot be confused with the cumulative
+    Total line. `requestContext` (the source of `peakContextByModel`) excludes
+    cache reads, so the bullet figure is `input + cache_creation` and is
+    reconcilable with the cumulative non-cached total. Cache reads remain
+    visible — and visible separately — on the Total line.
+  - **R4 — Total always splits cache reads / cache writes when present.**
+    The conditional that previously keyed on `cacheReadTokens` only is replaced
+    with a `buildCumulativeInputPhrase` helper that emits
+    `(X new + W cache writes + Y cache reads) input tokens` when both kinds of
+    cache activity exist, `(X new + W cache writes)` when only writes exist
+    (the Haiku case that triggered the issue), and the back-compat
+    `(X + Y cached)` form when only reads exist (so common Opus-only output
+    is unchanged). Cache writes are billed at 1.25× / 2× of input — fusing
+    them silently into the input figure was a real semantic bug, not a
+    cosmetic one.
+  Both `displayBudgetStats` (solver-log renderer) and `buildBudgetStatsString`
+  (PR-comment renderer) share the helper, so the two paths render identically.
+  Also adds **`dumpBudgetTrace`**
+  ([`src/claude.budget-stats.lib.mjs`](./src/claude.budget-stats.lib.mjs)),
+  a verbose-only structured per-model trace (peak request, cumulative
+  input/cache_write 5m+1h split/cache_read/output, server-tool counts with
+  implied dollar cost, public and Anthropic-reported costs, and the data
+  source) that fires from `displayBudgetStats` only when `{verbose: true}` is
+  set, so the default solver output is unchanged. The trace captures all the
+  inputs that drive the renderer in one place, so the next "calculation
+  correctness" report can be triaged from a saved log alone.
+  Tests:
+  - `tests/test-issue-1710-budget-trace.mjs` — 10 cases for the verbose trace.
+  - `tests/test-issue-1710-format-fixes.mjs` — 8 cases locking each requirement
+    to numbers from `docs/case-studies/issue-1710/facts.md` (the actual
+    PR #1707 result event the issue quotes).
+  Documentation: `docs/case-studies/issue-1710/` contains the root-cause
+  analysis (per symptom, with file:line citations), the captured facts, and
+  the (now-implemented) solution plans.
+  Also fixes the hosted-CI flake that surfaced while validating this PR:
+  `use-m` occasionally hands back a truncated/corrupt global package after
+  `npm install -g`, surfacing as either
+  `Failed to import module from '...': SyntaxError: Unexpected end of input`
+  or `Failed to resolve the path to '<pkg>'` when use-m loads `getenv` /
+  `links-notation` from `src/config.lib.mjs` and `src/lino.lib.mjs`. Adds
+  `src/use-with-retry.lib.mjs`, a small wrapper around `use(...)` that
+  recognises both flake modes, removes the broken alias directory, and
+  re-fetches once. Covered by `tests/test-use-with-retry.mjs` (13 cases).
+## 1.58.0
+### Minor Changes
+- 3616130: Add `--sub-session-size` and `--disable-1m-context` options for Claude and Codex (issue #1706)
+  `--sub-session-size` (default: `150k`) caps the size of each sub-session
+  between auto-compaction events. It accepts a token count (`150k`, `1m`,
+  `200000`), a percentage of the model context window (`50%`), or `default`
+  to keep the tool's built-in threshold.
+  `--disable-1m-context` (default: `true`) opts out of the 1M extended
+  context window so models stay on their standard 200K-400K window. This
+  preserves reasoning quality and avoids the long-context price tier.
+  Use `--no-disable-1m-context` to allow 1M.
+  Both options work for `--tool claude` and `--tool codex`. For Claude Code
+  the wrapper sets `CLAUDE_CODE_DISABLE_1M_CONTEXT`,
+  `CLAUDE_CODE_AUTO_COMPACT_WINDOW`, and `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE`
+  env vars (clamped per upstream's "lower-only" semantics). For Codex the
+  wrapper appends `-c model_context_window=200000` and
+  `-c model_auto_compact_token_limit=<tokens>` overrides.
+  Verbose mode logs the applied env vars and `-c` overrides so operators
+  can confirm they reached the spawned tool process.
+- b341775: Hide the cost-estimation breakdown when the public and Anthropic numbers agree to within display precision (issue #1703)
+  Both the live `displayCostComparison` console output and the
+  `buildCostInfoString` markdown rendered into PR/issue comments previously
+  collapsed to the short `💰 Cost: $X.XXXXXX` form only when the two values
+  matched **exactly** at six decimal places. Real-world calls regularly produce
+  underlying values that differ by ~`1e-7` and round to **adjacent** displays
+  (e.g. `$11.219694` vs `$11.219693`); the rendered difference (`$-0.000000
+(-0.00%)`) was therefore noise yet still printed three full lines. The guard
+  now triggers whenever `|public − anthropic|.toFixed(6) === '0.000000'`, which
+  preserves the existing behaviour at every meaningful (≥ `$0.000001`) delta and
+  adds short-form output for the boundary case from issue #1703. Regression
+  tests live in `tests/test-build-cost-info-string.mjs` and
+  `tests/test-display-cost-comparison.mjs`.
 ## 1.57.3
 ### Patch Changes

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@link-assistant/hive-mind",
-  "version": "1.57.3",
+  "version": "1.59.0",
   "description": "AI-powered issue solver and hive mind for collaborative problem solving",
   "main": "src/hive.mjs",
   "type": "module",

package/src/anthropic-server-tool-pricing.lib.mjs ADDED Viewed

@@ -0,0 +1,34 @@
+#!/usr/bin/env node
+/**
+ * Issue #1710: Anthropic server-side tool pricing.
+ *
+ * `calculateModelCost` historically only billed token-based usage (input,
+ * cache_creation, cache_read, output). When a sub-agent uses Anthropic's
+ * server-side web_search tool, the result event reports `webSearchRequests`,
+ * which Anthropic bills at $10 / 1 000 searches ($0.01 / request) per
+ * <https://platform.claude.com/docs/en/about-claude/pricing#web-search-tool>.
+ *
+ * Without billing it locally, the public-pricing estimate disagreed with
+ * Anthropic's reported `total_cost_usd` by exactly that amount — the
+ * "Difference: $0.040000 (+0.16%)" line that issue #1710 quotes.
+ *
+ * Centralising the constants in this module keeps the source-of-truth in one
+ * file: bumping a price is a one-line edit, and `calculateModelCost` /
+ * `dumpBudgetTrace` both read from the same map.
+ */
+export const SERVER_TOOL_PRICING_USD = Object.freeze({
+  // $10 per 1 000 searches = $0.01 per request.
+  // https://platform.claude.com/docs/en/about-claude/pricing#web-search-tool
+  web_search: { costPerRequest: 0.01, source: 'https://platform.claude.com/docs/en/about-claude/pricing#web-search-tool' },
+  // web_fetch is currently free for paying customers; kept here for
+  // completeness and so a future price change is a one-line edit.
+  web_fetch: { costPerRequest: 0, source: 'https://platform.claude.com/docs/en/about-claude/pricing#web-fetch-tool' },
+});
+/**
+ * Returns the per-request USD price for a server-side tool, or 0 if unknown.
+ * @param {string} tool - canonical tool name (e.g. "web_search")
+ * @returns {number} per-request price in USD
+ */
+export const getServerToolPrice = tool => SERVER_TOOL_PRICING_USD[tool]?.costPerRequest || 0;