npm - agenr - Versions diffs - 0.13.1 → 0.13.2 - Mend

agenr 0.13.1 → 0.13.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/CHANGELOG.md +10 -0
package/dist/cli-main.js +90 -4
package/dist/modules/surgeon/adapters/prompts/passes/auto.md +16 -4
package/dist/modules/surgeon/adapters/prompts/passes/contradictions.md +3 -1
package/dist/modules/surgeon/adapters/prompts/passes/dedup.md +8 -0
package/dist/modules/surgeon/adapters/prompts/passes/retirement.md +4 -0
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,15 @@
 # Changelog
+## [0.13.2] - 2026-03-23
+### Surgeon
+- **Continuation loop prevents early exit.** If the surgeon model stops without calling `complete_pass` and has >10% budget remaining, a continuation prompt is injected to push it back to work. Up to 3 nudges before allowing exit. Eliminates the "surgeon quits at 1% budget" problem.
+- **Lowered default dedup similarity threshold from 0.82 to 0.60.** The threshold controls candidate surfacing, not merge execution — the surgeon agent still makes every merge decision. Lower threshold surfaces more candidates for review on large corpora.
+- **`reset` parameter for `query_dedup_clusters` and `query_contradiction_candidates`.** Query parameters are no longer permanently frozen after the first call. Pass `reset: true` to clear cached clusters and rebuild at a new threshold. Lets the surgeon start wide and narrow if noisy.
+- **Strengthened auto sweep prompts.** Contradictions phase always runs proactive scan (no more skipping when pending conflicts = 0). Budget discipline section added — surgeon must keep working while budget remains. Retirement throughput expectations: 500+ candidates on a 3K corpus, not 100.
+- **Dedup threshold guidance in prompts.** Surgeon is told the default is deliberately low, and can raise it via reset if too noisy.
 ## [0.13.1] - 2026-03-23
 ### MCP Server

package/dist/cli-main.js CHANGED Viewed

@@ -20772,7 +20772,7 @@ function validateClusterWithSupport(group, maxSize, diameterFloor, supportGraph)
 }
 // src/modules/surgeon/application/clustering/cluster.ts
-var DEFAULT_SIMILARITY_THRESHOLD2 = 0.82;
+var DEFAULT_SIMILARITY_THRESHOLD2 = 0.6;
 var CROSS_TYPE_SUBJECT_THRESHOLD = 0.89;
 var DEFAULT_MIN_CLUSTER = 2;
 var DEFAULT_MAX_CLUSTER_SIZE2 = 12;
@@ -23402,6 +23402,10 @@ var QUERY_CONTRADICTION_CANDIDATES_SCHEMA = Type4.Object({
     default: false,
     description: "If true, only return pairs sharing a subject_key. If false, also find semantically similar cross-subject pairs."
   })),
+  reset: Type4.Optional(Type4.Boolean({
+    default: false,
+    description: "If true, clear the cached contradiction scan for this run and rebuild it with the current scan settings."
+  })),
   limit: Type4.Optional(Type4.Integer({ minimum: 1, maximum: 50, default: 20 })),
   offset: Type4.Optional(Type4.Integer({ minimum: 0 }))
 });
@@ -23441,9 +23445,12 @@ function createQueryContradictionCandidatesTool(deps) {
   return {
     name: "query_contradiction_candidates",
     label: "Query contradiction candidates",
-    description: "Scan the active corpus for potential undiscovered contradictions. Finds pairs of entries that are semantically similar or share structured claim predicates but assert different things. Pairs already in conflict_log are marked but still returned. Use inspect_entry to evaluate promising pairs, then resolve_conflict to fix confirmed contradictions or flag_for_review for ambiguous cases.",
+    description: "Scan the active corpus for potential undiscovered contradictions. Finds pairs of entries that are semantically similar or share structured claim predicates but assert different things. Pairs already in conflict_log are marked but still returned. Use inspect_entry to evaluate promising pairs, then resolve_conflict to fix confirmed contradictions or flag_for_review for ambiguous cases. If you need to rebuild the scan with different thresholds, call with reset=true.",
     parameters: QUERY_CONTRADICTION_CANDIDATES_SCHEMA,
     async execute(_toolCallId, params) {
+      if (params.reset === true) {
+        cached = null;
+      }
       const query = buildQuery(params, deps);
       const offset = normalizeOffset2(params.offset);
       const limit = normalizeLimit2(params.limit);
@@ -23750,7 +23757,7 @@ function createResolveConflictTool(deps) {
 import { Type as Type7 } from "@sinclair/typebox";
 // src/modules/surgeon/application/dedup-clusters.ts
-var DEFAULT_SIM_THRESHOLD = 0.82;
+var DEFAULT_SIM_THRESHOLD = 0.6;
 var CLUSTER_PREVIEW_MAX_CHARS = 200;
 var UNSCOPED_PROJECT_LABEL = "(unscoped)";
 var DAY_MS3 = 24 * 60 * 60 * 1e3;
@@ -23868,6 +23875,11 @@ function getCachedEligibleDedupClusters(cache) {
     clusters: cache.eligibleClusters
   };
 }
+function resetDedupClusterCache(cache) {
+  cache.rawClusters = null;
+  cache.eligibleClusters = null;
+  cache.frozenQuery = null;
+}
 async function loadEligibleDedupClusters(db, input) {
   const cached = getCachedEligibleDedupClusters(input.cache);
   if (cached) {
@@ -24031,6 +24043,10 @@ var QUERY_DEDUP_CLUSTERS_SCHEMA = Type8.Object({
   project: Type8.Optional(Type8.String()),
   type: Type8.Optional(Type8.String()),
   sim_threshold: Type8.Optional(Type8.Number({ minimum: 0.5, maximum: 1 })),
+  reset: Type8.Optional(Type8.Boolean({
+    default: false,
+    description: "If true, clear the cached cluster scan for this run and rebuild it with the current filters so you can adjust thresholds mid-run."
+  })),
   limit: Type8.Optional(Type8.Integer({ minimum: 1, maximum: 20 })),
   offset: Type8.Optional(Type8.Integer({ minimum: 0 }))
 });
@@ -24050,12 +24066,15 @@ function createQueryDedupClustersTool(deps) {
   return {
     name: "query_dedup_clusters",
     label: "Query dedup clusters",
-    description: "Retrieve clusters of potentially duplicate entries. Each cluster groups entries with high embedding similarity or identical structured claims. Returns cluster summaries with entry previews. Use offset for pagination.",
+    description: "Retrieve clusters of potentially duplicate entries. Each cluster groups entries with high embedding similarity or identical structured claims. Returns cluster summaries with entry previews. Use offset for pagination. If you need to rebuild the candidate set with a different threshold or scope, call with reset=true.",
     parameters: QUERY_DEDUP_CLUSTERS_SCHEMA,
     async execute(_toolCallId, params) {
       if (!deps.clusterCache) {
         throw new Error("Dedup cluster cache is unavailable for this run.");
       }
+      if (params.reset === true) {
+        resetDedupClusterCache(deps.clusterCache);
+      }
       const query = normalizeDedupClusterQuery(
         {
           project: params.project,
@@ -24927,6 +24946,9 @@ async function captureBrainHealthSnapshot(db) {
 // src/modules/surgeon/application/workflow.ts
 var USER_ABORT_ERROR = "Run aborted by user (SIGINT).";
 var USER_ABORT_SUMMARY = "Run aborted by user.";
+var MAX_CONTINUATION_ATTEMPTS = 3;
+var LOW_BUDGET_FRACTION = 0.1;
+var SHALLOW_RUN_WARNING_BUDGET_USED_FRACTION = 0.5;
 function resolveRunBudget(options, config) {
   if (Number.isFinite(options.budget) && options.budget > 0) {
     return Math.floor(options.budget);
@@ -25112,6 +25134,7 @@ function buildInitialUserPrompt(options, stats, tokenBudget, dedupClusterCount,
       `Last surgeon run: ${stats.lastRun ? `${stats.lastRun.passType} ${stats.lastRun.status} at ${stats.lastRun.startedAt}` : "none"}.`,
       `Your budget is ${tokenBudget} tokens for the entire sweep.`,
       "Work through passes in priority order: contradictions -> dedup -> retirement.",
+      "Always run the proactive contradiction scan before dedup, even when pending conflicts start at 0.",
       "Call complete_pass with the pass_type for each phase transition, and complete_pass with pass_type='auto' when the full sweep is done."
     ].join(" ");
   }
@@ -25128,6 +25151,31 @@ function buildInitialUserPrompt(options, stats, tokenBudget, dedupClusterCount,
     "Work conservatively and use complete_pass when you are done."
   ].join(" ");
 }
+function buildContinuationPrompt(options, input) {
+  const lines = [
+    `You stopped without calling complete_pass and still have ${input.remainingTokens} tokens and about $${input.remainingCostUsd.toFixed(2)} of run budget remaining.`
+  ];
+  if (options.pass === "auto") {
+    lines.push(
+      "Continue the auto sweep. If contradictions are not fully scanned, resume there first. Otherwise continue with the next unfinished phase in order: contradictions, dedup, retirement."
+    );
+    lines.push(
+      "Do not call complete_pass with pass_type='auto' until the full sweep is genuinely done."
+    );
+  } else {
+    lines.push(`Continue the ${options.pass} pass.`);
+    lines.push(
+      "Do not call complete_pass until candidates are genuinely exhausted or budget is low."
+    );
+  }
+  lines.push("Keep paginating and evaluating candidates.");
+  lines.push("A healthy-looking batch or a few blocked candidates are not reasons to stop.");
+  lines.push(
+    "If contradiction or dedup scans feel too narrow or too noisy, call the query tool again with reset=true and adjusted thresholds."
+  );
+  lines.push(`This is continuation attempt ${input.attempt} of ${MAX_CONTINUATION_ATTEMPTS}.`);
+  return lines.join(" ");
+}
 function buildStoredSummary(passType, summary, phaseCompletions, snapshots) {
   if (!summary) {
     return null;
@@ -25299,6 +25347,7 @@ async function runSurgeon(options, deps) {
   };
   let terminalStatus = null;
   let terminalError = null;
+  let continuationAttempts = 0;
   async function finalizeRun(status, error, summaryOverride) {
     if (beforeSnapshot && !afterSnapshot) {
       afterSnapshot = await captureBrainHealthSnapshot(deps.db);
@@ -25479,6 +25528,30 @@ async function runSurgeon(options, deps) {
         convertToLlm,
         toolExecution: "sequential",
         getApiKey: deps.getApiKey,
+        getFollowUpMessages: async () => {
+          if (completionState.isComplete || signal?.aborted || budgetTracker.isExhausted() || budgetTracker.isCostCapExceeded() || continuationAttempts >= MAX_CONTINUATION_ATTEMPTS) {
+            return [];
+          }
+          const remaining = budgetTracker.remaining();
+          const tokenRemainingFraction = tokenBudget > 0 ? remaining.tokens / tokenBudget : 0;
+          const costRemainingFraction = runCostCap > 0 ? remaining.costUsd / runCostCap : 0;
+          if (tokenRemainingFraction < LOW_BUDGET_FRACTION || costRemainingFraction < LOW_BUDGET_FRACTION) {
+            return [];
+          }
+          continuationAttempts += 1;
+          log21.warn(
+            `Surgeon stopped without completing. Injecting continuation prompt (${continuationAttempts}/${MAX_CONTINUATION_ATTEMPTS}) with ${remaining.tokens} tokens and $${remaining.costUsd.toFixed(2)} remaining.`
+          );
+          return [{
+            role: "user",
+            content: buildContinuationPrompt(options, {
+              remainingTokens: remaining.tokens,
+              remainingCostUsd: remaining.costUsd,
+              attempt: continuationAttempts
+            }),
+            timestamp: Date.now()
+          }];
+        },
         beforeToolCall: async (context) => {
           registerUsage(context.assistantMessage);
           if (signal?.aborted) {
@@ -25580,6 +25653,19 @@ async function runSurgeon(options, deps) {
         summarizeCompletion(completionState.summary, completionState.passCompletions) ?? USER_ABORT_SUMMARY
       );
     }
+    const totals = budgetTracker.totals();
+    const budgetUsedPct = Math.min(
+      1,
+      Math.max(
+        runCostCap > 0 ? totals.costUsd / runCostCap : 1,
+        tokenBudget > 0 ? (totals.inputTokens + totals.outputTokens) / tokenBudget : 1
+      )
+    );
+    if (!completionState.isComplete && !budgetTracker.isExhausted() && !budgetTracker.isCostCapExceeded() && budgetUsedPct < SHALLOW_RUN_WARNING_BUDGET_USED_FRACTION) {
+      log21.warn(
+        `Surgeon ended without calling complete_pass and left ${((1 - budgetUsedPct) * 100).toFixed(0)}% of the run budget unused. The run may have quit early. Re-run with --verbose to inspect the trace.`
+      );
+    }
     const finalStatus = completionState.summary ? terminalStatus && terminalStatus !== "failed" ? terminalStatus : "completed" : terminalStatus ?? "completed";
     return finalizeRun(finalStatus, terminalError);
   } catch (error) {

package/dist/modules/surgeon/adapters/prompts/passes/auto.md CHANGED Viewed

@@ -6,15 +6,15 @@ You have access to ALL surgeon tools across ALL pass types. Your job is to work
 Work through passes in this priority order:
-1. **Contradictions** - Resolve pending conflicts first. Active inconsistencies degrade corpus trust. Use `query_conflicts` and `resolve_conflict`.
-   Then scan for undiscovered contradictions with `query_contradiction_candidates`, log confirmed pairs with `log_conflict`, and resolve them.
+1. **Contradictions** - Always start here. Resolve pending conflicts first. Active inconsistencies degrade corpus trust. Use `query_conflicts` and `resolve_conflict`.
+   Then run a proactive scan with `query_contradiction_candidates`, even if `query_conflicts` returned zero pending conflicts. Log confirmed pairs with `log_conflict` and resolve them.
 2. **Dedup** - Merge duplicate entries next. Duplicates waste recall bandwidth and confuse retrieval. Use `query_dedup_clusters` and `merge_cluster`.
 3. **Retirement** - Clean up stale entries last. Use `query_candidates` and `retire_entry`. After standard candidates, scan for supersession chains with `query_supersession_candidates`.
 ## Workflow
 1. Call `get_health_stats` to orient.
-2. Start with the highest-priority pass that has work to do. If there are pending conflicts, start with contradictions. If none, check for dedup clusters. If none, start retirement.
+2. **Always start with contradictions.** First resolve pending conflicts via `query_conflicts`. Then, regardless of whether there were pending conflicts, run a proactive scan using `query_contradiction_candidates` to find undiscovered contradictions. Only move to dedup after both pending conflicts are resolved and the proactive scan is complete, or the contradictions budget allocation is genuinely spent.
 3. Work through that pass's candidates using the same methodology described in its individual pass instructions.
 4. When a pass is complete (candidates exhausted or no more productive work), call `complete_pass` with `pass_type` set to the pass you just finished (for example, `"contradictions"`).
 5. After completing a pass, move to the next priority pass. You do not need to call `get_health_stats` again - check for work by calling the next pass's query tool directly.
@@ -40,6 +40,18 @@ Rough budget allocation guideline (not rigid):
 If a pass has no work, its budget share rolls into the next pass.
+## Budget Discipline
+**Do not stop early.** Your budget exists to be used. If you have remaining budget and there are still candidates to evaluate, keep working. Finishing a sweep at 1% of budget on a corpus of thousands of entries means you barely looked.
+Concrete rules:
+- After each `complete_pass` for a phase transition, check your remaining budget. If significant budget remains, the next phase should use it.
+- For retirement: page through all candidates until `query_candidates` returns empty or budget is genuinely low, meaning less than 10% remains. Seeing a batch of healthy entries is not a reason to stop. The next batch may contain stale entries.
+- For contradictions: the proactive scan is mandatory in auto mode. Zero pending conflicts from `query_conflicts` is not enough to move on.
+- For dedup: if the phase produces very few clusters, note that observation. A corpus of thousands of entries typically has many more than a handful of near-duplicate candidates.
+- If you reach `complete_pass` with `pass_type = "auto"` after using less than 50% of your budget, reconsider. Go back and do deeper evaluation: inspect more dedup clusters, reset the contradiction or dedup query with wider thresholds if needed, or run a broader retirement sweep.
 ## Complete Pass Calls
 Call `complete_pass` once per phase transition:
@@ -49,6 +61,6 @@ Call `complete_pass` once per phase transition:
 - After finishing retirement work: `complete_pass` with `pass_type = "retirement"`
 - After all passes are done: `complete_pass` with `pass_type = "auto"` (this is the final one)
-If a pass has zero work (for example, no pending conflicts), skip calling `complete_pass` for it - just move to the next pass.
+If a pass has zero actionable work after running its required discovery steps, you may move to the next pass without calling `complete_pass` for that phase.
 The final `complete_pass` with `pass_type = "auto"` should include observations and recommendations spanning all passes you worked through.

package/dist/modules/surgeon/adapters/prompts/passes/contradictions.md CHANGED Viewed

@@ -42,6 +42,8 @@ Keep working until conflicts are exhausted or budget is low. Call `complete_pass
 After resolving pending conflicts from `query_conflicts`, scan for undiscovered contradictions using `query_contradiction_candidates`. This finds pairs of active entries that the ingestion pipeline never compared - entries from different sessions, with different subject normalization, or from different project scopes.
+**Do not skip proactive scanning.** Even if `query_conflicts` returned zero pending conflicts, the proactive scan via `query_contradiction_candidates` is mandatory. Undiscovered contradictions are the most dangerous kind because they silently degrade corpus quality without appearing in the pending conflict queue.
 For each candidate pair:
 1. **Check if already known** - If `existingConflictLogId` is present, the conflict is already logged. Skip it or inspect the entries if you need more context.
@@ -57,7 +59,7 @@ For each candidate pair:
 **Prioritize claim divergence pairs** (strategy `"claim_divergence"`) over embedding similarity pairs. Claim divergence means two entries share the same subject and predicate but assert different objects - these are usually real conflicts. Embedding similarity pairs need more careful evaluation.
-Budget note: Proactive scanning is secondary to resolving known pending conflicts. If budget is tight after `query_conflicts`, skip the scan and note it in your `complete_pass` recommendations.
+Budget note: Known pending conflicts still take priority, but proactive scanning is part of completing this pass. Only stop before the scan is exhausted when budget is genuinely low. If that happens, note the incomplete scan in your `complete_pass` recommendations.
 ## Resolution Quality Rules

package/dist/modules/surgeon/adapters/prompts/passes/dedup.md CHANGED Viewed

@@ -31,6 +31,14 @@ For each cluster from `query_dedup_clusters`:
 - Preserve the strongest form. Prefer the most specific, complete, and well-supported version of the knowledge.
 - Treat recently consolidated entries with extra caution. If `merged_from > 0` and `consolidated_at` is recent, inspect before merging again.
+## Threshold Guidance
+The similarity threshold controls which entry pairs are surfaced as candidates for your review. It does not control whether entries actually get merged. Merging always requires your decision after reading the entries.
+The default threshold is deliberately low (`0.60`) so the candidate net is wide. That will surface some clear duplicates, some borderline cases, and some related-but-distinct entries. That is expected. Your job is to evaluate each cluster and decide: merge, flag, or skip.
+If the current threshold is too noisy or too narrow, call `query_dedup_clusters` with `reset = true` and a different `sim_threshold`. Reset clears the cached cluster set for this run and rebuilds it with your new parameters. Start wide, then tighten only if the candidate stream is mostly noise.
 ## Working Through Clusters
 You will receive clusters in batches.

package/dist/modules/surgeon/adapters/prompts/passes/retirement.md CHANGED Viewed

@@ -37,6 +37,10 @@ An empty `query_candidates` result means there are no more candidates matching t
 A productive pass works through hundreds of candidates, not dozens.
+**Budget awareness:** If your budget allows examining more candidates, you must keep paginating. Do not call `complete_pass` while `query_candidates` is still returning candidates and budget is available. A batch where most entries are healthy is normal in a healthy corpus. That does not mean the pass is done. Keep going. The stale entries are mixed throughout the candidate pool.
+Expected throughput: On a corpus of around 3000 entries, a retirement pass with adequate budget should evaluate at least 500 candidates and often more. If you stop at 100, you probably have not done enough.
 ## Type-Specific Heuristics
 Different entry types have different retirement profiles:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "agenr",
-  "version": "0.13.1",
+  "version": "0.13.2",
   "openclaw": {
     "extensions": [
       "dist/edge/openclaw/index.js"