@ouro.bot/cli 0.1.0-alpha.490 → 0.1.0-alpha.491

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/changelog.json CHANGED
@@ -1,6 +1,14 @@
1
1
  {
2
2
  "_note": "This changelog is maintained as part of the PR/version-bump workflow. Agent-curated, not auto-generated. Agents read this file directly via read_file to understand what changed between versions.",
3
3
  "versions": [
4
+ {
5
+ "version": "0.1.0-alpha.491",
6
+ "changes": [
7
+ "Two live-runtime bugs Slugger surfaced during MCP roundtrip: (1) MCP `send_message` returned blank or raw `<think>` content instead of an actual reply when minimax-style models emitted only reasoning. The shared-turn runner now strips closed AND unclosed `<think>...</think>` blocks before returning, and when nothing remains it returns a clear diagnostic (`agent produced reasoning but no final answer this turn — try again`) plus emits a `senses.shared_turn_only_reasoning` warn-level nerve event so operators can see how often it's happening. (2) The `rest` tool's fresh-pending-work gate fired on every rest call within a turn because `hasFreshPendingWork(options)` reads from the turn-start snapshot of `pendingMessages` and never updates — once pending was non-empty, the agent could be told 'fresh work arrived' indefinitely even after surfacing or processing the items. The gate is now once-per-turn: the first rest call hits it, gets the message, the agent does whatever it needs, the next rest call passes. Emits `engine.fresh_work_gate_fired` info-level event the one time it fires. Both bugs reproduced as regression tests that fail without the fix.",
8
+ "New `trip_update_leg` tool to round out the trip ledger. The original Step 4 followup on substrate#35 listed `trip_ensure / trip_get / trip_update_leg / trip_attach_evidence` — the previous PR (#609) shipped `trip_upsert` instead of `trip_update_leg`, which forced the agent to re-emit the entire trip record to change one leg field. `trip_update_leg` updates specific fields of an existing leg in place: pass `tripId`, `legId`, a JSON object of field updates, and `updatedAt`. Identity-changing updates (`legId`, `kind`) and empty updates objects are rejected with operational error messages. Existing evidence is preserved unless the agent explicitly overwrites it. Emits a `trips.leg_updated` nerve event with the field list. Tool registry now at 74 tools (up from 73).",
9
+ "New `docs/trip-ledger.md` covering what the ledger actually is, why Slugger said it needed to exist (gap between mail body and travel doc — no authoritative source for cross-checks), the discriminated TripLeg union (lodging / flight / train / ground-transport / rental-car / ferry / event), the non-optional TripEvidence shape with `discoveryMethod`, the trust shape (per-agent keys, private key returned exactly once, hosted side never sees plaintext), the seven harness tools, on-disk layout for both harness and hosted sides, and an explicit answer to 'is this travel-specific or generalizable infra?' (current shape is travel-specific by design; the *pattern* is general and would be lifted to a shared abstraction the next time we build a per-agent encrypted record service)."
10
+ ]
11
+ },
4
12
  {
5
13
  "version": "0.1.0-alpha.490",
6
14
  "changes": [
@@ -180,6 +180,29 @@ function getProviderDisplayLabel(facing = "human") {
180
180
  };
181
181
  return providerLabelBuilders[provider]();
182
182
  }
183
+ /**
184
+ * Strip <think>...</think> blocks for the violation-detection check at the
185
+ * end of a streaming turn. Used to tell legitimate text-only responses
186
+ * apart from the MiniMax-M2.7 "only thinking, no tool call" violation
187
+ * shape. Mirrors the more thorough stripThinkBlocks helper in
188
+ * senses/shared-turn.ts (which is for operator-facing output) — kept
189
+ * inline here to avoid pulling senses/ into the core module's import graph.
190
+ */
191
+ function stripThinkBlocksForViolationCheck(input) {
192
+ let out = input;
193
+ for (;;) {
194
+ const open = out.indexOf("<think>");
195
+ if (open === -1)
196
+ break;
197
+ const close = out.indexOf("</think>", open + "<think>".length);
198
+ if (close === -1) {
199
+ out = out.slice(0, open);
200
+ break;
201
+ }
202
+ out = out.slice(0, open) + out.slice(close + "</think>".length);
203
+ }
204
+ return out.trim();
205
+ }
183
206
  function hasFreshPendingWork(options) {
184
207
  const pendingMessages = options?.pendingMessages;
185
208
  if (!Array.isArray(pendingMessages))
@@ -545,6 +568,21 @@ async function runAgent(messages, callbacks, channel, signal, options) {
545
568
  let sawQuerySession = false;
546
569
  let sawBridgeManage = false;
547
570
  let sawExternalStateQuery = false;
571
+ // Once-per-turn flag for the fresh-work rest gate. Without this, an agent
572
+ // that called rest, was told "fresh work arrived", processed the items,
573
+ // and called rest again would get the same message forever — the gate
574
+ // condition is read from the turn-start snapshot of pendingMessages,
575
+ // which doesn't update mid-turn. The agent only needs to be told once;
576
+ // after that, repeated rest attempts mean they've acknowledged.
577
+ let freshWorkGateFired = false;
578
+ // Counter for "no tool call returned despite tool_choice=required" violations.
579
+ // MiniMax reasoning models occasionally emit only a <think>...</think>
580
+ // block and stop, without any tool call — even when tool_choice is set to
581
+ // "required". This is a provider-level violation; the harness retries with
582
+ // a corrective nudge up to a small cap rather than silently accepting an
583
+ // empty turn.
584
+ let noToolCallRetries = 0;
585
+ const NO_TOOL_CALL_MAX_RETRIES = 2;
548
586
  const toolLoopState = (0, tool_loop_1.createToolLoopState)();
549
587
  const toolFrictionLedger = (0, tool_friction_1.createToolFrictionLedger)();
550
588
  const finishTerminalProviderError = (error, classification) => {
@@ -751,14 +789,53 @@ async function runAgent(messages, callbacks, channel, signal, options) {
751
789
  if (hasPhaseAnnotation) {
752
790
  msg.phase = isSoleSettle ? "settle" : "commentary";
753
791
  }
792
+ // Detect the MiniMax "only-thinking, no tool call" violation: no tool
793
+ // calls returned, and the content is empty after stripping
794
+ // <think>...</think> blocks. This is a narrow check — legitimate
795
+ // content-only responses (text without think tags, or text outside
796
+ // think tags) still flow through the original "no tool calls →
797
+ // accept as-is" path so existing channels and tests are unaffected.
798
+ const onlyThinkContent = !result.toolCalls.length
799
+ && typeof result.content === "string"
800
+ && stripThinkBlocksForViolationCheck(result.content).length === 0
801
+ && result.content.length > 0;
754
802
  if (!result.toolCalls.length) {
755
- // No tool calls accept response as-is.
756
- // (Kick detection disabled; tool_choice: required + settle
757
- // is the primary loop control. See src/heart/kicks.ts to re-enable.)
803
+ if (onlyThinkContent && toolChoiceRequired && noToolCallRetries < NO_TOOL_CALL_MAX_RETRIES) {
804
+ // Provider-level violation: tool_choice was required, model emitted
805
+ // only a <think>...</think> block (or empty content) with no tool
806
+ // call. Retry with a corrective nudge up to NO_TOOL_CALL_MAX_RETRIES
807
+ // times. After cap, accept as-is (the readback path strips think
808
+ // tags and surfaces a clear diagnostic).
809
+ noToolCallRetries++;
810
+ (0, runtime_1.emitNervesEvent)({
811
+ level: "warn",
812
+ component: "engine",
813
+ event: "engine.no_tool_call_retry",
814
+ message: "model returned only <think> content with no tool call despite tool_choice=required; retrying with corrective nudge",
815
+ meta: {
816
+ attempt: noToolCallRetries,
817
+ cap: NO_TOOL_CALL_MAX_RETRIES,
818
+ provider: providerRuntime.id,
819
+ model: providerRuntime.model,
820
+ contentLength: result.content.length,
821
+ },
822
+ });
823
+ messages.push(msg);
824
+ messages.push({
825
+ role: "user",
826
+ content: isInnerDialog
827
+ ? "no tool was called this turn. you must end every turn by calling rest (or surface, ponder, observe). emit the tool call now."
828
+ : "no tool was called this turn. you must end every turn by calling settle with your answer (or ponder/observe). emit the tool call now.",
829
+ });
830
+ continue;
831
+ }
832
+ // Legitimate text-only response, or cap reached — accept as-is.
758
833
  messages.push(msg);
759
834
  done = true;
760
835
  }
761
836
  else {
837
+ // Reset the retry counter on any successful tool call.
838
+ noToolCallRetries = 0;
762
839
  // Check for settle sole call: intercept before tool execution
763
840
  if (isSoleSettle) {
764
841
  /* v8 ignore next -- defensive: JSON.parse catch for malformed settle args @preserve */
@@ -896,12 +973,20 @@ async function runAgent(messages, callbacks, channel, signal, options) {
896
973
  providerRuntime.appendToolOutput(result.toolCalls[0].id, gateMessage);
897
974
  continue;
898
975
  }
899
- if (hasFreshPendingWork(options)) {
976
+ if (hasFreshPendingWork(options) && !freshWorkGateFired) {
977
+ freshWorkGateFired = true;
900
978
  callbacks.onToolEnd("rest", (0, tools_1.summarizeArgs)("rest", restArgs), false);
901
979
  messages.push(msg);
902
980
  const gateMessage = "fresh work arrived for me this turn — inspect the pending messages above and take the next concrete action before you rest.";
903
981
  messages.push({ role: "tool", tool_call_id: result.toolCalls[0].id, content: gateMessage });
904
982
  providerRuntime.appendToolOutput(result.toolCalls[0].id, gateMessage);
983
+ (0, runtime_1.emitNervesEvent)({
984
+ level: "info",
985
+ component: "engine",
986
+ event: "engine.fresh_work_gate_fired",
987
+ message: "rest deferred once because pending work arrived this turn; agent has been notified",
988
+ meta: { pendingCount: options.pendingMessages.length },
989
+ });
905
990
  continue;
906
991
  }
907
992
  callbacks.onToolEnd("rest", (0, tools_1.summarizeArgs)("rest", restArgs), true);
@@ -2938,6 +2938,14 @@ async function executeConnectBlueBubbles(agent, deps) {
2938
2938
  const port = parseOptionalPort(await promptInput("Local webhook port [18790]: "), 18790, "BlueBubbles webhook port");
2939
2939
  const webhookPath = normalizeWebhookPath(await promptInput("Local webhook path [/bluebubbles-webhook]: "), "/bluebubbles-webhook");
2940
2940
  const requestTimeoutMs = parseOptionalPositiveInteger(await promptInput("Request timeout ms [30000]: "), 30000, "BlueBubbles request timeout");
2941
+ // Capture the operator's known iMessage handles so the BB ingest path can
2942
+ // filter group-chat echoes whose `isFromMe` flag was lost or never set.
2943
+ // Without this, the agent would ingest its own outbound message as inbound
2944
+ // and reply to itself ("Slugger talking to himself" in groups).
2945
+ const ownHandlesRaw = (await promptInput("Your iMessage handle(s) — phone(s) and/or email(s) BlueBubbles attributes to your sent messages (comma-separated; needed for the group self-talk filter; blank to skip): ")).trim();
2946
+ const ownHandles = ownHandlesRaw
2947
+ ? ownHandlesRaw.split(",").map((h) => h.trim()).filter((h) => h.length > 0)
2948
+ : [];
2941
2949
  const machineId = currentMachineId(deps);
2942
2950
  const progress = createHumanCommandProgress(deps, "connect bluebubbles");
2943
2951
  let stored;
@@ -2956,6 +2964,7 @@ async function executeConnectBlueBubbles(agent, deps) {
2956
2964
  serverUrl,
2957
2965
  password,
2958
2966
  accountId: "default",
2967
+ ownHandles,
2959
2968
  },
2960
2969
  bluebubblesChannel: {
2961
2970
  port,
@@ -2984,6 +2993,7 @@ async function executeConnectBlueBubbles(agent, deps) {
2984
2993
  `Stored: ${stored.itemPath}`,
2985
2994
  "agent.json: senses.bluebubbles.enabled = true",
2986
2995
  `Runtime: ${daemonApply}`,
2996
+ `ownHandles: ${ownHandles.length > 0 ? ownHandles.join(", ") : "(none — group self-talk filter inactive)"}`,
2987
2997
  "secret was not printed",
2988
2998
  ...(syncSummary ? [syncSummary] : []),
2989
2999
  ],
@@ -248,6 +248,82 @@ exports.tripToolDefinitions = [
248
248
  },
249
249
  summaryKeys: ["tripId", "legId"],
250
250
  },
251
+ {
252
+ tool: {
253
+ type: "function",
254
+ function: {
255
+ name: "trip_update_leg",
256
+ description: "Update specific fields of an existing leg in a trip. Pass tripId, legId, and a JSON object of field updates (e.g. {status:\"cancelled\", confirmationCode:\"PNR123\"}). Existing evidence is preserved unless explicitly overwritten. Use this instead of trip_upsert when you only need to change one leg without re-emitting the whole record. The leg's `kind` cannot be changed (changing kind means a new leg).",
257
+ parameters: {
258
+ type: "object",
259
+ properties: {
260
+ tripId: { type: "string", description: "Canonical trip id." },
261
+ legId: { type: "string", description: "Leg id within the trip." },
262
+ updates: { type: "string", description: "JSON object of leg fields to update. Cannot include `legId` or `kind`. Common fields: status, confirmationCode, vendor, amount, checkInDate, checkOutDate, departureTime, arrivalTime, etc." },
263
+ updatedAt: { type: "string", description: "ISO timestamp for the update. Used both for the leg's updatedAt and the trip's updatedAt." },
264
+ },
265
+ required: ["tripId", "legId", "updates", "updatedAt"],
266
+ },
267
+ },
268
+ },
269
+ handler: async (args, ctx) => {
270
+ if (!trustAllowsTripAccess(ctx))
271
+ return "trip ledger is private; this tool is only available in trusted contexts.";
272
+ const tripId = args.tripId;
273
+ const legId = args.legId;
274
+ const updatedAt = args.updatedAt;
275
+ if (typeof tripId !== "string" || tripId.length === 0)
276
+ return "tripId is required.";
277
+ if (typeof legId !== "string" || legId.length === 0)
278
+ return "legId is required.";
279
+ if (typeof updatedAt !== "string" || updatedAt.length === 0)
280
+ return "updatedAt is required.";
281
+ try {
282
+ const updates = parseJsonArg(args.updates, "updates");
283
+ if (!isRecord(updates))
284
+ return "updates must be a JSON object.";
285
+ // Reject identity-changing fields — those would silently break referential integrity.
286
+ if ("legId" in updates)
287
+ return "updates cannot change legId; create a new leg instead.";
288
+ if ("kind" in updates)
289
+ return "updates cannot change kind; create a new leg instead.";
290
+ if (Object.keys(updates).length === 0)
291
+ return "updates cannot be empty — pass at least one field.";
292
+ const trip = (0, store_1.readTripRecord)((0, identity_1.getAgentName)(), tripId);
293
+ const legIndex = trip.legs.findIndex((leg) => leg.legId === legId);
294
+ if (legIndex === -1)
295
+ return `leg ${legId} not found in trip ${tripId}.`;
296
+ const leg = trip.legs[legIndex];
297
+ const updatedLeg = {
298
+ ...leg,
299
+ ...updates,
300
+ legId: leg.legId,
301
+ kind: leg.kind,
302
+ updatedAt,
303
+ };
304
+ const updated = {
305
+ ...trip,
306
+ legs: [...trip.legs.slice(0, legIndex), updatedLeg, ...trip.legs.slice(legIndex + 1)],
307
+ updatedAt,
308
+ };
309
+ (0, store_1.upsertTripRecord)((0, identity_1.getAgentName)(), updated);
310
+ (0, runtime_1.emitNervesEvent)({
311
+ component: "trips",
312
+ event: "trips.leg_updated",
313
+ message: "trip leg fields updated",
314
+ meta: { agentId: (0, identity_1.getAgentName)(), tripId, legId, fields: Object.keys(updates) },
315
+ });
316
+ const fieldList = Object.keys(updates).join(", ");
317
+ return `leg ${legId} updated in ${tripId}: ${fieldList}.`;
318
+ }
319
+ catch (error) {
320
+ if (error instanceof store_1.TripNotFoundError)
321
+ return error.message;
322
+ return `update failed: ${error instanceof Error ? error.message : /* v8 ignore next -- non-Error throw is unreachable from parseJsonArg/store */ String(error)}`;
323
+ }
324
+ },
325
+ summaryKeys: ["tripId", "legId"],
326
+ },
251
327
  {
252
328
  tool: {
253
329
  type: "function",
@@ -39,6 +39,7 @@ var __importStar = (this && this.__importStar) || (function () {
39
39
  };
40
40
  })();
41
41
  Object.defineProperty(exports, "__esModule", { value: true });
42
+ exports.stripThinkBlocks = stripThinkBlocks;
42
43
  exports.runSenseTurn = runSenseTurn;
43
44
  const os = __importStar(require("os"));
44
45
  const path = __importStar(require("path"));
@@ -59,6 +60,29 @@ const pipeline_1 = require("./pipeline");
59
60
  const mcp_manager_1 = require("../repertoire/mcp-manager");
60
61
  const runtime_1 = require("../nerves/runtime");
61
62
  const RESPONSE_CAP = 50_000;
63
+ /**
64
+ * Strip MiniMax-style `<think>...</think>` reasoning blocks from a response
65
+ * string. Handles unclosed open tags (treats everything from `<think>` to
66
+ * end of string as reasoning) and multiple blocks in sequence. Returns the
67
+ * trimmed remainder.
68
+ */
69
+ function stripThinkBlocks(input) {
70
+ let out = input;
71
+ // Closed blocks first (greedy match removed by repeatedly slicing the leftmost pair).
72
+ for (;;) {
73
+ const open = out.indexOf("<think>");
74
+ if (open === -1)
75
+ break;
76
+ const close = out.indexOf("</think>", open + "<think>".length);
77
+ if (close === -1) {
78
+ // Unclosed — drop everything from <think> onward.
79
+ out = out.slice(0, open);
80
+ break;
81
+ }
82
+ out = out.slice(0, open) + out.slice(close + "</think>".length);
83
+ }
84
+ return out.trim();
85
+ }
62
86
  /**
63
87
  * Run a single agent turn through the inbound pipeline.
64
88
  * Caller provides channel, session key, friend, and message;
@@ -191,6 +215,25 @@ async function runSenseTurn(options) {
191
215
  else {
192
216
  finalResponse = responseText;
193
217
  }
218
+ // Strip MiniMax-style <think>...</think> blocks from the final response.
219
+ // When a reasoning-style model emits only a think block and no final answer
220
+ // (no settle tool call, no post-think text), the readback path above
221
+ // surfaces the raw saved assistant content — which includes the think tags
222
+ // and renders as empty (or as raw reasoning) on MCP/CLI clients. Strip
223
+ // here so the caller sees the actual delivered text. If only reasoning
224
+ // came through and nothing else, surface a clear diagnostic message
225
+ // instead of a blank response so the operator knows what happened.
226
+ finalResponse = stripThinkBlocks(finalResponse);
227
+ if (finalResponse.length === 0) {
228
+ (0, runtime_1.emitNervesEvent)({
229
+ level: "warn",
230
+ component: "senses",
231
+ event: "senses.shared_turn_only_reasoning",
232
+ message: "agent produced only <think> reasoning with no final answer — likely a model that closed the think tag without continuing",
233
+ meta: { agentName, channel, sessionKey, friendId },
234
+ });
235
+ finalResponse = "(agent produced reasoning but no final answer this turn — try again, or check the session transcript for the trace)";
236
+ }
194
237
  // Cap response length
195
238
  if (finalResponse.length > RESPONSE_CAP) {
196
239
  finalResponse = finalResponse.slice(0, RESPONSE_CAP) + "\n\n[truncated — response exceeded 50K characters]";
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@ouro.bot/cli",
3
- "version": "0.1.0-alpha.490",
3
+ "version": "0.1.0-alpha.491",
4
4
  "main": "dist/heart/daemon/ouro-entry.js",
5
5
  "bin": {
6
6
  "cli": "dist/heart/daemon/ouro-bot-entry.js",