@probelabs/probe 0.6.0-rc272 → 0.6.0-rc274

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/cjs/index.cjs CHANGED
@@ -35869,20 +35869,15 @@ var init_zod = __esm({
35869
35869
 
35870
35870
  // src/agent/tasks/taskTool.js
35871
35871
  function createTaskCompletionBlockedMessage(taskSummary) {
35872
- return `<task_completion_blocked>
35873
- You cannot complete yet. The following tasks are still unresolved:
35872
+ return `You cannot complete yet. The following tasks are still unresolved:
35874
35873
 
35875
35874
  ${taskSummary}
35876
35875
 
35877
- Required action:
35878
- 1. For each "pending" or "in_progress" task, either:
35879
- - Complete the work and mark it: <task><action>complete</action><id>task-X</id></task>
35880
- - Or cancel if no longer needed: <task><action>update</action><id>task-X</id><status>cancelled</status></task>
35876
+ For each pending/in_progress task, either:
35877
+ - Complete it: call task tool with action="complete", id="task-X"
35878
+ - Cancel it: call task tool with action="update", id="task-X", status="cancelled"
35881
35879
 
35882
- 2. After ALL tasks are resolved (completed or cancelled), call attempt_completion again.
35883
-
35884
- Use <task><action>list</action></task> to review current status.
35885
- </task_completion_blocked>`;
35880
+ After all tasks are resolved, call attempt_completion again.`;
35886
35881
  }
35887
35882
  function createTaskTool(options = {}) {
35888
35883
  const { taskManager, tracer, debug = false } = options;
@@ -36105,145 +36100,46 @@ var init_taskTool = __esm({
36105
36100
  dependencies: external_exports.array(external_exports.string()).optional(),
36106
36101
  after: external_exports.string().optional()
36107
36102
  });
36108
- taskSystemPrompt = `[Task Management System]
36109
-
36110
- You have access to a task tracking tool to organize your work on complex requests.
36111
-
36112
- ## When to Create Tasks
36113
-
36114
- CREATE TASKS when the request has **multiple distinct deliverables or goals**:
36115
- - "Fix bug A AND add feature B" \u2192 Two separate tasks
36116
- - "Investigate auth, payments, AND notifications" \u2192 Three independent areas
36117
- - "Implement X, then add tests, then update docs" \u2192 Sequential phases with different outputs
36118
- - User explicitly asks for a plan or task breakdown
36119
-
36120
- SKIP TASKS for single-goal requests, even if they require multiple searches:
36121
- - "How does ranking work?" \u2192 Just investigate and answer (one goal)
36122
- - "What does function X do?" \u2192 Just look it up (one goal)
36123
- - "Explain the authentication flow" \u2192 Just trace and explain (one goal)
36124
- - "Find where errors are logged" \u2192 Just search and report (one goal)
36125
-
36126
- **Key insight**: Multiple *internal steps* (search, read, analyze) are NOT the same as multiple *goals*.
36127
- A single investigation with many steps is still ONE task, not many.
36128
-
36129
- ## Task Granularity
36130
-
36131
- Tasks represent LOGICAL UNITS OF WORK, not individual files or steps:
36132
- - "Fix 8 similar test files" \u2192 ONE task (same type of fix across files)
36133
- - "Update API + tests + docs" \u2192 THREE tasks (different types of work)
36134
- - "Implement feature in 5 files" \u2192 ONE task (single feature)
36135
-
36136
- **Rule of thumb**: If you're creating more than 3-4 tasks, you're probably too granular.
36137
-
36138
- **Anti-patterns to avoid**:
36139
- - One task per file \u274C
36140
- - One task per function \u274C
36141
- - One task per repository (when same type of work) \u274C
36142
-
36143
- **Good patterns**:
36144
- - One task per distinct deliverable \u2713
36145
- - One task per phase (implement, test, document) \u2713
36146
- - One task per different type of work \u2713
36147
-
36148
- MODIFY TASKS when (during execution):
36149
- - You discover the problem is more complex than expected \u2192 Add new tasks
36150
- - A single task covers too much scope \u2192 Split into smaller tasks
36151
- - You find related work that needs attention \u2192 Add dependent tasks
36152
- - A task becomes irrelevant based on findings \u2192 Cancel it
36153
- - Task priorities change based on discoveries \u2192 Update priority
36154
- - You learn new context \u2192 Update task description
36155
-
36156
- ## Task Workflow
36157
-
36158
- **STEP 1 - Plan (at start):**
36159
- Analyze the request and create tasks for each logical step:
36160
-
36161
- <task>
36162
- <action>create</action>
36163
- <tasks>[
36164
- {"title": "Search for authentication module", "priority": "high"},
36165
- {"title": "Analyze login flow implementation", "dependencies": ["task-1"]},
36166
- {"title": "Find session management code", "dependencies": ["task-1"]},
36167
- {"title": "Summarize authentication architecture", "dependencies": ["task-2", "task-3"]}
36168
- ]</tasks>
36169
- </task>
36170
-
36171
- **STEP 2 - Execute (during work):**
36172
- Update task status as you work:
36173
-
36174
- <task>
36175
- <action>update</action>
36176
- <id>task-1</id>
36177
- <status>in_progress</status>
36178
- </task>
36179
-
36180
- ... do the work (search, extract, etc.) ...
36181
-
36182
- <task>
36183
- <action>complete</action>
36184
- <id>task-1</id>
36185
- </task>
36186
-
36187
- **STEP 2b - Adapt (when you discover new work):**
36188
- As you work, you may discover that:
36189
- - A task is more complex than expected \u2192 Split it into subtasks
36190
- - New areas need investigation \u2192 Add new tasks
36191
- - Some tasks are no longer needed \u2192 Cancel them
36192
- - Task order should change \u2192 Update dependencies
36193
-
36194
- *Adding a new task when you discover more work:*
36195
- <task>
36196
- <action>create</action>
36197
- <title>Investigate caching layer</title>
36198
- <description>Found references to Redis caching in auth module</description>
36199
- </task>
36200
-
36201
- *Inserting a task after a specific task (to maintain logical order):*
36202
- <task>
36203
- <action>create</action>
36204
- <title>Check rate limiting</title>
36205
- <after>task-2</after>
36206
- </task>
36207
-
36208
- *Cancelling and splitting a complex task:*
36209
- <task>
36210
- <action>update</action>
36211
- <id>task-3</id>
36212
- <status>cancelled</status>
36213
- </task>
36214
- <task>
36215
- <action>create</action>
36216
- <tasks>[
36217
- {"title": "Review JWT token generation", "priority": "high"},
36218
- {"title": "Review token refresh logic"}
36219
- ]</tasks>
36220
- </task>
36221
-
36222
- **STEP 3 - Finish (before completion):**
36223
- Before calling attempt_completion, ensure ALL tasks are either:
36224
- - \`completed\` - you finished the work
36225
- - \`cancelled\` - no longer needed
36226
-
36227
- If you created tasks, you MUST resolve them all before completing.
36228
-
36229
- ## Key Rules
36230
-
36231
- 1. **Dependencies are enforced**: A task cannot start until its dependencies are completed
36232
- 2. **Circular dependencies are rejected**: task-1 \u2192 task-2 \u2192 task-1 is invalid
36233
- 3. **Completion is blocked**: attempt_completion will fail if tasks remain unresolved
36234
- 4. **List to review**: Use <task><action>list</action></task> to see current task status
36235
- 5. **Tasks are living documents**: Add, split, or cancel tasks as you learn more about the problem
36103
+ taskSystemPrompt = `[Task Management]
36104
+
36105
+ Use the task tool to track progress on complex requests with multiple distinct goals.
36106
+
36107
+ ## When to Use Tasks
36108
+
36109
+ CREATE tasks when the request has **multiple separate deliverables**:
36110
+ - "Fix bug A AND add feature B" \u2192 two tasks
36111
+ - "Investigate auth, payments, AND notifications" \u2192 three tasks
36112
+ - "Implement X, then add tests, then update docs" \u2192 three sequential tasks
36113
+
36114
+ SKIP tasks for single-goal requests, even complex ones:
36115
+ - "How does ranking work?" \u2014 just investigate and answer
36116
+ - "Explain the authentication flow" \u2014 just trace and explain
36117
+ Multiple internal steps (search, read, analyze) for one goal \u2260 multiple tasks.
36118
+
36119
+ ## Granularity
36120
+
36121
+ Tasks = logical units of work, not files or steps.
36122
+ - "Fix 8 similar test files" \u2192 ONE task (same fix repeated)
36123
+ - "Update API + tests + docs" \u2192 THREE tasks (different work types)
36124
+ - Max 3\u20134 tasks. More means you're too granular.
36125
+
36126
+ ## Workflow
36127
+
36128
+ 1. **Plan**: Call task tool with action="create" and a tasks array up front
36129
+ 2. **Execute**: Update status to "in_progress" / "completed" as you work. Add, split, or cancel tasks as you learn more.
36130
+ 3. **Finish**: All tasks must be "completed" or "cancelled" before calling attempt_completion.
36131
+
36132
+ ## Rules
36133
+
36134
+ - Dependencies are enforced: a task cannot start until its dependencies are completed
36135
+ - Circular dependencies are rejected
36136
+ - attempt_completion is blocked while tasks remain unresolved
36236
36137
  `;
36237
- taskGuidancePrompt = `<task_guidance>
36238
- Does this request have MULTIPLE DISTINCT GOALS?
36138
+ taskGuidancePrompt = `Does this request have MULTIPLE DISTINCT GOALS?
36239
36139
  - "Do A AND B AND C" (multiple goals) \u2192 Create tasks for each goal
36240
36140
  - "Investigate/explain/find X" (single goal) \u2192 Skip tasks, just answer directly
36241
-
36242
- Multiple internal steps (search, read, analyze) for ONE goal = NO tasks needed.
36243
- Only create tasks when there are separate deliverables the user is asking for.
36244
-
36245
- If creating tasks, use the task tool with action="create" first.
36246
- </task_guidance>`;
36141
+ Multiple internal steps for ONE goal = NO tasks needed.
36142
+ If creating tasks, use the task tool with action="create" first.`;
36247
36143
  }
36248
36144
  });
36249
36145
 
@@ -82742,9 +82638,10 @@ If the solution is clear, you can jump to implementation right away. If not, ask
82742
82638
  - After every significant change, verify the project still builds and passes linting. Do not wait until the end to discover breakage.
82743
82639
 
82744
82640
  # After Implementation
82745
- - Always run the project's tests before considering the task complete. If tests fail, fix them.
82746
- - Run lint and typecheck commands if known for the project.
82747
- - If a build, lint, or test fails, fix the issue before finishing.
82641
+ - Verify the project builds successfully. If it doesn't, fix the build before moving on.
82642
+ - Run lint and typecheck commands if known for the project. Fix any new warnings or errors you introduced.
82643
+ - Add tests for any new or changed functionality. Tests must cover the main path and important edge cases.
82644
+ - Run the project's full test suite. If any tests fail (including pre-existing ones you may have broken), fix them before finishing.
82748
82645
  - When the task is done, respond to the user with a concise summary of what was implemented, what files were changed, and any relevant details. Include links (e.g. pull request URL) so the user has everything they need.
82749
82646
 
82750
82647
  # GitHub Integration
@@ -91957,6 +91854,19 @@ function isContextLimitError(error2) {
91957
91854
  }
91958
91855
  return false;
91959
91856
  }
91857
+ function messageContainsCompletion(msg) {
91858
+ if (Array.isArray(msg.toolInvocations)) {
91859
+ if (msg.toolInvocations.some((t5) => t5.toolName === "attempt_completion")) return true;
91860
+ }
91861
+ if (Array.isArray(msg.tool_calls)) {
91862
+ if (msg.tool_calls.some((t5) => t5.function?.name === "attempt_completion")) return true;
91863
+ }
91864
+ if (Array.isArray(msg.content)) {
91865
+ if (msg.content.some((p5) => p5.type === "tool-call" && p5.toolName === "attempt_completion")) return true;
91866
+ }
91867
+ const text = typeof msg.content === "string" ? msg.content : "";
91868
+ return text.includes("attempt_completion");
91869
+ }
91960
91870
  function identifyMessageSegments(messages) {
91961
91871
  const segments = [];
91962
91872
  let currentSegment = null;
@@ -91965,27 +91875,23 @@ function identifyMessageSegments(messages) {
91965
91875
  if (msg.role === "system") {
91966
91876
  continue;
91967
91877
  }
91878
+ if (msg.role === "tool" && currentSegment) {
91879
+ currentSegment.monologueIndices.push(i5);
91880
+ continue;
91881
+ }
91968
91882
  if (msg.role === "user") {
91969
- const content = typeof msg.content === "string" ? msg.content : "";
91970
- const isToolResult = content.includes("<tool_result>");
91971
- if (isToolResult && currentSegment) {
91972
- currentSegment.finalIndex = i5;
91883
+ if (currentSegment) {
91973
91884
  segments.push(currentSegment);
91974
- currentSegment = null;
91975
- } else {
91976
- if (currentSegment) {
91977
- segments.push(currentSegment);
91978
- }
91979
- currentSegment = {
91980
- userIndex: i5,
91981
- monologueIndices: [],
91982
- finalIndex: null
91983
- };
91984
91885
  }
91886
+ currentSegment = {
91887
+ userIndex: i5,
91888
+ monologueIndices: [],
91889
+ finalIndex: null
91890
+ };
91985
91891
  }
91986
91892
  if (msg.role === "assistant" && currentSegment) {
91987
- const content = typeof msg.content === "string" ? msg.content : "";
91988
- if (content.includes("<attempt_completion>") || content.includes("attempt_completion")) {
91893
+ const hasCompletion = messageContainsCompletion(msg);
91894
+ if (hasCompletion) {
91989
91895
  currentSegment.monologueIndices.push(i5);
91990
91896
  currentSegment.finalIndex = i5;
91991
91897
  segments.push(currentSegment);
@@ -110092,8 +109998,7 @@ Instructions:
110092
109998
  - Format as a structured list if multiple items found
110093
109999
  - If nothing relevant is found in this chunk, respond with "No relevant items found in this chunk."
110094
110000
  - Do NOT summarize the code - extract the specific information requested
110095
- - IMPORTANT: When completing, always use the FULL format: <attempt_completion><result>YOUR ANSWER HERE</result></attempt_completion>
110096
- - Do NOT use the shorthand <attempt_complete></attempt_complete> format`;
110001
+ - When done, use the attempt_completion tool with your answer as the result.`;
110097
110002
  try {
110098
110003
  const result = await delegate({
110099
110004
  task,
@@ -110158,7 +110063,7 @@ async function aggregateResults(chunkResults2, aggregation, extractionPrompt, op
110158
110063
  ${stripResultTags(r5.result)}`).join("\n\n");
110159
110064
  const completionNote = `
110160
110065
 
110161
- IMPORTANT: When completing, always use the FULL format: <attempt_completion><result>YOUR ANSWER HERE</result></attempt_completion>`;
110066
+ When done, use the attempt_completion tool with your answer as the result.`;
110162
110067
  const aggregationPrompts = {
110163
110068
  summarize: `Synthesize these analyses into a comprehensive summary. Combine related findings, remove redundancy, and present a coherent overview.
110164
110069
 
@@ -110316,7 +110221,7 @@ Your answer should:
110316
110221
 
110317
110222
  Format your response as a well-structured document that fully answers: "${question}"
110318
110223
 
110319
- IMPORTANT: When completing, use the FULL format: <attempt_completion><result>YOUR ANSWER HERE</result></attempt_completion>`;
110224
+ When done, use the attempt_completion tool with your answer as the result.`;
110320
110225
  try {
110321
110226
  const result = await delegate({
110322
110227
  task: synthesisTask,
@@ -111386,9 +111291,7 @@ Example: <edit><file_path>${file_path}</file_path><symbol>${allMatches[0].qualif
111386
111291
  if (fileTracker) {
111387
111292
  const check = fileTracker.checkSymbolContent(resolvedPath2, symbol15, symbolInfo.code);
111388
111293
  if (!check.ok && check.reason === "stale") {
111389
- return `Error editing ${file_path}: Symbol "${symbol15}" has changed since you last read it. Use extract to re-read the current content, then retry.
111390
-
111391
- Example: <extract><targets>${file_path}#${symbol15}</targets></extract>`;
111294
+ return `Error editing ${file_path}: Symbol "${symbol15}" has changed since you last read it. Use the extract tool with targets="${file_path}#${symbol15}" to re-read the current content, then retry.`;
111392
111295
  }
111393
111296
  }
111394
111297
  const content = await import_fs12.promises.readFile(resolvedPath2, "utf-8");
@@ -111626,9 +111529,7 @@ Parameters:
111626
111529
  }
111627
111530
  if (options.fileTracker && !options.fileTracker.isFileSeen(resolvedPath2)) {
111628
111531
  const displayPath = toRelativePath(resolvedPath2, workspaceRoot);
111629
- return `Error editing ${displayPath}: This file has not been read yet in this session. Use 'extract' to read the file first, then retry your edit. This ensures you are working with the current file content.
111630
-
111631
- Example: <extract><targets>${displayPath}</targets></extract>`;
111532
+ return `Error editing ${displayPath}: This file has not been read yet in this session. Use the extract tool with targets="${displayPath}" to read the file first, then retry your edit.`;
111632
111533
  }
111633
111534
  if (symbol15 !== void 0 && symbol15 !== null) {
111634
111535
  return await handleSymbolEdit({ resolvedPath: resolvedPath2, file_path, symbol: symbol15, new_string, position, debug, cwd, fileTracker: options.fileTracker });
@@ -111648,7 +111549,7 @@ Example: <extract><targets>${displayPath}</targets></extract>`;
111648
111549
  const displayPath = toRelativePath(resolvedPath2, workspaceRoot);
111649
111550
  return `Error editing ${displayPath}: ${staleCheck.message}
111650
111551
 
111651
- Example: <extract><targets>${displayPath}</targets></extract>`;
111552
+ Use the extract tool with targets="${displayPath}" to re-read the file, then retry.`;
111652
111553
  }
111653
111554
  }
111654
111555
  const content = await import_fs12.promises.readFile(resolvedPath2, "utf-8");
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@probelabs/probe",
3
- "version": "0.6.0-rc272",
3
+ "version": "0.6.0-rc274",
4
4
  "description": "Node.js wrapper for the probe code search tool",
5
5
  "main": "src/index.js",
6
6
  "module": "src/index.js",
@@ -58,6 +58,28 @@ export function isContextLimitError(error) {
58
58
  return false;
59
59
  }
60
60
 
61
+ /**
62
+ * Check if an assistant message contains an attempt_completion tool call.
63
+ * Supports both native tool calling (toolInvocations/tool_calls) and text content.
64
+ */
65
+ function messageContainsCompletion(msg) {
66
+ // Native tool calling: Vercel AI SDK uses toolInvocations
67
+ if (Array.isArray(msg.toolInvocations)) {
68
+ if (msg.toolInvocations.some(t => t.toolName === 'attempt_completion')) return true;
69
+ }
70
+ // Native tool calling: OpenAI format uses tool_calls
71
+ if (Array.isArray(msg.tool_calls)) {
72
+ if (msg.tool_calls.some(t => t.function?.name === 'attempt_completion')) return true;
73
+ }
74
+ // Multipart content (Vercel AI SDK v4+)
75
+ if (Array.isArray(msg.content)) {
76
+ if (msg.content.some(p => p.type === 'tool-call' && p.toolName === 'attempt_completion')) return true;
77
+ }
78
+ // Text content fallback
79
+ const text = typeof msg.content === 'string' ? msg.content : '';
80
+ return text.includes('attempt_completion');
81
+ }
82
+
61
83
  /**
62
84
  * Identify message boundaries in conversation history
63
85
  * Structure: <user> -> <internal agentic monologue> -> <final-agent-answer>
@@ -65,7 +87,7 @@ export function isContextLimitError(error) {
65
87
  * A "segment" is:
66
88
  * - user message (role: 'user')
67
89
  * - followed by 0+ assistant messages (internal monologue)
68
- * - ending with tool_result or attempt_completion (final answer)
90
+ * - ending with attempt_completion tool call (final answer)
69
91
  *
70
92
  * @param {Array} messages - Array of message objects with {role, content}
71
93
  * @returns {Array} - Array of segments, each containing {userIndex, monologueIndices, finalIndex}
@@ -82,38 +104,33 @@ export function identifyMessageSegments(messages) {
82
104
  continue;
83
105
  }
84
106
 
107
+ // Tool result message (native tool calling format)
108
+ if (msg.role === 'tool' && currentSegment) {
109
+ currentSegment.monologueIndices.push(i);
110
+ continue;
111
+ }
112
+
85
113
  // User message starts a new segment
86
114
  if (msg.role === 'user') {
87
- // Check if this is a tool_result (final answer from previous segment)
88
- const content = typeof msg.content === 'string' ? msg.content : '';
89
- const isToolResult = content.includes('<tool_result>');
90
-
91
- if (isToolResult && currentSegment) {
92
- // This is the final answer for the current segment
93
- currentSegment.finalIndex = i;
115
+ // Save previous segment if it exists
116
+ if (currentSegment) {
94
117
  segments.push(currentSegment);
95
- currentSegment = null;
96
- } else {
97
- // Save previous segment if it exists
98
- if (currentSegment) {
99
- segments.push(currentSegment);
100
- }
101
-
102
- // Start new segment
103
- currentSegment = {
104
- userIndex: i,
105
- monologueIndices: [],
106
- finalIndex: null
107
- };
108
118
  }
119
+
120
+ // Start new segment
121
+ currentSegment = {
122
+ userIndex: i,
123
+ monologueIndices: [],
124
+ finalIndex: null
125
+ };
109
126
  }
110
127
 
111
128
  // Assistant message is part of monologue
112
129
  if (msg.role === 'assistant' && currentSegment) {
113
- const content = typeof msg.content === 'string' ? msg.content : '';
130
+ // Check if this contains an attempt_completion tool call (native or XML format)
131
+ const hasCompletion = messageContainsCompletion(msg);
114
132
 
115
- // Check if this contains attempt_completion (marks end of segment)
116
- if (content.includes('<attempt_completion>') || content.includes('attempt_completion')) {
133
+ if (hasCompletion) {
117
134
  currentSegment.monologueIndices.push(i);
118
135
  currentSegment.finalIndex = i;
119
136
  segments.push(currentSegment);
@@ -138,7 +155,7 @@ export function identifyMessageSegments(messages) {
138
155
  *
139
156
  * Strategy:
140
157
  * 1. Keep all user messages
141
- * 2. Keep all final answers (tool_results, attempt_completion)
158
+ * 2. Keep all final answers (attempt_completion)
142
159
  * 3. Remove intermediate monologue messages from completed segments
143
160
  * 4. Keep the most recent (active) segment intact
144
161
  *
@@ -94,9 +94,10 @@ If the solution is clear, you can jump to implementation right away. If not, ask
94
94
  - After every significant change, verify the project still builds and passes linting. Do not wait until the end to discover breakage.
95
95
 
96
96
  # After Implementation
97
- - Always run the project's tests before considering the task complete. If tests fail, fix them.
98
- - Run lint and typecheck commands if known for the project.
99
- - If a build, lint, or test fails, fix the issue before finishing.
97
+ - Verify the project builds successfully. If it doesn't, fix the build before moving on.
98
+ - Run lint and typecheck commands if known for the project. Fix any new warnings or errors you introduced.
99
+ - Add tests for any new or changed functionality. Tests must cover the main path and important edge cases.
100
+ - Run the project's full test suite. If any tests fail (including pre-existing ones you may have broken), fix them before finishing.
100
101
  - When the task is done, respond to the user with a concise summary of what was implemented, what files were changed, and any relevant details. Include links (e.g. pull request URL) so the user has everything they need.
101
102
 
102
103
  # GitHub Integration