@exulu/backend 1.53.1 → 1.54.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.cjs +3404 -2389
- package/dist/index.d.cts +66 -4
- package/dist/index.d.ts +66 -4
- package/dist/index.js +4926 -3918
- package/ee/agentic-retrieval/ANALYSIS.md +658 -0
- package/ee/agentic-retrieval/logs/README.md +198 -0
- package/ee/agentic-retrieval/v2.ts +1628 -0
- package/ee/agentic-retrieval/v3/agent-loop.ts +242 -0
- package/ee/agentic-retrieval/v3/classifier.ts +73 -0
- package/ee/agentic-retrieval/v3/context-sampler.ts +70 -0
- package/ee/agentic-retrieval/v3/dynamic-tools.ts +115 -0
- package/ee/agentic-retrieval/v3/index.ts +281 -0
- package/ee/agentic-retrieval/v3/strategies.ts +167 -0
- package/ee/agentic-retrieval/v3/tools.ts +435 -0
- package/ee/agentic-retrieval/v3/trajectory.ts +96 -0
- package/ee/agentic-retrieval/v3/types.ts +59 -0
- package/ee/agentic-retrieval/v4/agent-loop.ts +121 -0
- package/ee/agentic-retrieval/v4/embed-preprocessor.ts +76 -0
- package/ee/agentic-retrieval/v4/index.ts +181 -0
- package/ee/agentic-retrieval/v4/system-prompt.ts +248 -0
- package/ee/agentic-retrieval/v4/tools.ts +241 -0
- package/ee/agentic-retrieval/v4/types.ts +29 -0
- package/ee/chunking/markdown.ts +4 -2
- package/ee/workers.ts +1 -1
- package/package.json +6 -3
|
@@ -0,0 +1,658 @@
|
|
|
1
|
+
# Agentic Retrieval Agent - Functionality Analysis & Improvement Strategy
|
|
2
|
+
|
|
3
|
+
**Date:** 2026-04-09
|
|
4
|
+
**Current File:** `/Users/daniel.claessen/Desktop/Projects/exulu/backend/ee/agentic-retrieval/index.ts`
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
## Executive Summary
|
|
9
|
+
|
|
10
|
+
The current agentic retrieval agent is **good** but lacks the **flexibility and strategic depth** demonstrated by the ExuluContext retrieval skill. The skill can execute raw SQL queries with full control over search strategy, aggregations, counts, and filtering, while the current agent is constrained to predefined tool patterns.
|
|
11
|
+
|
|
12
|
+
**Key Gap:** The agent cannot dynamically craft custom SQL queries or COUNT/aggregation queries - it's limited to the fixed `search_content` and `search_items_by_name` tools.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## Current Capabilities
|
|
17
|
+
|
|
18
|
+
### ✅ What the Agent Does Well
|
|
19
|
+
|
|
20
|
+
1. **Multi-step reasoning** - Plans and executes retrieval in multiple steps
|
|
21
|
+
2. **Two-phase pattern** - Can search items first, then search within specific items
|
|
22
|
+
3. **includeContent optimization** - Knows when to exclude content for efficiency
|
|
23
|
+
4. **Search method selection** - Supports hybrid, keyword, and semantic search via `ctx.search()`
|
|
24
|
+
5. **Dynamic tool generation** - Creates `get_more_content` and `get_content` tools on the fly
|
|
25
|
+
6. **Multi-context search** - Can search across multiple contexts simultaneously
|
|
26
|
+
7. **Filtering capabilities** - Can filter by item_ids, item_names, item_external_ids
|
|
27
|
+
8. **Reranking support** - Can rerank results with ExuluReranker
|
|
28
|
+
|
|
29
|
+
### ❌ What the Agent CANNOT Do
|
|
30
|
+
|
|
31
|
+
Comparing to the ExuluContext retrieval skill capabilities:
|
|
32
|
+
|
|
33
|
+
1. **No COUNT/Aggregation queries**
|
|
34
|
+
- Skill: "How many documents mention FST?" → `SELECT COUNT(DISTINCT source)...`
|
|
35
|
+
- Agent: Can only return chunks, cannot provide counts or statistics
|
|
36
|
+
|
|
37
|
+
2. **No custom SQL flexibility**
|
|
38
|
+
- Skill: Can craft any SQL query (GROUP BY, AVG, SUM, complex JOINs)
|
|
39
|
+
- Agent: Limited to `ctx.search()` method only
|
|
40
|
+
|
|
41
|
+
3. **No direct table exploration**
|
|
42
|
+
- Skill: Can query item table directly, get column info, sample data
|
|
43
|
+
- Agent: Must use `search_items_by_name` which is name-based only
|
|
44
|
+
|
|
45
|
+
4. **No chunk expansion with SQL control**
|
|
46
|
+
- Skill: Can get surrounding chunks with precise BETWEEN queries
|
|
47
|
+
- Agent: Has `get_more_content` but less flexible
|
|
48
|
+
|
|
49
|
+
5. **No field-specific filtering**
|
|
50
|
+
- Skill: Can filter by ANY custom field (tags, metadata, JSONB fields, dates)
|
|
51
|
+
- Agent: Limited to name, id, external_id filtering
|
|
52
|
+
|
|
53
|
+
6. **No keyword-only FTS queries**
|
|
54
|
+
- Skill: Can use pure `ts_rank()` queries with complex tsquery syntax (AND, OR, NOT)
|
|
55
|
+
- Agent: Uses `ctx.search()` which always involves the full search stack
|
|
56
|
+
|
|
57
|
+
7. **No RRF score visibility**
|
|
58
|
+
- Skill: Can see and control RRF weights (keyword_weight: 2.0, semantic_weight: 1.0)
|
|
59
|
+
- Agent: Uses `ctx.search()` with fixed RRF implementation
|
|
60
|
+
|
|
61
|
+
8. **No multi-language FTS control**
|
|
62
|
+
- Skill: Can use `GREATEST(ts_rank(...'german'...), ts_rank(...'english'...))`
|
|
63
|
+
- Agent: Relies on context configuration
|
|
64
|
+
|
|
65
|
+
9. **No learning/strategy persistence**
|
|
66
|
+
- Skill: Has STRATEGY.md that learns from searches
|
|
67
|
+
- Agent: No memory of what works/doesn't work
|
|
68
|
+
|
|
69
|
+
10. **No exact field queries**
|
|
70
|
+
- Skill: Can query `WHERE tags LIKE '%important%'` or `WHERE "createdAt" > NOW() - INTERVAL '7 days'`
|
|
71
|
+
- Agent: Cannot filter by tags, dates, or custom fields
|
|
72
|
+
|
|
73
|
+
11. **No iterative temp file workflow**
|
|
74
|
+
- Skill: Can save query results to temp file, then grep iteratively without loading all content into LLM context
|
|
75
|
+
- Agent: All results loaded into tool output immediately, consuming tokens
|
|
76
|
+
- **Impact**: Skill can retrieve 100 chunks, save to `/tmp/results.txt`, then grep for specific patterns, only loading relevant portions into context
|
|
77
|
+
- **Agent limitation**: Must either load all content (expensive) or use `includeContent: false` and make additional tool calls
|
|
78
|
+
|
|
79
|
+
---
|
|
80
|
+
|
|
81
|
+
## Comparison Table
|
|
82
|
+
|
|
83
|
+
| Capability | Retrieval Skill | Current Agent | Gap Severity |
|
|
84
|
+
|------------|----------------|---------------|--------------|
|
|
85
|
+
| **Basic Search** | ✅ Full control | ✅ Via ctx.search() | Low |
|
|
86
|
+
| **Hybrid/Semantic/Keyword** | ✅ Full SQL control | ✅ Via method param | Low |
|
|
87
|
+
| **COUNT queries** | ✅ `COUNT(*)`, `COUNT(DISTINCT)` | ❌ Cannot count | **HIGH** |
|
|
88
|
+
| **Aggregations** | ✅ `SUM`, `AVG`, `GROUP BY` | ❌ Cannot aggregate | **HIGH** |
|
|
89
|
+
| **Field filtering** | ✅ ANY field (tags, dates, JSONB) | ❌ Only name/id/external_id | **MEDIUM** |
|
|
90
|
+
| **Direct table queries** | ✅ Can query items table directly | ❌ Must use search | **MEDIUM** |
|
|
91
|
+
| **Custom SQL** | ✅ Any SQL query | ❌ Fixed to ctx.search() | **HIGH** |
|
|
92
|
+
| **RRF control** | ✅ Can adjust weights | ❌ Fixed implementation | Low |
|
|
93
|
+
| **Learning** | ✅ STRATEGY.md | ❌ No persistence | **MEDIUM** |
|
|
94
|
+
| **Multi-step reasoning** | ✅ Manual steps | ✅ Automatic | Equal |
|
|
95
|
+
| **Context expansion** | ✅ Precise SQL BETWEEN | ✅ get_more_content tool | Equal |
|
|
96
|
+
| **Multi-context** | ✅ Manual UNION | ✅ Automatic | Equal |
|
|
97
|
+
| **includeContent opt** | N/A | ✅ Smart optimization | Agent better |
|
|
98
|
+
| **Iterative filtering** | ✅ Temp file + grep workflow | ❌ All results in context | **HIGH** |
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## Root Cause Analysis
|
|
103
|
+
|
|
104
|
+
### Why is the Agent Limited?
|
|
105
|
+
|
|
106
|
+
The agent is constrained by its **tool-based architecture**:
|
|
107
|
+
|
|
108
|
+
1. **Tools must be predefined** - Cannot dynamically create SQL-based tools
|
|
109
|
+
2. **ctx.search() abstraction** - Hides the underlying SQL flexibility
|
|
110
|
+
3. **No direct database access** - Tools don't expose `postgresClient` for raw queries
|
|
111
|
+
4. **Fixed schema** - Tool input schemas cannot be dynamically generated based on table structure
|
|
112
|
+
|
|
113
|
+
### What the Skill Does Differently
|
|
114
|
+
|
|
115
|
+
The skill operates at a **lower level**:
|
|
116
|
+
1. **Direct PostgreSQL access** - Uses `psql` commands with full SQL control
|
|
117
|
+
2. **Context discovery** - Queries `information_schema` to understand table structures
|
|
118
|
+
3. **Dynamic query building** - Crafts SQL based on query type (COUNT, search, aggregation)
|
|
119
|
+
4. **Strategy learning** - Stores successful patterns in STRATEGY.md
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## Proposed Solution Strategy
|
|
124
|
+
|
|
125
|
+
### Option 1: Add SQL Query Tool (RECOMMENDED)
|
|
126
|
+
|
|
127
|
+
**Concept:** Give the agent a new tool that can execute safe, read-only SQL queries directly.
|
|
128
|
+
|
|
129
|
+
**Implementation:**
|
|
130
|
+
|
|
131
|
+
```typescript
|
|
132
|
+
const sqlQueryTool = tool({
|
|
133
|
+
description: `
|
|
134
|
+
Execute a read-only SQL query against ExuluContext tables for advanced retrieval needs.
|
|
135
|
+
|
|
136
|
+
Use this tool when you need to:
|
|
137
|
+
- COUNT or aggregate data (e.g., "How many documents mention X?")
|
|
138
|
+
- Filter by custom fields (tags, dates, JSONB fields)
|
|
139
|
+
- Get statistics (AVG, SUM, MIN, MAX)
|
|
140
|
+
- Query item metadata without searching content
|
|
141
|
+
- Execute complex JOINs or GROUP BY queries
|
|
142
|
+
|
|
143
|
+
IMPORTANT:
|
|
144
|
+
- Only SELECT queries allowed (no INSERT, UPDATE, DELETE)
|
|
145
|
+
- Must query ExuluContext tables (${contexts.map(c => `${c.id}_items, ${c.id}_chunks`).join(', ')})
|
|
146
|
+
- Use parameterized queries for user input
|
|
147
|
+
`,
|
|
148
|
+
inputSchema: z.object({
|
|
149
|
+
context_id: z.enum(contexts.map(c => c.id)),
|
|
150
|
+
query: z.string().describe("The SQL SELECT query to execute"),
|
|
151
|
+
reasoning: z.string().describe("Explain why you need this SQL query vs using search_content"),
|
|
152
|
+
}),
|
|
153
|
+
execute: async ({ context_id, query, reasoning }) => {
|
|
154
|
+
// Validate query is read-only
|
|
155
|
+
if (!/^\\s*SELECT/i.test(query)) {
|
|
156
|
+
throw new Error("Only SELECT queries allowed");
|
|
157
|
+
}
|
|
158
|
+
|
|
159
|
+
// Validate table names
|
|
160
|
+
const validTables = [`${context_id}_items`, `${context_id}_chunks`];
|
|
161
|
+
// ... more validation
|
|
162
|
+
|
|
163
|
+
// Execute with safety limits
|
|
164
|
+
const { db } = await postgresClient();
|
|
165
|
+
const results = await db.raw(query + ' LIMIT 1000');
|
|
166
|
+
|
|
167
|
+
return JSON.stringify(results.rows, null, 2);
|
|
168
|
+
}
|
|
169
|
+
});
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
**Pros:**
|
|
173
|
+
- ✅ Maximum flexibility - agent can do anything the skill can
|
|
174
|
+
- ✅ Leverages LLM's SQL knowledge
|
|
175
|
+
- ✅ Minimal code changes
|
|
176
|
+
|
|
177
|
+
**Cons:**
|
|
178
|
+
- ⚠️ SQL injection risk (needs robust validation)
|
|
179
|
+
- ⚠️ Agent might generate inefficient queries
|
|
180
|
+
- ⚠️ Requires LLM to know table schemas
|
|
181
|
+
|
|
182
|
+
**Mitigation:**
|
|
183
|
+
- Provide table schema in tool description
|
|
184
|
+
- Use SQL parser to validate queries
|
|
185
|
+
- Set strict LIMIT caps
|
|
186
|
+
- Log all SQL queries for audit
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
### Option 1.5: Add Iterative Filtering with Virtual Bash Environment
|
|
191
|
+
|
|
192
|
+
**Concept:** Allow agent to save large result sets to a virtual filesystem, then iteratively grep/filter without loading all content into context. Uses `bash-tool` from AI SDK to provide grep/bash capabilities.
|
|
193
|
+
|
|
194
|
+
**Implementation:**
|
|
195
|
+
|
|
196
|
+
```typescript
|
|
197
|
+
import { createBashTool } from 'bash-tool';
|
|
198
|
+
import { generateText } from 'ai';
|
|
199
|
+
|
|
200
|
+
// Create agent with bash tool support
|
|
201
|
+
const createAgenticRetrievalWithBashSupport = async ({
|
|
202
|
+
contexts,
|
|
203
|
+
model,
|
|
204
|
+
// ... other params
|
|
205
|
+
}) => {
|
|
206
|
+
// Initialize virtual bash environment
|
|
207
|
+
const { tools: bashTools, updateFiles } = await createBashTool({
|
|
208
|
+
files: {}, // Start with empty virtual filesystem
|
|
209
|
+
});
|
|
210
|
+
|
|
211
|
+
// Tool to save search results to virtual filesystem
|
|
212
|
+
const saveSearchResultsTool = tool({
|
|
213
|
+
description: `
|
|
214
|
+
Execute a search and save results to the virtual filesystem instead of returning them directly.
|
|
215
|
+
This is useful when you expect many results and want to iteratively filter them
|
|
216
|
+
without consuming tokens by loading all content into context.
|
|
217
|
+
|
|
218
|
+
After saving, you can use bash tools (grep, awk, head, tail) to find specific patterns.
|
|
219
|
+
The file will be available in the virtual filesystem at /search_results.txt
|
|
220
|
+
`,
|
|
221
|
+
inputSchema: z.object({
|
|
222
|
+
context_ids: z.array(z.enum(contexts.map(c => c.id))),
|
|
223
|
+
query: z.string(),
|
|
224
|
+
searchMethod: z.enum(['keyword', 'semantic', 'hybrid']),
|
|
225
|
+
limit: z.number().max(1000).describe("Can retrieve up to 1000 results"),
|
|
226
|
+
includeContent: z.boolean().default(true),
|
|
227
|
+
}),
|
|
228
|
+
execute: async ({ query, context_ids, searchMethod, limit, includeContent }) => {
|
|
229
|
+
// Execute search across contexts
|
|
230
|
+
const results = await Promise.all(
|
|
231
|
+
context_ids.map(async (contextId) => {
|
|
232
|
+
const ctx = contexts.find(c => c.id === contextId);
|
|
233
|
+
return await ctx.search({
|
|
234
|
+
query,
|
|
235
|
+
method: searchMethod === 'hybrid' ? 'hybridSearch' : ...,
|
|
236
|
+
limit,
|
|
237
|
+
// ... other params
|
|
238
|
+
});
|
|
239
|
+
})
|
|
240
|
+
);
|
|
241
|
+
|
|
242
|
+
const chunks = results.flat();
|
|
243
|
+
|
|
244
|
+
// Format results in a greppable format with clear separators
|
|
245
|
+
const formattedContent = chunks.map((chunk, idx) =>
|
|
246
|
+
`### RESULT ${idx + 1} ###\n` +
|
|
247
|
+
`ITEM_NAME: ${chunk.item_name}\n` +
|
|
248
|
+
`ITEM_ID: ${chunk.item_id}\n` +
|
|
249
|
+
`CHUNK_ID: ${chunk.chunk_id}\n` +
|
|
250
|
+
`CHUNK_INDEX: ${chunk.chunk_index}\n` +
|
|
251
|
+
`CONTEXT: ${chunk.context?.id}\n` +
|
|
252
|
+
`SCORE: ${chunk.chunk_hybrid_score || chunk.chunk_fts_rank || chunk.chunk_cosine_distance}\n` +
|
|
253
|
+
`---CONTENT START---\n` +
|
|
254
|
+
`${includeContent ? chunk.chunk_content : '[Content not included - use includeContent: true to load]'}\n` +
|
|
255
|
+
`---CONTENT END---\n\n`
|
|
256
|
+
).join('');
|
|
257
|
+
|
|
258
|
+
// Update virtual filesystem with search results
|
|
259
|
+
await updateFiles({
|
|
260
|
+
'search_results.txt': formattedContent,
|
|
261
|
+
'search_metadata.json': JSON.stringify({
|
|
262
|
+
query,
|
|
263
|
+
timestamp: new Date().toISOString(),
|
|
264
|
+
results_count: chunks.length,
|
|
265
|
+
contexts: context_ids,
|
|
266
|
+
method: searchMethod,
|
|
267
|
+
}, null, 2)
|
|
268
|
+
});
|
|
269
|
+
|
|
270
|
+
return JSON.stringify({
|
|
271
|
+
success: true,
|
|
272
|
+
results_count: chunks.length,
|
|
273
|
+
message: `Saved ${chunks.length} results to virtual filesystem at /search_results.txt`,
|
|
274
|
+
available_commands: [
|
|
275
|
+
'grep -i "pattern" search_results.txt',
|
|
276
|
+
'grep "ITEM_NAME: specific_name" search_results.txt -A 10',
|
|
277
|
+
'awk \'/RESULT/ {print $3}\' search_results.txt',
|
|
278
|
+
'head -50 search_results.txt',
|
|
279
|
+
'grep "CHUNK_ID:" search_results.txt | wc -l'
|
|
280
|
+
],
|
|
281
|
+
next_steps: "Use bash tools to grep/filter the results. Example: grep -i 'safety' search_results.txt"
|
|
282
|
+
}, null, 2);
|
|
283
|
+
}
|
|
284
|
+
});
|
|
285
|
+
|
|
286
|
+
// Combine all tools
|
|
287
|
+
const allTools = {
|
|
288
|
+
...searchTools,
|
|
289
|
+
...searchItemsByNameTool,
|
|
290
|
+
save_search_results: saveSearchResultsTool,
|
|
291
|
+
...bashTools, // Provides: bash, grep, awk, sed, head, tail, cat, etc.
|
|
292
|
+
};
|
|
293
|
+
|
|
294
|
+
return createCustomAgenticRetrievalToolLoopAgent({
|
|
295
|
+
model,
|
|
296
|
+
tools: allTools,
|
|
297
|
+
// ... other config
|
|
298
|
+
});
|
|
299
|
+
};
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
**Workflow Example:**
|
|
303
|
+
```
|
|
304
|
+
1. User: "Find all documents about elevator safety"
|
|
305
|
+
|
|
306
|
+
2. Agent reasoning: "This is a broad query, I'll save results to virtual filesystem"
|
|
307
|
+
Tool: save_search_results → Saves 100 chunks to virtual /search_results.txt
|
|
308
|
+
Output: "Saved 100 results, use grep to filter"
|
|
309
|
+
|
|
310
|
+
3. Agent reasoning: "Now I'll grep for specific safety procedures"
|
|
311
|
+
Tool: bash → grep -i "notfall\|emergency" search_results.txt | head -30
|
|
312
|
+
Output: Shows 30 lines mentioning emergency procedures (minimal tokens)
|
|
313
|
+
|
|
314
|
+
4. Agent reasoning: "Found relevant section in ITEM_NAME: Safety_Manual_2024, let me extract that chunk ID"
|
|
315
|
+
Tool: bash → grep -B 3 "Notfallverfahren" search_results.txt | grep "CHUNK_ID:"
|
|
316
|
+
Output: CHUNK_ID: abc-123-def
|
|
317
|
+
|
|
318
|
+
5. Agent reasoning: "Now I'll get the full content for that specific chunk"
|
|
319
|
+
Tool: get_content_abc-123-def → Returns full chunk content
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
**Pros:**
|
|
323
|
+
- ✅ Token efficiency - can retrieve 1000+ results without loading into context
|
|
324
|
+
- ✅ Iterative refinement - grep, awk, sed for complex filtering
|
|
325
|
+
- ✅ Cost savings - only load specific chunks after identifying via grep
|
|
326
|
+
- ✅ Speed - bash operations are instant vs multiple LLM tool calls
|
|
327
|
+
- ✅ Safe - virtual filesystem, no actual file system access
|
|
328
|
+
- ✅ Familiar - agents already know how to use bash/grep
|
|
329
|
+
|
|
330
|
+
**Cons:**
|
|
331
|
+
- ⚠️ Requires bash-tool dependency
|
|
332
|
+
- ⚠️ More complex workflow - agent needs training on when to use this pattern
|
|
333
|
+
- ⚠️ Formatting matters - results must be in greppable format with clear separators
|
|
334
|
+
|
|
335
|
+
**Token Savings Example:**
|
|
336
|
+
- Without this: Load 100 chunks × 500 tokens each = 50,000 tokens
|
|
337
|
+
- With this: Save to file (100 tokens) + grep operations (500 tokens) + load 3 specific chunks (1,500 tokens) = **2,100 tokens (96% savings)**
|
|
338
|
+
|
|
339
|
+
---
|
|
340
|
+
|
|
341
|
+
### Option 2: Add Specialized COUNT/Aggregation Tools
|
|
342
|
+
|
|
343
|
+
**Concept:** Create dedicated tools for common operations the skill can do.
|
|
344
|
+
|
|
345
|
+
**Implementation:**
|
|
346
|
+
|
|
347
|
+
```typescript
|
|
348
|
+
const countTool = tool({
|
|
349
|
+
description: "Count items or chunks matching criteria",
|
|
350
|
+
inputSchema: z.object({
|
|
351
|
+
context_ids: z.array(z.enum(contexts.map(c => c.id))),
|
|
352
|
+
count_what: z.enum(['items', 'chunks', 'distinct_items']),
|
|
353
|
+
content_query: z.string().optional().describe("FTS query for chunks content"),
|
|
354
|
+
item_name_contains: z.string().optional(),
|
|
355
|
+
item_tags_contain: z.string().optional(),
|
|
356
|
+
created_after: z.string().optional().describe("ISO date"),
|
|
357
|
+
}),
|
|
358
|
+
execute: async ({ ... }) => {
|
|
359
|
+
// Build COUNT query based on parameters
|
|
360
|
+
const countQuery = buildCountQuery(params);
|
|
361
|
+
const results = await db.raw(countQuery);
|
|
362
|
+
return results.rows[0].count;
|
|
363
|
+
}
|
|
364
|
+
});
|
|
365
|
+
|
|
366
|
+
const aggregateTool = tool({
|
|
367
|
+
description: "Get statistics and aggregations",
|
|
368
|
+
inputSchema: z.object({
|
|
369
|
+
context_ids: z.array(z.enum(contexts.map(c => c.id))),
|
|
370
|
+
group_by: z.enum(['tags', 'created_date', 'item_name']).optional(),
|
|
371
|
+
aggregate: z.enum(['count', 'avg_chunks', 'sum_chunks']),
|
|
372
|
+
}),
|
|
373
|
+
// ... similar pattern
|
|
374
|
+
});
|
|
375
|
+
|
|
376
|
+
const filterItemsTool = tool({
|
|
377
|
+
description: "Query items table directly with advanced filtering",
|
|
378
|
+
inputSchema: z.object({
|
|
379
|
+
context_id: z.enum(contexts.map(c => c.id)),
|
|
380
|
+
filters: z.array(z.object({
|
|
381
|
+
field: z.string(),
|
|
382
|
+
operator: z.enum(['equals', 'contains', 'greater_than', 'less_than', 'in']),
|
|
383
|
+
value: z.any(),
|
|
384
|
+
})),
|
|
385
|
+
limit: z.number().default(100),
|
|
386
|
+
}),
|
|
387
|
+
// ... builds WHERE clause from filters
|
|
388
|
+
});
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
**Pros:**
|
|
392
|
+
- ✅ Safer than raw SQL
|
|
393
|
+
- ✅ More guided - agent knows exactly what each tool does
|
|
394
|
+
- ✅ Can optimize each tool independently
|
|
395
|
+
|
|
396
|
+
**Cons:**
|
|
397
|
+
- ❌ Less flexible - need to add new tools for new patterns
|
|
398
|
+
- ❌ More code to maintain
|
|
399
|
+
- ❌ Still limited to predefined operations
|
|
400
|
+
|
|
401
|
+
---
|
|
402
|
+
|
|
403
|
+
### Option 3: Hybrid Approach (BEST BALANCE)
|
|
404
|
+
|
|
405
|
+
**Concept:** Combine both approaches with safety layers.
|
|
406
|
+
|
|
407
|
+
1. **Add specialized tools** for common patterns (COUNT, basic aggregations)
|
|
408
|
+
2. **Add SQL tool** for advanced cases, but with strict validation
|
|
409
|
+
3. **Teach the agent** when to use which tool
|
|
410
|
+
|
|
411
|
+
**Implementation:**
|
|
412
|
+
|
|
413
|
+
```typescript
|
|
414
|
+
const tools = {
|
|
415
|
+
...existingTools,
|
|
416
|
+
|
|
417
|
+
// Safe, common operations
|
|
418
|
+
count_items_or_chunks: countTool,
|
|
419
|
+
aggregate_statistics: aggregateTool,
|
|
420
|
+
query_items_metadata: filterItemsTool,
|
|
421
|
+
|
|
422
|
+
// Advanced fallback (requires reasoning)
|
|
423
|
+
advanced_sql_query: {
|
|
424
|
+
...sqlQueryTool,
|
|
425
|
+
description: `
|
|
426
|
+
${sqlQueryTool.description}
|
|
427
|
+
|
|
428
|
+
USE THIS ONLY WHEN:
|
|
429
|
+
- count_items_or_chunks cannot handle the COUNT query
|
|
430
|
+
- aggregate_statistics cannot handle the aggregation
|
|
431
|
+
- query_items_metadata cannot handle the filtering
|
|
432
|
+
- You need complex JOINs or subqueries
|
|
433
|
+
|
|
434
|
+
ALWAYS TRY the specialized tools FIRST!
|
|
435
|
+
`
|
|
436
|
+
}
|
|
437
|
+
};
|
|
438
|
+
```
|
|
439
|
+
|
|
440
|
+
**Agent instruction updates:**
|
|
441
|
+
|
|
442
|
+
```typescript
|
|
443
|
+
const updatedInstructions = `
|
|
444
|
+
${baseInstructions}
|
|
445
|
+
|
|
446
|
+
QUERY STRATEGY DECISION TREE:
|
|
447
|
+
|
|
448
|
+
1. FOR COUNTING QUERIES ("how many...", "count...", "number of..."):
|
|
449
|
+
- Use count_items_or_chunks tool
|
|
450
|
+
- Specify what to count (items, chunks, distinct_items)
|
|
451
|
+
- Apply filters as needed
|
|
452
|
+
|
|
453
|
+
2. FOR STATISTICS ("average...", "total...", "breakdown by..."):
|
|
454
|
+
- Use aggregate_statistics tool
|
|
455
|
+
- Choose aggregation type and grouping
|
|
456
|
+
|
|
457
|
+
3. FOR METADATA QUERIES ("list items created after...", "show items tagged..."):
|
|
458
|
+
- Use query_items_metadata tool
|
|
459
|
+
- Build filter conditions
|
|
460
|
+
|
|
461
|
+
4. FOR CONTENT SEARCH (default pattern):
|
|
462
|
+
- Use existing search_content / search_items_by_name tools
|
|
463
|
+
|
|
464
|
+
5. FOR COMPLEX QUERIES (last resort):
|
|
465
|
+
- Use advanced_sql_query with full justification
|
|
466
|
+
- Must explain why specialized tools insufficient
|
|
467
|
+
`;
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
**Pros:**
|
|
471
|
+
- ✅ Safe common path (80% of cases)
|
|
472
|
+
- ✅ Flexible escape hatch (20% of cases)
|
|
473
|
+
- ✅ Guided decision-making
|
|
474
|
+
- ✅ Easier to maintain safety
|
|
475
|
+
|
|
476
|
+
**Cons:**
|
|
477
|
+
- ⚠️ More tools = more complexity
|
|
478
|
+
- ⚠️ Agent must learn when to use which tool
|
|
479
|
+
|
|
480
|
+
---
|
|
481
|
+
|
|
482
|
+
## Recommended Implementation Plan
|
|
483
|
+
|
|
484
|
+
### Phase 1: Add Counting & Aggregation (Week 1)
|
|
485
|
+
|
|
486
|
+
1. **Implement `count_items_or_chunks` tool**
|
|
487
|
+
- Support: count items, count chunks, count distinct items by content query
|
|
488
|
+
- Add filtering: by name, tags, dates, custom fields
|
|
489
|
+
- Test with queries like "How many documents mention X?"
|
|
490
|
+
|
|
491
|
+
2. **Implement `aggregate_statistics` tool**
|
|
492
|
+
- Support: COUNT, AVG, SUM, MIN, MAX
|
|
493
|
+
- GROUP BY support for: tags, dates, item names
|
|
494
|
+
- Test with queries like "Show me a breakdown of documents by tag"
|
|
495
|
+
|
|
496
|
+
3. **Update agent instructions**
|
|
497
|
+
- Add COUNT query pattern
|
|
498
|
+
- Add STATISTICS query pattern
|
|
499
|
+
- Provide examples
|
|
500
|
+
|
|
501
|
+
**Success Metrics:**
|
|
502
|
+
- Agent can handle "how many..." queries
|
|
503
|
+
- Agent can provide breakdowns and statistics
|
|
504
|
+
- No SQL injection vulnerabilities
|
|
505
|
+
|
|
506
|
+
### Phase 2: Add Advanced Filtering (Week 2)
|
|
507
|
+
|
|
508
|
+
1. **Implement `query_items_metadata` tool**
|
|
509
|
+
- Support filtering by ANY field with operators
|
|
510
|
+
- Return item metadata without content
|
|
511
|
+
- Support pagination
|
|
512
|
+
|
|
513
|
+
2. **Implement `query_by_custom_fields` tool**
|
|
514
|
+
- Allow filtering by tags, JSONB fields, dates
|
|
515
|
+
- Support complex conditions (AND, OR)
|
|
516
|
+
|
|
517
|
+
3. **Implement iterative filtering tool**
|
|
518
|
+
- Add `save_search_results_to_file` tool that saves results without loading into context
|
|
519
|
+
- Returns file path where results were saved
|
|
520
|
+
- Agent can then use grep/analysis tools on the file iteratively
|
|
521
|
+
- Only loads relevant portions into context
|
|
522
|
+
|
|
523
|
+
4. **Update dynamic tools**
|
|
524
|
+
- Make `get_more_content` accept chunk index ranges
|
|
525
|
+
- Add `get_item_details` for specific item exploration
|
|
526
|
+
|
|
527
|
+
**Success Metrics:**
|
|
528
|
+
- Agent can find items by date ranges
|
|
529
|
+
- Agent can filter by tags and custom fields
|
|
530
|
+
- Agent can explore specific items in detail
|
|
531
|
+
- Agent can retrieve 100+ chunks efficiently without consuming excessive tokens
|
|
532
|
+
|
|
533
|
+
### Phase 3: Add SQL Tool (Week 3) - OPTIONAL
|
|
534
|
+
|
|
535
|
+
1. **Implement `advanced_sql_query` tool with validation**
|
|
536
|
+
- SQL parser to verify SELECT-only
|
|
537
|
+
- Table name whitelist
|
|
538
|
+
- LIMIT enforcement
|
|
539
|
+
- Parameter binding
|
|
540
|
+
|
|
541
|
+
2. **Add guardrails**
|
|
542
|
+
- Require reasoning field
|
|
543
|
+
- Log all queries
|
|
544
|
+
- Rate limiting
|
|
545
|
+
|
|
546
|
+
3. **Update instructions**
|
|
547
|
+
- Teach agent when SQL tool is appropriate
|
|
548
|
+
- Provide SQL query templates
|
|
549
|
+
- Show table schemas
|
|
550
|
+
|
|
551
|
+
**Success Metrics:**
|
|
552
|
+
- Agent uses SQL tool only when necessary
|
|
553
|
+
- Zero SQL injection incidents
|
|
554
|
+
- Complex queries execute correctly
|
|
555
|
+
|
|
556
|
+
### Phase 4: Add Learning (Week 4)
|
|
557
|
+
|
|
558
|
+
1. **Implement strategy persistence**
|
|
559
|
+
- Store successful query patterns
|
|
560
|
+
- Track what tools worked for what queries
|
|
561
|
+
- Learn from failures
|
|
562
|
+
|
|
563
|
+
2. **Add feedback loop**
|
|
564
|
+
- Agent asks if results were helpful
|
|
565
|
+
- Updates strategy based on feedback
|
|
566
|
+
- Shares learnings across sessions
|
|
567
|
+
|
|
568
|
+
3. **Create STRATEGY.json**
|
|
569
|
+
- Store learned patterns
|
|
570
|
+
- Context-specific strategies
|
|
571
|
+
- Query type → tool mapping
|
|
572
|
+
|
|
573
|
+
**Success Metrics:**
|
|
574
|
+
- Agent improves over time
|
|
575
|
+
- Fewer failed searches
|
|
576
|
+
- Better tool selection
|
|
577
|
+
|
|
578
|
+
---
|
|
579
|
+
|
|
580
|
+
## Risk Assessment
|
|
581
|
+
|
|
582
|
+
| Risk | Severity | Mitigation |
|
|
583
|
+
|------|----------|------------|
|
|
584
|
+
| SQL Injection | 🔴 HIGH | SQL parser, whitelist, parameterization |
|
|
585
|
+
| Performance impact | 🟡 MEDIUM | Query timeouts, LIMIT enforcement |
|
|
586
|
+
| Wrong tool selection | 🟡 MEDIUM | Clear instructions, examples, learning |
|
|
587
|
+
| Maintenance burden | 🟡 MEDIUM | Good documentation, tests |
|
|
588
|
+
| Cost increase | 🟢 LOW | Proper tool selection reduces retries |
|
|
589
|
+
|
|
590
|
+
---
|
|
591
|
+
|
|
592
|
+
## Appendix: Example Queries the Agent Cannot Handle Now
|
|
593
|
+
|
|
594
|
+
1. **COUNT queries:**
|
|
595
|
+
```sql
|
|
596
|
+
-- Skill can do this
|
|
597
|
+
SELECT COUNT(DISTINCT source) FROM vorschriften_chunks
|
|
598
|
+
WHERE fts @@ to_tsquery('german', 'EN-8100');
|
|
599
|
+
|
|
600
|
+
-- Agent cannot
|
|
601
|
+
```
|
|
602
|
+
|
|
603
|
+
2. **Aggregations:**
|
|
604
|
+
```sql
|
|
605
|
+
-- Skill can do this
|
|
606
|
+
SELECT
|
|
607
|
+
unnest(string_to_array(tags, ',')) as tag,
|
|
608
|
+
COUNT(*) as count
|
|
609
|
+
FROM techDoc_items
|
|
610
|
+
GROUP BY tag
|
|
611
|
+
ORDER BY count DESC;
|
|
612
|
+
|
|
613
|
+
-- Agent cannot
|
|
614
|
+
```
|
|
615
|
+
|
|
616
|
+
3. **Date filtering:**
|
|
617
|
+
```sql
|
|
618
|
+
-- Skill can do this
|
|
619
|
+
SELECT * FROM techDoc_items
|
|
620
|
+
WHERE "createdAt" > NOW() - INTERVAL '7 days'
|
|
621
|
+
AND archived = false;
|
|
622
|
+
|
|
623
|
+
-- Agent cannot filter by dates
|
|
624
|
+
```
|
|
625
|
+
|
|
626
|
+
4. **Tag filtering:**
|
|
627
|
+
```sql
|
|
628
|
+
-- Skill can do this
|
|
629
|
+
SELECT * FROM vorschriften_items
|
|
630
|
+
WHERE tags LIKE '%important%';
|
|
631
|
+
|
|
632
|
+
-- Agent cannot filter by tags
|
|
633
|
+
```
|
|
634
|
+
|
|
635
|
+
5. **Statistics:**
|
|
636
|
+
```sql
|
|
637
|
+
-- Skill can do this
|
|
638
|
+
SELECT
|
|
639
|
+
AVG(chunks_count) as avg_chunks,
|
|
640
|
+
MAX(chunks_count) as max_chunks,
|
|
641
|
+
COUNT(*) as total_items
|
|
642
|
+
FROM techDoc_items;
|
|
643
|
+
|
|
644
|
+
-- Agent cannot compute statistics
|
|
645
|
+
```
|
|
646
|
+
|
|
647
|
+
---
|
|
648
|
+
|
|
649
|
+
## Conclusion
|
|
650
|
+
|
|
651
|
+
The agentic retrieval agent is **functionally good** but **strategically limited** compared to the ExuluContext retrieval skill. The recommended path forward is:
|
|
652
|
+
|
|
653
|
+
1. **Immediate (Week 1):** Add COUNT and aggregation tools
|
|
654
|
+
2. **Short-term (Week 2):** Add advanced filtering tools
|
|
655
|
+
3. **Medium-term (Week 3):** Consider SQL tool with strict validation
|
|
656
|
+
4. **Long-term (Week 4):** Add learning/strategy persistence
|
|
657
|
+
|
|
658
|
+
This approach balances **safety**, **flexibility**, and **maintenance burden** while bringing the agent closer to the skill's capabilities.
|