@exulu/backend 1.53.0 → 1.54.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,658 @@
1
+ # Agentic Retrieval Agent - Functionality Analysis & Improvement Strategy
2
+
3
+ **Date:** 2026-04-09
4
+ **Current File:** `/Users/daniel.claessen/Desktop/Projects/exulu/backend/ee/agentic-retrieval/index.ts`
5
+
6
+ ---
7
+
8
+ ## Executive Summary
9
+
10
+ The current agentic retrieval agent is **good** but lacks the **flexibility and strategic depth** demonstrated by the ExuluContext retrieval skill. The skill can execute raw SQL queries with full control over search strategy, aggregations, counts, and filtering, while the current agent is constrained to predefined tool patterns.
11
+
12
+ **Key Gap:** The agent cannot dynamically craft custom SQL queries or COUNT/aggregation queries - it's limited to the fixed `search_content` and `search_items_by_name` tools.
13
+
14
+ ---
15
+
16
+ ## Current Capabilities
17
+
18
+ ### ✅ What the Agent Does Well
19
+
20
+ 1. **Multi-step reasoning** - Plans and executes retrieval in multiple steps
21
+ 2. **Two-phase pattern** - Can search items first, then search within specific items
22
+ 3. **includeContent optimization** - Knows when to exclude content for efficiency
23
+ 4. **Search method selection** - Supports hybrid, keyword, and semantic search via `ctx.search()`
24
+ 5. **Dynamic tool generation** - Creates `get_more_content` and `get_content` tools on the fly
25
+ 6. **Multi-context search** - Can search across multiple contexts simultaneously
26
+ 7. **Filtering capabilities** - Can filter by item_ids, item_names, item_external_ids
27
+ 8. **Reranking support** - Can rerank results with ExuluReranker
28
+
29
+ ### ❌ What the Agent CANNOT Do
30
+
31
+ Comparing to the ExuluContext retrieval skill capabilities:
32
+
33
+ 1. **No COUNT/Aggregation queries**
34
+ - Skill: "How many documents mention FST?" → `SELECT COUNT(DISTINCT source)...`
35
+ - Agent: Can only return chunks, cannot provide counts or statistics
36
+
37
+ 2. **No custom SQL flexibility**
38
+ - Skill: Can craft any SQL query (GROUP BY, AVG, SUM, complex JOINs)
39
+ - Agent: Limited to `ctx.search()` method only
40
+
41
+ 3. **No direct table exploration**
42
+ - Skill: Can query item table directly, get column info, sample data
43
+ - Agent: Must use `search_items_by_name` which is name-based only
44
+
45
+ 4. **No chunk expansion with SQL control**
46
+ - Skill: Can get surrounding chunks with precise BETWEEN queries
47
+ - Agent: Has `get_more_content` but less flexible
48
+
49
+ 5. **No field-specific filtering**
50
+ - Skill: Can filter by ANY custom field (tags, metadata, JSONB fields, dates)
51
+ - Agent: Limited to name, id, external_id filtering
52
+
53
+ 6. **No keyword-only FTS queries**
54
+ - Skill: Can use pure `ts_rank()` queries with complex tsquery syntax (AND, OR, NOT)
55
+ - Agent: Uses `ctx.search()` which always involves the full search stack
56
+
57
+ 7. **No RRF score visibility**
58
+ - Skill: Can see and control RRF weights (keyword_weight: 2.0, semantic_weight: 1.0)
59
+ - Agent: Uses `ctx.search()` with fixed RRF implementation
60
+
61
+ 8. **No multi-language FTS control**
62
+ - Skill: Can use `GREATEST(ts_rank(...'german'...), ts_rank(...'english'...))`
63
+ - Agent: Relies on context configuration
64
+
65
+ 9. **No learning/strategy persistence**
66
+ - Skill: Has STRATEGY.md that learns from searches
67
+ - Agent: No memory of what works/doesn't work
68
+
69
+ 10. **No exact field queries**
70
+ - Skill: Can query `WHERE tags LIKE '%important%'` or `WHERE "createdAt" > NOW() - INTERVAL '7 days'`
71
+ - Agent: Cannot filter by tags, dates, or custom fields
72
+
73
+ 11. **No iterative temp file workflow**
74
+ - Skill: Can save query results to temp file, then grep iteratively without loading all content into LLM context
75
+ - Agent: All results loaded into tool output immediately, consuming tokens
76
+ - **Impact**: Skill can retrieve 100 chunks, save to `/tmp/results.txt`, then grep for specific patterns, only loading relevant portions into context
77
+ - **Agent limitation**: Must either load all content (expensive) or use `includeContent: false` and make additional tool calls
78
+
79
+ ---
80
+
81
+ ## Comparison Table
82
+
83
+ | Capability | Retrieval Skill | Current Agent | Gap Severity |
84
+ |------------|----------------|---------------|--------------|
85
+ | **Basic Search** | ✅ Full control | ✅ Via ctx.search() | Low |
86
+ | **Hybrid/Semantic/Keyword** | ✅ Full SQL control | ✅ Via method param | Low |
87
+ | **COUNT queries** | ✅ `COUNT(*)`, `COUNT(DISTINCT)` | ❌ Cannot count | **HIGH** |
88
+ | **Aggregations** | ✅ `SUM`, `AVG`, `GROUP BY` | ❌ Cannot aggregate | **HIGH** |
89
+ | **Field filtering** | ✅ ANY field (tags, dates, JSONB) | ❌ Only name/id/external_id | **MEDIUM** |
90
+ | **Direct table queries** | ✅ Can query items table directly | ❌ Must use search | **MEDIUM** |
91
+ | **Custom SQL** | ✅ Any SQL query | ❌ Fixed to ctx.search() | **HIGH** |
92
+ | **RRF control** | ✅ Can adjust weights | ❌ Fixed implementation | Low |
93
+ | **Learning** | ✅ STRATEGY.md | ❌ No persistence | **MEDIUM** |
94
+ | **Multi-step reasoning** | ✅ Manual steps | ✅ Automatic | Equal |
95
+ | **Context expansion** | ✅ Precise SQL BETWEEN | ✅ get_more_content tool | Equal |
96
+ | **Multi-context** | ✅ Manual UNION | ✅ Automatic | Equal |
97
+ | **includeContent opt** | N/A | ✅ Smart optimization | Agent better |
98
+ | **Iterative filtering** | ✅ Temp file + grep workflow | ❌ All results in context | **HIGH** |
99
+
100
+ ---
101
+
102
+ ## Root Cause Analysis
103
+
104
+ ### Why is the Agent Limited?
105
+
106
+ The agent is constrained by its **tool-based architecture**:
107
+
108
+ 1. **Tools must be predefined** - Cannot dynamically create SQL-based tools
109
+ 2. **ctx.search() abstraction** - Hides the underlying SQL flexibility
110
+ 3. **No direct database access** - Tools don't expose `postgresClient` for raw queries
111
+ 4. **Fixed schema** - Tool input schemas cannot be dynamically generated based on table structure
112
+
113
+ ### What the Skill Does Differently
114
+
115
+ The skill operates at a **lower level**:
116
+ 1. **Direct PostgreSQL access** - Uses `psql` commands with full SQL control
117
+ 2. **Context discovery** - Queries `information_schema` to understand table structures
118
+ 3. **Dynamic query building** - Crafts SQL based on query type (COUNT, search, aggregation)
119
+ 4. **Strategy learning** - Stores successful patterns in STRATEGY.md
120
+
121
+ ---
122
+
123
+ ## Proposed Solution Strategy
124
+
125
+ ### Option 1: Add SQL Query Tool (RECOMMENDED)
126
+
127
+ **Concept:** Give the agent a new tool that can execute safe, read-only SQL queries directly.
128
+
129
+ **Implementation:**
130
+
131
+ ```typescript
132
+ const sqlQueryTool = tool({
133
+ description: `
134
+ Execute a read-only SQL query against ExuluContext tables for advanced retrieval needs.
135
+
136
+ Use this tool when you need to:
137
+ - COUNT or aggregate data (e.g., "How many documents mention X?")
138
+ - Filter by custom fields (tags, dates, JSONB fields)
139
+ - Get statistics (AVG, SUM, MIN, MAX)
140
+ - Query item metadata without searching content
141
+ - Execute complex JOINs or GROUP BY queries
142
+
143
+ IMPORTANT:
144
+ - Only SELECT queries allowed (no INSERT, UPDATE, DELETE)
145
+ - Must query ExuluContext tables (${contexts.map(c => `${c.id}_items, ${c.id}_chunks`).join(', ')})
146
+ - Use parameterized queries for user input
147
+ `,
148
+ inputSchema: z.object({
149
+ context_id: z.enum(contexts.map(c => c.id)),
150
+ query: z.string().describe("The SQL SELECT query to execute"),
151
+ reasoning: z.string().describe("Explain why you need this SQL query vs using search_content"),
152
+ }),
153
+ execute: async ({ context_id, query, reasoning }) => {
154
+ // Validate query is read-only
155
+ if (!/^\\s*SELECT/i.test(query)) {
156
+ throw new Error("Only SELECT queries allowed");
157
+ }
158
+
159
+ // Validate table names
160
+ const validTables = [`${context_id}_items`, `${context_id}_chunks`];
161
+ // ... more validation
162
+
163
+ // Execute with safety limits
164
+ const { db } = await postgresClient();
165
+ const results = await db.raw(query + ' LIMIT 1000');
166
+
167
+ return JSON.stringify(results.rows, null, 2);
168
+ }
169
+ });
170
+ ```
171
+
172
+ **Pros:**
173
+ - ✅ Maximum flexibility - agent can do anything the skill can
174
+ - ✅ Leverages LLM's SQL knowledge
175
+ - ✅ Minimal code changes
176
+
177
+ **Cons:**
178
+ - ⚠️ SQL injection risk (needs robust validation)
179
+ - ⚠️ Agent might generate inefficient queries
180
+ - ⚠️ Requires LLM to know table schemas
181
+
182
+ **Mitigation:**
183
+ - Provide table schema in tool description
184
+ - Use SQL parser to validate queries
185
+ - Set strict LIMIT caps
186
+ - Log all SQL queries for audit
187
+
188
+ ---
189
+
190
+ ### Option 1.5: Add Iterative Filtering with Virtual Bash Environment
191
+
192
+ **Concept:** Allow agent to save large result sets to a virtual filesystem, then iteratively grep/filter without loading all content into context. Uses `bash-tool` from AI SDK to provide grep/bash capabilities.
193
+
194
+ **Implementation:**
195
+
196
+ ```typescript
197
+ import { createBashTool } from 'bash-tool';
198
+ import { generateText } from 'ai';
199
+
200
+ // Create agent with bash tool support
201
+ const createAgenticRetrievalWithBashSupport = async ({
202
+ contexts,
203
+ model,
204
+ // ... other params
205
+ }) => {
206
+ // Initialize virtual bash environment
207
+ const { tools: bashTools, updateFiles } = await createBashTool({
208
+ files: {}, // Start with empty virtual filesystem
209
+ });
210
+
211
+ // Tool to save search results to virtual filesystem
212
+ const saveSearchResultsTool = tool({
213
+ description: `
214
+ Execute a search and save results to the virtual filesystem instead of returning them directly.
215
+ This is useful when you expect many results and want to iteratively filter them
216
+ without consuming tokens by loading all content into context.
217
+
218
+ After saving, you can use bash tools (grep, awk, head, tail) to find specific patterns.
219
+ The file will be available in the virtual filesystem at /search_results.txt
220
+ `,
221
+ inputSchema: z.object({
222
+ context_ids: z.array(z.enum(contexts.map(c => c.id))),
223
+ query: z.string(),
224
+ searchMethod: z.enum(['keyword', 'semantic', 'hybrid']),
225
+ limit: z.number().max(1000).describe("Can retrieve up to 1000 results"),
226
+ includeContent: z.boolean().default(true),
227
+ }),
228
+ execute: async ({ query, context_ids, searchMethod, limit, includeContent }) => {
229
+ // Execute search across contexts
230
+ const results = await Promise.all(
231
+ context_ids.map(async (contextId) => {
232
+ const ctx = contexts.find(c => c.id === contextId);
233
+ return await ctx.search({
234
+ query,
235
+ method: searchMethod === 'hybrid' ? 'hybridSearch' : ...,
236
+ limit,
237
+ // ... other params
238
+ });
239
+ })
240
+ );
241
+
242
+ const chunks = results.flat();
243
+
244
+ // Format results in a greppable format with clear separators
245
+ const formattedContent = chunks.map((chunk, idx) =>
246
+ `### RESULT ${idx + 1} ###\n` +
247
+ `ITEM_NAME: ${chunk.item_name}\n` +
248
+ `ITEM_ID: ${chunk.item_id}\n` +
249
+ `CHUNK_ID: ${chunk.chunk_id}\n` +
250
+ `CHUNK_INDEX: ${chunk.chunk_index}\n` +
251
+ `CONTEXT: ${chunk.context?.id}\n` +
252
+ `SCORE: ${chunk.chunk_hybrid_score || chunk.chunk_fts_rank || chunk.chunk_cosine_distance}\n` +
253
+ `---CONTENT START---\n` +
254
+ `${includeContent ? chunk.chunk_content : '[Content not included - use includeContent: true to load]'}\n` +
255
+ `---CONTENT END---\n\n`
256
+ ).join('');
257
+
258
+ // Update virtual filesystem with search results
259
+ await updateFiles({
260
+ 'search_results.txt': formattedContent,
261
+ 'search_metadata.json': JSON.stringify({
262
+ query,
263
+ timestamp: new Date().toISOString(),
264
+ results_count: chunks.length,
265
+ contexts: context_ids,
266
+ method: searchMethod,
267
+ }, null, 2)
268
+ });
269
+
270
+ return JSON.stringify({
271
+ success: true,
272
+ results_count: chunks.length,
273
+ message: `Saved ${chunks.length} results to virtual filesystem at /search_results.txt`,
274
+ available_commands: [
275
+ 'grep -i "pattern" search_results.txt',
276
+ 'grep "ITEM_NAME: specific_name" search_results.txt -A 10',
277
+ 'awk \'/RESULT/ {print $3}\' search_results.txt',
278
+ 'head -50 search_results.txt',
279
+ 'grep "CHUNK_ID:" search_results.txt | wc -l'
280
+ ],
281
+ next_steps: "Use bash tools to grep/filter the results. Example: grep -i 'safety' search_results.txt"
282
+ }, null, 2);
283
+ }
284
+ });
285
+
286
+ // Combine all tools
287
+ const allTools = {
288
+ ...searchTools,
289
+ ...searchItemsByNameTool,
290
+ save_search_results: saveSearchResultsTool,
291
+ ...bashTools, // Provides: bash, grep, awk, sed, head, tail, cat, etc.
292
+ };
293
+
294
+ return createCustomAgenticRetrievalToolLoopAgent({
295
+ model,
296
+ tools: allTools,
297
+ // ... other config
298
+ });
299
+ };
300
+ ```
301
+
302
+ **Workflow Example:**
303
+ ```
304
+ 1. User: "Find all documents about elevator safety"
305
+
306
+ 2. Agent reasoning: "This is a broad query, I'll save results to virtual filesystem"
307
+ Tool: save_search_results → Saves 100 chunks to virtual /search_results.txt
308
+ Output: "Saved 100 results, use grep to filter"
309
+
310
+ 3. Agent reasoning: "Now I'll grep for specific safety procedures"
311
+ Tool: bash → grep -i "notfall\|emergency" search_results.txt | head -30
312
+ Output: Shows 30 lines mentioning emergency procedures (minimal tokens)
313
+
314
+ 4. Agent reasoning: "Found relevant section in ITEM_NAME: Safety_Manual_2024, let me extract that chunk ID"
315
+ Tool: bash → grep -B 3 "Notfallverfahren" search_results.txt | grep "CHUNK_ID:"
316
+ Output: CHUNK_ID: abc-123-def
317
+
318
+ 5. Agent reasoning: "Now I'll get the full content for that specific chunk"
319
+ Tool: get_content_abc-123-def → Returns full chunk content
320
+ ```
321
+
322
+ **Pros:**
323
+ - ✅ Token efficiency - can retrieve 1000+ results without loading into context
324
+ - ✅ Iterative refinement - grep, awk, sed for complex filtering
325
+ - ✅ Cost savings - only load specific chunks after identifying via grep
326
+ - ✅ Speed - bash operations are instant vs multiple LLM tool calls
327
+ - ✅ Safe - virtual filesystem, no actual file system access
328
+ - ✅ Familiar - agents already know how to use bash/grep
329
+
330
+ **Cons:**
331
+ - ⚠️ Requires bash-tool dependency
332
+ - ⚠️ More complex workflow - agent needs training on when to use this pattern
333
+ - ⚠️ Formatting matters - results must be in greppable format with clear separators
334
+
335
+ **Token Savings Example:**
336
+ - Without this: Load 100 chunks × 500 tokens each = 50,000 tokens
337
+ - With this: Save to file (100 tokens) + grep operations (500 tokens) + load 3 specific chunks (1,500 tokens) = **2,100 tokens (96% savings)**
338
+
339
+ ---
340
+
341
+ ### Option 2: Add Specialized COUNT/Aggregation Tools
342
+
343
+ **Concept:** Create dedicated tools for common operations the skill can do.
344
+
345
+ **Implementation:**
346
+
347
+ ```typescript
348
+ const countTool = tool({
349
+ description: "Count items or chunks matching criteria",
350
+ inputSchema: z.object({
351
+ context_ids: z.array(z.enum(contexts.map(c => c.id))),
352
+ count_what: z.enum(['items', 'chunks', 'distinct_items']),
353
+ content_query: z.string().optional().describe("FTS query for chunks content"),
354
+ item_name_contains: z.string().optional(),
355
+ item_tags_contain: z.string().optional(),
356
+ created_after: z.string().optional().describe("ISO date"),
357
+ }),
358
+ execute: async ({ ... }) => {
359
+ // Build COUNT query based on parameters
360
+ const countQuery = buildCountQuery(params);
361
+ const results = await db.raw(countQuery);
362
+ return results.rows[0].count;
363
+ }
364
+ });
365
+
366
+ const aggregateTool = tool({
367
+ description: "Get statistics and aggregations",
368
+ inputSchema: z.object({
369
+ context_ids: z.array(z.enum(contexts.map(c => c.id))),
370
+ group_by: z.enum(['tags', 'created_date', 'item_name']).optional(),
371
+ aggregate: z.enum(['count', 'avg_chunks', 'sum_chunks']),
372
+ }),
373
+ // ... similar pattern
374
+ });
375
+
376
+ const filterItemsTool = tool({
377
+ description: "Query items table directly with advanced filtering",
378
+ inputSchema: z.object({
379
+ context_id: z.enum(contexts.map(c => c.id)),
380
+ filters: z.array(z.object({
381
+ field: z.string(),
382
+ operator: z.enum(['equals', 'contains', 'greater_than', 'less_than', 'in']),
383
+ value: z.any(),
384
+ })),
385
+ limit: z.number().default(100),
386
+ }),
387
+ // ... builds WHERE clause from filters
388
+ });
389
+ ```
390
+
391
+ **Pros:**
392
+ - ✅ Safer than raw SQL
393
+ - ✅ More guided - agent knows exactly what each tool does
394
+ - ✅ Can optimize each tool independently
395
+
396
+ **Cons:**
397
+ - ❌ Less flexible - need to add new tools for new patterns
398
+ - ❌ More code to maintain
399
+ - ❌ Still limited to predefined operations
400
+
401
+ ---
402
+
403
+ ### Option 3: Hybrid Approach (BEST BALANCE)
404
+
405
+ **Concept:** Combine both approaches with safety layers.
406
+
407
+ 1. **Add specialized tools** for common patterns (COUNT, basic aggregations)
408
+ 2. **Add SQL tool** for advanced cases, but with strict validation
409
+ 3. **Teach the agent** when to use which tool
410
+
411
+ **Implementation:**
412
+
413
+ ```typescript
414
+ const tools = {
415
+ ...existingTools,
416
+
417
+ // Safe, common operations
418
+ count_items_or_chunks: countTool,
419
+ aggregate_statistics: aggregateTool,
420
+ query_items_metadata: filterItemsTool,
421
+
422
+ // Advanced fallback (requires reasoning)
423
+ advanced_sql_query: {
424
+ ...sqlQueryTool,
425
+ description: `
426
+ ${sqlQueryTool.description}
427
+
428
+ USE THIS ONLY WHEN:
429
+ - count_items_or_chunks cannot handle the COUNT query
430
+ - aggregate_statistics cannot handle the aggregation
431
+ - query_items_metadata cannot handle the filtering
432
+ - You need complex JOINs or subqueries
433
+
434
+ ALWAYS TRY the specialized tools FIRST!
435
+ `
436
+ }
437
+ };
438
+ ```
439
+
440
+ **Agent instruction updates:**
441
+
442
+ ```typescript
443
+ const updatedInstructions = `
444
+ ${baseInstructions}
445
+
446
+ QUERY STRATEGY DECISION TREE:
447
+
448
+ 1. FOR COUNTING QUERIES ("how many...", "count...", "number of..."):
449
+ - Use count_items_or_chunks tool
450
+ - Specify what to count (items, chunks, distinct_items)
451
+ - Apply filters as needed
452
+
453
+ 2. FOR STATISTICS ("average...", "total...", "breakdown by..."):
454
+ - Use aggregate_statistics tool
455
+ - Choose aggregation type and grouping
456
+
457
+ 3. FOR METADATA QUERIES ("list items created after...", "show items tagged..."):
458
+ - Use query_items_metadata tool
459
+ - Build filter conditions
460
+
461
+ 4. FOR CONTENT SEARCH (default pattern):
462
+ - Use existing search_content / search_items_by_name tools
463
+
464
+ 5. FOR COMPLEX QUERIES (last resort):
465
+ - Use advanced_sql_query with full justification
466
+ - Must explain why specialized tools insufficient
467
+ `;
468
+ ```
469
+
470
+ **Pros:**
471
+ - ✅ Safe common path (80% of cases)
472
+ - ✅ Flexible escape hatch (20% of cases)
473
+ - ✅ Guided decision-making
474
+ - ✅ Easier to maintain safety
475
+
476
+ **Cons:**
477
+ - ⚠️ More tools = more complexity
478
+ - ⚠️ Agent must learn when to use which tool
479
+
480
+ ---
481
+
482
+ ## Recommended Implementation Plan
483
+
484
+ ### Phase 1: Add Counting & Aggregation (Week 1)
485
+
486
+ 1. **Implement `count_items_or_chunks` tool**
487
+ - Support: count items, count chunks, count distinct items by content query
488
+ - Add filtering: by name, tags, dates, custom fields
489
+ - Test with queries like "How many documents mention X?"
490
+
491
+ 2. **Implement `aggregate_statistics` tool**
492
+ - Support: COUNT, AVG, SUM, MIN, MAX
493
+ - GROUP BY support for: tags, dates, item names
494
+ - Test with queries like "Show me a breakdown of documents by tag"
495
+
496
+ 3. **Update agent instructions**
497
+ - Add COUNT query pattern
498
+ - Add STATISTICS query pattern
499
+ - Provide examples
500
+
501
+ **Success Metrics:**
502
+ - Agent can handle "how many..." queries
503
+ - Agent can provide breakdowns and statistics
504
+ - No SQL injection vulnerabilities
505
+
506
+ ### Phase 2: Add Advanced Filtering (Week 2)
507
+
508
+ 1. **Implement `query_items_metadata` tool**
509
+ - Support filtering by ANY field with operators
510
+ - Return item metadata without content
511
+ - Support pagination
512
+
513
+ 2. **Implement `query_by_custom_fields` tool**
514
+ - Allow filtering by tags, JSONB fields, dates
515
+ - Support complex conditions (AND, OR)
516
+
517
+ 3. **Implement iterative filtering tool**
518
+ - Add `save_search_results_to_file` tool that saves results without loading into context
519
+ - Returns file path where results were saved
520
+ - Agent can then use grep/analysis tools on the file iteratively
521
+ - Only loads relevant portions into context
522
+
523
+ 4. **Update dynamic tools**
524
+ - Make `get_more_content` accept chunk index ranges
525
+ - Add `get_item_details` for specific item exploration
526
+
527
+ **Success Metrics:**
528
+ - Agent can find items by date ranges
529
+ - Agent can filter by tags and custom fields
530
+ - Agent can explore specific items in detail
531
+ - Agent can retrieve 100+ chunks efficiently without consuming excessive tokens
532
+
533
+ ### Phase 3: Add SQL Tool (Week 3) - OPTIONAL
534
+
535
+ 1. **Implement `advanced_sql_query` tool with validation**
536
+ - SQL parser to verify SELECT-only
537
+ - Table name whitelist
538
+ - LIMIT enforcement
539
+ - Parameter binding
540
+
541
+ 2. **Add guardrails**
542
+ - Require reasoning field
543
+ - Log all queries
544
+ - Rate limiting
545
+
546
+ 3. **Update instructions**
547
+ - Teach agent when SQL tool is appropriate
548
+ - Provide SQL query templates
549
+ - Show table schemas
550
+
551
+ **Success Metrics:**
552
+ - Agent uses SQL tool only when necessary
553
+ - Zero SQL injection incidents
554
+ - Complex queries execute correctly
555
+
556
+ ### Phase 4: Add Learning (Week 4)
557
+
558
+ 1. **Implement strategy persistence**
559
+ - Store successful query patterns
560
+ - Track what tools worked for what queries
561
+ - Learn from failures
562
+
563
+ 2. **Add feedback loop**
564
+ - Agent asks if results were helpful
565
+ - Updates strategy based on feedback
566
+ - Shares learnings across sessions
567
+
568
+ 3. **Create STRATEGY.json**
569
+ - Store learned patterns
570
+ - Context-specific strategies
571
+ - Query type → tool mapping
572
+
573
+ **Success Metrics:**
574
+ - Agent improves over time
575
+ - Fewer failed searches
576
+ - Better tool selection
577
+
578
+ ---
579
+
580
+ ## Risk Assessment
581
+
582
+ | Risk | Severity | Mitigation |
583
+ |------|----------|------------|
584
+ | SQL Injection | 🔴 HIGH | SQL parser, whitelist, parameterization |
585
+ | Performance impact | 🟡 MEDIUM | Query timeouts, LIMIT enforcement |
586
+ | Wrong tool selection | 🟡 MEDIUM | Clear instructions, examples, learning |
587
+ | Maintenance burden | 🟡 MEDIUM | Good documentation, tests |
588
+ | Cost increase | 🟢 LOW | Proper tool selection reduces retries |
589
+
590
+ ---
591
+
592
+ ## Appendix: Example Queries the Agent Cannot Handle Now
593
+
594
+ 1. **COUNT queries:**
595
+ ```sql
596
+ -- Skill can do this
597
+ SELECT COUNT(DISTINCT source) FROM vorschriften_chunks
598
+ WHERE fts @@ to_tsquery('german', 'EN-8100');
599
+
600
+ -- Agent cannot
601
+ ```
602
+
603
+ 2. **Aggregations:**
604
+ ```sql
605
+ -- Skill can do this
606
+ SELECT
607
+ unnest(string_to_array(tags, ',')) as tag,
608
+ COUNT(*) as count
609
+ FROM techDoc_items
610
+ GROUP BY tag
611
+ ORDER BY count DESC;
612
+
613
+ -- Agent cannot
614
+ ```
615
+
616
+ 3. **Date filtering:**
617
+ ```sql
618
+ -- Skill can do this
619
+ SELECT * FROM techDoc_items
620
+ WHERE "createdAt" > NOW() - INTERVAL '7 days'
621
+ AND archived = false;
622
+
623
+ -- Agent cannot filter by dates
624
+ ```
625
+
626
+ 4. **Tag filtering:**
627
+ ```sql
628
+ -- Skill can do this
629
+ SELECT * FROM vorschriften_items
630
+ WHERE tags LIKE '%important%';
631
+
632
+ -- Agent cannot filter by tags
633
+ ```
634
+
635
+ 5. **Statistics:**
636
+ ```sql
637
+ -- Skill can do this
638
+ SELECT
639
+ AVG(chunks_count) as avg_chunks,
640
+ MAX(chunks_count) as max_chunks,
641
+ COUNT(*) as total_items
642
+ FROM techDoc_items;
643
+
644
+ -- Agent cannot compute statistics
645
+ ```
646
+
647
+ ---
648
+
649
+ ## Conclusion
650
+
651
+ The agentic retrieval agent is **functionally good** but **strategically limited** compared to the ExuluContext retrieval skill. The recommended path forward is:
652
+
653
+ 1. **Immediate (Week 1):** Add COUNT and aggregation tools
654
+ 2. **Short-term (Week 2):** Add advanced filtering tools
655
+ 3. **Medium-term (Week 3):** Consider SQL tool with strict validation
656
+ 4. **Long-term (Week 4):** Add learning/strategy persistence
657
+
658
+ This approach balances **safety**, **flexibility**, and **maintenance burden** while bringing the agent closer to the skill's capabilities.