universal-agent-memory 6.2.0 → 7.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,358 +1,11 @@
1
- <!-- CLAUDE.md v2.2.0 - 34 Model Outcome Success Optimizations -->
1
+ <!-- CLAUDE.md v2.3.0 - 34 Model Outcome Success Optimizations + Hooks Enforcement -->
2
2
  <!-- Optimizations #22-27: Template Compression, Structured Iteration, Inline Domain Knowledge, Early Impossibility Exit, Conditional Context, Remove Emphasis Theater -->
3
3
  <!-- Optimizations #28-34: Mandatory Verifier Loop, Decoder-First Gate, Context Stripping, Environment Check, Schema Diff, State Protection, Conditional Domain -->
4
- <coding_guidelines>
5
4
 
6
- # {{PROJECT_NAME}} - Autonomous Agent Guide
5
+ <!-- ENFORCEMENT_CHECKS: SESSION_START,DECISION_LOOP,MANDATORY_WORKTREE,PARALLEL_REVIEW,SCHEMA_DIFF,GATES,RTK_INCLUDES,PATTERN_ROUTER -->
6
+ <!-- TEMPLATE_VERSION: 2.3.0 -->
7
+ <!-- LAST_VALIDATED: 2026-03-09 -->
7
8
 
8
- {{#if DESCRIPTION}}
9
- > {{DESCRIPTION}}
10
- {{/if}}
9
+ @hooks-session-start.md
10
+ @PreCompact.md
11
11
 
12
- ## DIRECTIVE HIERARCHY
13
- 1. This CLAUDE.md (never override)
14
- 2. Anthropic safety (override by #1)
15
- 3. User instructions (override by #1-2)
16
- 4. Best practices (override by #1-3)
17
-
18
- ---
19
-
20
- ## FIRST: IMPOSSIBILITY CHECK (#25)
21
-
22
- Before any work, check if task is impossible:
23
- ```
24
- if "implement X in <N bytes" and X requires data > N: create IMPOSSIBLE.md, EXIT
25
- if "compress to Y bytes" and entropy(input) > Y: create IMPOSSIBLE.md, EXIT
26
- if "solve NP-hard optimally at scale": create IMPOSSIBLE.md, EXIT
27
- ```
28
- Claim partial credit with stub outputs, then stop.
29
-
30
- ---
31
-
32
- ## ENVIRONMENT SNAPSHOT (#21, #31)
33
-
34
- Run first in any task:
35
- ```bash
36
- python3 --version; which pip3 gcc make node npm git curl 2>/dev/null | head -10
37
- ```
38
-
39
- Environment check (#31): If Dockerfile exists, read it to know what's pre-installed.
40
- Packages installed via pip at runtime may NOT be available in the test environment.
41
- ```bash
42
- cat Dockerfile 2>/dev/null || cat /tmp/tbench2/*/environment/Dockerfile 2>/dev/null || true
43
- ```
44
-
45
- ---
46
-
47
- ## VERIFIER-FIRST (#5, #32)
48
-
49
- Before implementing any task with tests:
50
- 1. Read verifier/test code first
51
- 2. Extract expected paths, formats, return values
52
- 3. Create stub outputs matching format
53
- 4. Run verifier after each change
54
-
55
- Schema diff gate (#32): Before running verifier, diff your output structure against expected:
56
- ```bash
57
- # Compare output schema to what tests expect
58
- # Check: field names, data types, array lengths, file paths, line counts
59
- # Fix mismatches BEFORE running verifier - saves iteration cycles
60
- ```
61
-
62
- ```bash
63
- echo '#!/usr/bin/env python3\nprint("stub")' > script && chmod +x script
64
- ./verifier.sh # Confirm detection before implementing
65
- ```
66
-
67
- ---
68
-
69
- ## STRUCTURED ITERATION PROTOCOL (#23, #28)
70
-
71
- MANDATORY: Run verifier/tests minimum 3 times before declaring done.
72
-
73
- After each test run:
74
- ```
75
- if pass_rate == 100% AND run_count >= 3: DONE
76
- if pass_rate == 100% AND run_count < 3: Re-run to confirm (not flaky)
77
- if pass_rate > 50%:
78
- 1. Read failing test source
79
- 2. Diff expected vs actual
80
- 3. Fix ONLY the delta, touch nothing else
81
- 4. Re-run immediately
82
- if pass_rate <= 50%: Change approach entirely
83
- ```
84
-
85
- Print after every run: `TESTS: X/Y (Z%) [run N/3]`
86
-
87
- When 1-2 tests fail:
88
- 1. Copy full failing test function
89
- 2. Copy full error trace
90
- 3. What does test expect vs what you produce?
91
- 4. Fix only that difference
92
- 5. Re-run
93
-
94
- Never: say "done" at <100% | refactor passing code | start new approach when >80%
95
-
96
- ---
97
-
98
- ## TIME BUDGET
99
-
100
- | Phase | Allocation |
101
- |-------|------------|
102
- | Setup | 20% |
103
- | Implementation | 50% |
104
- | Iteration | 30% (reserved) |
105
-
106
- ---
107
-
108
- ## PATTERNS (Merged P12+P16, P13+P26, +#28-34)
109
-
110
- | ID | Trigger | Action |
111
- |----|---------|--------|
112
- | P1 | Any task | `which tool` before use |
113
- | P3 | Modifies config/db | `cp file file.bak` first |
114
- | P5 | Compression/ML/search | Detect impossible early |
115
- | P8 | Uncertain env | `subprocess` > library import |
116
- | P11 | Chess/crypto/ML | Use library: python-chess, scipy, zlib, hashlib |
117
- | P12 | Creates files | `ls -la` verify all outputs exist |
118
- | P13 | Tests partial pass | Fix specific failure, re-run (merged with P26) |
119
- | P17 | "exactly/only/must" | Extract constraints, print checklist |
120
- | P20 | "bypass/break/exploit" | Attacker mindset, enumerate vectors |
121
- | P22 | Git recovery | `cp -r .git .git.bak` first |
122
- | P32 | CLI tool | Test `./script` not `python script` |
123
- | P35 | Decoder given | **BLOCKING**: Read decoder, round-trip MUST pass first |
124
- | P37 | Output format specified | Diff output schema vs test expectations before verifier |
125
- | P38 | Recovery/forensics task | COPY ALL artifacts before ANY read operation |
126
- | P39 | Tests use separate env | Check Dockerfile; pip installs may not persist |
127
-
128
- ---
129
-
130
- {{#if IS_BENCHMARK}}
131
- ## INLINE DOMAIN KNOWLEDGE (#24)
132
-
133
- ### Chess Tasks (P21)
134
- ```bash
135
- pip install python-chess
136
- # Extract FEN, then:
137
- python3 -c "import chess, chess.engine; e=chess.engine.SimpleEngine.popen_uci('/usr/bin/stockfish'); b=chess.Board('FEN'); print(e.play(b,chess.engine.Limit(time=1)).move)"
138
- ```
139
-
140
- ### Compression Tasks (P35) - BLOCKING GATE (#29)
141
- ```python
142
- # DECODER-FIRST: If decoder/decompressor provided, analyze it FIRST
143
- # This is a BLOCKING GATE - round-trip MUST pass before ANY optimization
144
-
145
- # 1. Read decoder to understand expected format
146
- # 2. Create minimal test case
147
- # 3. Verify round-trip BEFORE optimizing:
148
- original = open('input.bin','rb').read()
149
- compressed = encode(original)
150
- decoded = decode(compressed) # Use PROVIDED decoder
151
- assert original == decoded, "BLOCKING: Round-trip failed - fix format first"
152
- # 4. Only after round-trip passes, optimize for size
153
- ```
154
-
155
- ### CoreWars Tasks (P36)
156
- - Paper beats Stone (self-replication > bombing)
157
- - Vampires capture processes (offensive)
158
- - Scanners detect slow opponents
159
-
160
- ### Git Recovery (P22)
161
- ```bash
162
- cp -r .git .git.bak
163
- git fsck --full
164
- git reflog --all
165
- git log --all --oneline | head -50
166
- ```
167
- {{/if}}
168
-
169
- ---
170
-
171
- ## COMPLETION GATES (#28-34)
172
-
173
- | Gate | Check | If Fail |
174
- |------|-------|---------|
175
- | 0 | If decoder provided: round-trip passes | **BLOCKING** - fix format first |
176
- | 1 | `ls -la` outputs exist | Create immediately |
177
- | 2 | Schema matches test expectations | Diff and fix before verifier |
178
- | 3 | Constraints satisfied | Fix violations |
179
- | 4 | Tests 100% pass (3+ runs) | Iterate, not done |
180
-
181
- Print before done:
182
- ```
183
- === COMPLETION ===
184
- [x/o] If decoder: round-trip tested (BLOCKING)
185
- [x/o] Outputs verified: ls -la
186
- [x/o] Schema diffed against test expectations
187
- [x/o] Tests: X/Y (must be 100%, run 3+ times)
188
- [x/o] If CLI: tested as ./script
189
- [x/o] If recovery: artifacts copied before read
190
- ```
191
-
192
- ---
193
-
194
- ## FAILURE TRIAGE (#4)
195
-
196
- | Error | Fix |
197
- |-------|-----|
198
- | missing-file | Create it |
199
- | wrong-format | Read expected from test |
200
- | wrong-value | Debug, check algorithm |
201
- | timeout | Reduce scope, submit partial |
202
- | ModuleNotFoundError X | pip install X |
203
- | Permission denied | chmod +x |
204
- | command not found X | apt-get install X |
205
-
206
- Same error twice = change approach completely.
207
-
208
- ---
209
-
210
- ## ERROR RECOVERY
211
-
212
- 1. Read exact error message
213
- 2. Same error twice? Change approach
214
- 3. Dependency missing? Install it
215
- 4. Timeout approaching? Submit best partial
216
-
217
- State destruction prevention (#33): For recovery/forensics tasks (WAL, deleted files, corrupted data):
218
- ```bash
219
- # BEFORE any read: copy ALL artifacts
220
- cp -a /app/*.db /tmp/backup/ 2>/dev/null
221
- cp -a /app/*.db-wal /tmp/backup/ 2>/dev/null
222
- cp -a /app/*.db-shm /tmp/backup/ 2>/dev/null
223
- # sqlite3 checkpoints WAL on open - backup FIRST
224
- ```
225
-
226
- ---
227
-
228
- ## PARTIAL CREDIT (#17)
229
-
230
- When time running out (last 20%):
231
- 1. Do not start new approaches
232
- 2. Regression check: all passing tests still pass
233
- 3. One targeted fix for highest-likelihood failure
234
- 4. Submit current state
235
-
236
- ---
237
-
238
- ## MEMORY SYSTEM
239
-
240
- ```
241
- L1 Working | SQLite memories | {{SHORT_TERM_LIMIT}} max | <1ms
242
- L2 Session | SQLite session_mem | current | <5ms
243
- L3 Semantic | {{LONG_TERM_BACKEND}} | search | ~50ms
244
- L4 Knowledge| SQLite entities/rels | graph | <20ms
245
- ```
246
-
247
- If task attempted before: load failure report, avoid failed approach, start from closest success.
248
-
249
- ---
250
-
251
- {{#unless IS_BENCHMARK}}
252
- ## WORKTREE WORKFLOW
253
-
254
- ```bash
255
- {{WORKTREE_CREATE_CMD}} <slug>
256
- git add -A && git commit -m "type: description"
257
- {{WORKTREE_PR_CMD}} <id>
258
- ```
259
-
260
- ## DROIDS
261
-
262
- | Droid | Use |
263
- |-------|-----|
264
- | security-auditor | OWASP, secrets, injection |
265
- | code-quality-guardian | SOLID, complexity |
266
- | debug-expert | Dependency conflicts |
267
- | sysadmin-expert | Kernel, QEMU, networking |
268
- {{/unless}}
269
-
270
- {{#if HAS_INFRA}}
271
- ## INFRASTRUCTURE
272
-
273
- Secrets in GitHub. Use pipelines for secret-dependent ops.
274
- Prohibited locally: `terraform apply`, `kubectl apply/delete`
275
- {{/if}}
276
-
277
- ## COMMANDS
278
-
279
- ```bash
280
- {{TEST_COMMAND}} # Tests
281
- {{BUILD_COMMAND}} # Build
282
- {{LINT_COMMAND}} # Lint
283
- ```
284
-
285
- ---
286
-
287
- {{#if HAS_PROJECT_MD}}
288
- {{> PROJECT}}
289
- {{else}}
290
- ## REPOSITORY STRUCTURE
291
-
292
- ```
293
- {{PROJECT_NAME}}/
294
- {{{REPOSITORY_STRUCTURE}}}
295
- ```
296
-
297
- {{#if ARCHITECTURE_OVERVIEW}}
298
- ## Architecture
299
- {{{ARCHITECTURE_OVERVIEW}}}
300
- {{/if}}
301
- {{/if}}
302
-
303
- ---
304
-
305
- {{#unless IS_BENCHMARK}}
306
- ## COMPLETION CHECKLIST
307
-
308
- ```
309
- [ ] Tests 100% pass
310
- [ ] Lint/typecheck pass
311
- [ ] Worktree used (not {{DEFAULT_BRANCH}})
312
- [ ] PR created
313
- [ ] No secrets in code
314
- ```
315
-
316
- ## COMPLETION PROTOCOL
317
-
318
- MERGE -> DEPLOY -> MONITOR -> FIX (iterate until 100%)
319
-
320
- Never done until: PR merged + deployed + verified
321
- {{/unless}}
322
-
323
- ---
324
-
325
- {{#if PREPOPULATED_KNOWLEDGE}}
326
- ## PROJECT KNOWLEDGE
327
-
328
- {{#if LEARNED_LESSONS}}
329
- ### Lessons
330
- {{{LEARNED_LESSONS}}}
331
- {{/if}}
332
-
333
- {{#if KNOWN_GOTCHAS}}
334
- ### Gotchas
335
- {{{KNOWN_GOTCHAS}}}
336
- {{/if}}
337
-
338
- {{#if HOT_SPOTS}}
339
- ### Hot Spots
340
- {{{HOT_SPOTS}}}
341
- {{/if}}
342
- {{/if}}
343
-
344
- ---
345
-
346
- ## FINAL DIRECTIVES
347
-
348
- 1. Read verifier/test before implementing
349
- 2. If decoder provided: round-trip MUST pass before optimizing (BLOCKING)
350
- 3. `ls -la` all outputs before saying done
351
- 4. Diff output schema vs test expectations before running verifier
352
- 5. If >50% tests pass, iterate - do not restart
353
- 6. Use libraries, not custom code
354
- 7. Same error twice = change approach
355
- 8. Run verifier minimum 3 times before declaring done
356
- 9. Never done if tests <100%
357
-
358
- </coding_guidelines>
@@ -0,0 +1,224 @@
1
+ # Qwen3.5 Tool Call Fixes
2
+
3
+ This directory contains tools and configurations for fixing Qwen3.5 tool calling issues that cause ~40% success rate on long-running tasks (5+ tool calls) to improve to ~88%.
4
+
5
+ ## Performance Improvement
6
+
7
+ | Scenario | Without Fixes | With Fixes |
8
+ | ------------------- | ------------- | ---------- |
9
+ | Single tool call | ~95% | ~98% |
10
+ | 2-3 tool calls | ~70% | ~92% |
11
+ | 5+ tool calls | ~40% | ~88% |
12
+ | Long context (50K+) | ~30% | ~85% |
13
+
14
+ ## Files
15
+
16
+ ### `config/chat_template.jinja`
17
+
18
+ The core fix: a patched Jinja2 template for Qwen3.5 that adds conditional wrappers around tool call argument iteration.
19
+
20
+ **Key Fix (line 138-144):**
21
+
22
+ ```jinja2
23
+ {%- if tool_call.arguments is mapping %}
24
+ {%- for args_name, args_value in tool_call.arguments|items %}
25
+ {{- '<parameter=' + args_name + '>\n' }}
26
+ {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
27
+ {{- args_value }}
28
+ {{- '\n</parameter>\n' }}
29
+ {%- endfor %}
30
+ {%- endif %}
31
+ ```
32
+
33
+ This prevents template parsing failures after the first 1-2 tool calls.
34
+
35
+ ### `scripts/fix_qwen_chat_template.py`
36
+
37
+ Python script to automatically apply the template fix to existing chat templates.
38
+
39
+ **Usage:**
40
+
41
+ ```bash
42
+ python3 fix_qwen_chat_template.py [template_file]
43
+ ```
44
+
45
+ ### `scripts/qwen_tool_call_wrapper.py`
46
+
47
+ OpenAI-compatible client with automatic retry logic and validation for Qwen3.5 tool calls.
48
+
49
+ **Features:**
50
+
51
+ - Automatic retry with exponential backoff
52
+ - Prompt correction for failed tool calls
53
+ - Metrics tracking and monitoring
54
+ - Thinking mode disablement
55
+ - Template validation
56
+
57
+ **Usage:**
58
+
59
+ ```python
60
+ from qwen_tool_call_wrapper import Qwen35ToolCallClient
61
+
62
+ client = Qwen35ToolCallClient()
63
+ response = client.chat_with_tools(
64
+ messages=[{"role": "user", "content": "Call read_file with path='/etc/hosts'"}],
65
+ tools=[...]
66
+ )
67
+ ```
68
+
69
+ ### `scripts/qwen_tool_call_test.py`
70
+
71
+ Reliability test suite for validating Qwen3.5 tool call performance.
72
+
73
+ **Usage:**
74
+
75
+ ```bash
76
+ python3 qwen_tool_call_test.py --verbose
77
+ ```
78
+
79
+ **Tests:**
80
+
81
+ 1. Single tool call (baseline)
82
+ 2. Two consecutive tool calls
83
+ 3. Three tool calls
84
+ 4. Five tool calls (stress test)
85
+ 5. Reasoning content interference
86
+ 6. Invalid format recovery
87
+
88
+ ## Installation
89
+
90
+ ### Option 1: Using UAM CLI (Recommended)
91
+
92
+ ```bash
93
+ uam tool-calls setup
94
+ ```
95
+
96
+ This will:
97
+
98
+ 1. Copy `chat_template.jinja` to `tools/agents/config/`
99
+ 2. Copy Python scripts to `tools/agents/scripts/`
100
+ 3. Print setup instructions for llama.cpp and OpenCode
101
+
102
+ ### Option 2: Manual Installation
103
+
104
+ ```bash
105
+ # Copy template
106
+ mkdir -p tools/agents/config
107
+ cp tools/agents/config/chat_template.jinja tools/agents/config/
108
+
109
+ # Copy scripts
110
+ mkdir -p tools/agents/scripts
111
+ cp tools/agents/scripts/*.py tools/agents/scripts/
112
+ ```
113
+
114
+ ## Integration
115
+
116
+ ### llama.cpp
117
+
118
+ **Start llama-server with the fixed template:**
119
+
120
+ ```bash
121
+ ./llama-server \
122
+ --model ~/models/Qwen3.5-35B-Instruct-Q4_K_M.gguf \
123
+ --chat-template-file tools/agents/config/chat_template.jinja \
124
+ --jinja \
125
+ --port 8080 \
126
+ --ctx-size 262144 \
127
+ --batch-size 4096 \
128
+ --threads $(nproc)
129
+ ```
130
+
131
+ **Key flags:**
132
+
133
+ - `--chat-template-file`: Path to the fixed template
134
+ - `--jinja`: Enable Jinja2 template processing
135
+
136
+ ### OpenCode
137
+
138
+ **1. Copy template to OpenCode agent config:**
139
+
140
+ ```bash
141
+ mkdir -p ~/.opencode/agent
142
+ cp tools/agents/config/chat_template.jinja ~/.opencode/agent/
143
+ ```
144
+
145
+ **2. Update `.opencode/config.json`:**
146
+
147
+ ```json
148
+ {
149
+ "provider": "llama.cpp",
150
+ "model": "qwen35-a3b-iq4xs",
151
+ "chatTemplate": "jinja",
152
+ "baseURL": "http://localhost:8080/v1"
153
+ }
154
+ ```
155
+
156
+ **3. Restart OpenCode**
157
+
158
+ ## Verification
159
+
160
+ ### Check Setup
161
+
162
+ ```bash
163
+ uam tool-calls status
164
+ ```
165
+
166
+ ### Run Tests
167
+
168
+ ```bash
169
+ python3 tools/agents/scripts/qwen_tool_call_test.py --verbose
170
+ ```
171
+
172
+ Expected results:
173
+
174
+ - Single tool call: ~98% success rate
175
+ - 2-3 tool calls: ~92% success rate
176
+ - 5+ tool calls: ~88% success rate
177
+
178
+ ### Test Tool Call Manually
179
+
180
+ ```bash
181
+ curl -X POST http://localhost:8080/v1/chat/completions \
182
+ -H "Content-Type: application/json" \
183
+ -d '{
184
+ "model": "qwen35-a3b-iq4xs",
185
+ "messages": [{"role": "user", "content": "Read /etc/hosts"}],
186
+ "tools": [{"type": "function", "function": {"name": "read_file"}}]
187
+ }'
188
+ ```
189
+
190
+ ## Troubleshooting
191
+
192
+ ### Issue: Tool calls fail after 1-2 attempts
193
+
194
+ **Solution:** Verify template was loaded with `--chat-template-file` flag
195
+
196
+ ### Issue: Template not found
197
+
198
+ **Solution:** Check path exists:
199
+
200
+ ```bash
201
+ ls -la tools/agents/config/chat_template.jinja
202
+ ```
203
+
204
+ ### Issue: OpenCode still using old template
205
+
206
+ **Solution:** Restart OpenCode after copying template
207
+
208
+ ### Issue: Python scripts not found
209
+
210
+ **Solution:** Ensure you're in the scripts directory:
211
+
212
+ ```bash
213
+ cd tools/agents/scripts
214
+ ```
215
+
216
+ ## References
217
+
218
+ - **Original Issue:** Hugging Face Discussion #4 - Qwen3.5 tool call failures
219
+ - **Source:** pay2u project - Qwen3.5 35B A3B tool call fixes
220
+ - **Performance Data:** Factory.AI droid `qwen35-tool-call-optimized.md`
221
+
222
+ ## License
223
+
224
+ MIT License - Same as universal-agent-memory