universal-agent-memory 2.4.0 → 2.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/templates/CLAUDE.template.md +115 -198
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "universal-agent-memory",
|
|
3
|
-
"version": "2.
|
|
3
|
+
"version": "2.7.0",
|
|
4
4
|
"description": "Universal AI agent memory system - CLAUDE.md templates, memory, worktrees for Claude Code, Factory.AI, VSCode, OpenCode",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "dist/index.js",
|
|
@@ -1,8 +1,8 @@
|
|
|
1
|
-
<!-- CLAUDE.md v2.
|
|
2
|
-
<!--
|
|
3
|
-
<!--
|
|
4
|
-
<!--
|
|
5
|
-
<!--
|
|
1
|
+
<!-- CLAUDE.md v2.7.0 - 58 Model Outcome Success Optimizations -->
|
|
2
|
+
<!-- #55: Pattern Table Compression (keep 12 essential + conditional domain) -->
|
|
3
|
+
<!-- #56: Round-Trip Deduplication (single authoritative section) -->
|
|
4
|
+
<!-- #57: Memory System Compression (inline format) -->
|
|
5
|
+
<!-- #58: Critical Reminders Trim (8 highest-impact items) -->
|
|
6
6
|
<coding_guidelines>
|
|
7
7
|
|
|
8
8
|
# {{PROJECT_NAME}} - Autonomous Agent Guide
|
|
@@ -33,7 +33,7 @@ Do not handle only the happy path.
|
|
|
33
33
|
|
|
34
34
|
---
|
|
35
35
|
|
|
36
|
-
## PATTERN ROUTER (MANDATORY FIRST) (#41)
|
|
36
|
+
## PATTERN ROUTER (MANDATORY FIRST) (#41, #47, #50, #53)
|
|
37
37
|
|
|
38
38
|
Before ANY work, print:
|
|
39
39
|
```
|
|
@@ -42,9 +42,16 @@ Task: [summary]
|
|
|
42
42
|
Classification: [file-creation|constraint|adversarial|pipeline|impossible]
|
|
43
43
|
PATTERNS: P12:[Y/N] P17:[Y/N] P20:[Y/N] P11:[Y/N] P35:[Y/N]
|
|
44
44
|
ACTIVE: [list YES only]
|
|
45
|
+
BLOCKING: [P35: decoder.py/decode exists? → MUST round-trip before optimize]
|
|
46
|
+
VERIFIER: [cat verifier.sh test_*.py 2>/dev/null | head -20]
|
|
47
|
+
HOOKS: [ls /tmp/*_strategies.txt /tmp/*_hints.txt 2>/dev/null | head -3]
|
|
45
48
|
=== END ===
|
|
46
49
|
```
|
|
47
50
|
|
|
51
|
+
**#47 BLOCKING GATE**: If P35=YES (decoder provided), you CANNOT optimize until round-trip passes.
|
|
52
|
+
**#53 VERIFIER**: Read verifier/tests BEFORE implementing. Extract expected outputs, formats, paths.
|
|
53
|
+
**#50 HOOKS**: If hook files exist, READ THEM before implementing.
|
|
54
|
+
|
|
48
55
|
---
|
|
49
56
|
|
|
50
57
|
## FIRST: IMPOSSIBILITY CHECK (#25)
|
|
@@ -91,189 +98,76 @@ This guarantees partial credit even if implementation fails.
|
|
|
91
98
|
|
|
92
99
|
---
|
|
93
100
|
|
|
94
|
-
##
|
|
101
|
+
## ROUND-TRIP GATE (P35, #37, #47) - BLOCKING
|
|
95
102
|
|
|
96
103
|
If task provides BOTH encoder AND decoder:
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
decoded = decode(compressed) # Use PROVIDED decoder
|
|
102
|
-
assert test_input == decoded, "BLOCKING: Round-trip failed"
|
|
103
|
-
# Scale up: 100 bytes -> 1KB -> full input
|
|
104
|
-
# Only optimize size AFTER round-trip passes at all scales
|
|
105
|
-
```
|
|
106
|
-
|
|
107
|
-
---
|
|
108
|
-
|
|
109
|
-
## STRUCTURED ITERATION PROTOCOL (#23, #28)
|
|
110
|
-
|
|
111
|
-
Run verifier minimum 3 times before declaring done.
|
|
112
|
-
|
|
113
|
-
```
|
|
114
|
-
if pass_rate == 100% AND run_count >= 3: DONE
|
|
115
|
-
if pass_rate >= 75% AND 1-2 tests failing: LAST-MILE DEBUGGING (#36)
|
|
116
|
-
if pass_rate > 50%: Fix ONLY the delta, re-run immediately
|
|
117
|
-
if pass_rate <= 50%: Change approach entirely
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
Print after every run: `TESTS: X/Y (Z%) [run N/3]`
|
|
104
|
+
1. READ decoder FIRST - understand its expected format
|
|
105
|
+
2. Write encoder to MATCH that format exactly
|
|
106
|
+
3. Test round-trip at scales: `10B → 100B → 1KB → full`
|
|
107
|
+
4. ONLY optimize size AFTER round-trip passes at ALL scales
|
|
121
108
|
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
## LAST-MILE DEBUGGING PROTOCOL (#36)
|
|
125
|
-
|
|
126
|
-
When pass_rate >= 75% and exactly 1-2 tests failing:
|
|
127
|
-
```
|
|
128
|
-
1. STOP implementing new code
|
|
129
|
-
2. Read ENTIRE failing test function (not just assertion)
|
|
130
|
-
3. Print exact expected vs actual values
|
|
131
|
-
4. Binary search difference: format? precision? edge case?
|
|
132
|
-
5. If numerical: try tolerance (atol=1e-6 -> 1e-3 -> 1e-1)
|
|
133
|
-
6. If format: character-by-character diff
|
|
134
|
-
7. Spend up to 40% of remaining time on last-mile
|
|
135
|
-
```
|
|
136
|
-
|
|
137
|
-
Never: say "done" at <100% | refactor passing code | start new approach when >80%
|
|
109
|
+
If garbage output: encoder format mismatches decoder. Re-read decoder byte-by-byte.
|
|
110
|
+
Completion Gate 0: If decoder provided, round-trip MUST pass before ANY optimization.
|
|
138
111
|
|
|
139
112
|
---
|
|
140
113
|
|
|
141
|
-
##
|
|
114
|
+
## ESSENTIAL PATTERNS (#42, #55)
|
|
142
115
|
|
|
143
|
-
| Phase | Allocation |
|
|
144
|
-
|-------|------------|
|
|
145
|
-
| Setup | 20% |
|
|
146
|
-
| Implementation | 50% |
|
|
147
|
-
| Iteration | 30% (reserved) |
|
|
148
|
-
|
|
149
|
-
---
|
|
150
|
-
|
|
151
|
-
## PATTERNS REFERENCE (#42)
|
|
152
|
-
|
|
153
|
-
### Core (P1-P11)
|
|
154
116
|
| P | Trigger | Action |
|
|
155
117
|
|---|---------|--------|
|
|
156
118
|
| 1 | Any task | `which tool` before use |
|
|
157
|
-
| 2 | Complex task | Convert to numbered recipe steps |
|
|
158
119
|
| 3 | Modifies config/db | `cp file file.bak` FIRST |
|
|
159
|
-
| 4 | Ambiguous how | Specify exact tool+flags |
|
|
160
120
|
| 5 | Compression/ML/search | Detect impossible early, refuse |
|
|
161
|
-
| 6 | Critical rules | Put at END of context (recency bias) |
|
|
162
|
-
| 7 | Any task | Classify: state-sensitive→backup, tool→specify |
|
|
163
|
-
| 8 | Uncertain env | `subprocess` > library import |
|
|
164
|
-
| 9 | Non-text data | Convert→standard format first |
|
|
165
|
-
| 10 | Security filter | Allow-list ONLY, never blacklist |
|
|
166
121
|
| 11 | Chess/crypto/ML | Use library: python-chess, scipy, zlib, hashlib |
|
|
167
|
-
|
|
168
|
-
### Output (P12-P16)
|
|
169
|
-
| P | Trigger | Action |
|
|
170
|
-
|---|---------|--------|
|
|
171
122
|
| 12 | Creates files | `ls -la` verify ALL outputs exist before done |
|
|
172
|
-
| 13 | Tests partial pass | Fix specific failure, re-run, iterate
|
|
173
|
-
| 14 | Format specified | Validate output matches spec exactly |
|
|
174
|
-
| 15 | Exception thrown | Read full trace, fix error, re-run |
|
|
123
|
+
| 13 | Tests partial pass | Fix specific failure, re-run, iterate |
|
|
175
124
|
| 16 | Any output task | `touch` placeholders FIRST, then implement |
|
|
176
|
-
|
|
177
|
-
### Constraint (P17-P20)
|
|
178
|
-
| P | Trigger | Action |
|
|
179
|
-
|---|---------|--------|
|
|
180
125
|
| 17 | "exactly/only/must" | Extract constraints, print checklist |
|
|
181
|
-
| 18 | Multi-step | Identify tool per stage, chain |
|
|
182
|
-
| 19 | Impossible markers | REFUSE immediately, create IMPOSSIBLE.md |
|
|
183
126
|
| 20 | "bypass/break/exploit" | Attacker mindset, enumerate vectors |
|
|
184
|
-
|
|
185
|
-
### Domain (P21-P26)
|
|
186
|
-
| P | Trigger | Action |
|
|
187
|
-
|---|---------|--------|
|
|
188
|
-
| 21 | Chess | Use Stockfish: `stockfish` with FEN |
|
|
189
127
|
| 22 | Git recovery | `cp -r .git .git.bak` FIRST, then `git fsck` |
|
|
190
|
-
| 23 | Compress impossible | Refuse if target < Shannon limit |
|
|
191
|
-
| 24 | Polyglot | Search existing examples first |
|
|
192
|
-
| 25 | Multi-service | Configure in dependency order, test each |
|
|
193
128
|
| 26 | >50% tests pass | Focus on specific failures, 30% time reserved |
|
|
129
|
+
| 35 | Decoder given | **BLOCKING**: Read decoder, round-trip MUST pass first |
|
|
130
|
+
| 39 | Tests use separate env | Check Dockerfile; pip installs may not persist |
|
|
194
131
|
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|---|---------|--------|
|
|
198
|
-
| 27 | Output dir constraint | Remove non-required files before done |
|
|
199
|
-
| 28 | Service task | `curl` test BEFORE claiming done |
|
|
200
|
-
| 29 | "all/both/every" | Find ALL solutions, not just first |
|
|
201
|
-
| 30 | "% threshold" | Iterate until threshold met |
|
|
202
|
-
| 31 | Transform task | Round-trip: `original == decompress(compress(original))` |
|
|
203
|
-
|
|
204
|
-
### Execution (P32-P39)
|
|
132
|
+
{{#if IS_BENCHMARK}}
|
|
133
|
+
### Domain Patterns (Benchmark)
|
|
205
134
|
| P | Trigger | Action |
|
|
206
135
|
|---|---------|--------|
|
|
207
|
-
|
|
|
208
|
-
|
|
|
209
|
-
|
|
|
210
|
-
|
|
|
136
|
+
| 21 | Chess | Use Stockfish with FEN |
|
|
137
|
+
| 23 | Compress impossible | Refuse if target < Shannon limit |
|
|
138
|
+
| 24 | Polyglot | Search existing examples first |
|
|
139
|
+
| 33 | Numerical | Tolerance escalation: 1e-6→1e-3→1e-1 |
|
|
211
140
|
| 36 | Competition | Research strategies BEFORE implementing |
|
|
212
|
-
|
|
|
213
|
-
|
|
214
|
-
| 39 | Tests use separate env | Check Dockerfile; pip installs may not persist |
|
|
141
|
+
| 38 | Recovery/forensics | COPY ALL artifacts before ANY read |
|
|
142
|
+
{{/if}}
|
|
215
143
|
|
|
216
144
|
---
|
|
217
145
|
|
|
218
146
|
{{#if IS_BENCHMARK}}
|
|
219
|
-
## INLINE DOMAIN KNOWLEDGE (#24, #39)
|
|
220
|
-
|
|
221
|
-
### XSS/HTML Filtering (HIGH FAILURE RATE)
|
|
222
|
-
```python
|
|
223
|
-
# Use bleach or DOMPurify approach - NEVER regex-only
|
|
224
|
-
# Whitelist-only: allow known-safe tags/attributes, strip everything else
|
|
225
|
-
# Test: <script>, <img onerror=>, <svg onload=>, javascript: URIs, data: URIs
|
|
226
|
-
# Edge cases: nested encoding, null bytes, case variations, incomplete tags
|
|
227
|
-
import bleach
|
|
228
|
-
clean = bleach.clean(html, tags=['p','b','i','a','ul','li'], attributes={'a':['href']}, strip=True)
|
|
229
|
-
```
|
|
147
|
+
## INLINE DOMAIN KNOWLEDGE (#24, #39, #50 token-optimized)
|
|
230
148
|
|
|
231
|
-
###
|
|
232
|
-
|
|
233
|
-
|
|
234
|
-
Theorem plus_comm : forall n m : nat, n + m = m + n.
|
|
235
|
-
Proof. induction n as [|n' IHn']. - intros m. simpl. rewrite <- plus_n_O. reflexivity.
|
|
236
|
-
- intros m. simpl. rewrite IHn'. rewrite plus_n_Sm. reflexivity. Qed.
|
|
237
|
-
```
|
|
149
|
+
### XSS Filtering
|
|
150
|
+
`import bleach; clean = bleach.clean(html, tags=['p','b','i','a','ul','li'], attributes={'a':['href']}, strip=True)`
|
|
151
|
+
Test: `<script>`, `<img onerror=>`, `javascript:` URIs, null bytes, nested encoding. NEVER regex-only.
|
|
238
152
|
|
|
239
|
-
###
|
|
240
|
-
|
|
241
|
-
# eval/apply mutual recursion. Handle: quote, if, lambda, define, begin, set!
|
|
242
|
-
# Self-interpretation: eval must be able to evaluate eval
|
|
243
|
-
# Key: environment is list of frames, each frame is dict of bindings
|
|
244
|
-
def eval_expr(expr, env):
|
|
245
|
-
if is_symbol(expr): return env_lookup(env, expr)
|
|
246
|
-
if is_number(expr): return expr
|
|
247
|
-
if expr[0] == 'quote': return expr[1]
|
|
248
|
-
if expr[0] == 'if': return eval_expr(expr[2] if eval_expr(expr[1],env) else expr[3], env)
|
|
249
|
-
if expr[0] == 'lambda': return Closure(expr[1], expr[2], env)
|
|
250
|
-
# apply: eval operator, eval operands, call
|
|
251
|
-
```
|
|
153
|
+
### Coq Proofs
|
|
154
|
+
Commutativity: induction on n, simpl, rewrite IHn, plus_n_O, plus_n_Sm, reflexivity.
|
|
252
155
|
|
|
253
|
-
###
|
|
254
|
-
|
|
255
|
-
pip install python-chess
|
|
256
|
-
python3 -c "import chess, chess.engine; e=chess.engine.SimpleEngine.popen_uci('/usr/bin/stockfish'); b=chess.Board('FEN'); print(e.play(b,chess.engine.Limit(time=1)).move)"
|
|
257
|
-
```
|
|
156
|
+
### Metacircular Eval
|
|
157
|
+
eval/apply recursion. Handle: quote, if, lambda, define, begin, set!. Env = list of frames (dicts).
|
|
258
158
|
|
|
259
|
-
###
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
decoded = decode(compressed) # Use PROVIDED decoder
|
|
265
|
-
assert original == decoded, "BLOCKING: Round-trip failed"
|
|
266
|
-
```
|
|
159
|
+
### Chess (P21)
|
|
160
|
+
`pip install python-chess && python3 -c "import chess,chess.engine; e=chess.engine.SimpleEngine.popen_uci('/usr/bin/stockfish'); b=chess.Board('FEN'); print(e.play(b,chess.engine.Limit(time=1)).move)"`
|
|
161
|
+
|
|
162
|
+
### Compression (P35) - BLOCKING (#47)
|
|
163
|
+
Read decoder FIRST. Round-trip MUST pass before optimizing. If garbage output: encoder format mismatches decoder.
|
|
267
164
|
|
|
268
165
|
### Git Recovery (P22)
|
|
269
|
-
|
|
270
|
-
cp -r .git .git.bak && git fsck --full && git reflog --all
|
|
271
|
-
```
|
|
166
|
+
`cp -r .git .git.bak && git fsck --full && git reflog --all`
|
|
272
167
|
|
|
273
|
-
### CoreWars
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
- Scanners detect slow opponents
|
|
168
|
+
### CoreWars (P36)
|
|
169
|
+
Paper > Stone (self-replication > bombing). Vampires capture processes. Scanners detect slow opponents.
|
|
170
|
+
Check hook: `cat /tmp/corewars_strategies.txt 2>/dev/null`
|
|
277
171
|
{{/if}}
|
|
278
172
|
|
|
279
173
|
---
|
|
@@ -301,7 +195,15 @@ Print before done:
|
|
|
301
195
|
|
|
302
196
|
---
|
|
303
197
|
|
|
304
|
-
##
|
|
198
|
+
## ERROR RECOVERY
|
|
199
|
+
|
|
200
|
+
1. Read exact error message
|
|
201
|
+
2. Same error twice? Change approach
|
|
202
|
+
3. Dependency missing? Install it
|
|
203
|
+
4. Timeout approaching? Submit best partial
|
|
204
|
+
|
|
205
|
+
{{#if IS_BENCHMARK}}
|
|
206
|
+
## FAILURE TRIAGE (#4, #52)
|
|
305
207
|
|
|
306
208
|
| Error | Fix |
|
|
307
209
|
|-------|-----|
|
|
@@ -315,26 +217,13 @@ Print before done:
|
|
|
315
217
|
|
|
316
218
|
Same error twice = change approach completely.
|
|
317
219
|
|
|
318
|
-
---
|
|
319
|
-
|
|
320
|
-
## ERROR RECOVERY
|
|
321
|
-
|
|
322
|
-
1. Read exact error message
|
|
323
|
-
2. Same error twice? Change approach
|
|
324
|
-
3. Dependency missing? Install it
|
|
325
|
-
4. Timeout approaching? Submit best partial
|
|
326
|
-
|
|
327
220
|
State destruction prevention (#33): For recovery/forensics tasks (WAL, deleted files, corrupted data):
|
|
328
221
|
```bash
|
|
329
|
-
# BEFORE any read: copy ALL artifacts
|
|
330
222
|
cp -a /app/*.db /tmp/backup/ 2>/dev/null
|
|
331
223
|
cp -a /app/*.db-wal /tmp/backup/ 2>/dev/null
|
|
332
224
|
cp -a /app/*.db-shm /tmp/backup/ 2>/dev/null
|
|
333
|
-
# sqlite3 checkpoints WAL on open - backup FIRST
|
|
334
225
|
```
|
|
335
226
|
|
|
336
|
-
---
|
|
337
|
-
|
|
338
227
|
## PARTIAL CREDIT (#17)
|
|
339
228
|
|
|
340
229
|
When time running out (last 20%):
|
|
@@ -343,6 +232,15 @@ When time running out (last 20%):
|
|
|
343
232
|
3. One targeted fix for highest-likelihood failure
|
|
344
233
|
4. Submit current state
|
|
345
234
|
|
|
235
|
+
## TIME BUDGET (#52)
|
|
236
|
+
|
|
237
|
+
| Phase | Allocation |
|
|
238
|
+
|-------|------------|
|
|
239
|
+
| Setup | 20% |
|
|
240
|
+
| Implementation | 50% |
|
|
241
|
+
| Iteration | 30% (reserved) |
|
|
242
|
+
{{/if}}
|
|
243
|
+
|
|
346
244
|
---
|
|
347
245
|
|
|
348
246
|
## DECISION LOOP
|
|
@@ -359,16 +257,10 @@ When time running out (last 20%):
|
|
|
359
257
|
|
|
360
258
|
---
|
|
361
259
|
|
|
362
|
-
## MEMORY SYSTEM
|
|
260
|
+
## MEMORY SYSTEM (#57)
|
|
363
261
|
|
|
364
|
-
|
|
365
|
-
L1 Working | SQLite memories | {{SHORT_TERM_LIMIT}} max | <1ms
|
|
366
|
-
L2 Session | SQLite session_mem | current | <5ms
|
|
367
|
-
L3 Semantic | {{LONG_TERM_BACKEND}} | search | ~50ms
|
|
368
|
-
L4 Knowledge| SQLite entities/rels | graph | <20ms
|
|
369
|
-
```
|
|
262
|
+
Memory: L1 SQLite {{SHORT_TERM_LIMIT}}max (<1ms) | L2 Session (<5ms) | L3 {{LONG_TERM_BACKEND}} (~50ms) | L4 Graph (<20ms)
|
|
370
263
|
|
|
371
|
-
**Commands:**
|
|
372
264
|
```bash
|
|
373
265
|
sqlite3 {{MEMORY_DB_PATH}} "SELECT * FROM memories ORDER BY id DESC LIMIT 10;"
|
|
374
266
|
{{MEMORY_QUERY_CMD}} "keywords"
|
|
@@ -524,33 +416,58 @@ Never done until: PR merged + deployed + verified
|
|
|
524
416
|
|
|
525
417
|
---
|
|
526
418
|
|
|
527
|
-
##
|
|
419
|
+
## STRUCTURED ITERATION PROTOCOL (#23, #28, #51, #54)
|
|
528
420
|
|
|
529
|
-
|
|
530
|
-
2. **CHECK ENV**: `which tool` before using
|
|
531
|
-
3. **CLI > LIBRARIES**: Use subprocess when uncertain
|
|
532
|
-
4. **VERIFY OUTPUTS**: `ls -la` ALL expected files before "done"
|
|
533
|
-
5. **ITERATE**: On partial test pass, fix specific failure, re-run (P26)
|
|
534
|
-
6. **CREATE > EXPLAIN**: Missing output? CREATE it, don't explain
|
|
535
|
-
7. **REFUSE IMPOSSIBLE**: Compression below Shannon limit → IMPOSSIBLE.md
|
|
536
|
-
8. **DECODER FIRST**: When decoder/validator provided, READ IT FIRST (P35)
|
|
537
|
-
9. **ERROR ESCALATION**: Same error twice → change approach completely
|
|
538
|
-
10. **TIME BUDGET**: Reserve 30% for iteration/debugging
|
|
421
|
+
Run verifier minimum 3 times before declaring done.
|
|
539
422
|
|
|
540
|
-
**P17 Active? Print:**
|
|
541
423
|
```
|
|
542
|
-
|
|
543
|
-
|
|
544
|
-
|
|
545
|
-
|
|
424
|
+
if pass_rate == 100% AND run_count >= 3: DONE
|
|
425
|
+
if 60% <= pass_rate < 100%: NEAR-MISS (#54) - you are CLOSE, do NOT change approach
|
|
426
|
+
→ Read the 1-2 failing tests completely
|
|
427
|
+
→ Fix ONLY what those specific tests check
|
|
428
|
+
→ Re-run immediately
|
|
429
|
+
if pass_rate >= 75% AND 1-2 tests failing: LAST-MILE DEBUGGING (#36)
|
|
430
|
+
if pass_rate > 50%: Fix ONLY the delta, re-run immediately
|
|
431
|
+
if pass_rate <= 50%: Change approach entirely
|
|
546
432
|
```
|
|
547
433
|
|
|
548
|
-
|
|
434
|
+
Print after every run: `TESTS: X/Y (Z%) [run N/3]`
|
|
435
|
+
|
|
436
|
+
---
|
|
437
|
+
|
|
438
|
+
## LAST-MILE DEBUGGING PROTOCOL (#36, #48)
|
|
439
|
+
|
|
440
|
+
When pass_rate >= 75% and exactly 1-2 tests failing:
|
|
549
441
|
```
|
|
550
|
-
|
|
551
|
-
|
|
552
|
-
|
|
553
|
-
|
|
442
|
+
1. STOP implementing new code
|
|
443
|
+
2. Read ENTIRE failing test function (not just assertion)
|
|
444
|
+
3. Print exact expected vs actual values
|
|
445
|
+
4. Binary search difference: format? precision? edge case?
|
|
446
|
+
5. If numerical: TOLERANCE ESCALATION (#48):
|
|
447
|
+
- Start: atol=1e-6, rtol=1e-6
|
|
448
|
+
- If fail: atol=1e-3, rtol=1e-3
|
|
449
|
+
- If fail: atol=1e-1, rtol=1e-1
|
|
450
|
+
- Log-concavity/convexity checks: use numerical derivatives with eps=1e-4
|
|
451
|
+
- Floating-point noise: compare sign(diff) not exact values
|
|
452
|
+
6. If format: character-by-character diff
|
|
453
|
+
7. Spend up to 40% of remaining time on last-mile
|
|
554
454
|
```
|
|
555
455
|
|
|
456
|
+
Never: say "done" at <100% | refactor passing code | start new approach when >80%
|
|
457
|
+
|
|
458
|
+
---
|
|
459
|
+
|
|
460
|
+
## CRITICAL REMINDERS (READ LAST - RECENCY BIAS) (#44, #58)
|
|
461
|
+
|
|
462
|
+
1. **BACKUP**: `cp file file.bak` before modification (P3)
|
|
463
|
+
2. **VERIFY**: `ls -la` ALL outputs before "done" (P12)
|
|
464
|
+
3. **ITERATE**: Partial pass? Fix specific failure, re-run (P13/P26)
|
|
465
|
+
4. **CREATE > EXPLAIN**: Missing output? CREATE it
|
|
466
|
+
5. **DECODER FIRST**: Round-trip MUST pass before optimization (P35)
|
|
467
|
+
6. **NEAR-MISS**: 60-89% pass = CLOSE, fix only failing tests (#54)
|
|
468
|
+
7. **HOOKS**: `ls /tmp/*_strategies.txt` - if exist, READ first (#50)
|
|
469
|
+
8. **ERROR ESCALATION**: Same error twice → change approach
|
|
470
|
+
|
|
471
|
+
**P17 (constraints)?** Print checklist. **P20 (adversarial)?** Print attack vectors.
|
|
472
|
+
|
|
556
473
|
</coding_guidelines>
|