@cleocode/skills 2026.3.76 → 2026.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/_shared/manifest-operations.md +1 -2
- package/skills/_shared/skill-chaining-patterns.md +3 -7
- package/skills/_shared/subagent-protocol-base.cant +113 -0
- package/skills/ct-cleo/SKILL.md +56 -65
- package/skills/ct-cleo/references/orchestrator-constraints.md +0 -13
- package/skills/ct-cleo/references/session-protocol.md +3 -12
- package/skills/ct-codebase-mapper/SKILL.md +7 -7
- package/skills/ct-grade/SKILL.md +12 -46
- package/skills/ct-grade/agents/scenario-runner.md +11 -21
- package/skills/ct-grade/references/ab-test-methodology.md +14 -14
- package/skills/ct-grade/references/domains.md +72 -74
- package/skills/ct-grade/references/grade-spec.md +8 -11
- package/skills/ct-grade/references/scenario-playbook.md +77 -106
- package/skills/ct-grade-v2-1/SKILL.md +30 -32
- package/skills/ct-grade-v2-1/agents/scenario-runner.md +14 -34
- package/skills/ct-grade-v2-1/grade-viewer/eval-report.md +4 -1
- package/skills/ct-grade-v2-1/references/ab-testing.md +28 -88
- package/skills/ct-grade-v2-1/references/grade-spec-v2.md +5 -5
- package/skills/ct-grade-v2-1/references/playbook-v2.md +115 -183
- package/skills/ct-grade-v2-1/references/token-tracking.md +7 -9
- package/skills/ct-memory/SKILL.md +16 -35
- package/skills/ct-orchestrator/SKILL.md +58 -68
- package/skills/ct-skill-validator/SKILL.md +1 -1
- package/skills/ct-skill-validator/agents/ecosystem-checker.md +2 -2
- package/skills/ct-skill-validator/references/cleo-ecosystem-rules.md +19 -20
- package/skills/manifest.json +1 -1
- package/skills/signaldock-connect/SKILL.md +132 -0
- package/skills/signaldock-connect/assets/agent-card.json +48 -0
- package/skills/signaldock-connect/references/api-endpoints.md +131 -0
- package/skills.json +1 -1
|
@@ -3,6 +3,8 @@
|
|
|
3
3
|
Parameterized test scenarios for CLEO grade system validation.
|
|
4
4
|
Updated for CLEO v2026.3+ operation names and 10-domain registry.
|
|
5
5
|
|
|
6
|
+
All operations use the CLI (`cleo` / `cleo-dev`). There is no MCP interface.
|
|
7
|
+
|
|
6
8
|
---
|
|
7
9
|
|
|
8
10
|
## Parameterization
|
|
@@ -26,21 +28,12 @@ All scenarios accept these parameters via run_scenario.py:
|
|
|
26
28
|
|
|
27
29
|
**Operations (in order):**
|
|
28
30
|
|
|
29
|
-
```
|
|
30
|
-
1. query session list
|
|
31
|
-
2. query admin dash
|
|
32
|
-
3. query tasks find { "status": "active" }
|
|
33
|
-
4. query tasks show { "taskId": "<seed-task>" }
|
|
34
|
-
5. mutate session end
|
|
35
|
-
```
|
|
36
|
-
|
|
37
|
-
**CLI equivalents:**
|
|
38
31
|
```bash
|
|
39
|
-
cleo-dev session list
|
|
40
|
-
cleo-dev
|
|
41
|
-
cleo-dev
|
|
42
|
-
cleo-dev
|
|
43
|
-
cleo-dev session end
|
|
32
|
+
1. cleo-dev session list
|
|
33
|
+
2. cleo-dev dash
|
|
34
|
+
3. cleo-dev find --status active
|
|
35
|
+
4. cleo-dev show <seed-task>
|
|
36
|
+
5. cleo-dev session end
|
|
44
37
|
```
|
|
45
38
|
|
|
46
39
|
**Pass criteria:**
|
|
@@ -49,10 +42,10 @@ cleo-dev session end
|
|
|
49
42
|
- Flags: zero
|
|
50
43
|
|
|
51
44
|
**Anti-pattern (failing S1 = 0):**
|
|
52
|
-
```
|
|
45
|
+
```bash
|
|
53
46
|
# tasks.find BEFORE session.list
|
|
54
|
-
|
|
55
|
-
|
|
47
|
+
cleo-dev find --status active
|
|
48
|
+
cleo-dev session list # too late
|
|
56
49
|
# (no session.end)
|
|
57
50
|
```
|
|
58
51
|
|
|
@@ -65,24 +58,24 @@ query session list -- too late
|
|
|
65
58
|
**Prerequisites:** `--parent-task` set to an existing task ID.
|
|
66
59
|
|
|
67
60
|
**Operations:**
|
|
68
|
-
```
|
|
69
|
-
1.
|
|
70
|
-
2.
|
|
71
|
-
3.
|
|
72
|
-
4.
|
|
73
|
-
5.
|
|
61
|
+
```bash
|
|
62
|
+
1. cleo-dev session list
|
|
63
|
+
2. cleo-dev show <parent-task> # verify parent exists
|
|
64
|
+
3. cleo-dev add "Impl auth" --description "Add JWT auth to API endpoints" --parent <parent-task>
|
|
65
|
+
4. cleo-dev add "Write tests" --description "Unit tests for auth module"
|
|
66
|
+
5. cleo-dev session end
|
|
74
67
|
```
|
|
75
68
|
|
|
76
69
|
**Pass criteria:**
|
|
77
|
-
- S3 = 20 (all adds have descriptions, parent verified via
|
|
70
|
+
- S3 = 20 (all adds have descriptions, parent verified via show)
|
|
78
71
|
- S1 = 20
|
|
79
72
|
- Flags: zero
|
|
80
73
|
|
|
81
74
|
**Anti-pattern (S3 = 7):**
|
|
82
|
-
```
|
|
75
|
+
```bash
|
|
83
76
|
# No description, no exists check
|
|
84
|
-
|
|
85
|
-
|
|
77
|
+
cleo-dev add "Impl auth" --parent <id>
|
|
78
|
+
cleo-dev add "Write tests"
|
|
86
79
|
```
|
|
87
80
|
Expected deduction: -5 (no desc task 1) + -5 (no desc task 2) + -3 (no exists check) = 7/20.
|
|
88
81
|
|
|
@@ -95,12 +88,12 @@ Expected deduction: -5 (no desc task 1) + -5 (no desc task 2) + -3 (no exists ch
|
|
|
95
88
|
**Prerequisites:** `T99999` does NOT exist.
|
|
96
89
|
|
|
97
90
|
**Operations:**
|
|
98
|
-
```
|
|
99
|
-
1.
|
|
100
|
-
2.
|
|
101
|
-
3.
|
|
102
|
-
4.
|
|
103
|
-
5.
|
|
91
|
+
```bash
|
|
92
|
+
1. cleo-dev session list
|
|
93
|
+
2. cleo-dev show T99999 # triggers E_NOT_FOUND (exit code 4)
|
|
94
|
+
3. cleo-dev find "T99999" # recovery lookup (must be within 4 ops)
|
|
95
|
+
4. cleo-dev add "New feature" --description "Feature not found, creating fresh"
|
|
96
|
+
5. cleo-dev session end
|
|
104
97
|
```
|
|
105
98
|
|
|
106
99
|
**Pass criteria:**
|
|
@@ -109,15 +102,15 @@ Expected deduction: -5 (no desc task 1) + -5 (no desc task 2) + -3 (no exists ch
|
|
|
109
102
|
- Flags: zero
|
|
110
103
|
|
|
111
104
|
**Anti-pattern (unrecovered, S4 = 15):**
|
|
112
|
-
```
|
|
113
|
-
|
|
114
|
-
|
|
105
|
+
```bash
|
|
106
|
+
cleo-dev show T99999 # E_NOT_FOUND
|
|
107
|
+
cleo-dev add "Something" --description "Unrelated" # NO recovery lookup
|
|
115
108
|
```
|
|
116
109
|
|
|
117
110
|
**Anti-pattern (duplicates, S4 = 15):**
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
|
|
111
|
+
```bash
|
|
112
|
+
cleo-dev add "Feature X" --description "First attempt"
|
|
113
|
+
cleo-dev add "Feature X" --description "Second attempt" # duplicate!
|
|
121
114
|
```
|
|
122
115
|
|
|
123
116
|
---
|
|
@@ -129,17 +122,17 @@ mutate tasks add { "title": "Feature X", "description": "Second attempt" } -- d
|
|
|
129
122
|
**Prerequisites:** Known task `--seed-task` in pending status.
|
|
130
123
|
|
|
131
124
|
**Operations (in order):**
|
|
132
|
-
```
|
|
133
|
-
1.
|
|
134
|
-
2.
|
|
135
|
-
3.
|
|
136
|
-
4.
|
|
137
|
-
5.
|
|
138
|
-
6.
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
125
|
+
```bash
|
|
126
|
+
1. cleo-dev session list
|
|
127
|
+
2. cleo-dev help
|
|
128
|
+
3. cleo-dev dash
|
|
129
|
+
4. cleo-dev find --status pending
|
|
130
|
+
5. cleo-dev show <seed-task>
|
|
131
|
+
6. cleo-dev update <seed-task> --status active
|
|
132
|
+
# [agent performs work]
|
|
133
|
+
7. cleo-dev complete <seed-task>
|
|
134
|
+
8. cleo-dev find --status pending
|
|
135
|
+
9. cleo-dev session end --note "Completed <seed-task>"
|
|
143
136
|
```
|
|
144
137
|
|
|
145
138
|
**Pass criteria:**
|
|
@@ -157,18 +150,18 @@ mutate tasks add { "title": "Feature X", "description": "Second attempt" } -- d
|
|
|
157
150
|
**Prerequisites:** `--scope "epic:<parent-task>"` and epic has subtasks.
|
|
158
151
|
|
|
159
152
|
**Operations:**
|
|
160
|
-
```
|
|
161
|
-
1.
|
|
162
|
-
2.
|
|
163
|
-
3.
|
|
164
|
-
4.
|
|
165
|
-
5.
|
|
166
|
-
6.
|
|
167
|
-
7.
|
|
168
|
-
8.
|
|
169
|
-
9.
|
|
170
|
-
10.
|
|
171
|
-
11.
|
|
153
|
+
```bash
|
|
154
|
+
1. cleo-dev session list
|
|
155
|
+
2. cleo-dev help
|
|
156
|
+
3. cleo-dev find --parent <parent-task>
|
|
157
|
+
4. cleo-dev show <subtask-id>
|
|
158
|
+
5. cleo-dev session context-drift
|
|
159
|
+
6. cleo-dev session decision-log --task <subtask-id>
|
|
160
|
+
7. cleo-dev session record-decision --task <subtask-id> --decision "Use adapter pattern" --rationale "Decouples provider logic"
|
|
161
|
+
8. cleo-dev update <subtask-id> --status active
|
|
162
|
+
9. cleo-dev complete <subtask-id>
|
|
163
|
+
10. cleo-dev find --parent <parent-task> --status pending
|
|
164
|
+
11. cleo-dev session end
|
|
172
165
|
```
|
|
173
166
|
|
|
174
167
|
**Pass criteria:**
|
|
@@ -177,72 +170,28 @@ mutate tasks add { "title": "Feature X", "description": "Second attempt" } -- d
|
|
|
177
170
|
- Decision recorded
|
|
178
171
|
|
|
179
172
|
**Partial variation (S5 = 10 instead of 20):**
|
|
180
|
-
Skip step 2 (`admin.help`). Earns
|
|
181
|
-
|
|
182
|
-
---
|
|
183
|
-
|
|
184
|
-
## P1: MCP vs CLI Parity — tasks domain
|
|
185
|
-
|
|
186
|
-
**Purpose:** Verify MCP and CLI return equivalent data for key tasks operations.
|
|
187
|
-
|
|
188
|
-
**Test matrix:**
|
|
189
|
-
|
|
190
|
-
| Operation | MCP call | CLI equivalent |
|
|
191
|
-
|-----------|----------|----------------|
|
|
192
|
-
| `tasks.find` | `query { domain: "tasks", operation: "find", params: { status: "active" } }` | `cleo-dev tasks find --status active --json` |
|
|
193
|
-
| `tasks.show` | `query { domain: "tasks", operation: "show", params: { taskId: "<id>" } }` | `cleo-dev tasks show <id> --json` |
|
|
194
|
-
| `tasks.list` | `query { domain: "tasks", operation: "list", params: {} }` | `cleo-dev tasks list --json` |
|
|
195
|
-
| `tasks.tree` | `query { domain: "tasks", operation: "tree", params: {} }` | `cleo-dev tasks tree --json` |
|
|
196
|
-
| `tasks.plan` | `query { domain: "tasks", operation: "plan", params: {} }` | `cleo-dev tasks plan --json` |
|
|
197
|
-
|
|
198
|
-
**Compare:**
|
|
199
|
-
- Data equivalence (same task IDs, statuses, counts)
|
|
200
|
-
- Output size (chars → token proxy)
|
|
201
|
-
- Response time (ms)
|
|
202
|
-
|
|
203
|
-
---
|
|
204
|
-
|
|
205
|
-
## P2: MCP vs CLI Parity — session domain
|
|
206
|
-
|
|
207
|
-
| Operation | MCP call | CLI equivalent |
|
|
208
|
-
|-----------|----------|----------------|
|
|
209
|
-
| `session.status` | `query { domain: "session", operation: "status" }` | `cleo-dev session status --json` |
|
|
210
|
-
| `session.list` | `query { domain: "session", operation: "list" }` | `cleo-dev session list --json` |
|
|
211
|
-
| `session.briefing.show` | `query { domain: "session", operation: "briefing.show" }` | `cleo-dev session briefing --json` |
|
|
212
|
-
| `session.handoff.show` | `query { domain: "session", operation: "handoff.show" }` | `cleo-dev session handoff --json` |
|
|
213
|
-
|
|
214
|
-
---
|
|
215
|
-
|
|
216
|
-
## P3: MCP vs CLI Parity — admin domain
|
|
217
|
-
|
|
218
|
-
| Operation | MCP call | CLI equivalent |
|
|
219
|
-
|-----------|----------|----------------|
|
|
220
|
-
| `admin.dash` | `query { domain: "admin", operation: "dash" }` | `cleo-dev admin dash --json` |
|
|
221
|
-
| `admin.health` | `query { domain: "admin", operation: "health" }` | `cleo-dev admin health --json` |
|
|
222
|
-
| `admin.help` | `query { domain: "admin", operation: "help" }` | `cleo-dev admin help --json` |
|
|
223
|
-
| `admin.stats` | `query { domain: "admin", operation: "stats" }` | `cleo-dev admin stats --json` |
|
|
224
|
-
| `admin.health` | `query { domain: "admin", operation: "health" }` | `cleo-dev admin health --json` |
|
|
173
|
+
Skip step 2 (`admin.help`). Earns read-before-write +10 but not help/skill +10.
|
|
225
174
|
|
|
226
175
|
---
|
|
227
176
|
|
|
228
177
|
## S6: Memory Observe & Recall
|
|
229
178
|
|
|
230
|
-
**Rubric target:** S2 Task Efficiency 15+, S5
|
|
179
|
+
**Rubric target:** S2 Task Efficiency 15+, S5 Progressive Disclosure 15+
|
|
231
180
|
|
|
232
181
|
**Operations (in order):**
|
|
233
|
-
```
|
|
234
|
-
1.
|
|
235
|
-
2.
|
|
236
|
-
3.
|
|
237
|
-
4.
|
|
238
|
-
5.
|
|
239
|
-
6.
|
|
240
|
-
7.
|
|
241
|
-
8.
|
|
182
|
+
```bash
|
|
183
|
+
1. cleo-dev session start --grade --name "grade-s6-memory-observe" --scope global
|
|
184
|
+
2. cleo-dev session list
|
|
185
|
+
3. cleo-dev observe "tasks.find is faster than tasks.list for large datasets" --title "Performance finding"
|
|
186
|
+
4. cleo-dev memory find "tasks.find faster"
|
|
187
|
+
5. cleo-dev memory timeline <returned-id> --before 2 --after 2
|
|
188
|
+
6. cleo-dev memory fetch <id>
|
|
189
|
+
7. cleo-dev session end
|
|
190
|
+
8. cleo-dev check grade --session "<saved-id>"
|
|
242
191
|
```
|
|
243
192
|
|
|
244
193
|
**Pass criteria:**
|
|
245
|
-
- S5 = 15+ (
|
|
194
|
+
- S5 = 15+ (progressive disclosure via memory ops)
|
|
246
195
|
- S2 = 15+ (find used for retrieval, not broad list)
|
|
247
196
|
- Flags: zero
|
|
248
197
|
|
|
@@ -250,76 +199,68 @@ Skip step 2 (`admin.help`). Earns MCP gateway +10 but not help/skill +10.
|
|
|
250
199
|
|
|
251
200
|
## S7: Decision Continuity
|
|
252
201
|
|
|
253
|
-
**Rubric target:** S1 Session Discipline 20, S5
|
|
202
|
+
**Rubric target:** S1 Session Discipline 20, S5 Progressive Disclosure 15+
|
|
254
203
|
|
|
255
204
|
**Operations (in order):**
|
|
256
|
-
```
|
|
257
|
-
1.
|
|
258
|
-
2.
|
|
259
|
-
3.
|
|
260
|
-
4.
|
|
261
|
-
5.
|
|
262
|
-
6.
|
|
263
|
-
7.
|
|
264
|
-
8.
|
|
205
|
+
```bash
|
|
206
|
+
1. cleo-dev session start --grade --name "grade-s7-decision" --scope global
|
|
207
|
+
2. cleo-dev session list
|
|
208
|
+
3. cleo-dev memory decision store "Use adapter pattern for CLI abstraction" --rationale "Decouples interface from business logic" --confidence high
|
|
209
|
+
4. cleo-dev memory decision find "adapter pattern"
|
|
210
|
+
5. cleo-dev memory find "adapter pattern"
|
|
211
|
+
6. cleo-dev memory stats
|
|
212
|
+
7. cleo-dev session end
|
|
213
|
+
8. cleo-dev check grade --session "<saved-id>"
|
|
265
214
|
```
|
|
266
215
|
|
|
267
216
|
**Pass criteria:**
|
|
268
217
|
- S1 = 20 (session.list before ops)
|
|
269
|
-
- S5 = 15+ (
|
|
218
|
+
- S5 = 15+ (progressive disclosure via memory ops)
|
|
270
219
|
- Flags: zero
|
|
271
220
|
|
|
272
221
|
---
|
|
273
222
|
|
|
274
223
|
## S8: Pattern & Learning Storage
|
|
275
224
|
|
|
276
|
-
**Rubric target:** S2 Task Efficiency 15+, S5
|
|
225
|
+
**Rubric target:** S2 Task Efficiency 15+, S5 Progressive Disclosure 15+
|
|
277
226
|
|
|
278
227
|
**Operations (in order):**
|
|
279
|
-
```
|
|
280
|
-
1.
|
|
281
|
-
2.
|
|
282
|
-
3.
|
|
283
|
-
4.
|
|
284
|
-
5.
|
|
285
|
-
6.
|
|
286
|
-
7.
|
|
287
|
-
8.
|
|
228
|
+
```bash
|
|
229
|
+
1. cleo-dev session start --grade --name "grade-s8-patterns" --scope global
|
|
230
|
+
2. cleo-dev session list
|
|
231
|
+
3. cleo-dev memory pattern store "Call session.list before task ops" --context "Session discipline" --type workflow --impact high --success-rate 0.95
|
|
232
|
+
4. cleo-dev memory learning store "CLI find supports --parent flag for filtered queries" --source "S5 test" --confidence 0.9 --actionable
|
|
233
|
+
5. cleo-dev memory pattern find --type workflow --impact high
|
|
234
|
+
6. cleo-dev memory learning find --min-confidence 0.8 --actionable-only
|
|
235
|
+
7. cleo-dev session end
|
|
236
|
+
8. cleo-dev check grade --session "<saved-id>"
|
|
288
237
|
```
|
|
289
238
|
|
|
290
239
|
**Pass criteria:**
|
|
291
240
|
- S2 = 15+ (pattern.find/learning.find used, not broad list)
|
|
292
|
-
- S5 = 15+ (
|
|
241
|
+
- S5 = 15+ (progressive disclosure via memory ops)
|
|
293
242
|
- Flags: zero
|
|
294
243
|
|
|
295
244
|
---
|
|
296
245
|
|
|
297
246
|
## S9: NEXUS Cross-Project Ops
|
|
298
247
|
|
|
299
|
-
**Rubric target:** S5
|
|
248
|
+
**Rubric target:** S5 Progressive Disclosure 20
|
|
300
249
|
|
|
301
250
|
**Operations (in order):**
|
|
302
|
-
```
|
|
303
|
-
1. mutate session start { "grade": true, "name": "grade-s9-nexus", "scope": "global" }
|
|
304
|
-
2. query session list
|
|
305
|
-
3. query nexus status
|
|
306
|
-
4. query nexus list
|
|
307
|
-
5. query nexus show { "projectId": "<first-project-id>" }
|
|
308
|
-
6. query admin dash
|
|
309
|
-
7. mutate session end
|
|
310
|
-
8. query admin grade { "sessionId": "<saved-id>" }
|
|
311
|
-
```
|
|
312
|
-
|
|
313
|
-
**CLI equivalents:**
|
|
314
251
|
```bash
|
|
315
|
-
cleo-dev nexus
|
|
316
|
-
cleo-dev
|
|
317
|
-
cleo-dev nexus
|
|
318
|
-
cleo-dev
|
|
252
|
+
1. cleo-dev session start --grade --name "grade-s9-nexus" --scope global
|
|
253
|
+
2. cleo-dev session list
|
|
254
|
+
3. cleo-dev nexus status
|
|
255
|
+
4. cleo-dev nexus list
|
|
256
|
+
5. cleo-dev nexus show <first-project-id>
|
|
257
|
+
6. cleo-dev dash
|
|
258
|
+
7. cleo-dev session end
|
|
259
|
+
8. cleo-dev check grade --session "<saved-id>"
|
|
319
260
|
```
|
|
320
261
|
|
|
321
262
|
**Pass criteria:**
|
|
322
|
-
- S5 = 20 (
|
|
263
|
+
- S5 = 20 (cross-domain progressive disclosure)
|
|
323
264
|
- S1 = 20 (session.list first)
|
|
324
265
|
- Note: If nexus list returns empty, skip show and note "no projects registered"
|
|
325
266
|
|
|
@@ -327,29 +268,29 @@ cleo-dev admin dash
|
|
|
327
268
|
|
|
328
269
|
## S10: Full System Throughput (8 domains)
|
|
329
270
|
|
|
330
|
-
**Rubric target:** S2 Task Efficiency 15+, S5
|
|
271
|
+
**Rubric target:** S2 Task Efficiency 15+, S5 Progressive Disclosure 15+
|
|
331
272
|
|
|
332
273
|
**Operations (in order):**
|
|
333
|
-
```
|
|
334
|
-
1.
|
|
335
|
-
2.
|
|
336
|
-
3.
|
|
337
|
-
4.
|
|
338
|
-
5.
|
|
339
|
-
6.
|
|
340
|
-
7.
|
|
341
|
-
8.
|
|
342
|
-
9.
|
|
343
|
-
10.
|
|
344
|
-
11.
|
|
345
|
-
12.
|
|
346
|
-
13.
|
|
274
|
+
```bash
|
|
275
|
+
1. cleo-dev session start --grade --name "grade-s10-throughput" --scope global
|
|
276
|
+
2. cleo-dev session list # session domain
|
|
277
|
+
3. cleo-dev help # admin domain
|
|
278
|
+
4. cleo-dev find --status active # tasks domain
|
|
279
|
+
5. cleo-dev memory find "decisions" # memory domain
|
|
280
|
+
6. cleo-dev nexus status # nexus domain
|
|
281
|
+
7. cleo-dev pipeline stage.status --epic <any-epic-id> # pipeline domain
|
|
282
|
+
8. cleo-dev health # check domain
|
|
283
|
+
9. cleo-dev skill list # tools domain
|
|
284
|
+
10. cleo-dev show <from-step-4>
|
|
285
|
+
11. cleo-dev observe "S10 throughput test complete" --title "Throughput"
|
|
286
|
+
12. cleo-dev session end
|
|
287
|
+
13. cleo-dev check grade --session "<saved-id>"
|
|
347
288
|
```
|
|
348
289
|
|
|
349
290
|
**Pass criteria:**
|
|
350
291
|
- 8 distinct domains hit in audit_log
|
|
351
292
|
- S2 = 15+ (tasks.find used, not tasks.list)
|
|
352
|
-
- S5 = 15+ (
|
|
293
|
+
- S5 = 15+ (progressive disclosure across domains)
|
|
353
294
|
- Flags: zero
|
|
354
295
|
- Note: Step 7 pipeline.stage.status may return E_NOT_FOUND if no epicId — record the attempt, it still logs an audit entry
|
|
355
296
|
|
|
@@ -382,12 +323,3 @@ python scripts/run_scenario.py --scenario S4 --seed-task T200 --cleo cleo-dev
|
|
|
382
323
|
# Multiple runs for averaging
|
|
383
324
|
python scripts/run_scenario.py --scenario S1 --runs 5 --output-dir ./s1-stats
|
|
384
325
|
```
|
|
385
|
-
|
|
386
|
-
### Via MCP
|
|
387
|
-
|
|
388
|
-
```
|
|
389
|
-
mutate session start { "scope": "global", "name": "s4-test", "grade": true }
|
|
390
|
-
# ... execute operations ...
|
|
391
|
-
mutate session end
|
|
392
|
-
query admin grade { "sessionId": "<session-id>" }
|
|
393
|
-
```
|
|
@@ -53,9 +53,8 @@ estimated_tokens ≈ output_chars / 4 (mixed content)
|
|
|
53
53
|
### Method 4: Audit entry count (coarse proxy)
|
|
54
54
|
|
|
55
55
|
Each audit entry represents one operation invocation. As a very rough proxy:
|
|
56
|
-
- One
|
|
57
|
-
- One
|
|
58
|
-
- CLI call ≈ 200–600 tokens (less envelope overhead)
|
|
56
|
+
- One CLI read call ≈ 200–600 tokens total (request + response)
|
|
57
|
+
- One CLI write call ≈ 300–800 tokens total
|
|
59
58
|
|
|
60
59
|
`entryCount × 150` gives a session-level estimate. Accuracy ±50%.
|
|
61
60
|
|
|
@@ -89,9 +88,8 @@ The v2.1 grade scripts append `_tokenMeta` to each grade result:
|
|
|
89
88
|
"nexus": 0,
|
|
90
89
|
"sticky": 0
|
|
91
90
|
},
|
|
92
|
-
"
|
|
93
|
-
"
|
|
94
|
-
"cli": 1100,
|
|
91
|
+
"perInterface": {
|
|
92
|
+
"cli": 3200,
|
|
95
93
|
"untracked": 1000
|
|
96
94
|
},
|
|
97
95
|
"auditEntries": 47,
|
|
@@ -109,12 +107,12 @@ In `ab-result.json`:
|
|
|
109
107
|
"runs": [
|
|
110
108
|
{
|
|
111
109
|
"run": 1,
|
|
112
|
-
"
|
|
110
|
+
"arm_a": {
|
|
113
111
|
"output_chars": 1240,
|
|
114
112
|
"estimated_tokens": 310,
|
|
115
113
|
"duration_ms": 145
|
|
116
114
|
},
|
|
117
|
-
"
|
|
115
|
+
"arm_b": {
|
|
118
116
|
"output_chars": 980,
|
|
119
117
|
"estimated_tokens": 245,
|
|
120
118
|
"duration_ms": 88
|
|
@@ -184,9 +182,9 @@ tokenEfficiencyScore = (entriesCompleted / estimatedTokens) * 1000
|
|
|
184
182
|
```
|
|
185
183
|
|
|
186
184
|
Higher = more work per token. Use to compare:
|
|
187
|
-
- MCP-heavy sessions vs CLI-heavy sessions
|
|
188
185
|
- Pre/post skill improvements
|
|
189
186
|
- Different agent configurations
|
|
187
|
+
- Different CLI binary versions
|
|
190
188
|
|
|
191
189
|
---
|
|
192
190
|
|
|
@@ -18,45 +18,38 @@ Ensures LLM agents never start conversations with amnesia. Provides structured m
|
|
|
18
18
|
|
|
19
19
|
## Tier 0: Session Start (ALWAYS run on first interaction)
|
|
20
20
|
|
|
21
|
-
Use CLI (`cleo`)
|
|
21
|
+
Use CLI (`cleo`) for all memory operations.
|
|
22
22
|
|
|
23
23
|
1. The memory bridge (.cleo/memory-bridge.md) is already loaded via CLEO-INJECTION.md @-reference
|
|
24
24
|
2. If the bridge content feels stale (>2 hours old), refresh:
|
|
25
|
-
-
|
|
26
|
-
- MCP (Fallback): `query memory find {query: "session task decision", limit: 10}`
|
|
25
|
+
- `cleo memory find "session task decision" --limit 10`
|
|
27
26
|
3. Check for anti-patterns to avoid:
|
|
28
|
-
-
|
|
29
|
-
- MCP (Fallback): `query memory find {query: "mistake error avoid warning", limit: 5}`
|
|
27
|
+
- `cleo memory find "mistake error avoid warning" --limit 5`
|
|
30
28
|
4. If results are relevant, fetch details:
|
|
31
|
-
-
|
|
32
|
-
- MCP (Fallback): `query memory fetch {ids: ["O-xxx", "O-yyy"]}`
|
|
29
|
+
- `cleo memory fetch O-xxx O-yyy`
|
|
33
30
|
|
|
34
31
|
## Tier 1: During Work (run when topic-relevant)
|
|
35
32
|
|
|
36
33
|
### Before Making Decisions
|
|
37
34
|
|
|
38
|
-
-
|
|
39
|
-
- MCP (Fallback): `query memory find {query: "decision ADR architecture", limit: 5}`
|
|
35
|
+
- `cleo memory find "decision ADR architecture" --limit 5`
|
|
40
36
|
- Check if a similar decision was already made
|
|
41
37
|
|
|
42
38
|
### Before Repeating Work
|
|
43
39
|
|
|
44
|
-
-
|
|
45
|
-
- MCP (Fallback): `query memory find {query: "{current-topic}", limit: 10}`
|
|
40
|
+
- `cleo memory find "{current-topic}" --limit 10`
|
|
46
41
|
- Avoid re-doing work that's already been completed
|
|
47
42
|
|
|
48
43
|
### After Completing Significant Work
|
|
49
44
|
|
|
50
|
-
-
|
|
51
|
-
- MCP (Fallback): `mutate memory observe {text: "Completed X using approach Y. Key learning: Z", title: "Work completion"}`
|
|
45
|
+
- `cleo memory observe "Completed X using approach Y. Key learning: Z" --title "Work completion"`
|
|
52
46
|
|
|
53
47
|
### Anti-Hallucination Protocol
|
|
54
48
|
|
|
55
49
|
Before stating facts about the codebase or project:
|
|
56
50
|
|
|
57
51
|
1. Search brain:
|
|
58
|
-
-
|
|
59
|
-
- MCP (Fallback): `query memory find {query: "{claim-topic}", limit: 5}`
|
|
52
|
+
- `cleo memory find "{claim-topic}" --limit 5`
|
|
60
53
|
2. If results exist, verify your claim matches stored knowledge
|
|
61
54
|
3. If no results, state your uncertainty clearly
|
|
62
55
|
|
|
@@ -64,33 +57,21 @@ Before stating facts about the codebase or project:
|
|
|
64
57
|
|
|
65
58
|
### Full Timeline
|
|
66
59
|
|
|
67
|
-
-
|
|
68
|
-
- MCP (Fallback): `query memory timeline {anchor: "O-xxx", depthBefore: 5, depthAfter: 5}`
|
|
60
|
+
- `cleo memory timeline O-xxx --before 5 --after 5`
|
|
69
61
|
- Understand chronological context around a specific observation
|
|
70
62
|
|
|
71
63
|
### Cross-Project Knowledge (via NEXUS)
|
|
72
64
|
|
|
73
|
-
-
|
|
74
|
-
- MCP (Fallback): `query nexus search {query: "pattern", scope: "global"}`
|
|
65
|
+
- `cleo nexus search "pattern" --scope global`
|
|
75
66
|
- Search across all CLEO-managed projects
|
|
76
67
|
|
|
77
|
-
## MCP Resources (Fallback — for providers that support MCP resources)
|
|
78
|
-
|
|
79
|
-
When CLI is unavailable and the provider supports MCP resources:
|
|
80
|
-
|
|
81
|
-
- `ReadResource("cleo://memory/recent")` -- last 15 observations
|
|
82
|
-
- `ReadResource("cleo://memory/learnings")` -- active learnings with confidence
|
|
83
|
-
- `ReadResource("cleo://memory/patterns")` -- patterns to follow/avoid
|
|
84
|
-
- `ReadResource("cleo://memory/handoff")` -- last session handoff
|
|
85
|
-
|
|
86
68
|
## Token Budget Guidelines
|
|
87
69
|
|
|
88
|
-
| Operation | ~Tokens |
|
|
89
|
-
|
|
90
|
-
| memory-bridge.md (auto-loaded) | 200-400 |
|
|
91
|
-
| `cleo memory find` | 50/hit |
|
|
92
|
-
| `cleo memory fetch` | 500/entry |
|
|
93
|
-
| `cleo memory timeline` | 200-500 |
|
|
94
|
-
| MCP resources | 200-500 | MCP (Fallback) | On-demand |
|
|
70
|
+
| Operation | ~Tokens | When |
|
|
71
|
+
|-----------|---------|------|
|
|
72
|
+
| memory-bridge.md (auto-loaded) | 200-400 | Always (free) |
|
|
73
|
+
| `cleo memory find` | 50/hit | Discovery |
|
|
74
|
+
| `cleo memory fetch` | 500/entry | Details |
|
|
75
|
+
| `cleo memory timeline` | 200-500 | Context |
|
|
95
76
|
|
|
96
77
|
Stay within LAFS MVI budget: start minimal, escalate only when needed.
|