pi-hermes-memory 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +205 -78
- package/docs/0.2/PLAN.md +290 -0
- package/docs/0.2/TASKS.md +134 -0
- package/docs/0.2/TEST-PLAN.md +216 -0
- package/docs/ROADMAP.md +245 -135
- package/package.json +6 -3
- package/src/config.ts +7 -0
- package/src/constants.ts +73 -1
- package/src/handlers/auto-consolidate.ts +94 -0
- package/src/handlers/background-review.ts +27 -1
- package/src/handlers/correction-detector.ts +143 -0
- package/src/handlers/skill-auto-trigger.ts +108 -0
- package/src/handlers/skills-command.ts +38 -0
- package/src/index.ts +46 -8
- package/src/store/memory-store.ts +25 -2
- package/src/store/skill-store.ts +292 -0
- package/src/tools/memory-tool.ts +1 -1
- package/src/tools/skill-tool.ts +142 -0
- package/src/types.ts +40 -0
package/docs/0.2/PLAN.md
ADDED
|
@@ -0,0 +1,290 @@
|
|
|
1
|
+
# v0.2.0 Implementation Plan — Skills + Smart Curation
|
|
2
|
+
|
|
3
|
+
> **Goal**: Close the two biggest Hermes gaps — procedural memory (skills) and intelligent memory management (auto-consolidation, correction detection, tool-call-aware nudges).
|
|
4
|
+
|
|
5
|
+
## Implementation Order
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
Epic 2 (Auto-Consolidation) → standalone, modifies MemoryStore.add()
|
|
9
|
+
Epic 3 (Correction Detection) → standalone, new handler
|
|
10
|
+
Epic 4 (Tool-Call Nudge) → modifies background-review.ts
|
|
11
|
+
Epic 1 (Skill Tool) → largest: new store + tool + handlers
|
|
12
|
+
Epic 5 (Docs + Release) → depends on all above
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Epics 2, 3, 4 are independent but done sequentially to avoid merge conflicts in shared files (`types.ts`, `config.ts`, `constants.ts`, `index.ts`).
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Epic 2: Auto-Consolidation
|
|
20
|
+
|
|
21
|
+
**Problem**: When `add()` exceeds char limit, we return an error. Hermes auto-consolidates.
|
|
22
|
+
|
|
23
|
+
### New Files
|
|
24
|
+
|
|
25
|
+
**`src/handlers/auto-consolidate.ts`** (~120 lines)
|
|
26
|
+
```typescript
|
|
27
|
+
export async function triggerConsolidation(
|
|
28
|
+
pi: ExtensionAPI,
|
|
29
|
+
store: MemoryStore,
|
|
30
|
+
target: "memory" | "user",
|
|
31
|
+
signal?: AbortSignal,
|
|
32
|
+
): Promise<ConsolidationResult>
|
|
33
|
+
```
|
|
34
|
+
- Builds prompt from `CONSOLIDATION_PROMPT` + current entries for the target
|
|
35
|
+
- Calls `pi.exec("pi", ["-p", "--no-session", prompt], { signal, timeout: 60000 })`
|
|
36
|
+
- Returns `{ consolidated: true }` on success, `{ consolidated: false, error }` on failure
|
|
37
|
+
|
|
38
|
+
**`src/handlers/consolidate-command.ts`** (~30 lines)
|
|
39
|
+
- Registers `/memory-consolidate` via `pi.registerCommand()`
|
|
40
|
+
- Runs consolidation for both targets, reports via `ctx.ui.notify()`
|
|
41
|
+
|
|
42
|
+
**`tests/handlers/auto-consolidate.test.ts`** (~120 lines)
|
|
43
|
+
|
|
44
|
+
### Modified Files
|
|
45
|
+
|
|
46
|
+
**`src/constants.ts`** — Add `CONSOLIDATION_PROMPT`
|
|
47
|
+
|
|
48
|
+
**`src/types.ts`** — Add `autoConsolidate: boolean` to `MemoryConfig`; add `ConsolidationResult` interface
|
|
49
|
+
|
|
50
|
+
**`src/config.ts`** — Add `autoConsolidate: true` default + parsing
|
|
51
|
+
|
|
52
|
+
**`src/store/memory-store.ts`** — Key changes:
|
|
53
|
+
- `add()` becomes **async** (returns `Promise<MemoryResult>`)
|
|
54
|
+
- Add `setConsolidator()` method for dependency injection (avoids circular import)
|
|
55
|
+
- When over limit + consolidator set: call consolidator, **reload from disk** (`await this.loadFromDisk()`), then retry once
|
|
56
|
+
- **Critical**: The `pi.exec()` child process modifies files on disk. The parent's in-memory arrays become stale after consolidation. We MUST reload before retrying `add()` or the retry will overwrite consolidated entries with stale data.
|
|
57
|
+
|
|
58
|
+
**`src/tools/memory-tool.ts`** — `await store.add(target, content)` (line ~58)
|
|
59
|
+
|
|
60
|
+
**`src/index.ts`** — Wire consolidator + register command
|
|
61
|
+
|
|
62
|
+
**Test migration**: Making `add()` async means all existing tests calling `store.add()` must use `await`. Without `await`, tests get a Promise object instead of `MemoryResult`, causing assertion failures. Update all `store.add()` calls in `tests/store/memory-store.test.ts` to `await store.add()`.
|
|
63
|
+
|
|
64
|
+
### Key Decision: Consolidator Injection via Setter
|
|
65
|
+
MemoryStore cannot import from handlers (circular). Instead, `index.ts` injects a consolidator function via `store.setConsolidator()` after both `store` and `pi` are available.
|
|
66
|
+
|
|
67
|
+
### Key Decision: No memoryDirPath Getter
|
|
68
|
+
SkillStore receives its directory path directly from config (`config.memoryDir + "/skills/"`) in `index.ts`. No need to expose MemoryStore internals.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## Epic 3: Correction Detection + Immediate Save
|
|
73
|
+
|
|
74
|
+
**Problem**: User says "no, don't do that" — we only save it 8 turns later at the next nudge. Hermes detects immediately.
|
|
75
|
+
|
|
76
|
+
### New Files
|
|
77
|
+
|
|
78
|
+
**`src/handlers/correction-detector.ts`** (~100 lines)
|
|
79
|
+
```typescript
|
|
80
|
+
export function setupCorrectionDetector(
|
|
81
|
+
pi: ExtensionAPI,
|
|
82
|
+
store: MemoryStore,
|
|
83
|
+
config: MemoryConfig,
|
|
84
|
+
): void
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
**Design**:
|
|
88
|
+
1. On `message_end` (role=user): check text against `CORRECTION_PATTERNS`, set `pendingCorrection = true`
|
|
89
|
+
2. On `turn_end`: if `pendingCorrection`, trigger `pi.exec()` with `CORRECTION_SAVE_PROMPT` + recent messages + current memory
|
|
90
|
+
3. Rate limit: `turnsSinceLastCorrection >= 3` and `!correctionInProgress`
|
|
91
|
+
|
|
92
|
+
**Why turn_end, not message_end**: We need the full context (user correction + what agent said wrong) for the save prompt.
|
|
93
|
+
|
|
94
|
+
**`tests/handlers/correction-detector.test.ts`** (~150 lines)
|
|
95
|
+
|
|
96
|
+
### Modified Files
|
|
97
|
+
|
|
98
|
+
**`src/constants.ts`** — Add `CORRECTION_SAVE_PROMPT` and `CORRECTION_PATTERNS` (regex array)
|
|
99
|
+
|
|
100
|
+
**`src/types.ts`** — Add `correctionDetection: boolean` to `MemoryConfig`
|
|
101
|
+
|
|
102
|
+
**`src/config.ts`** — Add `correctionDetection: true` default + parsing
|
|
103
|
+
|
|
104
|
+
**`src/index.ts`** — Wire `setupCorrectionDetector()`
|
|
105
|
+
|
|
106
|
+
### Correction Patterns (Two-Pass Filter)
|
|
107
|
+
|
|
108
|
+
Patterns are split into **strong** (high confidence, trigger immediately) and **weak** (need a directive clause to confirm).
|
|
109
|
+
|
|
110
|
+
**Strong patterns** (always trigger):
|
|
111
|
+
```typescript
|
|
112
|
+
/don'?t do that/i, /not like that/i,
|
|
113
|
+
/^I said\b/i, /^I told you\b/i, /we already discussed/i,
|
|
114
|
+
/^please don'?t/i, /^that'?s not what I/i
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
**Weak patterns** (only trigger if followed by a directive — verb or "the/that/this"):
|
|
118
|
+
```typescript
|
|
119
|
+
/^no[,.\s!]/i, /^wrong[,.\s!]/i, /^actually[,.\s]/i, /^stop[,.\s!]/i
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
**Negative patterns** (suppress trigger even if a positive pattern matches):
|
|
123
|
+
```typescript
|
|
124
|
+
/^no worries/i, /^no problem/i, /^no thanks/i, /^no need/i,
|
|
125
|
+
/^actually.{0,10}(looks? great|perfect|good|correct|right)/i,
|
|
126
|
+
/^stop.{0,5}(there|here|for now)/i
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
This eliminates false positives like "no worries, I'll handle it" and "actually, that looks great" while still catching "no, don't use npm" and "actually, use yarn instead".
|
|
130
|
+
|
|
131
|
+
---
|
|
132
|
+
|
|
133
|
+
## Epic 4: Tool-Call-Aware Nudge
|
|
134
|
+
|
|
135
|
+
**Problem**: Nudge is purely turn-count based. Complex tasks with many tool calls generate more valuable memories.
|
|
136
|
+
|
|
137
|
+
### Modified Files
|
|
138
|
+
|
|
139
|
+
**`src/types.ts`** — Add `nudgeToolCalls: number` to `MemoryConfig`
|
|
140
|
+
|
|
141
|
+
**`src/config.ts`** — Add `nudgeToolCalls: 15` default + parsing
|
|
142
|
+
|
|
143
|
+
**`src/handlers/background-review.ts`** — Key changes:
|
|
144
|
+
- Count tool-use entries from `ctx.sessionManager.getBranch()` at `turn_end` time (robust — no unknown event names)
|
|
145
|
+
- Change trigger to OR logic: `turnsSinceReview >= nudgeInterval || toolCallsSinceReview >= nudgeToolCalls`
|
|
146
|
+
- Reset both counters on review
|
|
147
|
+
|
|
148
|
+
**`tests/handlers/background-review.test.ts`** — Add tool-call trigger tests
|
|
149
|
+
|
|
150
|
+
### Key Decision: Count from Branch, Not Events
|
|
151
|
+
Rather than depending on unknown Pi event names like `tool_end`, count tool-use entries from `ctx.sessionManager.getBranch()` at `turn_end` time. More robust and testable.
|
|
152
|
+
|
|
153
|
+
---
|
|
154
|
+
|
|
155
|
+
## Epic 1: Skill Tool + Procedural Memory
|
|
156
|
+
|
|
157
|
+
**Problem**: `COMBINED_REVIEW_PROMPT` asks about skills but there's no skill tool. This is the single highest-leverage change.
|
|
158
|
+
|
|
159
|
+
### New Files
|
|
160
|
+
|
|
161
|
+
**`src/store/skill-store.ts`** (~250 lines)
|
|
162
|
+
```typescript
|
|
163
|
+
export class SkillStore {
|
|
164
|
+
constructor(private skillsDir: string) {}
|
|
165
|
+
async loadIndex(): Promise<SkillIndex[]>
|
|
166
|
+
async loadSkill(fileName: string): Promise<SkillDocument | null>
|
|
167
|
+
async create(name: string, description: string, body: string): Promise<SkillResult>
|
|
168
|
+
async patch(fileName: string, section: string, newContent: string): Promise<SkillResult>
|
|
169
|
+
async edit(fileName: string, description: string, body: string): Promise<SkillResult>
|
|
170
|
+
async delete(fileName: string): Promise<SkillResult>
|
|
171
|
+
formatIndexForSystemPrompt(): string
|
|
172
|
+
}
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
**Storage**: `~/.pi/agent/memory/skills/` (isolated from user skills at `~/.pi/agent/skills/`)
|
|
176
|
+
|
|
177
|
+
**SKILL.md format**:
|
|
178
|
+
```markdown
|
|
179
|
+
---
|
|
180
|
+
name: debug-typescript-errors
|
|
181
|
+
description: Step-by-step approach to debugging TS errors in monorepos
|
|
182
|
+
version: 1
|
|
183
|
+
created: 2026-04-27
|
|
184
|
+
updated: 2026-04-27
|
|
185
|
+
---
|
|
186
|
+
## When to Use
|
|
187
|
+
## Procedure
|
|
188
|
+
## Pitfalls
|
|
189
|
+
## Verification
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
**Frontmatter parsing**: Simple regex (no yaml dependency). Split on `---`, parse key-value pairs.
|
|
193
|
+
|
|
194
|
+
**File naming**: `slugify(name) + ".md"` — lowercase, replace non-alphanum with `-`, collapse dashes.
|
|
195
|
+
|
|
196
|
+
**`src/tools/skill-tool.ts`** (~180 lines)
|
|
197
|
+
- Registered via `pi.registerTool()` with actions: `create`, `view`, `patch`, `edit`, `delete`
|
|
198
|
+
- Content scanning on all writes via `scanContent()`
|
|
199
|
+
|
|
200
|
+
**`src/handlers/skill-auto-trigger.ts`** (~80 lines)
|
|
201
|
+
- Track tool calls per turn
|
|
202
|
+
- When turn completes with **8+ tool calls** (not 5 — a typical read→bash→edit→bash→read is already 5), trigger skill extraction via `pi.exec()`
|
|
203
|
+
- Additionally require at least **2 distinct tool types** in the turn (e.g., read + bash, not just 8 reads) to filter trivial multi-call turns
|
|
204
|
+
- Rate limit: max 1 auto-trigger per session
|
|
205
|
+
|
|
206
|
+
**`src/handlers/skills-command.ts`** (~50 lines)
|
|
207
|
+
- `/memory-skills` command listing all skills
|
|
208
|
+
|
|
209
|
+
**Test files**: `tests/store/skill-store.test.ts`, `tests/tools/skill-tool.test.ts`, `tests/handlers/skill-auto-trigger.test.ts`
|
|
210
|
+
|
|
211
|
+
### Modified Files
|
|
212
|
+
|
|
213
|
+
**`src/constants.ts`** — Add `SKILL_TOOL_DESCRIPTION`, `DEFAULT_SKILL_TRIGGER_TOOL_CALLS` (= 8); update `COMBINED_REVIEW_PROMPT`:
|
|
214
|
+
|
|
215
|
+
```typescript
|
|
216
|
+
export const COMBINED_REVIEW_PROMPT = `Review the conversation above and consider two things:
|
|
217
|
+
|
|
218
|
+
**Memory**: Has the user revealed things about themselves — their persona, desires, preferences, or personal details? Has the user expressed expectations about how you should behave, their work style, or ways they want you to operate? If so, save using the memory tool.
|
|
219
|
+
|
|
220
|
+
**Skills**: Was a complex, non-trivial approach used to complete a task — one that required trial and error, multiple tool calls, or changing course? If so, save a reusable procedure using the skill tool with action 'create'. Include: when to use it, step-by-step procedure, pitfalls to avoid, and how to verify success.
|
|
221
|
+
|
|
222
|
+
Only act if there's something genuinely worth saving. If nothing stands out, just say 'Nothing to save.' and stop.`;
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
**Note on pi.exec() child tools**: The child `pi` process loads the same installed extension, so it has access to both the `memory` and `skill` tools. This is the same mechanism that makes the existing memory tool work in background review.
|
|
226
|
+
|
|
227
|
+
**`src/index.ts`** — Wire SkillStore, registerSkillTool, setupSkillAutoTrigger, registerSkillsCommand; inject skill index into system prompt at `before_agent_start`. Pass `config.memoryDir + "/skills/"` directly to SkillStore constructor (no memoryDirPath getter needed).
|
|
228
|
+
|
|
229
|
+
### Progressive Disclosure
|
|
230
|
+
- **System prompt**: Skill index only (name + description per skill, ~3K tokens max)
|
|
231
|
+
- **On demand**: Agent calls `skill` tool with action `view` to load full content
|
|
232
|
+
- **Frozen snapshot**: Index captured at `session_start`, same as memory snapshot
|
|
233
|
+
|
|
234
|
+
### Key Decision: Frozen Snapshot for Skills
|
|
235
|
+
Skill index is captured at `session_start` and injected at `before_agent_start`. New skills created mid-session appear in the index on next session. This preserves Pi's prompt cache.
|
|
236
|
+
|
|
237
|
+
---
|
|
238
|
+
|
|
239
|
+
## Epic 5: Documentation & Release
|
|
240
|
+
|
|
241
|
+
- Update `README.md` with new features, config options, commands
|
|
242
|
+
- Update `docs/ROADMAP.md` — mark v0.2 complete
|
|
243
|
+
- Bump `package.json` version to `0.2.0`
|
|
244
|
+
- `npm run check` passes, all tests pass
|
|
245
|
+
- Tag `v0.2.0`
|
|
246
|
+
|
|
247
|
+
---
|
|
248
|
+
|
|
249
|
+
## File Change Summary
|
|
250
|
+
|
|
251
|
+
### New Files (12)
|
|
252
|
+
| File | Lines | Epic |
|
|
253
|
+
|---|---|---|
|
|
254
|
+
| `src/handlers/auto-consolidate.ts` | ~120 | 2 |
|
|
255
|
+
| `src/handlers/consolidate-command.ts` | ~30 | 2 |
|
|
256
|
+
| `src/handlers/correction-detector.ts` | ~100 | 3 |
|
|
257
|
+
| `src/store/skill-store.ts` | ~250 | 1 |
|
|
258
|
+
| `src/tools/skill-tool.ts` | ~180 | 1 |
|
|
259
|
+
| `src/handlers/skill-auto-trigger.ts` | ~80 | 1 |
|
|
260
|
+
| `src/handlers/skills-command.ts` | ~50 | 1 |
|
|
261
|
+
| `tests/handlers/auto-consolidate.test.ts` | ~120 | 2 |
|
|
262
|
+
| `tests/handlers/correction-detector.test.ts` | ~150 | 3 |
|
|
263
|
+
| `tests/store/skill-store.test.ts` | ~200 | 1 |
|
|
264
|
+
| `tests/tools/skill-tool.test.ts` | ~100 | 1 |
|
|
265
|
+
| `tests/handlers/skill-auto-trigger.test.ts` | ~80 | 1 |
|
|
266
|
+
|
|
267
|
+
### Modified Files (8)
|
|
268
|
+
| File | Epic(s) |
|
|
269
|
+
|---|---|
|
|
270
|
+
| `src/types.ts` | 2, 3, 4 |
|
|
271
|
+
| `src/constants.ts` | 1, 2, 3, 4 |
|
|
272
|
+
| `src/config.ts` | 2, 3, 4 |
|
|
273
|
+
| `src/store/memory-store.ts` | 1, 2 |
|
|
274
|
+
| `src/tools/memory-tool.ts` | 2 |
|
|
275
|
+
| `src/handlers/background-review.ts` | 4 |
|
|
276
|
+
| `src/index.ts` | 1, 2, 3, 4 |
|
|
277
|
+
| `tests/handlers/background-review.test.ts` | 4 |
|
|
278
|
+
|
|
279
|
+
---
|
|
280
|
+
|
|
281
|
+
## Verification
|
|
282
|
+
|
|
283
|
+
After each epic:
|
|
284
|
+
1. `npm run check` — zero type errors
|
|
285
|
+
2. `npm test` — all tests pass
|
|
286
|
+
3. Manual test: `pi -e ./src/index.ts` — verify the feature works in a live session
|
|
287
|
+
|
|
288
|
+
Final:
|
|
289
|
+
4. Full regression: all 119 existing tests + new tests pass
|
|
290
|
+
5. Tag v0.2.0
|
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
# Tasks — v0.2.0: Skills + Smart Curation
|
|
2
|
+
|
|
3
|
+
> **Workflow**: When you start a task, change `[ ]` to `[~]`. When done, change to `[x]` and note the commit hash.
|
|
4
|
+
>
|
|
5
|
+
> **Implementation order**: Epic 2 → Epic 3 → Epic 4 → Epic 1 → Epic 5 (quick wins first, then the largest piece)
|
|
6
|
+
>
|
|
7
|
+
> **Plan**: See `docs/0.2/PLAN.md` for full implementation details and architectural decisions.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Epic 2: Auto-Consolidation
|
|
12
|
+
|
|
13
|
+
_Done when: memory full no longer returns an error — it triggers automatic consolidation and retries the add._
|
|
14
|
+
|
|
15
|
+
### Shared Config (Epics 2-4 touch these files — do once, extend per epic)
|
|
16
|
+
- [x] `src/types.ts` — add `autoConsolidate: boolean` to `MemoryConfig`; add `ConsolidationResult` interface (`c6317dd`)
|
|
17
|
+
- [x] `src/config.ts` — add `autoConsolidate: true` default + parsing (`c6317dd`)
|
|
18
|
+
- [x] `src/constants.ts` — add `CONSOLIDATION_PROMPT` (`c6317dd`)
|
|
19
|
+
|
|
20
|
+
### Implementation
|
|
21
|
+
- [x] `src/store/memory-store.ts` — make `add()` async, add `setConsolidator()` injection method; after consolidation: `await this.loadFromDisk()` before retry (`c6317dd`)
|
|
22
|
+
- [x] `src/tools/memory-tool.ts` — `await store.add(target, content)` (`c6317dd`)
|
|
23
|
+
- [x] `src/handlers/auto-consolidate.ts` — `triggerConsolidation()` using `pi.exec()` pattern (`c6317dd`)
|
|
24
|
+
- [x] `src/handlers/consolidate-command.ts` — `/memory-consolidate` command (`c6317dd` — combined into `auto-consolidate.ts`)
|
|
25
|
+
- [x] `src/index.ts` — wire consolidator via `store.setConsolidator()` + register command (`c6317dd`)
|
|
26
|
+
|
|
27
|
+
### Tests
|
|
28
|
+
- [x] `tests/handlers/auto-consolidate.test.ts` — consolidation trigger, pi.exec call, success/failure paths (`83e7c46`)
|
|
29
|
+
- [x] `tests/store/memory-store.test.ts` — migrate all `store.add()` calls to `await store.add()`; consolidator tests (`83e7c46`)
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Epic 3: Correction Detection + Immediate Save
|
|
34
|
+
|
|
35
|
+
_Done when: user corrections are detected in real-time and trigger an immediate memory save._
|
|
36
|
+
|
|
37
|
+
### Config
|
|
38
|
+
- [x] `src/types.ts` — add `correctionDetection: boolean` to `MemoryConfig` (`c6317dd`)
|
|
39
|
+
- [x] `src/config.ts` — add `correctionDetection: true` default + parsing (`c6317dd`)
|
|
40
|
+
- [x] `src/constants.ts` — add `CORRECTION_SAVE_PROMPT`, strong/weak/negative pattern arrays (`c6317dd`)
|
|
41
|
+
|
|
42
|
+
### Implementation
|
|
43
|
+
- [x] `src/handlers/correction-detector.ts` — two-pass filter: strong/weak/negative patterns (`c6317dd`)
|
|
44
|
+
- [x] Rate limiting — `turnsSinceLastCorrection >= 3` and `!correctionInProgress` guard (`c6317dd`)
|
|
45
|
+
- [x] `src/index.ts` — wire `setupCorrectionDetector()` (`c6317dd`)
|
|
46
|
+
|
|
47
|
+
### Tests
|
|
48
|
+
- [x] `tests/handlers/correction-detector.test.ts` — 35 tests: strong/weak/negative patterns, rate limiting, false positives (`83e7c46`)
|
|
49
|
+
|
|
50
|
+
---
|
|
51
|
+
|
|
52
|
+
## Epic 4: Tool-Call-Aware Nudge
|
|
53
|
+
|
|
54
|
+
_Done when: background review triggers based on EITHER turn count OR tool call count._
|
|
55
|
+
|
|
56
|
+
### Config
|
|
57
|
+
- [x] `src/types.ts` — add `nudgeToolCalls: number` to `MemoryConfig` (`c6317dd`)
|
|
58
|
+
- [x] `src/config.ts` — add `nudgeToolCalls: 15` default + parsing (`c6317dd`)
|
|
59
|
+
|
|
60
|
+
### Implementation
|
|
61
|
+
- [x] `src/handlers/background-review.ts` — count toolCall entries from branch; OR trigger logic; reset both counters (`c6317dd`)
|
|
62
|
+
|
|
63
|
+
### Tests
|
|
64
|
+
- [x] `tests/handlers/background-review.test.ts` — 6 new tests: tool-call trigger, combined trigger, counter reset, text-only, crash recovery (`83e7c46`)
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
## Epic 1: Skill Tool + Procedural Memory
|
|
69
|
+
|
|
70
|
+
_Done when: the agent can create/update/delete skill documents, skills appear in a progressive index in the system prompt, and skills are auto-created after complex tasks._
|
|
71
|
+
|
|
72
|
+
### Research & Design
|
|
73
|
+
- [x] Read Pi's skill discovery API — Pi uses `~/.pi/agent/skills/` with SKILL.md frontmatter format (`c6317dd`)
|
|
74
|
+
- [x] Decide: write to `~/.pi/agent/memory/skills/` — isolated from user skills (`c6317dd`)
|
|
75
|
+
- [x] Read Hermes `skill_manage` tool source for reference patterns (`c6317dd`)
|
|
76
|
+
|
|
77
|
+
### Store
|
|
78
|
+
- [x] `src/store/skill-store.ts` — `SkillStore` class with full CRUD + `formatIndexForSystemPrompt()` (`c6317dd`)
|
|
79
|
+
- [x] SKILL.md format — frontmatter (name, description, version, created, updated) + markdown body (`c6317dd`)
|
|
80
|
+
- [x] File naming — `slugify(name) + ".md"` (`c6317dd`)
|
|
81
|
+
- [x] Frontmatter parsing — regex-based, no yaml dependency (`c6317dd`)
|
|
82
|
+
- [x] Content scanning — all writes go through `scanContent()` (`c6317dd`)
|
|
83
|
+
- [x] Atomic writes — temp+rename pattern (`c6317dd`)
|
|
84
|
+
|
|
85
|
+
### Tool
|
|
86
|
+
- [x] `src/tools/skill-tool.ts` — `registerSkillTool()` with actions: `create`, `view`, `patch`, `edit`, `delete` (`c6317dd`)
|
|
87
|
+
- [x] `src/constants.ts` — add `SKILL_TOOL_DESCRIPTION` and `DEFAULT_SKILL_TRIGGER_TOOL_CALLS` (= 8) (`c6317dd`)
|
|
88
|
+
- [x] Rewrite `COMBINED_REVIEW_PROMPT` — references skill tool with create/patch actions (`c6317dd`)
|
|
89
|
+
|
|
90
|
+
### Progressive Disclosure
|
|
91
|
+
- [x] Skill index (name + description only) injected into system prompt at `before_agent_start` (`c6317dd`)
|
|
92
|
+
- [x] `view` action loads full skill content on demand (`c6317dd`)
|
|
93
|
+
- [x] Frozen snapshot — index captured at `session_start`, consistent throughout session (`c6317dd`)
|
|
94
|
+
|
|
95
|
+
### Auto-Trigger
|
|
96
|
+
- [x] `src/handlers/skill-auto-trigger.ts` — 8+ tool calls with 2+ distinct tool types (`c6317dd`)
|
|
97
|
+
- [x] Rate limit — max 1 auto-trigger per session (`c6317dd`)
|
|
98
|
+
|
|
99
|
+
### Command
|
|
100
|
+
- [x] `src/handlers/skills-command.ts` — `/memory-skills` command (`c6317dd`)
|
|
101
|
+
|
|
102
|
+
### Wiring
|
|
103
|
+
- [x] `src/index.ts` — wire SkillStore, registerSkillTool, setupSkillAutoTrigger, registerSkillsCommand (`c6317dd`)
|
|
104
|
+
|
|
105
|
+
### Tests
|
|
106
|
+
- [x] `tests/store/skill-store.test.ts` — 27 tests: CRUD, frontmatter, progressive disclosure, atomic writes (`83e7c46`)
|
|
107
|
+
- [x] `tests/tools/skill-tool.test.ts` — 10 tests: registration, action dispatch, validation (`83e7c46`)
|
|
108
|
+
- [x] `tests/handlers/skill-auto-trigger.test.ts` — 6 tests: threshold, distinct types, session limit (`83e7c46`)
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
## Epic 5: Documentation & Release
|
|
113
|
+
|
|
114
|
+
_Done when: v0.2.0 is tagged and released with updated docs._
|
|
115
|
+
|
|
116
|
+
- [x] Update `README.md` — skill tool, auto-consolidation, correction detection, new config, new commands (`4658529`)
|
|
117
|
+
- [x] Update `src/constants.ts` — verify all new prompts are finalized (`c6317dd`)
|
|
118
|
+
- [x] Update `docs/ROADMAP.md` — v0.2 roadmap documented (`d5b7518`)
|
|
119
|
+
- [x] `npm run check` passes with zero errors (`c6317dd`)
|
|
120
|
+
- [x] `npm test` — all 218 tests pass (`83e7c46`)
|
|
121
|
+
- [ ] Bump `package.json` version to `0.2.0`
|
|
122
|
+
- [ ] Tag v0.2.0 release
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
## Summary
|
|
127
|
+
|
|
128
|
+
| Epic | Priority | Est. Complexity | New Files | Modified Files |
|
|
129
|
+
|---|---|---|---|---|
|
|
130
|
+
| 2: Auto-Consolidation | HIGH | Low | 3 (src + test) | 5 (types, config, constants, memory-store, memory-tool, index) |
|
|
131
|
+
| 3: Correction Detection | HIGH | Low | 2 (src + test) | 3 (types, config, constants, index) |
|
|
132
|
+
| 4: Tool-Call Nudge | MEDIUM | Low | 0 | 3 (types, config, background-review, test) |
|
|
133
|
+
| 1: Skill Tool | CRITICAL | High | 8 (4 src + 4 test) | 3 (constants, index, memory-store) |
|
|
134
|
+
| 5: Documentation | NORMAL | Low | 0 | 4 (README, constants, ROADMAP, package.json) |
|
|
@@ -0,0 +1,216 @@
|
|
|
1
|
+
# Test Plan — v0.2.0: Skills + Smart Curation
|
|
2
|
+
|
|
3
|
+
> This document defines the test strategy for v0.2.0. Each section maps to an epic in the implementation plan.
|
|
4
|
+
|
|
5
|
+
## Current State
|
|
6
|
+
|
|
7
|
+
- **119 existing tests** — all passing after `add()` async migration
|
|
8
|
+
- **Zero new tests** yet for v0.2 features
|
|
9
|
+
- **Type check**: `npm run check` passes with zero errors
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Epic 2: Auto-Consolidation
|
|
14
|
+
|
|
15
|
+
### Unit Tests: `tests/handlers/auto-consolidate.test.ts`
|
|
16
|
+
|
|
17
|
+
| Test | What | Expected |
|
|
18
|
+
|---|---|---|
|
|
19
|
+
| `triggerConsolidation builds correct prompt` | Verify prompt includes current entries + CONSOLIDATION_PROMPT | Prompt contains entry text and consolidation instructions |
|
|
20
|
+
| `triggerConsolidation returns consolidated on success` | Mock `pi.exec()` to return code 0 | `{ consolidated: true }` |
|
|
21
|
+
| `triggerConsolidation returns error on failure` | Mock `pi.exec()` to return code 1 | `{ consolidated: false, error: "..." }` |
|
|
22
|
+
| `triggerConsolidation returns error on exception` | Mock `pi.exec()` to throw | `{ consolidated: false, error: "Consolidation failed..." }` |
|
|
23
|
+
| `/memory-consolidate consolidates both targets` | Mock handler, verify both "memory" and "user" get consolidated | UI notification contains both results |
|
|
24
|
+
| `/memory-consolidate skips empty targets` | Store has empty user entries | Report shows "(empty, nothing to consolidate)" for that target |
|
|
25
|
+
|
|
26
|
+
### Integration Tests: `tests/store/memory-store.test.ts` (extend existing)
|
|
27
|
+
|
|
28
|
+
| Test | What | Expected |
|
|
29
|
+
|---|---|---|
|
|
30
|
+
| `add() triggers consolidation when over limit` | Config `autoConsolidate: true`, mock consolidator returns success, entry exceeds limit | `add()` succeeds after consolidation + reload |
|
|
31
|
+
| `add() retries once after consolidation` | Verify consolidator called once, then `loadFromDisk()` called, then add succeeds | Entry appears in entries |
|
|
32
|
+
| `add() falls through to error if consolidation fails` | Mock consolidator returns `{ consolidated: false }` | Error about exceeding limit |
|
|
33
|
+
| `add() skips consolidation when disabled` | Config `autoConsolidate: false` | Error about exceeding limit (no consolidator call) |
|
|
34
|
+
| `add() skips consolidation when no consolidator set` | `setConsolidator()` not called | Error about exceeding limit |
|
|
35
|
+
| `add() consolidates for user target too` | Same test but target is "user" | Works identically |
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Epic 3: Correction Detection
|
|
40
|
+
|
|
41
|
+
### Unit Tests: `tests/handlers/correction-detector.test.ts`
|
|
42
|
+
|
|
43
|
+
#### Pattern Matching: `isCorrection()`
|
|
44
|
+
|
|
45
|
+
**Strong patterns (always trigger):**
|
|
46
|
+
| Input | Expected |
|
|
47
|
+
|---|---|
|
|
48
|
+
| `"don't do that"` | `true` |
|
|
49
|
+
| `"not like that"` | `true` |
|
|
50
|
+
| `"I said use yarn"` | `true` |
|
|
51
|
+
| `"I told you already"` | `true` |
|
|
52
|
+
| `"we already discussed this"` | `true` |
|
|
53
|
+
| `"please don't commit yet"` | `true` |
|
|
54
|
+
| `"that's not what I asked for"` | `true` |
|
|
55
|
+
|
|
56
|
+
**Weak patterns (need directive clause):**
|
|
57
|
+
| Input | Expected | Why |
|
|
58
|
+
|---|---|---|
|
|
59
|
+
| `"no, use yarn instead"` | `true` | "use" is directive |
|
|
60
|
+
| `"wrong, the file is in src/"` | `true` | "the" is directive |
|
|
61
|
+
| `"actually, don't use that"` | `true` | "don't" is directive |
|
|
62
|
+
| `"stop, fix the test first"` | `true` | "fix" is directive |
|
|
63
|
+
| `"no worries, I'll handle it"` | `false` | Negative pattern suppresses |
|
|
64
|
+
| `"no problem"` | `false` | Negative pattern suppresses |
|
|
65
|
+
| `"no thanks"` | `false` | Negative pattern suppresses |
|
|
66
|
+
| `"no need to change that"` | `false` | Negative pattern suppresses |
|
|
67
|
+
| `"actually, that looks great"` | `false` | Negative pattern suppresses |
|
|
68
|
+
| `"actually, perfect"` | `false` | Negative pattern suppresses |
|
|
69
|
+
| `"stop there"` | `false` | Negative pattern suppresses |
|
|
70
|
+
|
|
71
|
+
**Non-corrections (should NOT trigger):**
|
|
72
|
+
| Input | Expected |
|
|
73
|
+
|---|---|
|
|
74
|
+
| `"yes, do that"` | `false` |
|
|
75
|
+
| `"looks good"` | `false` |
|
|
76
|
+
| `"can you also check the tests?"` | `false` |
|
|
77
|
+
| `""` | `false` |
|
|
78
|
+
| `"thanks"` | `false` |
|
|
79
|
+
|
|
80
|
+
#### Handler Behavior
|
|
81
|
+
|
|
82
|
+
| Test | What | Expected |
|
|
83
|
+
|---|---|---|
|
|
84
|
+
| `triggers pi.exec on correction` | Emit user message "no, don't use npm", then turn_end | `pi.exec()` called with CORRECTION_SAVE_PROMPT |
|
|
85
|
+
| `does not trigger on normal message` | Emit user message "looks good", then turn_end | `pi.exec()` NOT called |
|
|
86
|
+
| `rate limits: 1 per 3 turns` | Emit correction at turn 1, correction at turn 2 | Only first correction triggers save |
|
|
87
|
+
| `rate limit resets after 3 turns` | Emit correction at turn 1, then normal turns 2-4, then correction at turn 5 | Both corrections trigger save |
|
|
88
|
+
| `does not trigger when in progress` | Emit correction, then another correction before first completes | Only first triggers |
|
|
89
|
+
| `disabled via config` | Config `correctionDetection: false` | No handler registered |
|
|
90
|
+
| `includes recent context (last 6 exchanges)` | Verify prompt content | Prompt includes recent messages + current memory |
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
## Epic 4: Tool-Call-Aware Nudge
|
|
95
|
+
|
|
96
|
+
### Unit Tests: `tests/handlers/background-review.test.ts` (extend existing)
|
|
97
|
+
|
|
98
|
+
| Test | What | Expected |
|
|
99
|
+
|---|---|---|
|
|
100
|
+
| `triggers on turn count threshold` | `turnsSinceReview >= 10`, `toolCallsSinceReview < 15` | Review triggers |
|
|
101
|
+
| `triggers on tool call count threshold` | `turnsSinceReview < 10`, `toolCallsSinceReview >= 15` | Review triggers |
|
|
102
|
+
| `triggers when both thresholds met` | Both thresholds exceeded | Review triggers |
|
|
103
|
+
| `does not trigger when neither threshold met` | Both below threshold | No review |
|
|
104
|
+
| `resets both counters after review` | After review completes | Both counters = 0 |
|
|
105
|
+
| `counts toolCall blocks from branch` | Branch has 3 assistant messages with 2 toolCall blocks each | `toolCallsSinceReview = 6` |
|
|
106
|
+
| `ignores text blocks in content` | Branch has text blocks only | `toolCallsSinceReview = 0` |
|
|
107
|
+
| `falls back gracefully if branch access fails` | `sessionManager.getBranch()` throws | No crash, continues with turn-based only |
|
|
108
|
+
|
|
109
|
+
---
|
|
110
|
+
|
|
111
|
+
## Epic 1: Skill Tool + Procedural Memory
|
|
112
|
+
|
|
113
|
+
### Unit Tests: `tests/store/skill-store.test.ts`
|
|
114
|
+
|
|
115
|
+
#### CRUD Operations
|
|
116
|
+
|
|
117
|
+
| Test | What | Expected |
|
|
118
|
+
|---|---|---|
|
|
119
|
+
| `create() writes SKILL.md with correct frontmatter` | Create skill with name, description, body | File exists with `---\nname: ...\n---\n` format |
|
|
120
|
+
| `create() slugifies name correctly` | Name: `"Debug TypeScript Errors!"` | File: `debug-typescript-errors.md` |
|
|
121
|
+
| `create() returns error for duplicate name` | Create same skill twice | Second returns error about existing skill |
|
|
122
|
+
| `create() returns error for empty name` | `name: ""` | Error: "Skill name is required." |
|
|
123
|
+
| `create() returns error for empty description` | Valid name, empty description | Error: "Skill description is required." |
|
|
124
|
+
| `create() returns error for empty body` | Valid name/desc, empty body | Error: "Skill body is required." |
|
|
125
|
+
| `create() scans content for security` | Body contains injection pattern | Error: "Blocked: content matches threat pattern" |
|
|
126
|
+
| `loadIndex() returns all skills` | Create 3 skills, call loadIndex | Returns 3 SkillIndex entries |
|
|
127
|
+
| `loadIndex() returns empty array when no skills` | Empty skills dir | Returns `[]` |
|
|
128
|
+
| `loadSkill() returns full document` | Load specific .md file | Returns SkillDocument with all fields |
|
|
129
|
+
| `loadSkill() returns null for missing file` | Nonexistent file | Returns `null` |
|
|
130
|
+
| `loadSkill() returns null for missing frontmatter` | File without `---` frontmatter | Returns `null` |
|
|
131
|
+
| `patch() replaces existing section` | Skill has `## Procedure`, patch it | New content replaces old section |
|
|
132
|
+
| `patch() appends new section` | Skill has no `## Pitfalls`, patch it | Section appended |
|
|
133
|
+
| `patch() increments version` | Patch a skill | Version goes from 1 → 2 |
|
|
134
|
+
| `patch() scans content` | New content has injection pattern | Blocked |
|
|
135
|
+
| `edit() replaces description and body` | Edit with new desc + body | Both updated |
|
|
136
|
+
| `edit() replaces only description` | Edit with new desc, empty body | Only description changes |
|
|
137
|
+
| `edit() increments version` | Edit a skill | Version incremented |
|
|
138
|
+
| `delete() removes file` | Delete a skill | File gone, loadIndex returns one fewer |
|
|
139
|
+
| `delete() returns error for missing file` | Delete nonexistent | Error: "not found" |
|
|
140
|
+
|
|
141
|
+
#### Frontmatter Parsing
|
|
142
|
+
|
|
143
|
+
| Test | What | Expected |
|
|
144
|
+
|---|---|---|
|
|
145
|
+
| `parses standard frontmatter` | `---\nname: foo\ndescription: bar\n---\nbody` | `{ name: "foo", description: "bar", body: "body" }` |
|
|
146
|
+
| `handles value with colons` | `description: uses this: that` | Description = `"uses this: that"` |
|
|
147
|
+
| `handles empty body` | Frontmatter only, no body after `---` | body = `""` |
|
|
148
|
+
| `handles no frontmatter` | Plain markdown without `---` | Returns `{ meta: {}, body: raw }` |
|
|
149
|
+
|
|
150
|
+
#### Progressive Disclosure
|
|
151
|
+
|
|
152
|
+
| Test | What | Expected |
|
|
153
|
+
|---|---|---|
|
|
154
|
+
| `formatIndexForSystemPrompt() returns formatted index` | 2 skills | String with skill names + descriptions |
|
|
155
|
+
| `formatIndexForSystemPrompt() returns empty when no skills` | No skills | Returns `""` |
|
|
156
|
+
| `index does not include body content` | Skills have long bodies | Index only shows name + description |
|
|
157
|
+
|
|
158
|
+
#### Atomic Writes
|
|
159
|
+
|
|
160
|
+
| Test | What | Expected |
|
|
161
|
+
|---|---|---|
|
|
162
|
+
| `create() uses atomic write (file exists after create)` | Create skill, read file | File exists with correct content |
|
|
163
|
+
| `file content is correct after create + patch` | Create then patch | File on disk reflects patch |
|
|
164
|
+
|
|
165
|
+
### Unit Tests: `tests/tools/skill-tool.test.ts`
|
|
166
|
+
|
|
167
|
+
| Test | What | Expected |
|
|
168
|
+
|---|---|---|
|
|
169
|
+
| `registers tool with name 'skill'` | Check tool registration | Tool name = "skill" |
|
|
170
|
+
| `create requires name, description, content` | Missing each param | Error for each missing param |
|
|
171
|
+
| `view without file_name lists all skills` | No file_name param | Returns skill index |
|
|
172
|
+
| `view with file_name returns full document` | Valid file_name | Returns SkillDocument |
|
|
173
|
+
| `view with invalid file_name returns error` | Nonexistent file | Error: "not found" |
|
|
174
|
+
| `patch requires file_name, section, content` | Missing params | Error for each |
|
|
175
|
+
| `edit requires file_name` | No file_name | Error |
|
|
176
|
+
| `delete requires file_name` | No file_name | Error |
|
|
177
|
+
| `unknown action returns error` | `action: "foo"` | Error: "Unknown action" |
|
|
178
|
+
|
|
179
|
+
### Unit Tests: `tests/handlers/skill-auto-trigger.test.ts`
|
|
180
|
+
|
|
181
|
+
| Test | What | Expected |
|
|
182
|
+
|---|---|---|
|
|
183
|
+
| `triggers at 8+ tool calls with 2+ types` | Branch has 8 toolCall blocks with 3 distinct tool names | `pi.exec()` called |
|
|
184
|
+
| `does not trigger below 8 tool calls` | Branch has 7 toolCall blocks | Not triggered |
|
|
185
|
+
| `does not trigger with only 1 tool type` | 10 toolCall blocks, all same tool | Not triggered |
|
|
186
|
+
| `only triggers once per session` | Two turn_end events both meeting threshold | Only first triggers |
|
|
187
|
+
| `handles branch access failure gracefully` | `getBranch()` throws | No crash |
|
|
188
|
+
|
|
189
|
+
---
|
|
190
|
+
|
|
191
|
+
## Epic 5: Documentation & Release
|
|
192
|
+
|
|
193
|
+
### Manual Verification
|
|
194
|
+
|
|
195
|
+
| Check | Command | Expected |
|
|
196
|
+
|---|---|---|
|
|
197
|
+
| Type check passes | `npm run check` | Zero errors |
|
|
198
|
+
| All tests pass | `npm test` | 119+ tests, 0 failures |
|
|
199
|
+
| README updated | Manual review | Mentions skill tool, auto-consolidation, correction detection |
|
|
200
|
+
| ROADMAP updated | Manual review | v0.2 marked complete |
|
|
201
|
+
| Version bumped | `cat package.json \| grep version` | `"version": "0.2.0"` |
|
|
202
|
+
| Git tagged | `git tag -l "v0.2*"` | `v0.2.0` exists |
|
|
203
|
+
|
|
204
|
+
---
|
|
205
|
+
|
|
206
|
+
## Summary
|
|
207
|
+
|
|
208
|
+
| Area | New Tests | Existing Tests Modified |
|
|
209
|
+
|---|---|---|
|
|
210
|
+
| Auto-Consolidation | 6 + 6 | `memory-store.test.ts` (6 tests for async+consolidator) |
|
|
211
|
+
| Correction Detection | ~20 (patterns + handler) | — |
|
|
212
|
+
| Tool-Call Nudge | 8 | `background-review.test.ts` (extend) |
|
|
213
|
+
| Skill Store | ~25 | — |
|
|
214
|
+
| Skill Tool | ~10 | — |
|
|
215
|
+
| Skill Auto-Trigger | 5 | — |
|
|
216
|
+
| **Total** | **~80 new tests** | **~14 modified** |
|