pi-hermes-memory 0.1.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,290 @@
1
+ # v0.2.0 Implementation Plan — Skills + Smart Curation
2
+
3
+ > **Goal**: Close the two biggest Hermes gaps — procedural memory (skills) and intelligent memory management (auto-consolidation, correction detection, tool-call-aware nudges).
4
+
5
+ ## Implementation Order
6
+
7
+ ```
8
+ Epic 2 (Auto-Consolidation) → standalone, modifies MemoryStore.add()
9
+ Epic 3 (Correction Detection) → standalone, new handler
10
+ Epic 4 (Tool-Call Nudge) → modifies background-review.ts
11
+ Epic 1 (Skill Tool) → largest: new store + tool + handlers
12
+ Epic 5 (Docs + Release) → depends on all above
13
+ ```
14
+
15
+ Epics 2, 3, 4 are independent but done sequentially to avoid merge conflicts in shared files (`types.ts`, `config.ts`, `constants.ts`, `index.ts`).
16
+
17
+ ---
18
+
19
+ ## Epic 2: Auto-Consolidation
20
+
21
+ **Problem**: When `add()` exceeds char limit, we return an error. Hermes auto-consolidates.
22
+
23
+ ### New Files
24
+
25
+ **`src/handlers/auto-consolidate.ts`** (~120 lines)
26
+ ```typescript
27
+ export async function triggerConsolidation(
28
+ pi: ExtensionAPI,
29
+ store: MemoryStore,
30
+ target: "memory" | "user",
31
+ signal?: AbortSignal,
32
+ ): Promise<ConsolidationResult>
33
+ ```
34
+ - Builds prompt from `CONSOLIDATION_PROMPT` + current entries for the target
35
+ - Calls `pi.exec("pi", ["-p", "--no-session", prompt], { signal, timeout: 60000 })`
36
+ - Returns `{ consolidated: true }` on success, `{ consolidated: false, error }` on failure
37
+
38
+ **`src/handlers/consolidate-command.ts`** (~30 lines)
39
+ - Registers `/memory-consolidate` via `pi.registerCommand()`
40
+ - Runs consolidation for both targets, reports via `ctx.ui.notify()`
41
+
42
+ **`tests/handlers/auto-consolidate.test.ts`** (~120 lines)
43
+
44
+ ### Modified Files
45
+
46
+ **`src/constants.ts`** — Add `CONSOLIDATION_PROMPT`
47
+
48
+ **`src/types.ts`** — Add `autoConsolidate: boolean` to `MemoryConfig`; add `ConsolidationResult` interface
49
+
50
+ **`src/config.ts`** — Add `autoConsolidate: true` default + parsing
51
+
52
+ **`src/store/memory-store.ts`** — Key changes:
53
+ - `add()` becomes **async** (returns `Promise<MemoryResult>`)
54
+ - Add `setConsolidator()` method for dependency injection (avoids circular import)
55
+ - When over limit + consolidator set: call consolidator, **reload from disk** (`await this.loadFromDisk()`), then retry once
56
+ - **Critical**: The `pi.exec()` child process modifies files on disk. The parent's in-memory arrays become stale after consolidation. We MUST reload before retrying `add()` or the retry will overwrite consolidated entries with stale data.
57
+
58
+ **`src/tools/memory-tool.ts`** — `await store.add(target, content)` (line ~58)
59
+
60
+ **`src/index.ts`** — Wire consolidator + register command
61
+
62
+ **Test migration**: Making `add()` async means all existing tests calling `store.add()` must use `await`. Without `await`, tests get a Promise object instead of `MemoryResult`, causing assertion failures. Update all `store.add()` calls in `tests/store/memory-store.test.ts` to `await store.add()`.
63
+
64
+ ### Key Decision: Consolidator Injection via Setter
65
+ MemoryStore cannot import from handlers (circular). Instead, `index.ts` injects a consolidator function via `store.setConsolidator()` after both `store` and `pi` are available.
66
+
67
+ ### Key Decision: No memoryDirPath Getter
68
+ SkillStore receives its directory path directly from config (`config.memoryDir + "/skills/"`) in `index.ts`. No need to expose MemoryStore internals.
69
+
70
+ ---
71
+
72
+ ## Epic 3: Correction Detection + Immediate Save
73
+
74
+ **Problem**: User says "no, don't do that" — we only save it 8 turns later at the next nudge. Hermes detects immediately.
75
+
76
+ ### New Files
77
+
78
+ **`src/handlers/correction-detector.ts`** (~100 lines)
79
+ ```typescript
80
+ export function setupCorrectionDetector(
81
+ pi: ExtensionAPI,
82
+ store: MemoryStore,
83
+ config: MemoryConfig,
84
+ ): void
85
+ ```
86
+
87
+ **Design**:
88
+ 1. On `message_end` (role=user): check text against `CORRECTION_PATTERNS`, set `pendingCorrection = true`
89
+ 2. On `turn_end`: if `pendingCorrection`, trigger `pi.exec()` with `CORRECTION_SAVE_PROMPT` + recent messages + current memory
90
+ 3. Rate limit: `turnsSinceLastCorrection >= 3` and `!correctionInProgress`
91
+
92
+ **Why turn_end, not message_end**: We need the full context (user correction + what agent said wrong) for the save prompt.
93
+
94
+ **`tests/handlers/correction-detector.test.ts`** (~150 lines)
95
+
96
+ ### Modified Files
97
+
98
+ **`src/constants.ts`** — Add `CORRECTION_SAVE_PROMPT` and `CORRECTION_PATTERNS` (regex array)
99
+
100
+ **`src/types.ts`** — Add `correctionDetection: boolean` to `MemoryConfig`
101
+
102
+ **`src/config.ts`** — Add `correctionDetection: true` default + parsing
103
+
104
+ **`src/index.ts`** — Wire `setupCorrectionDetector()`
105
+
106
+ ### Correction Patterns (Two-Pass Filter)
107
+
108
+ Patterns are split into **strong** (high confidence, trigger immediately) and **weak** (need a directive clause to confirm).
109
+
110
+ **Strong patterns** (always trigger):
111
+ ```typescript
112
+ /don'?t do that/i, /not like that/i,
113
+ /^I said\b/i, /^I told you\b/i, /we already discussed/i,
114
+ /^please don'?t/i, /^that'?s not what I/i
115
+ ```
116
+
117
+ **Weak patterns** (only trigger if followed by a directive — verb or "the/that/this"):
118
+ ```typescript
119
+ /^no[,.\s!]/i, /^wrong[,.\s!]/i, /^actually[,.\s]/i, /^stop[,.\s!]/i
120
+ ```
121
+
122
+ **Negative patterns** (suppress trigger even if a positive pattern matches):
123
+ ```typescript
124
+ /^no worries/i, /^no problem/i, /^no thanks/i, /^no need/i,
125
+ /^actually.{0,10}(looks? great|perfect|good|correct|right)/i,
126
+ /^stop.{0,5}(there|here|for now)/i
127
+ ```
128
+
129
+ This eliminates false positives like "no worries, I'll handle it" and "actually, that looks great" while still catching "no, don't use npm" and "actually, use yarn instead".
130
+
131
+ ---
132
+
133
+ ## Epic 4: Tool-Call-Aware Nudge
134
+
135
+ **Problem**: Nudge is purely turn-count based. Complex tasks with many tool calls generate more valuable memories.
136
+
137
+ ### Modified Files
138
+
139
+ **`src/types.ts`** — Add `nudgeToolCalls: number` to `MemoryConfig`
140
+
141
+ **`src/config.ts`** — Add `nudgeToolCalls: 15` default + parsing
142
+
143
+ **`src/handlers/background-review.ts`** — Key changes:
144
+ - Count tool-use entries from `ctx.sessionManager.getBranch()` at `turn_end` time (robust — no unknown event names)
145
+ - Change trigger to OR logic: `turnsSinceReview >= nudgeInterval || toolCallsSinceReview >= nudgeToolCalls`
146
+ - Reset both counters on review
147
+
148
+ **`tests/handlers/background-review.test.ts`** — Add tool-call trigger tests
149
+
150
+ ### Key Decision: Count from Branch, Not Events
151
+ Rather than depending on unknown Pi event names like `tool_end`, count tool-use entries from `ctx.sessionManager.getBranch()` at `turn_end` time. More robust and testable.
152
+
153
+ ---
154
+
155
+ ## Epic 1: Skill Tool + Procedural Memory
156
+
157
+ **Problem**: `COMBINED_REVIEW_PROMPT` asks about skills but there's no skill tool. This is the single highest-leverage change.
158
+
159
+ ### New Files
160
+
161
+ **`src/store/skill-store.ts`** (~250 lines)
162
+ ```typescript
163
+ export class SkillStore {
164
+ constructor(private skillsDir: string) {}
165
+ async loadIndex(): Promise<SkillIndex[]>
166
+ async loadSkill(fileName: string): Promise<SkillDocument | null>
167
+ async create(name: string, description: string, body: string): Promise<SkillResult>
168
+ async patch(fileName: string, section: string, newContent: string): Promise<SkillResult>
169
+ async edit(fileName: string, description: string, body: string): Promise<SkillResult>
170
+ async delete(fileName: string): Promise<SkillResult>
171
+ formatIndexForSystemPrompt(): string
172
+ }
173
+ ```
174
+
175
+ **Storage**: `~/.pi/agent/memory/skills/` (isolated from user skills at `~/.pi/agent/skills/`)
176
+
177
+ **SKILL.md format**:
178
+ ```markdown
179
+ ---
180
+ name: debug-typescript-errors
181
+ description: Step-by-step approach to debugging TS errors in monorepos
182
+ version: 1
183
+ created: 2026-04-27
184
+ updated: 2026-04-27
185
+ ---
186
+ ## When to Use
187
+ ## Procedure
188
+ ## Pitfalls
189
+ ## Verification
190
+ ```
191
+
192
+ **Frontmatter parsing**: Simple regex (no yaml dependency). Split on `---`, parse key-value pairs.
193
+
194
+ **File naming**: `slugify(name) + ".md"` — lowercase, replace non-alphanum with `-`, collapse dashes.
195
+
196
+ **`src/tools/skill-tool.ts`** (~180 lines)
197
+ - Registered via `pi.registerTool()` with actions: `create`, `view`, `patch`, `edit`, `delete`
198
+ - Content scanning on all writes via `scanContent()`
199
+
200
+ **`src/handlers/skill-auto-trigger.ts`** (~80 lines)
201
+ - Track tool calls per turn
202
+ - When turn completes with **8+ tool calls** (not 5 — a typical read→bash→edit→bash→read is already 5), trigger skill extraction via `pi.exec()`
203
+ - Additionally require at least **2 distinct tool types** in the turn (e.g., read + bash, not just 8 reads) to filter trivial multi-call turns
204
+ - Rate limit: max 1 auto-trigger per session
205
+
206
+ **`src/handlers/skills-command.ts`** (~50 lines)
207
+ - `/memory-skills` command listing all skills
208
+
209
+ **Test files**: `tests/store/skill-store.test.ts`, `tests/tools/skill-tool.test.ts`, `tests/handlers/skill-auto-trigger.test.ts`
210
+
211
+ ### Modified Files
212
+
213
+ **`src/constants.ts`** — Add `SKILL_TOOL_DESCRIPTION`, `DEFAULT_SKILL_TRIGGER_TOOL_CALLS` (= 8); update `COMBINED_REVIEW_PROMPT`:
214
+
215
+ ```typescript
216
+ export const COMBINED_REVIEW_PROMPT = `Review the conversation above and consider two things:
217
+
218
+ **Memory**: Has the user revealed things about themselves — their persona, desires, preferences, or personal details? Has the user expressed expectations about how you should behave, their work style, or ways they want you to operate? If so, save using the memory tool.
219
+
220
+ **Skills**: Was a complex, non-trivial approach used to complete a task — one that required trial and error, multiple tool calls, or changing course? If so, save a reusable procedure using the skill tool with action 'create'. Include: when to use it, step-by-step procedure, pitfalls to avoid, and how to verify success.
221
+
222
+ Only act if there's something genuinely worth saving. If nothing stands out, just say 'Nothing to save.' and stop.`;
223
+ ```
224
+
225
+ **Note on pi.exec() child tools**: The child `pi` process loads the same installed extension, so it has access to both the `memory` and `skill` tools. This is the same mechanism that makes the existing memory tool work in background review.
226
+
227
+ **`src/index.ts`** — Wire SkillStore, registerSkillTool, setupSkillAutoTrigger, registerSkillsCommand; inject skill index into system prompt at `before_agent_start`. Pass `config.memoryDir + "/skills/"` directly to SkillStore constructor (no memoryDirPath getter needed).
228
+
229
+ ### Progressive Disclosure
230
+ - **System prompt**: Skill index only (name + description per skill, ~3K tokens max)
231
+ - **On demand**: Agent calls `skill` tool with action `view` to load full content
232
+ - **Frozen snapshot**: Index captured at `session_start`, same as memory snapshot
233
+
234
+ ### Key Decision: Frozen Snapshot for Skills
235
+ Skill index is captured at `session_start` and injected at `before_agent_start`. New skills created mid-session appear in the index on next session. This preserves Pi's prompt cache.
236
+
237
+ ---
238
+
239
+ ## Epic 5: Documentation & Release
240
+
241
+ - Update `README.md` with new features, config options, commands
242
+ - Update `docs/ROADMAP.md` — mark v0.2 complete
243
+ - Bump `package.json` version to `0.2.0`
244
+ - `npm run check` passes, all tests pass
245
+ - Tag `v0.2.0`
246
+
247
+ ---
248
+
249
+ ## File Change Summary
250
+
251
+ ### New Files (12)
252
+ | File | Lines | Epic |
253
+ |---|---|---|
254
+ | `src/handlers/auto-consolidate.ts` | ~120 | 2 |
255
+ | `src/handlers/consolidate-command.ts` | ~30 | 2 |
256
+ | `src/handlers/correction-detector.ts` | ~100 | 3 |
257
+ | `src/store/skill-store.ts` | ~250 | 1 |
258
+ | `src/tools/skill-tool.ts` | ~180 | 1 |
259
+ | `src/handlers/skill-auto-trigger.ts` | ~80 | 1 |
260
+ | `src/handlers/skills-command.ts` | ~50 | 1 |
261
+ | `tests/handlers/auto-consolidate.test.ts` | ~120 | 2 |
262
+ | `tests/handlers/correction-detector.test.ts` | ~150 | 3 |
263
+ | `tests/store/skill-store.test.ts` | ~200 | 1 |
264
+ | `tests/tools/skill-tool.test.ts` | ~100 | 1 |
265
+ | `tests/handlers/skill-auto-trigger.test.ts` | ~80 | 1 |
266
+
267
+ ### Modified Files (8)
268
+ | File | Epic(s) |
269
+ |---|---|
270
+ | `src/types.ts` | 2, 3, 4 |
271
+ | `src/constants.ts` | 1, 2, 3, 4 |
272
+ | `src/config.ts` | 2, 3, 4 |
273
+ | `src/store/memory-store.ts` | 1, 2 |
274
+ | `src/tools/memory-tool.ts` | 2 |
275
+ | `src/handlers/background-review.ts` | 4 |
276
+ | `src/index.ts` | 1, 2, 3, 4 |
277
+ | `tests/handlers/background-review.test.ts` | 4 |
278
+
279
+ ---
280
+
281
+ ## Verification
282
+
283
+ After each epic:
284
+ 1. `npm run check` — zero type errors
285
+ 2. `npm test` — all tests pass
286
+ 3. Manual test: `pi -e ./src/index.ts` — verify the feature works in a live session
287
+
288
+ Final:
289
+ 4. Full regression: all 119 existing tests + new tests pass
290
+ 5. Tag v0.2.0
@@ -0,0 +1,134 @@
1
+ # Tasks — v0.2.0: Skills + Smart Curation
2
+
3
+ > **Workflow**: When you start a task, change `[ ]` to `[~]`. When done, change to `[x]` and note the commit hash.
4
+ >
5
+ > **Implementation order**: Epic 2 → Epic 3 → Epic 4 → Epic 1 → Epic 5 (quick wins first, then the largest piece)
6
+ >
7
+ > **Plan**: See `docs/0.2/PLAN.md` for full implementation details and architectural decisions.
8
+
9
+ ---
10
+
11
+ ## Epic 2: Auto-Consolidation
12
+
13
+ _Done when: memory full no longer returns an error — it triggers automatic consolidation and retries the add._
14
+
15
+ ### Shared Config (Epics 2-4 touch these files — do once, extend per epic)
16
+ - [x] `src/types.ts` — add `autoConsolidate: boolean` to `MemoryConfig`; add `ConsolidationResult` interface (`c6317dd`)
17
+ - [x] `src/config.ts` — add `autoConsolidate: true` default + parsing (`c6317dd`)
18
+ - [x] `src/constants.ts` — add `CONSOLIDATION_PROMPT` (`c6317dd`)
19
+
20
+ ### Implementation
21
+ - [x] `src/store/memory-store.ts` — make `add()` async, add `setConsolidator()` injection method; after consolidation: `await this.loadFromDisk()` before retry (`c6317dd`)
22
+ - [x] `src/tools/memory-tool.ts` — `await store.add(target, content)` (`c6317dd`)
23
+ - [x] `src/handlers/auto-consolidate.ts` — `triggerConsolidation()` using `pi.exec()` pattern (`c6317dd`)
24
+ - [x] `src/handlers/consolidate-command.ts` — `/memory-consolidate` command (`c6317dd` — combined into `auto-consolidate.ts`)
25
+ - [x] `src/index.ts` — wire consolidator via `store.setConsolidator()` + register command (`c6317dd`)
26
+
27
+ ### Tests
28
+ - [x] `tests/handlers/auto-consolidate.test.ts` — consolidation trigger, pi.exec call, success/failure paths (`83e7c46`)
29
+ - [x] `tests/store/memory-store.test.ts` — migrate all `store.add()` calls to `await store.add()`; consolidator tests (`83e7c46`)
30
+
31
+ ---
32
+
33
+ ## Epic 3: Correction Detection + Immediate Save
34
+
35
+ _Done when: user corrections are detected in real-time and trigger an immediate memory save._
36
+
37
+ ### Config
38
+ - [x] `src/types.ts` — add `correctionDetection: boolean` to `MemoryConfig` (`c6317dd`)
39
+ - [x] `src/config.ts` — add `correctionDetection: true` default + parsing (`c6317dd`)
40
+ - [x] `src/constants.ts` — add `CORRECTION_SAVE_PROMPT`, strong/weak/negative pattern arrays (`c6317dd`)
41
+
42
+ ### Implementation
43
+ - [x] `src/handlers/correction-detector.ts` — two-pass filter: strong/weak/negative patterns (`c6317dd`)
44
+ - [x] Rate limiting — `turnsSinceLastCorrection >= 3` and `!correctionInProgress` guard (`c6317dd`)
45
+ - [x] `src/index.ts` — wire `setupCorrectionDetector()` (`c6317dd`)
46
+
47
+ ### Tests
48
+ - [x] `tests/handlers/correction-detector.test.ts` — 35 tests: strong/weak/negative patterns, rate limiting, false positives (`83e7c46`)
49
+
50
+ ---
51
+
52
+ ## Epic 4: Tool-Call-Aware Nudge
53
+
54
+ _Done when: background review triggers based on EITHER turn count OR tool call count._
55
+
56
+ ### Config
57
+ - [x] `src/types.ts` — add `nudgeToolCalls: number` to `MemoryConfig` (`c6317dd`)
58
+ - [x] `src/config.ts` — add `nudgeToolCalls: 15` default + parsing (`c6317dd`)
59
+
60
+ ### Implementation
61
+ - [x] `src/handlers/background-review.ts` — count toolCall entries from branch; OR trigger logic; reset both counters (`c6317dd`)
62
+
63
+ ### Tests
64
+ - [x] `tests/handlers/background-review.test.ts` — 6 new tests: tool-call trigger, combined trigger, counter reset, text-only, crash recovery (`83e7c46`)
65
+
66
+ ---
67
+
68
+ ## Epic 1: Skill Tool + Procedural Memory
69
+
70
+ _Done when: the agent can create/update/delete skill documents, skills appear in a progressive index in the system prompt, and skills are auto-created after complex tasks._
71
+
72
+ ### Research & Design
73
+ - [x] Read Pi's skill discovery API — Pi uses `~/.pi/agent/skills/` with SKILL.md frontmatter format (`c6317dd`)
74
+ - [x] Decide: write to `~/.pi/agent/memory/skills/` — isolated from user skills (`c6317dd`)
75
+ - [x] Read Hermes `skill_manage` tool source for reference patterns (`c6317dd`)
76
+
77
+ ### Store
78
+ - [x] `src/store/skill-store.ts` — `SkillStore` class with full CRUD + `formatIndexForSystemPrompt()` (`c6317dd`)
79
+ - [x] SKILL.md format — frontmatter (name, description, version, created, updated) + markdown body (`c6317dd`)
80
+ - [x] File naming — `slugify(name) + ".md"` (`c6317dd`)
81
+ - [x] Frontmatter parsing — regex-based, no yaml dependency (`c6317dd`)
82
+ - [x] Content scanning — all writes go through `scanContent()` (`c6317dd`)
83
+ - [x] Atomic writes — temp+rename pattern (`c6317dd`)
84
+
85
+ ### Tool
86
+ - [x] `src/tools/skill-tool.ts` — `registerSkillTool()` with actions: `create`, `view`, `patch`, `edit`, `delete` (`c6317dd`)
87
+ - [x] `src/constants.ts` — add `SKILL_TOOL_DESCRIPTION` and `DEFAULT_SKILL_TRIGGER_TOOL_CALLS` (= 8) (`c6317dd`)
88
+ - [x] Rewrite `COMBINED_REVIEW_PROMPT` — references skill tool with create/patch actions (`c6317dd`)
89
+
90
+ ### Progressive Disclosure
91
+ - [x] Skill index (name + description only) injected into system prompt at `before_agent_start` (`c6317dd`)
92
+ - [x] `view` action loads full skill content on demand (`c6317dd`)
93
+ - [x] Frozen snapshot — index captured at `session_start`, consistent throughout session (`c6317dd`)
94
+
95
+ ### Auto-Trigger
96
+ - [x] `src/handlers/skill-auto-trigger.ts` — 8+ tool calls with 2+ distinct tool types (`c6317dd`)
97
+ - [x] Rate limit — max 1 auto-trigger per session (`c6317dd`)
98
+
99
+ ### Command
100
+ - [x] `src/handlers/skills-command.ts` — `/memory-skills` command (`c6317dd`)
101
+
102
+ ### Wiring
103
+ - [x] `src/index.ts` — wire SkillStore, registerSkillTool, setupSkillAutoTrigger, registerSkillsCommand (`c6317dd`)
104
+
105
+ ### Tests
106
+ - [x] `tests/store/skill-store.test.ts` — 27 tests: CRUD, frontmatter, progressive disclosure, atomic writes (`83e7c46`)
107
+ - [x] `tests/tools/skill-tool.test.ts` — 10 tests: registration, action dispatch, validation (`83e7c46`)
108
+ - [x] `tests/handlers/skill-auto-trigger.test.ts` — 6 tests: threshold, distinct types, session limit (`83e7c46`)
109
+
110
+ ---
111
+
112
+ ## Epic 5: Documentation & Release
113
+
114
+ _Done when: v0.2.0 is tagged and released with updated docs._
115
+
116
+ - [x] Update `README.md` — skill tool, auto-consolidation, correction detection, new config, new commands (`4658529`)
117
+ - [x] Update `src/constants.ts` — verify all new prompts are finalized (`c6317dd`)
118
+ - [x] Update `docs/ROADMAP.md` — v0.2 roadmap documented (`d5b7518`)
119
+ - [x] `npm run check` passes with zero errors (`c6317dd`)
120
+ - [x] `npm test` — all 218 tests pass (`83e7c46`)
121
+ - [x] Bump `package.json` version to `0.2.0`
122
+ - [x] Tag v0.2.0 release
123
+
124
+ ---
125
+
126
+ ## Summary
127
+
128
+ | Epic | Priority | Est. Complexity | New Files | Modified Files |
129
+ |---|---|---|---|---|
130
+ | 2: Auto-Consolidation | HIGH | Low | 3 (src + test) | 5 (types, config, constants, memory-store, memory-tool, index) |
131
+ | 3: Correction Detection | HIGH | Low | 2 (src + test) | 3 (types, config, constants, index) |
132
+ | 4: Tool-Call Nudge | MEDIUM | Low | 0 | 3 (types, config, background-review, test) |
133
+ | 1: Skill Tool | CRITICAL | High | 8 (4 src + 4 test) | 3 (constants, index, memory-store) |
134
+ | 5: Documentation | NORMAL | Low | 0 | 4 (README, constants, ROADMAP, package.json) |
@@ -0,0 +1,216 @@
1
+ # Test Plan — v0.2.0: Skills + Smart Curation
2
+
3
+ > This document defines the test strategy for v0.2.0. Each section maps to an epic in the implementation plan.
4
+
5
+ ## Current State
6
+
7
+ - **119 existing tests** — all passing after `add()` async migration
8
+ - **Zero new tests** yet for v0.2 features
9
+ - **Type check**: `npm run check` passes with zero errors
10
+
11
+ ---
12
+
13
+ ## Epic 2: Auto-Consolidation
14
+
15
+ ### Unit Tests: `tests/handlers/auto-consolidate.test.ts`
16
+
17
+ | Test | What | Expected |
18
+ |---|---|---|
19
+ | `triggerConsolidation builds correct prompt` | Verify prompt includes current entries + CONSOLIDATION_PROMPT | Prompt contains entry text and consolidation instructions |
20
+ | `triggerConsolidation returns consolidated on success` | Mock `pi.exec()` to return code 0 | `{ consolidated: true }` |
21
+ | `triggerConsolidation returns error on failure` | Mock `pi.exec()` to return code 1 | `{ consolidated: false, error: "..." }` |
22
+ | `triggerConsolidation returns error on exception` | Mock `pi.exec()` to throw | `{ consolidated: false, error: "Consolidation failed..." }` |
23
+ | `/memory-consolidate consolidates both targets` | Mock handler, verify both "memory" and "user" get consolidated | UI notification contains both results |
24
+ | `/memory-consolidate skips empty targets` | Store has empty user entries | Report shows "(empty, nothing to consolidate)" for that target |
25
+
26
+ ### Integration Tests: `tests/store/memory-store.test.ts` (extend existing)
27
+
28
+ | Test | What | Expected |
29
+ |---|---|---|
30
+ | `add() triggers consolidation when over limit` | Config `autoConsolidate: true`, mock consolidator returns success, entry exceeds limit | `add()` succeeds after consolidation + reload |
31
+ | `add() retries once after consolidation` | Verify consolidator called once, then `loadFromDisk()` called, then add succeeds | Entry appears in entries |
32
+ | `add() falls through to error if consolidation fails` | Mock consolidator returns `{ consolidated: false }` | Error about exceeding limit |
33
+ | `add() skips consolidation when disabled` | Config `autoConsolidate: false` | Error about exceeding limit (no consolidator call) |
34
+ | `add() skips consolidation when no consolidator set` | `setConsolidator()` not called | Error about exceeding limit |
35
+ | `add() consolidates for user target too` | Same test but target is "user" | Works identically |
36
+
37
+ ---
38
+
39
+ ## Epic 3: Correction Detection
40
+
41
+ ### Unit Tests: `tests/handlers/correction-detector.test.ts`
42
+
43
+ #### Pattern Matching: `isCorrection()`
44
+
45
+ **Strong patterns (always trigger):**
46
+ | Input | Expected |
47
+ |---|---|
48
+ | `"don't do that"` | `true` |
49
+ | `"not like that"` | `true` |
50
+ | `"I said use yarn"` | `true` |
51
+ | `"I told you already"` | `true` |
52
+ | `"we already discussed this"` | `true` |
53
+ | `"please don't commit yet"` | `true` |
54
+ | `"that's not what I asked for"` | `true` |
55
+
56
+ **Weak patterns (need directive clause):**
57
+ | Input | Expected | Why |
58
+ |---|---|---|
59
+ | `"no, use yarn instead"` | `true` | "use" is directive |
60
+ | `"wrong, the file is in src/"` | `true` | "the" is directive |
61
+ | `"actually, don't use that"` | `true` | "don't" is directive |
62
+ | `"stop, fix the test first"` | `true` | "fix" is directive |
63
+ | `"no worries, I'll handle it"` | `false` | Negative pattern suppresses |
64
+ | `"no problem"` | `false` | Negative pattern suppresses |
65
+ | `"no thanks"` | `false` | Negative pattern suppresses |
66
+ | `"no need to change that"` | `false` | Negative pattern suppresses |
67
+ | `"actually, that looks great"` | `false` | Negative pattern suppresses |
68
+ | `"actually, perfect"` | `false` | Negative pattern suppresses |
69
+ | `"stop there"` | `false` | Negative pattern suppresses |
70
+
71
+ **Non-corrections (should NOT trigger):**
72
+ | Input | Expected |
73
+ |---|---|
74
+ | `"yes, do that"` | `false` |
75
+ | `"looks good"` | `false` |
76
+ | `"can you also check the tests?"` | `false` |
77
+ | `""` | `false` |
78
+ | `"thanks"` | `false` |
79
+
80
+ #### Handler Behavior
81
+
82
+ | Test | What | Expected |
83
+ |---|---|---|
84
+ | `triggers pi.exec on correction` | Emit user message "no, don't use npm", then turn_end | `pi.exec()` called with CORRECTION_SAVE_PROMPT |
85
+ | `does not trigger on normal message` | Emit user message "looks good", then turn_end | `pi.exec()` NOT called |
86
+ | `rate limits: 1 per 3 turns` | Emit correction at turn 1, correction at turn 2 | Only first correction triggers save |
87
+ | `rate limit resets after 3 turns` | Emit correction at turn 1, then normal turns 2-4, then correction at turn 5 | Both corrections trigger save |
88
+ | `does not trigger when in progress` | Emit correction, then another correction before first completes | Only first triggers |
89
+ | `disabled via config` | Config `correctionDetection: false` | No handler registered |
90
+ | `includes recent context (last 6 exchanges)` | Verify prompt content | Prompt includes recent messages + current memory |
91
+
92
+ ---
93
+
94
+ ## Epic 4: Tool-Call-Aware Nudge
95
+
96
+ ### Unit Tests: `tests/handlers/background-review.test.ts` (extend existing)
97
+
98
+ | Test | What | Expected |
99
+ |---|---|---|
100
+ | `triggers on turn count threshold` | `turnsSinceReview >= 10`, `toolCallsSinceReview < 15` | Review triggers |
101
+ | `triggers on tool call count threshold` | `turnsSinceReview < 10`, `toolCallsSinceReview >= 15` | Review triggers |
102
+ | `triggers when both thresholds met` | Both thresholds exceeded | Review triggers |
103
+ | `does not trigger when neither threshold met` | Both below threshold | No review |
104
+ | `resets both counters after review` | After review completes | Both counters = 0 |
105
+ | `counts toolCall blocks from branch` | Branch has 3 assistant messages with 2 toolCall blocks each | `toolCallsSinceReview = 6` |
106
+ | `ignores text blocks in content` | Branch has text blocks only | `toolCallsSinceReview = 0` |
107
+ | `falls back gracefully if branch access fails` | `sessionManager.getBranch()` throws | No crash, continues with turn-based only |
108
+
109
+ ---
110
+
111
+ ## Epic 1: Skill Tool + Procedural Memory
112
+
113
+ ### Unit Tests: `tests/store/skill-store.test.ts`
114
+
115
+ #### CRUD Operations
116
+
117
+ | Test | What | Expected |
118
+ |---|---|---|
119
+ | `create() writes SKILL.md with correct frontmatter` | Create skill with name, description, body | File exists with `---\nname: ...\n---\n` format |
120
+ | `create() slugifies name correctly` | Name: `"Debug TypeScript Errors!"` | File: `debug-typescript-errors.md` |
121
+ | `create() returns error for duplicate name` | Create same skill twice | Second returns error about existing skill |
122
+ | `create() returns error for empty name` | `name: ""` | Error: "Skill name is required." |
123
+ | `create() returns error for empty description` | Valid name, empty description | Error: "Skill description is required." |
124
+ | `create() returns error for empty body` | Valid name/desc, empty body | Error: "Skill body is required." |
125
+ | `create() scans content for security` | Body contains injection pattern | Error: "Blocked: content matches threat pattern" |
126
+ | `loadIndex() returns all skills` | Create 3 skills, call loadIndex | Returns 3 SkillIndex entries |
127
+ | `loadIndex() returns empty array when no skills` | Empty skills dir | Returns `[]` |
128
+ | `loadSkill() returns full document` | Load specific .md file | Returns SkillDocument with all fields |
129
+ | `loadSkill() returns null for missing file` | Nonexistent file | Returns `null` |
130
+ | `loadSkill() returns null for missing frontmatter` | File without `---` frontmatter | Returns `null` |
131
+ | `patch() replaces existing section` | Skill has `## Procedure`, patch it | New content replaces old section |
132
+ | `patch() appends new section` | Skill has no `## Pitfalls`, patch it | Section appended |
133
+ | `patch() increments version` | Patch a skill | Version goes from 1 → 2 |
134
+ | `patch() scans content` | New content has injection pattern | Blocked |
135
+ | `edit() replaces description and body` | Edit with new desc + body | Both updated |
136
+ | `edit() replaces only description` | Edit with new desc, empty body | Only description changes |
137
+ | `edit() increments version` | Edit a skill | Version incremented |
138
+ | `delete() removes file` | Delete a skill | File gone, loadIndex returns one fewer |
139
+ | `delete() returns error for missing file` | Delete nonexistent | Error: "not found" |
140
+
141
+ #### Frontmatter Parsing
142
+
143
+ | Test | What | Expected |
144
+ |---|---|---|
145
+ | `parses standard frontmatter` | `---\nname: foo\ndescription: bar\n---\nbody` | `{ name: "foo", description: "bar", body: "body" }` |
146
+ | `handles value with colons` | `description: uses this: that` | Description = `"uses this: that"` |
147
+ | `handles empty body` | Frontmatter only, no body after `---` | body = `""` |
148
+ | `handles no frontmatter` | Plain markdown without `---` | Returns `{ meta: {}, body: raw }` |
149
+
150
+ #### Progressive Disclosure
151
+
152
+ | Test | What | Expected |
153
+ |---|---|---|
154
+ | `formatIndexForSystemPrompt() returns formatted index` | 2 skills | String with skill names + descriptions |
155
+ | `formatIndexForSystemPrompt() returns empty when no skills` | No skills | Returns `""` |
156
+ | `index does not include body content` | Skills have long bodies | Index only shows name + description |
157
+
158
+ #### Atomic Writes
159
+
160
+ | Test | What | Expected |
161
+ |---|---|---|
162
+ | `create() uses atomic write (file exists after create)` | Create skill, read file | File exists with correct content |
163
+ | `file content is correct after create + patch` | Create then patch | File on disk reflects patch |
164
+
165
+ ### Unit Tests: `tests/tools/skill-tool.test.ts`
166
+
167
+ | Test | What | Expected |
168
+ |---|---|---|
169
+ | `registers tool with name 'skill'` | Check tool registration | Tool name = "skill" |
170
+ | `create requires name, description, content` | Missing each param | Error for each missing param |
171
+ | `view without file_name lists all skills` | No file_name param | Returns skill index |
172
+ | `view with file_name returns full document` | Valid file_name | Returns SkillDocument |
173
+ | `view with invalid file_name returns error` | Nonexistent file | Error: "not found" |
174
+ | `patch requires file_name, section, content` | Missing params | Error for each |
175
+ | `edit requires file_name` | No file_name | Error |
176
+ | `delete requires file_name` | No file_name | Error |
177
+ | `unknown action returns error` | `action: "foo"` | Error: "Unknown action" |
178
+
179
+ ### Unit Tests: `tests/handlers/skill-auto-trigger.test.ts`
180
+
181
+ | Test | What | Expected |
182
+ |---|---|---|
183
+ | `triggers at 8+ tool calls with 2+ types` | Branch has 8 toolCall blocks with 3 distinct tool names | `pi.exec()` called |
184
+ | `does not trigger below 8 tool calls` | Branch has 7 toolCall blocks | Not triggered |
185
+ | `does not trigger with only 1 tool type` | 10 toolCall blocks, all same tool | Not triggered |
186
+ | `only triggers once per session` | Two turn_end events both meeting threshold | Only first triggers |
187
+ | `handles branch access failure gracefully` | `getBranch()` throws | No crash |
188
+
189
+ ---
190
+
191
+ ## Epic 5: Documentation & Release
192
+
193
+ ### Manual Verification
194
+
195
+ | Check | Command | Expected |
196
+ |---|---|---|
197
+ | Type check passes | `npm run check` | Zero errors |
198
+ | All tests pass | `npm test` | 119+ tests, 0 failures |
199
+ | README updated | Manual review | Mentions skill tool, auto-consolidation, correction detection |
200
+ | ROADMAP updated | Manual review | v0.2 marked complete |
201
+ | Version bumped | `cat package.json \| grep version` | `"version": "0.2.0"` |
202
+ | Git tagged | `git tag -l "v0.2*"` | `v0.2.0` exists |
203
+
204
+ ---
205
+
206
+ ## Summary
207
+
208
+ | Area | New Tests | Existing Tests Modified |
209
+ |---|---|---|
210
+ | Auto-Consolidation | 6 + 6 | `memory-store.test.ts` (6 tests for async+consolidator) |
211
+ | Correction Detection | ~20 (patterns + handler) | — |
212
+ | Tool-Call Nudge | 8 | `background-review.test.ts` (extend) |
213
+ | Skill Store | ~25 | — |
214
+ | Skill Tool | ~10 | — |
215
+ | Skill Auto-Trigger | 5 | — |
216
+ | **Total** | **~80 new tests** | **~14 modified** |