opencodekit 0.20.6 → 0.20.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +1 -1
- package/dist/template/.opencode/command/compound.md +102 -28
- package/dist/template/.opencode/command/curate.md +299 -0
- package/dist/template/.opencode/command/lfg.md +1 -0
- package/dist/template/.opencode/command/ship.md +1 -0
- package/dist/template/.opencode/memory.db +0 -0
- package/dist/template/.opencode/memory.db-shm +0 -0
- package/dist/template/.opencode/memory.db-wal +0 -0
- package/dist/template/.opencode/skill/reflection-checkpoints/SKILL.md +183 -0
- package/package.json +1 -1
package/dist/index.js
CHANGED
|
@@ -8,10 +8,17 @@ agent: build
|
|
|
8
8
|
|
|
9
9
|
Capture what was learned. This is the flywheel step — each cycle makes the next cycle faster.
|
|
10
10
|
|
|
11
|
-
> **Workflow:** `/plan` → `/ship` →
|
|
11
|
+
> **Workflow:** `/plan` → `/ship` → **`/compound`** → `/pr`
|
|
12
12
|
>
|
|
13
13
|
> Run after every completed task, review, or PR merge. The value compounds over time.
|
|
14
14
|
|
|
15
|
+
## Load Skills
|
|
16
|
+
|
|
17
|
+
```typescript
|
|
18
|
+
skill({ name: "memory-system" });
|
|
19
|
+
skill({ name: "verification-before-completion" });
|
|
20
|
+
```
|
|
21
|
+
|
|
15
22
|
## What This Does
|
|
16
23
|
|
|
17
24
|
Extracts learnings from the just-completed work and stores them as structured observations in memory,
|
|
@@ -47,7 +54,7 @@ For each finding, assign a type:
|
|
|
47
54
|
| `pattern` | A reusable approach confirmed to work in this codebase | "Always use X pattern for Y type of component" |
|
|
48
55
|
| `bugfix` | A non-obvious bug and its root cause | "Bun doesn't support X, use Y instead" |
|
|
49
56
|
| `decision` | An architectural or design choice with rationale | "Chose JWT over sessions because..." |
|
|
50
|
-
| `
|
|
57
|
+
| `warning` | A footgun, constraint, or thing that looks wrong but isn't | "Don't modify dist/ directly, build overwrites" |
|
|
51
58
|
| `discovery` | A non-obvious fact about the codebase or its dependencies | "Build copies .opencode/ to dist/template/" |
|
|
52
59
|
| `warning` | Something that will break if not followed | "Always run lint:fix before commit" |
|
|
53
60
|
|
|
@@ -60,19 +67,60 @@ For each learning worth keeping, create an observation:
|
|
|
60
67
|
|
|
61
68
|
```typescript
|
|
62
69
|
observation({
|
|
63
|
-
type: "pattern", // or bugfix, decision,
|
|
70
|
+
type: "pattern", // or bugfix, decision, discovery, warning, learning
|
|
64
71
|
title: "[Concise, searchable title — what someone would search for]",
|
|
65
72
|
narrative: "[What happened, why it matters, how to apply it]",
|
|
66
73
|
facts: "[comma, separated, key, facts]",
|
|
67
74
|
concepts: "[searchable, keywords, for, future, retrieval]",
|
|
68
75
|
files_modified: "[relevant/file.ts if applicable]",
|
|
69
76
|
confidence: "high", // high=verified, medium=likely, low=speculative
|
|
77
|
+
// ByteRover-inspired quality fields:
|
|
78
|
+
subtitle: "[One-line semantic summary — WHY this matters for future work]",
|
|
70
79
|
});
|
|
71
80
|
```
|
|
72
81
|
|
|
73
82
|
**Minimum viable:** title + narrative. Everything else is bonus.
|
|
74
83
|
|
|
75
|
-
|
|
84
|
+
**Quality enrichment:** Add `subtitle` (WHY it matters) for high-impact observations. Skip for routine findings.
|
|
85
|
+
|
|
86
|
+
## Phase 4: Structural Loss Prevention
|
|
87
|
+
|
|
88
|
+
When superseding an older observation, prevent accidental knowledge loss.
|
|
89
|
+
|
|
90
|
+
**Trigger:** Only runs when `supersedes: "ID"` is set on a new observation.
|
|
91
|
+
|
|
92
|
+
### Step 1: Read the old observation
|
|
93
|
+
|
|
94
|
+
```typescript
|
|
95
|
+
const old = memory_get({ ids: "<superseded-id>" });
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
### Step 2: Detect structural loss
|
|
99
|
+
|
|
100
|
+
Compare the new observation against the old one:
|
|
101
|
+
|
|
102
|
+
| Field | Loss Detection |
|
|
103
|
+
| ---------------- | --------------------------------------------------------------- |
|
|
104
|
+
| `facts` | Old facts not present in new facts (comma-separated comparison) |
|
|
105
|
+
| `concepts` | Old concepts not present in new concepts |
|
|
106
|
+
| `narrative` | New narrative significantly shorter than old (< 50% length) |
|
|
107
|
+
| `files_modified` | Old file paths not present in new list |
|
|
108
|
+
|
|
109
|
+
### Step 3: Auto-merge if loss detected
|
|
110
|
+
|
|
111
|
+
- **Array fields** (facts, concepts): Union merge — keep all old items, add new items, deduplicate
|
|
112
|
+
- **Scalar fields** (narrative): If new is shorter, append `\n\n[Preserved from superseded observation #ID:]\n` + old narrative section
|
|
113
|
+
- **File paths**: Union merge all paths
|
|
114
|
+
|
|
115
|
+
### Step 4: Flag for review if high-impact
|
|
116
|
+
|
|
117
|
+
If the old observation had `confidence: "high"` and the new one has `confidence: "medium"` or `confidence: "low"`, flag with a warning:
|
|
118
|
+
|
|
119
|
+
> ⚠️ Confidence downgrade detected: superseding a high-confidence observation (#ID) with lower confidence. Verify this is intentional.
|
|
120
|
+
|
|
121
|
+
**Principle:** Knowledge should accumulate, not be replaced. Merging is safer than overwriting.
|
|
122
|
+
|
|
123
|
+
## Phase 5: Check AGENTS.md / Skill Updates
|
|
76
124
|
|
|
77
125
|
Ask: does this learning belong as a permanent rule?
|
|
78
126
|
|
|
@@ -88,18 +136,18 @@ If MAYBE (it's a pattern, not a rule):
|
|
|
88
136
|
|
|
89
137
|
**Rule:** AGENTS.md changes require user confirmation. Observations are automatic.
|
|
90
138
|
|
|
91
|
-
## Phase
|
|
139
|
+
## Phase 6: Update Living Documentation
|
|
92
140
|
|
|
93
141
|
Check if the shipped work changed architecture, APIs, conventions, or tech stack. If so, update the relevant project docs.
|
|
94
142
|
|
|
95
143
|
**Check each:**
|
|
96
144
|
|
|
97
|
-
| Doc
|
|
98
|
-
|
|
|
99
|
-
| `tech-stack.md`
|
|
100
|
-
| `project.md`
|
|
101
|
-
| `gotchas.md`
|
|
102
|
-
| `AGENTS.md` (project) | New convention established, boundary rule needed
|
|
145
|
+
| Doc | Update When | What to Update |
|
|
146
|
+
| --------------------- | --------------------------------------------------------- | --------------------------------------------------- |
|
|
147
|
+
| `tech-stack.md` | New dependency added, build tool changed, runtime updated | Dependencies list, build tools, constraints |
|
|
148
|
+
| `project.md` | Architecture changed, new key files, success criteria met | Architecture section, key files table, phase status |
|
|
149
|
+
| `gotchas.md` | New footgun discovered, constraint found | Add the gotcha with context |
|
|
150
|
+
| `AGENTS.md` (project) | New convention established, boundary rule needed | Boundaries, gotchas, code example sections |
|
|
103
151
|
|
|
104
152
|
```typescript
|
|
105
153
|
// Check what changed
|
|
@@ -110,7 +158,8 @@ memory_update({ file: "project/gotchas", content: "...", mode: "append" });
|
|
|
110
158
|
```
|
|
111
159
|
|
|
112
160
|
**Rule:** Only update docs when the change is structural (new pattern, new dep, new constraint). Don't update for routine bug fixes or small features. Ask user before modifying `AGENTS.md`.
|
|
113
|
-
|
|
161
|
+
|
|
162
|
+
## Phase 7: Search for Related Past Observations
|
|
114
163
|
|
|
115
164
|
```typescript
|
|
116
165
|
// Check if this updates or supersedes an older observation
|
|
@@ -128,24 +177,48 @@ observation({
|
|
|
128
177
|
});
|
|
129
178
|
```
|
|
130
179
|
|
|
131
|
-
## Phase
|
|
180
|
+
## Phase 8: Output Summary
|
|
132
181
|
|
|
133
|
-
|
|
182
|
+
Present extracted learnings for user review before finalizing:
|
|
134
183
|
|
|
135
184
|
```
|
|
136
|
-
## Compound
|
|
185
|
+
## Compound Review
|
|
137
186
|
|
|
138
187
|
**Work reviewed:** [brief description]
|
|
139
|
-
**Learnings
|
|
188
|
+
**Learnings extracted:** [N] observations
|
|
189
|
+
|
|
190
|
+
| # | Type | Title | Impact | Action |
|
|
191
|
+
|---|------|-------|--------|--------|
|
|
192
|
+
| 1 | pattern | ... | high | ✅ Store |
|
|
193
|
+
| 2 | warning | ... | medium | ✅ Store |
|
|
194
|
+
| 3 | bugfix | ... | low | ⏭️ Skip (routine) |
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
```typescript
|
|
198
|
+
question({
|
|
199
|
+
questions: [
|
|
200
|
+
{
|
|
201
|
+
header: "Approve Learnings",
|
|
202
|
+
question: "Review extracted learnings. Store all approved observations?",
|
|
203
|
+
options: [
|
|
204
|
+
{ label: "Store all (Recommended)", description: "Persist all marked ✅" },
|
|
205
|
+
{ label: "Let me adjust", description: "I'll modify before storing" },
|
|
206
|
+
{ label: "Skip compound", description: "Nothing worth persisting" },
|
|
207
|
+
],
|
|
208
|
+
},
|
|
209
|
+
],
|
|
210
|
+
});
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
**After approval:** Store observations and report final summary:
|
|
140
214
|
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
| 1 | pattern | ... | auth, jwt |
|
|
144
|
-
| 2 | gotcha | ... | node, build |
|
|
145
|
-
| 3 | bugfix | ... | typecheck, strict-mode |
|
|
215
|
+
```
|
|
216
|
+
## Compound Summary
|
|
146
217
|
|
|
218
|
+
**Observations stored:** [N]
|
|
219
|
+
**Superseded:** [N] older observations updated
|
|
147
220
|
**AGENTS.md updates suggested:** [yes/no - describe if yes]
|
|
148
|
-
**Next recommended:** /pr
|
|
221
|
+
**Next recommended:** /pr (or /plan <next-bead-id>)
|
|
149
222
|
```
|
|
150
223
|
|
|
151
224
|
## When Nothing to Compound
|
|
@@ -158,9 +231,10 @@ Don't force observations. Quality over quantity.
|
|
|
158
231
|
|
|
159
232
|
## Related Commands
|
|
160
233
|
|
|
161
|
-
| Need
|
|
162
|
-
|
|
|
163
|
-
| Full chain
|
|
164
|
-
| Review
|
|
165
|
-
| Ship the work
|
|
166
|
-
|
|
|
234
|
+
| Need | Command |
|
|
235
|
+
| --------------- | ------------------ |
|
|
236
|
+
| Full chain | `/lfg` |
|
|
237
|
+
| Review codebase | `/review-codebase` |
|
|
238
|
+
| Ship the work | `/ship` |
|
|
239
|
+
| Curate memory | `/curate` |
|
|
240
|
+
| Create PR | `/pr` |
|
|
@@ -0,0 +1,299 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: Organize, deduplicate, and curate knowledge in project memory
|
|
3
|
+
argument-hint: "[--scope recent|all] [--auto-merge]"
|
|
4
|
+
agent: build
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Curate: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Organize accumulated knowledge. Surface conflicts, merge duplicates, archive stale observations.
|
|
10
|
+
|
|
11
|
+
> **Workflow:** `/ship` → `/compound` → **`/curate`** → `/pr`
|
|
12
|
+
>
|
|
13
|
+
> Run periodically (weekly or after major work) to keep memory sharp. Inspired by ByteRover's structured curation pipeline.
|
|
14
|
+
|
|
15
|
+
## Load Skills
|
|
16
|
+
|
|
17
|
+
```typescript
|
|
18
|
+
skill({ name: "memory-system" });
|
|
19
|
+
skill({ name: "verification-before-completion" });
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Parse Arguments
|
|
23
|
+
|
|
24
|
+
| Argument | Default | Description |
|
|
25
|
+
| -------------- | -------- | ------------------------------------------------ |
|
|
26
|
+
| `--scope` | `recent` | `recent` = last 30 days, `all` = entire memory |
|
|
27
|
+
| `--auto-merge` | false | Auto-merge exact duplicates without confirmation |
|
|
28
|
+
|
|
29
|
+
## Phase 1: Inventory
|
|
30
|
+
|
|
31
|
+
Take stock of current memory state:
|
|
32
|
+
|
|
33
|
+
```typescript
|
|
34
|
+
memory_admin({ operation: "status" });
|
|
35
|
+
memory_admin({ operation: "capture-stats" });
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
Report:
|
|
39
|
+
|
|
40
|
+
```
|
|
41
|
+
## Memory Inventory
|
|
42
|
+
|
|
43
|
+
| Metric | Count |
|
|
44
|
+
|--------|-------|
|
|
45
|
+
| Total observations | [N] |
|
|
46
|
+
| Recent (30 days) | [N] |
|
|
47
|
+
| By type | pattern: N, decision: N, bugfix: N, ... |
|
|
48
|
+
| Confidence distribution | high: N, medium: N, low: N |
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Phase 2: Domain Detection
|
|
52
|
+
|
|
53
|
+
Analyze observations to extract semantic domains — groups of related knowledge.
|
|
54
|
+
|
|
55
|
+
```typescript
|
|
56
|
+
// Get memory status for inventory
|
|
57
|
+
memory_admin({ operation: "status" });
|
|
58
|
+
|
|
59
|
+
// Search by common concept categories to build domain map
|
|
60
|
+
const domains = [];
|
|
61
|
+
for (const concept of ["build", "test", "memory", "git", "agent", "auth", "ui", "config"]) {
|
|
62
|
+
const results = memory_search({ query: concept, limit: 20 });
|
|
63
|
+
// Group results by concept affinity
|
|
64
|
+
}
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
Categorize observations into domains based on their `concepts` and `title` fields:
|
|
68
|
+
|
|
69
|
+
| Domain | Example Concepts | Observation Count |
|
|
70
|
+
| -------------- | ---------------------------------- | ----------------- |
|
|
71
|
+
| `build_system` | build, tsdown, rsync, dist | [N] |
|
|
72
|
+
| `testing` | vitest, test, TDD, coverage | [N] |
|
|
73
|
+
| `memory` | observation, FTS5, sqlite, handoff | [N] |
|
|
74
|
+
| `git_workflow` | commit, branch, push, PR | [N] |
|
|
75
|
+
| `agent_system` | subagent, delegation, skills | [N] |
|
|
76
|
+
|
|
77
|
+
**Domain naming rules:**
|
|
78
|
+
|
|
79
|
+
- snake_case, 1-3 words
|
|
80
|
+
- Semantically meaningful (not just "misc")
|
|
81
|
+
- Maximum 10 domains (merge small groups)
|
|
82
|
+
|
|
83
|
+
## Phase 3: Conflict & Duplicate Detection
|
|
84
|
+
|
|
85
|
+
### 3a. Exact Duplicates
|
|
86
|
+
|
|
87
|
+
```typescript
|
|
88
|
+
memory_admin({ operation: "lint" });
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Flag observations with identical or near-identical titles and narratives. Present for merge:
|
|
92
|
+
|
|
93
|
+
```
|
|
94
|
+
### Duplicates Found
|
|
95
|
+
|
|
96
|
+
| Obs A | Obs B | Similarity | Recommended Action |
|
|
97
|
+
|-------|-------|------------|-------------------|
|
|
98
|
+
| #12 "Use JWT for auth" | #45 "JWT chosen for auth" | 95% title match | MERGE → keep #45 (newer) |
|
|
99
|
+
| #8 "Build copies .opencode/" | #33 "Build copies .opencode/" | 100% title match | MERGE → keep #33 (newer) |
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
### 3b. Contradictions
|
|
103
|
+
|
|
104
|
+
Search for observations where:
|
|
105
|
+
|
|
106
|
+
- Same concepts but different decisions
|
|
107
|
+
- Same file paths but conflicting patterns
|
|
108
|
+
- Confidence downgrade without supersedes link
|
|
109
|
+
|
|
110
|
+
```
|
|
111
|
+
### Contradictions Found
|
|
112
|
+
|
|
113
|
+
| Obs A | Obs B | Conflict | Recommended Action |
|
|
114
|
+
|-------|-------|----------|-------------------|
|
|
115
|
+
| #5 "Always use X pattern" | #29 "Avoid X pattern" | Opposite recommendations | RESOLVE — ask user which is current |
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### 3c. Stale Observations
|
|
119
|
+
|
|
120
|
+
Flag observations where:
|
|
121
|
+
|
|
122
|
+
- Referenced files no longer exist
|
|
123
|
+
- Referenced patterns no longer appear in codebase
|
|
124
|
+
- Over 90 days old with no related recent activity
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
### Stale Observations
|
|
128
|
+
|
|
129
|
+
| Obs | Age | Reason | Recommended Action |
|
|
130
|
+
|-----|-----|--------|-------------------|
|
|
131
|
+
| #3 "src/old-file.ts pattern" | 120 days | File deleted | ARCHIVE |
|
|
132
|
+
| #7 "Use moment.js for dates" | 95 days | Dependency removed | ARCHIVE |
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
## Phase 4: Present Curation Plan
|
|
136
|
+
|
|
137
|
+
Compile all findings into a review table:
|
|
138
|
+
|
|
139
|
+
```
|
|
140
|
+
## Curation Plan
|
|
141
|
+
|
|
142
|
+
### Actions Required
|
|
143
|
+
|
|
144
|
+
| # | Observation | Action | Reason |
|
|
145
|
+
|---|------------|--------|--------|
|
|
146
|
+
| 1 | #12 + #45 | MERGE | Duplicate — keep newer |
|
|
147
|
+
| 2 | #5 vs #29 | RESOLVE | Contradicting patterns |
|
|
148
|
+
| 3 | #3 | ARCHIVE | Referenced file deleted |
|
|
149
|
+
| 4 | #7 | ARCHIVE | Dependency removed |
|
|
150
|
+
| 5 | #18 | UPDATE | Low confidence → verify |
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
```typescript
|
|
154
|
+
question({
|
|
155
|
+
questions: [
|
|
156
|
+
{
|
|
157
|
+
header: "Curation Plan",
|
|
158
|
+
question: "Review the curation plan. Proceed with all actions?",
|
|
159
|
+
options: [
|
|
160
|
+
{ label: "Execute all (Recommended)", description: "Apply all actions above" },
|
|
161
|
+
{ label: "Let me cherry-pick", description: "I'll approve individually" },
|
|
162
|
+
{ label: "Skip curation", description: "No changes to memory" },
|
|
163
|
+
],
|
|
164
|
+
},
|
|
165
|
+
],
|
|
166
|
+
});
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
## Phase 5: Execute Curation
|
|
170
|
+
|
|
171
|
+
For each approved action:
|
|
172
|
+
|
|
173
|
+
### MERGE (duplicates)
|
|
174
|
+
|
|
175
|
+
```typescript
|
|
176
|
+
// Read both observations
|
|
177
|
+
const older = memory_get({ ids: "<older-id>" });
|
|
178
|
+
const newer = memory_get({ ids: "<newer-id>" });
|
|
179
|
+
|
|
180
|
+
// Union-merge: combine comma-separated lists, deduplicate (case-insensitive), existing items first
|
|
181
|
+
// Example: older.facts="auth, jwt" + newer.facts="jwt, session" → "auth, jwt, session"
|
|
182
|
+
|
|
183
|
+
// Create merged observation (newer as base, merge fields from older)
|
|
184
|
+
observation({
|
|
185
|
+
type: newer.type,
|
|
186
|
+
title: newer.title,
|
|
187
|
+
narrative: newer.narrative,
|
|
188
|
+
// Manually combine comma-separated fields: keep all unique items from both
|
|
189
|
+
facts: "[combined unique facts from older + newer]",
|
|
190
|
+
concepts: "[combined unique concepts from older + newer]",
|
|
191
|
+
files_modified: "[combined unique file paths from older + newer]",
|
|
192
|
+
confidence: newer.confidence, // Newer confidence wins
|
|
193
|
+
supersedes: "<older-id>",
|
|
194
|
+
subtitle: "Merged from #<older-id> + #<newer-id>",
|
|
195
|
+
});
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
**Union merge rule:** Combine comma-separated lists, deduplicate (case-insensitive), existing items first.
|
|
199
|
+
|
|
200
|
+
### RESOLVE (contradictions)
|
|
201
|
+
|
|
202
|
+
Present the conflicting observations side-by-side:
|
|
203
|
+
|
|
204
|
+
```
|
|
205
|
+
### Contradiction: #5 vs #29
|
|
206
|
+
|
|
207
|
+
**#5 (older, high confidence):**
|
|
208
|
+
> Always use X pattern for Y components
|
|
209
|
+
|
|
210
|
+
**#29 (newer, medium confidence):**
|
|
211
|
+
> Avoid X pattern — causes Z issues
|
|
212
|
+
|
|
213
|
+
Which is the current truth?
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
```typescript
|
|
217
|
+
question({
|
|
218
|
+
questions: [
|
|
219
|
+
{
|
|
220
|
+
header: "Resolve Conflict",
|
|
221
|
+
question: "Which observation reflects the current codebase reality?",
|
|
222
|
+
options: [
|
|
223
|
+
{ label: "#5 (older) is correct", description: "Archive #29, keep #5" },
|
|
224
|
+
{ label: "#29 (newer) is correct", description: "Supersede #5 with #29" },
|
|
225
|
+
{ label: "Both partially correct", description: "I'll write a reconciled version" },
|
|
226
|
+
],
|
|
227
|
+
},
|
|
228
|
+
],
|
|
229
|
+
});
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
### ARCHIVE (stale)
|
|
233
|
+
|
|
234
|
+
```typescript
|
|
235
|
+
// Verify staleness by checking codebase
|
|
236
|
+
// If file doesn't exist or pattern not found:
|
|
237
|
+
observation({
|
|
238
|
+
type: "warning",
|
|
239
|
+
title: "Archived: [original title]",
|
|
240
|
+
narrative: "Archived during curation — [reason]. Original observation #<id>.",
|
|
241
|
+
supersedes: "<stale-id>",
|
|
242
|
+
confidence: "low",
|
|
243
|
+
});
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
### UPDATE (low confidence → verify)
|
|
247
|
+
|
|
248
|
+
```typescript
|
|
249
|
+
// Search codebase for evidence
|
|
250
|
+
// If evidence found → upgrade confidence
|
|
251
|
+
// If evidence not found → archive
|
|
252
|
+
```
|
|
253
|
+
|
|
254
|
+
## Phase 6: Compile Knowledge Index
|
|
255
|
+
|
|
256
|
+
After curation, regenerate the knowledge index:
|
|
257
|
+
|
|
258
|
+
```typescript
|
|
259
|
+
memory_admin({ operation: "compile" });
|
|
260
|
+
memory_admin({ operation: "index" });
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
## Phase 7: Report
|
|
264
|
+
|
|
265
|
+
```
|
|
266
|
+
## Curation Summary
|
|
267
|
+
|
|
268
|
+
**Scope:** [recent / all]
|
|
269
|
+
**Observations reviewed:** [N]
|
|
270
|
+
**Domains identified:** [N]
|
|
271
|
+
|
|
272
|
+
| Action | Count | Details |
|
|
273
|
+
|--------|-------|---------|
|
|
274
|
+
| Merged | [N] | [list merged pairs] |
|
|
275
|
+
| Resolved | [N] | [list resolved conflicts] |
|
|
276
|
+
| Archived | [N] | [list archived observations] |
|
|
277
|
+
| Updated | [N] | [list confidence changes] |
|
|
278
|
+
| No change | [N] | |
|
|
279
|
+
|
|
280
|
+
**Memory health:** [Healthy / Needs attention: describe]
|
|
281
|
+
**Next recommended:** /pr or continue work
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
## When Nothing to Curate
|
|
285
|
+
|
|
286
|
+
If all observations are clean, well-organized, and non-conflicting:
|
|
287
|
+
|
|
288
|
+
> "Memory is clean. No duplicates, contradictions, or stale observations found. [N] observations across [M] domains."
|
|
289
|
+
|
|
290
|
+
Don't force curation. Quality memory means less curation needed.
|
|
291
|
+
|
|
292
|
+
## Related Commands
|
|
293
|
+
|
|
294
|
+
| Need | Command |
|
|
295
|
+
| ----------------------- | ------------------------------ |
|
|
296
|
+
| Extract learnings first | `/compound` |
|
|
297
|
+
| Full chain | `/lfg` |
|
|
298
|
+
| Check memory health | `/health` |
|
|
299
|
+
| Search memory | Use `memory-search()` directly |
|
|
@@ -61,6 +61,7 @@ Execute the plan:
|
|
|
61
61
|
|
|
62
62
|
```typescript
|
|
63
63
|
skill({ name: "executing-plans" });
|
|
64
|
+
skill({ name: "reflection-checkpoints" }); // Phase transition + mid-point checks
|
|
64
65
|
// Load plan.md, execute wave-by-wave
|
|
65
66
|
// Per-task commits after each task passes verification
|
|
66
67
|
```
|
|
@@ -19,6 +19,7 @@ skill({ name: "beads" });
|
|
|
19
19
|
skill({ name: "memory-grounding" });
|
|
20
20
|
skill({ name: "workspace-setup" });
|
|
21
21
|
skill({ name: "verification-before-completion" });
|
|
22
|
+
skill({ name: "reflection-checkpoints" }); // Mid-point + completion checks during execution
|
|
22
23
|
```
|
|
23
24
|
|
|
24
25
|
## Determine Input Type
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
@@ -0,0 +1,183 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: reflection-checkpoints
|
|
3
|
+
description: >
|
|
4
|
+
Use when executing long-running commands (/ship, /lfg) to add self-assessment
|
|
5
|
+
checkpoints that detect scope drift, stalled progress, and premature completion claims.
|
|
6
|
+
Inspired by ByteRover's reflection prompt architecture.
|
|
7
|
+
version: 1.0.0
|
|
8
|
+
tags: [workflow, quality, autonomous]
|
|
9
|
+
dependencies: [verification-before-completion]
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Reflection Checkpoints
|
|
13
|
+
|
|
14
|
+
## When to Use
|
|
15
|
+
|
|
16
|
+
- During `/ship` execution after completing 50%+ of tasks
|
|
17
|
+
- During `/lfg` at each phase transition (Plan→Work→Review→Compound)
|
|
18
|
+
- When a task takes significantly longer than estimated
|
|
19
|
+
- When context usage exceeds 60% of budget
|
|
20
|
+
|
|
21
|
+
## When NOT to Use
|
|
22
|
+
|
|
23
|
+
- Simple, single-task work (< 3 tasks)
|
|
24
|
+
- Pure research or exploration commands
|
|
25
|
+
- When user explicitly requests fast execution without checkpoints
|
|
26
|
+
|
|
27
|
+
## Overview
|
|
28
|
+
|
|
29
|
+
Long-running autonomous execution drifts silently. By the time you notice, you've burned context on the wrong thing. Reflection checkpoints force self-assessment at critical moments — catching drift before it compounds.
|
|
30
|
+
|
|
31
|
+
**Core principle:** Pause to assess, don't just assess to pause.
|
|
32
|
+
|
|
33
|
+
## The Four Reflection Types
|
|
34
|
+
|
|
35
|
+
### 1. Mid-Point Check
|
|
36
|
+
|
|
37
|
+
**Trigger:** After completing ~50% of planned tasks (e.g., 3 of 6 tasks done)
|
|
38
|
+
|
|
39
|
+
```
|
|
40
|
+
## 🔍 Mid-Point Reflection
|
|
41
|
+
|
|
42
|
+
**Progress:** [N/M] tasks complete
|
|
43
|
+
**Context used:** ~[X]% estimated
|
|
44
|
+
|
|
45
|
+
### Scope Check
|
|
46
|
+
- [ ] Am I still solving the original problem?
|
|
47
|
+
- [ ] Have I introduced any unplanned work?
|
|
48
|
+
- [ ] Are remaining tasks still correctly scoped?
|
|
49
|
+
|
|
50
|
+
### Quality Check
|
|
51
|
+
- [ ] Do completed tasks actually work (not just "done")?
|
|
52
|
+
- [ ] Any verification steps I deferred?
|
|
53
|
+
- [ ] Any TODO/FIXME I left that needs addressing?
|
|
54
|
+
|
|
55
|
+
### Efficiency Check
|
|
56
|
+
- [ ] Am I spending context on the right things?
|
|
57
|
+
- [ ] Should remaining tasks be parallelized?
|
|
58
|
+
- [ ] Any tasks that should be deferred to a follow-up bead?
|
|
59
|
+
|
|
60
|
+
**Assessment:** [On track / Drifting / Blocked]
|
|
61
|
+
**Adjustment:** [None needed / Describe change]
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### 2. Completion Check
|
|
65
|
+
|
|
66
|
+
**Trigger:** Before claiming any task or phase is complete
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
## ✅ Completion Check
|
|
70
|
+
|
|
71
|
+
**Claiming complete:** [task/phase name]
|
|
72
|
+
|
|
73
|
+
### Evidence Audit
|
|
74
|
+
- [ ] Verification command was run (not assumed)
|
|
75
|
+
- [ ] Output confirms the claim (not inferred)
|
|
76
|
+
- [ ] No stub patterns in modified files
|
|
77
|
+
- [ ] Imports/exports are wired (not just declared)
|
|
78
|
+
|
|
79
|
+
### Goal-Backward Check
|
|
80
|
+
- [ ] Does this task achieve its stated end-state?
|
|
81
|
+
- [ ] Would a user see the expected behavior?
|
|
82
|
+
- [ ] If tested manually, would it work?
|
|
83
|
+
|
|
84
|
+
**Verdict:** [Complete / Needs work: describe what]
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### 3. Near-Limit Warning
|
|
88
|
+
|
|
89
|
+
**Trigger:** When context usage exceeds ~70% or step count approaches limit
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
## ⚠️ Near-Limit Warning
|
|
93
|
+
|
|
94
|
+
**Context pressure:** [High / Critical]
|
|
95
|
+
**Remaining tasks:** [N]
|
|
96
|
+
|
|
97
|
+
### Triage
|
|
98
|
+
1. What MUST be done before stopping? [list critical tasks]
|
|
99
|
+
2. What CAN be deferred? [list deferrable tasks]
|
|
100
|
+
3. What should be handed off? [list with context needed]
|
|
101
|
+
|
|
102
|
+
### Action
|
|
103
|
+
- [ ] Compress completed work
|
|
104
|
+
- [ ] Prioritize remaining tasks ruthlessly
|
|
105
|
+
- [ ] Prepare handoff if needed
|
|
106
|
+
|
|
107
|
+
**Decision:** [Continue (enough budget) / Compress and continue / Handoff now]
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### 4. Phase Transition Check
|
|
111
|
+
|
|
112
|
+
**Trigger:** At `/lfg` phase boundaries (Plan→Work, Work→Review, Review→Compound)
|
|
113
|
+
|
|
114
|
+
```
|
|
115
|
+
## 🔄 Phase Transition: [Previous] → [Next]
|
|
116
|
+
|
|
117
|
+
### Previous Phase Assessment
|
|
118
|
+
- **Objective met?** [Yes / Partially / No]
|
|
119
|
+
- **Artifacts produced:** [list]
|
|
120
|
+
- **Open issues carried forward:** [list or "none"]
|
|
121
|
+
|
|
122
|
+
### Next Phase Readiness
|
|
123
|
+
- [ ] Prerequisites satisfied
|
|
124
|
+
- [ ] Context is clean (no stale noise)
|
|
125
|
+
- [ ] Correct skills loaded for next phase
|
|
126
|
+
|
|
127
|
+
**Proceed:** [Yes / Need to resolve: describe]
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
## Integration Points
|
|
131
|
+
|
|
132
|
+
### In `/ship` (Phase 3 task loop)
|
|
133
|
+
|
|
134
|
+
After every ceil(totalTasks / 2) tasks, run **Mid-Point Check**:
|
|
135
|
+
|
|
136
|
+
```typescript
|
|
137
|
+
const midpoint = Math.ceil(totalTasks / 2);
|
|
138
|
+
if (completedTasks === midpoint) {
|
|
139
|
+
// Run mid-point reflection
|
|
140
|
+
// Log assessment to .beads/artifacts/$BEAD_ID/reflections.md
|
|
141
|
+
}
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
Before each task completion claim, run **Completion Check** (lightweight — just the evidence audit).
|
|
145
|
+
|
|
146
|
+
### In `/lfg` (phase transitions)
|
|
147
|
+
|
|
148
|
+
At each step boundary (Plan→Work, Work→Review, Review→Compound), run **Phase Transition Check**.
|
|
149
|
+
|
|
150
|
+
### Context pressure monitoring
|
|
151
|
+
|
|
152
|
+
When context usage estimate exceeds 70%, run **Near-Limit Warning** regardless of task position.
|
|
153
|
+
|
|
154
|
+
## Reflection Log
|
|
155
|
+
|
|
156
|
+
Append all reflections to `.beads/artifacts/$BEAD_ID/reflections.md` (or session-level if no bead):
|
|
157
|
+
|
|
158
|
+
```markdown
|
|
159
|
+
## Reflection Log
|
|
160
|
+
|
|
161
|
+
### [timestamp] Mid-Point Check
|
|
162
|
+
|
|
163
|
+
Assessment: On track
|
|
164
|
+
Context: ~45% used
|
|
165
|
+
Adjustment: None
|
|
166
|
+
|
|
167
|
+
### [timestamp] Completion Check — Task 3
|
|
168
|
+
|
|
169
|
+
Verdict: Complete
|
|
170
|
+
Evidence: typecheck pass, test pass (12/12)
|
|
171
|
+
|
|
172
|
+
### [timestamp] Near-Limit Warning
|
|
173
|
+
|
|
174
|
+
Decision: Compress and continue
|
|
175
|
+
Deferred: Task 6 (cosmetic cleanup) → follow-up bead
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
## Gotchas
|
|
179
|
+
|
|
180
|
+
- **Don't over-reflect** — these are quick self-checks, not long analyses. Each should take < 30 seconds of reasoning.
|
|
181
|
+
- **Don't block on minor drift** — if drift is cosmetic (variable naming, style), note it and continue. Only pause for scope drift.
|
|
182
|
+
- **Context cost** — each reflection adds ~200-400 tokens. Budget accordingly. Skip mid-point check for < 4 tasks.
|
|
183
|
+
- **Not a replacement for verification** — reflections assess trajectory, not correctness. Always run actual verification commands.
|