opencodekit 0.20.5 → 0.20.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +1 -1
- package/dist/template/.opencode/AGENTS.md +57 -0
- package/dist/template/.opencode/agent/build.md +82 -0
- package/dist/template/.opencode/agent/plan.md +22 -0
- package/dist/template/.opencode/agent/review.md +18 -0
- package/dist/template/.opencode/agent/scout.md +17 -0
- package/dist/template/.opencode/command/compound.md +117 -21
- package/dist/template/.opencode/command/create.md +54 -8
- package/dist/template/.opencode/command/curate.md +299 -0
- package/dist/template/.opencode/command/explore.md +170 -0
- package/dist/template/.opencode/command/health.md +124 -2
- package/dist/template/.opencode/command/iterate.md +200 -0
- package/dist/template/.opencode/command/lfg.md +1 -0
- package/dist/template/.opencode/command/plan.md +63 -2
- package/dist/template/.opencode/command/ship.md +1 -0
- package/dist/template/.opencode/memory/_templates/prd.md +16 -5
- package/dist/template/.opencode/memory.db +0 -0
- package/dist/template/.opencode/memory.db-shm +0 -0
- package/dist/template/.opencode/memory.db-wal +0 -0
- package/dist/template/.opencode/skill/reconcile/SKILL.md +183 -0
- package/dist/template/.opencode/skill/reflection-checkpoints/SKILL.md +183 -0
- package/dist/template/.opencode/skill/verification-before-completion/SKILL.md +75 -0
- package/package.json +1 -1
|
@@ -0,0 +1,183 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: reflection-checkpoints
|
|
3
|
+
description: >
|
|
4
|
+
Use when executing long-running commands (/ship, /lfg) to add self-assessment
|
|
5
|
+
checkpoints that detect scope drift, stalled progress, and premature completion claims.
|
|
6
|
+
Inspired by ByteRover's reflection prompt architecture.
|
|
7
|
+
version: 1.0.0
|
|
8
|
+
tags: [workflow, quality, autonomous]
|
|
9
|
+
dependencies: [verification-before-completion]
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Reflection Checkpoints
|
|
13
|
+
|
|
14
|
+
## When to Use
|
|
15
|
+
|
|
16
|
+
- During `/ship` execution after completing 50%+ of tasks
|
|
17
|
+
- During `/lfg` at each phase transition (Plan→Work→Review→Compound)
|
|
18
|
+
- When a task takes significantly longer than estimated
|
|
19
|
+
- When context usage exceeds 60% of budget
|
|
20
|
+
|
|
21
|
+
## When NOT to Use
|
|
22
|
+
|
|
23
|
+
- Simple, single-task work (< 3 tasks)
|
|
24
|
+
- Pure research or exploration commands
|
|
25
|
+
- When user explicitly requests fast execution without checkpoints
|
|
26
|
+
|
|
27
|
+
## Overview
|
|
28
|
+
|
|
29
|
+
Long-running autonomous execution drifts silently. By the time you notice, you've burned context on the wrong thing. Reflection checkpoints force self-assessment at critical moments — catching drift before it compounds.
|
|
30
|
+
|
|
31
|
+
**Core principle:** Pause to assess, don't just assess to pause.
|
|
32
|
+
|
|
33
|
+
## The Four Reflection Types
|
|
34
|
+
|
|
35
|
+
### 1. Mid-Point Check
|
|
36
|
+
|
|
37
|
+
**Trigger:** After completing ~50% of planned tasks (e.g., 3 of 6 tasks done)
|
|
38
|
+
|
|
39
|
+
```
|
|
40
|
+
## 🔍 Mid-Point Reflection
|
|
41
|
+
|
|
42
|
+
**Progress:** [N/M] tasks complete
|
|
43
|
+
**Context used:** ~[X]% estimated
|
|
44
|
+
|
|
45
|
+
### Scope Check
|
|
46
|
+
- [ ] Am I still solving the original problem?
|
|
47
|
+
- [ ] Have I introduced any unplanned work?
|
|
48
|
+
- [ ] Are remaining tasks still correctly scoped?
|
|
49
|
+
|
|
50
|
+
### Quality Check
|
|
51
|
+
- [ ] Do completed tasks actually work (not just "done")?
|
|
52
|
+
- [ ] Any verification steps I deferred?
|
|
53
|
+
- [ ] Any TODO/FIXME I left that needs addressing?
|
|
54
|
+
|
|
55
|
+
### Efficiency Check
|
|
56
|
+
- [ ] Am I spending context on the right things?
|
|
57
|
+
- [ ] Should remaining tasks be parallelized?
|
|
58
|
+
- [ ] Any tasks that should be deferred to a follow-up bead?
|
|
59
|
+
|
|
60
|
+
**Assessment:** [On track / Drifting / Blocked]
|
|
61
|
+
**Adjustment:** [None needed / Describe change]
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### 2. Completion Check
|
|
65
|
+
|
|
66
|
+
**Trigger:** Before claiming any task or phase is complete
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
## ✅ Completion Check
|
|
70
|
+
|
|
71
|
+
**Claiming complete:** [task/phase name]
|
|
72
|
+
|
|
73
|
+
### Evidence Audit
|
|
74
|
+
- [ ] Verification command was run (not assumed)
|
|
75
|
+
- [ ] Output confirms the claim (not inferred)
|
|
76
|
+
- [ ] No stub patterns in modified files
|
|
77
|
+
- [ ] Imports/exports are wired (not just declared)
|
|
78
|
+
|
|
79
|
+
### Goal-Backward Check
|
|
80
|
+
- [ ] Does this task achieve its stated end-state?
|
|
81
|
+
- [ ] Would a user see the expected behavior?
|
|
82
|
+
- [ ] If tested manually, would it work?
|
|
83
|
+
|
|
84
|
+
**Verdict:** [Complete / Needs work: describe what]
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### 3. Near-Limit Warning
|
|
88
|
+
|
|
89
|
+
**Trigger:** When context usage exceeds ~70% or step count approaches limit
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
## ⚠️ Near-Limit Warning
|
|
93
|
+
|
|
94
|
+
**Context pressure:** [High / Critical]
|
|
95
|
+
**Remaining tasks:** [N]
|
|
96
|
+
|
|
97
|
+
### Triage
|
|
98
|
+
1. What MUST be done before stopping? [list critical tasks]
|
|
99
|
+
2. What CAN be deferred? [list deferrable tasks]
|
|
100
|
+
3. What should be handed off? [list with context needed]
|
|
101
|
+
|
|
102
|
+
### Action
|
|
103
|
+
- [ ] Compress completed work
|
|
104
|
+
- [ ] Prioritize remaining tasks ruthlessly
|
|
105
|
+
- [ ] Prepare handoff if needed
|
|
106
|
+
|
|
107
|
+
**Decision:** [Continue (enough budget) / Compress and continue / Handoff now]
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### 4. Phase Transition Check
|
|
111
|
+
|
|
112
|
+
**Trigger:** At `/lfg` phase boundaries (Plan→Work, Work→Review, Review→Compound)
|
|
113
|
+
|
|
114
|
+
```
|
|
115
|
+
## 🔄 Phase Transition: [Previous] → [Next]
|
|
116
|
+
|
|
117
|
+
### Previous Phase Assessment
|
|
118
|
+
- **Objective met?** [Yes / Partially / No]
|
|
119
|
+
- **Artifacts produced:** [list]
|
|
120
|
+
- **Open issues carried forward:** [list or "none"]
|
|
121
|
+
|
|
122
|
+
### Next Phase Readiness
|
|
123
|
+
- [ ] Prerequisites satisfied
|
|
124
|
+
- [ ] Context is clean (no stale noise)
|
|
125
|
+
- [ ] Correct skills loaded for next phase
|
|
126
|
+
|
|
127
|
+
**Proceed:** [Yes / Need to resolve: describe]
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
## Integration Points
|
|
131
|
+
|
|
132
|
+
### In `/ship` (Phase 3 task loop)
|
|
133
|
+
|
|
134
|
+
After every ceil(totalTasks / 2) tasks, run **Mid-Point Check**:
|
|
135
|
+
|
|
136
|
+
```typescript
|
|
137
|
+
const midpoint = Math.ceil(totalTasks / 2);
|
|
138
|
+
if (completedTasks === midpoint) {
|
|
139
|
+
// Run mid-point reflection
|
|
140
|
+
// Log assessment to .beads/artifacts/$BEAD_ID/reflections.md
|
|
141
|
+
}
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
Before each task completion claim, run **Completion Check** (lightweight — just the evidence audit).
|
|
145
|
+
|
|
146
|
+
### In `/lfg` (phase transitions)
|
|
147
|
+
|
|
148
|
+
At each step boundary (Plan→Work, Work→Review, Review→Compound), run **Phase Transition Check**.
|
|
149
|
+
|
|
150
|
+
### Context pressure monitoring
|
|
151
|
+
|
|
152
|
+
When context usage estimate exceeds 70%, run **Near-Limit Warning** regardless of task position.
|
|
153
|
+
|
|
154
|
+
## Reflection Log
|
|
155
|
+
|
|
156
|
+
Append all reflections to `.beads/artifacts/$BEAD_ID/reflections.md` (or session-level if no bead):
|
|
157
|
+
|
|
158
|
+
```markdown
|
|
159
|
+
## Reflection Log
|
|
160
|
+
|
|
161
|
+
### [timestamp] Mid-Point Check
|
|
162
|
+
|
|
163
|
+
Assessment: On track
|
|
164
|
+
Context: ~45% used
|
|
165
|
+
Adjustment: None
|
|
166
|
+
|
|
167
|
+
### [timestamp] Completion Check — Task 3
|
|
168
|
+
|
|
169
|
+
Verdict: Complete
|
|
170
|
+
Evidence: typecheck pass, test pass (12/12)
|
|
171
|
+
|
|
172
|
+
### [timestamp] Near-Limit Warning
|
|
173
|
+
|
|
174
|
+
Decision: Compress and continue
|
|
175
|
+
Deferred: Task 6 (cosmetic cleanup) → follow-up bead
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
## Gotchas
|
|
179
|
+
|
|
180
|
+
- **Don't over-reflect** — these are quick self-checks, not long analyses. Each should take < 30 seconds of reasoning.
|
|
181
|
+
- **Don't block on minor drift** — if drift is cosmetic (variable naming, style), note it and continue. Only pause for scope drift.
|
|
182
|
+
- **Context cost** — each reflection adds ~200-400 tokens. Budget accordingly. Skip mid-point check for < 4 tasks.
|
|
183
|
+
- **Not a replacement for verification** — reflections assess trajectory, not correctness. Always run actual verification commands.
|
|
@@ -245,6 +245,81 @@ After ANY `task()` subagent returns with "success", follow the **Worker Distrust
|
|
|
245
245
|
> check a file, verify a condition, reject if unmet. Don't rely on the agent
|
|
246
246
|
> "remembering" to follow the rule.
|
|
247
247
|
|
|
248
|
+
## Phantom Completion Detection
|
|
249
|
+
|
|
250
|
+
Tasks can "pass" verification while containing stub implementations. This gate catches completions that are technically correct but substantively empty.
|
|
251
|
+
|
|
252
|
+
### When to Run
|
|
253
|
+
|
|
254
|
+
- After all PRD tasks are marked complete (during `/ship` Phase 4-5)
|
|
255
|
+
- Before closing any bead
|
|
256
|
+
- When `--full` verification is requested
|
|
257
|
+
|
|
258
|
+
### Stub Patterns to Detect
|
|
259
|
+
|
|
260
|
+
Scan all files modified in the current task/bead for these phantom indicators:
|
|
261
|
+
|
|
262
|
+
```bash
|
|
263
|
+
# Run against modified code files only (exclude .md, .json, .yml to avoid false positives)
|
|
264
|
+
git diff --name-only origin/main | grep -E '\.(ts|tsx|js|jsx|py|rs|go|swift|kt|java)$' | xargs grep -nE \
|
|
265
|
+
'return null|return undefined|return \{\}|return \[\]|onClick=\{?\(\) => \{\}\}?|TODO|FIXME|placeholder|stub|not.?implemented|throw new Error\(.Not implemented' \
|
|
266
|
+
2>/dev/null
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
| Pattern | What It Indicates | Severity |
|
|
270
|
+
| -------------------------------------------------------- | ------------------------- | -------- |
|
|
271
|
+
| `return null` / `return undefined` | Empty implementation | HIGH |
|
|
272
|
+
| `return {}` / `return []` | Hollow data | HIGH |
|
|
273
|
+
| `onClick={() => {}}` | No-op handler | HIGH |
|
|
274
|
+
| `<div>Component</div>` / `<div>{/* TODO */}</div>` | Placeholder UI | HIGH |
|
|
275
|
+
| `TODO` / `FIXME` / `HACK` | Acknowledged incomplete | MEDIUM |
|
|
276
|
+
| `placeholder` / `stub` / `not implemented` | Self-documenting stubs | HIGH |
|
|
277
|
+
| `throw new Error("Not implemented")` | Explicit stub | HIGH |
|
|
278
|
+
| `fetch('/api/...')` without `await` or error handling | Disconnected call | MEDIUM |
|
|
279
|
+
| `Response.json({ok: true})` or static hardcoded response | Fake API response | HIGH |
|
|
280
|
+
| `console.log` as only function body | Debug-only implementation | MEDIUM |
|
|
281
|
+
|
|
282
|
+
### Three-Level Artifact Verification
|
|
283
|
+
|
|
284
|
+
For each file listed in PRD `Affected Files`:
|
|
285
|
+
|
|
286
|
+
| Level | Check | How |
|
|
287
|
+
| ------------------ | ---------------------- | -------------------------------------------------------------------------------------------- |
|
|
288
|
+
| **1: Exists** | File is present | `ls path/to/file.ts` |
|
|
289
|
+
| **2: Substantive** | Not a stub/placeholder | `grep -v "TODO\|FIXME\|return null\|placeholder" path/to/file.ts` — verify real logic exists |
|
|
290
|
+
| **3: Wired** | Connected and used | `grep -r "import.*ExportName" src/` — verify other files import/use it |
|
|
291
|
+
|
|
292
|
+
### Key Link Verification
|
|
293
|
+
|
|
294
|
+
Check that components are actually connected (not just existing side-by-side):
|
|
295
|
+
|
|
296
|
+
| Connection Type | Check Command |
|
|
297
|
+
| --------------- | -------------------------------------------------------------- |
|
|
298
|
+
| Component → API | `grep -E "fetch.*api/\|axios\|useSWR\|useQuery" Component.tsx` |
|
|
299
|
+
| API → Database | `grep -E "prisma\.\|db\.\|sql\|query" route.ts` |
|
|
300
|
+
| Form → Handler | `grep "onSubmit\|handleSubmit" Component.tsx` |
|
|
301
|
+
| State → Render | `grep "{stateVar}" Component.tsx` |
|
|
302
|
+
| Route → Page | Check router config references the page component |
|
|
303
|
+
|
|
304
|
+
### Phantom Score
|
|
305
|
+
|
|
306
|
+
After running all checks, report a phantom score:
|
|
307
|
+
|
|
308
|
+
```
|
|
309
|
+
Phantom Completion Check:
|
|
310
|
+
- Files scanned: [N]
|
|
311
|
+
- Stubs found: [N] (HIGH: [n], MEDIUM: [n])
|
|
312
|
+
- Artifact levels: [N] exist, [M] substantive, [K] wired
|
|
313
|
+
- Key links verified: [N]/[M]
|
|
314
|
+
- Score: [CLEAN | SUSPECT | PHANTOM]
|
|
315
|
+
```
|
|
316
|
+
|
|
317
|
+
| Score | Criteria | Action |
|
|
318
|
+
| ----------- | ---------------------------------------------- | --------------------------------- |
|
|
319
|
+
| **CLEAN** | 0 HIGH stubs, all artifacts Level 3 | Proceed |
|
|
320
|
+
| **SUSPECT** | 1-2 MEDIUM stubs OR 1 artifact not Level 3 | Report, ask user |
|
|
321
|
+
| **PHANTOM** | Any HIGH stubs OR >2 artifacts not substantive | **BLOCK** — fix before completion |
|
|
322
|
+
|
|
248
323
|
## Why This Matters
|
|
249
324
|
|
|
250
325
|
From 24 failure memories:
|