opencodekit 0.20.5 → 0.20.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,183 @@
1
+ ---
2
+ name: reflection-checkpoints
3
+ description: >
4
+ Use when executing long-running commands (/ship, /lfg) to add self-assessment
5
+ checkpoints that detect scope drift, stalled progress, and premature completion claims.
6
+ Inspired by ByteRover's reflection prompt architecture.
7
+ version: 1.0.0
8
+ tags: [workflow, quality, autonomous]
9
+ dependencies: [verification-before-completion]
10
+ ---
11
+
12
+ # Reflection Checkpoints
13
+
14
+ ## When to Use
15
+
16
+ - During `/ship` execution after completing 50%+ of tasks
17
+ - During `/lfg` at each phase transition (Plan→Work→Review→Compound)
18
+ - When a task takes significantly longer than estimated
19
+ - When context usage exceeds 60% of budget
20
+
21
+ ## When NOT to Use
22
+
23
+ - Simple, single-task work (< 3 tasks)
24
+ - Pure research or exploration commands
25
+ - When user explicitly requests fast execution without checkpoints
26
+
27
+ ## Overview
28
+
29
+ Long-running autonomous execution drifts silently. By the time you notice, you've burned context on the wrong thing. Reflection checkpoints force self-assessment at critical moments — catching drift before it compounds.
30
+
31
+ **Core principle:** Pause to assess, don't just assess to pause.
32
+
33
+ ## The Four Reflection Types
34
+
35
+ ### 1. Mid-Point Check
36
+
37
+ **Trigger:** After completing ~50% of planned tasks (e.g., 3 of 6 tasks done)
38
+
39
+ ```
40
+ ## 🔍 Mid-Point Reflection
41
+
42
+ **Progress:** [N/M] tasks complete
43
+ **Context used:** ~[X]% estimated
44
+
45
+ ### Scope Check
46
+ - [ ] Am I still solving the original problem?
47
+ - [ ] Have I introduced any unplanned work?
48
+ - [ ] Are remaining tasks still correctly scoped?
49
+
50
+ ### Quality Check
51
+ - [ ] Do completed tasks actually work (not just "done")?
52
+ - [ ] Any verification steps I deferred?
53
+ - [ ] Any TODO/FIXME I left that needs addressing?
54
+
55
+ ### Efficiency Check
56
+ - [ ] Am I spending context on the right things?
57
+ - [ ] Should remaining tasks be parallelized?
58
+ - [ ] Any tasks that should be deferred to a follow-up bead?
59
+
60
+ **Assessment:** [On track / Drifting / Blocked]
61
+ **Adjustment:** [None needed / Describe change]
62
+ ```
63
+
64
+ ### 2. Completion Check
65
+
66
+ **Trigger:** Before claiming any task or phase is complete
67
+
68
+ ```
69
+ ## ✅ Completion Check
70
+
71
+ **Claiming complete:** [task/phase name]
72
+
73
+ ### Evidence Audit
74
+ - [ ] Verification command was run (not assumed)
75
+ - [ ] Output confirms the claim (not inferred)
76
+ - [ ] No stub patterns in modified files
77
+ - [ ] Imports/exports are wired (not just declared)
78
+
79
+ ### Goal-Backward Check
80
+ - [ ] Does this task achieve its stated end-state?
81
+ - [ ] Would a user see the expected behavior?
82
+ - [ ] If tested manually, would it work?
83
+
84
+ **Verdict:** [Complete / Needs work: describe what]
85
+ ```
86
+
87
+ ### 3. Near-Limit Warning
88
+
89
+ **Trigger:** When context usage exceeds ~70% or step count approaches limit
90
+
91
+ ```
92
+ ## ⚠️ Near-Limit Warning
93
+
94
+ **Context pressure:** [High / Critical]
95
+ **Remaining tasks:** [N]
96
+
97
+ ### Triage
98
+ 1. What MUST be done before stopping? [list critical tasks]
99
+ 2. What CAN be deferred? [list deferrable tasks]
100
+ 3. What should be handed off? [list with context needed]
101
+
102
+ ### Action
103
+ - [ ] Compress completed work
104
+ - [ ] Prioritize remaining tasks ruthlessly
105
+ - [ ] Prepare handoff if needed
106
+
107
+ **Decision:** [Continue (enough budget) / Compress and continue / Handoff now]
108
+ ```
109
+
110
+ ### 4. Phase Transition Check
111
+
112
+ **Trigger:** At `/lfg` phase boundaries (Plan→Work, Work→Review, Review→Compound)
113
+
114
+ ```
115
+ ## 🔄 Phase Transition: [Previous] → [Next]
116
+
117
+ ### Previous Phase Assessment
118
+ - **Objective met?** [Yes / Partially / No]
119
+ - **Artifacts produced:** [list]
120
+ - **Open issues carried forward:** [list or "none"]
121
+
122
+ ### Next Phase Readiness
123
+ - [ ] Prerequisites satisfied
124
+ - [ ] Context is clean (no stale noise)
125
+ - [ ] Correct skills loaded for next phase
126
+
127
+ **Proceed:** [Yes / Need to resolve: describe]
128
+ ```
129
+
130
+ ## Integration Points
131
+
132
+ ### In `/ship` (Phase 3 task loop)
133
+
134
+ After every ceil(totalTasks / 2) tasks, run **Mid-Point Check**:
135
+
136
+ ```typescript
137
+ const midpoint = Math.ceil(totalTasks / 2);
138
+ if (completedTasks === midpoint) {
139
+ // Run mid-point reflection
140
+ // Log assessment to .beads/artifacts/$BEAD_ID/reflections.md
141
+ }
142
+ ```
143
+
144
+ Before each task completion claim, run **Completion Check** (lightweight — just the evidence audit).
145
+
146
+ ### In `/lfg` (phase transitions)
147
+
148
+ At each step boundary (Plan→Work, Work→Review, Review→Compound), run **Phase Transition Check**.
149
+
150
+ ### Context pressure monitoring
151
+
152
+ When context usage estimate exceeds 70%, run **Near-Limit Warning** regardless of task position.
153
+
154
+ ## Reflection Log
155
+
156
+ Append all reflections to `.beads/artifacts/$BEAD_ID/reflections.md` (or session-level if no bead):
157
+
158
+ ```markdown
159
+ ## Reflection Log
160
+
161
+ ### [timestamp] Mid-Point Check
162
+
163
+ Assessment: On track
164
+ Context: ~45% used
165
+ Adjustment: None
166
+
167
+ ### [timestamp] Completion Check — Task 3
168
+
169
+ Verdict: Complete
170
+ Evidence: typecheck pass, test pass (12/12)
171
+
172
+ ### [timestamp] Near-Limit Warning
173
+
174
+ Decision: Compress and continue
175
+ Deferred: Task 6 (cosmetic cleanup) → follow-up bead
176
+ ```
177
+
178
+ ## Gotchas
179
+
180
+ - **Don't over-reflect** — these are quick self-checks, not long analyses. Each should take < 30 seconds of reasoning.
181
+ - **Don't block on minor drift** — if drift is cosmetic (variable naming, style), note it and continue. Only pause for scope drift.
182
+ - **Context cost** — each reflection adds ~200-400 tokens. Budget accordingly. Skip mid-point check for < 4 tasks.
183
+ - **Not a replacement for verification** — reflections assess trajectory, not correctness. Always run actual verification commands.
@@ -245,6 +245,81 @@ After ANY `task()` subagent returns with "success", follow the **Worker Distrust
245
245
  > check a file, verify a condition, reject if unmet. Don't rely on the agent
246
246
  > "remembering" to follow the rule.
247
247
 
248
+ ## Phantom Completion Detection
249
+
250
+ Tasks can "pass" verification while containing stub implementations. This gate catches completions that are technically correct but substantively empty.
251
+
252
+ ### When to Run
253
+
254
+ - After all PRD tasks are marked complete (during `/ship` Phase 4-5)
255
+ - Before closing any bead
256
+ - When `--full` verification is requested
257
+
258
+ ### Stub Patterns to Detect
259
+
260
+ Scan all files modified in the current task/bead for these phantom indicators:
261
+
262
+ ```bash
263
+ # Run against modified code files only (exclude .md, .json, .yml to avoid false positives)
264
+ git diff --name-only origin/main | grep -E '\.(ts|tsx|js|jsx|py|rs|go|swift|kt|java)$' | xargs grep -nE \
265
+ 'return null|return undefined|return \{\}|return \[\]|onClick=\{?\(\) => \{\}\}?|TODO|FIXME|placeholder|stub|not.?implemented|throw new Error\(.Not implemented' \
266
+ 2>/dev/null
267
+ ```
268
+
269
+ | Pattern | What It Indicates | Severity |
270
+ | -------------------------------------------------------- | ------------------------- | -------- |
271
+ | `return null` / `return undefined` | Empty implementation | HIGH |
272
+ | `return {}` / `return []` | Hollow data | HIGH |
273
+ | `onClick={() => {}}` | No-op handler | HIGH |
274
+ | `<div>Component</div>` / `<div>{/* TODO */}</div>` | Placeholder UI | HIGH |
275
+ | `TODO` / `FIXME` / `HACK` | Acknowledged incomplete | MEDIUM |
276
+ | `placeholder` / `stub` / `not implemented` | Self-documenting stubs | HIGH |
277
+ | `throw new Error("Not implemented")` | Explicit stub | HIGH |
278
+ | `fetch('/api/...')` without `await` or error handling | Disconnected call | MEDIUM |
279
+ | `Response.json({ok: true})` or static hardcoded response | Fake API response | HIGH |
280
+ | `console.log` as only function body | Debug-only implementation | MEDIUM |
281
+
282
+ ### Three-Level Artifact Verification
283
+
284
+ For each file listed in PRD `Affected Files`:
285
+
286
+ | Level | Check | How |
287
+ | ------------------ | ---------------------- | -------------------------------------------------------------------------------------------- |
288
+ | **1: Exists** | File is present | `ls path/to/file.ts` |
289
+ | **2: Substantive** | Not a stub/placeholder | `grep -v "TODO\|FIXME\|return null\|placeholder" path/to/file.ts` — verify real logic exists |
290
+ | **3: Wired** | Connected and used | `grep -r "import.*ExportName" src/` — verify other files import/use it |
291
+
292
+ ### Key Link Verification
293
+
294
+ Check that components are actually connected (not just existing side-by-side):
295
+
296
+ | Connection Type | Check Command |
297
+ | --------------- | -------------------------------------------------------------- |
298
+ | Component → API | `grep -E "fetch.*api/\|axios\|useSWR\|useQuery" Component.tsx` |
299
+ | API → Database | `grep -E "prisma\.\|db\.\|sql\|query" route.ts` |
300
+ | Form → Handler | `grep "onSubmit\|handleSubmit" Component.tsx` |
301
+ | State → Render | `grep "{stateVar}" Component.tsx` |
302
+ | Route → Page | Check router config references the page component |
303
+
304
+ ### Phantom Score
305
+
306
+ After running all checks, report a phantom score:
307
+
308
+ ```
309
+ Phantom Completion Check:
310
+ - Files scanned: [N]
311
+ - Stubs found: [N] (HIGH: [n], MEDIUM: [n])
312
+ - Artifact levels: [N] exist, [M] substantive, [K] wired
313
+ - Key links verified: [N]/[M]
314
+ - Score: [CLEAN | SUSPECT | PHANTOM]
315
+ ```
316
+
317
+ | Score | Criteria | Action |
318
+ | ----------- | ---------------------------------------------- | --------------------------------- |
319
+ | **CLEAN** | 0 HIGH stubs, all artifacts Level 3 | Proceed |
320
+ | **SUSPECT** | 1-2 MEDIUM stubs OR 1 artifact not Level 3 | Report, ask user |
321
+ | **PHANTOM** | Any HIGH stubs OR >2 artifacts not substantive | **BLOCK** — fix before completion |
322
+
248
323
  ## Why This Matters
249
324
 
250
325
  From 24 failure memories:
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "opencodekit",
3
- "version": "0.20.5",
3
+ "version": "0.20.7",
4
4
  "description": "CLI tool for bootstrapping and managing OpenCodeKit projects",
5
5
  "keywords": [
6
6
  "agents",