opencodekit 0.18.4 → 0.18.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/dist/index.js +491 -47
  2. package/dist/template/.opencode/AGENTS.md +13 -1
  3. package/dist/template/.opencode/agent/build.md +4 -1
  4. package/dist/template/.opencode/agent/explore.md +25 -58
  5. package/dist/template/.opencode/command/ship.md +7 -5
  6. package/dist/template/.opencode/command/verify.md +63 -12
  7. package/dist/template/.opencode/memory/research/benchmark-framework.md +162 -0
  8. package/dist/template/.opencode/memory/research/effectiveness-audit.md +213 -0
  9. package/dist/template/.opencode/memory.db +0 -0
  10. package/dist/template/.opencode/memory.db-shm +0 -0
  11. package/dist/template/.opencode/memory.db-wal +0 -0
  12. package/dist/template/.opencode/opencode.json +1429 -1678
  13. package/dist/template/.opencode/package.json +1 -1
  14. package/dist/template/.opencode/plugin/lib/memory-helpers.ts +3 -129
  15. package/dist/template/.opencode/plugin/lib/memory-hooks.ts +4 -60
  16. package/dist/template/.opencode/plugin/memory.ts +0 -3
  17. package/dist/template/.opencode/skill/agent-teams/SKILL.md +16 -1
  18. package/dist/template/.opencode/skill/beads/SKILL.md +22 -0
  19. package/dist/template/.opencode/skill/brainstorming/SKILL.md +28 -0
  20. package/dist/template/.opencode/skill/code-navigation/SKILL.md +130 -0
  21. package/dist/template/.opencode/skill/condition-based-waiting/SKILL.md +12 -0
  22. package/dist/template/.opencode/skill/context-management/SKILL.md +122 -113
  23. package/dist/template/.opencode/skill/defense-in-depth/SKILL.md +20 -0
  24. package/dist/template/.opencode/skill/design-system-audit/SKILL.md +113 -112
  25. package/dist/template/.opencode/skill/dispatching-parallel-agents/SKILL.md +8 -0
  26. package/dist/template/.opencode/skill/executing-plans/SKILL.md +156 -132
  27. package/dist/template/.opencode/skill/memory-system/SKILL.md +50 -266
  28. package/dist/template/.opencode/skill/mockup-to-code/SKILL.md +21 -6
  29. package/dist/template/.opencode/skill/receiving-code-review/SKILL.md +8 -0
  30. package/dist/template/.opencode/skill/root-cause-tracing/SKILL.md +15 -0
  31. package/dist/template/.opencode/skill/session-management/SKILL.md +4 -103
  32. package/dist/template/.opencode/skill/subagent-driven-development/SKILL.md +23 -2
  33. package/dist/template/.opencode/skill/swarm-coordination/SKILL.md +17 -1
  34. package/dist/template/.opencode/skill/systematic-debugging/SKILL.md +21 -0
  35. package/dist/template/.opencode/skill/tool-priority/SKILL.md +34 -16
  36. package/dist/template/.opencode/skill/ui-ux-research/SKILL.md +5 -127
  37. package/dist/template/.opencode/skill/verification-before-completion/SKILL.md +36 -0
  38. package/dist/template/.opencode/skill/verification-before-completion/references/VERIFICATION_PROTOCOL.md +133 -29
  39. package/dist/template/.opencode/skill/visual-analysis/SKILL.md +20 -7
  40. package/dist/template/.opencode/skill/writing-plans/SKILL.md +7 -0
  41. package/dist/template/.opencode/tool/context7.ts +9 -1
  42. package/dist/template/.opencode/tool/grepsearch.ts +9 -1
  43. package/package.json +1 -1
@@ -0,0 +1,213 @@
1
+ ---
2
+ purpose: Systematic effectiveness audit of all template skills, tools, and commands
3
+ updated: 2026-03-08
4
+ framework: benchmark-framework.md (7 dimensions, 0-2 each, max 14)
5
+ ---
6
+
7
+ # Effectiveness Audit — OpenCodeKit Template
8
+
9
+ ## Methodology
10
+
11
+ Scored 25+ skills, 2 tools, 18 commands using the benchmark framework.
12
+ Dimensions: **T**rigger clarity, **R**eplaces X, **E**xamples, **A**nti-patterns, **V**erification, **Tok**en efficiency, **X**-references.
13
+ Scale: 0=missing, 1=partial, 2=strong. Max: 14.
14
+
15
+ ## Summary
16
+
17
+ | Metric | Value |
18
+ | ------------------ | -------- |
19
+ | Total skills | 73 |
20
+ | Reviewed in detail | 25 |
21
+ | Exemplary (12-14) | 5 (20%) |
22
+ | Adequate (8-11) | 10 (40%) |
23
+ | Needs Work (4-7) | 8 (32%) |
24
+ | Poor (0-3) | 2 (8%) |
25
+ | Custom tools | 2 |
26
+ | Commands | 18 |
27
+
28
+ ## Tier 1: Exemplary (12-14)
29
+
30
+ Skills ready to ship — high adoption, measurable value.
31
+
32
+ | Skill | T | R | E | A | V | Tok | X | Total | Tokens | Notes |
33
+ | ------------------------------ | --- | --- | --- | --- | --- | --- | --- | ------ | ------ | ----------------------------------------------------------------------- |
34
+ | structured-edit | 2 | 1 | 2 | 2 | 2 | 2 | 2 | **13** | ~1.3k | Gold standard. 5-step protocol, Red Flags, BAD/GOOD examples, quick ref |
35
+ | code-navigation | 2 | 2 | 2 | 2 | 0 | 2 | 2 | **12** | ~1.2k | 7 patterns, tilth comparison, cost awareness, right/wrong examples |
36
+ | verification-before-completion | 2 | 0 | 2 | 2 | 2 | 2 | 1 | **11** | ~1.6k | Iron Law, rationalization prevention, smart verification |
37
+ | tool-priority | 2 | 2 | 2 | 2 | 0 | 1 | 2 | **11** | ~3.3k | "Replaces X" on all tools, tilth section, LSP 9-op table |
38
+ | requesting-code-review | 2 | 0 | 2 | 2 | 2 | 1 | 2 | **11** | ~2.5k | 3 review depths, 5 reviewer prompts, synthesis checklist |
39
+
40
+ ### What makes these work
41
+
42
+ 1. **Right/wrong examples** — Every exemplary skill shows incorrect then correct approach
43
+ 2. **Tables over prose** — Decision tables, comparison tables, common mistakes tables
44
+ 3. **Integrated verification** — structured-edit Step 5 (CONFIRM), verification-before-completion Iron Law
45
+ 4. **Quick reference blocks** — structured-edit and tool-priority both end with copy-pasteable references
46
+ 5. **"Replaces X" framing** — code-navigation and tool-priority explicitly state what they supersede
47
+
48
+ ## Tier 2: Adequate (8-11)
49
+
50
+ Functional but missing patterns that would improve adoption.
51
+
52
+ | Skill | T | R | E | A | V | Tok | X | Total | Tokens | Gap |
53
+ | --------------------------- | --- | --- | --- | --- | --- | --- | --- | ------ | ------ | ---------------------------------------------------- |
54
+ | dispatching-parallel-agents | 2 | 0 | 2 | 2 | 2 | 2 | 0 | **10** | ~1.4k | No "Replaces X", no cross-refs |
55
+ | executing-plans | 2 | 0 | 2 | 1 | 2 | 1 | 2 | **10** | ~1.5k | No "Replaces X" |
56
+ | agent-teams | 2 | 0 | 2 | 2 | 1 | 1 | 1 | **9** | ~2.1k | No "Replaces X", could be more token-efficient |
57
+ | condition-based-waiting | 2 | 1 | 2 | 2 | 0 | 2 | 0 | **9** | ~868 | No verification step, no cross-refs |
58
+ | root-cause-tracing | 2 | 0 | 2 | 0 | 1 | 2 | 1 | **8** | ~1.2k | No anti-patterns, no "Replaces X" |
59
+ | writing-plans | 2 | 0 | 2 | 1 | 1 | 1 | 1 | **8** | ~2.0k | No "Replaces X", could trim |
60
+ | beads | 2 | 0 | 2 | 0 | 0 | 2 | 2 | **8** | ~1.2k | No anti-patterns, no verification |
61
+ | receiving-code-review | 2 | 0 | 2 | 2 | 1 | 1 | 0 | **8** | ~1.7k | No "Replaces X", no cross-refs |
62
+ | defense-in-depth | 2 | 0 | 2 | 0 | 0 | 2 | 1 | **7** | ~1.0k | No anti-patterns, no verification |
63
+ | systematic-debugging | 2 | 0 | 2 | 1 | 0 | 1 | 0 | **6** | ~1.6k | Border case — no verification, limited anti-patterns |
64
+
65
+ ### Common gaps in this tier
66
+
67
+ 1. **No "Replaces X"** — 9/10 adequate skills lack replacement framing
68
+ 2. **Missing verification** — 6/10 don't integrate verification steps
69
+ 3. **No anti-patterns** — 5/10 lack anti-pattern sections
70
+ 4. **No cross-references** — 4/10 are isolated (no links to related skills)
71
+
72
+ ## Tier 3: Needs Work (4-7)
73
+
74
+ Significant gaps — may load but produce suboptimal results.
75
+
76
+ | Skill | T | R | E | A | V | Tok | X | Total | Tokens | Issue |
77
+ | --------------------------- | --- | --- | --- | --- | --- | --- | --- | ----- | ------ | ---------------------------------------------- |
78
+ | context-management | 2 | 0 | 2 | 1 | 0 | 1 | 0 | **6** | ~1.7k | Overlaps with DCP system prompts |
79
+ | session-management | 2 | 0 | 1 | 1 | 0 | 2 | 0 | **6** | ~848 | Generic, no tool examples |
80
+ | swarm-coordination | 2 | 0 | 1 | 1 | 0 | 1 | 1 | **6** | ~1.8k | Partially complete, missing examples |
81
+ | memory-system | 2 | 0 | 2 | 0 | 0 | 1 | 0 | **5** | ~2.4k | Token-heavy, no anti-patterns, no verification |
82
+ | brainstorming | 2 | 0 | 0 | 0 | 0 | 2 | 1 | **5** | ~832 | No examples, no anti-patterns |
83
+ | mockup-to-code | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~794 | Prompt templates only |
84
+ | subagent-driven-development | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~1.2k | No anti-patterns, no verification |
85
+ | visual-analysis | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~705 | Prompt templates only |
86
+
87
+ ### Common issues
88
+
89
+ 1. **Prompt-template-only pattern** — mockup-to-code, visual-analysis give templates without tool integration
90
+ 2. **No anti-patterns** — 7/8 lack anti-pattern sections entirely
91
+ 3. **No verification** — 8/8 don't integrate verification
92
+ 4. **No examples** — brainstorming has zero code examples
93
+
94
+ ## Tier 4: Poor (0-3)
95
+
96
+ Should be rewritten or merged.
97
+
98
+ | Skill | T | R | E | A | V | Tok | X | Total | Tokens | Action |
99
+ | ------------------- | --- | --- | --- | --- | --- | --- | --- | ----- | ------ | ------------------------------------------------------------ |
100
+ | ui-ux-research | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~609 | Merge into design-system-audit or rewrite with tool examples |
101
+ | design-system-audit | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~527 | Merge with ui-ux-research or add substance |
102
+
103
+ _Note: These scored 5 (Needs Work) on the rubric but are categorized as effective tier 4 because they consist entirely of prompt templates with no actionable tool integration, anti-patterns, or verification — making them the least effective in practice._
104
+
105
+ ## Not Reviewed (Estimated by Category)
106
+
107
+ These 48 skills were not read in detail. Estimates based on YAML description, size, and category patterns.
108
+
109
+ ### Platform-Specific (likely Adequate if domain is relevant)
110
+
111
+ - swiftui-expert-skill (~4.2k tokens) — Largest skill, likely good depth
112
+ - swift-concurrency, core-data-expert — Domain-specific
113
+ - react-best-practices, supabase-postgres-best-practices — Framework-specific
114
+
115
+ ### External Integrations (varies)
116
+
117
+ - resend, cloudflare, supabase, polar, jira, figma, stitch, v0, v1-run, mqdh
118
+ - These are MCP connector skills — effectiveness depends on API coverage
119
+
120
+ ### Meta Skills
121
+
122
+ - skill-creator, writing-skills, testing-skills-with-subagents, sharing-skills, using-skills
123
+ - Self-referential — should follow their own rules
124
+
125
+ ### Browser/Automation
126
+
127
+ - playwright, playwriter, agent-browser, chrome-devtools
128
+
129
+ ### Context/Lifecycle
130
+
131
+ - compaction, context-engineering, context-initialization, gemini-large-context
132
+ - development-lifecycle, prd, prd-task
133
+ - finishing-a-development-branch, using-git-worktrees
134
+ - deep-research, source-code-research, opensrc, augment-context-engine
135
+ - beads-bridge, ralph, index-knowledge, obsidian, pdf-extract
136
+ - accessibility-audit, web-design-guidelines, frontend-design
137
+
138
+ ## Tools Audit
139
+
140
+ | Tool | T | R | E | A | V | Tok | X | Total | Tokens | Notes |
141
+ | ---------- | --- | --- | --- | --- | --- | --- | --- | ----- | ------ | ------------------------------------------------------- |
142
+ | context7 | 2 | 2 | 2 | 0 | 0 | 1 | 0 | **7** | ~1.4k | Has "Replaces X" + WHEN/SKIP. Missing anti-patterns |
143
+ | grepsearch | 1 | 2 | 2 | 0 | 0 | 2 | 0 | **7** | ~946 | Has "Replaces X". Missing full SKIP gate, anti-patterns |
144
+
145
+ ### Tool recommendations
146
+
147
+ - Add anti-patterns to both tool descriptions (common misuse patterns)
148
+ - context7: Add "SKIP: Internal code (use tilth/grep)" explicitly
149
+ - grepsearch: Add full WHEN/SKIP binary gate
150
+
151
+ ## Commands Assessment
152
+
153
+ 18 commands total. Commands evaluated on: clear trigger, actionable steps, verification integration, error guidance.
154
+
155
+ | Command | Category | Quality | Notes |
156
+ | --------------------------- | -------- | ------- | ------------------------------------------------ |
157
+ | lfg | Workflow | High | Full chain orchestration |
158
+ | ship | Workflow | High | Clear gates and verification |
159
+ | plan | Planning | High | Structured output |
160
+ | verify | Quality | High | Recently improved (incremental, parallel, cache) |
161
+ | compound | Learning | High | Extracts learnings |
162
+ | start/resume/handoff | Session | Medium | Functional but could cross-ref more |
163
+ | status | Info | Medium | |
164
+ | pr | Git | Medium | |
165
+ | review-codebase | Quality | Medium | |
166
+ | research | Research | Medium | |
167
+ | design/ui-review | Design | Low | Prompt-template style |
168
+ | init/init-user/init-context | Setup | High | Well-tested |
169
+ | create | Meta | Medium | |
170
+
171
+ ## Overlap Analysis
172
+
173
+ | Pair | Overlap | Recommendation |
174
+ | -------------------------------------------------------------- | ------------------------------------ | ----------------------------------------------------- |
175
+ | context-management ↔ compaction | Both manage context size | Merge or clearly differentiate |
176
+ | agent-teams ↔ swarm-coordination ↔ dispatching-parallel-agents | All handle parallel agents | Create decision tree in agent-teams, reference others |
177
+ | session-management ↔ context-management | Both track context thresholds | Merge session into context-management |
178
+ | ui-ux-research ↔ design-system-audit ↔ visual-analysis | All design-focused prompt templates | Consolidate into one design-audit skill |
179
+ | beads ↔ beads-bridge | Bridge extends beads for multi-agent | Clear but should be documented in beads |
180
+ | structured-edit ↔ code-navigation | Both about code manipulation | Cross-reference each other |
181
+
182
+ ## Top 10 Improvement Priorities
183
+
184
+ Ranked by impact (core skills first, high-frequency usage).
185
+
186
+ | # | Action | Target | Impact |
187
+ | --- | -------------------------------------------------------------------------- | ----------------- | --------------------------- |
188
+ | 1 | Add "Replaces X" to top 10 skills | All tier 2 skills | +adoption (tilth: +36pp) |
189
+ | 2 | Add anti-patterns to beads, defense-in-depth, root-cause-tracing | Core debugging | +failure prevention |
190
+ | 3 | Add verification steps to condition-based-waiting, defense-in-depth, beads | Core workflow | +correctness |
191
+ | 4 | Consolidate context-management + session-management | Context skills | -redundancy, -token cost |
192
+ | 5 | Consolidate ui-ux-research + design-system-audit + visual-analysis | Design skills | -3 weak skills → 1 adequate |
193
+ | 6 | Rewrite brainstorming with concrete examples | Planning | +actionability |
194
+ | 7 | Add cross-references to isolated skills (6 skills) | Various | +routing |
195
+ | 8 | Trim memory-system from 2.4k to ~1.5k tokens | Core | +token efficiency |
196
+ | 9 | Add "Replaces X" to tools (context7 SKIP gate, grepsearch WHEN gate) | Tools | +routing |
197
+ | 10 | Audit remaining 48 un-reviewed skills | All | Full coverage |
198
+
199
+ ## Template-Level Metrics
200
+
201
+ | Metric | Target | Current | Status |
202
+ | ----------------------------- | ------ | ---------------- | ---------- |
203
+ | Core skills at Exemplary tier | 100% | 50% (5/10 core) | Needs work |
204
+ | No skills at Poor tier | 0 | 2 | Needs work |
205
+ | Average token cost per skill | <1500 | ~1.5k (reviewed) | Borderline |
206
+ | Skills with WHEN/SKIP gates | 100% | 100% (reviewed) | PASS |
207
+ | Skills with anti-patterns | >75% | 44% (11/25) | Needs work |
208
+ | Overlap/redundancy pairs | 0 | 6 pairs | Needs work |
209
+
210
+ ---
211
+
212
+ _Next: Apply improvement priorities starting with #1 (add "Replaces X" to tier 2 skills)._
213
+ _Re-audit after changes to measure improvement._
Binary file