opencodekit 0.18.4 → 0.18.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/index.js +491 -47
- package/dist/template/.opencode/AGENTS.md +13 -1
- package/dist/template/.opencode/agent/build.md +4 -1
- package/dist/template/.opencode/agent/explore.md +25 -58
- package/dist/template/.opencode/command/ship.md +7 -5
- package/dist/template/.opencode/command/verify.md +63 -12
- package/dist/template/.opencode/memory/research/benchmark-framework.md +162 -0
- package/dist/template/.opencode/memory/research/effectiveness-audit.md +213 -0
- package/dist/template/.opencode/memory.db +0 -0
- package/dist/template/.opencode/memory.db-shm +0 -0
- package/dist/template/.opencode/memory.db-wal +0 -0
- package/dist/template/.opencode/opencode.json +1429 -1678
- package/dist/template/.opencode/package.json +1 -1
- package/dist/template/.opencode/plugin/lib/memory-helpers.ts +3 -129
- package/dist/template/.opencode/plugin/lib/memory-hooks.ts +4 -60
- package/dist/template/.opencode/plugin/memory.ts +0 -3
- package/dist/template/.opencode/skill/agent-teams/SKILL.md +16 -1
- package/dist/template/.opencode/skill/beads/SKILL.md +22 -0
- package/dist/template/.opencode/skill/brainstorming/SKILL.md +28 -0
- package/dist/template/.opencode/skill/code-navigation/SKILL.md +130 -0
- package/dist/template/.opencode/skill/condition-based-waiting/SKILL.md +12 -0
- package/dist/template/.opencode/skill/context-management/SKILL.md +122 -113
- package/dist/template/.opencode/skill/defense-in-depth/SKILL.md +20 -0
- package/dist/template/.opencode/skill/design-system-audit/SKILL.md +113 -112
- package/dist/template/.opencode/skill/dispatching-parallel-agents/SKILL.md +8 -0
- package/dist/template/.opencode/skill/executing-plans/SKILL.md +156 -132
- package/dist/template/.opencode/skill/memory-system/SKILL.md +50 -266
- package/dist/template/.opencode/skill/mockup-to-code/SKILL.md +21 -6
- package/dist/template/.opencode/skill/receiving-code-review/SKILL.md +8 -0
- package/dist/template/.opencode/skill/root-cause-tracing/SKILL.md +15 -0
- package/dist/template/.opencode/skill/session-management/SKILL.md +4 -103
- package/dist/template/.opencode/skill/subagent-driven-development/SKILL.md +23 -2
- package/dist/template/.opencode/skill/swarm-coordination/SKILL.md +17 -1
- package/dist/template/.opencode/skill/systematic-debugging/SKILL.md +21 -0
- package/dist/template/.opencode/skill/tool-priority/SKILL.md +34 -16
- package/dist/template/.opencode/skill/ui-ux-research/SKILL.md +5 -127
- package/dist/template/.opencode/skill/verification-before-completion/SKILL.md +36 -0
- package/dist/template/.opencode/skill/verification-before-completion/references/VERIFICATION_PROTOCOL.md +133 -29
- package/dist/template/.opencode/skill/visual-analysis/SKILL.md +20 -7
- package/dist/template/.opencode/skill/writing-plans/SKILL.md +7 -0
- package/dist/template/.opencode/tool/context7.ts +9 -1
- package/dist/template/.opencode/tool/grepsearch.ts +9 -1
- package/package.json +1 -1
|
@@ -0,0 +1,213 @@
|
|
|
1
|
+
---
|
|
2
|
+
purpose: Systematic effectiveness audit of all template skills, tools, and commands
|
|
3
|
+
updated: 2026-03-08
|
|
4
|
+
framework: benchmark-framework.md (7 dimensions, 0-2 each, max 14)
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Effectiveness Audit — OpenCodeKit Template
|
|
8
|
+
|
|
9
|
+
## Methodology
|
|
10
|
+
|
|
11
|
+
Scored 25+ skills, 2 tools, 18 commands using the benchmark framework.
|
|
12
|
+
Dimensions: **T**rigger clarity, **R**eplaces X, **E**xamples, **A**nti-patterns, **V**erification, **Tok**en efficiency, **X**-references.
|
|
13
|
+
Scale: 0=missing, 1=partial, 2=strong. Max: 14.
|
|
14
|
+
|
|
15
|
+
## Summary
|
|
16
|
+
|
|
17
|
+
| Metric | Value |
|
|
18
|
+
| ------------------ | -------- |
|
|
19
|
+
| Total skills | 73 |
|
|
20
|
+
| Reviewed in detail | 25 |
|
|
21
|
+
| Exemplary (12-14) | 5 (20%) |
|
|
22
|
+
| Adequate (8-11) | 10 (40%) |
|
|
23
|
+
| Needs Work (4-7) | 8 (32%) |
|
|
24
|
+
| Poor (0-3) | 2 (8%) |
|
|
25
|
+
| Custom tools | 2 |
|
|
26
|
+
| Commands | 18 |
|
|
27
|
+
|
|
28
|
+
## Tier 1: Exemplary (12-14)
|
|
29
|
+
|
|
30
|
+
Skills ready to ship — high adoption, measurable value.
|
|
31
|
+
|
|
32
|
+
| Skill | T | R | E | A | V | Tok | X | Total | Tokens | Notes |
|
|
33
|
+
| ------------------------------ | --- | --- | --- | --- | --- | --- | --- | ------ | ------ | ----------------------------------------------------------------------- |
|
|
34
|
+
| structured-edit | 2 | 1 | 2 | 2 | 2 | 2 | 2 | **13** | ~1.3k | Gold standard. 5-step protocol, Red Flags, BAD/GOOD examples, quick ref |
|
|
35
|
+
| code-navigation | 2 | 2 | 2 | 2 | 0 | 2 | 2 | **12** | ~1.2k | 7 patterns, tilth comparison, cost awareness, right/wrong examples |
|
|
36
|
+
| verification-before-completion | 2 | 0 | 2 | 2 | 2 | 2 | 1 | **11** | ~1.6k | Iron Law, rationalization prevention, smart verification |
|
|
37
|
+
| tool-priority | 2 | 2 | 2 | 2 | 0 | 1 | 2 | **11** | ~3.3k | "Replaces X" on all tools, tilth section, LSP 9-op table |
|
|
38
|
+
| requesting-code-review | 2 | 0 | 2 | 2 | 2 | 1 | 2 | **11** | ~2.5k | 3 review depths, 5 reviewer prompts, synthesis checklist |
|
|
39
|
+
|
|
40
|
+
### What makes these work
|
|
41
|
+
|
|
42
|
+
1. **Right/wrong examples** — Every exemplary skill shows incorrect then correct approach
|
|
43
|
+
2. **Tables over prose** — Decision tables, comparison tables, common mistakes tables
|
|
44
|
+
3. **Integrated verification** — structured-edit Step 5 (CONFIRM), verification-before-completion Iron Law
|
|
45
|
+
4. **Quick reference blocks** — structured-edit and tool-priority both end with copy-pasteable references
|
|
46
|
+
5. **"Replaces X" framing** — code-navigation and tool-priority explicitly state what they supersede
|
|
47
|
+
|
|
48
|
+
## Tier 2: Adequate (8-11)
|
|
49
|
+
|
|
50
|
+
Functional but missing patterns that would improve adoption.
|
|
51
|
+
|
|
52
|
+
| Skill | T | R | E | A | V | Tok | X | Total | Tokens | Gap |
|
|
53
|
+
| --------------------------- | --- | --- | --- | --- | --- | --- | --- | ------ | ------ | ---------------------------------------------------- |
|
|
54
|
+
| dispatching-parallel-agents | 2 | 0 | 2 | 2 | 2 | 2 | 0 | **10** | ~1.4k | No "Replaces X", no cross-refs |
|
|
55
|
+
| executing-plans | 2 | 0 | 2 | 1 | 2 | 1 | 2 | **10** | ~1.5k | No "Replaces X" |
|
|
56
|
+
| agent-teams | 2 | 0 | 2 | 2 | 1 | 1 | 1 | **9** | ~2.1k | No "Replaces X", could be more token-efficient |
|
|
57
|
+
| condition-based-waiting | 2 | 1 | 2 | 2 | 0 | 2 | 0 | **9** | ~868 | No verification step, no cross-refs |
|
|
58
|
+
| root-cause-tracing | 2 | 0 | 2 | 0 | 1 | 2 | 1 | **8** | ~1.2k | No anti-patterns, no "Replaces X" |
|
|
59
|
+
| writing-plans | 2 | 0 | 2 | 1 | 1 | 1 | 1 | **8** | ~2.0k | No "Replaces X", could trim |
|
|
60
|
+
| beads | 2 | 0 | 2 | 0 | 0 | 2 | 2 | **8** | ~1.2k | No anti-patterns, no verification |
|
|
61
|
+
| receiving-code-review | 2 | 0 | 2 | 2 | 1 | 1 | 0 | **8** | ~1.7k | No "Replaces X", no cross-refs |
|
|
62
|
+
| defense-in-depth | 2 | 0 | 2 | 0 | 0 | 2 | 1 | **7** | ~1.0k | No anti-patterns, no verification |
|
|
63
|
+
| systematic-debugging | 2 | 0 | 2 | 1 | 0 | 1 | 0 | **6** | ~1.6k | Border case — no verification, limited anti-patterns |
|
|
64
|
+
|
|
65
|
+
### Common gaps in this tier
|
|
66
|
+
|
|
67
|
+
1. **No "Replaces X"** — 9/10 adequate skills lack replacement framing
|
|
68
|
+
2. **Missing verification** — 6/10 don't integrate verification steps
|
|
69
|
+
3. **No anti-patterns** — 5/10 lack anti-pattern sections
|
|
70
|
+
4. **No cross-references** — 4/10 are isolated (no links to related skills)
|
|
71
|
+
|
|
72
|
+
## Tier 3: Needs Work (4-7)
|
|
73
|
+
|
|
74
|
+
Significant gaps — may load but produce suboptimal results.
|
|
75
|
+
|
|
76
|
+
| Skill | T | R | E | A | V | Tok | X | Total | Tokens | Issue |
|
|
77
|
+
| --------------------------- | --- | --- | --- | --- | --- | --- | --- | ----- | ------ | ---------------------------------------------- |
|
|
78
|
+
| context-management | 2 | 0 | 2 | 1 | 0 | 1 | 0 | **6** | ~1.7k | Overlaps with DCP system prompts |
|
|
79
|
+
| session-management | 2 | 0 | 1 | 1 | 0 | 2 | 0 | **6** | ~848 | Generic, no tool examples |
|
|
80
|
+
| swarm-coordination | 2 | 0 | 1 | 1 | 0 | 1 | 1 | **6** | ~1.8k | Partially complete, missing examples |
|
|
81
|
+
| memory-system | 2 | 0 | 2 | 0 | 0 | 1 | 0 | **5** | ~2.4k | Token-heavy, no anti-patterns, no verification |
|
|
82
|
+
| brainstorming | 2 | 0 | 0 | 0 | 0 | 2 | 1 | **5** | ~832 | No examples, no anti-patterns |
|
|
83
|
+
| mockup-to-code | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~794 | Prompt templates only |
|
|
84
|
+
| subagent-driven-development | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~1.2k | No anti-patterns, no verification |
|
|
85
|
+
| visual-analysis | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~705 | Prompt templates only |
|
|
86
|
+
|
|
87
|
+
### Common issues
|
|
88
|
+
|
|
89
|
+
1. **Prompt-template-only pattern** — mockup-to-code, visual-analysis give templates without tool integration
|
|
90
|
+
2. **No anti-patterns** — 7/8 lack anti-pattern sections entirely
|
|
91
|
+
3. **No verification** — 8/8 don't integrate verification
|
|
92
|
+
4. **No examples** — brainstorming has zero code examples
|
|
93
|
+
|
|
94
|
+
## Tier 4: Poor (0-3)
|
|
95
|
+
|
|
96
|
+
Should be rewritten or merged.
|
|
97
|
+
|
|
98
|
+
| Skill | T | R | E | A | V | Tok | X | Total | Tokens | Action |
|
|
99
|
+
| ------------------- | --- | --- | --- | --- | --- | --- | --- | ----- | ------ | ------------------------------------------------------------ |
|
|
100
|
+
| ui-ux-research | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~609 | Merge into design-system-audit or rewrite with tool examples |
|
|
101
|
+
| design-system-audit | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~527 | Merge with ui-ux-research or add substance |
|
|
102
|
+
|
|
103
|
+
_Note: These scored 5 (Needs Work) on the rubric but are categorized as effective tier 4 because they consist entirely of prompt templates with no actionable tool integration, anti-patterns, or verification — making them the least effective in practice._
|
|
104
|
+
|
|
105
|
+
## Not Reviewed (Estimated by Category)
|
|
106
|
+
|
|
107
|
+
These 48 skills were not read in detail. Estimates based on YAML description, size, and category patterns.
|
|
108
|
+
|
|
109
|
+
### Platform-Specific (likely Adequate if domain is relevant)
|
|
110
|
+
|
|
111
|
+
- swiftui-expert-skill (~4.2k tokens) — Largest skill, likely good depth
|
|
112
|
+
- swift-concurrency, core-data-expert — Domain-specific
|
|
113
|
+
- react-best-practices, supabase-postgres-best-practices — Framework-specific
|
|
114
|
+
|
|
115
|
+
### External Integrations (varies)
|
|
116
|
+
|
|
117
|
+
- resend, cloudflare, supabase, polar, jira, figma, stitch, v0, v1-run, mqdh
|
|
118
|
+
- These are MCP connector skills — effectiveness depends on API coverage
|
|
119
|
+
|
|
120
|
+
### Meta Skills
|
|
121
|
+
|
|
122
|
+
- skill-creator, writing-skills, testing-skills-with-subagents, sharing-skills, using-skills
|
|
123
|
+
- Self-referential — should follow their own rules
|
|
124
|
+
|
|
125
|
+
### Browser/Automation
|
|
126
|
+
|
|
127
|
+
- playwright, playwriter, agent-browser, chrome-devtools
|
|
128
|
+
|
|
129
|
+
### Context/Lifecycle
|
|
130
|
+
|
|
131
|
+
- compaction, context-engineering, context-initialization, gemini-large-context
|
|
132
|
+
- development-lifecycle, prd, prd-task
|
|
133
|
+
- finishing-a-development-branch, using-git-worktrees
|
|
134
|
+
- deep-research, source-code-research, opensrc, augment-context-engine
|
|
135
|
+
- beads-bridge, ralph, index-knowledge, obsidian, pdf-extract
|
|
136
|
+
- accessibility-audit, web-design-guidelines, frontend-design
|
|
137
|
+
|
|
138
|
+
## Tools Audit
|
|
139
|
+
|
|
140
|
+
| Tool | T | R | E | A | V | Tok | X | Total | Tokens | Notes |
|
|
141
|
+
| ---------- | --- | --- | --- | --- | --- | --- | --- | ----- | ------ | ------------------------------------------------------- |
|
|
142
|
+
| context7 | 2 | 2 | 2 | 0 | 0 | 1 | 0 | **7** | ~1.4k | Has "Replaces X" + WHEN/SKIP. Missing anti-patterns |
|
|
143
|
+
| grepsearch | 1 | 2 | 2 | 0 | 0 | 2 | 0 | **7** | ~946 | Has "Replaces X". Missing full SKIP gate, anti-patterns |
|
|
144
|
+
|
|
145
|
+
### Tool recommendations
|
|
146
|
+
|
|
147
|
+
- Add anti-patterns to both tool descriptions (common misuse patterns)
|
|
148
|
+
- context7: Add "SKIP: Internal code (use tilth/grep)" explicitly
|
|
149
|
+
- grepsearch: Add full WHEN/SKIP binary gate
|
|
150
|
+
|
|
151
|
+
## Commands Assessment
|
|
152
|
+
|
|
153
|
+
18 commands total. Commands evaluated on: clear trigger, actionable steps, verification integration, error guidance.
|
|
154
|
+
|
|
155
|
+
| Command | Category | Quality | Notes |
|
|
156
|
+
| --------------------------- | -------- | ------- | ------------------------------------------------ |
|
|
157
|
+
| lfg | Workflow | High | Full chain orchestration |
|
|
158
|
+
| ship | Workflow | High | Clear gates and verification |
|
|
159
|
+
| plan | Planning | High | Structured output |
|
|
160
|
+
| verify | Quality | High | Recently improved (incremental, parallel, cache) |
|
|
161
|
+
| compound | Learning | High | Extracts learnings |
|
|
162
|
+
| start/resume/handoff | Session | Medium | Functional but could cross-ref more |
|
|
163
|
+
| status | Info | Medium | |
|
|
164
|
+
| pr | Git | Medium | |
|
|
165
|
+
| review-codebase | Quality | Medium | |
|
|
166
|
+
| research | Research | Medium | |
|
|
167
|
+
| design/ui-review | Design | Low | Prompt-template style |
|
|
168
|
+
| init/init-user/init-context | Setup | High | Well-tested |
|
|
169
|
+
| create | Meta | Medium | |
|
|
170
|
+
|
|
171
|
+
## Overlap Analysis
|
|
172
|
+
|
|
173
|
+
| Pair | Overlap | Recommendation |
|
|
174
|
+
| -------------------------------------------------------------- | ------------------------------------ | ----------------------------------------------------- |
|
|
175
|
+
| context-management ↔ compaction | Both manage context size | Merge or clearly differentiate |
|
|
176
|
+
| agent-teams ↔ swarm-coordination ↔ dispatching-parallel-agents | All handle parallel agents | Create decision tree in agent-teams, reference others |
|
|
177
|
+
| session-management ↔ context-management | Both track context thresholds | Merge session into context-management |
|
|
178
|
+
| ui-ux-research ↔ design-system-audit ↔ visual-analysis | All design-focused prompt templates | Consolidate into one design-audit skill |
|
|
179
|
+
| beads ↔ beads-bridge | Bridge extends beads for multi-agent | Clear but should be documented in beads |
|
|
180
|
+
| structured-edit ↔ code-navigation | Both about code manipulation | Cross-reference each other |
|
|
181
|
+
|
|
182
|
+
## Top 10 Improvement Priorities
|
|
183
|
+
|
|
184
|
+
Ranked by impact (core skills first, high-frequency usage).
|
|
185
|
+
|
|
186
|
+
| # | Action | Target | Impact |
|
|
187
|
+
| --- | -------------------------------------------------------------------------- | ----------------- | --------------------------- |
|
|
188
|
+
| 1 | Add "Replaces X" to top 10 skills | All tier 2 skills | +adoption (tilth: +36pp) |
|
|
189
|
+
| 2 | Add anti-patterns to beads, defense-in-depth, root-cause-tracing | Core debugging | +failure prevention |
|
|
190
|
+
| 3 | Add verification steps to condition-based-waiting, defense-in-depth, beads | Core workflow | +correctness |
|
|
191
|
+
| 4 | Consolidate context-management + session-management | Context skills | -redundancy, -token cost |
|
|
192
|
+
| 5 | Consolidate ui-ux-research + design-system-audit + visual-analysis | Design skills | -3 weak skills → 1 adequate |
|
|
193
|
+
| 6 | Rewrite brainstorming with concrete examples | Planning | +actionability |
|
|
194
|
+
| 7 | Add cross-references to isolated skills (6 skills) | Various | +routing |
|
|
195
|
+
| 8 | Trim memory-system from 2.4k to ~1.5k tokens | Core | +token efficiency |
|
|
196
|
+
| 9 | Add "Replaces X" to tools (context7 SKIP gate, grepsearch WHEN gate) | Tools | +routing |
|
|
197
|
+
| 10 | Audit remaining 48 un-reviewed skills | All | Full coverage |
|
|
198
|
+
|
|
199
|
+
## Template-Level Metrics
|
|
200
|
+
|
|
201
|
+
| Metric | Target | Current | Status |
|
|
202
|
+
| ----------------------------- | ------ | ---------------- | ---------- |
|
|
203
|
+
| Core skills at Exemplary tier | 100% | 50% (5/10 core) | Needs work |
|
|
204
|
+
| No skills at Poor tier | 0 | 2 | Needs work |
|
|
205
|
+
| Average token cost per skill | <1500 | ~1.5k (reviewed) | Borderline |
|
|
206
|
+
| Skills with WHEN/SKIP gates | 100% | 100% (reviewed) | PASS |
|
|
207
|
+
| Skills with anti-patterns | >75% | 44% (11/25) | Needs work |
|
|
208
|
+
| Overlap/redundancy pairs | 0 | 6 pairs | Needs work |
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
_Next: Apply improvement priorities starting with #1 (add "Replaces X" to tier 2 skills)._
|
|
213
|
+
_Re-audit after changes to measure improvement._
|
|
Binary file
|
|
Binary file
|
|
Binary file
|