opencodekit 0.18.3 → 0.18.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. package/dist/index.js +407 -17
  2. package/dist/template/.opencode/.version +1 -1
  3. package/dist/template/.opencode/AGENTS.md +13 -1
  4. package/dist/template/.opencode/agent/build.md +4 -1
  5. package/dist/template/.opencode/agent/explore.md +5 -35
  6. package/dist/template/.opencode/command/verify.md +63 -12
  7. package/dist/template/.opencode/memory/research/benchmark-framework.md +162 -0
  8. package/dist/template/.opencode/memory/research/effectiveness-audit.md +213 -0
  9. package/dist/template/.opencode/memory.db +0 -0
  10. package/dist/template/.opencode/memory.db-shm +0 -0
  11. package/dist/template/.opencode/memory.db-wal +0 -0
  12. package/dist/template/.opencode/opencode.json +1429 -1678
  13. package/dist/template/.opencode/package.json +1 -1
  14. package/dist/template/.opencode/plugin/lib/memory-helpers.ts +3 -129
  15. package/dist/template/.opencode/plugin/lib/memory-hooks.ts +4 -60
  16. package/dist/template/.opencode/plugin/memory.ts +0 -3
  17. package/dist/template/.opencode/skill/agent-teams/SKILL.md +16 -1
  18. package/dist/template/.opencode/skill/beads/SKILL.md +22 -0
  19. package/dist/template/.opencode/skill/brainstorming/SKILL.md +28 -0
  20. package/dist/template/.opencode/skill/code-navigation/SKILL.md +130 -0
  21. package/dist/template/.opencode/skill/condition-based-waiting/SKILL.md +12 -0
  22. package/dist/template/.opencode/skill/context-management/SKILL.md +122 -113
  23. package/dist/template/.opencode/skill/defense-in-depth/SKILL.md +20 -0
  24. package/dist/template/.opencode/skill/design-system-audit/SKILL.md +113 -112
  25. package/dist/template/.opencode/skill/dispatching-parallel-agents/SKILL.md +8 -0
  26. package/dist/template/.opencode/skill/executing-plans/SKILL.md +7 -0
  27. package/dist/template/.opencode/skill/memory-system/SKILL.md +50 -266
  28. package/dist/template/.opencode/skill/mockup-to-code/SKILL.md +21 -6
  29. package/dist/template/.opencode/skill/receiving-code-review/SKILL.md +8 -0
  30. package/dist/template/.opencode/skill/requesting-code-review/SKILL.md +242 -105
  31. package/dist/template/.opencode/skill/root-cause-tracing/SKILL.md +15 -0
  32. package/dist/template/.opencode/skill/session-management/SKILL.md +4 -103
  33. package/dist/template/.opencode/skill/subagent-driven-development/SKILL.md +23 -2
  34. package/dist/template/.opencode/skill/swarm-coordination/SKILL.md +17 -1
  35. package/dist/template/.opencode/skill/systematic-debugging/SKILL.md +21 -0
  36. package/dist/template/.opencode/skill/tool-priority/SKILL.md +34 -16
  37. package/dist/template/.opencode/skill/ui-ux-research/SKILL.md +5 -127
  38. package/dist/template/.opencode/skill/verification-before-completion/SKILL.md +36 -0
  39. package/dist/template/.opencode/skill/verification-before-completion/references/VERIFICATION_PROTOCOL.md +133 -29
  40. package/dist/template/.opencode/skill/visual-analysis/SKILL.md +20 -7
  41. package/dist/template/.opencode/skill/writing-plans/SKILL.md +7 -0
  42. package/dist/template/.opencode/tool/context7.ts +9 -1
  43. package/dist/template/.opencode/tool/grepsearch.ts +9 -1
  44. package/package.json +1 -1
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  description: Verify implementation completeness, correctness, and coherence
3
- argument-hint: "<bead-id> [--quick] [--full] [--fix]"
3
+ argument-hint: "<bead-id> [--quick] [--full] [--fix] [--no-cache]"
4
4
  agent: review
5
5
  ---
6
6
 
@@ -17,12 +17,13 @@ skill({ name: "verification-before-completion" });
17
17
 
18
18
  ## Parse Arguments
19
19
 
20
- | Argument | Default | Description |
21
- | ----------- | -------- | ---------------------------------------------- |
22
- | `<bead-id>` | required | The bead to verify |
23
- | `--quick` | false | Gates only, skip coherence check |
24
- | `--full` | false | Force full verification mode (non-incremental) |
25
- | `--fix` | false | Auto-fix lint/format issues |
20
+ | Argument | Default | Description |
21
+ | ------------ | -------- | ---------------------------------------------- |
22
+ | `<bead-id>` | required | The bead to verify |
23
+ | `--quick` | false | Gates only, skip coherence check |
24
+ | `--full` | false | Force full verification mode (non-incremental) |
25
+ | `--fix` | false | Auto-fix lint/format issues |
26
+ | `--no-cache` | false | Bypass verification cache, force fresh run |
26
27
 
27
28
  ## Determine Input Type
28
29
 
@@ -39,6 +40,32 @@ skill({ name: "verification-before-completion" });
39
40
  - **Run the gates**: Build, test, lint, typecheck are non-negotiable
40
41
  - **Use project conventions**: Check `package.json` scripts first
41
42
 
43
+ ## Phase 0: Check Verification Cache
44
+
45
+ Before running any gates, check if a recent verification is still valid:
46
+
47
+ ```bash
48
+ # Compute current state fingerprint (commit hash + full diff + untracked files)
49
+ CURRENT_STAMP=$(printf '%s\n%s\n%s' \
50
+ "$(git rev-parse HEAD)" \
51
+ "$(git diff HEAD -- '*.ts' '*.tsx' '*.js' '*.jsx')" \
52
+ "$(git ls-files --others --exclude-standard -- '*.ts' '*.tsx' '*.js' '*.jsx' | xargs cat 2>/dev/null)" \
53
+ | shasum -a 256 | cut -d' ' -f1)
54
+ LAST_STAMP=$(tail -1 .beads/verify.log 2>/dev/null | awk '{print $1}')
55
+ ```
56
+
57
+ | Condition | Action |
58
+ | ----------------------------------------- | ------------------------------------------------------ |
59
+ | `--no-cache` or `--full` | Skip cache check, run fresh |
60
+ | `CURRENT_STAMP == LAST_STAMP` | Report **cached PASS**, skip to Phase 2 (completeness) |
61
+ | `CURRENT_STAMP != LAST_STAMP` or no cache | Run gates normally |
62
+
63
+ When cache hits, report:
64
+
65
+ ```text
66
+ Verification: cached PASS (no changes since <timestamp from verify.log>)
67
+ ```
68
+
42
69
  ## Phase 1: Gather Context
43
70
 
44
71
  ```bash
@@ -66,10 +93,34 @@ Extract all requirements/tasks from the PRD and verify each is implemented:
66
93
 
67
94
  Follow the [Verification Protocol](../skill/verification-before-completion/references/VERIFICATION_PROTOCOL.md):
68
95
 
69
- - Use **incremental mode** for `verify` (pre-commit checks)
70
- - Use **full mode** if `--full` flag is passed
71
- - Run parallel group first, then sequential group
72
- - Report results in gate results table format
96
+ **Default: incremental mode** (changed files only, parallel gates).
97
+
98
+ | Mode | When | Behavior |
99
+ | ----------- | ----------------------------------------- | -------------------------------- |
100
+ | Incremental | Default, <20 changed files | Lint changed files, test changed |
101
+ | Full | `--full` flag, >20 changed files, or ship | Lint all, test all |
102
+
103
+ **Execution order:**
104
+
105
+ 1. **Parallel**: typecheck + lint (simultaneously)
106
+ 2. **Sequential** (after parallel passes): test, then build (ship only)
107
+
108
+ Report results with mode column:
109
+
110
+ ```text
111
+ | Gate | Status | Mode | Time |
112
+ |-----------|--------|-------------|--------|
113
+ | Typecheck | PASS | full | 2.1s |
114
+ | Lint | PASS | incremental | 0.3s |
115
+ | Test | PASS | incremental | 1.2s |
116
+ | Build | SKIP | — | — |
117
+ ```
118
+
119
+ **After all gates pass**, record to verification cache:
120
+
121
+ ```bash
122
+ echo "$CURRENT_STAMP $(date -u +%Y-%m-%dT%H:%M:%SZ) PASS" >> .beads/verify.log
123
+ ```
73
124
 
74
125
  If `--fix` flag provided, run the project's auto-fix command (e.g., `npm run lint:fix`, `ruff check --fix`, `cargo clippy --fix`).
75
126
 
@@ -93,7 +144,7 @@ Output:
93
144
 
94
145
  1. **Result**: READY TO SHIP / NEEDS WORK / BLOCKED
95
146
  2. **Completeness**: score and status
96
- 3. **Correctness**: gate results
147
+ 3. **Correctness**: gate results (with mode column)
97
148
  4. **Coherence**: contradictions found (if not --quick)
98
149
  5. **Blocking issues** to fix before shipping
99
150
  6. **Next step**: `/ship $ARGUMENTS` if ready, or list fixes needed
@@ -0,0 +1,162 @@
1
+ ---
2
+ purpose: Scoring rubric for evaluating template agent effectiveness
3
+ updated: 2026-03-08
4
+ based-on: tilth research (measurable pattern adoption improvements)
5
+ ---
6
+
7
+ # Agent Effectiveness Benchmark Framework
8
+
9
+ ## Purpose
10
+
11
+ Evaluate whether skills, tools, and commands in the OpenCodeKit template actually help AI agents perform better. Based on tilth's methodology: they measured accuracy, cost/correct answer, and tool adoption rates to prove what works.
12
+
13
+ ## Scoring Dimensions
14
+
15
+ 7 dimensions, each scored 0–2. Max score: **14**.
16
+
17
+ ### 1. Trigger Clarity (WHEN/SKIP)
18
+
19
+ Does the description clearly specify when to load AND when NOT to?
20
+
21
+ | Score | Criteria |
22
+ | ----- | ------------------------------------------- |
23
+ | 0 | Vague or missing trigger conditions |
24
+ | 1 | Has WHEN but not WHEN NOT (or vice versa) |
25
+ | 2 | Clear WHEN and WHEN NOT (SKIP) binary gates |
26
+
27
+ **Why it matters:** tilth found explicit WHEN/SKIP gates are the single most effective pattern for correct tool routing. Without them, agents either over-load (waste tokens) or under-load (miss relevant skills).
28
+
29
+ ### 2. "Replaces X" Framing
30
+
31
+ Does it explicitly state what behavior, tool, or workflow it replaces?
32
+
33
+ | Score | Criteria |
34
+ | ----- | ---------------------------------------------- |
35
+ | 0 | No replacement framing |
36
+ | 1 | Implied replacement or "better than X" |
37
+ | 2 | Explicit "Replaces X" statement in description |
38
+
39
+ **Why it matters:** tilth measured +36 percentage points adoption on Haiku when tool descriptions included "Replaces X" framing. Models route better when they know what's superseded.
40
+
41
+ ### 3. Concrete Examples
42
+
43
+ Does it provide working code with actual tool calls, not just prose?
44
+
45
+ | Score | Criteria |
46
+ | ----- | ----------------------------------------------------------------------- |
47
+ | 0 | No examples |
48
+ | 1 | Prose descriptions or generic prompt templates |
49
+ | 2 | Working code examples with actual tool calls / before-after comparisons |
50
+
51
+ **Why it matters:** Models follow examples more reliably than instructions. Prompt templates ("Analyze this image: [attach]") score 1, not 2 — they lack tool integration.
52
+
53
+ ### 4. Anti-Patterns
54
+
55
+ Does it show what NOT to do?
56
+
57
+ | Score | Criteria |
58
+ | ----- | -------------------------------------------------------------- |
59
+ | 0 | No anti-patterns section |
60
+ | 1 | Brief "don't do X" mentions |
61
+ | 2 | Wrong/right comparison table or detailed anti-patterns section |
62
+
63
+ **Why it matters:** Failure prevention is as valuable as success instruction. tilth's evidence-based feature removal (disabling `--map` because 62% of losing tasks used it) proves tracking what fails matters.
64
+
65
+ ### 5. Verification Integration
66
+
67
+ Does it reference or require verification steps?
68
+
69
+ | Score | Criteria |
70
+ | ----- | -------------------------------------------------------------- |
71
+ | 0 | No mention of verification |
72
+ | 1 | Mentions verification in passing |
73
+ | 2 | Integrates verification steps into workflow (run X, confirm Y) |
74
+
75
+ **Why it matters:** Skills that don't include verification produce unverified outputs. The build loop is perceive → create → **verify** → ship.
76
+
77
+ ### 6. Token Efficiency
78
+
79
+ Is the token cost proportional to value delivered?
80
+
81
+ | Score | Criteria |
82
+ | ----- | ------------------------------------------------------------------------------ |
83
+ | 0 | >2500 tokens with low value density (filler, repetition, obvious instructions) |
84
+ | 1 | Reasonable size OR moderate value density |
85
+ | 2 | <1500 tokens with high value density, OR larger with proportional density |
86
+
87
+ **Why it matters:** Every loaded skill consumes context budget. A 4000-token skill that could be 1500 tokens is actively harmful — it displaces working memory.
88
+
89
+ ### 7. Cross-References
90
+
91
+ Does it link to related skills for next steps?
92
+
93
+ | Score | Criteria |
94
+ | ----- | ---------------------------------------------------------------------------- |
95
+ | 0 | No references to other skills |
96
+ | 1 | Mentions related skills in text |
97
+ | 2 | Clear "Related Skills" table or "Next Phase" with skill loading instructions |
98
+
99
+ **Why it matters:** Skills that exist in isolation force agents to discover connections. Explicit connections reduce routing failures.
100
+
101
+ ## Score Interpretation
102
+
103
+ | Range | Tier | Meaning |
104
+ | ----- | ---------- | ----------------------------------------------------------- |
105
+ | 12–14 | Exemplary | Ready to ship — high adoption, measurable value |
106
+ | 8–11 | Adequate | Functional but missing patterns that would improve adoption |
107
+ | 4–7 | Needs Work | Significant gaps — may load but produce suboptimal results |
108
+ | 0–3 | Poor | Should be rewritten or merged into another skill |
109
+
110
+ ## Category Assessment
111
+
112
+ Beyond individual scoring, evaluate each skill's **category fit**:
113
+
114
+ | Category | Expected Traits |
115
+ | -------------------- | -------------------------------------------------------------------------------- |
116
+ | Core Workflow | Loaded frequently, high token ROI, tight integration with other core skills |
117
+ | Planning & Lifecycle | Clear phase transitions, handoff points between skills |
118
+ | Debugging & Quality | Real examples from actual debugging sessions, measurable impact |
119
+ | Code Review | Severity levels, actionable findings format |
120
+ | Design & UI | Visual reference integration, component breakdown |
121
+ | Agent Orchestration | Parallelism rules, coordination protocols |
122
+ | External Integration | API examples, auth handling, error patterns |
123
+ | Platform Specific | Version-pinned APIs, migration guidance |
124
+ | Meta Skills | Self-referential consistency (does the skill-about-skills follow its own rules?) |
125
+
126
+ ## Audit Process
127
+
128
+ 1. **Inventory** — List all skills with token size
129
+ 2. **Sample** — Read representative skills from each category
130
+ 3. **Score** — Apply 7 dimensions to each sampled skill
131
+ 4. **Classify** — Assign tier and category
132
+ 5. **Identify** — Flag overlaps, dead weight, and upgrade candidates
133
+ 6. **Prioritize** — Rank improvements by impact (core skills first)
134
+
135
+ ## Effectiveness Signals (Observable)
136
+
137
+ Beyond the rubric, track these runtime signals when possible:
138
+
139
+ | Signal | Indicates |
140
+ | ------------------------------------------ | ------------------------------------------------------------ |
141
+ | Skill loaded but instructions not followed | Trigger too broad OR instructions too vague |
142
+ | Skill never loaded despite relevant tasks | Trigger too narrow OR description doesn't match task framing |
143
+ | Agent re-reads files after skill search | Skill examples insufficient — agent needs more context |
144
+ | Verification skipped after skill workflow | Skill doesn't integrate verification |
145
+ | Agent loads 5+ skills simultaneously | Skills too granular — should be merged |
146
+
147
+ ## Template-Level Metrics
148
+
149
+ For the overall template (all skills + tools + commands):
150
+
151
+ | Metric | Target | Current |
152
+ | ----------------------------- | ------ | ------- |
153
+ | Core skills at Exemplary tier | 100% | (audit) |
154
+ | No skills at Poor tier | 0 | (audit) |
155
+ | Average token cost per skill | <1500 | (audit) |
156
+ | Skills with WHEN/SKIP gates | 100% | (audit) |
157
+ | Skills with anti-patterns | >75% | (audit) |
158
+ | Overlap/redundancy pairs | 0 | (audit) |
159
+
160
+ ---
161
+
162
+ _Apply this framework during effectiveness audits. Update scoring criteria as new evidence emerges._
@@ -0,0 +1,213 @@
1
+ ---
2
+ purpose: Systematic effectiveness audit of all template skills, tools, and commands
3
+ updated: 2026-03-08
4
+ framework: benchmark-framework.md (7 dimensions, 0-2 each, max 14)
5
+ ---
6
+
7
+ # Effectiveness Audit — OpenCodeKit Template
8
+
9
+ ## Methodology
10
+
11
+ Scored 25+ skills, 2 tools, 18 commands using the benchmark framework.
12
+ Dimensions: **T**rigger clarity, **R**eplaces X, **E**xamples, **A**nti-patterns, **V**erification, **Tok**en efficiency, **X**-references.
13
+ Scale: 0=missing, 1=partial, 2=strong. Max: 14.
14
+
15
+ ## Summary
16
+
17
+ | Metric | Value |
18
+ | ------------------ | -------- |
19
+ | Total skills | 73 |
20
+ | Reviewed in detail | 25 |
21
+ | Exemplary (12-14) | 5 (20%) |
22
+ | Adequate (8-11) | 10 (40%) |
23
+ | Needs Work (4-7) | 8 (32%) |
24
+ | Poor (0-3) | 2 (8%) |
25
+ | Custom tools | 2 |
26
+ | Commands | 18 |
27
+
28
+ ## Tier 1: Exemplary (12-14)
29
+
30
+ Skills ready to ship — high adoption, measurable value.
31
+
32
+ | Skill | T | R | E | A | V | Tok | X | Total | Tokens | Notes |
33
+ | ------------------------------ | --- | --- | --- | --- | --- | --- | --- | ------ | ------ | ----------------------------------------------------------------------- |
34
+ | structured-edit | 2 | 1 | 2 | 2 | 2 | 2 | 2 | **13** | ~1.3k | Gold standard. 5-step protocol, Red Flags, BAD/GOOD examples, quick ref |
35
+ | code-navigation | 2 | 2 | 2 | 2 | 0 | 2 | 2 | **12** | ~1.2k | 7 patterns, tilth comparison, cost awareness, right/wrong examples |
36
+ | verification-before-completion | 2 | 0 | 2 | 2 | 2 | 2 | 1 | **11** | ~1.6k | Iron Law, rationalization prevention, smart verification |
37
+ | tool-priority | 2 | 2 | 2 | 2 | 0 | 1 | 2 | **11** | ~3.3k | "Replaces X" on all tools, tilth section, LSP 9-op table |
38
+ | requesting-code-review | 2 | 0 | 2 | 2 | 2 | 1 | 2 | **11** | ~2.5k | 3 review depths, 5 reviewer prompts, synthesis checklist |
39
+
40
+ ### What makes these work
41
+
42
+ 1. **Right/wrong examples** — Every exemplary skill shows incorrect then correct approach
43
+ 2. **Tables over prose** — Decision tables, comparison tables, common mistakes tables
44
+ 3. **Integrated verification** — structured-edit Step 5 (CONFIRM), verification-before-completion Iron Law
45
+ 4. **Quick reference blocks** — structured-edit and tool-priority both end with copy-pasteable references
46
+ 5. **"Replaces X" framing** — code-navigation and tool-priority explicitly state what they supersede
47
+
48
+ ## Tier 2: Adequate (8-11)
49
+
50
+ Functional but missing patterns that would improve adoption.
51
+
52
+ | Skill | T | R | E | A | V | Tok | X | Total | Tokens | Gap |
53
+ | --------------------------- | --- | --- | --- | --- | --- | --- | --- | ------ | ------ | ---------------------------------------------------- |
54
+ | dispatching-parallel-agents | 2 | 0 | 2 | 2 | 2 | 2 | 0 | **10** | ~1.4k | No "Replaces X", no cross-refs |
55
+ | executing-plans | 2 | 0 | 2 | 1 | 2 | 1 | 2 | **10** | ~1.5k | No "Replaces X" |
56
+ | agent-teams | 2 | 0 | 2 | 2 | 1 | 1 | 1 | **9** | ~2.1k | No "Replaces X", could be more token-efficient |
57
+ | condition-based-waiting | 2 | 1 | 2 | 2 | 0 | 2 | 0 | **9** | ~868 | No verification step, no cross-refs |
58
+ | root-cause-tracing | 2 | 0 | 2 | 0 | 1 | 2 | 1 | **8** | ~1.2k | No anti-patterns, no "Replaces X" |
59
+ | writing-plans | 2 | 0 | 2 | 1 | 1 | 1 | 1 | **8** | ~2.0k | No "Replaces X", could trim |
60
+ | beads | 2 | 0 | 2 | 0 | 0 | 2 | 2 | **8** | ~1.2k | No anti-patterns, no verification |
61
+ | receiving-code-review | 2 | 0 | 2 | 2 | 1 | 1 | 0 | **8** | ~1.7k | No "Replaces X", no cross-refs |
62
+ | defense-in-depth | 2 | 0 | 2 | 0 | 0 | 2 | 1 | **7** | ~1.0k | No anti-patterns, no verification |
63
+ | systematic-debugging | 2 | 0 | 2 | 1 | 0 | 1 | 0 | **6** | ~1.6k | Border case — no verification, limited anti-patterns |
64
+
65
+ ### Common gaps in this tier
66
+
67
+ 1. **No "Replaces X"** — 9/10 adequate skills lack replacement framing
68
+ 2. **Missing verification** — 6/10 don't integrate verification steps
69
+ 3. **No anti-patterns** — 5/10 lack anti-pattern sections
70
+ 4. **No cross-references** — 4/10 are isolated (no links to related skills)
71
+
72
+ ## Tier 3: Needs Work (4-7)
73
+
74
+ Significant gaps — may load but produce suboptimal results.
75
+
76
+ | Skill | T | R | E | A | V | Tok | X | Total | Tokens | Issue |
77
+ | --------------------------- | --- | --- | --- | --- | --- | --- | --- | ----- | ------ | ---------------------------------------------- |
78
+ | context-management | 2 | 0 | 2 | 1 | 0 | 1 | 0 | **6** | ~1.7k | Overlaps with DCP system prompts |
79
+ | session-management | 2 | 0 | 1 | 1 | 0 | 2 | 0 | **6** | ~848 | Generic, no tool examples |
80
+ | swarm-coordination | 2 | 0 | 1 | 1 | 0 | 1 | 1 | **6** | ~1.8k | Partially complete, missing examples |
81
+ | memory-system | 2 | 0 | 2 | 0 | 0 | 1 | 0 | **5** | ~2.4k | Token-heavy, no anti-patterns, no verification |
82
+ | brainstorming | 2 | 0 | 0 | 0 | 0 | 2 | 1 | **5** | ~832 | No examples, no anti-patterns |
83
+ | mockup-to-code | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~794 | Prompt templates only |
84
+ | subagent-driven-development | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~1.2k | No anti-patterns, no verification |
85
+ | visual-analysis | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~705 | Prompt templates only |
86
+
87
+ ### Common issues
88
+
89
+ 1. **Prompt-template-only pattern** — mockup-to-code, visual-analysis give templates without tool integration
90
+ 2. **No anti-patterns** — 7/8 lack anti-pattern sections entirely
91
+ 3. **No verification** — 8/8 don't integrate verification
92
+ 4. **No examples** — brainstorming has zero code examples
93
+
94
+ ## Tier 4: Poor (0-3)
95
+
96
+ Should be rewritten or merged.
97
+
98
+ | Skill | T | R | E | A | V | Tok | X | Total | Tokens | Action |
99
+ | ------------------- | --- | --- | --- | --- | --- | --- | --- | ----- | ------ | ------------------------------------------------------------ |
100
+ | ui-ux-research | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~609 | Merge into design-system-audit or rewrite with tool examples |
101
+ | design-system-audit | 2 | 0 | 1 | 0 | 0 | 2 | 0 | **5** | ~527 | Merge with ui-ux-research or add substance |
102
+
103
+ _Note: These scored 5 (Needs Work) on the rubric but are categorized as effective tier 4 because they consist entirely of prompt templates with no actionable tool integration, anti-patterns, or verification — making them the least effective in practice._
104
+
105
+ ## Not Reviewed (Estimated by Category)
106
+
107
+ These 48 skills were not read in detail. Estimates based on YAML description, size, and category patterns.
108
+
109
+ ### Platform-Specific (likely Adequate if domain is relevant)
110
+
111
+ - swiftui-expert-skill (~4.2k tokens) — Largest skill, likely good depth
112
+ - swift-concurrency, core-data-expert — Domain-specific
113
+ - react-best-practices, supabase-postgres-best-practices — Framework-specific
114
+
115
+ ### External Integrations (varies)
116
+
117
+ - resend, cloudflare, supabase, polar, jira, figma, stitch, v0, v1-run, mqdh
118
+ - These are MCP connector skills — effectiveness depends on API coverage
119
+
120
+ ### Meta Skills
121
+
122
+ - skill-creator, writing-skills, testing-skills-with-subagents, sharing-skills, using-skills
123
+ - Self-referential — should follow their own rules
124
+
125
+ ### Browser/Automation
126
+
127
+ - playwright, playwriter, agent-browser, chrome-devtools
128
+
129
+ ### Context/Lifecycle
130
+
131
+ - compaction, context-engineering, context-initialization, gemini-large-context
132
+ - development-lifecycle, prd, prd-task
133
+ - finishing-a-development-branch, using-git-worktrees
134
+ - deep-research, source-code-research, opensrc, augment-context-engine
135
+ - beads-bridge, ralph, index-knowledge, obsidian, pdf-extract
136
+ - accessibility-audit, web-design-guidelines, frontend-design
137
+
138
+ ## Tools Audit
139
+
140
+ | Tool | T | R | E | A | V | Tok | X | Total | Tokens | Notes |
141
+ | ---------- | --- | --- | --- | --- | --- | --- | --- | ----- | ------ | ------------------------------------------------------- |
142
+ | context7 | 2 | 2 | 2 | 0 | 0 | 1 | 0 | **7** | ~1.4k | Has "Replaces X" + WHEN/SKIP. Missing anti-patterns |
143
+ | grepsearch | 1 | 2 | 2 | 0 | 0 | 2 | 0 | **7** | ~946 | Has "Replaces X". Missing full SKIP gate, anti-patterns |
144
+
145
+ ### Tool recommendations
146
+
147
+ - Add anti-patterns to both tool descriptions (common misuse patterns)
148
+ - context7: Add "SKIP: Internal code (use tilth/grep)" explicitly
149
+ - grepsearch: Add full WHEN/SKIP binary gate
150
+
151
+ ## Commands Assessment
152
+
153
+ 18 commands total. Commands evaluated on: clear trigger, actionable steps, verification integration, error guidance.
154
+
155
+ | Command | Category | Quality | Notes |
156
+ | --------------------------- | -------- | ------- | ------------------------------------------------ |
157
+ | lfg | Workflow | High | Full chain orchestration |
158
+ | ship | Workflow | High | Clear gates and verification |
159
+ | plan | Planning | High | Structured output |
160
+ | verify | Quality | High | Recently improved (incremental, parallel, cache) |
161
+ | compound | Learning | High | Extracts learnings |
162
+ | start/resume/handoff | Session | Medium | Functional but could cross-ref more |
163
+ | status | Info | Medium | |
164
+ | pr | Git | Medium | |
165
+ | review-codebase | Quality | Medium | |
166
+ | research | Research | Medium | |
167
+ | design/ui-review | Design | Low | Prompt-template style |
168
+ | init/init-user/init-context | Setup | High | Well-tested |
169
+ | create | Meta | Medium | |
170
+
171
+ ## Overlap Analysis
172
+
173
+ | Pair | Overlap | Recommendation |
174
+ | -------------------------------------------------------------- | ------------------------------------ | ----------------------------------------------------- |
175
+ | context-management ↔ compaction | Both manage context size | Merge or clearly differentiate |
176
+ | agent-teams ↔ swarm-coordination ↔ dispatching-parallel-agents | All handle parallel agents | Create decision tree in agent-teams, reference others |
177
+ | session-management ↔ context-management | Both track context thresholds | Merge session into context-management |
178
+ | ui-ux-research ↔ design-system-audit ↔ visual-analysis | All design-focused prompt templates | Consolidate into one design-audit skill |
179
+ | beads ↔ beads-bridge | Bridge extends beads for multi-agent | Clear but should be documented in beads |
180
+ | structured-edit ↔ code-navigation | Both about code manipulation | Cross-reference each other |
181
+
182
+ ## Top 10 Improvement Priorities
183
+
184
+ Ranked by impact (core skills first, high-frequency usage).
185
+
186
+ | # | Action | Target | Impact |
187
+ | --- | -------------------------------------------------------------------------- | ----------------- | --------------------------- |
188
+ | 1 | Add "Replaces X" to top 10 skills | All tier 2 skills | +adoption (tilth: +36pp) |
189
+ | 2 | Add anti-patterns to beads, defense-in-depth, root-cause-tracing | Core debugging | +failure prevention |
190
+ | 3 | Add verification steps to condition-based-waiting, defense-in-depth, beads | Core workflow | +correctness |
191
+ | 4 | Consolidate context-management + session-management | Context skills | -redundancy, -token cost |
192
+ | 5 | Consolidate ui-ux-research + design-system-audit + visual-analysis | Design skills | -3 weak skills → 1 adequate |
193
+ | 6 | Rewrite brainstorming with concrete examples | Planning | +actionability |
194
+ | 7 | Add cross-references to isolated skills (6 skills) | Various | +routing |
195
+ | 8 | Trim memory-system from 2.4k to ~1.5k tokens | Core | +token efficiency |
196
+ | 9 | Add "Replaces X" to tools (context7 SKIP gate, grepsearch WHEN gate) | Tools | +routing |
197
+ | 10 | Audit remaining 48 un-reviewed skills | All | Full coverage |
198
+
199
+ ## Template-Level Metrics
200
+
201
+ | Metric | Target | Current | Status |
202
+ | ----------------------------- | ------ | ---------------- | ---------- |
203
+ | Core skills at Exemplary tier | 100% | 50% (5/10 core) | Needs work |
204
+ | No skills at Poor tier | 0 | 2 | Needs work |
205
+ | Average token cost per skill | <1500 | ~1.5k (reviewed) | Borderline |
206
+ | Skills with WHEN/SKIP gates | 100% | 100% (reviewed) | PASS |
207
+ | Skills with anti-patterns | >75% | 44% (11/25) | Needs work |
208
+ | Overlap/redundancy pairs | 0 | 6 pairs | Needs work |
209
+
210
+ ---
211
+
212
+ _Next: Apply improvement priorities starting with #1 (add "Replaces X" to tier 2 skills)._
213
+ _Re-audit after changes to measure improvement._
Binary file