openhermes 4.0.1 → 4.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/ETHOS.md +6 -3
  2. package/LICENSE +21 -21
  3. package/README.md +111 -81
  4. package/bootstrap.ts +405 -0
  5. package/harness/agents/openhermes.md +45 -55
  6. package/harness/codex/AUTOPILOT.md +126 -0
  7. package/harness/codex/CONSTITUTION.md +14 -11
  8. package/harness/codex/ROUTING.md +35 -69
  9. package/harness/commands/oh-log.md +18 -0
  10. package/harness/instructions/RUNTIME.md +27 -51
  11. package/harness/skills/oh-builder/SKILL.md +27 -16
  12. package/harness/skills/oh-caveman/SKILL.md +9 -0
  13. package/harness/skills/oh-expert/SKILL.md +6 -0
  14. package/harness/skills/oh-facade/SKILL.md +298 -0
  15. package/harness/skills/oh-freeze/SKILL.md +9 -0
  16. package/harness/skills/oh-full-output/SKILL.md +81 -0
  17. package/harness/skills/oh-fusion/SKILL.md +314 -0
  18. package/harness/skills/oh-gauntlet/SKILL.md +10 -6
  19. package/harness/skills/oh-grill/SKILL.md +9 -5
  20. package/harness/skills/oh-guard/SKILL.md +9 -0
  21. package/harness/skills/oh-handoff/SKILL.md +9 -0
  22. package/harness/skills/oh-health/SKILL.md +8 -4
  23. package/harness/skills/oh-init/SKILL.md +80 -13
  24. package/harness/skills/oh-investigate/SKILL.md +57 -8
  25. package/harness/skills/oh-issue/SKILL.md +9 -0
  26. package/harness/skills/oh-learn/SKILL.md +81 -8
  27. package/harness/skills/oh-manifest/SKILL.md +55 -11
  28. package/harness/skills/oh-plan-review/SKILL.md +15 -8
  29. package/harness/skills/oh-planner/SKILL.md +18 -8
  30. package/harness/skills/oh-prd/SKILL.md +9 -0
  31. package/harness/skills/oh-refactor/SKILL.md +426 -0
  32. package/harness/skills/oh-retro/SKILL.md +9 -0
  33. package/harness/skills/oh-review/SKILL.md +12 -5
  34. package/harness/skills/oh-security/SKILL.md +4 -0
  35. package/harness/skills/oh-ship/SKILL.md +10 -0
  36. package/harness/skills/oh-skill-craft/SKILL.md +88 -0
  37. package/harness/skills/oh-skills-link/SKILL.md +9 -0
  38. package/harness/skills/oh-skills-list/SKILL.md +9 -0
  39. package/harness/skills/oh-triage/SKILL.md +11 -0
  40. package/index.ts +3 -0
  41. package/lib/{harness-resolver.mjs → harness-resolver.ts} +16 -12
  42. package/lib/logger.ts +75 -0
  43. package/package.json +16 -10
  44. package/tsconfig.json +16 -0
  45. package/bootstrap.mjs +0 -174
  46. package/harness/instructions/CONVENTIONS.md +0 -206
  47. package/index.mjs +0 -3
  48. package/lib/logger.mjs +0 -62
  49. package/test/plugins-behavioral.test.mjs +0 -64
  50. package/test/plugins.test.mjs +0 -62
@@ -0,0 +1,426 @@
1
+ ---
2
+ name: oh-refactor
3
+ description: "Surgical, behavior-preserving code refactoring. Extract functions, eliminate duplication, improve type safety, remove dead code, simplify conditionals. Use when code is hard to maintain, functions are too long, code smells accumulate, or user asks to clean up/improve/refactor code."
4
+ tier: 3
5
+ benefits-from: [oh-investigate, oh-review]
6
+ triggers:
7
+ - "refactor"
8
+ - "clean up"
9
+ - "improve this code"
10
+ - "code smell"
11
+ - "make this better"
12
+ - "extract method"
13
+ - "reduce duplication"
14
+ - "fix this mess"
15
+ - "technical debt"
16
+ - "god function"
17
+ - "long method"
18
+ - "nested conditionals"
19
+ route:
20
+ pass: oh-gauntlet
21
+ fail: oh-planner
22
+ blocker: surface
23
+ ---
24
+
25
+ # oh-refactor
26
+
27
+ Improve code structure and readability **without changing external behavior**. Refactoring is gradual evolution, not revolution.
28
+
29
+ ## When to Use
30
+
31
+ - Code is hard to understand or maintain
32
+ - Functions/classes are too large
33
+ - Code smells need addressing
34
+ - Adding features is difficult due to code structure
35
+ - User asks "clean up this code", "refactor this", "improve this"
36
+ - Technical debt has accumulated to the point it slows development
37
+
38
+ ## Refactoring Principles
39
+
40
+ ### Golden Rules
41
+
42
+ 1. **Behavior is preserved** — Refactoring doesn't change what the code does, only how. If the goal changes behavior, that's a feature, not a refactor.
43
+ 2. **Small steps** — Make one change, verify, commit, repeat. Never batch refactoring changes.
44
+ 3. **Tests are essential** — Without a fast, reliable test, you're not refactoring, you're editing blind. Write tests first if they don't exist.
45
+ 4. **One thing at a time** — Never mix refactoring with feature changes in the same commit.
46
+ 5. **Commit between safe states** — Commit before starting, commit after each green test run.
47
+
48
+ ### When NOT to Refactor
49
+
50
+ - Code that works and will never change again
51
+ - Critical production code without tests (add tests first)
52
+ - Under a tight deadline with no test safety net
53
+ - "Just because" — every refactor needs a clear purpose
54
+
55
+ ## Workflow
56
+
57
+ ### Phase 1: Prepare
58
+
59
+ 1. **Check test coverage** — run existing tests. If coverage is thin, write characterization tests that lock down current behavior before touching anything.
60
+ 2. **Commit current state** — `git commit` so you can diff and revert cleanly.
61
+ 3. **Create a feature branch** — isolate refactoring work from other changes.
62
+
63
+ ### Phase 2: Identify
64
+
65
+ 1. Find the code smell to address (see [Code Smells](#common-code-smells--fixes) below).
66
+ 2. Understand what the code actually does — trace all code paths.
67
+ 3. Plan the smallest refactoring that makes the problem better.
68
+ 4. If behavior is unclear, delegate to `oh-investigate` before refactoring.
69
+
70
+ ### Phase 3: Refactor (small steps)
71
+
72
+ 1. Make one small change.
73
+ 2. Run tests.
74
+ 3. Commit if tests pass.
75
+ 4. Repeat until the smell is gone.
76
+
77
+ ### Phase 4: Verify
78
+
79
+ 1. All tests pass.
80
+ 2. Manual smoke test if full coverage is missing.
81
+ 3. Performance unchanged or improved.
82
+ 4. Diff shows only structural changes — no logic changes.
83
+
84
+ ### Phase 5: Clean Up
85
+
86
+ 1. Remove commented-out code, stale imports, dead paths.
87
+ 2. Update inline docs only if behavior semantics changed.
88
+ 3. Final commit.
89
+ 4. Optionally route to `oh-review` for post-refactor quality gate.
90
+
91
+ ## Common Code Smells & Fixes
92
+
93
+ ### 1. Long Method/Function
94
+
95
+ ```diff
96
+ # BAD: 200-line function that does everything
97
+ - async function processOrder(orderId) {
98
+ - // 50 lines: fetch order
99
+ - // 30 lines: validate order
100
+ - // 40 lines: calculate pricing
101
+ - // 30 lines: update inventory
102
+ - // 20 lines: create shipment
103
+ - // 30 lines: send notifications
104
+ - }
105
+
106
+ # GOOD: Broken into focused functions
107
+ + async function processOrder(orderId) {
108
+ + const order = await fetchOrder(orderId);
109
+ + validateOrder(order);
110
+ + const pricing = calculatePricing(order);
111
+ + await updateInventory(order);
112
+ + const shipment = await createShipment(order);
113
+ + await sendNotifications(order, pricing, shipment);
114
+ + return { order, pricing, shipment };
115
+ + }
116
+ ```
117
+
118
+ ### 2. Duplicated Code
119
+
120
+ ```diff
121
+ # BAD: Same logic in multiple places
122
+ - function calculateUserDiscount(user) {
123
+ - if (user.membership === 'gold') return user.total * 0.2;
124
+ - if (user.membership === 'silver') return user.total * 0.1;
125
+ - return 0;
126
+ - }
127
+ - function calculateOrderDiscount(order) {
128
+ - if (order.user.membership === 'gold') return order.total * 0.2;
129
+ - if (order.user.membership === 'silver') return order.total * 0.1;
130
+ - return 0;
131
+ - }
132
+
133
+ # GOOD: Extract common logic
134
+ + function getMembershipDiscountRate(membership) {
135
+ + const rates = { gold: 0.2, silver: 0.1 };
136
+ + return rates[membership] || 0;
137
+ + }
138
+ + function calculateUserDiscount(user) {
139
+ + return user.total * getMembershipDiscountRate(user.membership);
140
+ + }
141
+ + function calculateOrderDiscount(order) {
142
+ + return order.total * getMembershipDiscountRate(order.user.membership);
143
+ + }
144
+ ```
145
+
146
+ ### 3. Large Class/Module (God Object)
147
+
148
+ ```diff
149
+ # BAD: God object that knows too much
150
+ - class UserManager {
151
+ - createUser() { /* ... */ }
152
+ - updateUser() { /* ... */ }
153
+ - deleteUser() { /* ... */ }
154
+ - sendEmail() { /* ... */ }
155
+ - generateReport() { /* ... */ }
156
+ - handlePayment() { /* ... */ }
157
+ - validateAddress() { /* ... */ }
158
+ - }
159
+
160
+ # GOOD: Single responsibility per class
161
+ + class UserService {
162
+ + create(data) { /* ... */ }
163
+ + update(id, data) { /* ... */ }
164
+ + delete(id) { /* ... */ }
165
+ + }
166
+ + class EmailService {
167
+ + send(to, subject, body) { /* ... */ }
168
+ + }
169
+ + class ReportService {
170
+ + generate(type, params) { /* ... */ }
171
+ + }
172
+ + class PaymentService {
173
+ + process(amount, method) { /* ... */ }
174
+ + }
175
+ ```
176
+
177
+ ### 4. Long Parameter List
178
+
179
+ ```diff
180
+ # BAD: Too many parameters
181
+ - function createUser(email, password, name, age, address, city, country, phone) { }
182
+
183
+ # GOOD: Group related parameters
184
+ + interface UserData {
185
+ + email: string;
186
+ + password: string;
187
+ + name: string;
188
+ + age?: number;
189
+ + address?: Address;
190
+ + phone?: string;
191
+ + }
192
+ + function createUser(data: UserData) { }
193
+ ```
194
+
195
+ ### 5. Feature Envy
196
+
197
+ ```diff
198
+ # BAD: Method uses another object's data more than its own
199
+ - class Order {
200
+ - calculateDiscount(user) {
201
+ - if (user.membershipLevel === 'gold') return this.total * 0.2;
202
+ - if (user.accountAge > 365) return this.total * 0.1;
203
+ - return 0;
204
+ - }
205
+ - }
206
+
207
+ # GOOD: Move logic to the object that owns the data
208
+ + class User {
209
+ + getDiscountRate(orderTotal) {
210
+ + if (this.membershipLevel === 'gold') return 0.2;
211
+ + if (this.accountAge > 365) return 0.1;
212
+ + return 0;
213
+ + }
214
+ + }
215
+ + class Order {
216
+ + calculateDiscount(user) {
217
+ + return this.total * user.getDiscountRate(this.total);
218
+ + }
219
+ + }
220
+ ```
221
+
222
+ ### 6. Primitive Obsession
223
+
224
+ ```diff
225
+ # BAD: Using primitives for domain concepts
226
+ - function sendEmail(to, subject, body) { }
227
+ - sendEmail('user@example.com', 'Hello', '...');
228
+
229
+ # GOOD: Use domain types
230
+ + class Email {
231
+ + private constructor(public readonly value: string) {
232
+ + if (!Email.isValid(value)) throw new Error('Invalid email');
233
+ + }
234
+ + static create(value: string) { return new Email(value); }
235
+ + static isValid(email: string) { return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email); }
236
+ + }
237
+ ```
238
+
239
+ ### 7. Magic Numbers/Strings
240
+
241
+ ```diff
242
+ # BAD: Unexplained values
243
+ - if (user.status === 2) { }
244
+ - const discount = total * 0.15;
245
+
246
+ # GOOD: Named constants
247
+ + const UserStatus = { ACTIVE: 1, INACTIVE: 2, SUSPENDED: 3 } as const;
248
+ + const DISCOUNT_RATES = { STANDARD: 0.1, PREMIUM: 0.15, VIP: 0.2 } as const;
249
+ + if (user.status === UserStatus.INACTIVE) { }
250
+ + const discount = total * DISCOUNT_RATES.PREMIUM;
251
+ ```
252
+
253
+ ### 8. Nested Conditionals (Arrow Code)
254
+
255
+ ```diff
256
+ # BAD: Arrow code
257
+ - function process(order) {
258
+ - if (order) {
259
+ - if (order.user) {
260
+ - if (order.user.isActive) {
261
+ - if (order.total > 0) {
262
+ - return processOrder(order);
263
+ - } else { return { error: 'Invalid total' }; }
264
+ - } else { return { error: 'User inactive' }; }
265
+ - } else { return { error: 'No user' }; }
266
+ - } else { return { error: 'No order' }; }
267
+ - }
268
+
269
+ # GOOD: Guard clauses / early returns
270
+ + function process(order) {
271
+ + if (!order) return { error: 'No order' };
272
+ + if (!order.user) return { error: 'No user' };
273
+ + if (!order.user.isActive) return { error: 'User inactive' };
274
+ + if (order.total <= 0) return { error: 'Invalid total' };
275
+ + return processOrder(order);
276
+ + }
277
+ ```
278
+
279
+ ### 9. Dead Code
280
+
281
+ ```diff
282
+ # BAD: Unused code lingers
283
+ - function oldImplementation() { }
284
+ - const DEPRECATED_VALUE = 5;
285
+ - import { unusedThing } from './somewhere';
286
+
287
+ # GOOD: Remove it
288
+ + // Delete unused functions, imports, commented-out code
289
+ + // Git history has everything you need
290
+ ```
291
+
292
+ ### 10. Inappropriate Intimacy
293
+
294
+ ```diff
295
+ # BAD: One class reaches deep into another
296
+ - class OrderProcessor {
297
+ - process(order) {
298
+ - order.user.profile.address.street; // Too intimate
299
+ - }
300
+ - }
301
+
302
+ # GOOD: Ask, don't tell
303
+ + class OrderProcessor {
304
+ + process(order) {
305
+ + order.getShippingAddress(); // Order knows how to get it
306
+ + order.save();
307
+ + }
308
+ + }
309
+ ```
310
+
311
+ ## Extract Method Refactoring
312
+
313
+ ```diff
314
+ # Before: One long function
315
+ - function printReport(users) {
316
+ - console.log('USER REPORT');
317
+ - console.log('============');
318
+ - console.log(`Total users: ${users.length}`);
319
+ - console.log('ACTIVE USERS');
320
+ - const active = users.filter(u => u.isActive);
321
+ - active.forEach(u => console.log(`- ${u.name} (${u.email})`));
322
+ - console.log(`Active: ${active.length}`);
323
+ - console.log('INACTIVE USERS');
324
+ - const inactive = users.filter(u => !u.isActive);
325
+ - inactive.forEach(u => console.log(`- ${u.name} (${u.email})`));
326
+ - console.log(`Inactive: ${inactive.length}`);
327
+ - }
328
+
329
+ # After: Extracted methods
330
+ + function printReport(users) {
331
+ + printHeader('USER REPORT');
332
+ + console.log(`Total users: ${users.length}\n`);
333
+ + printUserSection('ACTIVE USERS', users.filter(u => u.isActive));
334
+ + printUserSection('INACTIVE USERS', users.filter(u => !u.isActive));
335
+ + }
336
+ + function printHeader(title) {
337
+ + const line = '='.repeat(title.length);
338
+ + console.log(title); console.log(line); console.log('');
339
+ + }
340
+ + function printUserSection(title, users) {
341
+ + console.log(title);
342
+ + console.log('-'.repeat(title.length));
343
+ + users.forEach(u => console.log(`- ${u.name} (${u.email})`));
344
+ + console.log(`${title.split(' ')[0]}: ${users.length}`);
345
+ + }
346
+ ```
347
+
348
+ ## Design Patterns for Refactoring
349
+
350
+ ### Strategy Pattern
351
+
352
+ Replace conditional branching with composable strategies:
353
+
354
+ ```diff
355
+ - function calculateShipping(order, method) {
356
+ - if (method === 'standard') return order.total > 50 ? 0 : 5.99;
357
+ - else if (method === 'express') return order.total > 100 ? 9.99 : 14.99;
358
+ - else if (method === 'overnight') return 29.99;
359
+ - }
360
+
361
+ + interface ShippingStrategy { calculate(order: Order): number; }
362
+ + class StandardShipping implements ShippingStrategy {
363
+ + calculate(order: Order) { return order.total > 50 ? 0 : 5.99; }
364
+ + }
365
+ + class ExpressShipping implements ShippingStrategy {
366
+ + calculate(order: Order) { return order.total > 100 ? 9.99 : 14.99; }
367
+ + }
368
+ + function calculateShipping(order: Order, strategy: ShippingStrategy) {
369
+ + return strategy.calculate(order);
370
+ + }
371
+ ```
372
+
373
+ ### Guard Clauses
374
+
375
+ Replace nested conditions with early returns. This is the single highest-ROI refactoring pattern — it flattens deeply nested code immediately.
376
+
377
+ ## Common Refactoring Operations
378
+
379
+ | Operation | Description |
380
+ |---|---|
381
+ | Extract Method | Turn code fragment into named function |
382
+ | Extract Class | Move related behavior to new class |
383
+ | Inline Method | Move method body back to single caller |
384
+ | Rename Method/Variable | Improve clarity |
385
+ | Introduce Parameter Object | Group related parameters |
386
+ | Replace Conditional with Polymorphism | Dispatch by type instead of if/switch |
387
+ | Replace Magic Number with Constant | Named constants for literals |
388
+ | Decompose Conditional | Break complex conditions into named predicates |
389
+ | Consolidate Conditional | Combine duplicate conditions |
390
+ | Replace Nested Conditional with Guard Clauses | Early returns |
391
+ | Replace Inheritance with Delegation | Composition over inheritance |
392
+
393
+ ## Refactoring Checklist
394
+
395
+ ### Code Quality
396
+ - [ ] Functions are small (< 50 lines)
397
+ - [ ] Functions do one thing
398
+ - [ ] No duplicated code
399
+ - [ ] Descriptive names (variables, functions, classes)
400
+ - [ ] No magic numbers/strings
401
+ - [ ] Dead code removed
402
+
403
+ ### Structure
404
+ - [ ] Related code is together
405
+ - [ ] Clear module boundaries
406
+ - [ ] Dependencies flow in one direction
407
+ - [ ] No circular dependencies
408
+
409
+ ### Type Safety
410
+ - [ ] Types defined for all public APIs
411
+ - [ ] No `any` types without justification
412
+ - [ ] Nullable types explicitly marked
413
+
414
+ ### Testing
415
+ - [ ] Refactored code is tested
416
+ - [ ] Tests cover edge cases
417
+ - [ ] All tests pass
418
+
419
+ ## Routing
420
+
421
+ | Outcome | Route |
422
+ |---|---|
423
+ | pass | -> oh-review (post-refactor quality gate) |
424
+ | behavior unclear | -> oh-investigate (diagnose before refactoring) |
425
+ | test gap found | -> oh-builder TDD mode (add characterization tests first) |
426
+ | blocker | -> surface to user |
@@ -1,6 +1,15 @@
1
1
  ---
2
2
  name: oh-retro
3
3
  description: "Weekly engineering retrospective — analyze commit history and work patterns"
4
+ tier: 3
5
+ triggers:
6
+ - "retrospective"
7
+ - "retro for"
8
+ - "post-ship review"
9
+ route:
10
+ pass: oh-planner
11
+ fail: oh-handoff
12
+ blocker: surface
4
13
  ---
5
14
 
6
15
  # oh-retro
@@ -4,12 +4,19 @@ description: "Two-axis code and design review: Standards (conformance) + Spec (f
4
4
  tier: 3
5
5
  benefits-from: [oh-expert]
6
6
  triggers:
7
- - "review"
8
- - "code review"
9
- - "review since"
10
- - "review changes"
7
+ - "code review please"
8
+ - "review the code"
9
+ - "review the PR"
10
+ - "review changes since"
11
11
  - "pr review"
12
12
  - "design review"
13
+ - "review this code"
14
+ route:
15
+ pass:
16
+ - oh-gauntlet
17
+ - oh-ship
18
+ fail: oh-builder
19
+ blocker: surface
13
20
  ---
14
21
 
15
22
  # oh-review
@@ -45,7 +52,7 @@ Collect all files documenting how code should be written:
45
52
  - AGENTS.md, CLAUDE.md, CONTRIBUTING.md
46
53
  - CONTEXT.md, ADRs
47
54
  - eslint/biome/prettier config (note tool-enforced ones — don't re-check)
48
- - Any STYLE.md, STANDARDS.md, STYLEGUIDE.md
55
+
49
56
 
50
57
  ### 4. Spawn Both Sub-Agents (parallel)
51
58
 
@@ -11,6 +11,10 @@ triggers:
11
11
  - "pentest"
12
12
  - "security review"
13
13
  - "cso"
14
+ route:
15
+ pass: surface
16
+ fail: oh-investigate
17
+ blocker: surface
14
18
  ---
15
19
 
16
20
  # oh-security
@@ -1,6 +1,16 @@
1
1
  ---
2
2
  name: oh-ship
3
3
  description: "Deploy and PR pipeline — test, bump, changelog, PR, deploy, verify"
4
+ tier: 4
5
+ triggers:
6
+ - "ship this"
7
+ - "create a PR"
8
+ - "version bump"
9
+ - "publish"
10
+ route:
11
+ pass: oh-retro
12
+ fail: oh-expert
13
+ blocker: surface
4
14
  ---
5
15
 
6
16
  # oh-ship
@@ -10,6 +10,10 @@ triggers:
10
10
  - "skill-craft"
11
11
  - "meta-skill"
12
12
  - "add a capability"
13
+ route:
14
+ pass: oh-skills-link
15
+ fail: oh-expert
16
+ blocker: surface
13
17
  ---
14
18
 
15
19
  # oh-skill-craft
@@ -83,6 +87,10 @@ The description is the only thing the agent sees when deciding which skill to lo
83
87
 
84
88
  Scripts save tokens and improve reliability vs generated code.
85
89
 
90
+ ## Output Location
91
+
92
+ Skills created with oh-skill-craft should be written to `~/.config/opencode/skills/` (or `~/.agents/skills/` if the user prefers). Built-in skills live in the package `harness/skills/` and get replaced on npm update. User-written skills in `~/.config/opencode/skills/` survive updates and are auto-discovered on every session. On name conflict with a built-in skill, the user version wins.
93
+
86
94
  ## When to Split Files
87
95
  - SKILL.md exceeds 100 lines
88
96
  - Content has distinct domains
@@ -98,10 +106,90 @@ Scripts save tokens and improve reliability vs generated code.
98
106
  - [ ] Anti-patterns documented
99
107
  - [ ] Tests still pass after adding (`npm test`)
100
108
 
109
+ ## Eval-Driven Iteration
110
+
111
+ After writing the initial skill draft, iterate using test cases and evidence rather than guessing.
112
+
113
+ ### 1. Create Test Cases
114
+
115
+ Come up with 2-3 realistic test prompts — the kind of thing a real user would actually say. Save to `evals/evals.json`:
116
+
117
+ ```json
118
+ {
119
+ "skill_name": "oh-<name>",
120
+ "evals": [
121
+ {
122
+ "id": 1,
123
+ "prompt": "User's realistic task prompt",
124
+ "expected_output": "Description of expected result",
125
+ "files": []
126
+ }
127
+ ]
128
+ }
129
+ ```
130
+
131
+ Good test prompts are substantive multi-step tasks — not simple queries like "read this file." The model can handle simple tasks without a skill. Complex, multi-step, or specialized queries reveal whether the skill is pulling its weight.
132
+
133
+ ### 2. Spawn Runs
134
+
135
+ For each test case, spawn two subagents in parallel:
136
+ - **With-skill run** — load the skill, execute the task
137
+ - **Baseline run** — same prompt without the skill (for new skills) or with the previous version (for improvements)
138
+
139
+ Save outputs to `iteration-<N>/eval-<ID>/with_skill/outputs/` and `iteration-<N>/eval-<ID>/without_skill/outputs/`.
140
+
141
+ ### 3. Draft Assertions
142
+
143
+ While runs execute, draft objectively verifiable assertions for each test case. Good assertions have descriptive names and can be checked programmatically where possible. Update `evals/evals.json` with the assertions.
144
+
145
+ ### 4. Grade and Compare
146
+
147
+ Grade runs against assertions. Aggregate results into pass rates, timing, and token usage. Look for:
148
+ - Assertions that always pass regardless of skill (non-discriminating — remove them)
149
+ - High-variance evals (possibly flaky tests)
150
+ - Time/token tradeoffs between skill and baseline
151
+
152
+ ### 5. Improve
153
+
154
+ Based on results, revise the skill. Generalize from specific failures rather than overfitting to the test cases. The goal is a skill that works across a million different prompts, not just 2-3 examples. Keep instructions lean — remove anything not pulling its weight.
155
+
156
+ ### 6. Loop
157
+
158
+ Rerun all test cases into a new iteration directory. Repeat until:
159
+ - User says they're happy
160
+ - All feedback is positive
161
+ - No meaningful progress between iterations
162
+
163
+ ## Description Optimization
164
+
165
+ The description field in frontmatter is the primary mechanism for skill triggering. After the skill is solid, optimize the description for accuracy.
166
+
167
+ ### Trigger Eval Queries
168
+
169
+ Create 20 eval queries — a mix of should-trigger and should-not-trigger cases:
170
+
171
+ ```json
172
+ [
173
+ {"query": "realistic user prompt that should trigger", "should_trigger": true},
174
+ {"query": "near-miss prompt that should NOT trigger", "should_trigger": false}
175
+ ]
176
+ ```
177
+
178
+ Key principles:
179
+ - **Should-trigger** (8-10): different phrasings of the same intent — formal, casual. Include edge cases and contexts where this skill competes with another but should win.
180
+ - **Should-not-trigger** (8-10): near-misses that share keywords but need a different skill. Avoid obviously irrelevant queries — the hard cases are the adjacent ones.
181
+
182
+ Queries must be realistic — what a user would actually type, with concrete details, not abstract descriptions.
183
+
184
+ ### Run Optimization
185
+
186
+ Iterate the description: test current, propose improvements based on failures, re-test. Select the description that scores best on held-out test data. Apply the winner to the skill's frontmatter.
187
+
101
188
  ## Routing
102
189
 
103
190
  | Outcome | Route |
104
191
  |---------|-------|
105
192
  | pass | → oh-skills-link (verify skill discovery) |
193
+ | iteration data available | → oh-learn (extract patterns from eval results) |
106
194
  | fail | → oh-expert (diagnose skill creation issues) |
107
195
  | blocker | → surface to user |
@@ -1,6 +1,15 @@
1
1
  ---
2
2
  name: oh-skills-link
3
3
  description: "Verify that OpenCode can discover the package-local skills directory"
4
+ tier: 2
5
+ triggers:
6
+ - "verify skills"
7
+ - "check skill discovery"
8
+ - "link skills"
9
+ route:
10
+ pass: surface
11
+ fail: oh-skill-craft
12
+ blocker: surface
4
13
  ---
5
14
 
6
15
  # oh-skills-link
@@ -1,6 +1,15 @@
1
1
  ---
2
2
  name: oh-skills-list
3
3
  description: "List all available oh-* skills with descriptions"
4
+ tier: 2
5
+ triggers:
6
+ - "list skills"
7
+ - "show skills"
8
+ - "what skills"
9
+ route:
10
+ pass: done
11
+ fail: surface
12
+ blocker: surface
4
13
  ---
5
14
 
6
15
  # oh-skills-list
@@ -1,6 +1,17 @@
1
1
  ---
2
2
  name: oh-triage
3
3
  description: "Issue triage state machine — classify, prioritise, assign"
4
+ tier: 2
5
+ triggers:
6
+ - "triage this issue"
7
+ - "classify this issue"
8
+ - "triage the backlog"
9
+ route:
10
+ pass:
11
+ - oh-issue
12
+ - oh-handoff
13
+ fail: oh-expert
14
+ blocker: surface
4
15
  ---
5
16
 
6
17
  # oh-triage
package/index.ts ADDED
@@ -0,0 +1,3 @@
1
+ import { BootstrapPlugin } from "./bootstrap.ts"
2
+
3
+ export default BootstrapPlugin