openhermes 4.3.0 → 4.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (143) hide show
  1. package/CONTEXT.md +10 -1
  2. package/README.md +54 -42
  3. package/bootstrap.ts +396 -142
  4. package/harness/agents/oh-browser.md +97 -0
  5. package/harness/agents/oh-builder.md +78 -0
  6. package/harness/agents/oh-facade.md +75 -0
  7. package/harness/agents/oh-fusion.md +45 -0
  8. package/harness/agents/oh-gauntlet.md +71 -0
  9. package/harness/agents/oh-grill.md +71 -0
  10. package/harness/agents/oh-investigate.md +60 -0
  11. package/harness/agents/oh-manifest.md +95 -0
  12. package/harness/agents/oh-plan-review.md +40 -0
  13. package/harness/agents/oh-planner.md +50 -0
  14. package/harness/agents/oh-refactor.md +37 -0
  15. package/harness/agents/oh-retro.md +46 -0
  16. package/harness/agents/oh-review.md +85 -0
  17. package/harness/agents/oh-security.md +83 -0
  18. package/harness/agents/oh-ship.md +76 -0
  19. package/harness/agents/oh-skill-craft.md +38 -0
  20. package/harness/agents/openhermes.md +28 -73
  21. package/harness/codex/AUTOPILOT.md +235 -87
  22. package/harness/codex/CHARTER.md +80 -0
  23. package/harness/instructions/SHELL.md +76 -0
  24. package/harness/lib/background/background.test.ts +197 -0
  25. package/harness/lib/background/index.ts +7 -0
  26. package/harness/lib/background/interfaces.ts +31 -0
  27. package/harness/lib/background/manager.ts +320 -0
  28. package/harness/lib/composer/compose.test.ts +168 -0
  29. package/harness/lib/composer/compose.ts +65 -0
  30. package/harness/lib/composer/fragments/01-identity.md +1 -0
  31. package/harness/lib/composer/fragments/02-delegation.md +6 -0
  32. package/harness/lib/composer/fragments/03-permissions.md +13 -0
  33. package/harness/lib/composer/fragments/04-task-flow.md +15 -0
  34. package/harness/lib/composer/fragments/05-confidence.md +5 -0
  35. package/harness/lib/composer/fragments/06-parallelization.md +17 -0
  36. package/harness/lib/composer/fragments/07-shell.md +41 -0
  37. package/harness/lib/composer/fragments/08-routing.md +8 -0
  38. package/harness/lib/composer/fragments/09-guardrails.md +12 -0
  39. package/harness/lib/composer/index.ts +1 -0
  40. package/harness/lib/hooks/builtins/confidence-gate-hook.ts +70 -0
  41. package/harness/lib/hooks/builtins/delegation-depth-hook.ts +59 -0
  42. package/harness/lib/hooks/builtins/error-recovery-hook.ts +107 -0
  43. package/harness/lib/hooks/builtins/memory-sync-hook.ts +73 -0
  44. package/harness/lib/hooks/builtins/plan-check-hook.ts +43 -0
  45. package/harness/lib/hooks/builtins/route-tracking-hook.ts +147 -0
  46. package/harness/lib/hooks/builtins/sanity-check-hook.ts +52 -0
  47. package/harness/lib/hooks/builtins/shell-detect-hook.ts +96 -0
  48. package/harness/lib/hooks/hooks.test.ts +1016 -0
  49. package/harness/lib/hooks/index.ts +30 -0
  50. package/harness/lib/hooks/registry.ts +416 -0
  51. package/harness/lib/hooks/types.ts +71 -0
  52. package/harness/lib/memory/index.ts +18 -0
  53. package/harness/lib/memory/interfaces.ts +53 -0
  54. package/harness/lib/memory/memory-manager.ts +205 -0
  55. package/harness/lib/memory/memory.test.ts +491 -0
  56. package/harness/lib/memory/plan-store.ts +366 -0
  57. package/harness/lib/recovery/handler.ts +243 -0
  58. package/harness/lib/recovery/index.ts +14 -0
  59. package/harness/lib/recovery/interfaces.ts +48 -0
  60. package/harness/lib/recovery/patterns.ts +149 -0
  61. package/harness/lib/recovery/recovery.test.ts +312 -0
  62. package/harness/lib/sanity/anomaly-tracker.ts +127 -0
  63. package/harness/lib/sanity/checker.ts +178 -0
  64. package/harness/lib/sanity/index.ts +13 -0
  65. package/harness/lib/sanity/interfaces.ts +24 -0
  66. package/harness/lib/sanity/sanity.test.ts +472 -0
  67. package/harness/lib/sync/file-watcher.ts +174 -0
  68. package/harness/lib/sync/index.ts +11 -0
  69. package/harness/lib/sync/interfaces.ts +27 -0
  70. package/harness/lib/sync/plan-sync.ts +536 -0
  71. package/harness/lib/sync/sync.test.ts +832 -0
  72. package/harness/skills/oh-ascii/DEEP.md +292 -0
  73. package/harness/skills/oh-ascii/SKILL.md +31 -0
  74. package/harness/skills/oh-ascii/scripts/check_ascii_alignment.py +596 -0
  75. package/harness/skills/oh-browser/DEEP.md +54 -0
  76. package/harness/skills/oh-browser/SKILL.md +30 -0
  77. package/harness/skills/oh-builder/DEEP.md +63 -0
  78. package/harness/skills/oh-builder/SKILL.md +12 -90
  79. package/harness/skills/oh-expert/DEEP.md +85 -0
  80. package/harness/skills/oh-expert/SKILL.md +13 -106
  81. package/harness/skills/oh-facade/DEEP.md +182 -0
  82. package/harness/skills/oh-facade/SKILL.md +15 -279
  83. package/harness/skills/oh-freeze/DEEP.md +18 -0
  84. package/harness/skills/oh-freeze/SKILL.md +10 -19
  85. package/harness/skills/oh-full-output/DEEP.md +25 -0
  86. package/harness/skills/oh-full-output/SKILL.md +12 -65
  87. package/harness/skills/oh-fusion/DEEP.md +120 -0
  88. package/harness/skills/oh-fusion/SKILL.md +17 -295
  89. package/harness/skills/oh-gauntlet/DEEP.md +77 -0
  90. package/harness/skills/oh-gauntlet/SKILL.md +13 -105
  91. package/harness/skills/oh-grill/DEEP.md +51 -0
  92. package/harness/skills/oh-grill/SKILL.md +12 -63
  93. package/harness/skills/oh-guard/DEEP.md +19 -0
  94. package/harness/skills/oh-guard/SKILL.md +10 -24
  95. package/harness/skills/oh-handoff/DEEP.md +48 -0
  96. package/harness/skills/oh-handoff/SKILL.md +13 -23
  97. package/harness/skills/oh-health/DEEP.md +74 -0
  98. package/harness/skills/oh-health/SKILL.md +13 -76
  99. package/harness/skills/oh-init/DEEP.md +85 -0
  100. package/harness/skills/oh-init/SKILL.md +13 -127
  101. package/harness/skills/oh-investigate/DEEP.md +171 -0
  102. package/harness/skills/oh-investigate/SKILL.md +13 -66
  103. package/harness/skills/oh-issue/DEEP.md +21 -0
  104. package/harness/skills/oh-issue/SKILL.md +11 -27
  105. package/harness/skills/oh-manifest/DEEP.md +92 -0
  106. package/harness/skills/oh-manifest/SKILL.md +12 -109
  107. package/harness/skills/oh-plan-review/DEEP.md +90 -0
  108. package/harness/skills/oh-plan-review/SKILL.md +13 -115
  109. package/harness/skills/oh-planner/DEEP.md +172 -0
  110. package/harness/skills/oh-planner/SKILL.md +12 -149
  111. package/harness/skills/oh-prd/DEEP.md +45 -0
  112. package/harness/skills/oh-prd/SKILL.md +10 -26
  113. package/harness/skills/oh-refactor/DEEP.md +122 -0
  114. package/harness/skills/oh-refactor/SKILL.md +17 -410
  115. package/harness/skills/oh-retro/DEEP.md +26 -0
  116. package/harness/skills/oh-retro/SKILL.md +12 -24
  117. package/harness/skills/oh-review/DEEP.md +87 -0
  118. package/harness/skills/oh-review/SKILL.md +11 -97
  119. package/harness/skills/oh-security/DEEP.md +83 -0
  120. package/harness/skills/oh-security/SKILL.md +14 -96
  121. package/harness/skills/oh-ship/DEEP.md +141 -0
  122. package/harness/skills/oh-ship/SKILL.md +14 -32
  123. package/harness/skills/oh-skill-craft/DEEP.md +369 -0
  124. package/harness/skills/oh-skill-craft/SKILL.md +13 -177
  125. package/harness/skills/oh-skills-link/DEEP.md +16 -0
  126. package/harness/skills/oh-skills-link/SKILL.md +10 -20
  127. package/harness/skills/oh-skills-list/DEEP.md +20 -0
  128. package/harness/skills/oh-skills-list/SKILL.md +9 -22
  129. package/harness/skills/oh-triage/DEEP.md +23 -0
  130. package/harness/skills/oh-triage/SKILL.md +8 -24
  131. package/harness/skills/oh-worktree/DEEP.md +169 -0
  132. package/harness/skills/oh-worktree/SKILL.md +32 -0
  133. package/lib/harness-resolver.ts +8 -10
  134. package/package.json +7 -5
  135. package/tsconfig.json +1 -1
  136. package/harness/codex/CONSTITUTION.md +0 -73
  137. package/harness/codex/ROUTING.md +0 -92
  138. package/harness/commands/oh-doctor.md +0 -26
  139. package/harness/commands/oh-log.md +0 -18
  140. package/harness/instructions/RUNTIME.md +0 -30
  141. package/harness/skills/oh-caveman/SKILL.md +0 -42
  142. package/harness/skills/oh-learn/SKILL.md +0 -101
  143. package/lib/logger.ts +0 -75
@@ -1,24 +1,7 @@
1
1
  ---
2
2
  name: oh-fusion
3
- description: "Skill ingestion pipeline: discover, analyze, filter, adapt, fuse, and integrate external skills into the OH harness. Use when the user has an existing skill, finds a skill in their .agents/skills, or wants to bring an external capability into OH."
3
+ description: "Use when the user has an existing skill, finds a skill in their .agents/skills, or wants to bring an external capability into OH as a skill."
4
4
  tier: 3
5
- benefits-from: [oh-skill-craft, oh-skills-link, oh-expert]
6
- triggers:
7
- - "import skill"
8
- - "ingest skill"
9
- - "fuse skill"
10
- - "merge skills"
11
- - "port skill"
12
- - "add skill from"
13
- - "make this OH-native"
14
- - "skill fusion"
15
- - "oh-fusion"
16
- - "integrate skill"
17
- - "convert skill"
18
- - "bring in a skill"
19
- - "transfer skill"
20
- - "copy skill"
21
- - "adopt skill"
22
5
  route:
23
6
  pass:
24
7
  - oh-skills-link
@@ -29,286 +12,25 @@ route:
29
12
 
30
13
  # oh-fusion
31
14
 
32
- The skill ingestion pipeline: discover external skills, evaluate signal quality, filter out noise, adapt to OH conventions, fuse multiple into one, and integrate into the harness.
15
+ Skill ingestion pipeline: Discover Analyze Decide Adapt Fuse Integrate.
33
16
 
34
- Every skill you run through `oh-fusion` becomes part of the closed loop — wired into AUTOPILOT, ROUTING.md, AGENTS.md, and the self-driving engine.
17
+ ## Steps
35
18
 
36
- ## When to Use
37
-
38
- - The user points at a skill in `.agents/skills` and says "make this OH-native"
39
- - The user has a skill from `npx skills` ecosystem they want integrated
40
- - The user provides raw skill content and asks "is this worth keeping?"
41
- - Multiple skills need fusing into one (like the `oh-facade` fusion in this session)
42
- - Any external capability needs to become an `oh-*` skill with full wiring
43
-
44
- ## Pipeline
45
-
46
- 6-phase closed loop:
47
-
48
- ```
49
- Discovery → Analysis → Decision → Adaptation → Fusion (opt) → Integration
50
-
51
- oh-skills-link (verify)
52
- ```
53
-
54
- ---
55
-
56
- ## Phase 1: Discovery
57
-
58
- Input: user's skill source
59
- Output: raw skill content loaded for analysis
60
-
61
- ### Sources
62
-
63
- | Source | How to access |
64
- |---|---|
65
- | `.agents/skills/<name>/SKILL.md` | Read the file directly |
66
- | `npx skills` package | Run `npx skills find <query>` or check `skills.sh` |
67
- | URL to a skill | Fetch the content via web fetch |
68
- | User-provided path | Resolve and read |
69
- | User-provided content inline | Capture the raw text |
70
- | Multiple skills (for fusion) | Load all, enter Phase 2 on each |
71
-
72
- ### Discovery Checklist
73
-
74
- Before proceeding, confirm:
75
- - [ ] Skill content is loaded and readable
76
- - [ ] Frontmatter is present (name, description)
77
- - [ ] There are no access restrictions or permissions needed
78
- - [ ] For multiple skills: all are loaded and ready for comparison
79
-
80
- ---
81
-
82
- ## Phase 2: Analysis
83
-
84
- Input: raw skill content
85
- Output: structured analysis report with signal score
86
-
87
- ### 2a. Depth Scoring
88
-
89
- Measure the skill's substantive content:
90
-
91
- | Metric | How to assess |
92
- |---|---|
93
- | Total lines | SKILL.md length |
94
- | Concrete rules count | Number of "must", "never", "always", "banned" directives |
95
- | Example count | Number of code blocks showing before/after or usage |
96
- | Anti-patterns listed | Explicit "don't do this" sections |
97
- | Workflow steps | Number of sequential, actionable steps |
98
- | Routing table | Does it define pass/fail/blocker routing? |
99
-
100
- **Scoring:**
101
- - **High signal** (70-100): Multiple concrete rules, examples, anti-patterns, workflow steps, routing
102
- - **Medium signal** (30-69): Some structure but thin on specifics, few examples
103
- - **Low signal** (0-29): Vague descriptions, no concrete rules, no anti-patterns, "be creative" level
104
-
105
- ### 2b. Overlap Detection
106
-
107
- Compare against all existing OH skills (`harness/skills/oh-*/SKILL.md`):
108
-
109
- - Does any existing OH skill cover the same domain?
110
- - Is the overlap partial (complementary) or complete (redundant)?
111
- - Does the external skill have unique content OH lacks?
112
-
113
- ### 2c. Convention Check
114
-
115
- Does the skill follow good practices?
116
-
117
- - [ ] Has clear description for triggering
118
- - [ ] Has concrete, actionable instructions (not just philosophy)
119
- - [ ] Has anti-patterns or failure modes documented
120
- - [ ] Has examples or code blocks
121
- - [ ] Has measurable outcomes (not subjective "make it good")
122
- - [ ] Avoids time-sensitive references (dates, version numbers)
123
- - [ ] Avoids platform-specific assumptions that don't apply
124
-
125
- ### 2d. Report
126
-
127
- Output a structured report:
128
-
129
- ```markdown
130
- ## Analysis: <skill-name>
131
-
132
- **Source:** <path or origin>
133
- **Depth score:** <0-100> — <High/Medium/Low>
134
- **Total lines:** <N> | Concrete rules: <N> | Examples: <N> | Anti-patterns: <N>
135
- **Overlap:** <existing OH skill> — <none/partial/complete>
136
- **Verdict:** <keep / fuse / discard / ask>
137
-
138
- **Strengths:**
139
- - <what this skill does well>
140
-
141
- **Weaknesses:**
142
- - <what is missing or weak>
143
-
144
- **Recommended action:** <port directly / fuse with X > / discard>
145
- ```
146
-
147
- ---
148
-
149
- ## Phase 3: Decision
150
-
151
- Based on the analysis, decide what to do:
152
-
153
- | Verdict | Action |
154
- |---|---|
155
- | **Keep** | High signal, no overlap, OH conventions missing. Port directly to `oh-<name>`. |
156
- | **Fuse** | Medium-high signal, partial overlap with existing OH skill(s). Merge complementary DNA. |
157
- | **Discard** | Low signal, complete overlap, too niche, or no actionable content. Surface reasoning. |
158
- | **Ask** | Ambiguous quality, unclear domain fit, or user needs to choose between approaches. Surface findings. |
159
-
160
- **Decision principles:**
161
- - When in doubt between keep and fuse, prefer fuse — conserves routing slots and reduces surface area
162
- - When in doubt between keep and discard, prefer keep if there is ANY unique signal — the autopilot won't load it unless triggered
163
- - Never fuse incompatible domains (e.g., UI design into a security skill) — the result is confusing
164
-
165
- ---
166
-
167
- ## Phase 4: Adaptation
168
-
169
- Input: raw skill content to keep/fuse
170
- Output: OH-native SKILL.md
171
-
172
- ### 4a. Rewrite Frontmatter
173
-
174
- ```markdown
175
- ---
176
- name: oh-<new-name>
177
- description: "Adapted from <source>. <Core function>. Use when <triggers>."
178
- tier: <2|3|4>
179
- benefits-from: [<relevant oh- skills this depends on>]
180
- triggers:
181
- - "<trigger phrase from original, adapted>"
182
- - "<new trigger phrases for OH context>"
183
- ---
184
- ```
185
-
186
- ### 4b. Structure the Body
187
-
188
- OH skill structure:
189
- 1. **Summary** — one paragraph of what the skill does
190
- 2. **When to Use** — clear triggering context
191
- 3. **Workflow** — numbered steps (the core of the skill)
192
- 4. **Anti-patterns** — what NOT to do
193
- 5. **Routing** — pass/fail/blocker table
194
-
195
- Adaptation rules:
196
- - Remove all emojis from content
197
- - Replace ecosystem-specific terminology with OH equivalents
198
- - Convert relative paths to OH harness conventions
199
- - Add routing table based on skill's purpose
200
- - Keep all concrete rules, examples, and anti-patterns from the original
201
- - Discard fluff, philosophy, and motivational language
202
- - Preserve the original's unique signal — that's why you're importing it
203
-
204
- ### 4c. Naming
205
-
206
- - Name must match `^[a-z0-9]+(-[a-z0-9]+)*$`
207
- - Prefix with `oh-`
208
- - Use the original name if it maps well, adapt if not
209
- - For fusions: invent a new name that captures the combined purpose
210
-
211
- ---
212
-
213
- ## Phase 5: Fusion (optional — skip for single-skill imports)
214
-
215
- Input: 2+ analyzed skill contents with "fuse" verdict
216
- Output: one unified skill that merges complementary DNA
217
-
218
- ### 5a. Identify Complementary DNA
219
-
220
- For each skill being fused, identify:
221
- - **Unique rules/concepts** — content that only this skill has
222
- - **Overlapping content** — same idea expressed differently (keep the better version)
223
- - **Conflicting directives** — skills that say opposite things (surface to user)
224
-
225
- ### 5b. Merge Architecture
226
-
227
- Structure the fused skill so each source contributes its strength:
228
-
229
- ```markdown
230
- ## <Combined Workflow>
231
-
232
- ### Phase A: <from skill 1>
233
- <what skill 1 contributes>
234
-
235
- ### Phase B: <from skill 2>
236
- <what skill 2 contributes>
237
-
238
- ### Phase C: <from skill 3>
239
- <what skill 3 contributes>
240
- ```
241
-
242
- Do NOT just concatenate. The fused skill must read as a single coherent workflow, not three documents glued together.
243
-
244
- ### 5c. Name the Fusion
245
-
246
- The name should signal the combined purpose, not the individual sources.
247
- - `oh-facade` (from redesign + design-taste + high-end-visual) — not `oh-redesign-plus-taste`
248
- - Apply the same principle here
249
-
250
- ---
251
-
252
- ## Phase 6: Integration
253
-
254
- Input: OH-native SKILL.md
255
- Output: skill fully wired into the harness
256
-
257
- ### 6a. Create the Skill File
258
-
259
- Write to `~/.config/opencode/skills/oh-<name>/SKILL.md` (user dir, survives npm updates).
260
- If the user has an alternative preference (`~/.agents/skills/`), use that instead.
261
- The file structure follows the standard OH skill template.
262
-
263
- ### 6b. Wire into AUTOPILOT
264
-
265
- Add an entry to the auto-classify matrix in `harness/codex/AUTOPILOT.md`:
266
- - Signal keywords that should trigger this skill
267
- - Classification label
268
- - Action: "Load **oh-<name>**. Do not ask."
269
-
270
- ### 6c. Wire routing into frontmatter
271
-
272
- Add `route:` frontmatter to the skill — no ROUTING.md edit needed. The dynamic routing system reads `route.pass`, `route.fail`, and `route.blocker` directly from the skill's own `SKILL.md`. The skill becomes routable automatically:
273
-
274
- ```yaml
275
- route:
276
- pass: <next skill or done>
277
- fail: <fallback skill or surface>
278
- blocker: surface
279
- ```
280
-
281
- ### 6d. Wire into AGENTS.md
282
-
283
- Add to the skills table in `AGENTS.md`:
284
- - Skill, tier, purpose
285
- - Increment the total count
286
-
287
- ### 6e. Wire into openhermes.md
288
-
289
- Add to the orchestrator's skill list in `harness/agents/openhermes.md`.
290
-
291
- ### 6f. Verify Discovery
292
-
293
- Route to `oh-skills-link` to confirm the skill is discoverable by OpenCode.
294
-
295
- ---
19
+ 1. Load skill content — read from `.agents/skills/`, `npx skills`, URL, user path, or inline text.
20
+ 2. Analyze depth — score by lines, concrete rules, examples, anti-patterns, workflow steps, and routing.
21
+ 3. Detect overlap compare against existing `oh-*` skills. Report none/partial/complete.
22
+ 4. Decide verdict Keep (high signal, no overlap), Fuse (partial overlap, merge DNA), Discard (low/no signal), or Ask (ambiguous).
23
+ 5. Adapt to OH-native format remove emojis, convert paths, add routing, preserve unique signal.
24
+ 6. Fuse if merging identify unique concepts from each source, resolve conflicts, write one coherent workflow.
25
+ 7. Integrate create skill file, wire AUTOPILOT, routing, AGENTS.md, openhermes.md.
26
+ 8. Verify — route to oh-skills-link to confirm discovery.
296
27
 
297
28
  ## Routing
298
29
 
299
30
  | Outcome | Route |
300
- |---|---|
301
- | integration complete | -> oh-skills-link (verify discovery) |
302
- | fusion with iteration needed | -> oh-skill-craft (optimize via eval loop) |
303
- | analysis: discard | -> surface findings to user |
304
- | analysis: ask | -> surface findings + recommendations to user |
305
- | blocker | -> surface to user |
306
-
307
- ## Anti-patterns
308
-
309
- - Importing a skill without analyzing it first — always run Phase 2
310
- - Keeping everything from the source — 50% of most external skills is fluff. Be ruthless.
311
- - Fusing incompatible domains — the result confuses both the model and the user
312
- - Naming after the source ("oh-tailwind-v2") instead of the capability ("oh-styles")
313
- - Skipping route frontmatter — a skill without `route.pass`/`route.fail`/`route.blocker` won't auto-route
314
- - Overwriting existing routing entries without checking for collisions
31
+ |---------|-------|
32
+ | Integration complete | oh-skills-link (verify discovery) |
33
+ | Fusion needs iteration | oh-skill-craft |
34
+ | Analysis: discard | surface |
35
+ | Analysis: ask | surface with recs |
36
+ | Blocker | surface |
@@ -0,0 +1,77 @@
1
+ # oh-gauntlet — Deep Reference
2
+
3
+ ## Stage 1: Test Suite
4
+
5
+ Run all tests. Check they test behavior (not implementation). Flag gaps in edge case coverage. Do NOT add tests — surface as findings.
6
+
7
+ **TDD Iron Law:** `NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST`. If code was written before its test — flag as severe quality gap.
8
+
9
+ **RED-GREEN-REFACTOR verification:** For new code in diff, verify each function has a test, the test was seen to fail before implementation (commit history), minimal code was written to pass each test, and tests use real code (not mocks unless unavoidable).
10
+
11
+ **Rationalization Table:**
12
+
13
+ | Excuse | Reality |
14
+ |--------|---------|
15
+ | "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
16
+ | "I'll test after" | Tests passing immediately prove nothing. |
17
+ | "Already manually tested" | Ad-hoc ≠ systematic. Can't re-run. |
18
+ | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
19
+ | "TDD will slow me down" | TDD faster than debugging. |
20
+ | "Existing code has no tests" | You're improving it. Add tests for existing code. |
21
+
22
+ **Red Flags** (any = quality gap): Code before test · Test passes immediately · Can't explain why test failed · Rationalizing "just this once" · "Keep as reference" or "adapt existing code" · "Already spent X hours, deleting is wasteful" · "TDD is dogmatic, I'm being pragmatic" · "Tests after achieve the same purpose"
23
+
24
+ **TDD Verification Checklist:**
25
+ - [ ] Every new function has a test
26
+ - [ ] Watched each test fail before implementing (evidence)
27
+ - [ ] Wrote minimal code to pass each test
28
+ - [ ] All tests pass
29
+ - [ ] Output pristine (no errors/warnings)
30
+ - [ ] Tests use real code (mocks only if unavoidable)
31
+ - [ ] Edge cases and errors covered
32
+
33
+ ## Stage 2: Dual-Axis Review (parallel sub-agents)
34
+
35
+ - **Standards** — read documented standards (CONTEXT.md, AGENTS.md, eslint, ADRs). Report every violation. Cite source. Distinguish hard violations from judgment calls.
36
+ - **Spec** — read spec source (plan/issue/PRD). Report missing/partial requirements, scope creep, wrong implementations. Quote the spec.
37
+
38
+ Report independently. Do not merge or rank.
39
+
40
+ ## Stage 3: Edge Case Sweep
41
+
42
+ - Error states — invalid inputs, missing files, network failure
43
+ - Concurrency — races, deadlocks, stale state
44
+ - Security — injection, auth bypass, data leakage
45
+ - Performance — N+1, unbounded loops, leaks
46
+ - State transitions — invalid transitions, partial updates
47
+
48
+ Per finding: severity (critical/major/minor), location, reproduction.
49
+
50
+ ## Stage 4: QA Sweep (tiered)
51
+
52
+ Quick (critical only) / Standard (+ medium) / Exhaustive (+ cosmetic). Execute flows, log findings, fix highest severity first, re-verify after each fix.
53
+
54
+ ## Stage 5: Canary (post-deploy)
55
+
56
+ Capture pre-deploy baselines. Deploy. Navigate key flows. Diff against baselines. Surface anomalies. Suggest rollback if critical.
57
+
58
+ ## Stage 6: Manual Verification
59
+
60
+ - Happy path, error path, no regression, logging covers failures, docs match behavior.
61
+
62
+ ## Loop Protocol
63
+
64
+ 1. Run all stages (skip 5 if not deploying)
65
+ 2. 0 critical + 0 major → DONE
66
+ 3. Criticals/majors exist → fix highest severity, re-run affected stages
67
+ 4. Fix impossible → BLOCKER: `<what> | Options: A, B, C`
68
+
69
+ ## Anti-patterns
70
+
71
+ - Sequential when parallel possible
72
+ - Mixing Standards and Spec findings (keep axes separate)
73
+ - Skipping edge case sweep because tests pass
74
+ - Ignoring minors (accumulated design debt)
75
+ - Pushing critical failures without surfacing
76
+ - Skipping TDD — writing code without a failing test first
77
+ - Tests that pass immediately without having failed first (might test wrong thing)
@@ -1,17 +1,7 @@
1
1
  ---
2
2
  name: oh-gauntlet
3
- description: "Rigorous multi-axis testing gauntlet: unit, integration, edge cases, dual-axis review. Loops until done or blocker."
3
+ description: "Use when code is ready for thorough testing unit tests, integration, edge cases, dual-axis review, and QA. Loops until done or blocker."
4
4
  tier: 4
5
- benefits-from: [oh-expert, oh-builder]
6
- triggers:
7
- - "run the gauntlet on"
8
- - "test everything"
9
- - "rigorous testing"
10
- - "review all angles"
11
- - "qa the feature"
12
- - "full review of the code"
13
- - "validate this feature"
14
- - "thorough testing"
15
5
  route:
16
6
  pass: oh-ship
17
7
  fail: oh-builder
@@ -20,104 +10,22 @@ route:
20
10
 
21
11
  # oh-gauntlet
22
12
 
23
- Runs the current build through a multi-axis gauntlet: tests, edge cases, standards review, spec review. Spawns parallel sub-agents for independent axes. Loops until everything passes or a blocker is surfaced.
13
+ Multi-axis testing: test suite, dual-axis review, edge case sweep, QA, canary.
24
14
 
25
- ## Gauntlet Stages
15
+ ## Steps
26
16
 
27
- Each stage runs independently (parallel where possible). A stage that fails loops: fix re-run verify pass or blocker.
28
-
29
- ### Stage 1: Test Suite
30
- Run all existing tests. Check both that they pass and that they actually test the right things:
31
- - **Unit tests**do they pass? Are they testing behavior or implementation?
32
- - **Integration tests**do the real code paths work end-to-end?
33
- - **Edge case coverage**empty states, error states, boundary conditions, concurrency
34
-
35
- If tests are missing or weak, flag what should be added. Do not add them here — surface as finding.
36
-
37
- ### Stage 2: Dual-Axis Review (parallel sub-agents)
38
-
39
- Spawn two sub-agents simultaneously:
40
-
41
- **Standards sub-agent:** Read the repo's documented standards (CONTEXT.md, AGENTS.md, eslint config, ADRs). Then read the diff. Report every place the diff violates a documented standard. Cite the standard source. Distinguish hard violations from judgement calls.
42
-
43
- **Spec sub-agent:** Read the spec source (plan.md, issue, PRD, or user's description). Then read the diff. Report: (a) requirements that are missing or partial, (b) scope creep (behavior not asked for), (c) requirements that look implemented but wrong. Quote the spec.
44
-
45
- Report both axes independently — do not merge or rank. A change can pass one and fail the other.
46
-
47
- ### Stage 3: Edge Case Sweep
48
- Systematic edge case analysis for the changed code:
49
- - Error states — what happens when inputs are invalid, files are missing, network fails?
50
- - Concurrency — race conditions, deadlocks, stale state
51
- - Security — injection, auth bypass, data leakage, permission escalation
52
- - Performance — N+1 queries, unbounded loops, memory leaks, unnecessary allocations
53
- - State transitions — invalid state transitions, partial updates, rollback gaps
54
-
55
- For each finding: severity (critical/major/minor), location, reproduction path.
56
-
57
- ### Stage 4: QA Sweep (tiered)
58
- Systematic testing with iterative fix-verify cycles. Choose tier based on risk:
59
-
60
- - **Quick** — critical/high severity flows only
61
- - **Standard** — critical + medium severity, full edge case sweep
62
- - **Exhaustive** — all of the above + cosmetic, edge cases, cross-browser
63
-
64
- 1. Execute tests against each user flow, edge case, error state
65
- 2. Log findings with severity, reproduction steps, evidence
66
- 3. Fix highest-severity first, commit each fix atomically
67
- 4. Re-verify after each fix — confirm fix, check for regressions
68
- 5. Produce health scores (before/after)
69
-
70
- ### Stage 5: Canary (post-deploy)
71
- If deploying to production:
72
-
73
- 1. **Set baseline** — capture pre-deploy screenshots and metrics
74
- 2. **Deploy check** — verify deploy completed successfully
75
- 3. **Canary run** — navigate key user flows, capture screenshots, log console errors
76
- 4. **Compare** — diff against pre-deploy baselines
77
- 5. **Alert** — surface anomalies, performance regressions, new errors
78
- 6. **Recovery** — if critical issues found, suggest rollback
79
-
80
- Output: health status, screenshots (before/after), error log, performance diff, ship/no-go verdict.
81
-
82
- ### Stage 6: Manual Verification Checklist
83
- Based on the plan's verification criteria or spec:
84
- - [ ] Happy path works end-to-end
85
- - [ ] Error path degrades gracefully
86
- - [ ] No regression in adjacent areas
87
- - [ ] Logging/monitoring covers failure modes
88
- - [ ] Documentation matches behavior (if applicable)
89
-
90
- ## Loop Protocol
91
-
92
- 1. Run all 6 stages (skip Stage 5 if not deploying)
93
- 2. Collect findings by severity
94
- 3. If 0 criticals and 0 majors → DONE
95
- 4. If criticals or majors exist → fix highest severity first
96
- 5. After fix → re-run affected stages only
97
- 6. If fix is impossible within scope → surface BLOCKER
98
-
99
- ## Blocker Protocol
100
-
101
- ```
102
- BLOCKER: <what failed>
103
- Context: <what was attempted, why it cannot proceed>
104
- Options:
105
- A: <scope reduction>
106
- B: <alternative approach>
107
- C: <dependency change>
108
- ```
109
-
110
- ## Anti-patterns
111
- - Running stages sequentially when they can be parallel (Standards and Spec reviews are independent)
112
- - Mixing Standards and Spec findings (keep axes separate — one can pass while the other fails)
113
- - Skipping edge case sweep because tests pass (tests confirm behavior, not absence of edge cases)
114
- - Ignoring minors because no criticals exist (accumulated minors signal design debt)
115
- - Pushing through critical failures without surfacing blocker
17
+ 1. Run all tests verify TDD Iron Law (no production code without failing test first). Flag gaps in edge case coverage.
18
+ 2. Run dual-axis review — spawn parallel Standards and Spec sub-agents. Report independently. Do not merge or rank.
19
+ 3. Sweep edge cases — error states, concurrency, security, performance, state transitions. Assign severity (critical/major/minor).
20
+ 4. Run QA sweep tiered (quick/standard/exhaustive). Fix highest severity first. Re-verify after each fix.
21
+ 5. Deploy canary if applicable capture pre-deploy baselines, navigate flows, diff anomalies, suggest rollback if critical.
22
+ 6. Run manual verification happy path, error path, no regression, logging covers failures, docs match behavior.
23
+ 7. Apply loop protocol0 critical + 0 major → done. Fix highest severity, re-run affected stages. Surface blocker with options.
116
24
 
117
25
  ## Routing
118
26
 
119
27
  | Outcome | Route |
120
28
  |---------|-------|
121
- | pass | → oh-ship (all checks pass) |
122
- | fail | → oh-builder (fix issues found) |
123
- | blocker | → surface to user |
29
+ | pass | → oh-ship |
30
+ | fail | → oh-builder (fix issues) |
31
+ | blocker | → surface |
@@ -0,0 +1,51 @@
1
+ # oh-grill — Deep Reference
2
+
3
+ ## When to Use
4
+
5
+ Before committing to a plan. "Writing exactly what I asked for and it's still wrong" = design concept not shared. Cheaper in conversation than in code.
6
+
7
+ **Example:** User shares a plan. You respond with: "Have you considered the failure mode where X happens?" — then walk through the grill modes.
8
+
9
+ ## When NOT to Use
10
+
11
+ - Clear vetted plan needing execution
12
+ - User needs builder, not critic
13
+ - Trivial decisions
14
+
15
+ ## Modes / Workflow
16
+
17
+ ### Mode A: Grill (quick)
18
+
19
+ 1. Read plan/design doc
20
+ 2. Interview one decision at a time — each answer reveals new branches
21
+ 3. Resolve each branch before moving on
22
+ 4. Surface: contradictions, blind spots, unstated assumptions, ambiguous terms
23
+ 5. Propose recommended answer per decision
24
+ 6. Output: verified plan with flagged ambiguities
25
+
26
+ ### Mode B: Grill with Docs (thorough)
27
+
28
+ Same + persists to CONTEXT.md, ADRs, and DDD ubiquitous-language glossary.
29
+
30
+ 1. Load CONTEXT.md + ADRs
31
+ 2. Grill decision tree — each resolution may: update CONTEXT.md terms, create ADR, flag glossary ambiguity
32
+ 3. **Ubiquitous Language extraction** — scan for domain nouns/verbs/concepts. Identify: same word different concepts, different words same concept, vague terms. Propose canonical glossary with grouped tables. Write example dialogue (3-5 exchanges). Flag ambiguities.
33
+ 4. Persist CONTEXT.md changes immediately as language firms
34
+ 5. Output: updated CONTEXT.md + ADRs + UBIQUITOUS_LANGUAGE.md (if significant) + verified plan
35
+
36
+ ## Technique
37
+
38
+ - One question at a time
39
+ - Propose recommended answer per decision
40
+ - Walk full decision tree before accepting
41
+ - Reference CONTEXT.md glossary for ambiguous terms
42
+ - Cross-reference ADRs for architecture decisions
43
+
44
+ ## Anti-patterns
45
+
46
+ - Grilling for sake of grilling
47
+ - Questions you could answer by reading plan/codebase
48
+ - ADRs for trivial decisions
49
+ - Polishing CONTEXT.md before concepts settled
50
+ - Updating terms mid-discussion (let conversation resolve)
51
+ - Not distinguishing "must resolve now" vs "figure out later"