openhermes 4.1.0 → 4.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (99) hide show
  1. package/CONTEXT.md +9 -0
  2. package/ETHOS.md +6 -3
  3. package/LICENSE +21 -21
  4. package/README.md +120 -79
  5. package/bootstrap.ts +284 -41
  6. package/harness/agents/oh-browser.md +97 -0
  7. package/harness/agents/oh-builder.md +78 -0
  8. package/harness/agents/oh-facade.md +75 -0
  9. package/harness/agents/oh-fusion.md +45 -0
  10. package/harness/agents/oh-gauntlet.md +71 -0
  11. package/harness/agents/oh-grill.md +71 -0
  12. package/harness/agents/oh-investigate.md +60 -0
  13. package/harness/agents/oh-manifest.md +95 -0
  14. package/harness/agents/oh-plan-review.md +40 -0
  15. package/harness/agents/oh-planner.md +50 -0
  16. package/harness/agents/oh-refactor.md +37 -0
  17. package/harness/agents/oh-retro.md +46 -0
  18. package/harness/agents/oh-review.md +85 -0
  19. package/harness/agents/oh-security.md +83 -0
  20. package/harness/agents/oh-ship.md +76 -0
  21. package/harness/agents/oh-skill-craft.md +38 -0
  22. package/harness/agents/openhermes.md +106 -62
  23. package/harness/codex/AUTOPILOT.md +178 -0
  24. package/harness/codex/CHARTER.md +81 -0
  25. package/harness/commands/oh-doctor.md +193 -14
  26. package/harness/commands/oh-log.md +18 -0
  27. package/harness/instructions/SHELL.md +76 -0
  28. package/harness/skills/oh-ascii/DEEP.md +292 -0
  29. package/harness/skills/oh-ascii/SKILL.md +31 -0
  30. package/harness/skills/oh-ascii/scripts/check_ascii_alignment.py +596 -0
  31. package/harness/skills/oh-browser/DEEP.md +54 -0
  32. package/harness/skills/oh-browser/SKILL.md +30 -0
  33. package/harness/skills/oh-builder/DEEP.md +63 -0
  34. package/harness/skills/oh-builder/SKILL.md +16 -89
  35. package/harness/skills/oh-expert/DEEP.md +85 -0
  36. package/harness/skills/oh-expert/SKILL.md +19 -106
  37. package/harness/skills/oh-facade/DEEP.md +182 -0
  38. package/harness/skills/oh-facade/SKILL.md +34 -0
  39. package/harness/skills/oh-freeze/DEEP.md +18 -0
  40. package/harness/skills/oh-freeze/SKILL.md +15 -15
  41. package/harness/skills/oh-full-output/DEEP.md +25 -0
  42. package/harness/skills/oh-full-output/SKILL.md +28 -0
  43. package/harness/skills/oh-fusion/DEEP.md +120 -0
  44. package/harness/skills/oh-fusion/SKILL.md +36 -0
  45. package/harness/skills/oh-gauntlet/DEEP.md +77 -0
  46. package/harness/skills/oh-gauntlet/SKILL.md +17 -105
  47. package/harness/skills/oh-grill/DEEP.md +51 -0
  48. package/harness/skills/oh-grill/SKILL.md +16 -63
  49. package/harness/skills/oh-guard/DEEP.md +19 -0
  50. package/harness/skills/oh-guard/SKILL.md +15 -20
  51. package/harness/skills/oh-handoff/DEEP.md +48 -0
  52. package/harness/skills/oh-handoff/SKILL.md +18 -19
  53. package/harness/skills/oh-health/DEEP.md +74 -0
  54. package/harness/skills/oh-health/SKILL.md +17 -76
  55. package/harness/skills/oh-init/DEEP.md +85 -0
  56. package/harness/skills/oh-init/SKILL.md +17 -197
  57. package/harness/skills/oh-investigate/DEEP.md +171 -0
  58. package/harness/skills/oh-investigate/SKILL.md +18 -61
  59. package/harness/skills/oh-issue/DEEP.md +21 -0
  60. package/harness/skills/oh-issue/SKILL.md +16 -23
  61. package/harness/skills/oh-learn/DEEP.md +44 -0
  62. package/harness/skills/oh-learn/SKILL.md +17 -79
  63. package/harness/skills/oh-manifest/DEEP.md +92 -0
  64. package/harness/skills/oh-manifest/SKILL.md +15 -107
  65. package/harness/skills/oh-plan-review/DEEP.md +90 -0
  66. package/harness/skills/oh-plan-review/SKILL.md +19 -114
  67. package/harness/skills/oh-planner/DEEP.md +172 -0
  68. package/harness/skills/oh-planner/SKILL.md +16 -143
  69. package/harness/skills/oh-prd/DEEP.md +45 -0
  70. package/harness/skills/oh-prd/SKILL.md +15 -22
  71. package/harness/skills/oh-refactor/DEEP.md +122 -0
  72. package/harness/skills/oh-refactor/SKILL.md +33 -0
  73. package/harness/skills/oh-retro/DEEP.md +26 -0
  74. package/harness/skills/oh-retro/SKILL.md +17 -20
  75. package/harness/skills/oh-review/DEEP.md +87 -0
  76. package/harness/skills/oh-review/SKILL.md +17 -96
  77. package/harness/skills/oh-security/DEEP.md +83 -0
  78. package/harness/skills/oh-security/SKILL.md +18 -96
  79. package/harness/skills/oh-ship/DEEP.md +141 -0
  80. package/harness/skills/oh-ship/SKILL.md +18 -26
  81. package/harness/skills/oh-skill-craft/DEEP.md +369 -0
  82. package/harness/skills/oh-skill-craft/SKILL.md +20 -93
  83. package/harness/skills/oh-skills-link/DEEP.md +16 -0
  84. package/harness/skills/oh-skills-link/SKILL.md +15 -16
  85. package/harness/skills/oh-skills-list/DEEP.md +20 -0
  86. package/harness/skills/oh-skills-list/SKILL.md +14 -18
  87. package/harness/skills/oh-triage/DEEP.md +23 -0
  88. package/harness/skills/oh-triage/SKILL.md +15 -20
  89. package/harness/skills/oh-worktree/DEEP.md +169 -0
  90. package/harness/skills/oh-worktree/SKILL.md +32 -0
  91. package/lib/harness-resolver.ts +10 -12
  92. package/package.json +9 -4
  93. package/scripts/count-tokens.mjs +158 -0
  94. package/scripts/oh-doctor.ps1 +342 -0
  95. package/harness/codex/CONSTITUTION.md +0 -70
  96. package/harness/codex/ROUTING.md +0 -127
  97. package/harness/instructions/RUNTIME.md +0 -55
  98. package/harness/skills/oh-caveman/SKILL.md +0 -33
  99. package/lib/logger.ts +0 -69
@@ -0,0 +1,18 @@
1
+ # oh-freeze — Deep Reference
2
+
3
+ ## When to Use
4
+
5
+ When the user says "don't touch anything outside [path]" or "only edit files in [dir]." Prevents accidental edits to configuration, infrastructure, or unrelated modules.
6
+
7
+ ## Mode
8
+
9
+ - Allowed paths: only the specified directory and its children
10
+ - If you need to edit outside: state the reason and ask
11
+ - Read operations unrestricted anywhere
12
+ - File creation restricted to allowed paths
13
+
14
+ ## Anti-patterns
15
+
16
+ - Editing outside allowed path without asking
17
+ - Reading unrelated files for context and using that to justify edits
18
+ - "Just this one file outside" — ask first
@@ -1,28 +1,28 @@
1
1
  ---
2
2
  name: oh-freeze
3
- description: "Restrict file edits to a specific directory for the session"
3
+ description: "Restricts file edits to a specific directory for the session."
4
+ tier: 2
5
+ route:
6
+ pass: mode
7
+ fail: mode
8
+ blocker: surface
4
9
  ---
5
10
 
6
11
  # oh-freeze
7
12
 
8
- ## When to Use
9
- When debugging a specific module and you want to prevent accidentally "fixing" unrelated code. Scopes all Edit/Write operations to one directory.
13
+ Restrict write operations to one allowed directory.
10
14
 
11
- ## Workflow
12
- 1. Specify target directory to freeze
13
- 2. All Edit/Write operations outside that directory are blocked
14
- 3. User can explicitly approve cross-boundary edits
15
- 4. Unfreeze to release the boundary
15
+ ## Steps
16
16
 
17
- ## Anti-patterns
18
- - Freezing too broadly (defeats the purpose)
19
- - Forgetting to unfreeze when task scope expands
20
- - Using freeze as a substitute for git discipline
17
+ 1. Identify the allowed directory from user instructions
18
+ 2. Restrict all write/edit/create operations to the allowed path
19
+ 3. Allow read operations unrestricted anywhere
20
+ 4. If a write outside the allowed path is required, state the reason and ask before proceeding
21
21
 
22
22
  ## Routing
23
23
 
24
24
  | Outcome | Route |
25
25
  |---------|-------|
26
- | pass | → [return to prior skill — scope lock active] |
27
- | fail | → [surface issue — freeze not applied] |
28
- | blocker | → surface to user |
26
+ | pass | → mode |
27
+ | fail | → mode |
28
+ | blocker | → surface |
@@ -0,0 +1,25 @@
1
+ # oh-full-output — Deep Reference
2
+
3
+ ## When to Use
4
+
5
+ When generating long files, multi-component output, or any task where partial output would be a broken deliverable.
6
+
7
+ ## Banned Patterns (Hard Failures)
8
+
9
+ **Code:** `// ...`, `// rest of code`, `// implement here`, `// TODO`, `/* ... */`, `// similar`, `// continue`, `// same as`, bare `...`.
10
+
11
+ **Prose:** "let me know if you want me to continue", "for brevity", "the rest follows the same pattern", "and so on", "I'll leave that as an exercise".
12
+
13
+ **Structural:** Skeleton instead of full impl. First/last sections skipping middle. One example + description for repeated logic. Describing instead of writing.
14
+
15
+ ## Long Outputs
16
+
17
+ When approaching token limit:
18
+ - Write at full quality to a clean breakpoint (end of function/file/section)
19
+ - Do NOT compress remaining sections
20
+ - End with: `[PAUSED — X of Y complete. Send "continue" to resume from: next section name]`
21
+ - On "continue": pick up exactly where stopped. No recap.
22
+
23
+ ## Quick Check
24
+
25
+ Before finalizing: no banned patterns, every item present and finished, code blocks contain runnable code, nothing shortened.
@@ -0,0 +1,28 @@
1
+ ---
2
+ name: oh-full-output
3
+ description: "Override LLM truncation — enforce complete code generation, ban placeholders."
4
+ tier: 2
5
+ route:
6
+ pass: done
7
+ fail: [surface, oh-expert]
8
+ blocker: surface
9
+ ---
10
+
11
+ # oh-full-output
12
+
13
+ Override truncation. Enforce complete generation. Ban placeholders.
14
+
15
+ ## Steps
16
+
17
+ 1. Scope — Count distinct deliverables. Lock the number.
18
+ 2. Build — Generate every deliverable completely. No partial drafts.
19
+ 3. Cross-check — Re-read request. Compare deliverable count. Add anything missing.
20
+ 4. Verify — No banned patterns, every item present, code blocks are runnable, nothing shortened.
21
+
22
+ ## Routing
23
+
24
+ | Outcome | Route |
25
+ |---------|-------|
26
+ | pass | → done |
27
+ | fail | → surface, oh-expert |
28
+ | blocker | → surface |
@@ -0,0 +1,120 @@
1
+ # oh-fusion — Deep Reference
2
+
3
+ ## When to Use
4
+
5
+ Use when a user wants to bring an external skill or capability into the OpenHermes harness as an OH-native skill.
6
+
7
+ ## Phase 1: Discovery
8
+
9
+ | Source | Access |
10
+ |--------|--------|
11
+ | `.agents/skills/<name>/SKILL.md` | Read file |
12
+ | `npx skills` package | `npx skills find <query>` |
13
+ | URL | Fetch content |
14
+ | User path | Resolve and read |
15
+ | Inline text | Capture raw |
16
+
17
+ Confirm: content loaded, frontmatter present, no access restrictions. For multiple: all loaded.
18
+
19
+ ## Phase 2: Analysis
20
+
21
+ ### Depth Scoring
22
+ | Metric | Assessment |
23
+ |--------|-----------|
24
+ | Lines | SKILL.md length |
25
+ | Concrete rules | "must/never/always/banned" count |
26
+ | Examples | Before/after or usage code blocks |
27
+ | Anti-patterns | Explicit "don't" sections |
28
+ | Workflow steps | Sequential, actionable steps |
29
+ | Routing table | pass/fail/blocker defined |
30
+
31
+ **Score:** High (70-100) = concrete + examples + routing. Medium (30-69) = some structure. Low (0-29) = vague, no rules.
32
+
33
+ ### Overlap Detection
34
+ Compare against existing `harness/skills/oh-*` skills. Overlap: none / partial (complementary) / complete (redundant).
35
+
36
+ ### Convention Check
37
+ - Clear description for triggering? Actionable instructions? Anti-patterns? Examples? Measurable outcomes? No time-sensitive refs? No platform assumptions?
38
+
39
+ ### Report Template
40
+ ```markdown
41
+ **Depth:** <0-100> — <High/Medium/Low>
42
+ **Overlap:** <existing skill> — <none/partial/complete>
43
+ **Verdict:** <keep / fuse / discard / ask>
44
+ **Action:** <port directly / fuse with X / discard>
45
+ ```
46
+
47
+ ## Phase 3: Decision
48
+
49
+ | Verdict | Action |
50
+ |---------|--------|
51
+ | Keep | High signal, no overlap. Port to `oh-<name>`. |
52
+ | Fuse | Partial overlap with existing. Merge complementary DNA. |
53
+ | Discard | Low signal or complete overlap. Surface reasoning. |
54
+ | Ask | Ambiguous quality. Surface findings. |
55
+
56
+ When in doubt: prefer fuse over keep, keep over discard if ANY unique signal.
57
+
58
+ ## Phase 4: Adaptation
59
+
60
+ ### Frontmatter
61
+ ```yaml
62
+ name: oh-<name>
63
+ description: "Adapted from <source>. Core function. Use when ..."
64
+ tier: <2|3|4>
65
+ ```
66
+
67
+ ### Body Structure
68
+ 1. **Summary** — one-paragraph of what the skill does
69
+ 2. **When to Use** — clear triggering context
70
+ 3. **Workflow** — numbered steps (the core)
71
+ 4. **Anti-patterns** — what NOT to do
72
+ 5. **Routing** — pass/fail/blocker table
73
+
74
+ ### Adaptation Rules
75
+ - Remove emojis. Replace ecosystem terms with OH equivalents.
76
+ - Convert relative paths to OH harness conventions.
77
+ - Add routing table based on the skill's purpose.
78
+ - Keep all concrete rules, examples, and anti-patterns from the original.
79
+ - Discard fluff, philosophy, and motivational language.
80
+ - Preserve the original's unique signal — that's why you're importing it.
81
+
82
+ ### Naming
83
+ Match `^[a-z0-9]+(-[a-z0-9]+)*$`, prefix `oh-`. Original name if good fit, adapt if not. Fusion names signal combined purpose.
84
+
85
+ ## Phase 5: Fusion (opt — skip for single imports)
86
+
87
+ For each skill being fused, identify:
88
+ - **Unique concepts** — content only this skill has
89
+ - **Overlapping content** — same idea expressed differently (keep the better version)
90
+ - **Conflicting directives** — skills that say opposite things (surface to user)
91
+
92
+ ### Merge Architecture
93
+ Structure so each source contributes its strength. Do NOT concatenate — must read as one coherent workflow, not three documents glued together.
94
+
95
+ ```markdown
96
+ ## Combined Workflow
97
+ ### Phase A: <from skill 1>
98
+ ### Phase B: <from skill 2>
99
+ ```
100
+
101
+ ### Naming
102
+ Name signals combined purpose, not individual sources. E.g., `oh-facade` from redesign + design-taste + high-end-visual, not `oh-redesign-plus-taste`.
103
+
104
+ ## Phase 6: Integration
105
+
106
+ 1. **Create file** → user dir (`~/.config/opencode/skills/oh-<name>/SKILL.md`)
107
+ 2. **Wire AUTOPILOT** → add to auto-classify matrix in `harness/codex/AUTOPILOT.md`: signal keywords → classification → "Load **oh-<name>**. Do not ask."
108
+ 3. **Wire routing** → add `route:` frontmatter in the skill. Dynamic loading reads `route.pass`, `route.fail`, `route.blocker` directly from `SKILL.md` — no ROUTING.md edit needed. Skill becomes routable automatically.
109
+ 4. **Wire AGENTS.md** → add to skills table with tier and purpose. Increment total count.
110
+ 5. **Wire openhermes.md** → add to orchestrator's skill list in `harness/agents/openhermes.md`.
111
+ 6. **Verify** → route to `oh-skills-link` to confirm OpenCode discovers it.
112
+
113
+ ## Anti-patterns
114
+
115
+ - Importing without analysis (always run Phase 2)
116
+ - Keeping everything — ~50% of external skills is fluff
117
+ - Fusing incompatible domains (confusing to model and user)
118
+ - Naming after source ("oh-tailwind-v2") instead of capability ("oh-styles")
119
+ - Skipping route frontmatter — without it, autopilot can't route
120
+ - Overwriting existing routing without checking for collisions
@@ -0,0 +1,36 @@
1
+ ---
2
+ name: oh-fusion
3
+ description: "Use when the user has an existing skill, finds a skill in their .agents/skills, or wants to bring an external capability into OH as a skill."
4
+ tier: 3
5
+ route:
6
+ pass:
7
+ - oh-skills-link
8
+ - oh-skill-craft
9
+ fail: oh-skill-craft
10
+ blocker: surface
11
+ ---
12
+
13
+ # oh-fusion
14
+
15
+ Skill ingestion pipeline: Discover → Analyze → Decide → Adapt → Fuse → Integrate.
16
+
17
+ ## Steps
18
+
19
+ 1. Load skill content — read from `.agents/skills/`, `npx skills`, URL, user path, or inline text.
20
+ 2. Analyze depth — score by lines, concrete rules, examples, anti-patterns, workflow steps, and routing.
21
+ 3. Detect overlap — compare against existing `oh-*` skills. Report none/partial/complete.
22
+ 4. Decide verdict — Keep (high signal, no overlap), Fuse (partial overlap, merge DNA), Discard (low/no signal), or Ask (ambiguous).
23
+ 5. Adapt to OH-native format — remove emojis, convert paths, add routing, preserve unique signal.
24
+ 6. Fuse if merging — identify unique concepts from each source, resolve conflicts, write one coherent workflow.
25
+ 7. Integrate — create skill file, wire AUTOPILOT, routing, AGENTS.md, openhermes.md.
26
+ 8. Verify — route to oh-skills-link to confirm discovery.
27
+
28
+ ## Routing
29
+
30
+ | Outcome | Route |
31
+ |---------|-------|
32
+ | Integration complete | → oh-skills-link (verify discovery) |
33
+ | Fusion needs iteration | → oh-skill-craft |
34
+ | Analysis: discard | → surface |
35
+ | Analysis: ask | → surface with recs |
36
+ | Blocker | → surface |
@@ -0,0 +1,77 @@
1
+ # oh-gauntlet — Deep Reference
2
+
3
+ ## Stage 1: Test Suite
4
+
5
+ Run all tests. Check they test behavior (not implementation). Flag gaps in edge case coverage. Do NOT add tests — surface as findings.
6
+
7
+ **TDD Iron Law:** `NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST`. If code was written before its test — flag as severe quality gap.
8
+
9
+ **RED-GREEN-REFACTOR verification:** For new code in diff, verify each function has a test, the test was seen to fail before implementation (commit history), minimal code was written to pass each test, and tests use real code (not mocks unless unavoidable).
10
+
11
+ **Rationalization Table:**
12
+
13
+ | Excuse | Reality |
14
+ |--------|---------|
15
+ | "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
16
+ | "I'll test after" | Tests passing immediately prove nothing. |
17
+ | "Already manually tested" | Ad-hoc ≠ systematic. Can't re-run. |
18
+ | "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
19
+ | "TDD will slow me down" | TDD faster than debugging. |
20
+ | "Existing code has no tests" | You're improving it. Add tests for existing code. |
21
+
22
+ **Red Flags** (any = quality gap): Code before test · Test passes immediately · Can't explain why test failed · Rationalizing "just this once" · "Keep as reference" or "adapt existing code" · "Already spent X hours, deleting is wasteful" · "TDD is dogmatic, I'm being pragmatic" · "Tests after achieve the same purpose"
23
+
24
+ **TDD Verification Checklist:**
25
+ - [ ] Every new function has a test
26
+ - [ ] Watched each test fail before implementing (evidence)
27
+ - [ ] Wrote minimal code to pass each test
28
+ - [ ] All tests pass
29
+ - [ ] Output pristine (no errors/warnings)
30
+ - [ ] Tests use real code (mocks only if unavoidable)
31
+ - [ ] Edge cases and errors covered
32
+
33
+ ## Stage 2: Dual-Axis Review (parallel sub-agents)
34
+
35
+ - **Standards** — read documented standards (CONTEXT.md, AGENTS.md, eslint, ADRs). Report every violation. Cite source. Distinguish hard violations from judgment calls.
36
+ - **Spec** — read spec source (plan/issue/PRD). Report missing/partial requirements, scope creep, wrong implementations. Quote the spec.
37
+
38
+ Report independently. Do not merge or rank.
39
+
40
+ ## Stage 3: Edge Case Sweep
41
+
42
+ - Error states — invalid inputs, missing files, network failure
43
+ - Concurrency — races, deadlocks, stale state
44
+ - Security — injection, auth bypass, data leakage
45
+ - Performance — N+1, unbounded loops, leaks
46
+ - State transitions — invalid transitions, partial updates
47
+
48
+ Per finding: severity (critical/major/minor), location, reproduction.
49
+
50
+ ## Stage 4: QA Sweep (tiered)
51
+
52
+ Quick (critical only) / Standard (+ medium) / Exhaustive (+ cosmetic). Execute flows, log findings, fix highest severity first, re-verify after each fix.
53
+
54
+ ## Stage 5: Canary (post-deploy)
55
+
56
+ Capture pre-deploy baselines. Deploy. Navigate key flows. Diff against baselines. Surface anomalies. Suggest rollback if critical.
57
+
58
+ ## Stage 6: Manual Verification
59
+
60
+ - Happy path, error path, no regression, logging covers failures, docs match behavior.
61
+
62
+ ## Loop Protocol
63
+
64
+ 1. Run all stages (skip 5 if not deploying)
65
+ 2. 0 critical + 0 major → DONE
66
+ 3. Criticals/majors exist → fix highest severity, re-run affected stages
67
+ 4. Fix impossible → BLOCKER: `<what> | Options: A, B, C`
68
+
69
+ ## Anti-patterns
70
+
71
+ - Sequential when parallel possible
72
+ - Mixing Standards and Spec findings (keep axes separate)
73
+ - Skipping edge case sweep because tests pass
74
+ - Ignoring minors (accumulated design debt)
75
+ - Pushing critical failures without surfacing
76
+ - Skipping TDD — writing code without a failing test first
77
+ - Tests that pass immediately without having failed first (might test wrong thing)
@@ -1,119 +1,31 @@
1
1
  ---
2
2
  name: oh-gauntlet
3
- description: "Rigorous multi-axis testing gauntlet: unit, integration, edge cases, dual-axis review. Loops until done or blocker."
3
+ description: "Use when code is ready for thorough testing unit tests, integration, edge cases, dual-axis review, and QA. Loops until done or blocker."
4
4
  tier: 4
5
- benefits-from: [oh-expert, oh-builder]
6
- triggers:
7
- - "gauntlet"
8
- - "test everything"
9
- - "rigorous testing"
10
- - "review all angles"
11
- - "qa"
12
- - "full review"
13
- - "run the gauntlet"
14
- - "validate"
5
+ route:
6
+ pass: oh-ship
7
+ fail: oh-builder
8
+ blocker: surface
15
9
  ---
16
10
 
17
11
  # oh-gauntlet
18
12
 
19
- Runs the current build through a multi-axis gauntlet: tests, edge cases, standards review, spec review. Spawns parallel sub-agents for independent axes. Loops until everything passes or a blocker is surfaced.
13
+ Multi-axis testing: test suite, dual-axis review, edge case sweep, QA, canary.
20
14
 
21
- ## Gauntlet Stages
15
+ ## Steps
22
16
 
23
- Each stage runs independently (parallel where possible). A stage that fails loops: fix re-run verify pass or blocker.
24
-
25
- ### Stage 1: Test Suite
26
- Run all existing tests. Check both that they pass and that they actually test the right things:
27
- - **Unit tests**do they pass? Are they testing behavior or implementation?
28
- - **Integration tests**do the real code paths work end-to-end?
29
- - **Edge case coverage**empty states, error states, boundary conditions, concurrency
30
-
31
- If tests are missing or weak, flag what should be added. Do not add them here — surface as finding.
32
-
33
- ### Stage 2: Dual-Axis Review (parallel sub-agents)
34
-
35
- Spawn two sub-agents simultaneously:
36
-
37
- **Standards sub-agent:** Read the repo's documented standards (CONTEXT.md, AGENTS.md, eslint config, ADRs). Then read the diff. Report every place the diff violates a documented standard. Cite the standard source. Distinguish hard violations from judgement calls.
38
-
39
- **Spec sub-agent:** Read the spec source (plan.md, issue, PRD, or user's description). Then read the diff. Report: (a) requirements that are missing or partial, (b) scope creep (behavior not asked for), (c) requirements that look implemented but wrong. Quote the spec.
40
-
41
- Report both axes independently — do not merge or rank. A change can pass one and fail the other.
42
-
43
- ### Stage 3: Edge Case Sweep
44
- Systematic edge case analysis for the changed code:
45
- - Error states — what happens when inputs are invalid, files are missing, network fails?
46
- - Concurrency — race conditions, deadlocks, stale state
47
- - Security — injection, auth bypass, data leakage, permission escalation
48
- - Performance — N+1 queries, unbounded loops, memory leaks, unnecessary allocations
49
- - State transitions — invalid state transitions, partial updates, rollback gaps
50
-
51
- For each finding: severity (critical/major/minor), location, reproduction path.
52
-
53
- ### Stage 4: QA Sweep (tiered)
54
- Systematic testing with iterative fix-verify cycles. Choose tier based on risk:
55
-
56
- - **Quick** — critical/high severity flows only
57
- - **Standard** — critical + medium severity, full edge case sweep
58
- - **Exhaustive** — all of the above + cosmetic, edge cases, cross-browser
59
-
60
- 1. Execute tests against each user flow, edge case, error state
61
- 2. Log findings with severity, reproduction steps, evidence
62
- 3. Fix highest-severity first, commit each fix atomically
63
- 4. Re-verify after each fix — confirm fix, check for regressions
64
- 5. Produce health scores (before/after)
65
-
66
- ### Stage 5: Canary (post-deploy)
67
- If deploying to production:
68
-
69
- 1. **Set baseline** — capture pre-deploy screenshots and metrics
70
- 2. **Deploy check** — verify deploy completed successfully
71
- 3. **Canary run** — navigate key user flows, capture screenshots, log console errors
72
- 4. **Compare** — diff against pre-deploy baselines
73
- 5. **Alert** — surface anomalies, performance regressions, new errors
74
- 6. **Recovery** — if critical issues found, suggest rollback
75
-
76
- Output: health status, screenshots (before/after), error log, performance diff, ship/no-go verdict.
77
-
78
- ### Stage 6: Manual Verification Checklist
79
- Based on the plan's verification criteria or spec:
80
- - [ ] Happy path works end-to-end
81
- - [ ] Error path degrades gracefully
82
- - [ ] No regression in adjacent areas
83
- - [ ] Logging/monitoring covers failure modes
84
- - [ ] Documentation matches behavior (if applicable)
85
-
86
- ## Loop Protocol
87
-
88
- 1. Run all 6 stages (skip Stage 5 if not deploying)
89
- 2. Collect findings by severity
90
- 3. If 0 criticals and 0 majors → DONE
91
- 4. If criticals or majors exist → fix highest severity first
92
- 5. After fix → re-run affected stages only
93
- 6. If fix is impossible within scope → surface BLOCKER
94
-
95
- ## Blocker Protocol
96
-
97
- ```
98
- BLOCKER: <what failed>
99
- Context: <what was attempted, why it cannot proceed>
100
- Options:
101
- A: <scope reduction>
102
- B: <alternative approach>
103
- C: <dependency change>
104
- ```
105
-
106
- ## Anti-patterns
107
- - Running stages sequentially when they can be parallel (Standards and Spec reviews are independent)
108
- - Mixing Standards and Spec findings (keep axes separate — one can pass while the other fails)
109
- - Skipping edge case sweep because tests pass (tests confirm behavior, not absence of edge cases)
110
- - Ignoring minors because no criticals exist (accumulated minors signal design debt)
111
- - Pushing through critical failures without surfacing blocker
17
+ 1. Run all tests verify TDD Iron Law (no production code without failing test first). Flag gaps in edge case coverage.
18
+ 2. Run dual-axis review — spawn parallel Standards and Spec sub-agents. Report independently. Do not merge or rank.
19
+ 3. Sweep edge cases — error states, concurrency, security, performance, state transitions. Assign severity (critical/major/minor).
20
+ 4. Run QA sweep tiered (quick/standard/exhaustive). Fix highest severity first. Re-verify after each fix.
21
+ 5. Deploy canary if applicable capture pre-deploy baselines, navigate flows, diff anomalies, suggest rollback if critical.
22
+ 6. Run manual verification happy path, error path, no regression, logging covers failures, docs match behavior.
23
+ 7. Apply loop protocol0 critical + 0 major → done. Fix highest severity, re-run affected stages. Surface blocker with options.
112
24
 
113
25
  ## Routing
114
26
 
115
27
  | Outcome | Route |
116
28
  |---------|-------|
117
- | pass | → oh-ship (all checks pass) |
118
- | fail | → oh-builder (fix issues found) |
119
- | blocker | → surface to user |
29
+ | pass | → oh-ship |
30
+ | fail | → oh-builder (fix issues) |
31
+ | blocker | → surface |
@@ -0,0 +1,51 @@
1
+ # oh-grill — Deep Reference
2
+
3
+ ## When to Use
4
+
5
+ Before committing to a plan. "Writing exactly what I asked for and it's still wrong" = design concept not shared. Cheaper in conversation than in code.
6
+
7
+ **Example:** User shares a plan. You respond with: "Have you considered the failure mode where X happens?" — then walk through the grill modes.
8
+
9
+ ## When NOT to Use
10
+
11
+ - Clear vetted plan needing execution
12
+ - User needs builder, not critic
13
+ - Trivial decisions
14
+
15
+ ## Modes / Workflow
16
+
17
+ ### Mode A: Grill (quick)
18
+
19
+ 1. Read plan/design doc
20
+ 2. Interview one decision at a time — each answer reveals new branches
21
+ 3. Resolve each branch before moving on
22
+ 4. Surface: contradictions, blind spots, unstated assumptions, ambiguous terms
23
+ 5. Propose recommended answer per decision
24
+ 6. Output: verified plan with flagged ambiguities
25
+
26
+ ### Mode B: Grill with Docs (thorough)
27
+
28
+ Same + persists to CONTEXT.md, ADRs, and DDD ubiquitous-language glossary.
29
+
30
+ 1. Load CONTEXT.md + ADRs
31
+ 2. Grill decision tree — each resolution may: update CONTEXT.md terms, create ADR, flag glossary ambiguity
32
+ 3. **Ubiquitous Language extraction** — scan for domain nouns/verbs/concepts. Identify: same word different concepts, different words same concept, vague terms. Propose canonical glossary with grouped tables. Write example dialogue (3-5 exchanges). Flag ambiguities.
33
+ 4. Persist CONTEXT.md changes immediately as language firms
34
+ 5. Output: updated CONTEXT.md + ADRs + UBIQUITOUS_LANGUAGE.md (if significant) + verified plan
35
+
36
+ ## Technique
37
+
38
+ - One question at a time
39
+ - Propose recommended answer per decision
40
+ - Walk full decision tree before accepting
41
+ - Reference CONTEXT.md glossary for ambiguous terms
42
+ - Cross-reference ADRs for architecture decisions
43
+
44
+ ## Anti-patterns
45
+
46
+ - Grilling for sake of grilling
47
+ - Questions you could answer by reading plan/codebase
48
+ - ADRs for trivial decisions
49
+ - Polishing CONTEXT.md before concepts settled
50
+ - Updating terms mid-discussion (let conversation resolve)
51
+ - Not distinguishing "must resolve now" vs "figure out later"
@@ -1,77 +1,30 @@
1
1
  ---
2
2
  name: oh-grill
3
- description: "Stress-test plans and designs through relentless Socratic questioning. Sharpens assumptions, flags blind spots, updates domain docs."
3
+ description: "Stress-tests plans through Socratic questioning to surface assumptions and blind spots"
4
4
  tier: 3
5
- benefits-from: [oh-expert, oh-planner]
6
- triggers:
7
- - "grill"
8
- - "stress test this plan"
9
- - "challenge this"
10
- - "grill me"
11
- - "poke holes"
12
- - "interrogate"
5
+ route:
6
+ pass: oh-planner
7
+ fail: oh-expert
8
+ blocker: surface
13
9
  ---
14
10
 
15
11
  # oh-grill
16
12
 
17
- Stress-tests plans and designs through relentless Socratic questioning. Two modes: plain interrogate (quick) or interrogate + update domain docs (thorough).
13
+ Stress-tests plans through relentless Socratic questioning. Two modes.
18
14
 
19
- ## When to Use
20
- Before committing to a plan or design. When the user says "it is writing exactly what I asked for and it is still wrong" — the design concept is not shared yet. Cheaper to resolve in conversation than in code.
15
+ ## Steps
21
16
 
22
- ## Modes
23
-
24
- ### Mode A: Grill (quick)
25
- Challenge the plan without touching files.
26
-
27
- 1. Read the plan or design doc
28
- 2. Interview the user one decision at a time — each answer reveals new branches
29
- 3. Resolve each branch before moving on
30
- 4. Surface: contradictions, blind spots, unstated assumptions, ambiguous terms
31
- 5. Propose a recommended answer for each decision
32
- 6. Output: verified, stress-tested plan with flagged ambiguities
33
-
34
- ### Mode B: Grill with Docs (thorough)
35
- Same as Mode A, but persists decisions to CONTEXT.md and ADRs, and extracts a DDD ubiquitous-language glossary.
36
-
37
- 1. Load existing CONTEXT.md and ADRs
38
- 2. Grill through the decision tree — each resolved decision may:
39
- - Update CONTEXT.md domain terms (sharpen fuzzy language)
40
- - Create a new ADR for architectural decisions
41
- - Flag an ambiguity in the domain glossary
42
- 3. **Ubiquitous Language extraction** — after the decision tree resolves, scan the conversation for domain-relevant nouns, verbs, and concepts:
43
- - Identify problems: same word for different concepts (ambiguity), different words for same concept (synonyms), vague or overloaded terms
44
- - Propose a canonical glossary with grouped tables (by subdomain, lifecycle, or actor)
45
- - Write an example dialogue (3-5 exchanges) between dev and domain expert showing natural term usage
46
- - Write flagged ambiguities section
47
- 4. Persist changes to CONTEXT.md immediately as language firms up
48
- 5. Output: updated CONTEXT.md + new ADRs + UBIQUITOUS_LANGUAGE.md (if significant terms emerged) + verified plan with resolution trail
49
-
50
- ## Technique
51
-
52
- - Ask one question at a time
53
- - Propose a recommended answer for each decision
54
- - Walk the full decision tree before accepting the design
55
- - Reference domain glossary from CONTEXT.md when terms are ambiguous
56
- - Cross-reference with existing ADRs when architecture is at stake
57
-
58
- ## When NOT to Use
59
- - When you already have a clear, vetted plan and need execution
60
- - When the user needs a builder, not a critic
61
- - For trivial decisions that don't change the design's shape
62
-
63
- ## Anti-patterns
64
- - Grilling for the sake of grilling (redundant with existing reviews)
65
- - Asking questions you could answer by reading the plan or codebase
66
- - Creating ADRs for trivial decisions (not every choice is architecture)
67
- - Polishing CONTEXT.md prose before concepts are settled
68
- - Updating domain terms mid-discussion — let the conversation resolve first
69
- - Not distinguishing between "must resolve now" vs "figure out later"
17
+ 1. Read the plan or design document fully
18
+ 2. Interview decisions one at a time — each answer reveals new branches
19
+ 3. Resolve each branch before moving to the next decision
20
+ 4. Surface contradictions, blind spots, unstated assumptions, ambiguous terms
21
+ 5. Propose recommended answer per decision
22
+ 6. Output verified plan with flagged ambiguities
70
23
 
71
24
  ## Routing
72
25
 
73
26
  | Outcome | Route |
74
27
  |---------|-------|
75
- | pass | → oh-planner (revise plan based on feedback) |
76
- | fail | → oh-expert (resolve confusion or blind spot) |
77
- | blocker | → surface to user |
28
+ | pass | → oh-planner |
29
+ | fail | → oh-expert |
30
+ | blocker | → surface |
@@ -0,0 +1,19 @@
1
+ # oh-guard — Deep Reference
2
+
3
+ ## When to Use
4
+
5
+ When user says "be careful," "don't break anything," or context involves production, destructive changes, or irreversible actions.
6
+
7
+ ## What Triggers a Guard Check
8
+
9
+ - File deletion or rename
10
+ - Git destructive operations (reset, rebase, force push)
11
+ - Dependency changes (add/remove/upgrade)
12
+ - Configuration changes affecting auth, network, data
13
+ - Any command with `--force`, `-f`, `--hard`, or destructive flags
14
+
15
+ ## Anti-patterns
16
+
17
+ - Guarding trivial operations (wastes time)
18
+ - Forgetting to ask on genuinely destructive operations
19
+ - Asking for confirmation on reads or analysis steps