npm - openhermes - Versions diffs - 4.1.0 → 4.9.2 - Mend

openhermes 4.1.0 → 4.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (99) hide show

package/CONTEXT.md +9 -0
package/ETHOS.md +6 -3
package/LICENSE +21 -21
package/README.md +120 -79
package/bootstrap.ts +284 -41
package/harness/agents/oh-browser.md +97 -0
package/harness/agents/oh-builder.md +78 -0
package/harness/agents/oh-facade.md +75 -0
package/harness/agents/oh-fusion.md +45 -0
package/harness/agents/oh-gauntlet.md +71 -0
package/harness/agents/oh-grill.md +71 -0
package/harness/agents/oh-investigate.md +60 -0
package/harness/agents/oh-manifest.md +95 -0
package/harness/agents/oh-plan-review.md +40 -0
package/harness/agents/oh-planner.md +50 -0
package/harness/agents/oh-refactor.md +37 -0
package/harness/agents/oh-retro.md +46 -0
package/harness/agents/oh-review.md +85 -0
package/harness/agents/oh-security.md +83 -0
package/harness/agents/oh-ship.md +76 -0
package/harness/agents/oh-skill-craft.md +38 -0
package/harness/agents/openhermes.md +106 -62
package/harness/codex/AUTOPILOT.md +178 -0
package/harness/codex/CHARTER.md +81 -0
package/harness/commands/oh-doctor.md +193 -14
package/harness/commands/oh-log.md +18 -0
package/harness/instructions/SHELL.md +76 -0
package/harness/skills/oh-ascii/DEEP.md +292 -0
package/harness/skills/oh-ascii/SKILL.md +31 -0
package/harness/skills/oh-ascii/scripts/check_ascii_alignment.py +596 -0
package/harness/skills/oh-browser/DEEP.md +54 -0
package/harness/skills/oh-browser/SKILL.md +30 -0
package/harness/skills/oh-builder/DEEP.md +63 -0
package/harness/skills/oh-builder/SKILL.md +16 -89
package/harness/skills/oh-expert/DEEP.md +85 -0
package/harness/skills/oh-expert/SKILL.md +19 -106
package/harness/skills/oh-facade/DEEP.md +182 -0
package/harness/skills/oh-facade/SKILL.md +34 -0
package/harness/skills/oh-freeze/DEEP.md +18 -0
package/harness/skills/oh-freeze/SKILL.md +15 -15
package/harness/skills/oh-full-output/DEEP.md +25 -0
package/harness/skills/oh-full-output/SKILL.md +28 -0
package/harness/skills/oh-fusion/DEEP.md +120 -0
package/harness/skills/oh-fusion/SKILL.md +36 -0
package/harness/skills/oh-gauntlet/DEEP.md +77 -0
package/harness/skills/oh-gauntlet/SKILL.md +17 -105
package/harness/skills/oh-grill/DEEP.md +51 -0
package/harness/skills/oh-grill/SKILL.md +16 -63
package/harness/skills/oh-guard/DEEP.md +19 -0
package/harness/skills/oh-guard/SKILL.md +15 -20
package/harness/skills/oh-handoff/DEEP.md +48 -0
package/harness/skills/oh-handoff/SKILL.md +18 -19
package/harness/skills/oh-health/DEEP.md +74 -0
package/harness/skills/oh-health/SKILL.md +17 -76
package/harness/skills/oh-init/DEEP.md +85 -0
package/harness/skills/oh-init/SKILL.md +17 -197
package/harness/skills/oh-investigate/DEEP.md +171 -0
package/harness/skills/oh-investigate/SKILL.md +18 -61
package/harness/skills/oh-issue/DEEP.md +21 -0
package/harness/skills/oh-issue/SKILL.md +16 -23
package/harness/skills/oh-learn/DEEP.md +44 -0
package/harness/skills/oh-learn/SKILL.md +17 -79
package/harness/skills/oh-manifest/DEEP.md +92 -0
package/harness/skills/oh-manifest/SKILL.md +15 -107
package/harness/skills/oh-plan-review/DEEP.md +90 -0
package/harness/skills/oh-plan-review/SKILL.md +19 -114
package/harness/skills/oh-planner/DEEP.md +172 -0
package/harness/skills/oh-planner/SKILL.md +16 -143
package/harness/skills/oh-prd/DEEP.md +45 -0
package/harness/skills/oh-prd/SKILL.md +15 -22
package/harness/skills/oh-refactor/DEEP.md +122 -0
package/harness/skills/oh-refactor/SKILL.md +33 -0
package/harness/skills/oh-retro/DEEP.md +26 -0
package/harness/skills/oh-retro/SKILL.md +17 -20
package/harness/skills/oh-review/DEEP.md +87 -0
package/harness/skills/oh-review/SKILL.md +17 -96
package/harness/skills/oh-security/DEEP.md +83 -0
package/harness/skills/oh-security/SKILL.md +18 -96
package/harness/skills/oh-ship/DEEP.md +141 -0
package/harness/skills/oh-ship/SKILL.md +18 -26
package/harness/skills/oh-skill-craft/DEEP.md +369 -0
package/harness/skills/oh-skill-craft/SKILL.md +20 -93
package/harness/skills/oh-skills-link/DEEP.md +16 -0
package/harness/skills/oh-skills-link/SKILL.md +15 -16
package/harness/skills/oh-skills-list/DEEP.md +20 -0
package/harness/skills/oh-skills-list/SKILL.md +14 -18
package/harness/skills/oh-triage/DEEP.md +23 -0
package/harness/skills/oh-triage/SKILL.md +15 -20
package/harness/skills/oh-worktree/DEEP.md +169 -0
package/harness/skills/oh-worktree/SKILL.md +32 -0
package/lib/harness-resolver.ts +10 -12
package/package.json +9 -4
package/scripts/count-tokens.mjs +158 -0
package/scripts/oh-doctor.ps1 +342 -0
package/harness/codex/CONSTITUTION.md +0 -70
package/harness/codex/ROUTING.md +0 -127
package/harness/instructions/RUNTIME.md +0 -55
package/harness/skills/oh-caveman/SKILL.md +0 -33
package/lib/logger.ts +0 -69

package/harness/skills/oh-freeze/DEEP.md ADDED Viewed

@@ -0,0 +1,18 @@
+# oh-freeze — Deep Reference
+## When to Use
+When the user says "don't touch anything outside [path]" or "only edit files in [dir]." Prevents accidental edits to configuration, infrastructure, or unrelated modules.
+## Mode
+- Allowed paths: only the specified directory and its children
+- If you need to edit outside: state the reason and ask
+- Read operations unrestricted anywhere
+- File creation restricted to allowed paths
+## Anti-patterns
+- Editing outside allowed path without asking
+- Reading unrelated files for context and using that to justify edits
+- "Just this one file outside" — ask first

package/harness/skills/oh-freeze/SKILL.md CHANGED Viewed

@@ -1,28 +1,28 @@
 ---
 name: oh-freeze
-description: "Restrict file edits to a specific directory for the session"
+description: "Restricts file edits to a specific directory for the session."
+tier: 2
+route:
+  pass: mode
+  fail: mode
+  blocker: surface
 ---
 # oh-freeze
-## When to Use
-When debugging a specific module and you want to prevent accidentally "fixing" unrelated code. Scopes all Edit/Write operations to one directory.
+Restrict write operations to one allowed directory.
-## Workflow
-1. Specify target directory to freeze
-2. All Edit/Write operations outside that directory are blocked
-3. User can explicitly approve cross-boundary edits
-4. Unfreeze to release the boundary
+## Steps
-## Anti-patterns
-- Freezing too broadly (defeats the purpose)
-- Forgetting to unfreeze when task scope expands
-- Using freeze as a substitute for git discipline
+1. Identify the allowed directory from user instructions
+2. Restrict all write/edit/create operations to the allowed path
+3. Allow read operations unrestricted anywhere
+4. If a write outside the allowed path is required, state the reason and ask before proceeding
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → [return to prior skill — scope lock active] |
-| fail | → [surface issue — freeze not applied] |
-| blocker | → surface to user |
+| pass | → mode |
+| fail | → mode |
+| blocker | → surface |

package/harness/skills/oh-full-output/DEEP.md ADDED Viewed

@@ -0,0 +1,25 @@
+# oh-full-output — Deep Reference
+## When to Use
+When generating long files, multi-component output, or any task where partial output would be a broken deliverable.
+## Banned Patterns (Hard Failures)
+**Code:** `// ...`, `// rest of code`, `// implement here`, `// TODO`, `/* ... */`, `// similar`, `// continue`, `// same as`, bare `...`.
+**Prose:** "let me know if you want me to continue", "for brevity", "the rest follows the same pattern", "and so on", "I'll leave that as an exercise".
+**Structural:** Skeleton instead of full impl. First/last sections skipping middle. One example + description for repeated logic. Describing instead of writing.
+## Long Outputs
+When approaching token limit:
+- Write at full quality to a clean breakpoint (end of function/file/section)
+- Do NOT compress remaining sections
+- End with: `[PAUSED — X of Y complete. Send "continue" to resume from: next section name]`
+- On "continue": pick up exactly where stopped. No recap.
+## Quick Check
+Before finalizing: no banned patterns, every item present and finished, code blocks contain runnable code, nothing shortened.

package/harness/skills/oh-full-output/SKILL.md ADDED Viewed

@@ -0,0 +1,28 @@
+---
+name: oh-full-output
+description: "Override LLM truncation — enforce complete code generation, ban placeholders."
+tier: 2
+route:
+  pass: done
+  fail: [surface, oh-expert]
+  blocker: surface
+---
+# oh-full-output
+Override truncation. Enforce complete generation. Ban placeholders.
+## Steps
+1. Scope — Count distinct deliverables. Lock the number.
+2. Build — Generate every deliverable completely. No partial drafts.
+3. Cross-check — Re-read request. Compare deliverable count. Add anything missing.
+4. Verify — No banned patterns, every item present, code blocks are runnable, nothing shortened.
+## Routing
+| Outcome | Route |
+|---------|-------|
+| pass | → done |
+| fail | → surface, oh-expert |
+| blocker | → surface |

package/harness/skills/oh-fusion/DEEP.md ADDED Viewed

@@ -0,0 +1,120 @@
+# oh-fusion — Deep Reference
+## When to Use
+Use when a user wants to bring an external skill or capability into the OpenHermes harness as an OH-native skill.
+## Phase 1: Discovery
+| Source | Access |
+|--------|--------|
+| `.agents/skills/<name>/SKILL.md` | Read file |
+| `npx skills` package | `npx skills find <query>` |
+| URL | Fetch content |
+| User path | Resolve and read |
+| Inline text | Capture raw |
+Confirm: content loaded, frontmatter present, no access restrictions. For multiple: all loaded.
+## Phase 2: Analysis
+### Depth Scoring
+| Metric | Assessment |
+|--------|-----------|
+| Lines | SKILL.md length |
+| Concrete rules | "must/never/always/banned" count |
+| Examples | Before/after or usage code blocks |
+| Anti-patterns | Explicit "don't" sections |
+| Workflow steps | Sequential, actionable steps |
+| Routing table | pass/fail/blocker defined |
+**Score:** High (70-100) = concrete + examples + routing. Medium (30-69) = some structure. Low (0-29) = vague, no rules.
+### Overlap Detection
+Compare against existing `harness/skills/oh-*` skills. Overlap: none / partial (complementary) / complete (redundant).
+### Convention Check
+- Clear description for triggering? Actionable instructions? Anti-patterns? Examples? Measurable outcomes? No time-sensitive refs? No platform assumptions?
+### Report Template
+```markdown
+**Depth:** <0-100> — <High/Medium/Low>
+**Overlap:** <existing skill> — <none/partial/complete>
+**Verdict:** <keep / fuse / discard / ask>
+**Action:** <port directly / fuse with X / discard>
+```
+## Phase 3: Decision
+| Verdict | Action |
+|---------|--------|
+| Keep | High signal, no overlap. Port to `oh-<name>`. |
+| Fuse | Partial overlap with existing. Merge complementary DNA. |
+| Discard | Low signal or complete overlap. Surface reasoning. |
+| Ask | Ambiguous quality. Surface findings. |
+When in doubt: prefer fuse over keep, keep over discard if ANY unique signal.
+## Phase 4: Adaptation
+### Frontmatter
+```yaml
+name: oh-<name>
+description: "Adapted from <source>. Core function. Use when ..."
+tier: <2|3|4>
+```
+### Body Structure
+1. **Summary** — one-paragraph of what the skill does
+2. **When to Use** — clear triggering context
+3. **Workflow** — numbered steps (the core)
+4. **Anti-patterns** — what NOT to do
+5. **Routing** — pass/fail/blocker table
+### Adaptation Rules
+- Remove emojis. Replace ecosystem terms with OH equivalents.
+- Convert relative paths to OH harness conventions.
+- Add routing table based on the skill's purpose.
+- Keep all concrete rules, examples, and anti-patterns from the original.
+- Discard fluff, philosophy, and motivational language.
+- Preserve the original's unique signal — that's why you're importing it.
+### Naming
+Match `^[a-z0-9]+(-[a-z0-9]+)*$`, prefix `oh-`. Original name if good fit, adapt if not. Fusion names signal combined purpose.
+## Phase 5: Fusion (opt — skip for single imports)
+For each skill being fused, identify:
+- **Unique concepts** — content only this skill has
+- **Overlapping content** — same idea expressed differently (keep the better version)
+- **Conflicting directives** — skills that say opposite things (surface to user)
+### Merge Architecture
+Structure so each source contributes its strength. Do NOT concatenate — must read as one coherent workflow, not three documents glued together.
+```markdown
+## Combined Workflow
+### Phase A: <from skill 1>
+### Phase B: <from skill 2>
+```
+### Naming
+Name signals combined purpose, not individual sources. E.g., `oh-facade` from redesign + design-taste + high-end-visual, not `oh-redesign-plus-taste`.
+## Phase 6: Integration
+1. **Create file** → user dir (`~/.config/opencode/skills/oh-<name>/SKILL.md`)
+2. **Wire AUTOPILOT** → add to auto-classify matrix in `harness/codex/AUTOPILOT.md`: signal keywords → classification → "Load **oh-<name>**. Do not ask."
+3. **Wire routing** → add `route:` frontmatter in the skill. Dynamic loading reads `route.pass`, `route.fail`, `route.blocker` directly from `SKILL.md` — no ROUTING.md edit needed. Skill becomes routable automatically.
+4. **Wire AGENTS.md** → add to skills table with tier and purpose. Increment total count.
+5. **Wire openhermes.md** → add to orchestrator's skill list in `harness/agents/openhermes.md`.
+6. **Verify** → route to `oh-skills-link` to confirm OpenCode discovers it.
+## Anti-patterns
+- Importing without analysis (always run Phase 2)
+- Keeping everything — ~50% of external skills is fluff
+- Fusing incompatible domains (confusing to model and user)
+- Naming after source ("oh-tailwind-v2") instead of capability ("oh-styles")
+- Skipping route frontmatter — without it, autopilot can't route
+- Overwriting existing routing without checking for collisions

package/harness/skills/oh-fusion/SKILL.md ADDED Viewed

@@ -0,0 +1,36 @@
+---
+name: oh-fusion
+description: "Use when the user has an existing skill, finds a skill in their .agents/skills, or wants to bring an external capability into OH as a skill."
+tier: 3
+route:
+  pass:
+    - oh-skills-link
+    - oh-skill-craft
+  fail: oh-skill-craft
+  blocker: surface
+---
+# oh-fusion
+Skill ingestion pipeline: Discover → Analyze → Decide → Adapt → Fuse → Integrate.
+## Steps
+1. Load skill content — read from `.agents/skills/`, `npx skills`, URL, user path, or inline text.
+2. Analyze depth — score by lines, concrete rules, examples, anti-patterns, workflow steps, and routing.
+3. Detect overlap — compare against existing `oh-*` skills. Report none/partial/complete.
+4. Decide verdict — Keep (high signal, no overlap), Fuse (partial overlap, merge DNA), Discard (low/no signal), or Ask (ambiguous).
+5. Adapt to OH-native format — remove emojis, convert paths, add routing, preserve unique signal.
+6. Fuse if merging — identify unique concepts from each source, resolve conflicts, write one coherent workflow.
+7. Integrate — create skill file, wire AUTOPILOT, routing, AGENTS.md, openhermes.md.
+8. Verify — route to oh-skills-link to confirm discovery.
+## Routing
+| Outcome | Route |
+|---------|-------|
+| Integration complete | → oh-skills-link (verify discovery) |
+| Fusion needs iteration | → oh-skill-craft |
+| Analysis: discard | → surface |
+| Analysis: ask | → surface with recs |
+| Blocker | → surface |

package/harness/skills/oh-gauntlet/DEEP.md ADDED Viewed

@@ -0,0 +1,77 @@
+# oh-gauntlet — Deep Reference
+## Stage 1: Test Suite
+Run all tests. Check they test behavior (not implementation). Flag gaps in edge case coverage. Do NOT add tests — surface as findings.
+**TDD Iron Law:** `NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST`. If code was written before its test — flag as severe quality gap.
+**RED-GREEN-REFACTOR verification:** For new code in diff, verify each function has a test, the test was seen to fail before implementation (commit history), minimal code was written to pass each test, and tests use real code (not mocks unless unavoidable).
+**Rationalization Table:**
+| Excuse | Reality |
+|--------|---------|
+| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
+| "I'll test after" | Tests passing immediately prove nothing. |
+| "Already manually tested" | Ad-hoc ≠ systematic. Can't re-run. |
+| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
+| "TDD will slow me down" | TDD faster than debugging. |
+| "Existing code has no tests" | You're improving it. Add tests for existing code. |
+**Red Flags** (any = quality gap): Code before test · Test passes immediately · Can't explain why test failed · Rationalizing "just this once" · "Keep as reference" or "adapt existing code" · "Already spent X hours, deleting is wasteful" · "TDD is dogmatic, I'm being pragmatic" · "Tests after achieve the same purpose"
+**TDD Verification Checklist:**
+- [ ] Every new function has a test
+- [ ] Watched each test fail before implementing (evidence)
+- [ ] Wrote minimal code to pass each test
+- [ ] All tests pass
+- [ ] Output pristine (no errors/warnings)
+- [ ] Tests use real code (mocks only if unavoidable)
+- [ ] Edge cases and errors covered
+## Stage 2: Dual-Axis Review (parallel sub-agents)
+- **Standards** — read documented standards (CONTEXT.md, AGENTS.md, eslint, ADRs). Report every violation. Cite source. Distinguish hard violations from judgment calls.
+- **Spec** — read spec source (plan/issue/PRD). Report missing/partial requirements, scope creep, wrong implementations. Quote the spec.
+Report independently. Do not merge or rank.
+## Stage 3: Edge Case Sweep
+- Error states — invalid inputs, missing files, network failure
+- Concurrency — races, deadlocks, stale state
+- Security — injection, auth bypass, data leakage
+- Performance — N+1, unbounded loops, leaks
+- State transitions — invalid transitions, partial updates
+Per finding: severity (critical/major/minor), location, reproduction.
+## Stage 4: QA Sweep (tiered)
+Quick (critical only) / Standard (+ medium) / Exhaustive (+ cosmetic). Execute flows, log findings, fix highest severity first, re-verify after each fix.
+## Stage 5: Canary (post-deploy)
+Capture pre-deploy baselines. Deploy. Navigate key flows. Diff against baselines. Surface anomalies. Suggest rollback if critical.
+## Stage 6: Manual Verification
+- Happy path, error path, no regression, logging covers failures, docs match behavior.
+## Loop Protocol
+1. Run all stages (skip 5 if not deploying)
+2. 0 critical + 0 major → DONE
+3. Criticals/majors exist → fix highest severity, re-run affected stages
+4. Fix impossible → BLOCKER: `<what> | Options: A, B, C`
+## Anti-patterns
+- Sequential when parallel possible
+- Mixing Standards and Spec findings (keep axes separate)
+- Skipping edge case sweep because tests pass
+- Ignoring minors (accumulated design debt)
+- Pushing critical failures without surfacing
+- Skipping TDD — writing code without a failing test first
+- Tests that pass immediately without having failed first (might test wrong thing)

package/harness/skills/oh-gauntlet/SKILL.md CHANGED Viewed

@@ -1,119 +1,31 @@
 ---
 name: oh-gauntlet
-description: "Rigorous multi-axis testing gauntlet: unit, integration, edge cases, dual-axis review. Loops until done or blocker."
+description: "Use when code is ready for thorough testing — unit tests, integration, edge cases, dual-axis review, and QA. Loops until done or blocker."
 tier: 4
-benefits-from: [oh-expert, oh-builder]
-triggers:
-  - "gauntlet"
-  - "test everything"
-  - "rigorous testing"
-  - "review all angles"
-  - "qa"
-  - "full review"
-  - "run the gauntlet"
-  - "validate"
+route:
+  pass: oh-ship
+  fail: oh-builder
+  blocker: surface
 ---
 # oh-gauntlet
-Runs the current build through a multi-axis gauntlet: tests, edge cases, standards review, spec review. Spawns parallel sub-agents for independent axes. Loops until everything passes or a blocker is surfaced.
+Multi-axis testing: test suite, dual-axis review, edge case sweep, QA, canary.
-## Gauntlet Stages
+## Steps
-Each stage runs independently (parallel where possible). A stage that fails loops: fix → re-run → verify → pass or blocker.
-### Stage 1: Test Suite
-Run all existing tests. Check both that they pass and that they actually test the right things:
-- **Unit tests** — do they pass? Are they testing behavior or implementation?
-- **Integration tests** — do the real code paths work end-to-end?
-- **Edge case coverage** — empty states, error states, boundary conditions, concurrency
-If tests are missing or weak, flag what should be added. Do not add them here — surface as finding.
-### Stage 2: Dual-Axis Review (parallel sub-agents)
-Spawn two sub-agents simultaneously:
-**Standards sub-agent:** Read the repo's documented standards (CONTEXT.md, AGENTS.md, eslint config, ADRs). Then read the diff. Report every place the diff violates a documented standard. Cite the standard source. Distinguish hard violations from judgement calls.
-**Spec sub-agent:** Read the spec source (plan.md, issue, PRD, or user's description). Then read the diff. Report: (a) requirements that are missing or partial, (b) scope creep (behavior not asked for), (c) requirements that look implemented but wrong. Quote the spec.
-Report both axes independently — do not merge or rank. A change can pass one and fail the other.
-### Stage 3: Edge Case Sweep
-Systematic edge case analysis for the changed code:
-- Error states — what happens when inputs are invalid, files are missing, network fails?
-- Concurrency — race conditions, deadlocks, stale state
-- Security — injection, auth bypass, data leakage, permission escalation
-- Performance — N+1 queries, unbounded loops, memory leaks, unnecessary allocations
-- State transitions — invalid state transitions, partial updates, rollback gaps
-For each finding: severity (critical/major/minor), location, reproduction path.
-### Stage 4: QA Sweep (tiered)
-Systematic testing with iterative fix-verify cycles. Choose tier based on risk:
-- **Quick** — critical/high severity flows only
-- **Standard** — critical + medium severity, full edge case sweep
-- **Exhaustive** — all of the above + cosmetic, edge cases, cross-browser
-1. Execute tests against each user flow, edge case, error state
-2. Log findings with severity, reproduction steps, evidence
-3. Fix highest-severity first, commit each fix atomically
-4. Re-verify after each fix — confirm fix, check for regressions
-5. Produce health scores (before/after)
-### Stage 5: Canary (post-deploy)
-If deploying to production:
-1. **Set baseline** — capture pre-deploy screenshots and metrics
-2. **Deploy check** — verify deploy completed successfully
-3. **Canary run** — navigate key user flows, capture screenshots, log console errors
-4. **Compare** — diff against pre-deploy baselines
-5. **Alert** — surface anomalies, performance regressions, new errors
-6. **Recovery** — if critical issues found, suggest rollback
-Output: health status, screenshots (before/after), error log, performance diff, ship/no-go verdict.
-### Stage 6: Manual Verification Checklist
-Based on the plan's verification criteria or spec:
-- [ ] Happy path works end-to-end
-- [ ] Error path degrades gracefully
-- [ ] No regression in adjacent areas
-- [ ] Logging/monitoring covers failure modes
-- [ ] Documentation matches behavior (if applicable)
-## Loop Protocol
-1. Run all 6 stages (skip Stage 5 if not deploying)
-2. Collect findings by severity
-3. If 0 criticals and 0 majors → DONE
-4. If criticals or majors exist → fix highest severity first
-5. After fix → re-run affected stages only
-6. If fix is impossible within scope → surface BLOCKER
-## Blocker Protocol
-```
-BLOCKER: <what failed>
-Context: <what was attempted, why it cannot proceed>
-Options:
-  A: <scope reduction>
-  B: <alternative approach>
-  C: <dependency change>
-```
-## Anti-patterns
-- Running stages sequentially when they can be parallel (Standards and Spec reviews are independent)
-- Mixing Standards and Spec findings (keep axes separate — one can pass while the other fails)
-- Skipping edge case sweep because tests pass (tests confirm behavior, not absence of edge cases)
-- Ignoring minors because no criticals exist (accumulated minors signal design debt)
-- Pushing through critical failures without surfacing blocker
+1. Run all tests — verify TDD Iron Law (no production code without failing test first). Flag gaps in edge case coverage.
+2. Run dual-axis review — spawn parallel Standards and Spec sub-agents. Report independently. Do not merge or rank.
+3. Sweep edge cases — error states, concurrency, security, performance, state transitions. Assign severity (critical/major/minor).
+4. Run QA sweep — tiered (quick/standard/exhaustive). Fix highest severity first. Re-verify after each fix.
+5. Deploy canary if applicable — capture pre-deploy baselines, navigate flows, diff anomalies, suggest rollback if critical.
+6. Run manual verification — happy path, error path, no regression, logging covers failures, docs match behavior.
+7. Apply loop protocol — 0 critical + 0 major → done. Fix highest severity, re-run affected stages. Surface blocker with options.
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → oh-ship (all checks pass) |
-| fail | → oh-builder (fix issues found) |
-| blocker | → surface to user |
+| pass | → oh-ship |
+| fail | → oh-builder (fix issues) |
+| blocker | → surface |

package/harness/skills/oh-grill/DEEP.md ADDED Viewed

@@ -0,0 +1,51 @@
+# oh-grill — Deep Reference
+## When to Use
+Before committing to a plan. "Writing exactly what I asked for and it's still wrong" = design concept not shared. Cheaper in conversation than in code.
+**Example:** User shares a plan. You respond with: "Have you considered the failure mode where X happens?" — then walk through the grill modes.
+## When NOT to Use
+- Clear vetted plan needing execution
+- User needs builder, not critic
+- Trivial decisions
+## Modes / Workflow
+### Mode A: Grill (quick)
+1. Read plan/design doc
+2. Interview one decision at a time — each answer reveals new branches
+3. Resolve each branch before moving on
+4. Surface: contradictions, blind spots, unstated assumptions, ambiguous terms
+5. Propose recommended answer per decision
+6. Output: verified plan with flagged ambiguities
+### Mode B: Grill with Docs (thorough)
+Same + persists to CONTEXT.md, ADRs, and DDD ubiquitous-language glossary.
+1. Load CONTEXT.md + ADRs
+2. Grill decision tree — each resolution may: update CONTEXT.md terms, create ADR, flag glossary ambiguity
+3. **Ubiquitous Language extraction** — scan for domain nouns/verbs/concepts. Identify: same word different concepts, different words same concept, vague terms. Propose canonical glossary with grouped tables. Write example dialogue (3-5 exchanges). Flag ambiguities.
+4. Persist CONTEXT.md changes immediately as language firms
+5. Output: updated CONTEXT.md + ADRs + UBIQUITOUS_LANGUAGE.md (if significant) + verified plan
+## Technique
+- One question at a time
+- Propose recommended answer per decision
+- Walk full decision tree before accepting
+- Reference CONTEXT.md glossary for ambiguous terms
+- Cross-reference ADRs for architecture decisions
+## Anti-patterns
+- Grilling for sake of grilling
+- Questions you could answer by reading plan/codebase
+- ADRs for trivial decisions
+- Polishing CONTEXT.md before concepts settled
+- Updating terms mid-discussion (let conversation resolve)
+- Not distinguishing "must resolve now" vs "figure out later"

package/harness/skills/oh-grill/SKILL.md CHANGED Viewed

@@ -1,77 +1,30 @@
 ---
 name: oh-grill
-description: "Stress-test plans and designs through relentless Socratic questioning. Sharpens assumptions, flags blind spots, updates domain docs."
+description: "Stress-tests plans through Socratic questioning to surface assumptions and blind spots"
 tier: 3
-benefits-from: [oh-expert, oh-planner]
-triggers:
-  - "grill"
-  - "stress test this plan"
-  - "challenge this"
-  - "grill me"
-  - "poke holes"
-  - "interrogate"
+route:
+  pass: oh-planner
+  fail: oh-expert
+  blocker: surface
 ---
 # oh-grill
-Stress-tests plans and designs through relentless Socratic questioning. Two modes: plain interrogate (quick) or interrogate + update domain docs (thorough).
+Stress-tests plans through relentless Socratic questioning. Two modes.
-## When to Use
-Before committing to a plan or design. When the user says "it is writing exactly what I asked for and it is still wrong" — the design concept is not shared yet. Cheaper to resolve in conversation than in code.
+## Steps
-## Modes
-### Mode A: Grill (quick)
-Challenge the plan without touching files.
-1. Read the plan or design doc
-2. Interview the user one decision at a time — each answer reveals new branches
-3. Resolve each branch before moving on
-4. Surface: contradictions, blind spots, unstated assumptions, ambiguous terms
-5. Propose a recommended answer for each decision
-6. Output: verified, stress-tested plan with flagged ambiguities
-### Mode B: Grill with Docs (thorough)
-Same as Mode A, but persists decisions to CONTEXT.md and ADRs, and extracts a DDD ubiquitous-language glossary.
-1. Load existing CONTEXT.md and ADRs
-2. Grill through the decision tree — each resolved decision may:
-   - Update CONTEXT.md domain terms (sharpen fuzzy language)
-   - Create a new ADR for architectural decisions
-   - Flag an ambiguity in the domain glossary
-3. **Ubiquitous Language extraction** — after the decision tree resolves, scan the conversation for domain-relevant nouns, verbs, and concepts:
-   - Identify problems: same word for different concepts (ambiguity), different words for same concept (synonyms), vague or overloaded terms
-   - Propose a canonical glossary with grouped tables (by subdomain, lifecycle, or actor)
-   - Write an example dialogue (3-5 exchanges) between dev and domain expert showing natural term usage
-   - Write flagged ambiguities section
-4. Persist changes to CONTEXT.md immediately as language firms up
-5. Output: updated CONTEXT.md + new ADRs + UBIQUITOUS_LANGUAGE.md (if significant terms emerged) + verified plan with resolution trail
-## Technique
-- Ask one question at a time
-- Propose a recommended answer for each decision
-- Walk the full decision tree before accepting the design
-- Reference domain glossary from CONTEXT.md when terms are ambiguous
-- Cross-reference with existing ADRs when architecture is at stake
-## When NOT to Use
-- When you already have a clear, vetted plan and need execution
-- When the user needs a builder, not a critic
-- For trivial decisions that don't change the design's shape
-## Anti-patterns
-- Grilling for the sake of grilling (redundant with existing reviews)
-- Asking questions you could answer by reading the plan or codebase
-- Creating ADRs for trivial decisions (not every choice is architecture)
-- Polishing CONTEXT.md prose before concepts are settled
-- Updating domain terms mid-discussion — let the conversation resolve first
-- Not distinguishing between "must resolve now" vs "figure out later"
+1. Read the plan or design document fully
+2. Interview decisions one at a time — each answer reveals new branches
+3. Resolve each branch before moving to the next decision
+4. Surface contradictions, blind spots, unstated assumptions, ambiguous terms
+5. Propose recommended answer per decision
+6. Output verified plan with flagged ambiguities
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → oh-planner (revise plan based on feedback) |
-| fail | → oh-expert (resolve confusion or blind spot) |
-| blocker | → surface to user |
+| pass | → oh-planner |
+| fail | → oh-expert |
+| blocker | → surface |

package/harness/skills/oh-guard/DEEP.md ADDED Viewed

@@ -0,0 +1,19 @@
+# oh-guard — Deep Reference
+## When to Use
+When user says "be careful," "don't break anything," or context involves production, destructive changes, or irreversible actions.
+## What Triggers a Guard Check
+- File deletion or rename
+- Git destructive operations (reset, rebase, force push)
+- Dependency changes (add/remove/upgrade)
+- Configuration changes affecting auth, network, data
+- Any command with `--force`, `-f`, `--hard`, or destructive flags
+## Anti-patterns
+- Guarding trivial operations (wastes time)
+- Forgetting to ask on genuinely destructive operations
+- Asking for confirmation on reads or analysis steps