npm - openhermes - Versions diffs - 4.3.0 → 4.11.2 - Mend

openhermes 4.3.0 → 4.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (143) hide show

package/CONTEXT.md +10 -1
package/README.md +54 -42
package/bootstrap.ts +396 -142
package/harness/agents/oh-browser.md +97 -0
package/harness/agents/oh-builder.md +78 -0
package/harness/agents/oh-facade.md +75 -0
package/harness/agents/oh-fusion.md +45 -0
package/harness/agents/oh-gauntlet.md +71 -0
package/harness/agents/oh-grill.md +71 -0
package/harness/agents/oh-investigate.md +60 -0
package/harness/agents/oh-manifest.md +95 -0
package/harness/agents/oh-plan-review.md +40 -0
package/harness/agents/oh-planner.md +50 -0
package/harness/agents/oh-refactor.md +37 -0
package/harness/agents/oh-retro.md +46 -0
package/harness/agents/oh-review.md +85 -0
package/harness/agents/oh-security.md +83 -0
package/harness/agents/oh-ship.md +76 -0
package/harness/agents/oh-skill-craft.md +38 -0
package/harness/agents/openhermes.md +28 -73
package/harness/codex/AUTOPILOT.md +235 -87
package/harness/codex/CHARTER.md +80 -0
package/harness/instructions/SHELL.md +76 -0
package/harness/lib/background/background.test.ts +197 -0
package/harness/lib/background/index.ts +7 -0
package/harness/lib/background/interfaces.ts +31 -0
package/harness/lib/background/manager.ts +320 -0
package/harness/lib/composer/compose.test.ts +168 -0
package/harness/lib/composer/compose.ts +65 -0
package/harness/lib/composer/fragments/01-identity.md +1 -0
package/harness/lib/composer/fragments/02-delegation.md +6 -0
package/harness/lib/composer/fragments/03-permissions.md +13 -0
package/harness/lib/composer/fragments/04-task-flow.md +15 -0
package/harness/lib/composer/fragments/05-confidence.md +5 -0
package/harness/lib/composer/fragments/06-parallelization.md +17 -0
package/harness/lib/composer/fragments/07-shell.md +41 -0
package/harness/lib/composer/fragments/08-routing.md +8 -0
package/harness/lib/composer/fragments/09-guardrails.md +12 -0
package/harness/lib/composer/index.ts +1 -0
package/harness/lib/hooks/builtins/confidence-gate-hook.ts +70 -0
package/harness/lib/hooks/builtins/delegation-depth-hook.ts +59 -0
package/harness/lib/hooks/builtins/error-recovery-hook.ts +107 -0
package/harness/lib/hooks/builtins/memory-sync-hook.ts +73 -0
package/harness/lib/hooks/builtins/plan-check-hook.ts +43 -0
package/harness/lib/hooks/builtins/route-tracking-hook.ts +147 -0
package/harness/lib/hooks/builtins/sanity-check-hook.ts +52 -0
package/harness/lib/hooks/builtins/shell-detect-hook.ts +96 -0
package/harness/lib/hooks/hooks.test.ts +1016 -0
package/harness/lib/hooks/index.ts +30 -0
package/harness/lib/hooks/registry.ts +416 -0
package/harness/lib/hooks/types.ts +71 -0
package/harness/lib/memory/index.ts +18 -0
package/harness/lib/memory/interfaces.ts +53 -0
package/harness/lib/memory/memory-manager.ts +205 -0
package/harness/lib/memory/memory.test.ts +491 -0
package/harness/lib/memory/plan-store.ts +366 -0
package/harness/lib/recovery/handler.ts +243 -0
package/harness/lib/recovery/index.ts +14 -0
package/harness/lib/recovery/interfaces.ts +48 -0
package/harness/lib/recovery/patterns.ts +149 -0
package/harness/lib/recovery/recovery.test.ts +312 -0
package/harness/lib/sanity/anomaly-tracker.ts +127 -0
package/harness/lib/sanity/checker.ts +178 -0
package/harness/lib/sanity/index.ts +13 -0
package/harness/lib/sanity/interfaces.ts +24 -0
package/harness/lib/sanity/sanity.test.ts +472 -0
package/harness/lib/sync/file-watcher.ts +174 -0
package/harness/lib/sync/index.ts +11 -0
package/harness/lib/sync/interfaces.ts +27 -0
package/harness/lib/sync/plan-sync.ts +536 -0
package/harness/lib/sync/sync.test.ts +832 -0
package/harness/skills/oh-ascii/DEEP.md +292 -0
package/harness/skills/oh-ascii/SKILL.md +31 -0
package/harness/skills/oh-ascii/scripts/check_ascii_alignment.py +596 -0
package/harness/skills/oh-browser/DEEP.md +54 -0
package/harness/skills/oh-browser/SKILL.md +30 -0
package/harness/skills/oh-builder/DEEP.md +63 -0
package/harness/skills/oh-builder/SKILL.md +12 -90
package/harness/skills/oh-expert/DEEP.md +85 -0
package/harness/skills/oh-expert/SKILL.md +13 -106
package/harness/skills/oh-facade/DEEP.md +182 -0
package/harness/skills/oh-facade/SKILL.md +15 -279
package/harness/skills/oh-freeze/DEEP.md +18 -0
package/harness/skills/oh-freeze/SKILL.md +10 -19
package/harness/skills/oh-full-output/DEEP.md +25 -0
package/harness/skills/oh-full-output/SKILL.md +12 -65
package/harness/skills/oh-fusion/DEEP.md +120 -0
package/harness/skills/oh-fusion/SKILL.md +17 -295
package/harness/skills/oh-gauntlet/DEEP.md +77 -0
package/harness/skills/oh-gauntlet/SKILL.md +13 -105
package/harness/skills/oh-grill/DEEP.md +51 -0
package/harness/skills/oh-grill/SKILL.md +12 -63
package/harness/skills/oh-guard/DEEP.md +19 -0
package/harness/skills/oh-guard/SKILL.md +10 -24
package/harness/skills/oh-handoff/DEEP.md +48 -0
package/harness/skills/oh-handoff/SKILL.md +13 -23
package/harness/skills/oh-health/DEEP.md +74 -0
package/harness/skills/oh-health/SKILL.md +13 -76
package/harness/skills/oh-init/DEEP.md +85 -0
package/harness/skills/oh-init/SKILL.md +13 -127
package/harness/skills/oh-investigate/DEEP.md +171 -0
package/harness/skills/oh-investigate/SKILL.md +13 -66
package/harness/skills/oh-issue/DEEP.md +21 -0
package/harness/skills/oh-issue/SKILL.md +11 -27
package/harness/skills/oh-manifest/DEEP.md +92 -0
package/harness/skills/oh-manifest/SKILL.md +12 -109
package/harness/skills/oh-plan-review/DEEP.md +90 -0
package/harness/skills/oh-plan-review/SKILL.md +13 -115
package/harness/skills/oh-planner/DEEP.md +172 -0
package/harness/skills/oh-planner/SKILL.md +12 -149
package/harness/skills/oh-prd/DEEP.md +45 -0
package/harness/skills/oh-prd/SKILL.md +10 -26
package/harness/skills/oh-refactor/DEEP.md +122 -0
package/harness/skills/oh-refactor/SKILL.md +17 -410
package/harness/skills/oh-retro/DEEP.md +26 -0
package/harness/skills/oh-retro/SKILL.md +12 -24
package/harness/skills/oh-review/DEEP.md +87 -0
package/harness/skills/oh-review/SKILL.md +11 -97
package/harness/skills/oh-security/DEEP.md +83 -0
package/harness/skills/oh-security/SKILL.md +14 -96
package/harness/skills/oh-ship/DEEP.md +141 -0
package/harness/skills/oh-ship/SKILL.md +14 -32
package/harness/skills/oh-skill-craft/DEEP.md +369 -0
package/harness/skills/oh-skill-craft/SKILL.md +13 -177
package/harness/skills/oh-skills-link/DEEP.md +16 -0
package/harness/skills/oh-skills-link/SKILL.md +10 -20
package/harness/skills/oh-skills-list/DEEP.md +20 -0
package/harness/skills/oh-skills-list/SKILL.md +9 -22
package/harness/skills/oh-triage/DEEP.md +23 -0
package/harness/skills/oh-triage/SKILL.md +8 -24
package/harness/skills/oh-worktree/DEEP.md +169 -0
package/harness/skills/oh-worktree/SKILL.md +32 -0
package/lib/harness-resolver.ts +8 -10
package/package.json +7 -5
package/tsconfig.json +1 -1
package/harness/codex/CONSTITUTION.md +0 -73
package/harness/codex/ROUTING.md +0 -92
package/harness/commands/oh-doctor.md +0 -26
package/harness/commands/oh-log.md +0 -18
package/harness/instructions/RUNTIME.md +0 -30
package/harness/skills/oh-caveman/SKILL.md +0 -42
package/harness/skills/oh-learn/SKILL.md +0 -101
package/lib/logger.ts +0 -75

package/harness/skills/oh-init/SKILL.md CHANGED Viewed

@@ -1,14 +1,7 @@
 ---
 name: oh-init
-description: "Initialize project for OpenHermes: wire AGENTS.md, configure domain docs, issue tracker, and triage labels. Does NOT create .opencode/ directory."
+description: "Sets up AGENTS.md, domain docs, issue tracker, and triage labels"
 tier: 2
-triggers:
-  - "init this project for oh"
-  - "setup project for openhermes"
-  - "initialize openhermes setup"
-  - "onboard this project"
-  - "scaffold project setup"
-  - "oh takeover this project"
 route:
   pass: done
   fail: oh-init
@@ -17,129 +10,22 @@ route:
 # oh-init
-Per-repo setup for OpenHermes-assisted development. Run once per repo. Wires AGENTS.md, configures domain docs, issue tracker, and triage labels. Does NOT create a `.opencode/` directory — plan files go to `~/.local/share/opencode/openhermes/plans/`.
+Wire AGENTS.md, domain docs, issue tracker, and triage labels for a new OpenHermes project.
-Complements OpenCode's built-in `/init` command (which creates `AGENTS.md` with project build/test/architecture notes). Run oh-init after or instead — they serve different layers.
+## Steps
-## Process
-### Phase 0: Check Existing State
-Before writing anything, detect what already exists:
-- ☐ `AGENTS.md` exists? (If yes, was it created by OpenCode `/init` or manually?)
-- ☐ `opencode.json` / `opencode.jsonc` present?
-- ☐ Canonical plan files (`~/.local/share/opencode/openhermes/plans/<project-name>-plan-*.md`)?
-- ☐ `CONTEXT.md` exists?
-- ☐ `docs/agents/` directory exists?
-Report findings. If everything exists, offer to skip or verify and exit.
-### Phase 1: AGENTS.md Wiring
-Check if AGENTS.md exists:
-**If AGENTS.md does not exist:**
-Create it with OpenHermes orchestrator header + prompts for project info:
-```markdown
-# <project-name>
-OpenHermes is the primary orchestrator. All routing, planning, and delegation flows through oh-* skills.
-## Project Context
-- **Language**: <fill in>
-- **Package manager**: <fill in>
-- **Build command**: <fill in>
-- **Test command**: <fill in>
-- **Lint/type check**: <fill in>
-## Key Directives
-- Plan first. Write to `~/.local/share/opencode/openhermes/plans/<project-name>-plan-<nnn>.md` before multi-file changes.
-- **OpenHermes never executes tasks directly. It talks/reports to the user and delegates everything to sub-agents.**
-- Verify before claiming success. Read files, run commands, confirm output.
-- Never write code, run tests, or edit files in the main context — always delegate.
-- Use oh-* skills on demand. Load via OpenCode's skill tool when relevant.
-- Plan file is self-contained (Tasks, Completed, Work Log sections).
-```
-Then ask the user to fill in the Project Context fields. Offer to auto-detect from package manifests.
-**If AGENTS.md exists** (e.g., created by OpenCode `/init`):
-Append an `## OpenHermes Orchestrator` section to the end:
-```markdown
-## OpenHermes Orchestrator
-OpenHermes is the primary orchestrator for this session.
-- **Orchestrator**: OpenHermes — hub-and-spoke routing through oh-* skills
-- **Plan**: `~/.local/share/opencode/openhermes/plans/<project-name>-plan-<nnn>.md` — always check before starting work
-- **Never execute**: OpenHermes talks/reports to the user and delegates everything to sub-agents
-- **Verify before claim**: read files, run commands, confirm output
-```
-### Phase 2: Issue Tracker
-Detect the git hosting platform:
-- **GitHub** — `gh` CLI
-- **GitLab** — `glab` CLI
-- **Local markdown** — files under `.scratch/<feature>/`
-- **Other** — freeform workflow description
-Confirm with the user. Write the result to `docs/agents/issue-tracker.md`.
-### Phase 3: Triage Labels
-The `triage` skill uses these label strings to move issues through a state machine:
-- `needs-triage` — maintainer needs to evaluate
-- `needs-info` — waiting on reporter
-- `ready-for-agent` — fully specified, AFK-ready
-- `ready-for-human` — needs human implementation
-- `wontfix` — will not be actioned
-If the repo already has different label names, map them. Write to `docs/agents/triage-labels.md`.
-### Phase 4: Domain Docs
-Configure how the project organizes domain language:
-- **Single-context** — one `CONTEXT.md` + `docs/adr/` at repo root
-- **Multi-context** — `CONTEXT-MAP.md` pointing to per-context files
-Scaffold `CONTEXT.md` with project name, domain description, and placeholder glossary terms. Create `docs/adr/` directory with ADR template.
-Write to `docs/agents/domain.md`.
-### Phase 5: Agent Skills Block
-Add a `## Agent skills` section to `AGENTS.md` (or `CLAUDE.md` if it exists):
-```markdown
-## Agent skills
-### Issue tracker
-<summary>. See docs/agents/issue-tracker.md.
-### Triage labels
-<summary>. See docs/agents/triage-labels.md.
-### Domain docs
-<summary>. See docs/agents/domain.md.
-```
-### Phase 6: Decision Record
-Record: "oh-init completed for project <name> on <date>."
-## Anti-patterns
-- Running init without understanding the project domain
-- Scaffolding CONTEXT.md without populating any terms
-- Creating ADR directory but never writing ADRs
-- Creating both AGENTS.md and CLAUDE.md — edit the one that exists
-- Overwriting an existing AGENTS.md created by OpenCode `/init` (append instead)
-- Creating `.opencode/` directory — plan files go to OpenCode's canonical storage, not a hidden project dir
-- Empty instinct file never getting populated (run oh-learn extract periodically)
+1. Check existing state (AGENTS.md, opencode.json, plan files, CONTEXT.md, docs/agents/)
+2. Create AGENTS.md with OH orchestrator header or append OH section to existing
+3. Detect issue tracker platform, confirm with user, write to docs/agents/issue-tracker.md
+4. Define triage labels (needs-triage, needs-info, ready-for-agent, ready-for-human, wontfix), write to docs/agents/triage-labels.md
+5. Scaffold CONTEXT.md with project name, domain, glossary placeholders; create docs/adr/ with ADR template; write to docs/agents/domain.md
+6. Append Agent skills block referencing tracker, triage, and domain docs
+7. Record decision artifact
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → [done — one-time project setup] |
-| fail | → [retry with user corrections] |
-| blocker | → surface to user |
+| pass | → done |
+| fail | → oh-init |
+| blocker | → surface |

package/harness/skills/oh-investigate/DEEP.md ADDED Viewed

@@ -0,0 +1,171 @@
+# oh-investigate — Deep Reference
+## The Iron Law
+> **NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST. Surface fixes compound into technical debt.**
+If you haven't completed root cause investigation, you cannot propose fixes. Violating this process is violating the spirit of debugging.
+## Phase 0 — Build a feedback loop
+**This is the actual skill. Everything else is mechanical.**
+A fast, deterministic, agent-runnable pass/fail signal = you find the cause. Without one, staring at code won't save you.
+### Ways to construct a loop (try in order)
+1. Failing test at the bug's seam
+2. Curl/HTTP script against dev server
+3. CLI invocation + fixture, diff stdout
+4. Headless browser — assert on DOM/console/network
+5. Replay captured trace in isolation
+6. Throwaway harness — minimal subset exercising the bug path
+7. Property/fuzz loop — 1000 random inputs
+8. Bisection harness — `git bisect run`-able
+9. Differential loop — old vs new version output diff
+10. HITL script — drive human with structured loop
+**Sharpen the loop:** Faster? Sharper signal (specific symptom, not "didn't crash")? More deterministic (pin time, seed RNG, isolate FS)? A 2s deterministic loop is a superpower.
+**Non-deterministic:** Goal = higher reproduction rate. Loop 100×, parallelize, add stress. 50% flake is debuggable; 1% is not.
+**Cannot build a loop?** Stop. Say so. List what you tried. Do NOT hypothesise without a loop.
+## Workflow
+Complete each phase before proceeding. Each phase consumes the feedback loop built in Phase 0.
+### Phase 1 — Root Cause Investigation
+**Before attempting ANY fix:**
+1. **Reproduce** — Loop confirms the described failure. Exact steps? Every time?
+2. **Read Error Messages** — Read stack traces completely. Note line numbers and error codes.
+3. **Check Recent Changes** — Git diff, recent commits, new dependencies, env differences.
+4. **Minimise** — Strip unrelated code. Remove noise until only the failure path remains.
+5. **Gather Evidence** — One probe per hypothesis. Change one variable. Use unique debug prefixes.
+6. **Trace Data Flow** — If error is deep in call stack, trace backward.
+### Phase 2 — Pattern Analysis
+**Find the pattern before fixing:**
+1. **Find Working Examples** — Locate similar working code. What works that's analogous?
+2. **Compare Against References** — Read reference implementation completely. Don't skim.
+3. **Identify Differences** — List every difference between working and broken. Don't dismiss anything.
+4. **Understand Dependencies** — What components, config, or environment does this depend on?
+### Phase 3 — Hypothesis & Testing
+**Scientific method:**
+1. **Form Single Hypothesis** — "I think X is root cause because Y." Be specific, not vague.
+2. **Test Minimally** — Smallest change to test hypothesis. One variable. Don't fix multiple things.
+3. **Verify** — Prediction held? → Phase 4. No → new hypothesis. DON'T stack more fixes.
+### Phase 4 — Implementation
+**Fix root cause, not symptom:**
+1. **Create Failing Test** — Simplest reproduction. Automated if possible. Must fail before fix.
+2. **Implement Single Fix** — Address root cause. ONE change. No "while I'm here" improvements.
+3. **Verify Fix** — Failing test passes? No other tests broken? Phase 0 loop confirms resolution?
+4. **Regression Test** — Verify existing behavior. No regression seam = architecture gap (flag it).
+5. **Document** — Log root cause + fix. State which hypothesis was correct. What was the trigger?
+## Root Cause Tracing
+Bugs manifest deep in call stacks. Fixing at the symptom treats the wrong layer. **Trace backward through the call chain to find the original trigger.**
+1. **Observe symptom** — Error at point of failure.
+2. **Find immediate cause** — What code directly produces this error?
+3. **What called this?** — Step one level up the call chain.
+4. **Keep tracing up** — What value was passed? Where from?
+5. **Find original trigger** — Root source of bad state. Fix here, not at symptom.
+**Stack trace instrumentation:**
+```
+const stack = new Error().stack;
+console.error('DEBUG <component>:', { directory, cwd, stack });
+```
+Use `console.error()` (logger may be suppressed in tests). Grep output. **Never fix just where the error appears** — trace back and add validation at each layer.
+## Multi-Component Diagnostics
+**In multi-component systems (CI → build → signing, API → service → database), add instrumentation at each boundary BEFORE proposing fixes:**
+- Log data entering and exiting each component
+- Verify environment/config propagation across layers
+- Check state at each layer
+Run once to gather evidence, identify the failing component, THEN investigate it.
+**Example (build pipeline):** Layer 1 (workflow → secrets?), Layer 2 (build → env vars?), Layer 3 (signing → keychain?), Layer 4 (actual signing). Reveals which layer fails in one pass.
+## Red Flags
+**If you catch yourself thinking any of these, STOP. Return to Phase 1.**
+- "Quick fix for now, investigate later"
+- "Just try changing X and see if it works"
+- "Add multiple changes, run tests"
+- "Skip the test, I'll manually verify"
+- "It's probably X, let me fix that"
+- "I don't fully understand but this might work"
+- "Pattern says X but I'll adapt it differently"
+- Proposing solutions before tracing data flow
+- "One more fix attempt" (when already tried 2+)
+- Each fix reveals new problem in different place
+- "Here are the main problems: [lists fixes without investigation]"
+- "Issue is simple, don't need the full process"
+**All of these mean: STOP. Return to investigation.**
+## 3+ Fix Failure Rule
+**After 3 failed fix attempts, STOP and question the architecture.**
+Three failed fixes signals an architectural problem:
+- Each fix reveals new shared state/coupling in a different place
+- Fixes require "massive refactoring" to implement
+- Each fix creates new symptoms elsewhere
+**Do not attempt Fix #4 without architectural discussion:**
+- Is this pattern fundamentally sound?
+- Are we sticking with it through inertia?
+- Should we refactor architecture vs. continue fixing symptoms?
+This is NOT a failed hypothesis — this is a wrong architecture.
+## Partner Signal Monitoring
+**When your human partner says these, they mean you're guessing, not debugging:**
+| Phrase | Meaning |
+|--------|---------|
+| "Is that happening?" | You assumed without verifying |
+| "Will it show us...?" | You skipped evidence gathering |
+| "Stop guessing" | You're proposing fixes without understanding |
+| "We're stuck?" | Your approach isn't working |
+**When you see any of these: STOP. Return to Phase 1.**
+## Common Rationalizations
+| Excuse | Reality |
+|--------|---------|
+| "Issue is simple, don't need process" | Simple bugs have root causes too. Process is fast. |
+| "Emergency, no time for process" | Systematic is FASTER than guess-and-check thrashing. |
+| "Just try this first, then investigate" | First fix sets the pattern. Do it right from start. |
+| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves bug. |
+| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
+| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read fully. |
+| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
+| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Stop fixing. |
+## Anti-patterns
+- Fixing symptoms (same bug reappears)
+- Changing code without reproducing
+- Shotgun debugging (multiple changes hoping one sticks)
+- Not documenting root cause
+- Hypothesizing without a feedback loop

package/harness/skills/oh-investigate/SKILL.md CHANGED Viewed

@@ -1,12 +1,7 @@
 ---
 name: oh-investigate
-description: "Systematic bug diagnosis with root cause investigation"
+description: "Use when debugging any bug, test failure, or unexpected behavior. Finds root cause systematically before attempting fixes."
 tier: 2
-triggers:
-  - "investigate this bug"
-  - "debug this"
-  - "why is this broken"
-  - "root cause"
 route:
   pass: oh-builder
   fail: oh-expert
@@ -15,70 +10,22 @@ route:
 # oh-investigate
-## When to Use
-When a bug is reported, a test fails, or unexpected behavior occurs. Use this before attempting any fix.
+Systematic bug diagnosis: build a feedback loop, trace root cause, fix with evidence.
-## Phase 0 — Build a feedback loop
+## Steps
-**This is the actual skill. Everything else is mechanical.**
-If you have a fast, deterministic, agent-runnable pass/fail signal for the bug, you will find the cause — bisection, hypothesis-testing, and instrumentation are just consuming that signal. If you don't have one, no amount of staring at code will save you.
-Spend disproportionate effort here. **Be aggressive. Be creative. Refuse to give up.**
-### Ways to construct a feedback loop (try in this order)
-1. **Failing test** at whatever seam reaches the bug.
-2. **Curl / HTTP script** against a running dev server.
-3. **CLI invocation** with a fixture input, diffing stdout against a known-good snapshot.
-4. **Headless browser script** — drive the UI, assert on DOM/console/network.
-5. **Replay a captured trace** — save a real payload/event log, replay it in isolation.
-6. **Throwaway harness** — minimal subset of the system exercising the bug code path with a single call.
-7. **Property / fuzz loop** — run 1000 random inputs, look for the failure mode.
-8. **Bisection harness** — automate "boot at state X, check, repeat" so you can `git bisect run` it.
-9. **Differential loop** — run same input through old-version vs new-version, diff outputs.
-10. **HITL script** — last resort. Drive a human with a structured loop.
-### Iterate on the loop itself
-- Can I make it faster? (Cache setup, skip unrelated init, narrow the scope.)
-- Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".)
-- Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem.)
-A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower.
-### Non-deterministic bugs
-The goal is not a clean repro but a **higher reproduction rate**. Loop the trigger 100×, parallelise, add stress, narrow timing windows. A 50%-flake bug is debuggable; 1% is not.
-### When you genuinely cannot build a loop
-Stop and say so explicitly. List what you tried. Do **not** proceed to hypothesise without a loop.
-## Workflow (consumes the loop)
-1. **Reproduce** — run the loop, confirm the bug appears. The loop must match the user's described failure, not a different nearby failure.
-2. **Minimise** — strip away unrelated code until the minimal reproduction remains.
-3. **Hypothesise** — generate 3–5 ranked falsifiable hypotheses before testing any. Each must state a prediction: "If X is the cause, then changing Y will make the bug disappear".
-4. **Instrument** — one probe per hypothesis. Change one variable at a time. Tag every debug log with a unique prefix (e.g. `[DEBUG-a4f2]`) for easy cleanup.
-5. **Fix** — write the regression test at a correct seam first. Watch it fail. Apply the smallest correct change. Watch it pass. Re-run the Phase 0 loop against the original scenario.
-6. **Regression test** — verify fix doesn't break existing behavior. If no correct seam exists for a regression test, that itself is a finding — flag the architecture gap.
-7. **Document** — log the root cause and fix in the handoff, issue, or relevant docs. State which hypothesis was correct so the next debugger learns.
-## Iron Law
-No fixes without root cause. Surface-level fixes compound into technical debt.
-## Anti-patterns
-- Fixing symptoms instead of causes (the same bug reappears next week)
-- Changing code without reproducing the bug first
-- "Shotgun" debugging — changing multiple things hoping one sticks
-- Not documenting root cause for future reference
-- Proceeding to hypothesise without a feedback loop
+1. Build a feedback loop — failing test, curl, CLI, headless browser, or throwaway harness. Must be fast, deterministic, and agent-runnable.
+2. Apply the Iron Law — NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST. Surface fixes compound into technical debt.
+3. Reproduce and trace — confirm failure, read stack traces, check recent changes, minimise to failure path, gather evidence one probe per hypothesis.
+4. Trace backward — from symptom through call chain to original trigger. Instrument boundaries with debug logging.
+5. Find working pattern — locate similar working code, compare against references, list every difference.
+6. Form single hypothesis — "I think X is root cause because Y." Test with minimal change. One variable.
+7. Implement fix — create failing test first, apply one fix, verify resolution, regression test, document root cause.
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → oh-builder (implement the fix) |
-| fail | → oh-expert (deepen diagnosis) |
-| blocker | → surface to user |
+| pass | → oh-builder (fix) |
+| fail | → oh-expert (deepen) |
+| blocker | → surface |

package/harness/skills/oh-issue/DEEP.md ADDED Viewed

@@ -0,0 +1,21 @@
+# oh-issue — Deep Reference
+## When to Use
+Plan/PRD needs breaking into actionable issues. Vertical tracer-bullet slices.
+Triggers: break into issues, create issues from plan, issue breakdown.
+## Issue Structure
+- **Title**: action-oriented ("Add user auth API")
+- **AC**: concrete, testable ("User signs up with email + password")
+- **Notes**: pointers for implementer
+- **Deps**: what must come first
+- **Labels**: type, priority, area
+## Anti-patterns
+- Horizontal slicing (no one ships "DB layer" alone)
+- Issues too large (3+ days) or too small (<1 hour)
+- Missing acceptance criteria

package/harness/skills/oh-issue/SKILL.md CHANGED Viewed

@@ -1,11 +1,7 @@
 ---
 name: oh-issue
-description: "Break a plan, spec, or PRD into independently-grabbable GitHub issues"
+description: "Break plans/PRDs into independently-grabbable GitHub issues"
 tier: 2
-triggers:
-  - "break into issues"
-  - "create issues from plan"
-  - "issue breakdown"
 route:
   pass: done
   fail: oh-planner
@@ -14,32 +10,20 @@ route:
 # oh-issue
-## When to Use
-When a plan exists and needs to be broken into actionable issues. Uses tracer-bullet vertical slices for independent work items.
+Break plans/PRDs into vertical-slice issues with acceptance criteria and dependencies.
-## Workflow
-1. Read the plan or PRD
-2. Identify vertical slices — self-contained features that ship independently
-3. Write each issue with: clear title, acceptance criteria, implementation notes, dependencies
-4. Use `gh issue create` to publish each issue
-5. Label and milestone each issue appropriately
+## Steps
-## Issue Structure
-- **Title**: action-oriented ("Add user authentication API")
-- **Acceptance criteria**: concrete, testable ("User can sign up with email + password")
-- **Implementation notes**: pointers for the implementer
-- **Dependencies**: what must be done first
-- **Labels**: type, priority, area
-## Anti-patterns
-- Horizontal slicing (DB layer / API layer / UI layer — no one ships a layer)
-- Issues too large (3+ days) or too small (< 1 hour)
-- Writing issues without acceptance criteria
+1. Read plan or PRD
+2. Identify vertical slices — self-contained, independently shippable
+3. Write each issue with title, acceptance criteria, implementation notes, and dependencies
+4. Publish issues via `gh issue create`
+5. Apply labels and milestone
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → [done — issues published to tracker] |
-| fail | → oh-planner (re-spec unclear slices) |
-| blocker | → surface to user |
+| pass | → done |
+| fail | → oh-planner |
+| blocker | → surface |

package/harness/skills/oh-manifest/DEEP.md ADDED Viewed

@@ -0,0 +1,92 @@
+# oh-manifest — Deep Reference
+## Phase 0: Pre-Flight
+ALL must pass before any work:
+- ☐ **Quality baseline** — existing tests pass. Capture before/after.
+- ☐ **Rollback path** — clean `git stash` or committed state to return to.
+- ☐ **Branch isolation** — working branch, not main/master.
+- ☐ **Scope documented** — plan exists and unambiguous.
+Any check fails → STOP. Report which. Do not proceed until resolved.
+**Continuous execution:** Execute all tasks without pausing for progress check-ins between them. Only stop for BLOCKED, genuine ambiguity, or all tasks complete.
+## Pipeline
+### Step 1: Plan
+If plan exists, load. If not, run oh-planner. Auto-decide minor scope via decision principles. Surface only: premises needing human judgment, or plan/alternative conflicts.
+### Step 2: Build
+Run oh-builder for each plan phase in dependency order. Parallelizable phases → sub-agents. Auto-decide implementation choices.
+**Two-stage review (in order — never reverse):**
+1. **Spec compliance first** — Does the output match the plan/spec requirements? Quote the spec. No scope creep, no missing requirements.
+2. **Code quality second** — Only after spec compliance is ✅. Architecture, readability, test quality, edge cases.
+**Implementer status protocol** — Implementers report one of:
+| Status | Action |
+|--------|--------|
+| **DONE** | Proceed to spec review |
+| **DONE_WITH_CONCERNS** | Read concerns before proceeding |
+| **NEEDS_CONTEXT** | Provide context, re-dispatch |
+| **BLOCKED** | Assess: context problem? capability gap? task too large? plan wrong? |
+Never ignore BLOCKED or retry same approach without changes.
+### Step 3: Verify
+Check each phase against verification criteria. Tests pass → mark complete. Fail → diagnose (oh-expert), fix, re-verify.
+### Step 4: Loop
+All done → DONE. Phase fails → BLOCKER (surface). New work discovered → add to plan, continue.
+## Loop Patterns
+| Pattern | Use | Behavior |
+|---------|-----|----------|
+| sequential | Normal features | One phase at a time, verify each |
+| continuous-pr | Multi-step refactors | Per-phase PRs |
+| infinite | Watch mode, CI repair | Continue until stop signal |
+| rfc-dag | Complex deps | DAG resolution, parallelize independent branches |
+Default: sequential.
+## Escalation Triggers
+| Trigger | Condition | Action |
+|---------|-----------|--------|
+| Stall | 2 consecutive zero-progress checkpoints | Pause, report attempts |
+| Retry storm | Same error 5+ times | Stop, surface with fixes tried |
+| Cost drift | Cumulative changes exceed scope | Pause, show diff |
+| Quality regression | Verify scores lower than baseline | Pause, report |
+These are not optional. When triggered, loop **must** pause.
+## Decision Principles
+Auto-resolve: completeness > cleverness, boil the lake, pragmatic > perfect, DRY at 3rd instance, explicit > implicit, bias toward action.
+Surface only: premises, dead ends, cross-model disagreement.
+**Model selection guidance:**
+- Mechanical tasks (isolated, 1-2 files, clear spec) → fast cheap model
+- Integration tasks (multi-file, coordination) → standard model
+- Architecture/design/review tasks → most capable model
+## Blocker Protocol
+`BLOCKER: <what> | Options: A, B, C` → wait for decision.
+## Anti-patterns
+- Skipping pre-flight
+- Auto-deciding premises
+- Pushing through blockers without surfacing
+- Skipping verification
+- Parallelizing dependent phases
+- Not updating plan file
+- Ignoring escalation triggers
+- Starting code quality review before spec compliance is ✅
+- Ignoring implementer BLOCKED status and retrying with same approach
+- Pausing between tasks for progress updates (breaks flow)