npm - openhermes - Versions diffs - 4.1.0 → 4.9.2 - Mend

openhermes 4.1.0 → 4.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (99) hide show

package/CONTEXT.md +9 -0
package/ETHOS.md +6 -3
package/LICENSE +21 -21
package/README.md +120 -79
package/bootstrap.ts +284 -41
package/harness/agents/oh-browser.md +97 -0
package/harness/agents/oh-builder.md +78 -0
package/harness/agents/oh-facade.md +75 -0
package/harness/agents/oh-fusion.md +45 -0
package/harness/agents/oh-gauntlet.md +71 -0
package/harness/agents/oh-grill.md +71 -0
package/harness/agents/oh-investigate.md +60 -0
package/harness/agents/oh-manifest.md +95 -0
package/harness/agents/oh-plan-review.md +40 -0
package/harness/agents/oh-planner.md +50 -0
package/harness/agents/oh-refactor.md +37 -0
package/harness/agents/oh-retro.md +46 -0
package/harness/agents/oh-review.md +85 -0
package/harness/agents/oh-security.md +83 -0
package/harness/agents/oh-ship.md +76 -0
package/harness/agents/oh-skill-craft.md +38 -0
package/harness/agents/openhermes.md +106 -62
package/harness/codex/AUTOPILOT.md +178 -0
package/harness/codex/CHARTER.md +81 -0
package/harness/commands/oh-doctor.md +193 -14
package/harness/commands/oh-log.md +18 -0
package/harness/instructions/SHELL.md +76 -0
package/harness/skills/oh-ascii/DEEP.md +292 -0
package/harness/skills/oh-ascii/SKILL.md +31 -0
package/harness/skills/oh-ascii/scripts/check_ascii_alignment.py +596 -0
package/harness/skills/oh-browser/DEEP.md +54 -0
package/harness/skills/oh-browser/SKILL.md +30 -0
package/harness/skills/oh-builder/DEEP.md +63 -0
package/harness/skills/oh-builder/SKILL.md +16 -89
package/harness/skills/oh-expert/DEEP.md +85 -0
package/harness/skills/oh-expert/SKILL.md +19 -106
package/harness/skills/oh-facade/DEEP.md +182 -0
package/harness/skills/oh-facade/SKILL.md +34 -0
package/harness/skills/oh-freeze/DEEP.md +18 -0
package/harness/skills/oh-freeze/SKILL.md +15 -15
package/harness/skills/oh-full-output/DEEP.md +25 -0
package/harness/skills/oh-full-output/SKILL.md +28 -0
package/harness/skills/oh-fusion/DEEP.md +120 -0
package/harness/skills/oh-fusion/SKILL.md +36 -0
package/harness/skills/oh-gauntlet/DEEP.md +77 -0
package/harness/skills/oh-gauntlet/SKILL.md +17 -105
package/harness/skills/oh-grill/DEEP.md +51 -0
package/harness/skills/oh-grill/SKILL.md +16 -63
package/harness/skills/oh-guard/DEEP.md +19 -0
package/harness/skills/oh-guard/SKILL.md +15 -20
package/harness/skills/oh-handoff/DEEP.md +48 -0
package/harness/skills/oh-handoff/SKILL.md +18 -19
package/harness/skills/oh-health/DEEP.md +74 -0
package/harness/skills/oh-health/SKILL.md +17 -76
package/harness/skills/oh-init/DEEP.md +85 -0
package/harness/skills/oh-init/SKILL.md +17 -197
package/harness/skills/oh-investigate/DEEP.md +171 -0
package/harness/skills/oh-investigate/SKILL.md +18 -61
package/harness/skills/oh-issue/DEEP.md +21 -0
package/harness/skills/oh-issue/SKILL.md +16 -23
package/harness/skills/oh-learn/DEEP.md +44 -0
package/harness/skills/oh-learn/SKILL.md +17 -79
package/harness/skills/oh-manifest/DEEP.md +92 -0
package/harness/skills/oh-manifest/SKILL.md +15 -107
package/harness/skills/oh-plan-review/DEEP.md +90 -0
package/harness/skills/oh-plan-review/SKILL.md +19 -114
package/harness/skills/oh-planner/DEEP.md +172 -0
package/harness/skills/oh-planner/SKILL.md +16 -143
package/harness/skills/oh-prd/DEEP.md +45 -0
package/harness/skills/oh-prd/SKILL.md +15 -22
package/harness/skills/oh-refactor/DEEP.md +122 -0
package/harness/skills/oh-refactor/SKILL.md +33 -0
package/harness/skills/oh-retro/DEEP.md +26 -0
package/harness/skills/oh-retro/SKILL.md +17 -20
package/harness/skills/oh-review/DEEP.md +87 -0
package/harness/skills/oh-review/SKILL.md +17 -96
package/harness/skills/oh-security/DEEP.md +83 -0
package/harness/skills/oh-security/SKILL.md +18 -96
package/harness/skills/oh-ship/DEEP.md +141 -0
package/harness/skills/oh-ship/SKILL.md +18 -26
package/harness/skills/oh-skill-craft/DEEP.md +369 -0
package/harness/skills/oh-skill-craft/SKILL.md +20 -93
package/harness/skills/oh-skills-link/DEEP.md +16 -0
package/harness/skills/oh-skills-link/SKILL.md +15 -16
package/harness/skills/oh-skills-list/DEEP.md +20 -0
package/harness/skills/oh-skills-list/SKILL.md +14 -18
package/harness/skills/oh-triage/DEEP.md +23 -0
package/harness/skills/oh-triage/SKILL.md +15 -20
package/harness/skills/oh-worktree/DEEP.md +169 -0
package/harness/skills/oh-worktree/SKILL.md +32 -0
package/lib/harness-resolver.ts +10 -12
package/package.json +9 -4
package/scripts/count-tokens.mjs +158 -0
package/scripts/oh-doctor.ps1 +342 -0
package/harness/codex/CONSTITUTION.md +0 -70
package/harness/codex/ROUTING.md +0 -127
package/harness/instructions/RUNTIME.md +0 -55
package/harness/skills/oh-caveman/SKILL.md +0 -33
package/lib/logger.ts +0 -69

package/harness/skills/oh-investigate/DEEP.md ADDED Viewed

@@ -0,0 +1,171 @@
+# oh-investigate — Deep Reference
+## The Iron Law
+> **NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST. Surface fixes compound into technical debt.**
+If you haven't completed root cause investigation, you cannot propose fixes. Violating this process is violating the spirit of debugging.
+## Phase 0 — Build a feedback loop
+**This is the actual skill. Everything else is mechanical.**
+A fast, deterministic, agent-runnable pass/fail signal = you find the cause. Without one, staring at code won't save you.
+### Ways to construct a loop (try in order)
+1. Failing test at the bug's seam
+2. Curl/HTTP script against dev server
+3. CLI invocation + fixture, diff stdout
+4. Headless browser — assert on DOM/console/network
+5. Replay captured trace in isolation
+6. Throwaway harness — minimal subset exercising the bug path
+7. Property/fuzz loop — 1000 random inputs
+8. Bisection harness — `git bisect run`-able
+9. Differential loop — old vs new version output diff
+10. HITL script — drive human with structured loop
+**Sharpen the loop:** Faster? Sharper signal (specific symptom, not "didn't crash")? More deterministic (pin time, seed RNG, isolate FS)? A 2s deterministic loop is a superpower.
+**Non-deterministic:** Goal = higher reproduction rate. Loop 100×, parallelize, add stress. 50% flake is debuggable; 1% is not.
+**Cannot build a loop?** Stop. Say so. List what you tried. Do NOT hypothesise without a loop.
+## Workflow
+Complete each phase before proceeding. Each phase consumes the feedback loop built in Phase 0.
+### Phase 1 — Root Cause Investigation
+**Before attempting ANY fix:**
+1. **Reproduce** — Loop confirms the described failure. Exact steps? Every time?
+2. **Read Error Messages** — Read stack traces completely. Note line numbers and error codes.
+3. **Check Recent Changes** — Git diff, recent commits, new dependencies, env differences.
+4. **Minimise** — Strip unrelated code. Remove noise until only the failure path remains.
+5. **Gather Evidence** — One probe per hypothesis. Change one variable. Use unique debug prefixes.
+6. **Trace Data Flow** — If error is deep in call stack, trace backward.
+### Phase 2 — Pattern Analysis
+**Find the pattern before fixing:**
+1. **Find Working Examples** — Locate similar working code. What works that's analogous?
+2. **Compare Against References** — Read reference implementation completely. Don't skim.
+3. **Identify Differences** — List every difference between working and broken. Don't dismiss anything.
+4. **Understand Dependencies** — What components, config, or environment does this depend on?
+### Phase 3 — Hypothesis & Testing
+**Scientific method:**
+1. **Form Single Hypothesis** — "I think X is root cause because Y." Be specific, not vague.
+2. **Test Minimally** — Smallest change to test hypothesis. One variable. Don't fix multiple things.
+3. **Verify** — Prediction held? → Phase 4. No → new hypothesis. DON'T stack more fixes.
+### Phase 4 — Implementation
+**Fix root cause, not symptom:**
+1. **Create Failing Test** — Simplest reproduction. Automated if possible. Must fail before fix.
+2. **Implement Single Fix** — Address root cause. ONE change. No "while I'm here" improvements.
+3. **Verify Fix** — Failing test passes? No other tests broken? Phase 0 loop confirms resolution?
+4. **Regression Test** — Verify existing behavior. No regression seam = architecture gap (flag it).
+5. **Document** — Log root cause + fix. State which hypothesis was correct. What was the trigger?
+## Root Cause Tracing
+Bugs manifest deep in call stacks. Fixing at the symptom treats the wrong layer. **Trace backward through the call chain to find the original trigger.**
+1. **Observe symptom** — Error at point of failure.
+2. **Find immediate cause** — What code directly produces this error?
+3. **What called this?** — Step one level up the call chain.
+4. **Keep tracing up** — What value was passed? Where from?
+5. **Find original trigger** — Root source of bad state. Fix here, not at symptom.
+**Stack trace instrumentation:**
+```
+const stack = new Error().stack;
+console.error('DEBUG <component>:', { directory, cwd, stack });
+```
+Use `console.error()` (logger may be suppressed in tests). Grep output. **Never fix just where the error appears** — trace back and add validation at each layer.
+## Multi-Component Diagnostics
+**In multi-component systems (CI → build → signing, API → service → database), add instrumentation at each boundary BEFORE proposing fixes:**
+- Log data entering and exiting each component
+- Verify environment/config propagation across layers
+- Check state at each layer
+Run once to gather evidence, identify the failing component, THEN investigate it.
+**Example (build pipeline):** Layer 1 (workflow → secrets?), Layer 2 (build → env vars?), Layer 3 (signing → keychain?), Layer 4 (actual signing). Reveals which layer fails in one pass.
+## Red Flags
+**If you catch yourself thinking any of these, STOP. Return to Phase 1.**
+- "Quick fix for now, investigate later"
+- "Just try changing X and see if it works"
+- "Add multiple changes, run tests"
+- "Skip the test, I'll manually verify"
+- "It's probably X, let me fix that"
+- "I don't fully understand but this might work"
+- "Pattern says X but I'll adapt it differently"
+- Proposing solutions before tracing data flow
+- "One more fix attempt" (when already tried 2+)
+- Each fix reveals new problem in different place
+- "Here are the main problems: [lists fixes without investigation]"
+- "Issue is simple, don't need the full process"
+**All of these mean: STOP. Return to investigation.**
+## 3+ Fix Failure Rule
+**After 3 failed fix attempts, STOP and question the architecture.**
+Three failed fixes signals an architectural problem:
+- Each fix reveals new shared state/coupling in a different place
+- Fixes require "massive refactoring" to implement
+- Each fix creates new symptoms elsewhere
+**Do not attempt Fix #4 without architectural discussion:**
+- Is this pattern fundamentally sound?
+- Are we sticking with it through inertia?
+- Should we refactor architecture vs. continue fixing symptoms?
+This is NOT a failed hypothesis — this is a wrong architecture.
+## Partner Signal Monitoring
+**When your human partner says these, they mean you're guessing, not debugging:**
+| Phrase | Meaning |
+|--------|---------|
+| "Is that happening?" | You assumed without verifying |
+| "Will it show us...?" | You skipped evidence gathering |
+| "Stop guessing" | You're proposing fixes without understanding |
+| "We're stuck?" | Your approach isn't working |
+**When you see any of these: STOP. Return to Phase 1.**
+## Common Rationalizations
+| Excuse | Reality |
+|--------|---------|
+| "Issue is simple, don't need process" | Simple bugs have root causes too. Process is fast. |
+| "Emergency, no time for process" | Systematic is FASTER than guess-and-check thrashing. |
+| "Just try this first, then investigate" | First fix sets the pattern. Do it right from start. |
+| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves bug. |
+| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
+| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read fully. |
+| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
+| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Stop fixing. |
+## Anti-patterns
+- Fixing symptoms (same bug reappears)
+- Changing code without reproducing
+- Shotgun debugging (multiple changes hoping one sticks)
+- Not documenting root cause
+- Hypothesizing without a feedback loop

package/harness/skills/oh-investigate/SKILL.md CHANGED Viewed

@@ -1,74 +1,31 @@
 ---
 name: oh-investigate
-description: "Systematic bug diagnosis with root cause investigation"
+description: "Use when debugging any bug, test failure, or unexpected behavior. Finds root cause systematically before attempting fixes."
+tier: 2
+route:
+  pass: oh-builder
+  fail: oh-expert
+  blocker: surface
 ---
 # oh-investigate
-## When to Use
-When a bug is reported, a test fails, or unexpected behavior occurs. Use this before attempting any fix.
+Systematic bug diagnosis: build a feedback loop, trace root cause, fix with evidence.
-## Phase 0 — Build a feedback loop
+## Steps
-**This is the actual skill. Everything else is mechanical.**
-If you have a fast, deterministic, agent-runnable pass/fail signal for the bug, you will find the cause — bisection, hypothesis-testing, and instrumentation are just consuming that signal. If you don't have one, no amount of staring at code will save you.
-Spend disproportionate effort here. **Be aggressive. Be creative. Refuse to give up.**
-### Ways to construct a feedback loop (try in this order)
-1. **Failing test** at whatever seam reaches the bug.
-2. **Curl / HTTP script** against a running dev server.
-3. **CLI invocation** with a fixture input, diffing stdout against a known-good snapshot.
-4. **Headless browser script** — drive the UI, assert on DOM/console/network.
-5. **Replay a captured trace** — save a real payload/event log, replay it in isolation.
-6. **Throwaway harness** — minimal subset of the system exercising the bug code path with a single call.
-7. **Property / fuzz loop** — run 1000 random inputs, look for the failure mode.
-8. **Bisection harness** — automate "boot at state X, check, repeat" so you can `git bisect run` it.
-9. **Differential loop** — run same input through old-version vs new-version, diff outputs.
-10. **HITL script** — last resort. Drive a human with a structured loop.
-### Iterate on the loop itself
-- Can I make it faster? (Cache setup, skip unrelated init, narrow the scope.)
-- Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".)
-- Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem.)
-A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower.
-### Non-deterministic bugs
-The goal is not a clean repro but a **higher reproduction rate**. Loop the trigger 100×, parallelise, add stress, narrow timing windows. A 50%-flake bug is debuggable; 1% is not.
-### When you genuinely cannot build a loop
-Stop and say so explicitly. List what you tried. Do **not** proceed to hypothesise without a loop.
-## Workflow (consumes the loop)
-1. **Reproduce** — run the loop, confirm the bug appears. The loop must match the user's described failure, not a different nearby failure.
-2. **Minimise** — strip away unrelated code until the minimal reproduction remains.
-3. **Hypothesise** — generate 3–5 ranked falsifiable hypotheses before testing any. Each must state a prediction: "If X is the cause, then changing Y will make the bug disappear".
-4. **Instrument** — one probe per hypothesis. Change one variable at a time. Tag every debug log with a unique prefix (e.g. `[DEBUG-a4f2]`) for easy cleanup.
-5. **Fix** — write the regression test at a correct seam first. Watch it fail. Apply the smallest correct change. Watch it pass. Re-run the Phase 0 loop against the original scenario.
-6. **Regression test** — verify fix doesn't break existing behavior. If no correct seam exists for a regression test, that itself is a finding — flag the architecture gap.
-7. **Document** — log the root cause and fix in the handoff, issue, or relevant docs. State which hypothesis was correct so the next debugger learns.
-## Iron Law
-No fixes without root cause. Surface-level fixes compound into technical debt.
-## Anti-patterns
-- Fixing symptoms instead of causes (the same bug reappears next week)
-- Changing code without reproducing the bug first
-- "Shotgun" debugging — changing multiple things hoping one sticks
-- Not documenting root cause for future reference
-- Proceeding to hypothesise without a feedback loop
+1. Build a feedback loop — failing test, curl, CLI, headless browser, or throwaway harness. Must be fast, deterministic, and agent-runnable.
+2. Apply the Iron Law — NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST. Surface fixes compound into technical debt.
+3. Reproduce and trace — confirm failure, read stack traces, check recent changes, minimise to failure path, gather evidence one probe per hypothesis.
+4. Trace backward — from symptom through call chain to original trigger. Instrument boundaries with debug logging.
+5. Find working pattern — locate similar working code, compare against references, list every difference.
+6. Form single hypothesis — "I think X is root cause because Y." Test with minimal change. One variable.
+7. Implement fix — create failing test first, apply one fix, verify resolution, regression test, document root cause.
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → oh-builder (implement the fix) |
-| fail | → oh-expert (deepen diagnosis) |
-| blocker | → surface to user |
+| pass | → oh-builder (fix) |
+| fail | → oh-expert (deepen) |
+| blocker | → surface |

package/harness/skills/oh-issue/DEEP.md ADDED Viewed

@@ -0,0 +1,21 @@
+# oh-issue — Deep Reference
+## When to Use
+Plan/PRD needs breaking into actionable issues. Vertical tracer-bullet slices.
+Triggers: break into issues, create issues from plan, issue breakdown.
+## Issue Structure
+- **Title**: action-oriented ("Add user auth API")
+- **AC**: concrete, testable ("User signs up with email + password")
+- **Notes**: pointers for implementer
+- **Deps**: what must come first
+- **Labels**: type, priority, area
+## Anti-patterns
+- Horizontal slicing (no one ships "DB layer" alone)
+- Issues too large (3+ days) or too small (<1 hour)
+- Missing acceptance criteria

package/harness/skills/oh-issue/SKILL.md CHANGED Viewed

@@ -1,36 +1,29 @@
 ---
 name: oh-issue
-description: "Break a plan, spec, or PRD into independently-grabbable GitHub issues"
+description: "Break plans/PRDs into independently-grabbable GitHub issues"
+tier: 2
+route:
+  pass: done
+  fail: oh-planner
+  blocker: surface
 ---
 # oh-issue
-## When to Use
-When a plan exists and needs to be broken into actionable issues. Uses tracer-bullet vertical slices for independent work items.
+Break plans/PRDs into vertical-slice issues with acceptance criteria and dependencies.
-## Workflow
-1. Read the plan or PRD
-2. Identify vertical slices — self-contained features that ship independently
-3. Write each issue with: clear title, acceptance criteria, implementation notes, dependencies
-4. Use `gh issue create` to publish each issue
-5. Label and milestone each issue appropriately
+## Steps
-## Issue Structure
-- **Title**: action-oriented ("Add user authentication API")
-- **Acceptance criteria**: concrete, testable ("User can sign up with email + password")
-- **Implementation notes**: pointers for the implementer
-- **Dependencies**: what must be done first
-- **Labels**: type, priority, area
-## Anti-patterns
-- Horizontal slicing (DB layer / API layer / UI layer — no one ships a layer)
-- Issues too large (3+ days) or too small (< 1 hour)
-- Writing issues without acceptance criteria
+1. Read plan or PRD
+2. Identify vertical slices — self-contained, independently shippable
+3. Write each issue with title, acceptance criteria, implementation notes, and dependencies
+4. Publish issues via `gh issue create`
+5. Apply labels and milestone
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → [done — issues published to tracker] |
-| fail | → oh-planner (re-spec unclear slices) |
-| blocker | → surface to user |
+| pass | → done |
+| fail | → oh-planner |
+| blocker | → surface |

package/harness/skills/oh-learn/DEEP.md ADDED Viewed

@@ -0,0 +1,44 @@
+# oh-learn — Deep Reference
+## When to Use
+Session learnings should be captured, reviewed, or promoted as reusable instincts for future work.
+Triggers: learn from session, extract patterns, run oh-learn.
+## Instinct Data Model
+JSONL at `~/.local/share/opencode/openhermes/plans/<project>-instincts.jsonl`:
+```json
+{"trigger": "specific situation", "action": "recommended response", "confidence": 0.5, "applications": 1, "successes": 1, "category": "coding", "source": "oh-learn:extract", "ts": "2026-05-15T12:00:00Z"}
+```
+**Trigger:** specific, matchable (not general advice). **Action:** executable (not belief). **Confidence:** starts 0.5, +0.05 per success, -0.02/day decay. **Category:** coding, testing, security, git, planning, orchestration, debugging, ux.
+## Workflows
+### Extract
+Scan session for repeated decisions. For each: write instinct. Check existing file for near-duplicates. Merge (max confidence, increment applications) or append.
+### Evolve
+Read all instincts. Group by category then topic. ≥5 instincts with avg confidence ≥ 0.7 → oh-skill-craft spec. 3-4 with confidence ≥ 0.8 → suggest update to existing skill.
+### Promote
+Instincts with confidence ≥ 0.85 AND applications ≥ 10 → filter project-specific → append to global `%USERPROFILE%\.config\opencode\instincts.jsonl`. Tag promoted.
+### Review / Search / Prune / Export
+Review: totals + distributions. Search: by topic, trigger, category, confidence. Prune: stale >30d with confidence < 0.3. Export: portable JSON.
+## Anti-patterns
+- Hoarding every observation (most aren't learnings)
+- Never pruning
+- Storing what not why
+- Over-promoting to global
+- Extracting without applying
+- Ignoring confidence

package/harness/skills/oh-learn/SKILL.md CHANGED Viewed

@@ -1,92 +1,30 @@
 ---
 name: oh-learn
-description: "Extract, evolve, and promote session learnings as instincts. Review, search, prune, export."
+description: "Capture, review, and promote session learnings as reusable instincts"
+tier: 2
+route:
+  pass: done
+  fail: surface
+  blocker: surface
 ---
 # oh-learn
-Learning engine for the harness. Distills patterns from sessions into **instincts** (trigger-action pairs with confidence), clusters them into skill candidates, and graduates high-signal patterns from project to global scope.
+Distill session patterns into instincts, cluster into skill candidates, and promote high-signal patterns.
-## Instinct Data Model
+## Steps
-Every learning stored as one JSONL line in `.opencode/instincts.jsonl`:
-```json
-{ "trigger": "situation pattern", "action": "recommended response", "confidence": 0.5, "applications": 1, "successes": 1, "category": "coding", "source": "oh-learn:extract", "ts": "2026-05-15T12:00:00Z" }
-```
-**Rules:**
-- **Trigger** — specific, matchable situation. *Not* general advice.
-- **Action** — executable response. *Not* a belief.
-- **Confidence** — starts at 0.5, increments +0.05 per successful application, decays -0.02 per day without use.
-- **Category** — one of: `coding`, `testing`, `security`, `git`, `planning`, `orchestration`, `debugging`, `ux`.
-## When to Use
-After completing a significant piece of work, at session handoff, or when you notice the same pattern repeat 2+ times in one session. Also on explicit user request.
-## Workflows
-### Extract
-Mine the current session for reusable patterns.
-1. Scan recent conversation + code changes for repeated decision patterns
-2. For each distinct pattern write an instinct: trigger, action, confidence=0.5, category
-3. Read existing `.opencode/instincts.jsonl`, check for near-duplicate triggers
-4. If duplicate found: merge — `confidence = max(existing, 0.8 × new)`, increment applications
-5. If new: append line to file
-**Good instinct:** trigger=`"tsc --noEmit shows 10+ errors after batch edit"`, action=`"Fix errors one at a time, re-running tsc after each, rather than batch-fixing"`, category=`"debugging"`
-**Bad instinct:** `"Write clean code"` — too vague to trigger on.
-### Evolve
-Cluster related instincts into skill/command/agent candidates.
-1. Read all instincts from `.opencode/instincts.jsonl`
-2. Group by `category`, then by trigger topic similarity
-3. **If cluster ≥ 5 instincts AND avg confidence ≥ 0.7** → generate `oh-skill-craft` spec for a new skill
-4. **If cluster 3-4 instincts with confidence ≥ 0.8** → suggest update to existing skill
-5. Output candidate summary with trigger list and extracted core pattern
-### Promote
-Graduate high-confidence instincts from project to global scope.
-1. Scan `.opencode/instincts.jsonl` for instincts with `confidence >= 0.85 AND applications >= 10`
-2. Filter out project-specific patterns (reference paths, local APIs, domain terms)
-3. Append filtered candidates to `%USERPROFILE%\.config\opencode\instincts.jsonl` (global)
-4. Tag promoted instincts with `"promoted": true` in project file
-5. Report: "Promoted N instincts to global scope"
-### Review
-Show instinct summary: total count, confidence distribution, category breakdown, recently promoted.
-### Search
-Find instincts by topic, trigger fragment, category, or confidence range.
-### Prune
-Remove instincts stale for 30+ days with confidence < 0.3, or superseded by a higher-confidence instinct covering the same trigger.
-### Export
-Serialize instincts to portable JSON for sharing across projects or teams:
-```json
-{ "version": 1, "exported": "2026-05-15T12:00:00Z", "instincts": [...] }
-```
-## Anti-patterns
-- Hoarding every observation (most things aren't learnings)
-- Never pruning (stale knowledge is worse than no knowledge)
-- Storing what, not why (context-less facts are forgettable)
-- Over-promoting: not every pattern is globally useful
-- Extracting without applying: instincts that never trigger are noise
-- Ignoring confidence: treating all instincts as equally reliable
+1. Scan session for repeated decisions
+2. Write instinct for each pattern — trigger, action, confidence
+3. Check existing file for near-duplicates; merge or append
+4. Group instincts by category and topic
+5. Promote high-confidence instincts (≥0.85, ≥10 applications) to global scope
+6. Prune stale low-confidence instincts (<0.3, >30 days)
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → [done — report summary] |
-| fail | → [surface gaps to user] |
-| blocker | → surface to user |
+| pass | → done |
+| fail | → surface |
+| blocker | → surface |

package/harness/skills/oh-manifest/DEEP.md ADDED Viewed

@@ -0,0 +1,92 @@
+# oh-manifest — Deep Reference
+## Phase 0: Pre-Flight
+ALL must pass before any work:
+- ☐ **Quality baseline** — existing tests pass. Capture before/after.
+- ☐ **Rollback path** — clean `git stash` or committed state to return to.
+- ☐ **Branch isolation** — working branch, not main/master.
+- ☐ **Scope documented** — plan exists and unambiguous.
+Any check fails → STOP. Report which. Do not proceed until resolved.
+**Continuous execution:** Execute all tasks without pausing for progress check-ins between them. Only stop for BLOCKED, genuine ambiguity, or all tasks complete.
+## Pipeline
+### Step 1: Plan
+If plan exists, load. If not, run oh-planner. Auto-decide minor scope via decision principles. Surface only: premises needing human judgment, or plan/alternative conflicts.
+### Step 2: Build
+Run oh-builder for each plan phase in dependency order. Parallelizable phases → sub-agents. Auto-decide implementation choices.
+**Two-stage review (in order — never reverse):**
+1. **Spec compliance first** — Does the output match the plan/spec requirements? Quote the spec. No scope creep, no missing requirements.
+2. **Code quality second** — Only after spec compliance is ✅. Architecture, readability, test quality, edge cases.
+**Implementer status protocol** — Implementers report one of:
+| Status | Action |
+|--------|--------|
+| **DONE** | Proceed to spec review |
+| **DONE_WITH_CONCERNS** | Read concerns before proceeding |
+| **NEEDS_CONTEXT** | Provide context, re-dispatch |
+| **BLOCKED** | Assess: context problem? capability gap? task too large? plan wrong? |
+Never ignore BLOCKED or retry same approach without changes.
+### Step 3: Verify
+Check each phase against verification criteria. Tests pass → mark complete. Fail → diagnose (oh-expert), fix, re-verify.
+### Step 4: Loop
+All done → DONE. Phase fails → BLOCKER (surface). New work discovered → add to plan, continue.
+## Loop Patterns
+| Pattern | Use | Behavior |
+|---------|-----|----------|
+| sequential | Normal features | One phase at a time, verify each |
+| continuous-pr | Multi-step refactors | Per-phase PRs |
+| infinite | Watch mode, CI repair | Continue until stop signal |
+| rfc-dag | Complex deps | DAG resolution, parallelize independent branches |
+Default: sequential.
+## Escalation Triggers
+| Trigger | Condition | Action |
+|---------|-----------|--------|
+| Stall | 2 consecutive zero-progress checkpoints | Pause, report attempts |
+| Retry storm | Same error 5+ times | Stop, surface with fixes tried |
+| Cost drift | Cumulative changes exceed scope | Pause, show diff |
+| Quality regression | Verify scores lower than baseline | Pause, report |
+These are not optional. When triggered, loop **must** pause.
+## Decision Principles
+Auto-resolve: completeness > cleverness, boil the lake, pragmatic > perfect, DRY at 3rd instance, explicit > implicit, bias toward action.
+Surface only: premises, dead ends, cross-model disagreement.
+**Model selection guidance:**
+- Mechanical tasks (isolated, 1-2 files, clear spec) → fast cheap model
+- Integration tasks (multi-file, coordination) → standard model
+- Architecture/design/review tasks → most capable model
+## Blocker Protocol
+`BLOCKER: <what> | Options: A, B, C` → wait for decision.
+## Anti-patterns
+- Skipping pre-flight
+- Auto-deciding premises
+- Pushing through blockers without surfacing
+- Skipping verification
+- Parallelizing dependent phases
+- Not updating plan file
+- Ignoring escalation triggers
+- Starting code quality review before spec compliance is ✅
+- Ignoring implementer BLOCKED status and retrying with same approach
+- Pausing between tasks for progress updates (breaks flow)