npm - openhermes - Versions diffs - 4.3.0 → 4.9.2 - Mend

openhermes 4.3.0 → 4.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (96) hide show

package/CONTEXT.md +9 -0
package/README.md +26 -15
package/bootstrap.ts +161 -124
package/harness/agents/oh-browser.md +97 -0
package/harness/agents/oh-builder.md +78 -0
package/harness/agents/oh-facade.md +75 -0
package/harness/agents/oh-fusion.md +45 -0
package/harness/agents/oh-gauntlet.md +71 -0
package/harness/agents/oh-grill.md +71 -0
package/harness/agents/oh-investigate.md +60 -0
package/harness/agents/oh-manifest.md +95 -0
package/harness/agents/oh-plan-review.md +40 -0
package/harness/agents/oh-planner.md +50 -0
package/harness/agents/oh-refactor.md +37 -0
package/harness/agents/oh-retro.md +46 -0
package/harness/agents/oh-review.md +85 -0
package/harness/agents/oh-security.md +83 -0
package/harness/agents/oh-ship.md +76 -0
package/harness/agents/oh-skill-craft.md +38 -0
package/harness/agents/openhermes.md +107 -53
package/harness/codex/AUTOPILOT.md +143 -91
package/harness/codex/CHARTER.md +81 -0
package/harness/commands/oh-doctor.md +193 -14
package/harness/instructions/SHELL.md +76 -0
package/harness/skills/oh-ascii/DEEP.md +292 -0
package/harness/skills/oh-ascii/SKILL.md +31 -0
package/harness/skills/oh-ascii/scripts/check_ascii_alignment.py +596 -0
package/harness/skills/oh-browser/DEEP.md +54 -0
package/harness/skills/oh-browser/SKILL.md +30 -0
package/harness/skills/oh-builder/DEEP.md +63 -0
package/harness/skills/oh-builder/SKILL.md +12 -90
package/harness/skills/oh-expert/DEEP.md +85 -0
package/harness/skills/oh-expert/SKILL.md +13 -106
package/harness/skills/oh-facade/DEEP.md +182 -0
package/harness/skills/oh-facade/SKILL.md +15 -279
package/harness/skills/oh-freeze/DEEP.md +18 -0
package/harness/skills/oh-freeze/SKILL.md +10 -19
package/harness/skills/oh-full-output/DEEP.md +25 -0
package/harness/skills/oh-full-output/SKILL.md +12 -65
package/harness/skills/oh-fusion/DEEP.md +120 -0
package/harness/skills/oh-fusion/SKILL.md +17 -295
package/harness/skills/oh-gauntlet/DEEP.md +77 -0
package/harness/skills/oh-gauntlet/SKILL.md +13 -105
package/harness/skills/oh-grill/DEEP.md +51 -0
package/harness/skills/oh-grill/SKILL.md +12 -63
package/harness/skills/oh-guard/DEEP.md +19 -0
package/harness/skills/oh-guard/SKILL.md +10 -24
package/harness/skills/oh-handoff/DEEP.md +48 -0
package/harness/skills/oh-handoff/SKILL.md +13 -23
package/harness/skills/oh-health/DEEP.md +74 -0
package/harness/skills/oh-health/SKILL.md +13 -76
package/harness/skills/oh-init/DEEP.md +85 -0
package/harness/skills/oh-init/SKILL.md +13 -127
package/harness/skills/oh-investigate/DEEP.md +171 -0
package/harness/skills/oh-investigate/SKILL.md +13 -66
package/harness/skills/oh-issue/DEEP.md +21 -0
package/harness/skills/oh-issue/SKILL.md +11 -27
package/harness/skills/oh-learn/DEEP.md +44 -0
package/harness/skills/oh-learn/SKILL.md +12 -83
package/harness/skills/oh-manifest/DEEP.md +92 -0
package/harness/skills/oh-manifest/SKILL.md +11 -108
package/harness/skills/oh-plan-review/DEEP.md +90 -0
package/harness/skills/oh-plan-review/SKILL.md +13 -115
package/harness/skills/oh-planner/DEEP.md +172 -0
package/harness/skills/oh-planner/SKILL.md +12 -149
package/harness/skills/oh-prd/DEEP.md +45 -0
package/harness/skills/oh-prd/SKILL.md +10 -26
package/harness/skills/oh-refactor/DEEP.md +122 -0
package/harness/skills/oh-refactor/SKILL.md +17 -410
package/harness/skills/oh-retro/DEEP.md +26 -0
package/harness/skills/oh-retro/SKILL.md +12 -24
package/harness/skills/oh-review/DEEP.md +87 -0
package/harness/skills/oh-review/SKILL.md +11 -97
package/harness/skills/oh-security/DEEP.md +83 -0
package/harness/skills/oh-security/SKILL.md +14 -96
package/harness/skills/oh-ship/DEEP.md +141 -0
package/harness/skills/oh-ship/SKILL.md +13 -31
package/harness/skills/oh-skill-craft/DEEP.md +369 -0
package/harness/skills/oh-skill-craft/SKILL.md +17 -178
package/harness/skills/oh-skills-link/DEEP.md +16 -0
package/harness/skills/oh-skills-link/SKILL.md +10 -20
package/harness/skills/oh-skills-list/DEEP.md +20 -0
package/harness/skills/oh-skills-list/SKILL.md +9 -22
package/harness/skills/oh-triage/DEEP.md +23 -0
package/harness/skills/oh-triage/SKILL.md +8 -24
package/harness/skills/oh-worktree/DEEP.md +169 -0
package/harness/skills/oh-worktree/SKILL.md +32 -0
package/lib/harness-resolver.ts +8 -10
package/package.json +5 -3
package/scripts/count-tokens.mjs +158 -0
package/scripts/oh-doctor.ps1 +342 -0
package/harness/codex/CONSTITUTION.md +0 -73
package/harness/codex/ROUTING.md +0 -92
package/harness/instructions/RUNTIME.md +0 -30
package/harness/skills/oh-caveman/SKILL.md +0 -42
package/lib/logger.ts +0 -75

package/harness/skills/oh-manifest/DEEP.md ADDED Viewed

@@ -0,0 +1,92 @@
+# oh-manifest — Deep Reference
+## Phase 0: Pre-Flight
+ALL must pass before any work:
+- ☐ **Quality baseline** — existing tests pass. Capture before/after.
+- ☐ **Rollback path** — clean `git stash` or committed state to return to.
+- ☐ **Branch isolation** — working branch, not main/master.
+- ☐ **Scope documented** — plan exists and unambiguous.
+Any check fails → STOP. Report which. Do not proceed until resolved.
+**Continuous execution:** Execute all tasks without pausing for progress check-ins between them. Only stop for BLOCKED, genuine ambiguity, or all tasks complete.
+## Pipeline
+### Step 1: Plan
+If plan exists, load. If not, run oh-planner. Auto-decide minor scope via decision principles. Surface only: premises needing human judgment, or plan/alternative conflicts.
+### Step 2: Build
+Run oh-builder for each plan phase in dependency order. Parallelizable phases → sub-agents. Auto-decide implementation choices.
+**Two-stage review (in order — never reverse):**
+1. **Spec compliance first** — Does the output match the plan/spec requirements? Quote the spec. No scope creep, no missing requirements.
+2. **Code quality second** — Only after spec compliance is ✅. Architecture, readability, test quality, edge cases.
+**Implementer status protocol** — Implementers report one of:
+| Status | Action |
+|--------|--------|
+| **DONE** | Proceed to spec review |
+| **DONE_WITH_CONCERNS** | Read concerns before proceeding |
+| **NEEDS_CONTEXT** | Provide context, re-dispatch |
+| **BLOCKED** | Assess: context problem? capability gap? task too large? plan wrong? |
+Never ignore BLOCKED or retry same approach without changes.
+### Step 3: Verify
+Check each phase against verification criteria. Tests pass → mark complete. Fail → diagnose (oh-expert), fix, re-verify.
+### Step 4: Loop
+All done → DONE. Phase fails → BLOCKER (surface). New work discovered → add to plan, continue.
+## Loop Patterns
+| Pattern | Use | Behavior |
+|---------|-----|----------|
+| sequential | Normal features | One phase at a time, verify each |
+| continuous-pr | Multi-step refactors | Per-phase PRs |
+| infinite | Watch mode, CI repair | Continue until stop signal |
+| rfc-dag | Complex deps | DAG resolution, parallelize independent branches |
+Default: sequential.
+## Escalation Triggers
+| Trigger | Condition | Action |
+|---------|-----------|--------|
+| Stall | 2 consecutive zero-progress checkpoints | Pause, report attempts |
+| Retry storm | Same error 5+ times | Stop, surface with fixes tried |
+| Cost drift | Cumulative changes exceed scope | Pause, show diff |
+| Quality regression | Verify scores lower than baseline | Pause, report |
+These are not optional. When triggered, loop **must** pause.
+## Decision Principles
+Auto-resolve: completeness > cleverness, boil the lake, pragmatic > perfect, DRY at 3rd instance, explicit > implicit, bias toward action.
+Surface only: premises, dead ends, cross-model disagreement.
+**Model selection guidance:**
+- Mechanical tasks (isolated, 1-2 files, clear spec) → fast cheap model
+- Integration tasks (multi-file, coordination) → standard model
+- Architecture/design/review tasks → most capable model
+## Blocker Protocol
+`BLOCKER: <what> | Options: A, B, C` → wait for decision.
+## Anti-patterns
+- Skipping pre-flight
+- Auto-deciding premises
+- Pushing through blockers without surfacing
+- Skipping verification
+- Parallelizing dependent phases
+- Not updating plan file
+- Ignoring escalation triggers
+- Starting code quality review before spec compliance is ✅
+- Ignoring implementer BLOCKED status and retrying with same approach
+- Pausing between tasks for progress updates (breaks flow)

package/harness/skills/oh-manifest/SKILL.md CHANGED Viewed

@@ -1,17 +1,7 @@
 ---
 name: oh-manifest
-description: "Full build loop: plan → build → verify → loop until done or blocker. Orchestrates oh-planner + oh-builder with auto-decisions."
+description: "Use when running a complete implementation pipeline from plan through verification. Orchestrates oh-planner + oh-builder with auto-decisions."
 tier: 4
-benefits-from: [oh-planner, oh-builder, oh-expert]
-triggers:
-  - "run the full build"
-  - "full build pipeline"
-  - "build loop"
-  - "build until done"
-  - "orchestrate this build"
-  - "pipeline from plan"
-  - "run the plan"
-  - "manifest this"
 route:
   pass: oh-planner
   fail: oh-expert
@@ -20,104 +10,17 @@ route:
 # oh-manifest
-Full build orchestration loop. Runs pre-flight checks → planner → builder → verify → repeat until done or a blocker is surfaced. Uses decision principles to auto-resolve intermediate questions. Only interrupts the user for genuine blockers.
+Full build orchestration loop: pre-flight → plan → build → verify → loop.
-## Pipeline
+## Steps
-### Phase 0: Pre-Flight
-Before any work begins, ALL of these MUST pass:
-- ☐ **Quality baseline** — existing tests pass (if any). Capture output for before/after comparison.
-- ☐ **Rollback path** — clean `git stash` or a committed state you can return to.
-- ☐ **Branch isolation** — confirm you are on a working branch, not main/master.
-- ☐ **Scope documented** — plan or task description exists and is unambiguous.
-If any check fails → **STOP**. Report which check failed and why. Do not proceed to Phase 1 until the blocker is resolved.
-### Step 1: Plan
-- If a plan file (`~/.local/share/opencode/openhermes/plans/<project-name>-plan-<nnn>.md`) exists, load and verify it is current
-- If not, run `oh-planner` (Mode A, B, or C depending on context)
-- Auto-decide minor scope decisions using decision principles
-- Surface only: premises that need human judgment, or plan/alternative conflicts
-### Step 2: Build
-- For each phase in the plan file, run `oh-builder` (Mode D: From Plan)
-- Implements phases in dependency order
-- Parallelizable phases may be delegated to sub-agents
-- Auto-decide implementation choices using decision principles
-### Step 3: Verify
-- Check each phase against its verification criteria in the plan file
-- Run tests if they exist
-- If phase passes: mark complete in plan file, proceed to next
-- If phase fails: diagnose (use oh-expert self-diagnosis), fix, re-verify
-- If fix is impossible within scope: surface blocker
-### Step 4: Loop or Done
-- All phases complete and verified → DONE
-- Phase failed and cannot be fixed → BLOCKER (surface to user with context)
-- Phase passed but new work discovered → add to plan, continue loop
-## Loop Patterns
-Select a pattern based on the nature of the work:
-| Pattern | Use When | Behavior |
-|---------|----------|----------|
-| **sequential** | Normal feature work | One phase at a time, verify each before next |
-| **continuous-pr** | Multi-step refactors | Each phase is its own PR — commit, push, PR per phase |
-| **infinite** | Watch mode, CI repair | Continue until external stop signal or budget exhausted |
-| **rfc-dag** | Complex dependency chains | Resolve phase ordering by DAG; parallelize independent branches |
-Default is **sequential**. Switch patterns only when the work structure demands it.
-## Escalation Triggers
-These conditions cause the loop to **pause** and surface to the user:
-| Trigger | Condition | Action |
-|---------|-----------|--------|
-| **Stall** | 2 consecutive checkpoints with zero measurable progress | Pause. Report what was attempted, what blocked. |
-| **Retry storm** | Same error message 3+ times in the loop | Stop retrying. Surface error with attempted fixes. |
-| **Cost drift** | Cumulative changes exceed scope documented in pre-flight | Pause. Show diff between planned and actual scope. |
-| **Quality regression** | Verify phase scores lower than pre-flight baseline | Pause. Report degraded metrics. Do not push through. |
-These are not optional suggestions. When a trigger fires, the loop **must** pause and report.
-## Decision Principles
-Auto-resolve these without asking the user:
-1. **Completeness over cleverness** — cover more cases
-2. **Boil the lake** — fix blast radius, not symptom
-3. **Pragmatic over perfect** — cleaner option that ships today
-4. **DRY but not premature** — third instance is the time to abstract
-5. **Explicit over implicit** — clear code over magic
-6. **Bias toward action** — when in doubt, make progress
-Surface to user only:
-- **Premises** — fundamental assumptions that change the nature of the build
-- **Dead end** — all viable paths have significant trade-offs
-- **Cross-model disagreement** — two approaches both have strong arguments
-## Blocker Protocol
-When a blocker is encountered:
-1. **Describe the blocker** — what was attempted, what failed, why it cannot proceed
-2. **Propose alternatives** — scope reduction, dependency change, architectural shift
-3. **Surface to user** with: `BLOCKER: <description> | Options: <A, B, C>`
-4. **Wait for user decision** before continuing
-## Anti-patterns
-- Skipping pre-flight (every loop needs a baseline and a rollback plan)
-- Auto-deciding premises (fundamental assumptions need user input)
-- Pushing through blockers (surface immediately, don't try 5 workarounds silently)
-- Skipping verification (verify every phase, not just the final result)
-- Parallelizing dependent phases (respect the dependency order in the plan file)
-- Forgetting to update the plan file with completion status
-- Ignoring escalation triggers (stall means pause, not try harder)
+1. Run pre-flight — verify quality baseline (tests pass), rollback path (clean stash or committed state), branch isolation, scope documented.
+2. Load or create plan — load existing plan or dispatch oh-planner. Auto-decide minor scope via decision principles.
+3. Dispatch build — run oh-builder for each phase in dependency order. Parallelize independent phases via sub-agents.
+4. Review output — spec compliance first, then code quality. Never reverse the order.
+5. Verify — check each phase against verification criteria. Tests pass → mark complete. Fail → diagnose, fix, re-verify.
+6. Handle implementer status — DONE (proceed), DONE_WITH_CONCERNS (read before proceeding), NEEDS_CONTEXT (provide and re-dispatch), BLOCKED (assess type).
+7. Loop — all done → DONE. Phase fails → BLOCKER (surface with options). New work → add to plan, continue.
 ## Routing
@@ -125,4 +28,4 @@ When a blocker is encountered:
 |---------|-------|
 | pass | → pipeline continues (planner→builder→gauntlet→ship) |
 | fail | → oh-expert (diagnose loop failure) |
-| blocker | → surface to user with context and options |
+| blocker | → surface with context and options |

package/harness/skills/oh-plan-review/DEEP.md ADDED Viewed

@@ -0,0 +1,90 @@
+# oh-plan-review — Deep Reference
+## Lens Selection
+| Keywords | Lens |
+|----------|------|
+| architecture, data model, API, types, modules | Engineering |
+| UI, layout, colors, components, screens | Design |
+| CLI, SDK, dev tool, API, npm package, docs | DX |
+| product, strategy, scope, roadmap, business | Strategy |
+## Engineering Lens
+### Scope Challenge (before reviewing)
+1. Does existing code already solve any sub-problem?
+2. Minimum changes to achieve goal?
+3. 8+ files or 2+ new classes/services → smell. Challenge.
+4. Does framework have built-in for each pattern?
+5. AI completeness is cheap — recommend full over shortcuts.
+6. New artifact types need build/publish pipelines.
+### Architecture Review
+One section at a time: Architecture → Code Quality → Tests → Performance. Max 8 issues per section. Discuss each via AskUserQuestion. Anti-skip: evaluate every section; say "No issues found" if clean.
+### Cognitive patterns (internalize)
+- State diagnosis (Larson) — falling behind, treading, repaying debt, innovating?
+- Blast radius — worst case = how many systems?
+- Boring by default (McKinley) — proven tech unless you have innovation tokens
+- Reversibility — make wrong answers cheap. Feature flags, incremental rollouts.
+- Essential vs accidental complexity (Brooks) — real problem or self-inflicted?
+## Design Lens
+- Empty states — warmth, action, context when no data
+- Visual hierarchy — what's seen first, second, third?
+- Edge cases — long names, zero results, error, first-time vs power
+- AI slop — generic card grids, 3-column features, hero sections? Flag.
+- Responsive — every viewport intentional, not just stack-on-mobile
+- A11y — keyboard, screen readers, contrast, touch targets
+**Rule:** Specificity over vibes. "Clean, modern" is not a decision. Name the font, spacing, interaction, motion.
+## DX Lens
+**Evaluate:** Time to Hello World (< 2 min target). Error quality (problem + cause + fix). First 5 min friction. Progressive disclosure — simple case is prod-ready. Pit of Success — right thing is easy, wrong thing is hard.
+**Modes:**
+- **Expansion** — competitive advantage. Benchmark competitors.
+- **Polish** — bulletproof every touchpoint.
+- **Triage** — critical gaps only. Minimum viable DX.
+## Strategy Lens
+### Scope Modes
+- **Expansion** — "10x better for 2x effort?" Present as AskUserQuestion.
+- **Selective** — Surface cherry-pickable expansions. Neutral posture.
+- **Hold** — Bulletproof. Catch every failure. No silent reduction.
+- **Reduction** — Ruthless cut to MVP.
+### Patterns (internalize)
+- One-way vs two-way doors (Bezos) — most are two-way; move fast
+- Inversion (Munger) — "how do we win?" + "what makes us fail?"
+- Focus as subtraction (Jobs) — fewer things, better
+- Proxy skepticism (Bezos) — metrics serving users or self-referential?
+- Temporal depth — 5-10 year arcs. Regret minimization.
+### Prime Directives
+- Zero silent failures. Every failure mode visible.
+- Every error named — exception class, trigger, catch, message.
+- Data flows have shadow paths: nil, empty, upstream error. Trace all four.
+- Observability is first-class — new dashboards/alerts are deliverables.
+- Everything deferred written down or it doesn't exist.
+- You have permission to say "scrap it and do this instead."
+## Rules
+- **Interactive only.** One section at a time via AskUserQuestion.
+- **Anti-skip.** Every section evaluated. Zero findings → say so.
+- **Commit to lens.** Once scope agreed, don't re-argue earlier decisions.
+## Output
+Plan file (`~/.local/share/opencode/openhermes/plans/<project-name>-plan-<nnn>.md`) updated with findings and decisions.
+## Anti-patterns
+- Using the wrong lens for the question
+- Reviewing without reading the full plan first
+- Merging concerns across lenses
+- Skipping the interactive walkthrough

package/harness/skills/oh-plan-review/SKILL.md CHANGED Viewed

@@ -1,19 +1,7 @@
 ---
 name: oh-plan-review
-description: "Multi-lens plan review: 4 perspectives in one skill. Choose Engineering (architecture/scope), Design (UX/interaction), DX (API/CLI ergonomics), or Strategy (product/CEO). Interactive — walks through findings one section at a time."
+description: "Use when a plan needs multi-perspective review before execution. Choose Engineering, Design, DX, or Strategy lens — walks through findings one section at a time."
 tier: 3
-benefits-from: [oh-planner, oh-expert]
-triggers:
-  - "review this plan"
-  - "review the plan file"
-  - "architecture review of"
-  - "design review the plan"
-  - "ux review this plan"
-  - "dx review the plan"
-  - "strategy review"
-  - "engineering review"
-  - "ceo review"
-  - "review plan from"
 route:
   pass:
     - oh-grill
@@ -24,112 +12,22 @@ route:
 # oh-plan-review
-Four review lenses in one skill. Pick the lens that fits the plan's scope — or run multiple lenses in sequence for thorough coverage.
+Four-lens plan review. Interactive — walk findings one section at a time.
-**Interactive.** Walk findings one section at a time with opinionated recommendations and AskUserQuestion gates. Never dump all findings at once.
+## Steps
-**Read-only.** No code changes. The output is a better plan, not a document about the plan.
-## Lens Selection
-Ask the user which lens fits, or auto-detect from plan content:
-| Trigger keywords | Recommended lens |
-|---|---|
-| architecture, data model, API design, file structure, types, modules | Engineering |
-| UI, layout, colors, components, screens, mockups, user interface | Design |
-| CLI, SDK, developer tool, API, npm package, documentation, onboarding | DX |
-| product, strategy, scope, roadmap, competition, business model | Strategy |
-### Engineering Lens
-Scope challenge, architecture review, cognitive patterns for eng managers.
-**Scope Challenge** — Before reviewing anything:
-1. What existing code already partially solves each sub-problem?
-2. What is the minimum set of changes that achieves the stated goal?
-3. Complexity check: 8+ files or 2+ new classes/services → smell. Challenge it.
-4. Search check: does the runtime/framework have built-in support for each pattern the plan introduces?
-5. Completeness check: with AI-assisted coding, the cost of completeness is 10-100x cheaper. Recommend complete lakes over shortcuts.
-6. Distribution check: new artifact types need build/publish pipelines.
-**Architecture Review** — Walk through one section at a time: Architecture → Code Quality → Tests → Performance. Max 8 top issues per section. Use AskUserQuestion to discuss each finding.
-**Anti-skip rule:** Never condense or skip a section. If a section has zero findings, say so — but evaluate it.
-**Cognitive patterns** (internalize, don't enumerate):
-- State diagnosis (Larson) — Is your team falling behind, treading water, repaying debt, or innovating?
-- Blast radius instinct — What's the worst case and how many systems does it affect?
-- Boring by default (McKinley) — Proven technology unless you have innovation tokens to spend.
-- Reversibility preference — Feature flags, incremental rollouts. Make wrong answers cheap.
-- Essential vs accidental complexity (Brooks) — Is this solving a real problem or one we created?
-### Design Lens
-UX review, interaction state coverage, AI slop detection.
-**Evaluate:**
-- Empty states — every screen without data needs warmth, action, context
-- Visual hierarchy — what does the user see first, second, third?
-- Edge cases — 47-char names, zero results, error states, first-time vs power user
-- AI slop — generic card grids, hero sections, 3-column features? Flag them.
-- Responsive — every viewport gets intentional design, not just stack-on-mobile
-- Accessibility — keyboard nav, screen readers, contrast, touch targets
-**Principle:** Specificity over vibes. "Clean, modern UI" is not a design decision. Name the font, spacing scale, interaction pattern, and motion.
-### DX Lens
-Developer experience audit for APIs, CLIs, SDKs, libraries, platforms.
-**Evaluate:**
-- Time to Hello World — target < 2 minutes. Every extra minute drops adoption 20-30%.
-- Error quality — every error = problem + cause + fix. No "something went wrong."
-- First five minutes — one click to start. No credit card. No demo call.
-- Progressive disclosure — simple case is production-ready. Complex case uses the same API.
-- Pit of Success — make the right thing easy, the wrong thing hard.
-**Three modes:**
-- **DX Expansion** — competitive advantage. Design magical moments. Benchmark competitors.
-- **DX Polish** — bulletproof every touchpoint. No friction, no uncertainty.
-- **DX Triage** — critical gaps only. Minimum viable DX investment.
-### Strategy Lens
-Product/CEO review with 4 scope modes.
-**Select mode:**
-- **Scope Expansion** — "What would make this 10x better for 2x the effort?" Push scope up. Present each expansion as an AskUserQuestion. The user opts in or out.
-- **Selective Expansion** — Hold the baseline. Surface expansion opportunities for cherry-picking. Neutral recommendation posture.
-- **Hold Scope** — Make it bulletproof. Catch every failure mode. No silent reduction or expansion.
-- **Scope Reduction** — Find the minimum viable version. Be ruthless. Cut everything non-essential.
-**Cognitive patterns** (internalize):
-- Classification instinct (Bezos) — One-way vs two-way doors. Most things are two-way; move fast.
-- Inversion reflex (Munger) — For every "how do we win?" also ask "what would make us fail?"
-- Focus as subtraction (Jobs) — Default: do fewer things, better. 350 products → 10.
-- Proxy skepticism (Bezos) — Are our metrics still serving users or self-referential?
-- Temporal depth — Think in 5-10 year arcs. Apply regret minimization for major bets.
-**Prime directives:**
-- Zero silent failures. Every failure mode must be visible.
-- Every error has a name. Don't say "handle errors." Name the exception class, trigger, catch, user-facing message.
-- Data flows have shadow paths: nil, empty, upstream error. Trace all four.
-- Observability is scope, not afterthought. New dashboards and alerts are first-class deliverables.
-- Everything deferred must be written down. TODOS.md or it doesn't exist.
-- You have permission to say "scrap it and do this instead."
-## Output
-After each lens, the plan file (`~/.local/share/opencode/openhermes/plans/<project-name>-plan-<nnn>.md`) is updated with findings and decisions. The user reviews and accepts changes interactively.
-## Rules
-- **Interactive only.** One section at a time. Use AskUserQuestion to discuss findings before writing.
-- **Anti-skip:** Every section must be evaluated. If zero findings, say "No issues found" and move on.
-- **Anti-shortcut:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Findings go through AskUserQuestion before writing.
-- **Commit to the chosen lens.** Once scope is agreed, don't re-argue earlier decisions in later sections.
+1. Select lens — match keywords to lens using routing table (architecture → Engineering, UI → Design, CLI → DX, product → Strategy).
+2. Read the full plan before reviewing. Understand scope before evaluating.
+3. Walk through sections one at a time — interactive via AskUserQuestion.
+4. Apply lens-specific criteria — scope challenge, architecture review, cognitive patterns for Engineering; empty states, hierarchy, a11y for Design; Hello World time, error quality for DX; scope modes, prime directives for Strategy.
+5. Surface findings per section — max 8 issues per section. Zero findings → say so. Anti-skip: evaluate every section.
+6. Update plan file — record findings and decisions in canonical plan storage.
+7. Route result — pass to execution or stress-testing, fail back to revision.
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → oh-grill (if concerns remain) or oh-manifest (execute plan) |
-| fail | → oh-planner (revise plan based on findings) |
-| blocker | → surface to user |
+| pass | → oh-grill (if concerns remain) or oh-manifest (execute) |
+| fail | → oh-planner (revise) |
+| blocker | → surface |

package/harness/skills/oh-planner/DEEP.md ADDED Viewed

@@ -0,0 +1,172 @@
+# oh-planner — Deep Reference
+## Mode A: Brainstorm (fuzzy idea)
+Use when the concept is vague ("what if", "I have an idea") and needs shaping into something concrete.
+### Process
+Ask these 6 clarifying questions in order:
+1. **Who specifically needs this?** — Identify the exact user or stakeholder. Not "developers" but "frontend devs doing state management in React 19".
+2. **What do they do today?** — Current workflow, tooling, and pain points. What's the manual/partial solution?
+3. **What's the one concrete thing they can't do?** — The single capability gap. If they had one new thing, what would it be?
+4. **What's the smallest useful version?** — Minimum scope that delivers real value. Strip everything non-essential.
+5. **What signals success?** — Observable, measurable outcomes. Not "better DX" but "setup drops from 15min to 2min".
+6. **Does this compound or plateau?** — Will this unlock further improvements (compound) or is it a one-time fix (plateau)? Compound features get deeper investment.
+### Output
+Structured design doc covering: user definition, current workflow, capability gap, minimum viable scope, success metrics, growth trajectory.
+## Mode B: Architecture Analysis (existing codebase)
+Use when the codebase feels messy or you need to understand the surface before planning.
+### Process
+1. **Read domain** — Load `CONTEXT.md` (or equivalent domain doc). Understand the language, concepts, and shared terms before touching code. Domain-blind analysis produces wrong recommendations.
+2. **Map the surface** — Identify module boundaries and responsibilities, dependency direction, public API surfaces vs internal implementation, configuration and extension points.
+3. **Find deepening opportunities** — Look for duplication, over-coupling, grown-beyond-purpose files, dead code or unused abstractions, inconsistent patterns.
+4. **Rank by impact** — For each finding, assess effort, value, dependencies, risk.
+### Output
+Ranked list of refactoring candidates with effort/value/risk assessment. Each candidate includes: location, problem description, recommended change, and estimated effort.
+## Mode C: Structured Plan (non-trivial feature)
+Use when requirements exist and need a formal plan document to execute from.
+### 1. Scope Challenge
+Before writing anything, challenge the scope:
+- **What existing code partially solves it?** — Don't build from scratch if 60% exists.
+- **Minimum changes?** — What's the smallest diff that ships the feature?
+- **Complexity check:** 8+ files changed is a smell. Flag it. Reconsider the approach.
+- **Search check:** For each architecture pattern in your approach, search `{framework} {pattern} built-in`. Flag custom solutions where framework built-ins exist.
+- **Completeness check:** AI-assisted completeness is 10-100x cheaper than human teams. Default to full coverage, not minimal.
+- **Distribution check:** New artifact types may need pipelines (build, test, deploy, publish). Include them.
+### 2. Strategy Review
+Challenge premises: Is this the right problem to solve? Identify scope decisions explicitly (what's in, what's out, why). Consider 10x alternatives. Who owns the outcome? Who reviews?
+### 3. Architecture Review
+Analyze: data flow, component boundaries, API surface, state model.
+### 4. Edge Case Analysis
+Cover: error states, concurrency, failure modes, security.
+### 5. Dependency Mapping
+Map what blocks what: identify parallelizable work streams, note external dependencies, order phases so nothing blocks on unfinished upstream work.
+### 6. Write Plan
+Produce a structured artifact with: phases, dependencies, verification steps per phase, and exit criteria.
+### 7. Self-Review Checklist
+1. **Spec coverage** — Skim each requirement from the original request. Can you point to a task that implements it? List any gaps and add missing tasks.
+2. **Placeholder scan** — Search the plan for banned patterns: "TBD", "TODO", "implement later", "handle edge cases", "fill in details". Replace every instance with concrete content.
+3. **Type consistency** — Do types, method signatures, and property names match across tasks? A function called `clearLayers()` in Task 3 but `clearFullLayers()` in Task 7 is a bug. Fix cross-references.
+Fix any issues inline — no need to re-review, just fix and move on. If a spec requirement has no task, add the task.
+## Mode D: Autoplan (existing plan needs full review)
+Use when a plan exists and needs comprehensive automated review. Auto-decides 90% of intermediate questions.
+### Phase Order
+Runs sequentially: **Strategy → Architecture → Design → Engineering → DX**. Each phase must complete before the next begins. No jumping ahead.
+### Auto-Resolution Principles
+| # | Principle | Meaning |
+|---|-----------|---------|
+| 1 | **Completeness over cleverness** | Cover more cases. Clever shortcuts miss edge cases. |
+| 2 | **Boil the lake** | Fix blast radius, not symptom. If a module is misdesigned, refactor it — don't patch around it. |
+| 3 | **Pragmatic over perfect** | Ships today wins. Perfect designs that never ship are worthless. |
+| 4 | **DRY but not premature** | Reuse what exists. But don't abstract until the 3rd concrete instance appears. |
+| 5 | **Explicit over implicit** | Clear code over magic. Magic is fun to write, terrible to debug. |
+| 6 | **Bias toward action** | When in doubt, make progress. Analysis paralysis is a decision too. |
+### Never Auto-Decide
+- **Premises** — Core assumptions about what to build. These need human judgment.
+- **Close calls** — Decisions where both options have strong, valid arguments. Surface for discussion.
+## Plan Artifact Format
+Every plan written by oh-planner uses this canonical format.
+### Storage
+Canonical path: `~/.local/share/opencode/openhermes/plans/<project>-plan-<nnn>.md`
+### Template
+```markdown
+# PLAN: <project>
+Plan ID: <project>-plan-<nnn>
+Project: <project>
+Status: active | in-progress | blocked | complete | abandoned
+Created: <ts> | Updated: <ts>
+Project Path: <absolute-path>
+Plan Path: <canonical-path>/<project>-plan-<nnn>.md
+Objective: <short>
+## Current State
+— What exists now, what phase we're in.
+## Assumptions
+— Decisions we're making without full information.
+## Tasks
+- [ ] Task 1
+  - [ ] Subtask 1.1
+## Active Task
+— What's being worked on right now.
+## Subagents
+| Agent | Purpose | Status | Findings |
+## Completed
+— Finished tasks with dates.
+## Work Log
+— Running log of decisions and progress.
+## Blockers
+— What's stopping progress.
+## Validation
+- [ ] Static checks
+- [ ] Unit tests
+- [ ] Manual verification
+## Decisions
+— Key decisions and their rationale.
+## Notes
+— Miscellaneous context.
+```
+### Task Rules
+- **Bite-Sized Granularity** — Each step is one action, 2-5 minutes.
+- **No Placeholder Rule** — Banned: TBD, TODO, "implement later", "fill in details", "add appropriate error handling", "add validation", "handle edge cases", "write tests for the above" (without actual test code), "Similar to Task N".
+- **Complete Code in Every Step** — If a step changes code, show the complete code inline. Use exact file paths always.
+- **Expected Output** — Every test step must include the exact command to run and the expected output.
+### Execution Handoff
+After saving a plan, offer the user an execution choice:
+> **Plan saved. Two execution options:**
+> **1. Subagent-Driven (recommended)** — I dispatch a fresh subagent per task with two-stage review between tasks for fast iteration.
+> **2. Inline Execution** — Execute tasks in this session with batch execution and checkpoints.
+### Rules
+- **Self-contained** — Tasks, Completed, Subagents, and Work Log live in this one file. No separate `todo.md` or `work-log.md`.
+- **Status tracks lifecycle** — Only use: `active`, `in-progress`, `blocked`, `complete`, `abandoned`.
+- **Validation lives with the plan** — Each plan defines its own verification criteria.
+## Anti-patterns
+- Skipping strategy review for complex features (architecture mistakes compound)
+- Wrong granularity — too vague to execute or too detailed to read
+- Re-opening decided debates ("what if we rewrite in Rust?")
+- Perfect > shipped (progress > polish)
+- Not flagging taste decisions to user
+- Big bang rewrites — plan increments, not overhauls
+- Skipping the user-approval gate — implementing before the user has reviewed and approved the design document
+- Placeholders in plan tasks (TBD, TODO, "implement later" — makes plan unexecutable)
+- Missing expected output in test steps