npm - openhermes - Versions diffs - 2.6.1 → 4.0.0 - Mend

openhermes 2.6.1 → 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (158) hide show

package/CONTEXT.md +18 -0
package/ETHOS.md +15 -0
package/README.md +135 -292
package/bootstrap.mjs +174 -499
package/harness/agents/openhermes.md +87 -0
package/harness/codex/CONSTITUTION.md +70 -148
package/harness/codex/ROUTING.md +126 -0
package/harness/commands/oh-doctor.md +26 -0
package/harness/instructions/CONVENTIONS.md +206 -206
package/harness/instructions/RUNTIME.md +54 -31
package/harness/skills/oh-builder/SKILL.md +98 -0
package/harness/skills/oh-caveman/SKILL.md +33 -0
package/harness/skills/oh-expert/SKILL.md +121 -0
package/harness/skills/oh-freeze/SKILL.md +28 -0
package/harness/skills/oh-gauntlet/SKILL.md +119 -0
package/harness/skills/oh-grill/SKILL.md +77 -0
package/harness/skills/oh-guard/SKILL.md +33 -0
package/harness/skills/oh-handoff/SKILL.md +33 -0
package/harness/skills/oh-health/SKILL.md +90 -0
package/harness/skills/oh-init/SKILL.md +78 -0
package/harness/skills/oh-investigate/SKILL.md +35 -0
package/harness/skills/oh-issue/SKILL.md +36 -0
package/harness/skills/oh-learn/SKILL.md +28 -0
package/harness/skills/oh-manifest/SKILL.md +84 -0
package/harness/skills/oh-plan-review/SKILL.md +128 -0
package/harness/skills/oh-planner/SKILL.md +157 -0
package/harness/skills/oh-prd/SKILL.md +35 -0
package/harness/skills/oh-retro/SKILL.md +33 -0
package/harness/skills/oh-review/SKILL.md +110 -0
package/harness/skills/oh-security/SKILL.md +110 -0
package/harness/skills/oh-ship/SKILL.md +39 -0
package/harness/skills/oh-skill-craft/SKILL.md +107 -0
package/harness/skills/oh-skills-link/SKILL.md +29 -0
package/harness/skills/oh-skills-list/SKILL.md +31 -0
package/harness/skills/oh-triage/SKILL.md +36 -0
package/index.mjs +3 -58
package/lib/harness-resolver.mjs +77 -0
package/lib/logger.mjs +62 -0
package/package.json +49 -53
package/test/plugins-behavioral.test.mjs +64 -0
package/test/plugins.test.mjs +62 -0
package/autorecall.mjs +0 -237
package/curator.mjs +0 -455
package/harness/commands/build-fix.md +0 -60
package/harness/commands/checkpoint.md +0 -68
package/harness/commands/code-review.md +0 -71
package/harness/commands/doctor.md +0 -42
package/harness/commands/eval.md +0 -89
package/harness/commands/go-build.md +0 -87
package/harness/commands/go-review.md +0 -71
package/harness/commands/harness-audit.md +0 -90
package/harness/commands/learn.md +0 -37
package/harness/commands/loop-start.md +0 -38
package/harness/commands/loop-status.md +0 -30
package/harness/commands/memory-search.md +0 -37
package/harness/commands/model-route.md +0 -32
package/harness/commands/ohc.md +0 -13
package/harness/commands/orchestrate.md +0 -88
package/harness/commands/plan.md +0 -53
package/harness/commands/quality-gate.md +0 -35
package/harness/commands/refactor-clean.md +0 -102
package/harness/commands/rust-build.md +0 -78
package/harness/commands/rust-review.md +0 -65
package/harness/commands/security.md +0 -93
package/harness/commands/setup-pm.md +0 -65
package/harness/commands/skill-create.md +0 -99
package/harness/commands/test-coverage.md +0 -80
package/harness/commands/update-codemaps.md +0 -81
package/harness/commands/update-docs.md +0 -67
package/harness/commands/verify.md +0 -68
package/harness/prompts/architect.txt +0 -189
package/harness/prompts/build-cpp.md +0 -98
package/harness/prompts/build-error-resolver.md +0 -44
package/harness/prompts/build-go.md +0 -340
package/harness/prompts/build-java.md +0 -140
package/harness/prompts/build-kotlin.md +0 -137
package/harness/prompts/build-rust.md +0 -108
package/harness/prompts/code-reviewer.md +0 -40
package/harness/prompts/doc-updater.md +0 -206
package/harness/prompts/docs-lookup.md +0 -71
package/harness/prompts/e2e-runner.txt +0 -317
package/harness/prompts/explore.md +0 -42
package/harness/prompts/harness-optimizer.md +0 -42
package/harness/prompts/loop-operator.md +0 -53
package/harness/prompts/planner.md +0 -37
package/harness/prompts/refactor-cleaner.md +0 -256
package/harness/prompts/review-cpp.md +0 -81
package/harness/prompts/review-database.md +0 -261
package/harness/prompts/review-go.md +0 -257
package/harness/prompts/review-java.md +0 -113
package/harness/prompts/review-kotlin.md +0 -143
package/harness/prompts/review-python.md +0 -101
package/harness/prompts/review-rust.md +0 -77
package/harness/prompts/security-reviewer.md +0 -42
package/harness/prompts/tdd-guide.md +0 -228
package/harness/rules/audit.md +0 -84
package/harness/rules/checkpointing.md +0 -75
package/harness/rules/context-loading.md +0 -33
package/harness/rules/credential-exposure.md +0 -0
package/harness/rules/delegation.md +0 -80
package/harness/rules/handoff.md +0 -267
package/harness/rules/memory-management.md +0 -28
package/harness/rules/precedence.md +0 -52
package/harness/rules/promotion.md +0 -46
package/harness/rules/ranking.md +0 -64
package/harness/rules/retrieval.md +0 -94
package/harness/rules/runtime-guards.md +0 -196
package/harness/rules/self-heal.md +0 -79
package/harness/rules/session-start.md +0 -34
package/harness/rules/skills-management.md +0 -165
package/harness/rules/state-drift.md +0 -192
package/harness/rules/verification.md +0 -88
package/harness/scripts/sync-commands.mjs +0 -259
package/harness/skills/.bundled_manifest +0 -17
package/harness/skills/.usage.json +0 -6
package/harness/skills/api-design/SKILL.md +0 -523
package/harness/skills/backend-patterns/SKILL.md +0 -598
package/harness/skills/coding-standards/SKILL.md +0 -549
package/harness/skills/e2e-testing/SKILL.md +0 -326
package/harness/skills/frontend-patterns/SKILL.md +0 -642
package/harness/skills/frontend-slides/SKILL.md +0 -184
package/harness/skills/security-review/SKILL.md +0 -495
package/harness/skills/strategic-compact/SKILL.md +0 -131
package/harness/skills/tdd-workflow/SKILL.md +0 -463
package/harness/skills/verification-loop/SKILL.md +0 -126
package/lib/ambient-memory.mjs +0 -167
package/lib/handoff.mjs +0 -176
package/lib/hardening.mjs +0 -128
package/lib/memory-tools-plugin.mjs +0 -365
package/lib/ohc/block-sync.mjs +0 -69
package/lib/ohc/compress/search.mjs +0 -152
package/lib/ohc/compress/state.mjs +0 -76
package/lib/ohc/config.mjs +0 -186
package/lib/ohc/message-ids.mjs +0 -168
package/lib/ohc/notify.mjs +0 -154
package/lib/ohc/protected-patterns.mjs +0 -54
package/lib/ohc/prune-apply.mjs +0 -134
package/lib/ohc/pruner.mjs +0 -610
package/lib/ohc/reaper.mjs +0 -70
package/lib/ohc/state.mjs +0 -266
package/lib/ohc/strategies/deduplication.mjs +0 -72
package/lib/ohc/strategies/index.mjs +0 -2
package/lib/ohc/strategies/purge-errors.mjs +0 -43
package/lib/ohc/token-utils.mjs +0 -26
package/lib/ohc/updater.mjs +0 -133
package/lib/paths.mjs +0 -50
package/lib/schema-validator.mjs +0 -77
package/lib/search.mjs +0 -48
package/schemas/audit.schema.json +0 -82
package/schemas/backlog.schema.json +0 -63
package/schemas/checkpoint.schema.json +0 -65
package/schemas/constraint.schema.json +0 -62
package/schemas/decision.schema.json +0 -63
package/schemas/instinct.schema.json +0 -63
package/schemas/loop-state.schema.json +0 -33
package/schemas/mistake.schema.json +0 -64
package/schemas/verification_receipt.schema.json +0 -88
package/skill-builder.mjs +0 -88

package/harness/skills/oh-learn/SKILL.md ADDED Viewed

@@ -0,0 +1,28 @@
+---
+name: oh-learn
+description: "Review, search, prune, and export session learnings"
+---
+# oh-learn
+## When to Use
+To review what the agent has learned across sessions, search for specific patterns, prune stale knowledge, or export learnings for documentation.
+## Workflow
+1. **Review** — show recent learnings with context
+2. **Search** — find learnings matching specific topics or patterns
+3. **Prune** — remove stale, redundant, or superseded learnings
+4. **Export** — format learnings for documentation or sharing
+## Anti-patterns
+- Hoarding every observation (most things aren't learnings)
+- Never pruning (stale knowledge is worse than no knowledge)
+- Storing what, not why (context-less facts are forgettable)
+## Routing
+| Outcome | Route |
+|---------|-------|
+| pass | → [done — read-only report] |
+| fail | → [surface gaps to user] |
+| blocker | → surface to user |

package/harness/skills/oh-manifest/SKILL.md ADDED Viewed

@@ -0,0 +1,84 @@
+---
+name: oh-manifest
+description: "Full build loop: plan → build → verify → loop until done or blocker. Orchestrates oh-planner + oh-builder with auto-decisions."
+tier: 4
+benefits-from: [oh-planner, oh-builder, oh-expert]
+triggers:
+  - "manifest"
+  - "full build"
+  - "build loop"
+  - "build until done"
+  - "orchestrate"
+  - "pipeline"
+  - "run the plan"
+---
+# oh-manifest
+Full build orchestration loop. Runs planner → builder → verify → repeat until done or a blocker is surfaced. Uses gstack decision principles to auto-resolve intermediate questions. Only interrupts the user for genuine blockers.
+## Pipeline
+### Step 1: Plan
+- If `.opencode/plan.md` exists, load and verify it is current
+- If not, run `oh-planner` (Mode A, B, or C depending on context)
+- Auto-decide minor scope decisions using decision principles
+- Surface only: premises that need human judgment, or plan/alternative conflicts
+### Step 2: Build
+- For each phase in plan.md, run `oh-builder` (Mode D: From Plan)
+- Implements phases in dependency order
+- Parallelizable phases may be delegated to sub-agents
+- Auto-decide implementation choices using decision principles
+### Step 3: Verify
+- Check each phase against its verification criteria in plan.md
+- Run tests if they exist
+- If phase passes: mark complete in plan.md, proceed to next
+- If phase fails: diagnose (use oh-expert self-diagnosis), fix, re-verify
+- If fix is impossible within scope: surface blocker
+### Step 4: Loop or Done
+- All phases complete and verified → DONE
+- Phase failed and cannot be fixed → BLOCKER (surface to user with context)
+- Phase passed but new work discovered → add to plan, continue loop
+## Decision Principles
+Auto-resolve these without asking the user:
+1. **Completeness over cleverness** — cover more cases
+2. **Boil the lake** — fix blast radius, not symptom
+3. **Pragmatic over perfect** — cleaner option that ships today
+4. **DRY but not premature** — third instance is the time to abstract
+5. **Explicit over implicit** — clear code over magic
+6. **Bias toward action** — when in doubt, make progress
+Surface to user only:
+- **Premises** — fundamental assumptions that change the nature of the build
+- **Dead end** — all viable paths have significant trade-offs
+- **Cross-model disagreement** — two approaches both have strong arguments
+## Blocker Protocol
+When a blocker is encountered:
+1. **Describe the blocker** — what was attempted, what failed, why it cannot proceed
+2. **Propose alternatives** — scope reduction, dependency change, architectural shift
+3. **Surface to user** with: `BLOCKER: <description> | Options: <A, B, C>`
+4. **Wait for user decision** before continuing
+## Anti-patterns
+- Auto-deciding premises (fundamental assumptions need user input)
+- Pushing through blockers (surface immediately, don't try 5 workarounds silently)
+- Skipping verification (verify every phase, not just the final result)
+- Parallelizing dependent phases (respect the dependency order in plan.md)
+- Forgetting to update plan.md with completion status
+## Routing
+| Outcome | Route |
+|---------|-------|
+| pass | → pipeline continues (planner→builder→gauntlet→ship) |
+| fail | → oh-expert (diagnose loop failure) |
+| blocker | → surface to user with context and options |

package/harness/skills/oh-plan-review/SKILL.md ADDED Viewed

@@ -0,0 +1,128 @@
+---
+name: oh-plan-review
+description: "Multi-lens plan review: 4 perspectives in one skill. Choose Engineering (architecture/scope), Design (UX/interaction), DX (API/CLI ergonomics), or Strategy (product/CEO). Interactive — walks through findings one section at a time."
+tier: 3
+benefits-from: [oh-planner, oh-expert]
+triggers:
+  - "plan review"
+  - "review the plan"
+  - "architecture review"
+  - "design review"
+  - "ux review"
+  - "dx review"
+  - "strategy review"
+  - "eng review"
+  - "ceo review"
+---
+# oh-plan-review
+Four review lenses in one skill. Pick the lens that fits the plan's scope — or run multiple lenses in sequence for thorough coverage.
+**Interactive.** Walk findings one section at a time with opinionated recommendations and AskUserQuestion gates. Never dump all findings at once.
+**Read-only.** No code changes. The output is a better plan, not a document about the plan.
+## Lens Selection
+Ask the user which lens fits, or auto-detect from plan content:
+| Trigger keywords | Recommended lens |
+|---|---|
+| architecture, data model, API design, file structure, types, modules | Engineering |
+| UI, layout, colors, components, screens, mockups, user interface | Design |
+| CLI, SDK, developer tool, API, npm package, documentation, onboarding | DX |
+| product, strategy, scope, roadmap, competition, business model | Strategy |
+### Engineering Lens
+Scope challenge, architecture review, cognitive patterns for eng managers.
+**Scope Challenge** — Before reviewing anything:
+1. What existing code already partially solves each sub-problem?
+2. What is the minimum set of changes that achieves the stated goal?
+3. Complexity check: 8+ files or 2+ new classes/services → smell. Challenge it.
+4. Search check: does the runtime/framework have built-in support for each pattern the plan introduces?
+5. Completeness check: with AI-assisted coding, the cost of completeness is 10-100x cheaper. Recommend complete lakes over shortcuts.
+6. Distribution check: new artifact types need build/publish pipelines.
+**Architecture Review** — Walk through one section at a time: Architecture → Code Quality → Tests → Performance. Max 8 top issues per section. Use AskUserQuestion to discuss each finding.
+**Anti-skip rule:** Never condense or skip a section. If a section has zero findings, say so — but evaluate it.
+**Cognitive patterns** (internalize, don't enumerate):
+- State diagnosis (Larson) — Is your team falling behind, treading water, repaying debt, or innovating?
+- Blast radius instinct — What's the worst case and how many systems does it affect?
+- Boring by default (McKinley) — Proven technology unless you have innovation tokens to spend.
+- Reversibility preference — Feature flags, incremental rollouts. Make wrong answers cheap.
+- Essential vs accidental complexity (Brooks) — Is this solving a real problem or one we created?
+### Design Lens
+UX review, interaction state coverage, AI slop detection.
+**Evaluate:**
+- Empty states — every screen without data needs warmth, action, context
+- Visual hierarchy — what does the user see first, second, third?
+- Edge cases — 47-char names, zero results, error states, first-time vs power user
+- AI slop — generic card grids, hero sections, 3-column features? Flag them.
+- Responsive — every viewport gets intentional design, not just stack-on-mobile
+- Accessibility — keyboard nav, screen readers, contrast, touch targets
+**Principle:** Specificity over vibes. "Clean, modern UI" is not a design decision. Name the font, spacing scale, interaction pattern, and motion.
+### DX Lens
+Developer experience audit for APIs, CLIs, SDKs, libraries, platforms.
+**Evaluate:**
+- Time to Hello World — target < 2 minutes. Every extra minute drops adoption 20-30%.
+- Error quality — every error = problem + cause + fix. No "something went wrong."
+- First five minutes — one click to start. No credit card. No demo call.
+- Progressive disclosure — simple case is production-ready. Complex case uses the same API.
+- Pit of Success — make the right thing easy, the wrong thing hard.
+**Three modes:**
+- **DX Expansion** — competitive advantage. Design magical moments. Benchmark competitors.
+- **DX Polish** — bulletproof every touchpoint. No friction, no uncertainty.
+- **DX Triage** — critical gaps only. Minimum viable DX investment.
+### Strategy Lens
+Product/CEO review with 4 scope modes.
+**Select mode:**
+- **Scope Expansion** — "What would make this 10x better for 2x the effort?" Push scope up. Present each expansion as an AskUserQuestion. The user opts in or out.
+- **Selective Expansion** — Hold the baseline. Surface expansion opportunities for cherry-picking. Neutral recommendation posture.
+- **Hold Scope** — Make it bulletproof. Catch every failure mode. No silent reduction or expansion.
+- **Scope Reduction** — Find the minimum viable version. Be ruthless. Cut everything non-essential.
+**Cognitive patterns** (internalize):
+- Classification instinct (Bezos) — One-way vs two-way doors. Most things are two-way; move fast.
+- Inversion reflex (Munger) — For every "how do we win?" also ask "what would make us fail?"
+- Focus as subtraction (Jobs) — Default: do fewer things, better. 350 products → 10.
+- Proxy skepticism (Bezos) — Are our metrics still serving users or self-referential?
+- Temporal depth — Think in 5-10 year arcs. Apply regret minimization for major bets.
+**Prime directives:**
+- Zero silent failures. Every failure mode must be visible.
+- Every error has a name. Don't say "handle errors." Name the exception class, trigger, catch, user-facing message.
+- Data flows have shadow paths: nil, empty, upstream error. Trace all four.
+- Observability is scope, not afterthought. New dashboards and alerts are first-class deliverables.
+- Everything deferred must be written down. TODOS.md or it doesn't exist.
+- You have permission to say "scrap it and do this instead."
+## Output
+After each lens, the plan file (`/.opencode/plan.md`) is updated with findings and decisions. The user reviews and accepts changes interactively.
+## Rules
+- **Interactive only.** One section at a time. Use AskUserQuestion to discuss findings before writing.
+- **Anti-skip:** Every section must be evaluated. If zero findings, say "No issues found" and move on.
+- **Anti-shortcut:** The plan file is the OUTPUT of the interactive review, not a substitute for it. Findings go through AskUserQuestion before writing.
+- **Commit to the chosen lens.** Once scope is agreed, don't re-argue earlier decisions in later sections.
+## Routing
+| Outcome | Route |
+|---------|-------|
+| pass | → oh-grill (if concerns remain) or oh-manifest (execute plan) |
+| fail | → oh-planner (revise plan based on findings) |
+| blocker | → surface to user |

package/harness/skills/oh-planner/SKILL.md ADDED Viewed

@@ -0,0 +1,157 @@
+---
+name: oh-planner
+description: "ALL-arounder planner — brainstorm, architect, autoplan, decision pipeline. Produces a consumable plan artifact."
+tier: 3
+benefits-from: [oh-expert, oh-grill]
+triggers:
+  - "plan this"
+  - "how should I build"
+  - "architecture"
+  - "design this feature"
+  - "brainstorm"
+  - "autoplan"
+  - "strategy"
+  - "scope this"
+---
+# oh-planner
+The ALL-arounder planner. Merges brainstorm, architecture analysis, strategy review, and automatic plan review into one skill. Produces `.opencode/plan.md` that oh-builder consumes.
+## Entry Modes
+Use the mode that matches the user's starting point:
+### Mode A: Brainstorm (exploratory)
+When the idea is fuzzy and needs shaping.
+1. **Demand reality** — who specifically needs this?
+2. **Status quo** — what do they do today?
+3. **Desperate specificity** — what's the one concrete thing they can't do?
+4. **Narrowest wedge** — what's the smallest useful version?
+5. **Observation** — what will you see/hear when it works?
+6. **Future-fit** — does this compound or plateau?
+Output: structured design doc.
+### Mode B: Architecture Analysis (existing codebase)
+When the codebase feels messy or you need to understand the surface.
+1. **Read the domain** — load CONTEXT.md, understand the language
+2. **Map the surface** — identify modules, boundaries, dependencies
+3. **Find deepening opportunities** — duplication, over-coupling, grown-beyond-purpose functions, missing abstractions
+4. **Rank by impact** — effort vs value, dependencies, risk
+Output: ranked refactoring candidates.
+### Mode C: Structured Plan (non-trivial feature)
+When requirements exist but need a plan document.
+1. **Scope challenge** — before reviewing anything, answer:
+   - What existing code already partially solves each sub-problem?
+   - What is the minimum set of changes that achieves the stated goal?
+   - **Complexity check:** 8+ files or 2+ new classes/services in a single phase → smell. Propose splitting or simplifying.
+   - **Search check:** for each architectural pattern or infrastructure component the plan introduces, check whether the runtime/framework has a built-in. Search for: `{framework} {pattern} built-in`. Flag custom solutions where built-ins exist.
+   - **Completeness check:** with AI-assisted coding, completeness is 10-100x cheaper than with human teams. If the plan shortcuts something to save human hours that only saves minutes with AI, recommend the complete version.
+2. **Strategy review** — challenge premises, identify scope decisions, consider 10x alternatives
+3. **Architecture review** — data flow, component boundaries, API surface, state model
+4. **Edge case analysis** — error states, concurrency, failure modes, security implications
+5. **Dependency mapping** — what blocks what, parallelizable work
+6. **Write plan.md** — structured artifact with phases, deps, verification steps
+### Mode D: Autoplan (plan exists, needs full review)
+When a plan file exists and needs the full gauntlet. Auto-decides 90% of questions using decision principles. Surfaces only taste decisions at a final approval gate.
+Runs in order: **Strategy → Architecture → Design → Engineering → DX**
+Each phase must complete before the next begins.
+## Decision Principles
+Use these to auto-resolve intermediate questions. Only surface to the user when options are genuinely close (taste decisions):
+1. **Completeness over cleverness** — Choose the option that covers more cases
+2. **Boil the lake** — Fix the blast radius, not the symptom
+3. **Pragmatic over perfect** — Cleaner option that ships today wins
+4. **DRY but not premature** — Reuse over rebuild, but don't abstract before the third instance
+5. **Explicit over implicit** — Clear code over magic
+6. **Bias toward action** — When in doubt, make progress
+Never auto-decide: premises (need human judgment) or cases where both the plan and the alternative have strong arguments.
+## Plan Artifact
+Output goes in `.opencode/plan.md` with this structure (matching the global AGENTS.md schema):
+```markdown
+# PLAN: <project-name>
+Plan ID: <project-name>-plan-<nnn>
+Project: <project-name>
+Status: active
+Created: <local-date-time>
+Updated: <local-date-time>
+Project Path: <absolute-project-path>
+Plan Path: .opencode/plan.md
+Objective: <short objective>
+## Current State
+<what exists now, what's missing>
+## Assumptions
+- <assumption 1>
+- <assumption 2>
+## Tasks
+- [ ] Task 1
+  - [ ] Subtask 1.1
+## Active Task
+<what's being worked on now>
+## Subagents
+| Agent | Purpose | Status | Findings |
+|---|---|---|---|
+## Completed
+- <what's done>
+## Blockers
+- None
+## Validation
+- [ ] Static checks
+- [ ] Unit tests
+- [ ] Manual verification
+## Decisions
+- <decision> — <rationale>
+## Notes
+<anything else>
+```
+## Anti-patterns
+- Skipping strategy review for complex features (architecture mistakes compound)
+- Plans at wrong granularity — too vague to execute or too detailed to read
+- Re-opening already-decided debates ("what if we rewrite in Rust?")
+- Perfect being the enemy of shipped (progress > polish)
+- Failing to flag taste decisions to the user
+- Big bang rewrites — plan increments, not overhauls
+## Routing
+| Outcome | Route |
+|---------|-------|
+| pass | → oh-grill (stress-test plan) |
+| fail | → oh-planner (revise gaps) |
+| blocker | → surface to user |

package/harness/skills/oh-prd/SKILL.md ADDED Viewed

@@ -0,0 +1,35 @@
+---
+name: oh-prd
+description: "Turn conversation context into a PRD and publish as GitHub issue"
+---
+# oh-prd
+## When to Use
+When a feature discussion has produced enough context to write a product requirements document. Captures the decision tree and outputs a structured issue.
+## Workflow
+1. Extract requirements from conversation history
+2. Structure into PRD format: problem statement, target users, requirements (must/should/could), out of scope
+3. Create as GitHub issue with `gh issue create`
+4. Add triage label for prioritisation
+## PRD Structure
+- **Problem** — what problem does this solve?
+- **Target users** — who benefits?
+- **Requirements** — must have / should have / could have
+- **Out of scope** — explicitly what's NOT included
+- **Success metrics** — how will we know it works?
+## Anti-patterns
+- Writing PRD before understanding the problem
+- Requirements that aren't testable ("fast" vs "loads in <200ms")
+- Gold-plating — every feature is "must have"
+## Routing
+| Outcome | Route |
+|---------|-------|
+| pass | → oh-issue (break PRD into actionable issues) |
+| fail | → oh-grill (stress-test unclear requirements) |
+| blocker | → surface to user |

package/harness/skills/oh-retro/SKILL.md ADDED Viewed

@@ -0,0 +1,33 @@
+---
+name: oh-retro
+description: "Weekly engineering retrospective — analyze commit history and work patterns"
+---
+# oh-retro
+## When to Use
+At the end of a sprint or work week. Analyzes what was shipped, how it went, and what to improve.
+## Workflow
+1. **Analyze commits** — read git log since last retro
+2. **Categorize work** — features, fixes, refactors, docs, chores
+3. **Pattern analysis** — recurring themes, bottlenecks, types of bugs
+4. **Praise** — call out good work, good patterns, good decisions
+5. **Growth areas** — what could be better, with specific suggestions
+6. **Trend tracking** — compare against previous retros
+## Output
+Structured retro report with: shipped items, metrics, praise, growth areas, action items.
+## Anti-patterns
+- Blame-focused retro (it's about process, not people)
+- Action items without owners (no follow-through)
+- Same retro every week (if nothing changed, why?)
+## Routing
+| Outcome | Route |
+|---------|-------|
+| pass | → oh-planner (start next cycle with retro insights) |
+| fail | → oh-handoff (document blockers for next session) |
+| blocker | → surface to user |

package/harness/skills/oh-review/SKILL.md ADDED Viewed

@@ -0,0 +1,110 @@
+---
+name: oh-review
+description: "Two-axis code and design review: Standards (conformance) + Spec (fidelity) in parallel sub-agents. Includes architecture deepening analysis."
+tier: 3
+benefits-from: [oh-expert]
+triggers:
+  - "review"
+  - "code review"
+  - "review since"
+  - "review changes"
+  - "pr review"
+  - "design review"
+---
+# oh-review
+Two-axis review of the diff between HEAD and a fixed point. Both axes run as parallel sub-agents, then findings are aggregated. Three modes: **Diff Review**, **Architecture Deepening**, or both in sequence.
+## When to Use
+Before merging any PR or landing changes. When you need a quality gate that catches both code-quality violations and spec deviations.
+## Mode Selection
+- **Diff Review** (default) — Standards + Spec review of a changeset
+- **Architecture Deepening** — Surface refactoring opportunities in the codebase
+- **Full Review** — Both: diff review first, then architecture deepening pass
+---
+## Mode A: Diff Review
+### 1. Pin the Fixed Point
+The user provides a branch, commit SHA, or tag. Capture `git diff <fixed-point>...HEAD` and `git log <fixed-point>..HEAD --oneline`.
+### 2. Identify the Spec Source
+Look for the originating spec in this order:
+1. Issue references in commit messages (`#123`, `Closes #45`) — fetch via `docs/agents/issue-tracker.md`
+2. A path the user passed as an argument
+3. A PRD/spec file under `docs/`, `specs/`, or `.scratch/`
+4. Ask the user
+If no spec exists, the Spec sub-agent skips and reports "no spec available."
+### 3. Identify the Standards Sources
+Collect all files documenting how code should be written:
+- AGENTS.md, CLAUDE.md, CONTRIBUTING.md
+- CONTEXT.md, ADRs
+- eslint/biome/prettier config (note tool-enforced ones — don't re-check)
+- Any STYLE.md, STANDARDS.md, STYLEGUIDE.md
+### 4. Spawn Both Sub-Agents (parallel)
+**Standards sub-agent:** Read the standards docs and the diff. Report per-file/hunk every place the diff violates a documented standard. Cite the standard source + rule. Distinguish hard violations from judgement calls. Skip anything tooling enforces.
+**Spec sub-agent:** Read the spec and the diff. Report: (a) requirements missing or partial, (b) scope creep, (c) requirements implemented but wrong. Quote the spec line for each finding.
+### 5. Aggregate
+Present findings under `## Standards` and `## Spec` headings. Do NOT merge or rerank — the two axes are deliberately separate. End with one-line summary: total findings per axis and the worst single issue.
+### Safety Check (always run inline before spawning sub-agents)
+- SQL injection vectors
+- LLM trust boundary violations
+- Conditional side effects (test vs prod)
+- Hardcoded secrets
+Block immediately if critical safety issue found — do not spawn sub-agents.
+---
+## Mode B: Architecture Deepening
+Surface deepening opportunities — refactors that turn shallow modules into deep ones. Uses the **deletion test**: if deleting a module would concentrate complexity (not just move it), the module is earning its keep. If complexity vanishes, the module was a pass-through.
+### Vocabulary
+Use these terms exactly:
+- **Module** — anything with an interface and an implementation
+- **Depth** — leverage at the interface: lots of behavior behind a small interface
+- **Seam** — where an interface lives; a place behavior can be altered without editing in place
+- **Leverage** — what callers get from depth
+- **Locality** — what maintainers get from depth: change concentrated in one place
+### Process
+1. **Explore** — Read CONTEXT.md and ADRs. Walk the codebase noting friction:
+   - Where does understanding one concept require bouncing between many small modules?
+   - Where are modules shallow (interface as complex as implementation)?
+   - Where are pure functions extracted for testability but real bugs hide in how they're called?
+   - Apply the deletion test to suspected shallow modules
+2. **Present candidates** — Numbered list. For each: files, problem, solution, benefits in terms of locality/leverage. Flag ADR conflicts.
+3. **Grilling loop** — Walk the design tree with the user. Side effects: update CONTEXT.md for new terms, offer ADRs for rejected candidates.
+4. **Output** — Ranked refactoring candidates with collision warnings.
+## Scoring
+- Critical safety issue → block immediately (before sub-agents)
+- Structural concern → changes requested
+- Spec deviation → changes requested (with reference)
+- Style/nit → note for follow-up
+## Anti-patterns
+- Reviewing style before safety (wrong priority order)
+- Rubber-stamping without reading the diff
+- Requesting changes for subjective preferences
+- Merging Standards and Spec findings (one axis masks the other)
+- Proposing interfaces in deepening mode before the user picks a candidate
+## Routing
+| Outcome | Route |
+|---------|-------|
+| pass | → oh-gauntlet (if code changes needed) or oh-ship |
+| fail | → oh-builder (fix violations found) |
+| blocker | → surface to user |