npm - openhermes - Versions diffs - 4.3.0 → 4.9.2 - Mend

openhermes 4.3.0 → 4.9.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (96) hide show

package/CONTEXT.md +9 -0
package/README.md +26 -15
package/bootstrap.ts +161 -124
package/harness/agents/oh-browser.md +97 -0
package/harness/agents/oh-builder.md +78 -0
package/harness/agents/oh-facade.md +75 -0
package/harness/agents/oh-fusion.md +45 -0
package/harness/agents/oh-gauntlet.md +71 -0
package/harness/agents/oh-grill.md +71 -0
package/harness/agents/oh-investigate.md +60 -0
package/harness/agents/oh-manifest.md +95 -0
package/harness/agents/oh-plan-review.md +40 -0
package/harness/agents/oh-planner.md +50 -0
package/harness/agents/oh-refactor.md +37 -0
package/harness/agents/oh-retro.md +46 -0
package/harness/agents/oh-review.md +85 -0
package/harness/agents/oh-security.md +83 -0
package/harness/agents/oh-ship.md +76 -0
package/harness/agents/oh-skill-craft.md +38 -0
package/harness/agents/openhermes.md +107 -53
package/harness/codex/AUTOPILOT.md +143 -91
package/harness/codex/CHARTER.md +81 -0
package/harness/commands/oh-doctor.md +193 -14
package/harness/instructions/SHELL.md +76 -0
package/harness/skills/oh-ascii/DEEP.md +292 -0
package/harness/skills/oh-ascii/SKILL.md +31 -0
package/harness/skills/oh-ascii/scripts/check_ascii_alignment.py +596 -0
package/harness/skills/oh-browser/DEEP.md +54 -0
package/harness/skills/oh-browser/SKILL.md +30 -0
package/harness/skills/oh-builder/DEEP.md +63 -0
package/harness/skills/oh-builder/SKILL.md +12 -90
package/harness/skills/oh-expert/DEEP.md +85 -0
package/harness/skills/oh-expert/SKILL.md +13 -106
package/harness/skills/oh-facade/DEEP.md +182 -0
package/harness/skills/oh-facade/SKILL.md +15 -279
package/harness/skills/oh-freeze/DEEP.md +18 -0
package/harness/skills/oh-freeze/SKILL.md +10 -19
package/harness/skills/oh-full-output/DEEP.md +25 -0
package/harness/skills/oh-full-output/SKILL.md +12 -65
package/harness/skills/oh-fusion/DEEP.md +120 -0
package/harness/skills/oh-fusion/SKILL.md +17 -295
package/harness/skills/oh-gauntlet/DEEP.md +77 -0
package/harness/skills/oh-gauntlet/SKILL.md +13 -105
package/harness/skills/oh-grill/DEEP.md +51 -0
package/harness/skills/oh-grill/SKILL.md +12 -63
package/harness/skills/oh-guard/DEEP.md +19 -0
package/harness/skills/oh-guard/SKILL.md +10 -24
package/harness/skills/oh-handoff/DEEP.md +48 -0
package/harness/skills/oh-handoff/SKILL.md +13 -23
package/harness/skills/oh-health/DEEP.md +74 -0
package/harness/skills/oh-health/SKILL.md +13 -76
package/harness/skills/oh-init/DEEP.md +85 -0
package/harness/skills/oh-init/SKILL.md +13 -127
package/harness/skills/oh-investigate/DEEP.md +171 -0
package/harness/skills/oh-investigate/SKILL.md +13 -66
package/harness/skills/oh-issue/DEEP.md +21 -0
package/harness/skills/oh-issue/SKILL.md +11 -27
package/harness/skills/oh-learn/DEEP.md +44 -0
package/harness/skills/oh-learn/SKILL.md +12 -83
package/harness/skills/oh-manifest/DEEP.md +92 -0
package/harness/skills/oh-manifest/SKILL.md +11 -108
package/harness/skills/oh-plan-review/DEEP.md +90 -0
package/harness/skills/oh-plan-review/SKILL.md +13 -115
package/harness/skills/oh-planner/DEEP.md +172 -0
package/harness/skills/oh-planner/SKILL.md +12 -149
package/harness/skills/oh-prd/DEEP.md +45 -0
package/harness/skills/oh-prd/SKILL.md +10 -26
package/harness/skills/oh-refactor/DEEP.md +122 -0
package/harness/skills/oh-refactor/SKILL.md +17 -410
package/harness/skills/oh-retro/DEEP.md +26 -0
package/harness/skills/oh-retro/SKILL.md +12 -24
package/harness/skills/oh-review/DEEP.md +87 -0
package/harness/skills/oh-review/SKILL.md +11 -97
package/harness/skills/oh-security/DEEP.md +83 -0
package/harness/skills/oh-security/SKILL.md +14 -96
package/harness/skills/oh-ship/DEEP.md +141 -0
package/harness/skills/oh-ship/SKILL.md +13 -31
package/harness/skills/oh-skill-craft/DEEP.md +369 -0
package/harness/skills/oh-skill-craft/SKILL.md +17 -178
package/harness/skills/oh-skills-link/DEEP.md +16 -0
package/harness/skills/oh-skills-link/SKILL.md +10 -20
package/harness/skills/oh-skills-list/DEEP.md +20 -0
package/harness/skills/oh-skills-list/SKILL.md +9 -22
package/harness/skills/oh-triage/DEEP.md +23 -0
package/harness/skills/oh-triage/SKILL.md +8 -24
package/harness/skills/oh-worktree/DEEP.md +169 -0
package/harness/skills/oh-worktree/SKILL.md +32 -0
package/lib/harness-resolver.ts +8 -10
package/package.json +5 -3
package/scripts/count-tokens.mjs +158 -0
package/scripts/oh-doctor.ps1 +342 -0
package/harness/codex/CONSTITUTION.md +0 -73
package/harness/codex/ROUTING.md +0 -92
package/harness/instructions/RUNTIME.md +0 -30
package/harness/skills/oh-caveman/SKILL.md +0 -42
package/lib/logger.ts +0 -75

package/harness/skills/oh-review/SKILL.md CHANGED Viewed

@@ -1,16 +1,7 @@
 ---
 name: oh-review
-description: "Two-axis code and design review: Standards (conformance) + Spec (fidelity) in parallel sub-agents. Includes architecture deepening analysis."
+description: "Use when code, design, or PR changes need review before merging. Runs Standards + Spec review in parallel sub-agents. Includes architecture deepening and receiving-review feedback handling."
 tier: 3
-benefits-from: [oh-expert]
-triggers:
-  - "code review please"
-  - "review the code"
-  - "review the PR"
-  - "review changes since"
-  - "pr review"
-  - "design review"
-  - "review this code"
 route:
   pass:
     - oh-gauntlet
@@ -21,97 +12,20 @@ route:
 # oh-review
-Two-axis review of the diff between HEAD and a fixed point. Both axes run as parallel sub-agents, then findings are aggregated. Three modes: **Diff Review**, **Architecture Deepening**, or both in sequence.
+Two-axis review: Standards + Spec, parallel sub-agents. Three modes: Diff Review, Architecture Deepening, Receiving Feedback.
-## When to Use
-Before merging any PR or landing changes. When you need a quality gate that catches both code-quality violations and spec deviations.
+## Steps
-## Mode Selection
-- **Diff Review** (default) — Standards + Spec review of a changeset
-- **Architecture Deepening** — Surface refactoring opportunities in the codebase
-- **Full Review** — Both: diff review first, then architecture deepening pass
----
-## Mode A: Diff Review
-### 1. Pin the Fixed Point
-The user provides a branch, commit SHA, or tag. Capture `git diff <fixed-point>...HEAD` and `git log <fixed-point>..HEAD --oneline`.
-### 2. Identify the Spec Source
-Look for the originating spec in this order:
-1. Issue references in commit messages (`#123`, `Closes #45`) — fetch via `docs/agents/issue-tracker.md`
-2. A path the user passed as an argument
-3. A PRD/spec file under `docs/`, `specs/`, or `.scratch/`
-4. Ask the user
-If no spec exists, the Spec sub-agent skips and reports "no spec available."
-### 3. Identify the Standards Sources
-Collect all files documenting how code should be written:
-- AGENTS.md, CLAUDE.md, CONTRIBUTING.md
-- CONTEXT.md, ADRs
-- eslint/biome/prettier config (note tool-enforced ones — don't re-check)
-### 4. Spawn Both Sub-Agents (parallel)
-**Standards sub-agent:** Read the standards docs and the diff. Report per-file/hunk every place the diff violates a documented standard. Cite the standard source + rule. Distinguish hard violations from judgement calls. Skip anything tooling enforces.
-**Spec sub-agent:** Read the spec and the diff. Report: (a) requirements missing or partial, (b) scope creep, (c) requirements implemented but wrong. Quote the spec line for each finding.
-### 5. Aggregate
-Present findings under `## Standards` and `## Spec` headings. Do NOT merge or rerank — the two axes are deliberately separate. End with one-line summary: total findings per axis and the worst single issue.
-### Safety Check (always run inline before spawning sub-agents)
-- SQL injection vectors
-- LLM trust boundary violations
-- Conditional side effects (test vs prod)
-- Hardcoded secrets
-Block immediately if critical safety issue found — do not spawn sub-agents.
----
-## Mode B: Architecture Deepening
-Surface deepening opportunities — refactors that turn shallow modules into deep ones. Uses the **deletion test**: if deleting a module would concentrate complexity (not just move it), the module is earning its keep. If complexity vanishes, the module was a pass-through.
-### Vocabulary
-Use these terms exactly:
-- **Module** — anything with an interface and an implementation
-- **Depth** — leverage at the interface: lots of behavior behind a small interface
-- **Seam** — where an interface lives; a place behavior can be altered without editing in place
-- **Leverage** — what callers get from depth
-- **Locality** — what maintainers get from depth: change concentrated in one place
-### Process
-1. **Explore** — Read CONTEXT.md and ADRs. Walk the codebase noting friction:
-   - Where does understanding one concept require bouncing between many small modules?
-   - Where are modules shallow (interface as complex as implementation)?
-   - Where are pure functions extracted for testability but real bugs hide in how they're called?
-   - Apply the deletion test to suspected shallow modules
-2. **Present candidates** — Numbered list. For each: files, problem, solution, benefits in terms of locality/leverage. Flag ADR conflicts.
-3. **Grilling loop** — Walk the design tree with the user. Side effects: update CONTEXT.md for new terms, offer ADRs for rejected candidates.
-4. **Output** — Ranked refactoring candidates with collision warnings.
-## Scoring
-- Critical safety issue → block immediately (before sub-agents)
-- Structural concern → changes requested
-- Spec deviation → changes requested (with reference)
-- Style/nit → note for follow-up
-## Anti-patterns
-- Reviewing style before safety (wrong priority order)
-- Rubber-stamping without reading the diff
-- Requesting changes for subjective preferences
-- Merging Standards and Spec findings (one axis masks the other)
-- Proposing interfaces in deepening mode before the user picks a candidate
+1. Pin fixed point — capture `git diff <fixed>...HEAD` and `git log <fixed>..HEAD --oneline`.
+2. Find spec and standards sources — issues, user paths, docs, AGENTS.md, ADRs, lint config.
+3. Run safety check — SQL injection, trust boundaries, hardcoded secrets. Block immediately if critical.
+4. Spawn parallel sub-agents — Standards (cite violations per standard) and Spec (quote requirements). Report independently.
+5. Aggregate findings — present under Standards/Spec sections. Do not merge. End with total + worst issue.
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → oh-gauntlet (if code changes needed) or oh-ship |
-| fail | → oh-builder (fix violations found) |
-| blocker | → surface to user |
+| pass | → oh-gauntlet or oh-ship |
+| fail | → oh-builder |
+| blocker | → surface |

package/harness/skills/oh-security/DEEP.md ADDED Viewed

@@ -0,0 +1,83 @@
+# oh-security — Deep Reference
+## When to Use
+Use when the codebase needs a security audit — secrets scanning, dependency checks, CI/CD review, and threat modeling. Two modes: daily (fast) and comprehensive (deep).
+Benefits from: oh-expert
+Triggers: "security audit", "threat model", "check for vulnerabilities", "owasp review", "pentest", "security review", "cso"
+## Modes
+- **Daily** (default) — only flag findings with strong evidence. Skips speculative checks.
+- **Comprehensive** (`--comprehensive`) — surface everything plausible. User decides.
+## Phases
+### Phase 0: Stack + Architecture Mental Model
+Detect language, framework, components, trust boundaries, data flows, attack surface.
+### Phase 1: Attack Surface Census
+Public vs authed vs admin endpoints. File uploads, external integrations, WebSocket, webhooks. CI/CD workflows, containers, IaC, deploy targets.
+### Phase 2: Secrets Archaeology
+Git history for leaked credentials (AWS, OpenAI, GitHub, Slack, generic). .env tracking status. CI inline secrets.
+### Phase 3: Dependency Supply Chain
+CVEs in direct deps, install scripts in production deps, lockfile integrity, abandoned packages. Diff-mode limits to changed deps.
+### Phase 4: CI/CD Security
+Unpinned third-party actions, `pull_request_target` misuse, script injection via `${{ github.event.* }}`, secrets as env vars, CODEOWNERS on workflows.
+### Phase 5: Infrastructure Shadow
+Dockerfiles (root, secrets in ARG, missing USER), configs with prod DB URLs, IaC (overly permissive IAM, privileged K8s). Staging → prod refs.
+### Phase 6: Webhooks
+Endpoints without signature verification, TLS verification disabled, overly broad OAuth scopes.
+### Phase 7: LLM Security
+Prompt injection (user input → system prompts), unsanitized LLM output in UI, tool calls without validation, hardcoded AI keys.
+### Phase 8: OWASP + STRIDE
+Map findings to OWASP Top 10 and STRIDE. Coverage gaps identified.
+## Anti-patterns
+- Running daily mode for comprehensive needs (misses deep issues)
+- Skipping secrets archaeology in git history
+- Relying only on automated scanners without manual review
+- Not updating after dependencies change
+## Reference
+### Output Format
+```
+Security Posture Report
+Critical (n): finding — file:line — remediation
+High (n):
+Medium (n):
+Low (n):
+OWASP Coverage: A01-A10
+STRIDE: Spoofing..Elevation of Privilege
+```
+### Rules
+- Read-only (diagnosis only). Auto-fix low severity only if explicitly asked.
+- Daily: 8/10 gate. Would you stake reputation on it?
+- Comprehensive: 2/10 gate. Surface everything.
+- No false positives on git history. Placeholder values excluded. Rotated secrets still flagged.
+- Prioritize by blast radius: RCE > credential exposure > info leak > best-practice.
+- Distinguish direct vs transitive dependency findings.
+- Use Grep/Glob tools, not bash grep.

package/harness/skills/oh-security/SKILL.md CHANGED Viewed

@@ -1,16 +1,7 @@
 ---
 name: oh-security
-description: "Security audit: secrets archaeology, dependency supply chain, CI/CD security, OWASP Top 10, STRIDE threat modeling, LLM security. Two modes: daily (8/10 confidence gate) and comprehensive (2/10 bar)."
+description: "Security audit — secrets, dependencies, CI/CD, threat modeling"
 tier: 3
-benefits-from: [oh-expert]
-triggers:
-  - "security audit"
-  - "threat model"
-  - "check for vulnerabilities"
-  - "owasp review"
-  - "pentest"
-  - "security review"
-  - "cso"
 route:
   pass: surface
   fail: oh-investigate
@@ -19,96 +10,23 @@ route:
 # oh-security
-Security audit that finds the doors that are actually unlocked. Two modes: **daily** (8/10 confidence gate — low noise, high signal) and **comprehensive** (2/10 bar — casts a wider net, more findings). Output is a Security Posture Report with severity ratings and remediation plans. Does NOT make code changes — diagnosis only.
+Security audit: secrets scanning, dependency checks, CI/CD review, threat modeling. Read-only — diagnosis only.
-## Mode Selection
+## Steps
-- **Daily** (default) — 8/10 confidence gate. Only flag findings with strong evidence. Skips speculative or trace-only checks. Runs all phases but reports only clear findings.
-- **Comprehensive** (`--comprehensive`) — 2/10 bar. Surfaces more. Includes trace-only flags, speculative dependency issues, and historical pattern matching.
-## Phases
-### Phase 0: Stack Detection + Architecture Mental Model
-Detect the project's language stack and framework. Build an explicit mental model: what are the components, trust boundaries, data flows, and attack surface.
-### Phase 1: Attack Surface Census
-Map what an attacker sees:
-- Public vs authenticated vs admin endpoints
-- File upload points, external integrations, background jobs
-- WebSocket channels, webhook receivers
-- CI/CD workflows, container configs, IaC, deploy targets
-### Phase 2: Secrets Archaeology
-Scan git history for leaked credentials (AWS keys, OpenAI keys, GitHub tokens, Slack tokens, generic secrets). Check `.env` tracking status. Scan CI configs for inline secrets.
-### Phase 3: Dependency Supply Chain
-Check beyond `npm audit`: known CVEs in direct deps, install scripts in production deps, lockfile integrity, abandoned packages. Diff-mode limits to changed deps.
-### Phase 4: CI/CD Pipeline Security
-Check for unpinned third-party actions, `pull_request_target` misuse, script injection via `${{ github.event.* }}`, secrets exposed as env vars, CODEOWNERS protection on workflow files.
-### Phase 5: Infrastructure Shadow Surface
-Dockerfiles (root user, secrets in ARG, missing USER), config files with prod DB URLs, IaC (overly permissive IAM, privileged K8s). Staging configs referencing prod.
-### Phase 6: Webhook & Integration Audit
-Webhook endpoints without signature verification, TLS verification disabled in prod, overly broad OAuth scopes.
-### Phase 7: LLM & AI Security
-Prompt injection vectors (user input flowing into system prompts), unsanitized LLM output rendered in UI, tool/function calling without validation, hardcoded AI API keys.
-### Phase 8: OWASP Top 10 + STRIDE
-Map findings to OWASP Top 10 categories and STRIDE threat model. Identify gaps in coverage across categories.
-## Output Format
-```
-Security Posture Report
-══════════════════════
-Project: <name>
-Branch:  <branch>
-Mode:    daily | comprehensive
-Date:    <date>
-Critical (n):
-  - <finding> — <file:line> — <remediation>
-High (n):
-  - <finding> — <file:line> — <remediation>
-Medium (n):
-  - <finding> — <file:line> — <remediation>
-Low (n):
-  - <finding> — <file:line> — <remediation>
-OWASP Coverage:
-  A01:Broken Access Control — n findings
-  A02:Cryptographic Failures — n findings
-  ...
-STRIDE:
-  Spoofing — n
-  Tampering — n
-  Repudiation — n
-  Info Disclosure — n
-  Denial of Service — n
-  Elevation of Privilege — n
-```
-## Rules
-- **Read-only.** No fixes. Diagnosis only, except for auto-fixable low-severity findings when explicitly asked.
-- **Daily mode** gates findings at 8/10 confidence. If you would not stake your reputation on it, skip it in daily mode.
-- **Comprehensive mode** gates at 2/10. Surface everything plausible. The user decides.
-- **No false positives on git history.** Placeholder values ("your_", "changeme") excluded. Rotated secrets still flagged (they were exposed).
-- **Prioritize by blast radius.** Remote code execution > credential exposure > info leak > best-practice gap.
-- **Always distinguish direct vs transitive** dependencies in supply chain findings.
-- **Use Grep/Glob tools** for searches, not bash grep. The bash blocks below show WHAT to search, not HOW.
+1. Detect mode: daily (default, 8/10 confidence gate) or comprehensive (`--comprehensive`, 2/10 gate)
+2. Map stack, architecture, trust boundaries, data flows, and attack surface
+3. Scan git history, .env files, and CI inline configs for leaked secrets
+4. Audit dependencies for CVEs, install scripts, lockfile integrity, and abandoned packages
+5. Review CI/CD security — pinned actions, pull_request_target misuse, script injection, secrets exposure
+6. Check infrastructure — Dockerfiles, IaC, prod DB URLs, staging-to-prod references
+7. Assess OWASP Top 10 and STRIDE coverage gaps
+8. Produce Security Posture Report with criticality-ranked findings and route
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → [report findings to user] |
-| fail | → oh-investigate (deepen on findings) |
-| blocker | → surface to user |
+| pass | surface (report findings) |
+| fail | → oh-investigate (deepen) |
+| blocker | → surface |

package/harness/skills/oh-ship/DEEP.md ADDED Viewed

@@ -0,0 +1,141 @@
+# oh-ship — Deep Reference
+## When to Use
+Code ready to ship. Ships to the **current branch**. PRs are only created when explicitly stated or requested by the user — never automatically.
+## Workflow (Steps 1–4)
+### 1. Pre-flight
+Run tests, lint, typecheck. If any fail, stop and surface.
+### 2. Version bump (conditional)
+Check if a version bump is applicable:
+- If `package.json` or `VERSION` exists and user mentioned a release/bump → semver bump
+- If no version file exists or user didn't request a bump → skip
+- If unsure whether to bump → ask the user
+### 3. Changelog
+Generate from commits since last tag. Polish: consistent tense, group by type (features, fixes, breaking). Skip if no tag history.
+### 4. Commit
+Stage all changes. Commit message uses conventional commit format with **vague, professional descriptions** — do not leak implementation details. Use the git-commit skill conventions: `<type>[scope]: <short description>`.
+## Environment & Options (Steps 5–6)
+### 5. Detect Environment
+Before proceeding, determine workspace state:
+```bash
+GIT_DIR=$(cd "$(git rev-parse --git-dir)" 2>/dev/null && pwd -P)
+GIT_COMMON=$(cd "$(git rev-parse --git-common-dir)" 2>/dev/null && pwd -P)
+```
+| State | Type | Options |
+|-------|------|---------|
+| `GIT_DIR == GIT_COMMON` | Normal repo | Standard 4 |
+| `GIT_DIR != GIT_COMMON`, named branch | Worktree | Standard 4 |
+| `GIT_DIR != GIT_COMMON`, detached HEAD | Externally managed | Reduced 3 |
+### 6. Option Presentation
+Core principle: Verify → Detect → Present → Execute → Clean up.
+Determine base branch (`git merge-base HEAD main` or `git merge-base HEAD master`). Present structured options based on detected environment.
+**Normal repo or named-branch worktree (4 options):**
+```
+Implementation complete. What would you like to do?
+1. Merge into <base> locally
+2. Push + create a Pull Request
+3. Keep branch as-is
+4. Discard this work
+Which option?
+```
+**Detached HEAD — externally managed (3 options):**
+```
+Implementation complete. Detached HEAD — options:
+1. Push as new branch + create a Pull Request
+2. Keep as-is
+3. Discard this work
+Which option?
+```
+- **Option 1 (Merge):** Checkout base, pull, merge feature branch, verify tests on merged result. Run Provenance-Based Cleanup, then `git branch -d <branch>`. Done. Steps 7–11 are skipped.
+- **Option 2 (Push + PR):** Continue to Steps 7–11 (Push, PR, Deploy, Verify, Docs Sync).
+- **Option 3 (Keep):** Report "Keeping branch `<name>`." No cleanup. Stop.
+- **Option 4 (Discard):** Require typed "discard" confirmation. On confirm, run Provenance-Based Cleanup, then `git branch -D <branch>`. Done.
+## Push & Deploy (Steps 7–11)
+### 7. Push to current branch
+`git push origin <current-branch>`. Always the current branch. Never assume a different target.
+### 8. PR (only if requested)
+If the user explicitly said "create a PR", "open a pull request", or similar → create PR with summary and test evidence. If the change is very large, you may **suggest** a PR, but do not create one without explicit user confirmation.
+### 9. Deploy
+Trigger deploy (platform-specific). If no deploy target is configured, skip.
+### 10. Verify
+Smoke test or health check if applicable.
+### 11. Post-ship docs sync
+Cross-reference diff against README, CHANGELOG, ARCHITECTURE.md, CONTRIBUTING.md. Update to match what shipped.
+## Cleanup
+### Provenance-Based Cleanup
+Only runs for Option 1 (Merge) and Option 4 (Discard). Options 2 (Push + PR) and 3 (Keep) always preserve the worktree.
+1. **Detect provenance:**
+   - `GIT_DIR == GIT_COMMON` → normal repo, no worktree to clean. Done.
+   - Worktree path is under `.worktrees/`, `worktrees/`, or similar known paths → we own cleanup.
+   - Otherwise → harness-owned workspace. Do NOT remove.
+2. **Cleanup (only for owned worktrees):**
+   ```bash
+   MAIN_ROOT=$(git -C "$(git rev-parse --git-common-dir)/.." rev-parse --show-toplevel)
+   cd "$MAIN_ROOT"
+   git worktree remove "$WORKTREE_PATH"
+   git worktree prune
+   ```
+3. **Never clean up** harness-owned workspaces. If the platform provides a workspace-exit tool, use it. Otherwise leave in place.
+### Correct Ordering
+Merge → verify → cleanup → delete branch. Never delete before cleanup — `git branch -d` fails when worktree still references the branch. Always `cd` to main repo root before `git worktree remove`.
+## Quick Reference
+| Option | Merge | Push | Keep Worktree | Cleanup | Delete Branch |
+|--------|-------|------|---------------|---------|---------------|
+| 1. Merge locally | yes | — | — | yes | yes (`-d`) |
+| 2. Push + PR | — | yes | yes | — | — |
+| 3. Keep as-is | — | — | yes | — | — |
+| 4. Discard | — | — | — | yes | yes (`-D`) |
+## Branch Protocol
+- **Always push to the current branch.** Detect it with `git branch --show-current`.
+- **Always confirm before any branch-sensitive operation.** If the current branch is `main` or `master`, ask: *"Current branch is main. Are you sure? Do you mean a feature/dev branch?"*
+- **Never auto-create a PR.** The user must explicitly say "create a PR" or you may suggest one for massive changes, but never execute without confirmation.
+- **Never merge.** Merging is the user's decision.
+## Branch Confirmation Rules
+Before these operations, ALWAYS confirm the branch with the user:
+- Pushing to `main` / `master` / `production` — ask "Are you sure? Do you mean a dev branch?"
+- Creating a PR — confirm source and target branches
+- Deploying — confirm which environment
+- Version bump — confirm the bump type (major/minor/patch)
+## Anti-patterns
+- Skipping pre-flight ("just a quick fix")
+- Auto-creating a PR without the user asking
+- Pushing to main without confirmation
+- Merging without user instruction
+- Deploy without post-deploy verification
+- Not tagging releases
+- Deleting branch before removing worktree
+- Running `git worktree remove` from inside the worktree
+- Cleaning up harness-owned worktrees (provenance check required)
+- Discarding work without typed confirmation

package/harness/skills/oh-ship/SKILL.md CHANGED Viewed

@@ -1,12 +1,7 @@
 ---
 name: oh-ship
-description: "Deploy and PR pipeline — test, bump, changelog, PR, deploy, verify"
+description: "Use when code is ready to ship. Tests, version bump, commit, push to current branch, deploy, and verify. PRs only on request."
 tier: 4
-triggers:
-  - "ship this"
-  - "create a PR"
-  - "version bump"
-  - "publish"
 route:
   pass: oh-retro
   fail: oh-expert
@@ -15,35 +10,22 @@ route:
 # oh-ship
-## When to Use
-When code is ready to ship. Runs the full release pipeline from test to PR to deploy verification.
+Complete ship pipeline: pre-flight → version → changelog → commit → detect → present → execute.
-## Workflow
-1. **Pre-flight** — run test suite, lint, typecheck
-2. **Version bump** — read VERSION file or package.json, bump according to semver
-3. **Changelog** — generate from commit history since last tag. Polish voice: consistent tense, group by type (features, fixes, breaking)
-4. **Commit + push** — create release commit with changelog
-5. **PR** — create GitHub PR with summary, test evidence, deploy plan
-6. **Merge** — merge PR after CI passes
-7. **Deploy** — trigger deploy (platform-specific)
-8. **Verify** — canary check, health endpoints, smoke tests
-9. **Post-ship docs sync** — read all project docs (README, ARCHITECTURE.md, CONTRIBUTING.md), cross-reference the diff, update to match what shipped:
-   - README: new features, changed APIs, updated examples
-   - CHANGELOG: verify polish against what actually merged
-   - ARCHITECTURE.md / CONTRIBUTING.md: reflect any structural or workflow changes
-   - TODOS.md: remove completed items, add any new deferred items discovered during ship
-   - VERSION file: bump if not already done
+## Steps
-## Anti-patterns
-- Skipping pre-flight checks ("just a quick fix")
-- Bumping version without changelog
-- Deploying without post-deploy verification
-- Not tagging releases
+1. Run pre-flight — tests, lint, typecheck. Stop and surface if any fail.
+2. Version bump — conditional. If `package.json` or `VERSION` exists and user mentioned release, semver bump. Skip or ask if unsure.
+3. Generate changelog — from commits since last tag. Group by type (features, fixes, breaking). Skip if no tag history.
+4. Commit — stage all changes. Use conventional commit format with vague professional descriptions.
+5. Detect environment — normal repo, worktree, or detached HEAD. Determine base branch.
+6. Present structured options — Merge locally, Push + PR, Keep branch, or Discard.
+7. Execute chosen option — merge (verify + cleanup + delete), push (push + PR + deploy + verify + docs sync), keep, or discard (require typed confirmation).
 ## Routing
 | Outcome | Route |
 |---------|-------|
-| pass | → oh-retro (post-ship review) |
-| fail | → oh-expert (diagnose deployment failure) |
-| blocker | → surface to user |
+| pass | → surface (report success) |
+| fail | → oh-expert (diagnose) |
+| blocker | → surface |