openhermes 4.3.0 → 4.9.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CONTEXT.md +9 -0
- package/README.md +26 -15
- package/bootstrap.ts +161 -124
- package/harness/agents/oh-browser.md +97 -0
- package/harness/agents/oh-builder.md +78 -0
- package/harness/agents/oh-facade.md +75 -0
- package/harness/agents/oh-fusion.md +45 -0
- package/harness/agents/oh-gauntlet.md +71 -0
- package/harness/agents/oh-grill.md +71 -0
- package/harness/agents/oh-investigate.md +60 -0
- package/harness/agents/oh-manifest.md +95 -0
- package/harness/agents/oh-plan-review.md +40 -0
- package/harness/agents/oh-planner.md +50 -0
- package/harness/agents/oh-refactor.md +37 -0
- package/harness/agents/oh-retro.md +46 -0
- package/harness/agents/oh-review.md +85 -0
- package/harness/agents/oh-security.md +83 -0
- package/harness/agents/oh-ship.md +76 -0
- package/harness/agents/oh-skill-craft.md +38 -0
- package/harness/agents/openhermes.md +107 -53
- package/harness/codex/AUTOPILOT.md +143 -91
- package/harness/codex/CHARTER.md +81 -0
- package/harness/commands/oh-doctor.md +193 -14
- package/harness/instructions/SHELL.md +76 -0
- package/harness/skills/oh-ascii/DEEP.md +292 -0
- package/harness/skills/oh-ascii/SKILL.md +31 -0
- package/harness/skills/oh-ascii/scripts/check_ascii_alignment.py +596 -0
- package/harness/skills/oh-browser/DEEP.md +54 -0
- package/harness/skills/oh-browser/SKILL.md +30 -0
- package/harness/skills/oh-builder/DEEP.md +63 -0
- package/harness/skills/oh-builder/SKILL.md +12 -90
- package/harness/skills/oh-expert/DEEP.md +85 -0
- package/harness/skills/oh-expert/SKILL.md +13 -106
- package/harness/skills/oh-facade/DEEP.md +182 -0
- package/harness/skills/oh-facade/SKILL.md +15 -279
- package/harness/skills/oh-freeze/DEEP.md +18 -0
- package/harness/skills/oh-freeze/SKILL.md +10 -19
- package/harness/skills/oh-full-output/DEEP.md +25 -0
- package/harness/skills/oh-full-output/SKILL.md +12 -65
- package/harness/skills/oh-fusion/DEEP.md +120 -0
- package/harness/skills/oh-fusion/SKILL.md +17 -295
- package/harness/skills/oh-gauntlet/DEEP.md +77 -0
- package/harness/skills/oh-gauntlet/SKILL.md +13 -105
- package/harness/skills/oh-grill/DEEP.md +51 -0
- package/harness/skills/oh-grill/SKILL.md +12 -63
- package/harness/skills/oh-guard/DEEP.md +19 -0
- package/harness/skills/oh-guard/SKILL.md +10 -24
- package/harness/skills/oh-handoff/DEEP.md +48 -0
- package/harness/skills/oh-handoff/SKILL.md +13 -23
- package/harness/skills/oh-health/DEEP.md +74 -0
- package/harness/skills/oh-health/SKILL.md +13 -76
- package/harness/skills/oh-init/DEEP.md +85 -0
- package/harness/skills/oh-init/SKILL.md +13 -127
- package/harness/skills/oh-investigate/DEEP.md +171 -0
- package/harness/skills/oh-investigate/SKILL.md +13 -66
- package/harness/skills/oh-issue/DEEP.md +21 -0
- package/harness/skills/oh-issue/SKILL.md +11 -27
- package/harness/skills/oh-learn/DEEP.md +44 -0
- package/harness/skills/oh-learn/SKILL.md +12 -83
- package/harness/skills/oh-manifest/DEEP.md +92 -0
- package/harness/skills/oh-manifest/SKILL.md +11 -108
- package/harness/skills/oh-plan-review/DEEP.md +90 -0
- package/harness/skills/oh-plan-review/SKILL.md +13 -115
- package/harness/skills/oh-planner/DEEP.md +172 -0
- package/harness/skills/oh-planner/SKILL.md +12 -149
- package/harness/skills/oh-prd/DEEP.md +45 -0
- package/harness/skills/oh-prd/SKILL.md +10 -26
- package/harness/skills/oh-refactor/DEEP.md +122 -0
- package/harness/skills/oh-refactor/SKILL.md +17 -410
- package/harness/skills/oh-retro/DEEP.md +26 -0
- package/harness/skills/oh-retro/SKILL.md +12 -24
- package/harness/skills/oh-review/DEEP.md +87 -0
- package/harness/skills/oh-review/SKILL.md +11 -97
- package/harness/skills/oh-security/DEEP.md +83 -0
- package/harness/skills/oh-security/SKILL.md +14 -96
- package/harness/skills/oh-ship/DEEP.md +141 -0
- package/harness/skills/oh-ship/SKILL.md +13 -31
- package/harness/skills/oh-skill-craft/DEEP.md +369 -0
- package/harness/skills/oh-skill-craft/SKILL.md +17 -178
- package/harness/skills/oh-skills-link/DEEP.md +16 -0
- package/harness/skills/oh-skills-link/SKILL.md +10 -20
- package/harness/skills/oh-skills-list/DEEP.md +20 -0
- package/harness/skills/oh-skills-list/SKILL.md +9 -22
- package/harness/skills/oh-triage/DEEP.md +23 -0
- package/harness/skills/oh-triage/SKILL.md +8 -24
- package/harness/skills/oh-worktree/DEEP.md +169 -0
- package/harness/skills/oh-worktree/SKILL.md +32 -0
- package/lib/harness-resolver.ts +8 -10
- package/package.json +5 -3
- package/scripts/count-tokens.mjs +158 -0
- package/scripts/oh-doctor.ps1 +342 -0
- package/harness/codex/CONSTITUTION.md +0 -73
- package/harness/codex/ROUTING.md +0 -92
- package/harness/instructions/RUNTIME.md +0 -30
- package/harness/skills/oh-caveman/SKILL.md +0 -42
- package/lib/logger.ts +0 -75
|
@@ -1,14 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: oh-init
|
|
3
|
-
description: "
|
|
3
|
+
description: "Sets up AGENTS.md, domain docs, issue tracker, and triage labels"
|
|
4
4
|
tier: 2
|
|
5
|
-
triggers:
|
|
6
|
-
- "init this project for oh"
|
|
7
|
-
- "setup project for openhermes"
|
|
8
|
-
- "initialize openhermes setup"
|
|
9
|
-
- "onboard this project"
|
|
10
|
-
- "scaffold project setup"
|
|
11
|
-
- "oh takeover this project"
|
|
12
5
|
route:
|
|
13
6
|
pass: done
|
|
14
7
|
fail: oh-init
|
|
@@ -17,129 +10,22 @@ route:
|
|
|
17
10
|
|
|
18
11
|
# oh-init
|
|
19
12
|
|
|
20
|
-
|
|
13
|
+
Wire AGENTS.md, domain docs, issue tracker, and triage labels for a new OpenHermes project.
|
|
21
14
|
|
|
22
|
-
|
|
15
|
+
## Steps
|
|
23
16
|
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
- ☐ Canonical plan files (`~/.local/share/opencode/openhermes/plans/<project-name>-plan-*.md`)?
|
|
32
|
-
- ☐ `CONTEXT.md` exists?
|
|
33
|
-
- ☐ `docs/agents/` directory exists?
|
|
34
|
-
|
|
35
|
-
Report findings. If everything exists, offer to skip or verify and exit.
|
|
36
|
-
|
|
37
|
-
### Phase 1: AGENTS.md Wiring
|
|
38
|
-
|
|
39
|
-
Check if AGENTS.md exists:
|
|
40
|
-
|
|
41
|
-
**If AGENTS.md does not exist:**
|
|
42
|
-
Create it with OpenHermes orchestrator header + prompts for project info:
|
|
43
|
-
|
|
44
|
-
```markdown
|
|
45
|
-
# <project-name>
|
|
46
|
-
|
|
47
|
-
OpenHermes is the primary orchestrator. All routing, planning, and delegation flows through oh-* skills.
|
|
48
|
-
|
|
49
|
-
## Project Context
|
|
50
|
-
|
|
51
|
-
- **Language**: <fill in>
|
|
52
|
-
- **Package manager**: <fill in>
|
|
53
|
-
- **Build command**: <fill in>
|
|
54
|
-
- **Test command**: <fill in>
|
|
55
|
-
- **Lint/type check**: <fill in>
|
|
56
|
-
|
|
57
|
-
## Key Directives
|
|
58
|
-
|
|
59
|
-
- Plan first. Write to `~/.local/share/opencode/openhermes/plans/<project-name>-plan-<nnn>.md` before multi-file changes.
|
|
60
|
-
- **OpenHermes never executes tasks directly. It talks/reports to the user and delegates everything to sub-agents.**
|
|
61
|
-
- Verify before claiming success. Read files, run commands, confirm output.
|
|
62
|
-
- Never write code, run tests, or edit files in the main context — always delegate.
|
|
63
|
-
- Use oh-* skills on demand. Load via OpenCode's skill tool when relevant.
|
|
64
|
-
- Plan file is self-contained (Tasks, Completed, Work Log sections).
|
|
65
|
-
```
|
|
66
|
-
|
|
67
|
-
Then ask the user to fill in the Project Context fields. Offer to auto-detect from package manifests.
|
|
68
|
-
|
|
69
|
-
**If AGENTS.md exists** (e.g., created by OpenCode `/init`):
|
|
70
|
-
Append an `## OpenHermes Orchestrator` section to the end:
|
|
71
|
-
|
|
72
|
-
```markdown
|
|
73
|
-
## OpenHermes Orchestrator
|
|
74
|
-
|
|
75
|
-
OpenHermes is the primary orchestrator for this session.
|
|
76
|
-
|
|
77
|
-
- **Orchestrator**: OpenHermes — hub-and-spoke routing through oh-* skills
|
|
78
|
-
- **Plan**: `~/.local/share/opencode/openhermes/plans/<project-name>-plan-<nnn>.md` — always check before starting work
|
|
79
|
-
- **Never execute**: OpenHermes talks/reports to the user and delegates everything to sub-agents
|
|
80
|
-
- **Verify before claim**: read files, run commands, confirm output
|
|
81
|
-
```
|
|
82
|
-
|
|
83
|
-
### Phase 2: Issue Tracker
|
|
84
|
-
Detect the git hosting platform:
|
|
85
|
-
- **GitHub** — `gh` CLI
|
|
86
|
-
- **GitLab** — `glab` CLI
|
|
87
|
-
- **Local markdown** — files under `.scratch/<feature>/`
|
|
88
|
-
- **Other** — freeform workflow description
|
|
89
|
-
|
|
90
|
-
Confirm with the user. Write the result to `docs/agents/issue-tracker.md`.
|
|
91
|
-
|
|
92
|
-
### Phase 3: Triage Labels
|
|
93
|
-
The `triage` skill uses these label strings to move issues through a state machine:
|
|
94
|
-
- `needs-triage` — maintainer needs to evaluate
|
|
95
|
-
- `needs-info` — waiting on reporter
|
|
96
|
-
- `ready-for-agent` — fully specified, AFK-ready
|
|
97
|
-
- `ready-for-human` — needs human implementation
|
|
98
|
-
- `wontfix` — will not be actioned
|
|
99
|
-
|
|
100
|
-
If the repo already has different label names, map them. Write to `docs/agents/triage-labels.md`.
|
|
101
|
-
|
|
102
|
-
### Phase 4: Domain Docs
|
|
103
|
-
Configure how the project organizes domain language:
|
|
104
|
-
- **Single-context** — one `CONTEXT.md` + `docs/adr/` at repo root
|
|
105
|
-
- **Multi-context** — `CONTEXT-MAP.md` pointing to per-context files
|
|
106
|
-
|
|
107
|
-
Scaffold `CONTEXT.md` with project name, domain description, and placeholder glossary terms. Create `docs/adr/` directory with ADR template.
|
|
108
|
-
|
|
109
|
-
Write to `docs/agents/domain.md`.
|
|
110
|
-
|
|
111
|
-
### Phase 5: Agent Skills Block
|
|
112
|
-
Add a `## Agent skills` section to `AGENTS.md` (or `CLAUDE.md` if it exists):
|
|
113
|
-
|
|
114
|
-
```markdown
|
|
115
|
-
## Agent skills
|
|
116
|
-
|
|
117
|
-
### Issue tracker
|
|
118
|
-
<summary>. See docs/agents/issue-tracker.md.
|
|
119
|
-
|
|
120
|
-
### Triage labels
|
|
121
|
-
<summary>. See docs/agents/triage-labels.md.
|
|
122
|
-
|
|
123
|
-
### Domain docs
|
|
124
|
-
<summary>. See docs/agents/domain.md.
|
|
125
|
-
```
|
|
126
|
-
|
|
127
|
-
### Phase 6: Decision Record
|
|
128
|
-
Record: "oh-init completed for project <name> on <date>."
|
|
129
|
-
|
|
130
|
-
## Anti-patterns
|
|
131
|
-
- Running init without understanding the project domain
|
|
132
|
-
- Scaffolding CONTEXT.md without populating any terms
|
|
133
|
-
- Creating ADR directory but never writing ADRs
|
|
134
|
-
- Creating both AGENTS.md and CLAUDE.md — edit the one that exists
|
|
135
|
-
- Overwriting an existing AGENTS.md created by OpenCode `/init` (append instead)
|
|
136
|
-
- Creating `.opencode/` directory — plan files go to OpenCode's canonical storage, not a hidden project dir
|
|
137
|
-
- Empty instinct file never getting populated (run oh-learn extract periodically)
|
|
17
|
+
1. Check existing state (AGENTS.md, opencode.json, plan files, CONTEXT.md, docs/agents/)
|
|
18
|
+
2. Create AGENTS.md with OH orchestrator header or append OH section to existing
|
|
19
|
+
3. Detect issue tracker platform, confirm with user, write to docs/agents/issue-tracker.md
|
|
20
|
+
4. Define triage labels (needs-triage, needs-info, ready-for-agent, ready-for-human, wontfix), write to docs/agents/triage-labels.md
|
|
21
|
+
5. Scaffold CONTEXT.md with project name, domain, glossary placeholders; create docs/adr/ with ADR template; write to docs/agents/domain.md
|
|
22
|
+
6. Append Agent skills block referencing tracker, triage, and domain docs
|
|
23
|
+
7. Record decision artifact
|
|
138
24
|
|
|
139
25
|
## Routing
|
|
140
26
|
|
|
141
27
|
| Outcome | Route |
|
|
142
28
|
|---------|-------|
|
|
143
|
-
| pass | →
|
|
144
|
-
| fail | →
|
|
145
|
-
| blocker | → surface
|
|
29
|
+
| pass | → done |
|
|
30
|
+
| fail | → oh-init |
|
|
31
|
+
| blocker | → surface |
|
|
@@ -0,0 +1,171 @@
|
|
|
1
|
+
# oh-investigate — Deep Reference
|
|
2
|
+
|
|
3
|
+
## The Iron Law
|
|
4
|
+
|
|
5
|
+
> **NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST. Surface fixes compound into technical debt.**
|
|
6
|
+
|
|
7
|
+
If you haven't completed root cause investigation, you cannot propose fixes. Violating this process is violating the spirit of debugging.
|
|
8
|
+
|
|
9
|
+
## Phase 0 — Build a feedback loop
|
|
10
|
+
|
|
11
|
+
**This is the actual skill. Everything else is mechanical.**
|
|
12
|
+
|
|
13
|
+
A fast, deterministic, agent-runnable pass/fail signal = you find the cause. Without one, staring at code won't save you.
|
|
14
|
+
|
|
15
|
+
### Ways to construct a loop (try in order)
|
|
16
|
+
1. Failing test at the bug's seam
|
|
17
|
+
2. Curl/HTTP script against dev server
|
|
18
|
+
3. CLI invocation + fixture, diff stdout
|
|
19
|
+
4. Headless browser — assert on DOM/console/network
|
|
20
|
+
5. Replay captured trace in isolation
|
|
21
|
+
6. Throwaway harness — minimal subset exercising the bug path
|
|
22
|
+
7. Property/fuzz loop — 1000 random inputs
|
|
23
|
+
8. Bisection harness — `git bisect run`-able
|
|
24
|
+
9. Differential loop — old vs new version output diff
|
|
25
|
+
10. HITL script — drive human with structured loop
|
|
26
|
+
|
|
27
|
+
**Sharpen the loop:** Faster? Sharper signal (specific symptom, not "didn't crash")? More deterministic (pin time, seed RNG, isolate FS)? A 2s deterministic loop is a superpower.
|
|
28
|
+
|
|
29
|
+
**Non-deterministic:** Goal = higher reproduction rate. Loop 100×, parallelize, add stress. 50% flake is debuggable; 1% is not.
|
|
30
|
+
|
|
31
|
+
**Cannot build a loop?** Stop. Say so. List what you tried. Do NOT hypothesise without a loop.
|
|
32
|
+
|
|
33
|
+
## Workflow
|
|
34
|
+
|
|
35
|
+
Complete each phase before proceeding. Each phase consumes the feedback loop built in Phase 0.
|
|
36
|
+
|
|
37
|
+
### Phase 1 — Root Cause Investigation
|
|
38
|
+
|
|
39
|
+
**Before attempting ANY fix:**
|
|
40
|
+
|
|
41
|
+
1. **Reproduce** — Loop confirms the described failure. Exact steps? Every time?
|
|
42
|
+
2. **Read Error Messages** — Read stack traces completely. Note line numbers and error codes.
|
|
43
|
+
3. **Check Recent Changes** — Git diff, recent commits, new dependencies, env differences.
|
|
44
|
+
4. **Minimise** — Strip unrelated code. Remove noise until only the failure path remains.
|
|
45
|
+
5. **Gather Evidence** — One probe per hypothesis. Change one variable. Use unique debug prefixes.
|
|
46
|
+
6. **Trace Data Flow** — If error is deep in call stack, trace backward.
|
|
47
|
+
|
|
48
|
+
### Phase 2 — Pattern Analysis
|
|
49
|
+
|
|
50
|
+
**Find the pattern before fixing:**
|
|
51
|
+
|
|
52
|
+
1. **Find Working Examples** — Locate similar working code. What works that's analogous?
|
|
53
|
+
2. **Compare Against References** — Read reference implementation completely. Don't skim.
|
|
54
|
+
3. **Identify Differences** — List every difference between working and broken. Don't dismiss anything.
|
|
55
|
+
4. **Understand Dependencies** — What components, config, or environment does this depend on?
|
|
56
|
+
|
|
57
|
+
### Phase 3 — Hypothesis & Testing
|
|
58
|
+
|
|
59
|
+
**Scientific method:**
|
|
60
|
+
|
|
61
|
+
1. **Form Single Hypothesis** — "I think X is root cause because Y." Be specific, not vague.
|
|
62
|
+
2. **Test Minimally** — Smallest change to test hypothesis. One variable. Don't fix multiple things.
|
|
63
|
+
3. **Verify** — Prediction held? → Phase 4. No → new hypothesis. DON'T stack more fixes.
|
|
64
|
+
|
|
65
|
+
### Phase 4 — Implementation
|
|
66
|
+
|
|
67
|
+
**Fix root cause, not symptom:**
|
|
68
|
+
|
|
69
|
+
1. **Create Failing Test** — Simplest reproduction. Automated if possible. Must fail before fix.
|
|
70
|
+
2. **Implement Single Fix** — Address root cause. ONE change. No "while I'm here" improvements.
|
|
71
|
+
3. **Verify Fix** — Failing test passes? No other tests broken? Phase 0 loop confirms resolution?
|
|
72
|
+
4. **Regression Test** — Verify existing behavior. No regression seam = architecture gap (flag it).
|
|
73
|
+
5. **Document** — Log root cause + fix. State which hypothesis was correct. What was the trigger?
|
|
74
|
+
|
|
75
|
+
## Root Cause Tracing
|
|
76
|
+
|
|
77
|
+
Bugs manifest deep in call stacks. Fixing at the symptom treats the wrong layer. **Trace backward through the call chain to find the original trigger.**
|
|
78
|
+
|
|
79
|
+
1. **Observe symptom** — Error at point of failure.
|
|
80
|
+
2. **Find immediate cause** — What code directly produces this error?
|
|
81
|
+
3. **What called this?** — Step one level up the call chain.
|
|
82
|
+
4. **Keep tracing up** — What value was passed? Where from?
|
|
83
|
+
5. **Find original trigger** — Root source of bad state. Fix here, not at symptom.
|
|
84
|
+
|
|
85
|
+
**Stack trace instrumentation:**
|
|
86
|
+
```
|
|
87
|
+
const stack = new Error().stack;
|
|
88
|
+
console.error('DEBUG <component>:', { directory, cwd, stack });
|
|
89
|
+
```
|
|
90
|
+
Use `console.error()` (logger may be suppressed in tests). Grep output. **Never fix just where the error appears** — trace back and add validation at each layer.
|
|
91
|
+
|
|
92
|
+
## Multi-Component Diagnostics
|
|
93
|
+
|
|
94
|
+
**In multi-component systems (CI → build → signing, API → service → database), add instrumentation at each boundary BEFORE proposing fixes:**
|
|
95
|
+
|
|
96
|
+
- Log data entering and exiting each component
|
|
97
|
+
- Verify environment/config propagation across layers
|
|
98
|
+
- Check state at each layer
|
|
99
|
+
|
|
100
|
+
Run once to gather evidence, identify the failing component, THEN investigate it.
|
|
101
|
+
|
|
102
|
+
**Example (build pipeline):** Layer 1 (workflow → secrets?), Layer 2 (build → env vars?), Layer 3 (signing → keychain?), Layer 4 (actual signing). Reveals which layer fails in one pass.
|
|
103
|
+
|
|
104
|
+
## Red Flags
|
|
105
|
+
|
|
106
|
+
**If you catch yourself thinking any of these, STOP. Return to Phase 1.**
|
|
107
|
+
|
|
108
|
+
- "Quick fix for now, investigate later"
|
|
109
|
+
- "Just try changing X and see if it works"
|
|
110
|
+
- "Add multiple changes, run tests"
|
|
111
|
+
- "Skip the test, I'll manually verify"
|
|
112
|
+
- "It's probably X, let me fix that"
|
|
113
|
+
- "I don't fully understand but this might work"
|
|
114
|
+
- "Pattern says X but I'll adapt it differently"
|
|
115
|
+
- Proposing solutions before tracing data flow
|
|
116
|
+
- "One more fix attempt" (when already tried 2+)
|
|
117
|
+
- Each fix reveals new problem in different place
|
|
118
|
+
- "Here are the main problems: [lists fixes without investigation]"
|
|
119
|
+
- "Issue is simple, don't need the full process"
|
|
120
|
+
|
|
121
|
+
**All of these mean: STOP. Return to investigation.**
|
|
122
|
+
|
|
123
|
+
## 3+ Fix Failure Rule
|
|
124
|
+
|
|
125
|
+
**After 3 failed fix attempts, STOP and question the architecture.**
|
|
126
|
+
|
|
127
|
+
Three failed fixes signals an architectural problem:
|
|
128
|
+
- Each fix reveals new shared state/coupling in a different place
|
|
129
|
+
- Fixes require "massive refactoring" to implement
|
|
130
|
+
- Each fix creates new symptoms elsewhere
|
|
131
|
+
|
|
132
|
+
**Do not attempt Fix #4 without architectural discussion:**
|
|
133
|
+
- Is this pattern fundamentally sound?
|
|
134
|
+
- Are we sticking with it through inertia?
|
|
135
|
+
- Should we refactor architecture vs. continue fixing symptoms?
|
|
136
|
+
|
|
137
|
+
This is NOT a failed hypothesis — this is a wrong architecture.
|
|
138
|
+
|
|
139
|
+
## Partner Signal Monitoring
|
|
140
|
+
|
|
141
|
+
**When your human partner says these, they mean you're guessing, not debugging:**
|
|
142
|
+
|
|
143
|
+
| Phrase | Meaning |
|
|
144
|
+
|--------|---------|
|
|
145
|
+
| "Is that happening?" | You assumed without verifying |
|
|
146
|
+
| "Will it show us...?" | You skipped evidence gathering |
|
|
147
|
+
| "Stop guessing" | You're proposing fixes without understanding |
|
|
148
|
+
| "We're stuck?" | Your approach isn't working |
|
|
149
|
+
|
|
150
|
+
**When you see any of these: STOP. Return to Phase 1.**
|
|
151
|
+
|
|
152
|
+
## Common Rationalizations
|
|
153
|
+
|
|
154
|
+
| Excuse | Reality |
|
|
155
|
+
|--------|---------|
|
|
156
|
+
| "Issue is simple, don't need process" | Simple bugs have root causes too. Process is fast. |
|
|
157
|
+
| "Emergency, no time for process" | Systematic is FASTER than guess-and-check thrashing. |
|
|
158
|
+
| "Just try this first, then investigate" | First fix sets the pattern. Do it right from start. |
|
|
159
|
+
| "I'll write test after confirming fix works" | Untested fixes don't stick. Test first proves bug. |
|
|
160
|
+
| "Multiple fixes at once saves time" | Can't isolate what worked. Causes new bugs. |
|
|
161
|
+
| "Reference too long, I'll adapt the pattern" | Partial understanding guarantees bugs. Read fully. |
|
|
162
|
+
| "I see the problem, let me fix it" | Seeing symptoms ≠ understanding root cause. |
|
|
163
|
+
| "One more fix attempt" (after 2+ failures) | 3+ failures = architectural problem. Stop fixing. |
|
|
164
|
+
|
|
165
|
+
## Anti-patterns
|
|
166
|
+
|
|
167
|
+
- Fixing symptoms (same bug reappears)
|
|
168
|
+
- Changing code without reproducing
|
|
169
|
+
- Shotgun debugging (multiple changes hoping one sticks)
|
|
170
|
+
- Not documenting root cause
|
|
171
|
+
- Hypothesizing without a feedback loop
|
|
@@ -1,12 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: oh-investigate
|
|
3
|
-
description: "
|
|
3
|
+
description: "Use when debugging any bug, test failure, or unexpected behavior. Finds root cause systematically before attempting fixes."
|
|
4
4
|
tier: 2
|
|
5
|
-
triggers:
|
|
6
|
-
- "investigate this bug"
|
|
7
|
-
- "debug this"
|
|
8
|
-
- "why is this broken"
|
|
9
|
-
- "root cause"
|
|
10
5
|
route:
|
|
11
6
|
pass: oh-builder
|
|
12
7
|
fail: oh-expert
|
|
@@ -15,70 +10,22 @@ route:
|
|
|
15
10
|
|
|
16
11
|
# oh-investigate
|
|
17
12
|
|
|
18
|
-
|
|
19
|
-
When a bug is reported, a test fails, or unexpected behavior occurs. Use this before attempting any fix.
|
|
13
|
+
Systematic bug diagnosis: build a feedback loop, trace root cause, fix with evidence.
|
|
20
14
|
|
|
21
|
-
##
|
|
15
|
+
## Steps
|
|
22
16
|
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
1. **Failing test** at whatever seam reaches the bug.
|
|
32
|
-
2. **Curl / HTTP script** against a running dev server.
|
|
33
|
-
3. **CLI invocation** with a fixture input, diffing stdout against a known-good snapshot.
|
|
34
|
-
4. **Headless browser script** — drive the UI, assert on DOM/console/network.
|
|
35
|
-
5. **Replay a captured trace** — save a real payload/event log, replay it in isolation.
|
|
36
|
-
6. **Throwaway harness** — minimal subset of the system exercising the bug code path with a single call.
|
|
37
|
-
7. **Property / fuzz loop** — run 1000 random inputs, look for the failure mode.
|
|
38
|
-
8. **Bisection harness** — automate "boot at state X, check, repeat" so you can `git bisect run` it.
|
|
39
|
-
9. **Differential loop** — run same input through old-version vs new-version, diff outputs.
|
|
40
|
-
10. **HITL script** — last resort. Drive a human with a structured loop.
|
|
41
|
-
|
|
42
|
-
### Iterate on the loop itself
|
|
43
|
-
|
|
44
|
-
- Can I make it faster? (Cache setup, skip unrelated init, narrow the scope.)
|
|
45
|
-
- Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".)
|
|
46
|
-
- Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem.)
|
|
47
|
-
|
|
48
|
-
A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower.
|
|
49
|
-
|
|
50
|
-
### Non-deterministic bugs
|
|
51
|
-
|
|
52
|
-
The goal is not a clean repro but a **higher reproduction rate**. Loop the trigger 100×, parallelise, add stress, narrow timing windows. A 50%-flake bug is debuggable; 1% is not.
|
|
53
|
-
|
|
54
|
-
### When you genuinely cannot build a loop
|
|
55
|
-
|
|
56
|
-
Stop and say so explicitly. List what you tried. Do **not** proceed to hypothesise without a loop.
|
|
57
|
-
|
|
58
|
-
## Workflow (consumes the loop)
|
|
59
|
-
|
|
60
|
-
1. **Reproduce** — run the loop, confirm the bug appears. The loop must match the user's described failure, not a different nearby failure.
|
|
61
|
-
2. **Minimise** — strip away unrelated code until the minimal reproduction remains.
|
|
62
|
-
3. **Hypothesise** — generate 3–5 ranked falsifiable hypotheses before testing any. Each must state a prediction: "If X is the cause, then changing Y will make the bug disappear".
|
|
63
|
-
4. **Instrument** — one probe per hypothesis. Change one variable at a time. Tag every debug log with a unique prefix (e.g. `[DEBUG-a4f2]`) for easy cleanup.
|
|
64
|
-
5. **Fix** — write the regression test at a correct seam first. Watch it fail. Apply the smallest correct change. Watch it pass. Re-run the Phase 0 loop against the original scenario.
|
|
65
|
-
6. **Regression test** — verify fix doesn't break existing behavior. If no correct seam exists for a regression test, that itself is a finding — flag the architecture gap.
|
|
66
|
-
7. **Document** — log the root cause and fix in the handoff, issue, or relevant docs. State which hypothesis was correct so the next debugger learns.
|
|
67
|
-
|
|
68
|
-
## Iron Law
|
|
69
|
-
No fixes without root cause. Surface-level fixes compound into technical debt.
|
|
70
|
-
|
|
71
|
-
## Anti-patterns
|
|
72
|
-
- Fixing symptoms instead of causes (the same bug reappears next week)
|
|
73
|
-
- Changing code without reproducing the bug first
|
|
74
|
-
- "Shotgun" debugging — changing multiple things hoping one sticks
|
|
75
|
-
- Not documenting root cause for future reference
|
|
76
|
-
- Proceeding to hypothesise without a feedback loop
|
|
17
|
+
1. Build a feedback loop — failing test, curl, CLI, headless browser, or throwaway harness. Must be fast, deterministic, and agent-runnable.
|
|
18
|
+
2. Apply the Iron Law — NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST. Surface fixes compound into technical debt.
|
|
19
|
+
3. Reproduce and trace — confirm failure, read stack traces, check recent changes, minimise to failure path, gather evidence one probe per hypothesis.
|
|
20
|
+
4. Trace backward — from symptom through call chain to original trigger. Instrument boundaries with debug logging.
|
|
21
|
+
5. Find working pattern — locate similar working code, compare against references, list every difference.
|
|
22
|
+
6. Form single hypothesis — "I think X is root cause because Y." Test with minimal change. One variable.
|
|
23
|
+
7. Implement fix — create failing test first, apply one fix, verify resolution, regression test, document root cause.
|
|
77
24
|
|
|
78
25
|
## Routing
|
|
79
26
|
|
|
80
27
|
| Outcome | Route |
|
|
81
28
|
|---------|-------|
|
|
82
|
-
| pass | → oh-builder (
|
|
83
|
-
| fail | → oh-expert (deepen
|
|
84
|
-
| blocker | → surface
|
|
29
|
+
| pass | → oh-builder (fix) |
|
|
30
|
+
| fail | → oh-expert (deepen) |
|
|
31
|
+
| blocker | → surface |
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
# oh-issue — Deep Reference
|
|
2
|
+
|
|
3
|
+
## When to Use
|
|
4
|
+
|
|
5
|
+
Plan/PRD needs breaking into actionable issues. Vertical tracer-bullet slices.
|
|
6
|
+
|
|
7
|
+
Triggers: break into issues, create issues from plan, issue breakdown.
|
|
8
|
+
|
|
9
|
+
## Issue Structure
|
|
10
|
+
|
|
11
|
+
- **Title**: action-oriented ("Add user auth API")
|
|
12
|
+
- **AC**: concrete, testable ("User signs up with email + password")
|
|
13
|
+
- **Notes**: pointers for implementer
|
|
14
|
+
- **Deps**: what must come first
|
|
15
|
+
- **Labels**: type, priority, area
|
|
16
|
+
|
|
17
|
+
## Anti-patterns
|
|
18
|
+
|
|
19
|
+
- Horizontal slicing (no one ships "DB layer" alone)
|
|
20
|
+
- Issues too large (3+ days) or too small (<1 hour)
|
|
21
|
+
- Missing acceptance criteria
|
|
@@ -1,11 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: oh-issue
|
|
3
|
-
description: "Break
|
|
3
|
+
description: "Break plans/PRDs into independently-grabbable GitHub issues"
|
|
4
4
|
tier: 2
|
|
5
|
-
triggers:
|
|
6
|
-
- "break into issues"
|
|
7
|
-
- "create issues from plan"
|
|
8
|
-
- "issue breakdown"
|
|
9
5
|
route:
|
|
10
6
|
pass: done
|
|
11
7
|
fail: oh-planner
|
|
@@ -14,32 +10,20 @@ route:
|
|
|
14
10
|
|
|
15
11
|
# oh-issue
|
|
16
12
|
|
|
17
|
-
|
|
18
|
-
When a plan exists and needs to be broken into actionable issues. Uses tracer-bullet vertical slices for independent work items.
|
|
13
|
+
Break plans/PRDs into vertical-slice issues with acceptance criteria and dependencies.
|
|
19
14
|
|
|
20
|
-
##
|
|
21
|
-
1. Read the plan or PRD
|
|
22
|
-
2. Identify vertical slices — self-contained features that ship independently
|
|
23
|
-
3. Write each issue with: clear title, acceptance criteria, implementation notes, dependencies
|
|
24
|
-
4. Use `gh issue create` to publish each issue
|
|
25
|
-
5. Label and milestone each issue appropriately
|
|
15
|
+
## Steps
|
|
26
16
|
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
- **Labels**: type, priority, area
|
|
33
|
-
|
|
34
|
-
## Anti-patterns
|
|
35
|
-
- Horizontal slicing (DB layer / API layer / UI layer — no one ships a layer)
|
|
36
|
-
- Issues too large (3+ days) or too small (< 1 hour)
|
|
37
|
-
- Writing issues without acceptance criteria
|
|
17
|
+
1. Read plan or PRD
|
|
18
|
+
2. Identify vertical slices — self-contained, independently shippable
|
|
19
|
+
3. Write each issue with title, acceptance criteria, implementation notes, and dependencies
|
|
20
|
+
4. Publish issues via `gh issue create`
|
|
21
|
+
5. Apply labels and milestone
|
|
38
22
|
|
|
39
23
|
## Routing
|
|
40
24
|
|
|
41
25
|
| Outcome | Route |
|
|
42
26
|
|---------|-------|
|
|
43
|
-
| pass | →
|
|
44
|
-
| fail | → oh-planner
|
|
45
|
-
| blocker | → surface
|
|
27
|
+
| pass | → done |
|
|
28
|
+
| fail | → oh-planner |
|
|
29
|
+
| blocker | → surface |
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
# oh-learn — Deep Reference
|
|
2
|
+
|
|
3
|
+
## When to Use
|
|
4
|
+
|
|
5
|
+
Session learnings should be captured, reviewed, or promoted as reusable instincts for future work.
|
|
6
|
+
|
|
7
|
+
Triggers: learn from session, extract patterns, run oh-learn.
|
|
8
|
+
|
|
9
|
+
## Instinct Data Model
|
|
10
|
+
|
|
11
|
+
JSONL at `~/.local/share/opencode/openhermes/plans/<project>-instincts.jsonl`:
|
|
12
|
+
|
|
13
|
+
```json
|
|
14
|
+
{"trigger": "specific situation", "action": "recommended response", "confidence": 0.5, "applications": 1, "successes": 1, "category": "coding", "source": "oh-learn:extract", "ts": "2026-05-15T12:00:00Z"}
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
**Trigger:** specific, matchable (not general advice). **Action:** executable (not belief). **Confidence:** starts 0.5, +0.05 per success, -0.02/day decay. **Category:** coding, testing, security, git, planning, orchestration, debugging, ux.
|
|
18
|
+
|
|
19
|
+
## Workflows
|
|
20
|
+
|
|
21
|
+
### Extract
|
|
22
|
+
|
|
23
|
+
Scan session for repeated decisions. For each: write instinct. Check existing file for near-duplicates. Merge (max confidence, increment applications) or append.
|
|
24
|
+
|
|
25
|
+
### Evolve
|
|
26
|
+
|
|
27
|
+
Read all instincts. Group by category then topic. ≥5 instincts with avg confidence ≥ 0.7 → oh-skill-craft spec. 3-4 with confidence ≥ 0.8 → suggest update to existing skill.
|
|
28
|
+
|
|
29
|
+
### Promote
|
|
30
|
+
|
|
31
|
+
Instincts with confidence ≥ 0.85 AND applications ≥ 10 → filter project-specific → append to global `%USERPROFILE%\.config\opencode\instincts.jsonl`. Tag promoted.
|
|
32
|
+
|
|
33
|
+
### Review / Search / Prune / Export
|
|
34
|
+
|
|
35
|
+
Review: totals + distributions. Search: by topic, trigger, category, confidence. Prune: stale >30d with confidence < 0.3. Export: portable JSON.
|
|
36
|
+
|
|
37
|
+
## Anti-patterns
|
|
38
|
+
|
|
39
|
+
- Hoarding every observation (most aren't learnings)
|
|
40
|
+
- Never pruning
|
|
41
|
+
- Storing what not why
|
|
42
|
+
- Over-promoting to global
|
|
43
|
+
- Extracting without applying
|
|
44
|
+
- Ignoring confidence
|
|
@@ -1,11 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: oh-learn
|
|
3
|
-
description: "
|
|
3
|
+
description: "Capture, review, and promote session learnings as reusable instincts"
|
|
4
4
|
tier: 2
|
|
5
|
-
triggers:
|
|
6
|
-
- "learn from session"
|
|
7
|
-
- "extract patterns"
|
|
8
|
-
- "run oh-learn"
|
|
9
5
|
route:
|
|
10
6
|
pass: done
|
|
11
7
|
fail: surface
|
|
@@ -14,88 +10,21 @@ route:
|
|
|
14
10
|
|
|
15
11
|
# oh-learn
|
|
16
12
|
|
|
17
|
-
|
|
13
|
+
Distill session patterns into instincts, cluster into skill candidates, and promote high-signal patterns.
|
|
18
14
|
|
|
19
|
-
##
|
|
15
|
+
## Steps
|
|
20
16
|
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
**Rules:**
|
|
28
|
-
- **Trigger** — specific, matchable situation. *Not* general advice.
|
|
29
|
-
- **Action** — executable response. *Not* a belief.
|
|
30
|
-
- **Confidence** — starts at 0.5, increments +0.05 per successful application, decays -0.02 per day without use.
|
|
31
|
-
- **Category** — one of: `coding`, `testing`, `security`, `git`, `planning`, `orchestration`, `debugging`, `ux`.
|
|
32
|
-
|
|
33
|
-
## When to Use
|
|
34
|
-
|
|
35
|
-
After completing a significant piece of work, at session handoff, or when you notice the same pattern repeat 2+ times in one session. Also on explicit user request.
|
|
36
|
-
|
|
37
|
-
## Workflows
|
|
38
|
-
|
|
39
|
-
### Extract
|
|
40
|
-
Mine the current session for reusable patterns.
|
|
41
|
-
|
|
42
|
-
1. Scan recent conversation + code changes for repeated decision patterns
|
|
43
|
-
2. For each distinct pattern write an instinct: trigger, action, confidence=0.5, category
|
|
44
|
-
3. Read existing `~/.local/share/opencode/openhermes/plans/<project-name>-instincts.jsonl`, check for near-duplicate triggers
|
|
45
|
-
4. If duplicate found: merge — `confidence = max(existing, 0.8 × new)`, increment applications
|
|
46
|
-
5. If new: append line to file
|
|
47
|
-
|
|
48
|
-
**Good instinct:** trigger=`"tsc --noEmit shows 10+ errors after batch edit"`, action=`"Fix errors one at a time, re-running tsc after each, rather than batch-fixing"`, category=`"debugging"`
|
|
49
|
-
|
|
50
|
-
**Bad instinct:** `"Write clean code"` — too vague to trigger on.
|
|
51
|
-
|
|
52
|
-
### Evolve
|
|
53
|
-
Cluster related instincts into skill/command/agent candidates.
|
|
54
|
-
|
|
55
|
-
1. Read all instincts from `~/.local/share/opencode/openhermes/plans/<project-name>-instincts.jsonl`
|
|
56
|
-
2. Group by `category`, then by trigger topic similarity
|
|
57
|
-
3. **If cluster ≥ 5 instincts AND avg confidence ≥ 0.7** → generate `oh-skill-craft` spec for a new skill
|
|
58
|
-
4. **If cluster 3-4 instincts with confidence ≥ 0.8** → suggest update to existing skill
|
|
59
|
-
5. Output candidate summary with trigger list and extracted core pattern
|
|
60
|
-
|
|
61
|
-
### Promote
|
|
62
|
-
Graduate high-confidence instincts from project to global scope.
|
|
63
|
-
|
|
64
|
-
1. Scan `~/.local/share/opencode/openhermes/plans/<project-name>-instincts.jsonl` for instincts with `confidence >= 0.85 AND applications >= 10`
|
|
65
|
-
2. Filter out project-specific patterns (reference paths, local APIs, domain terms)
|
|
66
|
-
3. Append filtered candidates to `%USERPROFILE%\.config\opencode\instincts.jsonl` (global)
|
|
67
|
-
4. Tag promoted instincts with `"promoted": true` in project file
|
|
68
|
-
5. Report: "Promoted N instincts to global scope"
|
|
69
|
-
|
|
70
|
-
### Review
|
|
71
|
-
Show instinct summary: total count, confidence distribution, category breakdown, recently promoted.
|
|
72
|
-
|
|
73
|
-
### Search
|
|
74
|
-
Find instincts by topic, trigger fragment, category, or confidence range.
|
|
75
|
-
|
|
76
|
-
### Prune
|
|
77
|
-
Remove instincts stale for 30+ days with confidence < 0.3, or superseded by a higher-confidence instinct covering the same trigger.
|
|
78
|
-
|
|
79
|
-
### Export
|
|
80
|
-
Serialize instincts to portable JSON for sharing across projects or teams:
|
|
81
|
-
|
|
82
|
-
```json
|
|
83
|
-
{ "version": 1, "exported": "2026-05-15T12:00:00Z", "instincts": [...] }
|
|
84
|
-
```
|
|
85
|
-
|
|
86
|
-
## Anti-patterns
|
|
87
|
-
|
|
88
|
-
- Hoarding every observation (most things aren't learnings)
|
|
89
|
-
- Never pruning (stale knowledge is worse than no knowledge)
|
|
90
|
-
- Storing what, not why (context-less facts are forgettable)
|
|
91
|
-
- Over-promoting: not every pattern is globally useful
|
|
92
|
-
- Extracting without applying: instincts that never trigger are noise
|
|
93
|
-
- Ignoring confidence: treating all instincts as equally reliable
|
|
17
|
+
1. Scan session for repeated decisions
|
|
18
|
+
2. Write instinct for each pattern — trigger, action, confidence
|
|
19
|
+
3. Check existing file for near-duplicates; merge or append
|
|
20
|
+
4. Group instincts by category and topic
|
|
21
|
+
5. Promote high-confidence instincts (≥0.85, ≥10 applications) to global scope
|
|
22
|
+
6. Prune stale low-confidence instincts (<0.3, >30 days)
|
|
94
23
|
|
|
95
24
|
## Routing
|
|
96
25
|
|
|
97
26
|
| Outcome | Route |
|
|
98
27
|
|---------|-------|
|
|
99
|
-
| pass | →
|
|
100
|
-
| fail | →
|
|
101
|
-
| blocker | → surface
|
|
28
|
+
| pass | → done |
|
|
29
|
+
| fail | → surface |
|
|
30
|
+
| blocker | → surface |
|