openhermes 4.3.0 → 4.9.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CONTEXT.md +9 -0
- package/README.md +26 -15
- package/bootstrap.ts +161 -124
- package/harness/agents/oh-browser.md +97 -0
- package/harness/agents/oh-builder.md +78 -0
- package/harness/agents/oh-facade.md +75 -0
- package/harness/agents/oh-fusion.md +45 -0
- package/harness/agents/oh-gauntlet.md +71 -0
- package/harness/agents/oh-grill.md +71 -0
- package/harness/agents/oh-investigate.md +60 -0
- package/harness/agents/oh-manifest.md +95 -0
- package/harness/agents/oh-plan-review.md +40 -0
- package/harness/agents/oh-planner.md +50 -0
- package/harness/agents/oh-refactor.md +37 -0
- package/harness/agents/oh-retro.md +46 -0
- package/harness/agents/oh-review.md +85 -0
- package/harness/agents/oh-security.md +83 -0
- package/harness/agents/oh-ship.md +76 -0
- package/harness/agents/oh-skill-craft.md +38 -0
- package/harness/agents/openhermes.md +107 -53
- package/harness/codex/AUTOPILOT.md +143 -91
- package/harness/codex/CHARTER.md +81 -0
- package/harness/commands/oh-doctor.md +193 -14
- package/harness/instructions/SHELL.md +76 -0
- package/harness/skills/oh-ascii/DEEP.md +292 -0
- package/harness/skills/oh-ascii/SKILL.md +31 -0
- package/harness/skills/oh-ascii/scripts/check_ascii_alignment.py +596 -0
- package/harness/skills/oh-browser/DEEP.md +54 -0
- package/harness/skills/oh-browser/SKILL.md +30 -0
- package/harness/skills/oh-builder/DEEP.md +63 -0
- package/harness/skills/oh-builder/SKILL.md +12 -90
- package/harness/skills/oh-expert/DEEP.md +85 -0
- package/harness/skills/oh-expert/SKILL.md +13 -106
- package/harness/skills/oh-facade/DEEP.md +182 -0
- package/harness/skills/oh-facade/SKILL.md +15 -279
- package/harness/skills/oh-freeze/DEEP.md +18 -0
- package/harness/skills/oh-freeze/SKILL.md +10 -19
- package/harness/skills/oh-full-output/DEEP.md +25 -0
- package/harness/skills/oh-full-output/SKILL.md +12 -65
- package/harness/skills/oh-fusion/DEEP.md +120 -0
- package/harness/skills/oh-fusion/SKILL.md +17 -295
- package/harness/skills/oh-gauntlet/DEEP.md +77 -0
- package/harness/skills/oh-gauntlet/SKILL.md +13 -105
- package/harness/skills/oh-grill/DEEP.md +51 -0
- package/harness/skills/oh-grill/SKILL.md +12 -63
- package/harness/skills/oh-guard/DEEP.md +19 -0
- package/harness/skills/oh-guard/SKILL.md +10 -24
- package/harness/skills/oh-handoff/DEEP.md +48 -0
- package/harness/skills/oh-handoff/SKILL.md +13 -23
- package/harness/skills/oh-health/DEEP.md +74 -0
- package/harness/skills/oh-health/SKILL.md +13 -76
- package/harness/skills/oh-init/DEEP.md +85 -0
- package/harness/skills/oh-init/SKILL.md +13 -127
- package/harness/skills/oh-investigate/DEEP.md +171 -0
- package/harness/skills/oh-investigate/SKILL.md +13 -66
- package/harness/skills/oh-issue/DEEP.md +21 -0
- package/harness/skills/oh-issue/SKILL.md +11 -27
- package/harness/skills/oh-learn/DEEP.md +44 -0
- package/harness/skills/oh-learn/SKILL.md +12 -83
- package/harness/skills/oh-manifest/DEEP.md +92 -0
- package/harness/skills/oh-manifest/SKILL.md +11 -108
- package/harness/skills/oh-plan-review/DEEP.md +90 -0
- package/harness/skills/oh-plan-review/SKILL.md +13 -115
- package/harness/skills/oh-planner/DEEP.md +172 -0
- package/harness/skills/oh-planner/SKILL.md +12 -149
- package/harness/skills/oh-prd/DEEP.md +45 -0
- package/harness/skills/oh-prd/SKILL.md +10 -26
- package/harness/skills/oh-refactor/DEEP.md +122 -0
- package/harness/skills/oh-refactor/SKILL.md +17 -410
- package/harness/skills/oh-retro/DEEP.md +26 -0
- package/harness/skills/oh-retro/SKILL.md +12 -24
- package/harness/skills/oh-review/DEEP.md +87 -0
- package/harness/skills/oh-review/SKILL.md +11 -97
- package/harness/skills/oh-security/DEEP.md +83 -0
- package/harness/skills/oh-security/SKILL.md +14 -96
- package/harness/skills/oh-ship/DEEP.md +141 -0
- package/harness/skills/oh-ship/SKILL.md +13 -31
- package/harness/skills/oh-skill-craft/DEEP.md +369 -0
- package/harness/skills/oh-skill-craft/SKILL.md +17 -178
- package/harness/skills/oh-skills-link/DEEP.md +16 -0
- package/harness/skills/oh-skills-link/SKILL.md +10 -20
- package/harness/skills/oh-skills-list/DEEP.md +20 -0
- package/harness/skills/oh-skills-list/SKILL.md +9 -22
- package/harness/skills/oh-triage/DEEP.md +23 -0
- package/harness/skills/oh-triage/SKILL.md +8 -24
- package/harness/skills/oh-worktree/DEEP.md +169 -0
- package/harness/skills/oh-worktree/SKILL.md +32 -0
- package/lib/harness-resolver.ts +8 -10
- package/package.json +5 -3
- package/scripts/count-tokens.mjs +158 -0
- package/scripts/oh-doctor.ps1 +342 -0
- package/harness/codex/CONSTITUTION.md +0 -73
- package/harness/codex/ROUTING.md +0 -92
- package/harness/instructions/RUNTIME.md +0 -30
- package/harness/skills/oh-caveman/SKILL.md +0 -42
- package/lib/logger.ts +0 -75
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
# oh-builder — Deep Reference
|
|
2
|
+
|
|
3
|
+
## When to Use
|
|
4
|
+
|
|
5
|
+
Use when you need to build something from a plan, prototype an idea, implement code via TDD, or design an interface. Consumes plan artifacts and produces working code.
|
|
6
|
+
|
|
7
|
+
Benefits from: oh-planner, oh-expert
|
|
8
|
+
|
|
9
|
+
Triggers: "build this", "implement this phase", "write the code for", "prototype", "tdd", "red-green", "design an interface", "implement the feature", "build the component"
|
|
10
|
+
|
|
11
|
+
## Entry Modes
|
|
12
|
+
|
|
13
|
+
### Mode A: Prototype (exploratory)
|
|
14
|
+
|
|
15
|
+
Pick branch by question:
|
|
16
|
+
- **"Does this logic feel right?"** → Terminal branch. Tiny interactive app pushing state machine.
|
|
17
|
+
- **"What should this look like?"** → UI branch. Multiple visual variations switchable via param/control bar.
|
|
18
|
+
|
|
19
|
+
**Rules:** Throwaway from day one (clear name). One command to run. No persistence (memory state). Skip polish (no tests, minimal error handling). Surface state after every action. Delete/absorb answer when done.
|
|
20
|
+
|
|
21
|
+
### Mode B: TDD (test-first)
|
|
22
|
+
|
|
23
|
+
Red-green-refactor with vertical tracer bullets.
|
|
24
|
+
|
|
25
|
+
**Plan:** Confirm interface changes with user. Prioritize behaviors. Design for testability (public interface only). List behaviors, not implementation steps.
|
|
26
|
+
|
|
27
|
+
**Loop** per behavior:
|
|
28
|
+
```
|
|
29
|
+
RED: One test → fails
|
|
30
|
+
GREEN: Minimal code → passes
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
**Rules:** One test at a time. Only enough code to pass. Don't anticipate future tests. Tests through public interfaces. Never refactor while RED.
|
|
34
|
+
|
|
35
|
+
**Refactor** (all GREEN): Extract duplication, deepen modules, re-run tests after each step.
|
|
36
|
+
|
|
37
|
+
### Mode C: Design an Interface
|
|
38
|
+
|
|
39
|
+
"Design it twice" — generate multiple radically different designs, compare.
|
|
40
|
+
|
|
41
|
+
1. Gather requirements (problem, callers, operations, constraints)
|
|
42
|
+
2. Spawn 3+ parallel sub-agents with different constraints (min methods, max flexibility, optimize common case, specific paradigm)
|
|
43
|
+
3. Present designs (signature, examples, what it hides)
|
|
44
|
+
4. Compare (simplicity, generality, efficiency, depth)
|
|
45
|
+
5. Synthesize insights
|
|
46
|
+
|
|
47
|
+
### Mode D: From Plan
|
|
48
|
+
|
|
49
|
+
Plan exists → execute phases in order.
|
|
50
|
+
|
|
51
|
+
1. Read plan file
|
|
52
|
+
2. Each phase: implement per spec using TDD (Mode B)
|
|
53
|
+
3. Verify against criteria before moving on
|
|
54
|
+
4. Update plan with completed status
|
|
55
|
+
|
|
56
|
+
## Anti-patterns
|
|
57
|
+
|
|
58
|
+
- Polishing a prototype
|
|
59
|
+
- Writing all tests first (brittle, imaginary)
|
|
60
|
+
- Anticipating future tests
|
|
61
|
+
- Refactoring while RED
|
|
62
|
+
- Sub-agents producing similar designs (enforce radical difference)
|
|
63
|
+
- Implementing without verifying against plan criteria
|
|
@@ -1,18 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: oh-builder
|
|
3
|
-
description: "
|
|
3
|
+
description: "Build from plans, prototypes, TDD, or interface design"
|
|
4
4
|
tier: 4
|
|
5
|
-
benefits-from: [oh-planner, oh-expert]
|
|
6
|
-
triggers:
|
|
7
|
-
- "build this"
|
|
8
|
-
- "implement this phase"
|
|
9
|
-
- "write the code for"
|
|
10
|
-
- "prototype"
|
|
11
|
-
- "tdd"
|
|
12
|
-
- "red-green"
|
|
13
|
-
- "design an interface"
|
|
14
|
-
- "implement the feature"
|
|
15
|
-
- "build the component"
|
|
16
5
|
route:
|
|
17
6
|
pass: oh-gauntlet
|
|
18
7
|
fail: oh-builder
|
|
@@ -21,89 +10,22 @@ route:
|
|
|
21
10
|
|
|
22
11
|
# oh-builder
|
|
23
12
|
|
|
24
|
-
|
|
13
|
+
ALL-arounder builder: prototyping, TDD, plan implementation, interface design.
|
|
25
14
|
|
|
26
|
-
##
|
|
15
|
+
## Steps
|
|
27
16
|
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
If the question is genuinely ambiguous, default to whichever branch better matches the surrounding code (backend module → terminal, page/component → UI) and state the assumption.
|
|
37
|
-
|
|
38
|
-
**Rules that apply to both branches:**
|
|
39
|
-
|
|
40
|
-
1. **Throwaway from day one, clearly marked.** Name it so a casual reader sees it's a prototype.
|
|
41
|
-
2. **One command to run.** Whatever the project's task runner supports — `pnpm <name>`, `bun <path>`, etc.
|
|
42
|
-
3. **No persistence by default.** State lives in memory. If the question involves a database, hit a scratch DB with a clear "PROTOTYPE — wipe me" name.
|
|
43
|
-
4. **Skip the polish.** No tests, no error handling beyond what makes it runnable. The point is to learn and then delete.
|
|
44
|
-
5. **Surface the state.** After every action (terminal) or on every variant switch (UI), show the full relevant state so the user sees what changed.
|
|
45
|
-
6. **Delete or absorb when done.** The answer is the only thing worth keeping. Capture it in a commit, ADR, or note — then delete the prototype code.
|
|
46
|
-
|
|
47
|
-
### Mode B: TDD (test-first implementation)
|
|
48
|
-
When building production code from a plan or spec. Red-green-refactor with vertical tracer bullets.
|
|
49
|
-
|
|
50
|
-
**Planning** (one-time):
|
|
51
|
-
- [ ] Confirm interface changes with user
|
|
52
|
-
- [ ] Prioritize behaviors to test
|
|
53
|
-
- [ ] Design for testability (public interface only)
|
|
54
|
-
- [ ] List behaviors, not implementation steps
|
|
55
|
-
|
|
56
|
-
**Loop** (repeat per behavior):
|
|
57
|
-
```
|
|
58
|
-
RED: Write one test → fails
|
|
59
|
-
GREEN: Minimal code to pass → passes
|
|
60
|
-
```
|
|
61
|
-
|
|
62
|
-
**Rules:**
|
|
63
|
-
- One test at a time
|
|
64
|
-
- Only enough code to pass current test
|
|
65
|
-
- Do not anticipate future tests
|
|
66
|
-
- Tests describe behavior through public interfaces, not implementation details
|
|
67
|
-
- Never refactor while RED
|
|
68
|
-
|
|
69
|
-
**Refactor** (after all GREEN):
|
|
70
|
-
- Extract duplication
|
|
71
|
-
- Deepen modules (complexity behind simple interfaces)
|
|
72
|
-
- Run tests after each refactor step
|
|
73
|
-
|
|
74
|
-
### Mode C: Design an Interface (exploration)
|
|
75
|
-
When the interface shape is uncertain. "Design it twice" — generate multiple radically different designs, then compare.
|
|
76
|
-
|
|
77
|
-
1. **Gather requirements** — problem, callers, key operations, constraints
|
|
78
|
-
2. **Spawn 3+ parallel sub-agents** — each with a different constraint:
|
|
79
|
-
- Agent 1: "Minimize method count — aim for 1-3 methods max"
|
|
80
|
-
- Agent 2: "Maximize flexibility — support many use cases"
|
|
81
|
-
- Agent 3: "Optimize for the most common case"
|
|
82
|
-
- Agent 4: "Take inspiration from [specific paradigm]"
|
|
83
|
-
3. **Present designs** — interface signature, usage examples, what it hides
|
|
84
|
-
4. **Compare** — simplicity, generality, implementation efficiency, depth
|
|
85
|
-
5. **Synthesize** — combine insights from multiple options
|
|
86
|
-
|
|
87
|
-
### Mode D: From Plan (plan file exists)
|
|
88
|
-
When oh-planner produced a plan artifact. Execute phases in order.
|
|
89
|
-
|
|
90
|
-
1. Read the plan file ( `~/.local/share/opencode/openhermes/plans/<project-name>-plan-<nnn>.md` )
|
|
91
|
-
2. For each phase: implement per plan spec using TDD discipline (Mode B)
|
|
92
|
-
3. Verify each phase against its verification criteria before moving on
|
|
93
|
-
4. Update plan file with completed phase status
|
|
94
|
-
|
|
95
|
-
## Anti-patterns
|
|
96
|
-
- Polishing a prototype ("it's just a prototype!" — it never is)
|
|
97
|
-
- Writing all tests first (horizontal slicing) — produces brittle, imaginary tests
|
|
98
|
-
- Anticipating future tests — write for what exists now
|
|
99
|
-
- Refactoring while RED — get to GREEN first
|
|
100
|
-
- Letting sub-agents produce similar designs — enforce radical difference
|
|
101
|
-
- Implementing without verifying against plan criteria
|
|
17
|
+
1. Detect builder mode from request — prototype, TDD, interface design, or plan-driven
|
|
18
|
+
2. If plan file exists, read it and execute phases in order
|
|
19
|
+
3. For prototypes: build minimal throwaway app answering the specific question, no polish, surface state after every action
|
|
20
|
+
4. For TDD: red-green-refactor per behavior — one test at a time, minimal code to pass, never refactor while red
|
|
21
|
+
5. For interface design: spawn 3+ parallel sub-agents with radically different constraints, compare, synthesize insights
|
|
22
|
+
6. Verify output against success criteria before routing
|
|
23
|
+
7. Route based on outcome
|
|
102
24
|
|
|
103
25
|
## Routing
|
|
104
26
|
|
|
105
27
|
| Outcome | Route |
|
|
106
28
|
|---------|-------|
|
|
107
|
-
| pass | → oh-gauntlet
|
|
29
|
+
| pass | → oh-gauntlet |
|
|
108
30
|
| fail | → oh-builder (fix issues) |
|
|
109
|
-
| blocker | → surface
|
|
31
|
+
| blocker | → surface |
|
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
# oh-expert — Deep Reference
|
|
2
|
+
|
|
3
|
+
## When to Use
|
|
4
|
+
|
|
5
|
+
Agent is getting things wrong, reversing answers under pushback, hallucinating, drifting from instructions, or showing signs of attention degradation. "Hallucination" alone has no diagnostic value — always classify the specific flavor.
|
|
6
|
+
|
|
7
|
+
## Anti-patterns
|
|
8
|
+
|
|
9
|
+
- Using "hallucination" without classifying factuality vs faithfulness
|
|
10
|
+
- Adding more docs when attention is degraded (problem is buried signal, not missing info)
|
|
11
|
+
- Calling any agreeable wrong answer "sycophancy" without running the diagnostic test
|
|
12
|
+
- Pushing through smart zone drift instead of compacting
|
|
13
|
+
|
|
14
|
+
## Failure Modes
|
|
15
|
+
|
|
16
|
+
### Sycophancy
|
|
17
|
+
|
|
18
|
+
Model favors agreeable answers — agreement rewarded even when wrong.
|
|
19
|
+
|
|
20
|
+
**Signals:** Caves under pushback ("are you sure?" → reverses correct answer). Praises bad input. Framing skews positive when you signal authorship. Mimics your mistakes as confirmation.
|
|
21
|
+
|
|
22
|
+
**Diagnostic test:** "Would I say this without user steer?" If tone/framing shaped the answer → sycophancy.
|
|
23
|
+
|
|
24
|
+
**Fix:** Hide preferences. Re-ask neutrally.
|
|
25
|
+
|
|
26
|
+
### Hallucination (two flavors)
|
|
27
|
+
|
|
28
|
+
- **Factuality** — invented facts (fake functions, wrong API, fake citations). Cause: parametric knowledge gap. Fix: read current docs/files.
|
|
29
|
+
- **Faithfulness** — drifts from loaded context/user instructions. Cause: attention degradation. Fix: clear or compact.
|
|
30
|
+
|
|
31
|
+
### Attention Degradation
|
|
32
|
+
|
|
33
|
+
As session grows, token attention spreads across more competitors. Signal shrinks, noise crowds in.
|
|
34
|
+
|
|
35
|
+
**Signals:** Smart → dumb zone drift. Invents generics not in types. Ignores schema pasted at top.
|
|
36
|
+
|
|
37
|
+
**Fix:** Clear and reload. Do NOT add more docs — problem is buried signal, not missing info.
|
|
38
|
+
|
|
39
|
+
### Smart Zone / Dumb Zone
|
|
40
|
+
|
|
41
|
+
Sharp (early session) → sloppy (~100k tokens, frontier models). Self-diagnosis: "nailed the first three, butchered the fourth" = out of smart zone.
|
|
42
|
+
|
|
43
|
+
**Fix:** Clear or compact. Do not push through.
|
|
44
|
+
|
|
45
|
+
### Non-determinism
|
|
46
|
+
|
|
47
|
+
Same input, different output. Normal. No setting to disable.
|
|
48
|
+
|
|
49
|
+
**Signal:** "Model has been awful today" — probably distribution, not a worse version.
|
|
50
|
+
|
|
51
|
+
### Knowledge Cutoff
|
|
52
|
+
|
|
53
|
+
No parametric knowledge past cutoff date. Post-cutoff APIs → fabrication traps.
|
|
54
|
+
|
|
55
|
+
**Signal:** "Keeps writing v3 SDK — we're on v5." Fix: load current docs.
|
|
56
|
+
|
|
57
|
+
## Working Patterns
|
|
58
|
+
|
|
59
|
+
- **Progressive Disclosure** — AGENTS.md costs tokens every turn. Infrequent instructions behind skills.
|
|
60
|
+
- **Handoff** — structured artifact for session transfer (no return path).
|
|
61
|
+
- **Compaction** — in-memory handoff. Lossy but frees headroom.
|
|
62
|
+
- **Subagent** — spawned agent in own session. One level deep. Reports one result.
|
|
63
|
+
- **Skill vs Tool** — Skill = instructions (loaded on demand). Tool = function (always available).
|
|
64
|
+
|
|
65
|
+
## Diagnostic Map
|
|
66
|
+
|
|
67
|
+
| Symptom | Cause | First Move |
|
|
68
|
+
|---------|-------|------------|
|
|
69
|
+
| Reverses answer under pushback | Sycophancy | Re-ask neutrally |
|
|
70
|
+
| Invents things in loaded doc | Faithfulness hallucination | Clear or compact |
|
|
71
|
+
| Invents things not in any doc | Factuality hallucination | Load relevant docs |
|
|
72
|
+
| Sharp early, sloppy late | Smart zone drift | Compact |
|
|
73
|
+
| Different results same input | Non-determinism | Try again |
|
|
74
|
+
| Writes old API syntax | Knowledge cutoff | Load current docs |
|
|
75
|
+
| Ignores top-of-window context | Attention exhausted | Clear or move context closer |
|
|
76
|
+
|
|
77
|
+
## Avoid These Terms
|
|
78
|
+
|
|
79
|
+
| Instead of | Use |
|
|
80
|
+
|------------|-----|
|
|
81
|
+
| "Hallucination" alone | Factuality or faithfulness hallucination |
|
|
82
|
+
| "Sycophancy" for any pleasing wrong | Only when diagnostic test confirms |
|
|
83
|
+
| "Tool" for a skill | Skill = instructions; Tool = function |
|
|
84
|
+
| "Memory" (context window) | Context window |
|
|
85
|
+
| "Background agent" | AFK |
|
|
@@ -1,16 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: oh-expert
|
|
3
|
-
description: "
|
|
3
|
+
description: "Self-diagnose agent failure modes: sycophancy, hallucination, attention degradation"
|
|
4
4
|
tier: 2
|
|
5
|
-
triggers:
|
|
6
|
-
- "why did you get that wrong"
|
|
7
|
-
- "diagnose yourself"
|
|
8
|
-
- "are you sure"
|
|
9
|
-
- "stop agreeing with me"
|
|
10
|
-
- "sycophancy"
|
|
11
|
-
- "hallucination"
|
|
12
|
-
- "attention"
|
|
13
|
-
- "smart zone"
|
|
14
5
|
route:
|
|
15
6
|
pass:
|
|
16
7
|
- oh-builder
|
|
@@ -21,107 +12,23 @@ route:
|
|
|
21
12
|
|
|
22
13
|
# oh-expert
|
|
23
14
|
|
|
24
|
-
|
|
15
|
+
Self-diagnose and fix agent failure modes (sycophancy, hallucination, attention degradation).
|
|
25
16
|
|
|
26
|
-
##
|
|
17
|
+
## Steps
|
|
27
18
|
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
**Diagnostic test:** Would I have said this without the user's steer? If only tone/framing changed, it is sycophancy.
|
|
38
|
-
|
|
39
|
-
**Fix:** Hide your preferences. Re-ask neutrally — "review this code" not "is this code good?"
|
|
40
|
-
|
|
41
|
-
### Hallucination (two flavors)
|
|
42
|
-
Confidently-wrong output. Two flavors with different causes and fixes:
|
|
43
|
-
|
|
44
|
-
- **Factuality hallucination** — invented/wrong facts (fake function, wrong API, fake citation). Caused by parametric knowledge gaps. Fix: load contextual knowledge (read docs, read the file).
|
|
45
|
-
|
|
46
|
-
- **Faithfulness hallucination** — output drifts from loaded context, user instructions, or own prior reasoning. Symptom of attention degradation. Fix: clear or compact.
|
|
47
|
-
|
|
48
|
-
**Avoid:** "Hallucination" as bare synonym for "wrong" — without naming the flavor the term has no diagnostic value.
|
|
49
|
-
|
|
50
|
-
### Attention Degradation
|
|
51
|
-
As a session grows, each token's attention budget spreads across more competitors. Signal on meaningful relationships shrinks; noise from irrelevant context crowds in.
|
|
52
|
-
|
|
53
|
-
**Surfaces as:** The smart zone → dumb zone drift. Inventing generics not in the type file. Ignoring schema pasted at the top.
|
|
54
|
-
|
|
55
|
-
**Fix:** Clear and reload. Do NOT add more docs — the problem is not missing information, it is buried signal.
|
|
56
|
-
|
|
57
|
-
### Smart Zone / Dumb Zone
|
|
58
|
-
Early in a session the agent is sharp and focused (smart zone). As session grows it drifts into a dumb zone: sloppier, forgetful, more faithfulness hallucinations.
|
|
59
|
-
|
|
60
|
-
**Threshold:** On frontier models, the dumb zone commonly begins around 100k tokens.
|
|
61
|
-
|
|
62
|
-
**Self-diagnosis cue:** "It nailed the first three components and butchered the fourth" = out of smart zone.
|
|
63
|
-
|
|
64
|
-
**Fix:** Clear or compact. Do not push through.
|
|
65
|
-
|
|
66
|
-
### Non-determinism
|
|
67
|
-
Same input can produce different output. A property of how models generate text. No setting to disable it.
|
|
68
|
-
|
|
69
|
-
**Self-diagnosis cue:** "The model has been awful today" → probably not a worse version, just the distribution. Try again tomorrow.
|
|
70
|
-
|
|
71
|
-
**Avoid:** Over-narrativizing. A string of bad runs is not proof something changed.
|
|
72
|
-
|
|
73
|
-
### Knowledge Cutoff
|
|
74
|
-
The date past which a model has no parametric knowledge. Post-cutoff libraries/APIs are fabrication traps.
|
|
75
|
-
|
|
76
|
-
**Self-diagnosis cue:** "It keeps writing v3 SDK syntax — we are on v5." → v5 shipped after cutoff. Load current docs.
|
|
77
|
-
|
|
78
|
-
## Working Patterns
|
|
79
|
-
|
|
80
|
-
### Progressive Disclosure
|
|
81
|
-
AGENTS.md pays token cost every turn. Put infrequently used instructions behind context pointers (skills).
|
|
82
|
-
|
|
83
|
-
### Handoff
|
|
84
|
-
Transferring context from one session to another with no return path. Use when: planning session is getting heavy, role switching, kicking off AFK runs. Always write a structured artifact.
|
|
85
|
-
|
|
86
|
-
### Compaction
|
|
87
|
-
A handoff done in-memory: previous session is summarized and seeds a fresh session. Lossy — detail traded for headroom. Compact manually to control what is kept.
|
|
88
|
-
|
|
89
|
-
### Subagent
|
|
90
|
-
An agent spawned by another agent via tool call. Runs in own session with own context window. Reports a single result back. Cannot spawn further subagents (one level deep). Use to isolate context.
|
|
91
|
-
|
|
92
|
-
### Skill vs Tool
|
|
93
|
-
- **Skill**: instructions the agent reads (loaded on demand)
|
|
94
|
-
- **Tool**: function the agent calls (always available)
|
|
95
|
-
Do not confuse them.
|
|
96
|
-
|
|
97
|
-
## Diagnostic Map
|
|
98
|
-
|
|
99
|
-
| Symptom | Likely Cause | First Move |
|
|
100
|
-
|---|---|---|
|
|
101
|
-
| Reverses answer under pushback | Sycophancy | Re-ask neutrally, hide preference |
|
|
102
|
-
| Invents things in the loaded doc | Faithfulness hallucination / attention degradation | Clear or compact |
|
|
103
|
-
| Invents things not in any doc | Factuality hallucination / parametric gap | Load relevant docs |
|
|
104
|
-
| Sharp early, sloppy late | Smart zone → dumb zone drift | Compact, do not push through |
|
|
105
|
-
| Different results same input | Non-determinism (normal) | Try again |
|
|
106
|
-
| Writes old API syntax | Knowledge cutoff | Load current docs |
|
|
107
|
-
| Agrees with bad ideas | Sycophancy | Phrase prompts neutrally |
|
|
108
|
-
| Ignores context at top of window | Attention budget exhausted | Clear or move critical context closer |
|
|
109
|
-
|
|
110
|
-
## Avoid These Terms (imprecise or wrong)
|
|
111
|
-
|
|
112
|
-
| Instead of | Use |
|
|
113
|
-
|---|---|
|
|
114
|
-
| "Hallucination" alone | Factuality hallucination or faithfulness hallucination |
|
|
115
|
-
| "Sycophancy" for any pleasing wrong answer | Only when diagnostic test confirms |
|
|
116
|
-
| "Tool" for a skill | Skill = instructions read; Tool = function called |
|
|
117
|
-
| "Memory" (for context window) | Context window |
|
|
118
|
-
| "Working memory" | Contextual knowledge |
|
|
119
|
-
| "Background agent" | AFK |
|
|
19
|
+
1. Classify the symptom using the diagnostic map
|
|
20
|
+
2. Identify the specific failure mode
|
|
21
|
+
3. Apply the appropriate fix from the map
|
|
22
|
+
4. If sycophancy suspected, re-ask neutrally without user steer
|
|
23
|
+
5. If hallucination suspected, load current docs or compact context
|
|
24
|
+
6. If attention degraded, clear and reload
|
|
25
|
+
7. If non-determinism, try again
|
|
26
|
+
8. If smart zone drift detected, compact — do not push through
|
|
120
27
|
|
|
121
28
|
## Routing
|
|
122
29
|
|
|
123
30
|
| Outcome | Route |
|
|
124
31
|
|---------|-------|
|
|
125
32
|
| pass | → oh-builder (implement fix) or oh-gauntlet (re-test) |
|
|
126
|
-
| fail | → oh-expert (re-diagnose
|
|
127
|
-
| blocker | → surface
|
|
33
|
+
| fail | → oh-expert (re-diagnose) |
|
|
34
|
+
| blocker | → surface |
|
|
@@ -0,0 +1,182 @@
|
|
|
1
|
+
# oh-facade — Deep Reference
|
|
2
|
+
|
|
3
|
+
## Phase 0: Redesign Entry (existing projects)
|
|
4
|
+
|
|
5
|
+
Skip this phase for new builds. For existing projects, run this scan first:
|
|
6
|
+
|
|
7
|
+
### 0a. Scan
|
|
8
|
+
Read the codebase. Identify framework, styling method (Tailwind, vanilla CSS, styled-components, etc.), and current design patterns.
|
|
9
|
+
|
|
10
|
+
### 0b. Diagnose
|
|
11
|
+
Run the audit from Phase 4. List every generic pattern, weak point, and missing state found.
|
|
12
|
+
|
|
13
|
+
### 0c. Fix in-place
|
|
14
|
+
Apply targeted upgrades working with the existing stack. Do not rewrite from scratch. Improve what's there.
|
|
15
|
+
|
|
16
|
+
## Phase 1: Concept
|
|
17
|
+
|
|
18
|
+
### 1a. Context
|
|
19
|
+
What product/feature? Who uses it? What problem does it solve? Technical constraints (framework, viewport, a11y)? New build or redesign?
|
|
20
|
+
|
|
21
|
+
### 1b. Direction Archetype
|
|
22
|
+
|
|
23
|
+
Commit to one. Do not hedge.
|
|
24
|
+
|
|
25
|
+
| Archetype | Best for | Substrate | Density | Key Traits |
|
|
26
|
+
|---|---|---|---|---|
|
|
27
|
+
| Warm Minimalist | Content sites, portfolios | Light (warm off-white #F7F6F3) | Low (1-3) | Ultra-flat bento grids, muted pastels, editorial serif headers, no shadows |
|
|
28
|
+
| Premium SaaS | Web apps, dashboards | Light/Dark (zinc) | Medium (4-7) | Standard app patterns, glass nav, structured dashboards |
|
|
29
|
+
| Industrial Brutalist | Data dashboards, monitoring, tools | Dark (charcoal #1a1a1a) | High (7-10) | Swiss typographic print, CRT terminals, rigid grids, extreme contrast |
|
|
30
|
+
| Creative/Expressive | Marketing, showcases | Dark or Light | Variable | GSAP ScrollTriggers, pinning, stacking, scrubbing, Python-randomized layout |
|
|
31
|
+
|
|
32
|
+
### 1c. Metric Dials
|
|
33
|
+
|
|
34
|
+
| Dial | 1-3 | 4-7 | 8-10 |
|
|
35
|
+
|------|-----|-----|------|
|
|
36
|
+
| VARIANCE | Symmetric | Offset, asymmetric | Chaotic |
|
|
37
|
+
| MOTION | CSS only | CSS + spring | Cinematic |
|
|
38
|
+
| DENSITY | Airy | Standard app | Cockpit |
|
|
39
|
+
|
|
40
|
+
**Default:** VARIANCE=6, MOTION=5, DENSITY=4
|
|
41
|
+
|
|
42
|
+
### 1d. Output Brief
|
|
43
|
+
Single paragraph: archetype, substrate, dials, key differentiator. Self-contradictory → surface. Otherwise → Phase 2.
|
|
44
|
+
|
|
45
|
+
## Phase 2: Design System
|
|
46
|
+
|
|
47
|
+
### 2a. Color
|
|
48
|
+
- **Neutral**: Zinc or Slate. One. Never mix.
|
|
49
|
+
- **Accent**: Exactly one. Saturation < 80%. No AI purple/blue.
|
|
50
|
+
- **Surface**: Background, card, border, elevated. Specific hex.
|
|
51
|
+
- **Text**: Primary, secondary, muted, inverse. Specific hex.
|
|
52
|
+
- **Semantic**: Success, warning, error, info. Specific hex.
|
|
53
|
+
- **Dark**: Never `#000`. Use `#0a0a0a` or `#121212`.
|
|
54
|
+
|
|
55
|
+
### 2b. Typography
|
|
56
|
+
- **Display**: Premium sans (Geist, Satoshi, Cabinet Grotesk, Outfit, Switzer). Tight tracking `-0.03em` to `-0.05em`. Fluid `clamp()`.
|
|
57
|
+
- **Body**: Leading 1.6-1.8. Max 65ch. Off-black, not `#000`.
|
|
58
|
+
- **Mono**: JetBrains Mono, Geist Mono, SF Mono. Required when DENSITY > 5.
|
|
59
|
+
- **Serif**: Editorial/creative only (Fraunces, Instrument Serif). Never in dashboards.
|
|
60
|
+
- **BANNED**: Inter, Roboto, Arial, Open Sans, Helvetica, Georgia, Times New Roman.
|
|
61
|
+
|
|
62
|
+
### 2c. Components
|
|
63
|
+
**Buttons:** Primary (solid, off-black/accent), Secondary (outline/ghost), Icon (square). Hover lift/darken. Active `scale(0.98)`. Focus ring.
|
|
64
|
+
|
|
65
|
+
**Cards:** Double-Bezel (outer shell + inner core) or border-only (`border-t`/`divide-y` for high density). Cards only when elevation communicates hierarchy.
|
|
66
|
+
|
|
67
|
+
**Forms:** Label above, error below. Focus ring. Touch targets ≥ 44px.
|
|
68
|
+
|
|
69
|
+
**Loading:** Skeletons matching layout dimensions. No spinners.
|
|
70
|
+
|
|
71
|
+
**Empty:** Composed illustration + message + action button.
|
|
72
|
+
|
|
73
|
+
**Nav:** Glass floating pill (low density), sidebar (medium/high), or top bar. Active state. Mobile collapse (not hamburger-only).
|
|
74
|
+
|
|
75
|
+
### 2d. Layout
|
|
76
|
+
- **Grid**: CSS Grid. No flexbox percentage math for multi-column.
|
|
77
|
+
- **Container**: 1200-1440px max-width with auto margins.
|
|
78
|
+
- **Section spacing**: `py-32 md:py-48` (low), `py-24` (medium), `py-16` (high).
|
|
79
|
+
- **Responsive**: All multi-column → single below 768px. No exceptions.
|
|
80
|
+
- **Full-height**: `min-h-[100dvh]`. Never `h-screen` (iOS Safari bug).
|
|
81
|
+
- **BANNED**: Centered hero when VARIANCE > 4, 3-equal-card rows, edge-to-edge on wide screens.
|
|
82
|
+
|
|
83
|
+
### 2e. Motion
|
|
84
|
+
- **Spring**: `cubic-bezier(0.32, 0.72, 0, 1)` or Framer `spring(stiffness:100, damping:20)`. No linear.
|
|
85
|
+
- **Scroll entry**: fade up `translateY(12-24px)` + opacity 0→1, 600-800ms. IntersectionObserver or CSS scroll-driven. No `window.addEventListener('scroll')`.
|
|
86
|
+
- **Stagger**: Lists cascade via `animation-delay` or `staggerChildren`.
|
|
87
|
+
- **Hover**: scale, translate, shadow, or color. 200-300ms.
|
|
88
|
+
- **Active**: Every clickable gets `scale(0.98)` or `translateY(1px)`.
|
|
89
|
+
- **Performance**: Only `transform` and `opacity`. No `top/left/width/height`. `will-change: transform` sparingly.
|
|
90
|
+
- **Backdrop-blur**: Fixed/sticky elements only. Never scrolling containers.
|
|
91
|
+
|
|
92
|
+
### 2e(ii). GSAP Motion (Creative/Expressive archetype only)
|
|
93
|
+
|
|
94
|
+
When MOTION dial >= 7 or archetype is Creative/Expressive, use GSAP ScrollTrigger:
|
|
95
|
+
- **Scroll Pinning** — pin a section title on the left while gallery scrolls on the right: `pin: true, anticipatePin: 1`
|
|
96
|
+
- **Image Scale & Fade** — images start `scale(0.8)`, grow to `scale(1.0)` on enter, darken/fade to `opacity(0.2)` on exit
|
|
97
|
+
- **Horizontal Scroll** — horizontal scroll section with pinned container, scrub trigger
|
|
98
|
+
- **Staggered Reveal** — `stagger: 0.15` on child elements via `ScrollTrigger.batch()`
|
|
99
|
+
- **Text Splitting** — split text into lines/chars, reveal each with stagger
|
|
100
|
+
- **Spring Physics** — `cubic-bezier(0.32, 0.72, 0, 1)` or Framer `spring(stiffness:100, damping:20)`
|
|
101
|
+
- **Python Randomization** — use deterministic seed (character count of prompt `%` math) to select hero architecture, typography stack, GSAP paradigm. Never default to same layout twice.
|
|
102
|
+
- **Gapless Bento Grids** — use `grid-flow-dense` / `grid-auto-flow: dense`. Verify mathematically that `col-span`/`row-span` values interlock with no empty cells.
|
|
103
|
+
|
|
104
|
+
## Phase 3: Build
|
|
105
|
+
|
|
106
|
+
### 3a. Foundations
|
|
107
|
+
CSS custom properties for colors, spacing, typography, shadows, radii. Tailwind config extensions mapping tokens to utilities. Theme provider. Component directory structure.
|
|
108
|
+
|
|
109
|
+
### 3b. Component Library
|
|
110
|
+
Implement ALL defined components with every state: default, hover, active, focus-visible, disabled, loading (skeleton), empty (illustration + action), error (inline). Performance: transform/opacity only, systemic z-index scale.
|
|
111
|
+
|
|
112
|
+
### 3c. Pages
|
|
113
|
+
Full interface from components. Responsive collapse at 768px. All viewport states (loading → populated → empty → error). Nav with active states and mobile collapse.
|
|
114
|
+
|
|
115
|
+
### 3d. Requirements
|
|
116
|
+
- Check `package.json` before importing — never assume a library exists
|
|
117
|
+
- Framework-appropriate patterns (Server Components, island architecture)
|
|
118
|
+
- Semantic HTML: `<nav>`, `<main>`, `<section>`, `<article>`, `<aside>`, `<header>`, `<footer>`
|
|
119
|
+
- A11y: focus rings, skip-to-content, alt text, aria labels
|
|
120
|
+
- Meta: `<title>`, description, `og:image`, viewport
|
|
121
|
+
|
|
122
|
+
## Phase 4: Audit
|
|
123
|
+
|
|
124
|
+
### Priority 1 (do first)
|
|
125
|
+
- **Typography**: font matches spec? scale correct? tracking? no orphans? max-width?
|
|
126
|
+
- **Color**: single accent? saturation < 80%? no AI purple? consistent? dark not pure black?
|
|
127
|
+
- **Layout**: grid not flexbox math? `min-h-[100dvh]`? responsive at 768px?
|
|
128
|
+
|
|
129
|
+
### Priority 2 (feel)
|
|
130
|
+
- **Interactivity**: hover on all clickables? active feedback? focus rings? 200-300ms transitions?
|
|
131
|
+
- **States**: every component has loading/empty/error? skeletons (not spinners)?
|
|
132
|
+
- **Motion**: scroll entries? staggered? spring physics?
|
|
133
|
+
|
|
134
|
+
### Priority 3 (content)
|
|
135
|
+
- No lorem ipsum, cliches (Elevate, etc), generic names, emojis, bad icons?
|
|
136
|
+
|
|
137
|
+
### Priority 4 (hardening)
|
|
138
|
+
- Double-Bezel or appropriate card? button-in-button? nav active states?
|
|
139
|
+
- Consistent icon stroke? semantic HTML? no inline styles?
|
|
140
|
+
- 404 page? skip-to-content? meta tags? cookie consent?
|
|
141
|
+
|
|
142
|
+
### Priority 5 (existing project redesign scan)
|
|
143
|
+
- **Typography audit**: browser default fonts or Inter everywhere? Only Regular/Bold weights? Missing letter-spacing? All-caps subheaders everywhere? Orphaned words?
|
|
144
|
+
- **Color audit**: pure `#000` background? Oversaturated accents? Mixing warm + cool grays? AI purple/blue gradient? Generic `box-shadow` (pure black tint)? No texture (pure flat)?
|
|
145
|
+
- **Layout audit**: 3-equal-card rows? `height: 100vh` instead of `min-h-[100dvh]`? Complex flexbox percentage math? Everything centered and symmetrical?
|
|
146
|
+
- **Surface audit**: flat sections with no visual depth? No background imagery? Sudden dark section in light page?
|
|
147
|
+
- **Icon audit**: generic thin-line icon library? Rocket ship / shield cliches?
|
|
148
|
+
|
|
149
|
+
### Phase 5: Iterate
|
|
150
|
+
1. Fix in Priority order. Re-audit after each level.
|
|
151
|
+
2. All P1-P3 pass → done. P4 surface as recommendations.
|
|
152
|
+
3. Blocked on a check → narrow scope or surface.
|
|
153
|
+
|
|
154
|
+
## Hard Bans
|
|
155
|
+
|
|
156
|
+
- No emojis in code/content/alt/markup
|
|
157
|
+
- No Lorem Ipsum — write real draft copy
|
|
158
|
+
- No "Elevate", "Seamless", "Next-Gen", "Unleash", "Game-changer"
|
|
159
|
+
- No generic names (John Doe, Acme Corp)
|
|
160
|
+
- No rocket ship / shield cliche icons
|
|
161
|
+
- No purple/blue neon gradients
|
|
162
|
+
- No dark section in white page without committed dark mode
|
|
163
|
+
- No animated `top/left/width/height`
|
|
164
|
+
- No `window.addEventListener('scroll')`
|
|
165
|
+
- No `h-screen`
|
|
166
|
+
- No pill shapes on large containers, cards, or primary buttons
|
|
167
|
+
- No generic icon libraries (Lucide, Feather, Heroicons)
|
|
168
|
+
- No 3-equal-card feature rows — replace with 2-column zig-zag, asymmetric grid, or masonry
|
|
169
|
+
- No `#000000` backgrounds — use off-black `#0a0a0a` or `#121212`
|
|
170
|
+
- No even 45-degree linear gradients — break with radial, noise overlay, or mesh
|
|
171
|
+
- No mixing warm and cool grays — stick to one gray family throughout
|
|
172
|
+
- No random dark section in light page (or vice versa) — commit to one substrate
|
|
173
|
+
- No orphaned words — use `text-wrap: balance` or `text-wrap: pretty`
|
|
174
|
+
|
|
175
|
+
## Design Principles
|
|
176
|
+
|
|
177
|
+
1. **Intentionality** — Commit to direction. Maximalism and minimalism both work if committed.
|
|
178
|
+
2. **Engineering** — Buttons have structure, physics, a11y. Not just "style."
|
|
179
|
+
3. **Consistency** — One palette, one font system, one architecture.
|
|
180
|
+
4. **Performance** — Beautiful + laggy = not beautiful. Transform-only, guarded backdrop-blur.
|
|
181
|
+
5. **Ship every state** — Default-only is not production.
|
|
182
|
+
6. **Redesign first** — For existing projects, diagnose before prescribing. Improve in-place. A targeted upgrade beats a rewrite.
|