openhermes 4.3.0 → 4.11.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (143) hide show
  1. package/CONTEXT.md +10 -1
  2. package/README.md +54 -42
  3. package/bootstrap.ts +396 -142
  4. package/harness/agents/oh-browser.md +97 -0
  5. package/harness/agents/oh-builder.md +78 -0
  6. package/harness/agents/oh-facade.md +75 -0
  7. package/harness/agents/oh-fusion.md +45 -0
  8. package/harness/agents/oh-gauntlet.md +71 -0
  9. package/harness/agents/oh-grill.md +71 -0
  10. package/harness/agents/oh-investigate.md +60 -0
  11. package/harness/agents/oh-manifest.md +95 -0
  12. package/harness/agents/oh-plan-review.md +40 -0
  13. package/harness/agents/oh-planner.md +50 -0
  14. package/harness/agents/oh-refactor.md +37 -0
  15. package/harness/agents/oh-retro.md +46 -0
  16. package/harness/agents/oh-review.md +85 -0
  17. package/harness/agents/oh-security.md +83 -0
  18. package/harness/agents/oh-ship.md +76 -0
  19. package/harness/agents/oh-skill-craft.md +38 -0
  20. package/harness/agents/openhermes.md +28 -73
  21. package/harness/codex/AUTOPILOT.md +235 -87
  22. package/harness/codex/CHARTER.md +80 -0
  23. package/harness/instructions/SHELL.md +76 -0
  24. package/harness/lib/background/background.test.ts +197 -0
  25. package/harness/lib/background/index.ts +7 -0
  26. package/harness/lib/background/interfaces.ts +31 -0
  27. package/harness/lib/background/manager.ts +320 -0
  28. package/harness/lib/composer/compose.test.ts +168 -0
  29. package/harness/lib/composer/compose.ts +65 -0
  30. package/harness/lib/composer/fragments/01-identity.md +1 -0
  31. package/harness/lib/composer/fragments/02-delegation.md +6 -0
  32. package/harness/lib/composer/fragments/03-permissions.md +13 -0
  33. package/harness/lib/composer/fragments/04-task-flow.md +15 -0
  34. package/harness/lib/composer/fragments/05-confidence.md +5 -0
  35. package/harness/lib/composer/fragments/06-parallelization.md +17 -0
  36. package/harness/lib/composer/fragments/07-shell.md +41 -0
  37. package/harness/lib/composer/fragments/08-routing.md +8 -0
  38. package/harness/lib/composer/fragments/09-guardrails.md +12 -0
  39. package/harness/lib/composer/index.ts +1 -0
  40. package/harness/lib/hooks/builtins/confidence-gate-hook.ts +70 -0
  41. package/harness/lib/hooks/builtins/delegation-depth-hook.ts +59 -0
  42. package/harness/lib/hooks/builtins/error-recovery-hook.ts +107 -0
  43. package/harness/lib/hooks/builtins/memory-sync-hook.ts +73 -0
  44. package/harness/lib/hooks/builtins/plan-check-hook.ts +43 -0
  45. package/harness/lib/hooks/builtins/route-tracking-hook.ts +147 -0
  46. package/harness/lib/hooks/builtins/sanity-check-hook.ts +52 -0
  47. package/harness/lib/hooks/builtins/shell-detect-hook.ts +96 -0
  48. package/harness/lib/hooks/hooks.test.ts +1016 -0
  49. package/harness/lib/hooks/index.ts +30 -0
  50. package/harness/lib/hooks/registry.ts +416 -0
  51. package/harness/lib/hooks/types.ts +71 -0
  52. package/harness/lib/memory/index.ts +18 -0
  53. package/harness/lib/memory/interfaces.ts +53 -0
  54. package/harness/lib/memory/memory-manager.ts +205 -0
  55. package/harness/lib/memory/memory.test.ts +491 -0
  56. package/harness/lib/memory/plan-store.ts +366 -0
  57. package/harness/lib/recovery/handler.ts +243 -0
  58. package/harness/lib/recovery/index.ts +14 -0
  59. package/harness/lib/recovery/interfaces.ts +48 -0
  60. package/harness/lib/recovery/patterns.ts +149 -0
  61. package/harness/lib/recovery/recovery.test.ts +312 -0
  62. package/harness/lib/sanity/anomaly-tracker.ts +127 -0
  63. package/harness/lib/sanity/checker.ts +178 -0
  64. package/harness/lib/sanity/index.ts +13 -0
  65. package/harness/lib/sanity/interfaces.ts +24 -0
  66. package/harness/lib/sanity/sanity.test.ts +472 -0
  67. package/harness/lib/sync/file-watcher.ts +174 -0
  68. package/harness/lib/sync/index.ts +11 -0
  69. package/harness/lib/sync/interfaces.ts +27 -0
  70. package/harness/lib/sync/plan-sync.ts +536 -0
  71. package/harness/lib/sync/sync.test.ts +832 -0
  72. package/harness/skills/oh-ascii/DEEP.md +292 -0
  73. package/harness/skills/oh-ascii/SKILL.md +31 -0
  74. package/harness/skills/oh-ascii/scripts/check_ascii_alignment.py +596 -0
  75. package/harness/skills/oh-browser/DEEP.md +54 -0
  76. package/harness/skills/oh-browser/SKILL.md +30 -0
  77. package/harness/skills/oh-builder/DEEP.md +63 -0
  78. package/harness/skills/oh-builder/SKILL.md +12 -90
  79. package/harness/skills/oh-expert/DEEP.md +85 -0
  80. package/harness/skills/oh-expert/SKILL.md +13 -106
  81. package/harness/skills/oh-facade/DEEP.md +182 -0
  82. package/harness/skills/oh-facade/SKILL.md +15 -279
  83. package/harness/skills/oh-freeze/DEEP.md +18 -0
  84. package/harness/skills/oh-freeze/SKILL.md +10 -19
  85. package/harness/skills/oh-full-output/DEEP.md +25 -0
  86. package/harness/skills/oh-full-output/SKILL.md +12 -65
  87. package/harness/skills/oh-fusion/DEEP.md +120 -0
  88. package/harness/skills/oh-fusion/SKILL.md +17 -295
  89. package/harness/skills/oh-gauntlet/DEEP.md +77 -0
  90. package/harness/skills/oh-gauntlet/SKILL.md +13 -105
  91. package/harness/skills/oh-grill/DEEP.md +51 -0
  92. package/harness/skills/oh-grill/SKILL.md +12 -63
  93. package/harness/skills/oh-guard/DEEP.md +19 -0
  94. package/harness/skills/oh-guard/SKILL.md +10 -24
  95. package/harness/skills/oh-handoff/DEEP.md +48 -0
  96. package/harness/skills/oh-handoff/SKILL.md +13 -23
  97. package/harness/skills/oh-health/DEEP.md +74 -0
  98. package/harness/skills/oh-health/SKILL.md +13 -76
  99. package/harness/skills/oh-init/DEEP.md +85 -0
  100. package/harness/skills/oh-init/SKILL.md +13 -127
  101. package/harness/skills/oh-investigate/DEEP.md +171 -0
  102. package/harness/skills/oh-investigate/SKILL.md +13 -66
  103. package/harness/skills/oh-issue/DEEP.md +21 -0
  104. package/harness/skills/oh-issue/SKILL.md +11 -27
  105. package/harness/skills/oh-manifest/DEEP.md +92 -0
  106. package/harness/skills/oh-manifest/SKILL.md +12 -109
  107. package/harness/skills/oh-plan-review/DEEP.md +90 -0
  108. package/harness/skills/oh-plan-review/SKILL.md +13 -115
  109. package/harness/skills/oh-planner/DEEP.md +172 -0
  110. package/harness/skills/oh-planner/SKILL.md +12 -149
  111. package/harness/skills/oh-prd/DEEP.md +45 -0
  112. package/harness/skills/oh-prd/SKILL.md +10 -26
  113. package/harness/skills/oh-refactor/DEEP.md +122 -0
  114. package/harness/skills/oh-refactor/SKILL.md +17 -410
  115. package/harness/skills/oh-retro/DEEP.md +26 -0
  116. package/harness/skills/oh-retro/SKILL.md +12 -24
  117. package/harness/skills/oh-review/DEEP.md +87 -0
  118. package/harness/skills/oh-review/SKILL.md +11 -97
  119. package/harness/skills/oh-security/DEEP.md +83 -0
  120. package/harness/skills/oh-security/SKILL.md +14 -96
  121. package/harness/skills/oh-ship/DEEP.md +141 -0
  122. package/harness/skills/oh-ship/SKILL.md +14 -32
  123. package/harness/skills/oh-skill-craft/DEEP.md +369 -0
  124. package/harness/skills/oh-skill-craft/SKILL.md +13 -177
  125. package/harness/skills/oh-skills-link/DEEP.md +16 -0
  126. package/harness/skills/oh-skills-link/SKILL.md +10 -20
  127. package/harness/skills/oh-skills-list/DEEP.md +20 -0
  128. package/harness/skills/oh-skills-list/SKILL.md +9 -22
  129. package/harness/skills/oh-triage/DEEP.md +23 -0
  130. package/harness/skills/oh-triage/SKILL.md +8 -24
  131. package/harness/skills/oh-worktree/DEEP.md +169 -0
  132. package/harness/skills/oh-worktree/SKILL.md +32 -0
  133. package/lib/harness-resolver.ts +8 -10
  134. package/package.json +7 -5
  135. package/tsconfig.json +1 -1
  136. package/harness/codex/CONSTITUTION.md +0 -73
  137. package/harness/codex/ROUTING.md +0 -92
  138. package/harness/commands/oh-doctor.md +0 -26
  139. package/harness/commands/oh-log.md +0 -18
  140. package/harness/instructions/RUNTIME.md +0 -30
  141. package/harness/skills/oh-caveman/SKILL.md +0 -42
  142. package/harness/skills/oh-learn/SKILL.md +0 -101
  143. package/lib/logger.ts +0 -75
@@ -0,0 +1,63 @@
1
+ # oh-builder — Deep Reference
2
+
3
+ ## When to Use
4
+
5
+ Use when you need to build something from a plan, prototype an idea, implement code via TDD, or design an interface. Consumes plan artifacts and produces working code.
6
+
7
+ Benefits from: oh-planner, oh-expert
8
+
9
+ Triggers: "build this", "implement this phase", "write the code for", "prototype", "tdd", "red-green", "design an interface", "implement the feature", "build the component"
10
+
11
+ ## Entry Modes
12
+
13
+ ### Mode A: Prototype (exploratory)
14
+
15
+ Pick branch by question:
16
+ - **"Does this logic feel right?"** → Terminal branch. Tiny interactive app pushing state machine.
17
+ - **"What should this look like?"** → UI branch. Multiple visual variations switchable via param/control bar.
18
+
19
+ **Rules:** Throwaway from day one (clear name). One command to run. No persistence (memory state). Skip polish (no tests, minimal error handling). Surface state after every action. Delete/absorb answer when done.
20
+
21
+ ### Mode B: TDD (test-first)
22
+
23
+ Red-green-refactor with vertical tracer bullets.
24
+
25
+ **Plan:** Confirm interface changes with user. Prioritize behaviors. Design for testability (public interface only). List behaviors, not implementation steps.
26
+
27
+ **Loop** per behavior:
28
+ ```
29
+ RED: One test → fails
30
+ GREEN: Minimal code → passes
31
+ ```
32
+
33
+ **Rules:** One test at a time. Only enough code to pass. Don't anticipate future tests. Tests through public interfaces. Never refactor while RED.
34
+
35
+ **Refactor** (all GREEN): Extract duplication, deepen modules, re-run tests after each step.
36
+
37
+ ### Mode C: Design an Interface
38
+
39
+ "Design it twice" — generate multiple radically different designs, compare.
40
+
41
+ 1. Gather requirements (problem, callers, operations, constraints)
42
+ 2. Spawn 3+ parallel sub-agents with different constraints (min methods, max flexibility, optimize common case, specific paradigm)
43
+ 3. Present designs (signature, examples, what it hides)
44
+ 4. Compare (simplicity, generality, efficiency, depth)
45
+ 5. Synthesize insights
46
+
47
+ ### Mode D: From Plan
48
+
49
+ Plan exists → execute phases in order.
50
+
51
+ 1. Read plan file
52
+ 2. Each phase: implement per spec using TDD (Mode B)
53
+ 3. Verify against criteria before moving on
54
+ 4. Update plan with completed status
55
+
56
+ ## Anti-patterns
57
+
58
+ - Polishing a prototype
59
+ - Writing all tests first (brittle, imaginary)
60
+ - Anticipating future tests
61
+ - Refactoring while RED
62
+ - Sub-agents producing similar designs (enforce radical difference)
63
+ - Implementing without verifying against plan criteria
@@ -1,18 +1,7 @@
1
1
  ---
2
2
  name: oh-builder
3
- description: "ALL-arounder builder prototype, TDD, implement from plan, design interfaces. Consumes the plan file, produces working code."
3
+ description: "Build from plans, prototypes, TDD, or interface design"
4
4
  tier: 4
5
- benefits-from: [oh-planner, oh-expert]
6
- triggers:
7
- - "build this"
8
- - "implement this phase"
9
- - "write the code for"
10
- - "prototype"
11
- - "tdd"
12
- - "red-green"
13
- - "design an interface"
14
- - "implement the feature"
15
- - "build the component"
16
5
  route:
17
6
  pass: oh-gauntlet
18
7
  fail: oh-builder
@@ -21,89 +10,22 @@ route:
21
10
 
22
11
  # oh-builder
23
12
 
24
- The ALL-arounder builder. Merges prototyping, TDD, implementation from plan, and interface design exploration. Consumes the plan file from oh-planner or works standalone.
13
+ ALL-arounder builder: prototyping, TDD, plan implementation, interface design.
25
14
 
26
- ## Entry Modes
15
+ ## Steps
27
16
 
28
- ### Mode A: Prototype (exploratory)
29
- When you need to answer a question before committing.
30
-
31
- **Pick a branch based on the question being asked:**
32
-
33
- - **"Does this logic / state model feel right?"** → **Terminal branch.** Build a tiny interactive terminal app that pushes the state machine through cases that are hard to reason about on paper.
34
- - **"What should this look like?"** → **UI branch.** Generate several radically different visual variations, switchable via a URL param or floating control bar.
35
-
36
- If the question is genuinely ambiguous, default to whichever branch better matches the surrounding code (backend module → terminal, page/component → UI) and state the assumption.
37
-
38
- **Rules that apply to both branches:**
39
-
40
- 1. **Throwaway from day one, clearly marked.** Name it so a casual reader sees it's a prototype.
41
- 2. **One command to run.** Whatever the project's task runner supports — `pnpm <name>`, `bun <path>`, etc.
42
- 3. **No persistence by default.** State lives in memory. If the question involves a database, hit a scratch DB with a clear "PROTOTYPE — wipe me" name.
43
- 4. **Skip the polish.** No tests, no error handling beyond what makes it runnable. The point is to learn and then delete.
44
- 5. **Surface the state.** After every action (terminal) or on every variant switch (UI), show the full relevant state so the user sees what changed.
45
- 6. **Delete or absorb when done.** The answer is the only thing worth keeping. Capture it in a commit, ADR, or note — then delete the prototype code.
46
-
47
- ### Mode B: TDD (test-first implementation)
48
- When building production code from a plan or spec. Red-green-refactor with vertical tracer bullets.
49
-
50
- **Planning** (one-time):
51
- - [ ] Confirm interface changes with user
52
- - [ ] Prioritize behaviors to test
53
- - [ ] Design for testability (public interface only)
54
- - [ ] List behaviors, not implementation steps
55
-
56
- **Loop** (repeat per behavior):
57
- ```
58
- RED: Write one test → fails
59
- GREEN: Minimal code to pass → passes
60
- ```
61
-
62
- **Rules:**
63
- - One test at a time
64
- - Only enough code to pass current test
65
- - Do not anticipate future tests
66
- - Tests describe behavior through public interfaces, not implementation details
67
- - Never refactor while RED
68
-
69
- **Refactor** (after all GREEN):
70
- - Extract duplication
71
- - Deepen modules (complexity behind simple interfaces)
72
- - Run tests after each refactor step
73
-
74
- ### Mode C: Design an Interface (exploration)
75
- When the interface shape is uncertain. "Design it twice" — generate multiple radically different designs, then compare.
76
-
77
- 1. **Gather requirements** — problem, callers, key operations, constraints
78
- 2. **Spawn 3+ parallel sub-agents** — each with a different constraint:
79
- - Agent 1: "Minimize method count — aim for 1-3 methods max"
80
- - Agent 2: "Maximize flexibility — support many use cases"
81
- - Agent 3: "Optimize for the most common case"
82
- - Agent 4: "Take inspiration from [specific paradigm]"
83
- 3. **Present designs** — interface signature, usage examples, what it hides
84
- 4. **Compare** — simplicity, generality, implementation efficiency, depth
85
- 5. **Synthesize** — combine insights from multiple options
86
-
87
- ### Mode D: From Plan (plan file exists)
88
- When oh-planner produced a plan artifact. Execute phases in order.
89
-
90
- 1. Read the plan file ( `~/.local/share/opencode/openhermes/plans/<project-name>-plan-<nnn>.md` )
91
- 2. For each phase: implement per plan spec using TDD discipline (Mode B)
92
- 3. Verify each phase against its verification criteria before moving on
93
- 4. Update plan file with completed phase status
94
-
95
- ## Anti-patterns
96
- - Polishing a prototype ("it's just a prototype!" — it never is)
97
- - Writing all tests first (horizontal slicing) — produces brittle, imaginary tests
98
- - Anticipating future tests — write for what exists now
99
- - Refactoring while RED — get to GREEN first
100
- - Letting sub-agents produce similar designs — enforce radical difference
101
- - Implementing without verifying against plan criteria
17
+ 1. Detect builder mode from request — prototype, TDD, interface design, or plan-driven
18
+ 2. If plan file exists, read it and execute phases in order
19
+ 3. For prototypes: build minimal throwaway app answering the specific question, no polish, surface state after every action
20
+ 4. For TDD: red-green-refactor per behavior — one test at a time, minimal code to pass, never refactor while red
21
+ 5. For interface design: spawn 3+ parallel sub-agents with radically different constraints, compare, synthesize insights
22
+ 6. Verify output against success criteria before routing
23
+ 7. Route based on outcome
102
24
 
103
25
  ## Routing
104
26
 
105
27
  | Outcome | Route |
106
28
  |---------|-------|
107
- | pass | → oh-gauntlet (test built code) |
29
+ | pass | → oh-gauntlet |
108
30
  | fail | → oh-builder (fix issues) |
109
- | blocker | → surface to user |
31
+ | blocker | → surface |
@@ -0,0 +1,85 @@
1
+ # oh-expert — Deep Reference
2
+
3
+ ## When to Use
4
+
5
+ Agent is getting things wrong, reversing answers under pushback, hallucinating, drifting from instructions, or showing signs of attention degradation. "Hallucination" alone has no diagnostic value — always classify the specific flavor.
6
+
7
+ ## Anti-patterns
8
+
9
+ - Using "hallucination" without classifying factuality vs faithfulness
10
+ - Adding more docs when attention is degraded (problem is buried signal, not missing info)
11
+ - Calling any agreeable wrong answer "sycophancy" without running the diagnostic test
12
+ - Pushing through smart zone drift instead of compacting
13
+
14
+ ## Failure Modes
15
+
16
+ ### Sycophancy
17
+
18
+ Model favors agreeable answers — agreement rewarded even when wrong.
19
+
20
+ **Signals:** Caves under pushback ("are you sure?" → reverses correct answer). Praises bad input. Framing skews positive when you signal authorship. Mimics your mistakes as confirmation.
21
+
22
+ **Diagnostic test:** "Would I say this without user steer?" If tone/framing shaped the answer → sycophancy.
23
+
24
+ **Fix:** Hide preferences. Re-ask neutrally.
25
+
26
+ ### Hallucination (two flavors)
27
+
28
+ - **Factuality** — invented facts (fake functions, wrong API, fake citations). Cause: parametric knowledge gap. Fix: read current docs/files.
29
+ - **Faithfulness** — drifts from loaded context/user instructions. Cause: attention degradation. Fix: clear or compact.
30
+
31
+ ### Attention Degradation
32
+
33
+ As session grows, token attention spreads across more competitors. Signal shrinks, noise crowds in.
34
+
35
+ **Signals:** Smart → dumb zone drift. Invents generics not in types. Ignores schema pasted at top.
36
+
37
+ **Fix:** Clear and reload. Do NOT add more docs — problem is buried signal, not missing info.
38
+
39
+ ### Smart Zone / Dumb Zone
40
+
41
+ Sharp (early session) → sloppy (~100k tokens, frontier models). Self-diagnosis: "nailed the first three, butchered the fourth" = out of smart zone.
42
+
43
+ **Fix:** Clear or compact. Do not push through.
44
+
45
+ ### Non-determinism
46
+
47
+ Same input, different output. Normal. No setting to disable.
48
+
49
+ **Signal:** "Model has been awful today" — probably distribution, not a worse version.
50
+
51
+ ### Knowledge Cutoff
52
+
53
+ No parametric knowledge past cutoff date. Post-cutoff APIs → fabrication traps.
54
+
55
+ **Signal:** "Keeps writing v3 SDK — we're on v5." Fix: load current docs.
56
+
57
+ ## Working Patterns
58
+
59
+ - **Progressive Disclosure** — AGENTS.md costs tokens every turn. Infrequent instructions behind skills.
60
+ - **Handoff** — structured artifact for session transfer (no return path).
61
+ - **Compaction** — in-memory handoff. Lossy but frees headroom.
62
+ - **Subagent** — spawned agent in own session. One level deep. Reports one result.
63
+ - **Skill vs Tool** — Skill = instructions (loaded on demand). Tool = function (always available).
64
+
65
+ ## Diagnostic Map
66
+
67
+ | Symptom | Cause | First Move |
68
+ |---------|-------|------------|
69
+ | Reverses answer under pushback | Sycophancy | Re-ask neutrally |
70
+ | Invents things in loaded doc | Faithfulness hallucination | Clear or compact |
71
+ | Invents things not in any doc | Factuality hallucination | Load relevant docs |
72
+ | Sharp early, sloppy late | Smart zone drift | Compact |
73
+ | Different results same input | Non-determinism | Try again |
74
+ | Writes old API syntax | Knowledge cutoff | Load current docs |
75
+ | Ignores top-of-window context | Attention exhausted | Clear or move context closer |
76
+
77
+ ## Avoid These Terms
78
+
79
+ | Instead of | Use |
80
+ |------------|-----|
81
+ | "Hallucination" alone | Factuality or faithfulness hallucination |
82
+ | "Sycophancy" for any pleasing wrong | Only when diagnostic test confirms |
83
+ | "Tool" for a skill | Skill = instructions; Tool = function |
84
+ | "Memory" (context window) | Context window |
85
+ | "Background agent" | AFK |
@@ -1,16 +1,7 @@
1
1
  ---
2
2
  name: oh-expert
3
- description: "AI-expert built-in: shared vocabulary for self-diagnosis, failure modes, attention dynamics, and working patterns"
3
+ description: "Self-diagnose agent failure modes: sycophancy, hallucination, attention degradation"
4
4
  tier: 2
5
- triggers:
6
- - "why did you get that wrong"
7
- - "diagnose yourself"
8
- - "are you sure"
9
- - "stop agreeing with me"
10
- - "sycophancy"
11
- - "hallucination"
12
- - "attention"
13
- - "smart zone"
14
5
  route:
15
6
  pass:
16
7
  - oh-builder
@@ -21,107 +12,23 @@ route:
21
12
 
22
13
  # oh-expert
23
14
 
24
- Shared AI-coding vocabulary for agent self-diagnosis. Every failure mode maps to a specific cause and fix. Use this vocabulary precisely — vague terms ("hallucination" alone, "wrong") have no diagnostic value.
15
+ Self-diagnose and fix agent failure modes (sycophancy, hallucination, attention degradation).
25
16
 
26
- ## Failure Modes
17
+ ## Steps
27
18
 
28
- ### Sycophancy
29
- Confidently agreeable output. The model was trained to favor answers humans liked — agreement is rewarded even when wrong.
30
-
31
- **Surfaces as:**
32
- - Caving under pushback reverses a correct answer when you say "are you sure?"
33
- - Praising bad input agrees a broken plan is brilliant before analyzing it
34
- - Biased framing review skews positive when you signal authorship
35
- - Mimicry repeats your mistakes back as confirmation
36
-
37
- **Diagnostic test:** Would I have said this without the user's steer? If only tone/framing changed, it is sycophancy.
38
-
39
- **Fix:** Hide your preferences. Re-ask neutrally — "review this code" not "is this code good?"
40
-
41
- ### Hallucination (two flavors)
42
- Confidently-wrong output. Two flavors with different causes and fixes:
43
-
44
- - **Factuality hallucination** — invented/wrong facts (fake function, wrong API, fake citation). Caused by parametric knowledge gaps. Fix: load contextual knowledge (read docs, read the file).
45
-
46
- - **Faithfulness hallucination** — output drifts from loaded context, user instructions, or own prior reasoning. Symptom of attention degradation. Fix: clear or compact.
47
-
48
- **Avoid:** "Hallucination" as bare synonym for "wrong" — without naming the flavor the term has no diagnostic value.
49
-
50
- ### Attention Degradation
51
- As a session grows, each token's attention budget spreads across more competitors. Signal on meaningful relationships shrinks; noise from irrelevant context crowds in.
52
-
53
- **Surfaces as:** The smart zone → dumb zone drift. Inventing generics not in the type file. Ignoring schema pasted at the top.
54
-
55
- **Fix:** Clear and reload. Do NOT add more docs — the problem is not missing information, it is buried signal.
56
-
57
- ### Smart Zone / Dumb Zone
58
- Early in a session the agent is sharp and focused (smart zone). As session grows it drifts into a dumb zone: sloppier, forgetful, more faithfulness hallucinations.
59
-
60
- **Threshold:** On frontier models, the dumb zone commonly begins around 100k tokens.
61
-
62
- **Self-diagnosis cue:** "It nailed the first three components and butchered the fourth" = out of smart zone.
63
-
64
- **Fix:** Clear or compact. Do not push through.
65
-
66
- ### Non-determinism
67
- Same input can produce different output. A property of how models generate text. No setting to disable it.
68
-
69
- **Self-diagnosis cue:** "The model has been awful today" → probably not a worse version, just the distribution. Try again tomorrow.
70
-
71
- **Avoid:** Over-narrativizing. A string of bad runs is not proof something changed.
72
-
73
- ### Knowledge Cutoff
74
- The date past which a model has no parametric knowledge. Post-cutoff libraries/APIs are fabrication traps.
75
-
76
- **Self-diagnosis cue:** "It keeps writing v3 SDK syntax — we are on v5." → v5 shipped after cutoff. Load current docs.
77
-
78
- ## Working Patterns
79
-
80
- ### Progressive Disclosure
81
- AGENTS.md pays token cost every turn. Put infrequently used instructions behind context pointers (skills).
82
-
83
- ### Handoff
84
- Transferring context from one session to another with no return path. Use when: planning session is getting heavy, role switching, kicking off AFK runs. Always write a structured artifact.
85
-
86
- ### Compaction
87
- A handoff done in-memory: previous session is summarized and seeds a fresh session. Lossy — detail traded for headroom. Compact manually to control what is kept.
88
-
89
- ### Subagent
90
- An agent spawned by another agent via tool call. Runs in own session with own context window. Reports a single result back. Cannot spawn further subagents (one level deep). Use to isolate context.
91
-
92
- ### Skill vs Tool
93
- - **Skill**: instructions the agent reads (loaded on demand)
94
- - **Tool**: function the agent calls (always available)
95
- Do not confuse them.
96
-
97
- ## Diagnostic Map
98
-
99
- | Symptom | Likely Cause | First Move |
100
- |---|---|---|
101
- | Reverses answer under pushback | Sycophancy | Re-ask neutrally, hide preference |
102
- | Invents things in the loaded doc | Faithfulness hallucination / attention degradation | Clear or compact |
103
- | Invents things not in any doc | Factuality hallucination / parametric gap | Load relevant docs |
104
- | Sharp early, sloppy late | Smart zone → dumb zone drift | Compact, do not push through |
105
- | Different results same input | Non-determinism (normal) | Try again |
106
- | Writes old API syntax | Knowledge cutoff | Load current docs |
107
- | Agrees with bad ideas | Sycophancy | Phrase prompts neutrally |
108
- | Ignores context at top of window | Attention budget exhausted | Clear or move critical context closer |
109
-
110
- ## Avoid These Terms (imprecise or wrong)
111
-
112
- | Instead of | Use |
113
- |---|---|
114
- | "Hallucination" alone | Factuality hallucination or faithfulness hallucination |
115
- | "Sycophancy" for any pleasing wrong answer | Only when diagnostic test confirms |
116
- | "Tool" for a skill | Skill = instructions read; Tool = function called |
117
- | "Memory" (for context window) | Context window |
118
- | "Working memory" | Contextual knowledge |
119
- | "Background agent" | AFK |
19
+ 1. Classify the symptom using the diagnostic map
20
+ 2. Identify the specific failure mode
21
+ 3. Apply the appropriate fix from the map
22
+ 4. If sycophancy suspected, re-ask neutrally without user steer
23
+ 5. If hallucination suspected, load current docs or compact context
24
+ 6. If attention degraded, clear and reload
25
+ 7. If non-determinism, try again
26
+ 8. If smart zone drift detected, compact do not push through
120
27
 
121
28
  ## Routing
122
29
 
123
30
  | Outcome | Route |
124
31
  |---------|-------|
125
32
  | pass | → oh-builder (implement fix) or oh-gauntlet (re-test) |
126
- | fail | → oh-expert (re-diagnose — load fresh context) |
127
- | blocker | → surface to user |
33
+ | fail | → oh-expert (re-diagnose) |
34
+ | blocker | → surface |
@@ -0,0 +1,182 @@
1
+ # oh-facade — Deep Reference
2
+
3
+ ## Phase 0: Redesign Entry (existing projects)
4
+
5
+ Skip this phase for new builds. For existing projects, run this scan first:
6
+
7
+ ### 0a. Scan
8
+ Read the codebase. Identify framework, styling method (Tailwind, vanilla CSS, styled-components, etc.), and current design patterns.
9
+
10
+ ### 0b. Diagnose
11
+ Run the audit from Phase 4. List every generic pattern, weak point, and missing state found.
12
+
13
+ ### 0c. Fix in-place
14
+ Apply targeted upgrades working with the existing stack. Do not rewrite from scratch. Improve what's there.
15
+
16
+ ## Phase 1: Concept
17
+
18
+ ### 1a. Context
19
+ What product/feature? Who uses it? What problem does it solve? Technical constraints (framework, viewport, a11y)? New build or redesign?
20
+
21
+ ### 1b. Direction Archetype
22
+
23
+ Commit to one. Do not hedge.
24
+
25
+ | Archetype | Best for | Substrate | Density | Key Traits |
26
+ |---|---|---|---|---|
27
+ | Warm Minimalist | Content sites, portfolios | Light (warm off-white #F7F6F3) | Low (1-3) | Ultra-flat bento grids, muted pastels, editorial serif headers, no shadows |
28
+ | Premium SaaS | Web apps, dashboards | Light/Dark (zinc) | Medium (4-7) | Standard app patterns, glass nav, structured dashboards |
29
+ | Industrial Brutalist | Data dashboards, monitoring, tools | Dark (charcoal #1a1a1a) | High (7-10) | Swiss typographic print, CRT terminals, rigid grids, extreme contrast |
30
+ | Creative/Expressive | Marketing, showcases | Dark or Light | Variable | GSAP ScrollTriggers, pinning, stacking, scrubbing, Python-randomized layout |
31
+
32
+ ### 1c. Metric Dials
33
+
34
+ | Dial | 1-3 | 4-7 | 8-10 |
35
+ |------|-----|-----|------|
36
+ | VARIANCE | Symmetric | Offset, asymmetric | Chaotic |
37
+ | MOTION | CSS only | CSS + spring | Cinematic |
38
+ | DENSITY | Airy | Standard app | Cockpit |
39
+
40
+ **Default:** VARIANCE=6, MOTION=5, DENSITY=4
41
+
42
+ ### 1d. Output Brief
43
+ Single paragraph: archetype, substrate, dials, key differentiator. Self-contradictory → surface. Otherwise → Phase 2.
44
+
45
+ ## Phase 2: Design System
46
+
47
+ ### 2a. Color
48
+ - **Neutral**: Zinc or Slate. One. Never mix.
49
+ - **Accent**: Exactly one. Saturation < 80%. No AI purple/blue.
50
+ - **Surface**: Background, card, border, elevated. Specific hex.
51
+ - **Text**: Primary, secondary, muted, inverse. Specific hex.
52
+ - **Semantic**: Success, warning, error, info. Specific hex.
53
+ - **Dark**: Never `#000`. Use `#0a0a0a` or `#121212`.
54
+
55
+ ### 2b. Typography
56
+ - **Display**: Premium sans (Geist, Satoshi, Cabinet Grotesk, Outfit, Switzer). Tight tracking `-0.03em` to `-0.05em`. Fluid `clamp()`.
57
+ - **Body**: Leading 1.6-1.8. Max 65ch. Off-black, not `#000`.
58
+ - **Mono**: JetBrains Mono, Geist Mono, SF Mono. Required when DENSITY > 5.
59
+ - **Serif**: Editorial/creative only (Fraunces, Instrument Serif). Never in dashboards.
60
+ - **BANNED**: Inter, Roboto, Arial, Open Sans, Helvetica, Georgia, Times New Roman.
61
+
62
+ ### 2c. Components
63
+ **Buttons:** Primary (solid, off-black/accent), Secondary (outline/ghost), Icon (square). Hover lift/darken. Active `scale(0.98)`. Focus ring.
64
+
65
+ **Cards:** Double-Bezel (outer shell + inner core) or border-only (`border-t`/`divide-y` for high density). Cards only when elevation communicates hierarchy.
66
+
67
+ **Forms:** Label above, error below. Focus ring. Touch targets ≥ 44px.
68
+
69
+ **Loading:** Skeletons matching layout dimensions. No spinners.
70
+
71
+ **Empty:** Composed illustration + message + action button.
72
+
73
+ **Nav:** Glass floating pill (low density), sidebar (medium/high), or top bar. Active state. Mobile collapse (not hamburger-only).
74
+
75
+ ### 2d. Layout
76
+ - **Grid**: CSS Grid. No flexbox percentage math for multi-column.
77
+ - **Container**: 1200-1440px max-width with auto margins.
78
+ - **Section spacing**: `py-32 md:py-48` (low), `py-24` (medium), `py-16` (high).
79
+ - **Responsive**: All multi-column → single below 768px. No exceptions.
80
+ - **Full-height**: `min-h-[100dvh]`. Never `h-screen` (iOS Safari bug).
81
+ - **BANNED**: Centered hero when VARIANCE > 4, 3-equal-card rows, edge-to-edge on wide screens.
82
+
83
+ ### 2e. Motion
84
+ - **Spring**: `cubic-bezier(0.32, 0.72, 0, 1)` or Framer `spring(stiffness:100, damping:20)`. No linear.
85
+ - **Scroll entry**: fade up `translateY(12-24px)` + opacity 0→1, 600-800ms. IntersectionObserver or CSS scroll-driven. No `window.addEventListener('scroll')`.
86
+ - **Stagger**: Lists cascade via `animation-delay` or `staggerChildren`.
87
+ - **Hover**: scale, translate, shadow, or color. 200-300ms.
88
+ - **Active**: Every clickable gets `scale(0.98)` or `translateY(1px)`.
89
+ - **Performance**: Only `transform` and `opacity`. No `top/left/width/height`. `will-change: transform` sparingly.
90
+ - **Backdrop-blur**: Fixed/sticky elements only. Never scrolling containers.
91
+
92
+ ### 2e(ii). GSAP Motion (Creative/Expressive archetype only)
93
+
94
+ When MOTION dial >= 7 or archetype is Creative/Expressive, use GSAP ScrollTrigger:
95
+ - **Scroll Pinning** — pin a section title on the left while gallery scrolls on the right: `pin: true, anticipatePin: 1`
96
+ - **Image Scale & Fade** — images start `scale(0.8)`, grow to `scale(1.0)` on enter, darken/fade to `opacity(0.2)` on exit
97
+ - **Horizontal Scroll** — horizontal scroll section with pinned container, scrub trigger
98
+ - **Staggered Reveal** — `stagger: 0.15` on child elements via `ScrollTrigger.batch()`
99
+ - **Text Splitting** — split text into lines/chars, reveal each with stagger
100
+ - **Spring Physics** — `cubic-bezier(0.32, 0.72, 0, 1)` or Framer `spring(stiffness:100, damping:20)`
101
+ - **Python Randomization** — use deterministic seed (character count of prompt `%` math) to select hero architecture, typography stack, GSAP paradigm. Never default to same layout twice.
102
+ - **Gapless Bento Grids** — use `grid-flow-dense` / `grid-auto-flow: dense`. Verify mathematically that `col-span`/`row-span` values interlock with no empty cells.
103
+
104
+ ## Phase 3: Build
105
+
106
+ ### 3a. Foundations
107
+ CSS custom properties for colors, spacing, typography, shadows, radii. Tailwind config extensions mapping tokens to utilities. Theme provider. Component directory structure.
108
+
109
+ ### 3b. Component Library
110
+ Implement ALL defined components with every state: default, hover, active, focus-visible, disabled, loading (skeleton), empty (illustration + action), error (inline). Performance: transform/opacity only, systemic z-index scale.
111
+
112
+ ### 3c. Pages
113
+ Full interface from components. Responsive collapse at 768px. All viewport states (loading → populated → empty → error). Nav with active states and mobile collapse.
114
+
115
+ ### 3d. Requirements
116
+ - Check `package.json` before importing — never assume a library exists
117
+ - Framework-appropriate patterns (Server Components, island architecture)
118
+ - Semantic HTML: `<nav>`, `<main>`, `<section>`, `<article>`, `<aside>`, `<header>`, `<footer>`
119
+ - A11y: focus rings, skip-to-content, alt text, aria labels
120
+ - Meta: `<title>`, description, `og:image`, viewport
121
+
122
+ ## Phase 4: Audit
123
+
124
+ ### Priority 1 (do first)
125
+ - **Typography**: font matches spec? scale correct? tracking? no orphans? max-width?
126
+ - **Color**: single accent? saturation < 80%? no AI purple? consistent? dark not pure black?
127
+ - **Layout**: grid not flexbox math? `min-h-[100dvh]`? responsive at 768px?
128
+
129
+ ### Priority 2 (feel)
130
+ - **Interactivity**: hover on all clickables? active feedback? focus rings? 200-300ms transitions?
131
+ - **States**: every component has loading/empty/error? skeletons (not spinners)?
132
+ - **Motion**: scroll entries? staggered? spring physics?
133
+
134
+ ### Priority 3 (content)
135
+ - No lorem ipsum, cliches (Elevate, etc), generic names, emojis, bad icons?
136
+
137
+ ### Priority 4 (hardening)
138
+ - Double-Bezel or appropriate card? button-in-button? nav active states?
139
+ - Consistent icon stroke? semantic HTML? no inline styles?
140
+ - 404 page? skip-to-content? meta tags? cookie consent?
141
+
142
+ ### Priority 5 (existing project redesign scan)
143
+ - **Typography audit**: browser default fonts or Inter everywhere? Only Regular/Bold weights? Missing letter-spacing? All-caps subheaders everywhere? Orphaned words?
144
+ - **Color audit**: pure `#000` background? Oversaturated accents? Mixing warm + cool grays? AI purple/blue gradient? Generic `box-shadow` (pure black tint)? No texture (pure flat)?
145
+ - **Layout audit**: 3-equal-card rows? `height: 100vh` instead of `min-h-[100dvh]`? Complex flexbox percentage math? Everything centered and symmetrical?
146
+ - **Surface audit**: flat sections with no visual depth? No background imagery? Sudden dark section in light page?
147
+ - **Icon audit**: generic thin-line icon library? Rocket ship / shield cliches?
148
+
149
+ ### Phase 5: Iterate
150
+ 1. Fix in Priority order. Re-audit after each level.
151
+ 2. All P1-P3 pass → done. P4 surface as recommendations.
152
+ 3. Blocked on a check → narrow scope or surface.
153
+
154
+ ## Hard Bans
155
+
156
+ - No emojis in code/content/alt/markup
157
+ - No Lorem Ipsum — write real draft copy
158
+ - No "Elevate", "Seamless", "Next-Gen", "Unleash", "Game-changer"
159
+ - No generic names (John Doe, Acme Corp)
160
+ - No rocket ship / shield cliche icons
161
+ - No purple/blue neon gradients
162
+ - No dark section in white page without committed dark mode
163
+ - No animated `top/left/width/height`
164
+ - No `window.addEventListener('scroll')`
165
+ - No `h-screen`
166
+ - No pill shapes on large containers, cards, or primary buttons
167
+ - No generic icon libraries (Lucide, Feather, Heroicons)
168
+ - No 3-equal-card feature rows — replace with 2-column zig-zag, asymmetric grid, or masonry
169
+ - No `#000000` backgrounds — use off-black `#0a0a0a` or `#121212`
170
+ - No even 45-degree linear gradients — break with radial, noise overlay, or mesh
171
+ - No mixing warm and cool grays — stick to one gray family throughout
172
+ - No random dark section in light page (or vice versa) — commit to one substrate
173
+ - No orphaned words — use `text-wrap: balance` or `text-wrap: pretty`
174
+
175
+ ## Design Principles
176
+
177
+ 1. **Intentionality** — Commit to direction. Maximalism and minimalism both work if committed.
178
+ 2. **Engineering** — Buttons have structure, physics, a11y. Not just "style."
179
+ 3. **Consistency** — One palette, one font system, one architecture.
180
+ 4. **Performance** — Beautiful + laggy = not beautiful. Transform-only, guarded backdrop-blur.
181
+ 5. **Ship every state** — Default-only is not production.
182
+ 6. **Redesign first** — For existing projects, diagnose before prescribing. Improve in-place. A targeted upgrade beats a rewrite.