create-ccc-tutor 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (106) hide show
  1. package/README.md +41 -0
  2. package/bin/cli.js +76 -0
  3. package/package.json +28 -0
  4. package/template/.claude/commands/abandon.md +7 -0
  5. package/template/.claude/commands/add-anti-flag.md +7 -0
  6. package/template/.claude/commands/add-constitution-clause.md +7 -0
  7. package/template/.claude/commands/audit-spec.md +7 -0
  8. package/template/.claude/commands/commit.md +7 -0
  9. package/template/.claude/commands/constitution-edit.md +7 -0
  10. package/template/.claude/commands/db-schema.md +7 -0
  11. package/template/.claude/commands/exam.md +66 -0
  12. package/template/.claude/commands/execution-plan.md +7 -0
  13. package/template/.claude/commands/feature-draft.md +7 -0
  14. package/template/.claude/commands/handoff.md +7 -0
  15. package/template/.claude/commands/implement.md +7 -0
  16. package/template/.claude/commands/init.md +7 -0
  17. package/template/.claude/commands/next.md +7 -0
  18. package/template/.claude/commands/offload.md +7 -0
  19. package/template/.claude/commands/pickup.md +7 -0
  20. package/template/.claude/commands/recall.md +7 -0
  21. package/template/.claude/commands/remember.md +7 -0
  22. package/template/.claude/commands/slide.md +87 -0
  23. package/template/.claude/commands/spec-finalize.md +7 -0
  24. package/template/.claude/commands/test-fix.md +7 -0
  25. package/template/.claude/commands/uninstall.md +7 -0
  26. package/template/.claude/settings.json +161 -0
  27. package/template/.claude-plugin/plugin.json +41 -0
  28. package/template/.codex/config.toml +24 -0
  29. package/template/.codex/hooks.json +4 -0
  30. package/template/.codex/install-skills.sh +18 -0
  31. package/template/.codex/skills/exam/SKILL.md +61 -0
  32. package/template/.codex/skills/slide/SKILL.md +69 -0
  33. package/template/.harness/agents/README.md +70 -0
  34. package/template/.harness/agents/_template/junior-agent-template.md +116 -0
  35. package/template/.harness/agents/backend-reviewer.md +153 -0
  36. package/template/.harness/agents/frontend-reviewer.md +158 -0
  37. package/template/.harness/agents/security-reviewer.md +148 -0
  38. package/template/.harness/agents/test-fixer.md +147 -0
  39. package/template/.harness/docs/doc-sync.md +29 -0
  40. package/template/.harness/docs/git-hygiene.md +56 -0
  41. package/template/.harness/docs/spec-model.md +47 -0
  42. package/template/.harness/docs/tool-map.md +120 -0
  43. package/template/.harness/docs/workflow.md +59 -0
  44. package/template/.harness/scripts/README.md +70 -0
  45. package/template/.harness/scripts/auditor-gate.sh +388 -0
  46. package/template/.harness/scripts/bootstrap-check.sh +103 -0
  47. package/template/.harness/scripts/budget-monitor.sh +223 -0
  48. package/template/.harness/scripts/check-prereqs.sh +165 -0
  49. package/template/.harness/scripts/checkpoint-recall.sh +136 -0
  50. package/template/.harness/scripts/checkpoint-write.sh +281 -0
  51. package/template/.harness/scripts/decision-log-append.sh +90 -0
  52. package/template/.harness/scripts/env-check.sh +286 -0
  53. package/template/.harness/scripts/format-edit.sh +80 -0
  54. package/template/.harness/scripts/lint-bans.sh +110 -0
  55. package/template/.harness/scripts/memory-archive.sh +129 -0
  56. package/template/.harness/scripts/memory-recall.sh +197 -0
  57. package/template/.harness/scripts/memory-snapshot.sh +124 -0
  58. package/template/.harness/scripts/post-migration.sh +58 -0
  59. package/template/.harness/scripts/precommit-cycles.sh +74 -0
  60. package/template/.harness/scripts/precommit-typecheck.sh +69 -0
  61. package/template/.harness/scripts/scratchpad-recall.sh +83 -0
  62. package/template/.harness/scripts/scratchpad-update.sh +39 -0
  63. package/template/.harness/scripts/standalone-bootstrap.md +443 -0
  64. package/template/.harness/skills/abandon/SKILL.md +157 -0
  65. package/template/.harness/skills/add-anti-flag/SKILL.md +205 -0
  66. package/template/.harness/skills/add-constitution-clause/SKILL.md +244 -0
  67. package/template/.harness/skills/audit-spec/SKILL.md +395 -0
  68. package/template/.harness/skills/commit/SKILL.md +270 -0
  69. package/template/.harness/skills/constitution-edit/SKILL.md +292 -0
  70. package/template/.harness/skills/db-schema/SKILL.md +145 -0
  71. package/template/.harness/skills/db-schema/references/methodology.md +202 -0
  72. package/template/.harness/skills/execution-plan/SKILL.md +346 -0
  73. package/template/.harness/skills/feature-draft/SKILL.md +426 -0
  74. package/template/.harness/skills/handoff/SKILL.md +211 -0
  75. package/template/.harness/skills/implement/SKILL.md +355 -0
  76. package/template/.harness/skills/init/SKILL.md +805 -0
  77. package/template/.harness/skills/next/SKILL.md +245 -0
  78. package/template/.harness/skills/offload/SKILL.md +134 -0
  79. package/template/.harness/skills/pickup/SKILL.md +213 -0
  80. package/template/.harness/skills/recall/SKILL.md +159 -0
  81. package/template/.harness/skills/remember/SKILL.md +205 -0
  82. package/template/.harness/skills/spec-finalize/SKILL.md +196 -0
  83. package/template/.harness/skills/test-fix/SKILL.md +363 -0
  84. package/template/.harness/skills/uninstall/SKILL.md +370 -0
  85. package/template/.harness/state/install.json +83 -0
  86. package/template/AGENTS.md +262 -0
  87. package/template/CCC_MAGI_LICENSE +201 -0
  88. package/template/CCC_MAGI_README.md +986 -0
  89. package/template/CLAUDE.md +658 -0
  90. package/template/codex.md +39 -0
  91. package/template/constitution.md +164 -0
  92. package/template/course/README.md +15 -0
  93. package/template/course/course_code(example)/exam/README.md +2 -0
  94. package/template/course/course_code(example)/slide/slide_example-1.pdf +40 -0
  95. package/template/course/course_code(example)/slide/slide_example-2.pdf +40 -0
  96. package/template/docs/features/slide-query-implementation.md +79 -0
  97. package/template/docs/features/slide-query.md +211 -0
  98. package/template/docs-harness/README.md +42 -0
  99. package/template/docs-harness/adoption-playbook.md +373 -0
  100. package/template/docs-harness/ccc-step1-driver-template.md +288 -0
  101. package/template/docs-harness/cli-configs-README.md +78 -0
  102. package/template/docs-harness/context-architecture-v2.md +249 -0
  103. package/template/docs-harness/design-spec.md +437 -0
  104. package/template/docs-harness/memory-layer.md +135 -0
  105. package/template/docs-harness/retrospective-notes.md +204 -0
  106. package/template/gitignore +106 -0
@@ -0,0 +1,437 @@
1
+ # Harness Design Spec
2
+
3
+ This document is the **architectural rationale** for the harness. It explains *why* the harness is shaped the way it is — the operating model, the load-bearing invariants, and the blind spots that motivated the design.
4
+
5
+ Operational details (how each stage works step-by-step) live in `outcome/skills/<skill>/SKILL.md`. Rules and identity live in `outcome/constitution.md`. Workflow conventions live in `outcome/CLAUDE.md`. This file is the meta-layer that explains why all of that is structured the way it is.
6
+
7
+ ---
8
+
9
+ ## 0. What this document is for
10
+
11
+ The harness encodes a small number of opinionated choices about how an AI-assisted workflow should be run:
12
+
13
+ - A human owns intent; AI never overrides it.
14
+ - Every code change passes a different-model audit before commit.
15
+ - The thing a non-technical person reads (the spec) is separate from the thing engineers read (the implementation notes).
16
+ - Real-human verification is mandatory, not optional.
17
+ - Specs and code stay in sync, mechanically enforced.
18
+
19
+ This document explains the rationale for each of those choices, plus the secondary structures (lanes, stages, scenario IDs, agent roles) that follow from them.
20
+
21
+ A contributor — human or AI — who understands the rationale can extend the harness without breaking it. A contributor who only reads the operational docs will eventually rationalize past an invariant that exists for a non-obvious reason. This file is the defense against that.
22
+
23
+ ---
24
+
25
+ ## 1. The operating model
26
+
27
+ The harness organizes work like a small company.
28
+
29
+ ### 1.1 Roles
30
+
31
+ ```
32
+ CEO (the human user)
33
+ - Defines intent (happy paths, edge-case behavior)
34
+ - Makes business / user-impact decisions
35
+ - Defines project identity (constitution.md Section 2 — who we serve, what we don't do, compliance, performance floors)
36
+ - Runs smoke tests on the actual product
37
+ - Watches production telemetry
38
+ - Has final say when models disagree
39
+
40
+ Tech Lead (Main Claude + auditor model)
41
+ - Discovers edge cases with the CEO
42
+ - Decides implementation approaches
43
+ - Writes code (Main Claude)
44
+ - Cross-checks via a different model (auditor)
45
+
46
+ Junior Reviewers (subagents, mechanical rule enforcement only)
47
+ - frontend-reviewer / backend-reviewer / security-reviewer / ...
48
+ - Read the diff, cite the rule, report violations
49
+ - Never exercise judgment
50
+
51
+ Junior Programmer (subagent)
52
+ - test-fixer — writes tests, fixes failing ones
53
+ - Fresh context, doesn't know what the implementer thought
54
+ ```
55
+
56
+ ### 1.2 Who decides what
57
+
58
+ | Decision class | Owner | Examples |
59
+ |----------------|-------|----------|
60
+ | Intent / scope / business trade-offs | **CEO only** | "What should this feature do?" |
61
+ | Implementation approach | **Tech Lead only** | "Which library? Which pattern?" |
62
+ | Rule conformance | **Junior reviewer (mechanical)** | "Does the diff break a documented rule?" |
63
+ | Test correctness | **Auditor (judgment)** | "Did the fix actually fix it?" |
64
+
65
+ ### 1.3 What the CEO does NOT do
66
+
67
+ - Read code (the harness exists so the CEO can ship without reading code)
68
+ - Make technical choices (library X vs library Y is Tech Lead territory)
69
+ - Justify intent (the CEO's word *is* intent)
70
+
71
+ ### 1.4 What the Tech Lead does NOT do
72
+
73
+ - Question CEO intent at later stages (paraphrase to confirm — never to challenge)
74
+ - Reverse CEO decisions
75
+ - Ask the CEO technical questions ("hook or context API?" — translate to user impact instead)
76
+
77
+ ### 1.5 Why this model
78
+
79
+ LLMs default to "helpful assistant" behavior — they want to please. That collapses the role boundary: the model proposes intent, evaluates its own intent, then judges its own output. Bias compounds across the loop. Explicit role separation prevents this:
80
+
81
+ - The CEO holds intent → the LLM cannot drift the intent.
82
+ - The auditor (different model) judges output → the implementer cannot grade its own work.
83
+ - Junior reviewers enforce rules mechanically → no judgment-shaped escape hatch.
84
+
85
+ The cost is some upfront ceremony. The benefit is a workflow that doesn't accumulate hidden drift across iterations.
86
+
87
+ ---
88
+
89
+ ## 2. The load-bearing invariant: cross-model audit
90
+
91
+ **Every code change passes a different-model audit before commit.** No exceptions by lane or surface. (See `constitution.md § 1`.)
92
+
93
+ ### 2.1 Why "different model" specifically
94
+
95
+ Same-model self-audit reliably misses what same-model implementation rationalized. The same priors that generated the code generate the audit. Switching models (Claude → Codex, or vice versa) gets you a reviewer with different priors who catches different mistakes.
96
+
97
+ ### 2.2 Why "every change," not just "important ones"
98
+
99
+ "This change is too small to audit" is the precise failure mode the invariant prevents. Small changes accumulate. Model-bias drift is per-change, not per-line. The cost of running the audit on a trivial change is small (Quick mode runs in seconds); the cost of skipping is unbounded (silent drift surfaces months later).
100
+
101
+ ### 2.3 Audit intensity scales with change size
102
+
103
+ | Lane | Audit shape |
104
+ |------|-------------|
105
+ | Full workflow (new feature / audit-mode) | Full review across stages 1–6 |
106
+ | Stability-fix | Full review at stages 5 & 6 |
107
+ | Trivial-change | Quick mode (BLOCKING-only — security holes, data loss, defects only) |
108
+
109
+ Audit-exempt changes: pure docs, pure comments, pure formatting, pure tooling. Everything else passes audit.
110
+
111
+ ### 2.4 Single-engine fallback
112
+
113
+ If `auditor_model = None` (the user can't run a second model), the audit step still runs — but uses a fresh-context invocation of the same model. The bias-cancellation guarantee weakens; the discipline of a second look survives. Strictly worse than two-model; strictly better than no audit.
114
+
115
+ ---
116
+
117
+ ## 3. The two-file feature spec model
118
+
119
+ Every feature has up to two docs:
120
+
121
+ 1. **`{{spec_dir}}<name>.md`** — the **CEO spec**. Plain language. The CEO signs off on it; the CEO reads it end-to-end at smoke time. Tech terms are categorically banned.
122
+ 2. **`{{implementation_dir}}<name>-implementation.md`** — the **manager's notes**. Tech detail. Routing tables, state keys, library versions, scenario→test maps, audit-delta ledgers. The Tech Lead and junior reviewers read it; the CEO doesn't have to.
123
+
124
+ ### 3.1 Why two files
125
+
126
+ When CEO content and manager content live in one file:
127
+
128
+ - The CEO can't read their own spec (it's full of tech jargon they don't understand).
129
+ - The manager can't write the file freely (every paragraph has to be defensible in plain language).
130
+ - Drift compounds: the manager updates tech detail; the CEO doesn't know the user-facing behavior changed.
131
+
132
+ Separating them:
133
+
134
+ - The CEO spec stays scannable and signoffable by a non-engineer.
135
+ - The implementation notes can carry every routing table, RLS policy, and test-binding the harness needs.
136
+ - Drift between the two surfaces is detectable mechanically (every spec change must touch the right file in the same commit — `CLAUDE.md § Doc-in-sync responsibility`).
137
+
138
+ ### 3.2 The plain-language rule
139
+
140
+ The CEO spec is **banned from these categorical content types** (translate to behavior instead):
141
+
142
+ - Library / framework names
143
+ - Code identifiers (hook / function / store names)
144
+ - File paths
145
+ - RPC / function / table / column names
146
+ - Payload shapes (JSON field lists)
147
+ - Migration timestamps
148
+ - SDK error type names
149
+ - HTTP status codes as primary verbs
150
+ - Test file paths and test descriptions
151
+ - Query key constants
152
+
153
+ **The shape test:** if a non-engineer reading the sentence aloud would stumble, the sentence belongs in the implementation file. Translate to outcome ("nothing about the user reaches the device before the gate is passed"), not mechanism ("the RPC returns only `{state, reason, dormancy_required}`").
154
+
155
+ ### 3.3 Audit-delta ledgers belong in the implementation file
156
+
157
+ When `/audit-spec` produces deltas (code-vs-spec reconciliation), the ledger goes in `{{implementation_dir}}<name>-implementation.md`, never in the CEO spec. By definition the ledger tracks how code matches spec — that is manager-domain content.
158
+
159
+ ### 3.4 EARS notation for manager-domain functional requirements
160
+
161
+ **Manager-domain functional requirements use EARS notation** — see `CLAUDE.md § Two-file feature spec model > EARS notation` for the full guide. CEO-domain files stay plain prose; EARS is manager-only.
162
+
163
+ ---
164
+
165
+ ## 4. The 9-stage workflow
166
+
167
+ Every full-lane change passes through nine stages:
168
+
169
+ 1. **Draft / as-built spec** — paraphrase CEO intent, run edge-case sweep, or reverse-engineer existing code.
170
+ 2. **Finalize spec** — mark FINALIZED, final auditor cross-check.
171
+ 3. **Design schema** — only if `backend_db_type` is configured.
172
+ 4. **Write execution plan** — per-file checklist, library-version verification.
173
+ 5. **Implement per plan** — mechanical reviewer chain + auditor judgment.
174
+ 6. **Auto tests** — test-fixer subagent + four-axis auditor audit.
175
+ 7. **CEO smoke test** — real human runs the application manually.
176
+ 8. **Commit & push** — Conventional Commits, doc-in-sync check, only push if both Stage 6 + Stage 7 passed.
177
+ 9. **Watch after release** — observe telemetry for 24h.
178
+
179
+ (Stage details live in `outcome/skills/<skill>/SKILL.md`.)
180
+
181
+ ### 4.1 Why nine stages
182
+
183
+ Fewer stages save time but lose enforcement points. Each stage is a gate that catches a different class of failure:
184
+
185
+ - Stage 2 catches sloppy hand-offs from Stage 1.
186
+ - Stage 4 catches false library-API assumptions (high-trap-libraries) before any code is written — a previously un-audited blind spot in early versions of the harness.
187
+ - Stage 6 catches assertion-loosening / test-skipping / scenario-coverage gaps.
188
+ - Stage 7 catches "AI thinks it works, actually doesn't" — a category that no LLM audit reliably catches.
189
+ - Stage 9 catches drift between testing and production usage.
190
+
191
+ ### 4.2 Why this order, no reordering
192
+
193
+ Stages don't compose freely. Spec drives schema; schema drives plan; plan drives implementation; implementation drives tests. Reordering breaks the chain. A stage may be **skipped** via an explicit lane (stability-fix skips 1–3; trivial-change condenses 4–5), but never **reordered**.
194
+
195
+ ### 4.3 Stage 6's fourth axis — spec-vs-reality match
196
+
197
+ Stage 6 historically had three axes (test legitimacy, scenario coverage, fix correctness). A fourth axis was added after a real incident: the implementer had Stage 5 approved by the auditor, then quietly edited a spec sentence during deploy, and the now-incorrect spec sentence drove the CEO's smoke test wrong. The Stage 5 auditor never saw the spec edit (it audited the implementation diff); Stage 7 ran against a spec that no longer matched code. Adding a fourth axis at Stage 6 catches this drift before the CEO reads the spec to drive smoke.
198
+
199
+ The axis is narrow: it flags sentences asserting user-observable behavior the code provably doesn't deliver, or guarantees the code doesn't enforce. It does NOT police plain-language imprecision — the spec is supposed to omit mechanism; that's the two-file model's whole point.
200
+
201
+ ---
202
+
203
+ ## 5. The three lanes
204
+
205
+ Three lanes for three change shapes:
206
+
207
+ | Lane | When | Stages |
208
+ |------|------|--------|
209
+ | **Full workflow** | New feature / intent change / schema change / new dependency | All 9 |
210
+ | **Stability-fix** | Bug fix; intent unchanged, no new surface, no schema change | Skip 1–3; failing test mandatory before fix |
211
+ | **Trivial-change** | <20 LOC, no new surface, no schema change, no intent change | Skip 1–3; condensed 4–5; auditor Quick mode |
212
+
213
+ ### 5.1 Why three, not "as needed"
214
+
215
+ "As needed" means the implementer decides what level of scrutiny their own change deserves. That's self-grading. Three named lanes with explicit entry criteria force the conversation: which lane is this, and why?
216
+
217
+ The Tech Lead infers the lane from the change shape; the CEO confirms. Never auto-switches mid-flow — if the lane was wrong, the CEO re-classifies.
218
+
219
+ ### 5.2 The stability-fix invariant — failing test FIRST
220
+
221
+ A stability-fix change MUST author a failing test before the fix. The test is confirmed to fail on the broken code; then the fix lands; then the test passes. The reason: a "fix" without a corresponding test that the bug fails is not a fix, it's a guess. The harness mechanically enforces this at `/implement` Step 0 and `/test-fix` Step 0.
222
+
223
+ ### 5.3 Trivial-change's escape hatch
224
+
225
+ Trivial-change uses auditor Quick mode (BLOCKING-only). If the auditor's Quick audit surfaces non-trivial concerns, that's a signal the lane is wrong — the change is misclassified. The harness surfaces this to the CEO for re-classification rather than silently flagging the concerns as advisory and shipping.
226
+
227
+ ---
228
+
229
+ ## 6. Auditor opinion classification
230
+
231
+ Auditor findings are classified by intensity:
232
+
233
+ | Class | Handling | Examples |
234
+ |-------|----------|----------|
235
+ | **BLOCKING** | Must resolve. Pushback escalates to CEO. | Security holes; data loss; spec violations; race conditions; outright defects. |
236
+ | **STRONG** | Accept, or push back with explicit reasoning. | Better patterns exist; maintenance concerns; convention violations. |
237
+ | **ADVISORY** | Free choice. Usually accepted but skippable. | Style; naming; small improvements. |
238
+
239
+ The verdict comes back as a structured `verdict: PASS | CONCERNS | FAIL | WAIVED` plus `risk_score` (0–10), optional `waiver_reason`, and `blocking_items[]` / `advisory_items[]` arrays (each item shaped `{ category, rule_source, finding }`; advisory items omit `category`). See `outcome/AGENTS.md § Verdict output`. `FAIL` halts the flow until resolved; `CONCERNS` advances with a warning logged to `.harness/audits/concerns-*.json` for CEO commit-time review; `PASS` advances silently; `WAIVED` is a CEO override that advances with a logged `waiver_reason` and is rejected by the gate if any blocking item is `category: "universal-core"`.
240
+
241
+ ### 6.1 CEO escalation pattern
242
+
243
+ When the auditor disagrees with the CEO on a BLOCKING item:
244
+
245
+ ```
246
+ Manager: "Auditor disagreement on [item]:
247
+ - Auditor view: A (reasoning: ...)
248
+ - CEO view: B (reasoning: ...)
249
+
250
+ Impact:
251
+ - User-result impact: yes/no
252
+ - Cost / maintenance impact: yes/no
253
+ - Security / risk trade-off: yes/no
254
+
255
+ CEO decision needed."
256
+ ```
257
+
258
+ The CEO decides. Reasoning lands in the spec's `## Decision history` regardless of which side wins.
259
+
260
+ Exception: Universal Core items (`constitution.md § 1`) are not overridable, even by direct CEO instruction. If the auditor's finding maps to a Universal Core item, the instruction must be reformulated; it cannot be carried out as-stated.
261
+
262
+ ---
263
+
264
+ ## 7. CEO vs Tech Lead decision criteria
265
+
266
+ A decision goes to the CEO if any of these is true:
267
+
268
+ 1. **User-result impact** — the user sees something different, behavior changes.
269
+ 2. **Cost / maintenance impact** — new library, new external service, new infra.
270
+ 3. **Security / risk trade-off** — "X makes it faster but weakens Y."
271
+
272
+ If none of those, the Tech Lead decides autonomously and records the reasoning in code comments or the implementation file.
273
+
274
+ ### 7.1 The self-check before asking
275
+
276
+ Before asking the CEO anything:
277
+
278
+ - Would the user notice the difference?
279
+ - Can the CEO meaningfully compare the options (without becoming an engineer)?
280
+ - Is "I don't know" a defensible CEO answer?
281
+
282
+ Three "no"s = don't ask. Decide internally, document the reasoning.
283
+
284
+ ### 7.2 Good question vs bad question
285
+
286
+ | ❌ Bad (CEO can't answer) | ✅ Good (translated to user result) |
287
+ |---------------------------|-------------------------------------|
288
+ | "useState or useReducer?" | Don't ask. Tech Lead's call. |
289
+ | "OR or AND in this access predicate?" | Don't ask. Tech Lead's call. |
290
+ | "Should we cache chat messages aggressively?" | "Option A: chat opens slow but works offline. Option B: chat opens fast but needs network. Which UX?" |
291
+
292
+ ---
293
+
294
+ ## 8. Junior subagent roles — mechanical only
295
+
296
+ Junior reviewers (`frontend-reviewer`, `backend-reviewer`, `security-reviewer`) and the junior programmer (`test-fixer`) do **mechanical work only**:
297
+
298
+ - **Reviewers**: read the diff, look up the rule, report violations. Never propose new rules, evaluate business logic, or exercise judgment.
299
+ - **Programmer (test-fixer)**: writes/edits test code from fresh context. Doesn't know what the implementer believed; can't be biased by their reasoning. Writes within hard rules (never `.skip`, never loosen an assertion, etc.).
300
+
301
+ ### 8.1 Why mechanical-only
302
+
303
+ Judgment in junior agents creates a second judge that competes with the auditor. That defeats the bias-cancellation invariant. The auditor is the judgment layer; junior agents are the rule-enforcement layer. Each layer has a clean job. Mixing them muddies the audit chain.
304
+
305
+ ### 8.2 Subagent invocation is mechanical too
306
+
307
+ Path-based triggers: `client_code_paths` matches → `frontend-reviewer` fires. `backend_code_paths` matches → `backend-reviewer` fires. Specific predicates → `security-reviewer` fires. The implementer doesn't pick which reviewer runs; the diff picks.
308
+
309
+ If the implementer could pick, "this diff doesn't need a security review" becomes an escape hatch. Mechanical selection closes it.
310
+
311
+ ---
312
+
313
+ ## 9. Scenario classification — every edge case gets a category
314
+
315
+ Every edge-case scenario in the CEO spec carries one of two classifications:
316
+
317
+ - **[Required automated test]** — data correctness, security/permission, business logic, error handling, user data protection.
318
+ - **[Smoke test only]** — UI / animation / device-integration / timing-sensitive; cost of automating exceeds value.
319
+
320
+ ### 9.1 Why classify
321
+
322
+ Without classification, every scenario implicitly becomes "automate everything" (overhead) or "skip the automated test if it feels hard" (drift). Explicit classification per scenario forces the conversation: is this thing automate-able with reasonable effort, and is it the kind of failure that needs an automated catcher? CEO confirms each classification (not silently assigned by the LLM).
323
+
324
+ ### 9.2 Scenario-ID-to-test mapping
325
+
326
+ Every automated test carries a `// Verifies scenario X.Y` comment at the top:
327
+
328
+ ```
329
+ // Verifies scenario 3.4 — <scenario name from spec>
330
+ test('<test name>', async () => { ... })
331
+ ```
332
+
333
+ The comment is the source-of-truth binding. The implementation file's `## Scenario → automated test map` section is the human-facing lookup convenience. The CEO spec carries the classification only, never the test ID — file paths in the CEO spec violate the two-file model.
334
+
335
+ ### 9.3 Why this matters months later
336
+
337
+ When the error tracker pings a regression six months from now, the failing test maps to a scenario ID, the scenario ID maps to a CEO-spec section that defines the expected behavior, and `git blame` of the test maps to the commit that introduced the regression. The chain is fast and unambiguous; without scenario IDs it's an archaeological dig.
338
+
339
+ ---
340
+
341
+ ## 10. Anti-flag rules — telling the auditor what to ignore
342
+
343
+ Projects accumulate deliberate conventions that look like issues to an outside reviewer. `Pressable` is the project standard; `TouchableOpacity` is banned. `(SELECT auth.uid())` is the project pattern; bare `auth.uid()` is wrong. Etc.
344
+
345
+ Without an anti-flag list, the auditor wastes signal on every false positive. The harness ships an empty anti-flag list in `AGENTS.md`; `/init` seeds a few examples based on the detected tech stack; `/add-anti-flag` grows the list as the project develops conventions.
346
+
347
+ ### 10.1 Why this is a separate mechanism
348
+
349
+ Anti-flag rules are NOT the same as the audit's checklist. The checklist tells the auditor what to look for. The anti-flag list tells the auditor what to NOT flag. Both are necessary — the auditor must know what's a violation AND what looks like a violation but isn't.
350
+
351
+ ---
352
+
353
+ ## 11. Constitution as the immovable layer
354
+
355
+ The three-layer governance structure:
356
+
357
+ ```
358
+ Constitution (Section 1: Universal Core)
359
+ ↓ cannot be overridden by anyone, including CEO
360
+ CLAUDE.md (STRONG principles + operational guide)
361
+ ↓ overridable by CEO with reasoning recorded
362
+ spec / rule sources (per-feature, per-area)
363
+ ↓ refined per scenario
364
+ code
365
+ ```
366
+
367
+ ### 11.1 Why a constitution at all
368
+
369
+ CLAUDE.md is operational and grows over time. As CLAUDE.md grows, the load-bearing invariants get diluted in the operational detail. A separate constitution keeps the invariants small (~5 items), prominent (loaded first by every agent), and immune to silent revision via CLAUDE.md edits.
370
+
371
+ ### 11.2 What goes in the constitution
372
+
373
+ Five items, shipped with the harness, cannot be removed:
374
+
375
+ 1. Cross-model audit is mandatory.
376
+ 2. Data ownership red line — user data doesn't leak into logs/errors/cache/cross-user responses.
377
+ 3. CEO has final authority, EXCEPT on Universal Core.
378
+ 4. Real-human smoke test is mandatory.
379
+ 5. Spec and reality stay in sync.
380
+
381
+ Section 2 (project identity — filled at `/init`) and Section 3 (project-specific red lines — grows over time) are user-owned. Section 1 is harness-owned and immovable.
382
+
383
+ ---
384
+
385
+ ## 12. Blind spots this design closed
386
+
387
+ Earlier versions of an AI-workflow harness left these gaps:
388
+
389
+ | Blind spot | How this design closes it |
390
+ |------------|---------------------------|
391
+ | Stage 4 (planning) was un-audited | Auditor judgment audit at Stage 4 |
392
+ | UI / feature stability-fixes shipped without cross-model review | Stability-fix lane runs full audit at Stages 5 & 6 |
393
+ | Smoke-test failures patched without test-first verification | Stability-fix Stage 0 mechanically checks for a failing test before the fix |
394
+ | Trivial changes shipped without audit at all | Trivial-change lane runs Quick auditor audit |
395
+ | Tests passed but didn't tie to spec | Mandatory `// Verifies scenario X.Y` comment |
396
+ | CEO spec mixed with implementation detail | Two-file model |
397
+ | Manager would "just make the test pass" by loosening | Fresh-context `test-fixer` with hard rules + four-axis auditor audit |
398
+ | Spec edits during deploy went unaudited | Stage 6's fourth axis (spec-vs-reality match) |
399
+ | Implementer self-grading "is this reviewer needed?" | Mechanical path-based reviewer triggers |
400
+ | Auditor opinions treated as suggestions | Verdict parsed from exit code, not prose |
401
+ | CEO told "this is too risky" with no decision frame | CEO escalation pattern (Section 6.1 here) |
402
+
403
+ Each row was once a real incident that surfaced as a bug after the fact. The design changes are the response to those incidents, not speculative future-proofing.
404
+
405
+ ---
406
+
407
+ ## 13. What this design is not
408
+
409
+ - **Not a build system.** The harness orchestrates conversation, audit, and commit gates. It doesn't replace your build tool.
410
+ - **Not a project boilerplate.** Tech stack, file structure, and code conventions are project-specific. The harness wraps them; it doesn't impose them.
411
+ - **Not an enterprise governance suite.** No RBAC, no audit log signing, no compliance attestations. (Those are reasonable extensions on top, not part of the core.)
412
+ - **Not a substitute for engineering judgment.** Every rule in the harness can be overridden by the CEO (except Universal Core items). The harness's job is to make sure each override is conscious.
413
+
414
+ ---
415
+
416
+ ## 14. Where everything else lives
417
+
418
+ | You need | Look here |
419
+ |----------|-----------|
420
+ | Project identity, what we are/aren't | `outcome/constitution.md § Section 2` |
421
+ | Universal Core invariants (immovable) | `outcome/constitution.md § Section 1` |
422
+ | Stage-by-stage operating procedure | `outcome/skills/<skill>/SKILL.md` |
423
+ | Workflow overview, lanes, doc-in-sync | `outcome/CLAUDE.md` |
424
+ | Auditor verdict format, anti-flag rules | `outcome/AGENTS.md` |
425
+ | Junior agent definitions + plugin template | `outcome/agents/` |
426
+ | CLI integration (hooks, MCP config) | `outcome/cli-configs/` |
427
+ | How to install the harness in a project | `adoption-playbook.md` (this directory) |
428
+ | Lessons from real-world use | `retrospective-notes.md` (this directory) |
429
+ | Slot definitions (all 30+) | `outcome/constitution.md § Slot registry` |
430
+
431
+ ---
432
+
433
+ ## 15. Change history
434
+
435
+ ```
436
+ 2026-05: v1 — initial generic-harness spec, extracted and abstracted from the original project-specific spec.
437
+ ```
@@ -0,0 +1,135 @@
1
+ # Memory Layer (`.harness/memory/`)
2
+
3
+ Cross-session persistence for CCC-MAGI. File-based, hooks-driven, zero external dependencies, zero network calls.
4
+
5
+ ## What it does
6
+
7
+ Think of it as the project's **sticky-note wall**: a small notebook where Claude writes down decisions ("we use RLS, not middleware"), failures ("Google Vision was too expensive"), and observations ("FlashList beats FlatList here"). When you open a new Claude Code session on the same project, the harness reads the relevant notes and pins them to the conversation's context, so Claude starts informed instead of blank.
8
+
9
+ Without this layer, each new session is amnesia: Claude re-derives the same decisions, re-tries the same failed approaches, asks the same questions. With it, multi-session work on a feature accumulates rather than restarts.
10
+
11
+ ## How it works
12
+
13
+ ### Files
14
+
15
+ ```
16
+ .harness/memory/
17
+ ├── observations.jsonl # append-only JSONL; one entry per line
18
+ └── conventions.md # markdown long-form for project conventions
19
+ ```
20
+
21
+ ### Entry schema (`observations.jsonl`)
22
+
23
+ One JSON object per line:
24
+
25
+ ```json
26
+ {
27
+ "ts": "2026-05-24T14:00:00Z",
28
+ "kind": "decision",
29
+ "summary": "Use Supabase RLS instead of middleware",
30
+ "details": "middleware p99 = 800ms; RLS keeps the check at the DB layer",
31
+ "feature": "auth",
32
+ "files": ["src/auth.ts", "supabase/migrations/0042_rls.sql"],
33
+ "tags": ["auth", "rls", "supabase"],
34
+ "source": "manual"
35
+ }
36
+ ```
37
+
38
+ - **`kind`** — `decision` (a choice made), `failure` (an approach that didn't work), or `observation` (a general note).
39
+ - **`feature`** — name of the feature the entry relates to, or `null` for cross-cutting notes.
40
+ - **`source`** — `manual` (via `/remember`) or `session` (auto-captured at compaction time).
41
+
42
+ ### Hooks
43
+
44
+ | Hook | Script | Purpose | Frequency |
45
+ |------|--------|---------|-----------|
46
+ | `SessionStart` | `.harness/scripts/memory-recall.sh` | Read `observations.jsonl`, score entries by relevance to the current git branch's feature, inject top entries into `additionalContext`. | Every session start |
47
+ | `PreCompaction` | `.harness/scripts/memory-snapshot.sh` | Inject an instruction telling Claude to summarize the session's key decisions to `observations.jsonl` before compaction proceeds. | Only when context approaches limit |
48
+
49
+ We deliberately do **not** use `PostToolUse` (too noisy — fires on every tool call) or `Stop`/`SessionEnd` (too frequent / unreliable across CLI versions).
50
+
51
+ ### Relevance scoring (memory-recall.sh)
52
+
53
+ For each entry in `observations.jsonl`:
54
+
55
+ - **+5** if the entry's `feature` matches the feature derived from the current git branch (`feat/<name>-*` → `<name>`).
56
+ - **+1** if the entry's timestamp is within the last 7 days.
57
+
58
+ Sort by score DESC, then by timestamp DESC. Take the top 10 entries, OR stop when accumulated text exceeds ~8000 chars. If 0 entries pass the score filter, fall back to the top 3 most-recent regardless. If truly empty, emit nothing.
59
+
60
+ ## The `/remember` skill
61
+
62
+ User-invokable manual entry. Captures decisions/failures/observations curated by the human, not the LLM.
63
+
64
+ ```
65
+ /remember Use Supabase RLS, not middleware — middleware p99 was 800ms
66
+ ```
67
+
68
+ The skill:
69
+
70
+ 1. Parses `$ARGUMENTS` as the summary.
71
+ 2. Auto-extracts `feature` from the current git branch.
72
+ 3. Proposes `kind` based on phrasing (e.g., "use X" → `decision`; "didn't work" → `failure`).
73
+ 4. Proposes `files` from `git status --short`.
74
+ 5. Proposes `tags` from the summary's key nouns.
75
+ 6. Shows the full entry for confirmation.
76
+ 7. On approval, appends the JSON line to `.harness/memory/observations.jsonl`.
77
+
78
+ The entry's `source` is always `"manual"` — distinguishes user-curated notes from auto-captured ones.
79
+
80
+ ## Auto-capture via `PreCompaction`
81
+
82
+ Claude Code fires the `PreCompaction` hook when the conversation's context window is about to be compacted. We use this as a natural "save point": the hook injects an instruction telling Claude to pick the 3 most important items from the session and append them to `observations.jsonl` before the compaction discards them.
83
+
84
+ The hook itself does no summarization — that requires an LLM, which we already have (Claude is right there). The hook is just the prompt orchestrator. Claude does the actual `echo '<json>' >> observations.jsonl` calls.
85
+
86
+ The auto-captured entries are tagged `"source": "session"` to distinguish them from `/remember` entries.
87
+
88
+ ## Token economics
89
+
90
+ Memory recall adds **~1-3K tokens** to session startup (the `additionalContext` block).
91
+
92
+ - **Empty memory file** → zero tokens; the hook emits nothing.
93
+ - **Small memory file** (1-5 entries) → a few hundred tokens.
94
+ - **Active project** (10+ entries spanning multiple features) → up to ~2-3K tokens for the top-10 filtered recall.
95
+
96
+ Net savings only materialize on **multi-session work on the same feature**. A one-shot project with no prior sessions sees only cost, not benefit. That's an honest tradeoff: the layer is designed for the case where you're going to open Claude Code on the same project 10+ times, not for one-off chats.
97
+
98
+ ## Privacy
99
+
100
+ By default, `.harness/memory/` is **NOT** in `.gitignore`. Reasoning:
101
+
102
+ - For teams, the memory wall is a shared artifact — everyone benefits from prior decisions/failures.
103
+ - The contents are summaries (not raw transcripts), so sensitive prompt content doesn't leak.
104
+ - The schema has no secrets fields — entries are short, prose-level descriptions.
105
+
106
+ Solo developers who prefer local-only memory can uncomment the `.harness/memory/` line in the harness's shipped `.gitignore`.
107
+
108
+ If you want per-file granularity (commit `conventions.md` but ignore `observations.jsonl`, or vice versa), edit `.gitignore` accordingly.
109
+
110
+ ## What it is NOT
111
+
112
+ Explicit non-features so you can calibrate expectations:
113
+
114
+ - **Not a cloud service.** No backend, no API, no account. Just files on disk.
115
+ - **Not a vector database.** Scoring is feature + recency. No embeddings, no semantic search.
116
+ - **Not auto-learning.** The memory only grows when (a) the user invokes `/remember`, or (b) Claude runs the `PreCompaction`-prompted append. There is no background process scanning the conversation.
117
+ - **Not a transcript log.** Entries are summaries, not raw messages. A single decision is one line; a 2-hour design discussion that produced 1 decision = 1 line.
118
+ - **Not a replacement for `constitution.md` / `CLAUDE.md`.** Those are stable rules and project identity. The memory layer is volatile context — what's been tried, what's been chosen, what's been ruled out.
119
+
120
+ ## Comparison to mem0 / claude-mem
121
+
122
+ | Property | mem0 | claude-mem | CCC-MAGI memory |
123
+ |----------|------|------------|---------------------|
124
+ | Storage | Cloud (managed) or self-hosted vector DB | Local SQLite + embeddings | Plain files (`.jsonl` + `.md`) |
125
+ | Retrieval | Vector similarity | Hybrid (keyword + vector) | Feature + recency scoring |
126
+ | Auto-capture | Background heuristics | Per-message inference | PreCompaction-prompted (LLM does the summarization) |
127
+ | Network | Yes (managed) / Yes (self-host) | No | No |
128
+ | Dependencies | Python + cloud / DB | Node.js + sqlite + embedding model | `jq` + `bash` |
129
+ | Setup cost | Account / DB provisioning | npm install + model download | Zero (hooks already shipped) |
130
+ | Schema control | Library-defined | Library-defined | User-readable JSONL — grep-able, hand-editable |
131
+ | Lock-in | Yes | Partial | None (just text files) |
132
+
133
+ See `harness-2026-deep-comparison.md` for the longer-form comparison of CCC-MAGI vs. mem0 / claude-mem / Cursor Rules / etc.
134
+
135
+ The design tradeoff: we accept weaker retrieval (no semantic search) to keep the layer dependency-free, security-trivial, and version-controllable.