create-ccc-tutor 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +41 -0
- package/bin/cli.js +76 -0
- package/package.json +28 -0
- package/template/.claude/commands/abandon.md +7 -0
- package/template/.claude/commands/add-anti-flag.md +7 -0
- package/template/.claude/commands/add-constitution-clause.md +7 -0
- package/template/.claude/commands/audit-spec.md +7 -0
- package/template/.claude/commands/commit.md +7 -0
- package/template/.claude/commands/constitution-edit.md +7 -0
- package/template/.claude/commands/db-schema.md +7 -0
- package/template/.claude/commands/exam.md +66 -0
- package/template/.claude/commands/execution-plan.md +7 -0
- package/template/.claude/commands/feature-draft.md +7 -0
- package/template/.claude/commands/handoff.md +7 -0
- package/template/.claude/commands/implement.md +7 -0
- package/template/.claude/commands/init.md +7 -0
- package/template/.claude/commands/next.md +7 -0
- package/template/.claude/commands/offload.md +7 -0
- package/template/.claude/commands/pickup.md +7 -0
- package/template/.claude/commands/recall.md +7 -0
- package/template/.claude/commands/remember.md +7 -0
- package/template/.claude/commands/slide.md +87 -0
- package/template/.claude/commands/spec-finalize.md +7 -0
- package/template/.claude/commands/test-fix.md +7 -0
- package/template/.claude/commands/uninstall.md +7 -0
- package/template/.claude/settings.json +161 -0
- package/template/.claude-plugin/plugin.json +41 -0
- package/template/.codex/config.toml +24 -0
- package/template/.codex/hooks.json +4 -0
- package/template/.codex/install-skills.sh +18 -0
- package/template/.codex/skills/exam/SKILL.md +61 -0
- package/template/.codex/skills/slide/SKILL.md +69 -0
- package/template/.harness/agents/README.md +70 -0
- package/template/.harness/agents/_template/junior-agent-template.md +116 -0
- package/template/.harness/agents/backend-reviewer.md +153 -0
- package/template/.harness/agents/frontend-reviewer.md +158 -0
- package/template/.harness/agents/security-reviewer.md +148 -0
- package/template/.harness/agents/test-fixer.md +147 -0
- package/template/.harness/docs/doc-sync.md +29 -0
- package/template/.harness/docs/git-hygiene.md +56 -0
- package/template/.harness/docs/spec-model.md +47 -0
- package/template/.harness/docs/tool-map.md +120 -0
- package/template/.harness/docs/workflow.md +59 -0
- package/template/.harness/scripts/README.md +70 -0
- package/template/.harness/scripts/auditor-gate.sh +388 -0
- package/template/.harness/scripts/bootstrap-check.sh +103 -0
- package/template/.harness/scripts/budget-monitor.sh +223 -0
- package/template/.harness/scripts/check-prereqs.sh +165 -0
- package/template/.harness/scripts/checkpoint-recall.sh +136 -0
- package/template/.harness/scripts/checkpoint-write.sh +281 -0
- package/template/.harness/scripts/decision-log-append.sh +90 -0
- package/template/.harness/scripts/env-check.sh +286 -0
- package/template/.harness/scripts/format-edit.sh +80 -0
- package/template/.harness/scripts/lint-bans.sh +110 -0
- package/template/.harness/scripts/memory-archive.sh +129 -0
- package/template/.harness/scripts/memory-recall.sh +197 -0
- package/template/.harness/scripts/memory-snapshot.sh +124 -0
- package/template/.harness/scripts/post-migration.sh +58 -0
- package/template/.harness/scripts/precommit-cycles.sh +74 -0
- package/template/.harness/scripts/precommit-typecheck.sh +69 -0
- package/template/.harness/scripts/scratchpad-recall.sh +83 -0
- package/template/.harness/scripts/scratchpad-update.sh +39 -0
- package/template/.harness/scripts/standalone-bootstrap.md +443 -0
- package/template/.harness/skills/abandon/SKILL.md +157 -0
- package/template/.harness/skills/add-anti-flag/SKILL.md +205 -0
- package/template/.harness/skills/add-constitution-clause/SKILL.md +244 -0
- package/template/.harness/skills/audit-spec/SKILL.md +395 -0
- package/template/.harness/skills/commit/SKILL.md +270 -0
- package/template/.harness/skills/constitution-edit/SKILL.md +292 -0
- package/template/.harness/skills/db-schema/SKILL.md +145 -0
- package/template/.harness/skills/db-schema/references/methodology.md +202 -0
- package/template/.harness/skills/execution-plan/SKILL.md +346 -0
- package/template/.harness/skills/feature-draft/SKILL.md +426 -0
- package/template/.harness/skills/handoff/SKILL.md +211 -0
- package/template/.harness/skills/implement/SKILL.md +355 -0
- package/template/.harness/skills/init/SKILL.md +805 -0
- package/template/.harness/skills/next/SKILL.md +245 -0
- package/template/.harness/skills/offload/SKILL.md +134 -0
- package/template/.harness/skills/pickup/SKILL.md +213 -0
- package/template/.harness/skills/recall/SKILL.md +159 -0
- package/template/.harness/skills/remember/SKILL.md +205 -0
- package/template/.harness/skills/spec-finalize/SKILL.md +196 -0
- package/template/.harness/skills/test-fix/SKILL.md +363 -0
- package/template/.harness/skills/uninstall/SKILL.md +370 -0
- package/template/.harness/state/install.json +83 -0
- package/template/AGENTS.md +262 -0
- package/template/CCC_MAGI_LICENSE +201 -0
- package/template/CCC_MAGI_README.md +986 -0
- package/template/CLAUDE.md +658 -0
- package/template/codex.md +39 -0
- package/template/constitution.md +164 -0
- package/template/course/README.md +15 -0
- package/template/course/course_code(example)/exam/README.md +2 -0
- package/template/course/course_code(example)/slide/slide_example-1.pdf +40 -0
- package/template/course/course_code(example)/slide/slide_example-2.pdf +40 -0
- package/template/docs/features/slide-query-implementation.md +79 -0
- package/template/docs/features/slide-query.md +211 -0
- package/template/docs-harness/README.md +42 -0
- package/template/docs-harness/adoption-playbook.md +373 -0
- package/template/docs-harness/ccc-step1-driver-template.md +288 -0
- package/template/docs-harness/cli-configs-README.md +78 -0
- package/template/docs-harness/context-architecture-v2.md +249 -0
- package/template/docs-harness/design-spec.md +437 -0
- package/template/docs-harness/memory-layer.md +135 -0
- package/template/docs-harness/retrospective-notes.md +204 -0
- package/template/gitignore +106 -0
|
@@ -0,0 +1,363 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: test-fix
|
|
3
|
+
description: This skill should be used at stage 6 of the feature workflow, after implementation (stage 5) is complete. It runs the project's test suite ({{test_framework}}), and if any tests fail, spawns a fresh-context `test-fixer` subagent to diagnose and fix (up to 3 iterations), then audits fix legitimacy with a different-model auditor pass. On the stability-fix lane (no prior stage-5 audit) it runs an additional auditor backstop on the entire fix diff, closing the cross-model gap on hotfixes. Use this always after implementation — do not declare stage 5 done without running tests. Skipped entirely if `test_required = false`. Trigger when the user invokes /test-fix, says "write tests for X", "fix the failing test", "run tests", or when moving from implementation to verification.
|
|
4
|
+
argument-hint: [feature-name]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# /test-fix
|
|
8
|
+
|
|
9
|
+
Drive Stage 6 of the feature workflow. Three layers of independence remove the implementer-grades-own-work bias:
|
|
10
|
+
|
|
11
|
+
1. **Context-level independence** — fresh `test-fixer` subagent (junior programmer; writes test code only, no judgment).
|
|
12
|
+
2. **Model-level independence (post-fix audit)** — auditor ({{auditor_model}}) audits the test-fix diff on four axes:
|
|
13
|
+
- **Test legitimacy** — assertions / skips / mocks / deletions
|
|
14
|
+
- **Scenario coverage** — every CEO-spec scenario classified `[Required automated test]` has at least one test that references it via the `// Verifies scenario X.Y` comment
|
|
15
|
+
- **Fix correctness** — when source files were modified during the fix loop, do those source changes actually address the failure without introducing new ones
|
|
16
|
+
- **Spec-vs-reality match** — final gate before the CEO smoke test
|
|
17
|
+
3. **Model-level independence (stability-fix backstop)** — when the stability-fix lane is detected (no prior Stage 5 auditor audit on this change), an additional auditor audit covers fix correctness on the entire fix diff. Closes the gap where Stages 1–4 are skipped.
|
|
18
|
+
|
|
19
|
+
> *Constitutional basis: Constitution § 1 (cross-model audit) + § 4 (real-human smoke test follows — this is the final gate before CEO takes over).*
|
|
20
|
+
|
|
21
|
+
**Skipped entirely if `test_required = false`** — projects can opt out of automated tests at /init; in that case, jump from Stage 5 directly to Stage 7.
|
|
22
|
+
|
|
23
|
+
## Scenario-ID rule (load-bearing)
|
|
24
|
+
|
|
25
|
+
Every automated test produced or modified at this stage carries a comment:
|
|
26
|
+
|
|
27
|
+
```ts
|
|
28
|
+
// Verifies scenario 3.4 — <scenario name from spec>
|
|
29
|
+
test('<test name>', async () => { ... })
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
The comment ties the test to a scenario ID in `{{spec_dir}}<feature>.md`. Tests without the comment have no audit trail; the auditor's scenario-coverage check catches the gap.
|
|
33
|
+
|
|
34
|
+
When tests pass and the auditor advances (PASS, CONCERNS, or WAIVED), the resolved test bindings (file path + test name) are recorded in the implementation file's "Scenario → automated test map" section, NOT in the CEO spec.
|
|
35
|
+
|
|
36
|
+
**EARS-formatted requirements**: When the manager file has EARS-formatted requirements (`WHEN X THE SYSTEM SHALL Y`), each SHALL clause maps directly to a test assertion. The mapping is:
|
|
37
|
+
- The `WHEN` clause → the test's `Arrange` step (setup the trigger condition)
|
|
38
|
+
- The `THE SYSTEM SHALL` clause → the test's `Assert` step (verify the response)
|
|
39
|
+
- Verify the timing constraint if present (e.g., "within 500ms" → assert response time < 500ms)
|
|
40
|
+
|
|
41
|
+
For prose-style requirements (pre-EARS or simple features), interpret naturally — no special handling.
|
|
42
|
+
|
|
43
|
+
## Authoritative sources
|
|
44
|
+
|
|
45
|
+
1. The project's testing convention (`{{test_framework}}`, assertion rules)
|
|
46
|
+
2. `{{spec_dir}}<feature>.md` — the **CEO spec** (canonical scenario list + classification)
|
|
47
|
+
3. `{{implementation_dir}}<feature>-implementation.md` — manager-domain notes (when present)
|
|
48
|
+
4. `{{spec_dir}}<feature>-plan.md` — execution plan (when present; absent on stability-fix lane)
|
|
49
|
+
5. `.harness/agents/test-fixer.md` — the subagent definition (junior programmer, mechanical)
|
|
50
|
+
6. `.harness/scripts/auditor-gate.sh` — the auditor post-fix audit + diagnostic gate
|
|
51
|
+
7. `AGENTS.md` (root) — auditor standing context including `{{anti_flag_rules}}`
|
|
52
|
+
8. Root `CLAUDE.md` § Lanes (stability-fix flow) and § Workflow (Stage 6)
|
|
53
|
+
9. Scoped rule files governing the touched layers (from `{{rule_sources}}`)
|
|
54
|
+
|
|
55
|
+
## Identify the active feature
|
|
56
|
+
|
|
57
|
+
The skill needs `<feature>` for state file paths and to pass the spec/plan to test-fixer.
|
|
58
|
+
|
|
59
|
+
- If `$ARGUMENTS` is provided, use it as `<feature>`.
|
|
60
|
+
- Otherwise, identify the active feature from context (most recently created/edited spec under `{{spec_dir}}`, current implementation focus). If you cannot confidently identify it, ask the user explicitly: "which feature is this test pass for?"
|
|
61
|
+
|
|
62
|
+
Do not proceed without a confirmed `<feature>` name.
|
|
63
|
+
|
|
64
|
+
## Step 0 — Stability-fix lane: failing-test-first verification (conditional)
|
|
65
|
+
|
|
66
|
+
**Run this step only when the lane is stability-fix.** Skip for full workflow and trivial-change.
|
|
67
|
+
|
|
68
|
+
This step backstops the failing-test-first enforcement that `/implement` runs at its Step 0. It exists in case the user ran `/test-fix` directly (without going through `/implement`) on a stability-fix change.
|
|
69
|
+
|
|
70
|
+
The procedural sequence required by root `CLAUDE.md` § Lanes (stability-fix) is: bug analyzed → failing test authored → test confirmed failing on broken code → fix applied → test confirmed passing on fixed code. This step verifies a failing test was authored as part of the change, not bolted on after the fix passed.
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
ORIGINAL_BASELINE=<the last commit before the stability-fix work began — ask the user>
|
|
74
|
+
git diff "$ORIGINAL_BASELINE" --name-only > /tmp/test-fix-stab-files.txt
|
|
75
|
+
# Adjust pattern to match {{test_framework}}'s file convention:
|
|
76
|
+
grep -E '\.test\.(ts|tsx|js|jsx|py|go|rs)$|_test\.(go|py)$|test_.*\.(py)$' /tmp/test-fix-stab-files.txt > /tmp/test-fix-stab-tests.txt || true
|
|
77
|
+
grep -vE '\.test\.(ts|tsx|js|jsx|py|go|rs)$|_test\.(go|py)$|test_.*\.(py)$' /tmp/test-fix-stab-files.txt > /tmp/test-fix-stab-source.txt || true
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Mechanical check 1 — diff must include a test file.** If `/tmp/test-fix-stab-tests.txt` is empty but `/tmp/test-fix-stab-source.txt` is non-empty, halt:
|
|
81
|
+
|
|
82
|
+
> "Stability-fix lane requires a failing test written before the fix (per CLAUDE.md § Lanes). The diff contains source changes but no test changes — the regression has no automated catcher. Halt — write the failing test now (revert the fix to a temp branch first to confirm the test catches the bug), then re-stage and re-invoke `/test-fix`."
|
|
83
|
+
|
|
84
|
+
**Mechanical check 2 — at least one test in the diff carries `// Verifies scenario X.Y`.** Inspect `git diff "$ORIGINAL_BASELINE" -- $(cat /tmp/test-fix-stab-tests.txt)` for the comment pattern. If no test in the diff carries it, halt:
|
|
85
|
+
|
|
86
|
+
> "Stability-fix lane requires the failing test to reference a CEO-spec scenario via `// Verifies scenario X.Y`. The diff has test changes but none carry the comment. Halt — add the comment to the test that exercises the regression, then re-invoke `/test-fix`."
|
|
87
|
+
|
|
88
|
+
**User confirmation — pre-fix failure was observed.** After both mechanical checks pass, ask the user once:
|
|
89
|
+
|
|
90
|
+
> "Stability-fix procedural confirmation:
|
|
91
|
+
>
|
|
92
|
+
> - The failing test in this diff (`<test path>::<test name>`) was run against the broken code BEFORE the fix was applied — and it failed?
|
|
93
|
+
> - After the fix was applied, that same test passes now?
|
|
94
|
+
>
|
|
95
|
+
> Both `yes` → proceed to Step 1 (run tests + auditor audit).
|
|
96
|
+
> Either `no` → halt; the audit chain assumes test-first; re-do this in the correct order before re-invoking."
|
|
97
|
+
|
|
98
|
+
If the user answers `yes` to both, proceed. If either `no`, halt. The user may override with explicit reasoning recorded for the commit body — the override does not silently advance.
|
|
99
|
+
|
|
100
|
+
## Step 1 — Pre-flight
|
|
101
|
+
|
|
102
|
+
1. Capture baseline: `BASELINE=$(git rev-parse HEAD)`.
|
|
103
|
+
2. Run `{{test_runner_command}}` once with no fixing. Capture stderr and pass/fail status.
|
|
104
|
+
3. If all tests pass on the first run, skip to **Step 5 — Auditor post-fix audit** with an empty `FIXES_APPLIED`. We still run the audit so any test coverage that was weakened during Stage 5 doesn't slip through.
|
|
105
|
+
4. If tests fail, proceed to Step 2.
|
|
106
|
+
|
|
107
|
+
## Step 2 — Spawn test-fixer subagent
|
|
108
|
+
|
|
109
|
+
Invoke the `test-fixer` subagent via the Task tool (`subagent_type: "test-fixer"`).
|
|
110
|
+
|
|
111
|
+
Construct the subagent prompt to include exactly:
|
|
112
|
+
|
|
113
|
+
- The failing test output verbatim (full stderr from Step 1)
|
|
114
|
+
- An instruction to read: `{{spec_dir}}<feature>.md`, `{{spec_dir}}<feature>-plan.md`, the project's testing convention, the relevant scoped rule files
|
|
115
|
+
- If `.harness/state/test-fix/<feature>-attempts.json` exists (prior escalation retry), include its contents pretty-printed under the heading: `REJECTED_APPROACHES (from prior test-fixer iterations — these did NOT resolve the failures; do not repeat them, find a different angle):`
|
|
116
|
+
|
|
117
|
+
**Do NOT include in the prompt:**
|
|
118
|
+
|
|
119
|
+
- Any text from the main session about why the code "should be" correct
|
|
120
|
+
- The implementer's reasoning, framing, or interpretation
|
|
121
|
+
- "I believe…" / "this looks right" / any pre-supposition of the answer
|
|
122
|
+
|
|
123
|
+
Pass artifacts and criteria. Not interpretation.
|
|
124
|
+
|
|
125
|
+
## Step 3 — Receive and surface the report
|
|
126
|
+
|
|
127
|
+
Read the test-fixer's structured report **verbatim**. Do not summarize, filter, or rephrase.
|
|
128
|
+
|
|
129
|
+
Surface to the user:
|
|
130
|
+
|
|
131
|
+
- The full `FIXES_APPLIED` list, with every entry's `suspicious` flag visible
|
|
132
|
+
- `ITERATIONS_USED`
|
|
133
|
+
- If `STATUS: ESCALATE` — the `REMAINING_FAILURES` and `HYPOTHESIS` blocks
|
|
134
|
+
|
|
135
|
+
The user must see what the subagent actually did before any verdict computes.
|
|
136
|
+
|
|
137
|
+
**Wait for user response before continuing.**
|
|
138
|
+
|
|
139
|
+
## Step 4 — Branch on STATUS
|
|
140
|
+
|
|
141
|
+
- `STATUS: PASS` → Step 5 (auditor post-fix audit)
|
|
142
|
+
- `STATUS: ESCALATE` → Step 6 (escalation routing)
|
|
143
|
+
|
|
144
|
+
## Step 5 — Auditor post-fix audit
|
|
145
|
+
|
|
146
|
+
Compute the diff and write to a temp file:
|
|
147
|
+
|
|
148
|
+
```bash
|
|
149
|
+
DIFF_FILE=$(mktemp /tmp/test-fix-diff.XXXXXX)
|
|
150
|
+
git diff "$BASELINE" > "$DIFF_FILE"
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
`git diff "$BASELINE"` (no `..HEAD`) is working-tree-aware. The `test-fixer` subagent uses Edit, not commit — its changes sit uncommitted. Using `..HEAD` here would diff committed-only and yield an empty file, silently auditing nothing.
|
|
154
|
+
|
|
155
|
+
Invoke the gate. The prompt covers four axes — test legitimacy, scenario coverage, fix correctness, and spec-vs-reality match:
|
|
156
|
+
|
|
157
|
+
```bash
|
|
158
|
+
bash .harness/scripts/auditor-gate.sh review <feature> 6 \
|
|
159
|
+
"Review the diff produced by Stage 6 test fixing. Tests now pass; you verify four axes.
|
|
160
|
+
|
|
161
|
+
Axis 1 — Test legitimacy: weakened assertions (exact values replaced with looser matchers, removed assertions, removed test cases, .skip / .only / equivalent left in, new mocks of components / hooks / modules the test was supposed to exercise). If a test now passes only because it asserts less, that is CRITICAL.
|
|
162
|
+
|
|
163
|
+
Axis 2 — Scenario coverage: read {{spec_dir}}<feature>.md and enumerate every scenario classified [Required automated test]. For each, confirm at least one test in the diff (or already in the repo on a path the diff modifies) carries a '// Verifies scenario X.Y' comment matching it. A [Required automated test] scenario with no covering test is CRITICAL. The resolved test bindings should land in {{implementation_dir}}<feature>-implementation.md's 'Scenario → automated test map' section (NOT in the CEO spec); a [Required automated test] scenario missing a subsection there after Stage 6 is CRITICAL. Also flag CRITICAL if the CEO spec carries any '#### Automated test ID' block or 'scenario-X-Y — <path>/*.test.*' line — that's a workflow violation per CLAUDE.md's two-file model.
|
|
164
|
+
|
|
165
|
+
Axis 3 — Fix correctness (when source files were modified during the test-fix loop): does the source change actually fix the failure without introducing a new one? Does it contradict the CEO spec? Does it bypass an access-control gate the prior code relied on? If the source change is itself buggy, that is CRITICAL.
|
|
166
|
+
|
|
167
|
+
Axis 4 — Spec-vs-reality match (mandatory; this is the final gate before smoke). The CEO spec at {{spec_dir}}<feature>.md is a behavioral document written for the final decision maker — the CEO reads it end-to-end at smoke time to drive the manual smoke test. It describes what the feature does from a user-facing perspective: what the user sees, what they can do, what happens to their data, what guarantees the product makes. It does NOT describe implementation mechanism, and is BANNED from doing so by CLAUDE.md's two-file model. Your audit must respect that boundary. Read the spec end-to-end. Flag a sentence ONLY when: (a) it asserts a user-observable behavior the code provably doesn't deliver (timing, atomicity boundary, recovery path the user can actually take, what an attempted action returns, what gets scrubbed or cascaded or revoked from the user's perspective, what the user sees after the action), OR (b) it asserts a guarantee that the code doesn't enforce. Do NOT flag plain-language vocabulary that doesn't map 1:1 to a code identifier. Do NOT flag sentences that omit implementation mechanism. Do NOT flag wording that could be tightened toward technical precision. This axis exists because /implement auditor sees only the implementation diff (which may not include all spec edits) and /test-fix is the last gate before the CEO smoke test. It does NOT exist to police plain-language imprecision.
|
|
168
|
+
|
|
169
|
+
Do NOT flag: project-convention choices per {{anti_flag_rules}}, formatting, naming, refactor opinions, or suggestions for additional test coverage beyond what the CEO spec lists as [Required automated test]." \
|
|
170
|
+
"$DIFF_FILE"
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
Read the gate's exit code:
|
|
174
|
+
|
|
175
|
+
- **Exit 0 (PASS / CONCERNS / WAIVED)** — proceed to Step 5b (stability-fix backstop, if applicable). Mention any advisory items. For CONCERNS, also surface the logged warning path (`.harness/audits/concerns-*.json`) and remind the CEO to review before commit. For WAIVED, surface the `waiver_reason`.
|
|
176
|
+
- **Exit 2 (FAIL)** — surface every blocking item verbatim. Halt. Stage 6 incomplete. Ask the user how to proceed (re-run test-fixer with the blocking findings as additional guidance, manual fix, or accept the risk and override later at Stage 8).
|
|
177
|
+
- **Exit 1 (script error / Universal Core WAIVED rejected / missing waiver_reason / legacy verdict)** — surface stderr, halt. Stage 6 cannot complete without a successful auditor call.
|
|
178
|
+
|
|
179
|
+
## Step 5b — Stability-fix lane backstop (conditional)
|
|
180
|
+
|
|
181
|
+
This step closes the cross-model gap on the stability-fix lane. When `/implement` was invoked and produced a Stage 5 auditor approval, fix correctness was already audited there. This backstop fires only when that didn't happen — i.e., the user jumped straight to `/test-fix` without `/implement`.
|
|
182
|
+
|
|
183
|
+
It fires when:
|
|
184
|
+
|
|
185
|
+
1. **No prior Stage 5 audit exists** — `.harness/state/auditor-approvals/<feature>-stage5.json` is absent. Presence means `/implement` ran and the auditor already audited the fix on its full diff; running again here would duplicate.
|
|
186
|
+
|
|
187
|
+
If the prior-audit file exists, skip Step 5b and report `✓ Stage 6 complete: tests pass; auditor post-fix audit (legitimacy + coverage + correctness + spec-vs-reality) advanced.` exactly as the Step 5 PASS path does.
|
|
188
|
+
|
|
189
|
+
### Determine the original-fix baseline
|
|
190
|
+
|
|
191
|
+
`$BASELINE` from Step 1 was captured at `/test-fix` start, so it includes only the test-fix loop. The stability fix itself sits before that. Ask the user once:
|
|
192
|
+
|
|
193
|
+
> "Stability-fix lane detected (no prior `/implement` auditor audit on this change). What's the last commit before you started this fix? (e.g., `HEAD~1`, or a sha)"
|
|
194
|
+
|
|
195
|
+
Capture as `ORIGINAL_BASELINE`. Do not guess — an incorrect baseline either bloats the diff with unrelated commits (noise) or misses part of the fix (false negatives). If the user is unsure, prompt them to check `git log --oneline` rather than defaulting.
|
|
196
|
+
|
|
197
|
+
### Run the audit
|
|
198
|
+
|
|
199
|
+
```bash
|
|
200
|
+
STAB_DIFF=$(mktemp /tmp/test-fix-stab-diff.XXXXXX)
|
|
201
|
+
git diff "$ORIGINAL_BASELINE" > "$STAB_DIFF"
|
|
202
|
+
|
|
203
|
+
bash .harness/scripts/auditor-gate.sh review <feature> 6-fix \
|
|
204
|
+
"Stability-fix lane backstop. Stages 1–3 were skipped per CLAUDE.md and /implement was not invoked, so this is the FIRST cross-model audit on this change end-to-end. The diff includes the original fix plus any test-fixer adjustments. Audit fix CORRECTNESS (Step 5 already covered test legitimacy + scenario coverage). Look for: access-control / RLS correctness when {{backend_code_paths}} is touched; auth-context assumptions; anonymous-vs-authenticated path handling; race conditions between async operations; runtime edge cases (null, empty, concurrent, expired-session, signed-out timing, app-backgrounded mid-mutation); source-fix contradictions (does the fix actually address the bug? does it introduce a new failure mode? does it bypass an access-control gate the prior code relied on?); CEO-spec contradictions (read {{spec_dir}}<feature>.md and confirm the fix matches the spec's behavior for the touched scenarios — not just the test). Compare against scoped rule files and the feature spec. Do NOT flag: project conventions in {{anti_flag_rules}}, formatting, naming preferences, refactor opinions, or suggestions for additional test coverage beyond what the CEO spec lists as [Required automated test]." \
|
|
205
|
+
"$STAB_DIFF"
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
The gate writes `.harness/state/auditor-approvals/<feature>-stage6-fix.json`.
|
|
209
|
+
|
|
210
|
+
Read the gate's exit code:
|
|
211
|
+
|
|
212
|
+
- **Exit 0 (PASS / CONCERNS / WAIVED)** — surface `✓ Stage 6 complete: tests pass; auditor post-fix audit advanced; stability-fix end-to-end audit advanced.` Mention any advisory items. For CONCERNS, surface the logged warning path (`.harness/audits/concerns-*.json`) and remind the CEO to review before commit. For WAIVED, surface the `waiver_reason`. Stage 6 complete.
|
|
213
|
+
- **Exit 2 (FAIL)** — surface every blocking item verbatim. Halt. Stage 6 incomplete. The user addresses (typically by editing the fix, not the tests), then re-invokes `/test-fix`.
|
|
214
|
+
- **Exit 1 (script error / Universal Core WAIVED rejected / missing waiver_reason / legacy verdict)** — surface stderr, halt.
|
|
215
|
+
|
|
216
|
+
## Step 6 — Escalation routing
|
|
217
|
+
|
|
218
|
+
The test-fixer exhausted 3 iterations without PASS.
|
|
219
|
+
|
|
220
|
+
### 6a. Persist this attempt
|
|
221
|
+
|
|
222
|
+
Append to `.harness/state/test-fix/<feature>-attempts.json`. Schema:
|
|
223
|
+
|
|
224
|
+
```json
|
|
225
|
+
{
|
|
226
|
+
"feature": "<feature>",
|
|
227
|
+
"routing_rounds": <integer>,
|
|
228
|
+
"attempts": [
|
|
229
|
+
{
|
|
230
|
+
"spawned_at": "<ISO-8601 timestamp>",
|
|
231
|
+
"fixes_applied": [...],
|
|
232
|
+
"remaining_failures": [...],
|
|
233
|
+
"hypothesis": "..."
|
|
234
|
+
}
|
|
235
|
+
]
|
|
236
|
+
}
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
If the file exists: parse, increment `routing_rounds`, append the new attempt to `attempts`, write back atomically. If not: create with `routing_rounds: 1` and the array containing one entry. Use `mkdir -p .harness/state/test-fix/` if the directory is missing.
|
|
240
|
+
|
|
241
|
+
### 6b. Auto-fire auditor diagnostic
|
|
242
|
+
|
|
243
|
+
Invoke the gate in diagnostic mode:
|
|
244
|
+
|
|
245
|
+
```bash
|
|
246
|
+
bash .harness/scripts/auditor-gate.sh diagnostic <feature> \
|
|
247
|
+
"Three test-fixer iterations exhausted; tests still fail. Read the failing test output, the test files, the source files under test, the spec at {{spec_dir}}<feature>.md, and the prior attempts (provided as the artifact). Produce a different-model read: is the spec consistent with what the test asserts? Is there a hidden assumption in the implementation? Is there a fix angle the prior attempts haven't tried? Return ranked hypotheses with concrete next steps." \
|
|
248
|
+
".harness/state/test-fix/<feature>-attempts.json"
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
The gate writes `.harness/state/auditor-approvals/<feature>-stage6-diagnostic.json`. Surface the hypotheses to the user side-by-side with the test-fixer's `HYPOTHESIS`.
|
|
252
|
+
|
|
253
|
+
### 6c. Present the routing menu
|
|
254
|
+
|
|
255
|
+
```
|
|
256
|
+
[1] Retry test-fixer — fresh context, with prior attempts included as REJECTED_APPROACHES
|
|
257
|
+
[2] Re-examine spec — open {{spec_dir}}<feature>.md (test-fixer hypothesis suggests ambiguity)
|
|
258
|
+
[3] Re-examine plan — open {{spec_dir}}<feature>-plan.md (test design may be wrong)
|
|
259
|
+
[4] Manual takeover — drop back to interactive Claude with full context dumped
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
Wait for the user's choice. Do not auto-select.
|
|
263
|
+
|
|
264
|
+
### 6d. Hard retry budget
|
|
265
|
+
|
|
266
|
+
If `routing_rounds` in the attempts file has reached **2**, only `[4] Manual takeover` is offered. The harness stops offering retry/re-examine after two rounds — past that, the failure mode is human, not algorithmic.
|
|
267
|
+
|
|
268
|
+
## Trust contract
|
|
269
|
+
|
|
270
|
+
- **Surface the test-fixer report verbatim.** No "test-fixer says it's fine, proceeding." The user sees `FIXES_APPLIED` in full, including every `suspicious` flag, before any verdict is computed.
|
|
271
|
+
- **Auditor post-fix audit is unconditional on PASS.** No skipping the audit because "the diff looks fine." The skill has no judgment authority to bypass the gate.
|
|
272
|
+
- **Verdict is parsed deterministically from the gate's exit code.** No prose interpretation. The four verdicts are `PASS` (advance silently), `CONCERNS` (advance with logged warning at `.harness/audits/concerns-*.json` for CEO commit-time review), `FAIL` (halt), and `WAIVED` (CEO override only; rejected by the gate if any blocking item cites Universal Core). Note: in this skill, "PASS" appears in two distinct senses — the **test-fixer's** PASS/FAIL signal (tests passed) and the **auditor's** PASS verdict (one of the four). They are independent; both must be checked.
|
|
273
|
+
- **On disagreement** (test-fixer says tests-PASS, auditor says FAIL): auditor wins by default. Surface both views. The user overrides explicitly if they disagree.
|
|
274
|
+
|
|
275
|
+
## Step 7 — Update implementation file "Scenario → automated test map"
|
|
276
|
+
|
|
277
|
+
After every applicable auditor pass advances (PASS, CONCERNS, or WAIVED), update `{{implementation_dir}}<feature>-implementation.md`. The map lives in the implementation file, NOT the CEO spec — file paths and test descriptions are manager-domain. For each scenario classified `[Required automated test]` in the CEO spec, ensure the implementation file's "Scenario → automated test map" section has a subsection with the resolved test bindings:
|
|
278
|
+
|
|
279
|
+
```markdown
|
|
280
|
+
### Scenario 3.4 — <scenario name from CEO spec>
|
|
281
|
+
|
|
282
|
+
- scenario-3-4 — <path>/<test-file> > describe > test name
|
|
283
|
+
- scenario-3-4 — <path>/<other-test-file> > ... (if multiple tests cover the scenario)
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
If the implementation file does not yet exist (simple feature), create it with at minimum the header pointer to the CEO spec plus this map section. If the implementation file exists but lacks the map section, append it.
|
|
287
|
+
|
|
288
|
+
If a scenario is reclassified to `[Smoke test only]` after the fact, remove its subsection from the map.
|
|
289
|
+
|
|
290
|
+
This closes the spec ↔ test mapping loop and makes regression triage tractable months later. The source-of-truth scenario↔test binding remains the `// Verifies scenario X.Y` comment at the top of each test file; the map is the human-facing lookup convenience.
|
|
291
|
+
|
|
292
|
+
## Completion criteria
|
|
293
|
+
|
|
294
|
+
Stage 6 is complete when one of:
|
|
295
|
+
|
|
296
|
+
- All tests pass AND the auditor post-fix audit advances (`PASS`, `CONCERNS`, or `WAIVED` — Step 5: legitimacy + coverage + correctness + spec-vs-reality) AND, if Step 5b's condition holds, the stability-fix end-to-end audit advances (`PASS`, `CONCERNS`, or `WAIVED`), AND the implementation file's "Scenario → automated test map" has a subsection populated for every `[Required automated test]` scenario in the CEO spec (Step 7). CONCERNS warnings must be surfaced to the CEO before commit.
|
|
297
|
+
- The user has explicitly accepted ESCALATE state and chosen `[4] Manual takeover` (Step 6d).
|
|
298
|
+
|
|
299
|
+
Either outcome must be reported explicitly. No silent advancement to Stage 7 (smoke).
|
|
300
|
+
|
|
301
|
+
## Anti-patterns the skill blocks
|
|
302
|
+
|
|
303
|
+
- Implementer rationalizing why a failing test is "wrong" → fresh subagent context
|
|
304
|
+
- Implementer mocking internals to make a test pass → test-fixer's hard rules + auditor audit
|
|
305
|
+
- Implementer `.skip`-ing a test (or framework equivalent) → suspicious flag + auditor audit
|
|
306
|
+
- Loosening an assertion to match buggy code → suspicious flag + auditor audit
|
|
307
|
+
- Unbounded iterations on a stuck test → 3-iteration cap → escalation
|
|
308
|
+
- Looping forever on retry → 2-routing-round hard limit → manual takeover only
|
|
309
|
+
- Skipping the post-fix audit because the diff looks fine → mandatory by skill design
|
|
310
|
+
- Stability-fix lane shipping changes with zero cross-model review → Step 5b fires whenever no prior Stage 5 auditor audit exists
|
|
311
|
+
- Tests that cover-but-don't-name a scenario → auditor's scenario-coverage axis flags missing `// Verifies scenario X.Y` comments
|
|
312
|
+
- Implementation file's "Scenario → automated test map" missing a subsection for any `[Required automated test]` scenario after auditor advances (PASS / CONCERNS / WAIVED) → completion criteria forces the update
|
|
313
|
+
|
|
314
|
+
---
|
|
315
|
+
|
|
316
|
+
## Checkpoint + decision-log integration (MAGI Archivist)
|
|
317
|
+
|
|
318
|
+
After all required tests are green + post-fix auditor-gate passes:
|
|
319
|
+
|
|
320
|
+
```bash
|
|
321
|
+
.harness/scripts/checkpoint-write.sh \
|
|
322
|
+
--feature <feature-slug> \
|
|
323
|
+
--stage 7 \
|
|
324
|
+
--stage-complete 6 \
|
|
325
|
+
--append-audit "$(jq -c '{stage:6, verdict, risk:.risk_score, at:now|todate}' .harness/state/auditor-approvals/<feature>-stage6.json)"
|
|
326
|
+
|
|
327
|
+
# If MAGI Tester needed escalation (3 iterations exhausted) and CEO took over:
|
|
328
|
+
.harness/scripts/decision-log-append.sh \
|
|
329
|
+
--feature <feature-slug> --stage 6 --by "CEO" \
|
|
330
|
+
--decision "manual takeover after test-fixer exhausted 3 iterations" \
|
|
331
|
+
--evidence ".harness/state/test-fix/<feature>-attempts.json"
|
|
332
|
+
```
|
|
333
|
+
|
|
334
|
+
---
|
|
335
|
+
|
|
336
|
+
## Final message to CEO (natural-language, Stage 6 → Stage 7 → Stage 8)
|
|
337
|
+
|
|
338
|
+
After Stage 6 completes (tests green + post-fix auditor verdict), display (in CEO's OS locale):
|
|
339
|
+
|
|
340
|
+
```
|
|
341
|
+
✅ Stage 6 完成 — <feature> 的自动测试都过了
|
|
342
|
+
通过: N 个测试
|
|
343
|
+
MAGI Tester 迭代次数: X / 3
|
|
344
|
+
MAGI Verdict: <PASS/CONCERNS/WAIVED>, risk = M
|
|
345
|
+
|
|
346
|
+
⚠️ Stage 7 — CEO 手工冒烟测试 (这一步只有人能做)
|
|
347
|
+
|
|
348
|
+
请你按 docs/features/<feature>.md 的 smoke test procedure 手动跑一下:
|
|
349
|
+
1. 打开 app / 启动开发服务器
|
|
350
|
+
2. 走一遍你最关心的 happy path
|
|
351
|
+
3. 试 1-2 个 edge case (从 spec 里挑你最不放心的)
|
|
352
|
+
4. 回来告诉我结果
|
|
353
|
+
|
|
354
|
+
接下来你说:
|
|
355
|
+
👉 「smoke 过了」/「好的」/「ship」 — 我做 Stage 8 (commit + push)
|
|
356
|
+
👉 「smoke 发现 bug X」 — 我回 Stage 5 修
|
|
357
|
+
👉 「需要重新审一下 spec」 — 我回 Stage 1 改 spec (大改)
|
|
358
|
+
👉 「放弃」 — 不 ship 这个功能
|
|
359
|
+
```
|
|
360
|
+
|
|
361
|
+
**Critical**: do NOT skip the CEO smoke test. Per `constitution.md § 1.4`, AI-self-report of "done" is constitutionally forbidden — the human MUST manually verify before commit. This is non-negotiable.
|
|
362
|
+
|
|
363
|
+
On "smoke 过了" → invoke `/commit <feature>` silently.
|