cclaw-cli 8.3.0 → 8.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +24 -4
- package/dist/constants.d.ts +1 -1
- package/dist/constants.js +1 -1
- package/dist/content/skills.js +451 -29
- package/dist/content/specialist-prompts/architect.d.ts +1 -1
- package/dist/content/specialist-prompts/architect.js +8 -1
- package/dist/content/specialist-prompts/brainstormer.d.ts +1 -1
- package/dist/content/specialist-prompts/brainstormer.js +3 -0
- package/dist/content/specialist-prompts/planner.d.ts +1 -1
- package/dist/content/specialist-prompts/planner.js +48 -2
- package/dist/content/specialist-prompts/reviewer.d.ts +1 -1
- package/dist/content/specialist-prompts/reviewer.js +185 -42
- package/dist/content/specialist-prompts/security-reviewer.d.ts +1 -1
- package/dist/content/specialist-prompts/security-reviewer.js +3 -0
- package/dist/content/specialist-prompts/slice-builder.d.ts +1 -1
- package/dist/content/specialist-prompts/slice-builder.js +5 -2
- package/dist/content/start-command.js +128 -17
- package/dist/flow-state.d.ts +11 -0
- package/dist/flow-state.js +26 -0
- package/dist/types.d.ts +17 -0
- package/package.json +1 -1
package/dist/content/skills.js
CHANGED
|
@@ -201,6 +201,174 @@ The user is expected to clarify in (4) Custom or accept (1) Proceed; either way
|
|
|
201
201
|
- Forgetting to ask Question 2 (run mode) after Question 1 (path). \`triage.runMode\` controls Hop 4 (pause); a missing value defaults to \`step\` — safe but wastes a click for users who wanted autopilot.
|
|
202
202
|
- Forgetting to write \`triage\` into \`flow-state.json\`. The hook check \`commit-helper.mjs\` and the resume detector both read it; an absent triage breaks both.
|
|
203
203
|
- Re-running the gate on resume. Resume reads the saved triage (path + runMode) and continues from \`currentStage\`; it never re-prompts.
|
|
204
|
+
|
|
205
|
+
## Next step
|
|
206
|
+
|
|
207
|
+
After both triage questions are answered AND the path is **not** \`inline\`, the orchestrator runs the \`pre-flight-assumptions\` skill (Hop 2.5) before dispatching the first specialist. On the inline path, the orchestrator goes straight to the build dispatch — pre-flight is skipped (a one-line edit has no assumptions worth surfacing).
|
|
208
|
+
`;
|
|
209
|
+
const PRE_FLIGHT_ASSUMPTIONS = `---
|
|
210
|
+
name: pre-flight-assumptions
|
|
211
|
+
trigger: after triage-gate, before the first specialist dispatch — only when triage.path is NOT inline
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
# Skill: pre-flight-assumptions
|
|
215
|
+
|
|
216
|
+
Triage answers "**how much** work is this?" and "**how should we run it?**". Pre-flight answers "**on what assumptions** are we doing it?". They are different questions; v8.4 adds this hop because silently-defaulted assumptions are the most common reason a small/medium build ships the wrong feature.
|
|
217
|
+
|
|
218
|
+
The pre-flight skill runs **once** per flow, between the triage gate (Hop 2) and the first specialist dispatch (Hop 3). It does not run on the inline / trivial path — a single-file edit has no architectural assumptions worth surfacing.
|
|
219
|
+
|
|
220
|
+
## What the orchestrator does
|
|
221
|
+
|
|
222
|
+
1. Read \`triage.path\` from \`flow-state.json\`.
|
|
223
|
+
2. If \`path == ["build"]\` (inline), skip this skill entirely. Go to dispatch.
|
|
224
|
+
3. Otherwise:
|
|
225
|
+
1. Inspect the repo for stack inference. Read at most:
|
|
226
|
+
- \`package.json\` / \`pnpm-lock.yaml\` (Node, framework + version, test runner);
|
|
227
|
+
- \`pyproject.toml\` / \`requirements.txt\` (Python, framework + version);
|
|
228
|
+
- \`go.mod\` (Go);
|
|
229
|
+
- \`Cargo.toml\` (Rust);
|
|
230
|
+
- \`composer.json\` (PHP);
|
|
231
|
+
- \`Gemfile\` (Ruby);
|
|
232
|
+
- the top-level README or AGENTS.md if it exists.
|
|
233
|
+
2. Inspect the most recent shipped slug under \`.cclaw/flows/shipped/\` (if any) — its \`assumptions:\` block is your seed for what defaults the project already established.
|
|
234
|
+
3. Compose 3–7 short, numbered assumptions covering:
|
|
235
|
+
- **Stack** — language version, framework, runtime target, test runner.
|
|
236
|
+
- **Conventions** — where tests live (\`tests/\`, \`__tests__/\`, alongside source), filename pattern (\`*.test.ts\`, \`*.spec.ts\`, \`*_test.go\`).
|
|
237
|
+
- **Architecture defaults** that apply to this slug — CSS strategy, state strategy, auth strategy, persistence pattern. Skip items that are not relevant.
|
|
238
|
+
- **Out-of-scope defaults** — what we will NOT do unless asked (mobile breakpoints, i18n, telemetry hooks).
|
|
239
|
+
4. Surface them through the harness's structured ask tool. If the harness has none, fall back to a fenced block; same rule as the triage gate.
|
|
240
|
+
5. Persist the user's confirmed list to \`flow-state.json\`'s \`triage.assumptions\`.
|
|
241
|
+
|
|
242
|
+
## Output shape — STRUCTURED ask
|
|
243
|
+
|
|
244
|
+
Render the numbered list as the question prompt, plus four options:
|
|
245
|
+
|
|
246
|
+
- prompt:
|
|
247
|
+
|
|
248
|
+
Pre-flight — I'm about to run with these assumptions:
|
|
249
|
+
|
|
250
|
+
1. Node 20.11; Next.js 14.1; React 19.0; Tailwind 3.4 (read from package.json).
|
|
251
|
+
2. Tests live in \`tests/\` mirroring the production module path (\`*.test.tsx\`).
|
|
252
|
+
3. CSS strategy: Tailwind utility classes + 1 \`tokens.css\` for color/space tokens (matches existing components).
|
|
253
|
+
4. Auth strategy: session-based cookies via \`next-auth\` (current pattern).
|
|
254
|
+
5. Out-of-scope: mobile breakpoints, i18n strings, telemetry events.
|
|
255
|
+
|
|
256
|
+
Correct me now or I proceed with these.
|
|
257
|
+
|
|
258
|
+
- options:
|
|
259
|
+
- \`Proceed with these\`
|
|
260
|
+
- \`Edit one assumption\`
|
|
261
|
+
- \`Edit several assumptions\`
|
|
262
|
+
- \`Cancel — re-think the request\`
|
|
263
|
+
|
|
264
|
+
If the user picks "Edit one" or "Edit several", follow up with a free-text ask (textarea / prompt-input) for the corrected list. Re-confirm once with the structured tool, then persist.
|
|
265
|
+
|
|
266
|
+
If the user dismisses the question (timeout, harness limitation), default to "Proceed with these" — the user has at least seen them once, and the next message can amend if needed.
|
|
267
|
+
|
|
268
|
+
## Output shape — FALLBACK (no structured ask)
|
|
269
|
+
|
|
270
|
+
\`\`\`
|
|
271
|
+
Pre-flight assumptions
|
|
272
|
+
1. Node 20.11; Next.js 14.1; React 19.0; Tailwind 3.4 (from package.json).
|
|
273
|
+
2. Tests live in tests/ mirroring production module path.
|
|
274
|
+
3. CSS: Tailwind + tokens.css.
|
|
275
|
+
4. Auth: session cookies via next-auth.
|
|
276
|
+
5. Out of scope: mobile, i18n, telemetry.
|
|
277
|
+
|
|
278
|
+
Correct me now or I proceed.
|
|
279
|
+
[1] Proceed
|
|
280
|
+
[2] Edit one assumption — say which number and the replacement
|
|
281
|
+
[3] Edit several — paste the corrected list
|
|
282
|
+
[4] Cancel
|
|
283
|
+
\`\`\`
|
|
284
|
+
|
|
285
|
+
## Persistence shape
|
|
286
|
+
|
|
287
|
+
After the user accepts (with or without edits), patch \`flow-state.json\`:
|
|
288
|
+
|
|
289
|
+
\`\`\`json
|
|
290
|
+
{
|
|
291
|
+
"triage": {
|
|
292
|
+
"complexity": "small-medium",
|
|
293
|
+
"acMode": "soft",
|
|
294
|
+
"path": ["plan", "build", "review", "ship"],
|
|
295
|
+
"rationale": "...",
|
|
296
|
+
"decidedAt": "...",
|
|
297
|
+
"userOverrode": false,
|
|
298
|
+
"runMode": "step",
|
|
299
|
+
"assumptions": [
|
|
300
|
+
"Node 20.11, Next.js 14.1, React 19.0, Tailwind 3.4",
|
|
301
|
+
"Tests in tests/ mirroring module path",
|
|
302
|
+
"CSS: Tailwind + tokens.css",
|
|
303
|
+
"Auth: session cookies via next-auth",
|
|
304
|
+
"Out of scope: mobile, i18n, telemetry"
|
|
305
|
+
]
|
|
306
|
+
}
|
|
307
|
+
}
|
|
308
|
+
\`\`\`
|
|
309
|
+
|
|
310
|
+
The list is **immutable** for the lifetime of the flow. If during build a sub-agent finds an assumption was wrong, it stops and surfaces — the orchestrator either runs \`/cc-cancel\` and starts fresh, or accepts the violation as an explicit user decision and records it in the build log.
|
|
311
|
+
|
|
312
|
+
## How sub-agents read assumptions
|
|
313
|
+
|
|
314
|
+
Every dispatch envelope from Hop 3 onward includes a one-line note:
|
|
315
|
+
|
|
316
|
+
\`\`\`
|
|
317
|
+
Pre-flight assumptions: see triage.assumptions in flow-state.json
|
|
318
|
+
\`\`\`
|
|
319
|
+
|
|
320
|
+
Sub-agents (planner, slice-builder, reviewer, etc.) read \`flow-state.json > triage.assumptions\` before authoring their artifact. The list is appended verbatim (under \`## Assumptions\`) to:
|
|
321
|
+
|
|
322
|
+
- \`flows/<slug>/plan.md\` — copy the list once after the Frame, so the plan stays self-contained for review.
|
|
323
|
+
- \`flows/<slug>/decisions.md\` — when architect runs, the assumptions are the first input the architect must respect.
|
|
324
|
+
|
|
325
|
+
A sub-agent that would need to break an assumption raises it as a finding (in slice-builder: stop and surface; in reviewer: \`block\`-severity finding) instead of silently overriding.
|
|
326
|
+
|
|
327
|
+
## Sizing rules
|
|
328
|
+
|
|
329
|
+
- **3–7 assumptions, no more.** A long list is noise; the user stops reading after item 5.
|
|
330
|
+
- **One sentence per assumption, no half-paragraphs.** Pick the most relevant version / pattern / boundary.
|
|
331
|
+
- **Cite where you got each one** when non-obvious — \`(from package.json)\`, \`(from latest shipped slug 'auth-cleanup')\`, \`(my default — confirm)\`. The citation tells the user what is observed vs. defaulted.
|
|
332
|
+
- **Never invent specifics you have not read.** If you cannot cite a concrete version from a real file, say so: "(default; the project did not pin a Tailwind version)".
|
|
333
|
+
|
|
334
|
+
## What the pre-flight is NOT
|
|
335
|
+
|
|
336
|
+
- It is not a planning step. It does not write AC or design.
|
|
337
|
+
- It is not the brainstormer's Frame. The Frame answers "what are we building?"; pre-flight answers "on what stack and conventions?".
|
|
338
|
+
- It is not a security review. Sensitive surfaces still get \`security-reviewer\` later in the flow.
|
|
339
|
+
- It is not optional in soft / strict modes. The whole point is to surface defaults; skipping it defeats the skill.
|
|
340
|
+
|
|
341
|
+
## Worked example — Russian conversation
|
|
342
|
+
|
|
343
|
+
User: "Сделай страницу настроек профиля."
|
|
344
|
+
|
|
345
|
+
Pre-flight prompt (structured ask):
|
|
346
|
+
|
|
347
|
+
\`\`\`
|
|
348
|
+
Pre-flight — допущения, с которыми буду работать:
|
|
349
|
+
|
|
350
|
+
1. Next.js 14.1, React 19.0, TypeScript 5.4 (из package.json).
|
|
351
|
+
2. Тесты в tests/ рядом с компонентом, файлы *.test.tsx (как в существующих компонентах).
|
|
352
|
+
3. CSS: Tailwind + общий tokens.css (как в components/ui/).
|
|
353
|
+
4. Тёмная тема через CSS variables, не next-themes (текущая стратегия).
|
|
354
|
+
5. Сохранение в БД через существующий route /api/profile (Prisma + Postgres). localStorage не используем.
|
|
355
|
+
6. Server component для каркаса страницы, client component только для формы.
|
|
356
|
+
|
|
357
|
+
Поправь сейчас или продолжаю с этим.
|
|
358
|
+
\`\`\`
|
|
359
|
+
|
|
360
|
+
Options: \`Иду с этим\` / \`Поправить одно\` / \`Поправить несколько\` / \`Отмена\`.
|
|
361
|
+
|
|
362
|
+
Note: prompt is in Russian (matches user's language), but \`tokens.css\`, \`tests/\`, \`*.test.tsx\`, \`/api/profile\`, \`Prisma\`, \`Tailwind\` stay in their original form — they are mechanical tokens (see \`conversation-language.md\`).
|
|
363
|
+
|
|
364
|
+
## Common pitfalls
|
|
365
|
+
|
|
366
|
+
- **Listing 12+ assumptions.** That is a checklist, not an assumptions block. Keep it 3–7.
|
|
367
|
+
- **Mixing assumptions with the plan.** The plan goes into \`plan.md\`. The assumptions are pre-plan context.
|
|
368
|
+
- **Skipping pre-flight on \`small-medium\` because "the user knows the stack".** The user *does* know; pre-flight makes sure the orchestrator knows the same things.
|
|
369
|
+
- **Re-running pre-flight on resume.** It runs once per flow. Resume reads the saved \`assumptions\` from \`flow-state.json\` and proceeds.
|
|
370
|
+
- **Defaulting an assumption from training data instead of the repo.** If you cannot cite a file or shipped slug, mark the assumption with "(my default — confirm)" so the user knows it is a guess.
|
|
371
|
+
- **Pre-flight on the inline path.** Skip. Trivial change, no assumptions to surface.
|
|
204
372
|
`;
|
|
205
373
|
const FLOW_RESUME = `---
|
|
206
374
|
name: flow-resume
|
|
@@ -515,71 +683,106 @@ Every \`reviews/<slug>.md\` carries an append-only ledger. Each row is a single
|
|
|
515
683
|
\`\`\`markdown
|
|
516
684
|
## Concern Ledger
|
|
517
685
|
|
|
518
|
-
| ID | Opened in | Mode | Severity | Status | Closed in | Citation |
|
|
519
|
-
| --- | --- | --- | --- | --- | --- | --- |
|
|
520
|
-
| F-1 | 1 | code |
|
|
521
|
-
| F-2 | 2 | code |
|
|
686
|
+
| ID | Opened in | Mode | Axis | Severity | Status | Closed in | Citation |
|
|
687
|
+
| --- | --- | --- | --- | --- | --- | --- | --- |
|
|
688
|
+
| F-1 | 1 | code | correctness | required | closed | 2 | \`src/api/list.ts:14\` |
|
|
689
|
+
| F-2 | 2 | code | readability | consider | open | – | \`tests/integration/list.test.ts:31\` |
|
|
690
|
+
| F-3 | 1 | code | perf | nit | open | – | \`src/api/list.ts:88\` |
|
|
522
691
|
\`\`\`
|
|
523
692
|
|
|
524
693
|
Rules:
|
|
525
694
|
|
|
526
695
|
- **F-N** ids are stable and global per slug — never renumber. If a finding is superseded, append \`F-K supersedes F-J\` instead.
|
|
527
|
-
- **
|
|
696
|
+
- **Axis** is one of \`correctness\` | \`readability\` | \`architecture\` | \`security\` | \`perf\`. Pick the dimension the finding speaks to; never blank.
|
|
697
|
+
- **Severity** is one of \`critical\` | \`required\` | \`consider\` | \`nit\` | \`fyi\`. Ship gate threshold depends on \`acMode\` (see below).
|
|
528
698
|
- **Status** is \`open\` | \`closed\`. A closed row records the iteration that closed it.
|
|
529
699
|
- **Citation** is a real \`file:line\` (or test id, or commit SHA). No prose-only findings — if you cannot cite, you do not have a finding yet.
|
|
530
700
|
|
|
531
701
|
When iteration N+1 runs, the reviewer reads the ledger first, re-validates each open row (still open? closed by a fix? superseded?), then appends new findings as F-(max+1). Closing a row requires a citation to the fix evidence (commit SHA, test name, or new file:line).
|
|
532
702
|
|
|
533
|
-
|
|
703
|
+
> Severity legacy note: cclaw 8.0–8.3 used \`block\` / \`warn\` / \`info\`. v8.4 maps these to the new five-tier scale on read: \`block → critical | required\` (use \`critical\` only when ship-breaking, otherwise \`required\`), \`warn → consider\`, \`info → fyi\`. Mark migrated rows with \`(migrated from <old-severity>)\` in citation the first time you re-read them.
|
|
704
|
+
|
|
705
|
+
## Five axes (mandatory walk per iteration)
|
|
706
|
+
|
|
707
|
+
Walk every diff with the five axes in mind. Per-axis checklist:
|
|
708
|
+
|
|
709
|
+
| axis | what to check | typical findings |
|
|
710
|
+
| --- | --- | --- |
|
|
711
|
+
| \`correctness\` | does the code match the AC verification line? edge cases? tests assert state, not interactions? | wrong branch, missing edge case, test passes for wrong reason, mocks-of-things-we-own |
|
|
712
|
+
| \`readability\` | clear names, control flow, no dead code, no unnecessary cleverness | unclear name, long fn, hidden side effect |
|
|
713
|
+
| \`architecture\` | pattern fit, coupling, abstraction level, diff size | new dep when stdlib works; cross-layer reach; \`>300 LOC\` for one logical change → split |
|
|
714
|
+
| \`security\` | pre-screen for surfaces handled deeper by \`security-reviewer\` | unsanitised input, secrets in logs, missing authn/authz, encoding mismatch |
|
|
715
|
+
| \`perf\` | hot-path quality | N+1, unbounded loop, sync-where-async, missing pagination |
|
|
716
|
+
|
|
717
|
+
A reviewer that records zero findings on every axis must explicitly say so in the iteration block ("Five-axis pass: no findings on any axis"); silence is not the same as a clean review.
|
|
718
|
+
|
|
719
|
+
## Severity ↔ acMode → ship gate
|
|
720
|
+
|
|
721
|
+
| acMode | open severity → blocks ship |
|
|
722
|
+
| --- | --- |
|
|
723
|
+
| \`strict\` | \`critical\` OR \`required\` |
|
|
724
|
+
| \`soft\` | \`critical\` only (\`required\` carries over) |
|
|
725
|
+
| \`inline\` | reviewer not invoked |
|
|
726
|
+
|
|
727
|
+
\`consider\` / \`nit\` / \`fyi\` never block ship. They carry over to \`ships/<slug>.md\` (and \`learnings/<slug>.md\` for \`consider\`) but do not delay ship.
|
|
728
|
+
|
|
729
|
+
## Convergence detector (acMode-aware)
|
|
534
730
|
|
|
535
731
|
The loop ends when ANY of these fires:
|
|
536
732
|
|
|
537
|
-
1. **All ledger rows closed.** Decision: \`clear\`.
|
|
538
|
-
2. **Two consecutive iterations append zero new
|
|
539
|
-
3. **Hard cap reached** (5 iterations) with at least one open
|
|
733
|
+
1. **All ledger rows closed.** Decision: \`clear\`.
|
|
734
|
+
2. **Two consecutive iterations append zero new blocking findings AND every open row is non-blocking.** Decision: \`clear\` with non-blocking carry-over to \`ships/<slug>.md\` and \`learnings/<slug>.md\`. "Blocking" depends on acMode (see table above).
|
|
735
|
+
3. **Hard cap reached** (5 iterations) with at least one open blocking row remaining. Decision: \`cap-reached\`. Stop; surface to user.
|
|
540
736
|
|
|
541
|
-
Tie-breaker: if iteration 5 closes the last
|
|
737
|
+
Tie-breaker: if iteration 5 closes the last blocking row, return \`clear\` (signal #1) even though the cap was hit. The cap exists to bound runaway loops, not to punish a slug that converges on the last attempt.
|
|
542
738
|
|
|
543
739
|
## Hard cap
|
|
544
740
|
|
|
545
741
|
- 5 review iterations per slug. After the 5th, the reviewer writes \`status: cap-reached\` and stops.
|
|
546
|
-
- The orchestrator surfaces every remaining open ledger row and recommends \`/cc-cancel\` (split into a fresh slug) or \`accept
|
|
742
|
+
- The orchestrator surfaces every remaining open ledger row and recommends \`/cc-cancel\` (split into a fresh slug) or \`accept-and-ship\` (only valid if every remaining open row is non-blocking under the active acMode).
|
|
547
743
|
|
|
548
744
|
## Decision values
|
|
549
745
|
|
|
550
|
-
- \`block\` — at least one ledger row is
|
|
551
|
-
- \`warn\` — open rows exist, all
|
|
552
|
-
- \`clear\` —
|
|
553
|
-
- \`cap-reached\` — signal #3 fired with at least one open
|
|
746
|
+
- \`block\` — at least one ledger row is blocking under the active acMode + open. \`slice-builder\` (mode=fix-only) must run next; then re-review.
|
|
747
|
+
- \`warn\` — open rows exist, all non-blocking, convergence detector signal #2 has fired. Ship may proceed; carry-over.
|
|
748
|
+
- \`clear\` — signal #1 (all closed) OR signal #2 (non-blocking convergence). Ready for ship.
|
|
749
|
+
- \`cap-reached\` — signal #3 fired with at least one open blocking row remaining.
|
|
554
750
|
|
|
555
|
-
## Worked example — three-iteration convergence
|
|
751
|
+
## Worked example — three-iteration convergence (strict mode)
|
|
556
752
|
|
|
557
753
|
\`\`\`markdown
|
|
558
754
|
## Iteration 1 — code — 2026-04-18T10:14Z
|
|
559
755
|
|
|
756
|
+
Five-axis pass:
|
|
757
|
+
- correctness: F-1 (missing pagination cursor).
|
|
758
|
+
- readability: no findings.
|
|
759
|
+
- architecture: no findings.
|
|
760
|
+
- security: no findings.
|
|
761
|
+
- perf: F-2 (no negative test for empty page; potential N+1 if cursor regressed).
|
|
762
|
+
|
|
560
763
|
Findings:
|
|
561
|
-
- F-1
|
|
562
|
-
- F-2
|
|
764
|
+
- F-1 correctness/required — \`src/api/list.ts:14\` — missing pagination cursor.
|
|
765
|
+
- F-2 perf/consider — \`tests/integration/list.test.ts:31\` — no negative test for empty page.
|
|
563
766
|
|
|
564
|
-
Decision: block. slice-builder (mode=fix-only) invoked next.
|
|
767
|
+
Decision: block (F-1 is required-severity in strict). slice-builder (mode=fix-only) invoked next.
|
|
565
768
|
|
|
566
769
|
## Iteration 2 — code — 2026-04-18T10:39Z
|
|
567
770
|
|
|
568
771
|
Ledger reread:
|
|
569
772
|
- F-1: closed — fix at \`src/api/list.ts:18\` (commit 7a91ab2). Citation matches.
|
|
570
|
-
- F-2: open — no fix attempted (
|
|
773
|
+
- F-2: open — no fix attempted (consider carry-over).
|
|
571
774
|
|
|
572
|
-
|
|
775
|
+
Five-axis pass: no new findings on any axis.
|
|
573
776
|
|
|
574
|
-
Decision: warn. Convergence signal #2 needs another zero-
|
|
777
|
+
Decision: warn. Convergence signal #2 needs another zero-blocking iteration.
|
|
575
778
|
|
|
576
779
|
## Iteration 3 — code — 2026-04-18T11:02Z
|
|
577
780
|
|
|
578
781
|
Ledger reread:
|
|
579
782
|
- F-1: closed (sticky).
|
|
580
|
-
- F-2: open (
|
|
783
|
+
- F-2: open (consider carry-over).
|
|
581
784
|
|
|
582
|
-
|
|
785
|
+
Five-axis pass: no findings. Two consecutive zero-blocking iterations recorded.
|
|
583
786
|
|
|
584
787
|
Decision: clear (signal #2). F-2 carries to ships/<slug>.md and learnings/<slug>.md.
|
|
585
788
|
\`\`\`
|
|
@@ -589,9 +792,12 @@ Decision: clear (signal #2). F-2 carries to ships/<slug>.md and learnings/<slug>
|
|
|
589
792
|
- Adding "implicit" findings without citations because "the reviewer can see it". The reviewer cannot. Cite \`file:line\` or do not record the finding.
|
|
590
793
|
- Renumbering F-N ids when an old finding is superseded. Append a new row \`F-K supersedes F-J\`; never rewrite history.
|
|
591
794
|
- Closing a row without a fix citation. Closing is itself a claim — record the SHA / test name / file:line that proves the fix.
|
|
592
|
-
- Treating "no new findings" as instant clear. The convergence detector requires *two* consecutive zero-
|
|
795
|
+
- Treating "no new findings" as instant clear. The convergence detector requires *two* consecutive zero-blocking iterations; one is not enough.
|
|
593
796
|
- Skipping the convergence check and looping until cap. The detector exists so easy slugs ship fast; do not waste budget.
|
|
594
797
|
- Mixing \`code\` and \`text-review\` modes within one iteration. Each iteration declares one mode in its header.
|
|
798
|
+
- Recording a finding without an axis. Every row carries an axis (one of \`correctness\` / \`readability\` / \`architecture\` / \`security\` / \`perf\`). Pick the dimension the finding speaks to; never blank.
|
|
799
|
+
- Marking everything as \`required\` because "it might matter". Severity is graduated: \`critical\` for ship-breaking, \`required\` for must-fix-before-ship, \`consider\` for suggestion, \`nit\` for minor, \`fyi\` for context only. Padding severity makes it useless.
|
|
800
|
+
- Walking only one or two axes when the diff touches all five. The Five-axis pass is mandatory every iteration; record "no findings" for axes you walked but found clean. Silence is a smell — say what you walked.
|
|
595
801
|
`;
|
|
596
802
|
const COMMIT_MESSAGE_QUALITY = `---
|
|
597
803
|
name: commit-message-quality
|
|
@@ -722,14 +928,22 @@ The Iron Law holds in every mode; only the *bookkeeping* differs. Skipping tests
|
|
|
722
928
|
### GREEN — minimal production change
|
|
723
929
|
|
|
724
930
|
- Smallest possible production diff that turns RED into PASS.
|
|
725
|
-
- Run the **
|
|
726
|
-
-
|
|
931
|
+
- Run the **affected-test suite first** (test impact analysis), not the full suite — fast feedback. The affected tests are: tests in the test directory mirroring the modified production module path PLUS tests that import the modified module directly. Tools: \`vitest related <file>\`, \`jest --findRelatedTests <file>\`, \`pytest --testmon\` if available, or a manual \`grep\` for imports + the mirrored test file.
|
|
932
|
+
- After affected tests pass, run the **full relevant suite** as the safety net before commit. A passing single test with the suite broken elsewhere is a regression, not GREEN.
|
|
933
|
+
- Capture both: the affected-tests command + PASS summary, AND the full-suite command + PASS summary. The two together are the **GREEN evidence** in \`build.md\`.
|
|
727
934
|
- Touch only files declared in the plan. If a file outside the plan is required, **stop** and surface the conflict.
|
|
728
935
|
- Commit: \`commit-helper.mjs --ac=AC-N --phase=green --message="green(AC-N): …"\`.
|
|
729
936
|
|
|
937
|
+
Why two-stage: affected tests close the loop in seconds → fast iteration; full suite catches regressions impact analysis missed (test discovery is heuristic, not guaranteed). In tiny repos (<100 tests, <2s suite) the two stages collapse to one command — that is fine. In larger repos the difference is real wall-clock; affected-first matters.
|
|
938
|
+
|
|
730
939
|
### REFACTOR — mandatory pass
|
|
731
940
|
|
|
732
|
-
REFACTOR is **not optional**. Even when the GREEN diff feels minimal, you must consider rename / extract / inline / type-narrow / dedup / dead-code-removal.
|
|
941
|
+
REFACTOR is **not optional**. Even when the GREEN diff feels minimal, you must consider rename / extract / inline / type-narrow / dedup / dead-code-removal.
|
|
942
|
+
|
|
943
|
+
After the refactor edits:
|
|
944
|
+
|
|
945
|
+
1. Run the **full relevant suite** (always, not just affected). REFACTOR is the safety net for "did my rename break a place I didn't expect"; affected-test analysis is by definition incomplete here because a renamed symbol may have changed which tests are affected.
|
|
946
|
+
2. The suite must pass with **identical expected output** (no behaviour change). Snapshot diffs are a refactor leak; if a snapshot moved, your "refactor" is a behaviour change in disguise.
|
|
733
947
|
|
|
734
948
|
If a refactor is warranted, apply it and commit:
|
|
735
949
|
|
|
@@ -745,7 +959,7 @@ Silence fails the gate.
|
|
|
745
959
|
|
|
746
960
|
\`commit-helper\` enforces (a) ↔ (e) mechanically. The reviewer checks (b), (d), (f), (g) on iteration 1.
|
|
747
961
|
|
|
748
|
-
(a) **discovery_complete** — relevant tests / fixtures / helpers / commands cited.\n(b) **impact_check_complete** — affected callbacks / state / interfaces / contracts named.\n(c) **red_test_recorded** — failing test exists, watched-RED proof attached.\n(d) **red_fails_for_right_reason** — RED captured a real assertion failure.\n(e) **
|
|
962
|
+
(a) **discovery_complete** — relevant tests / fixtures / helpers / commands cited.\n(b) **impact_check_complete** — affected callbacks / state / interfaces / contracts named.\n(c) **red_test_recorded** — failing test exists, watched-RED proof attached.\n(d) **red_fails_for_right_reason** — RED captured a real assertion failure.\n(e) **green_two_stage_suite** — affected-tests pass AND full relevant suite passes after GREEN. Both commands captured in build.md.\n(f) **refactor_run_or_skipped_with_reason** — REFACTOR ran (with FULL suite green afterward), or explicitly skipped with reason.\n(g) **traceable_to_plan** — commits reference plan AC ids and the plan's file set.\n(h) **commit_chain_intact** — RED + GREEN + REFACTOR SHAs (or skipped sentinel) recorded in flow-state.
|
|
749
963
|
|
|
750
964
|
## Vertical slicing — tracer bullets, never horizontal waves
|
|
751
965
|
|
|
@@ -1051,6 +1265,200 @@ In all four cases: stop, return the summary JSON, do **not** push code that "wor
|
|
|
1051
1265
|
- Documented \`// eslint-disable\` lines with a one-line justification AND a follow-up issue id. The justification is what makes it not slop.
|
|
1052
1266
|
- Running \`tsc --noEmit\` after \`npm test\` — that is a different tool, not a re-run of the same one.
|
|
1053
1267
|
`;
|
|
1268
|
+
const SOURCE_DRIVEN = `---
|
|
1269
|
+
name: source-driven
|
|
1270
|
+
trigger: when architect or planner is dispatched in strict mode AND the task is framework-specific
|
|
1271
|
+
---
|
|
1272
|
+
|
|
1273
|
+
# Skill: source-driven
|
|
1274
|
+
|
|
1275
|
+
Framework-specific code (React hooks, Django views, Next.js routing, Prisma migrations, Tailwind utilities, etc.) must be **grounded in official documentation, not memory**. Training data goes stale: APIs deprecate, signatures change, recommended patterns evolve. Source-driven means: detect the stack, fetch the relevant doc page, implement against it, cite the URL.
|
|
1276
|
+
|
|
1277
|
+
## When this skill applies
|
|
1278
|
+
|
|
1279
|
+
| Triage | Stack signal | Apply? |
|
|
1280
|
+
| --- | --- | --- |
|
|
1281
|
+
| \`strict\` (large-risky / security-flagged) | framework-specific code in scope | **always** — required for architect / planner |
|
|
1282
|
+
| \`soft\` (small-medium) | framework-specific code in scope | **opt-in** — enable when the user asks for "source-driven" or "verified" implementation |
|
|
1283
|
+
| \`inline\` (trivial) | any | **never** — single-line edits don't need citations |
|
|
1284
|
+
| any | pure logic (loops, data structures, internal helpers) | skip — correctness is version-independent |
|
|
1285
|
+
|
|
1286
|
+
The orchestrator passes \`source_driven: true\` in the dispatch envelope when it applies. Specialists honour the flag.
|
|
1287
|
+
|
|
1288
|
+
## The four-step process
|
|
1289
|
+
|
|
1290
|
+
\`\`\`
|
|
1291
|
+
DETECT ──→ FETCH ──→ IMPLEMENT ──→ CITE
|
|
1292
|
+
│ │ │ │
|
|
1293
|
+
▼ ▼ ▼ ▼
|
|
1294
|
+
What Get the Follow the Show the
|
|
1295
|
+
stack + relevant documented URL inline
|
|
1296
|
+
versions? page, not patterns in code +
|
|
1297
|
+
homepage in artifact
|
|
1298
|
+
\`\`\`
|
|
1299
|
+
|
|
1300
|
+
### Step 1 — Detect stack and versions
|
|
1301
|
+
|
|
1302
|
+
Read the project's dependency file. Cite the file you read.
|
|
1303
|
+
|
|
1304
|
+
| Manifest | Versions to extract |
|
|
1305
|
+
| --- | --- |
|
|
1306
|
+
| \`package.json\` + lockfile | Node engines, framework dep version (React, Vue, Next.js, Express, etc.), test runner, linter |
|
|
1307
|
+
| \`composer.json\` | PHP version, framework version (Symfony, Laravel) |
|
|
1308
|
+
| \`pyproject.toml\` / \`requirements.txt\` | Python version, framework version (Django, Flask, FastAPI) |
|
|
1309
|
+
| \`go.mod\` | Go version, framework version (gin, echo, chi) |
|
|
1310
|
+
| \`Cargo.toml\` | Rust edition, crate version |
|
|
1311
|
+
| \`Gemfile\` | Ruby version, framework version (Rails, Sinatra) |
|
|
1312
|
+
|
|
1313
|
+
Surface the result explicitly in the artifact:
|
|
1314
|
+
|
|
1315
|
+
\`\`\`text
|
|
1316
|
+
STACK DETECTED:
|
|
1317
|
+
- React 19.1.0 (from package.json)
|
|
1318
|
+
- Vite 6.2.0 (from package.json)
|
|
1319
|
+
- Tailwind CSS 4.0.3 (from package.json)
|
|
1320
|
+
→ Fetching official docs for the patterns this slug needs.
|
|
1321
|
+
\`\`\`
|
|
1322
|
+
|
|
1323
|
+
If a version is missing or ambiguous (e.g. \`"react": "^19.0.0"\`, lockfile pinned to a release-candidate), **ask the user once** before proceeding. Don't guess.
|
|
1324
|
+
|
|
1325
|
+
### Step 2 — Fetch official documentation
|
|
1326
|
+
|
|
1327
|
+
Fetch the **deep link** for the specific feature in scope. Not the homepage. Not the search result. Not "the React docs".
|
|
1328
|
+
|
|
1329
|
+
| Bad | Good |
|
|
1330
|
+
| --- | --- |
|
|
1331
|
+
| \`react.dev\` | \`react.dev/reference/react/useActionState#usage\` |
|
|
1332
|
+
| "search Django auth" | \`docs.djangoproject.com/en/6.0/topics/auth/\` |
|
|
1333
|
+
| StackOverflow answer | \`react.dev/blog/2024/12/05/react-19#actions\` |
|
|
1334
|
+
|
|
1335
|
+
**Source hierarchy** (in order of authority):
|
|
1336
|
+
|
|
1337
|
+
1. Official documentation for the detected version (\`react.dev\`, \`docs.djangoproject.com\`, \`symfony.com/doc\`).
|
|
1338
|
+
2. Official blog / changelog (\`react.dev/blog\`, \`nextjs.org/blog\`).
|
|
1339
|
+
3. Web standards (\`MDN\`, \`web.dev\`, \`html.spec.whatwg.org\`).
|
|
1340
|
+
4. Browser/runtime compatibility (\`caniuse.com\`, \`node.green\`).
|
|
1341
|
+
|
|
1342
|
+
**Not authoritative** — do not cite as primary:
|
|
1343
|
+
|
|
1344
|
+
- Stack Overflow answers (community Q&A, not a spec).
|
|
1345
|
+
- Blog posts or tutorials, even popular ones.
|
|
1346
|
+
- AI-generated documentation summaries.
|
|
1347
|
+
- Your own training data — that is the whole point.
|
|
1348
|
+
|
|
1349
|
+
If the detected version's docs disagree with an older blog post, the docs win. If two official sources conflict (e.g. migration guide vs. API reference), surface the conflict to the user; do not silently pick one.
|
|
1350
|
+
|
|
1351
|
+
### Step 3 — Implement following documented patterns
|
|
1352
|
+
|
|
1353
|
+
Match the API signatures and patterns in the doc page. If the docs deprecate a pattern, do not use the deprecated version.
|
|
1354
|
+
|
|
1355
|
+
When existing project code conflicts with current docs:
|
|
1356
|
+
|
|
1357
|
+
\`\`\`text
|
|
1358
|
+
CONFLICT DETECTED:
|
|
1359
|
+
The existing codebase uses \`useState\` for form loading state,
|
|
1360
|
+
but React 19 docs recommend \`useActionState\` for this pattern.
|
|
1361
|
+
(Source: https://react.dev/reference/react/useActionState)
|
|
1362
|
+
|
|
1363
|
+
Options:
|
|
1364
|
+
A) Adopt the modern pattern (useActionState) — matches current docs.
|
|
1365
|
+
B) Match existing code (useState) — keeps codebase consistent.
|
|
1366
|
+
→ Which approach do you prefer?
|
|
1367
|
+
\`\`\`
|
|
1368
|
+
|
|
1369
|
+
Do not silently adopt one. The user picks; the decision goes in \`decisions.md\` (architect mode) or in the plan body (planner mode).
|
|
1370
|
+
|
|
1371
|
+
### Step 4 — Cite sources inline
|
|
1372
|
+
|
|
1373
|
+
Every framework-specific decision gets a citation. The user must be able to verify every choice without trusting the agent's memory.
|
|
1374
|
+
|
|
1375
|
+
In **plan.md** / **decisions.md**, include a \`sources:\` block under the relevant AC or decision:
|
|
1376
|
+
|
|
1377
|
+
\`\`\`yaml
|
|
1378
|
+
sources:
|
|
1379
|
+
- url: https://react.dev/reference/react/useActionState#usage
|
|
1380
|
+
used_for: AC-1 (form submission state pattern)
|
|
1381
|
+
fetched_at: 2026-05-08T22:45Z
|
|
1382
|
+
version: react@19.1.0
|
|
1383
|
+
- url: https://react.dev/blog/2024/12/05/react-19#actions
|
|
1384
|
+
used_for: D-1 (rationale for picking useActionState over manual useState)
|
|
1385
|
+
fetched_at: 2026-05-08T22:46Z
|
|
1386
|
+
version: react@19.x
|
|
1387
|
+
\`\`\`
|
|
1388
|
+
|
|
1389
|
+
In **code comments**, cite the doc URL near the pattern:
|
|
1390
|
+
|
|
1391
|
+
\`\`\`typescript
|
|
1392
|
+
// React 19 form handling with useActionState.
|
|
1393
|
+
// Source: https://react.dev/reference/react/useActionState#usage
|
|
1394
|
+
const [state, formAction, isPending] = useActionState(submitOrder, initialState);
|
|
1395
|
+
\`\`\`
|
|
1396
|
+
|
|
1397
|
+
Citation rules:
|
|
1398
|
+
|
|
1399
|
+
- Full URLs, not shortened.
|
|
1400
|
+
- Prefer deep links with anchors (\`/useActionState#usage\` over \`/useActionState\`).
|
|
1401
|
+
- Quote the specific passage when it supports a non-obvious decision (e.g. "useTransition now supports async functions [...] to handle pending states automatically").
|
|
1402
|
+
- Include browser/runtime support data when recommending platform features.
|
|
1403
|
+
|
|
1404
|
+
## UNVERIFIED marker (when docs are missing)
|
|
1405
|
+
|
|
1406
|
+
If you cannot find official documentation for a pattern (cclaw's \`user-context7\` MCP returns nothing, the framework has no public docs for the feature, etc.):
|
|
1407
|
+
|
|
1408
|
+
- Mark the AC / decision with \`unverified: true\` in frontmatter.
|
|
1409
|
+
- Add an inline marker in the artifact body:
|
|
1410
|
+
|
|
1411
|
+
\`\`\`text
|
|
1412
|
+
UNVERIFIED: I could not find official documentation for this pattern.
|
|
1413
|
+
This is based on training data and may be outdated.
|
|
1414
|
+
Verify before using in production.
|
|
1415
|
+
\`\`\`
|
|
1416
|
+
|
|
1417
|
+
- The reviewer treats \`unverified: true\` as a finding (axis: correctness, severity: required) on iteration 1. Ship blocks until the user either confirms the pattern is intentional or surfaces a doc URL the agent can cite.
|
|
1418
|
+
|
|
1419
|
+
Honesty about what you couldn't verify is more valuable than confident guessing.
|
|
1420
|
+
|
|
1421
|
+
## Specialist contracts
|
|
1422
|
+
|
|
1423
|
+
- **planner** in \`source_driven\` envelope: every framework-specific AC carries a \`sources\` block (URL + which AC it supports + fetched timestamp + version). AC without a citation in framework code → reviewer F-N axis=correctness, severity=required.
|
|
1424
|
+
- **architect** in \`source_driven\` envelope: every \`D-N\` whose decision rests on framework behaviour (rendering model, state management strategy, persistence pattern, security posture) carries a \`sources\` block. Architects without a citation surface "I could not find current documentation; this decision is based on training data" — explicit, not silent.
|
|
1425
|
+
- **slice-builder** in \`source_driven\` envelope: pulls the URL from \`plan.md\` / \`decisions.md\` into the code comment when implementing the pattern. Does not independently re-fetch (architect/planner already did the work).
|
|
1426
|
+
- **reviewer** runs the citation check as part of the \`correctness\` axis pass. Open finding when:
|
|
1427
|
+
- a framework-specific AC has no \`sources\` block;
|
|
1428
|
+
- a citation URL is to a non-authoritative source (Stack Overflow, blog, training data);
|
|
1429
|
+
- a citation is to a doc page for a different framework version than the one in the project.
|
|
1430
|
+
|
|
1431
|
+
## MCP integration (when the harness has \`user-context7\`)
|
|
1432
|
+
|
|
1433
|
+
cclaw recognises \`user-context7\` as the source-of-truth fetcher. When \`source_driven: true\` is in the envelope, the planner / architect SHOULD prefer:
|
|
1434
|
+
|
|
1435
|
+
1. \`mcp_user-context7_resolve-library-id\` to map a package name to a Context7 library id.
|
|
1436
|
+
2. \`mcp_user-context7_get-library-docs\` to fetch the relevant docs at the detected version.
|
|
1437
|
+
|
|
1438
|
+
If the harness does not have \`user-context7\` (or the user disabled it), the specialist falls back to the harness's web-fetch tool (browser tool, fetch, curl) against the official docs URL — same source-hierarchy rules apply.
|
|
1439
|
+
|
|
1440
|
+
## Common pitfalls
|
|
1441
|
+
|
|
1442
|
+
- "I'm confident about this API" — confidence is not evidence. Verify.
|
|
1443
|
+
- "Fetching docs wastes tokens" — hallucinating wastes more. One fetch prevents an hour of debugging the deprecated signature.
|
|
1444
|
+
- "The docs won't have what I need" — if they don't, that is itself information; the pattern may not be officially recommended.
|
|
1445
|
+
- "I'll just disclaim 'might be outdated'" — disclaimers don't help. Either verify and cite, or mark UNVERIFIED.
|
|
1446
|
+
- "This task is simple, no need to check" — simple tasks become templates. The user copies your useState pattern into ten components before realising useActionState exists.
|
|
1447
|
+
- Fetching the homepage instead of the deep link. Token waste with no signal.
|
|
1448
|
+
- Citing the docs once but using the pattern from memory. The point of source-driven is you wrote what the doc said, not what you remembered the doc said.
|
|
1449
|
+
|
|
1450
|
+
## Verification checklist (reviewer uses this)
|
|
1451
|
+
|
|
1452
|
+
After implementing under \`source_driven\`:
|
|
1453
|
+
|
|
1454
|
+
- [ ] Stack and versions identified from a real manifest file (cited \`file:line\`).
|
|
1455
|
+
- [ ] Official docs fetched for each framework-specific pattern (deep link, not homepage).
|
|
1456
|
+
- [ ] No Stack Overflow / blog / training-data citations as primary sources.
|
|
1457
|
+
- [ ] Code follows current-version patterns (no deprecated APIs).
|
|
1458
|
+
- [ ] Non-trivial decisions include a \`sources\` block with full URL.
|
|
1459
|
+
- [ ] Conflicts between docs and existing project code surfaced to the user.
|
|
1460
|
+
- [ ] Anything unverifiable marked \`UNVERIFIED:\` explicitly.
|
|
1461
|
+
`;
|
|
1054
1462
|
export const AUTO_TRIGGER_SKILLS = [
|
|
1055
1463
|
{
|
|
1056
1464
|
id: "triage-gate",
|
|
@@ -1066,6 +1474,13 @@ export const AUTO_TRIGGER_SKILLS = [
|
|
|
1066
1474
|
triggers: ["start:/cc", "active-flow-detected"],
|
|
1067
1475
|
body: FLOW_RESUME
|
|
1068
1476
|
},
|
|
1477
|
+
{
|
|
1478
|
+
id: "pre-flight-assumptions",
|
|
1479
|
+
fileName: "pre-flight-assumptions.md",
|
|
1480
|
+
description: "Surface 3-7 default assumptions (stack, conventions, architecture defaults, out-of-scope) for the user to confirm before any specialist runs. Skipped on the inline path.",
|
|
1481
|
+
triggers: ["after:triage-gate", "before:first-dispatch"],
|
|
1482
|
+
body: PRE_FLIGHT_ASSUMPTIONS
|
|
1483
|
+
},
|
|
1069
1484
|
{
|
|
1070
1485
|
id: "plan-authoring",
|
|
1071
1486
|
fileName: "plan-authoring.md",
|
|
@@ -1156,5 +1571,12 @@ export const AUTO_TRIGGER_SKILLS = [
|
|
|
1156
1571
|
description: "Always-on guard against redundant verification, env-specific shims, and silent skip-and-pass fixes.",
|
|
1157
1572
|
triggers: ["always-on", "task:build", "task:fix-only", "task:recovery"],
|
|
1158
1573
|
body: ANTI_SLOP
|
|
1574
|
+
},
|
|
1575
|
+
{
|
|
1576
|
+
id: "source-driven",
|
|
1577
|
+
fileName: "source-driven.md",
|
|
1578
|
+
description: "Detect stack + versions from manifest, fetch official documentation deep-links, implement against documented patterns, cite URLs in plan/decisions/code. Default in strict mode for framework-specific work.",
|
|
1579
|
+
triggers: ["ac_mode:strict", "specialist:planner", "specialist:architect", "framework-specific-code-detected"],
|
|
1580
|
+
body: SOURCE_DRIVEN
|
|
1159
1581
|
}
|
|
1160
1582
|
];
|