groundwork-method 0.10.0 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (70) hide show
  1. package/CHANGELOG.md +42 -0
  2. package/bin/groundwork.js +86 -17
  3. package/dist/src/generators/system-test-runner/generator.js +52 -4
  4. package/dist/src/generators/system-test-runner/generator.js.map +1 -1
  5. package/package.json +1 -1
  6. package/src/docs/principles/design/usability-and-ux.md +11 -0
  7. package/src/docs/principles/foundations/testing.md +32 -6
  8. package/src/docs/principles/index.md +2 -1
  9. package/src/docs/principles/quality/observability.md +2 -2
  10. package/src/engineer-skills/groundwork-electron-engineer/SKILL.md +6 -1
  11. package/src/engineer-skills/groundwork-electron-engineer/references/documentation.md +126 -0
  12. package/src/engineer-skills/groundwork-electron-engineer/references/observability.md +37 -0
  13. package/src/engineer-skills/groundwork-electron-engineer/references/performance-and-reliability.md +80 -0
  14. package/src/engineer-skills/groundwork-electron-engineer/references/testing-and-smoke.md +22 -0
  15. package/src/engineer-skills/groundwork-electron-engineer/sync-anchor.md +12 -4
  16. package/src/engineer-skills/groundwork-flutter-engineer/SKILL.md +7 -1
  17. package/src/engineer-skills/groundwork-flutter-engineer/references/documentation.md +122 -0
  18. package/src/engineer-skills/groundwork-flutter-engineer/references/observability.md +37 -0
  19. package/src/engineer-skills/groundwork-flutter-engineer/references/performance-and-reliability.md +100 -0
  20. package/src/engineer-skills/groundwork-flutter-engineer/references/security.md +96 -0
  21. package/src/engineer-skills/groundwork-flutter-engineer/references/testing.md +25 -0
  22. package/src/engineer-skills/groundwork-flutter-engineer/sync-anchor.md +13 -4
  23. package/src/engineer-skills/groundwork-go-engineer/SKILL.md +5 -2
  24. package/src/engineer-skills/groundwork-go-engineer/references/documentation.md +130 -0
  25. package/src/engineer-skills/groundwork-go-engineer/references/testing.md +63 -1
  26. package/src/engineer-skills/groundwork-go-engineer/sync-anchor.md +13 -4
  27. package/src/engineer-skills/groundwork-nextjs-engineer/SKILL.md +6 -1
  28. package/src/engineer-skills/groundwork-nextjs-engineer/references/accessibility.md +111 -0
  29. package/src/engineer-skills/groundwork-nextjs-engineer/references/observability.md +48 -0
  30. package/src/engineer-skills/groundwork-nextjs-engineer/references/security.md +131 -0
  31. package/src/engineer-skills/groundwork-nextjs-engineer/references/testing.md +59 -1
  32. package/src/engineer-skills/groundwork-nextjs-engineer/references/ux-principles.md +1 -49
  33. package/src/engineer-skills/groundwork-nextjs-engineer/sync-anchor.md +10 -3
  34. package/src/engineer-skills/groundwork-python-engineer/SKILL.md +5 -2
  35. package/src/engineer-skills/groundwork-python-engineer/references/security.md +148 -0
  36. package/src/engineer-skills/groundwork-python-engineer/references/testing.md +40 -1
  37. package/src/engineer-skills/groundwork-python-engineer/sync-anchor.md +11 -4
  38. package/src/generators/electron-app/docs/principles/stack/electron/index.md +2 -0
  39. package/src/generators/electron-app/files/tests/smoke/app.spec.ts.template +73 -8
  40. package/src/generators/flutter-app/docs/principles/stack/flutter/testing.md +14 -2
  41. package/src/generators/flutter-app/files/integration_test/app_test.dart.template +46 -12
  42. package/src/generators/go-microservice/docs/principles/stack/go/testing.md +17 -1
  43. package/src/generators/python-microservice/docs/principles/stack/python/testing.md +41 -0
  44. package/src/generators/system-test-runner/NATIVE-CHECK-CONTRACT.md +20 -0
  45. package/src/generators/system-test-runner/files/tests/system/test_render_smoke.py.template +30 -0
  46. package/src/generators/system-test-runner/generator.ts +58 -4
  47. package/src/generators/workspace-dev-cli/cli-src/dist/dev-bundle.js +1 -1
  48. package/src/hidden-skills/code-intelligence.md +6 -0
  49. package/src/hidden-skills/groundwork-architect/SKILL.md +1 -1
  50. package/src/hidden-skills/groundwork-architect/sync-anchor.md +2 -2
  51. package/src/hidden-skills/groundwork-bet/briefs/acceptance-auditor.md +68 -0
  52. package/src/hidden-skills/groundwork-bet/briefs/blind-reviewer.md +56 -0
  53. package/src/hidden-skills/groundwork-bet/briefs/coverage-auditor.md +95 -0
  54. package/src/hidden-skills/groundwork-bet/briefs/edge-case-tracer.md +64 -0
  55. package/src/hidden-skills/groundwork-bet/briefs/experience-auditor.md +83 -0
  56. package/src/hidden-skills/groundwork-bet/briefs/slice-worker.md +92 -26
  57. package/src/hidden-skills/groundwork-bet/instructions.md +4 -4
  58. package/src/hidden-skills/groundwork-bet/templates/bet-progress-test.md +16 -27
  59. package/src/hidden-skills/groundwork-bet/templates/change-proposal.md +1 -1
  60. package/src/hidden-skills/groundwork-bet/templates/decomposition/milestone-index.md +12 -16
  61. package/src/hidden-skills/groundwork-bet/templates/decomposition/slice.md +4 -8
  62. package/src/hidden-skills/groundwork-bet/templates/technical-design/03-api-design.md +1 -1
  63. package/src/hidden-skills/groundwork-bet/workflows/01-discovery.md +3 -1
  64. package/src/hidden-skills/groundwork-bet/workflows/02-design.md +11 -1
  65. package/src/hidden-skills/groundwork-bet/workflows/03-decomposition.md +60 -64
  66. package/src/hidden-skills/groundwork-bet/workflows/04-delivery.md +75 -42
  67. package/src/hidden-skills/groundwork-bet/workflows/05-validation.md +18 -7
  68. package/src/hidden-skills/groundwork-designer/sync-anchor.md +1 -1
  69. package/src/hidden-skills/groundwork-persona/instructions.md +11 -0
  70. package/src/hidden-skills/groundwork-review/checklists/implementation-readiness.md +1 -0
@@ -6,11 +6,13 @@
6
6
 
7
7
  Delivery is an orchestration, not a single linear loop you run in one context. You are the **driver**: you hold the thin spine — the board, the milestone order, the delivery granularity the user chose, and the triage and course-correction judgement — and you keep that context small so you can reason about the bet as a whole.
8
8
 
9
- You do not implement slices in your own context. Each slice is delivered by a **fresh slice-worker subagent** (`briefs/slice-worker.md`) you dispatch with a tight context capsule; it implements to green and returns a short report, and its implementation reasoning dies with its context. You review every worker's diff through independent lenses, triage the findings, commit the slice, and at each milestone boundary you run the postmortem that decides whether the plan needs to change. This division is what keeps the heavy implementation context disposable and your own context clear enough to course-correct.
9
+ You do not implement slices in your own context. Each slice is delivered by a **fresh slice-worker subagent** (`briefs/slice-worker.md`) you dispatch with a tight context capsule; it implements to green, rolls out the slice's permanent best-practice tests, and returns a short report, and its implementation reasoning dies with its context. You review every worker's diff through independent lenses, triage the findings, commit the slice, and at each milestone boundary you run the postmortem that decides whether the plan needs to change. This division is what keeps the heavy implementation context disposable and your own context clear enough to course-correct.
10
+
11
+ **A note on voice.** This phase is dense with reading — the board, the git log, the approved prose, the previous slice's commit — and it is tempting to narrate each read as a discovery ("now I understand the protocol," "key finding: the slices are approved"). Don't: you are guiding the user through *their* delivery, not reporting your way through this file. Tell them where the bet stands and what is next, and state routine checks as plain fact (groundwork-persona, *Speak as the Guide, Not the Tourist*).
10
12
 
11
13
  ## Restrictions
12
14
 
13
- ⚠️ **CRITICAL CONSTRAINT — sealed prose is the fixed definition of done; the ladder advances by ratchet.** The decomposition (`docs/bets/<bet-slug>/decomposition/`) and the technical design were reviewed proof by proof, at assertion-grade scrutiny, approved by the user, and sealed at the `bet/<bet-slug>/approved` git tag which at delivery start covers the full milestone ladder (every rung's headline proof) plus the first milestone's slices. That sealed **prose** is the fixed definition of done — Delivery builds *against* it, never edits it. Two things are *not* edits to it and are expected: (1) the tests and the implementation are *built* this phase — Step 0.5 materializes the red board from the approved Proof-of-work prose, and the milestone loop turns it green; and (2) each later milestone's slices are **authored on arrival** (and a missing rung may be **added**), with the tag *ratcheting forward* to add them additive, recorded (`bet(<bet-slug>): author milestone <N>` / `add milestone <N>`), and gated by the same decomposition review. What is forbidden is *changing an already-sealed proof*: a sealed Proof-of-work proof that looks wrong is a stop-and-escalate through the Amendment Protocol below, not a quiet prose edit. The seal holds because the prose is under git from the tag forward `git diff bet/<bet-slug>/approved.. -- docs/bets/<bet-slug>/` shows additive next-rung authoring or rung-addition (legitimate) versus a modification to sealed prose (a defect unless it carries an approved amendment trail), and the prose-integrity reconciliation in the slice review tells them apart.
15
+ ⚠️ **CRITICAL CONSTRAINT — the approved prose is the definition of done; changing what it proves is a recorded amendment.** The decomposition (`docs/bets/<bet-slug>/decomposition/`) and the technical design were reviewed proof by proof, at assertion-grade scrutiny, and approved by the user recorded at the `bet(<bet-slug>): approve decomposition` commit, which at delivery start covers the full milestone ladder (every rung's headline proof) plus the first milestone's slices. That approved **prose** is the definition of done — Delivery builds *against* it. Three things are expected and need no special ceremony: (1) the tests and the implementation are *built* this phase — Step 0.5 materializes the red board from the approved Proof-of-work prose, and the milestone loop turns it green; (2) each later milestone's slices are **authored on arrival** (and a missing rung may be **added**), recorded by committing the new slice files and gated by the same decomposition review; (3) the slice **breakdown is steered freely** as delivery teaches you — adjusting the path to a milestone needs no record. What is *not* free is **changing what a milestone proves** — editing or dropping an agreed front-door case, weakening a Proof-of-work proof, loosening an API shape: that is an owner-approved **Amendment** (below), recorded as a commit beside the prose with a reason, so a later context can see it. The prose lives under git, so `git log -- docs/bets/<bet-slug>/decomposition/ docs/bets/<bet-slug>/technical-design/` shows the trail, and the prose-integrity reconciliation in the slice review confirms each built test still proves what its current approved prose describes.
14
16
 
15
17
  ⚠️ **CRITICAL CONSTRAINT — scope.** Each slice writes only the code required to make its bet-progress tests green and satisfy the API and data design in `docs/bets/<bet-slug>/technical-design/`. Stay within the milestones and slices in the decomposition tree. No large refactors, no touching unrelated subsystems. If reality contradicts the locked design, follow Change Navigation below.
16
18
 
@@ -20,11 +22,37 @@ This workflow operates under the protocols defined in `.groundwork/skills/operat
20
22
 
21
23
  Subagent dispatch follows Protocol 9's mechanics throughout this phase — the slice-worker and every review lens run as isolated subagents (the `Task` tool in Claude Code), and only their reports flow back. A host with no subagent mechanism cannot run this phase as designed; surface that to the user before starting rather than collapsing the workers into your own context.
22
24
 
25
+ ## Git workflow: a branch per bet, a commit per slice
26
+
27
+ Delivery's unit of git isolation is the **bet**, not the slice. The bet rides one short-lived branch — `bet/<bet-slug>`, the line of history the approved decomposition commit already sits on — worked inside one **worktree** isolated from `main` and from any other bet in flight. Every slice-worker for this bet operates in that one worktree, in order; you commit each slice onto the branch as you close it. The branch merges to trunk once, at bet close (Validation, `05-validation.md`).
28
+
29
+ Slices are sequential by construction — a slice reads the previous slice's delivery commit, and a slice wires onto a prior slice's contract already proven green — so there is no parallelism to win by giving each slice its own worktree, and real hazard in trying: worktrees share one object store, and a second writer racing `.git/index.lock` or silently falling back to the main checkout is how agent runs lose work undetected. Keep parallelism where it belongs — across bets (each its own worktree and branch) and across the review lenses (read-only; they read the diff and need no worktree). The worker-leaves-it-unstaged, driver-reviews-then-commits handoff (Slice Loop below) works *because* worker and driver share this one serial worktree: the worker hands you a working-tree diff, not a branch to merge.
30
+
31
+ **Open the isolation before the red board.** The worktree and branch must exist before Step 0.5 commits the red board into them. If the bet is not already on its own branch and worktree from its earlier phases, open them now: branch `bet/<bet-slug>` from a clean `origin/<trunk>`, in a worktree under a gitignored path (the host's convention — e.g. `.worktrees/<bet-slug>/`). One branch lives in one worktree — never check the same branch out twice.
32
+
33
+ **Bootstrap the worktree once, before the first worker.** A fresh worktree shares the object store but not the working tree — dependencies, gitignored env and secrets, and submodule contents are all absent until you put them there. Before dispatching slice one: install dependencies, copy in the gitignored env/secret files the services need, run `git submodule update --init --recursive` if the project uses submodules, assign isolated ports / a scratch database if the bet boots services, and **build the code map for the worktree** (`npx groundwork-method repo-map` — deterministic, incremental, no network) so each worker's Step-1 orientation has a current map to read. This is the working-directory contract every worker capsule then points at (Slice Loop §1). One caveat on code intelligence in a worktree: Serena is registered in the tracked `.mcp.json` with `--project .`, which resolves to the *session* root, not the worktree path — so treat Serena as best-effort here and rely on the freshly-built repo map plus the graceful-degradation contract (`.groundwork/skills/code-intelligence.md`) rather than assuming the symbol tools are live in the worktree.
34
+
35
+ **Slice = one commit on the branch; milestone = a checkpoint, not a merge.** You commit each slice as you close it (Slice Loop §4) — one Conventional Commit per slice, history preserved, never squashed: the per-slice commits are the record the next slice's capsule and Validation's retrospective both read. A milestone closing is a green, reviewed, postmortem'd checkpoint *on the branch* — nothing merges to trunk yet. Slices integrating onto the bet branch continuously is the integration that catches slice-to-slice breakage; merging to trunk is a separate, later event.
36
+
37
+ **Push the bet branch as you go — off-machine backup.** The bet's worktree lives on one machine; until something leaves it, a single disk failure loses the whole delivery. So as delivery proceeds, push the branch to its own remote — `git push -u origin bet/<bet-slug>` on the first push, plain `git push` after — so every closed slice is backed up off the machine. This is **backup, not integration**: an isolated `bet/<bet-slug>` branch publishes nothing into trunk or anyone's path, so it does not carry the user gate the bet-close trunk merge does — it is the safety net the per-slice commit discipline assumes, run routinely without asking. Push right after each slice's delivery commit (Slice Loop §4), or at a minimum at every milestone close. During delivery the branch only grows — the single rebase happens at bet close — so these are fast-forward pushes, never forced; the backup branch is torn down with the local branch once the bet merges to trunk (Validation Step 8.5). On a project with no remote this is a no-op — and keep the project's CI scoped to pull requests and trunk so a routine backup push does not fire the full pipeline on every slice.
38
+
39
+ **Bet close = the single merge to trunk.** When Validation is green and the bet is done, integrate it to trunk in one user-gated step (merging to a shared branch is a push-class action — the user's call, never the driver's alone): rebase `bet/<bet-slug>` onto current `origin/<trunk>` to absorb drift, fast-forward merge, then remove the worktree (`git worktree remove`, then `git worktree prune` — never `rm -rf`) and delete the branch. Trunk only ever receives a complete, validated bet, so there is nothing half-built on it and no feature flag is required to keep it releasable. (A team may still flag a risky rollout in its own code; the delivery contract does not mandate it.) An appetite-bounded bet is a *short-lived* branch — the days-long feature branch trunk-based development permits, not the weeks-long drift it warns against. The mechanical merge step runs at the end of Validation (`05-validation.md`, Step 8.5).
40
+
41
+ ### Recording a cross-service slice, by repository topology
42
+
43
+ A slice can span more than one service; how it commits depends on how the project's repositories are laid out. The scaffold produces a **monorepo**, and that is the path the rest of this workflow assumes; the other two topologies are supported with the deltas below.
44
+
45
+ - **Monorepo** *(scaffold default — friendliest)*. All services and the bet's artifacts (`docs/bets/`, `tests/bets/`) live at one workspace root under one branch. A cross-service slice is **one atomic commit** spanning the service directories — the slice *is* the unit of record, exactly as Slice Loop §4 writes it. The "derive the client from the producer's canonical contract" rule is satisfied in-tree: the consumer's generated client and the producer's contract land in the same commit. A backwards-incompatible cross-service change is not one mega-commit — slice it expand → migrate → contract (add the new shape beside the old, migrate each consumer, then remove the old) so each step is independently green and revertible.
46
+
47
+ - **Submodules** *(if the project nests each service as a submodule of a superrepo)*. The worktree is the superrepo's; each affected service is a submodule. For each service the slice touches: the worker edits inside the submodule **on a real branch — never a detached HEAD**, or the commit is silently lost on the next submodule update; you commit there and **push the submodule first**, then record a **gitlink-bump commit in the superrepo** — that superrepo commit, referencing the child SHAs, is the slice of record. The worktree bootstrap must have run `submodule update --init`. This topology costs a two-step, two-repo commit per service and is hostile to automated workers; prefer the monorepo unless the project is already committed to submodules.
48
+
49
+ - **Polyrepo** *(if each service is its own repository)*. There is no single commit that spans the slice. Carry a **change-set id** across the N repositories' branches, record the slice as a **manifest binding the N commits** in the bet's home repo (where `docs/bets/<bet-slug>/` lives), and gate integration on producer-before-consumer ordering — or backward-compatibility plus a contract check (e.g. Pact `can-i-deploy`, `buf breaking`) — rather than a shared green build. This is the most expensive topology for the per-slice-commit model; reach for it only when the services genuinely must ship from separate repos.
50
+
23
51
  ## Step 0: Implementation Readiness Gate
24
52
 
25
- Before any slice work, verify the bet is actually executable. Load `.groundwork/skills/groundwork-review/checklists/implementation-readiness.md` and check every item against the bet's artifacts — the document chain, the API and data design, the approval tag, and currency. If the checklist file is absent, stop and report it — the install is broken and `npx groundwork-method update` restores it; do not improvise the gate from memory. These are mechanical existence and consistency checks; run them inline (no review subagent — the artifacts were already authorship-gated when their phases committed them, and there is nothing here to be biased about).
53
+ Before any slice work, verify the bet is actually executable. Load `.groundwork/skills/groundwork-review/checklists/implementation-readiness.md` and check every item against the bet's artifacts — the document chain, the API and data design, the approved decomposition commit, and currency. If the checklist file is absent, stop and report it — the install is broken and `npx groundwork-method update` restores it; do not improvise the gate from memory. These are mechanical existence and consistency checks; run them inline (no review subagent — the artifacts were already authorship-gated when their phases committed them, and there is nothing here to be biased about).
26
54
 
27
- The gate is fail-closed: any 🔴 item blocks delivery. Report each failed item by name with what is missing, route back to the owning phase (a missing interface design → Design Foundations; an absent approval tag or incomplete decomposition tree → Decomposition; an unreconciled discovery note → resolve it with the user now), and do not begin implementation until the item passes. 🟡 items are surfaced to the user with your read on whether they touch this bet; the user decides whether to proceed.
55
+ The gate is fail-closed: any 🔴 item blocks delivery. Report each failed item by name with what is missing, route back to the owning phase (a missing interface design → Design Foundations; an unapproved or incomplete decomposition tree → Decomposition; an unreconciled discovery note → resolve it with the user now), and do not begin implementation until the item passes. 🟡 items are surfaced to the user with your read on whether they touch this bet; the user decides whether to proceed.
28
56
 
29
57
  When every 🔴 item passes, state so in one line, update `docs/bets/<bet-slug>/pitch.md` frontmatter to `status: delivery`, and inform the user you are entering Developer Mode.
30
58
 
@@ -39,7 +67,7 @@ tests/bets/<bet-slug>/test_milestone_<N>_<milestone-slug>.<ext>
39
67
  tests/bets/<bet-slug>/test_slice_<N>_<service>_<slice-slug>.<ext>
40
68
  ```
41
69
 
42
- Consult `.groundwork/skills/groundwork-bet/templates/bet-progress-test.md` for the placeholder pattern and quality criteria. Run the suite once and confirm **every stub is red** — red because the implementation does not exist, not because of an import or fixture error. That red board is the bet's live progress display: `./dev bet status` reads it, and red→green is "see how far we've come." Commit the red board (e.g. `bet(<bet-slug>): materialize red board`) before opening the first slice — it is the build artifact the slice loop fills in, generated *from* the sealed prose and free to change, never the tamper target.
70
+ Consult `.groundwork/skills/groundwork-bet/templates/bet-progress-test.md` for the placeholder pattern and quality criteria. Run the suite once and confirm **every stub is red** — red because the implementation does not exist, not because of an import or fixture error. That red board is the bet's live progress display: `./dev bet status` reads it, and red→green is "see how far we've come." Commit the red board (e.g. `bet(<bet-slug>): materialize red board`) before opening the first slice — it is the build artifact the slice loop fills in, generated *from* the approved prose and free to change.
43
71
 
44
72
  The scaffold and the `./dev` CLI are a starting point you keep shaping as the product grows. When a repeated delivery task earns it, or shipped tooling does not fit the work, adapt the tooling rather than scripting around it — add a project command under `.dev/commands/`, register a runner, or extend the relevant scaffold. Never leave a shipped command inert and never build a parallel tool beside it (the *no empty capabilities* rule, `docs/principles/delivery/day-2-operational-baseline.md`).
45
73
 
@@ -61,19 +89,19 @@ State the chosen mode back in one line, then begin the milestone loop. The choic
61
89
 
62
90
  ## The Milestone Loop
63
91
 
64
- Work through the milestone ladder in order. For each milestone: if its slices are not yet authored (every milestone after the first), **open it** — author its slices and ratchet the seal (see *Opening a milestone* below) — then drive its slices to green (the Slice Loop), close the milestone, and run the milestone postmortem before moving to the next. The first milestone's slices were authored and sealed at decomposition, so it opens straight into the Slice Loop. A fresh context resumes by reading the board (`./dev bet status` renders red/green per milestone and slice from the suite) and the git log of delivery commits, not a manifest — the first red slice is where to pick up; a milestone whose headline stub is red but which has no slice files yet is the next one to open.
92
+ Work through the milestone ladder in order. For each milestone: if its slices are not yet authored (every milestone after the first), **open it** — author and record its slices (see *Opening a milestone* below) — then drive its slices to green (the Slice Loop), close the milestone, and run the milestone postmortem before moving to the next. The first milestone's slices were authored and approved at decomposition, so it opens straight into the Slice Loop. A fresh context resumes by reading the board (`./dev bet status` renders red/green per milestone and slice from the suite) and the git log of delivery commits, not a manifest — the first red slice is where to pick up; a milestone whose headline stub is red but which has no slice files yet is the next one to open.
65
93
 
66
- When the decomposition types its slices (`surface: core` or a surface slug registry projects), core slices merge before the surface slices that consume them. A surface slice wires a contract that must already be proven green, not one being built beneath it in parallel — the milestone order already encodes this (the capability milestone opens the bet); hold it at slice granularity too. The slice-worker capsule for a surface slice includes the capability milestone's green test file — the contract proof the slice builds on.
94
+ Slices run in sequence, each built on the proven state of the one before it. A slice that wires onto a contract another slice established consumes one already proven green, not one being built beneath it in parallel — the slice order encodes this. When a slice builds on a prior slice's proven contract, the slice-worker capsule includes that prior slice's green test file — the proof it builds on.
67
95
 
68
96
  ### Opening a milestone — authoring the next rung
69
97
 
70
- Every milestone after the first is *unsliced* until Delivery reaches it: decomposition sealed its headline proof in the ladder, not its slices. Opening it is where those slices are authored — and the reason they were deferred is that they are now written from what the milestones before them *actually taught*, not from an up-front guess. This is *plan just enough* in motion: the rung you are about to climb gets detailed using ground truth.
98
+ Every milestone after the first is *unsliced* until Delivery reaches it: decomposition approved its headline proof in the ladder, not its slices. Opening it is where those slices are authored — and the reason they were deferred is that they are now written from what the milestones before them *actually taught*, not from an up-front guess. This is *plan just enough* in motion: the rung you are about to climb gets detailed using ground truth.
71
99
 
72
- For milestone 1 there is nothing to open — its slices were authored and sealed at decomposition; roll straight into the Slice Loop. For every later milestone, open it at the end of the previous milestone's postmortem (the postmortem is the look-up; this is the act it produces):
100
+ For milestone 1 there is nothing to open — its slices were authored and approved at decomposition; roll straight into the Slice Loop. For every later milestone, open it at the end of the previous milestone's postmortem (the postmortem is the look-up; this is the act it produces):
73
101
 
74
- 1. **Author the milestone's slices** following Decomposition Step 4–5 (`workflows/03-decomposition.md`) — vertical slices, falsifiable Required Capabilities tracing to the design, a headline Proof of work per slice, all consistent with the milestone's sealed headline proof. Apply what the delivered milestones taught: a slice the design foresaw may now be redundant, a boundary may now need a slice the design missed.
102
+ 1. **Author the milestone's slices** following Decomposition Step 4–5 (`workflows/03-decomposition.md`) — vertical slices, falsifiable Required Capabilities tracing to the design, a headline Proof of work per slice, all consistent with the milestone's approved headline proof. Apply what the delivered milestones taught: a slice the design foresaw may now be redundant, a boundary may now need a slice the design missed.
75
103
  2. **Review them** — run the Decomposition Gate scoped to this milestone, then the Protocol 9 decomposition review on the new slice files (fail-closed, exactly as Decomposition Step 6). Revise to a clean verdict.
76
- 3. **Ratchet the seal** — on the user's approval (the postmortem already pauses for them in slice and milestone modes), commit the new slice files and advance the tag: `git tag -f bet/<bet-slug>/approved`, message `bet(<bet-slug>): author milestone <N>`. The ratchet is additive — it adds this rung's slices and never reopens a sealed proof.
104
+ 3. **Record the authored slices** — on the user's approval (the postmortem already pauses for them in slice and milestone modes), commit the new slice files (`bet(<bet-slug>): author milestone <N>`). This is additive authoring, recorded in history — it adds this rung's slices and changes no existing proof.
77
105
  4. **Materialize this milestone's slice stubs** (Step 0.5's procedure, scoped to the new slices) and commit the extended red board before the Slice Loop opens its first slice.
78
106
 
79
107
  If opening the milestone reveals the *headline proof itself* is now wrong — not just its slices — that is not authoring: route it through the Amendment Protocol or Change Navigation below.
@@ -83,10 +111,10 @@ If opening the milestone reveals the *headline proof itself* is now wrong — no
83
111
  The ladder is fluid: a postmortem can reveal that a milestone is *missing* — a demonstrable state the bet needs that the up-front ladder did not foresee. Introducing a new rung is a supported, first-class move, not a process failure. Because downstream milestones are unsliced, inserting or re-ordering a rung is cheap — there are no authored slices to unwind.
84
112
 
85
113
  1. **Appetite check first.** Confirm the new rung fits the bet's **appetite** and is derivable from the locked design. If it would *exceed* the appetite, or needs capability the technical design never covered, stop — that is Change Navigation (re-scope the appetite, or carve the work to a future bet / no-go), not a ladder amendment. Never grow the ladder silently to absorb scope the bet did not bet on. The "2–5 milestones; more is a roadmap" rule (`workflows/03-decomposition.md`) still bounds the grown ladder.
86
- 2. **Author the new milestone's `index.md`** with an **un-mockable headline proof**, placed and numbered at the right rung (re-number unopened downstream milestone folders as needed — cheap, they are unsliced).
114
+ 2. **Author the new milestone's `index.md`** with a **front-door headline proof** (a real outcome a named consumer observes, driven through the real product), placed and numbered at the right rung (re-number unopened downstream milestone folders as needed — cheap, they are unsliced).
87
115
  3. **Review it** — the Decomposition Gate scoped to the new milestone, then the Protocol 9 decomposition review (fail-closed). Revise to a clean verdict.
88
116
  4. **User approval is a hard stop** — adding a success-signal rung changes the definition of done, so it is the user's call, surfaced at the postmortem.
89
- 5. **Ratchet the seal** additively (`git tag -f bet/<bet-slug>/approved`, message `bet(<bet-slug>): add milestone <N>`) and **materialize the new milestone's headline stub**. Its slices are authored when Delivery reaches it, via *Opening a milestone* above. The new rung never reopens a sealed proof.
117
+ 5. **Record the new rung** additively (commit the new `index.md`, `bet(<bet-slug>): add milestone <N>`) and **materialize the new milestone's headline stub**. Its slices are authored when Delivery reaches it, via *Opening a milestone* above. Adding a rung changes no existing proof.
90
118
 
91
119
  ### The Slice Loop — the driver's per-slice sequence
92
120
 
@@ -94,32 +122,38 @@ For each slice in the milestone, in order:
94
122
 
95
123
  #### 1. Dispatch the slice-worker
96
124
 
97
- Assemble the context capsule and dispatch a fresh slice-worker subagent (Protocol 9 mechanics — an isolated subagent loading `.groundwork/skills/groundwork-bet/briefs/slice-worker.md`). Pass it:
125
+ Assemble the context capsule and dispatch a fresh slice-worker subagent (Protocol 9 mechanics — an isolated subagent loading `.groundwork/skills/groundwork-bet/briefs/slice-worker.md`). The capsule is **pointers and slice-specifics, not a paraphrase of the brief** — the worker reads the brief for the process (build the real thing, self-reconcile, do not commit, the report format); restating that here only bloats the capsule and drifts from the brief when it changes. Pass:
98
126
 
99
127
  - `bet_slug` and the slice's `slice_file` path under `docs/bets/<bet-slug>/decomposition/`.
128
+ - The **working directory & isolation contract** — the bet's worktree path (the bootstrapped one from the Git workflow above), the instruction to run every command from it, leave all changes **unstaged**, and **not** re-isolate (no new worktree, branch, or `EnterWorktree`). The worker builds in the worktree you prepared; it does not manage its own.
100
129
  - The **previous slice's delivery commit** — hash, message, and the instruction to read the diff. Its established patterns, eaten review findings, and `Notes:` line are how this slice repeats lessons instead of mistakes.
101
130
  - The **exact existing files this slice modifies**, named, to read in full.
102
- - For a **surface** slice, the **capability milestone's green test file** — the contract it wires onto.
131
+ - When the slice **builds on a prior slice's proven contract**, that slice's **green test file** — the proof it wires onto.
103
132
  - The slice's materialized red `Test file:` path(s).
133
+ - The **stack's testing strategy** — the promoted engineer skill for the slice's stack (`.agents/skills/groundwork-<stack>-engineer/references/testing.md`), whose **Bet Slice Rollout** section defines the permanent best-practice tests this slice owes. The worker rolls out against it and the coverage-auditor lens reviews against it, so name the right stack when a slice spans more than one.
134
+ - Any **slice-specific constraints** the brief cannot know — a frozen signature it must not change, a subsystem it must not touch, a safety or content guardrail, the fixtures it must prove on. State them as hard constraints, not suggestions.
135
+ - Any **prior spike or proven recipe** that de-risks this slice — a working invocation, a validated config, a dependency already on disk. If it lives in an ephemeral location (a job-temp or scratch path), instruct the worker to relocate what it needs into a durable path in the repo or the project cache **and never depend on the ephemeral path at runtime**.
136
+ - If the slice should leave specific facts for the next slice, name them so the worker's `NOTES:` captures them (a transport chosen, a model id, a durable location, a command to rebuild an artifact).
104
137
 
105
- The worker implements to green inside the locked design, runs its mechanical self-reconcile, and returns a short report (files touched, `NOTES:`, self-reconcile result, and any `BLOCKING CONCERN`). It does not commit. Keeping the capsule tight is what keeps the worker bounded: it reads what it needs to change and the contract it builds on, not the whole bet.
138
+ The worker implements to green inside the locked design, rolls out the slice's permanent best-practice tests per its stack's testing strategy, runs its mechanical self-reconcile, and returns a short report (files touched, `COVERAGE:`, `NOTES:`, self-reconcile result, and any `BLOCKING CONCERN`). It does not commit. Keeping the capsule tight is what keeps the worker bounded: it reads what it needs to change and the contract it builds on, not the whole bet.
106
139
 
107
140
  **Act on a `BLOCKING CONCERN` before reviewing.** A worker that reports an approved proof looks wrong, that reality contradicts the locked design, or that a real dependency the proof names cannot be reached has hit a hard stop — route it through the Amendment Protocol or Change Navigation below (and pause for the user) before any further slice work. Do not let a worker that could not honestly reach green be reviewed as if it did.
108
141
 
109
142
  #### 2. Review the slice
110
143
 
111
- A worker's green report is the author's account of its own work; it is not the gate. Review the slice's uncommitted diff before closing it — and note that the test files are *built* this phase and are *supposed* to change; what is sealed is the prose.
144
+ A worker's green report is the author's account of its own work; it is not the gate. Review the slice's uncommitted diff before closing it — and note that the test files are *built* this phase and are *supposed* to change; what is fixed is the approved prose.
112
145
 
113
146
  **First, reconcile against the approved prose (mechanical — run it yourself, no subagent).** The worker's self-reconcile is a first pass; confirm it.
114
147
 
115
- - **Prose integrity.** The approved contract is the decomposition tree and the technical design, sealed at the `bet/<bet-slug>/approved` tag. Confirm it has not silently moved: `git diff bet/<bet-slug>/approved.. -- docs/bets/<bet-slug>/decomposition/ docs/bets/<bet-slug>/technical-design/` shows no change except an approved amendment (the Amendment Protocol's commit trail). A Proof-of-work proof, an acceptance criterion, or an API shape changed without that trail — a weakened proof, a dropped case, a loosened shape — is a `decision-needed` finding. (Most slices change no prose at all; then this is a one-line no-op.) The built test must also still honestly prove what its slice's Proof-of-work prose describes — a stub filled in to assert less than the prose promised is the same finding.
116
- - **Honest green.** The implementation must satisfy the proof for the right reason. A return value hardcoded to the test's expected output, an input special-cased to the fixture, a `if TEST_MODE`-style branch, or a mocked-out unit of real work is a `decision-needed` finding even though the suite is green — *a weak suite that generated code passes is worse than no suite* (`docs/principles/foundations/testing.md`). A worker's `SELF-RECONCILE` flag here is a lead to confirm, not a verdict to trust.
148
+ - **Prose integrity.** The approved contract is the decomposition tree and the technical design. Confirm it has not silently moved: `git log --oneline -- docs/bets/<bet-slug>/decomposition/ docs/bets/<bet-slug>/technical-design/` since the approval commit shows only recorded amendments (each an Amendment commit carrying a reason). A Proof-of-work proof, an acceptance criterion, or an API shape changed without that recorded amendment — a weakened proof, a dropped case, a loosened shape — is a `decision-needed` finding. (Most slices change no prose at all; then this is a one-line no-op.) The built test must also still honestly prove what its slice's Proof-of-work prose describes — a stub filled in to assert less than the prose promised is the same finding.
149
+ - **Honest green.** The implementation must satisfy the proof for the right reason, against the real product. A return value hardcoded to the test's expected output, an input special-cased to the fixture, a `if TEST_MODE`-style branch, a mocked-out unit of real work, or a fixture standing in for a real pipeline stage that nothing else produces is a `decision-needed` finding even though the suite is green — *a weak suite that generated code passes is worse than no suite* (`docs/principles/foundations/testing.md`). Two extensions make this bite at delivery: **a fake needs a real test behind it** — if this slice leans on a fixture or stub for work a real stage should do, confirm some test exercises the real producer, or the fixture is a green light wired to nothing; and **the proof runs against the shipping build** — the artifact a user actually launches (the packaged app, the embedded worker), not a test target that runs code the shipping build never includes. A worker's `SELF-RECONCILE` flag here is a lead to confirm, not a verdict to trust.
117
150
 
118
- **Then dispatch the slice diff for review** through three parallel, independent lenses (Protocol 9 mechanics — isolated subagents; the three lenses catch different failure classes and none substitutes for another; none is the slice-worker, which authored the diff and cannot judge it):
151
+ **Then dispatch the slice diff for review** through four parallel, independent lenses (Protocol 9 mechanics — isolated subagents, each loading its brief under `.groundwork/skills/groundwork-bet/briefs/`; the four lenses catch different failure classes and none substitutes for another; none is the slice-worker, which authored the diff and cannot judge it):
119
152
 
120
- - **Blind reviewer** — receives only the diff, no bet context. Familiarity hides bugs; this lens has none.
121
- - **Edge-case tracer** — receives the diff plus repo read access. Walks every branch and boundary the diff introduces and reports only unhandled paths: null/empty inputs, failure timing, races, off-by-ones.
122
- - **Acceptance auditor** — receives the diff, the slice's Required Capabilities, and the prose API/data design (`technical-design/03-api-design.md`, `04-data-design.md`). Verifies the implementation does what the design says and nothing more — and that it does so honestly: the service's generated contract matches the prose shapes; undeclared endpoints, fields beyond the design, silently skipped error cases, and implementation gamed to the test (hardcoded returns, special-cased inputs, test-only branches) are findings even when tests pass.
153
+ - **Blind reviewer** (`briefs/blind-reviewer.md`) — receives only the diff, no bet context. Familiarity hides bugs; this lens has none.
154
+ - **Edge-case tracer** (`briefs/edge-case-tracer.md`) — receives the diff plus repo read access. Walks every branch and boundary the diff introduces and reports only unhandled paths: null/empty inputs, failure timing, races, off-by-ones.
155
+ - **Acceptance auditor** (`briefs/acceptance-auditor.md`) — receives the diff, the slice's Required Capabilities, and the prose API/data design (`technical-design/03-api-design.md`, `04-data-design.md`). Verifies the implementation does what the design says and nothing more — and that it does so honestly: the service's generated contract matches the prose shapes; undeclared endpoints, fields beyond the design, silently skipped error cases, and implementation gamed to the test (hardcoded returns, special-cased inputs, test-only branches) are findings even when tests pass.
156
+ - **Coverage auditor** (`briefs/coverage-auditor.md`) — receives the diff, the slice's Required Capabilities, and the stack's testing strategy (`.agents/skills/groundwork-<stack>-engineer/references/testing.md`). Judges the permanent best-practice tests the worker rolled out — they are in this diff now, so they are reviewable here — against that strategy: are the error and boundary cases covered to the rigour of the happy path, does the complex logic the slice introduced carry a unit test, did an observable path get its critical-path trace assertion, and for a `graphical-ui` slice are the named component states present. A sociable test that executes a branch without asserting on it is a gap even on a green board, and a surviving mutant on changed high-risk code is the evidence to cite. This is the lens that makes comprehensive coverage a reviewed obligation rather than a hope — the seam the honest-green check and the acceptance auditor both leave open.
123
157
 
124
158
  **Triage every finding** into exactly one bucket, deduplicating across lenses and the reconciliation:
125
159
 
@@ -132,50 +166,49 @@ A worker's green report is the author's account of its own work; it is not the g
132
166
 
133
167
  Apply `patch` fixes yourself when small and bounded, or re-dispatch a worker for a larger one. A slice closes only with zero open decision-needed and patch findings.
134
168
 
135
- #### 3. Roll out permanent tests
136
-
137
- Once the slice's bet-progress tests are green and the review is clear, roll out that slice's **permanent best-practice tests** — interface tests, HTTP API system tests, honeycomb service-perimeter tests, and unit tests for complex logic, per the project's testing strategy. For a `graphical-ui` slice, this includes **component render tests** across the states the design system names — default, loading, empty, error, long-content — so a component that throws on a prop or state combination is caught in isolation before any page integrates it (the scaffold ships the pattern at `components/render-smoke.test.tsx`); and adding any new route to `tests/system/routes.json` so the permanent render-smoke, geometry, and a11y gates sweep it. These live in the service repos and `tests/system/`, not in `tests/bets/`, and stay in the codebase after the bet is archived.
169
+ #### 3. Record and close the slice
138
170
 
139
- #### 4. Record and close the slice
140
-
141
- Commit the slice — that commit **is** the record, and the driver writes it (the worker left the changes unstaged). Use a structured message: a `bet(<bet-slug>): slice <N.M> <slice-slug>` subject, then a body listing every file added, modified, or deleted, and a `Notes:` line — one or two sentences on what the next slice should know: a pattern established, a deviation taken and why, a struggle worth not repeating (carry the worker's `NOTES:` forward here). The commit is what makes the next slice's capsule and Validation's retrospective possible; an empty `Notes:` on a slice that fought us is a record that lies. The slice flips green on the board the moment its tests pass — no status field to maintain.
171
+ Commit the slice — that commit **is** the record, and the driver writes it (the worker left the changes unstaged). Use a structured message: a `bet(<bet-slug>): slice <N.M> <slice-slug>` subject, then a body listing every file added, modified, or deleted, and a `Notes:` line — one or two sentences on what the next slice should know: a pattern established, a deviation taken and why, a struggle worth not repeating (carry the worker's `NOTES:` forward here). The commit is what makes the next slice's capsule and Validation's retrospective possible; an empty `Notes:` on a slice that fought us is a record that lies. The slice flips green on the board the moment its tests pass — no status field to maintain. Then push the branch (`git push`) so the closed slice is backed up off the machine — backup, not integration (Git workflow above); skip it only on a project with no remote.
142
172
 
143
173
  **In slice-by-slice mode, pause here** — show the user the closed slice (what it proved, what the review found, the commit) and confirm before dispatching the next worker. In milestone and whole-bet modes, continue to the next slice without pausing.
144
174
 
145
- ### Milestone close
175
+ ### Milestone close — prove it at the front door
176
+
177
+ A milestone is done when its **agreed front-door test cases pass against the real product** — the shipping build, on real data, in conditions as close to real as the environment allows — not when its slices are each individually green. Run the milestone's bet-progress tests (`test_milestone_<n>_*`); the milestone shows green on the board (`./dev bet status`) once its proof passes. But green at the suite is the floor, not the finish: closing the milestone is confirming the consumer's outcome actually holds at their surface.
146
178
 
147
- After all of a milestone's slices are delivered, run the milestone's bet-progress tests (`test_milestone_<n>_*`) to confirm the milestone's full demonstrable outcome. The milestone shows green on the board (`./dev bet status`) once its proof passes the board is derived from the suite, so there is nothing to mark.
179
+ **Prove it in the consumer's medium.** A behavioural test asserting a selector exists passes while the rendered page is blank, throwing, unstyled, or showing an error-boundary fallback — the bug class assertion tests cannot see. For a milestone whose consumer is at a screen (`graphical-ui`), drive the *running* app and verify what they see; a `cli` or `agentic-protocol` milestone proves at its own front door (the command's real output, the response structure) and pays nothing for the pixel tiers below.
148
180
 
149
- **Visual verificationgraphical surface milestones only.** A behavioural test asserting a selector exists passes while the rendered page is blank, throwing, unstyled, or showing an error-boundary fallback the bug class assertion tests cannot see. Before a milestone that closed a `graphical-ui` surface is marked `delivered`, run the ladder against the *running* app; skip this entirely for `core`/`cli`/`agentic-protocol` milestones, which pay nothing.
181
+ 1. **Tier 1the deterministic floor is green.** The permanent `tests/system/test_render_smoke.py`, `test_a11y_smoke.py`, and `test_token_conformance.py` run as part of the suite: navigation returns 2xx/3xx, zero `error`-level console output, zero uncaught exceptions, no failed same-origin requests, no error overlay, a non-blank render across the viewport × theme matrix, the axe gate at the design system's accessibility baseline, and the specified atmosphere actually landed (surface treatments render with backdrop blur and multi-layer elevation, the projected tokens resolve, no degradation to a flat default). A red layer blocks the milestone it is a real defect, not a flaky test. When the surface's platform has no check that can run, that is a fail-closed block, not a skip (the test tooling emits a failing placeholder naming the gap, never silently passes — Test tooling, area G of the change plan).
182
+ 2. **Tier 2 — confirm the build matches the micro-polish spec.** Read the screenshots Tier 1 captured (`.groundwork/cache/visual/_smoke/<surface>/<route>__<viewport>__<theme>.png`, plus any per-state captures written by interface tests to `.groundwork/cache/visual/<bet-slug>/<surface>/<state>.png`). Adopt the designer persona (`.groundwork/skills/groundwork-designer/SKILL.md`, reference `design-review.md`) and judge each screen against the **per-surface micro-polish spec** in `technical-design/01-ui-design.md` and the design system. The question is conformance to the written spec: did the specified surface treatment, motion, elevation, and type tokens land; do empty/loading/error states read as designed rather than as a failure; and — the dimensions Tier 1 cannot compute — is alignment optically correct, is the atmosphere restrained, does the composition read as considered? Surface what diverges from the spec; do not recite a fixed checklist. Record a one-line spec-conformance verdict per screen in the closing slice's commit message (a `Visual:` line) — a graphical milestone cannot close without it.
183
+ 3. **Tier 3 — the polish pass.** Now there is a running milestone to look at, run a deliberate pass over what was actually delivered, against the design and the agreed cases. Ask two questions: what does the consumer still need that is missing — a screen that works but shows no progress, a view with no empty state, a flow with no way back — and what considered touches would make this genuinely good to use. Build those in. The boundary is concrete: elevate what *this milestone* delivers to match its own design and complete its own flows; a net-new capability is its own milestone (route it as a ladder amendment), and the no-gos hold. AI-assisted coding makes this cheap, so the bar is high — "it renders" is not the finish line; "it is a pleasure to use" is.
150
184
 
151
- 1. **Tier 1 the deterministic floor is green.** The permanent `tests/system/test_render_smoke.py`, `test_a11y_smoke.py`, and `test_token_conformance.py` run as part of the suite: navigation returns 2xx/3xx, zero `error`-level console output, zero uncaught exceptions, no failed same-origin requests, no error overlay, a non-blank render across the viewport × theme matrix, the axe gate at the design system's accessibility baseline, and the new layer the specified atmosphere actually landed (surface treatments render with backdrop blur and multi-layer elevation, the projected tokens resolve, no degradation to a flat default). A red layer blocks the milestone it is a real defect, not a flaky test.
152
- 2. **Tier 2 — confirm the build matches the micro-polish spec.** Read the screenshots Tier 1 captured (`.groundwork/cache/visual/_smoke/<surface>/<route>__<viewport>__<theme>.png`, plus any per-state captures written by interface tests to `.groundwork/cache/visual/<bet-slug>/<surface>/<state>.png`). Adopt the designer persona (`.groundwork/skills/groundwork-designer/SKILL.md`, reference `design-review.md`) and judge each screen against the **per-surface micro-polish spec** in `technical-design/01-ui-design.md` and the design system. The question is conformance to the written spec, not "is it as good as a leader": did the specified surface treatment, motion, elevation, and type tokens land; do empty/loading/error states read as designed rather than as a failure; and — the dimensions Tier 1 cannot compute — is alignment optically correct, is the atmosphere restrained, does the composition read as considered? Surface what diverges from the spec; do not recite a fixed checklist. Record a one-line spec-conformance verdict per screen in the closing slice's commit message (a `Visual:` line) — a graphical milestone cannot close without it.
185
+ **Then the experience-auditor reviews the milestone.** Dispatch the experience-auditor lens (`briefs/experience-auditor.md`, the designer persona) over the assembled, running milestone distinct from the per-slice coverage review, because design fidelity and flow completeness need the whole surface, not one slice. It judges, against `01-ui-design.md`, the design system, and the design-phase reference apps: best-in-class patterns implemented in full, no dead-end flows, the named states present, design-system match, and the joy-to-use bar. Its findings triage like any other review a dead-end flow or a design-system miss is `decision-needed` and blocks the milestone.
153
186
 
154
- A coherence defect the inspection spots is fixed in this same delivery phase, where it is cheapest. A finding genuinely deferred is logged as a discovery note or a `docs/maturity.md` row, never silently dropped. Tier 1 asserts the tokens landed; this Tier-2 judgement covers what computation cannot optical alignment, restraint, and whether the whole reads as considered against the spec.
187
+ A coherence or experience defect the review spots is fixed in this same delivery phase, where it is cheapest. A finding genuinely deferred is logged as a discovery note or a `docs/maturity.md` row, never silently dropped. There is no "done for function now, polish later" split a milestone tested only through fakes, or shipped as a bare shell, has not closed.
155
188
 
156
189
  ### Milestone postmortem & course-correction
157
190
 
158
191
  A green milestone is not a finished milestone. The board going green proves the suite passes; it does not prove the milestone proved *what it set out to prove*, and it does not ask whether what the milestone taught us should change the rest of the plan. The retrospective in Validation (Phase 5) is too late for that — by then the whole bet is built against assumptions a mid-bet milestone may already have disproved. This checkpoint is the proactive one: at every milestone boundary, before the next milestone opens, run a focused pass over four questions, then open the next rung. It is a facilitated conversation, not a ceremony — it is where course-correction happens, and where the next milestone is sliced from what this one taught, while it is still cheap.
159
192
 
160
- 1. **Did this milestone honestly prove its intent?** Read the milestone's Proof-of-work prose against what was actually built. The board is green — but is it green for the right reason? The failure this catches is the *quietly hollowed proof*: a milestone whose intent was to exercise a real dependency — a live model call, a real external service, an actual queue — delivered instead against a light mock or a stub, so the suite passes while proving nothing the milestone existed to prove. The honest-green check runs per slice, but a milestone-level intent can erode across slices in a way no single slice review sees. When you find it, that is a course-correction: the milestone did not prove its intent, and the fix is to work the real thing in and re-prove it now — not to roll forward and discover at validation that the bet never proved its core premise. Treat a deferred-to-mock-where-the-real-thing-was-meant as a finding, every time, even on a fully green board.
193
+ 1. **Did this milestone honestly prove its intent, at the front door?** Read the milestone's Proof-of-work prose against what was actually built. The board is green — but is it green for the right reason, driven through the real product? The failure this catches is the *quietly hollowed proof*: a milestone whose intent was a consumer observing a real outcome — a live model call surfacing on screen, a real external service, an actual queue — delivered instead against a light mock, a stub, a scripted driver, or a fixture nothing real produces, so the suite passes while the consumer's outcome was never proven. The honest-green check runs per slice, but a milestone-level intent can erode across slices in a way no single slice review sees. When you find it, that is a course-correction: the milestone did not prove its intent, and the fix is to work the real thing in and re-prove it through the shipping build now — not to roll forward and discover at validation that the bet never proved its core premise. Treat a deferred-to-mock-where-the-real-thing-was-meant as a finding, every time, even on a fully green board.
161
194
 
162
- 2. **What did building this milestone teach that the remaining plan does not yet know?** Implementation reveals what design could only assume. Re-read the remaining ladder in light of what is now built: an assumption that broke, a downstream slice now redundant because this milestone subsumed it, a slice now missing because a real boundary turned out to need wiring the design did not foresee, a proof downstream that reads wrong now that its premise is concrete, or a whole milestone the ladder is missing. The question is not "is the plan still perfect" — it is "does what we learned change what we should build next." Route the answer by weight: (a) it changes only *how the next rung should be sliced* → carry it straight into *Opening a milestone* (that is exactly the slicing-from-ground-truth this deferral exists to enable); (b) the ladder is *missing a rung* → *Introducing a milestone* (a ladder amendment, within appetite); (c) a sealed *rung, design, or the appetite* is wrong → an Amendment or Change Navigation (Q3).
195
+ 2. **What did building this milestone teach that the remaining plan does not yet know?** Implementation reveals what design could only assume. Re-read the remaining ladder in light of what is now built: an assumption that broke, a downstream slice now redundant because this milestone subsumed it, a slice now missing because a real boundary turned out to need wiring the design did not foresee, a proof downstream that reads wrong now that its premise is concrete, or a whole milestone the ladder is missing. The question is not "is the plan still perfect" — it is "does what we learned change what we should build next." Route the answer by weight: (a) it changes only *how the next rung should be sliced* → carry it straight into *Opening a milestone* (that is exactly the slicing-from-ground-truth this deferral exists to enable); (b) the ladder is *missing a rung* → *Introducing a milestone* (a ladder amendment, within appetite); (c) an approved *rung, design, or the appetite* is wrong → an Amendment or Change Navigation (Q3).
163
196
 
164
- 3. **Route any needed change through the integrity machinery — never a silent edit.** A change to the *plan prose* (a milestone's or slice's Proof of work, an acceptance criterion) is an **Amendment** (below): on the user's approval, edit the prose, move the `bet/<bet-slug>/approved` tag to a commit that includes the edit, then adjust the affected board and code. A change to the *design itself* (an API/data shape, a milestone's existence, the appetite) is **Change Navigation** (below): write the change proposal and route by severity — though a *missing rung that fits the appetite and the locked design* is the lighter **ladder amendment** handled in-delivery (*Introducing a milestone* above), with no revert. Either way the trail — edited prose + re-tag, or a change-proposal file — is what lets the next slice's prose-integrity reconciliation tell an approved change from a silent one. "Adjust as we go" is a feature of this process precisely because it leaves that trail.
197
+ 3. **Route any needed change through the integrity machinery — never a silent edit.** A change to *what the plan proves* (a milestone's or slice's Proof of work, an acceptance criterion) is an **Amendment** (below): on the user's approval, edit the prose and commit it beside the decomposition with a reason (`bet(<bet-slug>): amend milestone <N> proof <reason>`), then adjust the affected board and code. A change to the *design itself* (an API/data shape, a milestone's existence, the appetite) is **Change Navigation** (below): write the change proposal and route by severity — though a *missing rung that fits the appetite and the locked design* is the lighter **ladder amendment** handled in-delivery (*Introducing a milestone* above), with no revert. Either way the trail — the recorded amendment commit, or a change-proposal file — is what lets the next slice's prose-integrity reconciliation tell an approved change from a silent one. "Adjust as we go" is a feature of this process precisely because it leaves that trail.
165
198
 
166
199
  4. **Where does the delivered work actually stand?** Note anything the milestone surfaced that the next milestone or the final validation needs — a readiness caveat, a discovery-note signal for a future bet (`.groundwork/cache/discovery-notes.md`), a `docs/maturity.md` row. Capture it now while it is fresh; do not bank on remembering it at validation.
167
200
 
168
201
  **Pause per the chosen mode.** In slice-by-slice and milestone-by-milestone modes, always pause here: present the postmortem — what the milestone proved, anything that did not hold, and any course-correction you recommend — and get the user's decision before opening the next milestone. In whole-bet mode, surface the postmortem summary and proceed automatically *unless* it found a course-correction (a hollowed proof, a remaining-plan change, an amendment, a new milestone / ladder amendment, or a Change Navigation): a course-correction is the user's call and pauses even in whole-bet mode. Routinely authoring the next rung's slices is *not* a course-correction — in whole-bet mode a clean postmortem rolls straight on, the scoped Protocol 9 review gating the new slices. A clean postmortem in whole-bet mode carries its summary onto the record.
169
202
 
170
- **Then open the next milestone.** With this milestone's lessons in hand, the user's go-ahead (or whole-bet autonomy), and any ladder amendment or Change Navigation already routed, run *Opening a milestone* above and ratchet the seal. (The final milestone has no next rung; its postmortem closes into Validation.)
203
+ **Then open the next milestone.** With this milestone's lessons in hand, the user's go-ahead (or whole-bet autonomy), and any ladder amendment or Change Navigation already routed, run *Opening a milestone* above and record the authored slices. (The final milestone has no next rung; its postmortem closes into Validation.)
171
204
 
172
205
  ## Amendment Protocol — when an approved proof is wrong
173
206
 
174
- An approved proof can still be wrong: its Proof-of-work prose can describe a shape the design never defined, encode a misread capability, or demand an outcome no implementation can reach. Approval does not make the prose right — it makes changing it a decision the user takes, not a convenience the implementing worker or driver reaches for. The amendment leaves a trail (the edited prose + a re-tag) precisely so the prose-integrity reconciliation can tell an approved change from a silent one. This protocol fires from three places: a slice-worker's `BLOCKING CONCERN`, a `decision-needed` review finding, or the milestone postmortem.
207
+ An approved proof can still be wrong: its Proof-of-work prose can describe a shape the design never defined, encode a misread capability, or demand an outcome no implementation can reach. Approval does not make the prose right — it makes changing it a decision the user takes, not a convenience the implementing worker or driver reaches for. The amendment leaves a trail (the edited prose committed with a reason) precisely so the prose-integrity reconciliation can tell an approved change from a silent one. This protocol fires from three places: a slice-worker's `BLOCKING CONCERN`, a `decision-needed` review finding, or the milestone postmortem.
175
208
 
176
209
  1. **Stop work on the affected slice or milestone.** Do not edit the prose, and do not implement toward a proof you believe is wrong.
177
210
  2. **State the case:** what the Proof-of-work proof (or the API shape behind it) says, what you believe it should say, and which artifact is the source of the error — the proof alone, or the technical design behind it.
178
- 3. **Route by depth.** A wrong proof against a correct design is a proof amendment: on the user's explicit approval, edit the slice's (or milestone's) Proof-of-work prose, **move the `bet/<bet-slug>/approved` tag to a commit that includes the edit** (`git tag -f`, or record the amended commit in the bet record when re-tagging is undesirable), then change the built test and code to match. That edited-prose commit is the amendment trail the reconciliation reads. Editing an *unopened* milestone's headline proof is the cheapest amendment of all — correct the ladder rung and re-tag; because its slices were never authored, nothing downstream unwinds. A wrong API/data design is deeper — follow Change Navigation below.
211
+ 3. **Route by depth.** A wrong proof against a correct design is a proof amendment: on the user's explicit approval, edit the slice's (or milestone's) Proof-of-work prose and **commit it beside the decomposition with a reason** (`bet(<bet-slug>): amend milestone <N> proof <reason>`), then change the built test and code to match. That recorded amendment commit is the trail the reconciliation reads. Editing an *unopened* milestone's headline proof is the cheapest amendment of all — correct the ladder rung and commit; because its slices were never authored, nothing downstream unwinds. A wrong API/data design is deeper — follow Change Navigation below.
179
212
  4. **Record the amendment** in the slice's delivery commit `Notes:` (and in the postmortem record when it surfaced there) so Validation's retrospective sees how the contract moved after approval.
180
213
 
181
214
  ## Change Navigation — when reality contradicts the locked design
@@ -183,7 +216,7 @@ An approved proof can still be wrong: its Proof-of-work prose can describe a sha
183
216
  Mid-delivery discoveries that invalidate the design are not failures of the process; pushing through them silently is. When implementation reveals the design committed to something wrong — surfaced by a slice-worker, a review, or the milestone postmortem:
184
217
 
185
218
  1. **Pause the slice** and write a change proposal at `docs/bets/<bet-slug>/change-proposal-<n>.md` using the template at `.groundwork/skills/groundwork-bet/templates/change-proposal.md`: the discovery and its evidence, the impact across pitch / technical design / decomposition / built artifacts (name each affected section), the before/after of every proposed edit, and the severity.
186
- 2. **Route by severity.** *Minor* — the API/data design and milestones survive; specific proofs and design sections need correction: on user approval, apply the edits, re-review mutated docs (Protocol 9), amend affected proofs through the Amendment Protocol (edit the prose, re-tag, change the built test and code), resume the slice. *Ladder amendment* — the design holds and the ladder is simply missing a rung that fits the appetite and is derivable from the locked design: this is not a design contradiction at all — handle it in-delivery via *Introducing a milestone* above (author the new rung's headline, review, ratchet the seal), no revert. *Structural* — an API/data design, a milestone, or the appetite itself is wrong, or a needed new rung requires capability the design never covered or would exceed the appetite: on user approval, revert to Design Foundations (`status: design`), rework the design with the proposal as input, and re-run Decomposition for the affected scope; unaffected delivered slices stand.
219
+ 2. **Route by severity.** *Minor* — the API/data design and milestones survive; specific proofs and design sections need correction: on user approval, apply the edits, re-review mutated docs (Protocol 9), amend affected proofs through the Amendment Protocol (edit the prose, commit it with a reason, change the built test and code), resume the slice. *Ladder amendment* — the design holds and the ladder is simply missing a rung that fits the appetite and is derivable from the locked design: this is not a design contradiction at all — handle it in-delivery via *Introducing a milestone* above (author the new rung's headline, review, record the new rung), no revert. *Structural* — an API/data design, a milestone, or the appetite itself is wrong, or a needed new rung requires capability the design never covered or would exceed the appetite: on user approval, revert to Design Foundations (`status: design`), rework the design with the proposal as input, and re-run Decomposition for the affected scope; unaffected delivered slices stand.
187
220
  3. The proposal stays in the bet directory either way — it is the audit trail Validation and the retrospective read.
188
221
 
189
222
  ## Transition
@@ -1,6 +1,6 @@
1
1
  # Phase 5: Validation (Testing & Handoff)
2
2
 
3
- **Goal:** Verify the implementation, capture each touched service's served contract into the canonical `docs/architecture/api/` record, archive the whole bet, fold what the bet learned back into the upstream documents, and seed the next bet with any signals that surfaced during delivery.
3
+ **Goal:** Verify the implementation, capture each touched service's served contract into the canonical `docs/architecture/api/` record, archive the whole bet, fold what the bet learned back into the upstream documents, integrate the validated bet to trunk, and seed the next bet with any signals that surfaced during delivery.
4
4
 
5
5
  A bet that ships without updating upstream docs leaves the next bet operating against a stale map. The Validation phase exists to close the loop — the test suite proves the implementation works, the Living Documents scan proves the rest of the system still describes reality.
6
6
 
@@ -16,7 +16,7 @@ Update `docs/bets/<bet-slug>/pitch.md` frontmatter to `status: validation`.
16
16
 
17
17
  ### Step 2: Run the test suite
18
18
 
19
- Execute the full bet-progress test suite: `./dev test bet <bet-slug>` (or `pytest tests/bets/<bet-slug>/` directly). Every test must pass before advancing — and run the **prose-integrity reconciliation once over the whole bet**. By now the `bet/<bet-slug>/approved` tag has **ratcheted forward** through delivery advancing additively as each milestone was opened-and-sliced or added so it sits at the final rung's authoring commit. `git diff bet/<bet-slug>/approved.. -- docs/bets/<bet-slug>/decomposition/ docs/bets/<bet-slug>/technical-design/` shows no change except approved amendments (the Amendment Protocol's commit trail), and every built test still proves what its slice's Proof-of-work prose describes. (The per-slice reconciliation already guarded each ratchet step during delivery; this is the whole-bet confirmation.) The sealed contract is the prose; the tests and implementation were built this phase and are supposed to have changed. A suite that drifted from the sealed prose without a recorded amendment is not what the user approved — flag it and revert.
19
+ Execute the full bet-progress test suite: `./dev test bet <bet-slug>` (or `pytest tests/bets/<bet-slug>/` directly). Every test must pass before advancing — and run the **prose-integrity reconciliation once over the whole bet**. The approved decomposition commit is the baseline; through delivery the only changes to the decomposition prose are recorded amendments (each an Amendment commit carrying a reason) and additive authoring of later rungs. `git log --oneline -- docs/bets/<bet-slug>/decomposition/ docs/bets/<bet-slug>/technical-design/` since that baseline shows that trail and nothing else, and every built test still proves what its slice's Proof-of-work prose describes. (The per-slice reconciliation already guarded each step during delivery; this is the whole-bet confirmation.) The contract is the approved prose; the tests and implementation were built this phase and are supposed to have changed. A suite that drifted from the approved prose without a recorded amendment is not what the user approved — flag it and revert.
20
20
 
21
21
  **Contract verification:** Confirm that no manual schema definitions or rogue HTTP calls were introduced during Delivery — cross-service calls use clients derived from the canonical `docs/architecture/api/<service>/` contract, and no endpoint, field, or table exists that the prose design and the captured contract do not define. A bet that delivered against side-channel contracts has compromised the architecture's integrity; flag it and revert.
22
22
 
@@ -28,10 +28,11 @@ The bet's API and data design were prose; the real machine-readable contract is
28
28
 
29
29
  **Conditional — graphical surfaces only.** Skip this step entirely, and say so in one line, when the bet touched no `graphical-ui` surface: a backend, CLI, or agentic bet pays nothing. For a bet that delivered a graphical surface, confirm the visual ladder before the bet can reach `delivered`:
30
30
 
31
- 1. **Tier 1 — the deterministic floor is green.** Step 2 already ran the suite; confirm `tests/system/test_render_smoke.py`, `test_a11y_smoke.py`, and `test_token_conformance.py` passed across the viewport × theme matrix. Render-smoke catches a blank/throwing/unstyled page; token-conformance asserts the specified atmosphere actually *landed* — the surface treatments render with their backdrop blur and multi-layer elevation rather than degrading to a flat default. A red layer is a real defect and blocks the bet — it is not a flaky test to wave through.
31
+ 1. **Tier 1 — the deterministic floor is green.** Step 2 already ran the suite; confirm `tests/system/test_render_smoke.py`, `test_a11y_smoke.py`, and `test_token_conformance.py` passed across the viewport × theme matrix. Render-smoke catches a blank/throwing/unstyled page; token-conformance asserts the specified atmosphere actually *landed* — the surface treatments render with their backdrop blur and multi-layer elevation rather than degrading to a flat default. A red layer is a real defect and blocks the bet — it is not a flaky test to wave through. A surface whose platform has no check that can run is a fail-closed block, not a skip — the placeholder check fails with the named gap (Test tooling), and the bet cannot reach `delivered` on an unverified surface.
32
32
  2. **Tier 2 — spec-conformance inspection happened.** Confirm each graphical milestone recorded its per-screen spec-conformance verdict during delivery (the `Visual:` line in the closing slice's delivery commit, from `04-delivery.md` Milestone close). That verdict is the designer-judged check: does the rendered surface match the written micro-polish spec, including the dimensions computation cannot assert — optical alignment, restraint, and whether the composition reads as considered. A milestone that closed with no verdict did not run the inspection — route it back.
33
+ 3. **The whole-bet experience judgment.** Each milestone proved its own piece; now judge the assembled product end to end. Dispatch the experience-auditor lens (`briefs/experience-auditor.md`, the designer persona) over the finished bet's surfaces, driven the way the consumer drives them: is it **fully formed and a joy to use** — no dead-end flows across milestone seams, every async state present, the design system consistent, best-in-class patterns implemented in full? This is the same designer-eye judgment the per-milestone review applies, now at bet scope, where cross-milestone gaps (a flow that works within each milestone but breaks at the join) first become visible. A no here — a dead end, a half-built pattern, a screen that works but is not usable — **fails the bet**; route it back to delivery to fix, do not wave it through to `delivered`.
33
34
 
34
- Report which tiers ran in one line, so the run/skip is auditable. There is no separate multimodal "is it as good as the references" grading pass: the craft bar is set as the concrete micro-polish spec at design time (a screenshot cannot see motion or state, and a vision grade of static frames is a weak signal), enforced deterministically by Tier 1 and by the designer's spec-conformance judgement in Tier 2.
35
+ Report which tiers ran in one line, so the run/skip is auditable. There is no separate multimodal "is it as good as the references" grading pass for static frames: the craft bar is the concrete micro-polish spec enforced by Tiers 1–2, and the experience judgment above is the designer driving the running product, not grading screenshots.
35
36
 
36
37
  ### Step 2.7: Record the bet in the capability ledger
37
38
 
@@ -40,7 +41,7 @@ Skip this step entirely when the project has no `docs/surfaces.md` — a project
40
41
  The capability ledger (in `docs/surfaces.md`) is where surface divergence becomes a recorded decision instead of silent drift, and validation is the one writer that appends capability rows. For each capability this bet delivered — user-meaningful, typically 1–3 per bet, coarse enough to stay readable, never per-endpoint — write its ledger row:
41
42
 
42
43
  - **Row key:** `<bet-slug>/<capability-slug>` — stable, greppable, collision-free.
43
- - **Every surface column filled** with exactly one state and its payload: `delivered` (this bet's slug), `planned` (a bet ref or discovery-notes pointer), `omitted` (one-line rationale), or `n/a` (no payload). The pitch's `surfaces:` scope and surface no-gos are the source: in-scope surfaces whose surface milestones went green are `delivered`; deferred no-gos and deferred surface milestones are `planned`; omitted no-gos are `omitted`, carrying the pitch's rationale; structurally meaningless columns are `n/a`. A retired surface's column fills `n/a` automatically.
44
+ - **Every surface column filled** with exactly one state and its payload: `delivered` (this bet's slug), `planned` (a bet ref or discovery-notes pointer), `omitted` (one-line rationale), or `n/a` (no payload). The pitch's `surfaces:` scope and surface no-gos are the source: in-scope surfaces this bet delivered the capability to are `delivered`; deferred no-gos and surfaces left for a later bet are `planned`; omitted no-gos are `omitted`, carrying the pitch's rationale; structurally meaningless columns are `n/a`. A retired surface's column fills `n/a` automatically.
44
45
  - **Cross-post every `planned` cell** as a bullet under `## Bets` in `.groundwork/cache/discovery-notes.md`, naming the capability key and the target surface — the next bet's Discovery reads that section, so the deferral becomes backlog instead of memory.
45
46
  - **Update `.groundwork/surfaces.json` in the same change:** append the capability entries with the same keys, states, and payloads. The prose ledger and its machine twin are projections of the same decisions; they never drift.
46
47
 
@@ -70,9 +71,9 @@ Move the whole bet out of the active tree: `docs/bets/<bet-slug>/` → `docs/bet
70
71
 
71
72
  The permanent best-practice tests rolled out during Delivery (in service repos and `tests/system/`) remain in place — they are the ongoing coverage for this feature going forward. The bet's prose and its bet-progress suite served their purpose as the definition of done and the proof-of-work scaffolding; they are now archived as the bet's record.
72
73
 
73
- ### Step 4: Review with the user
74
+ ### Step 4: Review with the user — they drive the real product
74
75
 
75
- Summarise what was delivered. Walk through the user-facing changes, the new contracts, and any constraints the implementation revealed. Capture the user's reactions — corrections, requests for follow-up bets, or observations about what surprised them all belong in the next step's scan.
76
+ The bet's success signal is the owner using the real shipping product the way its consumer will — running the agreed front-door cases against the build that actually ships, on real data. A green suite and a clean experience judgment are the evidence; the owner driving it is the confirmation. Walk them to the shipping build (not a test target), have them carry out the milestones' headline cases, and watch what happens on the real surface. Then summarise what was delivered — the user-facing changes, the new contracts, and any constraints the implementation revealed. Capture the user's reactions — corrections, requests for follow-up bets, anything that surprised them, or anything that did not feel right in their hands — they all belong in the next step's scan, and a "this isn't usable the way I expected" here is a finding, not a closing pleasantry.
76
77
 
77
78
  ### Step 5: Apply the Living Documents protocol
78
79
 
@@ -124,6 +125,16 @@ Write `docs/bets/<bet-slug>/retrospective.md`: the patterns found, the follow-th
124
125
 
125
126
  Update `docs/bets/<bet-slug>/pitch.md` frontmatter to `status: delivered`. On a registry project, Step 2.7's gate applies: do not write `delivered` while any ledger cell for this bet's capabilities is empty — fill the column or the bet does not close.
126
127
 
128
+ ### Step 8.5: Integrate the bet to trunk
129
+
130
+ The bet has ridden its own branch (`bet/<bet-slug>`) in an isolated worktree since Delivery (`04-delivery.md`, "Git workflow: a branch per bet"). With the suite green, the canonical contracts captured, the bet archived, and the upstream docs reconciled, the branch now holds a complete, validated bet — and trunk is ready to receive it. Integrating is a single **user-gated** step: merging to a shared branch is the user's call, never the driver's alone (the same standard that gates any push to a remote).
131
+
132
+ 1. **Rebase onto current trunk.** `git fetch origin`, then rebase `bet/<bet-slug>` onto `origin/<trunk>` to absorb whatever landed during the bet. Re-run the project's full test suite (`./dev test` or `./dev ci`) once after the rebase — the permanent best-practice tests rolled out during Delivery, not the now-archived bet-progress suite — so trunk receives a green merge, not an assumed one.
133
+ 2. **Fast-forward merge to trunk, then push `origin/<trunk>` — on the user's explicit go-ahead.** The push to `origin/main` is *the* gated action of the whole delivery: ask first, every time. (Backup pushes of the `bet/<bet-slug>` branch run freely during delivery — they publish nothing into trunk; the trunk push never does.) Trunk only ever receives a complete, validated bet, so nothing half-built lands and no feature flag is required to keep it releasable.
134
+ 3. **Tear down the isolation.** Remove the worktree (`git worktree remove <path>`, then `git worktree prune` — never `rm -rf`, which strands `.git/worktrees/` metadata) and delete the merged branch — locally, and, if it was pushed for backup during delivery (`04-delivery.md`, "Push the bet branch as you go"), its remote too (`git push origin --delete bet/<bet-slug>`). The bet's decomposition prose and its recorded amendment trail live on in the merged history — that is the permanent record.
135
+
136
+ **By topology** (`04-delivery.md`, "Recording a cross-service slice"): a **monorepo** merges one branch. For **submodules**, fast-forward each affected submodule's branch and push it, then land the final superrepo gitlink-bump merge last. For a **polyrepo**, fast-forward each repository's branch in producer-before-consumer order (or gated on the contract check), the change-set manifest recording the set; there is no single merge.
137
+
127
138
  ### Step 9: Hand off
128
139
 
129
140
  Confirm the bet is complete. Summarise what was delivered, what was updated upstream, and what was parked for the next bet. Recommend a fresh context for the next bet — the rich delivery context has been compressed into doc updates and discovery notes, so the next bet does not need it.
@@ -14,7 +14,7 @@ a review of the matching reference so the distillation never drifts.
14
14
  | src/docs/principles/design/visual-design.md | 40c4a59f2658f6075f60c745ac1d320afa1a2728542a1c0145153dc1752e20d2 | 2026-06-21 |
15
15
  | src/docs/principles/design/layout-and-space.md | 757c407126cf3cbc60be071bbdf6d17721c8d77105c7e6a9a6237d039fa1d09b | 2026-06-20 |
16
16
  | src/docs/principles/design/interaction-and-motion.md | 99c47d80bd0960b5bd325842cb55199697e10917034511a82af89c873fc76e39 | 2026-06-20 |
17
- | src/docs/principles/design/usability-and-ux.md | 5c08dfcaeddd79ae110dfd336a04a1e4ef151fb942b0c49c53f247fcb9bf133a | 2026-06-20 |
17
+ | src/docs/principles/design/usability-and-ux.md | 912999d2e125b393dbe46b7cf7a4172f5e5f2a48c3bc8459d8166afe34eb527c | 2026-06-27 |
18
18
  | src/docs/principles/design/design-systems-and-tokens.md | 3a7b416e122e4d79451a6ac2de56c7cb9142999902d60a20801572c24e201bcd | 2026-06-20 |
19
19
  | src/docs/principles/design/ai-native-design.md | b70c6906aad413e3cf40e7493cd247a8b47b5bfcd010841f22793e23348836ff | 2026-06-20 |
20
20
  | src/docs/principles/quality/accessibility.md | f921e7bf6256bc105b127b841d0a30af8a70ad1ddd7632d492589f052e6501b2 | 2026-06-20 |
@@ -49,6 +49,17 @@ The user follows the product you are building, not the bookkeeping you build it
49
49
  - **Speak at the level of behaviour, not the symbol that implements it.** "A corrupt file fails for good; a worker crash leaves the file untouched so we can retry it later" tells the user what they need; ".failed(deep) versus .coarse on the keyframe disposition" does not. Reach for code-level detail only when the user is reading the code alongside you.
50
50
  - **Frame a decision as a choice about the product.** When you surface a contradiction or need a ruling, lead with what each option means for the user and what you recommend. The documents or symbols that disagree are the footnote, not the headline.
51
51
 
52
+ ## Speak as the Guide, Not the Tourist
53
+
54
+ You have internalized this process; you are walking the user through theirs. Speak from that footing. The failure mode is narrating your own reading of the workflow as a run of discoveries — announcing that you now understand a protocol, flagging a routine state-check as a finding, reacting to a note you yourself wrote — which makes you sound like someone meeting the process for the first time rather than the expert running it. That you understand the workflow is assumed; act on it instead of reporting it.
55
+
56
+ - **Don't report your own comprehension.** "Now I understand the protocol" / "I now have a clear picture" narrate your reading, not the user's project. Drop them and state what is true: where the work stands and what comes next.
57
+ - **Routine checks are the job, not discoveries.** Reading the board, the git log, or the spec is *how you work* — not a revelation to announce. "Key finding: the slices are already authored" dramatizes a lookup. Say it flat: "Milestones 5 and 6 are sliced but not yet built."
58
+ - **Don't narrate which instruction you are following.** The user is steering their product, not your file reads. "Per the orchestrator I route to…" gives them nothing to act on. Speak from the project's vantage — "Next we build Milestone 5's red board" — not the manual's.
59
+ - **Reconcile silently; report the current truth.** When something you knew turns out stale, correct your understanding without performing the surprise ("the note is stale"). Just tell the user what is actually true now.
60
+
61
+ Keep the brief why. Guiding is not terseness: a single clause of reasoning where it helps the user follow or decide — "5 and 6 were sliced up front, so there's no planning to redo" — is the guide's value, and distinct from narrating your own process.
62
+
52
63
  ## When You Need Input
53
64
 
54
65
  When you lack the context to make a good proposal, ask a bounded, specific question rather than an open one — instead of asking generally how to handle errors, ask whether a specific validation failure should map to a 400 Bad Request or a domain exception. Bounded questions cost a busy developer seconds; open ones hand back the planning work the proposal was supposed to do.
@@ -38,6 +38,7 @@ Tests do not exist at this gate — Delivery materializes the red board from thi
38
38
  - [ ] 🔴 **Ladder or first milestone incomplete**: the `decomposition/` tree is missing a piece it must carry at delivery start — `meta.json`, any milestone `index.md` (the full ladder of headline proofs), or a slice file the **first milestone** links — so delivery would execute against a partial plan. A *later* milestone with no slice files is expected (it is sliced on arrival, not up front), not incomplete; but a slice link that resolves to nothing is a dangling reference and fails this check.
39
39
  - [ ] 🟡 **Capability ↔ proof drift**: a Required Capability with no Proof of work covering it, or a Proof of work that rests on no Required Capability — the proof and the plan disagree about what is being proven.
40
40
  - [ ] 🟡 **Proof-of-work prose stale**: a milestone `index.md` or slice file's Proof of work describes a shape, an interface, or an outcome the current `technical-design/` no longer carries — the sealed prose drifted from the design, so the user approved something other than what delivery will build against.
41
+ - [ ] 🟡 **Testing strategy unresolved**: a slice's stack does not resolve to a promoted engineer-skill testing strategy (`.agents/skills/groundwork-<stack>-engineer/references/testing.md`) — the slice-worker has no authority to roll the permanent best-practice tests out against, and the coverage-auditor lens no baseline to hold them to. Scaffolded services always carry one; a hand-added or non-standard service may not. Name the gap so delivery rolls out coverage deliberately rather than improvising it.
41
42
 
42
43
  ## Currency
43
44