@kontourai/flow-agents 1.3.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (214) hide show
  1. package/.github/CODEOWNERS +29 -0
  2. package/.github/actions/trust-verify/action.yml +145 -0
  3. package/.github/workflows/ci.yml +11 -4
  4. package/.github/workflows/kit-gates-demo.yml +2 -2
  5. package/.github/workflows/publish-npm.yml +10 -2
  6. package/.github/workflows/release-please.yml +1 -1
  7. package/.github/workflows/trust-reconcile.yml +113 -0
  8. package/AGENTS.md +13 -0
  9. package/CHANGELOG.md +103 -0
  10. package/CONTRIBUTING.md +4 -4
  11. package/README.md +1 -0
  12. package/agents/tool-planner.json +1 -1
  13. package/build/src/cli/console-learning-projection.d.ts +1 -0
  14. package/build/src/cli/effective-backlog-settings.d.ts +1 -0
  15. package/build/src/cli/fixture-retirement-audit.d.ts +2 -0
  16. package/build/src/cli/init.d.ts +17 -0
  17. package/build/src/cli/init.js +242 -20
  18. package/build/src/cli/kit.d.ts +1 -0
  19. package/build/src/cli/promote-workflow-artifact.d.ts +1 -0
  20. package/build/src/cli/publish-change-helper.d.ts +1 -0
  21. package/build/src/cli/pull-work-provider.d.ts +1 -0
  22. package/build/src/cli/runtime-adapter.d.ts +1 -0
  23. package/build/src/cli/telemetry-doctor.d.ts +1 -0
  24. package/build/src/cli/usage-feedback.d.ts +1 -0
  25. package/build/src/cli/utterance-check.d.ts +1 -0
  26. package/build/src/cli/validate-hook-influence.d.ts +1 -0
  27. package/build/src/cli/validate-source-tree.d.ts +1 -0
  28. package/build/src/cli/validate-workflow-artifacts.d.ts +2 -0
  29. package/build/src/cli/validate-workflow-artifacts.js +19 -2
  30. package/build/src/cli/verify.d.ts +1 -0
  31. package/build/src/cli/verify.js +90 -0
  32. package/build/src/cli/veritas-governance.d.ts +1 -0
  33. package/build/src/cli/workflow-artifact-cleanup-audit.d.ts +1 -0
  34. package/build/src/cli/workflow-sidecar.d.ts +324 -0
  35. package/build/src/cli/workflow-sidecar.js +1973 -90
  36. package/build/src/cli.d.ts +2 -0
  37. package/build/src/cli.js +2 -3
  38. package/build/src/flow-kit/validate.d.ts +81 -0
  39. package/build/src/index.d.ts +5 -0
  40. package/build/src/index.js +36 -0
  41. package/build/src/lib/args.d.ts +8 -0
  42. package/build/src/lib/flow-resolver.d.ts +82 -0
  43. package/build/src/lib/flow-resolver.js +237 -0
  44. package/build/src/lib/fs.d.ts +7 -0
  45. package/build/src/lib/workflow-learning-projection.d.ts +132 -0
  46. package/build/src/runtime-adapters.d.ts +18 -0
  47. package/build/src/tools/build-universal-bundles.d.ts +2 -0
  48. package/build/src/tools/build-universal-bundles.js +34 -22
  49. package/build/src/tools/common.d.ts +9 -0
  50. package/build/src/tools/generate-context-map.d.ts +2 -0
  51. package/build/src/tools/generate-context-map.js +3 -16
  52. package/build/src/tools/validate-package.d.ts +2 -0
  53. package/build/src/tools/validate-source-tree.d.ts +2 -0
  54. package/build/src/tools/validate-source-tree.js +42 -162
  55. package/context/contracts/artifact-contract.md +10 -0
  56. package/context/contracts/delivery-contract.md +1 -0
  57. package/context/contracts/review-contract.md +1 -0
  58. package/context/contracts/verification-contract.md +2 -0
  59. package/context/gate-awareness.md +39 -0
  60. package/context/scripts/hooks/stop-goal-fit.js +632 -70
  61. package/docs/adr/0001-flow-agents-consumes-flow.md +1 -1
  62. package/docs/adr/0002-flow-kits-as-extension-unit.md +1 -1
  63. package/docs/adr/0004-gates-expect-surface-claims.md +2 -0
  64. package/docs/adr/0005-kubernetes-inspired-resource-contracts.md +2 -0
  65. package/docs/adr/0007-skill-audit.md +1 -1
  66. package/docs/adr/0009-canonical-hook-core-kit-boundary.md +95 -0
  67. package/docs/adr/0010-workflow-trust-state-as-hachure-bundle.md +139 -0
  68. package/docs/adr/0011-mcp-posture.md +100 -0
  69. package/docs/adr/0012-agent-coordination-as-liveness-claims.md +119 -0
  70. package/docs/adr/0013-context-lifecycle.md +151 -0
  71. package/docs/adr/0014-core-vs-domain-kit-boundary.md +143 -0
  72. package/docs/adr/0015-flow-flow-agents-boundary-reconciliation.md +120 -0
  73. package/docs/adr/0016-three-hard-boundary-model.md +71 -0
  74. package/docs/adr/0017-anti-gaming-trust-security-model.md +155 -0
  75. package/docs/agent-system-guidebook.md +5 -12
  76. package/docs/context-map.md +4 -10
  77. package/docs/developer-architecture.md +14 -0
  78. package/docs/index.md +3 -2
  79. package/docs/integrations/framework-adapter.md +19 -6
  80. package/docs/integrations/index.md +2 -2
  81. package/docs/north-star.md +4 -4
  82. package/docs/operating-layers.md +3 -3
  83. package/docs/plans/adr-0010-phase2-gate-recompute.md +55 -0
  84. package/docs/repository-structure.md +2 -2
  85. package/docs/skills-map.md +1 -0
  86. package/docs/spec/runtime-hook-surface.md +78 -10
  87. package/docs/standards-register.md +3 -3
  88. package/docs/survey-utterance-check.md +1 -1
  89. package/docs/trust-anchor-adoption.md +197 -0
  90. package/docs/verifiable-trust.md +95 -0
  91. package/docs/veritas-integration.md +2 -2
  92. package/docs/workflow-usage-guide.md +69 -0
  93. package/evals/acceptance/DEMO-false-completion.md +144 -0
  94. package/evals/acceptance/demo-cast.sh +92 -0
  95. package/evals/acceptance/demo-false-completion.sh +72 -0
  96. package/evals/acceptance/demo-real-evidence.sh +104 -0
  97. package/evals/acceptance/demo.tape +29 -0
  98. package/evals/acceptance/prove-capture-teeth-declared.sh +335 -0
  99. package/evals/acceptance/prove-capture-teeth.sh +114 -0
  100. package/evals/acceptance/prove-teeth.sh +105 -0
  101. package/evals/ci/antigaming-suite.sh +54 -0
  102. package/evals/ci/run-baseline.sh +2 -0
  103. package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/flows/review.flow.json +26 -0
  104. package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/kit.json +20 -0
  105. package/evals/fixtures/flow-kit-repository/valid-unknown-extension/flows/review.flow.json +26 -0
  106. package/evals/fixtures/flow-kit-repository/valid-unknown-extension/kit.json +18 -0
  107. package/evals/integration/test_builder_step_producers.sh +379 -0
  108. package/evals/integration/test_bundle_install.sh +35 -71
  109. package/evals/integration/test_bundle_lifecycle.sh +39 -2
  110. package/evals/integration/test_captured_fail_reconciliation.sh +820 -0
  111. package/evals/integration/test_checkpoint_signing.sh +489 -0
  112. package/evals/integration/test_claim_lookup.sh +352 -0
  113. package/evals/integration/test_command_log_integrity.sh +275 -0
  114. package/evals/integration/test_context_map.sh +0 -2
  115. package/evals/integration/test_dual_emit_flow_step.sh +278 -0
  116. package/evals/integration/test_enforcer_expects_driven.sh +281 -0
  117. package/evals/integration/test_evidence_capture_hook.sh +185 -0
  118. package/evals/integration/test_flow_kit_repository.sh +2 -0
  119. package/evals/integration/test_flowdef_session_activation.sh +273 -0
  120. package/evals/integration/test_flowdef_session_history_preservation.sh +250 -0
  121. package/evals/integration/test_gate_bypass_chain.sh +448 -0
  122. package/evals/integration/test_gate_lockdown.sh +1137 -0
  123. package/evals/integration/test_gate_review_inquiry_records.sh +399 -0
  124. package/evals/integration/test_goal_fit_escape_hatch.sh +73 -0
  125. package/evals/integration/test_goal_fit_hook.sh +69 -4
  126. package/evals/integration/test_goal_fit_rederive.sh +263 -0
  127. package/evals/integration/test_hook_category_behaviors.sh +14 -0
  128. package/evals/integration/test_install_merge.sh +1176 -0
  129. package/evals/integration/test_mint_attestation.sh +373 -0
  130. package/evals/integration/test_phase_map_and_gate_claim.sh +365 -0
  131. package/evals/integration/test_publish_delivery.sh +269 -0
  132. package/evals/integration/test_reconcile_soundness.sh +528 -0
  133. package/evals/integration/test_resolvefirststep_security.sh +208 -0
  134. package/evals/integration/test_session_resume_roundtrip.sh +286 -0
  135. package/evals/integration/test_trust_checkpoint.sh +325 -0
  136. package/evals/integration/test_trust_reconcile.sh +293 -0
  137. package/evals/integration/test_verify_cli.sh +208 -0
  138. package/evals/integration/test_workflow_sidecar_writer.sh +549 -34
  139. package/evals/lib/node.sh +0 -6
  140. package/evals/run.sh +47 -0
  141. package/evals/static/test_library_exports.sh +85 -0
  142. package/evals/static/test_universal_bundles.sh +15 -0
  143. package/evals/static/test_workflow_skills.sh +6 -13
  144. package/install.sh +0 -7
  145. package/integrations/strands-ts/README.md +25 -15
  146. package/integrations/veritas/flow-agents.adapter.json +1 -2
  147. package/kits/builder/flows/build.flow.json +59 -12
  148. package/kits/builder/kit.json +85 -15
  149. package/kits/builder/skills/continue-work/SKILL.md +116 -0
  150. package/kits/builder/skills/deliver/SKILL.md +36 -6
  151. package/kits/builder/skills/design-probe/SKILL.md +28 -0
  152. package/kits/builder/skills/execute-plan/SKILL.md +9 -1
  153. package/kits/builder/skills/gate-review/SKILL.md +234 -0
  154. package/kits/builder/skills/learning-review/SKILL.md +30 -0
  155. package/kits/builder/skills/pickup-probe/SKILL.md +29 -0
  156. package/kits/builder/skills/plan-work/SKILL.md +13 -1
  157. package/kits/builder/skills/pull-work/SKILL.md +19 -0
  158. package/kits/knowledge/adapters/default-store/index.js +38 -0
  159. package/kits/knowledge/adapters/flow-runner/index.js +1620 -0
  160. package/kits/knowledge/adapters/obsidian-store/index.js +36 -6
  161. package/kits/knowledge/docs/store-contract.md +314 -0
  162. package/kits/knowledge/evals/audit-freshness/suite.test.js +368 -0
  163. package/kits/knowledge/evals/canonicalize-category/suite.test.js +383 -0
  164. package/kits/knowledge/evals/contract-suite/suite.test.js +111 -0
  165. package/kits/knowledge/evals/detect-contradictions/suite.test.js +324 -0
  166. package/kits/knowledge/evals/entities/suite.test.js +40 -0
  167. package/kits/knowledge/evals/glossary-sync/suite.test.js +416 -0
  168. package/kits/knowledge/evals/hygiene-review/suite.test.js +396 -0
  169. package/kits/knowledge/evals/retirement/suite.test.js +145 -0
  170. package/kits/knowledge/flows/audit-freshness.flow.json +44 -0
  171. package/kits/knowledge/flows/canonicalize-category.flow.json +44 -0
  172. package/kits/knowledge/flows/detect-contradictions.flow.json +44 -0
  173. package/kits/knowledge/flows/glossary-sync.flow.json +61 -0
  174. package/kits/knowledge/flows/hygiene-review.flow.json +43 -0
  175. package/kits/knowledge/kit.json +51 -1
  176. package/package.json +13 -4
  177. package/packaging/conformance/README.md +10 -2
  178. package/packaging/conformance/fixtures/evidence-capture--allow-records-command.json +29 -0
  179. package/packaging/conformance/fixtures/stop-goal-fit--block-bundle-disputed-claim.json +29 -0
  180. package/packaging/conformance/fixtures/stop-goal-fit--block-capture-contradicts-claimed-pass.json +30 -0
  181. package/packaging/conformance/fixtures/stop-goal-fit--block-mode.json +23 -0
  182. package/packaging/conformance/fixtures/stop-goal-fit--off-mode.json +24 -0
  183. package/packaging/conformance/fixtures/stop-goal-fit--warn-active-delivery.json +5 -2
  184. package/packaging/conformance/fixtures/stop-goal-fit--warn-no-bundle.json +23 -0
  185. package/packaging/conformance/fixtures/workflow-steering--reground-active-prompt.json +30 -0
  186. package/packaging/conformance/fixtures/workflow-steering--reground-session-start.json +30 -0
  187. package/packaging/conformance/run-conformance.js +1 -1
  188. package/scripts/README.md +2 -1
  189. package/scripts/build-universal-bundles.js +0 -1
  190. package/scripts/ci/mint-attestation.js +221 -0
  191. package/scripts/ci/trust-reconcile.js +545 -0
  192. package/scripts/hooks/config-protection.js +423 -1
  193. package/scripts/hooks/evidence-capture.js +348 -0
  194. package/scripts/hooks/lib/liveness-read.js +113 -0
  195. package/scripts/hooks/run-hook.js +6 -1
  196. package/scripts/hooks/stop-goal-fit.js +1471 -79
  197. package/scripts/hooks/workflow-steering.js +135 -5
  198. package/scripts/install-codex-home.sh +39 -0
  199. package/scripts/install-merge.js +330 -0
  200. package/src/cli/init.ts +218 -20
  201. package/src/cli/validate-workflow-artifacts.ts +18 -2
  202. package/src/cli/verify.ts +100 -0
  203. package/src/cli/workflow-sidecar.ts +2093 -84
  204. package/src/cli.ts +2 -3
  205. package/src/index.ts +53 -0
  206. package/src/lib/flow-resolver.ts +284 -0
  207. package/src/tools/build-universal-bundles.ts +34 -21
  208. package/src/tools/generate-context-map.ts +3 -17
  209. package/src/tools/validate-source-tree.ts +44 -104
  210. package/tsconfig.json +1 -0
  211. package/build/src/tools/filter-installed-packs.js +0 -135
  212. package/packaging/packs.json +0 -49
  213. package/scripts/filter-installed-packs.js +0 -2
  214. package/src/tools/filter-installed-packs.ts +0 -132
@@ -17,20 +17,90 @@
17
17
  }
18
18
  ],
19
19
  "skills": [
20
- { "id": "builder.builder-shape", "path": "skills/builder-shape/SKILL.md", "description": "Invoke Builder Kit shape from a raw idea or the current conversation context." },
21
- { "id": "builder.deliver", "path": "skills/deliver/SKILL.md", "description": "Delivery workflow — selected work to delivered code." },
22
- { "id": "builder.design-probe", "path": "skills/design-probe/SKILL.md", "description": "One-question-at-a-time design probing interview." },
23
- { "id": "builder.evidence-gate", "path": "skills/evidence-gate/SKILL.md", "description": "Evaluate whether completed work is trustworthy enough for human review, merge, or release." },
24
- { "id": "builder.execute-plan", "path": "skills/execute-plan/SKILL.md", "description": "Parallel execution primitive — plan artifact path to implemented code." },
25
- { "id": "builder.fix-bug", "path": "skills/fix-bug/SKILL.md", "description": "Bug fix orchestrator — diagnose, plan, execute, review, verify, loop." },
26
- { "id": "builder.idea-to-backlog", "path": "skills/idea-to-backlog/SKILL.md", "description": "Turn raw ideas into shaped, prioritized, executable GitHub issue backlog." },
27
- { "id": "builder.learning-review", "path": "skills/learning-review/SKILL.md", "description": "Capture post-merge learnings and feed them back into backlog, skills, tests, or knowledge." },
28
- { "id": "builder.pickup-probe", "path": "skills/pickup-probe/SKILL.md", "description": "Builder Kit work-item/docs/provider-grounded Probe specialization before plan-work." },
29
- { "id": "builder.plan-work", "path": "skills/plan-work/SKILL.md", "description": "Code planning primitive — goal + directory to structured execution plan." },
30
- { "id": "builder.pull-work", "path": "skills/pull-work/SKILL.md", "description": "Select ready GitHub issues from the executable backlog for implementation." },
31
- { "id": "builder.release-readiness", "path": "skills/release-readiness/SKILL.md", "description": "Decide whether evidence-backed work is ready to merge, release, deploy, or hold." },
32
- { "id": "builder.review-work", "path": "skills/review-work/SKILL.md", "description": "Review primitive — code, security, dependency, architecture critique before verification." },
33
- { "id": "builder.tdd-workflow", "path": "skills/tdd-workflow/SKILL.md", "description": "Test-driven development RED, GREEN, REFACTOR with git checkpoints." },
34
- { "id": "builder.verify-work", "path": "skills/verify-work/SKILL.md", "description": "Verification primitive — session file path to structured evidence verdict." }
20
+ {
21
+ "id": "builder.builder-shape",
22
+ "path": "skills/builder-shape/SKILL.md",
23
+ "description": "Invoke Builder Kit shape from a raw idea or the current conversation context."
24
+ },
25
+ {
26
+ "id": "builder.continue-work",
27
+ "path": "skills/continue-work/SKILL.md",
28
+ "description": "Advance a multi-slice work item to its next increment via a fresh-context handoff, routing the next slice through pull-work + pickup-probe."
29
+ },
30
+ {
31
+ "id": "builder.deliver",
32
+ "path": "skills/deliver/SKILL.md",
33
+ "description": "Delivery workflow — selected work to delivered code."
34
+ },
35
+ {
36
+ "id": "builder.design-probe",
37
+ "path": "skills/design-probe/SKILL.md",
38
+ "description": "One-question-at-a-time design probing interview."
39
+ },
40
+ {
41
+ "id": "builder.evidence-gate",
42
+ "path": "skills/evidence-gate/SKILL.md",
43
+ "description": "Evaluate whether completed work is trustworthy enough for human review, merge, or release."
44
+ },
45
+ {
46
+ "id": "builder.gate-review",
47
+ "path": "skills/gate-review/SKILL.md",
48
+ "description": "Enumerate gate fires and suspected misses from the session trust.bundle; classify as correct/false_block/missed_block; route findings to learning-review; propose advisory fixes."
49
+ },
50
+ {
51
+ "id": "builder.execute-plan",
52
+ "path": "skills/execute-plan/SKILL.md",
53
+ "description": "Parallel execution primitive — plan artifact path to implemented code."
54
+ },
55
+ {
56
+ "id": "builder.fix-bug",
57
+ "path": "skills/fix-bug/SKILL.md",
58
+ "description": "Bug fix orchestrator — diagnose, plan, execute, review, verify, loop."
59
+ },
60
+ {
61
+ "id": "builder.idea-to-backlog",
62
+ "path": "skills/idea-to-backlog/SKILL.md",
63
+ "description": "Turn raw ideas into shaped, prioritized, executable GitHub issue backlog."
64
+ },
65
+ {
66
+ "id": "builder.learning-review",
67
+ "path": "skills/learning-review/SKILL.md",
68
+ "description": "Capture post-merge learnings and feed them back into backlog, skills, tests, or knowledge."
69
+ },
70
+ {
71
+ "id": "builder.pickup-probe",
72
+ "path": "skills/pickup-probe/SKILL.md",
73
+ "description": "Builder Kit work-item/docs/provider-grounded Probe specialization before plan-work."
74
+ },
75
+ {
76
+ "id": "builder.plan-work",
77
+ "path": "skills/plan-work/SKILL.md",
78
+ "description": "Code planning primitive — goal + directory to structured execution plan."
79
+ },
80
+ {
81
+ "id": "builder.pull-work",
82
+ "path": "skills/pull-work/SKILL.md",
83
+ "description": "Select ready GitHub issues from the executable backlog for implementation."
84
+ },
85
+ {
86
+ "id": "builder.release-readiness",
87
+ "path": "skills/release-readiness/SKILL.md",
88
+ "description": "Decide whether evidence-backed work is ready to merge, release, deploy, or hold."
89
+ },
90
+ {
91
+ "id": "builder.review-work",
92
+ "path": "skills/review-work/SKILL.md",
93
+ "description": "Review primitive — code, security, dependency, architecture critique before verification."
94
+ },
95
+ {
96
+ "id": "builder.tdd-workflow",
97
+ "path": "skills/tdd-workflow/SKILL.md",
98
+ "description": "Test-driven development — RED, GREEN, REFACTOR with git checkpoints."
99
+ },
100
+ {
101
+ "id": "builder.verify-work",
102
+ "path": "skills/verify-work/SKILL.md",
103
+ "description": "Verification primitive — session file path to structured evidence verdict."
104
+ }
35
105
  ]
36
106
  }
@@ -0,0 +1,116 @@
1
+ ---
2
+ name: "continue-work"
3
+ description: "Advance a multi-slice work item to its next increment via a fresh-context handoff. Use when one or more slices of a multi-slice issue have landed and the next undone slice should be built. Routes the next slice through pull-work + pickup-probe (never around the gate), restores prior slices from the durable record as precedent, and hands off in a fresh context per ADR 0013."
4
+ ---
5
+
6
+ # Continue Work
7
+
8
+ Advance a multi-slice work item to its **next increment**, in a fresh context, with the already-landed slices as precedent.
9
+
10
+ This skill is **orchestration, not new machinery.** It composes existing pieces and must never reimplement them:
11
+
12
+ - `pull-work` already owns *selection* plus the `pickup-probe` gate, and already handles "continue / keep going / pick up the next" intents. continue-work routes the chosen slice **through** `pull-work` / `pickup-probe`, never around it.
13
+ - The **resume surface** (#153) owns restoring an item's durable record (`state.json` lifecycle + `handoff.json` next-steps/blockers + plan artifact + `trust.bundle` trust summary) into context. continue-work consumes that surface; it does not re-derive the record.
14
+ - **ADR 0013** establishes `pull-work` as the clean *context-reset* seam: a fresh window per increment, selective compaction, status-gated reuse. continue-work spawns the next increment as a fresh-context workflow seamed at `pull-work`.
15
+ - The **fresh-handoff** pattern (spawn a new context for the next increment) is the delivery mechanism.
16
+
17
+ continue-work ties these together for **one job**: take a multi-slice work item that has at least one slice landed and more remaining, determine the next undone slice, route it through the gate, and hand it off fresh with the prior slices as the model.
18
+
19
+ ## When To Use / When Not
20
+
21
+ **Use** when:
22
+
23
+ - A multi-slice issue has **at least one slice landed and more remaining** (for example #106), and the request is "continue", "pick up the next slice", "keep going on this issue", or "do the next increment".
24
+
25
+ **Do not use** when:
26
+
27
+ - The request is **brand-new work** with nothing landed yet — that is selection from the backlog. Route to `pull-work`.
28
+ - The request is to **resume the *same* interrupted slice** after a restart (same in-flight slice, mid-execution, picking up new hooks/logic) — that is the resume surface (#153), which reconstructs `state.json` + `handoff.json` + plan + `trust.bundle` for the *same* increment. continue-work advances to the *next* increment; it does not re-enter an unfinished one.
29
+
30
+ If the boundary is ambiguous (is this the next slice or the same one?), stop and ask one question before routing. Do not silently assume.
31
+
32
+ ## Boundary (ADR 0014)
33
+
34
+ Home is the **Builder Kit** — developer orchestration over issues, slices, and PRs — alongside `pull-work` and `deliver`. The underlying *fresh-handoff primitive* is generic. If a non-developer kit later needs continuation, **graduate the primitive** per ADR 0014; do not fork continue-work into each kit.
35
+
36
+ ## Inputs
37
+
38
+ - The multi-slice work item (an issue ref) with at least one slice landed and more remaining.
39
+ - Repository or working directory and the owning kit/kit-dir.
40
+ - The durable record for the item when one exists, restored via the resume surface (#153): `state.json`, `handoff.json` next-steps/blockers, the plan artifact, and a one-line `trust.bundle` summary of what is already verified.
41
+ - The merged PRs and commits that reference the issue (the landed slices), available as `git show <sha>` precedent.
42
+ - `AGENTS.md` "Operating discipline (working agreements)" — the operating agreements that travel with every increment.
43
+
44
+ ## Workflow
45
+
46
+ ### 1. Restore the durable record (resume surface, #153)
47
+
48
+ Before doing anything else, restore the item's durable record into context through the resume surface (#153) rather than re-deriving it from chat memory: `state.json` lifecycle, `handoff.json` next-steps and blockers, the plan artifact, and a one-line `trust.bundle` summary of what is already verified. This is *restore for context*, not *resume the slice* — continue-work is advancing to the next increment, so already-verified prior slices stay verified and are not re-proven.
49
+
50
+ If no durable record exists for the item, record that gap and rely on the issue body plus merged PRs/commits as the authoritative history.
51
+
52
+ ### 2. Determine the next undone slice
53
+
54
+ From the issue body plus the merged PRs and commits referencing the issue, determine which slices have **landed** and which is the **next undone slice**.
55
+
56
+ - Read the issue body for the slice list / acceptance breakdown.
57
+ - List merged PRs and commits referencing the issue (`gh pr list --search <issue>`, `git log --grep`) to see which slices are done.
58
+ - The next undone slice is the thinnest remaining meaningful increment. If the remaining work is ambiguous or no longer matches the issue, route back to `idea-to-backlog` instead of inventing scope.
59
+
60
+ ### 3. Route the slice THROUGH pull-work + pickup-probe (the gate, never around it)
61
+
62
+ Hand the chosen next slice to `pull-work`, then `pickup-probe`. **Never bypass this gate.** A continuation instruction ("continue", "pick up the next") may justify inspecting the queue, but it must not skip per-item pickup Probe evidence — see pull-work's Pickup Gate ("A stale broad continuation instruction … may allow queue inspection but must not bypass per-item pickup Probe evidence") and its post-merge rule ("automatic continuation … cannot enter planning or execution for the next work item until a fresh pickup Probe record exists for that newly selected item").
63
+
64
+ - `pull-work` enforces board selection, WIP/shepherding, dependency, grouping, freshness (planned-base drift), and worktree logic for the selected slice, and writes the pull-work artifact.
65
+ - `pickup-probe` then challenges the slice against the repository — scope, acceptance quality, provider state, drift, conflict risks — and records the pickup Probe outcome, planning readiness, decisions, unresolved questions, and accepted gaps.
66
+ - continue-work does **not** reimplement either step's logic. It supplies them the next slice and the precedent (prior slices) and consumes their artifacts. The evidence that the gate ran lives in the pull-work / pickup-probe artifact referenced by the handoff (`probe_status`, `probe_artifact_ref`).
67
+
68
+ Do not enter planning or execution until a fresh pickup Probe record exists for this slice.
69
+
70
+ ### 4. Assemble the minimal handoff
71
+
72
+ Once the slice passes the gate, assemble the **minimal handoff** — the smallest durable context a fresh agent needs:
73
+
74
+ - the **slice's spec**: `gh issue view <issue>` (the issue is the spec);
75
+ - the **operating agreements**: `AGENTS.md` "Operating discipline";
76
+ - the **precedent**: the prior slices' merged PRs as the model (`git show` them);
77
+ - the **gate evidence**: the pull-work / pickup-probe artifact ref proving the slice passed the gate.
78
+
79
+ The minimal template it encodes:
80
+
81
+ ```
82
+ Implement [slice N of #ISSUE] in <repo> — the <kit>.
83
+ Read first: AGENTS.md 'Operating discipline'; gh issue view ISSUE (your slice's spec);
84
+ the prior slices as your model (PRs … — git show them).
85
+ Then: scope → minimal impl reusing existing ops (consume-never-fork) → tests stay green +
86
+ cover new code → PR referencing #ISSUE, <kit-dir> only. Don't merge; get CI green and report.
87
+ ```
88
+
89
+ ### 5. Execute in a fresh context (ADR 0013)
90
+
91
+ Hand the minimal template off into a **fresh context** — either spawn a sub-agent for the next increment, or hand the prompt to the operator for a fresh session. Per ADR 0013, the new increment rebuilds its context from durable artifacts (the issue, AGENTS.md, prior PRs, the gate artifact), not from this conversation's history. The fresh-handoff is the delivery seam: a sharp window for the new slice, continuity carried by the durable system.
92
+
93
+ The fresh-context agent runs the standard Builder Kit build for its slice (`plan-work` → `execute-plan` → `review-work` → `verify-work`), which it may reach via `deliver`. continue-work does not re-run those primitives in-line; it sets up the handoff and lets the fresh context execute.
94
+
95
+ ### 6. Verify and report — do not merge
96
+
97
+ After the slice is built:
98
+
99
+ - Confirm the **boundary held** (only `<kit-dir>` changed) and the **suites are green** (the slice's tests cover the new code and nothing regressed).
100
+ - Report: which slice advanced, the gate evidence (pull-work / pickup-probe artifact), the precedent PRs used, the verification result, and the PR.
101
+ - **Do not merge without authorization.** Get CI green and report back.
102
+
103
+ ## Composition Gate
104
+
105
+ continue-work has correctly composed the pieces only when:
106
+
107
+ - the durable record was restored via the resume surface (#153), or a missing-record gap is recorded;
108
+ - the next undone slice was derived from the issue body plus merged PRs/commits, not invented;
109
+ - the slice was routed **through** `pull-work` + `pickup-probe`, with a fresh pickup Probe record (`probe_status`, `probe_artifact_ref`) referenced by the handoff — the gate was not bypassed;
110
+ - the minimal handoff carries the issue spec, `AGENTS.md` operating agreements, and the precedent PRs;
111
+ - the next increment runs in a **fresh context** per ADR 0013;
112
+ - the boundary held, suites are green, and the change is reported without merging.
113
+
114
+ If any item fails, stop and surface the gap rather than proceeding.
115
+
116
+ Refs: #106 (proving ground), #153 (resume surface), #168 / ADR 0013 (context lifecycle), #164 (operating agreements), ADR 0014 (core vs domain-kit boundary).
@@ -145,9 +145,12 @@ Create the session file with `status: planning`, `iteration: 0`. Use the sidecar
145
145
  npm run workflow:sidecar -- ensure-session \
146
146
  --source-request "<original request>" \
147
147
  --summary "<current delivery goal>" \
148
- --criterion "<acceptance criterion>"
148
+ --criterion "<acceptance criterion>" \
149
+ --flow-id builder.build
149
150
  ```
150
151
 
152
+ `--flow-id builder.build` activates the FlowDefinition-driven path for this session. Producers fire, gates enforce on builder.* claims, and `advance-state` sets `active_step_id` automatically via the `builder.build` phase_map. Keep this flag on all `deliver`-initiated sessions; do not remove it for direct ad-hoc requests that are not builder-flow pickup.
153
+
151
154
  ### 2. Plan (plan-work)
152
155
 
153
156
  Invoke plan-work with the goal, directory, session file path, and any pull-work / pickup-probe artifact refs. The plan must include `## Definition Of Done`. Present the plan to the user when a user decision is actually needed; otherwise record the plan artifact and continue automatically to execution.
@@ -212,10 +215,36 @@ Record the final local state with `advance-state`. Use `status: verified` only w
212
215
  After review, verification, evidence, and Goal Fit are clean for the same diff:
213
216
 
214
217
  1. Confirm the working tree contains only verified scope.
215
- 2. Commit the verified diff.
216
- 3. Push the branch.
217
- 4. Open or update the provider change record with issue links, closing refs, evidence links, and verification summary, or record an explicit no-provider-change reason.
218
- 5. Wait for provider checks/CI or record missing checks as `NOT_VERIFIED`.
218
+ 2. Publish the session trust bundle to `delivery/` so the CI trust-reconcile job can verify what the agent claimed. `record-release` (via the sidecar writer) does this automatically (best-effort). To publish or re-publish explicitly:
219
+
220
+ ```bash
221
+ npm run workflow:sidecar -- publish-delivery .flow-agents/<slug>
222
+ ```
223
+
224
+ Then force-stage the trust artifacts for the delivery commit. They are gitignored
225
+ by default (they are runtime artifacts written on every local delivery) — `-f`
226
+ commits them deliberately into THIS delivery PR so CI's trust-reconcile job can
227
+ reconcile the session's claims against fresh CI results:
228
+
229
+ ```bash
230
+ git add -f delivery/trust.bundle delivery/trust.checkpoint.json
231
+ ```
232
+
233
+ 3. Commit the verified diff, including the force-added `delivery/trust.bundle` and `delivery/trust.checkpoint.json`.
234
+ 4. Push the branch.
235
+ 5. Open or update the provider change record with issue links, closing refs, evidence links, and verification summary, or record an explicit no-provider-change reason.
236
+ 6. Wait for provider checks/CI or record missing checks as `NOT_VERIFIED`.
237
+ 7. Record the gate claim for the Builder Kit `pr-open` step immediately after the PR is opened or updated:
238
+
239
+ ```bash
240
+ npm run workflow:sidecar -- record-gate-claim .flow-agents/<slug> \
241
+ --expectation pull-request-opened \
242
+ --status pass \
243
+ --summary "PR opened: <pr-url>. Linked to <work-item-ref>, implementation summary and verification evidence attached." \
244
+ --evidence-ref-json '{"kind":"provider","url":"<pr-url>"}'
245
+ ```
246
+
247
+ Use `--status fail` when the PR cannot be opened or when no provider change record is created and the reason is not an accepted no-provider-change path. Use `--status not_verified` when provider access is unavailable and the PR creation cannot be confirmed.
219
248
 
220
249
  Do not invoke `release-readiness` before this gate unless the user explicitly accepts a no-provider-change/no-push path and the reason is recorded in the session artifact. For GitHub, the first `ChangeProvider` adapter example is a PR with PR checks.
221
250
 
@@ -229,7 +258,8 @@ After CI passes and the work is merged or otherwise accepted:
229
258
  4. Promote the relevant plan, decision, evidence, and usage notes into long-lived docs such as `docs/`, `README.md`, or a project decision record.
230
259
  5. Link the long-lived doc back to the provider record, archived plan artifact, or accepted evidence when useful so future readers can see why and how the feature was built.
231
260
  6. Confirm `.flow-agents/` runtime artifacts remain untracked before merge to `main`.
232
- 7. Hand off to `learning-review` when the delivery exposed workflow, testing, documentation, or product follow-up.
261
+ 7. **Clean up the workspace once the merge is confirmed.** First verify the merge actually happened from the provider's own record (a merge commit / `mergedAt`) — not a green check or a watcher's exit code. Then honor the `worktree_lifecycle` recorded by `pull-work` (`retain_until: pr_merged`): remove the isolated worktree (`git worktree remove <path>`) and delete the now-merged branch locally and on the remote. Never delete a branch or worktree before the merge is confirmed — a closed-but-unmerged PR or a prematurely deleted branch loses work. The task is not done while it leaves a stale worktree or merged branch behind.
262
+ 8. Hand off to `learning-review` when the delivery exposed workflow, testing, documentation, or product follow-up.
233
263
 
234
264
  ### 11. Deliver
235
265
 
@@ -98,6 +98,34 @@ Before stopping, summarize:
98
98
  - Planning readiness.
99
99
  - Recommended next action.
100
100
 
101
+ ## Gate Claims: Builder Kit Design-Probe Step
102
+
103
+ When `design-probe` runs at the Builder Kit `design-probe` flow step and the probe reaches a stop condition with shared understanding or accepted gaps, record the gate claims before handing off to `plan-work`.
104
+
105
+ This applies whether the probe is run directly (generic) or as part of a Builder Kit productized flow. The `pickup-probe` specialization owns the same two claims when it runs instead.
106
+
107
+ **Claim 1 — Pickup readiness** (probe passed, goal fit and scope confirmed):
108
+
109
+ ```bash
110
+ npm run workflow:sidecar -- record-gate-claim .flow-agents/<slug> \
111
+ --expectation pickup-probe-readiness \
112
+ --status pass \
113
+ --summary "Design probe passed: goal fit confirmed, scope aligned, planning readiness verified." \
114
+ --evidence-ref-json '{"kind":"artifact","file":".flow-agents/<slug>/<slug>--<artifact>.md","summary":"Design-probe artifact with decisions, accepted gaps, and planning readiness."}'
115
+ ```
116
+
117
+ **Claim 2 — Probe decisions captured**:
118
+
119
+ ```bash
120
+ npm run workflow:sidecar -- record-gate-claim .flow-agents/<slug> \
121
+ --expectation probe-decisions-or-accepted-gaps \
122
+ --status pass \
123
+ --summary "Probe decisions recorded: decisions made, unresolved questions explicit, planning readiness confirmed." \
124
+ --evidence-ref-json '{"kind":"artifact","file":".flow-agents/<slug>/<slug>--<artifact>.md","summary":"Design-probe artifact with decisions and accepted gaps."}'
125
+ ```
126
+
127
+ Record both claims when shared understanding exists and the next action is `plan-work` or equivalent. Use `--status fail` when stopping due to an unresolved blocker. Skip these claims entirely when `design-probe` is used outside a Builder Kit flow (no active `builder.build` flow step in `current.json`).
128
+
101
129
  ## Boundaries
102
130
 
103
131
  - Do not ask multiple questions in one turn.
@@ -45,7 +45,7 @@ This skill owns orchestration between waves. The contracts own artifact continui
45
45
  - if traceability is missing, update the session file and/or send the plan back for refinement before delegation
46
46
  5. Set session file `status: executing` and use `npm run workflow:sidecar -- advance-state <artifact-dir> --status in_progress --phase execution --summary ... --next-action ...` when the repository provides it
47
47
  6. **Frontend design check:** If any tasks involve UI, CSS, layouts, components, or visual design, read the `frontend-design` skill and include its aesthetics guidelines in the tool-worker prompts for those tasks
48
- 7. Fan out each wave to tool-worker subagents (up to 4 parallel):
48
+ 7. **Before fan-out, run the [Pre-Fan-Out Freshness Re-Check](#pre-fan-out-freshness-re-check) and re-ground if the plan is stale.** Then fan out each wave to tool-worker subagents (up to 4 parallel):
49
49
  - Delegate to the exact `tool-worker` role for every implementation worker. Do not spawn unnamed/default implementation agents.
50
50
  ```
51
51
  Each tool-worker gets:
@@ -69,6 +69,14 @@ This skill owns orchestration between waves. The contracts own artifact continui
69
69
 
70
70
  The orchestrator owns root `state.json` updates. Workers should receive the workflow artifact root explicitly and append agent events under that root instead of inferring the slug or rewriting shared sidecars.
71
71
 
72
+ ## Pre-Fan-Out Freshness Re-Check
73
+
74
+ A plan can go stale between planning and execution — upstream may have advanced, or the plan may simply be old. `plan-work` and `pull-work` stamp and check `planned_base_sha` / `revision_freshness` at planning and pickup; this is the same check at the **execution boundary**, where stale plans actually cause wasted work (parallel workers building what already landed upstream). Run it before any worker starts.
75
+
76
+ - **Always — cheap SHA tripwire.** Re-fetch the target ref and compare the current target SHA to the plan's `planned_base_sha` (per `context/contracts/planning-contract.md`). If the base moved **and** the newer commits/files intersect `planning_scope_refs`, the plan is stale: do not fan out. Route back to `plan-work` (or `pickup-probe` for provider-backed work) to re-ground against the current base — the same `revision_freshness: stale` rule plan-work and pull-work already enforce. Missing `planned_base_sha` is not fresh; record a `NOT_VERIFIED` gap and confirm the base before fan-out.
77
+ - **On plan age — deeper re-survey.** If the plan is older than the staleness window (default ~1h; shorter for fast-moving scope), do the costlier relook the SHA diff cannot: re-survey what now exists in the target area (recently merged PRs, new modules, sibling work) for anything that already does what this plan proposes. If it already shipped upstream, stop and route back to `plan-work` rather than building a duplicate. The SHA tripwire is the precise signal; plan age is the backstop for landscape drift the diff can't see.
78
+ - Record the re-check result (`fresh`, or re-grounded with the compared SHAs and route-back) in the session file before continuing. Worktree/isolation needs stay owned by `pull-work`'s file-overlap decision — don't re-derive them here.
79
+
72
80
  ## Session File Updates
73
81
 
74
82
  Between each wave, append to the session file:
@@ -0,0 +1,234 @@
1
+ ---
2
+ name: "gate-review"
3
+ description: "Enumerate gate fires and suspected misses from the session's Hachure trust.bundle, classify each as correct/false_block/missed_block using Surface's resolveInquiry to produce canonical InquiryRecords, route findings to learning-review, and propose advisory-only gate/flow fixes. Use mid-session after a goal-fit block or at closeout. Requires ADR 0010 Phase 1 (trust.bundle dual-write) to be present."
4
+ ---
5
+
6
+ # Gate Review
7
+
8
+ Classify gate fires and suspected misses from the session's `trust.bundle` by calling Surface's `resolveInquiry` to produce canonical `InquiryRecord` outputs. Every finding is advisory — proposes a fix, never applies one.
9
+
10
+ ## Contract
11
+
12
+ - **Advisory-only**: proposes fixes, never applies them. No finding may instruct auto-application of any fix.
13
+ - Never writes to `scripts/hooks/` or any flow file.
14
+ - Reads the local `trust.bundle` file only. Does NOT fall back to `command-log.jsonl`, `.goal-fit-block-streak.json`, or `evidence.json` direct reads as primary inputs.
15
+ - If no `trust.bundle` is present at `.flow-agents/<slug>/trust.bundle`, reports `NOT_VERIFIED` and stops. Does not silently degrade to bespoke sidecar reads.
16
+ - Routes all telemetry, `learning.json` writes, and correction routing through `learning-review`. Gate-review never calls `record-learning` directly.
17
+ - Reads `state.json` for lifecycle context only (phase, status). `state.json` is NOT a trust claim per ADR 0010.
18
+ - Reads `context/gate-awareness.md` for vocabulary alignment when available.
19
+ - Classification vocabulary (`correct`, `false_block`, `missed_block`) aligns with `context/gate-awareness.md` sections "Judge Gate Correctness" and "Missed-Block Diagnostic".
20
+ - Uses `@kontourai/surface`'s `resolveInquiry(bundle, inquiry)` to produce canonical `InquiryRecord` outputs per ADR 0003.
21
+ - If `@kontourai/surface` is unavailable, logs a warning and skips output. No bespoke fork fallback.
22
+ - **Builder Kit build flow**: gate-review operates on sessions created by `deliver` or `plan-work` with `--flow-id builder.build`. The session's trust.bundle contains both declared builder.* claims (e.g. `builder.verify.tests`) and legacy workflow.* shadow claims. Gate-review classifies all claims present in the bundle regardless of claimType prefix.
23
+
24
+ ## Inputs
25
+
26
+ - `trust.bundle` at `.flow-agents/<slug>/trust.bundle` (produced by ADR 0010 Phase 1 dual-write in `workflow-sidecar`).
27
+
28
+ **Dependency**: this file is NOT present at `origin/main @ a9b8fd6`; it requires ADR 0010 Phase 1 to be built and merged (owned by `arch/goal-fit-gate-trust-bundle`). Do not begin execution until Phase 1 has landed or a fixture is agreed with that owner.
29
+
30
+ The bundle shape produced by `workflow-sidecar` (schemaVersion 3, source `"flow-agents/workflow-sidecar;statusFunctionVersion=1"`):
31
+ ```json
32
+ {
33
+ "schemaVersion": 3,
34
+ "source": "flow-agents/workflow-sidecar;statusFunctionVersion=1",
35
+ "claims": [
36
+ {
37
+ "id": "<slug>-<checkId>.<surface>.<fieldOrBehavior>",
38
+ "subjectType": "workflow-check",
39
+ "subjectId": "<slug>/<checkId>",
40
+ "surface": "flow-agents.workflow",
41
+ "claimType": "workflow.check.test",
42
+ "fieldOrBehavior": "<check summary>",
43
+ "value": "pass|fail|skip",
44
+ "createdAt": "<ISO-8601>",
45
+ "updatedAt": "<ISO-8601>",
46
+ "status": "verified|disputed|assumed|proposed|rejected|stale|unknown"
47
+ }
48
+ ],
49
+ "evidence": [...],
50
+ "events": [...],
51
+ "policies": []
52
+ }
53
+ ```
54
+
55
+ The claim `status` field is the canonically derived status (computed by `@kontourai/surface.deriveClaimStatus`). Status values and their meaning for gate-review:
56
+
57
+ | `status` | Meaning |
58
+ | --- | --- |
59
+ | `verified` | Claim confirmed by matching evidence; a pass. |
60
+ | `disputed` | Claim contradicted by evidence; a genuine failure. |
61
+ | `assumed` | Claim accepted without direct evidence (e.g. `accepted_gap` criterion, `skip` check). |
62
+ | `proposed` | Claim written but not yet evaluated. |
63
+ | `rejected` | Claim explicitly rejected. |
64
+ | `stale` | Claim data is outdated; gate had stale input. |
65
+ | `unknown` | No event found; claim was never evaluated. |
66
+
67
+ - `state.json` at `.flow-agents/<slug>/state.json` (lifecycle context; not a trust input).
68
+ - Optional: seeded fixture `trust.bundle` path for testing before Phase 1 produces real bundles.
69
+
70
+ ## Artifact Contract
71
+
72
+ Write the following artifacts under `.flow-agents/<slug>/`:
73
+
74
+ ### `<slug>--gate-review.md`
75
+
76
+ Human-readable summary. Sections:
77
+
78
+ - `## Session` — slug, state.json phase/status at review time, trust.bundle schemaVersion
79
+ - `## Gate Fires` — one entry per classified InquiryRecord
80
+ - `## Suspected Misses` — missed_block InquiryRecords; expected criteria absent from the bundle
81
+ - `## Advisory Fixes` — proposed (NOT applied) fixes per InquiryRecord (from `answer.value.advisoryFix`)
82
+ - `## NOT_VERIFIED Gaps` — any classification that could not be completed (e.g. trust.bundle absent, Surface unavailable)
83
+ - `## Routed To` — `learning-review` invocation record
84
+
85
+ ### `gate-review.inquiries.json`
86
+
87
+ Machine-readable array of canonical `InquiryRecord` objects validated against the hachure schema at `node_modules/hachure/schemas/inquiry-record.schema.json` (canonical `$id`: `https://kontourai.io/schemas/surface/inquiry-record.schema.json`).
88
+
89
+ Required fields per schema: `id`, `inquiry`, `outcome`, `resolutionPath`, `inputSnapshot`, `statusFunctionVersion`, `resolvedAt`.
90
+
91
+ The `outcome` field is the canonical Surface value: `"matched"` (claim found and resolved), `"derived"` (rule-based resolution), or `"unsupported"` (no matching claim — absent criterion).
92
+
93
+ The `answer` field carries gate-review's value-add:
94
+ - `answer.status` — canonical `TrustStatus` of the resolved claim (`"unknown"` when absent).
95
+ - `answer.value` — gate-review advisory object:
96
+ ```json
97
+ {
98
+ "calibration": "correct | false_block | missed_block",
99
+ "advisoryFix": "<non-empty advisory string>",
100
+ "gateFired": true,
101
+ "sessionSlug": "<slug>"
102
+ }
103
+ ```
104
+
105
+ The `calibration` field in `answer.value` is derived from `(outcome, answer.status, blockSignal.blocked)`:
106
+ - `"matched"` + `"disputed"|"rejected"` + `blocked=true` → `"correct"`
107
+ - `"matched"` + `"verified"|"assumed"` + `blocked=true` → `"false_block"`
108
+ - `"matched"` + `"stale"|"unknown"|"proposed"` + `blocked=false` → `"missed_block"`
109
+ - `"unsupported"` (absent claim) → `"missed_block"`
110
+
111
+ The `advisoryFix` in `answer.value` must be non-empty for every record. No record may have `auto_applied: true` or instruct automatic changes.
112
+
113
+ Example record:
114
+ ```json
115
+ {
116
+ "id": "my-session-gr-1",
117
+ "inquiry": {
118
+ "id": "my-session-gr-1",
119
+ "question": "Was gate action on claim my-session/unit-tests... (status: verified) justified?",
120
+ "askedBy": "gate-review",
121
+ "askedAt": "2026-06-24T00:00:00Z",
122
+ "target": { "subjectType": "workflow-check", "subjectId": "my-session/unit-tests", "fieldOrBehavior": "unit tests pass" }
123
+ },
124
+ "outcome": "matched",
125
+ "resolutionPath": { "claimIds": ["my-session/unit-tests.flow-agents.workflow.unit tests pass"] },
126
+ "answer": {
127
+ "status": "verified",
128
+ "value": {
129
+ "calibration": "false_block",
130
+ "advisoryFix": "Investigate why the gate blocked when claim ... has status verified ...",
131
+ "gateFired": true,
132
+ "sessionSlug": "my-session"
133
+ }
134
+ },
135
+ "inputSnapshot": [{ "claimId": "my-session/unit-tests.flow-agents.workflow.unit tests pass", "status": "verified" }],
136
+ "statusFunctionVersion": "1",
137
+ "resolvedAt": "2026-06-24T00:00:00Z"
138
+ }
139
+ ```
140
+
141
+ Invariants:
142
+ - Every record must have a non-empty `answer.value.advisoryFix`.
143
+ - No record may have `auto_applied: true`.
144
+ - `answer.value.calibration` must be one of `"correct"`, `"false_block"`, or `"missed_block"`.
145
+
146
+ After writing `gate-review.inquiries.json`, invoke `learning-review` passing the inquiries artifact path as an additional reviewer-notes input. Learning-review writes `learning.json` via `npm run workflow:sidecar -- record-learning`. Do NOT call `record-learning` from gate-review directly.
147
+
148
+ ## Bundle-Claim to InquiryRecord Mapping
149
+
150
+ | Bundle claim condition | outcome | calibration | Rationale |
151
+ | --- | --- | --- | --- |
152
+ | Gate blocked AND claim has `status: "disputed"` or `"rejected"` | `matched` | `correct` | Gate saw a genuine failure; block was warranted. |
153
+ | Gate blocked AND claim has `status: "verified"` or `"assumed"` | `matched` | `false_block` | Gate blocked despite passing claims — acted on stale or incorrect data. |
154
+ | An expected claim is absent from the bundle entirely | `unsupported` | `missed_block` | Gate had no claim to evaluate. |
155
+ | A claim has `status: "stale"` and the gate did NOT block | `matched` | `missed_block` | Stale claim was present but gate did not fire on it. |
156
+ | A claim has `status: "unknown"` with no evidence trace | `matched` | `missed_block` | Claim was never evaluated; gate had no resolved evidence. |
157
+
158
+ Cross-reference with `state.json` phase at the time of the block to confirm the block was in an active workflow phase (not planning or archived).
159
+
160
+ ## Workflow
161
+
162
+ ### Step 1 — Locate trust.bundle
163
+
164
+ Resolve `.flow-agents/<slug>/trust.bundle`. The slug is the most recent active session (by `current.json` or `state.json` newest-mtime). If absent, surface the blocker:
165
+
166
+ ```
167
+ [gate-review] trust.bundle absent — NOT_VERIFIED. Build ADR 0010 Phase 1 first.
168
+ ```
169
+
170
+ Stop and surface the blocker to the user.
171
+
172
+ ### Step 2 — Load Surface and resolve inquiries
173
+
174
+ Run `npm run workflow:sidecar -- gate-review <dir>`.
175
+
176
+ The sidecar writer:
177
+ 1. Loads `@kontourai/surface` (ESM, fail-open dynamic import).
178
+ 2. For each claim in the bundle: builds a `SurfaceInquiry` with a canonical `target` and calls `resolveInquiry(bundle, inquiry)`.
179
+ 3. For each absent expected criterion (from `acceptance.json`): builds a `SurfaceInquiry` targeting the missing claim; `resolveInquiry` returns `"unsupported"`.
180
+ 4. Derives `calibration` from `(outcome, answer.status, blockSignal.blocked)` using `deriveGateCalibration`.
181
+ 5. Composes advisory `advisoryFix` string using `gateAdvisoryFix`.
182
+ 6. Sets `answer.value = { calibration, advisoryFix, gateFired, sessionSlug }`.
183
+ 7. Strips Surface-internal fields (`identityLinkIds`, `transitiveRuleIds`) to conform to the hachure schema.
184
+ 8. Validates each record against `inquiry-record.schema.json` (fail-open).
185
+ 9. Writes `gate-review.inquiries.json`.
186
+
187
+ ### Step 3 — Classify each InquiryRecord
188
+
189
+ Apply the InquiryRecord calibration mapping:
190
+
191
+ **`correct`** — `outcome: "matched"`, claim `status: "disputed"` or `"rejected"`, `blocked=true`:
192
+ > Gate saw a genuine failure. Block was warranted. Advisory fix: close the gap and re-run.
193
+
194
+ **`false_block`** — `outcome: "matched"`, claim `status: "verified"` or `"assumed"`, `blocked=true`:
195
+ > Gate blocked despite passing claims. Advisory fix: investigate the block trigger; add freshness check.
196
+
197
+ **`missed_block`** — `outcome: "unsupported"` (absent) OR `status: "stale"|"unknown"|"proposed"`, `blocked=false`:
198
+ > Gate had no claim to evaluate or claim was unresolved. Advisory fix: ensure record-evidence writes the claim.
199
+
200
+ ### Step 4 — Write human-readable summary
201
+
202
+ Write `<slug>--gate-review.md` with sections for Session, Gate Fires, Suspected Misses, Advisory Fixes, NOT_VERIFIED Gaps, and Routed To.
203
+
204
+ Optionally use `buildTrustReport(bundle)` + `formatTrustReportSummary(report)` from `@kontourai/surface` for the trust-state summary section.
205
+
206
+ ### Step 5 — Invoke learning-review
207
+
208
+ Pass the `gate-review.inquiries.json` path as additional reviewer notes to `learning-review`. Do not call `record-learning` directly. Learning-review owns the `learning.json` write and correction routing.
209
+
210
+ Example invocation note:
211
+ ```
212
+ gate-review InquiryRecords at .flow-agents/<slug>/gate-review.inquiries.json:
213
+ - <N> record(s): calibration counts
214
+ - gate fired: <true/false>
215
+ - calibration: correct=<n>, false_block=<n>, missed_block=<n>
216
+ Use these as reviewer notes for the learning-review correction record.
217
+ ```
218
+
219
+ ## Gates
220
+
221
+ - **Advisory gate**: every InquiryRecord must have a non-empty `answer.value.advisoryFix`. Gate-review must not complete without one per record.
222
+ - **No-auto-apply gate**: no record's advisory fix may instruct auto-application of any fix. Any proposed fix that starts with "Apply" or "Change" must be rephrased as "Propose" or "Investigate".
223
+ - **Phase-1 dependency gate**: if `trust.bundle` is absent, surface the blocker to the user rather than degrading silently to bespoke sidecars.
224
+ - **Surface gate**: if `@kontourai/surface` is unavailable, log and skip (no fork fallback).
225
+
226
+ ## NOT_VERIFIED Gaps
227
+
228
+ | Gap | Description | Resolution trigger |
229
+ | --- | --- | --- |
230
+ | NV1 | trust.bundle absent at `origin/main @ a9b8fd6` — ADR 0010 Phase 1 not yet built | Phase 1 merged to main by `arch/goal-fit-gate-trust-bundle` owner |
231
+ | NV2 | AC1 seeded-session test fixture cannot be validated against real bundle shape | Phase 1 lands; coordinate with Phase 1 owner on exact bundle file path and claim array shape |
232
+ | NV3 | AC2 false_block / missed_block fixture depends on exact Phase 1 bundle structure | Same as NV2 |
233
+
234
+ AC1 and AC2 are `not_verified` pending ADR 0010 Phase 1. The classification logic is spec-complete against the real bundle shape (confirmed by `workflow-sidecar ensure-session` + `record-evidence` probe). Re-run seeded-session tests after Phase 1 lands.
@@ -105,6 +105,36 @@ Check whether accepted delivery artifacts were promoted into long-lived document
105
105
 
106
106
  Record which follow-ups were created, which were intentionally deferred, and what trigger should revisit deferred work.
107
107
 
108
+ ## Gate Claims: Record Learning Outcomes
109
+
110
+ After `learning.json` is written and the learning verdict is `LEARNED` or `FOLLOWUP_REQUIRED`, record the two gate claims for the Builder Kit `learn` step. These satisfy the `builder.learn.decisions` and `builder.learn.evidence` gate expectations.
111
+
112
+ **Claim 1 — Decision evidence** (durable decisions from the build are recorded):
113
+
114
+ ```bash
115
+ npm run workflow:sidecar -- record-gate-claim .flow-agents/<slug> \
116
+ --expectation decision-evidence \
117
+ --status pass \
118
+ --summary "Build decisions recorded: <decision-count> decisions captured, correction.<needed> recorded." \
119
+ --evidence-ref-json '{"kind":"artifact","file":".flow-agents/<slug>/learning.json","summary":"learning.json with decisions and correction state."}'
120
+ ```
121
+
122
+ **Claim 2 — Learning evidence** (learnings from delivery are recorded for future work):
123
+
124
+ ```bash
125
+ npm run workflow:sidecar -- record-gate-claim .flow-agents/<slug> \
126
+ --expectation learning-evidence \
127
+ --status pass \
128
+ --summary "Learning evidence captured: <outcome> outcome, facts recorded, routing complete." \
129
+ --evidence-ref-json '{"kind":"artifact","file":".flow-agents/<slug>/learning.json","summary":"learning.json with outcomes, facts, and routing."}'
130
+ ```
131
+
132
+ Record both claims immediately after `record-learning` succeeds and artifact validation passes. Use `--status fail` when `record-learning` fails or when learning cannot be captured (verdict `BLOCKED`). Use `--status not_verified` only when the session has no active Builder Kit flow step.
133
+
134
+ When the learning verdict is `FOLLOWUP_REQUIRED`, record both claims with `--status pass` and name the open routing in the summary; the follow-up route is separate from gate satisfaction.
135
+
136
+
137
+
108
138
  ## Gates
109
139
 
110
140
  - Learning Gate: observed outcome is recorded with evidence.