@kontourai/flow-agents 1.4.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (184) hide show
  1. package/.github/CODEOWNERS +29 -0
  2. package/.github/actions/trust-verify/action.yml +145 -0
  3. package/.github/workflows/ci.yml +11 -4
  4. package/.github/workflows/kit-gates-demo.yml +2 -2
  5. package/.github/workflows/publish-npm.yml +10 -2
  6. package/.github/workflows/release-please.yml +1 -1
  7. package/.github/workflows/runtime-compat.yml +1 -1
  8. package/.github/workflows/trust-reconcile.yml +113 -0
  9. package/AGENTS.md +13 -0
  10. package/CHANGELOG.md +103 -0
  11. package/CONTRIBUTING.md +4 -4
  12. package/README.md +1 -0
  13. package/agents/tool-planner.json +1 -1
  14. package/build/src/cli/init.js +242 -20
  15. package/build/src/cli/validate-workflow-artifacts.js +19 -2
  16. package/build/src/cli/verify.d.ts +1 -0
  17. package/build/src/cli/verify.js +90 -0
  18. package/build/src/cli/workflow-sidecar.d.ts +316 -8
  19. package/build/src/cli/workflow-sidecar.js +1996 -91
  20. package/build/src/cli.js +2 -3
  21. package/build/src/lib/flow-resolver.d.ts +111 -0
  22. package/build/src/lib/flow-resolver.js +308 -0
  23. package/build/src/tools/build-universal-bundles.js +34 -22
  24. package/build/src/tools/generate-context-map.js +3 -16
  25. package/build/src/tools/validate-source-tree.d.ts +1 -1
  26. package/build/src/tools/validate-source-tree.js +42 -162
  27. package/context/contracts/artifact-contract.md +10 -0
  28. package/context/contracts/delivery-contract.md +1 -0
  29. package/context/contracts/review-contract.md +1 -0
  30. package/context/contracts/verification-contract.md +2 -0
  31. package/context/gate-awareness.md +39 -0
  32. package/context/scripts/hooks/stop-goal-fit.js +632 -70
  33. package/docs/adr/0001-flow-agents-consumes-flow.md +1 -1
  34. package/docs/adr/0002-flow-kits-as-extension-unit.md +1 -1
  35. package/docs/adr/0004-gates-expect-surface-claims.md +2 -0
  36. package/docs/adr/0005-kubernetes-inspired-resource-contracts.md +2 -0
  37. package/docs/adr/0007-skill-audit.md +1 -1
  38. package/docs/adr/0009-canonical-hook-core-kit-boundary.md +95 -0
  39. package/docs/adr/0010-workflow-trust-state-as-hachure-bundle.md +139 -0
  40. package/docs/adr/0011-mcp-posture.md +100 -0
  41. package/docs/adr/0012-agent-coordination-as-liveness-claims.md +119 -0
  42. package/docs/adr/0013-context-lifecycle.md +151 -0
  43. package/docs/adr/0014-core-vs-domain-kit-boundary.md +143 -0
  44. package/docs/adr/0015-flow-flow-agents-boundary-reconciliation.md +120 -0
  45. package/docs/adr/0016-three-hard-boundary-model.md +71 -0
  46. package/docs/adr/0017-anti-gaming-trust-security-model.md +155 -0
  47. package/docs/agent-system-guidebook.md +5 -12
  48. package/docs/context-map.md +4 -10
  49. package/docs/index.md +3 -2
  50. package/docs/integrations/framework-adapter.md +19 -6
  51. package/docs/integrations/index.md +2 -2
  52. package/docs/north-star.md +4 -4
  53. package/docs/operating-layers.md +3 -3
  54. package/docs/plans/adr-0010-phase2-gate-recompute.md +55 -0
  55. package/docs/repository-structure.md +2 -2
  56. package/docs/skills-map.md +1 -0
  57. package/docs/spec/runtime-hook-surface.md +62 -9
  58. package/docs/standards-register.md +3 -3
  59. package/docs/survey-utterance-check.md +1 -1
  60. package/docs/trust-anchor-adoption.md +197 -0
  61. package/docs/verifiable-trust.md +95 -0
  62. package/docs/veritas-integration.md +2 -2
  63. package/docs/workflow-usage-guide.md +69 -0
  64. package/evals/acceptance/DEMO-false-completion.md +144 -0
  65. package/evals/acceptance/demo-cast.sh +92 -0
  66. package/evals/acceptance/demo-false-completion.sh +72 -0
  67. package/evals/acceptance/demo-real-evidence.sh +104 -0
  68. package/evals/acceptance/demo.tape +29 -0
  69. package/evals/acceptance/prove-capture-teeth-declared.sh +335 -0
  70. package/evals/acceptance/prove-capture-teeth.sh +114 -0
  71. package/evals/acceptance/prove-teeth.sh +105 -0
  72. package/evals/ci/antigaming-suite.sh +55 -0
  73. package/evals/ci/run-baseline.sh +2 -0
  74. package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/flows/review.flow.json +26 -0
  75. package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/kit.json +20 -0
  76. package/evals/fixtures/flow-kit-repository/valid-unknown-extension/flows/review.flow.json +26 -0
  77. package/evals/fixtures/flow-kit-repository/valid-unknown-extension/kit.json +18 -0
  78. package/evals/integration/test_builder_step_producers.sh +379 -0
  79. package/evals/integration/test_bundle_install.sh +35 -71
  80. package/evals/integration/test_bundle_lifecycle.sh +39 -2
  81. package/evals/integration/test_captured_fail_reconciliation.sh +820 -0
  82. package/evals/integration/test_checkpoint_signing.sh +489 -0
  83. package/evals/integration/test_claim_lookup.sh +352 -0
  84. package/evals/integration/test_command_log_fork_classification.sh +134 -0
  85. package/evals/integration/test_command_log_integrity.sh +275 -0
  86. package/evals/integration/test_context_map.sh +0 -2
  87. package/evals/integration/test_dual_emit_flow_step.sh +278 -0
  88. package/evals/integration/test_enforcer_expects_driven.sh +281 -0
  89. package/evals/integration/test_evidence_capture_hook.sh +185 -0
  90. package/evals/integration/test_flow_kit_repository.sh +2 -0
  91. package/evals/integration/test_flowdef_session_activation.sh +273 -0
  92. package/evals/integration/test_flowdef_session_history_preservation.sh +250 -0
  93. package/evals/integration/test_gate_bypass_chain.sh +448 -0
  94. package/evals/integration/test_gate_lockdown.sh +1137 -0
  95. package/evals/integration/test_gate_review_inquiry_records.sh +399 -0
  96. package/evals/integration/test_goal_fit_escape_hatch.sh +73 -0
  97. package/evals/integration/test_goal_fit_hook.sh +69 -4
  98. package/evals/integration/test_goal_fit_rederive.sh +263 -0
  99. package/evals/integration/test_install_merge.sh +1176 -0
  100. package/evals/integration/test_kit_identity_trust.sh +393 -0
  101. package/evals/integration/test_mint_attestation.sh +373 -0
  102. package/evals/integration/test_phase_map_and_gate_claim.sh +365 -0
  103. package/evals/integration/test_publish_delivery.sh +269 -0
  104. package/evals/integration/test_reconcile_soundness.sh +528 -0
  105. package/evals/integration/test_resolvefirststep_security.sh +208 -0
  106. package/evals/integration/test_session_resume_roundtrip.sh +286 -0
  107. package/evals/integration/test_trust_checkpoint.sh +325 -0
  108. package/evals/integration/test_trust_reconcile.sh +293 -0
  109. package/evals/integration/test_verify_cli.sh +208 -0
  110. package/evals/integration/test_workflow_sidecar_writer.sh +549 -34
  111. package/evals/lib/node.sh +0 -6
  112. package/evals/run.sh +47 -0
  113. package/evals/static/test_workflow_skills.sh +6 -13
  114. package/install.sh +0 -7
  115. package/integrations/strands-ts/README.md +25 -15
  116. package/integrations/veritas/flow-agents.adapter.json +1 -2
  117. package/kits/builder/flows/build.flow.json +59 -12
  118. package/kits/builder/kit.json +85 -15
  119. package/kits/builder/skills/continue-work/SKILL.md +116 -0
  120. package/kits/builder/skills/deliver/SKILL.md +36 -6
  121. package/kits/builder/skills/design-probe/SKILL.md +28 -0
  122. package/kits/builder/skills/execute-plan/SKILL.md +9 -1
  123. package/kits/builder/skills/gate-review/SKILL.md +234 -0
  124. package/kits/builder/skills/learning-review/SKILL.md +30 -0
  125. package/kits/builder/skills/pickup-probe/SKILL.md +29 -0
  126. package/kits/builder/skills/plan-work/SKILL.md +13 -1
  127. package/kits/builder/skills/pull-work/SKILL.md +19 -0
  128. package/kits/knowledge/adapters/default-store/index.js +38 -0
  129. package/kits/knowledge/adapters/flow-runner/index.js +1620 -0
  130. package/kits/knowledge/adapters/obsidian-store/index.js +36 -6
  131. package/kits/knowledge/docs/store-contract.md +314 -0
  132. package/kits/knowledge/evals/audit-freshness/suite.test.js +368 -0
  133. package/kits/knowledge/evals/canonicalize-category/suite.test.js +383 -0
  134. package/kits/knowledge/evals/contract-suite/suite.test.js +111 -0
  135. package/kits/knowledge/evals/detect-contradictions/suite.test.js +324 -0
  136. package/kits/knowledge/evals/entities/suite.test.js +40 -0
  137. package/kits/knowledge/evals/glossary-sync/suite.test.js +416 -0
  138. package/kits/knowledge/evals/hygiene-review/suite.test.js +396 -0
  139. package/kits/knowledge/evals/retirement/suite.test.js +145 -0
  140. package/kits/knowledge/flows/audit-freshness.flow.json +44 -0
  141. package/kits/knowledge/flows/canonicalize-category.flow.json +44 -0
  142. package/kits/knowledge/flows/detect-contradictions.flow.json +44 -0
  143. package/kits/knowledge/flows/glossary-sync.flow.json +61 -0
  144. package/kits/knowledge/flows/hygiene-review.flow.json +43 -0
  145. package/kits/knowledge/kit.json +51 -1
  146. package/package.json +6 -6
  147. package/packaging/conformance/README.md +10 -2
  148. package/packaging/conformance/fixtures/evidence-capture--allow-records-command.json +29 -0
  149. package/packaging/conformance/fixtures/stop-goal-fit--block-bundle-disputed-claim.json +29 -0
  150. package/packaging/conformance/fixtures/stop-goal-fit--block-capture-contradicts-claimed-pass.json +30 -0
  151. package/packaging/conformance/fixtures/stop-goal-fit--block-mode.json +23 -0
  152. package/packaging/conformance/fixtures/stop-goal-fit--off-mode.json +24 -0
  153. package/packaging/conformance/fixtures/stop-goal-fit--warn-active-delivery.json +5 -2
  154. package/packaging/conformance/fixtures/stop-goal-fit--warn-no-bundle.json +23 -0
  155. package/packaging/conformance/fixtures/workflow-steering--reground-active-prompt.json +30 -0
  156. package/packaging/conformance/fixtures/workflow-steering--reground-session-start.json +30 -0
  157. package/packaging/conformance/run-conformance.js +1 -1
  158. package/scripts/README.md +2 -1
  159. package/scripts/build-universal-bundles.js +0 -1
  160. package/scripts/ci/mint-attestation.js +221 -0
  161. package/scripts/ci/trust-reconcile.js +545 -0
  162. package/scripts/hooks/config-protection.js +423 -1
  163. package/scripts/hooks/evidence-capture.js +348 -0
  164. package/scripts/hooks/lib/liveness-read.js +113 -0
  165. package/scripts/hooks/run-hook.js +6 -1
  166. package/scripts/hooks/stop-goal-fit.js +1524 -79
  167. package/scripts/hooks/workflow-steering.js +135 -5
  168. package/scripts/install-codex-home.sh +39 -0
  169. package/scripts/install-merge.js +330 -0
  170. package/scripts/repair-command-log.js +115 -0
  171. package/src/cli/init.ts +218 -20
  172. package/src/cli/validate-workflow-artifacts.ts +18 -2
  173. package/src/cli/verify.ts +100 -0
  174. package/src/cli/workflow-sidecar.ts +2127 -84
  175. package/src/cli.ts +2 -3
  176. package/src/lib/flow-resolver.ts +369 -0
  177. package/src/tools/build-universal-bundles.ts +34 -21
  178. package/src/tools/generate-context-map.ts +3 -17
  179. package/src/tools/validate-source-tree.ts +44 -104
  180. package/build/src/tools/filter-installed-packs.d.ts +0 -2
  181. package/build/src/tools/filter-installed-packs.js +0 -135
  182. package/packaging/packs.json +0 -49
  183. package/scripts/filter-installed-packs.js +0 -2
  184. package/src/tools/filter-installed-packs.ts +0 -132
@@ -0,0 +1,151 @@
1
+ ---
2
+ title: "ADR 0013: Context Lifecycle — Workflow-Boundary Compaction, Freshness-Gated Reuse, and the Learning Split"
3
+ ---
4
+
5
+ # ADR 0013: Context Lifecycle
6
+
7
+ **Date:** 2026-06-25
8
+ **Status:** Accepted as direction (decided with Brian Anderson, 2026-06-25). Implementation phased.
9
+
10
+ ---
11
+
12
+ ## Context
13
+
14
+ A long agent session produces good results partly because the *conversation* holds the
15
+ thread — the corrections, the discoveries, the accumulated repo model. But that thread is
16
+ ephemeral: it does not survive a fresh session, and within a session it degrades the model
17
+ (attention and reasoning fall off as context grows; cost rises). The goal is the *feeling* of
18
+ an infinite session — let it run, or start fresh, and get very similar results — driven by the
19
+ **durable system** (ADRs, issues, trust bundles, `state.json`, the gates, the context-map,
20
+ skills), **not** by the chat history.
21
+
22
+ We already have most of the substrate: durable per-task **trust bundles** keyed by
23
+ `subjectId`; **freshness** (the liveness/duration + commit-window machinery from ADR 0012);
24
+ the **liveness stream** indexing in-progress work; `context-map --check` for repo-structure
25
+ freshness; and the `learning-review` loop. What is missing is the *wiring* that turns these
26
+ into a context lifecycle — and a decision about **what may evolve where**, so the kit stays
27
+ deterministic and reproducible.
28
+
29
+ ## Decision
30
+
31
+ ### 1. The workflow boundary is the context boundary (selective compaction)
32
+
33
+ A workflow is a bounded unit of work; **`pull-work` (the start of a workflow) is the clean
34
+ seam to reset context.** A new workflow rebuilds its context from durable artifacts (the work
35
+ item, ADRs, context-map, prior bundles) — not the prior conversation. The *feel* of an
36
+ infinite session is therefore **a seamless sequence of fresh-context workflows seamed at
37
+ `pull-work`**, with the system carrying continuity: the model stays sharp (fresh window per
38
+ item); the experience stays continuous (durable state).
39
+
40
+ Compaction is **selective, not automatic.** Follow-on work that *shares a lane* with the
41
+ current work benefits from the warm context; unrelated work does not. `pull-work` already
42
+ reads the signal — the liveness lane / `subjectId` / dependency graph — so the rule is:
43
+ **suggest a fresh/compacted context when the new workflow's lane is disjoint from the
44
+ current; keep the context warm when it is a continuation.** `pull-work` *suggests*; the
45
+ operator (or a policy) decides.
46
+
47
+ ### 2. Freshness-gated context reuse (don't rebuild what's fresh; don't trust what's stale)
48
+
49
+ To limit rebuilding context without relying on stale information, **reuse context gated by
50
+ freshness** — the trust primitive applied to context:
51
+
52
+ - the **trust bundle** (per work-item, by `subjectId`) **is** the durable context of that work
53
+ — read it instead of re-deriving;
54
+ - **freshness** is the **stale-guard** — reuse claims that are still fresh; re-derive only what
55
+ has gone stale (e.g., the commit window moved);
56
+ - the **liveness stream** is the index of in-progress work — the entry point to *glean*;
57
+ - `context-map --check` is the same pattern one level up (re-survey only changed structure).
58
+
59
+ **Gleaning in-progress work** (a prior session's, or another developer's once the stream is
60
+ shared per ADR 0012's cross-machine sink) **must respect claim *status*.** In-progress work is
61
+ mid-flight — its claims are `proposed`/`assumed`, not `verified`. Glean the *intent* and the
62
+ *verified* facts; treat unverified claims as provisional. This is the line between "I see what
63
+ they're attempting" (safe) and "I'll build on their unproven conclusions" (the stale-info
64
+ trap). The status field is the guard.
65
+
66
+ ### 3. The learning split — three buckets; the kit does not self-evolve per machine
67
+
68
+ Self-improvement is **not one mechanism**. Conflating these would let the kit mutate its own
69
+ behavior on each user's machine — a thousand divergent kits, the opposite of the open,
70
+ deterministic, reproducible format the product *is*. The split:
71
+
72
+ - **User-project knowledge** (docs, context, the project's `AGENTS.md` about *their* codebase)
73
+ is **data.** The learning loop that captures it **ships in the kit** as a feature, operates
74
+ **per-project**, and is *expected* to diverge.
75
+ - **Kit discipline** (consume-never-fork, the vocabulary rules, "flakiness ⇒ real bug") is
76
+ guidance for *any* agent using the kit, so it ships in the kit — but it is **code/versioned,
77
+ encoded by us, uniform across users.** It does **not** self-modify on user machines; it
78
+ improves when *we* ship a new version.
79
+ - **Model/agent tendencies** (e.g. sycophancy) belong to the **agent**, not the kit or the
80
+ project — a different home (the agent's own disciplines). Encoding a model quirk into a
81
+ shipped kit is a category error.
82
+
83
+ **The invariant:** the kit (including the learning loop) is *uniform and versioned*; what it
84
+ *learns per project* is *data* and diverges. Same app, different user data. The thing that must
85
+ **never** diverge per machine is the kit's *behavior*.
86
+
87
+ ### 4. The self-encode mechanism — the learning loop + a confirmation gate
88
+
89
+ Operating lessons self-encode through the existing loop, **extended to an
90
+ `operating-agreement` claim type, with a propose→confirm gate** (zero-touch self-encoding is
91
+ rejected — it produces a brittle, over-fit, self-contradicting rulebook):
92
+
93
+ 1. **Detect candidates** at workflow close (where `learning-review` runs): scan for *behavior*
94
+ signals — human pushback, redos, reverts, "you already built that," self-critique flags.
95
+ 2. **Propose** (never auto-apply): surface the candidate agreement with its motivating
96
+ evidence.
97
+ 3. **Confirm + distill** — the gate. A human (or a very-high-bar reviewer) accepts / rejects /
98
+ edits and generalizes the instance into a principle. **Without this gate, do not build it.**
99
+ 4. **Encode as a structured claim** (`subject: operating-discipline`, `evidence: [corrections]`,
100
+ `status`, `freshness`) — the human-readable `AGENTS.md` is a *projection* of these claims,
101
+ so it is queryable, versioned, and decayable, not a doc that rots.
102
+ 5. **Apply *relevant*, decay by freshness** — future sessions load agreements **filtered by
103
+ workflow type / lane** (not the whole rulebook), at session/workflow start; unused or
104
+ repeatedly-overridden agreements go stale and flag for review.
105
+
106
+ **The escalation ladder** — `correction → advisory context → gate check → merge-readiness
107
+ criterion` — exists in two contexts with *different drivers*, and **neither is the kit
108
+ self-evolving**:
109
+
110
+ - For **kit discipline**, the ladder is **our internal dogfooding dev process** (we run the kit
111
+ on the kit, catch what it does wrong, and the *output is a PR / ADR / version bump* — never
112
+ runtime self-modification).
113
+ - For **user-project agreements**, the kit *offers* users the ability to promote *their*
114
+ agreements to *their* gates — **user-driven configuration of their data**, not the kit
115
+ autonomously adding a gate.
116
+
117
+ ## Consequences
118
+
119
+ - **Stopping stops mattering.** Once a session's operating lessons live in the loop (or shipped
120
+ kit guidance) rather than the chat, a fresh session reproduces the quality — so restart is
121
+ free, and "infinite" is a UX property, not a context-length goal.
122
+ - **Determinism is protected** — the kit's behavior is uniform and versioned; only per-project
123
+ *data* diverges.
124
+ - **Context cost drops** — reuse-gated-by-freshness avoids rebuilding what's still valid and
125
+ avoids trusting what's stale; selective compaction keeps windows small without losing warm
126
+ context for continuations.
127
+ - **Costs (eyes open):** (1) selective-compaction and gleaning need the lane/overlap +
128
+ status signals to be reliable (depends on ADR 0012 maturing); (2) the meta-learning loop
129
+ only automates the *second* occurrence of a lesson — a human still catches each *novel*
130
+ class of drift once (this is permanent, and fine); (3) the propose→confirm gate is human
131
+ effort (kept cheap, but not zero).
132
+
133
+ ## Alternatives Considered
134
+
135
+ - **One long (infinite) session.** Rejected: fights model degradation and cost; the right
136
+ target is *session-independence* (cheap restart, identical results), not session-immortality.
137
+ - **A hand-written `AGENTS.md` as the end-state.** Rejected as the *end-state*: it is the loop's
138
+ *seed* (initial state), not an alternative to it; left alone it rots.
139
+ - **Zero-touch self-encoding of lessons.** Rejected: without a confirmation gate it over-fits
140
+ and contradicts itself; judgment must gate the rulebook.
141
+ - **A self-evolving kit (per-machine).** Rejected hard: violates the determinism/reproducibility
142
+ thesis — a thousand divergent kits. Kit behavior is versioned; only project *data* diverges.
143
+ - **Always rebuild context fresh / always reuse.** Rejected: rebuild-always is wasteful;
144
+ reuse-always trusts stale data. Freshness gates the choice per claim.
145
+
146
+ ## References
147
+
148
+ - [ADR 0010](./0010-workflow-trust-state-as-hachure-bundle.md) — trust bundle as durable context.
149
+ - [ADR 0012](./0012-agent-coordination-as-liveness-claims.md) — liveness/freshness, `subjectId`
150
+ correlation, the shared stream and cross-machine sink, resumption-via-durable-evidence.
151
+ - `learning-review` skill; `gate-review` / self-critique; `context-map --check`; `pull-work`.
@@ -0,0 +1,143 @@
1
+ ---
2
+ title: "ADR 0014: Flow Agents core vs domain kits — the generic/kit boundary"
3
+ ---
4
+
5
+ # ADR 0014: Flow Agents core vs domain kits
6
+
7
+ **Date:** 2026-06-25
8
+ **Status:** Proposed (decision owner: Brian Anderson). Defines the boundary; code moves are sequenced, not immediate.
9
+ Revised after boundary review: `workflow-sidecar` confirmed **core** (the lifecycle engine, with only a few developer-leaning *defaults* to make kit-extensible); Builder Kit reframed as a **first-class pulled-out kit**; `knowledge`/`release-evidence` cited as already validating the substrate.
10
+
11
+ > **Superseded in part by ADR 0016 (2026-06-26).** The finding that the *bespoke `workflow-sidecar` FSM* is the legitimate core engine is superseded. The core does own a lifecycle **engine**, but per ADR 0016 + 0015 + #183 that engine must be the **FlowDefinition / Resource-Contract-driven** one — the bespoke FSM is a parallel reimplementation of ADR 0005 that retires via ADR 0015's migration. This ADR's boundary *principle* (the agent-blind "dividing test", core vs domain-kit) stands; only the "keep the FSM as core / tweak defaults" conclusion is replaced.
12
+
13
+ ---
14
+
15
+ ## Context
16
+
17
+ Flow Agents is defined (CONTEXT.md) as *"an operating layer that helps agents route natural
18
+ user requests into the right procedures, tools, state, evidence, knowledge, and follow-ups"* —
19
+ a **generic, domain-agnostic** layer. The README states the intended composition model:
20
+ *"domain kits that compose this substrate — a Sales Kit…, a Research Kit…"*. `kits/` already
21
+ holds three: `builder` (developer workflows), `knowledge` (knowledge capture/recall), and
22
+ `release-evidence`.
23
+
24
+ But the structure does **not** reflect this. The entire `agents/` directory is *developer*
25
+ tooling (`tool-code-reviewer`, `tool-verifier`, `tool-worker`, `tool-planner`,
26
+ `tool-explore-*`, `tool-playwright`), and `context/contracts/{review,verification,execution}`
27
+ are *developer* contracts (code-review lanes; build/types/lint/test phases). These live in the
28
+ "core" locations (`agents/`, `context/`) yet are consumed only by the **Builder Kit**
29
+ (`kits/builder/` has no contracts or agents of its own — it consumes core). The non-developer
30
+ `knowledge` kit does not use them.
31
+
32
+ This surfaced while placing two universal disciplines (fail-loud/no-silent-data-loss from #160;
33
+ "a flake is a real defect"): the only "homes" available were developer contracts, which forced
34
+ a generic principle into a developer-specific file (#170). The boundary is **implicit**, and
35
+ the implicit version conflates the generic operating layer with one domain kit.
36
+
37
+ ## Decision
38
+
39
+ ### 1. The boundary principle
40
+
41
+ - **Flow Agents (core) owns generic *mechanisms*** that any domain reuses: the workflow
42
+ lifecycle (work-items, states, phases, transitions), the trust substrate (bundle, claims,
43
+ evidence, policies, freshness, status derivation via Surface), the enforcement gates
44
+ (goal-fit, evidence-capture, reground), liveness/coordination (ADR 0012), routing, durable
45
+ persistence, kit installation/runtime adapters, and the **agent operating disciplines**
46
+ (consume-never-fork, fail-loud, evidence-bearing, freshness-gating).
47
+ - **A domain kit owns the domain *specifics***: its vocabulary, the concrete workflow shape,
48
+ the domain verification/review *criteria*, the domain *tools*, the domain *schema*, and the
49
+ side-effect *adapters*. Builder Kit is the developer domain kit; Sales/Research/Knowledge are
50
+ siblings.
51
+
52
+ ### 2. The clean test
53
+
54
+ > **"Would a non-developer domain kit (e.g. `knowledge`, a Sales Kit) need this?"**
55
+ > Yes → it is generic → **Flow Agents core.** No → it is developer-specific → **Builder Kit.**
56
+
57
+ By this test today: the `tool-*` agents (worker/code-reviewer/verifier/planner), the Builder
58
+ *skills* (plan-work/execute-plan/review-work/verify-work), the code-review lanes, and the
59
+ build/lint/test phases are **Builder Kit**. The lifecycle **engine** (`workflow-sidecar` — it
60
+ writes the trust bundle, advances state, records evidence/claims, emits liveness), the trust
61
+ substrate, the gates, liveness, and the persistence/data-integrity invariants are **core**.
62
+
63
+ The only developer lean *inside* the core engine is in its **defaults**, not its mechanism: the
64
+ code-specific `checkKinds` (`build`/`types`/`lint`/`test`/`browser`) and some vocabulary
65
+ (`init-plan` reads developer-ish; it just means *open a tracked work-item from its defining
66
+ artifact* — create its state/acceptance/handoff/trust.bundle and claim it via liveness). Those
67
+ defaults should become **kit-extensible** (and the vocab can be neutralized, e.g. `init-plan` →
68
+ `init-work`) — a small core cleanup, not a relocation. The engine is core; it is simply not yet
69
+ *exercised* by a non-developer kit (`knowledge`/`release-evidence` took the lighter `flows`
70
+ path), which is a validation gap, not a sign it is Builder-shaped.
71
+
72
+ ### 3. Kits extend, never reimplement
73
+
74
+ Domain kits **consume** the generic mechanisms; they must not fork the lifecycle, the trust
75
+ substrate, or the gates (consume-never-fork, ADR 0008, applied to kit authors). A kit that
76
+ needs different behavior configures or extends the generic mechanism — it does not ship a
77
+ parallel one. This is what keeps "an open format that means the same thing everywhere" true
78
+ across kits.
79
+
80
+ ### 4. Mixed contracts split: generic invariant (core) + domain extension (kit)
81
+
82
+ `review-contract` and `verification-contract` are **mixed**. The split:
83
+
84
+ - **Generic (core):** verify work meets acceptance criteria with evidence; mark `NOT_VERIFIED`
85
+ honestly; **fail-loud, never fail-open** (persistence that silently drops a record is data
86
+ loss, not a degraded mode); **nondeterminism is a defect** (an operation that can pass
87
+ without doing its job is a failure); review against standards with evidence-bearing,
88
+ severity-tagged findings; don't silently pass.
89
+ - **Domain (Builder Kit):** the concrete phases (`build`/`types`/`lint`/`tests`/`browser`) and
90
+ review lanes (code quality, security scanning, architecture fit).
91
+
92
+ The two disciplines from #170 are **generic** and belong in the core invariant layer — so
93
+ *every* kit (knowledge persisting a note, a Sales Kit logging to a CRM) inherits them — not in
94
+ the developer `review-contract`. **#170 is re-homed here**, not merged as-placed.
95
+
96
+ ## Consequences
97
+
98
+ - **Protects the core value proposition.** "Generic, domain-agnostic operating layer" is the
99
+ moat; a core that secretly assumes code-review/build/test undermines it for the next domain
100
+ kit author. The clean test (§2) becomes a **standing design gate**, not a one-time cleanup.
101
+ - **Re-homes #170** and tells us where future cross-cutting disciplines go.
102
+ - **Builder Kit becomes a first-class, pulled-out kit** — the same shape `knowledge` already
103
+ has (its own `kit.json`, flows, skills, and now agents + contracts) — and an *independently
104
+ valuable* product, not a demo. The developer tools (`agents/`) and the developer halves of the
105
+ mixed contracts move into it; the generic invariants consolidate in core. This touches code the
106
+ Phase-4 agents are active in — **define now, move later, coordinate** (no premature big-bang).
107
+
108
+ ## Alternatives Considered
109
+
110
+ - **Leave the boundary implicit.** Rejected: it leaks developer assumptions into the core and
111
+ blocks clean domain-kit authoring.
112
+ - **Refactor the folders first, define later.** Rejected: moving `agents/` and contracts into
113
+ `kits/builder/` is large and conflicts with active Phase-4 work; the *definition* is cheap and
114
+ must lead.
115
+ - **Fully purify the core now (extract a grand generic verification/review framework).**
116
+ Rejected as **speculative generality** (the consume-never-fork sibling). The *substrate*
117
+ (gates, Surface claims, flows) is **already proven domain-neutral** by shipping kits —
118
+ `knowledge` and `release-evidence` use it with no developer machinery. What is *not* yet
119
+ exercised by a non-developer kit is the **lifecycle engine** (`workflow-sidecar`); a
120
+ Sales/Research kit that actually uses `init-plan → advance-state → record-evidence → liveness`
121
+ is the real validator, and would surface which engine *defaults* (above) are developer-shaped.
122
+ Build it for a real use case, not as architecture theater. Extract only the invariants that are
123
+ already obviously cross-domain (data integrity, evidence honesty, nondeterminism, freshness).
124
+
125
+ ## Product weigh-in (requested)
126
+
127
+ 1. **The boundary is the moat — treat the clean test as a gate.** Every time something lands in
128
+ `agents/` or `context/`, ask "would the knowledge/Sales kit need it?" If no, it's a Builder
129
+ Kit feature wearing a core costume.
130
+ 2. **Build out a non-developer kit that uses the lifecycle engine — it is the cheapest way to
131
+ find the true core.** The substrate is already validated; what isn't is `workflow-sidecar`
132
+ under a non-developer domain. A Sales/Research kit that *reuses* `init-plan → advance-state →
133
+ record-evidence → liveness` will expose which engine defaults are secretly developer-shaped
134
+ and de-risk the refactor — *if* there is a real use case (not architecture theater).
135
+ 3. **Ship the generic disciplines to the core invariant layer regardless** — data-integrity,
136
+ evidence honesty, nondeterminism, freshness are cross-domain today; they should not wait on
137
+ the full refactor.
138
+
139
+ ## References
140
+
141
+ - CONTEXT.md (Flow Agents definition); README (domain-kit direction).
142
+ - ADR 0008 (consume-never-fork), 0010 (trust bundle), 0012 (liveness), 0013 (context lifecycle).
143
+ - #170 (the two disciplines, parked pending this boundary); #160 (the fail-open data-loss).
@@ -0,0 +1,120 @@
1
+ ---
2
+ title: "ADR 0015: Flow / Flow Agents Boundary Reconciliation"
3
+ ---
4
+
5
+ # ADR 0015: Flow / Flow Agents Boundary Reconciliation
6
+
7
+ **Date:** 2026-06-25
8
+ **Status:** Accepted. Tier 0 (#175) shipped; Tier 1 (#176) closed-by-evaluation (+ a found anti-gaming fix, #196); Tier 2 (#177) **reopened and scoped** as the Resource Contract migration (the sidecar FSM IS a parallel reimplementation per ADR 0005 / #183 — see corrected Reassessment); #178/#179 are deferred cross-package work.
9
+ **Parent issue:** #174 (umbrella)
10
+
11
+ ---
12
+
13
+ ## Context
14
+
15
+ ADR 0001 established that Flow Agents *consumes* Flow for generic workflow enforcement
16
+ rather than owning the enforcement kernel. The boundary is owned by ADR 0001. During
17
+ Phase 4 of ADR 0010 (trust.bundle as sole verification artifact), a drift was found:
18
+ `src/cli/workflow-sidecar.ts` contained a bespoke trust-bundle schema validator
19
+ (`tryLoadHachureValidator` / `getHachureValidator` / local `validateTrustBundle`) that
20
+ duplicated logic already owned canonically by `@kontourai/surface`.
21
+
22
+ Surface's `validateTrustBundle` is the canonical owner at the lowest code layer:
23
+ hachure owns the schemas, surface owns the trust computation (including validation),
24
+ flow owns the workflow engine, flow-agents owns product adapters. The bespoke validator
25
+ was a THREE-WAY duplication of that ownership:
26
+
27
+ 1. Hachure's `trust-bundle.schema.json` (the schema source of truth)
28
+ 2. Surface's `validateTrustBundle` (the canonical validator, using those schemas)
29
+ 3. Flow Agents' bespoke AJV + hachure-schema-loading validator (the drift)
30
+
31
+ A survey of flow-agents found that flow-agents uses approximately 1 of ~95 flow exports
32
+ (the workflow engine / gate-expectation / run-state kernel). A parallel run-state / gate
33
+ model is being reconciled through a tiered program (see below).
34
+
35
+ ## Decision
36
+
37
+ ### Layered ownership
38
+
39
+ ```
40
+ hachure — schemas (trust-bundle.schema.json, claim, evidence, policy, event)
41
+ surface — trust computation: validateTrustBundle, deriveClaimStatus, resolveInquiry
42
+ flow — workflow engine: Flow Definitions, Runs, steps, gates, transitions
43
+ flow-agents — product/adapters: skills, hooks, sidecar writers, runtime adapters
44
+ ```
45
+
46
+ Flow-agents does not own trust-bundle schema validation. Surface owns it.
47
+
48
+ ### Tier 0 (this PR): consume surface's validateTrustBundle
49
+
50
+ Replaced the bespoke `tryLoadHachureValidator` / `getHachureValidator` / local
51
+ `validateTrustBundle` in `src/cli/workflow-sidecar.ts` with consumption of
52
+ `@kontourai/surface`'s canonical `validateTrustBundle`.
53
+
54
+ **Equivalence verified before swap:** surface's validator is equivalent-or-stronger than
55
+ the bespoke one — it validates the same structural constraints (required fields,
56
+ enum values, schema shape) plus cross-reference integrity (evidence → claim, event →
57
+ claim, event → evidence) that the hachure JSON schema did not enforce. All nine
58
+ test cases agreed; surface rejected two additional invalid bundles (dangling references)
59
+ that the bespoke validator accepted.
60
+
61
+ **Return shape preserved:** the public export `validateTrustBundle(bundle) →
62
+ { valid, errors, available }` is preserved. `available` reflects surface presence (surface
63
+ is required per ADR 0010 Phase 4c; fail-open is maintained for diagnostic use). The
64
+ function became `async` because surface is ESM-only and loaded via `import()`; the call
65
+ site in `writeTrustBundle` is already async; the test inline script uses top-level await
66
+ in ES module mode.
67
+
68
+ **AJV decision:** AJV and hachure schema loading are retained for `validateInquiryRecord`
69
+ (which validates inquiry-record.schema.json — a separate schema not covered by surface's
70
+ `validateTrustBundle`). Only the trust-bundle AJV duplication was removed.
71
+
72
+ **normalizeSurfaceRefs advisory validation:** the inline advisory validation in
73
+ `normalizeSurfaceRefs` (which validates referenced trust.bundle files) was updated to use
74
+ the cached `_surfaceModule` instead of the bespoke `getHachureValidator`. Fail-open
75
+ behavior is preserved: if surface is not yet loaded when `normalizeSurfaceRefs` runs,
76
+ validation is skipped.
77
+
78
+ ### Tiered reconciliation program (post Tier 0)
79
+
80
+ The broader boundary reconciliation (issue #174) is phased:
81
+
82
+ | Tier | Issue | Scope | Outcome |
83
+ | --- | --- | --- | --- |
84
+ | Tier 0 | #175 | consume surface's `validateTrustBundle`; delete bespoke validator | **DONE** — the one genuine fork removed |
85
+ | Tier 1 | #176 | gate-expectation engine: consume flow's gate evaluation kernel | **CLOSED by evaluation** — the gate already consumes Surface (`deriveClaimStatus`) for re-derivation; residual logic is product-specific gate policy. Scoping it found+fixed a real anti-gaming regression (PR #196). |
86
+ | Tier 2 | #177 | run-state kernel → Resource Contract migration | **REOPENED AND SCOPED** — the cheap `FlowRunState` swap is still churn (original eval correct on that narrow point), but the sidecar FSM IS a parallel reimplementation of ADR 0005's Resource Contract (`state.json→WorkflowRun.status`, `acceptance.json→RunPlan.spec`, `evidence→conditions[].evidenceRefs`). The real convergence (Resource Contract + Flow Definitions, retiring the FSM, #183) is the accepted direction. Scoped as a phased migration (projection → FlowDefinition-backed advance-state → hooks → resume/evals → retire sidecars). `kits/builder/flows/build.flow.json` already exists; the FSM just doesn't consult it. |
87
+ | promotes | #178 | promote liveness / InquiryRecord / run-hook upstream | **Deferred — cross-package** (requires `flow`/`surface` source changes; not doable from this repo). |
88
+ | contracts | #179 | extract generic vocabulary to flow contracts | **Deferred — cross-package.** |
89
+
90
+ ### Reassessment (post Tiers 1–2) — corrected
91
+
92
+ **An earlier version of this Reassessment was too narrow and is corrected here.** It was right that Tier 1's gate computation already consumes Surface, and that a *cheap mechanical `FlowRunState` swap* (Tier 2's original framing) would be pure churn. But it wrongly concluded the sidecar FSM is a *legitimate product-specific layer* and that the program is "essentially resolved." That misses the larger issue documented in #183:
93
+
94
+ `workflow-sidecar.ts`'s state model — the 11 phases, 13 statuses, bespoke `advanceState` guard, and per-session `state.json`/`handoff.json`/`acceptance.json`/`current.json` — **IS a parallel reimplementation of the Kontour Resource Contract (ADR 0005)** at the product level. ADR 0005 defines `WorkflowRun`/`RunPlan`/`SelectedScope`/`Gate` as the durable record shape for exactly this information; the sidecar FSM predates ADR 0005's acceptance and was never migrated. `docs/kontour-resource-contract.md`'s Compatibility Guidance already documents the mapping (`state.json→WorkflowRun.status`, `acceptance.json→RunPlan.spec`, `evidence→conditions[].evidenceRefs`, `handoff.json→WorkflowRun.status`) — i.e. it is a pre-ADR-0005 parallel implementation, not a deliberate product layer. Notably `kits/builder/flows/build.flow.json` (a Builder FlowDefinition, 10 steps / 9 gates) **already exists** — `advance-state` simply doesn't consult it.
95
+
96
+ **Corrected outcome:** Tier 0 done (the one Surface-layer fork removed); Tier 1 closed-by-evaluation (+ the #196 anti-gaming fix); **Tier 2 reopened and scoped** as a phased Builder→Resource-Contract/Flow-Definition migration (see #177): Phase 1 projection layer → Phase 2 FlowDefinition-backed `advance-state` → Phase 3 hooks → Phase 4 resume/evals → Phase 5 retire sidecars → Phase 6 Flow kernel (deferred to #178). Per #183, Builder and Knowledge are the **same** abstraction (Resource Contract over `WorkflowRun`/`Gate`), so this is a prerequisite for new kit authors to have a stable target, not optional cleanup — and the Builder migration must coordinate with the parallel Knowledge work (which already ships Flow Definitions).
97
+
98
+ **Invariant that must survive migration (#183 Finding 2):** `WorkflowRun.status.conditions` are writable summaries; the gate re-derives from Hachure claims via Surface; `conditions[].evidenceRefs` cite claim IDs. **Do not fuse Resource and claim** — the separation (a friendly mutable surface over an un-gameable derived core) is the architecture.
99
+
100
+ ## Consequences
101
+
102
+ - **No bespoke trust-bundle schema validator in flow-agents.** Surface is the canonical
103
+ owner; flow-agents delegates.
104
+ - **Stronger validation.** Surface also validates cross-reference integrity (dangling
105
+ evidence/event references) that the hachure JSON schema did not. Bundles produced by
106
+ `buildTrustBundle` are already reference-consistent, so no regression is possible in
107
+ normal operation — only malformed external inputs are now additionally rejected.
108
+ - **async API.** `validateTrustBundle` is now async (returns `Promise<{valid,errors,available}>`).
109
+ All existing call sites are in async contexts. External consumers of the library export
110
+ must `await` the result.
111
+ - **Surface availability:** surface was already REQUIRED for bundle writes per ADR 0010 4c.
112
+ `available: false` (fail-open) is only reachable in degraded diagnostic environments
113
+ (e.g. `FLOW_AGENTS_SURFACE_UNAVAILABLE=1` test seam) or if surface fails to load.
114
+
115
+ ## References
116
+
117
+ - [ADR 0001](./0001-flow-agents-consumes-flow.md) — Flow Agents consumes Flow; boundary ownership.
118
+ - [ADR 0010](./0010-workflow-trust-state-as-hachure-bundle.md) — trust bundle as workflow trust state; Phase 4c.
119
+ - GitHub issue #174 (umbrella: flow/flow-agents boundary reconciliation)
120
+ - GitHub issue #175 (Tier 0: this PR)
@@ -0,0 +1,71 @@
1
+ ---
2
+ title: "ADR 0016: The Three-Hard-Boundary Model — a FlowDefinition-Driven, Kit-Agnostic Core"
3
+ ---
4
+
5
+ # ADR 0016: The Three-Hard-Boundary Model — a FlowDefinition-Driven, Kit-Agnostic Core
6
+
7
+ **Date:** 2026-06-26
8
+ **Status:** Accepted
9
+ **Supersedes (in part):** ADR 0014's finding that the bespoke `workflow-sidecar` FSM is the legitimate core engine.
10
+ **Builds on:** ADR 0001 (Flow Agents consumes Flow), 0004 (gates expect trust claims), 0005 (Resource Contract), 0007 (Flow/Skill/Kit/Tool), 0009 (canonical hook core/kit boundary), 0015 (Flow↔Flow-Agents reconciliation); synthesis input #183.
11
+
12
+ ---
13
+
14
+ ## Context
15
+
16
+ The boundary between **flow**, **flow-agents**, and **flow-agent-kits** is defined across many ADRs (0001, 0007, 0009, 0014, 0015) but never as one model, and the pieces have drifted:
17
+
18
+ 1. **A real contradiction.** ADR 0014 (Proposed) calls the bespoke `workflow-sidecar` FSM "confirmed core — the legitimate lifecycle engine" needing "a small cleanup, not a relocation." ADR 0015 (Accepted) calls the same FSM "a parallel reimplementation of ADR 0005's Resource Contract" to be retired via a phased migration. Both were written 2026-06-25; they cannot both stand as written.
19
+
20
+ 2. **A load-bearing gap.** No ADR states that the core's gate enforcer and lifecycle driver must be **driven by the active kit's FlowDefinition** rather than hardcode a claim taxonomy. The consequence is live in the code: `scripts/hooks/stop-goal-fit.js` enforces a hardcoded generic taxonomy (`workflow.check.*`, `workflow.critique.review`, `workflow.acceptance.criterion`) with **zero** references to any FlowDefinition, while the kits' FlowDefinitions declare a different, per-step vocabulary (`builder.verify.tests`, `knowledge.ingest.capture`). ADR 0009 narrowed the de-coupling rule to skill/template names and explicitly *blessed* the hardcoded `workflow.*` taxonomy as "core" — sanctioning the very coupling this ADR forbids.
21
+
22
+ 3. **Unresolved ownership.** Claim-taxonomy ownership (generic kinds vs kit-namespaced types) and the cardinality/lifetime parameterization (#183) are decided nowhere binding.
23
+
24
+ We own the full stack (hachure → surface → flow → flow-agents → kits). The boundaries should be **hard**, and the abstractions should let the *demoed* use cases — the Builder delivery workflow and the Knowledge hygiene workflows — run on the same machinery from their FlowDefinitions.
25
+
26
+ ## Decision
27
+
28
+ ### 1. Three hard boundaries (one named model)
29
+
30
+ - **flow** — the **domain-agnostic workflow engine.** Owns the FlowDefinition schema (steps, gates, `expects[]`, `route_back_policy`), gate *evaluation* (`evaluateGate` over expectations, re-derived from the trust layer), transition validation, run-state (`FlowRunState`/Resource Contract run model), route-back, and Flow Reports. It knows nothing about any kit or claim vocabulary; it operates on *whatever a FlowDefinition declares*.
31
+
32
+ - **flow-agents (core)** — the **kit-agnostic execution of a flow inside an agent harness.** Owns the lifecycle *driver* (`advance-state`), the gate *enforcer* (the Stop hook), evidence capture, the trust.bundle producer, the Resource/sidecar projection, the runtime adapters (claude/codex/…), and session machinery. It executes **any** kit's FlowDefinition. "**Core**" in this repo means exactly this layer; align other ADRs' usage to it.
33
+
34
+ - **flow-agent-kits** — the **domains** (builder, knowledge, and future Sales/Research). Each kit **declares**: its FlowDefinition(s) (steps + gates + `expects[]`), its skills/agents (the claim *producers*), and its domain store/adapters. The kit **declares**; the core **executes**. A kit never re-implements the engine or the enforcer.
35
+
36
+ ### 2. Abstraction A (load-bearing) — the core is FlowDefinition-driven
37
+
38
+ The core gate enforcer and lifecycle driver **MUST be driven by the active kit's FlowDefinition.** The enforcer evaluates the claim expectations the kit's FlowDefinition `expects[]` declares for the current gate (re-deriving status via the trust layer); the lifecycle driver validates transitions and reads `route_back_policy` from the FlowDefinition. **The core MUST NOT hardcode a claim taxonomy, step graph, or route-back limit.**
39
+
40
+ The current `stop-goal-fit.js` hardcoded `workflow.*` taxonomy and `advance-state`'s hardcoded `>= 3` / `phase==="learning"` rules are **violations to remediate**, not "core contract." (This corrects ADR 0009 §3, which reclassified skill/template names but left the hardcoded taxonomy in place.)
41
+
42
+ ### 3. Abstraction B — claim-taxonomy ownership
43
+
44
+ The **kit's FlowDefinition is authoritative** for which claims each gate expects. The core derives the generic *kind* of an expectation (a check, a critique, an acceptance, etc.) from the FlowDefinition's expectation metadata; it does not pattern-match a hardcoded namespace. Generic claim **kinds** are flow/core vocabulary; the **binding** of kinds to steps + accepted statuses is the kit's FlowDefinition. (Reconciles ADR 0004's kit-namespaced examples with the core enforcer.)
45
+
46
+ ### 4. Abstraction C — cardinality & lifetime are kit parameters, not new engines
47
+
48
+ Builder and Knowledge are the **same** model at two settings of two parameters (per #183): **cardinality** (Builder = one work-item subject; Knowledge = many records — `SelectedScope` already says "one or many") and **lifetime** (run-scoped vs durable Resources via `ownerReferences`). New kits set these parameters over the same core + FlowDefinition machinery; they do not author new lifecycle engines.
49
+
50
+ ### 5. Abstraction D — Resource/claim separation is the architecture (invariant)
51
+
52
+ Per ADR 0005 + #183 Finding 2: the **Resource Contract** (`WorkflowRun`/`RunPlan`/`status.conditions`) is the run/state model; the bespoke `workflow-sidecar` FSM is a parallel reimplementation that **retires** via ADR 0015's phased migration. Conditions are **writable summaries**; the gate **re-derives** truth from Hachure claims via Surface; `conditions[].evidenceRefs` cite claim IDs. **Resource and claim must not be fused** — the friendly mutable surface over the un-gameable derived core is the whole point.
53
+
54
+ ### 6. Resolution of the 0014↔0015 contradiction
55
+
56
+ Both were partly right. The core **owns a lifecycle engine** (0014) — but that engine is the **FlowDefinition / Resource-Contract-driven** one defined here, **not** the bespoke FSM, which retires (0015). ADR 0014's "keep the FSM as core, tweak defaults" finding is superseded by this ADR; its boundary *principle* (the agent-blind "dividing test") stands.
57
+
58
+ ## Consequences
59
+
60
+ - A clear target for the ADR-0015 migration: each phase moves a core mechanism from hardcoded behavior to FlowDefinition-driven behavior, within these boundaries.
61
+ - The first remediation of Abstraction A is the gate enforcer: `stop-goal-fit` should evaluate the active kit's FlowDefinition `expects[]` (still re-deriving via Surface — the anti-gaming property is unchanged, only the *source of expectations* moves).
62
+ - New kits become "write a FlowDefinition + skills + (optionally) a store adapter" — no engine work.
63
+ - **P-d (dual-emit shadow retired):** FlowDefinition-driven sessions now emit ONLY the declared `builder.*` (or kit-namespaced) claim per gate — the `-legacy` workflow.* shadow is removed. The no-flow `workflow.*` primary path in `buildTrustBundle` is the LEGITIMATE home for standalone primitive use (not scaffolding); it is preserved unchanged. Full removal of the no-flow path would require forcing primitives through a default/minimal FlowDefinition — a separate future decision outside this cleanup.
64
+ - Terminology: "core" is fixed to the flow-agents kit-agnostic execution layer; 0001/0009/0014/0015 usage aligns to it.
65
+
66
+ ## References
67
+
68
+ - ADR 0001, 0004, 0005, 0007, 0008, 0009, 0014, 0015
69
+ - `docs/kontour-resource-contract.md` (Compatibility Guidance)
70
+ - Issue #183 (synthesis input — "not a new decision"); #174 (boundary umbrella); #177 (the migration)
71
+ - `scripts/hooks/stop-goal-fit.js` (the Abstraction-A violation to remediate); `kits/builder/flows/build.flow.json`, `kits/knowledge/flows/*.flow.json` (kit FlowDefinitions)