@kontourai/flow-agents 1.4.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (184) hide show
  1. package/.github/CODEOWNERS +29 -0
  2. package/.github/actions/trust-verify/action.yml +145 -0
  3. package/.github/workflows/ci.yml +11 -4
  4. package/.github/workflows/kit-gates-demo.yml +2 -2
  5. package/.github/workflows/publish-npm.yml +10 -2
  6. package/.github/workflows/release-please.yml +1 -1
  7. package/.github/workflows/runtime-compat.yml +1 -1
  8. package/.github/workflows/trust-reconcile.yml +113 -0
  9. package/AGENTS.md +13 -0
  10. package/CHANGELOG.md +103 -0
  11. package/CONTRIBUTING.md +4 -4
  12. package/README.md +1 -0
  13. package/agents/tool-planner.json +1 -1
  14. package/build/src/cli/init.js +242 -20
  15. package/build/src/cli/validate-workflow-artifacts.js +19 -2
  16. package/build/src/cli/verify.d.ts +1 -0
  17. package/build/src/cli/verify.js +90 -0
  18. package/build/src/cli/workflow-sidecar.d.ts +316 -8
  19. package/build/src/cli/workflow-sidecar.js +1996 -91
  20. package/build/src/cli.js +2 -3
  21. package/build/src/lib/flow-resolver.d.ts +111 -0
  22. package/build/src/lib/flow-resolver.js +308 -0
  23. package/build/src/tools/build-universal-bundles.js +34 -22
  24. package/build/src/tools/generate-context-map.js +3 -16
  25. package/build/src/tools/validate-source-tree.d.ts +1 -1
  26. package/build/src/tools/validate-source-tree.js +42 -162
  27. package/context/contracts/artifact-contract.md +10 -0
  28. package/context/contracts/delivery-contract.md +1 -0
  29. package/context/contracts/review-contract.md +1 -0
  30. package/context/contracts/verification-contract.md +2 -0
  31. package/context/gate-awareness.md +39 -0
  32. package/context/scripts/hooks/stop-goal-fit.js +632 -70
  33. package/docs/adr/0001-flow-agents-consumes-flow.md +1 -1
  34. package/docs/adr/0002-flow-kits-as-extension-unit.md +1 -1
  35. package/docs/adr/0004-gates-expect-surface-claims.md +2 -0
  36. package/docs/adr/0005-kubernetes-inspired-resource-contracts.md +2 -0
  37. package/docs/adr/0007-skill-audit.md +1 -1
  38. package/docs/adr/0009-canonical-hook-core-kit-boundary.md +95 -0
  39. package/docs/adr/0010-workflow-trust-state-as-hachure-bundle.md +139 -0
  40. package/docs/adr/0011-mcp-posture.md +100 -0
  41. package/docs/adr/0012-agent-coordination-as-liveness-claims.md +119 -0
  42. package/docs/adr/0013-context-lifecycle.md +151 -0
  43. package/docs/adr/0014-core-vs-domain-kit-boundary.md +143 -0
  44. package/docs/adr/0015-flow-flow-agents-boundary-reconciliation.md +120 -0
  45. package/docs/adr/0016-three-hard-boundary-model.md +71 -0
  46. package/docs/adr/0017-anti-gaming-trust-security-model.md +155 -0
  47. package/docs/agent-system-guidebook.md +5 -12
  48. package/docs/context-map.md +4 -10
  49. package/docs/index.md +3 -2
  50. package/docs/integrations/framework-adapter.md +19 -6
  51. package/docs/integrations/index.md +2 -2
  52. package/docs/north-star.md +4 -4
  53. package/docs/operating-layers.md +3 -3
  54. package/docs/plans/adr-0010-phase2-gate-recompute.md +55 -0
  55. package/docs/repository-structure.md +2 -2
  56. package/docs/skills-map.md +1 -0
  57. package/docs/spec/runtime-hook-surface.md +62 -9
  58. package/docs/standards-register.md +3 -3
  59. package/docs/survey-utterance-check.md +1 -1
  60. package/docs/trust-anchor-adoption.md +197 -0
  61. package/docs/verifiable-trust.md +95 -0
  62. package/docs/veritas-integration.md +2 -2
  63. package/docs/workflow-usage-guide.md +69 -0
  64. package/evals/acceptance/DEMO-false-completion.md +144 -0
  65. package/evals/acceptance/demo-cast.sh +92 -0
  66. package/evals/acceptance/demo-false-completion.sh +72 -0
  67. package/evals/acceptance/demo-real-evidence.sh +104 -0
  68. package/evals/acceptance/demo.tape +29 -0
  69. package/evals/acceptance/prove-capture-teeth-declared.sh +335 -0
  70. package/evals/acceptance/prove-capture-teeth.sh +114 -0
  71. package/evals/acceptance/prove-teeth.sh +105 -0
  72. package/evals/ci/antigaming-suite.sh +55 -0
  73. package/evals/ci/run-baseline.sh +2 -0
  74. package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/flows/review.flow.json +26 -0
  75. package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/kit.json +20 -0
  76. package/evals/fixtures/flow-kit-repository/valid-unknown-extension/flows/review.flow.json +26 -0
  77. package/evals/fixtures/flow-kit-repository/valid-unknown-extension/kit.json +18 -0
  78. package/evals/integration/test_builder_step_producers.sh +379 -0
  79. package/evals/integration/test_bundle_install.sh +35 -71
  80. package/evals/integration/test_bundle_lifecycle.sh +39 -2
  81. package/evals/integration/test_captured_fail_reconciliation.sh +820 -0
  82. package/evals/integration/test_checkpoint_signing.sh +489 -0
  83. package/evals/integration/test_claim_lookup.sh +352 -0
  84. package/evals/integration/test_command_log_fork_classification.sh +134 -0
  85. package/evals/integration/test_command_log_integrity.sh +275 -0
  86. package/evals/integration/test_context_map.sh +0 -2
  87. package/evals/integration/test_dual_emit_flow_step.sh +278 -0
  88. package/evals/integration/test_enforcer_expects_driven.sh +281 -0
  89. package/evals/integration/test_evidence_capture_hook.sh +185 -0
  90. package/evals/integration/test_flow_kit_repository.sh +2 -0
  91. package/evals/integration/test_flowdef_session_activation.sh +273 -0
  92. package/evals/integration/test_flowdef_session_history_preservation.sh +250 -0
  93. package/evals/integration/test_gate_bypass_chain.sh +448 -0
  94. package/evals/integration/test_gate_lockdown.sh +1137 -0
  95. package/evals/integration/test_gate_review_inquiry_records.sh +399 -0
  96. package/evals/integration/test_goal_fit_escape_hatch.sh +73 -0
  97. package/evals/integration/test_goal_fit_hook.sh +69 -4
  98. package/evals/integration/test_goal_fit_rederive.sh +263 -0
  99. package/evals/integration/test_install_merge.sh +1176 -0
  100. package/evals/integration/test_kit_identity_trust.sh +393 -0
  101. package/evals/integration/test_mint_attestation.sh +373 -0
  102. package/evals/integration/test_phase_map_and_gate_claim.sh +365 -0
  103. package/evals/integration/test_publish_delivery.sh +269 -0
  104. package/evals/integration/test_reconcile_soundness.sh +528 -0
  105. package/evals/integration/test_resolvefirststep_security.sh +208 -0
  106. package/evals/integration/test_session_resume_roundtrip.sh +286 -0
  107. package/evals/integration/test_trust_checkpoint.sh +325 -0
  108. package/evals/integration/test_trust_reconcile.sh +293 -0
  109. package/evals/integration/test_verify_cli.sh +208 -0
  110. package/evals/integration/test_workflow_sidecar_writer.sh +549 -34
  111. package/evals/lib/node.sh +0 -6
  112. package/evals/run.sh +47 -0
  113. package/evals/static/test_workflow_skills.sh +6 -13
  114. package/install.sh +0 -7
  115. package/integrations/strands-ts/README.md +25 -15
  116. package/integrations/veritas/flow-agents.adapter.json +1 -2
  117. package/kits/builder/flows/build.flow.json +59 -12
  118. package/kits/builder/kit.json +85 -15
  119. package/kits/builder/skills/continue-work/SKILL.md +116 -0
  120. package/kits/builder/skills/deliver/SKILL.md +36 -6
  121. package/kits/builder/skills/design-probe/SKILL.md +28 -0
  122. package/kits/builder/skills/execute-plan/SKILL.md +9 -1
  123. package/kits/builder/skills/gate-review/SKILL.md +234 -0
  124. package/kits/builder/skills/learning-review/SKILL.md +30 -0
  125. package/kits/builder/skills/pickup-probe/SKILL.md +29 -0
  126. package/kits/builder/skills/plan-work/SKILL.md +13 -1
  127. package/kits/builder/skills/pull-work/SKILL.md +19 -0
  128. package/kits/knowledge/adapters/default-store/index.js +38 -0
  129. package/kits/knowledge/adapters/flow-runner/index.js +1620 -0
  130. package/kits/knowledge/adapters/obsidian-store/index.js +36 -6
  131. package/kits/knowledge/docs/store-contract.md +314 -0
  132. package/kits/knowledge/evals/audit-freshness/suite.test.js +368 -0
  133. package/kits/knowledge/evals/canonicalize-category/suite.test.js +383 -0
  134. package/kits/knowledge/evals/contract-suite/suite.test.js +111 -0
  135. package/kits/knowledge/evals/detect-contradictions/suite.test.js +324 -0
  136. package/kits/knowledge/evals/entities/suite.test.js +40 -0
  137. package/kits/knowledge/evals/glossary-sync/suite.test.js +416 -0
  138. package/kits/knowledge/evals/hygiene-review/suite.test.js +396 -0
  139. package/kits/knowledge/evals/retirement/suite.test.js +145 -0
  140. package/kits/knowledge/flows/audit-freshness.flow.json +44 -0
  141. package/kits/knowledge/flows/canonicalize-category.flow.json +44 -0
  142. package/kits/knowledge/flows/detect-contradictions.flow.json +44 -0
  143. package/kits/knowledge/flows/glossary-sync.flow.json +61 -0
  144. package/kits/knowledge/flows/hygiene-review.flow.json +43 -0
  145. package/kits/knowledge/kit.json +51 -1
  146. package/package.json +6 -6
  147. package/packaging/conformance/README.md +10 -2
  148. package/packaging/conformance/fixtures/evidence-capture--allow-records-command.json +29 -0
  149. package/packaging/conformance/fixtures/stop-goal-fit--block-bundle-disputed-claim.json +29 -0
  150. package/packaging/conformance/fixtures/stop-goal-fit--block-capture-contradicts-claimed-pass.json +30 -0
  151. package/packaging/conformance/fixtures/stop-goal-fit--block-mode.json +23 -0
  152. package/packaging/conformance/fixtures/stop-goal-fit--off-mode.json +24 -0
  153. package/packaging/conformance/fixtures/stop-goal-fit--warn-active-delivery.json +5 -2
  154. package/packaging/conformance/fixtures/stop-goal-fit--warn-no-bundle.json +23 -0
  155. package/packaging/conformance/fixtures/workflow-steering--reground-active-prompt.json +30 -0
  156. package/packaging/conformance/fixtures/workflow-steering--reground-session-start.json +30 -0
  157. package/packaging/conformance/run-conformance.js +1 -1
  158. package/scripts/README.md +2 -1
  159. package/scripts/build-universal-bundles.js +0 -1
  160. package/scripts/ci/mint-attestation.js +221 -0
  161. package/scripts/ci/trust-reconcile.js +545 -0
  162. package/scripts/hooks/config-protection.js +423 -1
  163. package/scripts/hooks/evidence-capture.js +348 -0
  164. package/scripts/hooks/lib/liveness-read.js +113 -0
  165. package/scripts/hooks/run-hook.js +6 -1
  166. package/scripts/hooks/stop-goal-fit.js +1524 -79
  167. package/scripts/hooks/workflow-steering.js +135 -5
  168. package/scripts/install-codex-home.sh +39 -0
  169. package/scripts/install-merge.js +330 -0
  170. package/scripts/repair-command-log.js +115 -0
  171. package/src/cli/init.ts +218 -20
  172. package/src/cli/validate-workflow-artifacts.ts +18 -2
  173. package/src/cli/verify.ts +100 -0
  174. package/src/cli/workflow-sidecar.ts +2127 -84
  175. package/src/cli.ts +2 -3
  176. package/src/lib/flow-resolver.ts +369 -0
  177. package/src/tools/build-universal-bundles.ts +34 -21
  178. package/src/tools/generate-context-map.ts +3 -17
  179. package/src/tools/validate-source-tree.ts +44 -104
  180. package/build/src/tools/filter-installed-packs.d.ts +0 -2
  181. package/build/src/tools/filter-installed-packs.js +0 -135
  182. package/packaging/packs.json +0 -49
  183. package/scripts/filter-installed-packs.js +0 -2
  184. package/src/tools/filter-installed-packs.ts +0 -132
@@ -41,7 +41,7 @@ Flow Agents owns:
41
41
  - provider settings
42
42
  - runtime adapters
43
43
  - native harness hooks
44
- - useful Flow-backed workflow packs
44
+ - useful Flow-backed workflow Kits
45
45
  - Console views for agent users
46
46
  - local-first setup and runtime exports
47
47
 
@@ -8,6 +8,6 @@ Flow Agents will use **Flow Kit** as the product and implementation term for ins
8
8
 
9
9
  **Status**: Accepted
10
10
 
11
- **Considered Options**: Keeping "pack" was familiar from the current `packaging/packs.json`, but it was too generic and carried plugin-marketplace baggage. Keeping "pack" internally while using "kit" publicly was rejected because the repository is unpublished and a split vocabulary would make the migration harder to understand.
11
+ **Considered Options**: Keeping "pack" was familiar from the then-current `packaging/` pack-composition layer, but it was too generic and carried plugin-marketplace baggage. Keeping "pack" internally while using "kit" publicly was rejected because the repository is unpublished and a split vocabulary would make the migration harder to understand. (That legacy composition layer was subsequently removed outright; the standalone base always installs and Kits carry depth through the Kit Catalog.)
12
12
 
13
13
  **Consequences**: Flow Agents will use `kits/catalog.json` as the Kit Catalog. The first real kit will live under `kits/builder/` with its own `kit.json` and Flow Definitions under `kits/builder/flows/`. The Builder Kit must be installable through the same compliance path as future external Flow Kit repositories.
@@ -8,6 +8,8 @@ Flow-backed kits will model rich gate evidence as claim expectations using the H
8
8
 
9
9
  **Status**: Accepted (updated: vocabulary aligned to Hachure trust.bundle in hachure-align)
10
10
 
11
+ > **Clarified by ADR 0016 (2026-06-26) on claim-taxonomy ownership.** The kit's FlowDefinition `expects[]` is **authoritative** for which claims each gate requires (it declares the kit-namespaced types, e.g. `builder.verify.tests`). The core derives the generic *kind* of an expectation from that metadata; it does not hardcode a claim namespace. Generic kinds are flow/core vocabulary; the binding of kinds → steps + accepted statuses is the kit's FlowDefinition.
12
+
11
13
  **Considered Options**: Provider-aware gate rules were rejected because they would make Flow Definitions know too much about individual tools. Plain evidence strings such as `tests` or `veritas` were rejected because they cannot represent claim type, accepted status, producer authority, transparency gaps, or project-level enforcement overrides cleanly. An earlier version used `kind: "surface.claim"` and `artifact_kind: "TrustReport"/"Trust Snapshot"` — those have been renamed to `kind: "trust.bundle"` and `artifact_kind: "trust.bundle"` to align with the Hachure schema standard that Flow now ships.
12
14
 
13
15
  **Consequences**: Trusted producer mappings belong upstream in Flow project configuration, not Flow Agents runtime configuration. Flow Agents can help author, install, and adapt that configuration for agent runtimes, but CI, framework agents, local CLIs, and humans should all evaluate gates against the same Flow-owned authority model. When hachure is installed as an optional dependency, referenced trust artifacts are validated against hachure's trust-bundle.schema.json at evidence-recording time.
@@ -10,6 +10,8 @@ Date: 2026-05-27
10
10
 
11
11
  Accepted
12
12
 
13
+ > **Migration directive (ADR 0015 / 0016, 2026-06-26):** this contract's "existing sidecars need migration plans" guidance has since been made a concrete, accepted directive — the bespoke `workflow-sidecar` FSM is a parallel reimplementation of this Resource Contract and **retires** via ADR 0015's phased migration (`state.json → WorkflowRun.status`, etc.). The Resource Contract is the run/state model; the FSM is not a legitimate parallel layer. See ADR 0016 for the unified three-boundary model.
14
+
13
15
  ## Context
14
16
 
15
17
  Kontour products need durable records that humans, agents, runtime adapters, provider integrations, CLI tools, CI systems, evals, and future control planes can all inspect without relying on hidden chat context. Kubernetes, Tekton, and Argo have converged on versioned resource shapes with metadata, desired state, observed status, and status conditions; that shape is familiar, toolable, and a good fit for workflow state. OpenTelemetry GenAI conventions are also emerging for agent observability, so Kontour products should not invent isolated telemetry concepts where compatible conventions already exist.
@@ -107,6 +107,6 @@ The dispositions in this audit table were implemented in PR #62:
107
107
  - `src/tools/build-universal-bundles.ts`: `collectAllSkills()` function added; bundle builders now collect skills from both `skills/` (tool-skills) and kit-declared `skills` arrays. Runtime bundles (`.claude/skills/`, `.codex/skills/`, etc.) include all kit-owned skills unchanged.
108
108
  - `src/tools/generate-context-map.ts`: `allSkillPaths()` function added; context map generation now includes kit-owned skills.
109
109
  - `src/tools/validate-source-tree.ts`: `validateLegacyRefs()` updated to skip legacy-ref matches that resolve as declared kit-owned asset subpaths.
110
- - `packaging/packs.json`: Skill entries limited to the 6 remaining tool-skills in `skills/`. Kit-owned skills are no longer listed in packs (they're always included in the bundle as kit assets).
110
+ - The legacy `packaging/` pack-composition manifest (since removed): at the time of this audit, its skill entries were limited to the 6 remaining tool-skills in `skills/`, and kit-owned skills were already excluded (they ship as kit assets). That whole composition layer was later removed outright; the standalone `skills/`/`agents/`/`powers/` base always installs and Kits carry depth through the Kit Catalog.
111
111
  - `flow-agents kit inspect kits/builder` now reports `k1: true` (skills present).
112
112
  - `flow-agents kit inspect kits/knowledge` now reports `k1: true` (skills present).
@@ -0,0 +1,95 @@
1
+ ---
2
+ title: "ADR 0009: Canonical Hook Core/Kit Boundary"
3
+ ---
4
+
5
+ # ADR 0009: Canonical Hook Core/Kit Boundary
6
+
7
+ **Date:** 2026-06-23
8
+ **Status:** Accepted (decided with Brian Anderson, 2026-06-23)
9
+
10
+ > **Extended/corrected by ADR 0016 (2026-06-26).** This ADR de-coupled the canonical hook from Builder skill/template *names* but left the hardcoded `workflow.*` claim taxonomy in the hook as "core contract." ADR 0016 (Abstraction A) corrects that: the gate enforcer must be **driven by the active kit's FlowDefinition `expects[]`**, not a hardcoded taxonomy. The hook's hardcoded `workflow.check.*`/`workflow.critique.review`/`workflow.acceptance.criterion` filtering is reclassified as a **violation to remediate** (same treatment this ADR gave `DELIVERY_TYPES`). The anti-gaming re-derivation via Surface is unchanged — only the *source* of the expectations moves to the FlowDefinition.
11
+
12
+ ---
13
+
14
+ ## Context
15
+
16
+ Flow Agents ships **canonical hooks** — `stop-goal-fit`, `workflow-steering`,
17
+ `evidence-capture`, `config-protection`, `quality-gate` — wired into *every* runtime
18
+ bundle as kit-neutral policy. ADRs 0007 (skill/tool/kit boundary) and 0008 (kit
19
+ operation boundary) classify skills, tools, kits, and kit *operations*, but they
20
+ never explicitly placed **hooks**. Hooks are a fourth thing: the always-on policy
21
+ layer the runtime fires, independent of which kit is active.
22
+
23
+ Dogfooding block-by-default surfaced a boundary violation: `stop-goal-fit` (a
24
+ canonical hook shipped to all runtimes) hard-codes **Builder-Kit extension
25
+ semantics** — `DELIVERY_TYPES` (the Builder skill names deliver/fix-bug/
26
+ execute-plan/verify-work) plus `--deliver`/etc. artifact detection, and the Builder
27
+ deliver-template section names (`Definition Of Done` / `Goal Fit Gate` /
28
+ `Final Acceptance`). A hook that runs under *any* kit should not know *one* kit's
29
+ skills or markdown template.
30
+
31
+ An open sub-question rode along: is the workflow-state **phase set**
32
+ (`idea → … → done`, enumerated in `schemas/workflow-state.schema.json`) a canonical
33
+ core lifecycle, or Builder-Kit's own?
34
+
35
+ ## Decision
36
+
37
+ ### 1. Apply ADR 0008's dividing test to canonical hooks
38
+
39
+ A canonical hook **may read the kit-neutral core workflow contract**: the
40
+ `schemas/workflow-*.schema.json` shapes (`state` status/phase/next_action,
41
+ `evidence` verdict/checks, `acceptance` criteria) and `command-log.jsonl`. These are
42
+ container/contract-level (the *WHAT*), owned in core `schemas/` + `context/contracts/`
43
+ and agentless-capable (ADR 0001).
44
+
45
+ A canonical hook **may NOT interpret kit-extension semantics**: specific skill names,
46
+ a kit's artifact-template section conventions, or any per-kit vocabulary. This is
47
+ ADR 0008's *agent-blind guardrail*, extended from kit operations to hooks: *does the
48
+ hook interpret the extension (kit-specific) or only the contract (core)?*
49
+
50
+ ### 2. The workflow-state phase set is canonical/core
51
+
52
+ Flow owns a **single canonical task lifecycle**; kit Flow Definitions map their steps
53
+ onto it. Kits do not invent phases. This is what lets one canonical gate work across
54
+ every kit — the cross-kit gate property. (Trade-off accepted: a kit with a radically
55
+ different lifecycle must map onto canonical phases or motivate adding one.)
56
+
57
+ ### 3. Consequence for `stop-goal-fit` (the de-coupling)
58
+
59
+ - Detect the active task by the **kit-neutral signal** — presence of `state.json` /
60
+ the workflow sidecars, scoped via `.flow-agents/current.json` — **not** by the
61
+ Builder skill-name patterns. Remove `DELIVERY_TYPES` / `--deliver` coupling.
62
+ - The deliver-template **finish-line section checks** (`Definition Of Done` /
63
+ `Goal Fit Gate` / `Final Acceptance`) are Builder-Kit vocabulary → supplied by the
64
+ **kit** (config the canonical hook reads) or a Builder-Kit-owned check, not
65
+ hard-coded in the canonical hook.
66
+ - The deterministic enforcement that matters (evidence verdict/checks + the capture
67
+ cross-reference + acceptance criteria) is all **schema-based** and stays in the
68
+ canonical hook unchanged.
69
+
70
+ ## Consequences
71
+
72
+ - The canonical gate becomes genuinely kit-neutral; the proven false-completion catch
73
+ (conformance L2; `prove-capture-teeth`) is unaffected because it is schema-based.
74
+ - Builder-Kit finish-line conventions move to kit ownership — a small, mechanical
75
+ refactor analogous to the skill-placement move ADR 0007 made mechanical.
76
+ - **Resolves the A/B fork as A** ("core knows the core contract"), with the precise
77
+ boundary: **schema + canonical phases = core; skill-names + template-sections = kit.**
78
+
79
+ ## Alternatives Considered
80
+
81
+ - **B — canonical hooks fully kit-agnostic; the stop gate is Builder-Kit-owned.**
82
+ Rejected: the workflow contract is core (schemas/, context/contracts/, agentless per
83
+ ADR 0001), and relocating the gate per-kit duplicates a well-tested enforcement,
84
+ violating consume-never-fork. The dividing test already gives a kit-neutral core gate.
85
+ - **Leave the skill-name / template-section coupling in place.** Rejected: it violates
86
+ ADR 0008's agent-blind guardrail — a hook shipped to every runtime must not interpret
87
+ one kit's extension.
88
+
89
+ ## References
90
+
91
+ - [ADR 0001: Flow Agents Consumes Flow](./0001-flow-agents-consumes-flow.md)
92
+ - [ADR 0007: Flow / Skill / Kit / Tool Boundary](./0007-flow-skill-kit-tool-boundary.md)
93
+ - [ADR 0008: Kit Operation Boundary](./0008-kit-operation-boundary.md) — the dividing test this ADR extends to hooks.
94
+ - [ADR 0004: Gates Expect Hachure Trust Bundles](./0004-gates-expect-surface-claims.md)
95
+ - `schemas/workflow-*.schema.json`; `scripts/hooks/{stop-goal-fit,evidence-capture,workflow-steering}.js`.
@@ -0,0 +1,139 @@
1
+ ---
2
+ title: "ADR 0010: Workflow Trust State as a Hachure Trust Bundle"
3
+ ---
4
+
5
+ # ADR 0010: Workflow Trust State as a Hachure Trust Bundle
6
+
7
+ **Date:** 2026-06-23
8
+ **Status:** Accepted. Phase 1 (emit) shipped (pre-existing); **maximal enrichment** (verification-policies + capture-authoritative evidence), **Phase 2 core** (the gate enforces on the bundle's Surface-derived claim statuses), **Phase 3 local projection** (`render-trust-panel` emits a standalone Surface Trust Panel HTML), and **Phase 2 hardening complete** (`DELIVERY_TYPES`/markdown removal done in sub-step 2c: `stop-goal-fit` verdict is now fully bundle-driven; Builder template heading checks removed; sidecar-driven Final Acceptance hygiene added). Phase 4 shipped: bespoke evidence.json/critique.json writes retired (ADR 0010 4c); trust.bundle is the sole verification artifact. Remaining: Phase 3 Console sink.
9
+ **Supersedes:** the interim markdown-de-coupling of ADR 0009 (see "Relationship to ADR 0009").
10
+
11
+ ---
12
+
13
+ ## Context
14
+
15
+ Flow Agents stores a task's workflow state as several bespoke JSON sidecars under
16
+ `.flow-agents/<slug>/`: `evidence.json` (verdict + checks), `acceptance.json`
17
+ (criteria), `critique.json` (review findings), `command-log.jsonl` (captured command
18
+ results), plus `state.json` (lifecycle: status/phase/next_action). The canonical hooks
19
+ (`stop-goal-fit`, `evidence-capture`) read these directly, and `stop-goal-fit` also
20
+ parses a Builder **markdown template** (`## Definition Of Done`, `## Goal Fit Gate`,
21
+ `### Verdict: PASS`).
22
+
23
+ Meanwhile **ADR 0004 already established that Flow gates expect Hachure `trust.bundle`
24
+ claims** (`builder.verify.tests`, etc.). So the platform is *half Hachure* at the gate
25
+ layer and *bespoke* at the local-sidecar layer — a duplicated representation of the same
26
+ thing: **is this work provably done?** That question is, definitionally, Hachure's: *a
27
+ claim's status recomputed from its evidence by a pure, versioned function.*
28
+
29
+ ## Decision
30
+
31
+ **The workflow's *trust* state is represented as a Hachure trust bundle; the gate
32
+ evaluates that bundle; Surface projects it to the Trust Panel / Console.**
33
+
34
+ ### What maps to the bundle (trust state)
35
+
36
+ - `evidence.json` checks → **claims + evidence** (status recomputed from the result).
37
+ - `acceptance.json` criteria → **claims** (each AC; status from its evidence_refs).
38
+ - `command-log.jsonl` → **evidence/traces** the claims recompute *from* (the event
39
+ stream behind the bundle).
40
+ - `critique.json` → **claims/findings**.
41
+
42
+ ### What stays out of the bundle (lifecycle ≠ trust)
43
+
44
+ `state.json` — `status` / `phase` / `next_action` — is workflow **control** state (the
45
+ *WHAT-step*, owned by Flow per ADR 0007), **not** a trust claim. It stays a Flow
46
+ workflow-state record, *referenced by* the bundle, not modeled *as* claims.
47
+
48
+ ### The gate
49
+
50
+ `stop-goal-fit` stops parsing the Builder markdown template and stops reading bespoke
51
+ sidecars; it **recomputes the Hachure bundle** (Surface evaluation) and gates on claim
52
+ statuses + the capture cross-reference. This is the full realization of ADR 0009 (gate
53
+ on the canonical contract) — the gate becomes a deterministic Hachure recompute,
54
+ inherently portable and third-party-verifiable.
55
+
56
+ ### Distribution: local-first, Console as optional projection
57
+
58
+ - **Local file is the source of truth.** Enforcement reads the local bundle and works
59
+ fully offline. **The gate never depends on Console or any network sink.**
60
+ - **Console / Trust Panel is an optional Surface *projection*** over the bundle —
61
+ Surface already defines `TrustBundle → Trust Panel | Console | API | MCP`. Flow Agents
62
+ *produces* the local bundle; **Surface owns projection**; Console is the plane. Flow
63
+ Agents must not grow a bespoke "push to console" path (consume-never-fork).
64
+ - This is the **existing telemetry sink config** generalized: *local always; Console
65
+ optional*. It is also the monetization boundary made literal — **open data plane (local
66
+ bundle, free) / paid control plane (Console)**.
67
+
68
+ ## Consequences
69
+
70
+ - **One representation end-to-end** — finishes what ADR 0004 started; kills the
71
+ bespoke/Hachure duplication.
72
+ - **Surface Trust Panel "for free"** — workflow outcomes (which checks passed/failed,
73
+ why a stop blocked) become viewable + recomputable in the company's own panel. This is
74
+ also a dogfood/demo: Flow Agents' own workflow trust state, in the company's trust
75
+ format, in the company's panel.
76
+ - **Deterministic gate** — the verdict is a pure recompute, not bespoke parsing.
77
+ - **Costs (eyes open):** (1) hook weight — the Stop hook gains a Hachure recompute +
78
+ schema validation vs. dependency-free `JSON.parse` (mitigable; `hachure` is already an
79
+ optional dep per ADR 0004; the recompute is pure). (2) Migration touches every producer
80
+ (`workflow-sidecar` writer), consumer (hooks, `validate-workflow-artifacts`), schema,
81
+ and test. (3) Regression risk concentrates on the exact hook that twice broke the
82
+ capture loop from haste — so it is phased with proofs as guardrails.
83
+
84
+ ## Phased Migration (proof-gated)
85
+
86
+ Each phase must keep `prove-capture-teeth` (8/8) and conformance (L2) green before the
87
+ next begins.
88
+
89
+ - **Phase 0 (today):** gate enforces on bespoke sidecars + the hygiene fixes (`08319f4`).
90
+ - **Phase 1 — dual-write the bundle: ✅ SHIPPED.** `workflow-sidecar`'s
91
+ `buildTrustBundle`/`writeTrustBundle` emit a validated Hachure `trust.bundle` for
92
+ evidence/acceptance/critique, wired into `record-evidence`/`record-critique`/
93
+ `advance-state`. **Maximal enrichment (this PR):** a `VerificationPolicy` per claimType
94
+ and the `command-log` capture folded in as `execution` evidence — and capture is
95
+ *authoritative* (a claimed-pass whose captured command FAILED is `disputed` in the
96
+ bundle). *Proof: `test_workflow_sidecar_writer` AC3 + capture/conformance green.*
97
+ - **Phase 2 — gate enforces on the bundle: ◑ CORE SHIPPED (this PR).** `stop-goal-fit`
98
+ reads the `trust.bundle` and blocks on any high-impact `disputed` claim (Surface-derived
99
+ status; a canonical false-completion signal — HARD_BLOCK at any phase, incl. terminal).
100
+ Additive: the capture cross-reference is preserved. *Proof: new conformance fixture
101
+ `stop-goal-fit--block-bundle-disputed-claim` (L2 21/21); prove-capture-teeth 8/8.*
102
+ **Remaining hardening:** (a) *re-derive at the gate* via Surface `buildTrustReport`
103
+ (needs an async hook restructure — today the gate reads the write-time Surface-derived
104
+ status); (b) *remove* the `DELIVERY_TYPES`/markdown parsing (subsumes ADR 0009) — the
105
+ capture proof seeds raw evidence+log and relies on markdown detection, so this is
106
+ reworked carefully, not ripped out.
107
+ - **Phase 3 — Surface projection: ◑ LOCAL SHIPPED (this PR).** `render-trust-panel <dir>`
108
+ derives the report (Surface `buildTrustReport`) and emits a **standalone HTML** embedding
109
+ Surface's dependency-free `<surface-trust-panel>` element — projection fully delegated to
110
+ Surface (consume-never-fork). Enforcement stays local/offline. *Proof:
111
+ `test_workflow_sidecar_writer` AC4.* **Remaining:** the **Console sink** (behind the
112
+ existing opt-in telemetry sink — local always, Console optional).
113
+ - **Phase 4 — retire the bespoke sidecars** (keep `state.json` lifecycle only) once all
114
+ producers/consumers/tests are on the bundle.
115
+
116
+ ## Relationship to ADR 0009
117
+
118
+ ADR 0009 (apply 0008's dividing test to canonical hooks; phases canonical-core) stands.
119
+ Its *mechanical interim* — de-coupling the Builder markdown/skill-name parsing in place —
120
+ is **superseded**: do not polish the bespoke hook; Phase 2 replaces that parsing with
121
+ bundle recompute, achieving the same kit-neutrality via the canonical Hachure format.
122
+
123
+ ## Alternatives Considered
124
+
125
+ - **Keep bespoke sidecars.** Rejected: duplicates ADR 0004's gate representation, no
126
+ Trust Panel, gate stays bespoke. (Defensible only if the Trust Panel view is not wanted
127
+ soon — in which case defer Phases 2–4, but Phase 1 dual-write still de-risks.)
128
+ - **A new bespoke "stream trust state to Console" mechanism.** Rejected: Surface already
129
+ projects bundles to Console and telemetry already has a local/Console sink config —
130
+ compose them, don't fork.
131
+
132
+ ## References
133
+
134
+ - [ADR 0004: Gates Expect Hachure Trust Bundles](./0004-gates-expect-surface-claims.md)
135
+ - [ADR 0008: Kit Operation Boundary](./0008-kit-operation-boundary.md) — the dividing test.
136
+ - [ADR 0009: Canonical Hook Core/Kit Boundary](./0009-canonical-hook-core-kit-boundary.md) — superseded interim.
137
+ - `surface/CONTEXT.md` — `TrustBundle → Trust Panel | Console | API | MCP` projections.
138
+ - hachure `trust-bundle.schema.json`; `install.sh --telemetry-sink` (local/Console sink precedent).
139
+ - `scripts/hooks/{stop-goal-fit,evidence-capture}.js`; `src/cli/workflow-sidecar.ts`.
@@ -0,0 +1,100 @@
1
+ ---
2
+ title: "ADR 0011: MCP Posture — Enforcement Stays Hooks; Surface Owns MCP Projection; No Auto-Injected Config"
3
+ ---
4
+
5
+ # ADR 0011: MCP Posture
6
+
7
+ **Date:** 2026-06-24
8
+ **Status:** Accepted (decided with Brian Anderson, 2026-06-24).
9
+
10
+ ---
11
+
12
+ ## Context
13
+
14
+ Flow Agents integrates with agent runtimes through two mechanisms: **hooks** (PostToolUse
15
+ capture, Stop gate, SessionStart — deterministic, automatic, can block) and a **CLI**
16
+ (`workflow-sidecar` — operations the agent invokes: `init-plan`, `record-evidence`,
17
+ `advance-state`, `current`, `render-trust-panel`). The question arose whether Flow Agents
18
+ should also expose an **MCP** surface — and specifically how to surface a workflow's
19
+ **trust report inline in the conversation** by leveraging Surface's existing MCP-UI trust
20
+ panel (`buildTrustPanelUiResource`, `ui://surface/trust-panel/…`).
21
+
22
+ Two facts shaped the decision:
23
+
24
+ - MCP tools are **agent-invoked** — the agent *chooses* to call them; they cannot be forced
25
+ and cannot block.
26
+ - Surface **already** exposes an MCP server (`surface mcp`) whose tools re-derive a trust
27
+ report from a trust input and return the MCP-UI panel.
28
+
29
+ ## Decision
30
+
31
+ ### 1. Enforcement stays hooks. MCP never carries the gate.
32
+
33
+ The gate (`stop-goal-fit` blocking) and capture (PostToolUse) are deterministic runtime
34
+ **interception** — they fire automatically and exit non-zero to block. MCP tools are
35
+ agent-invoked and cannot intercept or block. **MCP therefore cannot carry Flow Agents'
36
+ differentiator (deterministic teeth); enforcement remains hooks.** Any MCP surface is
37
+ *additive* to — never a replacement for — hooks.
38
+
39
+ ### 2. Trust-surfacing in-conversation = consume Surface's MCP; do not build our own.
40
+
41
+ Flow Agents **produces** `.flow-agents/<slug>/trust.bundle`; **Surface's MCP projects** it
42
+ to the MCP-UI trust panel. Flow Agents writes **zero MCP code** (consume-never-fork; Surface
43
+ owns projection). Ingestion uses Surface's **per-call `path`** argument (the skill passes the
44
+ active task's bundle, resolved from `current.json`), **not** a static `--input` set at server
45
+ launch — a single static input cannot follow a session's many per-task bundles or its moving
46
+ "current." A directory / `current`-aware Surface ingestion is the cleaner long-term shape,
47
+ but that is **Surface's** design to evaluate (kontourai/surface#95) — not something Flow
48
+ Agents hacks around.
49
+
50
+ ### 3. Never auto-inject MCP config into files we do not own.
51
+
52
+ Registering an MCP server edits runtime config (Claude Code `.mcp.json`, Codex equivalent)
53
+ that belongs to the **user**, not Flow Agents. **Hooks are core** to Flow Agents' function
54
+ (justified to write); **MCP-surfacing is optional sugar** (must not be force-injected).
55
+ Therefore:
56
+
57
+ - **Default — zero writes:** the installer *documents/prints* the exact `surface mcp`
58
+ registration snippet; the user adds it if they want it.
59
+ - **Convenience — explicit, reversible opt-in:** a `flow-agents trust:mcp enable|disable`
60
+ command writes a **fenced managed block** (`# BEGIN flow-agents (managed) … # END`) — easy
61
+ to enable, trivial to remove. Never on plain install. Tracked in flow-agents#137.
62
+
63
+ ### 4. A Flow Agents *invocation* MCP is a separate, deferred decision.
64
+
65
+ Flow Agents *could* expose its workflow operations as MCP **tools** — valuable for **reach**
66
+ (MCP hosts without shell access, e.g. claude.ai web) and first-class, discoverable
67
+ operations. But an **MCP-only host gets invocation without enforcement** (no hooks) — a
68
+ **capped conformance tier** (invoke-without-enforce ranks below hook-enforce in the L0/L1/L2
69
+ model). This is a larger architecture decision, **not adopted here**; recorded as a future
70
+ possibility gated on a real need for non-shell reach.
71
+
72
+ ## Consequences
73
+
74
+ - Trust-in-conversation is **opt-in and boundary-pure** — Surface owns the MCP + the UI;
75
+ Flow Agents produces the bundle and (optionally, with consent) points Surface at it.
76
+ - The **local HTML projection (`render-trust-panel`, #135) stays the no-config default**;
77
+ MCP-surfacing is the opt-in upgrade.
78
+ - Flow Agents' config posture is explicit: **core (hooks) writes; optional (MCP) is
79
+ documented / opt-in / reversible.**
80
+ - The `surface mcp --input` ingestion is referred to Surface for evaluation (kontourai/surface#95).
81
+
82
+ ## Alternatives Considered
83
+
84
+ - **Auto-inject Surface's MCP on install.** Rejected: edits config Flow Agents does not own,
85
+ for optional sugar; surprising and hard to cleanly remove.
86
+ - **Build an MCP-UI trust panel in Flow Agents.** Rejected: forks Surface's
87
+ `buildTrustPanelUiResource` and requires standing up an MCP server (consume-never-fork
88
+ violation).
89
+ - **Replace hooks with MCP tools.** Rejected: MCP is agent-invoked and cannot block — it
90
+ cannot carry deterministic enforcement.
91
+ - **Static `--input` for trust ingestion.** Rejected for Flow Agents' use: cannot follow
92
+ per-task bundles or a moving current; use per-call `path`. The exposure itself is referred
93
+ to Surface (kontourai/surface#95).
94
+
95
+ ## References
96
+
97
+ - [ADR 0010: Workflow Trust State as a Hachure Trust Bundle](./0010-workflow-trust-state-as-hachure-bundle.md) — the bundle this surfaces; #135 (`render-trust-panel` local projection).
98
+ - `@kontourai/surface` `src/commands/mcp.ts` (`surface mcp --input`, per-call `path`, `buildTrustPanelUiResource`).
99
+ - kontourai/surface#95 — evaluate `surface mcp --input` ingestion exposure.
100
+ - flow-agents#137 — opt-in `trust:mcp` wiring command.
@@ -0,0 +1,119 @@
1
+ ---
2
+ title: "ADR 0012: Agent Coordination as Hachure Liveness Claims"
3
+ ---
4
+
5
+ # ADR 0012: Agent Coordination as Hachure Liveness Claims
6
+
7
+ **Date:** 2026-06-24
8
+ **Status:** Accepted (decided with Brian Anderson, 2026-06-24). Grounded by a round-trip proof against the current Surface kernel; **gated on a Surface dependency bump** (see Consequences).
9
+
10
+ ---
11
+
12
+ ## Context
13
+
14
+ Multiple agents (and human + agent teams) work the same repo concurrently — this is the
15
+ normal case, not the exception. This session *was* the experiment. What actually prevented
16
+ collisions was cheap **tolerance** (isolated git worktrees + small PRs + PR/CI
17
+ serialization); what *hurt* was (1) **discovery** — agents repeatedly almost-rebuilt
18
+ in-flight work because there was no shared "what's claimed" signal — and (2) the **merge
19
+ race** (strict up-to-date vs. a fast main).
20
+
21
+ `pull-work` already speaks `in_progress` and parses "coordinate with" blockers, but it does
22
+ **not** *write* a claim or exclude claimed items — the backlog→pull-work loop the system was
23
+ designed for is unfinished. The recurring word was **claim**, which is exactly a Hachure
24
+ concept: a claim with **evidence** and **freshness**, whose status is **recomputed**.
25
+
26
+ ## Decision
27
+
28
+ ### 1. A work-claim is a Hachure claim under a *liveness policy* — not a new subsystem.
29
+
30
+ An agent claiming work emits a claim (`claimType: workflow.coordination.hold`) governed by a
31
+ **liveness policy** (a `ttlSeconds` window) and kept alive by **heartbeat** (verified)
32
+ events. The coordination lifecycle derives from the **existing** Surface status function —
33
+ **proven** (5/5) against the current kernel:
34
+
35
+ | Coordination state | Mechanism | Derived `TrustStatus` |
36
+ |---|---|---|
37
+ | **held** | claim + heartbeat within `ttlSeconds` | `verified` |
38
+ | **reclaimable** | heartbeat lapsed past ttl | `stale` |
39
+ | **released** | `revoked` invalidation event | `stale` |
40
+ | **taken over** | `superseded` event | `superseded` |
41
+ | **reclaimed** | new holder's fresh heartbeat | `verified` |
42
+
43
+ No new statuses, no new machinery — a claim's nature is defined by its **policy**, and the
44
+ liveness policy is a reuse of the duration/ttl freshness logic (`claimIntrinsicExpiry`).
45
+ **`stale` is the reaper** I worried about: abandoned claims expire by construction.
46
+
47
+ ### 2. Coordination and verification are *siblings under one subject*, not nested.
48
+
49
+ The delivery workflow's progress is **not evidence for the reservation** (a heartbeat is). So
50
+ the coordination claim and the verification bundle are **co-equal claims about the same
51
+ `subjectId`** (the work-item / backlog identity), derived from **one event stream**, linked
52
+ by `identityLinks` + an optional `derivationEdges` reference (so "who holds it" can drill into
53
+ "and here's their progress"). The work-item identity is the join key; the provider adapter
54
+ maps issue → `subjectId` and optionally projects the claim back (label/assignee).
55
+
56
+ ### 3. Resumption via durable evidence — strictly better than a lock.
57
+
58
+ The coordination claim is **ephemeral** (liveness); the verification evidence is **durable**.
59
+ When a holder goes dark, its claim goes `stale` but its evidence **survives** — so the next
60
+ agent **resumes from recorded state, not restart**. The *same freshness* that reaps the claim
61
+ also tells the resumer which inherited evidence is still valid (fresh) vs. must be re-run
62
+ (stale). This generalizes the bespoke `handoff.json`.
63
+
64
+ ### 4. Advisory, not a lock.
65
+
66
+ A bundle is *additive*; a lock is *mutex*. The recompute is **awareness**, not a seizure.
67
+ Actual serialization happens at the **integration layer** (branch/PR/merge-queue). A
68
+ false-stale double-hold (two fresh claims on one subject) is **detected** via Hachure
69
+ `conflictRules`/`conflictedClaims`, not prevented. This is why **ttl/heartbeat tuning is the
70
+ operational risk** — too-tight manufactures false reclaims; keep it advisory and
71
+ double-checked against real branch/PR state.
72
+
73
+ ### 5. Flow-owned; Veritas optional; local-first.
74
+
75
+ - **Hachure:** the schema + a **liveness policy archetype** (a policy shape, *not* a new type).
76
+ - **Flow:** owns the shared coordination **stream + recompute** (it is the workflow/event engine).
77
+ - **Flow Agents:** `pull-work` emits/heartbeats/releases; a hook **surfaces** "lane X held by A (fresh, PR #n) / lane Y stale — reclaimable."
78
+ - **Sink:** local file or **git ref** first (solo + the real model, not a throwaway) → optional hosted relay/provider → optional **Surface/Console** projection (a live activity panel). **Never required-Console.**
79
+ - **Veritas:** an *optional* policy layer on top (e.g. "don't merge into a contested lane"); not in the path.
80
+
81
+ ### 6. Policy archetypes — a tight, universal set only.
82
+
83
+ A small set of status-derivation **shapes** — **evidence-backed**, **liveness**,
84
+ **attestation**, **corroboration** — is general enough to live in **Hachure** as a reference
85
+ profile (interop + de-dupes our hand-rolled `VerificationPolicy` instances from ADR 0010).
86
+ Tuned **instances** stay in the products. Domain policies must **not** go in the format.
87
+
88
+ ## Consequences
89
+
90
+ - **Completes the backlog→pull-work loop** and serves solo (local file/git-ref) *and* team
91
+ (shared provider/relay) with one model — separation of concerns for context that can't
92
+ hold every task at once.
93
+ - **Resumption beats locking**; the same primitive proves verification *and* coordination —
94
+ strong evidence an open trust format is general, not single-purpose (the dogfood *is* the
95
+ justification).
96
+ - **Prerequisite (proven):** flow-agents depends on `@kontourai/surface@^1.0.1` and installs
97
+ **1.0.1**, which *predates* the `ttlSeconds`/`claimIntrinsicExpiry` liveness logic. The
98
+ round-trip **fails on 1.0.1 and passes on 1.2.1**. So this is gated on **bumping Surface to
99
+ ≥1.2.x** (which also benefits the existing trust bundles' freshness).
100
+ - TTL/heartbeat defaults must be configurable; the layer stays advisory.
101
+
102
+ ## Alternatives Considered
103
+
104
+ - **Hard lock (lease server / branch CAS).** Rejected as the primary: stale leases orphan
105
+ work; you can't predict an agent's file footprint at claim time; prevention you'll mispredict.
106
+ - **Issue-marking only (label/assignee in-progress).** Good *thin* layer, but solves
107
+ work-*item* collision, not work-*area* (file) collision — which is what actually bit us — and
108
+ needs TTL/atomicity anyway. Necessary, not sufficient.
109
+ - **A bespoke coordination subsystem / new statuses.** Rejected: square-peg; express it as a
110
+ policy archetype and reuse the status function.
111
+ - **Veritas owns lane-conflict.** Rejected: that's awareness/shared-state (Flow), not
112
+ policy-compliance (Veritas). Veritas is optional on top.
113
+
114
+ ## References
115
+
116
+ - [ADR 0010: Workflow Trust State as a Hachure Trust Bundle](./0010-workflow-trust-state-as-hachure-bundle.md) — the verification sibling.
117
+ - `@kontourai/surface` `src/status.ts` — `deriveTrustStatus`, `claimIntrinsicExpiry`, terminal-event fold (the proven kernel).
118
+ - Round-trip proof: `held→stale→released→superseded→reclaimed` (5/5 on Surface 1.2.1; fails on 1.0.1).
119
+ - `handoff.json` (bespoke resumption precedent); flow-agents#137 (`pull-work` claim wiring); kontourai/surface#95 (`mcp --input` ingestion).