npm - @kontourai/flow-agents - Versions diffs - 1.4.0 → 2.0.1 - Mend

@kontourai/flow-agents 1.4.0 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (184) hide show

package/.github/CODEOWNERS +29 -0
package/.github/actions/trust-verify/action.yml +145 -0
package/.github/workflows/ci.yml +11 -4
package/.github/workflows/kit-gates-demo.yml +2 -2
package/.github/workflows/publish-npm.yml +10 -2
package/.github/workflows/release-please.yml +1 -1
package/.github/workflows/runtime-compat.yml +1 -1
package/.github/workflows/trust-reconcile.yml +113 -0
package/AGENTS.md +13 -0
package/CHANGELOG.md +103 -0
package/CONTRIBUTING.md +4 -4
package/README.md +1 -0
package/agents/tool-planner.json +1 -1
package/build/src/cli/init.js +242 -20
package/build/src/cli/validate-workflow-artifacts.js +19 -2
package/build/src/cli/verify.d.ts +1 -0
package/build/src/cli/verify.js +90 -0
package/build/src/cli/workflow-sidecar.d.ts +316 -8
package/build/src/cli/workflow-sidecar.js +1996 -91
package/build/src/cli.js +2 -3
package/build/src/lib/flow-resolver.d.ts +111 -0
package/build/src/lib/flow-resolver.js +308 -0
package/build/src/tools/build-universal-bundles.js +34 -22
package/build/src/tools/generate-context-map.js +3 -16
package/build/src/tools/validate-source-tree.d.ts +1 -1
package/build/src/tools/validate-source-tree.js +42 -162
package/context/contracts/artifact-contract.md +10 -0
package/context/contracts/delivery-contract.md +1 -0
package/context/contracts/review-contract.md +1 -0
package/context/contracts/verification-contract.md +2 -0
package/context/gate-awareness.md +39 -0
package/context/scripts/hooks/stop-goal-fit.js +632 -70
package/docs/adr/0001-flow-agents-consumes-flow.md +1 -1
package/docs/adr/0002-flow-kits-as-extension-unit.md +1 -1
package/docs/adr/0004-gates-expect-surface-claims.md +2 -0
package/docs/adr/0005-kubernetes-inspired-resource-contracts.md +2 -0
package/docs/adr/0007-skill-audit.md +1 -1
package/docs/adr/0009-canonical-hook-core-kit-boundary.md +95 -0
package/docs/adr/0010-workflow-trust-state-as-hachure-bundle.md +139 -0
package/docs/adr/0011-mcp-posture.md +100 -0
package/docs/adr/0012-agent-coordination-as-liveness-claims.md +119 -0
package/docs/adr/0013-context-lifecycle.md +151 -0
package/docs/adr/0014-core-vs-domain-kit-boundary.md +143 -0
package/docs/adr/0015-flow-flow-agents-boundary-reconciliation.md +120 -0
package/docs/adr/0016-three-hard-boundary-model.md +71 -0
package/docs/adr/0017-anti-gaming-trust-security-model.md +155 -0
package/docs/agent-system-guidebook.md +5 -12
package/docs/context-map.md +4 -10
package/docs/index.md +3 -2
package/docs/integrations/framework-adapter.md +19 -6
package/docs/integrations/index.md +2 -2
package/docs/north-star.md +4 -4
package/docs/operating-layers.md +3 -3
package/docs/plans/adr-0010-phase2-gate-recompute.md +55 -0
package/docs/repository-structure.md +2 -2
package/docs/skills-map.md +1 -0
package/docs/spec/runtime-hook-surface.md +62 -9
package/docs/standards-register.md +3 -3
package/docs/survey-utterance-check.md +1 -1
package/docs/trust-anchor-adoption.md +197 -0
package/docs/verifiable-trust.md +95 -0
package/docs/veritas-integration.md +2 -2
package/docs/workflow-usage-guide.md +69 -0
package/evals/acceptance/DEMO-false-completion.md +144 -0
package/evals/acceptance/demo-cast.sh +92 -0
package/evals/acceptance/demo-false-completion.sh +72 -0
package/evals/acceptance/demo-real-evidence.sh +104 -0
package/evals/acceptance/demo.tape +29 -0
package/evals/acceptance/prove-capture-teeth-declared.sh +335 -0
package/evals/acceptance/prove-capture-teeth.sh +114 -0
package/evals/acceptance/prove-teeth.sh +105 -0
package/evals/ci/antigaming-suite.sh +55 -0
package/evals/ci/run-baseline.sh +2 -0
package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/flows/review.flow.json +26 -0
package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/kit.json +20 -0
package/evals/fixtures/flow-kit-repository/valid-unknown-extension/flows/review.flow.json +26 -0
package/evals/fixtures/flow-kit-repository/valid-unknown-extension/kit.json +18 -0
package/evals/integration/test_builder_step_producers.sh +379 -0
package/evals/integration/test_bundle_install.sh +35 -71
package/evals/integration/test_bundle_lifecycle.sh +39 -2
package/evals/integration/test_captured_fail_reconciliation.sh +820 -0
package/evals/integration/test_checkpoint_signing.sh +489 -0
package/evals/integration/test_claim_lookup.sh +352 -0
package/evals/integration/test_command_log_fork_classification.sh +134 -0
package/evals/integration/test_command_log_integrity.sh +275 -0
package/evals/integration/test_context_map.sh +0 -2
package/evals/integration/test_dual_emit_flow_step.sh +278 -0
package/evals/integration/test_enforcer_expects_driven.sh +281 -0
package/evals/integration/test_evidence_capture_hook.sh +185 -0
package/evals/integration/test_flow_kit_repository.sh +2 -0
package/evals/integration/test_flowdef_session_activation.sh +273 -0
package/evals/integration/test_flowdef_session_history_preservation.sh +250 -0
package/evals/integration/test_gate_bypass_chain.sh +448 -0
package/evals/integration/test_gate_lockdown.sh +1137 -0
package/evals/integration/test_gate_review_inquiry_records.sh +399 -0
package/evals/integration/test_goal_fit_escape_hatch.sh +73 -0
package/evals/integration/test_goal_fit_hook.sh +69 -4
package/evals/integration/test_goal_fit_rederive.sh +263 -0
package/evals/integration/test_install_merge.sh +1176 -0
package/evals/integration/test_kit_identity_trust.sh +393 -0
package/evals/integration/test_mint_attestation.sh +373 -0
package/evals/integration/test_phase_map_and_gate_claim.sh +365 -0
package/evals/integration/test_publish_delivery.sh +269 -0
package/evals/integration/test_reconcile_soundness.sh +528 -0
package/evals/integration/test_resolvefirststep_security.sh +208 -0
package/evals/integration/test_session_resume_roundtrip.sh +286 -0
package/evals/integration/test_trust_checkpoint.sh +325 -0
package/evals/integration/test_trust_reconcile.sh +293 -0
package/evals/integration/test_verify_cli.sh +208 -0
package/evals/integration/test_workflow_sidecar_writer.sh +549 -34
package/evals/lib/node.sh +0 -6
package/evals/run.sh +47 -0
package/evals/static/test_workflow_skills.sh +6 -13
package/install.sh +0 -7
package/integrations/strands-ts/README.md +25 -15
package/integrations/veritas/flow-agents.adapter.json +1 -2
package/kits/builder/flows/build.flow.json +59 -12
package/kits/builder/kit.json +85 -15
package/kits/builder/skills/continue-work/SKILL.md +116 -0
package/kits/builder/skills/deliver/SKILL.md +36 -6
package/kits/builder/skills/design-probe/SKILL.md +28 -0
package/kits/builder/skills/execute-plan/SKILL.md +9 -1
package/kits/builder/skills/gate-review/SKILL.md +234 -0
package/kits/builder/skills/learning-review/SKILL.md +30 -0
package/kits/builder/skills/pickup-probe/SKILL.md +29 -0
package/kits/builder/skills/plan-work/SKILL.md +13 -1
package/kits/builder/skills/pull-work/SKILL.md +19 -0
package/kits/knowledge/adapters/default-store/index.js +38 -0
package/kits/knowledge/adapters/flow-runner/index.js +1620 -0
package/kits/knowledge/adapters/obsidian-store/index.js +36 -6
package/kits/knowledge/docs/store-contract.md +314 -0
package/kits/knowledge/evals/audit-freshness/suite.test.js +368 -0
package/kits/knowledge/evals/canonicalize-category/suite.test.js +383 -0
package/kits/knowledge/evals/contract-suite/suite.test.js +111 -0
package/kits/knowledge/evals/detect-contradictions/suite.test.js +324 -0
package/kits/knowledge/evals/entities/suite.test.js +40 -0
package/kits/knowledge/evals/glossary-sync/suite.test.js +416 -0
package/kits/knowledge/evals/hygiene-review/suite.test.js +396 -0
package/kits/knowledge/evals/retirement/suite.test.js +145 -0
package/kits/knowledge/flows/audit-freshness.flow.json +44 -0
package/kits/knowledge/flows/canonicalize-category.flow.json +44 -0
package/kits/knowledge/flows/detect-contradictions.flow.json +44 -0
package/kits/knowledge/flows/glossary-sync.flow.json +61 -0
package/kits/knowledge/flows/hygiene-review.flow.json +43 -0
package/kits/knowledge/kit.json +51 -1
package/package.json +6 -6
package/packaging/conformance/README.md +10 -2
package/packaging/conformance/fixtures/evidence-capture--allow-records-command.json +29 -0
package/packaging/conformance/fixtures/stop-goal-fit--block-bundle-disputed-claim.json +29 -0
package/packaging/conformance/fixtures/stop-goal-fit--block-capture-contradicts-claimed-pass.json +30 -0
package/packaging/conformance/fixtures/stop-goal-fit--block-mode.json +23 -0
package/packaging/conformance/fixtures/stop-goal-fit--off-mode.json +24 -0
package/packaging/conformance/fixtures/stop-goal-fit--warn-active-delivery.json +5 -2
package/packaging/conformance/fixtures/stop-goal-fit--warn-no-bundle.json +23 -0
package/packaging/conformance/fixtures/workflow-steering--reground-active-prompt.json +30 -0
package/packaging/conformance/fixtures/workflow-steering--reground-session-start.json +30 -0
package/packaging/conformance/run-conformance.js +1 -1
package/scripts/README.md +2 -1
package/scripts/build-universal-bundles.js +0 -1
package/scripts/ci/mint-attestation.js +221 -0
package/scripts/ci/trust-reconcile.js +545 -0
package/scripts/hooks/config-protection.js +423 -1
package/scripts/hooks/evidence-capture.js +348 -0
package/scripts/hooks/lib/liveness-read.js +113 -0
package/scripts/hooks/run-hook.js +6 -1
package/scripts/hooks/stop-goal-fit.js +1524 -79
package/scripts/hooks/workflow-steering.js +135 -5
package/scripts/install-codex-home.sh +39 -0
package/scripts/install-merge.js +330 -0
package/scripts/repair-command-log.js +115 -0
package/src/cli/init.ts +218 -20
package/src/cli/validate-workflow-artifacts.ts +18 -2
package/src/cli/verify.ts +100 -0
package/src/cli/workflow-sidecar.ts +2127 -84
package/src/cli.ts +2 -3
package/src/lib/flow-resolver.ts +369 -0
package/src/tools/build-universal-bundles.ts +34 -21
package/src/tools/generate-context-map.ts +3 -17
package/src/tools/validate-source-tree.ts +44 -104
package/build/src/tools/filter-installed-packs.d.ts +0 -2
package/build/src/tools/filter-installed-packs.js +0 -135
package/packaging/packs.json +0 -49
package/scripts/filter-installed-packs.js +0 -2
package/src/tools/filter-installed-packs.ts +0 -132

package/docs/adr/0001-flow-agents-consumes-flow.md CHANGED Viewed

@@ -41,7 +41,7 @@ Flow Agents owns:
 - provider settings
 - runtime adapters
 - native harness hooks
-- useful Flow-backed workflow packs
+- useful Flow-backed workflow Kits
 - Console views for agent users
 - local-first setup and runtime exports

package/docs/adr/0002-flow-kits-as-extension-unit.md CHANGED Viewed

@@ -8,6 +8,6 @@ Flow Agents will use **Flow Kit** as the product and implementation term for ins
 **Status**: Accepted
-**Considered Options**: Keeping "pack" was familiar from the current `packaging/packs.json`, but it was too generic and carried plugin-marketplace baggage. Keeping "pack" internally while using "kit" publicly was rejected because the repository is unpublished and a split vocabulary would make the migration harder to understand.
+**Considered Options**: Keeping "pack" was familiar from the then-current `packaging/` pack-composition layer, but it was too generic and carried plugin-marketplace baggage. Keeping "pack" internally while using "kit" publicly was rejected because the repository is unpublished and a split vocabulary would make the migration harder to understand. (That legacy composition layer was subsequently removed outright; the standalone base always installs and Kits carry depth through the Kit Catalog.)
 **Consequences**: Flow Agents will use `kits/catalog.json` as the Kit Catalog. The first real kit will live under `kits/builder/` with its own `kit.json` and Flow Definitions under `kits/builder/flows/`. The Builder Kit must be installable through the same compliance path as future external Flow Kit repositories.

package/docs/adr/0004-gates-expect-surface-claims.md CHANGED Viewed

@@ -8,6 +8,8 @@ Flow-backed kits will model rich gate evidence as claim expectations using the H
 **Status**: Accepted (updated: vocabulary aligned to Hachure trust.bundle in hachure-align)
+> **Clarified by ADR 0016 (2026-06-26) on claim-taxonomy ownership.** The kit's FlowDefinition `expects[]` is **authoritative** for which claims each gate requires (it declares the kit-namespaced types, e.g. `builder.verify.tests`). The core derives the generic *kind* of an expectation from that metadata; it does not hardcode a claim namespace. Generic kinds are flow/core vocabulary; the binding of kinds → steps + accepted statuses is the kit's FlowDefinition.
 **Considered Options**: Provider-aware gate rules were rejected because they would make Flow Definitions know too much about individual tools. Plain evidence strings such as `tests` or `veritas` were rejected because they cannot represent claim type, accepted status, producer authority, transparency gaps, or project-level enforcement overrides cleanly. An earlier version used `kind: "surface.claim"` and `artifact_kind: "TrustReport"/"Trust Snapshot"` — those have been renamed to `kind: "trust.bundle"` and `artifact_kind: "trust.bundle"` to align with the Hachure schema standard that Flow now ships.
 **Consequences**: Trusted producer mappings belong upstream in Flow project configuration, not Flow Agents runtime configuration. Flow Agents can help author, install, and adapt that configuration for agent runtimes, but CI, framework agents, local CLIs, and humans should all evaluate gates against the same Flow-owned authority model. When hachure is installed as an optional dependency, referenced trust artifacts are validated against hachure's trust-bundle.schema.json at evidence-recording time.

package/docs/adr/0005-kubernetes-inspired-resource-contracts.md CHANGED Viewed

@@ -10,6 +10,8 @@ Date: 2026-05-27
 Accepted
+> **Migration directive (ADR 0015 / 0016, 2026-06-26):** this contract's "existing sidecars need migration plans" guidance has since been made a concrete, accepted directive — the bespoke `workflow-sidecar` FSM is a parallel reimplementation of this Resource Contract and **retires** via ADR 0015's phased migration (`state.json → WorkflowRun.status`, etc.). The Resource Contract is the run/state model; the FSM is not a legitimate parallel layer. See ADR 0016 for the unified three-boundary model.
 ## Context
 Kontour products need durable records that humans, agents, runtime adapters, provider integrations, CLI tools, CI systems, evals, and future control planes can all inspect without relying on hidden chat context. Kubernetes, Tekton, and Argo have converged on versioned resource shapes with metadata, desired state, observed status, and status conditions; that shape is familiar, toolable, and a good fit for workflow state. OpenTelemetry GenAI conventions are also emerging for agent observability, so Kontour products should not invent isolated telemetry concepts where compatible conventions already exist.

package/docs/adr/0007-skill-audit.md CHANGED Viewed

@@ -107,6 +107,6 @@ The dispositions in this audit table were implemented in PR #62:
 - `src/tools/build-universal-bundles.ts`: `collectAllSkills()` function added; bundle builders now collect skills from both `skills/` (tool-skills) and kit-declared `skills` arrays. Runtime bundles (`.claude/skills/`, `.codex/skills/`, etc.) include all kit-owned skills unchanged.
 - `src/tools/generate-context-map.ts`: `allSkillPaths()` function added; context map generation now includes kit-owned skills.
 - `src/tools/validate-source-tree.ts`: `validateLegacyRefs()` updated to skip legacy-ref matches that resolve as declared kit-owned asset subpaths.
-- `packaging/packs.json`: Skill entries limited to the 6 remaining tool-skills in `skills/`. Kit-owned skills are no longer listed in packs (they're always included in the bundle as kit assets).
+- The legacy `packaging/` pack-composition manifest (since removed): at the time of this audit, its skill entries were limited to the 6 remaining tool-skills in `skills/`, and kit-owned skills were already excluded (they ship as kit assets). That whole composition layer was later removed outright; the standalone `skills/`/`agents/`/`powers/` base always installs and Kits carry depth through the Kit Catalog.
 - `flow-agents kit inspect kits/builder` now reports `k1: true` (skills present).
 - `flow-agents kit inspect kits/knowledge` now reports `k1: true` (skills present).

package/docs/adr/0009-canonical-hook-core-kit-boundary.md ADDED Viewed

@@ -0,0 +1,95 @@
+---
+title: "ADR 0009: Canonical Hook Core/Kit Boundary"
+---
+# ADR 0009: Canonical Hook Core/Kit Boundary
+**Date:** 2026-06-23
+**Status:** Accepted (decided with Brian Anderson, 2026-06-23)
+> **Extended/corrected by ADR 0016 (2026-06-26).** This ADR de-coupled the canonical hook from Builder skill/template *names* but left the hardcoded `workflow.*` claim taxonomy in the hook as "core contract." ADR 0016 (Abstraction A) corrects that: the gate enforcer must be **driven by the active kit's FlowDefinition `expects[]`**, not a hardcoded taxonomy. The hook's hardcoded `workflow.check.*`/`workflow.critique.review`/`workflow.acceptance.criterion` filtering is reclassified as a **violation to remediate** (same treatment this ADR gave `DELIVERY_TYPES`). The anti-gaming re-derivation via Surface is unchanged — only the *source* of the expectations moves to the FlowDefinition.
+---
+## Context
+Flow Agents ships **canonical hooks** — `stop-goal-fit`, `workflow-steering`,
+`evidence-capture`, `config-protection`, `quality-gate` — wired into *every* runtime
+bundle as kit-neutral policy. ADRs 0007 (skill/tool/kit boundary) and 0008 (kit
+operation boundary) classify skills, tools, kits, and kit *operations*, but they
+never explicitly placed **hooks**. Hooks are a fourth thing: the always-on policy
+layer the runtime fires, independent of which kit is active.
+Dogfooding block-by-default surfaced a boundary violation: `stop-goal-fit` (a
+canonical hook shipped to all runtimes) hard-codes **Builder-Kit extension
+semantics** — `DELIVERY_TYPES` (the Builder skill names deliver/fix-bug/
+execute-plan/verify-work) plus `--deliver`/etc. artifact detection, and the Builder
+deliver-template section names (`Definition Of Done` / `Goal Fit Gate` /
+`Final Acceptance`). A hook that runs under *any* kit should not know *one* kit's
+skills or markdown template.
+An open sub-question rode along: is the workflow-state **phase set**
+(`idea → … → done`, enumerated in `schemas/workflow-state.schema.json`) a canonical
+core lifecycle, or Builder-Kit's own?
+## Decision
+### 1. Apply ADR 0008's dividing test to canonical hooks
+A canonical hook **may read the kit-neutral core workflow contract**: the
+`schemas/workflow-*.schema.json` shapes (`state` status/phase/next_action,
+`evidence` verdict/checks, `acceptance` criteria) and `command-log.jsonl`. These are
+container/contract-level (the *WHAT*), owned in core `schemas/` + `context/contracts/`
+and agentless-capable (ADR 0001).
+A canonical hook **may NOT interpret kit-extension semantics**: specific skill names,
+a kit's artifact-template section conventions, or any per-kit vocabulary. This is
+ADR 0008's *agent-blind guardrail*, extended from kit operations to hooks: *does the
+hook interpret the extension (kit-specific) or only the contract (core)?*
+### 2. The workflow-state phase set is canonical/core
+Flow owns a **single canonical task lifecycle**; kit Flow Definitions map their steps
+onto it. Kits do not invent phases. This is what lets one canonical gate work across
+every kit — the cross-kit gate property. (Trade-off accepted: a kit with a radically
+different lifecycle must map onto canonical phases or motivate adding one.)
+### 3. Consequence for `stop-goal-fit` (the de-coupling)
+- Detect the active task by the **kit-neutral signal** — presence of `state.json` /
+  the workflow sidecars, scoped via `.flow-agents/current.json` — **not** by the
+  Builder skill-name patterns. Remove `DELIVERY_TYPES` / `--deliver` coupling.
+- The deliver-template **finish-line section checks** (`Definition Of Done` /
+  `Goal Fit Gate` / `Final Acceptance`) are Builder-Kit vocabulary → supplied by the
+  **kit** (config the canonical hook reads) or a Builder-Kit-owned check, not
+  hard-coded in the canonical hook.
+- The deterministic enforcement that matters (evidence verdict/checks + the capture
+  cross-reference + acceptance criteria) is all **schema-based** and stays in the
+  canonical hook unchanged.
+## Consequences
+- The canonical gate becomes genuinely kit-neutral; the proven false-completion catch
+  (conformance L2; `prove-capture-teeth`) is unaffected because it is schema-based.
+- Builder-Kit finish-line conventions move to kit ownership — a small, mechanical
+  refactor analogous to the skill-placement move ADR 0007 made mechanical.
+- **Resolves the A/B fork as A** ("core knows the core contract"), with the precise
+  boundary: **schema + canonical phases = core; skill-names + template-sections = kit.**
+## Alternatives Considered
+- **B — canonical hooks fully kit-agnostic; the stop gate is Builder-Kit-owned.**
+  Rejected: the workflow contract is core (schemas/, context/contracts/, agentless per
+  ADR 0001), and relocating the gate per-kit duplicates a well-tested enforcement,
+  violating consume-never-fork. The dividing test already gives a kit-neutral core gate.
+- **Leave the skill-name / template-section coupling in place.** Rejected: it violates
+  ADR 0008's agent-blind guardrail — a hook shipped to every runtime must not interpret
+  one kit's extension.
+## References
+- [ADR 0001: Flow Agents Consumes Flow](./0001-flow-agents-consumes-flow.md)
+- [ADR 0007: Flow / Skill / Kit / Tool Boundary](./0007-flow-skill-kit-tool-boundary.md)
+- [ADR 0008: Kit Operation Boundary](./0008-kit-operation-boundary.md) — the dividing test this ADR extends to hooks.
+- [ADR 0004: Gates Expect Hachure Trust Bundles](./0004-gates-expect-surface-claims.md)
+- `schemas/workflow-*.schema.json`; `scripts/hooks/{stop-goal-fit,evidence-capture,workflow-steering}.js`.

package/docs/adr/0010-workflow-trust-state-as-hachure-bundle.md ADDED Viewed

@@ -0,0 +1,139 @@
+---
+title: "ADR 0010: Workflow Trust State as a Hachure Trust Bundle"
+---
+# ADR 0010: Workflow Trust State as a Hachure Trust Bundle
+**Date:** 2026-06-23
+**Status:** Accepted. Phase 1 (emit) shipped (pre-existing); **maximal enrichment** (verification-policies + capture-authoritative evidence), **Phase 2 core** (the gate enforces on the bundle's Surface-derived claim statuses), **Phase 3 local projection** (`render-trust-panel` emits a standalone Surface Trust Panel HTML), and **Phase 2 hardening complete** (`DELIVERY_TYPES`/markdown removal done in sub-step 2c: `stop-goal-fit` verdict is now fully bundle-driven; Builder template heading checks removed; sidecar-driven Final Acceptance hygiene added). Phase 4 shipped: bespoke evidence.json/critique.json writes retired (ADR 0010 4c); trust.bundle is the sole verification artifact. Remaining: Phase 3 Console sink.
+**Supersedes:** the interim markdown-de-coupling of ADR 0009 (see "Relationship to ADR 0009").
+---
+## Context
+Flow Agents stores a task's workflow state as several bespoke JSON sidecars under
+`.flow-agents/<slug>/`: `evidence.json` (verdict + checks), `acceptance.json`
+(criteria), `critique.json` (review findings), `command-log.jsonl` (captured command
+results), plus `state.json` (lifecycle: status/phase/next_action). The canonical hooks
+(`stop-goal-fit`, `evidence-capture`) read these directly, and `stop-goal-fit` also
+parses a Builder **markdown template** (`## Definition Of Done`, `## Goal Fit Gate`,
+`### Verdict: PASS`).
+Meanwhile **ADR 0004 already established that Flow gates expect Hachure `trust.bundle`
+claims** (`builder.verify.tests`, etc.). So the platform is *half Hachure* at the gate
+layer and *bespoke* at the local-sidecar layer — a duplicated representation of the same
+thing: **is this work provably done?** That question is, definitionally, Hachure's: *a
+claim's status recomputed from its evidence by a pure, versioned function.*
+## Decision
+**The workflow's *trust* state is represented as a Hachure trust bundle; the gate
+evaluates that bundle; Surface projects it to the Trust Panel / Console.**
+### What maps to the bundle (trust state)
+- `evidence.json` checks → **claims + evidence** (status recomputed from the result).
+- `acceptance.json` criteria → **claims** (each AC; status from its evidence_refs).
+- `command-log.jsonl` → **evidence/traces** the claims recompute *from* (the event
+  stream behind the bundle).
+- `critique.json` → **claims/findings**.
+### What stays out of the bundle (lifecycle ≠ trust)
+`state.json` — `status` / `phase` / `next_action` — is workflow **control** state (the
+*WHAT-step*, owned by Flow per ADR 0007), **not** a trust claim. It stays a Flow
+workflow-state record, *referenced by* the bundle, not modeled *as* claims.
+### The gate
+`stop-goal-fit` stops parsing the Builder markdown template and stops reading bespoke
+sidecars; it **recomputes the Hachure bundle** (Surface evaluation) and gates on claim
+statuses + the capture cross-reference. This is the full realization of ADR 0009 (gate
+on the canonical contract) — the gate becomes a deterministic Hachure recompute,
+inherently portable and third-party-verifiable.
+### Distribution: local-first, Console as optional projection
+- **Local file is the source of truth.** Enforcement reads the local bundle and works
+  fully offline. **The gate never depends on Console or any network sink.**
+- **Console / Trust Panel is an optional Surface *projection*** over the bundle —
+  Surface already defines `TrustBundle → Trust Panel | Console | API | MCP`. Flow Agents
+  *produces* the local bundle; **Surface owns projection**; Console is the plane. Flow
+  Agents must not grow a bespoke "push to console" path (consume-never-fork).
+- This is the **existing telemetry sink config** generalized: *local always; Console
+  optional*. It is also the monetization boundary made literal — **open data plane (local
+  bundle, free) / paid control plane (Console)**.
+## Consequences
+- **One representation end-to-end** — finishes what ADR 0004 started; kills the
+  bespoke/Hachure duplication.
+- **Surface Trust Panel "for free"** — workflow outcomes (which checks passed/failed,
+  why a stop blocked) become viewable + recomputable in the company's own panel. This is
+  also a dogfood/demo: Flow Agents' own workflow trust state, in the company's trust
+  format, in the company's panel.
+- **Deterministic gate** — the verdict is a pure recompute, not bespoke parsing.
+- **Costs (eyes open):** (1) hook weight — the Stop hook gains a Hachure recompute +
+  schema validation vs. dependency-free `JSON.parse` (mitigable; `hachure` is already an
+  optional dep per ADR 0004; the recompute is pure). (2) Migration touches every producer
+  (`workflow-sidecar` writer), consumer (hooks, `validate-workflow-artifacts`), schema,
+  and test. (3) Regression risk concentrates on the exact hook that twice broke the
+  capture loop from haste — so it is phased with proofs as guardrails.
+## Phased Migration (proof-gated)
+Each phase must keep `prove-capture-teeth` (8/8) and conformance (L2) green before the
+next begins.
+- **Phase 0 (today):** gate enforces on bespoke sidecars + the hygiene fixes (`08319f4`).
+- **Phase 1 — dual-write the bundle: ✅ SHIPPED.** `workflow-sidecar`'s
+  `buildTrustBundle`/`writeTrustBundle` emit a validated Hachure `trust.bundle` for
+  evidence/acceptance/critique, wired into `record-evidence`/`record-critique`/
+  `advance-state`. **Maximal enrichment (this PR):** a `VerificationPolicy` per claimType
+  and the `command-log` capture folded in as `execution` evidence — and capture is
+  *authoritative* (a claimed-pass whose captured command FAILED is `disputed` in the
+  bundle). *Proof: `test_workflow_sidecar_writer` AC3 + capture/conformance green.*
+- **Phase 2 — gate enforces on the bundle: ◑ CORE SHIPPED (this PR).** `stop-goal-fit`
+  reads the `trust.bundle` and blocks on any high-impact `disputed` claim (Surface-derived
+  status; a canonical false-completion signal — HARD_BLOCK at any phase, incl. terminal).
+  Additive: the capture cross-reference is preserved. *Proof: new conformance fixture
+  `stop-goal-fit--block-bundle-disputed-claim` (L2 21/21); prove-capture-teeth 8/8.*
+  **Remaining hardening:** (a) *re-derive at the gate* via Surface `buildTrustReport`
+  (needs an async hook restructure — today the gate reads the write-time Surface-derived
+  status); (b) *remove* the `DELIVERY_TYPES`/markdown parsing (subsumes ADR 0009) — the
+  capture proof seeds raw evidence+log and relies on markdown detection, so this is
+  reworked carefully, not ripped out.
+- **Phase 3 — Surface projection: ◑ LOCAL SHIPPED (this PR).** `render-trust-panel <dir>`
+  derives the report (Surface `buildTrustReport`) and emits a **standalone HTML** embedding
+  Surface's dependency-free `<surface-trust-panel>` element — projection fully delegated to
+  Surface (consume-never-fork). Enforcement stays local/offline. *Proof:
+  `test_workflow_sidecar_writer` AC4.* **Remaining:** the **Console sink** (behind the
+  existing opt-in telemetry sink — local always, Console optional).
+- **Phase 4 — retire the bespoke sidecars** (keep `state.json` lifecycle only) once all
+  producers/consumers/tests are on the bundle.
+## Relationship to ADR 0009
+ADR 0009 (apply 0008's dividing test to canonical hooks; phases canonical-core) stands.
+Its *mechanical interim* — de-coupling the Builder markdown/skill-name parsing in place —
+is **superseded**: do not polish the bespoke hook; Phase 2 replaces that parsing with
+bundle recompute, achieving the same kit-neutrality via the canonical Hachure format.
+## Alternatives Considered
+- **Keep bespoke sidecars.** Rejected: duplicates ADR 0004's gate representation, no
+  Trust Panel, gate stays bespoke. (Defensible only if the Trust Panel view is not wanted
+  soon — in which case defer Phases 2–4, but Phase 1 dual-write still de-risks.)
+- **A new bespoke "stream trust state to Console" mechanism.** Rejected: Surface already
+  projects bundles to Console and telemetry already has a local/Console sink config —
+  compose them, don't fork.
+## References
+- [ADR 0004: Gates Expect Hachure Trust Bundles](./0004-gates-expect-surface-claims.md)
+- [ADR 0008: Kit Operation Boundary](./0008-kit-operation-boundary.md) — the dividing test.
+- [ADR 0009: Canonical Hook Core/Kit Boundary](./0009-canonical-hook-core-kit-boundary.md) — superseded interim.
+- `surface/CONTEXT.md` — `TrustBundle → Trust Panel | Console | API | MCP` projections.
+- hachure `trust-bundle.schema.json`; `install.sh --telemetry-sink` (local/Console sink precedent).
+- `scripts/hooks/{stop-goal-fit,evidence-capture}.js`; `src/cli/workflow-sidecar.ts`.

package/docs/adr/0011-mcp-posture.md ADDED Viewed

@@ -0,0 +1,100 @@
+---
+title: "ADR 0011: MCP Posture — Enforcement Stays Hooks; Surface Owns MCP Projection; No Auto-Injected Config"
+---
+# ADR 0011: MCP Posture
+**Date:** 2026-06-24
+**Status:** Accepted (decided with Brian Anderson, 2026-06-24).
+---
+## Context
+Flow Agents integrates with agent runtimes through two mechanisms: **hooks** (PostToolUse
+capture, Stop gate, SessionStart — deterministic, automatic, can block) and a **CLI**
+(`workflow-sidecar` — operations the agent invokes: `init-plan`, `record-evidence`,
+`advance-state`, `current`, `render-trust-panel`). The question arose whether Flow Agents
+should also expose an **MCP** surface — and specifically how to surface a workflow's
+**trust report inline in the conversation** by leveraging Surface's existing MCP-UI trust
+panel (`buildTrustPanelUiResource`, `ui://surface/trust-panel/…`).
+Two facts shaped the decision:
+- MCP tools are **agent-invoked** — the agent *chooses* to call them; they cannot be forced
+  and cannot block.
+- Surface **already** exposes an MCP server (`surface mcp`) whose tools re-derive a trust
+  report from a trust input and return the MCP-UI panel.
+## Decision
+### 1. Enforcement stays hooks. MCP never carries the gate.
+The gate (`stop-goal-fit` blocking) and capture (PostToolUse) are deterministic runtime
+**interception** — they fire automatically and exit non-zero to block. MCP tools are
+agent-invoked and cannot intercept or block. **MCP therefore cannot carry Flow Agents'
+differentiator (deterministic teeth); enforcement remains hooks.** Any MCP surface is
+*additive* to — never a replacement for — hooks.
+### 2. Trust-surfacing in-conversation = consume Surface's MCP; do not build our own.
+Flow Agents **produces** `.flow-agents/<slug>/trust.bundle`; **Surface's MCP projects** it
+to the MCP-UI trust panel. Flow Agents writes **zero MCP code** (consume-never-fork; Surface
+owns projection). Ingestion uses Surface's **per-call `path`** argument (the skill passes the
+active task's bundle, resolved from `current.json`), **not** a static `--input` set at server
+launch — a single static input cannot follow a session's many per-task bundles or its moving
+"current." A directory / `current`-aware Surface ingestion is the cleaner long-term shape,
+but that is **Surface's** design to evaluate (kontourai/surface#95) — not something Flow
+Agents hacks around.
+### 3. Never auto-inject MCP config into files we do not own.
+Registering an MCP server edits runtime config (Claude Code `.mcp.json`, Codex equivalent)
+that belongs to the **user**, not Flow Agents. **Hooks are core** to Flow Agents' function
+(justified to write); **MCP-surfacing is optional sugar** (must not be force-injected).
+Therefore:
+- **Default — zero writes:** the installer *documents/prints* the exact `surface mcp`
+  registration snippet; the user adds it if they want it.
+- **Convenience — explicit, reversible opt-in:** a `flow-agents trust:mcp enable|disable`
+  command writes a **fenced managed block** (`# BEGIN flow-agents (managed) … # END`) — easy
+  to enable, trivial to remove. Never on plain install. Tracked in flow-agents#137.
+### 4. A Flow Agents *invocation* MCP is a separate, deferred decision.
+Flow Agents *could* expose its workflow operations as MCP **tools** — valuable for **reach**
+(MCP hosts without shell access, e.g. claude.ai web) and first-class, discoverable
+operations. But an **MCP-only host gets invocation without enforcement** (no hooks) — a
+**capped conformance tier** (invoke-without-enforce ranks below hook-enforce in the L0/L1/L2
+model). This is a larger architecture decision, **not adopted here**; recorded as a future
+possibility gated on a real need for non-shell reach.
+## Consequences
+- Trust-in-conversation is **opt-in and boundary-pure** — Surface owns the MCP + the UI;
+  Flow Agents produces the bundle and (optionally, with consent) points Surface at it.
+- The **local HTML projection (`render-trust-panel`, #135) stays the no-config default**;
+  MCP-surfacing is the opt-in upgrade.
+- Flow Agents' config posture is explicit: **core (hooks) writes; optional (MCP) is
+  documented / opt-in / reversible.**
+- The `surface mcp --input` ingestion is referred to Surface for evaluation (kontourai/surface#95).
+## Alternatives Considered
+- **Auto-inject Surface's MCP on install.** Rejected: edits config Flow Agents does not own,
+  for optional sugar; surprising and hard to cleanly remove.
+- **Build an MCP-UI trust panel in Flow Agents.** Rejected: forks Surface's
+  `buildTrustPanelUiResource` and requires standing up an MCP server (consume-never-fork
+  violation).
+- **Replace hooks with MCP tools.** Rejected: MCP is agent-invoked and cannot block — it
+  cannot carry deterministic enforcement.
+- **Static `--input` for trust ingestion.** Rejected for Flow Agents' use: cannot follow
+  per-task bundles or a moving current; use per-call `path`. The exposure itself is referred
+  to Surface (kontourai/surface#95).
+## References
+- [ADR 0010: Workflow Trust State as a Hachure Trust Bundle](./0010-workflow-trust-state-as-hachure-bundle.md) — the bundle this surfaces; #135 (`render-trust-panel` local projection).
+- `@kontourai/surface` `src/commands/mcp.ts` (`surface mcp --input`, per-call `path`, `buildTrustPanelUiResource`).
+- kontourai/surface#95 — evaluate `surface mcp --input` ingestion exposure.
+- flow-agents#137 — opt-in `trust:mcp` wiring command.

package/docs/adr/0012-agent-coordination-as-liveness-claims.md ADDED Viewed

@@ -0,0 +1,119 @@
+---
+title: "ADR 0012: Agent Coordination as Hachure Liveness Claims"
+---
+# ADR 0012: Agent Coordination as Hachure Liveness Claims
+**Date:** 2026-06-24
+**Status:** Accepted (decided with Brian Anderson, 2026-06-24). Grounded by a round-trip proof against the current Surface kernel; **gated on a Surface dependency bump** (see Consequences).
+---
+## Context
+Multiple agents (and human + agent teams) work the same repo concurrently — this is the
+normal case, not the exception. This session *was* the experiment. What actually prevented
+collisions was cheap **tolerance** (isolated git worktrees + small PRs + PR/CI
+serialization); what *hurt* was (1) **discovery** — agents repeatedly almost-rebuilt
+in-flight work because there was no shared "what's claimed" signal — and (2) the **merge
+race** (strict up-to-date vs. a fast main).
+`pull-work` already speaks `in_progress` and parses "coordinate with" blockers, but it does
+**not** *write* a claim or exclude claimed items — the backlog→pull-work loop the system was
+designed for is unfinished. The recurring word was **claim**, which is exactly a Hachure
+concept: a claim with **evidence** and **freshness**, whose status is **recomputed**.
+## Decision
+### 1. A work-claim is a Hachure claim under a *liveness policy* — not a new subsystem.
+An agent claiming work emits a claim (`claimType: workflow.coordination.hold`) governed by a
+**liveness policy** (a `ttlSeconds` window) and kept alive by **heartbeat** (verified)
+events. The coordination lifecycle derives from the **existing** Surface status function —
+**proven** (5/5) against the current kernel:
+| Coordination state | Mechanism | Derived `TrustStatus` |
+|---|---|---|
+| **held** | claim + heartbeat within `ttlSeconds` | `verified` |
+| **reclaimable** | heartbeat lapsed past ttl | `stale` |
+| **released** | `revoked` invalidation event | `stale` |
+| **taken over** | `superseded` event | `superseded` |
+| **reclaimed** | new holder's fresh heartbeat | `verified` |
+No new statuses, no new machinery — a claim's nature is defined by its **policy**, and the
+liveness policy is a reuse of the duration/ttl freshness logic (`claimIntrinsicExpiry`).
+**`stale` is the reaper** I worried about: abandoned claims expire by construction.
+### 2. Coordination and verification are *siblings under one subject*, not nested.
+The delivery workflow's progress is **not evidence for the reservation** (a heartbeat is). So
+the coordination claim and the verification bundle are **co-equal claims about the same
+`subjectId`** (the work-item / backlog identity), derived from **one event stream**, linked
+by `identityLinks` + an optional `derivationEdges` reference (so "who holds it" can drill into
+"and here's their progress"). The work-item identity is the join key; the provider adapter
+maps issue → `subjectId` and optionally projects the claim back (label/assignee).
+### 3. Resumption via durable evidence — strictly better than a lock.
+The coordination claim is **ephemeral** (liveness); the verification evidence is **durable**.
+When a holder goes dark, its claim goes `stale` but its evidence **survives** — so the next
+agent **resumes from recorded state, not restart**. The *same freshness* that reaps the claim
+also tells the resumer which inherited evidence is still valid (fresh) vs. must be re-run
+(stale). This generalizes the bespoke `handoff.json`.
+### 4. Advisory, not a lock.
+A bundle is *additive*; a lock is *mutex*. The recompute is **awareness**, not a seizure.
+Actual serialization happens at the **integration layer** (branch/PR/merge-queue). A
+false-stale double-hold (two fresh claims on one subject) is **detected** via Hachure
+`conflictRules`/`conflictedClaims`, not prevented. This is why **ttl/heartbeat tuning is the
+operational risk** — too-tight manufactures false reclaims; keep it advisory and
+double-checked against real branch/PR state.
+### 5. Flow-owned; Veritas optional; local-first.
+- **Hachure:** the schema + a **liveness policy archetype** (a policy shape, *not* a new type).
+- **Flow:** owns the shared coordination **stream + recompute** (it is the workflow/event engine).
+- **Flow Agents:** `pull-work` emits/heartbeats/releases; a hook **surfaces** "lane X held by A (fresh, PR #n) / lane Y stale — reclaimable."
+- **Sink:** local file or **git ref** first (solo + the real model, not a throwaway) → optional hosted relay/provider → optional **Surface/Console** projection (a live activity panel). **Never required-Console.**
+- **Veritas:** an *optional* policy layer on top (e.g. "don't merge into a contested lane"); not in the path.
+### 6. Policy archetypes — a tight, universal set only.
+A small set of status-derivation **shapes** — **evidence-backed**, **liveness**,
+**attestation**, **corroboration** — is general enough to live in **Hachure** as a reference
+profile (interop + de-dupes our hand-rolled `VerificationPolicy` instances from ADR 0010).
+Tuned **instances** stay in the products. Domain policies must **not** go in the format.
+## Consequences
+- **Completes the backlog→pull-work loop** and serves solo (local file/git-ref) *and* team
+  (shared provider/relay) with one model — separation of concerns for context that can't
+  hold every task at once.
+- **Resumption beats locking**; the same primitive proves verification *and* coordination —
+  strong evidence an open trust format is general, not single-purpose (the dogfood *is* the
+  justification).
+- **Prerequisite (proven):** flow-agents depends on `@kontourai/surface@^1.0.1` and installs
+  **1.0.1**, which *predates* the `ttlSeconds`/`claimIntrinsicExpiry` liveness logic. The
+  round-trip **fails on 1.0.1 and passes on 1.2.1**. So this is gated on **bumping Surface to
+  ≥1.2.x** (which also benefits the existing trust bundles' freshness).
+- TTL/heartbeat defaults must be configurable; the layer stays advisory.
+## Alternatives Considered
+- **Hard lock (lease server / branch CAS).** Rejected as the primary: stale leases orphan
+  work; you can't predict an agent's file footprint at claim time; prevention you'll mispredict.
+- **Issue-marking only (label/assignee in-progress).** Good *thin* layer, but solves
+  work-*item* collision, not work-*area* (file) collision — which is what actually bit us — and
+  needs TTL/atomicity anyway. Necessary, not sufficient.
+- **A bespoke coordination subsystem / new statuses.** Rejected: square-peg; express it as a
+  policy archetype and reuse the status function.
+- **Veritas owns lane-conflict.** Rejected: that's awareness/shared-state (Flow), not
+  policy-compliance (Veritas). Veritas is optional on top.
+## References
+- [ADR 0010: Workflow Trust State as a Hachure Trust Bundle](./0010-workflow-trust-state-as-hachure-bundle.md) — the verification sibling.
+- `@kontourai/surface` `src/status.ts` — `deriveTrustStatus`, `claimIntrinsicExpiry`, terminal-event fold (the proven kernel).
+- Round-trip proof: `held→stale→released→superseded→reclaimed` (5/5 on Surface 1.2.1; fails on 1.0.1).
+- `handoff.json` (bespoke resumption precedent); flow-agents#137 (`pull-work` claim wiring); kontourai/surface#95 (`mcp --input` ingestion).