npm - @kontourai/flow-agents - Versions diffs - 1.4.0 → 2.0.0 - Mend

@kontourai/flow-agents 1.4.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (180) hide show

package/.github/CODEOWNERS +29 -0
package/.github/actions/trust-verify/action.yml +145 -0
package/.github/workflows/ci.yml +11 -4
package/.github/workflows/kit-gates-demo.yml +2 -2
package/.github/workflows/publish-npm.yml +10 -2
package/.github/workflows/release-please.yml +1 -1
package/.github/workflows/trust-reconcile.yml +113 -0
package/AGENTS.md +13 -0
package/CHANGELOG.md +95 -0
package/CONTRIBUTING.md +4 -4
package/README.md +1 -0
package/agents/tool-planner.json +1 -1
package/build/src/cli/init.js +242 -20
package/build/src/cli/validate-workflow-artifacts.js +19 -2
package/build/src/cli/verify.d.ts +1 -0
package/build/src/cli/verify.js +90 -0
package/build/src/cli/workflow-sidecar.d.ts +300 -8
package/build/src/cli/workflow-sidecar.js +1934 -83
package/build/src/cli.js +2 -3
package/build/src/lib/flow-resolver.d.ts +82 -0
package/build/src/lib/flow-resolver.js +237 -0
package/build/src/tools/build-universal-bundles.js +34 -22
package/build/src/tools/generate-context-map.js +3 -16
package/build/src/tools/validate-source-tree.d.ts +1 -1
package/build/src/tools/validate-source-tree.js +42 -162
package/context/contracts/artifact-contract.md +10 -0
package/context/contracts/delivery-contract.md +1 -0
package/context/contracts/review-contract.md +1 -0
package/context/contracts/verification-contract.md +2 -0
package/context/gate-awareness.md +39 -0
package/context/scripts/hooks/stop-goal-fit.js +632 -70
package/docs/adr/0001-flow-agents-consumes-flow.md +1 -1
package/docs/adr/0002-flow-kits-as-extension-unit.md +1 -1
package/docs/adr/0004-gates-expect-surface-claims.md +2 -0
package/docs/adr/0005-kubernetes-inspired-resource-contracts.md +2 -0
package/docs/adr/0007-skill-audit.md +1 -1
package/docs/adr/0009-canonical-hook-core-kit-boundary.md +95 -0
package/docs/adr/0010-workflow-trust-state-as-hachure-bundle.md +139 -0
package/docs/adr/0011-mcp-posture.md +100 -0
package/docs/adr/0012-agent-coordination-as-liveness-claims.md +119 -0
package/docs/adr/0013-context-lifecycle.md +151 -0
package/docs/adr/0014-core-vs-domain-kit-boundary.md +143 -0
package/docs/adr/0015-flow-flow-agents-boundary-reconciliation.md +120 -0
package/docs/adr/0016-three-hard-boundary-model.md +71 -0
package/docs/adr/0017-anti-gaming-trust-security-model.md +155 -0
package/docs/agent-system-guidebook.md +5 -12
package/docs/context-map.md +4 -10
package/docs/index.md +3 -2
package/docs/integrations/framework-adapter.md +19 -6
package/docs/integrations/index.md +2 -2
package/docs/north-star.md +4 -4
package/docs/operating-layers.md +3 -3
package/docs/plans/adr-0010-phase2-gate-recompute.md +55 -0
package/docs/repository-structure.md +2 -2
package/docs/skills-map.md +1 -0
package/docs/spec/runtime-hook-surface.md +62 -9
package/docs/standards-register.md +3 -3
package/docs/survey-utterance-check.md +1 -1
package/docs/trust-anchor-adoption.md +197 -0
package/docs/verifiable-trust.md +95 -0
package/docs/veritas-integration.md +2 -2
package/docs/workflow-usage-guide.md +69 -0
package/evals/acceptance/DEMO-false-completion.md +144 -0
package/evals/acceptance/demo-cast.sh +92 -0
package/evals/acceptance/demo-false-completion.sh +72 -0
package/evals/acceptance/demo-real-evidence.sh +104 -0
package/evals/acceptance/demo.tape +29 -0
package/evals/acceptance/prove-capture-teeth-declared.sh +335 -0
package/evals/acceptance/prove-capture-teeth.sh +114 -0
package/evals/acceptance/prove-teeth.sh +105 -0
package/evals/ci/antigaming-suite.sh +54 -0
package/evals/ci/run-baseline.sh +2 -0
package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/flows/review.flow.json +26 -0
package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/kit.json +20 -0
package/evals/fixtures/flow-kit-repository/valid-unknown-extension/flows/review.flow.json +26 -0
package/evals/fixtures/flow-kit-repository/valid-unknown-extension/kit.json +18 -0
package/evals/integration/test_builder_step_producers.sh +379 -0
package/evals/integration/test_bundle_install.sh +35 -71
package/evals/integration/test_bundle_lifecycle.sh +39 -2
package/evals/integration/test_captured_fail_reconciliation.sh +820 -0
package/evals/integration/test_checkpoint_signing.sh +489 -0
package/evals/integration/test_claim_lookup.sh +352 -0
package/evals/integration/test_command_log_integrity.sh +275 -0
package/evals/integration/test_context_map.sh +0 -2
package/evals/integration/test_dual_emit_flow_step.sh +278 -0
package/evals/integration/test_enforcer_expects_driven.sh +281 -0
package/evals/integration/test_evidence_capture_hook.sh +185 -0
package/evals/integration/test_flow_kit_repository.sh +2 -0
package/evals/integration/test_flowdef_session_activation.sh +273 -0
package/evals/integration/test_flowdef_session_history_preservation.sh +250 -0
package/evals/integration/test_gate_bypass_chain.sh +448 -0
package/evals/integration/test_gate_lockdown.sh +1137 -0
package/evals/integration/test_gate_review_inquiry_records.sh +399 -0
package/evals/integration/test_goal_fit_escape_hatch.sh +73 -0
package/evals/integration/test_goal_fit_hook.sh +69 -4
package/evals/integration/test_goal_fit_rederive.sh +263 -0
package/evals/integration/test_install_merge.sh +1176 -0
package/evals/integration/test_mint_attestation.sh +373 -0
package/evals/integration/test_phase_map_and_gate_claim.sh +365 -0
package/evals/integration/test_publish_delivery.sh +269 -0
package/evals/integration/test_reconcile_soundness.sh +528 -0
package/evals/integration/test_resolvefirststep_security.sh +208 -0
package/evals/integration/test_session_resume_roundtrip.sh +286 -0
package/evals/integration/test_trust_checkpoint.sh +325 -0
package/evals/integration/test_trust_reconcile.sh +293 -0
package/evals/integration/test_verify_cli.sh +208 -0
package/evals/integration/test_workflow_sidecar_writer.sh +549 -34
package/evals/lib/node.sh +0 -6
package/evals/run.sh +45 -0
package/evals/static/test_workflow_skills.sh +6 -13
package/install.sh +0 -7
package/integrations/strands-ts/README.md +25 -15
package/integrations/veritas/flow-agents.adapter.json +1 -2
package/kits/builder/flows/build.flow.json +59 -12
package/kits/builder/kit.json +85 -15
package/kits/builder/skills/continue-work/SKILL.md +116 -0
package/kits/builder/skills/deliver/SKILL.md +36 -6
package/kits/builder/skills/design-probe/SKILL.md +28 -0
package/kits/builder/skills/execute-plan/SKILL.md +9 -1
package/kits/builder/skills/gate-review/SKILL.md +234 -0
package/kits/builder/skills/learning-review/SKILL.md +30 -0
package/kits/builder/skills/pickup-probe/SKILL.md +29 -0
package/kits/builder/skills/plan-work/SKILL.md +13 -1
package/kits/builder/skills/pull-work/SKILL.md +19 -0
package/kits/knowledge/adapters/default-store/index.js +38 -0
package/kits/knowledge/adapters/flow-runner/index.js +1620 -0
package/kits/knowledge/adapters/obsidian-store/index.js +36 -6
package/kits/knowledge/docs/store-contract.md +314 -0
package/kits/knowledge/evals/audit-freshness/suite.test.js +368 -0
package/kits/knowledge/evals/canonicalize-category/suite.test.js +383 -0
package/kits/knowledge/evals/contract-suite/suite.test.js +111 -0
package/kits/knowledge/evals/detect-contradictions/suite.test.js +324 -0
package/kits/knowledge/evals/entities/suite.test.js +40 -0
package/kits/knowledge/evals/glossary-sync/suite.test.js +416 -0
package/kits/knowledge/evals/hygiene-review/suite.test.js +396 -0
package/kits/knowledge/evals/retirement/suite.test.js +145 -0
package/kits/knowledge/flows/audit-freshness.flow.json +44 -0
package/kits/knowledge/flows/canonicalize-category.flow.json +44 -0
package/kits/knowledge/flows/detect-contradictions.flow.json +44 -0
package/kits/knowledge/flows/glossary-sync.flow.json +61 -0
package/kits/knowledge/flows/hygiene-review.flow.json +43 -0
package/kits/knowledge/kit.json +51 -1
package/package.json +4 -4
package/packaging/conformance/README.md +10 -2
package/packaging/conformance/fixtures/evidence-capture--allow-records-command.json +29 -0
package/packaging/conformance/fixtures/stop-goal-fit--block-bundle-disputed-claim.json +29 -0
package/packaging/conformance/fixtures/stop-goal-fit--block-capture-contradicts-claimed-pass.json +30 -0
package/packaging/conformance/fixtures/stop-goal-fit--block-mode.json +23 -0
package/packaging/conformance/fixtures/stop-goal-fit--off-mode.json +24 -0
package/packaging/conformance/fixtures/stop-goal-fit--warn-active-delivery.json +5 -2
package/packaging/conformance/fixtures/stop-goal-fit--warn-no-bundle.json +23 -0
package/packaging/conformance/fixtures/workflow-steering--reground-active-prompt.json +30 -0
package/packaging/conformance/fixtures/workflow-steering--reground-session-start.json +30 -0
package/packaging/conformance/run-conformance.js +1 -1
package/scripts/README.md +2 -1
package/scripts/build-universal-bundles.js +0 -1
package/scripts/ci/mint-attestation.js +221 -0
package/scripts/ci/trust-reconcile.js +545 -0
package/scripts/hooks/config-protection.js +423 -1
package/scripts/hooks/evidence-capture.js +348 -0
package/scripts/hooks/lib/liveness-read.js +113 -0
package/scripts/hooks/run-hook.js +6 -1
package/scripts/hooks/stop-goal-fit.js +1471 -79
package/scripts/hooks/workflow-steering.js +135 -5
package/scripts/install-codex-home.sh +39 -0
package/scripts/install-merge.js +330 -0
package/src/cli/init.ts +218 -20
package/src/cli/validate-workflow-artifacts.ts +18 -2
package/src/cli/verify.ts +100 -0
package/src/cli/workflow-sidecar.ts +2064 -77
package/src/cli.ts +2 -3
package/src/lib/flow-resolver.ts +284 -0
package/src/tools/build-universal-bundles.ts +34 -21
package/src/tools/generate-context-map.ts +3 -17
package/src/tools/validate-source-tree.ts +44 -104
package/build/src/tools/filter-installed-packs.d.ts +0 -2
package/build/src/tools/filter-installed-packs.js +0 -135
package/packaging/packs.json +0 -49
package/scripts/filter-installed-packs.js +0 -2
package/src/tools/filter-installed-packs.ts +0 -132

package/docs/spec/runtime-hook-surface.md CHANGED Viewed

@@ -75,7 +75,7 @@ The reason text is the canonical steering message: it should tell the agent what
 ## 2. Policy Classes
-Flow Agents currently ships four canonical policy classes. Each policy class has a canonical hook script under `scripts/hooks/` and may be wired to one or more canonical trigger events.
+Flow Agents currently ships five canonical policy classes. Each policy class has a canonical hook script under `scripts/hooks/` and may be wired to one or more canonical trigger events.
 ### 2.1 Workflow Steering
@@ -83,14 +83,14 @@ Flow Agents currently ships four canonical policy classes. Each policy class has
 **Canonical script**: `scripts/hooks/workflow-steering.js`
-**Canonical trigger event**: `userPromptSubmit` (ambient state guidance), `postToolUse` (after `InvokeSubagents` tool calls)
+**Canonical trigger event**: `userPromptSubmit` and `agentSpawn`/`SessionStart` (active-goal re-grounding), `postToolUse` (after `InvokeSubagents` tool calls)
 **Inputs consumed**:
 - `.flow-agents/<slug>/state.json` — current workflow phase and status
 - `.flow-agents/<slug>/critique.json` — open critique findings
 - `docs/context-map.md` — structure hint for repo navigation
-**Decision contract**: Non-blocking. Always exits 0. Appends steering text to the agent's context via `additionalContext` in the hook response. Does not block any action.
+**Decision contract**: Non-blocking. Always exits 0. Appends steering text to the agent's context via `additionalContext` in the hook response. Does not block any action. It re-grounds the active workflow goal (status, phase, recorded next step) at the start of every user turn — not only for flagged/blocked states — and on `SessionStart`, which fires after context compaction and on resume. This is the mechanism that keeps an in-flight goal alive across context loss instead of relying on the model voluntarily re-reading the sidecar.
 **Degradation when host lacks trigger**: If the host has no `userPromptSubmit`-equivalent hook, workflow steering is silent. The agent receives no ambient phase reminders at turn start. This is a capability loss, not a blocking failure. Log the gap in the adapter's conformance declaration as `userPromptSubmit: no native equivalent — steering context injection unavailable`.
@@ -129,11 +129,32 @@ Flow Agents currently ships four canonical policy classes. Each policy class has
 - `.flow-agents/<slug>/state.json` — workflow phase and next action
 - `.flow-agents/<slug>/evidence.json` — verification verdict and NOT_VERIFIED gaps
 - `.flow-agents/<slug>/critique.json` — critique status and open findings
-- `FLOW_AGENTS_GOAL_FIT_STRICT` env var — `true` to make blocking (exit 2) instead of warning-only (exit 0)
+- `.flow-agents/<slug>/command-log.jsonl` — the deterministic capture log written by the Evidence Capture policy (see §2.5); cross-referenced against `evidence.json` claimed-pass command checks
+- `.flow-agents/<slug>/acceptance.json` — acceptance criteria; a criterion's `command`-kind `evidence_ref` (`excerpt`) is the most-trusted backstop command
+- `FLOW_AGENTS_GOAL_FIT_MODE` env var — `block` | `warn` | `off` (the legacy `FLOW_AGENTS_GOAL_FIT_STRICT=true` is an alias for `block`)
+- `FLOW_AGENTS_GOAL_FIT_MAX_BLOCKS` env var — consecutive-identical-block cap before the escape hatch releases (default 3)
+- `FLOW_AGENTS_GOAL_FIT_BACKSTOP` env var — `block` (default) | `off`/`warn` | `skip`; controls the capture backstop re-run (see Capture cross-reference below)
+- `FLOW_AGENTS_GOAL_FIT_BACKSTOP_TIMEOUT_MS` env var — per-backstop-command timeout in ms (default 120000; runaway commands are SIGKILL'd)
+- `FLOW_AGENTS_GOAL_FIT_RECHECK` env var — `true` opts into re-running the model's free-form `evidence.checks[].command` (the RCE-risky path; off by default)
 **Decision contract**:
-- Default mode: warning-only (exits 0). Writes guidance to stderr.
-- Strict mode (`FLOW_AGENTS_GOAL_FIT_STRICT=true`): blocking (exits 2) when the active workflow artifact has state, Definition Of Done, Goal Fit, or sidecar issues that classify as blocking.
+- `warn` (canonical engine default): exits 0, writes guidance to stderr. Non-blocking.
+- `block`: exits 2 when the active workflow artifact has state, Definition Of Done, Goal Fit, evidence, sidecar, or capture cross-reference issues that classify as blocking. Shipped L2 runtime configs (Claude Code, Codex) set `block` by default, overridable per-operator via the env var.
+- `off`: silent (exits 0, no stderr).
+- Escape hatch: in `block` mode the same goal-fit gap is refused up to `FLOW_AGENTS_GOAL_FIT_MAX_BLOCKS` (default 3) consecutive times, then released (exit 0 with a loud notice) so a genuinely-unsatisfiable goal cannot trap the agent. A changing gap resets the streak.
+**Capture cross-reference (capture-first determinism)**: For each `evidence.checks[]` of `kind:"command"` claiming `status:"pass"` that carries a `command`, the gate cross-references the deterministic capture log (`command-log.jsonl`, §2.5) *before* trusting the model's claim:
+1. **Log shows the command ran and FAILED** → this is a caught false-completion → a blocking goal-fit gap (feeds the existing block/`MAX_BLOCKS` machinery).
+2. **Log shows the command ran and PASSED** → satisfied deterministically, with no re-run.
+3. **Log has NO execution for that claimed-pass command** (it was never actually run) → resolve a TRUSTED command to re-run as a thin backstop, in priority order:
+   - **(a) acceptance criterion** — the `command`-kind evidence ref of the matching `acceptance.json` criterion (authored upfront, most trusted).
+   - **(b) declared manifest target** — the project's own declared `package.json` `scripts.{test,build,lint}` (or `typecheck`), `Makefile` target, `cargo test`/`build`, `tox`/`pytest`, or `just`/`task` target. The NAMED declared target is run — never an arbitrary allowlisted string. (`veritas readiness` is just one such declared command — no special-casing.)
+   - **(c) model free-form command** — `evidence.checks[].command`, ONLY when `FLOW_AGENTS_GOAL_FIT_RECHECK=true` (opt-in; the RCE-risky path).
+   If the resolved backstop re-run fails, it is a caught false-completion. If NO trusted command resolves, the gate records `NOT_VERIFIED` — never a guess, never a silent pass, never auto-running an unlisted string.
+**Backstop guardrails**: each backstop command runs under a per-command timeout (`FLOW_AGENTS_GOAL_FIT_BACKSTOP_TIMEOUT_MS`, default 120s; runaway commands are killed). The trusted-source backstop (a/b) rides `block` mode by default but is operator-disablable for latency: `FLOW_AGENTS_GOAL_FIT_BACKSTOP=off` (re-run becomes warn-only, never blocks) or `=skip` (no re-run at all → record `NOT_VERIFIED`). The arbitrary-model-command backstop (c) is opt-in only via `FLOW_AGENTS_GOAL_FIT_RECHECK`.
 **Degradation when host lacks trigger**: If the host has no stop hook, stop-goal-fit cannot fire. The agent may complete without the check. Log the gap as `stop: no native equivalent — stop-goal-fit policy unavailable`.
@@ -154,6 +175,31 @@ Flow Agents currently ships four canonical policy classes. Each policy class has
 **Degradation when host lacks trigger**: If the host has no `preToolUse`-equivalent blocking hook, config protection cannot veto tool calls. The agent may modify linter configs without interception. Log the gap as `preToolUse: no native blocking equivalent — config-protection policy unavailable`.
+### 2.5 Evidence Capture (capture-first determinism)
+**Intent**: Make evidence about what actually ran *machine-recorded at the source* rather than transcribed later by the model. `evidence.json` is the model's narration and can claim a test passed when it did not. The capture policy deterministically records every command/shell tool execution and its observed result to an append-only log, which the Stop-Goal-Fit gate (§2.3) cross-references against the model's claims. This makes re-running at the gate a thin backstop, not the primary check.
+**Canonical script**: `scripts/hooks/evidence-capture.js`
+**Canonical trigger event**: `postToolUse` (after command/shell tool calls)
+**Inputs consumed**:
+- `tool_name` + `tool_input.command` — identifies a command/shell execution (a command string present, with a command-shaped tool name; when no tool name is present but a command string is, it is still captured).
+- `tool_response` / `tool_output` / `error` — the host tool result (per §1, `postToolUse`); the source of the deterministically-observed outcome.
+- `.flow-agents/current.json` (`active_slug` / `artifact_dir`) then newest-mtime `state.json` — resolves the active artifact dir, the same way Workflow Steering and Stop-Goal-Fit do.
+**Output**: appends one JSON object per line to `.flow-agents/<slug>/command-log.jsonl`:
+```json
+{ "command": "npm test", "observedResult": "pass", "exitCode": 0, "capturedAt": "2026-06-23T00:00:00Z", "source": "postToolUse-capture" }
+```
+**Exit-code handling (deterministic observation only)**: a clean integer exit code is host-dependent. The policy extracts the real exit code where the host surfaces one (`tool_response`/`tool_output` `.exitCode`/`.exit_code`/`.status`/`.code`/`.returnCode`, or top-level equivalents) and sets `observedResult` to `pass` iff that code is `0`. When no clean integer exit code is present, `exitCode` is recorded as `null` and `observedResult` is inferred *only* from deterministic failure signals — a non-empty `error`, a `success:false`/`failed:true`/`is_error:true` flag, or a non-empty stderr with no stdout. Plain stdout text is never scanned for the words "error"/"fail"; the model's narration is never consulted.
+**Decision contract**: Non-blocking. Always exits 0 and echoes stdin. Idempotent/append-only. Fail-open on any error — a capture failure must never block the agent or corrupt the log. Only records when an active workflow artifact dir resolves (otherwise there is nothing to anchor the log to).
+**Degradation when host lacks trigger**: If the host has no `postToolUse` hook, command results are not captured. The Stop gate then has no capture log to cross-reference and falls back to its trusted backstop re-run (§2.3) for claimed-pass command checks. Log the gap as `postToolUse: no native equivalent — evidence capture unavailable; Stop gate relies on backstop re-run only`.
 ---
 ## 3. Hook Profiles
@@ -206,7 +252,8 @@ The adapter implements L1 plus all blocking policy classes.
 - Config protection fires on `preToolUse` and can block (exit 2 translates to a deny response).
 - Every block surfaces its reason to the model through the host's deny-reason channel (see [Block Reason Channel](#block-reason-channel)), not only to a log.
 - Quality gate fires on `postToolUse`.
-- Stop-goal-fit fires on `stop` with `FLOW_AGENTS_GOAL_FIT_STRICT` configurable (default may be warning mode; strict mode must be possible to enable).
+- Stop-goal-fit fires on `stop` with `FLOW_AGENTS_GOAL_FIT_MODE` configurable. Shipped L2 configs default to `block`; the canonical engine default remains `warn`, and any mode must be operator-overridable.
+- Workflow steering additionally re-grounds the active goal on `agentSpawn`/`SessionStart` so an in-flight goal survives context compaction and resume.
 **Permitted gaps**: None. All four policy classes are wired. Any missing host trigger must be documented as a named gap in the adapter's conformance declaration.
@@ -454,8 +501,9 @@ For structured `run()` responses (native import form), the return value is:
 |-------------|-------------|--------------------|--------------------|
 | config-protection | Fail-closed (exit 2 on protected file) | Yes — hook runtime errors exit 0 | Yes (preToolUse) |
 | quality-gate | Fail-open (exit 0 always) | Yes | No |
-| stop-goal-fit | Fail-open by default; fail-closed with `FLOW_AGENTS_GOAL_FIT_STRICT=true` | Yes — hook runtime errors exit 0 | Yes (stop, strict mode only) |
+| stop-goal-fit | Engine default warn (fail-open); blocks in `FLOW_AGENTS_GOAL_FIT_MODE=block` (shipped L2 default) | Yes — hook runtime errors exit 0 | Yes (stop, block mode) |
 | workflow-steering | Fail-open (exit 0 always) | Yes | No |
+| evidence-capture | Fail-open (exit 0 always) | Yes — capture errors never block or corrupt the log | No |
 **Telemetry**: Always fail-open. Hook runtime errors in telemetry scripts must never block agent work.
@@ -471,7 +519,12 @@ For structured `run()` responses (native import form), the return value is:
 | `SA_HOOK_INPUT_MAX_BYTES` | Integer string | `config-protection.js` |
 | `SA_QUALITY_GATE_FIX` | `true` / `false` | `quality-gate.js` |
 | `SA_QUALITY_GATE_STRICT` | `true` / `false` | `quality-gate.js` |
-| `FLOW_AGENTS_GOAL_FIT_STRICT` | `true` / `false` | `stop-goal-fit.js` |
+| `FLOW_AGENTS_GOAL_FIT_MODE` | `block` / `warn` / `off` | `stop-goal-fit.js` |
+| `FLOW_AGENTS_GOAL_FIT_MAX_BLOCKS` | Integer string (default 3) | `stop-goal-fit.js` |
+| `FLOW_AGENTS_GOAL_FIT_STRICT` | `true` / `false` (legacy alias for mode=block) | `stop-goal-fit.js` |
+| `FLOW_AGENTS_GOAL_FIT_BACKSTOP` | `block` (default) / `off` (=`warn`) / `skip` | `stop-goal-fit.js` |
+| `FLOW_AGENTS_GOAL_FIT_BACKSTOP_TIMEOUT_MS` | Integer string (default 120000) | `stop-goal-fit.js` |
+| `FLOW_AGENTS_GOAL_FIT_RECHECK` | `true` / `false` (opt-in re-run of model free-form command) | `stop-goal-fit.js` |
 | `FLOW_AGENTS_REQUIRE_SIDECARS` | `true` / `false` | `stop-goal-fit.js` |
 | `FLOW_AGENTS_REQUIRE_CRITIQUE` | `true` / `false` | `stop-goal-fit.js` |
 | `FLOW_AGENTS_HOOK_RUNTIME` | `claude-code`, `codex`, etc. | Hook adapters (forwarded to scripts) |

package/docs/standards-register.md CHANGED Viewed

@@ -57,8 +57,8 @@ Flow Agents may need local schemas for reliability glue that existing standards
 | Critique record | Reviewer passes, findings, severity, and resolution state for critique loops | `.flow-agents/<slug>/critique.json` | Draft schema: `schemas/workflow-critique.schema.json` |
 | Release readiness | Merge, release, deploy, hold, rollback, docs, and operational readiness decisions | `.flow-agents/<slug>/release.json` | Draft schema: `schemas/workflow-release.schema.json` |
 | Learning record | Repeated failure, correction, pattern, and recommended system update | `.flow-agents/<slug>/learning.json` or `.telemetry/outcomes.jsonl` | Draft schema: `schemas/workflow-learning.schema.json` |
-| Context map | Compact project map: structure, commands, conventions, test strategy, packs, and recent state | Generated under `.flow-agents/` or configurable cache | Planned |
-| Pack manifest | Core and optional pack composition for a target install | `packaging/packs.json` plus generated export catalog metadata | Draft manifest: `packaging/packs.json` |
+| Context map | Compact project map: structure, commands, conventions, test strategy, Kits, and recent state | Generated under `.flow-agents/` or configurable cache | Planned |
+| Kit Catalog | Product-facing catalog of Flow Kits and their activation, layered over the always-installed standalone base | `kits/catalog.json` plus generated export catalog metadata | Catalog: `kits/catalog.json` |
 | Governance adapter | Optional bridge from Flow Agents evidence gates to tools such as Veritas | `context/contracts/governance-adapter-contract.md` | Draft contract |
 These formats should be treated as contracts once introduced. Breaking changes require schema version bumps and migration notes.
@@ -93,4 +93,4 @@ Before merging a new schema, file format, or artifact:
 - Is the new format schema-described?
 - Is there a human-readable representation?
 - Can another tool consume or export it?
-- Does this belong in core or an optional pack?
+- Does this belong in the standalone base or an opinionated Flow Kit?

package/docs/survey-utterance-check.md CHANGED Viewed

@@ -289,7 +289,7 @@ Flow Agents does not own trust claim models, inquiry semantics, or extractor imp
 - Do not make `@kontourai/survey` a mandatory dependency of flow-agents.
 - Do not copy Survey's extraction or inquiry schemas into flow-agents.
-- Do not auto-register the hook in the default pack; it is opt-in only.
+- Do not auto-register the hook in the standalone base; it is opt-in only.
 - Do not make the hook blocking without explicit `mode: "strict"` or the env override.
 - Do not silently decide anything. The hook injects guidance; the agent decides next steps.

package/docs/trust-anchor-adoption.md ADDED Viewed

@@ -0,0 +1,197 @@
+---
+title: "Trust Anchor Adoption — Add the CI Trust Anchor to Your Repo"
+---
+# Trust Anchor Adoption
+This guide explains how to add the Flow Agents CI trust anchor to any repository that
+uses Flow Agents. The anchor is a required CI job that re-runs your canonical
+verification fresh in a clean environment and reconciles the agent's claimed passes
+against the real CI results. It is the external, un-disablable check that closes the
+loop on agent self-reporting.
+See [ADR 0017](adr/0017-anti-gaming-trust-security-model.md) for the full security
+model and threat analysis.
+## What the Trust Anchor Does
+1. **Re-runs verification fresh.** In a clean CI environment the agent does not
+   control, it runs your declared verify command (build + tests + lint). Real exit
+   codes. No agent influence.
+2. **Reconciles the delivered bundle.** If the agent published a `delivery/trust.bundle`
+   with the PR, the anchor cross-checks every claimed-pass command against CI's own
+   fresh results. Divergences (claimed pass + CI fail, laundered command, claim with
+   no evidence, checkpoint-only bundle) fail the job with a clear diagnostic.
+3. **Fails closed on compile-only.** If no comprehensive verify command is configured,
+   the anchor refuses to pass — preventing a "build only" attestation that misses tests.
+## Step 1 — The Agent Publishes a Bundle
+Flow Agents' deliver skill calls `publishDelivery`, which writes `delivery/trust.bundle`
+to the repository with `git add -f` during the `record-release` step. This file carries
+the session's evidence and claims to CI so the anchor can reconcile them.
+You do not need to configure this — it is part of the deliver skill workflow. The bundle
+is gitignored by default (the deliver skill force-adds it for the PR commit only).
+## Step 2 — Add the Composite Action
+In your repo, create or update a CI workflow file (e.g.
+`.github/workflows/trust-verify.yml`):
+```yaml
+name: Trust Verify
+on:
+  pull_request:
+  push:
+    branches: ["main"]
+  workflow_dispatch:
+permissions:
+  contents: read
+concurrency:
+  group: trust-verify-${{ github.ref }}
+  cancel-in-progress: true
+jobs:
+  trust-verify:
+    name: Trust Verify
+    runs-on: ubuntu-latest
+    timeout-minutes: 15
+    # Add id-token: write here if you enable sign: true (Sigstore attestation).
+    permissions:
+      contents: read
+    steps:
+      - name: Checkout
+        uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
+      - uses: kontourai/flow-agents/.github/actions/trust-verify@<SHA>
+        with:
+          # Declare your comprehensive verify command: build + tests + lint.
+          # The agent must run this same command locally (via trust-reconcile-verify).
+          verify-command: "npm run build && npm test && npm run lint"
+          # bundle: defaults to delivery/trust.bundle (auto-discovered if present)
+          # sign: false (set to true + add id-token: write for Sigstore attestation)
+```
+Replace `<SHA>` with the pinned commit SHA of the `kontourai/flow-agents` release you
+are adopting. Pin to a SHA (not a tag) for supply-chain security.
+**To find the SHA**: look at the
+[flow-agents releases](https://github.com/kontourai/flow-agents/releases) or pin to
+`main` HEAD after reviewing the CHANGELOG.
+## Step 3 — Arm It as a Required Status Check
+The action reports results but is advisory until you arm it server-side:
+1. Go to **Settings → Branches** in your GitHub repository.
+2. Edit (or create) the branch protection rule for `main`.
+3. Under **Require status checks to pass before merging**, add **`Trust Verify`**.
+4. Check **Require branches to be up to date before merging**.
+5. Enable **Do not allow bypassing the above settings** (the "enforce admins" option).
+Once armed, no PR can merge past a `Trust Verify` failure — including ones pushed by
+the agent.
+## Step 4 — Protect the Verify Config
+CODEOWNERS prevents the agent from quietly weakening the verify command. Add entries
+for the files that declare what CI runs:
+```
+# Trust anchor config — requires owner review.
+# An agent cannot weaken verify-command without a human approving the change.
+.github/workflows/trust-verify.yml  @your-org/owners
+package.json                         @your-org/owners
+```
+Adjust paths and team names for your repo structure.
+## Configuring the Verify Command
+The anchor fails closed if it cannot find a comprehensive verify command. Provide it
+one of three ways (in priority order):
+1. **Action input** `verify-command` (recommended for the composite action).
+2. **`TRUST_RECONCILE_COMMANDS` environment variable** (comma- or newline-separated).
+3. **`package.json` `scripts["trust-reconcile-verify"]`** — the anchor auto-discovers
+   this key. Add it to your `package.json`:
+   ```json
+   {
+     "scripts": {
+       "trust-reconcile-verify": "npm run build && npm test && npm run lint"
+     }
+   }
+   ```
+   Then you can also run it locally:
+   ```
+   npx @kontourai/flow-agents verify
+   ```
+## Local Use
+The `flow-agents verify` CLI subcommand runs the same trust-reconcile logic locally:
+```bash
+# Install (or npx):
+npm install -D @kontourai/flow-agents
+# Re-run verify + reconcile against a delivered bundle:
+npx @kontourai/flow-agents verify \
+  --commands "npm run build,npm test" \
+  --bundle delivery/trust.bundle
+# Auto-discover bundle + verify command from package.json:
+npx @kontourai/flow-agents verify
+# Help:
+npx @kontourai/flow-agents verify --help
+```
+Exit codes: 0 = clean (fresh verify passed, no divergence); 1 = failed/divergence.
+## Mirror: Flow Agents' Own Setup
+Flow Agents uses the same pattern in its own repository:
+- **`scripts/ci/trust-reconcile.js`** — the anchor script (runs in
+  `.github/workflows/trust-reconcile.yml`).
+- **`package.json` `trust-reconcile-verify`** — `npm run build && npm run eval:static`.
+- **`evals/ci/antigaming-suite.sh`** — the regression suite that proves the gate and
+  anchor work; runs in the required `ci.yml` lane.
+- **Branch protection** on `main` — `Trust Reconcile` required, `enforce_admins` on.
+## Adoption Checklist
+- [ ] Deliver skill is configured and publishes `delivery/trust.bundle`.
+- [ ] `.github/workflows/trust-verify.yml` added and the composite action is pinned.
+- [ ] `verify-command` declares a comprehensive verify (build + tests + lint).
+- [ ] `Trust Verify` added as a required, no-bypass status check on `main`.
+- [ ] CODEOWNERS entry protects `trust-verify.yml` and `package.json`.
+- [ ] (Optional) `scripts["trust-reconcile-verify"]` in `package.json` for local use.
+- [ ] (Optional) `sign: true` + `id-token: write` for Sigstore attestation.
+## Troubleshooting
+**"no comprehensive trust-reconcile-verify configured"**: Provide `verify-command` in
+the action input, set `TRUST_RECONCILE_COMMANDS`, or add `scripts["trust-reconcile-verify"]`
+to `package.json`. The anchor refuses to attest a compile-only check.
+**"trust divergence: agent claimed X passed; CI fresh run = FAIL"**: The agent's
+local environment or shell profile produced a false pass. The anchor correctly flagged
+the mismatch. Fix the underlying test failure.
+**"trust divergence: command contains exit-code-laundering operator"**: A claimed
+command used `||`, `; true`, or `; exit 0`. These mask real exit codes. Remove them.
+**"checkpoint-only bundle cannot be reconciled per-command"**: A `delivery/trust.bundle`
+was expected but only `delivery/trust.checkpoint.json` was found. The deliver skill
+publishes the full bundle; ensure it ran correctly.

package/docs/verifiable-trust.md ADDED Viewed

@@ -0,0 +1,95 @@
+# Verifiable Trust — why "done" actually means done
+> **The problem with autonomous coding agents: they grade their own homework.**
+> An agent writes the code, runs the tests, and reports "all green, shipped." If it's
+> wrong — or if it learns it can just *say* the tests passed — you find out in production.
+> Flow Agents is built so an agent **can't** mark work complete that isn't.
+## The one-line pitch
+Flow Agents treats "the work is done" as a **claim that must be proven**, not a status the
+agent gets to assert. Completion is gated by **evidence the system re-derives itself**, and
+the authoritative check runs in **CI — an environment the agent can't disable or fake** —
+with **cryptographically signed provenance** of exactly what was verified.
+Most agent frameworks trust the model's self-report. Flow Agents doesn't trust the agent,
+its claims, *or* its environment.
+## What that buys you
+- **"Done" you can rely on.** A finished task ends with real evidence — tests, build, lint,
+  review findings, captured command results — and the gate *re-derives* the verdict from
+  that evidence. A claimed pass that contradicts a captured failure is **blocked**, not shipped.
+- **Anti-gaming by design.** The gate independently captures real command results and
+  reconciles them against the agent's claims — namespace-agnostic, and independent of any
+  status the agent self-declares. Tricks like `npm test || true` (laundering the exit code)
+  are rejected.
+- **An external anchor that can't be switched off.** On every pull request, CI re-runs the
+  verification *fresh* in a clean environment and **fails the merge on any divergence**
+  between what the agent claimed and what CI actually observed. The agent can tamper with its
+  own machine all it likes; it can't reach into CI.
+- **Signed provenance.** CI mints a Sigstore-signed attestation over its *own* results. The
+  agent has no signing identity, so a fabricated "green" can't be signed — you get a
+  tamper-evident, externally-verifiable record of what shipped.
+- **The gate can't be silently weakened.** The anti-gaming test suite runs as a **required
+  CI check**, and the gate/CI/verify config require **code-owner review** — so a change that
+  guts the protections can't merge.
+## Who it's for
+- **Solo builders and teams shipping agent-written code.** Run the agent, and trust that
+  what it marks "done" is verified — not just asserted. Less re-checking, fewer "it said it
+  passed but it didn't" surprises.
+- **Unattended / AFK and overnight agents.** When you're not watching, the gate is. An agent
+  running autonomously can't quietly ship broken work past a green self-report.
+- **High-assurance, regulated, and audited environments.** Every delivery carries a signed,
+  reproducible record of *what was verified and how* — provenance you can hand to an auditor,
+  not a screenshot of a passing run.
+- **Multi-agent and at-scale delivery.** Every agent's output is held to the same external,
+  un-gameable bar — so you can fan out work without fanning out the risk that one agent
+  learns to game its gate.
+- **Platform teams adopting agents.** Add the trust anchor as a required check in your repos
+  and get a consistent, enforced "agents must prove it" policy across every team.
+## How it's different
+| | Typical agent setup | Flow Agents |
+|---|---|---|
+| "Is it done?" | The agent says so | Re-derived from independent evidence |
+| Failure hiding | Easy (claim pass, launder exit codes) | Caught — captured results reconcile against claims |
+| Where trust lives | In the agent's environment | **External** — CI re-runs fresh, agent can't disable it |
+| Provenance | A log line | **Sigstore-signed** attestation of CI's own results |
+| Tampering with the gate | Possible | Required tests + code-owner review block it |
+## The honest part
+This is a **defense-in-depth bar-raiser, not a magic wall** — and the docs say so plainly.
+The local gate raises the cost of casual or direct self-tampering; the *real* tamper-proof
+boundary is **external**: CI's fresh re-run, the CI-minted signatures, and human (owner)
+review — none of which the agent can reach. Known residuals (and their mitigations) are
+documented openly rather than hidden, because overstating security is its own risk. We'd
+rather you trust this *because* you can see where the lines are.
+> This posture wasn't designed on paper and declared safe — it was **earned by an adversarial
+> loop**: independent reviewers repeatedly tried to defeat the gate (and found real holes we
+> closed) until a round came back clean. That loop is now part of the policy, and the
+> regression suite that proves it runs on every change.
+## Add it to your repo
+The same external anchor works in **any** repo that uses Flow Agents — add the
+[`trust-verify` composite action](trust-anchor-adoption.md) as a required check, or run it
+locally / in any CI with `npx @kontourai/flow-agents verify`. See the
+[Trust Anchor Adoption guide](trust-anchor-adoption.md) for the full wiring (publish the
+bundle → add the action → make it a required, no-bypass check + CODEOWNERS).
+## Learn more
+- **Add the anchor to your repo:** [Trust Anchor Adoption guide](trust-anchor-adoption.md)
+- **Architecture, threat model, and residuals:** [ADR 0017 — The Anti-Gaming Trust Security
+  Model](adr/0017-anti-gaming-trust-security-model.md)
+- **The trust state model it builds on:** [ADR 0010 — Workflow Trust State as a Hachure
+  Bundle](adr/0010-workflow-trust-state-as-hachure-bundle.md), [ADR 0004 — Gates Expect
+  Surface Claims](adr/0004-gates-expect-surface-claims.md)
+- **Turning on the external teeth** (admin, one-time): the CI anchor + code-owner review are
+  armed by two server-side branch-protection settings — see the activation note in ADR 0017.

package/docs/veritas-integration.md CHANGED Viewed

@@ -30,7 +30,7 @@ The user sees a clear result: pass, fail, hold, or not verified. The implementat
 | Area | Flow Agents Owns | Veritas Owns |
 | --- | --- | --- |
-| Workflow | Agent-facing workflow packs, harness hooks, sidecars, release decisions, learning loops | None |
+| Workflow | Agent-facing workflow skills, harness hooks, sidecars, release decisions, learning loops | None |
 | Flow | Process steps, gates, transitions, Flow Runs, exceptions, and Flow Reports | None |
 | Governance | When to ask for governance evidence | Repo standards, authority settings, evidence checks |
 | Evidence | `evidence.json`, `standard_refs`, `external_evidence`, acceptance mapping | Native Veritas reports and rule results |
@@ -157,7 +157,7 @@ Current local configuration in this repo is limited to:
 ## Non-Goals
 - Do not vendor Veritas source into Flow Agents.
-- Do not make Veritas mandatory for the core pack.
+- Do not make Veritas mandatory for the standalone base.
 - Do not duplicate Veritas policy schemas inside Flow Agents.
 - Do not make knowledge, meeting, or sales workflows depend on development governance tooling.
 - Do not bootstrap `.veritas/repo-map.json` from Flow Agents in this slice. Native Veritas repository setup remains future Veritas-owned or adapter-owned work.

package/docs/workflow-usage-guide.md CHANGED Viewed

@@ -278,6 +278,34 @@ npm run workflow:sidecar -- init-plan .flow-agents/<slug>/<slug>--deliver.md \
   --next-action "<next step>"
 ```
+#### Deterministic slug from a work-item ref
+For issue-backed sessions, pass `--work-item <owner/repo#id>` instead of `--task-slug`. The
+derived slug has the format `<owner>-<repo>-<id>` — for example:
+```bash
+npm run workflow:sidecar -- ensure-session \
+  --work-item "kontourai/flow-agents#161" \
+  --source-request "Implement #161" \
+  --summary "Deterministic slug demo."
+# Creates .flow-agents/kontourai-flow-agents-161/
+```
+The slug is deterministic and idempotent: any agent or worktree that runs `ensure-session
+--work-item kontourai/flow-agents#161` will land in the same directory. This makes liveness
+collision-detection work correctly — the `subjectId` written to `liveness/events.jsonl` equals
+`workItemSlug(ref)` (i.e. `kontourai-flow-agents-161`), so a double-hold on the same issue is
+detectable via `liveness status --subject kontourai-flow-agents-161` (see
+[ADR 0012](adr/0012-agent-coordination-as-liveness-claims.md)).
+Rules:
+- `--task-slug` always wins when both flags are supplied (back-compat).
+- Omitting both flags still dies with `--task-slug is required`.
+- The `id` part after `#` must be a plain integer (GitHub issue number). Non-integer ids are
+  rejected.
+- Issue-backed sessions should prefer `--work-item` over hand-supplied `--task-slug` so that
+  liveness subjectId alignment is automatic.
 Reviewer Markdown artifacts can be imported into `critique.json`:
 ```bash
@@ -441,3 +469,44 @@ Retrospective:
 ```text
 Use learning-review. Capture facts, decisions, gaps, follow-ups, and durable knowledge updates from this completed or failed workflow.
 ```
+## Resumable sessions
+When a session resumes (after context compaction, an agent restart, or a cross-session
+handoff), the workflow-steering hook emits a `RESUME:` block on `SessionStart` that
+gives the resuming agent immediate situational awareness without blocking or auto-deciding.
+The `RESUME:` block supplements the existing `STATE:` line and contains:
+- **Header** — `RESUME: <slug> status:<status> phase:<phase>` — quick orientation.
+- **Next action** — the full `next_action.summary` at 240 characters (not truncated to 80), so the agent can re-ground to the exact recorded next step.
+- **Plan** — path to the plan artifact (`<slug>--plan-work.md` from `state.json artifact_paths` or conventional fallback).
+- **Next step** — the first `handoff.json next_steps` entry.
+- **Blockers** — any recorded blockers from `handoff.json`, or "none".
+- **Trust** — `Trust: N verified / M disputed / T total` from reading `trust.bundle`. Each disputed or unknown claim is listed with its id and a copy-pasteable remedy command: `npm run workflow:sidecar -- claim <id> <dir>`.
+- **Liveness advisory** (when applicable) — `[LIVENESS WARNING: another agent appears live on this work: actor <X>, last seen <T>]` when the shared liveness stream (`.flow-agents/liveness/events.jsonl`, ADR 0012) contains a fresh claim or heartbeat from a different actor for the same slug. This is advisory only — the hook exits 0 regardless.
+- **Route hint** — `To continue: resume this work. Or run pull-work to assess WIP and start new/parallel work.` — always routes the resume-vs-parallel decision through `pull-work` rather than auto-taking it.
+The `RESUME:` block appears on `SessionStart` only. `UserPromptSubmit` and `PostToolUse`
+behavior is unchanged.
+All reads are fail-open: a missing `handoff.json`, `trust.bundle`, or liveness stream
+degrades gracefully — the section is omitted or shows "no data", and the hook never throws.
+The liveness freshness check is read-only (ADR 0012). Writing or excluding liveness claims
+is scoped to issue #151 (a later slice). The session-level event log (Layer 2) is also
+deferred.
+### Shared liveness helper
+The freshness logic is centralised in `scripts/hooks/lib/liveness-read.js` (pure CJS,
+zero dependencies). It exports:
+- `readLivenessEvents(streamPath)` — reads a `.flow-agents/liveness/events.jsonl` file
+  line-by-line, JSON-parses each, and tolerates malformed lines.
+- `freshHolders(events, slug, selfActor, nowMs)` — returns actors (excluding `selfActor`)
+  who hold a within-TTL claim or heartbeat on `subjectId === slug`.
+Both the hook (`scripts/hooks/workflow-steering.js`) and the compiled CLI
+(`build/src/cli/workflow-sidecar.js`) consume this helper so the TTL/freshness logic lives
+in one place.