@kontourai/flow-agents 1.4.0 → 2.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.github/CODEOWNERS +29 -0
- package/.github/actions/trust-verify/action.yml +145 -0
- package/.github/workflows/ci.yml +11 -4
- package/.github/workflows/kit-gates-demo.yml +2 -2
- package/.github/workflows/publish-npm.yml +10 -2
- package/.github/workflows/release-please.yml +1 -1
- package/.github/workflows/runtime-compat.yml +1 -1
- package/.github/workflows/trust-reconcile.yml +113 -0
- package/AGENTS.md +13 -0
- package/CHANGELOG.md +103 -0
- package/CONTRIBUTING.md +4 -4
- package/README.md +1 -0
- package/agents/tool-planner.json +1 -1
- package/build/src/cli/init.js +242 -20
- package/build/src/cli/validate-workflow-artifacts.js +19 -2
- package/build/src/cli/verify.d.ts +1 -0
- package/build/src/cli/verify.js +90 -0
- package/build/src/cli/workflow-sidecar.d.ts +316 -8
- package/build/src/cli/workflow-sidecar.js +1996 -91
- package/build/src/cli.js +2 -3
- package/build/src/lib/flow-resolver.d.ts +111 -0
- package/build/src/lib/flow-resolver.js +308 -0
- package/build/src/tools/build-universal-bundles.js +34 -22
- package/build/src/tools/generate-context-map.js +3 -16
- package/build/src/tools/validate-source-tree.d.ts +1 -1
- package/build/src/tools/validate-source-tree.js +42 -162
- package/context/contracts/artifact-contract.md +10 -0
- package/context/contracts/delivery-contract.md +1 -0
- package/context/contracts/review-contract.md +1 -0
- package/context/contracts/verification-contract.md +2 -0
- package/context/gate-awareness.md +39 -0
- package/context/scripts/hooks/stop-goal-fit.js +632 -70
- package/docs/adr/0001-flow-agents-consumes-flow.md +1 -1
- package/docs/adr/0002-flow-kits-as-extension-unit.md +1 -1
- package/docs/adr/0004-gates-expect-surface-claims.md +2 -0
- package/docs/adr/0005-kubernetes-inspired-resource-contracts.md +2 -0
- package/docs/adr/0007-skill-audit.md +1 -1
- package/docs/adr/0009-canonical-hook-core-kit-boundary.md +95 -0
- package/docs/adr/0010-workflow-trust-state-as-hachure-bundle.md +139 -0
- package/docs/adr/0011-mcp-posture.md +100 -0
- package/docs/adr/0012-agent-coordination-as-liveness-claims.md +119 -0
- package/docs/adr/0013-context-lifecycle.md +151 -0
- package/docs/adr/0014-core-vs-domain-kit-boundary.md +143 -0
- package/docs/adr/0015-flow-flow-agents-boundary-reconciliation.md +120 -0
- package/docs/adr/0016-three-hard-boundary-model.md +71 -0
- package/docs/adr/0017-anti-gaming-trust-security-model.md +155 -0
- package/docs/agent-system-guidebook.md +5 -12
- package/docs/context-map.md +4 -10
- package/docs/index.md +3 -2
- package/docs/integrations/framework-adapter.md +19 -6
- package/docs/integrations/index.md +2 -2
- package/docs/north-star.md +4 -4
- package/docs/operating-layers.md +3 -3
- package/docs/plans/adr-0010-phase2-gate-recompute.md +55 -0
- package/docs/repository-structure.md +2 -2
- package/docs/skills-map.md +1 -0
- package/docs/spec/runtime-hook-surface.md +62 -9
- package/docs/standards-register.md +3 -3
- package/docs/survey-utterance-check.md +1 -1
- package/docs/trust-anchor-adoption.md +197 -0
- package/docs/verifiable-trust.md +95 -0
- package/docs/veritas-integration.md +2 -2
- package/docs/workflow-usage-guide.md +69 -0
- package/evals/acceptance/DEMO-false-completion.md +144 -0
- package/evals/acceptance/demo-cast.sh +92 -0
- package/evals/acceptance/demo-false-completion.sh +72 -0
- package/evals/acceptance/demo-real-evidence.sh +104 -0
- package/evals/acceptance/demo.tape +29 -0
- package/evals/acceptance/prove-capture-teeth-declared.sh +335 -0
- package/evals/acceptance/prove-capture-teeth.sh +114 -0
- package/evals/acceptance/prove-teeth.sh +105 -0
- package/evals/ci/antigaming-suite.sh +55 -0
- package/evals/ci/run-baseline.sh +2 -0
- package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/flows/review.flow.json +26 -0
- package/evals/fixtures/flow-kit-repository/invalid-missing-extension-asset/kit.json +20 -0
- package/evals/fixtures/flow-kit-repository/valid-unknown-extension/flows/review.flow.json +26 -0
- package/evals/fixtures/flow-kit-repository/valid-unknown-extension/kit.json +18 -0
- package/evals/integration/test_builder_step_producers.sh +379 -0
- package/evals/integration/test_bundle_install.sh +35 -71
- package/evals/integration/test_bundle_lifecycle.sh +39 -2
- package/evals/integration/test_captured_fail_reconciliation.sh +820 -0
- package/evals/integration/test_checkpoint_signing.sh +489 -0
- package/evals/integration/test_claim_lookup.sh +352 -0
- package/evals/integration/test_command_log_fork_classification.sh +134 -0
- package/evals/integration/test_command_log_integrity.sh +275 -0
- package/evals/integration/test_context_map.sh +0 -2
- package/evals/integration/test_dual_emit_flow_step.sh +278 -0
- package/evals/integration/test_enforcer_expects_driven.sh +281 -0
- package/evals/integration/test_evidence_capture_hook.sh +185 -0
- package/evals/integration/test_flow_kit_repository.sh +2 -0
- package/evals/integration/test_flowdef_session_activation.sh +273 -0
- package/evals/integration/test_flowdef_session_history_preservation.sh +250 -0
- package/evals/integration/test_gate_bypass_chain.sh +448 -0
- package/evals/integration/test_gate_lockdown.sh +1137 -0
- package/evals/integration/test_gate_review_inquiry_records.sh +399 -0
- package/evals/integration/test_goal_fit_escape_hatch.sh +73 -0
- package/evals/integration/test_goal_fit_hook.sh +69 -4
- package/evals/integration/test_goal_fit_rederive.sh +263 -0
- package/evals/integration/test_install_merge.sh +1176 -0
- package/evals/integration/test_kit_identity_trust.sh +393 -0
- package/evals/integration/test_mint_attestation.sh +373 -0
- package/evals/integration/test_phase_map_and_gate_claim.sh +365 -0
- package/evals/integration/test_publish_delivery.sh +269 -0
- package/evals/integration/test_reconcile_soundness.sh +528 -0
- package/evals/integration/test_resolvefirststep_security.sh +208 -0
- package/evals/integration/test_session_resume_roundtrip.sh +286 -0
- package/evals/integration/test_trust_checkpoint.sh +325 -0
- package/evals/integration/test_trust_reconcile.sh +293 -0
- package/evals/integration/test_verify_cli.sh +208 -0
- package/evals/integration/test_workflow_sidecar_writer.sh +549 -34
- package/evals/lib/node.sh +0 -6
- package/evals/run.sh +47 -0
- package/evals/static/test_workflow_skills.sh +6 -13
- package/install.sh +0 -7
- package/integrations/strands-ts/README.md +25 -15
- package/integrations/veritas/flow-agents.adapter.json +1 -2
- package/kits/builder/flows/build.flow.json +59 -12
- package/kits/builder/kit.json +85 -15
- package/kits/builder/skills/continue-work/SKILL.md +116 -0
- package/kits/builder/skills/deliver/SKILL.md +36 -6
- package/kits/builder/skills/design-probe/SKILL.md +28 -0
- package/kits/builder/skills/execute-plan/SKILL.md +9 -1
- package/kits/builder/skills/gate-review/SKILL.md +234 -0
- package/kits/builder/skills/learning-review/SKILL.md +30 -0
- package/kits/builder/skills/pickup-probe/SKILL.md +29 -0
- package/kits/builder/skills/plan-work/SKILL.md +13 -1
- package/kits/builder/skills/pull-work/SKILL.md +19 -0
- package/kits/knowledge/adapters/default-store/index.js +38 -0
- package/kits/knowledge/adapters/flow-runner/index.js +1620 -0
- package/kits/knowledge/adapters/obsidian-store/index.js +36 -6
- package/kits/knowledge/docs/store-contract.md +314 -0
- package/kits/knowledge/evals/audit-freshness/suite.test.js +368 -0
- package/kits/knowledge/evals/canonicalize-category/suite.test.js +383 -0
- package/kits/knowledge/evals/contract-suite/suite.test.js +111 -0
- package/kits/knowledge/evals/detect-contradictions/suite.test.js +324 -0
- package/kits/knowledge/evals/entities/suite.test.js +40 -0
- package/kits/knowledge/evals/glossary-sync/suite.test.js +416 -0
- package/kits/knowledge/evals/hygiene-review/suite.test.js +396 -0
- package/kits/knowledge/evals/retirement/suite.test.js +145 -0
- package/kits/knowledge/flows/audit-freshness.flow.json +44 -0
- package/kits/knowledge/flows/canonicalize-category.flow.json +44 -0
- package/kits/knowledge/flows/detect-contradictions.flow.json +44 -0
- package/kits/knowledge/flows/glossary-sync.flow.json +61 -0
- package/kits/knowledge/flows/hygiene-review.flow.json +43 -0
- package/kits/knowledge/kit.json +51 -1
- package/package.json +6 -6
- package/packaging/conformance/README.md +10 -2
- package/packaging/conformance/fixtures/evidence-capture--allow-records-command.json +29 -0
- package/packaging/conformance/fixtures/stop-goal-fit--block-bundle-disputed-claim.json +29 -0
- package/packaging/conformance/fixtures/stop-goal-fit--block-capture-contradicts-claimed-pass.json +30 -0
- package/packaging/conformance/fixtures/stop-goal-fit--block-mode.json +23 -0
- package/packaging/conformance/fixtures/stop-goal-fit--off-mode.json +24 -0
- package/packaging/conformance/fixtures/stop-goal-fit--warn-active-delivery.json +5 -2
- package/packaging/conformance/fixtures/stop-goal-fit--warn-no-bundle.json +23 -0
- package/packaging/conformance/fixtures/workflow-steering--reground-active-prompt.json +30 -0
- package/packaging/conformance/fixtures/workflow-steering--reground-session-start.json +30 -0
- package/packaging/conformance/run-conformance.js +1 -1
- package/scripts/README.md +2 -1
- package/scripts/build-universal-bundles.js +0 -1
- package/scripts/ci/mint-attestation.js +221 -0
- package/scripts/ci/trust-reconcile.js +545 -0
- package/scripts/hooks/config-protection.js +423 -1
- package/scripts/hooks/evidence-capture.js +348 -0
- package/scripts/hooks/lib/liveness-read.js +113 -0
- package/scripts/hooks/run-hook.js +6 -1
- package/scripts/hooks/stop-goal-fit.js +1524 -79
- package/scripts/hooks/workflow-steering.js +135 -5
- package/scripts/install-codex-home.sh +39 -0
- package/scripts/install-merge.js +330 -0
- package/scripts/repair-command-log.js +115 -0
- package/src/cli/init.ts +218 -20
- package/src/cli/validate-workflow-artifacts.ts +18 -2
- package/src/cli/verify.ts +100 -0
- package/src/cli/workflow-sidecar.ts +2127 -84
- package/src/cli.ts +2 -3
- package/src/lib/flow-resolver.ts +369 -0
- package/src/tools/build-universal-bundles.ts +34 -21
- package/src/tools/generate-context-map.ts +3 -17
- package/src/tools/validate-source-tree.ts +44 -104
- package/build/src/tools/filter-installed-packs.d.ts +0 -2
- package/build/src/tools/filter-installed-packs.js +0 -135
- package/packaging/packs.json +0 -49
- package/scripts/filter-installed-packs.js +0 -2
- package/src/tools/filter-installed-packs.ts +0 -132
|
@@ -0,0 +1,155 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: "ADR 0017: The Anti-Gaming Trust Security Model — Layered Defense + External CI Anchor"
|
|
3
|
+
---
|
|
4
|
+
|
|
5
|
+
# ADR 0017: The Anti-Gaming Trust Security Model — Layered Defense + External CI Anchor
|
|
6
|
+
|
|
7
|
+
Status: Accepted
|
|
8
|
+
|
|
9
|
+
## Context
|
|
10
|
+
|
|
11
|
+
Flow Agents runs an autonomous coding agent that produces work and *reports its own
|
|
12
|
+
completion* through a trust gate (the Stop hook `stop-goal-fit.js`, ADR 0010 / ADR 0004):
|
|
13
|
+
the agent records evidence/claims into a `trust.bundle`, and the gate decides whether the
|
|
14
|
+
work may be marked complete.
|
|
15
|
+
|
|
16
|
+
This is a self-grading exam. The motivating question (from the owner): *"How do I know the
|
|
17
|
+
agent isn't leaving a security hole it will use to tamper with the process later?"* The
|
|
18
|
+
honest answer is that you cannot trust the agent's self-report, and — critically — you
|
|
19
|
+
cannot trust the **author of the gate** either: an agent that builds the gate can build a
|
|
20
|
+
hole into it while truthfully reporting "all tests green."
|
|
21
|
+
|
|
22
|
+
This ADR records the threat model, the layered defense that resulted from a 7-round
|
|
23
|
+
adversarial-hardening loop (PRs #213–#228), what each layer does and does **not** defend,
|
|
24
|
+
the residuals, and how the whole thing is activated.
|
|
25
|
+
|
|
26
|
+
## Threat model
|
|
27
|
+
|
|
28
|
+
The adversary is the running agent itself (or a future agent), which has shell access and
|
|
29
|
+
can write any file in the workspace. The defining property the loop kept rediscovering:
|
|
30
|
+
|
|
31
|
+
> **The gate's strongest checks key off agent-controlled inputs.**
|
|
32
|
+
> `state.json.status` (completion gating + block regime), command strings (exit-code
|
|
33
|
+
> laundering like `npm test || true`), the sourced shell profile (`FLOW_AGENTS_GOAL_FIT_MODE=off`),
|
|
34
|
+
> and the gate/CI source itself are all writable by the agent.
|
|
35
|
+
|
|
36
|
+
A gate that runs *inside* the agent's environment and judges from state the agent writes
|
|
37
|
+
can always be dodged by self-declaration. Therefore no purely-local control is airtight; the
|
|
38
|
+
authoritative anchor must live somewhere the agent cannot pre-disable or post-forge — **CI**.
|
|
39
|
+
|
|
40
|
+
## Decision: a layered defense, anchored externally
|
|
41
|
+
|
|
42
|
+
### Layer 1 — Local gate (fast feedback, bar-raiser)
|
|
43
|
+
`scripts/hooks/stop-goal-fit.js` + `evidence-capture.js` + `config-protection.js`.
|
|
44
|
+
- **Re-derive, don't trust:** the gate re-derives claim status from evidence via
|
|
45
|
+
`@kontourai/surface` (`deriveClaimStatus`) rather than trusting stored status (ADR 0010).
|
|
46
|
+
- **Independent capture:** `evidence-capture.js` records real command results to a
|
|
47
|
+
hash-chained `command-log.jsonl`; the gate cross-references claimed-passes against it
|
|
48
|
+
("caught false-completion" blocks). `captureCrossReference` is FlowDefinition-aware so it
|
|
49
|
+
sees kit-typed (`builder.*`) claims, not only `workflow.*` (#214).
|
|
50
|
+
- **Captured-FAIL reconciliation, namespace-agnostic + status-independent (#218, #219):**
|
|
51
|
+
any claim asserting pass for a command whose latest capture is FAIL blocks, on every stop,
|
|
52
|
+
regardless of self-declared status.
|
|
53
|
+
- **Exit-code laundering:** a claimed verify command containing *any* `||` (or trailing
|
|
54
|
+
`; true`/`exit 0`) is rejected — it masks the real exit code (#219, #227).
|
|
55
|
+
- **Anti-forgery:** `config-protection.js` blocks agent Write/Edit/redirect/`cp` to the
|
|
56
|
+
kill-switch + truth-source files (`.claude/settings.json`, shell profiles,
|
|
57
|
+
`.flow-agents/<slug>/current.json|state.json|trust.bundle`, `delivery/trust.bundle`)
|
|
58
|
+
(#216, #220, #227). Path traversal in `active_flow_id` is rejected (#215, #217).
|
|
59
|
+
`MAX_BLOCKS` never auto-releases a hard block (#216).
|
|
60
|
+
|
|
61
|
+
### Layer 2 — External CI anchor (authoritative)
|
|
62
|
+
`scripts/ci/trust-reconcile.js` + `.github/workflows/trust-reconcile.yml`.
|
|
63
|
+
- On a PR, in a clean CI environment the agent cannot pre-disable, **re-run the real
|
|
64
|
+
verification fresh** (`trust-reconcile-verify` = `build && eval:static`; integration +
|
|
65
|
+
the anti-gaming suite run in the required `ci.yml` lanes — see Layer 4), and **reconcile**
|
|
66
|
+
the delivered bundle's claims against CI's own results: a claimed-pass that CI shows FAIL,
|
|
67
|
+
a command CI never ran, or a laundered command → **divergence, job fails** (#222, #226).
|
|
68
|
+
- CI **never trusts** the agent's bundle as a truth source — it only uses it to detect
|
|
69
|
+
divergence. Bundle command labels are **never executed** (no CI-RCE); they are string-
|
|
70
|
+
compared against repo-declared canonical commands only.
|
|
71
|
+
- Fail-open on bundle absence (the fresh verify still gates); fail-closed on divergence and
|
|
72
|
+
on a compile-only / laundered verify config (#226).
|
|
73
|
+
|
|
74
|
+
### Layer 3 — Provenance (un-forgeable)
|
|
75
|
+
`scripts/ci/mint-attestation.js`. CI mints an in-toto/DSSE attestation over its *own* fresh
|
|
76
|
+
results, signed via keyless Sigstore. The local agent has no ambient OIDC identity, so it
|
|
77
|
+
**cannot mint a trusted signature** — a fabricated "green" cannot be signed (#224).
|
|
78
|
+
|
|
79
|
+
### Layer 4 — Enforcement guards (so the gate itself can't be silently weakened)
|
|
80
|
+
- The **anti-gaming regression suite** (`evals/ci/antigaming-suite.sh`, 17 tests) runs in
|
|
81
|
+
the **required** `ci.yml` "Runtime and Kit" lane (#228). Weakening any gate/anchor/
|
|
82
|
+
protection — or deleting a regression test — turns the lane red and cannot merge.
|
|
83
|
+
- **CODEOWNERS** (#221, #228) requires owner review for the gate hooks, the CI definition,
|
|
84
|
+
and the verify config (`package.json`, `scripts/ci/`, `evals/ci/`, `evals/run.sh`).
|
|
85
|
+
- Branch protection on `main`: required status checks + `enforce_admins`.
|
|
86
|
+
|
|
87
|
+
### Delivery transport
|
|
88
|
+
`delivery/trust.bundle` (gitignored by default; the deliver skill `git add -f`s it at an
|
|
89
|
+
actual delivery) carries the session bundle to CI; CI reconciles it against fresh truth
|
|
90
|
+
(#223, #1b). `publishDelivery` writes it via `fs` at `record-release`.
|
|
91
|
+
|
|
92
|
+
## How the layers map to the threat model
|
|
93
|
+
|
|
94
|
+
| Agent capability | Defended by |
|
|
95
|
+
|---|---|
|
|
96
|
+
| Lie in the bundle (claim pass) | L1 re-derive + capture cross-ref; L2 fresh re-run + reconcile |
|
|
97
|
+
| Launder the exit code (`\|\| true`) | L1 + L2 laundering screen |
|
|
98
|
+
| Forge `state.json`/`trust.bundle`/`current.json` | L1 config-protection; L2 doesn't trust them |
|
|
99
|
+
| Disable the local gate (env/profile) | L2 (CI ignores the agent's env) |
|
|
100
|
+
| Weaken the gate/anchor source | L4 required anti-gaming suite + CODEOWNERS |
|
|
101
|
+
| Fabricate a signed "green" | L3 (no local OIDC) |
|
|
102
|
+
|
|
103
|
+
## The adversarial-hardening method (and why it is part of the decision)
|
|
104
|
+
|
|
105
|
+
The architecture above was **not** designed up front; it was *earned* by an adversarial
|
|
106
|
+
loop, and that loop is itself the policy: **author self-certification is insufficient.**
|
|
107
|
+
Across 7 rounds, fresh-context adversaries (red-team + security audit, told to distrust the
|
|
108
|
+
author's "verified" claims) found and we closed 7+ real holes — several of which were
|
|
109
|
+
shipped by the gate's own author while reporting tests green. The loop is run to convergence
|
|
110
|
+
(a round that finds nothing newly exploitable beyond known residuals). The required
|
|
111
|
+
anti-gaming suite (L4) is how convergence is kept enforced over time.
|
|
112
|
+
|
|
113
|
+
## Residuals (honest — this is a bar-raiser, not a wall)
|
|
114
|
+
|
|
115
|
+
- **Local env kill-switch:** an agent can write a shell profile via interpreter one-liners
|
|
116
|
+
(`node -e`, `py`-style `-c` invocations) or runtime-constructed paths that
|
|
117
|
+
`config-protection` cannot fully enumerate, setting
|
|
118
|
+
`FLOW_AGENTS_GOAL_FIT_MODE=off` for the next local hook. *Mitigated by:* L2 (CI is
|
|
119
|
+
unaffected by the agent's env) + L4.
|
|
120
|
+
- **Wrapper-script verify:** an agent could point `trust-reconcile-verify` at a script that
|
|
121
|
+
exits 0 without `||`. *Mitigated by:* CODEOWNERS on the verify config (L4) + the required
|
|
122
|
+
anti-gaming suite running independently of the verify command.
|
|
123
|
+
- **Runtime-path / exotic-tool forge** (`rsync`, `dd`, `path.join(...)`) of protected files:
|
|
124
|
+
not caught at the hook layer. *Mitigated by:* L2 not trusting those files + human review.
|
|
125
|
+
- **The irreducible boundary:** an admin/owner can bypass branch protection or approve a
|
|
126
|
+
malicious change. That is the trust-the-humans boundary; it is documented, not closed.
|
|
127
|
+
|
|
128
|
+
The consistent principle: **the real tamper-proof boundary is external — the CI anchor's
|
|
129
|
+
fresh re-run, CI-minted signatures, and human (owner) review. The local gate raises the cost
|
|
130
|
+
of casual/direct self-tampering; it does not claim to stop a determined agent on its own.**
|
|
131
|
+
|
|
132
|
+
## Activation (required, server-side — issue #225)
|
|
133
|
+
|
|
134
|
+
The CI anchor and CODEOWNERS run and report today, but two settings (admin-only, not in the
|
|
135
|
+
repo) **arm** them as no-bypass gates:
|
|
136
|
+
1. Add **`Trust Reconcile`** as a required, no-bypass status check on `main`.
|
|
137
|
+
2. Enable **"Require review from Code Owners."**
|
|
138
|
+
|
|
139
|
+
Until set, `Trust Reconcile` + CODEOWNERS are advisory. The anti-gaming suite (L4) is
|
|
140
|
+
*already* blocking because it rides the existing required "Runtime and Kit" check.
|
|
141
|
+
|
|
142
|
+
## Consequences
|
|
143
|
+
|
|
144
|
+
- The trust gate's correctness no longer depends on trusting the agent, its claims, or its
|
|
145
|
+
environment — only on the external CI anchor + human review, which the agent cannot reach.
|
|
146
|
+
- Any change that weakens the gate/anchor/protections is caught by the required anti-gaming
|
|
147
|
+
suite and cannot merge.
|
|
148
|
+
- The model is reusable for the product: a downstream repo adds the `trust-reconcile` job +
|
|
149
|
+
the anti-gaming suite as required checks to get the same external anchor (Phase 3, future).
|
|
150
|
+
|
|
151
|
+
## Related
|
|
152
|
+
|
|
153
|
+
ADR 0004 (gates expect Surface claims), ADR 0010 (workflow trust state as a hachure bundle),
|
|
154
|
+
ADR 0012 (liveness claims), ADR 0016 (three-hard-boundary model — the gate is core,
|
|
155
|
+
FlowDefinition-driven). Implementation: PRs #213–#228. Activation: issue #225.
|
|
@@ -126,7 +126,7 @@ The important source areas are:
|
|
|
126
126
|
| `context/contracts/` | Shared workflow contracts for planning, execution, verification, delivery, sandboxing, and governance adapters. |
|
|
127
127
|
| `scripts/` | Build, validation, hook, telemetry, and sidecar tooling. |
|
|
128
128
|
| `evals/` | Static, behavioral, integration, and bundle-install tests. |
|
|
129
|
-
| `packaging/` |
|
|
129
|
+
| `packaging/` | Cross-runtime export manifest and packaging rules. |
|
|
130
130
|
| `docs/` | Durable explanation of the operating model and roadmap. |
|
|
131
131
|
|
|
132
132
|
Flow Agents currently carries local workflow sidecars and hooks while Flow is being separated into its own Kontour product layer. The intended boundary is that Flow owns generic steps, gates, transitions, Flow Runs, exceptions, and Flow Reports; Flow Agents owns the agent-facing modes, skills, provider settings, runtime adapters, and Console experience that make those flows useful.
|
|
@@ -248,18 +248,11 @@ The repo has tests for:
|
|
|
248
248
|
|
|
249
249
|
The intended pattern is that every important workflow rule gets a test at the lowest useful layer: static checks for text contracts, integration checks for scripts and hooks, and behavioral evals for runtime agent behavior when practical.
|
|
250
250
|
|
|
251
|
-
###
|
|
251
|
+
### Neutral base and Kits
|
|
252
252
|
|
|
253
|
-
|
|
253
|
+
Every install ships the full standalone base — the `skills/`, `agents/`, and `powers/` directories are the neutral multi-framework toolbox, always present and never filtered at install time.
|
|
254
254
|
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
- `core`
|
|
258
|
-
- `development`
|
|
259
|
-
|
|
260
|
-
Future packs (knowledge, AWS, experimental) are deferred until another producer proof shows repeated friction.
|
|
261
|
-
|
|
262
|
-
All-pack installs remain the default today. `FLOW_AGENTS_PACKS` lets users opt into a smaller installed surface, and domain depth belongs in packs so a global setup can be narrowed without changing the source bundle.
|
|
255
|
+
Opinion and depth live in Flow Kits (builder, knowledge, release-evidence), surfaced through the Kit Catalog and activated when a workflow needs them. The Kit Catalog is the product-facing vocabulary; the standalone base is the doer's toolbox.
|
|
263
256
|
|
|
264
257
|
## How A Request Flows
|
|
265
258
|
|
|
@@ -383,7 +376,7 @@ The next useful improvements are:
|
|
|
383
376
|
|
|
384
377
|
- stronger live behavioral evals that prove hook output changes agent behavior across every runtime, not only that hooks emit guidance
|
|
385
378
|
- richer guide examples for non-code knowledge workflows
|
|
386
|
-
- clearer
|
|
379
|
+
- clearer Kit Catalog activation guidance for global installs
|
|
387
380
|
- a Veritas advisory-readiness spike through the optional governance adapter boundary
|
|
388
381
|
- a self-validation loop that automatically proposes docs, eval, or skill updates after repeated workflow friction
|
|
389
382
|
|
package/docs/context-map.md
CHANGED
|
@@ -23,7 +23,7 @@ Generated by `npm run context-map`. Regenerate after changing agents, skills, sc
|
|
|
23
23
|
| evals | canonical copy | Static, integration, install, and behavioral eval fixtures. |
|
|
24
24
|
| kits | canonical copy | Project directory. |
|
|
25
25
|
| packaging | canonical copy | Project directory. |
|
|
26
|
-
| powers | canonical copy | Optional MCP/tool
|
|
26
|
+
| powers | canonical copy | Optional MCP/tool capability bundles. |
|
|
27
27
|
| prompts | canonical copy | Reusable prompt entry points. |
|
|
28
28
|
| schemas | canonical copy | JSON Schema contracts for machine-readable workflow artifacts. |
|
|
29
29
|
| scripts | canonical copy | Build, validation, hook, telemetry, workflow, and import/export utilities. |
|
|
@@ -40,6 +40,7 @@ Generated by `npm run context-map`. Regenerate after changing agents, skills, sc
|
|
|
40
40
|
| Integration suite | bash evals/run.sh integration |
|
|
41
41
|
| Workflow artifacts | npm run workflow:validate-artifacts -- --require-sidecars --require-critique .flow-agents/<slug> |
|
|
42
42
|
| Workflow sidecars | npm run workflow:sidecar -- --help |
|
|
43
|
+
| Claim lookup | npm run workflow:sidecar -- claim <id> <dir> |
|
|
43
44
|
| Context map drift | npm run context-map:check |
|
|
44
45
|
| Bundle build | npm run build:bundles |
|
|
45
46
|
|
|
@@ -65,10 +66,12 @@ Primary tools: `npm run workflow:sidecar`, `npm run workflow:validate-artifacts`
|
|
|
65
66
|
|
|
66
67
|
| Skill | Source | When To Load |
|
|
67
68
|
| --- | --- | --- |
|
|
69
|
+
| continue-work | kits/builder/skills/continue-work/SKILL.md | Advance a multi-slice work item to its next increment via a fresh-context handoff. Use when one or more slices of a multi-slice issue have landed and the next undone slice should be built. Routes the next slice through pull-work + pickup... |
|
|
68
70
|
| deliver | kits/builder/skills/deliver/SKILL.md | Delivery workflow — selected work to delivered code. Ensures pull-work + pickup-probe preflight, then chains plan-work → execute-plan → review-work → verify-work → loop on failure without requiring user interaction between cleanly determ... |
|
|
69
71
|
| evidence-gate | kits/builder/skills/evidence-gate/SKILL.md | Evaluate whether completed work is trustworthy enough for human review, merge, or release. Use after implementation, verify-work, provider checks, CI, or remediation to map acceptance criteria to evidence, inspect scope integrity, classi... |
|
|
70
72
|
| execute-plan | kits/builder/skills/execute-plan/SKILL.md | Parallel execution primitive — plan artifact path to implemented code via tool-worker (x4). Reads plan directly. Updates session file between waves. |
|
|
71
73
|
| fix-bug | kits/builder/skills/fix-bug/SKILL.md | Bug fix orchestrator — diagnose → plan-work → execute-plan → review-work → verify-work → loop. Diagnosis phase is unique to bugs, then chains the same primitives. |
|
|
74
|
+
| gate-review | kits/builder/skills/gate-review/SKILL.md | Enumerate gate fires and suspected misses from the session's Hachure trust.bundle, classify each as correct/false_block/missed_block using Surface's resolveInquiry to produce canonical InquiryRecords, route findings to learning-review, a... |
|
|
72
75
|
| idea-to-backlog | kits/builder/skills/idea-to-backlog/SKILL.md | Turn raw product or technical ideas into shaped, prioritized, executable GitHub issue backlog. Use for idea intake, ideation, product shaping, spike/prototype decisions, PRD-like feature briefs, prioritization, and backlog creation befor... |
|
|
73
76
|
| learning-review | kits/builder/skills/learning-review/SKILL.md | Capture post-merge, post-deploy, or post-incident learnings and feed them back into backlog, workflow skills, tests, docs, or knowledge. Use after release readiness, post-deploy checks, retrospectives, failed gates, or repeated workflow... |
|
|
74
77
|
| plan-work | kits/builder/skills/plan-work/SKILL.md | Code planning primitive — goal + directory to structured execution plan. Delegates to tool-planner. No resume, no ideation. |
|
|
@@ -119,15 +122,6 @@ Primary tools: `npm run workflow:sidecar`, `npm run workflow:validate-artifacts`
|
|
|
119
122
|
| dependency-checker | powers/dependency-checker/POWER.md |
|
|
120
123
|
| playwright | powers/playwright/POWER.md |
|
|
121
124
|
|
|
122
|
-
## Packs
|
|
123
|
-
|
|
124
|
-
Pack composition is defined in `packaging/packs.json`. The current builder exports pack metadata in bundle catalogs, and generated install scripts support opt-in `FLOW_AGENTS_PACKS` filtering while leaving all packs installed by default.
|
|
125
|
-
|
|
126
|
-
| Pack | Default | Skills | Agents | Powers | Purpose |
|
|
127
|
-
| --- | --- | --- | --- | --- | --- |
|
|
128
|
-
| core | yes | 2 | 5 | 1 | Small default surface for reliable coding and workflow execution. |
|
|
129
|
-
| development | no | 4 | 9 | 1 | Development workflow depth for backlog, release, dependency, GitHub, TDD, and frontend work. |
|
|
130
|
-
|
|
131
125
|
## Current Workflow State
|
|
132
126
|
|
|
133
127
|
Runtime workflow state is excluded from the committed map.
|
package/docs/index.md
CHANGED
|
@@ -61,13 +61,14 @@ The four canonical policy classes are defined in the <a href="spec/runtime-hook-
|
|
|
61
61
|
| Core harness | opencode | agents, skills, plugin, opencode.json | L1 — no prompt-submit hook |
|
|
62
62
|
| Core harness | pi | extension, skills, AGENTS.md | L1 — no stop hook |
|
|
63
63
|
| Official framework adapter | AWS Strands (Python) | `integrations/strands/` spike/preview | L0 + config protection via cancellation |
|
|
64
|
+
| Official framework adapter | AWS Strands (TypeScript) | `integrations/strands-ts/` native-import preview | shipped telemetry + native config protection; L2-targeted policies run through the conformance shim |
|
|
64
65
|
| Conformance-certified | Community / third-party | Self-certify | Conformance kit in development |
|
|
65
66
|
|
|
66
|
-
Documented gaps: opencode has no native `prompt.submit`-equivalent event; pi has no stop hook; Codex live hook influence on model context is limited. The <a href="spec/runtime-hook-surface.html">Runtime Hook Surface spec</a> names every gap explicitly using the canonical event taxonomy.
|
|
67
|
+
Documented gaps: opencode has no native `prompt.submit`-equivalent event; pi has no stop hook; Codex live hook influence on model context is limited; Strands TS workflow steering, quality-gate, and stop-goal-fit coverage is conformance-shim-only. The <a href="spec/runtime-hook-surface.html">Runtime Hook Surface spec</a> names every gap explicitly using the canonical event taxonomy.
|
|
67
68
|
|
|
68
69
|
## Framework adapters
|
|
69
70
|
|
|
70
|
-
The same canonical policies wire into agent frameworks as in-process language-native packages. `integrations/strands/` contains `flow-agents-strands`, a Python `HookProvider` that emits the canonical telemetry taxonomy and enforces config protection via `BeforeToolCallEvent` cancellation — 50 unit tests, no Strands SDK required.
|
|
71
|
+
The same canonical policies wire into agent frameworks as in-process language-native packages. `integrations/strands/` contains `flow-agents-strands`, a Python `HookProvider` that emits the canonical telemetry taxonomy and enforces config protection via `BeforeToolCallEvent` cancellation — 50 unit tests, no Strands SDK required. `integrations/strands-ts/` adds a native-import TypeScript preview with shipped telemetry callbacks and native config-protection blocking; workflow-steering, quality-gate, and stop-goal-fit policy coverage is exercised by the conformance shim only. See <a href="spec/runtime-hook-surface.html">the spec</a> for the full framework adapter mapping and minimum viable adapter pseudocode.
|
|
71
72
|
|
|
72
73
|
## Quick Start
|
|
73
74
|
|
|
@@ -180,12 +180,12 @@ The following limitations are from `integrations/strands/README.md` and reflect
|
|
|
180
180
|
|
|
181
181
|
8. **Quality-gate policy omitted**: `quality-gate.js` invokes ruff/biome after edits. There is no clear Strands analogue yet.
|
|
182
182
|
|
|
183
|
-
## Conformance
|
|
183
|
+
## Conformance status
|
|
184
184
|
|
|
185
|
-
The Strands adapter is L0 plus config protection via `BeforeToolCallEvent` cancellation.
|
|
185
|
+
The Strands adapter is L0 plus config protection via `BeforeToolCallEvent` cancellation. Its current status is:
|
|
186
186
|
|
|
187
187
|
```
|
|
188
|
-
|
|
188
|
+
conformance_status: L0 (+ config-protection via BeforeToolCallEvent)
|
|
189
189
|
host: AWS Strands Agents
|
|
190
190
|
event_coverage:
|
|
191
191
|
agentSpawn: AgentInitializedEvent (full fidelity)
|
|
@@ -228,6 +228,8 @@ python3 -m unittest discover
|
|
|
228
228
|
|
|
229
229
|
`@kontourai/flow-agents-strands` is the first **native-import** consumer of the policy engine contract. Where the Python adapter spawns a subprocess for each `BeforeToolCallEvent` policy check, the TS adapter calls `config-protection.js`'s exported `run()` function directly — zero subprocess overhead on the hot path.
|
|
230
230
|
|
|
231
|
+
The shipped native callback surface is telemetry plus config-protection blocking. The adapter's workflow steering, quality-gate, and stop-goal-fit checks are conformance-shim-only through `bin/conformance-shim.mjs`; it is not a claim that production `FlowAgentsHooks` runs all four policies directly in Strands TS callbacks.
|
|
232
|
+
|
|
231
233
|
### Key differences from the Python adapter
|
|
232
234
|
|
|
233
235
|
| | Python adapter | TypeScript adapter |
|
|
@@ -235,9 +237,20 @@ python3 -m unittest discover
|
|
|
235
237
|
| Engine binding | subprocess (`node run-hook.js …`) | `require("config-protection.js").run()` — in-process |
|
|
236
238
|
| Strands SDK | `register_hooks(registry)` → `registry.add_callback` | `registerHooks(registry)` → `registry.addCallback` |
|
|
237
239
|
| Cancel signal | `event.cancel_tool = reason` | `event.cancel = reason` (TS variant) |
|
|
238
|
-
| Conformance | L0 + config-protection | L2
|
|
240
|
+
| Conformance | L0 + config-protection | L2-targeted policy coverage via conformance shim |
|
|
239
241
|
| Test framework | stdlib unittest (Python) | node:test (no extra deps) |
|
|
240
242
|
|
|
243
|
+
### TypeScript capability states
|
|
244
|
+
|
|
245
|
+
| Capability | State | Public behavior |
|
|
246
|
+
| --- | --- | --- |
|
|
247
|
+
| Telemetry callbacks | shipped | Production callbacks emit canonical JSONL events. |
|
|
248
|
+
| Config-protection hot path | shipped | Native `run()` import blocks via `event.cancel`. |
|
|
249
|
+
| Workflow steering | structural-only | L2-targeted checks run through the conformance shim; production callbacks emit telemetry only. |
|
|
250
|
+
| Quality-gate | structural-only | L2-targeted checks run through the conformance shim. |
|
|
251
|
+
| Stop-goal-fit | structural-only | L2-targeted checks run through the conformance shim. |
|
|
252
|
+
| Analytics channel, Console/HTTP sink, subagent events, permission requests, token usage | unavailable | These gaps are not wired in the TypeScript adapter. |
|
|
253
|
+
|
|
241
254
|
### Constructing FlowAgentsHooks (TypeScript)
|
|
242
255
|
|
|
243
256
|
```typescript
|
|
@@ -264,7 +277,7 @@ The TS adapter exports `STRANDS_TO_CANONICAL` matching the Python adapter's dict
|
|
|
264
277
|
|
|
265
278
|
### Conformance
|
|
266
279
|
|
|
267
|
-
The TS adapter
|
|
280
|
+
The TS adapter runs its L2-targeted policy coverage through `bin/conformance-shim.mjs`:
|
|
268
281
|
|
|
269
282
|
```bash
|
|
270
283
|
node packaging/conformance/run-conformance.js \
|
|
@@ -272,4 +285,4 @@ node packaging/conformance/run-conformance.js \
|
|
|
272
285
|
--level L2
|
|
273
286
|
```
|
|
274
287
|
|
|
275
|
-
|
|
288
|
+
This is the validation path for the canonical fixture contract through the shim while the shipped native adapter remains telemetry plus native config-protection. Current status: the L2 target is not passing. The runner reports 18/20 fixtures passing with highest achieved level L0; `stop-goal-fit--warn-active-delivery.json` and `workflow-steering--reground-session-start.json` remain failing. Treat runner output as the current status for that target; do not read it as a production callback capability. See `integrations/strands-ts/README.md` for the full conformance declaration and limitations.
|
|
@@ -22,7 +22,7 @@ Flow Agents reaches host runtimes and agent frameworks through two distinct dist
|
|
|
22
22
|
| L1 | L0 plus workflow steering and stop-goal-fit in warning mode |
|
|
23
23
|
| L2 | L1 plus config protection (blocking) and quality gate — the reference level |
|
|
24
24
|
|
|
25
|
-
Claude Code and Codex are L2 reference implementations. opencode is L1 (no prompt-submit hook). pi is L1 (no stop hook). The Strands adapter is L0 plus config protection via `BeforeToolCallEvent` cancellation.
|
|
25
|
+
Claude Code and Codex are L2 reference implementations. opencode is L1 (no prompt-submit hook). pi is L1 (no stop hook). The Strands Python adapter is L0 plus config protection via `BeforeToolCallEvent` cancellation. The Strands TypeScript adapter ships telemetry callbacks plus native config protection; workflow steering, quality-gate, and stop-goal-fit policy coverage is available through the conformance shim only.
|
|
26
26
|
|
|
27
27
|
The <a href="../spec/runtime-hook-surface.html">Runtime Hook Surface spec</a> defines the canonical event taxonomy, policy classes, conformance levels, and engine contract in full.
|
|
28
28
|
|
|
@@ -55,4 +55,4 @@ The <a href="../spec/runtime-hook-surface.html">Runtime Hook Surface spec</a> de
|
|
|
55
55
|
|
|
56
56
|
## TypeScript native-import adapter
|
|
57
57
|
|
|
58
|
-
`integrations/strands-ts/` (`@kontourai/flow-agents-strands`) is the first native-import consumer of the policy engine contract. It binds the `config-protection.js` `run()` function directly — no subprocess on the hot path.
|
|
58
|
+
`integrations/strands-ts/` (`@kontourai/flow-agents-strands`) is the first native-import consumer of the policy engine contract. It binds the `config-protection.js` `run()` function directly — no subprocess on the hot path. Shipped native behavior is telemetry callbacks plus config-protection blocking; workflow steering, quality-gate, and stop-goal-fit run through the conformance shim rather than production Strands TS callbacks. See `integrations/strands-ts/README.md` and the [Framework Adapter](framework-adapter.html) page for the full comparison with the Python adapter.
|
package/docs/north-star.md
CHANGED
|
@@ -46,7 +46,7 @@ Flow Agents should only invent a format when no durable standard or Kontour foun
|
|
|
46
46
|
|
|
47
47
|
Do not load the whole operating manual into every session.
|
|
48
48
|
|
|
49
|
-
Flow Agents should expose small discovery metadata first, then load guidance only when it is useful. Skills, powers, workflow contracts, context
|
|
49
|
+
Flow Agents should expose small discovery metadata first, then load guidance only when it is useful. Skills, powers, workflow contracts, context bundles, and references should be activated just in time.
|
|
50
50
|
|
|
51
51
|
### Reliability Over Ceremony
|
|
52
52
|
|
|
@@ -148,7 +148,7 @@ The goal is not to add ceremony. The goal is to make agents more reliable while
|
|
|
148
148
|
| --- | --- | --- |
|
|
149
149
|
| [x] | North star | Durable direction documented in `docs/north-star.md`. |
|
|
150
150
|
| [x] | Layer taxonomy | Repo vocabulary clearly separates rules, skills, powers, agents, workflows, knowledge, and evidence. |
|
|
151
|
-
| [x] |
|
|
151
|
+
| [x] | Neutral base vs Kit depth | The standalone `skills/`/`agents/`/`powers/` base always installs; opinion and depth live in Flow Kits surfaced through the Kit Catalog and activated on demand. |
|
|
152
152
|
| [x] | Standards register | Supported standards and Flow Agents-owned formats are documented with adoption rules. |
|
|
153
153
|
| [ ] | Structured workflow state | Draft schemas, contracts, validation, explicit current-session identity, delegation-safe agent event logs, sidecar writer commands, and direct workflow-skill writer instructions exist for state, acceptance, evidence, handoff, critique, release, and learning; automatic enforcement remains partial. |
|
|
154
154
|
| [ ] | Context map | Generated repo/context map exists; workflow steering and core planner/worker/verifier agents now use it, but broader agent coverage remains. |
|
|
@@ -180,7 +180,7 @@ Tasks:
|
|
|
180
180
|
|
|
181
181
|
- Document the public layers: rules, skills, powers, agents, workflows, knowledge, and evidence. **Done:** see https://github.com/kontourai/flow-agents/blob/main/docs/operating-layers.md.
|
|
182
182
|
- Mark which directories are canonical source, generated exports, runtime state, and optional integrations.
|
|
183
|
-
-
|
|
183
|
+
- Separate the neutral standalone base (always installed) from opinionated depth in Flow Kits. **Done:** the `skills/`/`agents/`/`powers/` base always ships; Kits carry depth through the Kit Catalog.
|
|
184
184
|
- Add a standards register that lists each external standard, how Flow Agents uses it, and what Flow Agents-owned schemas still exist. **Done:** see https://github.com/kontourai/flow-agents/blob/main/docs/standards-register.md.
|
|
185
185
|
- Add a "do not invent without checking standards" rule to contributor docs.
|
|
186
186
|
|
|
@@ -214,7 +214,7 @@ Exit criteria:
|
|
|
214
214
|
|
|
215
215
|
Tasks:
|
|
216
216
|
|
|
217
|
-
- Generate a compact context map for each repo: structure, commands, test strategy, key conventions, recent workflow state, and available
|
|
217
|
+
- Generate a compact context map for each repo: structure, commands, test strategy, key conventions, recent workflow state, and available Kits. **Started:** `npm run context-map --` writes `docs/context-map.md` and supports drift checks.
|
|
218
218
|
- Extend hooks so they can surface file-specific, workflow-specific, or evidence-specific guidance without loading whole docs. **Started:** workflow steering now emits ambient reminders after non-subagent tools when sidecars show `not_verified`, `needs_decision`, `blocked`, `failed`, or `needs_user`.
|
|
219
219
|
- Add skill discovery metadata that lets agents choose a skill from a short summary, then progressively load the body.
|
|
220
220
|
- Add missing-evidence prompts: when a workflow is about to stop without proof, show the specific gate that failed. **Started:** the Goal Fit stop hook now reads `state.json`, `evidence.json`, and `critique.json` to report unfinished phase, next action, failed checks, `NOT_VERIFIED` gaps, and open critique findings.
|
package/docs/operating-layers.md
CHANGED
|
@@ -89,14 +89,14 @@ If a proposed artifact seems to belong to multiple layers, split it. For example
|
|
|
89
89
|
- workflow state for a specific update task
|
|
90
90
|
- evidence for the scan result
|
|
91
91
|
|
|
92
|
-
##
|
|
92
|
+
## Neutral Base And Kit Depth
|
|
93
93
|
|
|
94
|
-
|
|
94
|
+
Every install ships the full standalone base: the `skills/`, `agents/`, and `powers/` directories are the neutral multi-framework toolbox, always present. Flow Kits add workflow depth and opinion on top of that base, surfaced through the Kit Catalog — the product-facing vocabulary — and activated when a workflow needs them. `packaging/` holds the cross-harness export manifest and build mechanics only; it no longer carries any install-time composition layer.
|
|
95
95
|
|
|
96
96
|
Do not duplicate full membership lists in prose. Update the canonical kit and packaging metadata, then regenerate the Context Map for the current skill, agent, power, and Flow Kit counts:
|
|
97
97
|
https://github.com/kontourai/flow-agents/blob/main/docs/context-map.md
|
|
98
98
|
|
|
99
|
-
Kit boundaries should be validated by usage data, context budget impact, and whether users can predict what
|
|
99
|
+
Kit boundaries should be validated by usage data, context budget impact, and whether users can predict what the Kit Catalog will activate.
|
|
100
100
|
|
|
101
101
|
## Design Checks
|
|
102
102
|
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
# Plan — ADR 0010 Phase 2: gate recomputes the trust bundle (+ maximal enrichment)
|
|
2
|
+
|
|
3
|
+
**Status:** Workstream A (maximal enrichment) + B-core (gate enforces on the bundle) **shipped** in this PR; see [ADR 0010](../adr/0010-workflow-trust-state-as-hachure-bundle.md) for the authoritative phase status. **Remaining (the careful follow-on):** B's hardening — re-derive-at-gate via Surface `buildTrustReport` (async hook restructure) and removal of the `DELIVERY_TYPES`/markdown parsing (completes [ADR 0009](../adr/0009-canonical-hook-core-kit-boundary.md)) — plus Phases 3–4. The constraint that blocks a naive markdown rip-out is in Workstream B below: `prove-capture-teeth` seeds raw evidence+log and relies on markdown detection.
|
|
4
|
+
|
|
5
|
+
## Baseline (already shipped — do NOT rebuild)
|
|
6
|
+
|
|
7
|
+
- **Phase 1 emit is done + wired.** `src/cli/workflow-sidecar.ts` → `buildTrustBundle()` maps `evidence.checks` / `acceptance.criteria` / `critique` → claims+evidence+events, **recomputes status via `@kontourai/surface`'s `deriveClaimStatus`**, and `writeTrustBundle()` validates + writes `.flow-agents/<slug>/trust.bundle`. Wired into `record-evidence` / `record-critique` / `advance-state` (lines 688/743/832/897). Fail-open; `@kontourai/surface` is an **optional** dep, loaded via dynamic `import()` (`tryLoadSurface`).
|
|
8
|
+
- Gates already expect `trust.bundle` claims (ADR 0004 / #97). Surface exports `deriveClaimStatus`, `buildTrustReport(bundle) → TrustReport` (the recompute), `validateTrustBundle`. ESM-only.
|
|
9
|
+
- `stop-goal-fit.js` currently enforces off **bespoke `evidence.json` + Builder markdown** (`## Definition Of Done` / `## Goal Fit Gate`, `DELIVERY_TYPES` skill-names) + the capture cross-reference (`command-log.jsonl`). It already has: `current.json` active-task scoping, pre-execution/terminal gating, escape hatch, `FLOW_AGENTS_GOAL_FIT_MODE`.
|
|
10
|
+
|
|
11
|
+
## Goal
|
|
12
|
+
|
|
13
|
+
1. **Gate recomputes the bundle** (`buildTrustReport`) and enforces on the *report's* claim statuses — replacing bespoke `evidence.json` + Builder-markdown parsing.
|
|
14
|
+
2. **Maximal enrichment** of the emit: add verification-**policies** (currently `policies: []`) and fold **`command-log` capture** into the bundle's evidence.
|
|
15
|
+
3. Correct ADR 0010's "implementation not started" line.
|
|
16
|
+
|
|
17
|
+
## Workstream A — Maximal enrichment of the emit (do FIRST)
|
|
18
|
+
|
|
19
|
+
File: `src/cli/workflow-sidecar.ts` → `buildTrustBundle()`.
|
|
20
|
+
|
|
21
|
+
- **A1 — policies.** Emit a `VerificationPolicy` per claimType (`workflow.check.*`, `workflow.acceptance.criterion`, `workflow.critique.review`) and pass them into `deriveClaimStatus` + the bundle's `policies[]` (today `[]` → status derives without policy). Required fields (from `surface/src/types.ts` `VerificationPolicy`): `id, claimType, requiredEvidence, acceptanceCriteria, reviewAuthority, validityRule, stalenessTriggers, conflictRules, impactLevel`.
|
|
22
|
+
- **A2 — capture as evidence.** Read `.flow-agents/<slug>/command-log.jsonl`; for checks whose command was captured, add `Evidence` with `execution { runner:"bash", exitCode, isError, label }` + `passing`/`blocking` from the real captured result (the deterministic capture, now first-class bundle evidence).
|
|
23
|
+
- **Proof:** `validateTrustBundle` stays valid; extend `evals/integration/test_workflow_sidecar_writer.sh` to assert policies + capture evidence are present and statuses derive correctly.
|
|
24
|
+
|
|
25
|
+
## Workstream B — Phase 2: gate recomputes the bundle
|
|
26
|
+
|
|
27
|
+
File: `scripts/hooks/stop-goal-fit.js`.
|
|
28
|
+
|
|
29
|
+
- **B1** Resolve the active task dir (already done via `current.json` / state scoping) and read its `trust.bundle`.
|
|
30
|
+
- **B2** Recompute via Surface `buildTrustReport(bundle)` — dynamic `import()`, **fail-open** (mirror `workflow-sidecar`'s `tryLoadSurface`).
|
|
31
|
+
- **B3** Block on the **report's** statuses: any blocking-impact claim with `fail`/`disputed` (and per-policy `unknown`) → block. This **replaces** the bespoke `evidence.verdict`/`checks` parsing and the markdown DOD/Goal-Fit parsing. **Preserve:** the false-completion catch (a failed capture must surface as a `disputed` claim → block), pre-exec/terminal gating, escape hatch, `MODE`.
|
|
32
|
+
- **B4** Remove the Builder-markdown coupling (`## Definition Of Done` / `## Goal Fit Gate` parsing, `DELIVERY_TYPES`/`--deliver` skill-name detection) → realizes ADR 0009's de-coupling. Detect the artifact by the kit-neutral signal (`state.json` presence), not skill names.
|
|
33
|
+
- **Fallback:** when Surface/bundle is unavailable → fall back to bespoke `state.json`/`evidence.json` status checks (NEVER markdown). Bundle-recompute when available; schema-status fallback otherwise.
|
|
34
|
+
- **Proof:** `prove-capture-teeth` **8/8** (the catch now via report), conformance **L2** (add bundle-based fixtures), goal-fit + escape-hatch + steering integration green.
|
|
35
|
+
|
|
36
|
+
## Workstream C — Docs
|
|
37
|
+
|
|
38
|
+
- Correct `docs/adr/0010` : Phase 1 shipped (`buildTrustBundle`/`writeTrustBundle`); remainder = Phase 2 (this) + maximal + Phase 3.
|
|
39
|
+
- **Phase 3 (separate, later):** Surface **Trust Panel** projection (`@kontourai/surface/trust-panel/element`) over the local bundle; optional Console sink per ADR 0010's local-first distribution model.
|
|
40
|
+
|
|
41
|
+
## Sequencing, proof gates, hygiene
|
|
42
|
+
|
|
43
|
+
- Order: **A → B → C.** After *every* edit to `stop-goal-fit.js`, re-run `prove-capture-teeth` + the goal-fit suite (this hook broke twice from haste — change incrementally).
|
|
44
|
+
- Per-gate green required: `tsc` build, static+integration evals, conformance L2, `prove-capture-teeth` 8/8.
|
|
45
|
+
- Work in an **isolated git worktree off latest `origin/main`** (this is a busy multi-agent repo — 5+ active worktrees). **Before starting, check `feat/gate-review` and `chore/gate-vocabulary-migration` aren't colliding.** Surgical commits, one PR, shepherd CI (the `pre-push` source-validation needs `node_modules/.bin` on `PATH`; never `--no-verify`).
|
|
46
|
+
|
|
47
|
+
## Guardrails (don't violate)
|
|
48
|
+
|
|
49
|
+
- **Consume, never fork:** use Surface's `buildTrustReport`/`deriveClaimStatus`; do not reimplement status logic in flow-agents.
|
|
50
|
+
- **Boundary (ADR 0009):** the gate reads the canonical bundle (core), not Builder markdown (kit).
|
|
51
|
+
- **Determinism preserved:** the proven "claimed-pass but capture shows fail → block" must still hold via the report (failed capture event → `disputed` claim).
|
|
52
|
+
|
|
53
|
+
## Key files
|
|
54
|
+
|
|
55
|
+
`src/cli/workflow-sidecar.ts` (`buildTrustBundle`), `scripts/hooks/stop-goal-fit.js`, `surface/src/types.ts` (Policy/Event/Claim shapes), `schemas/workflow-evidence.schema.json`, `evals/integration/test_workflow_sidecar_writer.sh`, `evals/acceptance/prove-capture-teeth.sh`, `packaging/conformance/fixtures/`, `docs/adr/0009`+`0010`.
|
|
@@ -27,7 +27,7 @@ This is the canonical developer-facing map for the Flow Agents repository. Use i
|
|
|
27
27
|
# canonical workflow bundle content
|
|
28
28
|
kits/ # Flow Kit catalog and bundled kit assets
|
|
29
29
|
schemas/ # JSON sidecar and provider schemas
|
|
30
|
-
packaging/ # bundle/export manifests and
|
|
30
|
+
packaging/ # bundle/export manifests and packaging rules
|
|
31
31
|
evals/ # eval harness, fixtures, static checks, integration checks
|
|
32
32
|
docs/ # durable docs and GitHub Pages source
|
|
33
33
|
integrations/ # optional external integration config
|
|
@@ -56,7 +56,7 @@ This is the canonical developer-facing map for the Flow Agents repository. Use i
|
|
|
56
56
|
| `evals/` | canonical eval source plus ignored results | Harness, cases, fixtures, static checks, integration checks. | `evals/results/*.json`, reports, and CI logs are generated output unless intentionally tracked fixtures. | Do not remove fixtures without reference proof; generated results can be local cleanup candidates. |
|
|
57
57
|
| `integrations/` | optional integration source | Integration config shipped with the repo. | Source; local run state belongs under ignored runtime roots. | Keep optional and adapter-driven. |
|
|
58
58
|
| `kits/` | canonical Flow Kit source | Kit Catalog and bundled Builder Kit assets. | Exported and validated by Flow Kit commands. | Preserve catalog paths and validation coverage. |
|
|
59
|
-
| `packaging/` | canonical packaging source | Manifest,
|
|
59
|
+
| `packaging/` | canonical packaging source | Manifest, export rules, and packaging docs. | Drives generated bundles under `dist/`. | Update before changing export shape. |
|
|
60
60
|
| `powers/` | canonical source | Optional MCP/tool capability bundles. | Exported where supported. | Keep activation guidance separate from credentials. |
|
|
61
61
|
| `prompts/` | canonical source | Saved prompt entry points. | Exported where supported. | Promote stable procedures into skills when needed. |
|
|
62
62
|
| `schemas/` | canonical source | JSON schemas for sidecars and provider/resource records. | Used by validators and workflow tooling. | Schema changes require artifact validation. |
|
package/docs/skills-map.md
CHANGED
|
@@ -14,6 +14,7 @@ For practical operator instructions and copy/paste prompts, see https://github.c
|
|
|
14
14
|
- `design-probe`: generic one-question-at-a-time probing interview; Builder Kit uses this step before planning when the build flow needs shared understanding or a pickup decision.
|
|
15
15
|
- `pickup-probe`: Builder Kit specialization of `design-probe` for selected work items; records scope, provider state, WIP/conflict scans, risks, decisions, unresolved questions, accepted gaps, and planning readiness.
|
|
16
16
|
- `plan-work` / `execute-plan` / `deliver`: Definition Of Done, execution orchestration, and local delivery closure.
|
|
17
|
+
- `continue-work`: advance a multi-slice work item to its next increment via a fresh-context handoff; restores the durable record (resume surface), derives the next undone slice from the issue plus merged PRs, and routes that slice **through** `pull-work` + `pickup-probe` (never around the gate) before handing off fresh per ADR 0013.
|
|
17
18
|
- `review-work`: report-only critique for quality, security triggers, architecture fit, and standards findings.
|
|
18
19
|
- `verify-work`: behavior evidence mapped to acceptance criteria and Goal Fit.
|
|
19
20
|
- `evidence-gate`: trust assessment for completed work: acceptance evidence, integrity checks, CI confidence, and next step.
|