@kontourai/flow-agents 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/.github/workflows/ci.yml +6 -1
  2. package/.github/workflows/kit-gates-demo.yml +6 -2
  3. package/CHANGELOG.md +25 -0
  4. package/CONTRIBUTING.md +30 -0
  5. package/agents/dev.json +1 -1
  6. package/agents/tool-planner.json +1 -1
  7. package/build/src/cli/workflow-sidecar.js +70 -5
  8. package/build/src/flow-kit/validate.js +32 -1
  9. package/build/src/tools/build-universal-bundles.js +14 -0
  10. package/console.telemetry.json +1 -1
  11. package/docs/adr/0004-gates-expect-surface-claims.md +7 -7
  12. package/docs/kit-authoring-guide.md +99 -6
  13. package/docs/operating-layers.md +2 -2
  14. package/docs/veritas-integration.md +4 -4
  15. package/docs/workflow-eval-strategy.md +2 -2
  16. package/docs/workflow-usage-guide.md +1 -1
  17. package/evals/acceptance/test_opencode_harness.sh +18 -10
  18. package/evals/acceptance/test_pi_harness.sh +10 -6
  19. package/evals/ci/run-baseline.sh +1 -1
  20. package/evals/fixtures/flow-kit-repository/mixed-runtime-kit/flows/runtime.flow.json +4 -4
  21. package/evals/fixtures/flow-kit-repository/valid-local-kit/flows/review.flow.json +4 -4
  22. package/evals/fixtures/kit-conformance-levels/k0-flows-only/flows/review.flow.json +4 -4
  23. package/evals/fixtures/kit-conformance-levels/k1-agent-extension/flows/build.flow.json +4 -4
  24. package/evals/fixtures/kit-conformance-levels/k2-with-evals/flows/synthesize.flow.json +4 -4
  25. package/evals/fixtures/kit-conformance-levels/third-party-extension/flows/review.flow.json +4 -4
  26. package/evals/fixtures/surface-trust/accepted-claim-trust-report.json +2 -2
  27. package/evals/fixtures/surface-trust/artifact-absent.json +2 -2
  28. package/evals/fixtures/surface-trust/integrity-mismatch-trust-report.json +2 -2
  29. package/evals/fixtures/surface-trust/missing-authority-trust-report.json +2 -2
  30. package/evals/fixtures/surface-trust/provider-absent.json +2 -2
  31. package/evals/fixtures/surface-trust/rejected-claim-trust-report.json +2 -2
  32. package/evals/fixtures/surface-trust/stale-claim-trust-snapshot.json +2 -2
  33. package/evals/integration/test_console_learning_projection.sh +1 -1
  34. package/evals/integration/test_goal_fit_hook.sh +144 -0
  35. package/evals/integration/test_kit_conformance_levels.sh +55 -1
  36. package/evals/integration/test_workflow_sidecar_writer.sh +9 -9
  37. package/evals/static/test_package.sh +3 -3
  38. package/evals/static/test_workflow_skills.sh +4 -4
  39. package/kits/builder/flows/build.flow.json +48 -48
  40. package/kits/builder/flows/shape.flow.json +36 -36
  41. package/kits/knowledge/adapters/obsidian-store/index.js +137 -26
  42. package/kits/knowledge/evals/contract-suite/suite.test.js +90 -0
  43. package/kits/knowledge/flows/compile.flow.json +12 -12
  44. package/kits/knowledge/flows/consolidate.flow.json +16 -16
  45. package/kits/knowledge/flows/ingest.flow.json +12 -12
  46. package/kits/knowledge/flows/retire.flow.json +16 -16
  47. package/kits/knowledge/flows/store-contract.flow.json +12 -12
  48. package/kits/knowledge/flows/synthesize.flow.json +16 -16
  49. package/kits/release-evidence/flows/release-evidence.flow.json +3 -3
  50. package/package.json +5 -2
  51. package/schemas/workflow-evidence.schema.json +2 -1
  52. package/scripts/hooks/stop-goal-fit.js +66 -18
  53. package/src/cli/workflow-sidecar.ts +62 -4
  54. package/src/flow-kit/validate.ts +55 -1
  55. package/src/tools/build-universal-bundles.ts +14 -0
@@ -40,7 +40,9 @@ jobs:
40
40
  mkdir -p .flow-cli
41
41
  cd .flow-cli
42
42
  printf '{"name":"flow-cli-host","private":true}\n' > package.json
43
- npm install --no-save @kontourai/flow
43
+ # Pinned to ~1.3.0: gate evidence uses the Hachure trust.bundle format
44
+ # (kontourai/flow#84). flow-agents migrated surface.claim -> trust.bundle.
45
+ npm install --no-save @kontourai/flow@~1.3.0
44
46
 
45
47
  - name: Install shell tools
46
48
  run: |
@@ -216,6 +218,9 @@ jobs:
216
218
  continue-on-error: true
217
219
  run: bash evals/ci/run-baseline.sh --check flow-kit-install-git-integration
218
220
 
221
+ - name: Console learning projection integration
222
+ continue-on-error: true
223
+ run: bash evals/ci/run-baseline.sh --check console-learning-projection-integration
219
224
 
220
225
  - name: Context map integration
221
226
  continue-on-error: true
@@ -50,7 +50,9 @@ jobs:
50
50
  mkdir -p .flow-cli
51
51
  cd .flow-cli
52
52
  printf '{"name":"flow-cli-host","private":true}\n' > package.json
53
- npm install --no-save @kontourai/flow
53
+ # Pinned to ~1.3.0: gate evidence uses the Hachure trust.bundle format
54
+ # (kontourai/flow#84). flow-agents migrated surface.claim -> trust.bundle.
55
+ npm install --no-save @kontourai/flow@~1.3.0
54
56
  env:
55
57
  FLOW_CLI_ROOT: ${{ github.workspace }}/.flow-cli/node_modules/@kontourai/flow
56
58
 
@@ -113,7 +115,9 @@ jobs:
113
115
  mkdir -p .flow-cli
114
116
  cd .flow-cli
115
117
  printf '{"name":"flow-cli-host","private":true}\n' > package.json
116
- npm install --no-save @kontourai/flow
118
+ # Pinned to ~1.3.0: gate evidence uses the Hachure trust.bundle format
119
+ # (kontourai/flow#84). flow-agents migrated surface.claim -> trust.bundle.
120
+ npm install --no-save @kontourai/flow@~1.3.0
117
121
  env:
118
122
  FLOW_CLI_ROOT: ${{ github.workspace }}/.flow-cli/node_modules/@kontourai/flow
119
123
 
package/CHANGELOG.md CHANGED
@@ -1,5 +1,30 @@
1
1
  # Changelog
2
2
 
3
+ ## [1.3.0](https://github.com/kontourai/flow-agents/compare/v1.2.0...v1.3.0) (2026-06-16)
4
+
5
+
6
+ ### Features
7
+
8
+ * add kit TRUST axis to inspect output — orthogonal to K-levels (issue [#79](https://github.com/kontourai/flow-agents/issues/79)) ([2a353d1](https://github.com/kontourai/flow-agents/commit/2a353d17ffb1da8b0fc23f442f52aa0676a1fabe))
9
+ * add TRUST axis to kit inspect — orthogonal to K-level capability (issue [#79](https://github.com/kontourai/flow-agents/issues/79)) ([02ac699](https://github.com/kontourai/flow-agents/commit/02ac699227c4071c16c936c3e01e5fd013466baf))
10
+ * **knowledge:** rendered-body-as-storage in Obsidian adapter ([baef40f](https://github.com/kontourai/flow-agents/commit/baef40f46f4016ba8b6c8afd1c61b91cade1de12))
11
+ * **knowledge:** rendered-body-as-storage in Obsidian adapter ([0a31c32](https://github.com/kontourai/flow-agents/commit/0a31c3233ee8772b000cb42dbef0a3fdc38ccf1c))
12
+ * migrate gate evidence from surface.claim to Hachure trust.bundle ([#97](https://github.com/kontourai/flow-agents/issues/97)) ([8ed43c4](https://github.com/kontourai/flow-agents/commit/8ed43c46c2a6887d32cd850bc8b2d97e7829f825))
13
+
14
+
15
+ ### Fixes
16
+
17
+ * **#74:** console-learning test cross-platform + un-quarantine; docs([#39](https://github.com/kontourai/flow-agents/issues/39)): live-validation rule ([89b2bdb](https://github.com/kontourai/flow-agents/commit/89b2bdb44f3fa5ea629135f7e93410eee92efb1c))
18
+ * **#74:** un-quarantine console-learning test — passes 12/12 on Linux CI ([#89](https://github.com/kontourai/flow-agents/issues/89)) ([371ecd2](https://github.com/kontourai/flow-agents/commit/371ecd22cbd8e80b6404cbdd2825d4a94fb6573c))
19
+ * **#75:** assert opencode plugin load via factory marker file ([#96](https://github.com/kontourai/flow-agents/issues/96)) ([6c09288](https://github.com/kontourai/flow-agents/commit/6c092883bc4b2fd5a893431991ab75921f8b080b))
20
+ * acceptance harnesses poll for all required telemetry events (canary flake [#75](https://github.com/kontourai/flow-agents/issues/75)) ([a27b4ff](https://github.com/kontourai/flow-agents/commit/a27b4ff48c88908419ef079c447d8a9930aa707a))
21
+ * acceptance harnesses skip (not fail) when no telemetry produced — no-provider CI ([d9cba18](https://github.com/kontourai/flow-agents/commit/d9cba180ebcec9005bcd0c7b29f2608530c8acc3))
22
+ * acceptance harnesses skip telemetry assertions when no provider (canary [#75](https://github.com/kontourai/flow-agents/issues/75)) ([dbd0e7b](https://github.com/kontourai/flow-agents/commit/dbd0e7b77444ed5df93fb47a59d2704460742367))
23
+ * acceptance harnesses wait for ALL required telemetry events, not just file existence ([d9c86c0](https://github.com/kontourai/flow-agents/commit/d9c86c0987d42ab1f6c5c411884bcf1912bd8fab))
24
+ * **ci:** pin @kontourai/flow to ~1.2.0 ([#95](https://github.com/kontourai/flow-agents/issues/95)) ([fd97803](https://github.com/kontourai/flow-agents/commit/fd97803c97ade926b1985c42b1693d8e9890f9f1))
25
+ * **knowledge:** collision-proof body delimiter in Obsidian adapter ([4e2560c](https://github.com/kontourai/flow-agents/commit/4e2560cec3b0b8c2660879d059ce29f0cc88184a))
26
+ * **stop-goal-fit:** invoke built validator directly; skip on env errors ([#92](https://github.com/kontourai/flow-agents/issues/92)) ([7b3d520](https://github.com/kontourai/flow-agents/commit/7b3d5208497f3cc8d4f8137d21f16408f9d2689e))
27
+
3
28
  ## [1.2.0](https://github.com/kontourai/flow-agents/compare/v1.1.0...v1.2.0) (2026-06-15)
4
29
 
5
30
 
package/CONTRIBUTING.md CHANGED
@@ -45,4 +45,34 @@ Releases are automated with release-please: merges to main accumulate into a rel
45
45
  - `bash evals/ci/run-baseline.sh` — deterministic CI baseline
46
46
  - `npm run check:content-boundary` — no private/internal content leaks
47
47
 
48
+ ## Runtime integrations must be live-validated
49
+
50
+ Static and integration evals that only assert "the artifact exists / parses as
51
+ JSON / the helper script runs" are **not sufficient** for generated host
52
+ artifacts. During the 0.3.0 program, six defects shipped green across 113+
53
+ assertions and were caught only by executing the artifact in (or as) its real
54
+ host. A new runtime integration MUST ship:
55
+
56
+ 1. **Parse-gates** for every generated artifact, in its host language (e.g.
57
+ `node --check` for a JS plugin, `tsc` syntax check for a TS extension) — a
58
+ file that doesn't parse in its host helps no one, no matter how valid its
59
+ JSON wrapper is.
60
+ 2. **A mechanical hook-chain execution test** — actually run the generated
61
+ hook/plugin handlers with realistic payloads and assert the downstream
62
+ effects (telemetry written, policy decision returned), not just that the
63
+ files are wired.
64
+ 3. **A binary-gated live acceptance harness** — install into a temp workspace,
65
+ run the real host binary if present (skip cleanly if not), and assert
66
+ observable behavior end-to-end. See `evals/acceptance/test_opencode_harness.sh`,
67
+ `test_pi_harness.sh`, and `test_knowledge_kit_live.sh` for the pattern.
68
+
69
+ Integration tests must also be wired into a CI lane in `evals/ci/run-baseline.sh`
70
+ (and a matching `--check` step in `.github/workflows/ci.yml`) — a test that
71
+ runs via the `evals/run.sh` glob but is absent from the curated CI lanes gates
72
+ nothing. Tests that create temp dirs must canonicalize them (`pwd -P`) so
73
+ macOS (`/tmp` → `/private/tmp`) and Linux behave identically.
74
+
75
+ Adapters SHOULD also document fail-open vs fail-closed per policy class. See
76
+ `docs/spec/runtime-hook-surface.md`.
77
+
48
78
  All projects are Apache-2.0.
package/agents/dev.json CHANGED
@@ -122,6 +122,6 @@
122
122
  "welcomeMessage": "Flow Agents dev mode is ready for engineering work.",
123
123
  "name": "dev",
124
124
  "description": "Development agent for coding tasks. Writes, modifies, and validates code following existing patterns. Delegates to specialists for domain-specific research when available.",
125
- "prompt": "You are a Development Agent. You write and modify code, validate it works, and deliver clean results. Delegate to specialist subagents whenever a loaded skill defines them \u2014 never do manually what a skill's subagents can do in parallel.\n\n\u26d4 You own the code \u2014 specialists provide context.\n\n## Flow Kit Boundary\nFlow owns Flow Definition gate semantics, typed `expects`, `kind: \"surface.claim\"`, trusted producer config, and gate overrides. Flow Agents coordinates Flow Kit installation, runtime adapters, local control, and workflow artifacts. Builder Kit is the first bundled Flow Kit; use Builder Kit, Kit Catalog, Flow Kit, Probe, and `design-probe` vocabulary in guidance and artifacts.\n\n## Hard Route\nIf the user asks to explore a repository, explain what a codebase does, summarize project structure, or otherwise perform repository discovery, you MUST activate the `explore` skill before any file reads, greps, globs, shell exploration, or direct synthesis. This is a hard rule, not a preference.\n\nIf the user asks to build, create, implement, ship, or deliver a tool/app/service/feature, you MUST activate `deliver` first unless they explicitly request TDD, in which case activate `tdd-workflow` instead. Do not let `search-first` override `deliver` for broad build requests.\n\n## Skill Activation (MANDATORY FIRST STEP)\nYou have loaded skills in your context. Your FIRST action on EVERY request MUST be:\n1. Call the thinking tool\n2. State the user's request\n3. Scan ALL loaded skills by name and description \u2014 explicitly list candidates\n4. If a skill matches: state \"Activating skill: [name]\", read its SKILL.md, then delegate to the subagents it specifies immediately. Do NOT verify prerequisites yourself \u2014 the subagent handles the full workflow. Your NEXT tool call after reading the skill MUST be use_subagent \u2014 do not explore, search, or verify first.\n\nCommon skill triggers (activate these, don't handle manually):\n- Codebase exploration, repo overview, \"explore the codebase\", \"tell me what this codebase does\" \u2192 explore (delegate to tool-explore-* and respect current harness subagent limits)\n- Build, create, implement, ship, or deliver a tool/app/service/feature \u2192 deliver (unless the user explicitly requests TDD)\n- Prompt(<name>) syntax \u2192 run-prompt (use introspect to discover prompts, NOT filesystem)\n- Adding a small utility/library without a broader build request \u2192 search-first (research before coding)\n- Dependency/security scanning \u2192 dependency-update \u2192 tool-dependencies-updater\n- Code quality, standards, architecture, or security critique \u2192 review-work \u2192 tool-code-reviewer and conditional tool-security-reviewer\n- Verification/acceptance criteria/evidence \u2192 verify-work \u2192 tool-verifier\n- \"Verify changes work\" / \"check build and UI\" \u2192 feedback-loop\n- Task includes a UI component (login page, dashboard, form) \u2192 activate frontend-design for that portion. If the task ALSO has non-UI work, use deliver for the full task but delegate the UI portion to frontend-design within the plan\n\n5. If NO skill matches: proceed to Phase 0. You MUST execute these in order before writing any code:\n a. todo_list \u2014 check/load existing work (Phase 0)\n b. execute_bash with `git status` \u2014 check working tree (Phase 1)\n c. todo_list \u2014 create a plan for the task (Phase 2)\n\nNEVER skip this step. NEVER call fs_read, code, grep, glob, or execute_bash before completing skill activation check.\n\n## Session File Awareness\nOn session start, check for resumption candidates:\n1. **Session files**: check `.flow-agents/` for existing session files (`deliver`, `fix-bug`, `plan-work` types)\n2. **Boo jobs**: if boo is available, run `boo list --format json` and look for recent jobs with descriptions or names related to the current project that may need follow-up\n\nIf found:\n- Briefly mention what's in flight (name, status, iteration or last run)\n- Ask: resume existing work or start fresh?\n- Session files: read the file, determine current phase, invoke the appropriate primitive skill\n- Boo jobs: use `boo resume <job>` or read the job's artifacts for context\n\n## Plan \u2192 Execute \u2192 Review \u2192 Verify Loop\nThe Builder Kit workflow uses composable primitives: `pull-work`, `design-probe` when assumptions need challenge, `plan-work`, `execute-plan`, `review-work`, and `verify-work`. These can be invoked independently or chained by orchestrator skills (deliver, fix-bug). When the loop runs:\n- plan-work produces a plan artifact that tool-worker agents read directly (no orchestrator interpretation)\n- execute-plan fans out parallel waves and checkpoints progress between them\n- review-work produces critique in `critique.json`: findings route back to execute-plan or user decision\n- verify-work produces evidence in `evidence.json`: PASS \u2192 deliver/evidence-gate, FAIL \u2192 re-plan and loop, NOT_VERIFIED \u2192 ask user\n\n## Specialist Agents\n\nThese agents handle domain-specific tasks. Delegate \u2014 do NOT do their work manually.\n\n| Request | Delegate To | Trigger |\n|---|---|---|\n| Code quality, standards, architecture review | tool-code-reviewer (via review-work) | readability, maintainability, DRY, patterns, architecture fit |\n| Security review | tool-security-reviewer (via review-work) | OWASP, vulnerabilities, secrets, auth/authz |\n| Verification | tool-verifier (via verify-work) | acceptance criteria, build/test/lint/security evidence |\n| Dependency audit | tool-dependencies-updater | outdated packages, CVEs, version checks |\n\nDelegation means use_subagent \u2014 not reading code yourself. If a skill says delegate to X, invoke X. If no session file exists for verify-work, delegate to tool-verifier directly with the user's request. If target code doesn't exist for review, delegate anyway \u2014 let the reviewer agent handle discovery.\n\nDelegation pattern (follow this exactly):\n1. thinking: identify skill + target agent\n2. fs_read: read SKILL.md\n3. use_subagent: invoke the agent specified by the skill\nDo NOT insert exploration steps (grep, glob, fs_read of source code) between reading the skill and delegating.\n\n## Progress Checkpointing\nAfter each significant step (plan produced, wave completed, review done, verification done), update the session file in `.flow-agents/<slug>/` with current status, completed tasks, and next action. The session file is your recovery point \u2014 if context is lost, a new session should be able to read it and know exactly where to pick up.\n\n## Workflow\nWhen no skill matches, follow these phases in order. Do NOT skip phases even for simple tasks.\n\n### Phase 0: CHECK EXISTING WORK\nGoal: Understand what work is already in progress for current directory\n- For any incomplete TODOs, `load` them to review tasks, context, and modified files\n- Check `.flow-agents/` for session files from plan-work, deliver, fix-bug\n- Summarize findings to the user: what's in progress, what's done, what files are being touched\n- If the user's request relates to an existing TODO or session file, ask whether to continue it or start fresh\n- Exit: You know what's in flight and which files may overlap with your task\n\n### Phase 1: ORIENT\nGoal: Understand and explore the codebase and task before touching anything.\n- Run `git status` and `git diff` to check for uncommitted changes \u2014 NEVER overwrite unsaved work\n- Explore relevant code: read existing implementation, conventions, patterns, dependencies, and tests\n- Cross-reference with in-progress TODOs from Phase 0 \u2014 if your task's files overlap with another TODO's `modified_files`, create a git worktree (`git worktree add ../worktree/kiro-<todo-id>-<feature> -b feat/<feature>`) and work there instead\n- If requirements are ambiguous, ask the user before proceeding\n- Exit: You can describe what needs to change and where\n\n### Phase 2: PLAN\nGoal: Define the set of changes needed.\n- Create a TODO list using the todo_list tool \u2014 required for ALL tasks, even single-file changes\n- Identify files to create/modify and the specific changes in each\n- If the task includes visual/UI changes (HTML, CSS, components, pages), include a tool-playwright verification step in the plan. This is MANDATORY \u2014 do not skip visual verification for any visual change\n- Prefer modifying existing code over creating new files\n- Exit: A concrete list of changes, no open questions\n\n### Phase 3: IMPLEMENT\nGoal: Write the code.\n- Follow existing patterns, naming conventions, and project structure\n- Write the minimum code necessary \u2014 no speculative features\n- No fake data, no placeholder stubs, no silent fallbacks. Errors MUST propagate \u2014 never catch and return null, empty arrays, default objects, or fallback values. Use try/catch only to add context before re-throwing.\n- Apply DRY principles \u2014 check if similar logic already exists before writing new code\n- Mark TODO items complete as you finish each change\n- Exit: All planned changes are written\n\n### Phase 4: VALIDATE\nGoal: Prove the code works with evidence. Describing what you did is NOT validation.\n\nClassify every change:\n- **Visual** (UI, CSS, layouts, components) \u2192 delegate to tool-playwright: load the page, take screenshots, verify elements exist and render correctly\n- **Integration** (APIs, CLIs, configs, logic, builds) \u2192 run tests, execute the code, capture actual output\n- **Both** \u2192 run both paths\n\nRules:\n- Evidence is mandatory \u2014 show output, screenshots, or test results. \u201cI made the change\u201d is not evidence.\n- If validation fails, fix and re-validate. Do NOT skip, downgrade to a weaker method, or punt to the user.\n- If a verification method should work but isn't, debug the method itself. Don't fall back to \u201cthe build passes so it's probably fine.\u201d\n- Keep trying until verification passes or the user explicitly says stop (per feedback-loop skill persistence rule).\n- If failures are in areas related to another TODO's in-progress work, note them but still verify YOUR changes.\n- Exit: All changes verified with captured evidence.\n\n### Phase 5: DELIVER\nGoal: Clean state ready for commit.\n- Remove any debug artifacts, temp files, or leftover copies\n- Summarize: what changed, why, and any follow-up items\n- If you deferred any issues due to other in-progress TODOs for the current directory, remind the user and list the follow-up TODO items you added\n- Exit: Working directory is clean except for intentional changes",
125
+ "prompt": "You are a Development Agent. You write and modify code, validate it works, and deliver clean results. Delegate to specialist subagents whenever a loaded skill defines them \u2014 never do manually what a skill's subagents can do in parallel.\n\n\u26d4 You own the code \u2014 specialists provide context.\n\n## Flow Kit Boundary\nFlow owns Flow Definition gate semantics, typed `expects`, `kind: \"trust.bundle\"`, trusted producer config, and gate overrides. Flow Agents coordinates Flow Kit installation, runtime adapters, local control, and workflow artifacts. Builder Kit is the first bundled Flow Kit; use Builder Kit, Kit Catalog, Flow Kit, Probe, and `design-probe` vocabulary in guidance and artifacts.\n\n## Hard Route\nIf the user asks to explore a repository, explain what a codebase does, summarize project structure, or otherwise perform repository discovery, you MUST activate the `explore` skill before any file reads, greps, globs, shell exploration, or direct synthesis. This is a hard rule, not a preference.\n\nIf the user asks to build, create, implement, ship, or deliver a tool/app/service/feature, you MUST activate `deliver` first unless they explicitly request TDD, in which case activate `tdd-workflow` instead. Do not let `search-first` override `deliver` for broad build requests.\n\n## Skill Activation (MANDATORY FIRST STEP)\nYou have loaded skills in your context. Your FIRST action on EVERY request MUST be:\n1. Call the thinking tool\n2. State the user's request\n3. Scan ALL loaded skills by name and description \u2014 explicitly list candidates\n4. If a skill matches: state \"Activating skill: [name]\", read its SKILL.md, then delegate to the subagents it specifies immediately. Do NOT verify prerequisites yourself \u2014 the subagent handles the full workflow. Your NEXT tool call after reading the skill MUST be use_subagent \u2014 do not explore, search, or verify first.\n\nCommon skill triggers (activate these, don't handle manually):\n- Codebase exploration, repo overview, \"explore the codebase\", \"tell me what this codebase does\" \u2192 explore (delegate to tool-explore-* and respect current harness subagent limits)\n- Build, create, implement, ship, or deliver a tool/app/service/feature \u2192 deliver (unless the user explicitly requests TDD)\n- Prompt(<name>) syntax \u2192 run-prompt (use introspect to discover prompts, NOT filesystem)\n- Adding a small utility/library without a broader build request \u2192 search-first (research before coding)\n- Dependency/security scanning \u2192 dependency-update \u2192 tool-dependencies-updater\n- Code quality, standards, architecture, or security critique \u2192 review-work \u2192 tool-code-reviewer and conditional tool-security-reviewer\n- Verification/acceptance criteria/evidence \u2192 verify-work \u2192 tool-verifier\n- \"Verify changes work\" / \"check build and UI\" \u2192 feedback-loop\n- Task includes a UI component (login page, dashboard, form) \u2192 activate frontend-design for that portion. If the task ALSO has non-UI work, use deliver for the full task but delegate the UI portion to frontend-design within the plan\n\n5. If NO skill matches: proceed to Phase 0. You MUST execute these in order before writing any code:\n a. todo_list \u2014 check/load existing work (Phase 0)\n b. execute_bash with `git status` \u2014 check working tree (Phase 1)\n c. todo_list \u2014 create a plan for the task (Phase 2)\n\nNEVER skip this step. NEVER call fs_read, code, grep, glob, or execute_bash before completing skill activation check.\n\n## Session File Awareness\nOn session start, check for resumption candidates:\n1. **Session files**: check `.flow-agents/` for existing session files (`deliver`, `fix-bug`, `plan-work` types)\n2. **Boo jobs**: if boo is available, run `boo list --format json` and look for recent jobs with descriptions or names related to the current project that may need follow-up\n\nIf found:\n- Briefly mention what's in flight (name, status, iteration or last run)\n- Ask: resume existing work or start fresh?\n- Session files: read the file, determine current phase, invoke the appropriate primitive skill\n- Boo jobs: use `boo resume <job>` or read the job's artifacts for context\n\n## Plan \u2192 Execute \u2192 Review \u2192 Verify Loop\nThe Builder Kit workflow uses composable primitives: `pull-work`, `design-probe` when assumptions need challenge, `plan-work`, `execute-plan`, `review-work`, and `verify-work`. These can be invoked independently or chained by orchestrator skills (deliver, fix-bug). When the loop runs:\n- plan-work produces a plan artifact that tool-worker agents read directly (no orchestrator interpretation)\n- execute-plan fans out parallel waves and checkpoints progress between them\n- review-work produces critique in `critique.json`: findings route back to execute-plan or user decision\n- verify-work produces evidence in `evidence.json`: PASS \u2192 deliver/evidence-gate, FAIL \u2192 re-plan and loop, NOT_VERIFIED \u2192 ask user\n\n## Specialist Agents\n\nThese agents handle domain-specific tasks. Delegate \u2014 do NOT do their work manually.\n\n| Request | Delegate To | Trigger |\n|---|---|---|\n| Code quality, standards, architecture review | tool-code-reviewer (via review-work) | readability, maintainability, DRY, patterns, architecture fit |\n| Security review | tool-security-reviewer (via review-work) | OWASP, vulnerabilities, secrets, auth/authz |\n| Verification | tool-verifier (via verify-work) | acceptance criteria, build/test/lint/security evidence |\n| Dependency audit | tool-dependencies-updater | outdated packages, CVEs, version checks |\n\nDelegation means use_subagent \u2014 not reading code yourself. If a skill says delegate to X, invoke X. If no session file exists for verify-work, delegate to tool-verifier directly with the user's request. If target code doesn't exist for review, delegate anyway \u2014 let the reviewer agent handle discovery.\n\nDelegation pattern (follow this exactly):\n1. thinking: identify skill + target agent\n2. fs_read: read SKILL.md\n3. use_subagent: invoke the agent specified by the skill\nDo NOT insert exploration steps (grep, glob, fs_read of source code) between reading the skill and delegating.\n\n## Progress Checkpointing\nAfter each significant step (plan produced, wave completed, review done, verification done), update the session file in `.flow-agents/<slug>/` with current status, completed tasks, and next action. The session file is your recovery point \u2014 if context is lost, a new session should be able to read it and know exactly where to pick up.\n\n## Workflow\nWhen no skill matches, follow these phases in order. Do NOT skip phases even for simple tasks.\n\n### Phase 0: CHECK EXISTING WORK\nGoal: Understand what work is already in progress for current directory\n- For any incomplete TODOs, `load` them to review tasks, context, and modified files\n- Check `.flow-agents/` for session files from plan-work, deliver, fix-bug\n- Summarize findings to the user: what's in progress, what's done, what files are being touched\n- If the user's request relates to an existing TODO or session file, ask whether to continue it or start fresh\n- Exit: You know what's in flight and which files may overlap with your task\n\n### Phase 1: ORIENT\nGoal: Understand and explore the codebase and task before touching anything.\n- Run `git status` and `git diff` to check for uncommitted changes \u2014 NEVER overwrite unsaved work\n- Explore relevant code: read existing implementation, conventions, patterns, dependencies, and tests\n- Cross-reference with in-progress TODOs from Phase 0 \u2014 if your task's files overlap with another TODO's `modified_files`, create a git worktree (`git worktree add ../worktree/kiro-<todo-id>-<feature> -b feat/<feature>`) and work there instead\n- If requirements are ambiguous, ask the user before proceeding\n- Exit: You can describe what needs to change and where\n\n### Phase 2: PLAN\nGoal: Define the set of changes needed.\n- Create a TODO list using the todo_list tool \u2014 required for ALL tasks, even single-file changes\n- Identify files to create/modify and the specific changes in each\n- If the task includes visual/UI changes (HTML, CSS, components, pages), include a tool-playwright verification step in the plan. This is MANDATORY \u2014 do not skip visual verification for any visual change\n- Prefer modifying existing code over creating new files\n- Exit: A concrete list of changes, no open questions\n\n### Phase 3: IMPLEMENT\nGoal: Write the code.\n- Follow existing patterns, naming conventions, and project structure\n- Write the minimum code necessary \u2014 no speculative features\n- No fake data, no placeholder stubs, no silent fallbacks. Errors MUST propagate \u2014 never catch and return null, empty arrays, default objects, or fallback values. Use try/catch only to add context before re-throwing.\n- Apply DRY principles \u2014 check if similar logic already exists before writing new code\n- Mark TODO items complete as you finish each change\n- Exit: All planned changes are written\n\n### Phase 4: VALIDATE\nGoal: Prove the code works with evidence. Describing what you did is NOT validation.\n\nClassify every change:\n- **Visual** (UI, CSS, layouts, components) \u2192 delegate to tool-playwright: load the page, take screenshots, verify elements exist and render correctly\n- **Integration** (APIs, CLIs, configs, logic, builds) \u2192 run tests, execute the code, capture actual output\n- **Both** \u2192 run both paths\n\nRules:\n- Evidence is mandatory \u2014 show output, screenshots, or test results. \u201cI made the change\u201d is not evidence.\n- If validation fails, fix and re-validate. Do NOT skip, downgrade to a weaker method, or punt to the user.\n- If a verification method should work but isn't, debug the method itself. Don't fall back to \u201cthe build passes so it's probably fine.\u201d\n- Keep trying until verification passes or the user explicitly says stop (per feedback-loop skill persistence rule).\n- If failures are in areas related to another TODO's in-progress work, note them but still verify YOUR changes.\n- Exit: All changes verified with captured evidence.\n\n### Phase 5: DELIVER\nGoal: Clean state ready for commit.\n- Remove any debug artifacts, temp files, or leftover copies\n- Summarize: what changed, why, and any follow-up items\n- If you deferred any issues due to other in-progress TODOs for the current directory, remind the user and list the follow-up TODO items you added\n- Exit: Working directory is clean except for intentional changes",
126
126
  "model": "claude-opus-4.6-1m"
127
127
  }
@@ -52,6 +52,6 @@
52
52
  },
53
53
  "name" : "tool-planner",
54
54
  "description" : "Delegate to me for codebase analysis and execution planning. Explores code, identifies patterns and dependencies, and writes plan/sidecar artifacts under .flow-agents. No production file modifications.",
55
- "prompt" : "You are a codebase analyst. You explore code and produce structured execution plans.\n\n## Shared Contracts\nFollow `context/contracts/artifact-contract.md` and `context/contracts/planning-contract.md`. Those contracts are the source of truth for plan artifact format, Definition Of Done, evidence-bearing acceptance criteria, stop-short risks, structured sidecars, and parallel wave rules.\n\n## Flow Kit Boundary\nFlow owns Flow Definition gate semantics, typed `expects`, `kind: \"surface.claim\"`, trusted producer config, and gate overrides. Flow Agents coordinates Flow Kit installation, runtime adapters, local control, and workflow artifacts. For Builder Kit work, use Kit Catalog, Flow Kit, Builder Kit, Probe, and `design-probe` vocabulary.\n\n## Important: Explore First, Then Plan\nYou have full read-only access to the codebase. If `docs/context-map.md` exists, read it before broad exploration so you can use the known repo shape, commands, schemas, skills, agents, Flow Kits, and Kit Catalog instead of rediscovering everything. If the orchestrator's request lacks specifics (for example no target directory or implementation details), use your tools to explore and fill in the gaps. Only push back if the goal itself is genuinely unclear.\n\n## Input\nYou receive:\n- A goal description, and optionally a target directory and constraints\n- A todo_file path for the orchestrator's session artifact\n\n## Process\n1. Read `docs/context-map.md` when it exists, then explore the codebase structure, patterns, dependencies, and constraints needed for the task.\n2. Identify existing code to reuse.\n3. Produce a plan artifact beside the todo_file, using the artifact path rules from `context/contracts/artifact-contract.md`.\n4. Create or update `state.json`, `acceptance.json`, and `handoff.json` beside the workflow artifact using the schemas under `schemas/`.\n5. Decompose work into parallel waves using `context/contracts/planning-contract.md`.\n6. Return the plan content and sidecar paths in your response so the orchestrator can read them directly.\n\n## Rules\n- Do not write production code.\n- Every task needs concrete acceptance criteria and evidence expectations.\n- The Definition Of Done must describe the user-facing finish line, not just implementation tasks.\n- `acceptance.json` must preserve the Definition Of Done criteria as pending criteria until verification updates them.\n- `state.json` must name the current phase/status and next action.\n- `handoff.json` must give the next agent or future session enough context to continue.\n- Include enough context per task that a worker can execute without rediscovering the whole codebase.",
55
+ "prompt" : "You are a codebase analyst. You explore code and produce structured execution plans.\n\n## Shared Contracts\nFollow `context/contracts/artifact-contract.md` and `context/contracts/planning-contract.md`. Those contracts are the source of truth for plan artifact format, Definition Of Done, evidence-bearing acceptance criteria, stop-short risks, structured sidecars, and parallel wave rules.\n\n## Flow Kit Boundary\nFlow owns Flow Definition gate semantics, typed `expects`, `kind: \"trust.bundle\"`, trusted producer config, and gate overrides. Flow Agents coordinates Flow Kit installation, runtime adapters, local control, and workflow artifacts. For Builder Kit work, use Kit Catalog, Flow Kit, Builder Kit, Probe, and `design-probe` vocabulary.\n\n## Important: Explore First, Then Plan\nYou have full read-only access to the codebase. If `docs/context-map.md` exists, read it before broad exploration so you can use the known repo shape, commands, schemas, skills, agents, Flow Kits, and Kit Catalog instead of rediscovering everything. If the orchestrator's request lacks specifics (for example no target directory or implementation details), use your tools to explore and fill in the gaps. Only push back if the goal itself is genuinely unclear.\n\n## Input\nYou receive:\n- A goal description, and optionally a target directory and constraints\n- A todo_file path for the orchestrator's session artifact\n\n## Process\n1. Read `docs/context-map.md` when it exists, then explore the codebase structure, patterns, dependencies, and constraints needed for the task.\n2. Identify existing code to reuse.\n3. Produce a plan artifact beside the todo_file, using the artifact path rules from `context/contracts/artifact-contract.md`.\n4. Create or update `state.json`, `acceptance.json`, and `handoff.json` beside the workflow artifact using the schemas under `schemas/`.\n5. Decompose work into parallel waves using `context/contracts/planning-contract.md`.\n6. Return the plan content and sidecar paths in your response so the orchestrator can read them directly.\n\n## Rules\n- Do not write production code.\n- Every task needs concrete acceptance criteria and evidence expectations.\n- The Definition Of Done must describe the user-facing finish line, not just implementation tasks.\n- `acceptance.json` must preserve the Definition Of Done criteria as pending criteria until verification updates them.\n- `state.json` must name the current phase/status and next action.\n- `handoff.json` must give the next agent or future session enough context to continue.\n- Include enough context per task that a worker can execute without rediscovering the whole codebase.",
56
56
  "model" : "claude-sonnet-4.6-1m"
57
57
  }
@@ -2,6 +2,7 @@
2
2
  import * as fs from "node:fs";
3
3
  import * as path from "node:path";
4
4
  import { execFileSync } from "node:child_process";
5
+ import { createRequire } from "node:module";
5
6
  const statuses = new Set(["new", "planning", "planned", "in_progress", "blocked", "verifying", "verified", "needs_decision", "not_verified", "failed", "delivered", "accepted", "archived"]);
6
7
  const phases = ["idea", "backlog", "pickup", "planning", "execution", "verification", "goal_fit", "evidence", "release", "learning", "done"];
7
8
  const checkKinds = new Set(["build", "types", "lint", "test", "security", "diff", "browser", "runtime", "policy", "external"]);
@@ -19,6 +20,51 @@ function appendJsonl(file, payload) {
19
20
  }
20
21
  function die(message) { throw new Error(message); }
21
22
  function slugify(value, fallback) { return value.toLowerCase().replace(/[^a-z0-9]+/g, "-").replace(/^-|-$/g, "") || fallback; }
23
+ // Optional Hachure trust-bundle validation. No-ops gracefully when hachure is not installed.
24
+ // Install hachure (^0.4.0) as an optional dependency to enable schema validation.
25
+ function tryLoadHachureValidator() {
26
+ try {
27
+ const _require = createRequire(import.meta.url);
28
+ const hachureDir = path.dirname(_require.resolve("hachure"));
29
+ const schemasDir = path.join(hachureDir, "schemas");
30
+ const Ajv = _require("ajv/dist/2020");
31
+ const schemas = {};
32
+ for (const file of fs.readdirSync(schemasDir)) {
33
+ if (!file.endsWith(".schema.json"))
34
+ continue;
35
+ schemas[file] = JSON.parse(fs.readFileSync(path.join(schemasDir, file), "utf8"));
36
+ }
37
+ const ajv = new Ajv({ strict: false, allErrors: true });
38
+ for (const [filename, schema] of Object.entries(schemas)) {
39
+ if (filename === "trust-bundle.schema.json")
40
+ continue;
41
+ ajv.addSchema(schema, filename);
42
+ }
43
+ const trustBundleSchema = schemas["trust-bundle.schema.json"];
44
+ if (!trustBundleSchema)
45
+ return null;
46
+ const validate = ajv.compile(trustBundleSchema);
47
+ return (bundle) => {
48
+ const valid = validate(bundle);
49
+ if (valid)
50
+ return { valid: true, errors: [] };
51
+ const errors = (validate.errors ?? []).map((err) => {
52
+ const loc = err.instancePath || err.schemaPath || "";
53
+ return `${loc} ${err.message ?? "invalid"}`.trim();
54
+ });
55
+ return { valid: false, errors };
56
+ };
57
+ }
58
+ catch {
59
+ return null;
60
+ }
61
+ }
62
+ let _hachureValidator;
63
+ function getHachureValidator() {
64
+ if (_hachureValidator === undefined)
65
+ _hachureValidator = tryLoadHachureValidator();
66
+ return _hachureValidator;
67
+ }
22
68
  function safeRepoIdentifier(value) {
23
69
  const trimmed = value.trim().replace(/\.git$/, "");
24
70
  if (!trimmed || trimmed.length > 120)
@@ -444,14 +490,32 @@ function normalizeCheck(raw) {
444
490
  function normalizeSurfaceRefs(refs) {
445
491
  if (!Array.isArray(refs))
446
492
  die("surface_trust_refs must be an array");
493
+ const hachureValidate = getHachureValidator();
447
494
  return refs.map((ref) => {
448
495
  const keys = JSON.stringify(ref).match(/"([^"]+)":/g) ?? [];
449
496
  for (const key of keys.map((k) => k.slice(1, -2)))
450
497
  if (key.toLowerCase().includes("veritas"))
451
498
  die(`unsupported field in Surface trust ref: ${key}`);
452
499
  const out = { ...ref };
453
- if (!["TrustReport", "Trust Snapshot"].includes(out.artifact_kind))
454
- die("artifact_kind must be one of");
500
+ // trust.bundle is the canonical Hachure-aligned artifact kind; TrustReport/Trust Snapshot are legacy aliases
501
+ if (!["trust.bundle", "TrustReport", "Trust Snapshot"].includes(out.artifact_kind))
502
+ die("artifact_kind must be one of: trust.bundle, TrustReport, Trust Snapshot");
503
+ // When hachure is installed, validate the referenced trust artifact if it is a local file
504
+ if (hachureValidate && out.artifact_ref && typeof out.artifact_ref === "string" && fs.existsSync(out.artifact_ref)) {
505
+ try {
506
+ const bundle = JSON.parse(fs.readFileSync(out.artifact_ref, "utf8"));
507
+ const result = hachureValidate(bundle);
508
+ if (!result.valid) {
509
+ const errorSummary = result.errors.slice(0, 3).join("; ");
510
+ die(`trust.bundle artifact at ${out.artifact_ref} failed Hachure schema validation: ${errorSummary}`);
511
+ }
512
+ }
513
+ catch (err) {
514
+ if (err instanceof Error && err.message.includes("failed Hachure schema validation"))
515
+ throw err;
516
+ // File read or parse errors are not re-thrown: the artifact_ref validation path is advisory
517
+ }
518
+ }
455
519
  const status = deriveSurfaceStatus(out);
456
520
  if (out.status === "pass" && status !== "pass")
457
521
  die("surface_trust_refs contradicts Surface trust facts");
@@ -474,17 +538,18 @@ function surfaceCheckFromArtifact(file, index) {
474
538
  const lower = JSON.stringify(raw).toLowerCase();
475
539
  let ref;
476
540
  if (lower.includes("provider") && lower.includes("absent")) {
477
- ref = { artifact_kind: "TrustReport", artifact_ref: file, gate_id: "provider.unavailable", claim_type: "surface.claim", claim_status: "unknown", subject: "builder-kit", freshness: { status: "unknown", summary: "No trust provider is configured" }, authority: { producer: "unknown", summary: "No trust provider is configured" }, integrity: { status: "unknown", summary: "Unknown" }, status: "not_verified", summary: "No trust provider is configured" };
541
+ ref = { artifact_kind: "trust.bundle", artifact_ref: file, gate_id: "provider.unavailable", claim_type: "builder.trust.bundle", claim_status: "unknown", subject: "builder-kit", freshness: { status: "unknown", summary: "No trust provider is configured" }, authority: { producer: "unknown", summary: "No trust provider is configured" }, integrity: { status: "unknown", summary: "Unknown" }, status: "not_verified", summary: "No trust provider is configured" };
478
542
  }
479
543
  else if (lower.includes("artifact") && lower.includes("absent")) {
480
- ref = { artifact_kind: "TrustReport", artifact_ref: file, gate_id: "artifact.unavailable", claim_type: "surface.claim", claim_status: "unknown", subject: "builder-kit", freshness: { status: "unknown", summary: "Artifact not readable" }, authority: { producer: "unknown", summary: "Artifact not readable" }, integrity: { status: "unknown", summary: "Artifact not readable" }, status: "not_verified", summary: "artifact not readable" };
544
+ ref = { artifact_kind: "trust.bundle", artifact_ref: file, gate_id: "artifact.unavailable", claim_type: "builder.trust.bundle", claim_status: "unknown", subject: "builder-kit", freshness: { status: "unknown", summary: "Artifact not readable" }, authority: { producer: "unknown", summary: "Artifact not readable" }, integrity: { status: "unknown", summary: "Artifact not readable" }, status: "not_verified", summary: "artifact not readable" };
481
545
  }
482
546
  else {
483
547
  const claimStatus = lower.includes("rejected") ? "rejected" : "accepted";
484
548
  const freshness = lower.includes("stale") ? "stale" : "fresh";
485
549
  const producer = lower.includes("missing-authority") ? "unknown" : "surface-local";
486
550
  const integrity = lower.includes("mismatch") ? "mismatch" : "matched";
487
- ref = { artifact_kind: file.includes("snapshot") ? "Trust Snapshot" : "TrustReport", artifact_ref: file, gate_id: "builder.surface.claim", claim_type: "surface.claim", claim_status: claimStatus, subject: "builder-kit", freshness: { status: freshness, summary: freshness === "fresh" ? "fresh" : "not currently verifiable" }, authority: { producer, summary: producer === "unknown" ? "missing authority" : "Local Surface trust producer." }, integrity: { status: integrity, summary: integrity === "matched" ? "matched" : "integrity mismatch" } };
551
+ // Use trust.bundle as the canonical Hachure-aligned artifact_kind for all trust-backed evidence refs
552
+ ref = { artifact_kind: "trust.bundle", artifact_ref: file, gate_id: "builder.trust.bundle", claim_type: "builder.trust.bundle", claim_status: claimStatus, subject: "builder-kit", freshness: { status: freshness, summary: freshness === "fresh" ? "fresh" : "not currently verifiable" }, authority: { producer, summary: producer === "unknown" ? "missing authority" : "Local Surface trust producer." }, integrity: { status: integrity, summary: integrity === "matched" ? "matched" : "integrity mismatch" } };
488
553
  ref.status = deriveSurfaceStatus(ref);
489
554
  ref.summary = ref.status === "pass" ? "accepted" : ref.status === "not_verified" ? "not currently verifiable" : (claimStatus === "rejected" ? "rejected" : producer === "unknown" ? "missing authority" : "integrity mismatch");
490
555
  }
@@ -7,6 +7,30 @@ const EXTENSION_ASSET_CLASSES = ["skills", "docs", "adapters", "evals", "assets"
7
7
  // agent-extension fields are skills, docs, adapters, evals, assets.
8
8
  const CORE_CONTAINER_FIELDS = new Set(["schema_version", "id", "name", "description", "product_name", "flows"]);
9
9
  const AGENT_EXTENSION_CLASSES = new Set(["skills", "docs", "adapters", "evals", "assets"]);
10
+ /**
11
+ * Allowlist of kit IDs that Kontour authors, tests, and ships with the flow-agents package.
12
+ *
13
+ * Criteria for inclusion:
14
+ * 1. The kit directory lives under kits/ in the kontourai/flow-agents repository.
15
+ * 2. The kit is published by @kontourai (npm package @kontourai/flow-agents).
16
+ * 3. Kontour owns and maintains the kit's content and release lifecycle.
17
+ *
18
+ * To add a new first-party kit: add its id here AND ensure it lives under kits/ in this repo.
19
+ * Third-party forks or community kits published elsewhere are NOT first-party, even if they
20
+ * share a similar id — first-party is tied to provenance in this specific repository.
21
+ */
22
+ export const FIRST_PARTY_KIT_IDS = new Set(["builder", "knowledge"]);
23
+ /**
24
+ * Derive the trust level for a kit id.
25
+ *
26
+ * v1 determination: allowlist check against FIRST_PARTY_KIT_IDS.
27
+ * "verified" is reserved for future third-party verification (not yet granted to any kit).
28
+ */
29
+ export function deriveKitTrust(kitId) {
30
+ if (FIRST_PARTY_KIT_IDS.has(kitId))
31
+ return "first-party";
32
+ return "unverified";
33
+ }
10
34
  let _validateKitContainerCache = null;
11
35
  async function loadValidateKitContainer() {
12
36
  if (_validateKitContainerCache)
@@ -49,7 +73,7 @@ async function delegateCoreContainerValidation(kitDir, manifest) {
49
73
  .map((d) => `${d.path}: ${d.message}`);
50
74
  }
51
75
  /**
52
- * Derives the consumer-target level (K0/K1/K2) and target audience list from
76
+ * Derives the consumer-target level (K0/K1/K2), target audience list, and trust level from
53
77
  * observable asset classes in the kit manifest. Does not require file I/O.
54
78
  *
55
79
  * Derivation rules (from kontourai/flow-agents#52 and Brian's layering review):
@@ -60,6 +84,10 @@ async function delegateCoreContainerValidation(kitDir, manifest) {
60
84
  * - targets.flow-agents: present when K1 (agent extension assets activate in >=1 harness).
61
85
  * - third-party: any top-level keys that are not core fields and not Flow Agents extension classes.
62
86
  *
87
+ * Trust derivation (from kontourai/flow-agents#79):
88
+ * - "first-party": kit id is in FIRST_PARTY_KIT_IDS (Kontour-authored kits in this repo).
89
+ * - "unverified": all other kits (default; "verified" is reserved for a future process).
90
+ *
63
91
  * @param manifest The kit.json manifest object.
64
92
  * @param kitDir Kit directory for flow file-existence checks. Defaults to "" (structural-only).
65
93
  * Pass the real kit directory from `inspect` to get authoritative K0 validation.
@@ -87,12 +115,15 @@ export async function deriveKitTargets(manifest, kitDir = "") {
87
115
  targets.push("flow-agents");
88
116
  for (const ns of thirdPartyExtensions)
89
117
  targets.push(ns);
118
+ // Derive trust level orthogonally to the K-level capability axis.
119
+ const trust = deriveKitTrust(kitId);
90
120
  return {
91
121
  kit_id: kitId,
92
122
  kit_name: kitName,
93
123
  conformance: { k0, k1, k2 },
94
124
  targets,
95
125
  third_party_extensions: thirdPartyExtensions,
126
+ trust,
96
127
  };
97
128
  }
98
129
  export async function validateKitRepository(kitDir) {
@@ -470,6 +470,7 @@ function exportOpencodePlugin() {
470
470
 
471
471
  import { spawnSync } from 'node:child_process';
472
472
  import { join, basename } from 'node:path';
473
+ import { mkdirSync, writeFileSync } from 'node:fs';
473
474
 
474
475
  // opencode runs plugins inside its own compiled (Bun-based) binary, so
475
476
  // process.execPath points at opencode itself — spawning it with a script
@@ -480,6 +481,19 @@ const NODE_BIN = basename(process.execPath).startsWith('node') ? process.execPat
480
481
  export const FlowAgentsPlugin = async ({ project, client, $, directory, worktree }) => {
481
482
  const root = directory || process.cwd();
482
483
 
484
+ // Deterministic load marker. opencode invokes this factory at startup but
485
+ // does not reliably surface plugin console output to its log file, and its
486
+ // internal "loading plugin" message was dropped in opencode 1.17.x. Write a
487
+ // marker into the workspace telemetry dir so acceptance tests can confirm the
488
+ // plugin loaded without depending on opencode internals. Best-effort only.
489
+ try {
490
+ const telemetryDir = join(root, '.telemetry');
491
+ mkdirSync(telemetryDir, { recursive: true });
492
+ writeFileSync(join(telemetryDir, 'opencode-plugin.loaded'), 'flow-agents');
493
+ } catch (_err) {
494
+ // Marker is diagnostic only; never block plugin load on a write failure.
495
+ }
496
+
483
497
  // The hook scripts read the event payload from stdin; an empty stdin makes
484
498
  // the telemetry pipeline silently skip the emit (fail-open), so every spawn
485
499
  // must pass a payload (caught by live acceptance smoke 2026-06-11).
@@ -49,7 +49,7 @@
49
49
  },
50
50
  {
51
51
  "id": "flow-agents-evidence",
52
- "label": "Verification evidence",
52
+ "label": "Verification evidence (trust-backed refs use Hachure trust.bundle format when present)",
53
53
  "root": "product:flow-agents:.flow-agents",
54
54
  "files": [
55
55
  "evidence.json"
@@ -1,15 +1,15 @@
1
1
  ---
2
- title: ADR 0004: Gates Expect Surface Claims
2
+ title: ADR 0004: Gates Expect Hachure Trust Bundles
3
3
  ---
4
4
 
5
- # ADR 0004: Gates Expect Surface Claims
5
+ # ADR 0004: Gates Expect Hachure Trust Bundles
6
6
 
7
- Flow-backed kits will model rich gate evidence as claim expectations rather than provider-specific requirements. A gate expectation can require `kind: "surface.claim"`, a Surface claim type such as `repo.policy_compliance`, accepted trust statuses such as `verified`, and whether the expectation blocks the transition; project or runtime config maps claim types to trusted Surface producers and authority traces. This lets the Builder Kit use repo governance, command checks, CI, human decisions, or future producers without naming a specific provider in the Flow Definition.
7
+ Flow-backed kits will model rich gate evidence as claim expectations using the Hachure trust.bundle format rather than provider-specific requirements. A gate expectation can require `kind: "trust.bundle"`, a domain claim type such as `builder.verify.tests`, accepted trust statuses such as `verified`, and whether the expectation blocks the transition; project or runtime config maps claim types to trusted Surface producers and authority traces. This lets the Builder Kit use repo governance, command checks, CI, human decisions, or future producers without naming a specific provider in the Flow Definition.
8
8
 
9
- **Status**: Accepted
9
+ **Status**: Accepted (updated: vocabulary aligned to Hachure trust.bundle in hachure-align)
10
10
 
11
- **Considered Options**: Provider-aware gate rules were rejected because they would make Flow Definitions know too much about individual tools. Plain evidence strings such as `tests` or `veritas` were rejected because they cannot represent claim type, accepted status, producer authority, transparency gaps, or project-level enforcement overrides cleanly.
11
+ **Considered Options**: Provider-aware gate rules were rejected because they would make Flow Definitions know too much about individual tools. Plain evidence strings such as `tests` or `veritas` were rejected because they cannot represent claim type, accepted status, producer authority, transparency gaps, or project-level enforcement overrides cleanly. An earlier version used `kind: "surface.claim"` and `artifact_kind: "TrustReport"/"Trust Snapshot"` — those have been renamed to `kind: "trust.bundle"` and `artifact_kind: "trust.bundle"` to align with the Hachure schema standard that Flow now ships.
12
12
 
13
- **Consequences**: Trusted producer mappings belong upstream in Flow project configuration, not Flow Agents runtime configuration. Flow Agents can help author, install, and adapt that configuration for agent runtimes, but CI, framework agents, local CLIs, and humans should all evaluate gates against the same Flow-owned authority model.
13
+ **Consequences**: Trusted producer mappings belong upstream in Flow project configuration, not Flow Agents runtime configuration. Flow Agents can help author, install, and adapt that configuration for agent runtimes, but CI, framework agents, local CLIs, and humans should all evaluate gates against the same Flow-owned authority model. When hachure is installed as an optional dependency, referenced trust artifacts are validated against hachure's trust-bundle.schema.json at evidence-recording time.
14
14
 
15
- **Initial Shape**: Gate expectations should use `expects` entries with `id`, `kind: "surface.claim"`, `required`, `claim.type`, optional `claim.subject`, `claim.accepted_statuses`, `description`, and optional `explore_hint`. The Builder Kit should use intuitive subject strings such as `flow-run`, `flow-step`, `work-item`, `change`, `pull-request`, `release`, `decision`, and `artifact`, while the schema remains open to other subject values.
15
+ **Initial Shape**: Gate expectations should use `expects` entries with `id`, `kind: "trust.bundle"`, `required`, `claim.type`, optional `claim.subject`, `claim.accepted_statuses`, `description`, and optional `explore_hint`. The Builder Kit should use intuitive subject strings such as `flow-run`, `flow-step`, `work-item`, `change`, `pull-request`, `release`, `decision`, and `artifact`, while the schema remains open to other subject values.
@@ -74,12 +74,12 @@ A Flow Definition at minimum needs `id`, `version`, `steps`, and `gates`. Steps
74
74
  "expects": [
75
75
  {
76
76
  "id": "review-finding",
77
- "kind": "surface.claim",
77
+ "kind": "trust.bundle",
78
78
  "required": true,
79
79
  "description": "The change was reviewed and findings were recorded.",
80
- "claim": {
81
- "type": "my-kit.review.finding",
82
- "subject": "artifact",
80
+ "bundle_claim": {
81
+ "claimType": "my-kit.review.finding",
82
+ "subjectType": "artifact",
83
83
  "accepted_statuses": ["trusted", "accepted"]
84
84
  }
85
85
  }
@@ -270,7 +270,7 @@ Version constraints (e.g. minimum `flow-agents` version) are the only case where
270
270
 
271
271
  ### Evidence layering: Surface and Veritas
272
272
 
273
- Kit gates reference evidence using `"kind": "surface.claim"`. This is **Flow-native vocabulary**: Flow is built on Surface, so Surface claims are the expected evidence substrate at the Flow level. Surface claims are not a Flow Agents coupling.
273
+ Kit gates reference evidence using `"kind": "trust.bundle"` with a `bundle_claim` selector (`claimType`, optional `subjectType`, `accepted_statuses`). This is **Flow-native vocabulary** in the Hachure open trust-bundle format: Flow is built on Surface, so trust bundles are the expected evidence substrate at the Flow level, validated against Hachure's `trust-bundle.schema.json`. They are not a Flow Agents coupling. (Earlier Flow releases used `kind: "surface.claim"` with a `claim` selector; Flow 1.3.0 replaced that with `trust.bundle`, kontourai/flow#84.)
274
274
 
275
275
  Veritas is an **optional claim family** — a developer-repo specialization for evidence that has been through a trust pipeline. Kits may be opinionated about requiring Veritas-class evidence. Builder Kit requiring Veritas-class evidence is the kit's own policy choice, defined by Kontour as the kit author, not a platform requirement. Other kits may not require Veritas at all.
276
276
 
@@ -299,7 +299,8 @@ Output is stable JSON:
299
299
  "k2": false
300
300
  },
301
301
  "targets": ["flow", "flow-agents"],
302
- "third_party_extensions": []
302
+ "third_party_extensions": [],
303
+ "trust": "unverified"
303
304
  }
304
305
  ```
305
306
 
@@ -307,6 +308,98 @@ Exit code 0 when the kit is at least K0 (valid core container); exit code 1 when
307
308
 
308
309
  The `inspect` command is read-only and safe to run before install.
309
310
 
311
+ ## Trust axis: who vouches for a kit
312
+
313
+ The **trust axis** is a separate, orthogonal classification from the K-level capability axis. It answers the question "who vouches for this kit?" rather than "what does this kit contain?".
314
+
315
+ ### Two orthogonal axes
316
+
317
+ Every kit carries two independent badges:
318
+
319
+ | Axis | Values | Question answered |
320
+ |---|---|---|
321
+ | **Capability** (K-level) | K0 / K1 / K2 | What does the kit CONTAIN? (derived from assets) |
322
+ | **Trust** | first-party / verified / unverified | WHO vouches for it? (derived from provenance) |
323
+
324
+ A K2 kit can be `unverified`. A K0 kit can be `first-party`. The levels are independent.
325
+
326
+ **Marketplace listing format**: `Works with: Flow (gates-only) | K1 | ✓ First-party`
327
+
328
+ ### Trust levels
329
+
330
+ | Level | Meaning | How it is assigned (v1) |
331
+ |---|---|---|
332
+ | `first-party` | Kontour authored, tested, and ships this kit in the `@kontourai/flow-agents` package. | Kit id is in the internal FIRST_PARTY_KIT_IDS allowlist in `src/flow-kit/validate.ts`. |
333
+ | `verified` | Reserved for a future third-party verification process. | Not yet implemented; the value is reserved but not granted to any kit today. |
334
+ | `unverified` | Default for all kits not explicitly vouched for. | All other kits, including third-party community kits. |
335
+
336
+ `unverified` says nothing about the quality of a kit — it only means Kontour has not vouched for it through one of the above channels.
337
+
338
+ ### First-party kits (v1)
339
+
340
+ The first-party allowlist in v1 contains the kits authored by Kontour and distributed with the flow-agents package:
341
+
342
+ - `builder` — Builder Kit (shape, build, and deliver work)
343
+ - `knowledge` — Knowledge Kit (durable gated knowledge store)
344
+
345
+ Criteria for a kit to be first-party:
346
+ 1. Its directory lives under `kits/` in the `kontourai/flow-agents` repository.
347
+ 2. It is published as part of the `@kontourai/flow-agents` npm package.
348
+ 3. Kontour owns and maintains the kit's content and release lifecycle.
349
+
350
+ Third-party forks, community kits, or kits published under a different npm package are NOT first-party even if they share a similar id. First-party is tied to provenance in this specific repository and package.
351
+
352
+ ### Deferred: verified trust and cryptographic attestation (v2)
353
+
354
+ The `verified` value is reserved for a future verification process. The intended v2 path:
355
+
356
+ - Third-party kit authors can apply for `verified` status.
357
+ - Verification evidence: the kit passes the conformance kit self-certification + a cryptographic signature or Veritas attestation.
358
+ - The [conformance kit](https://github.com/kontourai/flow) and [Veritas claims](veritas-integration.md) are the natural substrate for this attestation layer.
359
+ - The signature or attestation would be checked by `flow-agents kit inspect` at derivation time.
360
+
361
+ v1 deliberately omits the signing/attestation mechanism and the verification process. The `verified` value is reserved so consuming tools can handle it when it arrives without a breaking schema change.
362
+
363
+ ### Inspecting trust
364
+
365
+ The `trust` field appears in `flow-agents kit inspect` output alongside `conformance`:
366
+
367
+ ```bash
368
+ npm run kit -- inspect kits/builder
369
+ ```
370
+
371
+ ```json
372
+ {
373
+ "kit_id": "builder",
374
+ "kit_name": "Builder Kit",
375
+ "conformance": {
376
+ "k0": true,
377
+ "k1": true,
378
+ "k2": false
379
+ },
380
+ "targets": ["flow", "flow-agents"],
381
+ "third_party_extensions": [],
382
+ "trust": "first-party"
383
+ }
384
+ ```
385
+
386
+ A third-party kit inspected before verification:
387
+
388
+ ```json
389
+ {
390
+ "kit_id": "my-custom-kit",
391
+ "kit_name": "My Custom Kit",
392
+ "conformance": {
393
+ "k0": true,
394
+ "k1": true,
395
+ "k2": false
396
+ },
397
+ "targets": ["flow", "flow-agents"],
398
+ "third_party_extensions": [],
399
+ "trust": "unverified"
400
+ }
401
+ ```
402
+
310
403
  ## Direction
311
404
 
312
405
  Flow Kits are designed to be shareable workflow units — authored once, carried across teams and workspaces. The intended growth path is distribution from git remotes and a curated Kontour kit catalog of Kontour-authored kits covering work modes beyond software delivery. Today install is local-path only; remote fetch is explicitly a non-goal in this version.
@@ -49,7 +49,7 @@ Governance tools such as Veritas belong at the Evidence boundary. Flow Agents sh
49
49
 
50
50
  ## Flow Kit Coordination
51
51
 
52
- Flow owns Flow Definition semantics: gates use typed `expects` entries, Surface requirements use `kind: "surface.claim"`, and project configuration owns trusted producer mappings plus gate overrides. Flow Agents should author, install, adapt, and control those assets for local runtimes; it should not become the authority source for claim trust or override semantics.
52
+ Flow owns Flow Definition semantics: gates use typed `expects` entries, Surface requirements use `kind: "trust.bundle"` (the Hachure-aligned gate kind), and project configuration owns trusted producer mappings plus gate overrides. Flow Agents should author, install, adapt, and control those assets for local runtimes; it should not become the authority source for claim trust or override semantics.
53
53
 
54
54
  The Kit Catalog is the Flow Agents index of installable Flow Kits. A Flow Kit can contain Flow Definitions, skills, docs, adapters, and evals, but the catalog points at those assets instead of defining gate behavior itself. Builder Kit is the first Kontour-authored kit and proves the path from shaping through build, verification, merge readiness, and learning.
55
55
 
@@ -65,7 +65,7 @@ Builder Kit vocabulary should be used in public and internal guidance:
65
65
  - Builder Kit: the coding/building kit shipped by this repo.
66
66
  - Probe: question-driven design and context challenge step, surfaced as `design-probe`.
67
67
 
68
- Builder Kit evidence gates can reference Surface trust state without naming a provider. A trust-backed gate may attach a TrustReport or Trust Snapshot ref for the relevant Surface claim, while Flow keeps authority over gate evaluation, trusted producer mapping, and route-back behavior. Surface remains the portable trust-state layer, and Veritas remains an optional producer rather than a required Builder Kit dependency.
68
+ Builder Kit evidence gates can reference Surface trust state without naming a provider. A trust-backed gate may attach a Hachure trust.bundle ref for the relevant Surface claim, while Flow keeps authority over gate evaluation, trusted producer mapping, and route-back behavior. Surface remains the portable trust-state layer, and Veritas remains an optional producer rather than a required Builder Kit dependency.
69
69
 
70
70
  ## Placement Rules
71
71