@kontourai/flow-agents 0.1.2 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (117) hide show
  1. package/.github/dependabot.yml +23 -0
  2. package/.github/workflows/release-please.yml +31 -0
  3. package/.github/workflows/runtime-compat.yml +118 -0
  4. package/CHANGELOG.md +46 -0
  5. package/CONTRIBUTING.md +4 -0
  6. package/README.md +80 -18
  7. package/build/src/cli/flow-kit.js +9 -4
  8. package/build/src/cli/init.js +215 -5
  9. package/build/src/cli/runtime-adapter.js +9 -5
  10. package/build/src/cli/telemetry-doctor.js +4 -1
  11. package/build/src/cli/utterance-check.js +65 -1
  12. package/build/src/runtime-adapters.js +34 -0
  13. package/build/src/tools/build-universal-bundles.js +285 -0
  14. package/build/src/tools/filter-installed-packs.js +3 -0
  15. package/build/src/tools/validate-source-tree.js +5 -1
  16. package/console.telemetry.json +115 -20
  17. package/context/scripts/telemetry/lib/config.sh +5 -1
  18. package/context/settings/flow-agents-settings.json +7 -0
  19. package/docs/_layouts/default.html +2 -0
  20. package/docs/context-map.md +1 -0
  21. package/docs/index.md +53 -4
  22. package/docs/integrations/conformance.md +246 -0
  23. package/docs/integrations/framework-adapter.md +275 -0
  24. package/docs/integrations/harness-install.md +213 -0
  25. package/docs/integrations/index.md +58 -0
  26. package/docs/integrations/knowledge-kit-live.md +211 -0
  27. package/docs/kit-authoring-guide.md +169 -0
  28. package/docs/north-star.md +2 -2
  29. package/docs/spec/runtime-hook-surface.md +525 -0
  30. package/docs/survey-utterance-check.md +211 -94
  31. package/docs/vision.md +45 -0
  32. package/evals/acceptance/run.sh +13 -2
  33. package/evals/acceptance/test_knowledge_kit_live.sh +221 -0
  34. package/evals/acceptance/test_opencode_harness.sh +121 -0
  35. package/evals/acceptance/test_pi_harness.sh +113 -0
  36. package/evals/integration/test_bundle_install.sh +226 -1
  37. package/evals/integration/test_bundle_lifecycle.sh +641 -0
  38. package/evals/integration/test_runtime_adapter_activation.sh +113 -1
  39. package/evals/integration/test_utterance_check.sh +291 -44
  40. package/evals/run.sh +2 -0
  41. package/evals/static/test_universal_bundles.sh +137 -2
  42. package/integrations/strands/README.md +256 -0
  43. package/integrations/strands/example.py +74 -0
  44. package/integrations/strands/examples/knowledge_kit_live.py +461 -0
  45. package/integrations/strands/flow_agents_strands/__init__.py +27 -0
  46. package/integrations/strands/flow_agents_strands/hooks.py +194 -0
  47. package/integrations/strands/flow_agents_strands/policy.py +348 -0
  48. package/integrations/strands/flow_agents_strands/steering.py +225 -0
  49. package/integrations/strands/flow_agents_strands/telemetry.py +238 -0
  50. package/integrations/strands/pyproject.toml +38 -0
  51. package/integrations/strands/tests/__init__.py +0 -0
  52. package/integrations/strands/tests/test_hooks.py +392 -0
  53. package/integrations/strands/tests/test_policy.py +315 -0
  54. package/integrations/strands/tests/test_telemetry.py +184 -0
  55. package/integrations/strands-ts/README.md +224 -0
  56. package/integrations/strands-ts/bin/conformance-shim.mjs +257 -0
  57. package/integrations/strands-ts/package.json +53 -0
  58. package/integrations/strands-ts/src/hooks.ts +312 -0
  59. package/integrations/strands-ts/src/index.ts +22 -0
  60. package/integrations/strands-ts/src/policy.ts +345 -0
  61. package/integrations/strands-ts/src/telemetry.ts +251 -0
  62. package/integrations/strands-ts/test/test-policy.ts +322 -0
  63. package/integrations/strands-ts/test/test-steering.ts +159 -0
  64. package/integrations/strands-ts/test/test-telemetry.ts +226 -0
  65. package/integrations/strands-ts/tsconfig.json +20 -0
  66. package/kits/catalog.json +6 -0
  67. package/kits/knowledge/adapters/default-store/index.js +821 -0
  68. package/kits/knowledge/adapters/flow-runner/index.js +1179 -0
  69. package/kits/knowledge/adapters/flow-runner/telemetry.js +174 -0
  70. package/kits/knowledge/docs/README.md +135 -0
  71. package/kits/knowledge/docs/store-contract.md +526 -0
  72. package/kits/knowledge/evals/consolidation/suite.test.js +1234 -0
  73. package/kits/knowledge/evals/contract-suite/suite.test.js +670 -0
  74. package/kits/knowledge/evals/ingest-compile/suite.test.js +574 -0
  75. package/kits/knowledge/evals/synthesis/suite.test.js +909 -0
  76. package/kits/knowledge/flows/compile.flow.json +60 -0
  77. package/kits/knowledge/flows/consolidate.flow.json +77 -0
  78. package/kits/knowledge/flows/ingest.flow.json +60 -0
  79. package/kits/knowledge/flows/store-contract.flow.json +48 -0
  80. package/kits/knowledge/flows/synthesize.flow.json +77 -0
  81. package/kits/knowledge/kit.json +78 -0
  82. package/package.json +7 -2
  83. package/packaging/conformance/README.md +142 -0
  84. package/packaging/conformance/fixtures/config-protection--allow-no-path.json +18 -0
  85. package/packaging/conformance/fixtures/config-protection--allow-safe-file.json +20 -0
  86. package/packaging/conformance/fixtures/config-protection--block-biome.json +20 -0
  87. package/packaging/conformance/fixtures/config-protection--block-eslintrc.json +20 -0
  88. package/packaging/conformance/fixtures/quality-gate--allow-no-path.json +17 -0
  89. package/packaging/conformance/fixtures/quality-gate--allow-nonexistent-file.json +19 -0
  90. package/packaging/conformance/fixtures/stop-goal-fit--allow-clean-cwd.json +17 -0
  91. package/packaging/conformance/fixtures/stop-goal-fit--block-strict-mode.json +23 -0
  92. package/packaging/conformance/fixtures/stop-goal-fit--warn-active-delivery.json +21 -0
  93. package/packaging/conformance/fixtures/workflow-steering--allow-no-state.json +16 -0
  94. package/packaging/conformance/fixtures/workflow-steering--inject-active-state.json +29 -0
  95. package/packaging/conformance/fixtures/workflow-steering--inject-subagent-steering.json +25 -0
  96. package/packaging/conformance/package.json +4 -0
  97. package/packaging/conformance/run-conformance.js +322 -0
  98. package/packaging/manifest.json +59 -0
  99. package/schemas/flow-agents-settings.schema.json +48 -0
  100. package/scripts/README.md +4 -0
  101. package/scripts/dogfood.js +16 -0
  102. package/scripts/hooks/opencode-hook-adapter.js +123 -0
  103. package/scripts/hooks/opencode-telemetry-hook.js +101 -0
  104. package/scripts/hooks/pi-hook-adapter.js +123 -0
  105. package/scripts/hooks/pi-telemetry-hook.js +105 -0
  106. package/scripts/hooks/run-hook.js +8 -0
  107. package/scripts/hooks/utterance-check.js +124 -22
  108. package/scripts/telemetry/lib/config.sh +5 -1
  109. package/src/cli/flow-kit.ts +10 -4
  110. package/src/cli/init.ts +219 -6
  111. package/src/cli/runtime-adapter.ts +10 -5
  112. package/src/cli/telemetry-doctor.ts +4 -1
  113. package/src/cli/utterance-check.ts +71 -1
  114. package/src/runtime-adapters.ts +35 -0
  115. package/src/tools/build-universal-bundles.ts +283 -0
  116. package/src/tools/filter-installed-packs.ts +3 -0
  117. package/src/tools/validate-source-tree.ts +5 -1
@@ -0,0 +1,213 @@
1
+ ---
2
+ title: Harness Install
3
+ ---
4
+
5
+ # Harness Install
6
+
7
+ This page walks through three harness installs: Claude Code (the L2 reference runtime), opencode, and pi. All three follow the same model — `npm run build:bundles` generates the bundle, `flow-agents init` places it — but each runtime expects different files at different paths.
8
+
9
+ ## How harness bundles work
10
+
11
+ `npm run build:bundles` generates one bundle per runtime under `dist/<runtime>/`. Each bundle contains:
12
+
13
+ - A host-specific configuration file that maps lifecycle events to shell commands invoking the canonical hook adapter wrapper.
14
+ - A host-specific adapter wrapper (`<runtime>-hook-adapter.js`) that reads stdin JSON from the host, invokes `run-hook.js` with the canonical script path and profile, translates the exit code to the host-native response format, and fails open on errors.
15
+ - A host-specific telemetry wrapper (`<runtime>-telemetry-hook.js`) that maps host event names to canonical telemetry event names and invokes `scripts/telemetry/telemetry.sh`.
16
+ - An `install.sh` that places the generated files at the host-expected paths.
17
+
18
+ `flow-agents init` (from `npx @kontourai/flow-agents`) calls `install.sh` for the selected runtime.
19
+
20
+ ## Claude Code
21
+
22
+ Claude Code is the L2 reference implementation. All four policy classes are wired: workflow steering, quality gate, stop-goal-fit, and config protection.
23
+
24
+ ### Install
25
+
26
+ ```bash
27
+ npx @kontourai/flow-agents init --runtime claude-code --dest /path/to/workspace --yes
28
+ ```
29
+
30
+ The install script writes hook wiring into `.claude/settings.json` inside the destination workspace. The hooks object in `settings.json` maps Claude Code lifecycle events (`UserPromptSubmit`, `PreToolUse`, `PostToolUse`, `Stop`) to shell commands invoking the adapter:
31
+
32
+ ```bash
33
+ bash -lc 'root="${FLOW_AGENTS_CLAUDE_CODE_ROOT:-$(pwd)}"; \
34
+ node "$root/scripts/hooks/claude-telemetry-hook.js" UserPromptSubmit dev'
35
+ bash -lc 'root="${FLOW_AGENTS_CLAUDE_CODE_ROOT:-$(pwd)}"; \
36
+ node "$root/scripts/hooks/claude-hook-adapter.js" UserPromptSubmit \
37
+ workflow-steering workflow-steering.js default'
38
+ ```
39
+
40
+ Telemetry always fires first and is always non-blocking (timeout: 10 s). Policy hooks fire second and may block on `PreToolUse` (timeout: 30 s). Both fail open on hook runtime errors.
41
+
42
+ ### Dogfood variant (repo-local)
43
+
44
+ Inside the `flow-agents` source repo itself, the dogfood script writes hook wiring that points at the local `scripts/hooks/` directory rather than a published package:
45
+
46
+ ```bash
47
+ npm run dogfood -- --runtime claude-code
48
+ ```
49
+
50
+ The destination defaults to the repo root. Pass `--dest` to override.
51
+
52
+ ### Scope-collision warning
53
+
54
+ When `init` detects that an existing `.claude/settings.json` already has hooks entries for the same lifecycle events, it emits a scope-collision warning to stderr:
55
+
56
+ ```
57
+ [flow-agents] WARNING: .claude/settings.json already has hooks for UserPromptSubmit.
58
+ Existing entries will be preserved; Flow Agents hooks will be appended.
59
+ Review .claude/settings.json to confirm hook ordering is correct.
60
+ ```
61
+
62
+ The install appends rather than replaces, so existing hooks are not removed. Review the settings file after install to confirm the ordering is what you want.
63
+
64
+ ### Resulting file layout
65
+
66
+ ```
67
+ <workspace>/
68
+ .claude/
69
+ settings.json ← hook wiring (appended by install)
70
+ scripts/
71
+ hooks/
72
+ claude-hook-adapter.js
73
+ claude-telemetry-hook.js
74
+ run-hook.js
75
+ config-protection.js
76
+ quality-gate.js
77
+ stop-goal-fit.js
78
+ workflow-steering.js
79
+
80
+ skills/
81
+
82
+ .flow-agents/ ← runtime workflow artifacts (not committed)
83
+ ```
84
+
85
+ ## opencode
86
+
87
+ opencode is an L1 adapter. It has no native `prompt.submit`-equivalent event, so workflow steering is approximated at `session.created` rather than at each user turn. This is a documented gap: see <a href="../spec/runtime-hook-surface.html">the spec, section 2.1</a>.
88
+
89
+ ### Install
90
+
91
+ ```bash
92
+ npx @kontourai/flow-agents init --runtime opencode --dest /path/to/workspace --yes
93
+ ```
94
+
95
+ ### Dogfood variant
96
+
97
+ ```bash
98
+ npm run dogfood -- --runtime opencode
99
+ ```
100
+
101
+ ### Resulting file layout
102
+
103
+ ```
104
+ <workspace>/
105
+ .opencode/
106
+ plugins/
107
+ flow-agents.js ← auto-loaded at opencode startup
108
+ agents/
109
+ dev.md ← agent prompts (opencode markdown format)
110
+ tool-planner.md
111
+ tool-worker.md
112
+
113
+ skills/
114
+ deliver.md
115
+ fix-bug.md
116
+
117
+ opencode.json ← workspace instructions pointer
118
+ scripts/
119
+ hooks/
120
+ opencode-hook-adapter.js
121
+ opencode-telemetry-hook.js
122
+ run-hook.js
123
+
124
+ skills/
125
+
126
+ ```
127
+
128
+ `opencode.json` at the workspace root is a minimal config file:
129
+
130
+ ```json
131
+ {
132
+ "instructions": "This workspace uses Flow Agents. See AGENTS.md for conventions, skills, and workflow guidance."
133
+ }
134
+ ```
135
+
136
+ The plugin at `.opencode/plugins/flow-agents.js` is auto-loaded at opencode startup. It exports `FlowAgentsPlugin` and registers handlers for:
137
+
138
+ | opencode event | What fires |
139
+ | --- | --- |
140
+ | `session.created` | Telemetry + workflow steering (session-start context injection) |
141
+ | `tool.execute.before` | Telemetry + config-protection (blocking via thrown Error) |
142
+ | `tool.execute.after` | Telemetry + quality gate |
143
+ | `session.idle` | Telemetry + stop-goal-fit (warning only — not a true stop event) |
144
+ | `session.error`, `session.compacted`, `permission.asked`, `file.edited` | Telemetry only |
145
+
146
+ **Accepted gaps**: opencode has no `prompt.submit` hook, so workflow steering fires only on `session.created` — not at each user turn. `session.idle` is the closest event to a stop hook but does not reliably fire on session completion. These gaps are declared in the conformance level (L1) and in the plugin source comments.
147
+
148
+ **Agents**: opencode receives agent prompts as markdown files in `.opencode/agents/`. The main orchestrator is `dev.md`; specialist tools (planner, worker, reviewer, etc.) are additional markdown files in the same directory.
149
+
150
+ ## pi
151
+
152
+ pi is an L1 adapter. It has no stop hook, so stop-goal-fit cannot fire at session end. This is a documented gap: see <a href="../spec/runtime-hook-surface.html">the spec, section 2.3</a>.
153
+
154
+ ### Install
155
+
156
+ ```bash
157
+ npx @kontourai/flow-agents init --runtime pi --dest /path/to/workspace --yes
158
+ ```
159
+
160
+ ### Dogfood variant
161
+
162
+ ```bash
163
+ npm run dogfood -- --runtime pi
164
+ ```
165
+
166
+ ### Resulting file layout
167
+
168
+ ```
169
+ <workspace>/
170
+ .pi/
171
+ extensions/
172
+ flow-agents.ts ← auto-discovered at startup (needs project trust)
173
+ skills/
174
+ deliver.md
175
+ fix-bug.md
176
+
177
+ AGENTS.md ← agent instructions (pi uses AGENTS.md, not a registry)
178
+ scripts/
179
+ hooks/
180
+ pi-hook-adapter.js
181
+ pi-telemetry-hook.js
182
+ run-hook.js
183
+
184
+ skills/
185
+
186
+ ```
187
+
188
+ The extension at `.pi/extensions/flow-agents.ts` is auto-discovered at startup. It registers handlers for:
189
+
190
+ | pi event | What fires |
191
+ | --- | --- |
192
+ | `session_start` | Telemetry |
193
+ | `before_agent_start` | Telemetry + workflow steering (injects context into system prompt) |
194
+ | `tool_call` | Telemetry + config-protection (blocking via `{ block: true }` return) |
195
+ | `tool_result` | Telemetry + quality gate |
196
+ | `session_shutdown` | Telemetry + stop-goal-fit (warning only — not a true stop event) |
197
+
198
+ **Accepted gaps**: pi has no stop hook. `session_shutdown` is used as the closest equivalent but does not carry the same semantics as a stop event. This gap is declared in the conformance level (L1) and in the extension source comments.
199
+
200
+ **Agents**: pi has no named-subagent registry. Agent guidance is delivered through `AGENTS.md` at the workspace root, plus the skills in `.pi/skills/` and the extension. The `flow-agents.ts` extension comment says explicitly: "pi has no named-subagent registry. Agents are not exported for pi."
201
+
202
+ ### Scope-collision warning
203
+
204
+ Same behavior as Claude Code: if an existing `.pi/extensions/` directory contains a file with conflicting event registrations, `init` warns and appends. Review the extension file after install.
205
+
206
+ ## Related references
207
+
208
+ - `dist/opencode/` — generated opencode bundle (do not edit by hand)
209
+ - `dist/pi/` — generated pi bundle (do not edit by hand)
210
+ - `dist/claude-code/` — generated Claude Code bundle
211
+ - `scripts/hooks/run-hook.js` — canonical hook runner
212
+ - <a href="../spec/runtime-hook-surface.html">Runtime Hook Surface spec</a> — event taxonomy, policy classes, conformance levels
213
+ - <a href="conformance.html">Conformance</a> — how to self-certify a new adapter
@@ -0,0 +1,58 @@
1
+ ---
2
+ title: Integration Examples
3
+ ---
4
+
5
+ # Integration Examples
6
+
7
+ Flow Agents reaches host runtimes and agent frameworks through two distinct distribution models. This section provides worked examples for each model and a guide to the conformance kit for third-party adapter authors.
8
+
9
+ ## Distribution models at a glance
10
+
11
+ **Harness runtimes** ship as self-contained bundles under `dist/<runtime>/`. The `npm run build:bundles` command generates each bundle from the canonical manifest and policy scripts. `flow-agents init` (or the dogfood variant) places the generated files at the host-expected paths inside a target workspace. Claude Code, Codex, Kiro, opencode, and pi are harness adapters.
12
+
13
+ **Framework adapters** live in `integrations/<name>/` as language-native packages. They register Flow Agents callbacks with the framework's lifecycle system using the framework's native registration API. `integrations/strands/` is the reference implementation: `flow-agents-strands` is a Python `HookProvider` that wires into AWS Strands Agents without requiring the Strands SDK at import time.
14
+
15
+ **Third-party adapters** self-certify by running the conformance kit in `packaging/conformance/`. The kit provides golden fixtures and a runner that pipes each fixture through the adapter command and reports per-level verdict.
16
+
17
+ ## Conformance levels
18
+
19
+ | Level | What is required |
20
+ | --- | --- |
21
+ | L0 | Telemetry only — at least `agentSpawn` fires on session start |
22
+ | L1 | L0 plus workflow steering and stop-goal-fit in warning mode |
23
+ | L2 | L1 plus config protection (blocking) and quality gate — the reference level |
24
+
25
+ Claude Code and Codex are L2 reference implementations. opencode is L1 (no prompt-submit hook). pi is L1 (no stop hook). The Strands adapter is L0 plus config protection via `BeforeToolCallEvent` cancellation.
26
+
27
+ The <a href="../spec/runtime-hook-surface.html">Runtime Hook Surface spec</a> defines the canonical event taxonomy, policy classes, conformance levels, and engine contract in full.
28
+
29
+ ## Pages in this section
30
+
31
+ <div class="doc-grid">
32
+ <a class="doc-card" href="harness-install.html">
33
+ <strong>Harness Install</strong>
34
+ <span>Worked example installing into a Claude Code project, and the two newest runtimes: opencode and pi. Includes the dogfood variant and scope-collision warning behavior.</span>
35
+ </a>
36
+ <a class="doc-card" href="framework-adapter.html">
37
+ <strong>Framework Adapter</strong>
38
+ <span>Worked example based on <code>integrations/strands/</code>: constructing FlowAgentsHooks, telemetry emitted, the engine-contract binding for policy, and documented limitations.</span>
39
+ </a>
40
+ <a class="doc-card" href="conformance.html">
41
+ <strong>Conformance</strong>
42
+ <span>How a third-party adapter self-certifies: the engine contract 1.0, running the conformance runner, what each level requires, and how to declare gaps.</span>
43
+ </a>
44
+ <a class="doc-card" href="../spec/runtime-hook-surface.html">
45
+ <strong>Runtime Hook Surface Spec</strong>
46
+ <span>Canonical event taxonomy, four policy classes, conformance levels L0/L1/L2, mapping tables, and the engine contract for adapter authors.</span>
47
+ </a>
48
+ <a class="doc-card" href="knowledge-kit-live.html">
49
+ <strong>Knowledge Kit Live Example</strong>
50
+ <span>End-to-end proof of the Knowledge Kit ingest + compile flows against a real Strands agent (OllamaModel / qwen3:1.7b). No API key required. Includes acceptance test with telemetry and provenance assertions.</span>
51
+ </a>
52
+ </div>
53
+
54
+ ---
55
+
56
+ ## TypeScript native-import adapter
57
+
58
+ `integrations/strands-ts/` (`@kontourai/flow-agents-strands`) is the first native-import consumer of the policy engine contract. It binds the `config-protection.js` `run()` function directly — no subprocess on the hot path. Achieves **L2** conformance. See `integrations/strands-ts/README.md` and the [Framework Adapter](framework-adapter.html) page for the full comparison with the Python adapter.
@@ -0,0 +1,211 @@
1
+ ---
2
+ title: Knowledge Kit Live Example
3
+ ---
4
+
5
+ # Knowledge Kit Live Example
6
+
7
+ This page documents `integrations/strands/examples/knowledge_kit_live.py`: a keyless, ollama-backed end-to-end proof of the Knowledge Kit's ingest and compile flows running against a real Strands agent.
8
+
9
+ Everything on this page is grounded in the source files and in the acceptance test that was run to validate the commands. Limitations are documented honestly.
10
+
11
+ ## What it proves
12
+
13
+ The example exercises the full `knowledge.ingest` → `knowledge.compile` pipeline in a temporary workspace:
14
+
15
+ - Two raw records are created programmatically via direct Node.js subprocess calls to the kit's flow-runner (`kits/knowledge/adapters/flow-runner/index.js`).
16
+ - One raw record is created by the Strands agent calling the `capture_knowledge` tool.
17
+ - The Strands agent calls `compile_knowledge` with all three raw record IDs, producing a compiled record with verified provenance links.
18
+
19
+ Two telemetry streams are asserted:
20
+
21
+ | Stream | Path | Contents |
22
+ | --- | --- | --- |
23
+ | Kit gate telemetry | `<workspace>/.telemetry/full.jsonl` | `tool.invoke` + `tool.result` per ingest/compile gate point |
24
+ | Session telemetry | `<workspace>/.flow-agents/.telemetry/full.jsonl` | `session.start`, `turn.user`, `tool.invoke`, `tool.result`, `session.end` from FlowAgentsHooks |
25
+
26
+ ## Prerequisites
27
+
28
+ - ollama installed and `qwen3:1.7b` pulled:
29
+
30
+ ```bash
31
+ ollama pull qwen3:1.7b
32
+ ```
33
+
34
+ - Python venv with `strands-agents[ollama]` at `/tmp/strands-py-live/venv`:
35
+
36
+ ```bash
37
+ python3 -m venv /tmp/strands-py-live/venv
38
+ /tmp/strands-py-live/venv/bin/pip install 'strands-agents[ollama]'
39
+ ```
40
+
41
+ - Node.js on PATH (for the kit's ESM flow-runner and bridge script).
42
+
43
+ ## Running the example
44
+
45
+ ```bash
46
+ # From the repo root:
47
+ ollama serve &
48
+ FLOW_AGENTS_ROOT=$(pwd) \
49
+ /tmp/strands-py-live/venv/bin/python3 \
50
+ integrations/strands/examples/knowledge_kit_live.py
51
+ ```
52
+
53
+ Expected output (session IDs and UUIDs vary):
54
+
55
+ ```
56
+ === Knowledge Kit S5: Keyless Live Example ===
57
+ Repo root: /path/to/flow-agents
58
+
59
+ Node.js: v24.16.0
60
+ Workspace: /tmp/knowledge-kit-live-xxxxxxxx
61
+ Corpus: 3 doc snippets
62
+ docs/integrations/framework-adapter.md (engineering.docs)
63
+ docs/integrations/index.md (engineering.docs)
64
+ kits/knowledge/docs/README.md (research.notes)
65
+
66
+ --- Step 1: Programmatic captures (2 records) ---
67
+ docs/integrations/framework-adapter.md → <raw-id-1>
68
+ docs/integrations/index.md → <raw-id-2>
69
+
70
+ --- Step 2: Agent-driven capture ---
71
+ Agent turn: 2.9s
72
+ Reply snippet: 'The captured knowledge record has been successfully stored with ID: ...'
73
+ Raw records in store: 3
74
+
75
+ --- Step 3: Agent-driven compile ---
76
+ Agent turn: 4.3s
77
+ Reply snippet: 'The compiled knowledge record has been successfully created with ID: ...'
78
+ Compiled records in store: 1
79
+
80
+ --- Provenance verification ---
81
+ Compiled record: <compiled-id>
82
+ Source IDs present in provenance: True
83
+ Source links in graph index: 3
84
+
85
+ Kit gate telemetry (.telemetry/full.jsonl): 18 events
86
+ [tool.invoke] knowledge.ingest.classify-gate
87
+ [tool.result] knowledge.ingest.classify-gate
88
+ ...
89
+ [tool.invoke] knowledge.compile.link-gate
90
+ [tool.result] knowledge.compile.link-gate
91
+
92
+ Session telemetry (.flow-agents/.telemetry/full.jsonl): 9 events
93
+ [session.start]
94
+ [turn.user]
95
+ [tool.invoke] (capture_knowledge)
96
+ [tool.result] (capture_knowledge)
97
+ [session.end]
98
+ [turn.user]
99
+ [tool.invoke] (compile_knowledge)
100
+ [tool.result] (compile_knowledge)
101
+ [session.end]
102
+
103
+ --- Summary ---
104
+ Kit event types: ['tool.invoke', 'tool.result']
105
+ Session event types: ['session.end', 'session.start', 'tool.invoke', 'tool.result', 'turn.user']
106
+ Raw records: 3
107
+ Compiled records: 1
108
+ Provenance ok: True
109
+
110
+ Overall: PASS
111
+ ```
112
+
113
+ ## Running the acceptance test
114
+
115
+ The acceptance harness gates on ollama binary, model presence, and venv presence. If any gate is absent it skips cleanly.
116
+
117
+ ```bash
118
+ # Run the knowledge-kit-live acceptance test directly:
119
+ bash evals/acceptance/test_knowledge_kit_live.sh
120
+
121
+ # Or through the acceptance runner:
122
+ bash evals/acceptance/run.sh knowledge-kit-live
123
+ ```
124
+
125
+ The harness asserts:
126
+
127
+ | Assertion | What is checked |
128
+ | --- | --- |
129
+ | A1 | Example script exits 0 |
130
+ | A2 | `<workspace>/.telemetry/full.jsonl` contains `tool.invoke` + `tool.result` |
131
+ | A3 | `<workspace>/.flow-agents/.telemetry/full.jsonl` contains `session.start`, `tool.invoke`, `tool.result` |
132
+ | A4 | No `.telemetry` directory leaked to the workspace parent |
133
+ | A5 | At least 1 compiled record in the knowledge store |
134
+ | A6 | Compiled record has `source_ids` provenance referencing raw records |
135
+
136
+ ## How the kit tools work
137
+
138
+ The example defines two Strands `@tool` functions that call the kit's flow-runner via Node.js subprocess:
139
+
140
+ ```python
141
+ @tool
142
+ def capture_knowledge(text: str, category: str) -> str:
143
+ """Capture raw knowledge text. Returns JSON: {"id": "<uuid>"}."""
144
+ meta_json = json.dumps({"category": category})
145
+ data = _call_node_bridge(bridge, "capture", text, meta_json, workspace=workspace)
146
+ return json.dumps(data)
147
+
148
+ @tool
149
+ def compile_knowledge(id1: str, id2: str, id3: str) -> str:
150
+ """Compile three raw records into a compiled record. Returns JSON: {"id": ...}."""
151
+ raw_ids = [i for i in [id1, id2, id3] if i and i.strip()]
152
+ data = _call_node_bridge(bridge, "compile", json.dumps(raw_ids), workspace=workspace)
153
+ return json.dumps(data)
154
+ ```
155
+
156
+ The bridge script (`_kit_bridge.mjs`) is written into the workspace at runtime. It imports the kit's ESM modules using absolute paths resolved from `FLOW_AGENTS_ROOT`:
157
+
158
+ ```javascript
159
+ import { DefaultKnowledgeStore } from "<FLOW_AGENTS_ROOT>/kits/knowledge/adapters/default-store/index.js";
160
+ import { capture, compile } from "<FLOW_AGENTS_ROOT>/kits/knowledge/adapters/flow-runner/index.js";
161
+ ```
162
+
163
+ Kit gate telemetry is written by the Node flow-runner to `<workspace>/.telemetry/full.jsonl` (via the `FLOW_AGENTS_WORKSPACE` env var). This path is separate from the FlowAgentsHooks telemetry path (`<workspace>/.flow-agents/.telemetry/full.jsonl`) — both files are asserted in the acceptance test.
164
+
165
+ ## Why two programmatic + one agent-driven capture
166
+
167
+ `qwen3:1.7b` (1.7B parameters) reliably calls single-tool prompts, but complex multi-capture prompts cause it to loop or produce unexpected output. The example uses programmatic captures for the first two records to keep runtime bounded (~30 seconds total), and agent-driven calls for the third capture and the compile step. This gives evidence that:
168
+
169
+ - The `capture_knowledge` and `compile_knowledge` tools are callable from a real Strands agent.
170
+ - FlowAgentsHooks records session events for those calls.
171
+ - The kit's gate telemetry is written correctly for all operations regardless of call path.
172
+
173
+ The acceptance harness asserts on filesystem evidence, not on model output quality.
174
+
175
+ ## console.telemetry.json mapping
176
+
177
+ A `knowledge` flow entry is registered in `console.telemetry.json` to make knowledge flow events visible in the Flow Agents Console:
178
+
179
+ ```json
180
+ {
181
+ "id": "knowledge",
182
+ "label": "Knowledge flows",
183
+ "match": { "attribute": "flow", "includes": "knowledge." },
184
+ "titleAttribute": "title",
185
+ "detailAttributes": { ... }
186
+ }
187
+ ```
188
+
189
+ This matches telemetry events where the `flow` attribute includes `"knowledge."` — for example, the kit gate events emitted by the flow-runner use `knowledge.ingest` and `knowledge.compile` as the flow identifiers.
190
+
191
+ ## Documented limitations
192
+
193
+ 1. **Model quality**: `qwen3:1.7b` is a 1.7B parameter model. It works for single-tool prompts but has limited reliability for complex multi-step instructions. Larger models will work more reliably but require API keys or more memory.
194
+
195
+ 2. **Single-turn scope**: Each agent invocation covers one operation. Multi-turn chaining with full context tracking across many captures is out of scope for this sprint.
196
+
197
+ 3. **Steering seam**: The `FlowAgentsHooks` spike injects workflow steering context once at `Agent` construction time. Per-turn steering re-evaluation is not implemented. See `docs/integrations/framework-adapter.md` § Limitations for details.
198
+
199
+ 4. **Kit telemetry path**: The kit's flow-runner writes telemetry to `<workspace>/.telemetry/full.jsonl` (not the `.flow-agents/.telemetry/` subdirectory used by `FlowAgentsHooks`). Both paths are separate by design: kit telemetry captures gate-point evidence, session telemetry captures agent lifecycle events.
200
+
201
+ 5. **compile_knowledge tool signature**: The tool takes three separate `id1`, `id2`, `id3` parameters instead of a JSON array. This is because `qwen3:1.7b` does not reliably produce valid JSON array syntax when prompted. This signature change is limited to this example and does not affect the kit's flow-runner API.
202
+
203
+ ## Related references
204
+
205
+ - `integrations/strands/examples/knowledge_kit_live.py` — the example script
206
+ - `evals/acceptance/test_knowledge_kit_live.sh` — the acceptance test
207
+ - `kits/knowledge/adapters/flow-runner/index.js` — the kit flow-runner (capture + compile)
208
+ - `kits/knowledge/adapters/default-store/index.js` — the store adapter
209
+ - `kits/knowledge/kit.json` — kit manifest
210
+ - <a href="framework-adapter.html">Framework Adapter</a> — `FlowAgentsHooks` documentation and limitations
211
+ - <a href="../spec/runtime-hook-surface.html">Runtime Hook Surface spec</a> — canonical event taxonomy
@@ -0,0 +1,169 @@
1
+ ---
2
+ title: Flow Kit Authoring Guide
3
+ ---
4
+
5
+ # Flow Kit Authoring Guide
6
+
7
+ A Flow Kit is a portable workflow bundle you author once and install into any Flow Agents workspace. It lets you package one or more Flow Definitions — plus optional skills, docs, adapters, evals, and assets — under a single validated manifest. The same install, validation, and activation path that ships the built-in Builder Kit is available to your own kits.
8
+
9
+ This guide walks you from an empty directory to a validated, locally installed kit.
10
+
11
+ ## Concepts
12
+
13
+ - **Kit** — a directory with a root `kit.json` manifest and the assets it declares. The manifest is the contract; Flow Agents validates it before anything is copied.
14
+ - **Flow Definition** — a `.flow.json` file that declares steps, gates, and expected evidence. Validation of the Flow Definition semantics belongs to [Kontour Flow](https://kontourai.github.io/flow/); the kit contract delegates to it.
15
+ - **Activation** — the step that reads the installed kit and writes runtime-local files into your workspace. Today the `codex-local` adapter is the only adapter, and it activates only Flow Definition assets.
16
+
17
+ ## Directory layout
18
+
19
+ ```text
20
+ my-kit/
21
+ kit.json ← required manifest
22
+ flows/
23
+ review.flow.json ← at least one Flow Definition
24
+ docs/ ← optional
25
+ README.md
26
+ ```
27
+
28
+ All paths declared in `kit.json` must be relative to the kit directory and must not contain `..`. The kit must be fully self-contained so it can be installed from any machine or worktree.
29
+
30
+ ## Minimal kit.json
31
+
32
+ ```json
33
+ {
34
+ "schema_version": "1.0",
35
+ "id": "my-kit",
36
+ "name": "My Kit",
37
+ "description": "A minimal kit that adds a review flow.",
38
+ "flows": [
39
+ {
40
+ "id": "my-kit.review",
41
+ "path": "flows/review.flow.json",
42
+ "description": "Review a change against agreed criteria."
43
+ }
44
+ ]
45
+ }
46
+ ```
47
+
48
+ Required fields:
49
+
50
+ | Field | Rule |
51
+ |---|---|
52
+ | `schema_version` | Must be `"1.0"` |
53
+ | `id` | Stable kebab-case string, e.g. `review-kit` |
54
+ | `name` | Non-empty display name |
55
+ | `flows` | Non-empty list; each entry must have `id` and `path` |
56
+
57
+ Optional fields: `product_name`, `description`, `skills`, `docs`, `adapters`, `evals`, `assets`. Optional fields list relative asset paths or objects with `id`, `path`, and optional `description`. They are declared for provenance but only Flow Definition assets are activated today; others appear in diagnostics as `skipped_assets`.
58
+
59
+ ## Minimal flow file
60
+
61
+ A Flow Definition at minimum needs `id`, `version`, `steps`, and `gates`. Steps form a linked list; each gate names the step it guards and the evidence it expects.
62
+
63
+ ```json
64
+ {
65
+ "id": "my-kit.review",
66
+ "version": "1.0",
67
+ "steps": [
68
+ { "id": "review", "next": "done" },
69
+ { "id": "done", "next": null }
70
+ ],
71
+ "gates": {
72
+ "review-gate": {
73
+ "step": "review",
74
+ "expects": [
75
+ {
76
+ "id": "review-finding",
77
+ "kind": "surface.claim",
78
+ "required": true,
79
+ "description": "The change was reviewed and findings were recorded.",
80
+ "claim": {
81
+ "type": "my-kit.review.finding",
82
+ "subject": "artifact",
83
+ "accepted_statuses": ["trusted", "accepted"]
84
+ }
85
+ }
86
+ ]
87
+ }
88
+ }
89
+ }
90
+ ```
91
+
92
+ The `id` in the flow file should match the `id` declared in `kit.json`'s `flows` list. Look at `kits/builder/flows/shape.flow.json` and `kits/builder/flows/build.flow.json` in this repository for fuller examples of multi-step flows with required and optional gate evidence.
93
+
94
+ ## Validate
95
+
96
+ Before installing or sharing a kit, run validation from the flow-agents checkout:
97
+
98
+ ```bash
99
+ npm run validate:source -- --kit path/to/my-kit
100
+ ```
101
+
102
+ This runs the same repository contract validation used by `install-local`. A validation failure exits nonzero with a diagnostic. Fix errors and re-run until it passes cleanly.
103
+
104
+ The full source-tree validation (no `--kit` flag) additionally validates the built-in catalog and Builder Kit:
105
+
106
+ ```bash
107
+ npm run validate:source --
108
+ ```
109
+
110
+ ## Install locally
111
+
112
+ Once validation passes, install the kit into a target workspace:
113
+
114
+ ```bash
115
+ npx @kontourai/flow-agents flow-kit install-local path/to/my-kit --dest /path/to/workspace
116
+ ```
117
+
118
+ `--dest` is the installed Flow Agents bundle root. When omitted the command uses the current directory. From a contributor checkout of this repository, the equivalent form is `npm run flow-kit -- <command>`.
119
+
120
+ Confirm the install:
121
+
122
+ ```bash
123
+ npx @kontourai/flow-agents flow-kit list --dest /path/to/workspace
124
+ npx @kontourai/flow-agents flow-kit status my-kit --dest /path/to/workspace
125
+ ```
126
+
127
+ `list` prints one summary line per installed kit. `status` prints JSON provenance including the SHA256 content hash and `installed` or `missing` state.
128
+
129
+ To replace an existing install after you update the kit source:
130
+
131
+ ```bash
132
+ npx @kontourai/flow-agents flow-kit install-local path/to/my-kit --dest /path/to/workspace --update
133
+ ```
134
+
135
+ ## Activate
136
+
137
+ After installing, run activate to write runtime-local files into the workspace:
138
+
139
+ ```bash
140
+ npx @kontourai/flow-agents flow-kit activate --dest /path/to/workspace --format json
141
+ ```
142
+
143
+ The `codex-local` adapter is selected automatically. It writes Flow Definition copies under `.flow-agents/runtime/codex/flows/<kit-id>/` and an `activation.json` manifest. Declared `skills`, `docs`, `adapters`, `evals`, and `assets` are recorded as `skipped_assets` — they are not an error, just not activated yet.
144
+
145
+ When installing through `npx @kontourai/flow-agents init` with the Codex runtime, pass `--activate-kits` to run activation as part of init:
146
+
147
+ ```bash
148
+ npx @kontourai/flow-agents init --runtime codex --dest /path/to/workspace --activate-kits --yes
149
+ ```
150
+
151
+ ## Troubleshooting
152
+
153
+ Common validation errors and fixes are documented in the [Flow Kit Repository Contract](flow-kit-repository-contract.md#common-failures). The most frequent:
154
+
155
+ - `kit.json: .schema_version must be "1.0"` — update the manifest.
156
+ - `kit.json: .id must be a stable kebab-case string` — use a lowercase id like `review-kit`.
157
+ - `kit.json: .flows must be a non-empty list` — declare at least one Flow Definition.
158
+ - `kit.json: flows[0].path points at missing Flow Definition` — add the file or fix the path.
159
+ - `kit.json: docs[0].path points at missing asset` — add the asset or remove the entry.
160
+
161
+ For path errors: all declared paths must be relative, must not contain `..`, and must point at existing files. Absolute paths are rejected because a kit must be portable between machines.
162
+
163
+ For conflicts on re-install: if you install a different source with an existing kit id, the command fails unless you pass `--update`. Use `--force` to re-copy an existing same-source install after validation.
164
+
165
+ See the [Flow Kit Repository Contract](flow-kit-repository-contract.md) for the full validation rules, registry schema, activation diagnostics, and the install/update/force semantics.
166
+
167
+ ## Direction
168
+
169
+ Flow Kits are designed to be shareable workflow units — authored once, carried across teams and workspaces. The intended growth path is distribution from git remotes and a curated Kontour kit catalog of Kontour-authored kits covering work modes beyond software delivery. Today install is local-path only; remote fetch is explicitly a non-goal in this version.
@@ -152,9 +152,9 @@ The goal is not to add ceremony. The goal is to make agents more reliable while
152
152
  | [x] | Standards register | Supported standards and Flow Agents-owned formats are documented with adoption rules. |
153
153
  | [ ] | Structured workflow state | Draft schemas, contracts, validation, explicit current-session identity, delegation-safe agent event logs, sidecar writer commands, and direct workflow-skill writer instructions exist for state, acceptance, evidence, handoff, critique, release, and learning; automatic enforcement remains partial. |
154
154
  | [ ] | Context map | Generated repo/context map exists; workflow steering and core planner/worker/verifier agents now use it, but broader agent coverage remains. |
155
- | [ ] | JIT guidance | Stop hook checks sidecars; workflow steering reads `state.json`, `critique.json`, context-map availability, and high-risk state after non-subagent tools; broader file/task-aware guidance remains. |
155
+ | [ ] | JIT guidance | Stop hook checks sidecars; workflow steering reads `state.json`, `critique.json`, context-map availability, and high-risk state after non-subagent tools; the opt-in utterance evidence-check hook (ADR 0003 §9) badges unsupported agent statements via Survey; broader file/task-aware guidance remains. |
156
156
  | [x] | Sandbox policy | `context/contracts/sandbox-policy.md` and https://github.com/kontourai/flow-agents/blob/main/docs/sandbox-policy.md classify local read-only, local edit, worktree, container, cloud sandbox, and privileged integration modes. |
157
- | [ ] | Evidence integration | Evidence sidecars now carry `standard_refs` for SARIF, OpenTelemetry, JUnit/TAP, Veritas, and custom proof; a local Veritas readiness wrapper can record native Veritas reports as optional Flow Agents evidence. |
157
+ | [ ] | Evidence integration | Evidence sidecars now carry `standard_refs` for SARIF, OpenTelemetry, JUnit/TAP, Veritas, and custom proof; a local Veritas readiness wrapper records native Veritas reports as optional evidence; utterance trust reports from `@kontourai/survey` cover agent statements. |
158
158
  | [ ] | Feedback loop | Runtime telemetry, outcomes, evals, and recurring corrections feed back into docs, skills, rules, or backlog. |
159
159
  | [ ] | Export validation | Codex, Claude Code, and Kiro exports preserve the same operating layers and now install telemetry, Goal Fit, and workflow steering hook wiring; adapter output, installed-command coverage, Claude live hook influence, and Kiro live strict-stop coverage exist. |
160
160