@kontourai/flow-agents 0.1.2 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (85) hide show
  1. package/.github/dependabot.yml +23 -0
  2. package/.github/workflows/release-please.yml +31 -0
  3. package/.github/workflows/runtime-compat.yml +118 -0
  4. package/CHANGELOG.md +23 -0
  5. package/CONTRIBUTING.md +4 -0
  6. package/README.md +53 -10
  7. package/build/src/cli/init.js +215 -5
  8. package/build/src/cli/utterance-check.js +65 -1
  9. package/build/src/tools/build-universal-bundles.js +268 -0
  10. package/build/src/tools/filter-installed-packs.js +3 -0
  11. package/build/src/tools/validate-source-tree.js +5 -1
  12. package/context/scripts/telemetry/lib/config.sh +5 -1
  13. package/context/settings/flow-agents-settings.json +7 -0
  14. package/docs/context-map.md +1 -0
  15. package/docs/index.md +45 -4
  16. package/docs/integrations/conformance.md +246 -0
  17. package/docs/integrations/framework-adapter.md +275 -0
  18. package/docs/integrations/harness-install.md +213 -0
  19. package/docs/integrations/index.md +54 -0
  20. package/docs/north-star.md +2 -2
  21. package/docs/spec/runtime-hook-surface.md +472 -0
  22. package/docs/survey-utterance-check.md +211 -94
  23. package/docs/vision.md +45 -0
  24. package/evals/acceptance/run.sh +4 -2
  25. package/evals/acceptance/test_opencode_harness.sh +121 -0
  26. package/evals/acceptance/test_pi_harness.sh +98 -0
  27. package/evals/integration/test_bundle_install.sh +226 -1
  28. package/evals/integration/test_bundle_lifecycle.sh +641 -0
  29. package/evals/integration/test_utterance_check.sh +291 -44
  30. package/evals/run.sh +2 -0
  31. package/evals/static/test_universal_bundles.sh +137 -2
  32. package/integrations/strands/README.md +256 -0
  33. package/integrations/strands/example.py +74 -0
  34. package/integrations/strands/flow_agents_strands/__init__.py +27 -0
  35. package/integrations/strands/flow_agents_strands/hooks.py +194 -0
  36. package/integrations/strands/flow_agents_strands/policy.py +348 -0
  37. package/integrations/strands/flow_agents_strands/steering.py +172 -0
  38. package/integrations/strands/flow_agents_strands/telemetry.py +238 -0
  39. package/integrations/strands/pyproject.toml +38 -0
  40. package/integrations/strands/tests/__init__.py +0 -0
  41. package/integrations/strands/tests/test_hooks.py +304 -0
  42. package/integrations/strands/tests/test_policy.py +315 -0
  43. package/integrations/strands/tests/test_telemetry.py +184 -0
  44. package/integrations/strands-ts/README.md +224 -0
  45. package/integrations/strands-ts/bin/conformance-shim.mjs +257 -0
  46. package/integrations/strands-ts/package.json +53 -0
  47. package/integrations/strands-ts/src/hooks.ts +208 -0
  48. package/integrations/strands-ts/src/index.ts +22 -0
  49. package/integrations/strands-ts/src/policy.ts +345 -0
  50. package/integrations/strands-ts/src/telemetry.ts +251 -0
  51. package/integrations/strands-ts/test/test-policy.ts +322 -0
  52. package/integrations/strands-ts/test/test-telemetry.ts +226 -0
  53. package/integrations/strands-ts/tsconfig.json +20 -0
  54. package/package.json +7 -2
  55. package/packaging/conformance/README.md +142 -0
  56. package/packaging/conformance/fixtures/config-protection--allow-no-path.json +18 -0
  57. package/packaging/conformance/fixtures/config-protection--allow-safe-file.json +20 -0
  58. package/packaging/conformance/fixtures/config-protection--block-biome.json +20 -0
  59. package/packaging/conformance/fixtures/config-protection--block-eslintrc.json +20 -0
  60. package/packaging/conformance/fixtures/quality-gate--allow-no-path.json +17 -0
  61. package/packaging/conformance/fixtures/quality-gate--allow-nonexistent-file.json +19 -0
  62. package/packaging/conformance/fixtures/stop-goal-fit--allow-clean-cwd.json +17 -0
  63. package/packaging/conformance/fixtures/stop-goal-fit--block-strict-mode.json +23 -0
  64. package/packaging/conformance/fixtures/stop-goal-fit--warn-active-delivery.json +21 -0
  65. package/packaging/conformance/fixtures/workflow-steering--allow-no-state.json +16 -0
  66. package/packaging/conformance/fixtures/workflow-steering--inject-active-state.json +29 -0
  67. package/packaging/conformance/fixtures/workflow-steering--inject-subagent-steering.json +25 -0
  68. package/packaging/conformance/package.json +4 -0
  69. package/packaging/conformance/run-conformance.js +322 -0
  70. package/packaging/manifest.json +59 -0
  71. package/schemas/flow-agents-settings.schema.json +48 -0
  72. package/scripts/README.md +4 -0
  73. package/scripts/dogfood.js +16 -0
  74. package/scripts/hooks/opencode-hook-adapter.js +123 -0
  75. package/scripts/hooks/opencode-telemetry-hook.js +101 -0
  76. package/scripts/hooks/pi-hook-adapter.js +123 -0
  77. package/scripts/hooks/pi-telemetry-hook.js +105 -0
  78. package/scripts/hooks/run-hook.js +8 -0
  79. package/scripts/hooks/utterance-check.js +124 -22
  80. package/scripts/telemetry/lib/config.sh +5 -1
  81. package/src/cli/init.ts +219 -6
  82. package/src/cli/utterance-check.ts +71 -1
  83. package/src/tools/build-universal-bundles.ts +266 -0
  84. package/src/tools/filter-installed-packs.ts +3 -0
  85. package/src/tools/validate-source-tree.ts +5 -1
@@ -0,0 +1,213 @@
1
+ ---
2
+ title: Harness Install
3
+ ---
4
+
5
+ # Harness Install
6
+
7
+ This page walks through three harness installs: Claude Code (the L2 reference runtime), opencode, and pi. All three follow the same model — `npm run build:bundles` generates the bundle, `flow-agents init` places it — but each runtime expects different files at different paths.
8
+
9
+ ## How harness bundles work
10
+
11
+ `npm run build:bundles` generates one bundle per runtime under `dist/<runtime>/`. Each bundle contains:
12
+
13
+ - A host-specific configuration file that maps lifecycle events to shell commands invoking the canonical hook adapter wrapper.
14
+ - A host-specific adapter wrapper (`<runtime>-hook-adapter.js`) that reads stdin JSON from the host, invokes `run-hook.js` with the canonical script path and profile, translates the exit code to the host-native response format, and fails open on errors.
15
+ - A host-specific telemetry wrapper (`<runtime>-telemetry-hook.js`) that maps host event names to canonical telemetry event names and invokes `scripts/telemetry/telemetry.sh`.
16
+ - An `install.sh` that places the generated files at the host-expected paths.
17
+
18
+ `flow-agents init` (from `npx @kontourai/flow-agents`) calls `install.sh` for the selected runtime.
19
+
20
+ ## Claude Code
21
+
22
+ Claude Code is the L2 reference implementation. All four policy classes are wired: workflow steering, quality gate, stop-goal-fit, and config protection.
23
+
24
+ ### Install
25
+
26
+ ```bash
27
+ npx @kontourai/flow-agents init --runtime claude-code --dest /path/to/workspace --yes
28
+ ```
29
+
30
+ The install script writes hook wiring into `.claude/settings.json` inside the destination workspace. The hooks object in `settings.json` maps Claude Code lifecycle events (`UserPromptSubmit`, `PreToolUse`, `PostToolUse`, `Stop`) to shell commands invoking the adapter:
31
+
32
+ ```bash
33
+ bash -lc 'root="${FLOW_AGENTS_CLAUDE_CODE_ROOT:-$(pwd)}"; \
34
+ node "$root/scripts/hooks/claude-telemetry-hook.js" UserPromptSubmit dev'
35
+ bash -lc 'root="${FLOW_AGENTS_CLAUDE_CODE_ROOT:-$(pwd)}"; \
36
+ node "$root/scripts/hooks/claude-hook-adapter.js" UserPromptSubmit \
37
+ workflow-steering workflow-steering.js default'
38
+ ```
39
+
40
+ Telemetry always fires first and is always non-blocking (timeout: 10 s). Policy hooks fire second and may block on `PreToolUse` (timeout: 30 s). Both fail open on hook runtime errors.
41
+
42
+ ### Dogfood variant (repo-local)
43
+
44
+ Inside the `flow-agents` source repo itself, the dogfood script writes hook wiring that points at the local `scripts/hooks/` directory rather than a published package:
45
+
46
+ ```bash
47
+ npm run dogfood -- --runtime claude-code
48
+ ```
49
+
50
+ The destination defaults to the repo root. Pass `--dest` to override.
51
+
52
+ ### Scope-collision warning
53
+
54
+ When `init` detects that an existing `.claude/settings.json` already has hooks entries for the same lifecycle events, it emits a scope-collision warning to stderr:
55
+
56
+ ```
57
+ [flow-agents] WARNING: .claude/settings.json already has hooks for UserPromptSubmit.
58
+ Existing entries will be preserved; Flow Agents hooks will be appended.
59
+ Review .claude/settings.json to confirm hook ordering is correct.
60
+ ```
61
+
62
+ The install appends rather than replaces, so existing hooks are not removed. Review the settings file after install to confirm the ordering is what you want.
63
+
64
+ ### Resulting file layout
65
+
66
+ ```
67
+ <workspace>/
68
+ .claude/
69
+ settings.json ← hook wiring (appended by install)
70
+ scripts/
71
+ hooks/
72
+ claude-hook-adapter.js
73
+ claude-telemetry-hook.js
74
+ run-hook.js
75
+ config-protection.js
76
+ quality-gate.js
77
+ stop-goal-fit.js
78
+ workflow-steering.js
79
+
80
+ skills/
81
+
82
+ .flow-agents/ ← runtime workflow artifacts (not committed)
83
+ ```
84
+
85
+ ## opencode
86
+
87
+ opencode is an L1 adapter. It has no native `prompt.submit`-equivalent event, so workflow steering is approximated at `session.created` rather than at each user turn. This is a documented gap: see <a href="../spec/runtime-hook-surface.html">the spec, section 2.1</a>.
88
+
89
+ ### Install
90
+
91
+ ```bash
92
+ npx @kontourai/flow-agents init --runtime opencode --dest /path/to/workspace --yes
93
+ ```
94
+
95
+ ### Dogfood variant
96
+
97
+ ```bash
98
+ npm run dogfood -- --runtime opencode
99
+ ```
100
+
101
+ ### Resulting file layout
102
+
103
+ ```
104
+ <workspace>/
105
+ .opencode/
106
+ plugins/
107
+ flow-agents.js ← auto-loaded at opencode startup
108
+ agents/
109
+ dev.md ← agent prompts (opencode markdown format)
110
+ tool-planner.md
111
+ tool-worker.md
112
+
113
+ skills/
114
+ deliver.md
115
+ fix-bug.md
116
+
117
+ opencode.json ← workspace instructions pointer
118
+ scripts/
119
+ hooks/
120
+ opencode-hook-adapter.js
121
+ opencode-telemetry-hook.js
122
+ run-hook.js
123
+
124
+ skills/
125
+
126
+ ```
127
+
128
+ `opencode.json` at the workspace root is a minimal config file:
129
+
130
+ ```json
131
+ {
132
+ "instructions": "This workspace uses Flow Agents. See AGENTS.md for conventions, skills, and workflow guidance."
133
+ }
134
+ ```
135
+
136
+ The plugin at `.opencode/plugins/flow-agents.js` is auto-loaded at opencode startup. It exports `FlowAgentsPlugin` and registers handlers for:
137
+
138
+ | opencode event | What fires |
139
+ | --- | --- |
140
+ | `session.created` | Telemetry + workflow steering (session-start context injection) |
141
+ | `tool.execute.before` | Telemetry + config-protection (blocking via thrown Error) |
142
+ | `tool.execute.after` | Telemetry + quality gate |
143
+ | `session.idle` | Telemetry + stop-goal-fit (warning only — not a true stop event) |
144
+ | `session.error`, `session.compacted`, `permission.asked`, `file.edited` | Telemetry only |
145
+
146
+ **Accepted gaps**: opencode has no `prompt.submit` hook, so workflow steering fires only on `session.created` — not at each user turn. `session.idle` is the closest event to a stop hook but does not reliably fire on session completion. These gaps are declared in the conformance level (L1) and in the plugin source comments.
147
+
148
+ **Agents**: opencode receives agent prompts as markdown files in `.opencode/agents/`. The main orchestrator is `dev.md`; specialist tools (planner, worker, reviewer, etc.) are additional markdown files in the same directory.
149
+
150
+ ## pi
151
+
152
+ pi is an L1 adapter. It has no stop hook, so stop-goal-fit cannot fire at session end. This is a documented gap: see <a href="../spec/runtime-hook-surface.html">the spec, section 2.3</a>.
153
+
154
+ ### Install
155
+
156
+ ```bash
157
+ npx @kontourai/flow-agents init --runtime pi --dest /path/to/workspace --yes
158
+ ```
159
+
160
+ ### Dogfood variant
161
+
162
+ ```bash
163
+ npm run dogfood -- --runtime pi
164
+ ```
165
+
166
+ ### Resulting file layout
167
+
168
+ ```
169
+ <workspace>/
170
+ .pi/
171
+ extensions/
172
+ flow-agents.ts ← auto-discovered at startup (needs project trust)
173
+ skills/
174
+ deliver.md
175
+ fix-bug.md
176
+
177
+ AGENTS.md ← agent instructions (pi uses AGENTS.md, not a registry)
178
+ scripts/
179
+ hooks/
180
+ pi-hook-adapter.js
181
+ pi-telemetry-hook.js
182
+ run-hook.js
183
+
184
+ skills/
185
+
186
+ ```
187
+
188
+ The extension at `.pi/extensions/flow-agents.ts` is auto-discovered at startup. It registers handlers for:
189
+
190
+ | pi event | What fires |
191
+ | --- | --- |
192
+ | `session_start` | Telemetry |
193
+ | `before_agent_start` | Telemetry + workflow steering (injects context into system prompt) |
194
+ | `tool_call` | Telemetry + config-protection (blocking via `{ block: true }` return) |
195
+ | `tool_result` | Telemetry + quality gate |
196
+ | `session_shutdown` | Telemetry + stop-goal-fit (warning only — not a true stop event) |
197
+
198
+ **Accepted gaps**: pi has no stop hook. `session_shutdown` is used as the closest equivalent but does not carry the same semantics as a stop event. This gap is declared in the conformance level (L1) and in the extension source comments.
199
+
200
+ **Agents**: pi has no named-subagent registry. Agent guidance is delivered through `AGENTS.md` at the workspace root, plus the skills in `.pi/skills/` and the extension. The `flow-agents.ts` extension comment says explicitly: "pi has no named-subagent registry. Agents are not exported for pi."
201
+
202
+ ### Scope-collision warning
203
+
204
+ Same behavior as Claude Code: if an existing `.pi/extensions/` directory contains a file with conflicting event registrations, `init` warns and appends. Review the extension file after install.
205
+
206
+ ## Related references
207
+
208
+ - `dist/opencode/` — generated opencode bundle (do not edit by hand)
209
+ - `dist/pi/` — generated pi bundle (do not edit by hand)
210
+ - `dist/claude-code/` — generated Claude Code bundle
211
+ - `scripts/hooks/run-hook.js` — canonical hook runner
212
+ - <a href="../spec/runtime-hook-surface.html">Runtime Hook Surface spec</a> — event taxonomy, policy classes, conformance levels
213
+ - <a href="conformance.html">Conformance</a> — how to self-certify a new adapter
@@ -0,0 +1,54 @@
1
+ ---
2
+ title: Integration Examples
3
+ ---
4
+
5
+ # Integration Examples
6
+
7
+ Flow Agents reaches host runtimes and agent frameworks through two distinct distribution models. This section provides worked examples for each model and a guide to the conformance kit for third-party adapter authors.
8
+
9
+ ## Distribution models at a glance
10
+
11
+ **Harness runtimes** ship as self-contained bundles under `dist/<runtime>/`. The `npm run build:bundles` command generates each bundle from the canonical manifest and policy scripts. `flow-agents init` (or the dogfood variant) places the generated files at the host-expected paths inside a target workspace. Claude Code, Codex, Kiro, opencode, and pi are harness adapters.
12
+
13
+ **Framework adapters** live in `integrations/<name>/` as language-native packages. They register Flow Agents callbacks with the framework's lifecycle system using the framework's native registration API. `integrations/strands/` is the reference implementation: `flow-agents-strands` is a Python `HookProvider` that wires into AWS Strands Agents without requiring the Strands SDK at import time.
14
+
15
+ **Third-party adapters** self-certify by running the conformance kit in `packaging/conformance/`. The kit provides golden fixtures and a runner that pipes each fixture through the adapter command and reports per-level verdict.
16
+
17
+ ## Conformance levels
18
+
19
+ | Level | What is required |
20
+ | --- | --- |
21
+ | L0 | Telemetry only — at least `agentSpawn` fires on session start |
22
+ | L1 | L0 plus workflow steering and stop-goal-fit in warning mode |
23
+ | L2 | L1 plus config protection (blocking) and quality gate — the reference level |
24
+
25
+ Claude Code and Codex are L2 reference implementations. opencode is L1 (no prompt-submit hook). pi is L1 (no stop hook). The Strands adapter is L0 plus config protection via `BeforeToolCallEvent` cancellation.
26
+
27
+ The <a href="../spec/runtime-hook-surface.html">Runtime Hook Surface spec</a> defines the canonical event taxonomy, policy classes, conformance levels, and engine contract in full.
28
+
29
+ ## Pages in this section
30
+
31
+ <div class="doc-grid">
32
+ <a class="doc-card" href="harness-install.html">
33
+ <strong>Harness Install</strong>
34
+ <span>Worked example installing into a Claude Code project, and the two newest runtimes: opencode and pi. Includes the dogfood variant and scope-collision warning behavior.</span>
35
+ </a>
36
+ <a class="doc-card" href="framework-adapter.html">
37
+ <strong>Framework Adapter</strong>
38
+ <span>Worked example based on <code>integrations/strands/</code>: constructing FlowAgentsHooks, telemetry emitted, the engine-contract binding for policy, and documented limitations.</span>
39
+ </a>
40
+ <a class="doc-card" href="conformance.html">
41
+ <strong>Conformance</strong>
42
+ <span>How a third-party adapter self-certifies: the engine contract 1.0, running the conformance runner, what each level requires, and how to declare gaps.</span>
43
+ </a>
44
+ <a class="doc-card" href="../spec/runtime-hook-surface.html">
45
+ <strong>Runtime Hook Surface Spec</strong>
46
+ <span>Canonical event taxonomy, four policy classes, conformance levels L0/L1/L2, mapping tables, and the engine contract for adapter authors.</span>
47
+ </a>
48
+ </div>
49
+
50
+ ---
51
+
52
+ ## TypeScript native-import adapter
53
+
54
+ `integrations/strands-ts/` (`@kontourai/flow-agents-strands`) is the first native-import consumer of the policy engine contract. It binds the `config-protection.js` `run()` function directly — no subprocess on the hot path. Achieves **L2** conformance. See `integrations/strands-ts/README.md` and the [Framework Adapter](framework-adapter.html) page for the full comparison with the Python adapter.
@@ -152,9 +152,9 @@ The goal is not to add ceremony. The goal is to make agents more reliable while
152
152
  | [x] | Standards register | Supported standards and Flow Agents-owned formats are documented with adoption rules. |
153
153
  | [ ] | Structured workflow state | Draft schemas, contracts, validation, explicit current-session identity, delegation-safe agent event logs, sidecar writer commands, and direct workflow-skill writer instructions exist for state, acceptance, evidence, handoff, critique, release, and learning; automatic enforcement remains partial. |
154
154
  | [ ] | Context map | Generated repo/context map exists; workflow steering and core planner/worker/verifier agents now use it, but broader agent coverage remains. |
155
- | [ ] | JIT guidance | Stop hook checks sidecars; workflow steering reads `state.json`, `critique.json`, context-map availability, and high-risk state after non-subagent tools; broader file/task-aware guidance remains. |
155
+ | [ ] | JIT guidance | Stop hook checks sidecars; workflow steering reads `state.json`, `critique.json`, context-map availability, and high-risk state after non-subagent tools; the opt-in utterance evidence-check hook (ADR 0003 §9) badges unsupported agent statements via Survey; broader file/task-aware guidance remains. |
156
156
  | [x] | Sandbox policy | `context/contracts/sandbox-policy.md` and https://github.com/kontourai/flow-agents/blob/main/docs/sandbox-policy.md classify local read-only, local edit, worktree, container, cloud sandbox, and privileged integration modes. |
157
- | [ ] | Evidence integration | Evidence sidecars now carry `standard_refs` for SARIF, OpenTelemetry, JUnit/TAP, Veritas, and custom proof; a local Veritas readiness wrapper can record native Veritas reports as optional Flow Agents evidence. |
157
+ | [ ] | Evidence integration | Evidence sidecars now carry `standard_refs` for SARIF, OpenTelemetry, JUnit/TAP, Veritas, and custom proof; a local Veritas readiness wrapper records native Veritas reports as optional evidence; utterance trust reports from `@kontourai/survey` cover agent statements. |
158
158
  | [ ] | Feedback loop | Runtime telemetry, outcomes, evals, and recurring corrections feed back into docs, skills, rules, or backlog. |
159
159
  | [ ] | Export validation | Codex, Claude Code, and Kiro exports preserve the same operating layers and now install telemetry, Goal Fit, and workflow steering hook wiring; adapter output, installed-command coverage, Claude live hook influence, and Kiro live strict-stop coverage exist. |
160
160