@kontourai/flow-agents 0.2.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (53) hide show
  1. package/.github/workflows/release-please.yml +13 -1
  2. package/.github/workflows/runtime-compat.yml +1 -1
  3. package/AGENTS.md +8 -1
  4. package/CHANGELOG.md +41 -0
  5. package/README.md +38 -19
  6. package/build/src/cli/flow-kit.js +9 -4
  7. package/build/src/cli/runtime-adapter.js +9 -5
  8. package/build/src/cli/telemetry-doctor.js +4 -1
  9. package/build/src/runtime-adapters.js +34 -0
  10. package/build/src/tools/build-universal-bundles.js +18 -1
  11. package/console.telemetry.json +115 -20
  12. package/docs/_layouts/default.html +2 -0
  13. package/docs/index.md +8 -0
  14. package/docs/integrations/index.md +4 -0
  15. package/docs/integrations/knowledge-kit-live.md +211 -0
  16. package/docs/kit-authoring-guide.md +169 -0
  17. package/docs/spec/runtime-hook-surface.md +56 -3
  18. package/evals/acceptance/run.sh +10 -1
  19. package/evals/acceptance/test_knowledge_kit_live.sh +221 -0
  20. package/evals/acceptance/test_pi_harness.sh +15 -0
  21. package/evals/integration/test_runtime_adapter_activation.sh +113 -1
  22. package/evals/static/test_universal_bundles.sh +10 -0
  23. package/integrations/strands/examples/knowledge_kit_live.py +461 -0
  24. package/integrations/strands/flow_agents_strands/steering.py +54 -1
  25. package/integrations/strands/tests/test_hooks.py +88 -0
  26. package/integrations/strands-ts/src/hooks.ts +104 -0
  27. package/integrations/strands-ts/test/test-steering.ts +159 -0
  28. package/kits/catalog.json +6 -0
  29. package/kits/knowledge/adapters/default-store/index.js +902 -0
  30. package/kits/knowledge/adapters/flow-runner/index.js +1469 -0
  31. package/kits/knowledge/adapters/flow-runner/telemetry.js +174 -0
  32. package/kits/knowledge/adapters/similarity-vector/index.js +284 -0
  33. package/kits/knowledge/docs/README.md +328 -0
  34. package/kits/knowledge/docs/store-contract.md +650 -0
  35. package/kits/knowledge/evals/consolidation/suite.test.js +1234 -0
  36. package/kits/knowledge/evals/contract-suite/suite.test.js +675 -0
  37. package/kits/knowledge/evals/ingest-compile/suite.test.js +574 -0
  38. package/kits/knowledge/evals/retirement/suite.test.js +1173 -0
  39. package/kits/knowledge/evals/similarity-vector/suite.test.js +685 -0
  40. package/kits/knowledge/evals/synthesis/suite.test.js +916 -0
  41. package/kits/knowledge/flows/compile.flow.json +60 -0
  42. package/kits/knowledge/flows/consolidate.flow.json +77 -0
  43. package/kits/knowledge/flows/ingest.flow.json +60 -0
  44. package/kits/knowledge/flows/retire.flow.json +77 -0
  45. package/kits/knowledge/flows/store-contract.flow.json +48 -0
  46. package/kits/knowledge/flows/synthesize.flow.json +77 -0
  47. package/kits/knowledge/kit.json +98 -0
  48. package/package.json +1 -1
  49. package/src/cli/flow-kit.ts +10 -4
  50. package/src/cli/runtime-adapter.ts +10 -5
  51. package/src/cli/telemetry-doctor.ts +4 -1
  52. package/src/runtime-adapters.ts +35 -0
  53. package/src/tools/build-universal-bundles.ts +18 -1
@@ -0,0 +1,328 @@
1
+ ---
2
+ title: Knowledge Kit
3
+ ---
4
+
5
+ # Knowledge Kit
6
+
7
+ A Flow Kit for durable, gated knowledge storage. The kit defines a store contract with precise
8
+ record types, mutation operations, and provenance requirements — then ships a default adapter
9
+ backed by markdown files, YAML frontmatter, `[[wikilink]]` inline links, and a JSON graph
10
+ index.
11
+
12
+ Any storage backend can adopt this kit by implementing the contract without forking kit flows.
13
+
14
+ ---
15
+
16
+ ## Contract Summary
17
+
18
+ See [`store-contract.md`](store-contract.md) for the full specification. Quick reference:
19
+
20
+ **Record types**
21
+
22
+ | Type | Purpose |
23
+ |---|---|
24
+ | `raw` | Unprocessed source material — excerpts, transcripts, URLs with notes. |
25
+ | `compiled` | Normalized, editor-reviewed distillations of raw records. |
26
+ | `concept` | Named ideas or principles that other records reference. |
27
+ | `snapshot` | Bounded decision summary for a topic (Addendum A). |
28
+
29
+ **Record status lifecycle**
30
+
31
+ | Status | Meaning | Default |
32
+ |---|---|---|
33
+ | `active` | Live, part of the working set. | Yes (records without status field are treated as active). |
34
+ | `implemented` | Decision was shipped; transitional state before archival. | No |
35
+ | `retired` | Excluded from default working-set queries; history preserved. | No |
36
+
37
+ **Mutation operations**
38
+
39
+ | Op | Required evidence fields |
40
+ |---|---|
41
+ | `create` | `type`, `title`, `body`, `category`, `provenance.agent` |
42
+ | `update` | `agent` (in evidence) + at least one mutable field |
43
+ | `link` | `agent`, non-empty `links` array |
44
+ | `propose` | `agent`, `proposal` (non-empty) |
45
+ | `apply` | `agent`, `new_body` (non-empty), `rationale` (non-empty) |
46
+ | `reject` | `agent`, `reason` (non-empty) |
47
+ | `supersede` | `agent`, `rationale` (non-empty), non-empty `supersededIds` array |
48
+ | `retire` | `agent`, `rationale` (non-empty), `implementedByRef` (when `targetStatus="implemented"`) |
49
+
50
+ Every mutation throws with `error.code === "MISSING_EVIDENCE"` when required evidence is absent.
51
+
52
+ ---
53
+
54
+ ## Running the Contract Suite
55
+
56
+ The contract suite is a `node:test` suite parameterized by adapter.
57
+
58
+ ### Default adapter (runs automatically)
59
+
60
+ ```bash
61
+ node --test kits/knowledge/evals/contract-suite/suite.test.js
62
+ ```
63
+
64
+ ### Alternative adapter
65
+
66
+ ```bash
67
+ KNOWLEDGE_ADAPTER=/path/to/my-adapter.js \
68
+ node --test kits/knowledge/evals/contract-suite/suite.test.js
69
+ ```
70
+
71
+ Or via CLI flag:
72
+
73
+ ```bash
74
+ node --test kits/knowledge/evals/contract-suite/suite.test.js \
75
+ -- --adapter=/path/to/my-adapter.js
76
+ ```
77
+
78
+ ### Expected output (default adapter)
79
+
80
+ All tests pass and exit 0. Any failure indicates a contract regression or an adapter gap.
81
+
82
+ ---
83
+
84
+ ## Acceptance Criteria Mapping
85
+
86
+ | AC | Requirement | Evidence |
87
+ |---|---|---|
88
+ | AC1 | Contract doc exists; mutation ops enumerate evidence fields. | [`docs/store-contract.md`](store-contract.md) — §6 covers all six ops with required-field tables. |
89
+ | AC2 | Default adapter passes contract suite (command evidence). | Run `node --test kits/knowledge/evals/contract-suite/suite.test.js` — exit 0, all tests pass. |
90
+ | AC3 | Record round-trips raw → stored → queried with category + links intact. | Suite §2 "create: round-trip raw → stored → queried" tests this directly. |
91
+
92
+ ---
93
+
94
+ ## Default Adapter Details
95
+
96
+ Located at `adapters/default-store/index.js`. Zero runtime dependencies; uses Node.js
97
+ built-ins only.
98
+
99
+ **Storage layout**
100
+
101
+ ```
102
+ <store_root>/
103
+ records/
104
+ <id>.md ← one markdown file per record, YAML frontmatter + body
105
+ graph-index.json ← forward + reverse link index, schema_version 1.0
106
+ ```
107
+
108
+ **Constructor**
109
+
110
+ ```js
111
+ import DefaultKnowledgeStore from './adapters/default-store/index.js';
112
+ const store = new DefaultKnowledgeStore({ storeRoot: '/path/to/store' });
113
+ ```
114
+
115
+ **Interface** (all methods return Promises):
116
+
117
+ - `create(input)` → `Promise<string>` (new id)
118
+ - `update(id, fields, evidence)` → `Promise<void>`
119
+ - `link(sourceId, links, evidence)` → `Promise<void>`
120
+ - `propose(conceptId, proposerId, evidence)` → `Promise<void>`
121
+ - `apply(conceptId, proposerId, evidence)` → `Promise<void>`
122
+ - `reject(conceptId, proposerId, evidence)` → `Promise<void>`
123
+ - `get(id)` → `Promise<Record | null>`
124
+ - `getLinks(id)` → `Promise<{ forward: Link[], reverse: Link[] }>`
125
+ - `listByCategory(category, options?)` → `Promise<Record[]>`
126
+ - `listByType(type)` → `Promise<Record[]>`
127
+
128
+ ---
129
+
130
+ ## Flow
131
+
132
+ The kit ships one flow:
133
+
134
+ **`knowledge.store-contract`** — gates on three evidence claims before a store implementation
135
+ is accepted: contract-suite pass, provenance-enforcement pass, and round-trip integrity pass.
136
+ S2 will add pipeline flows for raw ingestion, compilation, and concept management; this flow
137
+ and adapter infrastructure remain the foundation.
138
+
139
+ ---
140
+
141
+ ## Decision Lifecycle — Retiring Records (S7)
142
+
143
+ Implemented or obsolete records can be retired from the working set via the `knowledge.retire`
144
+ flow. Retirement is **non-destructive**: the record body, links, and creation provenance remain
145
+ intact; the record is simply excluded from the default working set.
146
+
147
+ ### Status transitions
148
+
149
+ | From | To | Evidence required |
150
+ |---|---|---|
151
+ | `active` | `implemented` | `rationale` (non-empty) + `implementedByRef` (non-empty ref to implementing artifact) |
152
+ | `active` | `retired` | `rationale` (non-empty) |
153
+ | `implemented` | `retired` | `rationale` (non-empty) |
154
+ | `retired` | *(any)* | Invalid — `retired` is terminal |
155
+
156
+ ### Working-set exclusion
157
+
158
+ Retired records are excluded from:
159
+
160
+ - `listByType(type)` — default query
161
+ - `listByCategory(category, options)` — default query
162
+ - `defaultSimilarityDetector` — default cluster candidates
163
+ - `createVectorSimilarityDetector` — vector cluster candidates
164
+
165
+ Add `{ includeRetired: true }` to any query to restore retired records.
166
+
167
+ `get(id)` **always** returns the full record regardless of status.
168
+
169
+ ### Using the retire flow
170
+
171
+ ```js
172
+ import { KnowledgeFlowRunner } from './adapters/flow-runner/index.js';
173
+
174
+ const runner = new KnowledgeFlowRunner({ store, workspace });
175
+
176
+ // Retire a compiled decision record that was implemented
177
+ const result = await runner.retire(compiledId, {
178
+ targetStatus: 'implemented',
179
+ rationale: 'REST API shipped in v1.0 (PR #42).',
180
+ implementedByRef: 'https://github.com/org/repo/pull/42',
181
+ decision: 'apply',
182
+ });
183
+
184
+ // Retire an obsolete concept record
185
+ await runner.retire(conceptId, {
186
+ targetStatus: 'retired',
187
+ rationale: 'Superseded by new architecture decision in ADR-007.',
188
+ decision: 'apply',
189
+ });
190
+
191
+ // Reject a retirement proposal (status unchanged)
192
+ await runner.retire(recordId, {
193
+ targetStatus: 'retired',
194
+ rationale: 'Proposing retirement.',
195
+ decision: 'reject',
196
+ rejectReason: 'Still needed for reference.',
197
+ });
198
+ ```
199
+
200
+ ### Accessing retired records with provenance
201
+
202
+ ```js
203
+ // Always works — returns full record including retirement evidence
204
+ const record = await store.get(retiredId);
205
+ console.log(record.status); // "retired"
206
+ console.log(record.mutation_log); // includes retire entry with rationale
207
+
208
+ // Query all retired records of a type
209
+ const allCompiled = await store.listByType('compiled', { includeRetired: true });
210
+
211
+ // Get history from snapshot provenance
212
+ const snapshot = await store.get(snapshotId);
213
+ for (const srcId of snapshot.provenance.source_ids) {
214
+ const src = await store.get(srcId); // works even if src is retired
215
+ console.log(src.id, src.status);
216
+ }
217
+ ```
218
+
219
+ ---
220
+
221
+ ## Non-Goals (this iteration)
222
+
223
+ - Vector/semantic retrieval (parked as I10)
224
+ - Multi-user concurrency
225
+ - Store migrations
226
+ - Personal-KB import (parked as I11)
227
+
228
+ ---
229
+
230
+ ## Similarity Detectors
231
+
232
+ The `synthesize` and `consolidate` flows accept a pluggable `similarityDetector` option. A
233
+ detector has the signature:
234
+
235
+ ```js
236
+ async (concept: Record, candidates: Record[], store: KnowledgeStoreAdapter) => string[]
237
+ ```
238
+
239
+ It receives the target concept, all compiled candidates, and the store; it returns the IDs of
240
+ candidates that are similar enough to form a cluster. The `KnowledgeFlowRunner` uses the cluster
241
+ as its evidence base — an empty cluster throws `MISSING_EVIDENCE` at the detect-cluster gate.
242
+
243
+ ### Choosing a detector
244
+
245
+ | Detector | Best for | Tradeoff |
246
+ |---|---|---|
247
+ | `defaultSimilarityDetector` (built-in) | Fast, zero-config. Works well when records share a structured category taxonomy and inter-record wikilinks. | Relies on category prefixes and link-overlap (Jaccard ≥ 0.10). Misses semantic similarity across category boundaries. |
248
+ | `createVectorSimilarityDetector` | Semantic clustering. Finds similar records regardless of how they were categorised. | Requires an embedding backend (ollama by default). Adds latency proportional to cluster size. |
249
+
250
+ ### Vector detector — ollama embedding
251
+
252
+ The vector adapter lives at `adapters/similarity-vector/index.js`. It is zero-dependency and
253
+ calls ollama's `/api/embed` endpoint via the built-in `fetch`.
254
+
255
+ ```js
256
+ import { createVectorSimilarityDetector } from './adapters/similarity-vector/index.js';
257
+
258
+ // Default: uses ollama at localhost:11434 with nomic-embed-text
259
+ const detector = createVectorSimilarityDetector();
260
+
261
+ // Or customise host, model, and threshold:
262
+ const detector = createVectorSimilarityDetector({
263
+ host: 'http://localhost:11434',
264
+ model: 'nomic-embed-text',
265
+ threshold: 0.60, // cosine similarity cutoff
266
+ });
267
+
268
+ // Pass to synthesize:
269
+ await runner.synthesize(conceptId, {
270
+ proposedBody: '...',
271
+ rationale: '...',
272
+ similarityDetector: detector,
273
+ });
274
+ ```
275
+
276
+ **Starting ollama:**
277
+
278
+ ```bash
279
+ ollama serve &
280
+ ollama pull nomic-embed-text # 274 MB, one-time pull
281
+ ```
282
+
283
+ **Threshold guidance:**
284
+
285
+ The default threshold of `0.60` is validated against `nomic-embed-text` (768-dim). Empirical
286
+ scores observed in the eval suite:
287
+
288
+ | Pair | Score |
289
+ |---|---|
290
+ | Semantically similar API design texts | ~0.77 |
291
+ | Semantically unrelated (API vs. bread baking) | ~0.41 |
292
+
293
+ A threshold of `0.60` cleanly separates these two classes. If your domain records are more
294
+ homogeneous (narrow vocabulary, very similar boilerplate) you may need to raise the threshold
295
+ to `0.70–0.80` to avoid over-clustering.
296
+
297
+ ### Fail-closed rationale
298
+
299
+ The vector detector throws an `Error` with `code="EMBED_FAILURE"` rather than returning `[]`
300
+ when the embedding call fails (network error, HTTP error, malformed response, wrong vector
301
+ count). This is intentional.
302
+
303
+ A detector that silently returns `[]` on infrastructure failure is indistinguishable from one
304
+ that found no similar records. The result is a misleading `MISSING_EVIDENCE` at the detect-cluster
305
+ gate, which looks like "this concept has no sources" rather than "the embedding service is down".
306
+
307
+ Failing closed makes the infrastructure problem visible immediately, at the right level, with a
308
+ clear error code. Operators can catch `EMBED_FAILURE` separately from `MISSING_EVIDENCE` and
309
+ route them to different alerting channels.
310
+
311
+ ### Injecting a custom embed function
312
+
313
+ For tests or alternative providers (OpenAI, Cohere, etc.), pass `embed` directly:
314
+
315
+ ```js
316
+ const detector = createVectorSimilarityDetector({
317
+ embed: async (texts) => {
318
+ // texts: string[] — one per record (title + "\n" + body by default)
319
+ // must return: number[][] — one vector per input text
320
+ const response = await myEmbeddingAPI.embed(texts);
321
+ return response.vectors;
322
+ },
323
+ threshold: 0.70,
324
+ });
325
+ ```
326
+
327
+ The `embed` function is called once per `synthesize`/`consolidate` call with all texts in a
328
+ single batch (concept first, then candidates).