@semiont/make-meaning 0.5.6 → 0.5.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +51 -32
- package/dist/index.d.ts +198 -23
- package/dist/index.js +751 -223
- package/dist/index.js.map +1 -1
- package/dist/smelter-main.js +547 -243
- package/dist/smelter-main.js.map +1 -1
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -8,18 +8,26 @@
|
|
|
8
8
|
|
|
9
9
|
**Making meaning from resources through actors, context assembly, and relationship reasoning.**
|
|
10
10
|
|
|
11
|
-
This package implements the actor model from [ACTOR-MODEL.md](../../docs/system/ACTOR-MODEL.md). It owns the **Knowledge Base** and the actors that
|
|
11
|
+
This package implements the actor model from [ACTOR-MODEL.md](../../docs/system/ACTOR-MODEL.md). It owns the **Knowledge Base** and the seven actors that serve it.
|
|
12
|
+
|
|
13
|
+
Five **access actors** mediate every read and write — the bus-facing interface of the Knowledge Base:
|
|
12
14
|
|
|
13
15
|
- **Stower** (write) — the single write gateway to the Knowledge Base; handles all resource and annotation mutations and job lifecycle events
|
|
14
|
-
- **Browser** (read) — handles all KB read queries: resources, annotations, events, annotation history, referenced-by lookups, entity type listing, and directory browse (merging filesystem listings with KB metadata)
|
|
16
|
+
- **Browser** (read) — handles all KB read queries: resources, annotations, events, annotation history, referenced-by lookups, entity type and tag-schema listing, and directory browse (merging filesystem listings with KB metadata)
|
|
15
17
|
- **Gatherer** (context assembly) — assembles gathered context for annotations (`gather:requested`) and resources (`gather:resource-requested`); searches vectors for semantically similar passages (adds `semanticContext` to `GatheredContext`)
|
|
16
18
|
- **Matcher** (search/link) — context-driven candidate search with multi-source retrieval, composite structural scoring, and optional LLM semantic scoring
|
|
17
|
-
- **Smelter** (embed) — subscribes to resource/annotation events, chunks text, embeds via `@semiont/vectors`, emits `embedding:compute` commands (persisted by Stower as `embedding:computed` events), and indexes into vector store (Qdrant)
|
|
18
19
|
- **CloneTokenManager** (yield) — manages clone token lifecycle for resource cloning
|
|
19
20
|
|
|
20
|
-
|
|
21
|
+
Two **projection pipelines** follow the event log to keep the eventually-consistent read models in sync — addressed by no one, replying to nothing:
|
|
22
|
+
|
|
23
|
+
- **Graph Consumer** (project) — subscribes to graph-relevant domain events and projects them into the graph database; carried on the KB record (`kb.graphConsumer`) and rebuilt from the event log at startup (`rebuildAll()`)
|
|
24
|
+
- **Smelter** (embed) — standalone embedding pipeline run via `@semiont/make-meaning/smelter-main` (not started by `startMakeMeaning`); subscribes to domain events, reads content from the KB working tree via `WorkerContentTransport`, chunks text, embeds via `@semiont/vectors`, and indexes into the vector store (Qdrant). On startup it reconciles Qdrant against the KS catalog — re-embedding what's missing or stale (every upsert is stamped with the embedded bytes' checksum, so changed content is detected) and deleting orphans — so a wiped Qdrant volume, or events missed while the worker was down, recover by restarting the smelter
|
|
25
|
+
|
|
26
|
+
(The third derived read model — the materialized views — is not pipeline-maintained: the EventStore's `ViewManager` materializes views synchronously inside `appendEvent()` for a read-your-writes guarantee.)
|
|
27
|
+
|
|
28
|
+
All seven actors subscribe to the EventBus via RxJS pipelines and expose no public business methods — only `initialize()` and `stop()`, plus a startup recovery entry point on the pipelines (`rebuildAll()` / `reconcile()`). Callers communicate with the access actors by putting events on the bus.
|
|
21
29
|
|
|
22
|
-
The EventBus is a **complete interface** for all knowledge-domain operations. HTTP routes in the backend are thin wrappers that delegate to EventBus actors. The `@semiont/
|
|
30
|
+
The EventBus is a **complete interface** for all knowledge-domain operations. HTTP routes in the backend are thin wrappers that delegate to EventBus actors. The `@semiont/http-transport` exposes the same operations via verb-oriented namespaces (`semiont.browse`, `semiont.mark`, `semiont.gather`, etc.).
|
|
23
31
|
|
|
24
32
|
## Quick Start
|
|
25
33
|
|
|
@@ -44,7 +52,7 @@ const makeMeaning = await startMakeMeaning(project, config, eventBus, logger);
|
|
|
44
52
|
|
|
45
53
|
// Access components
|
|
46
54
|
const { knowledgeSystem, jobQueue } = makeMeaning;
|
|
47
|
-
const { kb, stower, browser, gatherer, matcher,
|
|
55
|
+
const { kb, stower, browser, gatherer, matcher, cloneTokenManager } = knowledgeSystem;
|
|
48
56
|
|
|
49
57
|
// Graceful shutdown
|
|
50
58
|
await makeMeaning.stop();
|
|
@@ -52,33 +60,37 @@ await makeMeaning.stop();
|
|
|
52
60
|
|
|
53
61
|
This single call initializes:
|
|
54
62
|
- **KnowledgeSystem** — groups the Knowledge Base and its actors
|
|
55
|
-
- **KnowledgeBase** — groups EventStore, ViewStorage, WorkingTreeStore, GraphDatabase, GraphDBConsumer, and optionally VectorStore
|
|
63
|
+
- **KnowledgeBase** — groups EventStore, ViewStorage, WorkingTreeStore, GraphDatabase, GraphDBConsumer, and optionally VectorStore
|
|
56
64
|
- **Stower** — subscribes to write commands on EventBus
|
|
57
65
|
- **Browser** — subscribes to all KB read queries and directory browse requests on EventBus
|
|
58
66
|
- **Gatherer** — subscribes to annotation and resource gather requests on EventBus; searches vectors for semantically similar passages
|
|
59
67
|
- **Matcher** — subscribes to candidate search requests on EventBus
|
|
60
|
-
- **Smelter** — subscribes to resource/annotation events, chunks text, embeds, indexes into Qdrant
|
|
61
68
|
- **CloneTokenManager** — subscribes to clone token operations on EventBus
|
|
62
69
|
- **JobQueue** — background job processing queue + job status subscription
|
|
63
|
-
- **
|
|
70
|
+
- **Bus command handlers** — request-channel translators registered via `registerBusHandlers`
|
|
71
|
+
|
|
72
|
+
It does **not** start the Smelter (a standalone process — `@semiont/make-meaning/smelter-main`) or the job workers (the worker process in [@semiont/jobs](../jobs/) — see [Job Workers](./docs/job-workers.md)).
|
|
64
73
|
|
|
65
74
|
### Gather Context (via EventBus)
|
|
66
75
|
|
|
67
76
|
```typescript
|
|
68
77
|
import { firstValueFrom, race, filter, timeout } from 'rxjs';
|
|
69
78
|
|
|
79
|
+
const correlationId = crypto.randomUUID();
|
|
80
|
+
|
|
70
81
|
// Emit gather request for an annotation
|
|
71
82
|
eventBus.get('gather:requested').next({
|
|
72
|
-
|
|
83
|
+
correlationId,
|
|
84
|
+
annotationId,
|
|
73
85
|
resourceId,
|
|
74
|
-
options: {
|
|
86
|
+
options: { contextWindow: 1000 },
|
|
75
87
|
});
|
|
76
88
|
|
|
77
89
|
// Await result
|
|
78
90
|
const result = await firstValueFrom(
|
|
79
91
|
race(
|
|
80
|
-
eventBus.get('gather:complete').pipe(filter(e => e.
|
|
81
|
-
eventBus.get('gather:failed').pipe(filter(e => e.
|
|
92
|
+
eventBus.get('gather:complete').pipe(filter(e => e.correlationId === correlationId)),
|
|
93
|
+
eventBus.get('gather:failed').pipe(filter(e => e.correlationId === correlationId)),
|
|
82
94
|
).pipe(timeout(30_000)),
|
|
83
95
|
);
|
|
84
96
|
```
|
|
@@ -100,7 +112,8 @@ graph TB
|
|
|
100
112
|
BROWSER["Browser<br/>(read)"]
|
|
101
113
|
GATHERER["Gatherer<br/>(context assembly)"]
|
|
102
114
|
MATCHER["Matcher<br/>(search/link)"]
|
|
103
|
-
SMELTER["Smelter<br/>(embed)"]
|
|
115
|
+
SMELTER["Smelter<br/>(embed pipeline, standalone process)"]
|
|
116
|
+
GC["Graph Consumer<br/>(graph pipeline)"]
|
|
104
117
|
CTM["CloneTokenManager<br/>(clone)"]
|
|
105
118
|
KB["Knowledge Base"]
|
|
106
119
|
VECTORS["Vector Store<br/>(Qdrant)"]
|
|
@@ -112,21 +125,22 @@ graph TB
|
|
|
112
125
|
MATCHER -->|search| VECTORS
|
|
113
126
|
SMELTER -->|embed & index| VECTORS
|
|
114
127
|
SMELTER -->|read| KB
|
|
128
|
+
GC -->|project| KB
|
|
115
129
|
CTM -->|query| KB
|
|
116
130
|
end
|
|
117
131
|
|
|
118
|
-
BUS -->|"yield:create, yield:update, yield:mv<br/>mark:create, mark:delete, mark:update-body<br/>frame:add-entity-type, mark:archive, mark:unarchive
|
|
119
|
-
BUS -->|"browse:resource-requested, browse:resources-requested<br/>browse:annotations-requested, browse:annotation-requested<br/>browse:events-requested, browse:annotation-history-requested<br/>browse:referenced-by-requested, browse:entity-types-requested<br/>browse:directory-requested"| BROWSER
|
|
132
|
+
BUS -->|"yield:create, yield:update, yield:mv<br/>mark:create, mark:delete, mark:update-body<br/>frame:add-entity-type, frame:add-tag-schema<br/>mark:archive, mark:unarchive, mark:update-entity-types<br/>job:start, job:complete, job:fail"| STOWER
|
|
133
|
+
BUS -->|"browse:resource-requested, browse:resources-requested<br/>browse:annotations-requested, browse:annotation-requested<br/>browse:events-requested, browse:annotation-history-requested<br/>browse:referenced-by-requested, browse:entity-types-requested<br/>browse:tag-schemas-requested, browse:directory-requested"| BROWSER
|
|
120
134
|
BUS -->|"gather:requested<br/>gather:resource-requested"| GATHERER
|
|
121
135
|
BUS -->|"match:search-requested"| MATCHER
|
|
122
|
-
BUS -->|"yield:created,
|
|
136
|
+
BUS -->|"domain events:<br/>yield:created, yield:updated<br/>yield:representation-added<br/>mark:added, mark:removed, mark:archived"| SMELTER
|
|
137
|
+
BUS -->|"graph-relevant<br/>domain events"| GC
|
|
123
138
|
BUS -->|"yield:clone-token-requested<br/>yield:clone-resource-requested<br/>yield:clone-create"| CTM
|
|
124
139
|
|
|
125
|
-
STOWER -->|"yield:
|
|
126
|
-
BROWSER -->|"browse:resource-result, browse:resources-result<br/>browse:annotations-result, browse:annotation-result<br/>browse:events-result, browse:annotation-history-result<br/>browse:referenced-by-result, browse:entity-types-result<br/>browse:directory-result"| BUS
|
|
140
|
+
STOWER -->|"yield:create-ok, yield:update-ok, yield:move-ok<br/>mark:delete-ok, *-failed replies<br/>(domain events are republished onto the bus<br/>by the EventStore: yield:created, mark:added, ...)"| BUS
|
|
141
|
+
BROWSER -->|"browse:resource-result, browse:resources-result<br/>browse:annotations-result, browse:annotation-result<br/>browse:events-result, browse:annotation-history-result<br/>browse:referenced-by-result, browse:entity-types-result<br/>browse:tag-schemas-result, browse:directory-result"| BUS
|
|
127
142
|
GATHERER -->|"gather:complete, gather:failed<br/>gather:resource-complete, gather:resource-failed"| BUS
|
|
128
143
|
MATCHER -->|"match:search-results, match:search-failed"| BUS
|
|
129
|
-
SMELTER -->|"embedding:compute,<br/>embedding:delete"| BUS
|
|
130
144
|
CTM -->|"yield:clone-token-generated<br/>yield:clone-resource-result<br/>yield:clone-created"| BUS
|
|
131
145
|
|
|
132
146
|
classDef bus fill:#e8a838,stroke:#b07818,stroke-width:3px,color:#000,font-weight:bold
|
|
@@ -136,7 +150,7 @@ graph TB
|
|
|
136
150
|
|
|
137
151
|
class BUS bus
|
|
138
152
|
classDef vectorstore fill:#6b8e9d,stroke:#4a6a7a,stroke-width:2px,color:#fff
|
|
139
|
-
class STOWER,BROWSER,GATHERER,MATCHER,SMELTER,CTM actor
|
|
153
|
+
class STOWER,BROWSER,GATHERER,MATCHER,SMELTER,GC,CTM actor
|
|
140
154
|
class KB kb
|
|
141
155
|
class VECTORS vectorstore
|
|
142
156
|
class Routes,Workers,EBC caller
|
|
@@ -146,24 +160,25 @@ graph TB
|
|
|
146
160
|
|
|
147
161
|
The **Knowledge System** binds the Knowledge Base to its actors. Nothing outside the Knowledge System reads or writes the Knowledge Base directly.
|
|
148
162
|
|
|
149
|
-
The **Knowledge Base** is an inert store — it has no intelligence, no goals, no decisions. It groups five core subsystems and
|
|
163
|
+
The **Knowledge Base** is an inert store — it has no intelligence, no goals, no decisions. It groups five core subsystems and one optional one:
|
|
150
164
|
|
|
151
165
|
| Store | Implementation | Purpose |
|
|
152
166
|
|-------|---------------|---------|
|
|
153
167
|
| **Event Log** | `EventStore` | Immutable append-only log of all domain events |
|
|
154
|
-
| **Materialized Views** | `ViewStorage` | Denormalized projections for fast reads |
|
|
168
|
+
| **Materialized Views** | `ViewStorage` | Denormalized projections for fast reads (materialized synchronously on append) |
|
|
155
169
|
| **Content Store** | `WorkingTreeStore` | Working-tree files addressed by URI |
|
|
156
170
|
| **Graph** | `GraphDatabase` | Eventually consistent relationship projection |
|
|
157
|
-
| **Graph Consumer** | `GraphDBConsumer` | Event-to-graph
|
|
171
|
+
| **Graph Consumer** | `GraphDBConsumer` | Event-to-graph projection pipeline (one of the two pipeline actors; carried on the KB record because `createKnowledgeBase()` constructs and starts it) |
|
|
158
172
|
| **Vectors** *(optional)* | `VectorStore` | Semantic vector index (Qdrant + memory) via `@semiont/vectors` |
|
|
159
|
-
|
|
173
|
+
|
|
174
|
+
Its sibling pipeline, the Smelter (event-to-vector projection), is **not** a KB member — it runs as a standalone process via `@semiont/make-meaning/smelter-main`.
|
|
160
175
|
|
|
161
176
|
```typescript
|
|
162
177
|
import { createKnowledgeBase } from '@semiont/make-meaning';
|
|
163
178
|
|
|
164
|
-
const kb = await createKnowledgeBase(eventStore, project, graphDb, logger);
|
|
179
|
+
const kb = await createKnowledgeBase(eventStore, project, graphDb, eventBus, logger, options);
|
|
165
180
|
// kb.eventStore, kb.views, kb.content, kb.graph, kb.graphConsumer
|
|
166
|
-
// kb.vectors (optional), kb.
|
|
181
|
+
// kb.vectors (optional), kb.projectionsDir
|
|
167
182
|
```
|
|
168
183
|
|
|
169
184
|
### EventBus Ownership
|
|
@@ -196,7 +211,7 @@ This pattern (functional core, imperative shell) is shared with `@semiont/event-
|
|
|
196
211
|
### Service (Primary)
|
|
197
212
|
|
|
198
213
|
- `startMakeMeaning(project, config, eventBus, logger)` — Initialize all infrastructure
|
|
199
|
-
- `MakeMeaningService` — Type for service return value (`knowledgeSystem`, `jobQueue`, `
|
|
214
|
+
- `MakeMeaningService` — Type for service return value (`knowledgeSystem`, `jobQueue`, `stop`)
|
|
200
215
|
|
|
201
216
|
### Knowledge System
|
|
202
217
|
|
|
@@ -205,8 +220,8 @@ This pattern (functional core, imperative shell) is shared with `@semiont/event-
|
|
|
205
220
|
|
|
206
221
|
### Knowledge Base
|
|
207
222
|
|
|
208
|
-
- `createKnowledgeBase(eventStore, project, graphDb, logger)` — Async factory function
|
|
209
|
-
- `KnowledgeBase` — Interface grouping the
|
|
223
|
+
- `createKnowledgeBase(eventStore, project, graphDb, eventBus, logger, options?)` — Async factory function
|
|
224
|
+
- `KnowledgeBase` — Interface grouping the KB stores (`eventStore`, `views`, `content`, `graph`, optional `vectors`) plus the `graphConsumer` pipeline
|
|
210
225
|
|
|
211
226
|
### Actors
|
|
212
227
|
|
|
@@ -214,8 +229,10 @@ This pattern (functional core, imperative shell) is shared with `@semiont/event-
|
|
|
214
229
|
- `Browser` — Read actor (all KB queries, directory listings merged with KB metadata)
|
|
215
230
|
- `Gatherer` — Context assembly actor (annotation and resource gather flows; vector semantic search)
|
|
216
231
|
- `Matcher` — Search/link actor (context-driven candidate search with structural + semantic scoring)
|
|
217
|
-
- `Smelter` — Embedding pipeline actor (chunk, embed, persist, index into vector store)
|
|
218
232
|
- `CloneTokenManager` — Clone token lifecycle actor (yield domain)
|
|
233
|
+
- `Smelter` / `createSmelterActorStateUnit` / `WorkerContentTransport` — the embedding pipeline, its domain-event fan-in, and the worker-side content transport; wired together by the standalone `@semiont/make-meaning/smelter-main` entry point, and exported for callers that run the pipeline on their own `WorkerBus`
|
|
234
|
+
|
|
235
|
+
The Graph Consumer (`GraphDBConsumer`) is not exported — `createKnowledgeBase()` constructs it internally and exposes it as `kb.graphConsumer`.
|
|
219
236
|
|
|
220
237
|
### Operations
|
|
221
238
|
|
|
@@ -237,7 +254,7 @@ This pattern (functional core, imperative shell) is shared with `@semiont/event-
|
|
|
237
254
|
## Dependencies
|
|
238
255
|
|
|
239
256
|
- **[@semiont/core](../core/)** — Core types, EventBus, utilities
|
|
240
|
-
- **[@semiont/
|
|
257
|
+
- **[@semiont/http-transport](../http-transport/)** — OpenAPI-generated types
|
|
241
258
|
- **[@semiont/event-sourcing](../event-sourcing/)** — Event store and view storage
|
|
242
259
|
- **[@semiont/content](../content/)** — Content-addressed storage
|
|
243
260
|
- **[@semiont/graph](../graph/)** — Graph database abstraction
|
|
@@ -245,6 +262,8 @@ This pattern (functional core, imperative shell) is shared with `@semiont/event-
|
|
|
245
262
|
- **[@semiont/inference](../inference/)** — AI primitives (generateText)
|
|
246
263
|
- **[@semiont/vectors](../vectors/)** — Vector store abstraction (Qdrant + memory) and embedding providers (Voyage, Ollama)
|
|
247
264
|
- **[@semiont/jobs](../jobs/)** — Job queue and annotation workers
|
|
265
|
+
- **[@semiont/observability](../observability/)** — Actor spans and metrics providers
|
|
266
|
+
- **[@semiont/sdk](../sdk/)** — `StateUnit` / `WorkerBus` types (used by the Smelter actor state unit)
|
|
248
267
|
|
|
249
268
|
## Testing
|
|
250
269
|
|
package/dist/index.d.ts
CHANGED
|
@@ -1,14 +1,13 @@
|
|
|
1
1
|
import { JobQueue } from '@semiont/jobs';
|
|
2
2
|
import { SemiontProject } from '@semiont/core/node';
|
|
3
|
-
import { GraphServiceConfig, VectorsServiceConfig, EmbeddingServiceConfig, EventBus, Logger, StoredEvent, ResourceId, ResourceDescriptor, AnnotationId, components, ITransport, BaseUrl, ConnectionState, SemiontError, UserDID, EventMap, IContentTransport, PutBinaryRequest, PutBinaryOptions,
|
|
4
|
-
export { AssembledAnnotation, applyBodyOperations, assembleAnnotation } from '@semiont/core';
|
|
3
|
+
import { GraphServiceConfig, VectorsServiceConfig, EmbeddingServiceConfig, EventBus, Logger, StoredEvent, ResourceId, ResourceDescriptor, AnnotationId, components, ITransport, BaseUrl, ConnectionState, SemiontError, UserDID, EventMap, IContentTransport, PutBinaryRequest, PutBinaryOptions, AccessToken, Annotation, UserId, ResourceAnnotations, AnnotationCategory, GraphPath, GraphConnection } from '@semiont/core';
|
|
5
4
|
import { EventStore, ViewStorage } from '@semiont/event-sourcing';
|
|
6
5
|
import { WorkingTreeStore } from '@semiont/content';
|
|
7
6
|
import { GraphDatabase } from '@semiont/graph';
|
|
8
|
-
import { VectorStore, EmbeddingProvider } from '@semiont/vectors';
|
|
7
|
+
import { VectorStore, EmbeddingProvider, ChunkingConfig } from '@semiont/vectors';
|
|
9
8
|
import { InferenceClient } from '@semiont/inference';
|
|
10
9
|
import { BehaviorSubject, Observable } from 'rxjs';
|
|
11
|
-
import { StateUnit, WorkerBus } from '@semiont/sdk';
|
|
10
|
+
import { StateUnit, WorkerBus, BusRequestPrimitive } from '@semiont/sdk';
|
|
12
11
|
import { Writable, Readable } from 'node:stream';
|
|
13
12
|
|
|
14
13
|
/**
|
|
@@ -195,7 +194,7 @@ declare function createKnowledgeBase(eventStore: EventStore, project: SemiontPro
|
|
|
195
194
|
*
|
|
196
195
|
* The single write gateway to the Knowledge Base. Subscribes to command
|
|
197
196
|
* events on the EventBus and translates them into domain events on the
|
|
198
|
-
* EventStore + content
|
|
197
|
+
* EventStore + content operations on the WorkingTreeStore.
|
|
199
198
|
*
|
|
200
199
|
* From ARCHITECTURE.md:
|
|
201
200
|
* The Knowledge Base has exactly three actor interfaces:
|
|
@@ -203,7 +202,8 @@ declare function createKnowledgeBase(eventStore: EventStore, project: SemiontPro
|
|
|
203
202
|
* - Gatherer (read context)
|
|
204
203
|
* - Matcher (read search)
|
|
205
204
|
*
|
|
206
|
-
* No other code should call eventStore.appendEvent() or
|
|
205
|
+
* No other code should call eventStore.appendEvent() or mutate the working tree
|
|
206
|
+
* through kb.content.
|
|
207
207
|
*
|
|
208
208
|
* Subscriptions:
|
|
209
209
|
* - yield:create → resource.created (+ content store) → yield:created / yield:create-failed
|
|
@@ -435,12 +435,16 @@ declare class CloneTokenManager {
|
|
|
435
435
|
* Nothing outside the KnowledgeSystem reads or writes the KnowledgeBase directly.
|
|
436
436
|
*
|
|
437
437
|
* - kb: the durable store (event log, views, content, graph)
|
|
438
|
-
* - stower: write actor —
|
|
439
|
-
* -
|
|
440
|
-
* -
|
|
441
|
-
* -
|
|
438
|
+
* - stower: write actor — the single write gateway
|
|
439
|
+
* - browser: read actor — all KB queries plus directory listings
|
|
440
|
+
* - gatherer: context-assembly actor — builds GatheredContext from passage, graph, and vectors
|
|
441
|
+
* - matcher: search actor — context-driven candidate search and scoring
|
|
442
442
|
* - cloneTokenManager: token actor — manages resource clone tokens
|
|
443
443
|
*
|
|
444
|
+
* These are the five access actors. Two projection-pipeline actors complete
|
|
445
|
+
* the seven: the Graph Consumer (kb.graphConsumer, started by
|
|
446
|
+
* createKnowledgeBase) and the Smelter (standalone process via smelter-main).
|
|
447
|
+
*
|
|
444
448
|
* EventBus, JobQueue, and workers are peers to KnowledgeSystem, not members.
|
|
445
449
|
*/
|
|
446
450
|
|
|
@@ -561,27 +565,34 @@ declare class LocalTransport implements ITransport {
|
|
|
561
565
|
* resource-creation pipeline the HTTP `/resources` handler uses.
|
|
562
566
|
*/
|
|
563
567
|
|
|
568
|
+
type GetResourceResponse = components['schemas']['GetResourceResponse'];
|
|
564
569
|
declare class LocalContentTransport implements IContentTransport {
|
|
565
570
|
private readonly ks;
|
|
566
571
|
constructor(ks: KnowledgeSystem);
|
|
567
572
|
putBinary(_request: PutBinaryRequest, _options?: PutBinaryOptions): Promise<{
|
|
568
573
|
resourceId: ResourceId;
|
|
569
574
|
}>;
|
|
570
|
-
getBinary(resourceId: ResourceId,
|
|
571
|
-
accept?: ContentFormat$1 | string;
|
|
575
|
+
getBinary(resourceId: ResourceId, _options?: {
|
|
572
576
|
auth?: AccessToken;
|
|
573
577
|
}): Promise<{
|
|
574
578
|
data: ArrayBuffer;
|
|
575
579
|
contentType: string;
|
|
576
580
|
}>;
|
|
577
|
-
getBinaryStream(resourceId: ResourceId,
|
|
578
|
-
accept?: ContentFormat$1 | string;
|
|
581
|
+
getBinaryStream(resourceId: ResourceId, _options?: {
|
|
579
582
|
auth?: AccessToken;
|
|
580
583
|
}): Promise<{
|
|
581
584
|
stream: ReadableStream<Uint8Array>;
|
|
582
585
|
contentType: string;
|
|
583
586
|
}>;
|
|
584
587
|
private loadBinary;
|
|
588
|
+
/**
|
|
589
|
+
* Assemble the resource's JSON-LD graph in-process from the KB — the local
|
|
590
|
+
* realization of `IContentTransport.getResourceGraph` (symmetric with
|
|
591
|
+
* getBinary; SIMPLER-JSON-LD.md decision 7). Local mode has no auth.
|
|
592
|
+
*/
|
|
593
|
+
getResourceGraph(resourceId: ResourceId, _options?: {
|
|
594
|
+
auth?: AccessToken;
|
|
595
|
+
}): Promise<GetResourceResponse>;
|
|
585
596
|
dispose(): void;
|
|
586
597
|
}
|
|
587
598
|
|
|
@@ -709,11 +720,178 @@ interface SmelterActorStateUnitOptions {
|
|
|
709
720
|
}
|
|
710
721
|
interface SmelterActorStateUnit extends StateUnit {
|
|
711
722
|
events$: Observable<SmelterEvent>;
|
|
712
|
-
emit(channel: string, payload: Record<string, unknown>): Promise<void>;
|
|
713
723
|
start(): void;
|
|
714
724
|
}
|
|
715
725
|
declare function createSmelterActorStateUnit(options: SmelterActorStateUnitOptions): SmelterActorStateUnit;
|
|
716
726
|
|
|
727
|
+
/**
|
|
728
|
+
* Smelter — event-to-vector pipeline for the standalone smelter worker.
|
|
729
|
+
*
|
|
730
|
+
* Consumes the smelter-relevant domain events surfaced by
|
|
731
|
+
* `SmelterActorStateUnit.events$`, reads resource content via the injected
|
|
732
|
+
* `IContentTransport` (HTTP verbatim mode in worker deployments — the
|
|
733
|
+
* stored bytes, untouched), chunks and embeds it via the configured
|
|
734
|
+
* EmbeddingProvider, and indexes vectors into the VectorStore (Qdrant).
|
|
735
|
+
* `smelter-main` is the container entry point that wires this up.
|
|
736
|
+
*
|
|
737
|
+
* ## Per-resource serialization
|
|
738
|
+
*
|
|
739
|
+
* Smelter processes events strictly in order per resourceId via
|
|
740
|
+
* `groupBy(resourceId) + concatMap(...)`. This is the stream-consumer
|
|
741
|
+
* flavor of per-resource serialization — the same invariant enforced by
|
|
742
|
+
* `GraphDBConsumer`, `Gatherer`, and (in a different shape) `ViewManager`.
|
|
743
|
+
* See `packages/core/src/serialize-per-key.ts` for the shared primitive
|
|
744
|
+
* used by RPC-style services.
|
|
745
|
+
*
|
|
746
|
+
* ## Batching
|
|
747
|
+
*
|
|
748
|
+
* `burstBuffer` collects event bursts per resource; consecutive same-type
|
|
749
|
+
* runs within a burst share a single `embedBatch()` call.
|
|
750
|
+
*
|
|
751
|
+
* ## Reconciliation
|
|
752
|
+
*
|
|
753
|
+
* Qdrant is an ephemeral projection of the event log. `reconcile()` brings
|
|
754
|
+
* it back in sync at startup — after a wiped volume, or after events missed
|
|
755
|
+
* while the worker was down. It is a planner: it diffs the store against the
|
|
756
|
+
* catalog (over the `browse:*` RPC channels) — both membership AND content
|
|
757
|
+
* freshness, via the checksum stamped onto every resource upsert — and
|
|
758
|
+
* enqueues `smelt:*` work items through the same mailbox as live events, so
|
|
759
|
+
* per-resource ordering holds across the two paths (axioms S1/S2/S11/S12 in
|
|
760
|
+
* `.plans/SMELTER-AXIOMS.md`).
|
|
761
|
+
*/
|
|
762
|
+
|
|
763
|
+
interface ReconcileSummary {
|
|
764
|
+
resourcesEmbedded: number;
|
|
765
|
+
resourceVectorsDeleted: number;
|
|
766
|
+
annotationsEmbedded: number;
|
|
767
|
+
annotationVectorsDeleted: number;
|
|
768
|
+
}
|
|
769
|
+
type ReconcileState = {
|
|
770
|
+
phase: 'pending';
|
|
771
|
+
} | {
|
|
772
|
+
phase: 'running';
|
|
773
|
+
} | {
|
|
774
|
+
phase: 'done';
|
|
775
|
+
summary: ReconcileSummary;
|
|
776
|
+
} | {
|
|
777
|
+
phase: 'failed';
|
|
778
|
+
error: string;
|
|
779
|
+
};
|
|
780
|
+
/**
|
|
781
|
+
* Burst-buffer timings for the event pipeline. Required — `smelter-main`
|
|
782
|
+
* passes production values (50/100/200); test harnesses pass ~1ms values so
|
|
783
|
+
* property suites run at generator speed. See `.plans/SMELTER-AXIOMS.md` (D4).
|
|
784
|
+
*/
|
|
785
|
+
interface SmelterTiming {
|
|
786
|
+
burstWindowMs: number;
|
|
787
|
+
maxBatchSize: number;
|
|
788
|
+
idleTimeoutMs: number;
|
|
789
|
+
}
|
|
790
|
+
/**
|
|
791
|
+
* Reconcile-planner work items — enqueued through the same mailbox as wire
|
|
792
|
+
* events. Distinct `smelt:*` types make forged domain events unrepresentable
|
|
793
|
+
* (`.plans/SMELTER-AXIOMS.md`, D1); the shared shape lets the per-resource
|
|
794
|
+
* lanes and batch paths serve both kinds of input.
|
|
795
|
+
*/
|
|
796
|
+
interface SmelterWorkItem {
|
|
797
|
+
type: 'smelt:embed' | 'smelt:purge' | 'smelt:embed-annotation' | 'smelt:purge-annotation';
|
|
798
|
+
resourceId: string;
|
|
799
|
+
payload: Record<string, unknown>;
|
|
800
|
+
}
|
|
801
|
+
type SmelterInput = SmelterEvent | SmelterWorkItem;
|
|
802
|
+
declare class Smelter {
|
|
803
|
+
private events$;
|
|
804
|
+
private vectorStore;
|
|
805
|
+
private embeddingProvider;
|
|
806
|
+
private content;
|
|
807
|
+
private bus;
|
|
808
|
+
private chunkingConfig;
|
|
809
|
+
private timing;
|
|
810
|
+
private logger;
|
|
811
|
+
private static readonly RECONCILE_PAGE_SIZE;
|
|
812
|
+
/** Bound on concurrently in-flight reconcile work — a cold rebuild must not fan out unbounded embedding calls. */
|
|
813
|
+
private static readonly RECONCILE_WAVE;
|
|
814
|
+
private eventSubject;
|
|
815
|
+
private sourceSubscription;
|
|
816
|
+
private pipelineSubscription;
|
|
817
|
+
private _eventsProcessed;
|
|
818
|
+
private _reconcileState;
|
|
819
|
+
private workDone;
|
|
820
|
+
private workWaiter;
|
|
821
|
+
constructor(events$: Observable<SmelterEvent>, vectorStore: VectorStore, embeddingProvider: EmbeddingProvider, content: IContentTransport, bus: BusRequestPrimitive, chunkingConfig: ChunkingConfig, timing: SmelterTiming, logger: Logger);
|
|
822
|
+
get eventsProcessed(): number;
|
|
823
|
+
get reconcileState(): ReconcileState;
|
|
824
|
+
initialize(): void;
|
|
825
|
+
stop(): void;
|
|
826
|
+
private noteWorkDone;
|
|
827
|
+
/**
|
|
828
|
+
* Returns the number of WIRE events processed without error (the S9b
|
|
829
|
+
* oracle) — `smelt:*` work-item runs tick the drain counter instead.
|
|
830
|
+
*/
|
|
831
|
+
private processBatch;
|
|
832
|
+
/**
|
|
833
|
+
* Batch-optimized processing for consecutive events of the same type.
|
|
834
|
+
* Returns the number of events processed without error.
|
|
835
|
+
*/
|
|
836
|
+
private applyBatchByType;
|
|
837
|
+
/** Returns true if the input was processed without error. */
|
|
838
|
+
private safeProcessEvent;
|
|
839
|
+
private processEvent;
|
|
840
|
+
private handleResourcePurge;
|
|
841
|
+
/**
|
|
842
|
+
* Resolve a resource's embeddable text: bytes via the content transport,
|
|
843
|
+
* gated to media types that decode as text, decoded charset-aware. The
|
|
844
|
+
* checksum is over the raw bytes actually read — stamped onto the vectors
|
|
845
|
+
* so reconciliation can compare against the catalog's claim (S12). Returns
|
|
846
|
+
* null (logged) when the resource doesn't decode as text, is unavailable,
|
|
847
|
+
* or is empty — callers skip it.
|
|
848
|
+
*/
|
|
849
|
+
private fetchEmbeddableText;
|
|
850
|
+
private embedResource;
|
|
851
|
+
private handleResourceArchived;
|
|
852
|
+
private handleAnnotationAdded;
|
|
853
|
+
private handleAnnotationRemoved;
|
|
854
|
+
/**
|
|
855
|
+
* Batch-embed chunks from multiple yield:created events in a single
|
|
856
|
+
* embedBatch() call, then index per resource.
|
|
857
|
+
*/
|
|
858
|
+
private batchResourceCreated;
|
|
859
|
+
/**
|
|
860
|
+
* Batch-embed exact texts from multiple mark:added events in a single
|
|
861
|
+
* embedBatch() call, then index per annotation.
|
|
862
|
+
*/
|
|
863
|
+
private batchAnnotationAdded;
|
|
864
|
+
/**
|
|
865
|
+
* Reconcile the vector store against the KS catalog.
|
|
866
|
+
*
|
|
867
|
+
* Lists what IS indexed (via the store's id enumeration) and what SHOULD
|
|
868
|
+
* be (non-archived resources with embeddable media types, plus their
|
|
869
|
+
* exact-text annotations, via the `browse:*` RPC channels), then plans the
|
|
870
|
+
* diff as `smelt:*` work items — embeds for what's missing, purges for
|
|
871
|
+
* what shouldn't be there — and drains them through the pipeline mailbox.
|
|
872
|
+
* Work items share the per-resource lanes with live events, so a reconcile
|
|
873
|
+
* re-embed can never interleave with (or stale-overwrite) live processing
|
|
874
|
+
* of the same resource (axioms S1/S2). Waves of RECONCILE_WAVE bound how
|
|
875
|
+
* many embedding calls a cold rebuild has in flight.
|
|
876
|
+
*
|
|
877
|
+
* Call after the live subscription is attached so nothing falls in the
|
|
878
|
+
* gap. The index snapshot is taken BEFORE the catalog listing so a
|
|
879
|
+
* resource indexed by a live event mid-reconcile is never mistaken for an
|
|
880
|
+
* orphan; convergence holds because every upsert replaces a resource's
|
|
881
|
+
* full vector set from current content.
|
|
882
|
+
*/
|
|
883
|
+
reconcile(): Promise<ReconcileSummary>;
|
|
884
|
+
/**
|
|
885
|
+
* Enqueue planner work through the mailbox in bounded waves and await
|
|
886
|
+
* completion. The pipeline ticks `noteWorkDone` for every consumed work
|
|
887
|
+
* item (success or failure — failures are logged like any live event), so
|
|
888
|
+
* each wave's waiter resolves exactly when its items have been processed.
|
|
889
|
+
*/
|
|
890
|
+
private drain;
|
|
891
|
+
/** Page through `browse:resources-requested` until the catalog is exhausted. */
|
|
892
|
+
private listAllResources;
|
|
893
|
+
}
|
|
894
|
+
|
|
717
895
|
/**
|
|
718
896
|
* Exchange Format Manifest Types
|
|
719
897
|
*
|
|
@@ -760,9 +938,9 @@ declare function validateManifestVersion(version: number): void;
|
|
|
760
938
|
*
|
|
761
939
|
* Produces a lossless tar.gz archive of the system of record:
|
|
762
940
|
* - Event log (all streams, JSONL format)
|
|
763
|
-
* -
|
|
941
|
+
* - Working-tree content (archived as checksum-named blobs)
|
|
764
942
|
*
|
|
765
|
-
* Reads events via EventStore and content via
|
|
943
|
+
* Reads events via EventStore and content via WorkingTreeStore.
|
|
766
944
|
* The archive can restore a complete knowledge base.
|
|
767
945
|
*/
|
|
768
946
|
|
|
@@ -940,7 +1118,7 @@ declare function importLinkedData(archive: Readable, options: LinkedDataImporter
|
|
|
940
1118
|
* Business logic for resource operations. All writes go through the EventBus
|
|
941
1119
|
* — the Stower actor subscribes and handles persistence.
|
|
942
1120
|
*
|
|
943
|
-
* For create: emits yield:create, awaits yield:
|
|
1121
|
+
* For create: emits yield:create, awaits yield:create-ok / yield:create-failed.
|
|
944
1122
|
*/
|
|
945
1123
|
|
|
946
1124
|
type ContentFormat = components['schemas']['ContentFormat'];
|
|
@@ -1242,8 +1420,5 @@ declare function generateResourceSummary(resourceName: string, content: string,
|
|
|
1242
1420
|
*/
|
|
1243
1421
|
declare function generateReferenceSuggestions(referenceTitle: string, client: InferenceClient, entityType?: string, currentContent?: string): Promise<string[] | null>;
|
|
1244
1422
|
|
|
1245
|
-
|
|
1246
|
-
|
|
1247
|
-
|
|
1248
|
-
export { AnnotationContext, AnnotationOperations, BACKUP_FORMAT, Browser, CloneTokenManager, FORMAT_VERSION, Gatherer$1 as Gatherer, GraphContext, LLMContext, LocalContentTransport, LocalTransport, Matcher, PACKAGE_NAME, ResourceContext, ResourceOperations, Stower, VERSION, bootstrapEntityTypes, createKnowledgeBase, createSmelterActorStateUnit, exportBackup, exportLinkedData, generateReferenceSuggestions, generateResourceSummary, importBackup, importLinkedData, isBackupManifest, readEntityTypesProjection, registerAnnotationAssemblyHandler, registerAnnotationLookupHandlers, registerBindUpdateBodyHandler, registerBusHandlers, registerJobCommandHandlers, startMakeMeaning, stopKnowledgeSystem, validateManifestVersion };
|
|
1249
|
-
export type { BackupContentReader, BackupEventStoreReader, BackupExporterOptions, BackupImportResult, BackupImporterOptions, BackupManifestHeader, BackupStreamSummary, BuildContextOptions, ContentBlobResolver, CreateAnnotationResult, CreateResourceInput, CreateResourceResult, GraphEdge, GraphNode, GraphRepresentation, KnowledgeBase, KnowledgeSystem, LLMContextOptions, LinkedDataContentReader, LinkedDataExporterOptions, LinkedDataImportResult, LinkedDataImporterOptions, LinkedDataViewReader, ListResourcesFilters, LocalTransportConfig, MakeMeaningConfig, MakeMeaningService, ReplayStats, SmelterActorStateUnit, SmelterActorStateUnitOptions, SmelterEvent, UpdateAnnotationBodyResult };
|
|
1423
|
+
export { AnnotationContext, AnnotationOperations, BACKUP_FORMAT, Browser, CloneTokenManager, FORMAT_VERSION, Gatherer$1 as Gatherer, GraphContext, LLMContext, LocalContentTransport, LocalTransport, Matcher, ResourceContext, ResourceOperations, Smelter, Stower, bootstrapEntityTypes, createKnowledgeBase, createSmelterActorStateUnit, exportBackup, exportLinkedData, generateReferenceSuggestions, generateResourceSummary, importBackup, importLinkedData, isBackupManifest, readEntityTypesProjection, registerAnnotationAssemblyHandler, registerAnnotationLookupHandlers, registerBindUpdateBodyHandler, registerBusHandlers, registerJobCommandHandlers, startMakeMeaning, stopKnowledgeSystem, validateManifestVersion };
|
|
1424
|
+
export type { BackupContentReader, BackupEventStoreReader, BackupExporterOptions, BackupImportResult, BackupImporterOptions, BackupManifestHeader, BackupStreamSummary, BuildContextOptions, ContentBlobResolver, CreateAnnotationResult, CreateResourceInput, CreateResourceResult, GraphEdge, GraphNode, GraphRepresentation, KnowledgeBase, KnowledgeSystem, LLMContextOptions, LinkedDataContentReader, LinkedDataExporterOptions, LinkedDataImportResult, LinkedDataImporterOptions, LinkedDataViewReader, ListResourcesFilters, LocalTransportConfig, MakeMeaningConfig, MakeMeaningService, ReconcileState, ReconcileSummary, ReplayStats, SmelterActorStateUnit, SmelterActorStateUnitOptions, SmelterEvent, SmelterInput, SmelterTiming, SmelterWorkItem, UpdateAnnotationBodyResult };
|