npm - knolo-core - Versions diffs - 0.2.3 → 3.1.0 - Mend

knolo-core 0.2.3 → 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

package/DOCS.md CHANGED Viewed

@@ -1,7 +1,9 @@
 # DOCS.md — KnoLo Core
-> Deterministic, embedding-free retrieval and portable knowledge packs.
+> Deterministic, embedding-first optional hybrid retrieval and portable knowledge packs.
+Determinism note: lexical retrieval is deterministic, and semantic rerank is deterministic given the same `.knolo` pack bytes, query embedding model, and embedding provider outputs.
 ## Table of Contents
@@ -51,8 +53,8 @@ npm install knolo-core
 import { buildPack, mountPack, query, makeContextPatch } from "knolo-core";
 const docs = [
-  { id: "guide", heading: "React Native Bridge", text: "The bridge sends messages between JS and native. You can throttle events..." },
-  { id: "throttle", heading: "Throttling", text: "Throttling reduces frequency of events..." }
+  { id: "guide", namespace: "mobile", heading: "React Native Bridge", text: "The bridge sends messages between JS and native. You can throttle events..." },
+  { id: "throttle", namespace: "mobile", heading: "Throttling", text: "Throttling reduces frequency of events..." }
 ];
 const bytes = await buildPack(docs);
@@ -64,10 +66,21 @@ const patch = makeContextPatch(hits, { budget: "small" });
 ### CLI build
 ```bash
-# input docs.json -> output knowledge.knolo
+# lexical-only
 npx knolo docs.json knowledge.knolo
+# semantic-enabled build (embeddings JSON + model id)
+npx knolo docs.json knowledge.knolo --embeddings embeddings.json --model-id text-embedding-3-small
+# embed agents from a local directory (.json/.yml/.yaml)
+npx knolo docs.json knowledge.knolo --agents ./examples/agents
 ```
+### Agents and namespace binding
+When agent definitions are embedded into `meta.agents`, `resolveAgent(pack, { agentId, query, patch })` enforces **strict namespace binding**: `retrievalDefaults.namespace` always wins over caller `query.namespace`. This keeps retrieval deterministic and on-policy for each agent.
 ---
 ## Concepts
@@ -90,6 +103,7 @@ npx knolo docs.json knowledge.knolo
 type BuildInputDoc = {
   id?: string;          // exposed later as hit.source
   heading?: string;     // boosts relevance when overlapping query terms
+  namespace?: string;   // optional namespace for scoped retrieval
   text: string;         // raw markdown accepted (lightly stripped)
 };
 ```
@@ -97,7 +111,17 @@ type BuildInputDoc = {
 ### API
 ```ts
-const bytes: Uint8Array = await buildPack(docs: BuildInputDoc[]);
+const bytes: Uint8Array = await buildPack(docs: BuildInputDoc[], {
+  semantic?: {
+    enabled: boolean;
+    modelId: string;
+    embeddings: Float32Array[]; // same length/order as blocks
+    quantization?: {
+      type: 'int8_l2norm';
+      perVectorScale?: true;
+    };
+  };
+});
 ```
 **Tips**
@@ -115,7 +139,30 @@ const bytes: Uint8Array = await buildPack(docs: BuildInputDoc[]);
 ```ts
 type QueryOptions = {
   topK?: number;               // default 10
+  minScore?: number;           // optional absolute score floor
   requirePhrases?: string[];   // phrases that must appear verbatim
+  namespace?: string | string[]; // optional namespace filter(s)
+  source?: string | string[];    // optional source/docId filter(s)
+  queryExpansion?: {
+    enabled?: boolean;         // default true
+    docs?: number;             // top seed docs, default 3
+    terms?: number;            // expanded lexical terms, default 4
+    weight?: number;           // tf scaling for expansion terms, default 0.35
+    minTermLength?: number;    // default 3
+  };
+  semantic?: {
+    enabled?: boolean;         // default false
+    mode?: "rerank";           // default "rerank"
+    topN?: number;             // default 50
+    minLexConfidence?: number; // default 0.35
+    blend?: {
+      enabled?: boolean;       // default true
+      wLex?: number;           // default 0.75
+      wSem?: number;           // default 0.25
+    };
+    queryEmbedding?: Float32Array; // required if enabled=true
+    force?: boolean;           // rerank even when lexical confidence is high
+  };
 };
 type Hit = {
@@ -123,22 +170,50 @@ type Hit = {
   score: number;
   text: string;
   source?: string;             // docId if provided
+  namespace?: string;          // namespace if provided
 };
 const hits: Hit[] = query(pack, '“react native bridge” throttling', {
   topK: 5,
-  requirePhrases: ["maximum rate"] // hard constraint
+  requirePhrases: ["maximum rate"], // hard constraint
+  namespace: "mobile",
+  source: ["guide", "faq"]
 });
 ```
+### Semantic helper ergonomics
+```ts
+import { hasSemantic, validateQueryOptions, validateSemanticQueryOptions } from "knolo-core";
+if (hasSemantic(pack)) {
+  validateQueryOptions({
+    topK: 10,
+    namespace: "mobile",
+    queryExpansion: { enabled: true, docs: 3, terms: 4 },
+  });
+  validateSemanticQueryOptions({
+    enabled: true,
+    topN: 40,
+    minLexConfidence: 0.35,
+    queryEmbedding,
+  });
+}
+```
+`validateQueryOptions(...)` and `validateSemanticQueryOptions(...)` throw useful errors for invalid option types/ranges (for example `topK`, `queryExpansion.docs`, `topN`, `minLexConfidence`, blend weights, and missing `Float32Array` embedding types).
 **What the ranker does**
 1. Enforces quoted/required phrases (hard filter)
-2. BM25L with precomputed avg block length
+2. Corpus-aware BM25L with true IDF, query-time DF collection, and per-block length normalization
 3. **Proximity bonus** (minimal span cover)
 4. **Heading overlap** boost
-5. **KNS** tie-breaker (small, deterministic)
-6. **De-dupe + MMR** diversity for final top-K
+5. Deterministic **pseudo-relevance query expansion** from top lexical seeds
+6. **KNS** tie-breaker (small, deterministic)
+7. Optional semantic rerank over lexical top-N when confidence is low
+8. **De-dupe + MMR** diversity for final top-K
 ---
@@ -178,6 +253,75 @@ const patch = makeContextPatch(hits, { budget: "mini" | "small" | "full" });
 query(pack, "throttling", { requirePhrases: ["react native bridge"] });
 ```
+### Namespace-scoped retrieval
+```ts
+query(pack, "bridge events", { namespace: ["mobile", "sdk"] });
+```
+### Source/docId-scoped retrieval
+```ts
+query(pack, "throttling", { source: ["guide", "faq"] });
+```
+### Minimum score threshold
+```ts
+query(pack, "throttle bridge", { minScore: 2.5 });
+```
+Use this when you prefer precision over recall and only want confident lexical matches.
+### Query expansion controls
+```ts
+query(pack, "throttle bridge", {
+  queryExpansion: { enabled: true, docs: 4, terms: 6, weight: 0.3 }
+});
+```
+This keeps retrieval lexical/deterministic while increasing recall for related vocabulary found in top-ranked seed blocks.
+### Optional semantic rerank (hybrid MVP)
+```ts
+query(pack, "throttle bridge", {
+  topK: 5,
+  semantic: {
+    enabled: true,
+    queryEmbedding, // Float32Array from your embedding model (required)
+    topN: 50,
+    minLexConfidence: 0.35,
+    force: false,
+    blend: { enabled: true, wLex: 0.75, wSem: 0.25 },
+  },
+});
+```
+Lexical retrieval still runs first. Semantic rerank only touches top-N lexical candidates, and runs before de-dupe/MMR. If `pack.semantic` is missing, rerank is skipped silently; if `queryEmbedding` is omitted while enabled, `query(...)` throws.
+Example with explicit validation:
+```ts
+validateSemanticQueryOptions({
+  enabled: true,
+  topN: 64,
+  minLexConfidence: 0.25,
+  blend: { enabled: true, wLex: 0.7, wSem: 0.3 },
+  queryEmbedding,
+});
+const hits = query(pack, userQuery, {
+  semantic: {
+    enabled: true,
+    queryEmbedding,
+    topN: 64,
+    minLexConfidence: 0.25,
+  },
+});
+```
 ### Tight vs. scattered matches
 Proximity bonus favors blocks where all query terms co-occur in a small span.
@@ -203,13 +347,14 @@ Top-K results apply near-duplicate suppression (5-gram Jaccard) and MMR (λ≈0.
 [lexLen:u32][lexicon JSON]
 [postCount:u32][postings u32[]]
 [blocksLen:u32][blocks JSON]
+[semLen:u32][semantic JSON][semBlobLen:u32][semantic blob] // optional tail at EOF
 ```
 **Meta JSON**
 ```json
 {
-  "version": 2,
+  "version": 3,
   "stats": {
     "docs": <number>,
     "blocks": <number>,
@@ -219,6 +364,77 @@ Top-K results apply near-duplicate suppression (5-gram Jaccard) and MMR (λ≈0.
 }
 ```
+**Optional semantic tail**
+* Fully backward compatible: if EOF is reached immediately after `blocks JSON`, no semantic data is present.
+* Semantic tail schema version is `1` (`semantic.version = 1`).
+* `buildPack(..., { semantic })` can now generate this section from provided `Float32Array` embeddings (no model inference at build time).
+* Quantization is deterministic `int8_l2norm` per vector:
+  1. L2-normalize the input embedding.
+  2. Compute `scale = max(abs(e_i)) / 127`.
+  3. Quantize `q_i = clamp(round(e_i / scale), -127..127)`.
+  4. Store scale in `Uint16Array` using float16 encoding.
+* Blob layout is **vectors first, scales second**:
+  * `blocks.vectors.byteOffset = 0`
+  * `blocks.vectors.length = blockCount * dims` (Int8 elements)
+  * `blocks.scales.byteOffset = vectors.byteLength`
+  * `blocks.scales.length = blockCount` (Uint16 elements)
+Semantic JSON schema (stored verbatim in `[semantic JSON]`):
+```json
+{
+  "version": 1,
+  "modelId": "string",
+  "dims": 384,
+  "encoding": "int8_l2norm",
+  "perVectorScale": true,
+  "blocks": {
+    "vectors": { "byteOffset": 0, "length": 1152 },
+    "scales": { "byteOffset": 1152, "length": 3, "encoding": "float16" }
+  }
+}
+```
+### Building packs with embeddings (library usage)
+```ts
+const embeddings: Float32Array[] = await Promise.all(
+  docs.map(async (doc) => embedText(doc.text))
+);
+const bytes = await buildPack(docs, {
+  semantic: {
+    enabled: true,
+    modelId: "text-embedding-3-small",
+    embeddings,
+    quantization: { type: "int8_l2norm", perVectorScale: true },
+  },
+});
+```
+Embedding validation rules:
+* `embeddings.length` must match block count exactly.
+* every embedding must be `Float32Array`.
+* every vector must have identical `dims`.
+### Querying with semantic rerank
+```ts
+const queryEmbedding = await embedText(userQuestion);
+const hits = query(pack, userQuestion, {
+  topK: 8,
+  semantic: {
+    enabled: true,
+    queryEmbedding,
+    topN: 64,
+    minLexConfidence: 0.35,
+    blend: { enabled: true, wLex: 0.75, wSem: 0.25 },
+  },
+});
+```
 **Lexicon JSON**
 * Array of `[term, termId]` pairs.
@@ -228,15 +444,16 @@ Top-K results apply near-duplicate suppression (5-gram Jaccard) and MMR (λ≈0.
 * Flattened `Uint32Array`:
   ```
-  termId, blockId, pos, pos, …, 0, blockId, …, 0, 0, termId, ...
+  termId, blockId+1, pos, pos, …, 0, blockId+1, …, 0, 0, termId, ...
   ```
-  Each block section ends with `0`, each term section ends with `0`.
+  Block IDs are encoded as `bid + 1` so `0` is reserved as the delimiter. Each block section ends with `0`, each term section ends with `0`.
 **Blocks JSON (v1 / v2)**
 * **v1**: `string[]` (text only)
 * **v2**: `{ text, heading?, docId? }[]`
+* **v3**: `{ text, heading?, docId?, namespace?, len }[]` (`len` is block token length for stable ranking)
 Runtime auto-detects and exposes:
@@ -244,7 +461,18 @@ Runtime auto-detects and exposes:
 type Pack = {
   meta, lexicon, postings, blocks: string[],
   headings?: (string|null)[],
-  docIds?: (string|null)[]
+  docIds?: (string|null)[],
+  namespaces?: (string|null)[],
+  blockTokenLens?: number[],
+  semantic?: {
+    version: 1,
+    modelId: string,
+    dims: number,
+    encoding: "int8_l2norm",
+    perVectorScale: boolean,
+    vecs: Int8Array,
+    scales?: Uint16Array
+  }
 }
 ```
@@ -319,7 +547,7 @@ npm run smoke
 ## FAQ
 **Q: Does this use embeddings or a vector DB?**
-A: No—pure lexical retrieval with positions and structural cues.
+A: Default retrieval is lexical. Optional semantic hybrid rerank is supported when packs are built with embeddings; no external vector DB is required.
 **Q: Why am I still seeing similar results?**
 A: De-dup suppresses near-duplicates but allows related passages. Increase Jaccard threshold or tune λ (if forking).