latticesql 4.0.1 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -2,6 +2,13 @@
2
2
 
3
3
  Complete reference for all public classes, methods, and types exported by `latticesql`.
4
4
 
5
+ > **v4.1 retrieval, query & data primitives** (eval/doctor/benchmark, chunked +
6
+ > indexed vector, hybrid + ranking + rerank, graph, bounded reads / projection /
7
+ > jsonPath / aggregate / keyset pagination / distinctOn / include, provenance +
8
+ > trust, retry + resumable migrations, computed columns + rollups, keyless cloud
9
+ > file presigning) are documented with examples in **[retrieval.md](retrieval.md)**.
10
+ > All are additive and opt-in.
11
+
5
12
  ---
6
13
 
7
14
  ## Table of Contents
@@ -0,0 +1,171 @@
1
+ # Retrieval, query & data primitives (v4.1)
2
+
3
+ latticesql 4.1 turns the library into a measurable, production-grade retrieval and
4
+ data substrate. Everything here is **additive and opt-in** — absent the opt-in,
5
+ `query()` / `count()` / `search()` behave byte-identically to 4.0. Every primitive
6
+ ships with unit (`:memory:` SQLite) + integration (real Postgres) + dialect-parity
7
+ tests.
8
+
9
+ ## Measurable retrieval
10
+
11
+ ### `evaluateRetrieval(queries, retriever, opts?)`
12
+
13
+ Standard IR metrics over **any** ranked retriever — `(query) => rankedRowIds`, so
14
+ it grades semantic, full-text, hybrid, graph, or an external service.
15
+
16
+ ```ts
17
+ const summary = await db.evaluateRetrieval(
18
+ [{ query: 'budget', relevant: ['doc1', 'doc7'] }],
19
+ async (q) => (await db.search('docs', q, { topK: 10 })).map((r) => String(r.row.id)),
20
+ { k: 10, ks: [1, 5, 10] },
21
+ );
22
+ // summary.precisionAtK / recallAtK / mrr / ndcgAtK / map (+ perQuery, byK)
23
+ ```
24
+
25
+ `detectRetrievalRegressions(baseline, candidate, tolerance)` turns it into a CI
26
+ gate — a retrieval change that lowers any metric past tolerance fails the build.
27
+
28
+ ### `lattice doctor` / `diagnoseRetrieval(opts?)`
29
+
30
+ Read-only health: per-table FTS + embedding coverage (soft-deleted rows excluded),
31
+ extension availability (FTS5, sqlite-vec, pgvector, pg_trgm), and severity-ranked
32
+ issues. `lattice doctor [--json]` exits non-zero on any error (deploy gate).
33
+
34
+ ### `benchmarkRetrieval(opts?)` / `checkSlos(report, slos)`
35
+
36
+ Reproducible p50/p95/p99 latency for filtered query, FTS, vector, and aggregate,
37
+ plus ingest throughput + peak memory — on both dialects, at a configurable scale
38
+ (`LATTICE_BENCH_ROWS/QUERIES/DIM`). Ships in the package so buyers reproduce the
39
+ numbers; wire `checkSlos` as a CI SLO gate.
40
+
41
+ ## Better search
42
+
43
+ ### Chunked + contextual embeddings
44
+
45
+ ```ts
46
+ db.define('docs', {
47
+ columns: { id: 'TEXT PRIMARY KEY', title: 'TEXT', body: 'TEXT' },
48
+ embeddings: {
49
+ fields: ['title', 'body'],
50
+ embed: myEmbedder,
51
+ chunker: semanticChunker({ maxChars: 1000, overlap: 100 }),
52
+ contextPrefix: (row) => String(row.title), // prepended to every chunk
53
+ modelId: 'text-embedding-3-small',
54
+ },
55
+ });
56
+ ```
57
+
58
+ Each row is embedded as several boundary-aware chunks → higher precision@k and
59
+ fewer tokens to a correct answer. `search()` returns the best-matching chunk
60
+ (`chunkIndex` + `matchedContent`), excludes soft-deleted rows, and throws
61
+ `EmbeddingDimensionMismatchError` if the model dimension changed without a re-embed.
62
+ `refreshEmbeddings(table, opts)` backfills missing / re-embeds stale / sweeps orphans.
63
+
64
+ ### Indexed vector search
65
+
66
+ ```ts
67
+ await db.buildVectorIndex('docs'); // pgvector HNSW (PG) / sqlite-vec (SQLite)
68
+ ```
69
+
70
+ Opt-in per-table approximate-nearest-neighbor index built from the stored vectors;
71
+ `search()` uses it automatically when present, else the in-process scan (which
72
+ `doctor` reports). Requires the extension server-side (pgvector) or loaded
73
+ (sqlite-vec).
74
+
75
+ ### Hybrid search + ranking + reranker
76
+
77
+ ```ts
78
+ const results = await db.hybridSearch('docs', 'q4 budget', {
79
+ topK: 10,
80
+ ranking: {
81
+ recency: { column: 'created_at', halfLifeDays: 30, weight: 1 },
82
+ reward: { weight: 0.5 },
83
+ },
84
+ reranker: myCrossEncoder, // optional; graceful fallback on failure
85
+ });
86
+ // each result carries .explain { rrf, vectorRank/Score, ftsRank/Score, rankingBoost, rerankerScore }
87
+ ```
88
+
89
+ Reciprocal Rank Fusion (k=60) of the vector + full-text arms. `lattice search
90
+ "<q>" --table <t> --explain` shows the score breakdown. Full-text is now
91
+ relevance-ranked (`ts_rank` / `bm25`).
92
+
93
+ ### Graph-augmented retrieval
94
+
95
+ ```ts
96
+ await db.addEdge({ srcTable: 'docs', srcId: 'a', dstTable: 'docs', dstId: 'b', type: 'cites' });
97
+ await db.extractEdges({ srcTable: 'docs', fkColumn: 'parent_id', dstTable: 'docs' }); // zero-LLM
98
+ const results = await db.graphSearch('docs', 'q', { anchors: [{ table: 'docs', id: 'a' }] });
99
+ ```
100
+
101
+ A typed-edge graph (`__lattice_edges`) with bounded BFS (`traverseGraph`, depth ≤ 5,
102
+ node-capped) and adjacency boosting — relationship-aware retrieval that lifts rows
103
+ connected to your current-context entities.
104
+
105
+ ## Query primitives
106
+
107
+ ```ts
108
+ // Bounded reads — guard against unbounded full-table loads
109
+ await db.query('t', { maxRows: 1000 }); // throws BoundedReadError if more match
110
+ new Lattice(path, { defaultMaxRows: 1000 }); // global default
111
+
112
+ // Projection — return only the columns you need
113
+ await db.query('t', { projection: ['id', 'name'] });
114
+
115
+ // OR/AND groups + jsonPath
116
+ await db.query('t', {
117
+ filters: [
118
+ { col: 'status', op: 'eq', val: 'open' },
119
+ {
120
+ or: [
121
+ { col: 'priority', op: 'gte', val: 3 },
122
+ { col: 'pinned', op: 'eq', val: true },
123
+ ],
124
+ },
125
+ { col: 'meta', jsonPath: 'tier', op: 'eq', val: 'gold' },
126
+ ],
127
+ });
128
+
129
+ // SQL-side aggregation
130
+ await db.aggregate('orders', {
131
+ groupBy: ['status'],
132
+ aggregates: [
133
+ { fn: 'count', as: 'n' },
134
+ { fn: 'sum', col: 'total', as: 'revenue' },
135
+ ],
136
+ having: [{ aggregate: 'n', op: 'gt', val: 10 }],
137
+ });
138
+
139
+ // Keyset pagination — fast arbitrarily deep
140
+ const page = await db.queryPage('t', { orderBy: 'created_at', limit: 50, cursor });
141
+
142
+ // distinctOn — one row per group; include — batched relation expansion
143
+ await db.query('events', { distinctOn: 'user_id', orderBy: 'ts', orderDir: 'desc' });
144
+ await db.query('posts', { include: ['author'] }); // belongsTo → row; hasMany → array
145
+ ```
146
+
147
+ ## Governance, reliability, computed columns, cloud files
148
+
149
+ ```ts
150
+ // Immutable provenance + trust gate
151
+ db.define('docs', { columns: {...}, provenance: true, trust: true });
152
+ await db.verifyRow('docs', id, 'alice'); // markRowForReview / rowsNeedingReview / verifiedRows
153
+
154
+ // Durable retry + online resumable migrations
155
+ await withRetry(() => db.insert(...)); // idempotent ops only
156
+ await applyChunkedMigration(db.adapter, { id, table, apply, batchSize: 1000 });
157
+
158
+ // Computed columns + materialized rollups
159
+ db.define('people', { columns: {...}, computed: {
160
+ full_name: { deps: ['first', 'last'], compute: (r) => `${r.first} ${r.last}` },
161
+ }});
162
+ db.define('posts', { columns: {...}, materializedRollups: {
163
+ comment_count: { sourceTable: 'comments', foreignKey: 'post_id', fn: 'count' },
164
+ }});
165
+
166
+ // Keyless cloud file-byte access (Postgres cloud)
167
+ await db.enableCloudFilePresigning({ bucket, region, accessKey, secretKey });
168
+ // members fetch bytes with zero config; the owner key never leaves the database.
169
+ ```
170
+
171
+ See [CHANGELOG.md](../CHANGELOG.md) for the full 4.1 list.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "latticesql",
3
- "version": "4.0.1",
3
+ "version": "4.1.0",
4
4
  "description": "Persistent structured memory for AI agent systems — pluggable SQLite or Postgres backend, LLM context bridge",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
@@ -66,8 +66,10 @@
66
66
  "optionalDependencies": {
67
67
  "@aws-sdk/client-s3": "^3.1067.0",
68
68
  "pg": "^8.11.0",
69
+ "pgvector": "^0.2.0",
69
70
  "playwright": "^1.48.0",
70
- "sharp": "^0.33.5"
71
+ "sharp": "^0.33.5",
72
+ "sqlite-vec": "^0.1.6"
71
73
  },
72
74
  "devDependencies": {
73
75
  "@eslint/js": "^9.0.0",