latticesql 4.1.0 → 4.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -501,13 +501,23 @@ const results = await db.search('docs', 'deploy to production', { topK: 5, minSc
501
501
 
502
502
  **`SearchOptions`**:
503
503
 
504
- | Field | Type | Default | Description |
505
- | ---------- | -------- | ------- | ----------------------------------- |
506
- | `topK` | `number` | `10` | Max results to return |
507
- | `minScore` | `number` | `0` | Minimum cosine similarity threshold |
504
+ | Field | Type | Default | Description |
505
+ | ---------- | -------- | ------- | --------------------------------------------------------------------------------- |
506
+ | `topK` | `number` | `10` | Max results to return (clamped to `[1, 1000]` before the candidate fan-out; v4.2) |
507
+ | `minScore` | `number` | `0` | Minimum cosine similarity threshold |
508
508
 
509
509
  **`SearchResult`**: `{ row: Row, score: number }`
510
510
 
511
+ > **v4.2 — bounded retrieval.** `topK` is clamped (`clampTopK`,
512
+ > `SEARCH_TOPK_MAX = 1000`) before the indexed arm over-fetches `topK * N`
513
+ > candidates, so a single large `topK` can't turn one query into a whole-table
514
+ > read. When a table has **no** native vector index, the in-process cosine scan
515
+ > can be capped per-table via `EmbeddingsConfig.maxScanChunks`: if the scan would
516
+ > read more than that many stored chunk vectors it throws
517
+ > `EmbeddingScanTooLargeError` rather than load them all into memory — off by
518
+ > default (unbounded scan, historical behavior), and never silently truncated. See
519
+ > [retrieval.md](retrieval.md).
520
+
511
521
  ---
512
522
 
513
523
  ### Sync Methods
@@ -1403,6 +1413,52 @@ function cleanupEntityContexts(
1403
1413
  ): CleanupResult;
1404
1414
  ```
1405
1415
 
1416
+ ### Structured import (4.2)
1417
+
1418
+ Turn a JSON object or an Excel `.xlsx` workbook into a Lattice schema and
1419
+ materialize it (entities / dimensions / junctions), with point-in-time snapshots
1420
+ and re-import recognition. All exported from `latticesql`. In `lattice gui` these
1421
+ run automatically when you **drop a structured file into the assistant rail**;
1422
+ the same functions are available as a GUI-independent library API. See
1423
+ [importing.md](importing.md).
1424
+
1425
+ ```ts
1426
+ import {
1427
+ inferSchema, // (data, opts?) => ProposedSchema — entities/dimensions/junctions
1428
+ inferFieldType, // (values) => InferredType
1429
+ normalizeName, // (key) => string — source key → table/column name
1430
+ sourceRecords, // (data, entity) => Record<string, unknown>[]
1431
+ excelToRecords, // (absPath) => Promise<Record<string, unknown[]>> — sheets → records
1432
+ dedupeAndDetectViews, // (data, plan) => { ..., views: DetectedView[] } — read-only per-slice views
1433
+ detectAsOf, // (fileName) => string | null — ISO YYYY-MM-DD
1434
+ detectAsOfCandidates, // (inputs: AsOfInputs) => AsOfCandidate[]
1435
+ detectAsOfColumns, // (data, plan) => AsOfColumnCandidate[] — per-row date columns
1436
+ parseCellDate, // (value) => string | null — ISO YYYY-MM-DD
1437
+ matchSchemaToExisting, // (existing, plan) => SchemaMatch — fingerprint re-imports
1438
+ renameEntities, // (plan, rename) => ProposedSchema
1439
+ materializeImport, // (ctx, data, plan, views?, opts?) => Promise<MaterializeResult>
1440
+ EmbeddingScanTooLargeError,
1441
+ } from 'latticesql';
1442
+ ```
1443
+
1444
+ `materializeImport(ctx, data, plan, views?, opts?)`:
1445
+
1446
+ - `ctx`: `{ db: Lattice, configPath?: string | null }` — when `configPath` is
1447
+ set, the inferred schema is persisted to the workspace config (canonical).
1448
+ - `opts.mode`: `'schema' | 'contents' | 'both'` (default `'both'`).
1449
+ - `opts.asOf`: file-level ISO date — stamps every row's `as_of` and folds it into
1450
+ the row identity, so re-importing at a new date appends a snapshot.
1451
+ - `opts.asOfColumn`: a per-row date column name — dates each row individually.
1452
+ - `opts.onProgress`: streams `ImportProgress` steps for a live pipeline view.
1453
+ - Returns `MaterializeResult`:
1454
+ `{ mode, asOf, asOfColumn, tablesCreated, rowsByTable, links, views }`.
1455
+
1456
+ Types: `ProposedSchema`, `InferredEntity`, `InferredColumn`, `InferredDimension`,
1457
+ `InferredLinkage`, `InferredType`, `DetectedView`, `AsOfCandidate`, `AsOfInputs`,
1458
+ `AsOfColumnCandidate`, `SchemaMatch`, `EntityMatch`, `ExistingTable`,
1459
+ `MaterializeCtx`, `MaterializeResult`, `MaterializeOptions`, `ImportMode`,
1460
+ `ImportProgress`.
1461
+
1406
1462
  ### Full-text search (1.16)
1407
1463
 
1408
1464
  ```ts
@@ -187,6 +187,21 @@ Two modules:
187
187
 
188
188
  Standalone entry point compiled to `dist/cli.js` with a `#!/usr/bin/env node` shebang. Uses no external CLI framework — just manual `process.argv` parsing. Calls `generateAll()` and logs results.
189
189
 
190
+ ### Structured import (`src/import/`) _(v4.2)_
191
+
192
+ Turns a structured source — a JSON object or an Excel `.xlsx` workbook — into a
193
+ Lattice schema and materializes it. It is a self-contained module with no
194
+ dependency on the GUI or any dashboard:
195
+
196
+ - `infer.ts` — `inferSchema` / `inferFieldType` / `normalizeName` / `sourceRecords`: source → proposed entities, dimensions, junctions.
197
+ - `excel.ts` — `excelToRecords`: sheets → records (header + data-region detection).
198
+ - `dedupe-views.ts` — `dedupeAndDetectViews`: per-slice tabs that mirror a master become read-only views, not duplicate tables.
199
+ - `asof.ts` / `asof-columns.ts` — `detectAsOf*` / `parseCellDate`: detect a file-level or per-row as-of date for point-in-time snapshots.
200
+ - `match.ts` — `matchSchemaToExisting` / `renameEntities`: fingerprint a re-upload against existing tables so it lands as a new snapshot, not a duplicate set.
201
+ - `materialize.ts` — `materializeImport`: create tables (idempotent), insert rows + links, persist the schema to config, build the detected views.
202
+
203
+ In `lattice gui` the import is reachable only by dropping a structured file into the assistant rail; the confirmed proposal is applied via `POST /api/import/apply`. The functions are also exported from `latticesql` for library use.
204
+
190
205
  ---
191
206
 
192
207
  ## Data flow
@@ -314,6 +329,15 @@ src/
314
329
  │ └── loop.ts # SyncLoop (+ cleanup integration, v0.5)
315
330
  ├── writeback/
316
331
  │ └── pipeline.ts # WritebackPipeline
332
+ ├── import/ # v4.2 — structured-source import
333
+ │ ├── infer.ts # inferSchema / inferFieldType / normalizeName / sourceRecords
334
+ │ ├── excel.ts # excelToRecords
335
+ │ ├── dedupe-views.ts # dedupeAndDetectViews
336
+ │ ├── asof.ts # detectAsOf* / parseCellDate
337
+ │ ├── asof-columns.ts # detectAsOfColumns
338
+ │ ├── match.ts # matchSchemaToExisting / renameEntities
339
+ │ ├── materialize.ts # materializeImport
340
+ │ └── types.ts # ProposedSchema, InferredEntity, DetectedView, …
317
341
  └── security/
318
342
  └── sanitize.ts # Sanitizer
319
343
 
package/docs/assistant.md CHANGED
@@ -166,6 +166,29 @@ client): `organizeSource`, `describeImage`, `crawlUrl`, `enrichKnowledge`, and t
166
166
  A transient **"Analyzing…"** row shows while ingest runs; the add/enrich/link
167
167
  events stream into the feed as the server materializes them.
168
168
 
169
+ ### Structured-source import (drop a JSON / `.xlsx`) (4.2)
170
+
171
+ The Context Constructor above turns _unstructured_ sources (documents, images,
172
+ web pages) into a summarized, linked `files` row. **Dropping a structured source
173
+ — a JSON object or an Excel `.xlsx` workbook — takes a different path:** Lattice
174
+ infers a schema from it (entities, dimensions, junctions) and materializes it into
175
+ real tables. Excel sheets become records (header + data-region detection);
176
+ per-slice tabs that mirror a master become read-only **views** (no duplicated
177
+ rows). An **as-of date** is detected (file contents → name → Excel preamble → a
178
+ Claude fallback, or per-row from a date column), so re-importing a newer period
179
+ keeps a **dated snapshot** beside the prior one; a re-upload is fingerprinted and
180
+ matched to the tables already in the workspace, so it lands as a new snapshot
181
+ rather than duplicate tables.
182
+
183
+ A **recognized dataset with a confident date imports silently** as a dated
184
+ snapshot (reported in the activity feed); a brand-new dataset, or a recognized one
185
+ with no confident date, surfaces an **inline confirm card** that proposes the
186
+ schema, the as-of date (and any per-row date column), and the mode before anything
187
+ is written — applied via `POST /api/import/apply`. The same inference +
188
+ materialization functions (`inferSchema`, `materializeImport`, `detectAsOf*`,
189
+ `excelToRecords`, `dedupeAndDetectViews`, …) are exported from `latticesql` for
190
+ library use. See [importing.md](importing.md) for the full walkthrough.
191
+
169
192
  ## Artifacts
170
193
 
171
194
  Ask the assistant to "write a doc / note / summary / write-up" and it calls the
@@ -0,0 +1,284 @@
1
+ <!doctype html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="utf-8" />
5
+ <meta name="viewport" content="width=device-width, initial-scale=1" />
6
+ <title>My Dashboard</title>
7
+ <style>
8
+ /* A plain, mobile-friendly starting point. Restyle freely — only the
9
+ fetch() calls below matter. One system font, differentiated by
10
+ weight/size only. */
11
+ :root {
12
+ font-family:
13
+ ui-sans-serif,
14
+ system-ui,
15
+ -apple-system,
16
+ Segoe UI,
17
+ Roboto,
18
+ sans-serif;
19
+ color-scheme: light dark;
20
+ }
21
+ * {
22
+ box-sizing: border-box;
23
+ }
24
+ body {
25
+ margin: 0;
26
+ padding: 1.5rem 1rem;
27
+ max-width: 760px;
28
+ margin-inline: auto;
29
+ line-height: 1.5;
30
+ overflow-x: hidden;
31
+ }
32
+ h1 {
33
+ font-size: 1.4rem;
34
+ margin: 0 0 0.25rem;
35
+ }
36
+ p.sub {
37
+ margin: 0 0 1.5rem;
38
+ opacity: 0.7;
39
+ }
40
+ section {
41
+ border: 1px solid color-mix(in srgb, currentColor 18%, transparent);
42
+ border-radius: 12px;
43
+ padding: 1rem;
44
+ margin-bottom: 1.25rem;
45
+ }
46
+ label {
47
+ font-weight: 600;
48
+ display: block;
49
+ margin-bottom: 0.5rem;
50
+ }
51
+ textarea,
52
+ input[type='text'] {
53
+ width: 100%;
54
+ padding: 0.6rem;
55
+ border-radius: 8px;
56
+ border: 1px solid color-mix(in srgb, currentColor 25%, transparent);
57
+ background: transparent;
58
+ color: inherit;
59
+ font: inherit;
60
+ }
61
+ textarea {
62
+ min-height: 70px;
63
+ resize: vertical;
64
+ }
65
+ .row {
66
+ display: flex;
67
+ gap: 0.5rem;
68
+ flex-wrap: wrap;
69
+ align-items: center;
70
+ margin-top: 0.6rem;
71
+ }
72
+ button {
73
+ padding: 0.55rem 1rem;
74
+ border-radius: 8px;
75
+ border: 0;
76
+ background: #2d6cdf;
77
+ color: #fff;
78
+ font: inherit;
79
+ font-weight: 600;
80
+ cursor: pointer;
81
+ }
82
+ button:disabled {
83
+ opacity: 0.5;
84
+ cursor: default;
85
+ }
86
+ #drop {
87
+ border: 2px dashed color-mix(in srgb, currentColor 30%, transparent);
88
+ border-radius: 10px;
89
+ padding: 1.25rem;
90
+ text-align: center;
91
+ opacity: 0.85;
92
+ }
93
+ #drop.over {
94
+ border-color: #2d6cdf;
95
+ opacity: 1;
96
+ }
97
+ ul {
98
+ list-style: none;
99
+ padding: 0;
100
+ margin: 0;
101
+ }
102
+ li {
103
+ padding: 0.7rem 0;
104
+ border-top: 1px solid color-mix(in srgb, currentColor 12%, transparent);
105
+ }
106
+ li:first-child {
107
+ border-top: 0;
108
+ }
109
+ .name {
110
+ font-weight: 600;
111
+ word-break: break-word;
112
+ }
113
+ .meta {
114
+ font-size: 0.85rem;
115
+ opacity: 0.7;
116
+ }
117
+ .tag {
118
+ display: inline-block;
119
+ font-size: 0.75rem;
120
+ padding: 0.1rem 0.5rem;
121
+ border-radius: 999px;
122
+ background: color-mix(in srgb, currentColor 12%, transparent);
123
+ margin: 0.15rem 0.15rem 0 0;
124
+ }
125
+ #status {
126
+ min-height: 1.2rem;
127
+ font-size: 0.85rem;
128
+ opacity: 0.8;
129
+ }
130
+ </style>
131
+ </head>
132
+ <body>
133
+ <h1>My Dashboard</h1>
134
+ <p class="sub">
135
+ Upload a file or jot a note — Lattice reads it and files it against your data.
136
+ </p>
137
+
138
+ <section>
139
+ <label for="file">Upload files</label>
140
+ <div id="drop">
141
+ Drag files here, or
142
+ <button type="button" id="pick">choose files</button>
143
+ </div>
144
+ <input id="file" type="file" multiple hidden />
145
+ </section>
146
+
147
+ <section>
148
+ <label for="note">Add a note</label>
149
+ <textarea id="note" placeholder="Type a note, or paste a link to capture it…"></textarea>
150
+ <div class="row">
151
+ <button type="button" id="addNote">Add note</button>
152
+ </div>
153
+ </section>
154
+
155
+ <section>
156
+ <label>Recently captured</label>
157
+ <div id="status"></div>
158
+ <ul id="list"></ul>
159
+ </section>
160
+
161
+ <script>
162
+ // ---- Lattice client -------------------------------------------------
163
+ // These three calls are the whole integration. The dashboard is served by
164
+ // Lattice on the same origin, so plain relative fetch() works — no API key
165
+ // in the page, no CORS. Copy these into your own page to wire your own UI.
166
+
167
+ // Upload one file. Returns { id, extraction_status, suggestedLinks, ... }.
168
+ async function latticeUpload(file) {
169
+ const res = await fetch('/api/ingest/upload', {
170
+ method: 'POST',
171
+ headers: {
172
+ 'content-type': file.type || 'application/octet-stream',
173
+ 'x-filename': encodeURIComponent(file.name || 'file'),
174
+ },
175
+ body: file,
176
+ });
177
+ if (!res.ok) throw new Error('Upload failed: HTTP ' + res.status);
178
+ return res.json();
179
+ }
180
+
181
+ // Capture a note (or a pasted URL). Returns { id, extraction_status, suggestedLinks }.
182
+ async function latticeAddNote(text, title) {
183
+ const res = await fetch('/api/ingest/text', {
184
+ method: 'POST',
185
+ headers: { 'content-type': 'application/json' },
186
+ body: JSON.stringify(title ? { text, title } : { text }),
187
+ });
188
+ if (!res.ok) throw new Error('Add note failed: HTTP ' + res.status);
189
+ return res.json();
190
+ }
191
+
192
+ // List captured items (newest first). Returns an array of file rows.
193
+ async function latticeListFiles(limit = 25) {
194
+ const res = await fetch('/api/tables/files/rows?limit=' + limit);
195
+ if (!res.ok) throw new Error('List failed: HTTP ' + res.status);
196
+ const data = await res.json();
197
+ return Array.isArray(data.rows) ? data.rows : [];
198
+ }
199
+
200
+ // ---- Wiring (replace with your own UI) ------------------------------
201
+ const statusEl = document.getElementById('status');
202
+ const listEl = document.getElementById('list');
203
+ const fileInput = document.getElementById('file');
204
+ const drop = document.getElementById('drop');
205
+
206
+ function setStatus(msg) {
207
+ statusEl.textContent = msg || '';
208
+ }
209
+
210
+ function renderList(rows) {
211
+ listEl.innerHTML = '';
212
+ for (const r of rows) {
213
+ const li = document.createElement('li');
214
+ const name = document.createElement('div');
215
+ name.className = 'name';
216
+ name.textContent = r.original_name || r.name || '(untitled)';
217
+ const meta = document.createElement('div');
218
+ meta.className = 'meta';
219
+ meta.textContent =
220
+ (r.description || '').slice(0, 200) +
221
+ (r.extraction_status ? ' · ' + r.extraction_status : '');
222
+ li.append(name, meta);
223
+ listEl.append(li);
224
+ }
225
+ if (rows.length === 0) listEl.innerHTML = '<li class="meta">Nothing captured yet.</li>';
226
+ }
227
+
228
+ async function refresh() {
229
+ try {
230
+ renderList(await latticeListFiles());
231
+ } catch (e) {
232
+ setStatus(e.message);
233
+ }
234
+ }
235
+
236
+ async function handleFiles(files) {
237
+ for (const file of files) {
238
+ setStatus('Uploading ' + file.name + '…');
239
+ try {
240
+ const out = await latticeUpload(file);
241
+ const n = (out.suggestedLinks || []).length;
242
+ setStatus('Captured ' + file.name + (n ? ' · linked to ' + n + ' record(s)' : ''));
243
+ } catch (e) {
244
+ setStatus(e.message);
245
+ }
246
+ }
247
+ await refresh();
248
+ }
249
+
250
+ document.getElementById('pick').addEventListener('click', () => fileInput.click());
251
+ fileInput.addEventListener('change', () => {
252
+ if (fileInput.files.length) handleFiles(fileInput.files);
253
+ fileInput.value = '';
254
+ });
255
+ drop.addEventListener('dragover', (e) => {
256
+ e.preventDefault();
257
+ drop.classList.add('over');
258
+ });
259
+ drop.addEventListener('dragleave', () => drop.classList.remove('over'));
260
+ drop.addEventListener('drop', (e) => {
261
+ e.preventDefault();
262
+ drop.classList.remove('over');
263
+ if (e.dataTransfer.files.length) handleFiles(e.dataTransfer.files);
264
+ });
265
+
266
+ document.getElementById('addNote').addEventListener('click', async () => {
267
+ const ta = document.getElementById('note');
268
+ const text = ta.value.trim();
269
+ if (!text) return;
270
+ setStatus('Saving note…');
271
+ try {
272
+ await latticeAddNote(text);
273
+ ta.value = '';
274
+ setStatus('Note captured.');
275
+ } catch (e) {
276
+ setStatus(e.message);
277
+ }
278
+ await refresh();
279
+ });
280
+
281
+ refresh();
282
+ </script>
283
+ </body>
284
+ </html>
@@ -0,0 +1,118 @@
1
+ # Structured-source import (v4.2)
2
+
3
+ latticesql 4.2 can turn a **structured file** — a JSON object or an Excel
4
+ `.xlsx` workbook — into a Lattice schema (entities, dimensions, junctions) and
5
+ materialize it into a workspace. Everything here is **additive and opt-in**:
6
+ absent a file drop, behavior is byte-identical to 4.1.
7
+
8
+ The feature is reachable **only by dropping a file into the assistant rail** in
9
+ `lattice gui`. There is no CLI verb and no separate endpoint to call by hand —
10
+ the upload pipeline builds a proposal, and a confirmed proposal is applied via
11
+ `POST /api/import/apply`. The same inference and materialization functions are
12
+ also exported from `latticesql` for library use (see [Library API](#library-api)).
13
+
14
+ ## What it does
15
+
16
+ When you drop a recognized JSON / `.xlsx` source into the chat:
17
+
18
+ 1. **Infer a schema.** `inferSchema` reads the source and proposes **entities**
19
+ (record collections that become tables), **dimensions** (small repeated value
20
+ sets that become a shared taxonomy / dictionary), and **junctions** (the
21
+ many-to-many links between them). Field types are inferred per column
22
+ (`inferFieldType`), and source keys are normalized to table/column names
23
+ (`normalizeName`).
24
+ 2. **Read Excel natively.** `excelToRecords` turns each sheet into records by
25
+ detecting the header row and the data region. A per-slice tab that is just a
26
+ filtered view of a master sheet is recognized as a **read-only view** (no
27
+ duplicated rows) rather than a second table — see `dedupeAndDetectViews`.
28
+ 3. **Detect an as-of date for point-in-time snapshots.** `detectAsOf*` looks at
29
+ the file's contents, then its name, then an Excel preamble, then a Claude
30
+ fallback — or a per-row date **column** (`detectAsOfColumns`, `parseCellDate`).
31
+ When a date is found, every materialized row is stamped `as_of` and the row
32
+ identity folds it in, so **re-importing a newer period APPENDS a dated
33
+ snapshot beside the prior one** instead of overwriting it. Dimensions (the
34
+ shared taxonomy) are not dated.
35
+ 4. **Recognize a re-import.** `matchSchemaToExisting` fingerprints the inferred
36
+ schema and matches it against the tables already in the workspace, so a
37
+ re-upload lands as a **new snapshot of the existing tables**, not a duplicate
38
+ set. `renameEntities` applies any entity → table-name overrides.
39
+ 5. **Materialize.** `materializeImport` creates the tables (idempotently),
40
+ inserts the rows + links, persists the schema to the workspace config, and
41
+ builds the detected read-only views.
42
+
43
+ ## Silent import vs. the inline confirm card
44
+
45
+ The chat drop chooses one of three paths automatically:
46
+
47
+ - **Recognized dataset + a confident date → silent import.** The file matches
48
+ tables already in the workspace and a date was confidently detected, so it is
49
+ imported straight away as a dated snapshot and reported in the activity feed.
50
+ - **Recognized dataset but no / ambiguous date → confirm card.** Importing
51
+ undated would overwrite the prior snapshot, so an **inline confirm card**
52
+ proposes the date (and any per-row date column) before anything is written.
53
+ - **Brand-new structured data → confirm card.** Tables are never created
54
+ silently from a chat drop. The card proposes the full schema, the date, and
55
+ the mode for you to review and apply.
56
+
57
+ Either way, nothing is written until a confident match resolves silently or you
58
+ confirm the card; the confirmed proposal is applied via `POST /api/import/apply`,
59
+ which streams the materialization progress back as NDJSON.
60
+
61
+ ## File-size cap
62
+
63
+ A source file is capped at **50 MB**, and the cap is enforced **on both paths**:
64
+ the streaming upload rejects an oversized file, and the apply route re-`statSync`s
65
+ the retained bytes before reading them — so an oversized or swapped-on-disk
66
+ source (including one reached via a `local_ref` that never went through the
67
+ upload) cannot be streamed whole into memory.
68
+
69
+ ## Library API
70
+
71
+ The inference + materialization functions are exported from `latticesql` and run
72
+ GUI-independently:
73
+
74
+ ```ts
75
+ import {
76
+ inferSchema,
77
+ inferFieldType,
78
+ normalizeName,
79
+ sourceRecords,
80
+ excelToRecords,
81
+ dedupeAndDetectViews,
82
+ detectAsOf,
83
+ detectAsOfCandidates,
84
+ detectAsOfColumns,
85
+ parseCellDate,
86
+ matchSchemaToExisting,
87
+ renameEntities,
88
+ materializeImport,
89
+ } from 'latticesql';
90
+
91
+ // JSON object → proposed schema
92
+ const plan = inferSchema(data); // { entities, dimensions, junctions, skipped }
93
+
94
+ // Detect the as-of date and any per-row date column
95
+ const asOf = detectAsOf(fileName); // ISO YYYY-MM-DD | null
96
+ const asOfColumns = detectAsOfColumns(data, plan);
97
+
98
+ // Detect read-only views (per-slice tabs that mirror a master)
99
+ const { views } = dedupeAndDetectViews(data, plan);
100
+
101
+ // Materialize into a workspace
102
+ const result = await materializeImport({ db, configPath }, data, plan, views, {
103
+ mode: 'both',
104
+ asOf,
105
+ asOfColumn: null,
106
+ });
107
+ // result: { mode, asOf, asOfColumn, tablesCreated, rowsByTable, links, views }
108
+ ```
109
+
110
+ `materializeImport` takes a `mode` of `'schema'` (table structures + dimension
111
+ values + views), `'contents'` (entity rows + links into existing tables), or
112
+ `'both'` (the default). When `asOf` (a file-level ISO date) or `asOfColumn` (a
113
+ per-row date column) is set, rows are stamped and the row identity folds the date
114
+ in, so the same model imported at a new date is a distinct snapshot rather than an
115
+ overwrite. `onProgress` streams the per-phase pipeline steps for a live view.
116
+
117
+ See [CHANGELOG.md](../CHANGELOG.md) for the full 4.2 list and
118
+ [assistant.md](assistant.md) for the chat-drop experience.
package/docs/retrieval.md CHANGED
@@ -25,6 +25,15 @@ const summary = await db.evaluateRetrieval(
25
25
  `detectRetrievalRegressions(baseline, candidate, tolerance)` turns it into a CI
26
26
  gate — a retrieval change that lowers any metric past tolerance fails the build.
27
27
 
28
+ > **v4.2 — the gate can actually fail.** The golden corpus is now ~20 docs with
29
+ > deliberate cross-topic lexical overlap, so the real `search()` scores
30
+ > good-but-imperfect; the committed baseline is **generated** by running the real
31
+ > search (`npm run eval:baseline`) and is sub-perfect (`mrr ≈ 0.92`,
32
+ > `ndcg@3 ≈ 0.94`), never hand-authored. `npm run eval:gate` evaluates the current
33
+ > `search()` against that baseline and exits non-zero on any metric dropping past
34
+ > tolerance; it runs as a required CI step, and a suite test asserts the baseline
35
+ > still has headroom (`mrr < 1`) so the gate can't silently go blind.
36
+
28
37
  ### `lattice doctor` / `diagnoseRetrieval(opts?)`
29
38
 
30
39
  Read-only health: per-table FTS + embedding coverage (soft-deleted rows excluded),
@@ -38,6 +47,17 @@ plus ingest throughput + peak memory — on both dialects, at a configurable sca
38
47
  (`LATTICE_BENCH_ROWS/QUERIES/DIM`). Ships in the package so buyers reproduce the
39
48
  numbers; wire `checkSlos` as a CI SLO gate.
40
49
 
50
+ > **v4.2 — honest vector timing + an advisory SLO gate.** A Postgres integration
51
+ > test runs the benchmark against a real pgvector cluster and asserts the harness
52
+ > built the **native index before** the vector timing loop
53
+ > (`report.vectorIndexed === true`), so `vector.p95` reflects the indexed path,
54
+ > not the O(n) in-process scan; where pgvector is unavailable the test skips with a
55
+ > clear message rather than passing green-by-construction. `npm run slo:gate` runs
56
+ > the real benchmark at a committed scale and checks observed p95 latencies against
57
+ > committed thresholds — it is **advisory, never build-blocking** (shared CI
58
+ > runners are too latency-noisy to gate a merge on), and the output marks whether
59
+ > `vector.p95` reflects a native index or the in-process scan.
60
+
41
61
  ## Better search
42
62
 
43
63
  ### Chunked + contextual embeddings
@@ -72,6 +92,17 @@ Opt-in per-table approximate-nearest-neighbor index built from the stored vector
72
92
  `doctor` reports). Requires the extension server-side (pgvector) or loaded
73
93
  (sqlite-vec).
74
94
 
95
+ > **v4.2 — bounded retrieval reads.** `search()` / `hybridSearch()` clamp the
96
+ > caller's `topK` (`clampTopK`, `SEARCH_TOPK_MAX = 1000`) **before** the indexed
97
+ > arm over-fetches `topK * N` candidates, so a single large `topK` can't fan out
98
+ > into a whole-table read. For a table with **no** native index, the in-process
99
+ > cosine scan can be capped per-table with `embeddings.maxScanChunks`: when the
100
+ > scan would read more than that many stored chunk vectors it throws
101
+ > `EmbeddingScanTooLargeError` (telling you to add a pgvector index or raise the
102
+ > cap) rather than load them all into memory. It is **off by default** (unbounded
103
+ > scan — the historical behavior) and is **never silently truncated**, because a
104
+ > partial cosine scan would return incomplete, wrong results.
105
+
75
106
  ### Hybrid search + ranking + reranker
76
107
 
77
108
  ```ts
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "latticesql",
3
- "version": "4.1.0",
3
+ "version": "4.2.0",
4
4
  "description": "Persistent structured memory for AI agent systems — pluggable SQLite or Postgres backend, LLM context bridge",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
@@ -31,13 +31,16 @@
31
31
  "scripts": {
32
32
  "build": "tsup",
33
33
  "typecheck": "tsc --noEmit",
34
- "lint": "eslint src tests",
35
- "lint:fix": "eslint src tests --fix",
34
+ "lint": "eslint src tests scripts",
35
+ "lint:fix": "eslint src tests scripts --fix",
36
36
  "format": "prettier --write .",
37
37
  "format:check": "prettier --check .",
38
38
  "check:generic": "bash scripts/check-generic.sh",
39
39
  "test": "vitest run",
40
40
  "test:watch": "vitest",
41
+ "eval:baseline": "vite-node scripts/eval-baseline.ts",
42
+ "eval:gate": "vite-node scripts/eval-gate.ts",
43
+ "slo:gate": "vite-node scripts/slo-gate.ts",
41
44
  "test:coverage": "vitest run --coverage",
42
45
  "test:e2e": "playwright test",
43
46
  "docs": "typedoc --out docs-generated src/index.ts",
@@ -65,6 +68,7 @@
65
68
  },
66
69
  "optionalDependencies": {
67
70
  "@aws-sdk/client-s3": "^3.1067.0",
71
+ "exceljs": "^4.4.0",
68
72
  "pg": "^8.11.0",
69
73
  "pgvector": "^0.2.0",
70
74
  "playwright": "^1.48.0",