hyperdb-mcp 0.1.1 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +694 -0
  2. package/package.json +6 -5
package/README.md ADDED
@@ -0,0 +1,694 @@
1
+ # hyperdb-mcp
2
+
3
+ An MCP (Model Context Protocol) server that turns the Hyper columnar database into an instant SQL analytics engine. Data flows in from other MCP plugins or files, lands in Hyper automatically, and becomes queryable with SQL — no setup, no schema files, no database management.
4
+
5
+ Built on the pure-Rust [`hyperdb-api`](../hyperdb-api/) crate for maximum performance: 22M+ rows/sec inserts, 18M+ rows/sec queries, constant memory for billion-row results.
6
+
7
+ ---
8
+
9
+ ## Why
10
+
11
+ LLMs are powerful at reasoning but cannot natively crunch millions of rows. This plugin bridges that gap: another MCP tool produces data, the LLM passes it to `hyperdb-mcp`, Hyper ingests it and makes it SQL-queryable, the LLM runs analytical SQL, and results come back as JSON. Optionally export to CSV, Parquet, Apache Iceberg, Arrow IPC, or `.hyper` (opens directly in **Tableau Desktop**).
12
+
13
+ ---
14
+
15
+ ## Features
16
+
17
+ - **Zero setup** — `HyperProcess` auto-starts the Hyper server
18
+ - **Any data in** — JSON, CSV, Parquet, Arrow IPC, Apache Iceberg; schema inferred or exact
19
+ - **SQL at scale** — thousands to billions of rows
20
+ - **Data out** — export to CSV, Parquet, Apache Iceberg, Arrow IPC, or `.hyper` (Tableau Desktop-ready)
21
+ - **One-shot queries** — `query_file("/tmp/sales.csv", "SELECT ...")` — single call, zero management
22
+ - **Persistent workspace** — load multiple tables, JOIN across them, persist across sessions
23
+ - **Read-only safe mode** — `--read-only` flag for safe deployment
24
+ - **Schema resources** — auto-discover table schemas via `resources/list`
25
+ - **Guided prompts** — `analyze-table`, `compare-tables`, `data-quality`, `suggest-queries`
26
+ - **Inline charts** — bar/line/scatter/histogram as PNG or SVG
27
+ - **Incremental ingest** — `watch_directory` monitors for `.ready` sentinel files
28
+ - **Performance telemetry** — every response includes throughput stats
29
+ - **Smart schema inference** — exact (Arrow/Parquet), structural (JSON), heuristic (CSV) with full-file numeric widening
30
+ - **Pre-ingest file inspection** — `inspect_file` dry-runs the same inference without touching Hyper so LLMs can build safe schema overrides in one shot
31
+ - **Partial schema overrides** — supply just the columns you want to correct (e.g. `{"population":"BIGINT"}`) — the rest keep their inferred type
32
+ - **Rich resource surface** — workspace readme, per-table JSON and CSV samples, and one JSON + one CSV resource per table so LLMs can orient themselves via `resources/list` without any tool calls
33
+ - **Saved queries** — register named read-only SQL with `save_query`; each query becomes `hyper://queries/{name}/definition` (metadata) + `hyper://queries/{name}/result` (live re-run). Persisted in `--workspace` mode, session-only otherwise
34
+ - **Live resource-update notifications** — MCP clients can `resources/subscribe` to any `hyper://...` URI; the server fires `notifications/resources/updated` after every ingest, DDL, watcher event, or saved-query mutation
35
+
36
+ ---
37
+
38
+ ## Installation
39
+
40
+ ### From npm
41
+
42
+ > **Requirement:** Node.js **v21 or later**. Earlier versions ship an
43
+ > older `npx` whose argument parsing is incompatible with the
44
+ > `npx -y hyperdb-mcp` invocation in the MCP config below. If you're
45
+ > on an older Node, see [Upgrading Node.js with nvm](#upgrading-nodejs-with-nvm)
46
+ > below.
47
+
48
+ ```bash
49
+ npm install -g hyperdb-mcp
50
+ ```
51
+
52
+ The npm package bundles both the `hyperdb-mcp` binary and the `hyperd` database server — no additional setup required.
53
+
54
+ ### Upgrading Node.js with nvm
55
+
56
+ `nvm` (Node Version Manager) makes it easy to install and switch between Node.js versions.
57
+
58
+ **macOS / Linux** ([nvm-sh/nvm](https://github.com/nvm-sh/nvm)):
59
+ ```bash
60
+ # install nvm if you don't have it
61
+ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
62
+
63
+ # install and use the latest LTS (>= 21)
64
+ nvm install --lts
65
+ nvm use --lts
66
+ node --version # should report v22.x.x or newer
67
+ ```
68
+
69
+ **Windows** ([coreybutler/nvm-windows](https://github.com/coreybutler/nvm-windows)): download the installer, then in a new shell:
70
+ ```powershell
71
+ nvm install lts
72
+ nvm use lts
73
+ node --version
74
+ ```
75
+
76
+ After upgrading, restart your MCP client so it picks up the new Node binary on `PATH`.
77
+
78
+ ### Building from Source
79
+
80
+ ```bash
81
+ cd hyper-api-rust
82
+ cargo build --release -p hyperdb-mcp
83
+ ```
84
+
85
+ The binary is at `target/release/hyperdb-mcp`.
86
+
87
+ When building from source the `hyperd` executable is **not** bundled, so
88
+ you'll need to provide one. The easiest path is the companion
89
+ [`hyperdb-bootstrap`](../hyperdb-bootstrap/) CLI, which downloads a
90
+ matching pinned `hyperd` for your platform:
91
+
92
+ ```bash
93
+ cargo install hyperdb-bootstrap
94
+ hyperdb-bootstrap download # installs into ./.hyperd/current/
95
+ export HYPERD_PATH="$PWD/.hyperd/current" # or pass via your MCP config
96
+ ```
97
+
98
+ `hyperdb-bootstrap` also has a library API if you'd rather wire the
99
+ download into your own build script — see its
100
+ [README](../hyperdb-bootstrap/README.md). If you already have `hyperd`
101
+ elsewhere (Tableau Hyper API for C++/Python/Java ships one), point
102
+ `HYPERD_PATH` at it or add it to your `PATH`.
103
+
104
+ ### MCP Client Configuration
105
+
106
+ Each AI tool reads MCP server config from a different file but uses the same JSON shape. The base config block using npx (recommended):
107
+ ```json
108
+ {
109
+ "mcpServers": {
110
+ "HyperDB": {
111
+ "type": "stdio",
112
+ "command": "npx",
113
+ "args": ["-y", "hyperdb-mcp"]
114
+ }
115
+ }
116
+ }
117
+ ```
118
+
119
+ Or if you built from source:
120
+ ```json
121
+ {
122
+ "mcpServers": {
123
+ "HyperDB": {
124
+ "type": "stdio",
125
+ "command": "/path/to/hyperdb-mcp",
126
+ "env": {
127
+ "HYPERD_PATH": "/path/to/hyperd"
128
+ }
129
+ }
130
+ }
131
+ }
132
+ ```
133
+
134
+ For a **persistent workspace** (tables survive across sessions), add `"args"`:
135
+ ```json
136
+ "args": ["--workspace", "/path/to/my-project.hyper"]
137
+ ```
138
+ This is still **experimental** and will only work with only one session at a time since the Hyper database is locked by Hyper. Each session is isolated and has its own Hyper instance running. Future work will allow multiple sessions to share the same database but requires work to spin up a shared Hyper instance.
139
+
140
+ #### Claude Code / AI Suite
141
+
142
+ Create or edit `~/.claude/.mcp.json` (global) or `.mcp.json` in the project root (project-scoped). Use the base config block above.
143
+
144
+ After adding the config:
145
+ 1. Start a new Claude Code session. You'll be prompted to approve the server on first use.
146
+ 2. **Auto-approve tools (optional):** Add `"mcp__HyperDB__*"` to the `permissions.allow` array in `~/.claude/settings.json`.
147
+
148
+ #### Claude Desktop
149
+
150
+ Edit `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows). Use the base config block above.
151
+
152
+ #### Cursor
153
+
154
+ Edit `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (project root). Use the base config block above.
155
+
156
+ #### Other MCP Clients
157
+
158
+ Any tool that supports the MCP stdio transport can use this server. Point it at the `hyperdb-mcp` binary and set `HYPERD_PATH` in the environment.
159
+
160
+ ---
161
+
162
+ ## MCP Tools
163
+
164
+ ### One-Shot Tools
165
+
166
+ #### `query_data`
167
+
168
+ Ingest inline data and run a SQL query in a single call.
169
+
170
+ ```
171
+ query_data(data: '[{"region":"West","revenue":1200},...]', sql: 'SELECT region, SUM(revenue) FROM data GROUP BY region')
172
+ ```
173
+
174
+ | Parameter | Type | Required | Description |
175
+ |-----------|------|----------|-------------|
176
+ | `data` | string | yes | JSON array of objects, or CSV text |
177
+ | `sql` | string | yes | SQL query to run against the data |
178
+ | `format` | string | no | `"json"` or `"csv"` — auto-detected if omitted |
179
+ | `table_name` | string | no | Table name for use in SQL — defaults to `"data"` |
180
+ | `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
181
+
182
+ #### `query_file`
183
+
184
+ Ingest a file and run a SQL query in a single call. Streams from disk — handles files of any size.
185
+
186
+ ```
187
+ query_file(path: '/tmp/sales.parquet', sql: 'SELECT TOP 10 * FROM sales ORDER BY amount DESC')
188
+ ```
189
+
190
+ | Parameter | Type | Required | Description |
191
+ |-----------|------|----------|-------------|
192
+ | `path` | string | yes | Path to CSV / JSON / JSONL / Parquet / Arrow IPC file |
193
+ | `sql` | string | yes | SQL query to run |
194
+ | `table_name` | string | no | Table name — defaults to filename stem |
195
+ | `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
196
+
197
+ ### Workspace Tools
198
+
199
+ #### `load_data`
200
+
201
+ Load inline data into a named workspace table.
202
+
203
+ ```
204
+ load_data(table: 'customers', data: '[{"id":1,"name":"Alice"},...]')
205
+ ```
206
+
207
+ | Parameter | Type | Required | Description |
208
+ |-----------|------|----------|-------------|
209
+ | `table` | string | yes | Table name |
210
+ | `data` | string | yes | JSON array of objects, or CSV text |
211
+ | `format` | string | no | `"json"` or `"csv"` — auto-detected |
212
+ | `mode` | string | no | `"replace"` (default) or `"append"` |
213
+ | `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
214
+
215
+ #### `load_file`
216
+
217
+ Load a file into a named workspace table.
218
+
219
+ ```
220
+ load_file(table: 'orders', path: '/tmp/orders.csv')
221
+ ```
222
+
223
+ | Parameter | Type | Required | Description |
224
+ |-----------|------|----------|-------------|
225
+ | `table` | string | yes | Table name |
226
+ | `path` | string | yes | Path to CSV / JSON / JSONL / Parquet / Arrow IPC file |
227
+ | `mode` | string | no | `"replace"` (default) or `"append"` |
228
+ | `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
229
+
230
+ When you're unsure of the right types — or recovering from a previous
231
+ `SCHEMA_MISMATCH` — call [`inspect_file`](#inspect-file) first. It reports the
232
+ exact schema `load_file` would use plus per-column `min` / `max` / `null_count`
233
+ so you can build a minimal, correct override in one shot.
234
+
235
+ #### `load_iceberg`
236
+
237
+ Load an [Apache Iceberg](https://iceberg.apache.org/) table into a named
238
+ workspace table. Pass the absolute path to the Iceberg table root (the
239
+ directory containing `metadata/` and `data/`); hyperd's native Iceberg
240
+ reader derives the schema and resolves the snapshot.
241
+
242
+ ```
243
+ load_iceberg(table: 'sales', path: '/lake/warehouse/db/sales')
244
+ ```
245
+
246
+ | Parameter | Type | Required | Description |
247
+ |-----------|------|----------|-------------|
248
+ | `table` | string | yes | Target Hyper table name |
249
+ | `path` | string | yes | Absolute path to the Iceberg table root directory |
250
+ | `mode` | string | no | `"replace"` (default) or `"append"` |
251
+ | `metadata_filename` | string | no | Pin a specific snapshot, e.g. `"v2.metadata.json"`. Omit for latest. |
252
+ | `version_as_of` | integer | no | Pin a snapshot by version number |
253
+
254
+ Schema overrides are not accepted — hyperd derives the schema from the
255
+ Iceberg table metadata.
256
+
257
+ #### `query`
258
+
259
+ Run a **read-only** SQL query against the workspace. Accepts `SELECT`, `WITH`, `EXPLAIN`, `SHOW`, `VALUES`. For DDL/DML use `execute`.
260
+
261
+ ```
262
+ query(sql: 'SELECT c.name, SUM(o.amount) FROM orders o JOIN customers c ON o.customer_id = c.id GROUP BY c.name')
263
+ ```
264
+
265
+ #### `execute`
266
+
267
+ Execute a **mutating** SQL statement: `CREATE TABLE`, `INSERT`, `UPDATE`, `DELETE`, `DROP TABLE`, `ALTER`, `COPY`, etc. Returns the affected row count. Disabled in read-only mode.
268
+
269
+ ```
270
+ execute(sql: 'CREATE TABLE archived_orders AS SELECT * FROM orders WHERE year < 2024')
271
+ ```
272
+
273
+ #### `describe`
274
+
275
+ List all workspace tables with their schemas, column types, and row counts.
276
+
277
+ #### `sample`
278
+
279
+ Return the schema, total row count, and first N rows of a table in a single call.
280
+
281
+ ```
282
+ sample(table: 'orders', n: 10)
283
+ ```
284
+
285
+ | Parameter | Type | Required | Description |
286
+ |-----------|------|----------|-------------|
287
+ | `table` | string | yes | Table name |
288
+ | `n` | int | no | Rows to return (default: 5, clamped to 1..=100) |
289
+
290
+ ### Diagnostics
291
+
292
+ #### `inspect_file`
293
+
294
+ Dry-run schema inference on a CSV, Parquet, or Arrow IPC file **without ingesting
295
+ it**. Returns the exact schema `load_file` / `query_file` would use (including
296
+ the full-file numeric widening pass) plus per-column `min`, `max`, `null_count`,
297
+ and `sample_values`. Nothing is written to Hyper and `hyperd` is not even
298
+ started.
299
+
300
+ Use it **before** `load_file` whenever you are unsure about types, or **after** a
301
+ `SCHEMA_MISMATCH` failure to pick the right widening. The LLM can feed the
302
+ reported `type` + `min` / `max` directly into a partial `schema` override on the
303
+ subsequent `load_file` call.
304
+
305
+ ```
306
+ inspect_file(path: '/tmp/owid-population.csv')
307
+ ```
308
+
309
+ | Parameter | Type | Required | Description |
310
+ |-----------|------|----------|-------------|
311
+ | `path` | string | yes | Path to CSV / JSON / JSONL / Parquet / Arrow IPC file |
312
+ | `sample_rows` | int | no | Sample values / rows per column (default 5, clamped 1..=50) |
313
+
314
+ Response shape:
315
+
316
+ ```json
317
+ {
318
+ "file_format": "csv",
319
+ "row_count": 63000,
320
+ "file_size_bytes": 4831204,
321
+ "columns": [
322
+ { "name": "Entity", "type": "TEXT", "nullable": true, "null_count": 0, "sample_values": ["Afghanistan", ...] },
323
+ { "name": "Year", "type": "INT", "nullable": true, "null_count": 0, "min": 1800, "max": 2023, "sample_values": ["1800", ...] },
324
+ { "name": "Population", "type": "BIGINT", "nullable": true, "null_count": 12, "min": 500, "max": 8002572256, "sample_values": ["4000000", ...] }
325
+ ],
326
+ "sample_rows": [ { "Entity": "Afghanistan", "Year": "1800", "Population": "2805829" } ]
327
+ }
328
+ ```
329
+
330
+ `sample_values` and `sample_rows` are **always strings**, regardless of the inferred column `type` — they report what the file contains on disk, before any type coercion, so the LLM can compare the raw text against `min` / `max` when building a `schema` override. Use `type` (and `min` / `max`) for the typed view; use `sample_values` for the raw view.
331
+
332
+ ### Saved Queries
333
+
334
+ Register a named read-only SQL query once; read its live result as many
335
+ times as you like via a resource URI. Useful for dashboard-style recurring
336
+ views and for giving LLMs a stable "bookmark" set of key queries that
337
+ resources/list advertises up front.
338
+
339
+ Each saved query produces **two** resources:
340
+
341
+ - `hyper://queries/{name}/definition` — the stored SQL plus metadata
342
+ (description, `created_at`) as JSON.
343
+ - `hyper://queries/{name}/result` — re-runs the SQL on every read and
344
+ returns `{ name, result: [...], stats: {...} }`.
345
+
346
+ **Persistence:** queries saved while `--workspace <path>` is set are
347
+ stored in the `_hyperdb_saved_queries` meta-table inside the `.hyper`
348
+ file and survive server restarts. In ephemeral workspaces they live only
349
+ for the lifetime of the server process.
350
+
351
+ #### `save_query`
352
+
353
+ ```
354
+ save_query(name: 'top_5_customers', sql: 'SELECT customer, SUM(amount) AS total FROM orders GROUP BY customer ORDER BY total DESC LIMIT 5', description: 'Biggest spenders this year')
355
+ ```
356
+
357
+ | Parameter | Type | Required | Description |
358
+ |---|---|---|---|
359
+ | `name` | string | yes | Unique identifier used as the URI path component |
360
+ | `sql` | string | yes | Read-only SQL (SELECT / WITH / EXPLAIN / SHOW / VALUES) |
361
+ | `description` | string | no | Human-friendly summary |
362
+
363
+ Duplicate names are rejected with `INVALID_ARGUMENT` — use `delete_query`
364
+ first if you intend to overwrite. Non-read-only SQL is rejected with
365
+ `SQL_ERROR`. Disabled in read-only mode.
366
+
367
+ #### `delete_query`
368
+
369
+ ```
370
+ delete_query(name: 'top_5_customers')
371
+ ```
372
+
373
+ | Parameter | Type | Required | Description |
374
+ |---|---|---|---|
375
+ | `name` | string | yes | Name of the saved query to remove |
376
+
377
+ Returns `{ "deleted": true }` when the query existed, `{ "deleted": false }`
378
+ when it did not (no error on unknown names). Disabled in read-only mode.
379
+
380
+ ### Export Tools
381
+
382
+ #### `export`
383
+
384
+ Write query results or a table to a file.
385
+
386
+ ```
387
+ export(table: 'orders', path: '~/Desktop/orders.parquet', format: 'parquet')
388
+ export(sql: 'SELECT ...', path: '~/Desktop/analysis.hyper', format: 'hyper')
389
+ ```
390
+
391
+ | Parameter | Type | Required | Description |
392
+ |-----------|------|----------|-------------|
393
+ | `sql` | string | no | Query to export (if omitted, exports whole table) |
394
+ | `table` | string | no | Table name (used if `sql` omitted) |
395
+ | `path` | string | yes | Output file path |
396
+ | `format` | string | yes | `"csv"`, `"parquet"`, `"iceberg"`, `"arrow_ipc"`, or `"hyper"` |
397
+
398
+ The `"hyper"` format produces a `.hyper` file that opens directly in **Tableau Desktop**.
399
+
400
+ ### Visualization
401
+
402
+ #### `chart`
403
+
404
+ Render a chart from a SQL query and return it inline as an image.
405
+
406
+ ```
407
+ chart(sql: 'SELECT product, SUM(revenue) as total FROM sales GROUP BY product', chart_type: 'bar', x: 'product', y: 'total', title: 'Revenue by Product')
408
+ ```
409
+
410
+ | Parameter | Type | Required | Description |
411
+ |-----------|------|----------|-------------|
412
+ | `sql` | string | yes | Read-only SQL query returning the data to plot |
413
+ | `chart_type` | string | yes | `bar`, `line`, `scatter`, or `histogram` |
414
+ | `x` | string | yes* | X-axis column (for histogram, the value column) |
415
+ | `y` | string | yes* | Y-axis column (not required for histogram) |
416
+ | `series` | string | no | Grouping column for multi-series plots |
417
+ | `title` | string | no | Chart title |
418
+ | `format` | string | no | `png` (default) or `svg` |
419
+ | `width` | int | no | Pixels (default 800, clamped 200..4096) |
420
+ | `height` | int | no | Pixels (default 480, clamped 150..4096) |
421
+ | `bins` | int | no | Histogram bins (default 20, clamped 1..500) |
422
+
423
+ Returns an `ImageContent` (base64 PNG or SVG) plus a stats JSON block.
424
+
425
+ ### Incremental Ingest
426
+
427
+ #### `watch_directory` / `unwatch_directory`
428
+
429
+ Monitor a directory for data files and auto-append them to a target table.
430
+
431
+ ```
432
+ watch_directory(path: '/tmp/inbox', table: 'events')
433
+ unwatch_directory(path: '/tmp/inbox')
434
+ ```
435
+
436
+ **Producer protocol (`.ready` sentinel):**
437
+
438
+ 1. Write data file (e.g. `foo.csv`) and close it.
439
+ 2. Create a zero-byte companion `foo.csv.ready` — this is the atomic signal.
440
+ 3. Poll for the absence of `foo.csv.ready` to confirm the watcher is done.
441
+
442
+ On success, both files are deleted. On failure, both are moved to `failed/` with a `.error` JSON file.
443
+
444
+ Key properties:
445
+ - **One directory, one table, append mode** — files must match the target schema.
446
+ - **Initial sweep** — pre-existing `.ready` files are processed immediately.
447
+ - **Read-only mode** — `watch_directory` is blocked; `unwatch_directory` is always allowed.
448
+ - **Cleanup** — dropping the server or calling `unwatch_directory` terminates the background thread.
449
+
450
+ ### Utility Tools
451
+
452
+ #### `status`
453
+
454
+ Returns plugin health, workspace mode, table count, total rows, disk usage, read-only flag, and active directory watchers with per-watcher stats.
455
+
456
+ ---
457
+
458
+ ## MCP Resources
459
+
460
+ The server exposes workspace state as MCP **Resources**, discoverable via
461
+ `resources/list`. Each resource advertises its own MIME type so clients
462
+ can route it appropriately (LLM context vs. file download vs. chart).
463
+
464
+ | URI | MIME | Content |
465
+ |-----|------|---------|
466
+ | `hyper://workspace` | `application/json` | Workspace mode, table count, total rows, disk usage |
467
+ | `hyper://tables` | `application/json` | Full list of tables with schemas and row counts |
468
+ | `hyper://readme` | `text/markdown` | Workspace overview as markdown: table catalog, related resources per table, and tool hints for a cold-started LLM |
469
+ | `hyper://tables/{name}/schema` | `application/json` | Columns, types, nullability, and row count for one table |
470
+ | `hyper://tables/{name}/sample` | `application/json` | First 5 rows of a table as JSON, with schema |
471
+ | `hyper://tables/{name}/csv-sample` | `text/csv` | First 20 rows of a table as CSV, header-first |
472
+ | `hyper://queries/{name}/definition` | `application/json` | Stored SQL + metadata for a saved query |
473
+ | `hyper://queries/{name}/result` | `application/json` | Live result of a saved query — re-runs on every read |
474
+
475
+ Resource templates (discoverable via `resources/templates/list`):
476
+
477
+ - `hyper://tables/{name}/schema`
478
+ - `hyper://tables/{name}/sample`
479
+ - `hyper://tables/{name}/csv-sample`
480
+ - `hyper://queries/{name}/definition`
481
+ - `hyper://queries/{name}/result`
482
+
483
+ The internal `_hyperdb_saved_queries` meta-table used to persist saved
484
+ queries is deliberately hidden from `resources/list` and
485
+ `hyper://tables` — callers see only user-visible data tables.
486
+
487
+ ### Resource-update notifications
488
+
489
+ HyperDB advertises both the `resources.subscribe` and
490
+ `resources.listChanged` capabilities in its `initialize` response. Clients
491
+ can subscribe to any `hyper://...` URI via `resources/subscribe` and will
492
+ then receive `notifications/resources/updated` messages whenever the
493
+ server detects a change, without polling.
494
+
495
+ The server fires **targeted** updates for the URIs affected by each kind
496
+ of mutation:
497
+
498
+ | Trigger | Updated URIs | `resources/list_changed`? |
499
+ |---|---|---|
500
+ | `load_data` / `load_file` (replace mode) | `hyper://workspace`, `hyper://tables`, `hyper://readme`, per-table schema + sample + csv-sample | Yes |
501
+ | `load_data` / `load_file` (append mode) | Same per-table + summary URIs | No &sup1; |
502
+ | `watch_directory` ingest of a `.ready` pair | Same per-table + summary URIs | No &sup1; |
503
+ | `execute` (INSERT / UPDATE / DELETE) | Workspace summary URIs | No |
504
+ | `execute` (CREATE / DROP / ALTER / TRUNCATE / RENAME) | Workspace summary URIs | Yes |
505
+ | `save_query` | (none per-URI) | Yes — two new `hyper://queries/{name}/...` resources |
506
+ | `delete_query` | `hyper://queries/{name}/definition`, `hyper://queries/{name}/result` | Yes — two resources disappeared |
507
+
508
+ &sup1; Append-mode ingest (both `load_*` and the watcher) auto-creates the target table when it doesn't exist, but **does not** fire `list_changed` for that creation. Clients that need to discover watcher-created tables should re-read `hyper://tables` after subscribing, or use the per-table `updated` notification as a trigger to refresh their list. Tracked in `DEVELOPMENT.md` as tech debt.
509
+
510
+ Notifications are fire-and-forget — send failures (typically due to a
511
+ client disconnect) are logged at the `debug` level and the registry
512
+ prunes dead peers lazily. This keeps mutation paths fast and free of
513
+ back-pressure concerns.
514
+
515
+ All JSON-typed resources return a pretty-printed object; Markdown and
516
+ CSV resources are returned verbatim.
517
+
518
+ ---
519
+
520
+ ## MCP Prompts
521
+
522
+ Four guided analytical workflows registered as MCP **Prompts**.
523
+
524
+ | Prompt | Arguments | What it does |
525
+ |--------|-----------|--------------|
526
+ | `analyze-table` | `table` | Schema walkthrough, column statistics, data quality flags |
527
+ | `compare-tables` | `table_a`, `table_b` | Schema alignment, JOIN key suggestions, analytical opportunities |
528
+ | `data-quality` | `table` | Systematic NULL / duplicate / cardinality / outlier checks |
529
+ | `suggest-queries` | `table`, `goal?` | 5 analytical SQL queries with explanations, optionally goal-guided |
530
+
531
+ ---
532
+
533
+ ## Read-Only Mode
534
+
535
+ ```bash
536
+ hyperdb-mcp --workspace ~/analytics.hyper --read-only
537
+ ```
538
+
539
+ - **Allowed:** `query`, `query_data`, `query_file`, `describe`, `sample`, `inspect_file`, `status`, `export`
540
+ - **Blocked:** `execute`, `load_data`, `load_file`, `watch_directory`, `save_query`, `delete_query` — return `READ_ONLY_VIOLATION`
541
+ - **Resources, prompts, and resource subscriptions** work normally — read-only clients can still subscribe to `hyper://...` URIs and receive notifications when other (non-read-only) connections mutate state
542
+
543
+ The `query` tool also enforces read-only at the SQL level — only `SELECT`/`WITH`/`EXPLAIN`/`SHOW`/`VALUES` are accepted.
544
+
545
+ ---
546
+
547
+ ## Data Flow Patterns
548
+
549
+ - **Small data (LLM relay):** For <10K rows. The LLM gets data from another plugin and passes it inline via `query_data`.
550
+ - **Large data (file intermediary):** For thousands to billions of rows. Source plugin exports to a file, the LLM calls `query_file`. Data never enters the LLM context — constant memory regardless of file size.
551
+
552
+ ---
553
+
554
+ ## Schema Inference
555
+
556
+ Three tiers, chosen automatically based on the data source:
557
+
558
+ | Tier | Source | How |
559
+ |------|--------|-----|
560
+ | **Exact** | Arrow IPC, Parquet | Schema read from file metadata. Types preserved exactly. |
561
+ | **Structural** | JSON | All objects scanned. Per-column type widening: Int → BigInt → Double. Mixed types → TEXT. |
562
+ | **Heuristic** | CSV | Header row for names, first 1,000 rows sampled for types. A second full-file streaming pass then **widens** numeric columns if needed (INT → BIGINT → NUMERIC(38,0); INT/BIGINT → DOUBLE PRECISION if any later row contains a decimal). |
563
+
564
+ **JSON file shapes.** `load_file` and `query_file` accept two JSON
565
+ representations and auto-detect between them from the first non-whitespace
566
+ byte: a top-level JSON array of objects (e.g. `[{...}, {...}]`) or
567
+ newline-delimited JSON (JSONL / NDJSON — one JSON object per line, the
568
+ format hyperd's own logs use). Blank lines are tolerated. Malformed
569
+ JSONL surfaces a `SCHEMA_MISMATCH` error naming the offending line
570
+ number.
571
+
572
+ **Content sniffing for unknown extensions.** Files with extensions the
573
+ dispatcher doesn't recognize (`.log`, `.txt`, no extension at all) are
574
+ classified by peeking at the first non-whitespace byte: `[` or `{`
575
+ routes to JSON, anything else to CSV. This means hyperd's raw `.log`
576
+ files load through `load_file` directly, no rename or preprocessing
577
+ required. Binary formats (`.parquet`, `.arrow`, `.ipc`, `.feather`,
578
+ `.pq`) always win by extension since they're not text-sniffable.
579
+ `inspect_file` uses the exact same dispatcher so its report can never
580
+ disagree with what `load_file` would do.
581
+
582
+ **CSV NULL handling.** Unquoted empty cells (`,,`) load as SQL NULL —
583
+ matching PostgreSQL's CSV convention and `inspect_file`'s `null_count`
584
+ diagnostics. Quoted empty strings (`,"",`) load as the literal empty
585
+ string. This means downstream `WHERE col IS NULL` works directly without
586
+ a defensive `OR col = ''` clause.
587
+
588
+ The full-file CSV widening pass specifically protects against the "big value
589
+ hidden at the end of the file" failure mode — e.g. an aggregate row whose
590
+ `population` is ~8 billion tucked in after 60 000 country-sized rows. Without
591
+ it, the first-pass sample would pick `INT` and the COPY would fail with
592
+ `SCHEMA_MISMATCH` / SQLSTATE 22003 mid-ingest.
593
+
594
+ For implementation details (widening rules, type mapping tables), see the
595
+ module docs in `src/schema.rs` and `src/ingest_arrow.rs`.
596
+
597
+ ### Schema Overrides
598
+
599
+ Every data-in tool (`query_data`, `query_file`, `load_data`, `load_file`)
600
+ accepts an optional `schema` parameter: a **partial** map from column name to
601
+ Hyper SQL type.
602
+
603
+ ```json
604
+ { "schema": { "population": "BIGINT", "order_date": "DATE" } }
605
+ ```
606
+
607
+ Semantics:
608
+
609
+ - Keys are matched to columns **by name** (case-sensitive). Column order in
610
+ the JSON object does not need to match the file — the inferred order from
611
+ the file is preserved.
612
+ - Columns **not** listed in the override keep their inferred type. You only
613
+ specify the columns you want to correct.
614
+ - Unknown column names and unknown type strings are rejected up front with a
615
+ `SCHEMA_MISMATCH` error that lists the real column names, so the LLM can
616
+ self-correct without another round-trip.
617
+ - Supported type strings: `INT`, `BIGINT`, `NUMERIC(p,s)` (e.g.
618
+ `NUMERIC(38,0)` or `NUMERIC(12,2)`), `DOUBLE PRECISION`, `TEXT`, `BOOL`,
619
+ `DATE`, `TIMESTAMP`.
620
+
621
+ **Recommended workflow for unfamiliar data:**
622
+
623
+ 1. Call `inspect_file` → read the reported `type` + `min` / `max` per column.
624
+ 2. For any column whose `max` exceeds its inferred type's range, or where
625
+ you want stricter parsing than CSV heuristics give, build a partial
626
+ override.
627
+ 3. Pass it to `load_file` / `query_file`.
628
+
629
+ ---
630
+
631
+ ## SQL Dialect
632
+
633
+ Hyper uses the Salesforce Data Cloud SQL dialect (PostgreSQL-compatible with extensions). Supports `SELECT`, JOINs, subqueries, CTEs, window functions, aggregations, DDL, DML, and `COPY FROM`.
634
+
635
+ Full reference: [Data Cloud SQL Reference](https://developer.salesforce.com/docs/data/data-cloud-query-guide/references/dc-sql-reference/data-cloud-sql-context.html)
636
+
637
+ ---
638
+
639
+ ## CLI Reference
640
+
641
+ ```
642
+ hyperdb-mcp [OPTIONS]
643
+
644
+ Options:
645
+ --workspace <PATH> Path to the `.hyper` workspace file for persistent mode (omit for ephemeral)
646
+ --read-only Disable mutating tools (execute, load_data, load_file, save_query, delete_query, watch_directory)
647
+ --bare Skip MCP-managed auxiliary tables (`_table_catalog`) and force saved queries into in-memory storage, even with --workspace
648
+
649
+ Environment:
650
+ HYPERD_PATH Path to hyperd binary (auto-detected if on PATH)
651
+ ```
652
+
653
+ ---
654
+
655
+ ## Error Handling
656
+
657
+ Errors include a machine-readable code and a suggestion:
658
+
659
+ | Code | When | Recovery |
660
+ |---|---|---|
661
+ | `HYPERD_NOT_FOUND` | `hyperd` not found | Set `HYPERD_PATH` or install Hyper |
662
+ | `FILE_NOT_FOUND` | File path doesn't exist | Verify the path |
663
+ | `UNSUPPORTED_FORMAT` | Unrecognized file type | Specify `format` explicitly |
664
+ | `SCHEMA_MISMATCH` | Data doesn't match inferred types, numeric overflow (SQLSTATE 22003), or invalid text for target type (SQLSTATE 22P02) | Call `inspect_file` then retry with a partial `schema` override (e.g. `{"population":"BIGINT"}` or `{"id":"TEXT"}`) |
665
+ | `SQL_ERROR` | Invalid SQL | Fix the query |
666
+ | `TABLE_NOT_FOUND` | Table doesn't exist | Use `describe` to list tables |
667
+ | `READ_ONLY_VIOLATION` | Mutating op in read-only mode | Use `query_*` / `inspect_file`, or restart without `--read-only` |
668
+ | `CONNECTION_LOST` | `hyperd` crashed or wire protocol desynchronized | Retry — the server tears down the engine and reconnects on the next call |
669
+
670
+ Server-returned errors include a machine-readable `code`, a `message`, and a
671
+ `suggestion` with concrete retry guidance. The `SCHEMA_MISMATCH` suggestion for
672
+ an overflow names the workflow directly: "call `inspect_file`, then retry with
673
+ a partial schema override", so the LLM does not need to infer the recovery
674
+ steps from the SQLSTATE alone.
675
+
676
+ ---
677
+
678
+ ## Troubleshooting
679
+
680
+ **Tools not discovered by the client** — Verify the `initialize` response advertises `"capabilities": {"tools": {}}`. Pipe a raw `initialize` JSON-RPC request to the binary to check.
681
+
682
+ **Server registered but tools not callable (Claude Code)** — Add `"mcp__HyperDB__*"` to the `permissions.allow` array in `~/.claude/settings.json`.
683
+
684
+ **hyperd not found** — Set `HYPERD_PATH` in the MCP server's `env` config, or place `hyperd` on your `PATH`.
685
+
686
+ ---
687
+
688
+ ## Related Documentation
689
+
690
+ - **[Main README](../README.md)** — Getting started with the Hyper API
691
+ - **[hyperdb-api](../hyperdb-api/)** — Core Rust API (sync/async connections, inserter, query)
692
+ - **[DEVELOPMENT.md](DEVELOPMENT.md)** — Internal architecture, design decisions, contributor guide
693
+ - **[ROADMAP.md](ROADMAP.md)** — Forward-looking design sketches for features that aren't built yet
694
+ - **[Design Spec](../docs/specs/hyperdb-mcp-design.md)** — Full design document
package/package.json CHANGED
@@ -1,17 +1,18 @@
1
1
  {
2
2
  "name": "hyperdb-mcp",
3
- "version": "0.1.1",
3
+ "version": "0.1.3",
4
4
  "description": "HyperDB MCP server — instant SQL analytics for LLM workflows",
5
5
  "bin": {
6
6
  "hyperdb-mcp": "bin.js"
7
7
  },
8
8
  "optionalDependencies": {
9
- "hyperdb-mcp-darwin-arm64": "0.1.1",
10
- "hyperdb-mcp-linux-x64-gnu": "0.1.1",
11
- "hyperdb-mcp-win32-x64-msvc": "0.1.1"
9
+ "hyperdb-mcp-darwin-arm64": "0.1.3",
10
+ "hyperdb-mcp-linux-x64-gnu": "0.1.3",
11
+ "hyperdb-mcp-win32-x64-msvc": "0.1.3"
12
12
  },
13
13
  "files": [
14
- "bin.js"
14
+ "bin.js",
15
+ "README.md"
15
16
  ],
16
17
  "keywords": [
17
18
  "hyper",