hyperdb-mcp 0.1.1 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +828 -0
  2. package/package.json +6 -5
package/README.md ADDED
@@ -0,0 +1,828 @@
1
+ # hyperdb-mcp
2
+
3
+ > **Note:** This crate was vibe-engineered with heavy use of AI coding assistants. The 0.1.x line may still undergo large breaking changes; the public API won't settle until the 1.0.0 release.
4
+
5
+ An MCP (Model Context Protocol) server that turns the Hyper columnar database into an instant SQL analytics engine. Data flows in from other MCP plugins or files, lands in Hyper automatically, and becomes queryable with SQL — no setup, no schema files, no database management.
6
+
7
+ Built on the pure-Rust [`hyperdb-api`](../hyperdb-api/) crate for maximum performance: 22M+ rows/sec inserts, 18M+ rows/sec queries, constant memory for billion-row results.
8
+
9
+ ---
10
+
11
+ ## Why
12
+
13
+ LLMs are powerful at reasoning but cannot natively crunch millions of rows. This plugin bridges that gap: another MCP tool produces data, the LLM passes it to `hyperdb-mcp`, Hyper ingests it and makes it SQL-queryable, the LLM runs analytical SQL, and results come back as JSON. Optionally export to CSV, Parquet, Apache Iceberg, Arrow IPC, or `.hyper` (opens directly in **Tableau Desktop**).
14
+
15
+ ### Queryable Memory for AI
16
+
17
+ Unlike flat-text memory systems that store blobs and retrieve by similarity search, HyperDB gives LLMs **structured, queryable long-term memory**. The persistent database survives across sessions — anything the LLM stores there can be JOINed, filtered, aggregated, and reasoned over with full SQL in any future conversation.
18
+
19
+ This means an LLM can:
20
+ - **Accumulate knowledge over time** — store reference tables, project decisions, user preferences, learned facts
21
+ - **Cross-reference across sessions** — JOIN today's analysis against historical data from last week
22
+ - **Answer complex recall questions** — "Which projects had budget overruns in Q1?" is a SQL query, not a fuzzy text search
23
+ - **Build on prior work** — load yesterday's cleaned dataset and extend it without re-processing from scratch
24
+ - **Maintain structured context** — store relationship graphs, timelines, or decision logs as proper tables with typed columns
25
+
26
+ The ephemeral database is scratch space (think: a whiteboard). The persistent database is long-term memory (think: a filing cabinet you can query). Multiple AI clients sharing the same daemon see the same persistent data — so Claude Code, Cursor, and VS Code Copilot can all read from and contribute to the same knowledge base.
27
+
28
+ ---
29
+
30
+ ## Features
31
+
32
+ - **Zero setup** — `HyperProcess` auto-starts the Hyper server
33
+ - **Shared `hyperd` daemon** — one Hyper process per user, shared across all MCP clients (Claude Code, Cursor, VS Code, etc.) for reduced memory overhead and concurrent access to the same persistent databases
34
+ - **Queryable long-term memory** — persistent database survives across sessions; LLMs can store, recall, JOIN, and aggregate structured knowledge over time — not just retrieve text blobs, but reason over them with SQL
35
+ - **Any data in** — JSON, CSV, Parquet, Arrow IPC, Apache Iceberg; schema inferred or exact
36
+ - **SQL at scale** — thousands to billions of rows
37
+ - **Data out** — export to CSV, Parquet, Apache Iceberg, Arrow IPC, or `.hyper` (Tableau Desktop-ready)
38
+ - **One-shot queries** — `query_file("/tmp/sales.csv", "SELECT ...")` — single call, zero management
39
+ - **Cross-session continuity** — load multiple tables, JOIN across them, persist across sessions; pick up exactly where you left off
40
+ - **Read-only safe mode** — `--read-only` flag for safe deployment
41
+ - **Schema resources** — auto-discover table schemas via `resources/list`
42
+ - **Guided prompts** — `analyze-table`, `compare-tables`, `data-quality`, `suggest-queries`
43
+ - **Inline charts** — bar/line/scatter/histogram as PNG or SVG
44
+ - **Incremental ingest** — `watch_directory` monitors for `.ready` sentinel files
45
+ - **Performance telemetry** — every response includes throughput stats
46
+ - **Smart schema inference** — exact (Arrow/Parquet), structural (JSON), heuristic (CSV) with full-file numeric widening
47
+ - **Pre-ingest file inspection** — `inspect_file` dry-runs the same inference without touching Hyper so LLMs can build safe schema overrides in one shot
48
+ - **Partial schema overrides** — supply just the columns you want to correct (e.g. `{"population":"BIGINT"}`) — the rest keep their inferred type
49
+ - **Rich resource surface** — workspace readme, per-table JSON and CSV samples, and one JSON + one CSV resource per table so LLMs can orient themselves via `resources/list` without any tool calls
50
+ - **Saved queries** — register named read-only SQL with `save_query`; each query becomes `hyper://queries/{name}/definition` (metadata) + `hyper://queries/{name}/result` (live re-run). Persisted in the persistent attachment, session-only when `--ephemeral-only`
51
+ - **Live resource-update notifications** — MCP clients can `resources/subscribe` to any `hyper://...` URI; the server fires `notifications/resources/updated` after every ingest, DDL, watcher event, or saved-query mutation
52
+
53
+ ---
54
+
55
+ ## Installation
56
+
57
+ ### From npm
58
+
59
+ > **Requirement:** Node.js **v21 or later**. Earlier versions ship an
60
+ > older `npx` whose argument parsing is incompatible with the
61
+ > `npx -y hyperdb-mcp` invocation in the MCP config below. If you're
62
+ > on an older Node, see [Upgrading Node.js with nvm](#upgrading-nodejs-with-nvm)
63
+ > below.
64
+
65
+ ```bash
66
+ npm install -g hyperdb-mcp
67
+ ```
68
+
69
+ The npm package bundles both the `hyperdb-mcp` binary and the `hyperd` database server — no additional setup required.
70
+
71
+ ### Upgrading Node.js with nvm
72
+
73
+ `nvm` (Node Version Manager) makes it easy to install and switch between Node.js versions.
74
+
75
+ **macOS / Linux** ([nvm-sh/nvm](https://github.com/nvm-sh/nvm)):
76
+ ```bash
77
+ # install nvm if you don't have it
78
+ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
79
+
80
+ # install and use the latest LTS (>= 21)
81
+ nvm install --lts
82
+ nvm use --lts
83
+ node --version # should report v22.x.x or newer
84
+ ```
85
+
86
+ **Windows** ([coreybutler/nvm-windows](https://github.com/coreybutler/nvm-windows)): download the installer, then in a new shell:
87
+ ```powershell
88
+ nvm install lts
89
+ nvm use lts
90
+ node --version
91
+ ```
92
+
93
+ After upgrading, restart your MCP client so it picks up the new Node binary on `PATH`.
94
+
95
+ ### Building from Source
96
+
97
+ ```bash
98
+ cd hyper-api-rust
99
+ cargo build --release -p hyperdb-mcp
100
+ ```
101
+
102
+ The binary is at `target/release/hyperdb-mcp`.
103
+
104
+ When building from source the `hyperd` executable is **not** bundled, so
105
+ you'll need to provide one. The easiest path is the companion
106
+ [`hyperdb-bootstrap`](../hyperdb-bootstrap/) CLI, which downloads a
107
+ matching pinned `hyperd` for your platform:
108
+
109
+ ```bash
110
+ cargo install hyperdb-bootstrap
111
+ hyperdb-bootstrap download # installs into ./.hyperd/current/
112
+ export HYPERD_PATH="$PWD/.hyperd/current" # or pass via your MCP config
113
+ ```
114
+
115
+ `hyperdb-bootstrap` also has a library API if you'd rather wire the
116
+ download into your own build script — see its
117
+ [README](../hyperdb-bootstrap/README.md). If you already have `hyperd`
118
+ elsewhere (Tableau Hyper API for C++/Python/Java ships one), point
119
+ `HYPERD_PATH` at it or add it to your `PATH`.
120
+
121
+ ### MCP Client Configuration
122
+
123
+ Each AI tool reads MCP server config from a different file but uses the same JSON shape. The base config block using npx (recommended):
124
+ ```json
125
+ {
126
+ "mcpServers": {
127
+ "HyperDB": {
128
+ "type": "stdio",
129
+ "command": "npx",
130
+ "args": ["-y", "hyperdb-mcp"]
131
+ }
132
+ }
133
+ }
134
+ ```
135
+
136
+ Or if you built from source:
137
+ ```json
138
+ {
139
+ "mcpServers": {
140
+ "HyperDB": {
141
+ "type": "stdio",
142
+ "command": "/path/to/hyperdb-mcp",
143
+ "env": {
144
+ "HYPERD_PATH": "/path/to/hyperd"
145
+ }
146
+ }
147
+ }
148
+ }
149
+ ```
150
+
151
+ By default, persistent storage lives at the platform data dir (`~/Library/Application Support/hyperdb/workspace.hyper` on macOS, `~/.local/share/hyperdb/workspace.hyper` on Linux, `%APPDATA%\hyperdb\workspace.hyper` on Windows). To use a custom path:
152
+ ```json
153
+ "args": ["--persistent-db", "/path/to/my-project.hyper"]
154
+ ```
155
+
156
+ Multiple MCP clients can point at the **same** persistent file simultaneously — they all connect through the shared `hyperd` daemon and use Hyper's MVCC transaction isolation. See [Operating Modes](#operating-modes) below.
157
+
158
+ #### Claude Code / AI Suite
159
+
160
+ Create or edit `~/.claude/.mcp.json` (global) or `.mcp.json` in the project root (project-scoped). Use the base config block above.
161
+
162
+ After adding the config:
163
+ 1. Start a new Claude Code session. You'll be prompted to approve the server on first use.
164
+ 2. **Auto-approve tools (optional):** Add `"mcp__HyperDB__*"` to the `permissions.allow` array in `~/.claude/settings.json`.
165
+
166
+ #### Claude Desktop
167
+
168
+ Edit `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows). Use the base config block above.
169
+
170
+ #### Cursor
171
+
172
+ Edit `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (project root). Use the base config block above.
173
+
174
+ #### Other MCP Clients
175
+
176
+ Any tool that supports the MCP stdio transport can use this server. Point it at the `hyperdb-mcp` binary and set `HYPERD_PATH` in the environment.
177
+
178
+ ---
179
+
180
+ ## Operating Modes
181
+
182
+ Each session has **two databases**: an ephemeral primary (scratch space — always created fresh per session, deleted on exit) and a persistent database (queryable long-term memory — stored at the platform-default location or a path you supply, survives indefinitely). Unqualified SQL targets the ephemeral primary; the persistent database is reachable as the `"persistent"` alias.
183
+
184
+ ### Hyper engine
185
+
186
+ | Mode | Flag | Behavior |
187
+ |---|---|---|
188
+ | **Shared daemon** *(default)* | *(none)* | One `hyperd` process per user, shared across all MCP clients. The first client auto-spawns the daemon; subsequent clients discover and reuse it. Idle for 30 minutes → daemon shuts itself down; the next client spawns a fresh one. |
189
+ | **Private hyperd** | `--no-daemon` | Each MCP client spawns its own `hyperd` (legacy behavior, one per session). |
190
+
191
+ The shared daemon is the bigger win for users running multiple AI clients (Claude Code + Cursor + VS Code) — they all share one Hyper engine instead of spawning three.
192
+
193
+ ### Database storage
194
+
195
+ | Mode | Flag | Behavior |
196
+ |---|---|---|
197
+ | **Default** | *(none)* | Ephemeral primary in `$TMPDIR/hyperdb-mcp-<pid>-<n>/scratch.hyper` + persistent attachment at the platform data dir (e.g. `~/Library/Application Support/hyperdb/workspace.hyper` on macOS). |
198
+ | **Custom persistent path** | `--persistent-db <PATH>` | Same as default but the persistent file lives at `<PATH>`. The deprecated `--workspace <PATH>` is accepted as an alias with a stderr warning. |
199
+ | **Ephemeral-only** | `--ephemeral-only` | No persistent attachment; the session has only the ephemeral primary plus any user-attached databases via `attach_database`. Saved queries fall back to in-memory storage and disappear when the session ends. |
200
+
201
+ `HYPERDB_PERSISTENT_DB` overrides the default persistent path the same way `--persistent-db` does.
202
+
203
+ ### Working with both databases
204
+
205
+ Tool calls default to the ephemeral primary — that's the LLM's scratch space for exploratory work that doesn't need to outlive the session. To store data in long-term memory (the persistent database), there are two ways to reach it:
206
+
207
+ **1. Per-tool `database` parameter** (preferred for ergonomic LLM workflows):
208
+
209
+ ```jsonc
210
+ // Save a useful table to the persistent database
211
+ load_data({ table: "customers", data: "[...]", persist: true })
212
+ // ↑ shorthand for `database: "persistent"`
213
+
214
+ // Query from persistent
215
+ query({ sql: "SELECT * FROM customers", database: "persistent" })
216
+
217
+ // Inspect persistent tables
218
+ describe({ database: "persistent" })
219
+ sample({ table: "customers", database: "persistent" })
220
+ ```
221
+
222
+ The `database` parameter is available on `query`, `execute`, `load_data`, `load_file`, `load_files`, `watch_directory`, `describe`, `sample`, `chart`, `export`, and `set_table_metadata`. The shorthand `persist: true` (sugar for `database: "persistent"`) is available on `load_data`, `load_file`, `load_files`, and `watch_directory`. Pass any user-attached writable alias (created via `attach_database`) to target a custom database.
223
+
224
+ (`query_data` and `query_file` are one-shot tools that materialize the inline data into their own temp table and query it — they do not accept a `database` parameter because the data isn't in a persisted database to begin with.)
225
+
226
+ **2. Fully-qualified SQL** (for power users or complex multi-DB joins):
227
+
228
+ ```sql
229
+ -- Read from persistent
230
+ SELECT * FROM "persistent"."public"."customers";
231
+
232
+ -- Write to persistent
233
+ CREATE TABLE "persistent"."public"."revenue_2026" AS
234
+ SELECT region, SUM(amount) FROM scratch_orders GROUP BY region;
235
+ ```
236
+
237
+ **Per-database `_table_catalog`:** every writable database — persistent and any user-attached writable file — gets its own `_table_catalog` lazily seeded on first ingest. MCP-managed metadata (load tool, params, timestamps, prose fields set via `set_table_metadata`) lives alongside the data file, so opening a `.hyper` file later as a primary workspace finds the catalog ready. If you want a pristine `.hyper` file for export with no MCP bookkeeping, run `DROP TABLE "<alias>"."public"."_table_catalog"` once and subsequent sessions opening that file will leave it dropped.
238
+
239
+ **Detach safety:** `detach_database` rejects with `InvalidArgument` if any active watcher targets the alias — call `unwatch_directory` first. This prevents the watcher's pool from silently writing into a now-detached file (or worse, the wrong file if the alias is later re-attached to a different path).
240
+
241
+ ### Daemon management
242
+
243
+ The daemon is normally invisible — it auto-spawns and idle-times-out on its own. For diagnostics:
244
+
245
+ ```bash
246
+ hyperdb-mcp daemon status # Show running daemon (PID, endpoint, started_at, version)
247
+ hyperdb-mcp daemon stop # Gracefully shut down the daemon
248
+ hyperdb-mcp daemon # Run as a daemon explicitly (rarely needed)
249
+ ```
250
+
251
+ State files live at `~/.hyperdb/` by default (override with `HYPERDB_STATE_DIR`).
252
+
253
+ ### Recovery from hyperd crashes
254
+
255
+ The daemon polls `hyperd` every 5 seconds. If the process has exited (crashed, OOM, killed), the daemon spawns a replacement, atomically updates `~/.hyperdb/daemon.json` with the new endpoint, and continues serving clients. Clients see one failed tool call (the request that was in flight when hyperd died); the next tool call transparently reconnects to the new hyperd via the same recovery path used for normal connection drops.
256
+
257
+ If a client itself notices hyperd is unreachable before the next polling tick, it sends a fast-path `REPORT_HYPERD_ERROR` signal to the daemon so the restart kicks off without waiting for the timer.
258
+
259
+ If hyperd repeatedly fails to start (3 attempts within 60 seconds — e.g., misconfigured `HYPERD_PATH`, port exhaustion, broken binary), the daemon shuts itself down and removes the discovery file. The next MCP client to start up will then spawn a fresh daemon, surfacing any persistent failure clearly to the user rather than spinning silently.
260
+
261
+ **Known limitation:** if hyperd hangs (alive at the OS level but unresponsive to queries), the daemon's polling can't detect it and your tool call may stall indefinitely. The recovery path is `hyperdb-mcp daemon stop` followed by reconnecting from your MCP client.
262
+
263
+ ### Other behavioral flags
264
+
265
+ | Flag | Behavior |
266
+ |---|---|
267
+ | `--read-only` | Disables `execute`, `load_data`, `load_file`, `watch_directory`, `save_query`, `delete_query`, and Hyper-format export. See [Read-Only Mode](#read-only-mode). |
268
+
269
+ ---
270
+
271
+ ## MCP Tools
272
+
273
+ ### One-Shot Tools
274
+
275
+ #### `query_data`
276
+
277
+ Ingest inline data and run a SQL query in a single call.
278
+
279
+ ```
280
+ query_data(data: '[{"region":"West","revenue":1200},...]', sql: 'SELECT region, SUM(revenue) FROM data GROUP BY region')
281
+ ```
282
+
283
+ | Parameter | Type | Required | Description |
284
+ |-----------|------|----------|-------------|
285
+ | `data` | string | yes | JSON array of objects, or CSV text |
286
+ | `sql` | string | yes | SQL query to run against the data |
287
+ | `format` | string | no | `"json"` or `"csv"` — auto-detected if omitted |
288
+ | `table_name` | string | no | Table name for use in SQL — defaults to `"data"` |
289
+ | `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
290
+
291
+ #### `query_file`
292
+
293
+ Ingest a file and run a SQL query in a single call. Streams from disk — handles files of any size.
294
+
295
+ ```
296
+ query_file(path: '/tmp/sales.parquet', sql: 'SELECT TOP 10 * FROM sales ORDER BY amount DESC')
297
+ ```
298
+
299
+ | Parameter | Type | Required | Description |
300
+ |-----------|------|----------|-------------|
301
+ | `path` | string | yes | Path to CSV / JSON / JSONL / Parquet / Arrow IPC file |
302
+ | `sql` | string | yes | SQL query to run |
303
+ | `table_name` | string | no | Table name — defaults to filename stem |
304
+ | `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
305
+
306
+ ### Workspace Tools
307
+
308
+ #### `load_data`
309
+
310
+ Load inline data into a named workspace table.
311
+
312
+ ```
313
+ load_data(table: 'customers', data: '[{"id":1,"name":"Alice"},...]')
314
+ ```
315
+
316
+ | Parameter | Type | Required | Description |
317
+ |-----------|------|----------|-------------|
318
+ | `table` | string | yes | Table name |
319
+ | `data` | string | yes | JSON array of objects, or CSV text |
320
+ | `format` | string | no | `"json"` or `"csv"` — auto-detected |
321
+ | `mode` | string | no | `"replace"` (default) or `"append"` |
322
+ | `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
323
+
324
+ #### `load_file`
325
+
326
+ Load a file into a named workspace table.
327
+
328
+ ```
329
+ load_file(table: 'orders', path: '/tmp/orders.csv')
330
+ ```
331
+
332
+ | Parameter | Type | Required | Description |
333
+ |-----------|------|----------|-------------|
334
+ | `table` | string | yes | Table name |
335
+ | `path` | string | yes | Path to CSV / JSON / JSONL / Parquet / Arrow IPC file |
336
+ | `mode` | string | no | `"replace"` (default) or `"append"` |
337
+ | `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
338
+
339
+ When you're unsure of the right types — or recovering from a previous
340
+ `SCHEMA_MISMATCH` — call [`inspect_file`](#inspect-file) first. It reports the
341
+ exact schema `load_file` would use plus per-column `min` / `max` / `null_count`
342
+ so you can build a minimal, correct override in one shot.
343
+
344
+ #### `load_iceberg`
345
+
346
+ Load an [Apache Iceberg](https://iceberg.apache.org/) table into a named
347
+ workspace table. Pass the absolute path to the Iceberg table root (the
348
+ directory containing `metadata/` and `data/`); hyperd's native Iceberg
349
+ reader derives the schema and resolves the snapshot.
350
+
351
+ ```
352
+ load_iceberg(table: 'sales', path: '/lake/warehouse/db/sales')
353
+ ```
354
+
355
+ | Parameter | Type | Required | Description |
356
+ |-----------|------|----------|-------------|
357
+ | `table` | string | yes | Target Hyper table name |
358
+ | `path` | string | yes | Absolute path to the Iceberg table root directory |
359
+ | `mode` | string | no | `"replace"` (default) or `"append"` |
360
+ | `metadata_filename` | string | no | Pin a specific snapshot, e.g. `"v2.metadata.json"`. Omit for latest. |
361
+ | `version_as_of` | integer | no | Pin a snapshot by version number |
362
+
363
+ Schema overrides are not accepted — hyperd derives the schema from the
364
+ Iceberg table metadata.
365
+
366
+ #### `query`
367
+
368
+ Run a **read-only** SQL query against the workspace. Accepts `SELECT`, `WITH`, `EXPLAIN`, `SHOW`, `VALUES`. For DDL/DML use `execute`.
369
+
370
+ ```
371
+ query(sql: 'SELECT c.name, SUM(o.amount) FROM orders o JOIN customers c ON o.customer_id = c.id GROUP BY c.name')
372
+ ```
373
+
374
+ #### `execute`
375
+
376
+ Execute a **mutating** SQL statement: `CREATE TABLE`, `INSERT`, `UPDATE`, `DELETE`, `DROP TABLE`, `ALTER`, `COPY`, etc. Returns the affected row count. Disabled in read-only mode.
377
+
378
+ ```
379
+ execute(sql: 'CREATE TABLE archived_orders AS SELECT * FROM orders WHERE year < 2024')
380
+ ```
381
+
382
+ #### `describe`
383
+
384
+ List all workspace tables with their schemas, column types, and row counts.
385
+
386
+ #### `sample`
387
+
388
+ Return the schema, total row count, and first N rows of a table in a single call.
389
+
390
+ ```
391
+ sample(table: 'orders', n: 10)
392
+ ```
393
+
394
+ | Parameter | Type | Required | Description |
395
+ |-----------|------|----------|-------------|
396
+ | `table` | string | yes | Table name |
397
+ | `n` | int | no | Rows to return (default: 5, clamped to 1..=100) |
398
+
399
+ ### Diagnostics
400
+
401
+ #### `inspect_file`
402
+
403
+ Dry-run schema inference on a CSV, Parquet, or Arrow IPC file **without ingesting
404
+ it**. Returns the exact schema `load_file` / `query_file` would use (including
405
+ the full-file numeric widening pass) plus per-column `min`, `max`, `null_count`,
406
+ and `sample_values`. Nothing is written to Hyper and `hyperd` is not even
407
+ started.
408
+
409
+ Use it **before** `load_file` whenever you are unsure about types, or **after** a
410
+ `SCHEMA_MISMATCH` failure to pick the right widening. The LLM can feed the
411
+ reported `type` + `min` / `max` directly into a partial `schema` override on the
412
+ subsequent `load_file` call.
413
+
414
+ ```
415
+ inspect_file(path: '/tmp/owid-population.csv')
416
+ ```
417
+
418
+ | Parameter | Type | Required | Description |
419
+ |-----------|------|----------|-------------|
420
+ | `path` | string | yes | Path to CSV / JSON / JSONL / Parquet / Arrow IPC file |
421
+ | `sample_rows` | int | no | Sample values / rows per column (default 5, clamped 1..=50) |
422
+
423
+ Response shape:
424
+
425
+ ```json
426
+ {
427
+ "file_format": "csv",
428
+ "row_count": 63000,
429
+ "file_size_bytes": 4831204,
430
+ "columns": [
431
+ { "name": "Entity", "type": "TEXT", "nullable": true, "null_count": 0, "sample_values": ["Afghanistan", ...] },
432
+ { "name": "Year", "type": "INT", "nullable": true, "null_count": 0, "min": 1800, "max": 2023, "sample_values": ["1800", ...] },
433
+ { "name": "Population", "type": "BIGINT", "nullable": true, "null_count": 12, "min": 500, "max": 8002572256, "sample_values": ["4000000", ...] }
434
+ ],
435
+ "sample_rows": [ { "Entity": "Afghanistan", "Year": "1800", "Population": "2805829" } ]
436
+ }
437
+ ```
438
+
439
+ `sample_values` and `sample_rows` are **always strings**, regardless of the inferred column `type` — they report what the file contains on disk, before any type coercion, so the LLM can compare the raw text against `min` / `max` when building a `schema` override. Use `type` (and `min` / `max`) for the typed view; use `sample_values` for the raw view.
440
+
441
+ ### Saved Queries
442
+
443
+ Register a named read-only SQL query once; read its live result as many
444
+ times as you like via a resource URI. Useful for dashboard-style recurring
445
+ views and for giving LLMs a stable "bookmark" set of key queries that
446
+ resources/list advertises up front.
447
+
448
+ Each saved query produces **two** resources:
449
+
450
+ - `hyper://queries/{name}/definition` — the stored SQL plus metadata
451
+ (description, `created_at`) as JSON.
452
+ - `hyper://queries/{name}/result` — re-runs the SQL on every read and
453
+ returns `{ name, result: [...], stats: {...} }`.
454
+
455
+ **Persistence:** saved queries land in the persistent attachment's
456
+ `_hyperdb_saved_queries` meta-table (`"persistent"."public"."_hyperdb_saved_queries"`)
457
+ and survive server restarts. In `--ephemeral-only` sessions they live
458
+ only for the lifetime of the server process.
459
+
460
+ #### `save_query`
461
+
462
+ ```
463
+ save_query(name: 'top_5_customers', sql: 'SELECT customer, SUM(amount) AS total FROM orders GROUP BY customer ORDER BY total DESC LIMIT 5', description: 'Biggest spenders this year')
464
+ ```
465
+
466
+ | Parameter | Type | Required | Description |
467
+ |---|---|---|---|
468
+ | `name` | string | yes | Unique identifier used as the URI path component |
469
+ | `sql` | string | yes | Read-only SQL (SELECT / WITH / EXPLAIN / SHOW / VALUES) |
470
+ | `description` | string | no | Human-friendly summary |
471
+
472
+ Duplicate names are rejected with `INVALID_ARGUMENT` — use `delete_query`
473
+ first if you intend to overwrite. Non-read-only SQL is rejected with
474
+ `SQL_ERROR`. Disabled in read-only mode.
475
+
476
+ #### `delete_query`
477
+
478
+ ```
479
+ delete_query(name: 'top_5_customers')
480
+ ```
481
+
482
+ | Parameter | Type | Required | Description |
483
+ |---|---|---|---|
484
+ | `name` | string | yes | Name of the saved query to remove |
485
+
486
+ Returns `{ "deleted": true }` when the query existed, `{ "deleted": false }`
487
+ when it did not (no error on unknown names). Disabled in read-only mode.
488
+
489
+ ### Export Tools
490
+
491
+ #### `export`
492
+
493
+ Write query results or a table to a file.
494
+
495
+ ```
496
+ export(table: 'orders', path: '~/Desktop/orders.parquet', format: 'parquet')
497
+ export(sql: 'SELECT ...', path: '~/Desktop/analysis.hyper', format: 'hyper')
498
+ ```
499
+
500
+ | Parameter | Type | Required | Description |
501
+ |-----------|------|----------|-------------|
502
+ | `sql` | string | no | Query to export (if omitted, exports whole table) |
503
+ | `table` | string | no | Table name (used if `sql` omitted) |
504
+ | `path` | string | yes | Output file path |
505
+ | `format` | string | yes | `"csv"`, `"parquet"`, `"iceberg"`, `"arrow_ipc"`, or `"hyper"` |
506
+
507
+ The `"hyper"` format produces a `.hyper` file that opens directly in **Tableau Desktop**.
508
+
509
+ ### Visualization
510
+
511
+ #### `chart`
512
+
513
+ Render a chart from a SQL query and return it inline as an image.
514
+
515
+ ```
516
+ chart(sql: 'SELECT product, SUM(revenue) as total FROM sales GROUP BY product', chart_type: 'bar', x: 'product', y: 'total', title: 'Revenue by Product')
517
+ ```
518
+
519
+ | Parameter | Type | Required | Description |
520
+ |-----------|------|----------|-------------|
521
+ | `sql` | string | yes | Read-only SQL query returning the data to plot |
522
+ | `chart_type` | string | yes | `bar`, `line`, `scatter`, or `histogram` |
523
+ | `x` | string | yes* | X-axis column (for histogram, the value column) |
524
+ | `y` | string | yes* | Y-axis column (not required for histogram) |
525
+ | `series` | string | no | Grouping column for multi-series plots |
526
+ | `title` | string | no | Chart title |
527
+ | `format` | string | no | `png` (default) or `svg` |
528
+ | `width` | int | no | Pixels (default 800, clamped 200..4096) |
529
+ | `height` | int | no | Pixels (default 480, clamped 150..4096) |
530
+ | `bins` | int | no | Histogram bins (default 20, clamped 1..500) |
531
+
532
+ Returns an `ImageContent` (base64 PNG or SVG) plus a stats JSON block.
533
+
534
+ ### Incremental Ingest
535
+
536
+ #### `watch_directory` / `unwatch_directory`
537
+
538
+ Monitor a directory for data files and auto-append them to a target table.
539
+
540
+ ```
541
+ watch_directory(path: '/tmp/inbox', table: 'events')
542
+ unwatch_directory(path: '/tmp/inbox')
543
+ ```
544
+
545
+ **Producer protocol (`.ready` sentinel):**
546
+
547
+ 1. Write data file (e.g. `foo.csv`) and close it.
548
+ 2. Create a zero-byte companion `foo.csv.ready` — this is the atomic signal.
549
+ 3. Poll for the absence of `foo.csv.ready` to confirm the watcher is done.
550
+
551
+ On success, both files are deleted. On failure, both are moved to `failed/` with a `.error` JSON file.
552
+
553
+ Key properties:
554
+ - **One directory, one table, append mode** — files must match the target schema.
555
+ - **Initial sweep** — pre-existing `.ready` files are processed immediately.
556
+ - **Read-only mode** — `watch_directory` is blocked; `unwatch_directory` is always allowed.
557
+ - **Cleanup** — dropping the server or calling `unwatch_directory` terminates the background thread.
558
+
559
+ ### Utility Tools
560
+
561
+ #### `status`
562
+
563
+ Returns plugin health, workspace mode, table count, total rows, disk usage, read-only flag, and active directory watchers with per-watcher stats.
564
+
565
+ ---
566
+
567
+ ## MCP Resources
568
+
569
+ The server exposes workspace state as MCP **Resources**, discoverable via
570
+ `resources/list`. Each resource advertises its own MIME type so clients
571
+ can route it appropriately (LLM context vs. file download vs. chart).
572
+
573
+ | URI | MIME | Content |
574
+ |-----|------|---------|
575
+ | `hyper://workspace` | `application/json` | Workspace mode, table count, total rows, disk usage |
576
+ | `hyper://tables` | `application/json` | Full list of tables with schemas and row counts |
577
+ | `hyper://readme` | `text/markdown` | Workspace overview as markdown: table catalog, related resources per table, and tool hints for a cold-started LLM |
578
+ | `hyper://tables/{name}/schema` | `application/json` | Columns, types, nullability, and row count for one table |
579
+ | `hyper://tables/{name}/sample` | `application/json` | First 5 rows of a table as JSON, with schema |
580
+ | `hyper://tables/{name}/csv-sample` | `text/csv` | First 20 rows of a table as CSV, header-first |
581
+ | `hyper://queries/{name}/definition` | `application/json` | Stored SQL + metadata for a saved query |
582
+ | `hyper://queries/{name}/result` | `application/json` | Live result of a saved query — re-runs on every read |
583
+
584
+ Resource templates (discoverable via `resources/templates/list`):
585
+
586
+ - `hyper://tables/{name}/schema`
587
+ - `hyper://tables/{name}/sample`
588
+ - `hyper://tables/{name}/csv-sample`
589
+ - `hyper://queries/{name}/definition`
590
+ - `hyper://queries/{name}/result`
591
+
592
+ The internal `_hyperdb_saved_queries` meta-table used to persist saved
593
+ queries is deliberately hidden from `resources/list` and
594
+ `hyper://tables` — callers see only user-visible data tables.
595
+
596
+ ### Resource-update notifications
597
+
598
+ HyperDB advertises both the `resources.subscribe` and
599
+ `resources.listChanged` capabilities in its `initialize` response. Clients
600
+ can subscribe to any `hyper://...` URI via `resources/subscribe` and will
601
+ then receive `notifications/resources/updated` messages whenever the
602
+ server detects a change, without polling.
603
+
604
+ The server fires **targeted** updates for the URIs affected by each kind
605
+ of mutation:
606
+
607
+ | Trigger | Updated URIs | `resources/list_changed`? |
608
+ |---|---|---|
609
+ | `load_data` / `load_file` (replace mode) | `hyper://workspace`, `hyper://tables`, `hyper://readme`, per-table schema + sample + csv-sample | Yes |
610
+ | `load_data` / `load_file` (append mode) | Same per-table + summary URIs | No &sup1; |
611
+ | `watch_directory` ingest of a `.ready` pair | Same per-table + summary URIs | No &sup1; |
612
+ | `execute` (INSERT / UPDATE / DELETE) | Workspace summary URIs | No |
613
+ | `execute` (CREATE / DROP / ALTER / TRUNCATE / RENAME) | Workspace summary URIs | Yes |
614
+ | `save_query` | (none per-URI) | Yes — two new `hyper://queries/{name}/...` resources |
615
+ | `delete_query` | `hyper://queries/{name}/definition`, `hyper://queries/{name}/result` | Yes — two resources disappeared |
616
+
617
+ &sup1; Append-mode ingest (both `load_*` and the watcher) auto-creates the target table when it doesn't exist, but **does not** fire `list_changed` for that creation. Clients that need to discover watcher-created tables should re-read `hyper://tables` after subscribing, or use the per-table `updated` notification as a trigger to refresh their list. Tracked in `DEVELOPMENT.md` as tech debt.
618
+
619
+ Notifications are fire-and-forget — send failures (typically due to a
620
+ client disconnect) are logged at the `debug` level and the registry
621
+ prunes dead peers lazily. This keeps mutation paths fast and free of
622
+ back-pressure concerns.
623
+
624
+ All JSON-typed resources return a pretty-printed object; Markdown and
625
+ CSV resources are returned verbatim.
626
+
627
+ ---
628
+
629
+ ## MCP Prompts
630
+
631
+ Four guided analytical workflows registered as MCP **Prompts**.
632
+
633
+ | Prompt | Arguments | What it does |
634
+ |--------|-----------|--------------|
635
+ | `analyze-table` | `table` | Schema walkthrough, column statistics, data quality flags |
636
+ | `compare-tables` | `table_a`, `table_b` | Schema alignment, JOIN key suggestions, analytical opportunities |
637
+ | `data-quality` | `table` | Systematic NULL / duplicate / cardinality / outlier checks |
638
+ | `suggest-queries` | `table`, `goal?` | 5 analytical SQL queries with explanations, optionally goal-guided |
639
+
640
+ ---
641
+
642
+ ## Read-Only Mode
643
+
644
+ ```bash
645
+ hyperdb-mcp --persistent-db ~/analytics.hyper --read-only
646
+ ```
647
+
648
+ - **Allowed:** `query`, `query_data`, `query_file`, `describe`, `sample`, `inspect_file`, `status`, `export`
649
+ - **Blocked:** `execute`, `load_data`, `load_file`, `watch_directory`, `save_query`, `delete_query` — return `READ_ONLY_VIOLATION`
650
+ - **Resources, prompts, and resource subscriptions** work normally — read-only clients can still subscribe to `hyper://...` URIs and receive notifications when other (non-read-only) connections mutate state
651
+
652
+ The `query` tool also enforces read-only at the SQL level — only `SELECT`/`WITH`/`EXPLAIN`/`SHOW`/`VALUES` are accepted.
653
+
654
+ ---
655
+
656
+ ## Data Flow Patterns
657
+
658
+ - **Small data (LLM relay):** For <10K rows. The LLM gets data from another plugin and passes it inline via `query_data`.
659
+ - **Large data (file intermediary):** For thousands to billions of rows. Source plugin exports to a file, the LLM calls `query_file`. Data never enters the LLM context — constant memory regardless of file size.
660
+
661
+ ---
662
+
663
+ ## Schema Inference
664
+
665
+ Three tiers, chosen automatically based on the data source:
666
+
667
+ | Tier | Source | How |
668
+ |------|--------|-----|
669
+ | **Exact** | Arrow IPC, Parquet | Schema read from file metadata. Types preserved exactly. |
670
+ | **Structural** | JSON | All objects scanned. Per-column type widening: Int → BigInt → Double. Mixed types → TEXT. |
671
+ | **Heuristic** | CSV | Header row for names, first 1,000 rows sampled for types. A second full-file streaming pass then **widens** numeric columns if needed (INT → BIGINT → NUMERIC(38,0); INT/BIGINT → DOUBLE PRECISION if any later row contains a decimal). |
672
+
673
+ **JSON file shapes.** `load_file` and `query_file` accept two JSON
674
+ representations and auto-detect between them from the first non-whitespace
675
+ byte: a top-level JSON array of objects (e.g. `[{...}, {...}]`) or
676
+ newline-delimited JSON (JSONL / NDJSON — one JSON object per line, the
677
+ format hyperd's own logs use). Blank lines are tolerated. Malformed
678
+ JSONL surfaces a `SCHEMA_MISMATCH` error naming the offending line
679
+ number.
680
+
681
+ **Content sniffing for unknown extensions.** Files with extensions the
682
+ dispatcher doesn't recognize (`.log`, `.txt`, no extension at all) are
683
+ classified by peeking at the first non-whitespace byte: `[` or `{`
684
+ routes to JSON, anything else to CSV. This means hyperd's raw `.log`
685
+ files load through `load_file` directly, no rename or preprocessing
686
+ required. Binary formats (`.parquet`, `.arrow`, `.ipc`, `.feather`,
687
+ `.pq`) always win by extension since they're not text-sniffable.
688
+ `inspect_file` uses the exact same dispatcher so its report can never
689
+ disagree with what `load_file` would do.
690
+
691
+ **CSV NULL handling.** Unquoted empty cells (`,,`) load as SQL NULL —
692
+ matching PostgreSQL's CSV convention and `inspect_file`'s `null_count`
693
+ diagnostics. Quoted empty strings (`,"",`) load as the literal empty
694
+ string. This means downstream `WHERE col IS NULL` works directly without
695
+ a defensive `OR col = ''` clause.
696
+
697
+ The full-file CSV widening pass specifically protects against the "big value
698
+ hidden at the end of the file" failure mode — e.g. an aggregate row whose
699
+ `population` is ~8 billion tucked in after 60 000 country-sized rows. Without
700
+ it, the first-pass sample would pick `INT` and the COPY would fail with
701
+ `SCHEMA_MISMATCH` / SQLSTATE 22003 mid-ingest.
702
+
703
+ For implementation details (widening rules, type mapping tables), see the
704
+ module docs in `src/schema.rs` and `src/ingest_arrow.rs`.
705
+
706
+ ### Schema Overrides
707
+
708
+ Every data-in tool (`query_data`, `query_file`, `load_data`, `load_file`)
709
+ accepts an optional `schema` parameter: a **partial** map from column name to
710
+ Hyper SQL type.
711
+
712
+ ```json
713
+ { "schema": { "population": "BIGINT", "order_date": "DATE" } }
714
+ ```
715
+
716
+ Semantics:
717
+
718
+ - Keys are matched to columns **by name** (case-sensitive). Column order in
719
+ the JSON object does not need to match the file — the inferred order from
720
+ the file is preserved.
721
+ - Columns **not** listed in the override keep their inferred type. You only
722
+ specify the columns you want to correct.
723
+ - Unknown column names and unknown type strings are rejected up front with a
724
+ `SCHEMA_MISMATCH` error that lists the real column names, so the LLM can
725
+ self-correct without another round-trip.
726
+ - Supported type strings: `INT`, `BIGINT`, `NUMERIC(p,s)` (e.g.
727
+ `NUMERIC(38,0)` or `NUMERIC(12,2)`), `DOUBLE PRECISION`, `TEXT`, `BOOL`,
728
+ `DATE`, `TIMESTAMP`.
729
+
730
+ **Recommended workflow for unfamiliar data:**
731
+
732
+ 1. Call `inspect_file` → read the reported `type` + `min` / `max` per column.
733
+ 2. For any column whose `max` exceeds its inferred type's range, or where
734
+ you want stricter parsing than CSV heuristics give, build a partial
735
+ override.
736
+ 3. Pass it to `load_file` / `query_file`.
737
+
738
+ ---
739
+
740
+ ## SQL Dialect
741
+
742
+ Hyper uses the Salesforce Data Cloud SQL dialect (PostgreSQL-compatible with extensions). Supports `SELECT`, JOINs, subqueries, CTEs, window functions, aggregations, DDL, DML, and `COPY FROM`.
743
+
744
+ Full reference: [Data Cloud SQL Reference](https://developer.salesforce.com/docs/data/data-cloud-query-guide/references/dc-sql-reference/data-cloud-sql-context.html)
745
+
746
+ ---
747
+
748
+ ## CLI Reference
749
+
750
+ ```
751
+ hyperdb-mcp [OPTIONS] [COMMAND]
752
+
753
+ Commands:
754
+ daemon Run as a background daemon managing a shared hyperd process
755
+
756
+ Options:
757
+ --persistent-db <PATH> Path to the persistent .hyper file. Defaults to the platform
758
+ data dir (~/Library/Application Support/hyperdb/workspace.hyper
759
+ on macOS, ~/.local/share/hyperdb/workspace.hyper on Linux,
760
+ %APPDATA%\hyperdb\workspace.hyper on Windows). Override via
761
+ the HYPERDB_PERSISTENT_DB env var.
762
+ --ephemeral-only Skip the persistent attachment entirely. Disables save_query
763
+ persistence (queries fall back to session storage).
764
+ --read-only Disable mutating tools (execute, load_data, load_file,
765
+ save_query, delete_query, watch_directory)
766
+ --no-daemon Disable the shared daemon and spawn a private hyperd
767
+
768
+ Deprecated:
769
+ --workspace <PATH> Old name for --persistent-db. Still accepted, emits a
770
+ stderr warning, and will be removed in a future release.
771
+
772
+ Daemon subcommand:
773
+ hyperdb-mcp daemon Start the daemon (usually auto-spawned)
774
+ hyperdb-mcp daemon stop Gracefully stop the running daemon
775
+ hyperdb-mcp daemon status Show running daemon info
776
+ hyperdb-mcp daemon --port <PORT> Override the health/lock port (default 7484)
777
+ hyperdb-mcp daemon --idle-timeout <SECS> Override idle timeout (default 1800 = 30 min)
778
+
779
+ Environment:
780
+ HYPERD_PATH Path to hyperd binary (auto-detected if on PATH)
781
+ HYPERDB_PERSISTENT_DB Override the default persistent-db path
782
+ HYPERDB_STATE_DIR Override daemon state directory (default ~/.hyperdb/)
783
+ HYPERDB_DAEMON_PORT Override daemon health/lock port (default 7484)
784
+ HYPERDB_DAEMON_IDLE_TIMEOUT Override daemon idle timeout in seconds (default 1800)
785
+ ```
786
+
787
+ ---
788
+
789
+ ## Error Handling
790
+
791
+ Errors include a machine-readable code and a suggestion:
792
+
793
+ | Code | When | Recovery |
794
+ |---|---|---|
795
+ | `HYPERD_NOT_FOUND` | `hyperd` not found | Set `HYPERD_PATH` or install Hyper |
796
+ | `FILE_NOT_FOUND` | File path doesn't exist | Verify the path |
797
+ | `UNSUPPORTED_FORMAT` | Unrecognized file type | Specify `format` explicitly |
798
+ | `SCHEMA_MISMATCH` | Data doesn't match inferred types, numeric overflow (SQLSTATE 22003), or invalid text for target type (SQLSTATE 22P02) | Call `inspect_file` then retry with a partial `schema` override (e.g. `{"population":"BIGINT"}` or `{"id":"TEXT"}`) |
799
+ | `SQL_ERROR` | Invalid SQL | Fix the query |
800
+ | `TABLE_NOT_FOUND` | Table doesn't exist | Use `describe` to list tables |
801
+ | `READ_ONLY_VIOLATION` | Mutating op in read-only mode | Use `query_*` / `inspect_file`, or restart without `--read-only` |
802
+ | `CONNECTION_LOST` | `hyperd` crashed or wire protocol desynchronized | Retry — the server tears down the engine and reconnects on the next call |
803
+
804
+ Server-returned errors include a machine-readable `code`, a `message`, and a
805
+ `suggestion` with concrete retry guidance. The `SCHEMA_MISMATCH` suggestion for
806
+ an overflow names the workflow directly: "call `inspect_file`, then retry with
807
+ a partial schema override", so the LLM does not need to infer the recovery
808
+ steps from the SQLSTATE alone.
809
+
810
+ ---
811
+
812
+ ## Troubleshooting
813
+
814
+ **Tools not discovered by the client** — Verify the `initialize` response advertises `"capabilities": {"tools": {}}`. Pipe a raw `initialize` JSON-RPC request to the binary to check.
815
+
816
+ **Server registered but tools not callable (Claude Code)** — Add `"mcp__HyperDB__*"` to the `permissions.allow` array in `~/.claude/settings.json`.
817
+
818
+ **hyperd not found** — Set `HYPERD_PATH` in the MCP server's `env` config, or place `hyperd` on your `PATH`.
819
+
820
+ ---
821
+
822
+ ## Related Documentation
823
+
824
+ - **[Main README](../README.md)** — Getting started with the Hyper API
825
+ - **[hyperdb-api](../hyperdb-api/)** — Core Rust API (sync/async connections, inserter, query)
826
+ - **[DEVELOPMENT.md](DEVELOPMENT.md)** — Internal architecture, design decisions, contributor guide
827
+ - **[ROADMAP.md](ROADMAP.md)** — Forward-looking design sketches for features that aren't built yet
828
+ - **[Design Spec](../docs/specs/hyperdb-mcp-design.md)** — Full design document
package/package.json CHANGED
@@ -1,17 +1,18 @@
1
1
  {
2
2
  "name": "hyperdb-mcp",
3
- "version": "0.1.1",
3
+ "version": "0.2.1",
4
4
  "description": "HyperDB MCP server — instant SQL analytics for LLM workflows",
5
5
  "bin": {
6
6
  "hyperdb-mcp": "bin.js"
7
7
  },
8
8
  "optionalDependencies": {
9
- "hyperdb-mcp-darwin-arm64": "0.1.1",
10
- "hyperdb-mcp-linux-x64-gnu": "0.1.1",
11
- "hyperdb-mcp-win32-x64-msvc": "0.1.1"
9
+ "hyperdb-mcp-darwin-arm64": "0.2.1",
10
+ "hyperdb-mcp-linux-x64-gnu": "0.2.1",
11
+ "hyperdb-mcp-win32-x64-msvc": "0.2.1"
12
12
  },
13
13
  "files": [
14
- "bin.js"
14
+ "bin.js",
15
+ "README.md"
15
16
  ],
16
17
  "keywords": [
17
18
  "hyper",