hyperdb-mcp 0.1.3 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +191 -17
  2. package/package.json +7 -7
package/README.md CHANGED
@@ -1,5 +1,7 @@
1
1
  # hyperdb-mcp
2
2
 
3
+ > **Note:** This crate was vibe-engineered with heavy use of AI coding assistants. The 0.2.x line may still undergo large breaking changes; the public API won't settle until the 1.0.0 release.
4
+
3
5
  An MCP (Model Context Protocol) server that turns the Hyper columnar database into an instant SQL analytics engine. Data flows in from other MCP plugins or files, lands in Hyper automatically, and becomes queryable with SQL — no setup, no schema files, no database management.
4
6
 
5
7
  Built on the pure-Rust [`hyperdb-api`](../hyperdb-api/) crate for maximum performance: 22M+ rows/sec inserts, 18M+ rows/sec queries, constant memory for billion-row results.
@@ -10,16 +12,31 @@ Built on the pure-Rust [`hyperdb-api`](../hyperdb-api/) crate for maximum perfor
10
12
 
11
13
  LLMs are powerful at reasoning but cannot natively crunch millions of rows. This plugin bridges that gap: another MCP tool produces data, the LLM passes it to `hyperdb-mcp`, Hyper ingests it and makes it SQL-queryable, the LLM runs analytical SQL, and results come back as JSON. Optionally export to CSV, Parquet, Apache Iceberg, Arrow IPC, or `.hyper` (opens directly in **Tableau Desktop**).
12
14
 
15
+ ### Queryable Memory for AI
16
+
17
+ Unlike flat-text memory systems that store blobs and retrieve by similarity search, HyperDB gives LLMs **structured, queryable long-term memory**. The persistent database survives across sessions — anything the LLM stores there can be JOINed, filtered, aggregated, and reasoned over with full SQL in any future conversation.
18
+
19
+ This means an LLM can:
20
+ - **Accumulate knowledge over time** — store reference tables, project decisions, user preferences, learned facts
21
+ - **Cross-reference across sessions** — JOIN today's analysis against historical data from last week
22
+ - **Answer complex recall questions** — "Which projects had budget overruns in Q1?" is a SQL query, not a fuzzy text search
23
+ - **Build on prior work** — load yesterday's cleaned dataset and extend it without re-processing from scratch
24
+ - **Maintain structured context** — store relationship graphs, timelines, or decision logs as proper tables with typed columns
25
+
26
+ The ephemeral database is scratch space (think: a whiteboard). The persistent database is long-term memory (think: a filing cabinet you can query). Multiple AI clients sharing the same daemon see the same persistent data — so Claude Code, Cursor, and VS Code Copilot can all read from and contribute to the same knowledge base.
27
+
13
28
  ---
14
29
 
15
30
  ## Features
16
31
 
17
32
  - **Zero setup** — `HyperProcess` auto-starts the Hyper server
33
+ - **Shared `hyperd` daemon** — one Hyper process per user, shared across all MCP clients (Claude Code, Cursor, VS Code, etc.) for reduced memory overhead and concurrent access to the same persistent databases
34
+ - **Queryable long-term memory** — persistent database survives across sessions; LLMs can store, recall, JOIN, and aggregate structured knowledge over time — not just retrieve text blobs, but reason over them with SQL
18
35
  - **Any data in** — JSON, CSV, Parquet, Arrow IPC, Apache Iceberg; schema inferred or exact
19
36
  - **SQL at scale** — thousands to billions of rows
20
37
  - **Data out** — export to CSV, Parquet, Apache Iceberg, Arrow IPC, or `.hyper` (Tableau Desktop-ready)
21
38
  - **One-shot queries** — `query_file("/tmp/sales.csv", "SELECT ...")` — single call, zero management
22
- - **Persistent workspace** — load multiple tables, JOIN across them, persist across sessions
39
+ - **Cross-session continuity** — load multiple tables, JOIN across them, persist across sessions; pick up exactly where you left off
23
40
  - **Read-only safe mode** — `--read-only` flag for safe deployment
24
41
  - **Schema resources** — auto-discover table schemas via `resources/list`
25
42
  - **Guided prompts** — `analyze-table`, `compare-tables`, `data-quality`, `suggest-queries`
@@ -30,7 +47,7 @@ LLMs are powerful at reasoning but cannot natively crunch millions of rows. This
30
47
  - **Pre-ingest file inspection** — `inspect_file` dry-runs the same inference without touching Hyper so LLMs can build safe schema overrides in one shot
31
48
  - **Partial schema overrides** — supply just the columns you want to correct (e.g. `{"population":"BIGINT"}`) — the rest keep their inferred type
32
49
  - **Rich resource surface** — workspace readme, per-table JSON and CSV samples, and one JSON + one CSV resource per table so LLMs can orient themselves via `resources/list` without any tool calls
33
- - **Saved queries** — register named read-only SQL with `save_query`; each query becomes `hyper://queries/{name}/definition` (metadata) + `hyper://queries/{name}/result` (live re-run). Persisted in `--workspace` mode, session-only otherwise
50
+ - **Saved queries** — register named read-only SQL with `save_query`; each query becomes `hyper://queries/{name}/definition` (metadata) + `hyper://queries/{name}/result` (live re-run). Persisted in the persistent attachment, session-only when `--ephemeral-only`
34
51
  - **Live resource-update notifications** — MCP clients can `resources/subscribe` to any `hyper://...` URI; the server fires `notifications/resources/updated` after every ingest, DDL, watcher event, or saved-query mutation
35
52
 
36
53
  ---
@@ -131,11 +148,12 @@ Or if you built from source:
131
148
  }
132
149
  ```
133
150
 
134
- For a **persistent workspace** (tables survive across sessions), add `"args"`:
151
+ By default, persistent storage lives at the platform data dir (`~/Library/Application Support/hyperdb/workspace.hyper` on macOS, `~/.local/share/hyperdb/workspace.hyper` on Linux, `%APPDATA%\hyperdb\workspace.hyper` on Windows). To use a custom path:
135
152
  ```json
136
- "args": ["--workspace", "/path/to/my-project.hyper"]
153
+ "args": ["--persistent-db", "/path/to/my-project.hyper"]
137
154
  ```
138
- This is still **experimental** and will only work with only one session at a time since the Hyper database is locked by Hyper. Each session is isolated and has its own Hyper instance running. Future work will allow multiple sessions to share the same database but requires work to spin up a shared Hyper instance.
155
+
156
+ Multiple MCP clients can point at the **same** persistent file simultaneously — they all connect through the shared `hyperd` daemon and use Hyper's MVCC transaction isolation. See [Operating Modes](#operating-modes) below.
139
157
 
140
158
  #### Claude Code / AI Suite
141
159
 
@@ -159,6 +177,97 @@ Any tool that supports the MCP stdio transport can use this server. Point it at
159
177
 
160
178
  ---
161
179
 
180
+ ## Operating Modes
181
+
182
+ Each session has **two databases**: an ephemeral primary (scratch space — always created fresh per session, deleted on exit) and a persistent database (queryable long-term memory — stored at the platform-default location or a path you supply, survives indefinitely). Unqualified SQL targets the ephemeral primary; the persistent database is reachable as the `"persistent"` alias.
183
+
184
+ ### Hyper engine
185
+
186
+ | Mode | Flag | Behavior |
187
+ |---|---|---|
188
+ | **Shared daemon** *(default)* | *(none)* | One `hyperd` process per user, shared across all MCP clients. The first client auto-spawns the daemon; subsequent clients discover and reuse it. Idle for 30 minutes → daemon shuts itself down; the next client spawns a fresh one. |
189
+ | **Private hyperd** | `--no-daemon` | Each MCP client spawns its own `hyperd` (legacy behavior, one per session). |
190
+
191
+ The shared daemon is the bigger win for users running multiple AI clients (Claude Code + Cursor + VS Code) — they all share one Hyper engine instead of spawning three.
192
+
193
+ ### Database storage
194
+
195
+ | Mode | Flag | Behavior |
196
+ |---|---|---|
197
+ | **Default** | *(none)* | Ephemeral primary in `$TMPDIR/hyperdb-mcp-<pid>-<n>/scratch.hyper` + persistent attachment at the platform data dir (e.g. `~/Library/Application Support/hyperdb/workspace.hyper` on macOS). |
198
+ | **Custom persistent path** | `--persistent-db <PATH>` | Same as default but the persistent file lives at `<PATH>`. The deprecated `--workspace <PATH>` is accepted as an alias with a stderr warning. |
199
+ | **Ephemeral-only** | `--ephemeral-only` | No persistent attachment; the session has only the ephemeral primary plus any user-attached databases via `attach_database`. Saved queries fall back to in-memory storage and disappear when the session ends. |
200
+
201
+ `HYPERDB_PERSISTENT_DB` overrides the default persistent path the same way `--persistent-db` does.
202
+
203
+ ### Working with both databases
204
+
205
+ Tool calls default to the ephemeral primary — that's the LLM's scratch space for exploratory work that doesn't need to outlive the session. To store data in long-term memory (the persistent database), there are two ways to reach it:
206
+
207
+ **1. Per-tool `database` parameter** (preferred for ergonomic LLM workflows):
208
+
209
+ ```jsonc
210
+ // Save a useful table to the persistent database
211
+ load_data({ table: "customers", data: "[...]", persist: true })
212
+ // ↑ shorthand for `database: "persistent"`
213
+
214
+ // Query from persistent
215
+ query({ sql: "SELECT * FROM customers", database: "persistent" })
216
+
217
+ // Inspect persistent tables
218
+ describe({ database: "persistent" })
219
+ sample({ table: "customers", database: "persistent" })
220
+ ```
221
+
222
+ The `database` parameter is available on `query`, `execute`, `load_data`, `load_file`, `load_files`, `watch_directory`, `describe`, `sample`, `chart`, `export`, and `set_table_metadata`. The shorthand `persist: true` (sugar for `database: "persistent"`) is available on `load_data`, `load_file`, `load_files`, and `watch_directory`. Pass any user-attached writable alias (created via `attach_database`) to target a custom database.
223
+
224
+ (`query_data` and `query_file` are one-shot tools that materialize the inline data into their own temp table and query it — they do not accept a `database` parameter because the data isn't in a persisted database to begin with.)
225
+
226
+ **2. Fully-qualified SQL** (for power users or complex multi-DB joins):
227
+
228
+ ```sql
229
+ -- Read from persistent
230
+ SELECT * FROM "persistent"."public"."customers";
231
+
232
+ -- Write to persistent
233
+ CREATE TABLE "persistent"."public"."revenue_2026" AS
234
+ SELECT region, SUM(amount) FROM scratch_orders GROUP BY region;
235
+ ```
236
+
237
+ **Per-database `_table_catalog`:** every writable database — persistent and any user-attached writable file — gets its own `_table_catalog` lazily seeded on first ingest. MCP-managed metadata (load tool, params, timestamps, prose fields set via `set_table_metadata`) lives alongside the data file, so opening a `.hyper` file later as a primary workspace finds the catalog ready. If you want a pristine `.hyper` file for export with no MCP bookkeeping, run `DROP TABLE "<alias>"."public"."_table_catalog"` once and subsequent sessions opening that file will leave it dropped.
238
+
239
+ **Detach safety:** `detach_database` rejects with `InvalidArgument` if any active watcher targets the alias — call `unwatch_directory` first. This prevents the watcher's pool from silently writing into a now-detached file (or worse, the wrong file if the alias is later re-attached to a different path).
240
+
241
+ ### Daemon management
242
+
243
+ The daemon is normally invisible — it auto-spawns and idle-times-out on its own. For diagnostics:
244
+
245
+ ```bash
246
+ hyperdb-mcp daemon status # Show running daemon (PID, endpoint, started_at, version)
247
+ hyperdb-mcp daemon stop # Gracefully shut down the daemon
248
+ hyperdb-mcp daemon # Run as a daemon explicitly (rarely needed)
249
+ ```
250
+
251
+ State files live at `~/.hyperdb/` by default (override with `HYPERDB_STATE_DIR`).
252
+
253
+ ### Recovery from hyperd crashes
254
+
255
+ The daemon polls `hyperd` every 5 seconds. If the process has exited (crashed, OOM, killed), the daemon spawns a replacement, atomically updates `~/.hyperdb/daemon.json` with the new endpoint, and continues serving clients. Clients see one failed tool call (the request that was in flight when hyperd died); the next tool call transparently reconnects to the new hyperd via the same recovery path used for normal connection drops.
256
+
257
+ If a client itself notices hyperd is unreachable before the next polling tick, it sends a fast-path `REPORT_HYPERD_ERROR` signal to the daemon so the restart kicks off without waiting for the timer.
258
+
259
+ If hyperd repeatedly fails to start (3 attempts within 60 seconds — e.g., misconfigured `HYPERD_PATH`, port exhaustion, broken binary), the daemon shuts itself down and removes the discovery file. The next MCP client to start up will then spawn a fresh daemon, surfacing any persistent failure clearly to the user rather than spinning silently.
260
+
261
+ **Known limitation:** if hyperd hangs (alive at the OS level but unresponsive to queries), the daemon's polling can't detect it and your tool call may stall indefinitely. The recovery path is `hyperdb-mcp daemon stop` followed by reconnecting from your MCP client.
262
+
263
+ ### Other behavioral flags
264
+
265
+ | Flag | Behavior |
266
+ |---|---|
267
+ | `--read-only` | Disables `execute`, `load_data`, `load_file`, `watch_directory`, `save_query`, `delete_query`, and Hyper-format export. See [Read-Only Mode](#read-only-mode). |
268
+
269
+ ---
270
+
162
271
  ## MCP Tools
163
272
 
164
273
  ### One-Shot Tools
@@ -264,12 +373,27 @@ query(sql: 'SELECT c.name, SUM(o.amount) FROM orders o JOIN customers c ON o.cus
264
373
 
265
374
  #### `execute`
266
375
 
267
- Execute a **mutating** SQL statement: `CREATE TABLE`, `INSERT`, `UPDATE`, `DELETE`, `DROP TABLE`, `ALTER`, `COPY`, etc. Returns the affected row count. Disabled in read-only mode.
376
+ Execute one or more **mutating** SQL statements as an atomic batch: `CREATE TABLE`, `INSERT`, `UPDATE`, `DELETE`, `DROP TABLE`, `ALTER`, `COPY`, etc. `sql` is an array of statements; multi-element batches run inside a transaction (all commit or all roll back). Single-element batches auto-commit, same as a one-off statement. Returns the per-statement affected row counts plus a total. Disabled in read-only mode.
268
377
 
269
378
  ```
270
- execute(sql: 'CREATE TABLE archived_orders AS SELECT * FROM orders WHERE year < 2024')
379
+ // Single statement (auto-commit)
380
+ execute(sql: ['CREATE TABLE archived_orders AS SELECT * FROM orders WHERE year < 2024'])
381
+
382
+ // Atomic upsert — both run or neither runs
383
+ execute(sql: [
384
+ "UPDATE settings SET value = 'dark' WHERE key = 'theme'",
385
+ "INSERT INTO settings (key, value) SELECT 'theme', 'dark' \
386
+ WHERE NOT EXISTS (SELECT 1 FROM settings WHERE key = 'theme')"
387
+ ])
271
388
  ```
272
389
 
390
+ Validation rules enforced before any SQL hits the server:
391
+ - Array must be non-empty; no element may be empty / whitespace-only / comment-only.
392
+ - No element may be read-only — use `query` for SELECT/WITH/EXPLAIN.
393
+ - DDL and DML cannot be mixed in one batch (Hyper aborts mixed transactions with SQLSTATE 0A000).
394
+ - Multi-element all-DDL batches are rejected because Hyper auto-commits CREATE/DROP/ALTER even inside a transaction; issue each DDL in its own `execute` call.
395
+ - Explicit transaction-control statements (`BEGIN` / `COMMIT` / `ROLLBACK` / `SAVEPOINT`) in batch elements are rejected — the tool manages the transaction for you, and a user-issued COMMIT mid-batch would defeat atomicity.
396
+
273
397
  #### `describe`
274
398
 
275
399
  List all workspace tables with their schemas, column types, and row counts.
@@ -343,10 +467,10 @@ Each saved query produces **two** resources:
343
467
  - `hyper://queries/{name}/result` — re-runs the SQL on every read and
344
468
  returns `{ name, result: [...], stats: {...} }`.
345
469
 
346
- **Persistence:** queries saved while `--workspace <path>` is set are
347
- stored in the `_hyperdb_saved_queries` meta-table inside the `.hyper`
348
- file and survive server restarts. In ephemeral workspaces they live only
349
- for the lifetime of the server process.
470
+ **Persistence:** saved queries land in the persistent attachment's
471
+ `_hyperdb_saved_queries` meta-table (`"persistent"."public"."_hyperdb_saved_queries"`)
472
+ and survive server restarts. In `--ephemeral-only` sessions they live
473
+ only for the lifetime of the server process.
350
474
 
351
475
  #### `save_query`
352
476
 
@@ -533,7 +657,7 @@ Four guided analytical workflows registered as MCP **Prompts**.
533
657
  ## Read-Only Mode
534
658
 
535
659
  ```bash
536
- hyperdb-mcp --workspace ~/analytics.hyper --read-only
660
+ hyperdb-mcp --persistent-db ~/analytics.hyper --read-only
537
661
  ```
538
662
 
539
663
  - **Allowed:** `query`, `query_data`, `query_file`, `describe`, `sample`, `inspect_file`, `status`, `export`
@@ -632,6 +756,31 @@ Semantics:
632
756
 
633
757
  Hyper uses the Salesforce Data Cloud SQL dialect (PostgreSQL-compatible with extensions). Supports `SELECT`, JOINs, subqueries, CTEs, window functions, aggregations, DDL, DML, and `COPY FROM`.
634
758
 
759
+ ### Upserts (INSERT or UPDATE)
760
+
761
+ Hyper does **not** support `ON CONFLICT` or `INSERT ... ON DUPLICATE KEY`. Use the `execute` tool's atomic batch shape instead:
762
+
763
+ ```
764
+ execute(sql: [
765
+ "UPDATE settings SET value = 'dark' WHERE key = 'theme'",
766
+ "INSERT INTO settings (key, value) SELECT 'theme', 'dark' \
767
+ WHERE NOT EXISTS (SELECT 1 FROM settings WHERE key = 'theme')"
768
+ ])
769
+ ```
770
+
771
+ Both statements run inside a single Hyper transaction — they commit together or both roll back. No race window between them.
772
+
773
+ > **Tip:** For file-based upserts (merging updated data from a CSV/JSON file into an existing table), use `load_file` with `mode: "merge"` and a `merge_key` instead of writing manual SQL — it handles the UPDATE/INSERT logic automatically and also auto-adds new columns.
774
+
775
+ ### Transactions
776
+
777
+ The Hyper Rust API supports `BEGIN` / `COMMIT` / `ROLLBACK` plus an RAII `Transaction` guard (see [`docs/TRANSACTIONS.md`](../docs/TRANSACTIONS.md)). The MCP `execute` tool surfaces this as the `sql` array shape: pass multiple statements and they run atomically.
778
+
779
+ Hyper-specific limits worth remembering when batching:
780
+ - **DDL after DML in the same transaction is rejected** with SQLSTATE 0A000. The `execute` tool catches this up front — mixing CREATE/DROP/ALTER with INSERT/UPDATE/DELETE in one batch is rejected with an actionable error.
781
+ - **DDL is auto-committed** even inside a transaction. `execute` rejects multi-element all-DDL batches because the "atomic" promise can't be honored — issue each DDL call as its own one-element array.
782
+ - **After any error inside a transaction**, the connection enters aborted state and only ROLLBACK is accepted next. The `execute` tool handles this for you — on any per-statement failure the wrapper issues ROLLBACK before surfacing the error.
783
+
635
784
  Full reference: [Data Cloud SQL Reference](https://developer.salesforce.com/docs/data/data-cloud-query-guide/references/dc-sql-reference/data-cloud-sql-context.html)
636
785
 
637
786
  ---
@@ -639,15 +788,40 @@ Full reference: [Data Cloud SQL Reference](https://developer.salesforce.com/docs
639
788
  ## CLI Reference
640
789
 
641
790
  ```
642
- hyperdb-mcp [OPTIONS]
791
+ hyperdb-mcp [OPTIONS] [COMMAND]
792
+
793
+ Commands:
794
+ daemon Run as a background daemon managing a shared hyperd process
643
795
 
644
796
  Options:
645
- --workspace <PATH> Path to the `.hyper` workspace file for persistent mode (omit for ephemeral)
646
- --read-only Disable mutating tools (execute, load_data, load_file, save_query, delete_query, watch_directory)
647
- --bare Skip MCP-managed auxiliary tables (`_table_catalog`) and force saved queries into in-memory storage, even with --workspace
797
+ --persistent-db <PATH> Path to the persistent .hyper file. Defaults to the platform
798
+ data dir (~/Library/Application Support/hyperdb/workspace.hyper
799
+ on macOS, ~/.local/share/hyperdb/workspace.hyper on Linux,
800
+ %APPDATA%\hyperdb\workspace.hyper on Windows). Override via
801
+ the HYPERDB_PERSISTENT_DB env var.
802
+ --ephemeral-only Skip the persistent attachment entirely. Disables save_query
803
+ persistence (queries fall back to session storage).
804
+ --read-only Disable mutating tools (execute, load_data, load_file,
805
+ save_query, delete_query, watch_directory)
806
+ --no-daemon Disable the shared daemon and spawn a private hyperd
807
+
808
+ Deprecated:
809
+ --workspace <PATH> Old name for --persistent-db. Still accepted, emits a
810
+ stderr warning, and will be removed in a future release.
811
+
812
+ Daemon subcommand:
813
+ hyperdb-mcp daemon Start the daemon (usually auto-spawned)
814
+ hyperdb-mcp daemon stop Gracefully stop the running daemon
815
+ hyperdb-mcp daemon status Show running daemon info
816
+ hyperdb-mcp daemon --port <PORT> Override the health/lock port (default 7484)
817
+ hyperdb-mcp daemon --idle-timeout <SECS> Override idle timeout (default 1800 = 30 min)
648
818
 
649
819
  Environment:
650
- HYPERD_PATH Path to hyperd binary (auto-detected if on PATH)
820
+ HYPERD_PATH Path to hyperd binary (auto-detected if on PATH)
821
+ HYPERDB_PERSISTENT_DB Override the default persistent-db path
822
+ HYPERDB_STATE_DIR Override daemon state directory (default ~/.hyperdb/)
823
+ HYPERDB_DAEMON_PORT Override daemon health/lock port (default 7484)
824
+ HYPERDB_DAEMON_IDLE_TIMEOUT Override daemon idle timeout in seconds (default 1800)
651
825
  ```
652
826
 
653
827
  ---
package/package.json CHANGED
@@ -1,15 +1,9 @@
1
1
  {
2
2
  "name": "hyperdb-mcp",
3
- "version": "0.1.3",
4
3
  "description": "HyperDB MCP server — instant SQL analytics for LLM workflows",
5
4
  "bin": {
6
5
  "hyperdb-mcp": "bin.js"
7
6
  },
8
- "optionalDependencies": {
9
- "hyperdb-mcp-darwin-arm64": "0.1.3",
10
- "hyperdb-mcp-linux-x64-gnu": "0.1.3",
11
- "hyperdb-mcp-win32-x64-msvc": "0.1.3"
12
- },
13
7
  "files": [
14
8
  "bin.js",
15
9
  "README.md"
@@ -28,11 +22,17 @@
28
22
  },
29
23
  "repository": {
30
24
  "type": "git",
31
- "url": "https://github.com/tableau/hyper-api-rust.git",
25
+ "url": "git+https://github.com/tableau/hyper-api-rust.git",
32
26
  "directory": "hyperdb-mcp"
33
27
  },
34
28
  "license": "MIT OR Apache-2.0",
35
29
  "engines": {
36
30
  "node": ">= 21"
31
+ },
32
+ "version": "0.2.3",
33
+ "optionalDependencies": {
34
+ "hyperdb-mcp-darwin-arm64": "0.2.3",
35
+ "hyperdb-mcp-linux-x64-gnu": "0.2.3",
36
+ "hyperdb-mcp-win32-x64-msvc": "0.2.3"
37
37
  }
38
38
  }