npm - hyperdb-mcp - Versions diffs - 0.1.3 → 0.2.3 - Mend

hyperdb-mcp 0.1.3 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +191 -17
package/package.json +7 -7

package/README.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # hyperdb-mcp
+> **Note:** This crate was vibe-engineered with heavy use of AI coding assistants. The 0.2.x line may still undergo large breaking changes; the public API won't settle until the 1.0.0 release.
 An MCP (Model Context Protocol) server that turns the Hyper columnar database into an instant SQL analytics engine. Data flows in from other MCP plugins or files, lands in Hyper automatically, and becomes queryable with SQL — no setup, no schema files, no database management.
 Built on the pure-Rust [`hyperdb-api`](../hyperdb-api/) crate for maximum performance: 22M+ rows/sec inserts, 18M+ rows/sec queries, constant memory for billion-row results.
@@ -10,16 +12,31 @@ Built on the pure-Rust [`hyperdb-api`](../hyperdb-api/) crate for maximum perfor
 LLMs are powerful at reasoning but cannot natively crunch millions of rows. This plugin bridges that gap: another MCP tool produces data, the LLM passes it to `hyperdb-mcp`, Hyper ingests it and makes it SQL-queryable, the LLM runs analytical SQL, and results come back as JSON. Optionally export to CSV, Parquet, Apache Iceberg, Arrow IPC, or `.hyper` (opens directly in **Tableau Desktop**).
+### Queryable Memory for AI
+Unlike flat-text memory systems that store blobs and retrieve by similarity search, HyperDB gives LLMs **structured, queryable long-term memory**. The persistent database survives across sessions — anything the LLM stores there can be JOINed, filtered, aggregated, and reasoned over with full SQL in any future conversation.
+This means an LLM can:
+- **Accumulate knowledge over time** — store reference tables, project decisions, user preferences, learned facts
+- **Cross-reference across sessions** — JOIN today's analysis against historical data from last week
+- **Answer complex recall questions** — "Which projects had budget overruns in Q1?" is a SQL query, not a fuzzy text search
+- **Build on prior work** — load yesterday's cleaned dataset and extend it without re-processing from scratch
+- **Maintain structured context** — store relationship graphs, timelines, or decision logs as proper tables with typed columns
+The ephemeral database is scratch space (think: a whiteboard). The persistent database is long-term memory (think: a filing cabinet you can query). Multiple AI clients sharing the same daemon see the same persistent data — so Claude Code, Cursor, and VS Code Copilot can all read from and contribute to the same knowledge base.
 ---
 ## Features
 - **Zero setup** — `HyperProcess` auto-starts the Hyper server
+- **Shared `hyperd` daemon** — one Hyper process per user, shared across all MCP clients (Claude Code, Cursor, VS Code, etc.) for reduced memory overhead and concurrent access to the same persistent databases
+- **Queryable long-term memory** — persistent database survives across sessions; LLMs can store, recall, JOIN, and aggregate structured knowledge over time — not just retrieve text blobs, but reason over them with SQL
 - **Any data in** — JSON, CSV, Parquet, Arrow IPC, Apache Iceberg; schema inferred or exact
 - **SQL at scale** — thousands to billions of rows
 - **Data out** — export to CSV, Parquet, Apache Iceberg, Arrow IPC, or `.hyper` (Tableau Desktop-ready)
 - **One-shot queries** — `query_file("/tmp/sales.csv", "SELECT ...")` — single call, zero management
-- **Persistent workspace** — load multiple tables, JOIN across them, persist across sessions
+- **Cross-session continuity** — load multiple tables, JOIN across them, persist across sessions; pick up exactly where you left off
 - **Read-only safe mode** — `--read-only` flag for safe deployment
 - **Schema resources** — auto-discover table schemas via `resources/list`
 - **Guided prompts** — `analyze-table`, `compare-tables`, `data-quality`, `suggest-queries`
@@ -30,7 +47,7 @@ LLMs are powerful at reasoning but cannot natively crunch millions of rows. This
 - **Pre-ingest file inspection** — `inspect_file` dry-runs the same inference without touching Hyper so LLMs can build safe schema overrides in one shot
 - **Partial schema overrides** — supply just the columns you want to correct (e.g. `{"population":"BIGINT"}`) — the rest keep their inferred type
 - **Rich resource surface** — workspace readme, per-table JSON and CSV samples, and one JSON + one CSV resource per table so LLMs can orient themselves via `resources/list` without any tool calls
-- **Saved queries** — register named read-only SQL with `save_query`; each query becomes `hyper://queries/{name}/definition` (metadata) + `hyper://queries/{name}/result` (live re-run). Persisted in `--workspace` mode, session-only otherwise
+- **Saved queries** — register named read-only SQL with `save_query`; each query becomes `hyper://queries/{name}/definition` (metadata) + `hyper://queries/{name}/result` (live re-run). Persisted in the persistent attachment, session-only when `--ephemeral-only`
 - **Live resource-update notifications** — MCP clients can `resources/subscribe` to any `hyper://...` URI; the server fires `notifications/resources/updated` after every ingest, DDL, watcher event, or saved-query mutation
 ---
@@ -131,11 +148,12 @@ Or if you built from source:
 }
 ```
-For a **persistent workspace** (tables survive across sessions), add `"args"`:
+By default, persistent storage lives at the platform data dir (`~/Library/Application Support/hyperdb/workspace.hyper` on macOS, `~/.local/share/hyperdb/workspace.hyper` on Linux, `%APPDATA%\hyperdb\workspace.hyper` on Windows). To use a custom path:
 ```json
-"args": ["--workspace", "/path/to/my-project.hyper"]
+"args": ["--persistent-db", "/path/to/my-project.hyper"]
 ```
-This is still **experimental** and will only work with only one session at a time since the Hyper database is locked by Hyper. Each session is isolated and has its own Hyper instance running. Future work will allow multiple sessions to share the same database but requires work to spin up a shared Hyper instance.
+Multiple MCP clients can point at the **same** persistent file simultaneously — they all connect through the shared `hyperd` daemon and use Hyper's MVCC transaction isolation. See [Operating Modes](#operating-modes) below.
 #### Claude Code / AI Suite
@@ -159,6 +177,97 @@ Any tool that supports the MCP stdio transport can use this server. Point it at
 ---
+## Operating Modes
+Each session has **two databases**: an ephemeral primary (scratch space — always created fresh per session, deleted on exit) and a persistent database (queryable long-term memory — stored at the platform-default location or a path you supply, survives indefinitely). Unqualified SQL targets the ephemeral primary; the persistent database is reachable as the `"persistent"` alias.
+### Hyper engine
+| Mode | Flag | Behavior |
+|---|---|---|
+| **Shared daemon** *(default)* | *(none)* | One `hyperd` process per user, shared across all MCP clients. The first client auto-spawns the daemon; subsequent clients discover and reuse it. Idle for 30 minutes → daemon shuts itself down; the next client spawns a fresh one. |
+| **Private hyperd** | `--no-daemon` | Each MCP client spawns its own `hyperd` (legacy behavior, one per session). |
+The shared daemon is the bigger win for users running multiple AI clients (Claude Code + Cursor + VS Code) — they all share one Hyper engine instead of spawning three.
+### Database storage
+| Mode | Flag | Behavior |
+|---|---|---|
+| **Default** | *(none)* | Ephemeral primary in `$TMPDIR/hyperdb-mcp-<pid>-<n>/scratch.hyper` + persistent attachment at the platform data dir (e.g. `~/Library/Application Support/hyperdb/workspace.hyper` on macOS). |
+| **Custom persistent path** | `--persistent-db <PATH>` | Same as default but the persistent file lives at `<PATH>`. The deprecated `--workspace <PATH>` is accepted as an alias with a stderr warning. |
+| **Ephemeral-only** | `--ephemeral-only` | No persistent attachment; the session has only the ephemeral primary plus any user-attached databases via `attach_database`. Saved queries fall back to in-memory storage and disappear when the session ends. |
+`HYPERDB_PERSISTENT_DB` overrides the default persistent path the same way `--persistent-db` does.
+### Working with both databases
+Tool calls default to the ephemeral primary — that's the LLM's scratch space for exploratory work that doesn't need to outlive the session. To store data in long-term memory (the persistent database), there are two ways to reach it:
+**1. Per-tool `database` parameter** (preferred for ergonomic LLM workflows):
+```jsonc
+// Save a useful table to the persistent database
+load_data({ table: "customers", data: "[...]", persist: true })
+//   ↑ shorthand for `database: "persistent"`
+// Query from persistent
+query({ sql: "SELECT * FROM customers", database: "persistent" })
+// Inspect persistent tables
+describe({ database: "persistent" })
+sample({ table: "customers", database: "persistent" })
+```
+The `database` parameter is available on `query`, `execute`, `load_data`, `load_file`, `load_files`, `watch_directory`, `describe`, `sample`, `chart`, `export`, and `set_table_metadata`. The shorthand `persist: true` (sugar for `database: "persistent"`) is available on `load_data`, `load_file`, `load_files`, and `watch_directory`. Pass any user-attached writable alias (created via `attach_database`) to target a custom database.
+(`query_data` and `query_file` are one-shot tools that materialize the inline data into their own temp table and query it — they do not accept a `database` parameter because the data isn't in a persisted database to begin with.)
+**2. Fully-qualified SQL** (for power users or complex multi-DB joins):
+```sql
+-- Read from persistent
+SELECT * FROM "persistent"."public"."customers";
+-- Write to persistent
+CREATE TABLE "persistent"."public"."revenue_2026" AS
+  SELECT region, SUM(amount) FROM scratch_orders GROUP BY region;
+```
+**Per-database `_table_catalog`:** every writable database — persistent and any user-attached writable file — gets its own `_table_catalog` lazily seeded on first ingest. MCP-managed metadata (load tool, params, timestamps, prose fields set via `set_table_metadata`) lives alongside the data file, so opening a `.hyper` file later as a primary workspace finds the catalog ready. If you want a pristine `.hyper` file for export with no MCP bookkeeping, run `DROP TABLE "<alias>"."public"."_table_catalog"` once and subsequent sessions opening that file will leave it dropped.
+**Detach safety:** `detach_database` rejects with `InvalidArgument` if any active watcher targets the alias — call `unwatch_directory` first. This prevents the watcher's pool from silently writing into a now-detached file (or worse, the wrong file if the alias is later re-attached to a different path).
+### Daemon management
+The daemon is normally invisible — it auto-spawns and idle-times-out on its own. For diagnostics:
+```bash
+hyperdb-mcp daemon status   # Show running daemon (PID, endpoint, started_at, version)
+hyperdb-mcp daemon stop     # Gracefully shut down the daemon
+hyperdb-mcp daemon          # Run as a daemon explicitly (rarely needed)
+```
+State files live at `~/.hyperdb/` by default (override with `HYPERDB_STATE_DIR`).
+### Recovery from hyperd crashes
+The daemon polls `hyperd` every 5 seconds. If the process has exited (crashed, OOM, killed), the daemon spawns a replacement, atomically updates `~/.hyperdb/daemon.json` with the new endpoint, and continues serving clients. Clients see one failed tool call (the request that was in flight when hyperd died); the next tool call transparently reconnects to the new hyperd via the same recovery path used for normal connection drops.
+If a client itself notices hyperd is unreachable before the next polling tick, it sends a fast-path `REPORT_HYPERD_ERROR` signal to the daemon so the restart kicks off without waiting for the timer.
+If hyperd repeatedly fails to start (3 attempts within 60 seconds — e.g., misconfigured `HYPERD_PATH`, port exhaustion, broken binary), the daemon shuts itself down and removes the discovery file. The next MCP client to start up will then spawn a fresh daemon, surfacing any persistent failure clearly to the user rather than spinning silently.
+**Known limitation:** if hyperd hangs (alive at the OS level but unresponsive to queries), the daemon's polling can't detect it and your tool call may stall indefinitely. The recovery path is `hyperdb-mcp daemon stop` followed by reconnecting from your MCP client.
+### Other behavioral flags
+| Flag | Behavior |
+|---|---|
+| `--read-only` | Disables `execute`, `load_data`, `load_file`, `watch_directory`, `save_query`, `delete_query`, and Hyper-format export. See [Read-Only Mode](#read-only-mode). |
+---
 ## MCP Tools
 ### One-Shot Tools
@@ -264,12 +373,27 @@ query(sql: 'SELECT c.name, SUM(o.amount) FROM orders o JOIN customers c ON o.cus
 #### `execute`
-Execute a **mutating** SQL statement: `CREATE TABLE`, `INSERT`, `UPDATE`, `DELETE`, `DROP TABLE`, `ALTER`, `COPY`, etc. Returns the affected row count. Disabled in read-only mode.
+Execute one or more **mutating** SQL statements as an atomic batch: `CREATE TABLE`, `INSERT`, `UPDATE`, `DELETE`, `DROP TABLE`, `ALTER`, `COPY`, etc. `sql` is an array of statements; multi-element batches run inside a transaction (all commit or all roll back). Single-element batches auto-commit, same as a one-off statement. Returns the per-statement affected row counts plus a total. Disabled in read-only mode.
 ```
-execute(sql: 'CREATE TABLE archived_orders AS SELECT * FROM orders WHERE year < 2024')
+// Single statement (auto-commit)
+execute(sql: ['CREATE TABLE archived_orders AS SELECT * FROM orders WHERE year < 2024'])
+// Atomic upsert — both run or neither runs
+execute(sql: [
+  "UPDATE settings SET value = 'dark' WHERE key = 'theme'",
+  "INSERT INTO settings (key, value) SELECT 'theme', 'dark' \
+     WHERE NOT EXISTS (SELECT 1 FROM settings WHERE key = 'theme')"
+])
 ```
+Validation rules enforced before any SQL hits the server:
+- Array must be non-empty; no element may be empty / whitespace-only / comment-only.
+- No element may be read-only — use `query` for SELECT/WITH/EXPLAIN.
+- DDL and DML cannot be mixed in one batch (Hyper aborts mixed transactions with SQLSTATE 0A000).
+- Multi-element all-DDL batches are rejected because Hyper auto-commits CREATE/DROP/ALTER even inside a transaction; issue each DDL in its own `execute` call.
+- Explicit transaction-control statements (`BEGIN` / `COMMIT` / `ROLLBACK` / `SAVEPOINT`) in batch elements are rejected — the tool manages the transaction for you, and a user-issued COMMIT mid-batch would defeat atomicity.
 #### `describe`
 List all workspace tables with their schemas, column types, and row counts.
@@ -343,10 +467,10 @@ Each saved query produces **two** resources:
 - `hyper://queries/{name}/result` — re-runs the SQL on every read and
   returns `{ name, result: [...], stats: {...} }`.
-**Persistence:** queries saved while `--workspace <path>` is set are
-stored in the `_hyperdb_saved_queries` meta-table inside the `.hyper`
-file and survive server restarts. In ephemeral workspaces they live only
-for the lifetime of the server process.
+**Persistence:** saved queries land in the persistent attachment's
+`_hyperdb_saved_queries` meta-table (`"persistent"."public"."_hyperdb_saved_queries"`)
+and survive server restarts. In `--ephemeral-only` sessions they live
+only for the lifetime of the server process.
 #### `save_query`
@@ -533,7 +657,7 @@ Four guided analytical workflows registered as MCP **Prompts**.
 ## Read-Only Mode
 ```bash
-hyperdb-mcp --workspace ~/analytics.hyper --read-only
+hyperdb-mcp --persistent-db ~/analytics.hyper --read-only
 ```
 - **Allowed:** `query`, `query_data`, `query_file`, `describe`, `sample`, `inspect_file`, `status`, `export`
@@ -632,6 +756,31 @@ Semantics:
 Hyper uses the Salesforce Data Cloud SQL dialect (PostgreSQL-compatible with extensions). Supports `SELECT`, JOINs, subqueries, CTEs, window functions, aggregations, DDL, DML, and `COPY FROM`.
+### Upserts (INSERT or UPDATE)
+Hyper does **not** support `ON CONFLICT` or `INSERT ... ON DUPLICATE KEY`. Use the `execute` tool's atomic batch shape instead:
+```
+execute(sql: [
+  "UPDATE settings SET value = 'dark' WHERE key = 'theme'",
+  "INSERT INTO settings (key, value) SELECT 'theme', 'dark' \
+     WHERE NOT EXISTS (SELECT 1 FROM settings WHERE key = 'theme')"
+])
+```
+Both statements run inside a single Hyper transaction — they commit together or both roll back. No race window between them.
+> **Tip:** For file-based upserts (merging updated data from a CSV/JSON file into an existing table), use `load_file` with `mode: "merge"` and a `merge_key` instead of writing manual SQL — it handles the UPDATE/INSERT logic automatically and also auto-adds new columns.
+### Transactions
+The Hyper Rust API supports `BEGIN` / `COMMIT` / `ROLLBACK` plus an RAII `Transaction` guard (see [`docs/TRANSACTIONS.md`](../docs/TRANSACTIONS.md)). The MCP `execute` tool surfaces this as the `sql` array shape: pass multiple statements and they run atomically.
+Hyper-specific limits worth remembering when batching:
+- **DDL after DML in the same transaction is rejected** with SQLSTATE 0A000. The `execute` tool catches this up front — mixing CREATE/DROP/ALTER with INSERT/UPDATE/DELETE in one batch is rejected with an actionable error.
+- **DDL is auto-committed** even inside a transaction. `execute` rejects multi-element all-DDL batches because the "atomic" promise can't be honored — issue each DDL call as its own one-element array.
+- **After any error inside a transaction**, the connection enters aborted state and only ROLLBACK is accepted next. The `execute` tool handles this for you — on any per-statement failure the wrapper issues ROLLBACK before surfacing the error.
 Full reference: [Data Cloud SQL Reference](https://developer.salesforce.com/docs/data/data-cloud-query-guide/references/dc-sql-reference/data-cloud-sql-context.html)
 ---
@@ -639,15 +788,40 @@ Full reference: [Data Cloud SQL Reference](https://developer.salesforce.com/docs
 ## CLI Reference
 ```
-hyperdb-mcp [OPTIONS]
+hyperdb-mcp [OPTIONS] [COMMAND]
+Commands:
+  daemon                  Run as a background daemon managing a shared hyperd process
 Options:
-  --workspace <PATH>    Path to the `.hyper` workspace file for persistent mode (omit for ephemeral)
-  --read-only           Disable mutating tools (execute, load_data, load_file, save_query, delete_query, watch_directory)
-  --bare                Skip MCP-managed auxiliary tables (`_table_catalog`) and force saved queries into in-memory storage, even with --workspace
+  --persistent-db <PATH>  Path to the persistent .hyper file. Defaults to the platform
+                          data dir (~/Library/Application Support/hyperdb/workspace.hyper
+                          on macOS, ~/.local/share/hyperdb/workspace.hyper on Linux,
+                          %APPDATA%\hyperdb\workspace.hyper on Windows). Override via
+                          the HYPERDB_PERSISTENT_DB env var.
+  --ephemeral-only        Skip the persistent attachment entirely. Disables save_query
+                          persistence (queries fall back to session storage).
+  --read-only             Disable mutating tools (execute, load_data, load_file,
+                          save_query, delete_query, watch_directory)
+  --no-daemon             Disable the shared daemon and spawn a private hyperd
+Deprecated:
+  --workspace <PATH>      Old name for --persistent-db. Still accepted, emits a
+                          stderr warning, and will be removed in a future release.
+Daemon subcommand:
+  hyperdb-mcp daemon                          Start the daemon (usually auto-spawned)
+  hyperdb-mcp daemon stop                     Gracefully stop the running daemon
+  hyperdb-mcp daemon status                   Show running daemon info
+  hyperdb-mcp daemon --port <PORT>            Override the health/lock port (default 7484)
+  hyperdb-mcp daemon --idle-timeout <SECS>    Override idle timeout (default 1800 = 30 min)
 Environment:
-  HYPERD_PATH           Path to hyperd binary (auto-detected if on PATH)
+  HYPERD_PATH                  Path to hyperd binary (auto-detected if on PATH)
+  HYPERDB_PERSISTENT_DB        Override the default persistent-db path
+  HYPERDB_STATE_DIR            Override daemon state directory (default ~/.hyperdb/)
+  HYPERDB_DAEMON_PORT          Override daemon health/lock port (default 7484)
+  HYPERDB_DAEMON_IDLE_TIMEOUT  Override daemon idle timeout in seconds (default 1800)
 ```
 ---

package/package.json CHANGED Viewed

@@ -1,15 +1,9 @@
 {
   "name": "hyperdb-mcp",
-  "version": "0.1.3",
   "description": "HyperDB MCP server — instant SQL analytics for LLM workflows",
   "bin": {
     "hyperdb-mcp": "bin.js"
   },
-  "optionalDependencies": {
-    "hyperdb-mcp-darwin-arm64": "0.1.3",
-    "hyperdb-mcp-linux-x64-gnu": "0.1.3",
-    "hyperdb-mcp-win32-x64-msvc": "0.1.3"
-  },
   "files": [
     "bin.js",
     "README.md"
@@ -28,11 +22,17 @@
   },
   "repository": {
     "type": "git",
-    "url": "https://github.com/tableau/hyper-api-rust.git",
+    "url": "git+https://github.com/tableau/hyper-api-rust.git",
     "directory": "hyperdb-mcp"
   },
   "license": "MIT OR Apache-2.0",
   "engines": {
     "node": ">= 21"
+  },
+  "version": "0.2.3",
+  "optionalDependencies": {
+    "hyperdb-mcp-darwin-arm64": "0.2.3",
+    "hyperdb-mcp-linux-x64-gnu": "0.2.3",
+    "hyperdb-mcp-win32-x64-msvc": "0.2.3"
   }
 }