hyperdb-mcp 0.1.1 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +828 -0
- package/package.json +6 -5
package/README.md
ADDED
|
@@ -0,0 +1,828 @@
|
|
|
1
|
+
# hyperdb-mcp
|
|
2
|
+
|
|
3
|
+
> **Note:** This crate was vibe-engineered with heavy use of AI coding assistants. The 0.1.x line may still undergo large breaking changes; the public API won't settle until the 1.0.0 release.
|
|
4
|
+
|
|
5
|
+
An MCP (Model Context Protocol) server that turns the Hyper columnar database into an instant SQL analytics engine. Data flows in from other MCP plugins or files, lands in Hyper automatically, and becomes queryable with SQL — no setup, no schema files, no database management.
|
|
6
|
+
|
|
7
|
+
Built on the pure-Rust [`hyperdb-api`](../hyperdb-api/) crate for maximum performance: 22M+ rows/sec inserts, 18M+ rows/sec queries, constant memory for billion-row results.
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## Why
|
|
12
|
+
|
|
13
|
+
LLMs are powerful at reasoning but cannot natively crunch millions of rows. This plugin bridges that gap: another MCP tool produces data, the LLM passes it to `hyperdb-mcp`, Hyper ingests it and makes it SQL-queryable, the LLM runs analytical SQL, and results come back as JSON. Optionally export to CSV, Parquet, Apache Iceberg, Arrow IPC, or `.hyper` (opens directly in **Tableau Desktop**).
|
|
14
|
+
|
|
15
|
+
### Queryable Memory for AI
|
|
16
|
+
|
|
17
|
+
Unlike flat-text memory systems that store blobs and retrieve by similarity search, HyperDB gives LLMs **structured, queryable long-term memory**. The persistent database survives across sessions — anything the LLM stores there can be JOINed, filtered, aggregated, and reasoned over with full SQL in any future conversation.
|
|
18
|
+
|
|
19
|
+
This means an LLM can:
|
|
20
|
+
- **Accumulate knowledge over time** — store reference tables, project decisions, user preferences, learned facts
|
|
21
|
+
- **Cross-reference across sessions** — JOIN today's analysis against historical data from last week
|
|
22
|
+
- **Answer complex recall questions** — "Which projects had budget overruns in Q1?" is a SQL query, not a fuzzy text search
|
|
23
|
+
- **Build on prior work** — load yesterday's cleaned dataset and extend it without re-processing from scratch
|
|
24
|
+
- **Maintain structured context** — store relationship graphs, timelines, or decision logs as proper tables with typed columns
|
|
25
|
+
|
|
26
|
+
The ephemeral database is scratch space (think: a whiteboard). The persistent database is long-term memory (think: a filing cabinet you can query). Multiple AI clients sharing the same daemon see the same persistent data — so Claude Code, Cursor, and VS Code Copilot can all read from and contribute to the same knowledge base.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Features
|
|
31
|
+
|
|
32
|
+
- **Zero setup** — `HyperProcess` auto-starts the Hyper server
|
|
33
|
+
- **Shared `hyperd` daemon** — one Hyper process per user, shared across all MCP clients (Claude Code, Cursor, VS Code, etc.) for reduced memory overhead and concurrent access to the same persistent databases
|
|
34
|
+
- **Queryable long-term memory** — persistent database survives across sessions; LLMs can store, recall, JOIN, and aggregate structured knowledge over time — not just retrieve text blobs, but reason over them with SQL
|
|
35
|
+
- **Any data in** — JSON, CSV, Parquet, Arrow IPC, Apache Iceberg; schema inferred or exact
|
|
36
|
+
- **SQL at scale** — thousands to billions of rows
|
|
37
|
+
- **Data out** — export to CSV, Parquet, Apache Iceberg, Arrow IPC, or `.hyper` (Tableau Desktop-ready)
|
|
38
|
+
- **One-shot queries** — `query_file("/tmp/sales.csv", "SELECT ...")` — single call, zero management
|
|
39
|
+
- **Cross-session continuity** — load multiple tables, JOIN across them, persist across sessions; pick up exactly where you left off
|
|
40
|
+
- **Read-only safe mode** — `--read-only` flag for safe deployment
|
|
41
|
+
- **Schema resources** — auto-discover table schemas via `resources/list`
|
|
42
|
+
- **Guided prompts** — `analyze-table`, `compare-tables`, `data-quality`, `suggest-queries`
|
|
43
|
+
- **Inline charts** — bar/line/scatter/histogram as PNG or SVG
|
|
44
|
+
- **Incremental ingest** — `watch_directory` monitors for `.ready` sentinel files
|
|
45
|
+
- **Performance telemetry** — every response includes throughput stats
|
|
46
|
+
- **Smart schema inference** — exact (Arrow/Parquet), structural (JSON), heuristic (CSV) with full-file numeric widening
|
|
47
|
+
- **Pre-ingest file inspection** — `inspect_file` dry-runs the same inference without touching Hyper so LLMs can build safe schema overrides in one shot
|
|
48
|
+
- **Partial schema overrides** — supply just the columns you want to correct (e.g. `{"population":"BIGINT"}`) — the rest keep their inferred type
|
|
49
|
+
- **Rich resource surface** — workspace readme, per-table JSON and CSV samples, and one JSON + one CSV resource per table so LLMs can orient themselves via `resources/list` without any tool calls
|
|
50
|
+
- **Saved queries** — register named read-only SQL with `save_query`; each query becomes `hyper://queries/{name}/definition` (metadata) + `hyper://queries/{name}/result` (live re-run). Persisted in the persistent attachment, session-only when `--ephemeral-only`
|
|
51
|
+
- **Live resource-update notifications** — MCP clients can `resources/subscribe` to any `hyper://...` URI; the server fires `notifications/resources/updated` after every ingest, DDL, watcher event, or saved-query mutation
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## Installation
|
|
56
|
+
|
|
57
|
+
### From npm
|
|
58
|
+
|
|
59
|
+
> **Requirement:** Node.js **v21 or later**. Earlier versions ship an
|
|
60
|
+
> older `npx` whose argument parsing is incompatible with the
|
|
61
|
+
> `npx -y hyperdb-mcp` invocation in the MCP config below. If you're
|
|
62
|
+
> on an older Node, see [Upgrading Node.js with nvm](#upgrading-nodejs-with-nvm)
|
|
63
|
+
> below.
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
npm install -g hyperdb-mcp
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
The npm package bundles both the `hyperdb-mcp` binary and the `hyperd` database server — no additional setup required.
|
|
70
|
+
|
|
71
|
+
### Upgrading Node.js with nvm
|
|
72
|
+
|
|
73
|
+
`nvm` (Node Version Manager) makes it easy to install and switch between Node.js versions.
|
|
74
|
+
|
|
75
|
+
**macOS / Linux** ([nvm-sh/nvm](https://github.com/nvm-sh/nvm)):
|
|
76
|
+
```bash
|
|
77
|
+
# install nvm if you don't have it
|
|
78
|
+
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
|
|
79
|
+
|
|
80
|
+
# install and use the latest LTS (>= 21)
|
|
81
|
+
nvm install --lts
|
|
82
|
+
nvm use --lts
|
|
83
|
+
node --version # should report v22.x.x or newer
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**Windows** ([coreybutler/nvm-windows](https://github.com/coreybutler/nvm-windows)): download the installer, then in a new shell:
|
|
87
|
+
```powershell
|
|
88
|
+
nvm install lts
|
|
89
|
+
nvm use lts
|
|
90
|
+
node --version
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
After upgrading, restart your MCP client so it picks up the new Node binary on `PATH`.
|
|
94
|
+
|
|
95
|
+
### Building from Source
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
cd hyper-api-rust
|
|
99
|
+
cargo build --release -p hyperdb-mcp
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
The binary is at `target/release/hyperdb-mcp`.
|
|
103
|
+
|
|
104
|
+
When building from source the `hyperd` executable is **not** bundled, so
|
|
105
|
+
you'll need to provide one. The easiest path is the companion
|
|
106
|
+
[`hyperdb-bootstrap`](../hyperdb-bootstrap/) CLI, which downloads a
|
|
107
|
+
matching pinned `hyperd` for your platform:
|
|
108
|
+
|
|
109
|
+
```bash
|
|
110
|
+
cargo install hyperdb-bootstrap
|
|
111
|
+
hyperdb-bootstrap download # installs into ./.hyperd/current/
|
|
112
|
+
export HYPERD_PATH="$PWD/.hyperd/current" # or pass via your MCP config
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
`hyperdb-bootstrap` also has a library API if you'd rather wire the
|
|
116
|
+
download into your own build script — see its
|
|
117
|
+
[README](../hyperdb-bootstrap/README.md). If you already have `hyperd`
|
|
118
|
+
elsewhere (Tableau Hyper API for C++/Python/Java ships one), point
|
|
119
|
+
`HYPERD_PATH` at it or add it to your `PATH`.
|
|
120
|
+
|
|
121
|
+
### MCP Client Configuration
|
|
122
|
+
|
|
123
|
+
Each AI tool reads MCP server config from a different file but uses the same JSON shape. The base config block using npx (recommended):
|
|
124
|
+
```json
|
|
125
|
+
{
|
|
126
|
+
"mcpServers": {
|
|
127
|
+
"HyperDB": {
|
|
128
|
+
"type": "stdio",
|
|
129
|
+
"command": "npx",
|
|
130
|
+
"args": ["-y", "hyperdb-mcp"]
|
|
131
|
+
}
|
|
132
|
+
}
|
|
133
|
+
}
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
Or if you built from source:
|
|
137
|
+
```json
|
|
138
|
+
{
|
|
139
|
+
"mcpServers": {
|
|
140
|
+
"HyperDB": {
|
|
141
|
+
"type": "stdio",
|
|
142
|
+
"command": "/path/to/hyperdb-mcp",
|
|
143
|
+
"env": {
|
|
144
|
+
"HYPERD_PATH": "/path/to/hyperd"
|
|
145
|
+
}
|
|
146
|
+
}
|
|
147
|
+
}
|
|
148
|
+
}
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
By default, persistent storage lives at the platform data dir (`~/Library/Application Support/hyperdb/workspace.hyper` on macOS, `~/.local/share/hyperdb/workspace.hyper` on Linux, `%APPDATA%\hyperdb\workspace.hyper` on Windows). To use a custom path:
|
|
152
|
+
```json
|
|
153
|
+
"args": ["--persistent-db", "/path/to/my-project.hyper"]
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
Multiple MCP clients can point at the **same** persistent file simultaneously — they all connect through the shared `hyperd` daemon and use Hyper's MVCC transaction isolation. See [Operating Modes](#operating-modes) below.
|
|
157
|
+
|
|
158
|
+
#### Claude Code / AI Suite
|
|
159
|
+
|
|
160
|
+
Create or edit `~/.claude/.mcp.json` (global) or `.mcp.json` in the project root (project-scoped). Use the base config block above.
|
|
161
|
+
|
|
162
|
+
After adding the config:
|
|
163
|
+
1. Start a new Claude Code session. You'll be prompted to approve the server on first use.
|
|
164
|
+
2. **Auto-approve tools (optional):** Add `"mcp__HyperDB__*"` to the `permissions.allow` array in `~/.claude/settings.json`.
|
|
165
|
+
|
|
166
|
+
#### Claude Desktop
|
|
167
|
+
|
|
168
|
+
Edit `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows). Use the base config block above.
|
|
169
|
+
|
|
170
|
+
#### Cursor
|
|
171
|
+
|
|
172
|
+
Edit `~/.cursor/mcp.json` (global) or `.cursor/mcp.json` (project root). Use the base config block above.
|
|
173
|
+
|
|
174
|
+
#### Other MCP Clients
|
|
175
|
+
|
|
176
|
+
Any tool that supports the MCP stdio transport can use this server. Point it at the `hyperdb-mcp` binary and set `HYPERD_PATH` in the environment.
|
|
177
|
+
|
|
178
|
+
---
|
|
179
|
+
|
|
180
|
+
## Operating Modes
|
|
181
|
+
|
|
182
|
+
Each session has **two databases**: an ephemeral primary (scratch space — always created fresh per session, deleted on exit) and a persistent database (queryable long-term memory — stored at the platform-default location or a path you supply, survives indefinitely). Unqualified SQL targets the ephemeral primary; the persistent database is reachable as the `"persistent"` alias.
|
|
183
|
+
|
|
184
|
+
### Hyper engine
|
|
185
|
+
|
|
186
|
+
| Mode | Flag | Behavior |
|
|
187
|
+
|---|---|---|
|
|
188
|
+
| **Shared daemon** *(default)* | *(none)* | One `hyperd` process per user, shared across all MCP clients. The first client auto-spawns the daemon; subsequent clients discover and reuse it. Idle for 30 minutes → daemon shuts itself down; the next client spawns a fresh one. |
|
|
189
|
+
| **Private hyperd** | `--no-daemon` | Each MCP client spawns its own `hyperd` (legacy behavior, one per session). |
|
|
190
|
+
|
|
191
|
+
The shared daemon is the bigger win for users running multiple AI clients (Claude Code + Cursor + VS Code) — they all share one Hyper engine instead of spawning three.
|
|
192
|
+
|
|
193
|
+
### Database storage
|
|
194
|
+
|
|
195
|
+
| Mode | Flag | Behavior |
|
|
196
|
+
|---|---|---|
|
|
197
|
+
| **Default** | *(none)* | Ephemeral primary in `$TMPDIR/hyperdb-mcp-<pid>-<n>/scratch.hyper` + persistent attachment at the platform data dir (e.g. `~/Library/Application Support/hyperdb/workspace.hyper` on macOS). |
|
|
198
|
+
| **Custom persistent path** | `--persistent-db <PATH>` | Same as default but the persistent file lives at `<PATH>`. The deprecated `--workspace <PATH>` is accepted as an alias with a stderr warning. |
|
|
199
|
+
| **Ephemeral-only** | `--ephemeral-only` | No persistent attachment; the session has only the ephemeral primary plus any user-attached databases via `attach_database`. Saved queries fall back to in-memory storage and disappear when the session ends. |
|
|
200
|
+
|
|
201
|
+
`HYPERDB_PERSISTENT_DB` overrides the default persistent path the same way `--persistent-db` does.
|
|
202
|
+
|
|
203
|
+
### Working with both databases
|
|
204
|
+
|
|
205
|
+
Tool calls default to the ephemeral primary — that's the LLM's scratch space for exploratory work that doesn't need to outlive the session. To store data in long-term memory (the persistent database), there are two ways to reach it:
|
|
206
|
+
|
|
207
|
+
**1. Per-tool `database` parameter** (preferred for ergonomic LLM workflows):
|
|
208
|
+
|
|
209
|
+
```jsonc
|
|
210
|
+
// Save a useful table to the persistent database
|
|
211
|
+
load_data({ table: "customers", data: "[...]", persist: true })
|
|
212
|
+
// ↑ shorthand for `database: "persistent"`
|
|
213
|
+
|
|
214
|
+
// Query from persistent
|
|
215
|
+
query({ sql: "SELECT * FROM customers", database: "persistent" })
|
|
216
|
+
|
|
217
|
+
// Inspect persistent tables
|
|
218
|
+
describe({ database: "persistent" })
|
|
219
|
+
sample({ table: "customers", database: "persistent" })
|
|
220
|
+
```
|
|
221
|
+
|
|
222
|
+
The `database` parameter is available on `query`, `execute`, `load_data`, `load_file`, `load_files`, `watch_directory`, `describe`, `sample`, `chart`, `export`, and `set_table_metadata`. The shorthand `persist: true` (sugar for `database: "persistent"`) is available on `load_data`, `load_file`, `load_files`, and `watch_directory`. Pass any user-attached writable alias (created via `attach_database`) to target a custom database.
|
|
223
|
+
|
|
224
|
+
(`query_data` and `query_file` are one-shot tools that materialize the inline data into their own temp table and query it — they do not accept a `database` parameter because the data isn't in a persisted database to begin with.)
|
|
225
|
+
|
|
226
|
+
**2. Fully-qualified SQL** (for power users or complex multi-DB joins):
|
|
227
|
+
|
|
228
|
+
```sql
|
|
229
|
+
-- Read from persistent
|
|
230
|
+
SELECT * FROM "persistent"."public"."customers";
|
|
231
|
+
|
|
232
|
+
-- Write to persistent
|
|
233
|
+
CREATE TABLE "persistent"."public"."revenue_2026" AS
|
|
234
|
+
SELECT region, SUM(amount) FROM scratch_orders GROUP BY region;
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
**Per-database `_table_catalog`:** every writable database — persistent and any user-attached writable file — gets its own `_table_catalog` lazily seeded on first ingest. MCP-managed metadata (load tool, params, timestamps, prose fields set via `set_table_metadata`) lives alongside the data file, so opening a `.hyper` file later as a primary workspace finds the catalog ready. If you want a pristine `.hyper` file for export with no MCP bookkeeping, run `DROP TABLE "<alias>"."public"."_table_catalog"` once and subsequent sessions opening that file will leave it dropped.
|
|
238
|
+
|
|
239
|
+
**Detach safety:** `detach_database` rejects with `InvalidArgument` if any active watcher targets the alias — call `unwatch_directory` first. This prevents the watcher's pool from silently writing into a now-detached file (or worse, the wrong file if the alias is later re-attached to a different path).
|
|
240
|
+
|
|
241
|
+
### Daemon management
|
|
242
|
+
|
|
243
|
+
The daemon is normally invisible — it auto-spawns and idle-times-out on its own. For diagnostics:
|
|
244
|
+
|
|
245
|
+
```bash
|
|
246
|
+
hyperdb-mcp daemon status # Show running daemon (PID, endpoint, started_at, version)
|
|
247
|
+
hyperdb-mcp daemon stop # Gracefully shut down the daemon
|
|
248
|
+
hyperdb-mcp daemon # Run as a daemon explicitly (rarely needed)
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
State files live at `~/.hyperdb/` by default (override with `HYPERDB_STATE_DIR`).
|
|
252
|
+
|
|
253
|
+
### Recovery from hyperd crashes
|
|
254
|
+
|
|
255
|
+
The daemon polls `hyperd` every 5 seconds. If the process has exited (crashed, OOM, killed), the daemon spawns a replacement, atomically updates `~/.hyperdb/daemon.json` with the new endpoint, and continues serving clients. Clients see one failed tool call (the request that was in flight when hyperd died); the next tool call transparently reconnects to the new hyperd via the same recovery path used for normal connection drops.
|
|
256
|
+
|
|
257
|
+
If a client itself notices hyperd is unreachable before the next polling tick, it sends a fast-path `REPORT_HYPERD_ERROR` signal to the daemon so the restart kicks off without waiting for the timer.
|
|
258
|
+
|
|
259
|
+
If hyperd repeatedly fails to start (3 attempts within 60 seconds — e.g., misconfigured `HYPERD_PATH`, port exhaustion, broken binary), the daemon shuts itself down and removes the discovery file. The next MCP client to start up will then spawn a fresh daemon, surfacing any persistent failure clearly to the user rather than spinning silently.
|
|
260
|
+
|
|
261
|
+
**Known limitation:** if hyperd hangs (alive at the OS level but unresponsive to queries), the daemon's polling can't detect it and your tool call may stall indefinitely. The recovery path is `hyperdb-mcp daemon stop` followed by reconnecting from your MCP client.
|
|
262
|
+
|
|
263
|
+
### Other behavioral flags
|
|
264
|
+
|
|
265
|
+
| Flag | Behavior |
|
|
266
|
+
|---|---|
|
|
267
|
+
| `--read-only` | Disables `execute`, `load_data`, `load_file`, `watch_directory`, `save_query`, `delete_query`, and Hyper-format export. See [Read-Only Mode](#read-only-mode). |
|
|
268
|
+
|
|
269
|
+
---
|
|
270
|
+
|
|
271
|
+
## MCP Tools
|
|
272
|
+
|
|
273
|
+
### One-Shot Tools
|
|
274
|
+
|
|
275
|
+
#### `query_data`
|
|
276
|
+
|
|
277
|
+
Ingest inline data and run a SQL query in a single call.
|
|
278
|
+
|
|
279
|
+
```
|
|
280
|
+
query_data(data: '[{"region":"West","revenue":1200},...]', sql: 'SELECT region, SUM(revenue) FROM data GROUP BY region')
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
| Parameter | Type | Required | Description |
|
|
284
|
+
|-----------|------|----------|-------------|
|
|
285
|
+
| `data` | string | yes | JSON array of objects, or CSV text |
|
|
286
|
+
| `sql` | string | yes | SQL query to run against the data |
|
|
287
|
+
| `format` | string | no | `"json"` or `"csv"` — auto-detected if omitted |
|
|
288
|
+
| `table_name` | string | no | Table name for use in SQL — defaults to `"data"` |
|
|
289
|
+
| `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
|
|
290
|
+
|
|
291
|
+
#### `query_file`
|
|
292
|
+
|
|
293
|
+
Ingest a file and run a SQL query in a single call. Streams from disk — handles files of any size.
|
|
294
|
+
|
|
295
|
+
```
|
|
296
|
+
query_file(path: '/tmp/sales.parquet', sql: 'SELECT TOP 10 * FROM sales ORDER BY amount DESC')
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
| Parameter | Type | Required | Description |
|
|
300
|
+
|-----------|------|----------|-------------|
|
|
301
|
+
| `path` | string | yes | Path to CSV / JSON / JSONL / Parquet / Arrow IPC file |
|
|
302
|
+
| `sql` | string | yes | SQL query to run |
|
|
303
|
+
| `table_name` | string | no | Table name — defaults to filename stem |
|
|
304
|
+
| `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
|
|
305
|
+
|
|
306
|
+
### Workspace Tools
|
|
307
|
+
|
|
308
|
+
#### `load_data`
|
|
309
|
+
|
|
310
|
+
Load inline data into a named workspace table.
|
|
311
|
+
|
|
312
|
+
```
|
|
313
|
+
load_data(table: 'customers', data: '[{"id":1,"name":"Alice"},...]')
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
| Parameter | Type | Required | Description |
|
|
317
|
+
|-----------|------|----------|-------------|
|
|
318
|
+
| `table` | string | yes | Table name |
|
|
319
|
+
| `data` | string | yes | JSON array of objects, or CSV text |
|
|
320
|
+
| `format` | string | no | `"json"` or `"csv"` — auto-detected |
|
|
321
|
+
| `mode` | string | no | `"replace"` (default) or `"append"` |
|
|
322
|
+
| `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
|
|
323
|
+
|
|
324
|
+
#### `load_file`
|
|
325
|
+
|
|
326
|
+
Load a file into a named workspace table.
|
|
327
|
+
|
|
328
|
+
```
|
|
329
|
+
load_file(table: 'orders', path: '/tmp/orders.csv')
|
|
330
|
+
```
|
|
331
|
+
|
|
332
|
+
| Parameter | Type | Required | Description |
|
|
333
|
+
|-----------|------|----------|-------------|
|
|
334
|
+
| `table` | string | yes | Table name |
|
|
335
|
+
| `path` | string | yes | Path to CSV / JSON / JSONL / Parquet / Arrow IPC file |
|
|
336
|
+
| `mode` | string | no | `"replace"` (default) or `"append"` |
|
|
337
|
+
| `schema` | object | no | Partial column-name → type map (see [Schema Overrides](#schema-overrides)) |
|
|
338
|
+
|
|
339
|
+
When you're unsure of the right types — or recovering from a previous
|
|
340
|
+
`SCHEMA_MISMATCH` — call [`inspect_file`](#inspect-file) first. It reports the
|
|
341
|
+
exact schema `load_file` would use plus per-column `min` / `max` / `null_count`
|
|
342
|
+
so you can build a minimal, correct override in one shot.
|
|
343
|
+
|
|
344
|
+
#### `load_iceberg`
|
|
345
|
+
|
|
346
|
+
Load an [Apache Iceberg](https://iceberg.apache.org/) table into a named
|
|
347
|
+
workspace table. Pass the absolute path to the Iceberg table root (the
|
|
348
|
+
directory containing `metadata/` and `data/`); hyperd's native Iceberg
|
|
349
|
+
reader derives the schema and resolves the snapshot.
|
|
350
|
+
|
|
351
|
+
```
|
|
352
|
+
load_iceberg(table: 'sales', path: '/lake/warehouse/db/sales')
|
|
353
|
+
```
|
|
354
|
+
|
|
355
|
+
| Parameter | Type | Required | Description |
|
|
356
|
+
|-----------|------|----------|-------------|
|
|
357
|
+
| `table` | string | yes | Target Hyper table name |
|
|
358
|
+
| `path` | string | yes | Absolute path to the Iceberg table root directory |
|
|
359
|
+
| `mode` | string | no | `"replace"` (default) or `"append"` |
|
|
360
|
+
| `metadata_filename` | string | no | Pin a specific snapshot, e.g. `"v2.metadata.json"`. Omit for latest. |
|
|
361
|
+
| `version_as_of` | integer | no | Pin a snapshot by version number |
|
|
362
|
+
|
|
363
|
+
Schema overrides are not accepted — hyperd derives the schema from the
|
|
364
|
+
Iceberg table metadata.
|
|
365
|
+
|
|
366
|
+
#### `query`
|
|
367
|
+
|
|
368
|
+
Run a **read-only** SQL query against the workspace. Accepts `SELECT`, `WITH`, `EXPLAIN`, `SHOW`, `VALUES`. For DDL/DML use `execute`.
|
|
369
|
+
|
|
370
|
+
```
|
|
371
|
+
query(sql: 'SELECT c.name, SUM(o.amount) FROM orders o JOIN customers c ON o.customer_id = c.id GROUP BY c.name')
|
|
372
|
+
```
|
|
373
|
+
|
|
374
|
+
#### `execute`
|
|
375
|
+
|
|
376
|
+
Execute a **mutating** SQL statement: `CREATE TABLE`, `INSERT`, `UPDATE`, `DELETE`, `DROP TABLE`, `ALTER`, `COPY`, etc. Returns the affected row count. Disabled in read-only mode.
|
|
377
|
+
|
|
378
|
+
```
|
|
379
|
+
execute(sql: 'CREATE TABLE archived_orders AS SELECT * FROM orders WHERE year < 2024')
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
#### `describe`
|
|
383
|
+
|
|
384
|
+
List all workspace tables with their schemas, column types, and row counts.
|
|
385
|
+
|
|
386
|
+
#### `sample`
|
|
387
|
+
|
|
388
|
+
Return the schema, total row count, and first N rows of a table in a single call.
|
|
389
|
+
|
|
390
|
+
```
|
|
391
|
+
sample(table: 'orders', n: 10)
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
| Parameter | Type | Required | Description |
|
|
395
|
+
|-----------|------|----------|-------------|
|
|
396
|
+
| `table` | string | yes | Table name |
|
|
397
|
+
| `n` | int | no | Rows to return (default: 5, clamped to 1..=100) |
|
|
398
|
+
|
|
399
|
+
### Diagnostics
|
|
400
|
+
|
|
401
|
+
#### `inspect_file`
|
|
402
|
+
|
|
403
|
+
Dry-run schema inference on a CSV, Parquet, or Arrow IPC file **without ingesting
|
|
404
|
+
it**. Returns the exact schema `load_file` / `query_file` would use (including
|
|
405
|
+
the full-file numeric widening pass) plus per-column `min`, `max`, `null_count`,
|
|
406
|
+
and `sample_values`. Nothing is written to Hyper and `hyperd` is not even
|
|
407
|
+
started.
|
|
408
|
+
|
|
409
|
+
Use it **before** `load_file` whenever you are unsure about types, or **after** a
|
|
410
|
+
`SCHEMA_MISMATCH` failure to pick the right widening. The LLM can feed the
|
|
411
|
+
reported `type` + `min` / `max` directly into a partial `schema` override on the
|
|
412
|
+
subsequent `load_file` call.
|
|
413
|
+
|
|
414
|
+
```
|
|
415
|
+
inspect_file(path: '/tmp/owid-population.csv')
|
|
416
|
+
```
|
|
417
|
+
|
|
418
|
+
| Parameter | Type | Required | Description |
|
|
419
|
+
|-----------|------|----------|-------------|
|
|
420
|
+
| `path` | string | yes | Path to CSV / JSON / JSONL / Parquet / Arrow IPC file |
|
|
421
|
+
| `sample_rows` | int | no | Sample values / rows per column (default 5, clamped 1..=50) |
|
|
422
|
+
|
|
423
|
+
Response shape:
|
|
424
|
+
|
|
425
|
+
```json
|
|
426
|
+
{
|
|
427
|
+
"file_format": "csv",
|
|
428
|
+
"row_count": 63000,
|
|
429
|
+
"file_size_bytes": 4831204,
|
|
430
|
+
"columns": [
|
|
431
|
+
{ "name": "Entity", "type": "TEXT", "nullable": true, "null_count": 0, "sample_values": ["Afghanistan", ...] },
|
|
432
|
+
{ "name": "Year", "type": "INT", "nullable": true, "null_count": 0, "min": 1800, "max": 2023, "sample_values": ["1800", ...] },
|
|
433
|
+
{ "name": "Population", "type": "BIGINT", "nullable": true, "null_count": 12, "min": 500, "max": 8002572256, "sample_values": ["4000000", ...] }
|
|
434
|
+
],
|
|
435
|
+
"sample_rows": [ { "Entity": "Afghanistan", "Year": "1800", "Population": "2805829" } ]
|
|
436
|
+
}
|
|
437
|
+
```
|
|
438
|
+
|
|
439
|
+
`sample_values` and `sample_rows` are **always strings**, regardless of the inferred column `type` — they report what the file contains on disk, before any type coercion, so the LLM can compare the raw text against `min` / `max` when building a `schema` override. Use `type` (and `min` / `max`) for the typed view; use `sample_values` for the raw view.
|
|
440
|
+
|
|
441
|
+
### Saved Queries
|
|
442
|
+
|
|
443
|
+
Register a named read-only SQL query once; read its live result as many
|
|
444
|
+
times as you like via a resource URI. Useful for dashboard-style recurring
|
|
445
|
+
views and for giving LLMs a stable "bookmark" set of key queries that
|
|
446
|
+
resources/list advertises up front.
|
|
447
|
+
|
|
448
|
+
Each saved query produces **two** resources:
|
|
449
|
+
|
|
450
|
+
- `hyper://queries/{name}/definition` — the stored SQL plus metadata
|
|
451
|
+
(description, `created_at`) as JSON.
|
|
452
|
+
- `hyper://queries/{name}/result` — re-runs the SQL on every read and
|
|
453
|
+
returns `{ name, result: [...], stats: {...} }`.
|
|
454
|
+
|
|
455
|
+
**Persistence:** saved queries land in the persistent attachment's
|
|
456
|
+
`_hyperdb_saved_queries` meta-table (`"persistent"."public"."_hyperdb_saved_queries"`)
|
|
457
|
+
and survive server restarts. In `--ephemeral-only` sessions they live
|
|
458
|
+
only for the lifetime of the server process.
|
|
459
|
+
|
|
460
|
+
#### `save_query`
|
|
461
|
+
|
|
462
|
+
```
|
|
463
|
+
save_query(name: 'top_5_customers', sql: 'SELECT customer, SUM(amount) AS total FROM orders GROUP BY customer ORDER BY total DESC LIMIT 5', description: 'Biggest spenders this year')
|
|
464
|
+
```
|
|
465
|
+
|
|
466
|
+
| Parameter | Type | Required | Description |
|
|
467
|
+
|---|---|---|---|
|
|
468
|
+
| `name` | string | yes | Unique identifier used as the URI path component |
|
|
469
|
+
| `sql` | string | yes | Read-only SQL (SELECT / WITH / EXPLAIN / SHOW / VALUES) |
|
|
470
|
+
| `description` | string | no | Human-friendly summary |
|
|
471
|
+
|
|
472
|
+
Duplicate names are rejected with `INVALID_ARGUMENT` — use `delete_query`
|
|
473
|
+
first if you intend to overwrite. Non-read-only SQL is rejected with
|
|
474
|
+
`SQL_ERROR`. Disabled in read-only mode.
|
|
475
|
+
|
|
476
|
+
#### `delete_query`
|
|
477
|
+
|
|
478
|
+
```
|
|
479
|
+
delete_query(name: 'top_5_customers')
|
|
480
|
+
```
|
|
481
|
+
|
|
482
|
+
| Parameter | Type | Required | Description |
|
|
483
|
+
|---|---|---|---|
|
|
484
|
+
| `name` | string | yes | Name of the saved query to remove |
|
|
485
|
+
|
|
486
|
+
Returns `{ "deleted": true }` when the query existed, `{ "deleted": false }`
|
|
487
|
+
when it did not (no error on unknown names). Disabled in read-only mode.
|
|
488
|
+
|
|
489
|
+
### Export Tools
|
|
490
|
+
|
|
491
|
+
#### `export`
|
|
492
|
+
|
|
493
|
+
Write query results or a table to a file.
|
|
494
|
+
|
|
495
|
+
```
|
|
496
|
+
export(table: 'orders', path: '~/Desktop/orders.parquet', format: 'parquet')
|
|
497
|
+
export(sql: 'SELECT ...', path: '~/Desktop/analysis.hyper', format: 'hyper')
|
|
498
|
+
```
|
|
499
|
+
|
|
500
|
+
| Parameter | Type | Required | Description |
|
|
501
|
+
|-----------|------|----------|-------------|
|
|
502
|
+
| `sql` | string | no | Query to export (if omitted, exports whole table) |
|
|
503
|
+
| `table` | string | no | Table name (used if `sql` omitted) |
|
|
504
|
+
| `path` | string | yes | Output file path |
|
|
505
|
+
| `format` | string | yes | `"csv"`, `"parquet"`, `"iceberg"`, `"arrow_ipc"`, or `"hyper"` |
|
|
506
|
+
|
|
507
|
+
The `"hyper"` format produces a `.hyper` file that opens directly in **Tableau Desktop**.
|
|
508
|
+
|
|
509
|
+
### Visualization
|
|
510
|
+
|
|
511
|
+
#### `chart`
|
|
512
|
+
|
|
513
|
+
Render a chart from a SQL query and return it inline as an image.
|
|
514
|
+
|
|
515
|
+
```
|
|
516
|
+
chart(sql: 'SELECT product, SUM(revenue) as total FROM sales GROUP BY product', chart_type: 'bar', x: 'product', y: 'total', title: 'Revenue by Product')
|
|
517
|
+
```
|
|
518
|
+
|
|
519
|
+
| Parameter | Type | Required | Description |
|
|
520
|
+
|-----------|------|----------|-------------|
|
|
521
|
+
| `sql` | string | yes | Read-only SQL query returning the data to plot |
|
|
522
|
+
| `chart_type` | string | yes | `bar`, `line`, `scatter`, or `histogram` |
|
|
523
|
+
| `x` | string | yes* | X-axis column (for histogram, the value column) |
|
|
524
|
+
| `y` | string | yes* | Y-axis column (not required for histogram) |
|
|
525
|
+
| `series` | string | no | Grouping column for multi-series plots |
|
|
526
|
+
| `title` | string | no | Chart title |
|
|
527
|
+
| `format` | string | no | `png` (default) or `svg` |
|
|
528
|
+
| `width` | int | no | Pixels (default 800, clamped 200..4096) |
|
|
529
|
+
| `height` | int | no | Pixels (default 480, clamped 150..4096) |
|
|
530
|
+
| `bins` | int | no | Histogram bins (default 20, clamped 1..500) |
|
|
531
|
+
|
|
532
|
+
Returns an `ImageContent` (base64 PNG or SVG) plus a stats JSON block.
|
|
533
|
+
|
|
534
|
+
### Incremental Ingest
|
|
535
|
+
|
|
536
|
+
#### `watch_directory` / `unwatch_directory`
|
|
537
|
+
|
|
538
|
+
Monitor a directory for data files and auto-append them to a target table.
|
|
539
|
+
|
|
540
|
+
```
|
|
541
|
+
watch_directory(path: '/tmp/inbox', table: 'events')
|
|
542
|
+
unwatch_directory(path: '/tmp/inbox')
|
|
543
|
+
```
|
|
544
|
+
|
|
545
|
+
**Producer protocol (`.ready` sentinel):**
|
|
546
|
+
|
|
547
|
+
1. Write data file (e.g. `foo.csv`) and close it.
|
|
548
|
+
2. Create a zero-byte companion `foo.csv.ready` — this is the atomic signal.
|
|
549
|
+
3. Poll for the absence of `foo.csv.ready` to confirm the watcher is done.
|
|
550
|
+
|
|
551
|
+
On success, both files are deleted. On failure, both are moved to `failed/` with a `.error` JSON file.
|
|
552
|
+
|
|
553
|
+
Key properties:
|
|
554
|
+
- **One directory, one table, append mode** — files must match the target schema.
|
|
555
|
+
- **Initial sweep** — pre-existing `.ready` files are processed immediately.
|
|
556
|
+
- **Read-only mode** — `watch_directory` is blocked; `unwatch_directory` is always allowed.
|
|
557
|
+
- **Cleanup** — dropping the server or calling `unwatch_directory` terminates the background thread.
|
|
558
|
+
|
|
559
|
+
### Utility Tools
|
|
560
|
+
|
|
561
|
+
#### `status`
|
|
562
|
+
|
|
563
|
+
Returns plugin health, workspace mode, table count, total rows, disk usage, read-only flag, and active directory watchers with per-watcher stats.
|
|
564
|
+
|
|
565
|
+
---
|
|
566
|
+
|
|
567
|
+
## MCP Resources
|
|
568
|
+
|
|
569
|
+
The server exposes workspace state as MCP **Resources**, discoverable via
|
|
570
|
+
`resources/list`. Each resource advertises its own MIME type so clients
|
|
571
|
+
can route it appropriately (LLM context vs. file download vs. chart).
|
|
572
|
+
|
|
573
|
+
| URI | MIME | Content |
|
|
574
|
+
|-----|------|---------|
|
|
575
|
+
| `hyper://workspace` | `application/json` | Workspace mode, table count, total rows, disk usage |
|
|
576
|
+
| `hyper://tables` | `application/json` | Full list of tables with schemas and row counts |
|
|
577
|
+
| `hyper://readme` | `text/markdown` | Workspace overview as markdown: table catalog, related resources per table, and tool hints for a cold-started LLM |
|
|
578
|
+
| `hyper://tables/{name}/schema` | `application/json` | Columns, types, nullability, and row count for one table |
|
|
579
|
+
| `hyper://tables/{name}/sample` | `application/json` | First 5 rows of a table as JSON, with schema |
|
|
580
|
+
| `hyper://tables/{name}/csv-sample` | `text/csv` | First 20 rows of a table as CSV, header-first |
|
|
581
|
+
| `hyper://queries/{name}/definition` | `application/json` | Stored SQL + metadata for a saved query |
|
|
582
|
+
| `hyper://queries/{name}/result` | `application/json` | Live result of a saved query — re-runs on every read |
|
|
583
|
+
|
|
584
|
+
Resource templates (discoverable via `resources/templates/list`):
|
|
585
|
+
|
|
586
|
+
- `hyper://tables/{name}/schema`
|
|
587
|
+
- `hyper://tables/{name}/sample`
|
|
588
|
+
- `hyper://tables/{name}/csv-sample`
|
|
589
|
+
- `hyper://queries/{name}/definition`
|
|
590
|
+
- `hyper://queries/{name}/result`
|
|
591
|
+
|
|
592
|
+
The internal `_hyperdb_saved_queries` meta-table used to persist saved
|
|
593
|
+
queries is deliberately hidden from `resources/list` and
|
|
594
|
+
`hyper://tables` — callers see only user-visible data tables.
|
|
595
|
+
|
|
596
|
+
### Resource-update notifications
|
|
597
|
+
|
|
598
|
+
HyperDB advertises both the `resources.subscribe` and
|
|
599
|
+
`resources.listChanged` capabilities in its `initialize` response. Clients
|
|
600
|
+
can subscribe to any `hyper://...` URI via `resources/subscribe` and will
|
|
601
|
+
then receive `notifications/resources/updated` messages whenever the
|
|
602
|
+
server detects a change, without polling.
|
|
603
|
+
|
|
604
|
+
The server fires **targeted** updates for the URIs affected by each kind
|
|
605
|
+
of mutation:
|
|
606
|
+
|
|
607
|
+
| Trigger | Updated URIs | `resources/list_changed`? |
|
|
608
|
+
|---|---|---|
|
|
609
|
+
| `load_data` / `load_file` (replace mode) | `hyper://workspace`, `hyper://tables`, `hyper://readme`, per-table schema + sample + csv-sample | Yes |
|
|
610
|
+
| `load_data` / `load_file` (append mode) | Same per-table + summary URIs | No ¹ |
|
|
611
|
+
| `watch_directory` ingest of a `.ready` pair | Same per-table + summary URIs | No ¹ |
|
|
612
|
+
| `execute` (INSERT / UPDATE / DELETE) | Workspace summary URIs | No |
|
|
613
|
+
| `execute` (CREATE / DROP / ALTER / TRUNCATE / RENAME) | Workspace summary URIs | Yes |
|
|
614
|
+
| `save_query` | (none per-URI) | Yes — two new `hyper://queries/{name}/...` resources |
|
|
615
|
+
| `delete_query` | `hyper://queries/{name}/definition`, `hyper://queries/{name}/result` | Yes — two resources disappeared |
|
|
616
|
+
|
|
617
|
+
¹ Append-mode ingest (both `load_*` and the watcher) auto-creates the target table when it doesn't exist, but **does not** fire `list_changed` for that creation. Clients that need to discover watcher-created tables should re-read `hyper://tables` after subscribing, or use the per-table `updated` notification as a trigger to refresh their list. Tracked in `DEVELOPMENT.md` as tech debt.
|
|
618
|
+
|
|
619
|
+
Notifications are fire-and-forget — send failures (typically due to a
|
|
620
|
+
client disconnect) are logged at the `debug` level and the registry
|
|
621
|
+
prunes dead peers lazily. This keeps mutation paths fast and free of
|
|
622
|
+
back-pressure concerns.
|
|
623
|
+
|
|
624
|
+
All JSON-typed resources return a pretty-printed object; Markdown and
|
|
625
|
+
CSV resources are returned verbatim.
|
|
626
|
+
|
|
627
|
+
---
|
|
628
|
+
|
|
629
|
+
## MCP Prompts
|
|
630
|
+
|
|
631
|
+
Four guided analytical workflows registered as MCP **Prompts**.
|
|
632
|
+
|
|
633
|
+
| Prompt | Arguments | What it does |
|
|
634
|
+
|--------|-----------|--------------|
|
|
635
|
+
| `analyze-table` | `table` | Schema walkthrough, column statistics, data quality flags |
|
|
636
|
+
| `compare-tables` | `table_a`, `table_b` | Schema alignment, JOIN key suggestions, analytical opportunities |
|
|
637
|
+
| `data-quality` | `table` | Systematic NULL / duplicate / cardinality / outlier checks |
|
|
638
|
+
| `suggest-queries` | `table`, `goal?` | 5 analytical SQL queries with explanations, optionally goal-guided |
|
|
639
|
+
|
|
640
|
+
---
|
|
641
|
+
|
|
642
|
+
## Read-Only Mode
|
|
643
|
+
|
|
644
|
+
```bash
|
|
645
|
+
hyperdb-mcp --persistent-db ~/analytics.hyper --read-only
|
|
646
|
+
```
|
|
647
|
+
|
|
648
|
+
- **Allowed:** `query`, `query_data`, `query_file`, `describe`, `sample`, `inspect_file`, `status`, `export`
|
|
649
|
+
- **Blocked:** `execute`, `load_data`, `load_file`, `watch_directory`, `save_query`, `delete_query` — return `READ_ONLY_VIOLATION`
|
|
650
|
+
- **Resources, prompts, and resource subscriptions** work normally — read-only clients can still subscribe to `hyper://...` URIs and receive notifications when other (non-read-only) connections mutate state
|
|
651
|
+
|
|
652
|
+
The `query` tool also enforces read-only at the SQL level — only `SELECT`/`WITH`/`EXPLAIN`/`SHOW`/`VALUES` are accepted.
|
|
653
|
+
|
|
654
|
+
---
|
|
655
|
+
|
|
656
|
+
## Data Flow Patterns
|
|
657
|
+
|
|
658
|
+
- **Small data (LLM relay):** For <10K rows. The LLM gets data from another plugin and passes it inline via `query_data`.
|
|
659
|
+
- **Large data (file intermediary):** For thousands to billions of rows. Source plugin exports to a file, the LLM calls `query_file`. Data never enters the LLM context — constant memory regardless of file size.
|
|
660
|
+
|
|
661
|
+
---
|
|
662
|
+
|
|
663
|
+
## Schema Inference
|
|
664
|
+
|
|
665
|
+
Three tiers, chosen automatically based on the data source:
|
|
666
|
+
|
|
667
|
+
| Tier | Source | How |
|
|
668
|
+
|------|--------|-----|
|
|
669
|
+
| **Exact** | Arrow IPC, Parquet | Schema read from file metadata. Types preserved exactly. |
|
|
670
|
+
| **Structural** | JSON | All objects scanned. Per-column type widening: Int → BigInt → Double. Mixed types → TEXT. |
|
|
671
|
+
| **Heuristic** | CSV | Header row for names, first 1,000 rows sampled for types. A second full-file streaming pass then **widens** numeric columns if needed (INT → BIGINT → NUMERIC(38,0); INT/BIGINT → DOUBLE PRECISION if any later row contains a decimal). |
|
|
672
|
+
|
|
673
|
+
**JSON file shapes.** `load_file` and `query_file` accept two JSON
|
|
674
|
+
representations and auto-detect between them from the first non-whitespace
|
|
675
|
+
byte: a top-level JSON array of objects (e.g. `[{...}, {...}]`) or
|
|
676
|
+
newline-delimited JSON (JSONL / NDJSON — one JSON object per line, the
|
|
677
|
+
format hyperd's own logs use). Blank lines are tolerated. Malformed
|
|
678
|
+
JSONL surfaces a `SCHEMA_MISMATCH` error naming the offending line
|
|
679
|
+
number.
|
|
680
|
+
|
|
681
|
+
**Content sniffing for unknown extensions.** Files with extensions the
|
|
682
|
+
dispatcher doesn't recognize (`.log`, `.txt`, no extension at all) are
|
|
683
|
+
classified by peeking at the first non-whitespace byte: `[` or `{`
|
|
684
|
+
routes to JSON, anything else to CSV. This means hyperd's raw `.log`
|
|
685
|
+
files load through `load_file` directly, no rename or preprocessing
|
|
686
|
+
required. Binary formats (`.parquet`, `.arrow`, `.ipc`, `.feather`,
|
|
687
|
+
`.pq`) always win by extension since they're not text-sniffable.
|
|
688
|
+
`inspect_file` uses the exact same dispatcher so its report can never
|
|
689
|
+
disagree with what `load_file` would do.
|
|
690
|
+
|
|
691
|
+
**CSV NULL handling.** Unquoted empty cells (`,,`) load as SQL NULL —
|
|
692
|
+
matching PostgreSQL's CSV convention and `inspect_file`'s `null_count`
|
|
693
|
+
diagnostics. Quoted empty strings (`,"",`) load as the literal empty
|
|
694
|
+
string. This means downstream `WHERE col IS NULL` works directly without
|
|
695
|
+
a defensive `OR col = ''` clause.
|
|
696
|
+
|
|
697
|
+
The full-file CSV widening pass specifically protects against the "big value
|
|
698
|
+
hidden at the end of the file" failure mode — e.g. an aggregate row whose
|
|
699
|
+
`population` is ~8 billion tucked in after 60 000 country-sized rows. Without
|
|
700
|
+
it, the first-pass sample would pick `INT` and the COPY would fail with
|
|
701
|
+
`SCHEMA_MISMATCH` / SQLSTATE 22003 mid-ingest.
|
|
702
|
+
|
|
703
|
+
For implementation details (widening rules, type mapping tables), see the
|
|
704
|
+
module docs in `src/schema.rs` and `src/ingest_arrow.rs`.
|
|
705
|
+
|
|
706
|
+
### Schema Overrides
|
|
707
|
+
|
|
708
|
+
Every data-in tool (`query_data`, `query_file`, `load_data`, `load_file`)
|
|
709
|
+
accepts an optional `schema` parameter: a **partial** map from column name to
|
|
710
|
+
Hyper SQL type.
|
|
711
|
+
|
|
712
|
+
```json
|
|
713
|
+
{ "schema": { "population": "BIGINT", "order_date": "DATE" } }
|
|
714
|
+
```
|
|
715
|
+
|
|
716
|
+
Semantics:
|
|
717
|
+
|
|
718
|
+
- Keys are matched to columns **by name** (case-sensitive). Column order in
|
|
719
|
+
the JSON object does not need to match the file — the inferred order from
|
|
720
|
+
the file is preserved.
|
|
721
|
+
- Columns **not** listed in the override keep their inferred type. You only
|
|
722
|
+
specify the columns you want to correct.
|
|
723
|
+
- Unknown column names and unknown type strings are rejected up front with a
|
|
724
|
+
`SCHEMA_MISMATCH` error that lists the real column names, so the LLM can
|
|
725
|
+
self-correct without another round-trip.
|
|
726
|
+
- Supported type strings: `INT`, `BIGINT`, `NUMERIC(p,s)` (e.g.
|
|
727
|
+
`NUMERIC(38,0)` or `NUMERIC(12,2)`), `DOUBLE PRECISION`, `TEXT`, `BOOL`,
|
|
728
|
+
`DATE`, `TIMESTAMP`.
|
|
729
|
+
|
|
730
|
+
**Recommended workflow for unfamiliar data:**
|
|
731
|
+
|
|
732
|
+
1. Call `inspect_file` → read the reported `type` + `min` / `max` per column.
|
|
733
|
+
2. For any column whose `max` exceeds its inferred type's range, or where
|
|
734
|
+
you want stricter parsing than CSV heuristics give, build a partial
|
|
735
|
+
override.
|
|
736
|
+
3. Pass it to `load_file` / `query_file`.
|
|
737
|
+
|
|
738
|
+
---
|
|
739
|
+
|
|
740
|
+
## SQL Dialect
|
|
741
|
+
|
|
742
|
+
Hyper uses the Salesforce Data Cloud SQL dialect (PostgreSQL-compatible with extensions). Supports `SELECT`, JOINs, subqueries, CTEs, window functions, aggregations, DDL, DML, and `COPY FROM`.
|
|
743
|
+
|
|
744
|
+
Full reference: [Data Cloud SQL Reference](https://developer.salesforce.com/docs/data/data-cloud-query-guide/references/dc-sql-reference/data-cloud-sql-context.html)
|
|
745
|
+
|
|
746
|
+
---
|
|
747
|
+
|
|
748
|
+
## CLI Reference
|
|
749
|
+
|
|
750
|
+
```
|
|
751
|
+
hyperdb-mcp [OPTIONS] [COMMAND]
|
|
752
|
+
|
|
753
|
+
Commands:
|
|
754
|
+
daemon Run as a background daemon managing a shared hyperd process
|
|
755
|
+
|
|
756
|
+
Options:
|
|
757
|
+
--persistent-db <PATH> Path to the persistent .hyper file. Defaults to the platform
|
|
758
|
+
data dir (~/Library/Application Support/hyperdb/workspace.hyper
|
|
759
|
+
on macOS, ~/.local/share/hyperdb/workspace.hyper on Linux,
|
|
760
|
+
%APPDATA%\hyperdb\workspace.hyper on Windows). Override via
|
|
761
|
+
the HYPERDB_PERSISTENT_DB env var.
|
|
762
|
+
--ephemeral-only Skip the persistent attachment entirely. Disables save_query
|
|
763
|
+
persistence (queries fall back to session storage).
|
|
764
|
+
--read-only Disable mutating tools (execute, load_data, load_file,
|
|
765
|
+
save_query, delete_query, watch_directory)
|
|
766
|
+
--no-daemon Disable the shared daemon and spawn a private hyperd
|
|
767
|
+
|
|
768
|
+
Deprecated:
|
|
769
|
+
--workspace <PATH> Old name for --persistent-db. Still accepted, emits a
|
|
770
|
+
stderr warning, and will be removed in a future release.
|
|
771
|
+
|
|
772
|
+
Daemon subcommand:
|
|
773
|
+
hyperdb-mcp daemon Start the daemon (usually auto-spawned)
|
|
774
|
+
hyperdb-mcp daemon stop Gracefully stop the running daemon
|
|
775
|
+
hyperdb-mcp daemon status Show running daemon info
|
|
776
|
+
hyperdb-mcp daemon --port <PORT> Override the health/lock port (default 7484)
|
|
777
|
+
hyperdb-mcp daemon --idle-timeout <SECS> Override idle timeout (default 1800 = 30 min)
|
|
778
|
+
|
|
779
|
+
Environment:
|
|
780
|
+
HYPERD_PATH Path to hyperd binary (auto-detected if on PATH)
|
|
781
|
+
HYPERDB_PERSISTENT_DB Override the default persistent-db path
|
|
782
|
+
HYPERDB_STATE_DIR Override daemon state directory (default ~/.hyperdb/)
|
|
783
|
+
HYPERDB_DAEMON_PORT Override daemon health/lock port (default 7484)
|
|
784
|
+
HYPERDB_DAEMON_IDLE_TIMEOUT Override daemon idle timeout in seconds (default 1800)
|
|
785
|
+
```
|
|
786
|
+
|
|
787
|
+
---
|
|
788
|
+
|
|
789
|
+
## Error Handling
|
|
790
|
+
|
|
791
|
+
Errors include a machine-readable code and a suggestion:
|
|
792
|
+
|
|
793
|
+
| Code | When | Recovery |
|
|
794
|
+
|---|---|---|
|
|
795
|
+
| `HYPERD_NOT_FOUND` | `hyperd` not found | Set `HYPERD_PATH` or install Hyper |
|
|
796
|
+
| `FILE_NOT_FOUND` | File path doesn't exist | Verify the path |
|
|
797
|
+
| `UNSUPPORTED_FORMAT` | Unrecognized file type | Specify `format` explicitly |
|
|
798
|
+
| `SCHEMA_MISMATCH` | Data doesn't match inferred types, numeric overflow (SQLSTATE 22003), or invalid text for target type (SQLSTATE 22P02) | Call `inspect_file` then retry with a partial `schema` override (e.g. `{"population":"BIGINT"}` or `{"id":"TEXT"}`) |
|
|
799
|
+
| `SQL_ERROR` | Invalid SQL | Fix the query |
|
|
800
|
+
| `TABLE_NOT_FOUND` | Table doesn't exist | Use `describe` to list tables |
|
|
801
|
+
| `READ_ONLY_VIOLATION` | Mutating op in read-only mode | Use `query_*` / `inspect_file`, or restart without `--read-only` |
|
|
802
|
+
| `CONNECTION_LOST` | `hyperd` crashed or wire protocol desynchronized | Retry — the server tears down the engine and reconnects on the next call |
|
|
803
|
+
|
|
804
|
+
Server-returned errors include a machine-readable `code`, a `message`, and a
|
|
805
|
+
`suggestion` with concrete retry guidance. The `SCHEMA_MISMATCH` suggestion for
|
|
806
|
+
an overflow names the workflow directly: "call `inspect_file`, then retry with
|
|
807
|
+
a partial schema override", so the LLM does not need to infer the recovery
|
|
808
|
+
steps from the SQLSTATE alone.
|
|
809
|
+
|
|
810
|
+
---
|
|
811
|
+
|
|
812
|
+
## Troubleshooting
|
|
813
|
+
|
|
814
|
+
**Tools not discovered by the client** — Verify the `initialize` response advertises `"capabilities": {"tools": {}}`. Pipe a raw `initialize` JSON-RPC request to the binary to check.
|
|
815
|
+
|
|
816
|
+
**Server registered but tools not callable (Claude Code)** — Add `"mcp__HyperDB__*"` to the `permissions.allow` array in `~/.claude/settings.json`.
|
|
817
|
+
|
|
818
|
+
**hyperd not found** — Set `HYPERD_PATH` in the MCP server's `env` config, or place `hyperd` on your `PATH`.
|
|
819
|
+
|
|
820
|
+
---
|
|
821
|
+
|
|
822
|
+
## Related Documentation
|
|
823
|
+
|
|
824
|
+
- **[Main README](../README.md)** — Getting started with the Hyper API
|
|
825
|
+
- **[hyperdb-api](../hyperdb-api/)** — Core Rust API (sync/async connections, inserter, query)
|
|
826
|
+
- **[DEVELOPMENT.md](DEVELOPMENT.md)** — Internal architecture, design decisions, contributor guide
|
|
827
|
+
- **[ROADMAP.md](ROADMAP.md)** — Forward-looking design sketches for features that aren't built yet
|
|
828
|
+
- **[Design Spec](../docs/specs/hyperdb-mcp-design.md)** — Full design document
|
package/package.json
CHANGED
|
@@ -1,17 +1,18 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "hyperdb-mcp",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.2.1",
|
|
4
4
|
"description": "HyperDB MCP server — instant SQL analytics for LLM workflows",
|
|
5
5
|
"bin": {
|
|
6
6
|
"hyperdb-mcp": "bin.js"
|
|
7
7
|
},
|
|
8
8
|
"optionalDependencies": {
|
|
9
|
-
"hyperdb-mcp-darwin-arm64": "0.
|
|
10
|
-
"hyperdb-mcp-linux-x64-gnu": "0.
|
|
11
|
-
"hyperdb-mcp-win32-x64-msvc": "0.
|
|
9
|
+
"hyperdb-mcp-darwin-arm64": "0.2.1",
|
|
10
|
+
"hyperdb-mcp-linux-x64-gnu": "0.2.1",
|
|
11
|
+
"hyperdb-mcp-win32-x64-msvc": "0.2.1"
|
|
12
12
|
},
|
|
13
13
|
"files": [
|
|
14
|
-
"bin.js"
|
|
14
|
+
"bin.js",
|
|
15
|
+
"README.md"
|
|
15
16
|
],
|
|
16
17
|
"keywords": [
|
|
17
18
|
"hyper",
|