@deepsql/mcp 0.10.1 → 0.11.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +3 -3
- package/src/cli.js +418 -127
- package/src/cli.test.js +81 -1
- package/src/commands/indexes.js +306 -0
- package/src/commands/indexes.test.js +298 -0
- package/CLAUDE.md +0 -330
package/CLAUDE.md
DELETED
|
@@ -1,330 +0,0 @@
|
|
|
1
|
-
# DeepSQL — guidance for AI agents (Claude Code / Cursor / Codex / etc.)
|
|
2
|
-
|
|
3
|
-
> Read this file before invoking DeepSQL tools. It exists because there are
|
|
4
|
-
> **two surfaces** (MCP tools + CLI commands) that look similar and agents
|
|
5
|
-
> tend to pick the wrong one or use the right one inefficiently.
|
|
6
|
-
|
|
7
|
-
DeepSQL is an autonomous database performance assistant. It exposes itself
|
|
8
|
-
to AI agents in two ways:
|
|
9
|
-
|
|
10
|
-
| Surface | Where it lives | When you use it |
|
|
11
|
-
|---|---|---|
|
|
12
|
-
| **MCP tools** (programmatic) | The stdio server you connected via `deepsql mcp` | Default. You're an MCP client and these are first-class tool calls. |
|
|
13
|
-
| **CLI** (`deepsql` binary) | The user's `$PATH`, invoked via Bash | Only when an MCP tool can't do it (admin ops, auth, multi-step user flows) **or** the user explicitly asked you to "run `deepsql ...`". |
|
|
14
|
-
|
|
15
|
-
If you have both available, **prefer MCP tools.** They're structured, typed,
|
|
16
|
-
faster, and don't depend on the user's shell environment.
|
|
17
|
-
|
|
18
|
-
---
|
|
19
|
-
|
|
20
|
-
## Decision tree — "I want to..."
|
|
21
|
-
|
|
22
|
-
```
|
|
23
|
-
1. Find which databases I can query
|
|
24
|
-
→ list_connections (returns id, name, type for each)
|
|
25
|
-
|
|
26
|
-
2. Understand a database's structure
|
|
27
|
-
→ get_brain_context (RAG retrieval — best when you have a
|
|
28
|
-
natural-language question; returns ranked
|
|
29
|
-
tables/columns/FKs/docs/business rules)
|
|
30
|
-
→ get_schema (full deterministic schema dump — when you
|
|
31
|
-
need every table; expensive on large DBs)
|
|
32
|
-
→ get_database_objects (tables/views/functions/procedures only)
|
|
33
|
-
|
|
34
|
-
3. Answer a business question about data
|
|
35
|
-
→ get_brain_context first (retrieves what tables hold what, FK edges,
|
|
36
|
-
business rules; gives you grounded context)
|
|
37
|
-
→ then construct SQL yourself, then:
|
|
38
|
-
→ explain_readonly_sql (validate the plan)
|
|
39
|
-
→ execute_readonly_sql (run it)
|
|
40
|
-
|
|
41
|
-
4. Find inferred relationships between tables
|
|
42
|
-
→ get_relationships (returns FK candidates with confidence scores)
|
|
43
|
-
|
|
44
|
-
5. Read business rules / data-access policies
|
|
45
|
-
→ list_business_rules (active rules + guardrails for a connection;
|
|
46
|
-
pass `question` to scope to relevant ones)
|
|
47
|
-
|
|
48
|
-
6. Find anti-patterns
|
|
49
|
-
→ get_anti_patterns kind=table (schema/structural anti-patterns)
|
|
50
|
-
→ get_anti_patterns kind=query (slow/expensive query patterns)
|
|
51
|
-
|
|
52
|
-
7. Investigate slow queries
|
|
53
|
-
→ analyze_slow_queries (recent slow queries with fingerprints + ms)
|
|
54
|
-
|
|
55
|
-
8. Run SQL to inspect data
|
|
56
|
-
→ execute_readonly_sql (read-only — backend rejects mutations)
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
**Rule of thumb for question-answering:** start with `get_brain_context`,
|
|
60
|
-
not `execute_readonly_sql`. The brain context tells you which tables matter,
|
|
61
|
-
their FKs, what the columns mean, and what business rules apply. Skipping
|
|
62
|
-
straight to SQL is how you write queries against the wrong tables.
|
|
63
|
-
|
|
64
|
-
---
|
|
65
|
-
|
|
66
|
-
## MCP tool reference (10 tools)
|
|
67
|
-
|
|
68
|
-
Every tool requires a `connectionId` (string UUID) **except** `list_connections`.
|
|
69
|
-
Always call `list_connections` first if you don't already know the ID.
|
|
70
|
-
|
|
71
|
-
### Discovery
|
|
72
|
-
|
|
73
|
-
#### `list_connections`
|
|
74
|
-
- **Args:** none
|
|
75
|
-
- **Returns:** array of `{ id, connectionName, databaseType, ... }`
|
|
76
|
-
- **Use when:** the user mentions a DB by name and you need its ID, or you
|
|
77
|
-
don't know which DBs are available.
|
|
78
|
-
|
|
79
|
-
#### `get_schema(connectionId)`
|
|
80
|
-
- **Returns:** the cached schema metadata for the whole DB.
|
|
81
|
-
- **Use when:** you need an exhaustive listing. **Avoid** when the DB is
|
|
82
|
-
large (hundreds of tables) — `get_brain_context` ranks the relevant
|
|
83
|
-
subset much faster.
|
|
84
|
-
|
|
85
|
-
#### `get_database_objects(connectionId)`
|
|
86
|
-
- **Returns:** tables, views, functions, procedures.
|
|
87
|
-
- **Use when:** the user asks "what views/functions exist?" — narrower
|
|
88
|
-
than `get_schema`.
|
|
89
|
-
|
|
90
|
-
### RAG / brain retrieval (preferred for question-answering)
|
|
91
|
-
|
|
92
|
-
#### `get_brain_context(connectionId, question, topK?)`
|
|
93
|
-
- **Args:**
|
|
94
|
-
- `question` — natural-language question used for retrieval ranking
|
|
95
|
-
- `topK` (optional, 1–100) — when provided, returns ranked diagnostic
|
|
96
|
-
snippets (good for "show me the top 5 most relevant tables"). When
|
|
97
|
-
omitted, returns the rich training-context payload (tables + columns
|
|
98
|
-
+ FKs + business rules + docs assembled for prompt-grounding).
|
|
99
|
-
- **Use when:** the user asks any analytical question. This is the cheapest
|
|
100
|
-
way to ground yourself before generating SQL.
|
|
101
|
-
- **Output:** typically includes `trainingContext` (text block ready to feed
|
|
102
|
-
into your own context window) plus structured ranked results.
|
|
103
|
-
|
|
104
|
-
#### `list_business_rules(connectionId, question?)`
|
|
105
|
-
- **Returns:** `activeRules` + `applicableGuardrails` + `guardrailContext`.
|
|
106
|
-
- **Use when:** before generating SQL that touches sensitive entities. If
|
|
107
|
-
the rules say "PII columns are blocked," respect that in your output.
|
|
108
|
-
Pass `question` to filter to rules applicable to the user's intent.
|
|
109
|
-
|
|
110
|
-
#### `get_relationships(connectionId)`
|
|
111
|
-
- **Returns:** array of `{ sourceTable, sourceColumn, targetTable, targetColumn, confidence, inferenceMethod, validationStatus }`.
|
|
112
|
-
- **Use when:** writing JOINs and the actual FK constraint isn't declared
|
|
113
|
-
in the schema. Anything `confidence >= 0.8` is safe; lower confidence
|
|
114
|
-
means inferred from naming patterns or data — verify with the user.
|
|
115
|
-
|
|
116
|
-
#### `get_anti_patterns(connectionId, kind?, limit?)`
|
|
117
|
-
- **`kind="table"` (default):** schema-level anti-patterns (missing
|
|
118
|
-
indexes, wide tables, etc.).
|
|
119
|
-
- **`kind="query":** query-level patterns; pass `limit` (1–500).
|
|
120
|
-
- **Use when:** the user asks "what's wrong with this DB?" or you've
|
|
121
|
-
generated a query and want to sanity-check it.
|
|
122
|
-
|
|
123
|
-
### Operations
|
|
124
|
-
|
|
125
|
-
#### `analyze_slow_queries(connectionId, thresholdMs?, limit?)`
|
|
126
|
-
- **Args:** `thresholdMs` defaults 100, `limit` defaults 10.
|
|
127
|
-
- **Returns:** recent slow queries from `pg_stat_statements` with
|
|
128
|
-
fingerprints, durations, example statements.
|
|
129
|
-
- **Use when:** the user asks "what's slow?" or you're triaging a
|
|
130
|
-
performance incident.
|
|
131
|
-
|
|
132
|
-
### Execution
|
|
133
|
-
|
|
134
|
-
#### `execute_readonly_sql(connectionId, query, limit?, timeoutSeconds?)`
|
|
135
|
-
- **Read-only enforced at four layers:** client SQL parser, backend SQL
|
|
136
|
-
parser, per-connection ACL on the calling user's token, and the DB
|
|
137
|
-
role itself usually only has SELECT/EXPLAIN. Mutations (INSERT, UPDATE,
|
|
138
|
-
DELETE, DDL, etc.) are rejected — **don't try to work around this**.
|
|
139
|
-
- **Multi-statement SQL is rejected** in phase 1. Send one statement.
|
|
140
|
-
- **Defaults:** 100-row `limit`, backend default `timeoutSeconds`.
|
|
141
|
-
- **Use when:** you've grounded yourself with `get_brain_context` and need
|
|
142
|
-
to fetch concrete numbers.
|
|
143
|
-
|
|
144
|
-
#### `explain_readonly_sql(connectionId, query)`
|
|
145
|
-
- **Don't include `EXPLAIN` in the query string** — the tool wraps it.
|
|
146
|
-
`ANALYZE` is also rejected (read-only).
|
|
147
|
-
- **Use when:** you want to validate a plan before running it, or you're
|
|
148
|
-
diagnosing why a query is slow.
|
|
149
|
-
|
|
150
|
-
---
|
|
151
|
-
|
|
152
|
-
## CLI commands (run via Bash, only when MCP isn't enough)
|
|
153
|
-
|
|
154
|
-
The CLI exposes the same data plane plus admin operations the MCP server
|
|
155
|
-
deliberately doesn't expose. **Only run CLI commands when the user explicitly
|
|
156
|
-
asks you to**, or when an MCP tool can't do what's needed (admin, auth,
|
|
157
|
-
multi-step flows).
|
|
158
|
-
|
|
159
|
-
### Quick reference
|
|
160
|
-
|
|
161
|
-
```
|
|
162
|
-
# Auth (the user typically did this once; don't re-run unless asked)
|
|
163
|
-
deepsql login --url https://<host>
|
|
164
|
-
deepsql whoami
|
|
165
|
-
deepsql logout
|
|
166
|
-
|
|
167
|
-
# Connections — the human's "active DB" pin (CLI-only; MCP tools don't read this)
|
|
168
|
-
deepsql connections list # marks active with *
|
|
169
|
-
deepsql connections use <name> # pin
|
|
170
|
-
deepsql connections current # show pinned
|
|
171
|
-
deepsql connections unset
|
|
172
|
-
|
|
173
|
-
# Read-only data ops (mirror MCP tools — same backend, same guardrails)
|
|
174
|
-
deepsql query "SELECT ..." --connection <name>
|
|
175
|
-
deepsql explain "SELECT ..." --connection <name>
|
|
176
|
-
deepsql schema [tables|objects] --connection <name>
|
|
177
|
-
|
|
178
|
-
# Brain / RAG (mirror the MCP brain tools)
|
|
179
|
-
deepsql brain-context "<question>" --connection <name> [--top-k N]
|
|
180
|
-
deepsql business-rules --connection <name> [--question "..."]
|
|
181
|
-
deepsql relationships --connection <name>
|
|
182
|
-
deepsql anti-patterns --connection <name> [--kind table|query] [--limit N]
|
|
183
|
-
|
|
184
|
-
# Slack daily digest
|
|
185
|
-
deepsql digest [N] --connection <name>
|
|
186
|
-
|
|
187
|
-
# Slow-query operations
|
|
188
|
-
deepsql slow-queries latest --connection <name>
|
|
189
|
-
deepsql slow-queries history --connection <name> [N]
|
|
190
|
-
deepsql slow-queries analyze --connection <name>
|
|
191
|
-
deepsql slow-queries optimize --connection <name> --query-id <id> # SSE stream
|
|
192
|
-
|
|
193
|
-
# Admin (require ADMIN role)
|
|
194
|
-
deepsql users list | get | add | set-role | lock | unlock | disable | delete
|
|
195
|
-
deepsql access list | grant | revoke | policy
|
|
196
|
-
deepsql permissions list | override | reset
|
|
197
|
-
deepsql setup [--skip-email] [--skip-slack] # post-install wizard
|
|
198
|
-
```
|
|
199
|
-
|
|
200
|
-
### When CLI is the right call (vs MCP)
|
|
201
|
-
|
|
202
|
-
- The user said "run `deepsql ...`" or "use the CLI."
|
|
203
|
-
- The operation is admin (`users`, `access`, `permissions`, `setup`) — these
|
|
204
|
-
aren't exposed via MCP intentionally.
|
|
205
|
-
- The user wants Slack digest content (`digest`).
|
|
206
|
-
- You're in a script context where structured stdin/stdout is preferable.
|
|
207
|
-
|
|
208
|
-
### When CLI is the **wrong** call
|
|
209
|
-
|
|
210
|
-
- For everything in the decision tree above. The MCP equivalents are
|
|
211
|
-
faster and don't depend on the user's `$PATH`, env, or saved auth.
|
|
212
|
-
- For executing SQL the user is paying you to write — use
|
|
213
|
-
`execute_readonly_sql`, not `Bash("deepsql query ...")`.
|
|
214
|
-
|
|
215
|
-
---
|
|
216
|
-
|
|
217
|
-
## Common mistakes — and how to avoid them
|
|
218
|
-
|
|
219
|
-
### ❌ Generating SQL without retrieving brain context first
|
|
220
|
-
The user asks: *"How many active customers do we have?"*
|
|
221
|
-
|
|
222
|
-
**Wrong:** call `execute_readonly_sql("SELECT COUNT(*) FROM customers WHERE active = true")` —
|
|
223
|
-
guesses at the table name (`customers` vs `dim_customer` vs `users`),
|
|
224
|
-
guesses at the column (`active` vs `is_active` vs `status='ACTIVE'`),
|
|
225
|
-
ignores any business rule that defines what "active" means.
|
|
226
|
-
|
|
227
|
-
**Right:**
|
|
228
|
-
1. `get_brain_context(connectionId, "how many active customers")` — returns the
|
|
229
|
-
right table (`dim_customer`) and the column convention.
|
|
230
|
-
2. `list_business_rules(connectionId, "active customers")` — returns the rule
|
|
231
|
-
if "active" has a workspace-specific definition.
|
|
232
|
-
3. Generate SQL using the names + rules from #1 and #2.
|
|
233
|
-
4. `explain_readonly_sql(...)` — sanity check.
|
|
234
|
-
5. `execute_readonly_sql(...)` — run it.
|
|
235
|
-
|
|
236
|
-
### ❌ Calling `get_schema` on every analysis question
|
|
237
|
-
`get_schema` returns the entire DB. On a 200-table OLAP warehouse that's a
|
|
238
|
-
huge response and most of it is irrelevant to the question. **Use
|
|
239
|
-
`get_brain_context` for question-scoped retrieval.** Reserve `get_schema`
|
|
240
|
-
for exhaustive listing tasks.
|
|
241
|
-
|
|
242
|
-
### ❌ Trying to mutate data
|
|
243
|
-
Every execution path is read-only. INSERT/UPDATE/DELETE/CREATE/DROP/ALTER/
|
|
244
|
-
TRUNCATE are all rejected at the SQL parser layer. If the user asks for a
|
|
245
|
-
mutation, **stop and tell them DeepSQL is read-only**, then offer to draft
|
|
246
|
-
the SQL for them to run themselves.
|
|
247
|
-
|
|
248
|
-
### ❌ Forgetting the connectionId
|
|
249
|
-
Every tool except `list_connections` requires it. If the user mentions a DB
|
|
250
|
-
by name (e.g., "look at prod-replica"), call `list_connections` first to
|
|
251
|
-
resolve the name → UUID. Don't guess.
|
|
252
|
-
|
|
253
|
-
### ❌ Re-fetching context on every turn
|
|
254
|
-
Schema and brain context don't change minute-to-minute. If you already
|
|
255
|
-
called `get_brain_context` for a related question this conversation, reuse
|
|
256
|
-
the result. Don't re-call unless the question has shifted topics.
|
|
257
|
-
|
|
258
|
-
### ❌ Mixing CLI invocations and MCP tool calls in the same session
|
|
259
|
-
Pick one. If you have MCP available, stay in MCP. If you only have Bash,
|
|
260
|
-
use the CLI. Mixing forces the user to debug two surfaces.
|
|
261
|
-
|
|
262
|
-
### ❌ Calling `analyze_slow_queries` and immediately querying the slow-query log table directly
|
|
263
|
-
The MCP tool already does the right query against `pg_stat_statements` (or
|
|
264
|
-
the equivalent for MySQL) with the right thresholds. Don't reinvent it.
|
|
265
|
-
|
|
266
|
-
---
|
|
267
|
-
|
|
268
|
-
## Output handling tips
|
|
269
|
-
|
|
270
|
-
- **`get_brain_context` returns a `trainingContext` text block.** It's
|
|
271
|
-
designed to drop into your prompt as-is. Don't summarize it before
|
|
272
|
-
generating SQL — let the structured names flow through.
|
|
273
|
-
- **`execute_readonly_sql` returns `{ result: { columns, rows, rowCount, totalRowCount, isLimited, ... }, success, queryType }`.**
|
|
274
|
-
`rows` is array-of-arrays (column-positional), not array-of-objects. The
|
|
275
|
-
CLI's `query` command renders this; if you're consuming the structured
|
|
276
|
-
response yourself, zip `columns` and `rows[i]` to get an object.
|
|
277
|
-
- **`explain_readonly_sql` returns the plan as JSON.** Postgres-style
|
|
278
|
-
textual EXPLAIN is in `plan` if available; structured form may be
|
|
279
|
-
alongside.
|
|
280
|
-
- **`analyze_slow_queries` returns slow queries with fingerprints, not raw
|
|
281
|
-
SQL.** Fingerprints are normalized (`?` for literals). Use the
|
|
282
|
-
`queryId` to feed back into `optimize` flows.
|
|
283
|
-
|
|
284
|
-
---
|
|
285
|
-
|
|
286
|
-
## Multi-database situations
|
|
287
|
-
|
|
288
|
-
DeepSQL doesn't support cross-connection JOINs at the SQL layer. If the user
|
|
289
|
-
asks a question that spans DBs:
|
|
290
|
-
|
|
291
|
-
1. Call `list_connections` to enumerate.
|
|
292
|
-
2. For each relevant DB, call `get_brain_context` and/or `execute_readonly_sql`.
|
|
293
|
-
3. Combine the results in your reasoning, not in SQL.
|
|
294
|
-
|
|
295
|
-
The CLI's "active connection" pin (`deepsql connections use`) is **not** read
|
|
296
|
-
by MCP tools — it only saves typing for human CLI users. As an MCP client,
|
|
297
|
-
always pass `connectionId` explicitly per call.
|
|
298
|
-
|
|
299
|
-
---
|
|
300
|
-
|
|
301
|
-
## Authentication & security model
|
|
302
|
-
|
|
303
|
-
You don't need to manage auth — the MCP server was launched with a saved
|
|
304
|
-
token from `~/.config/deepsql/auth.json`. Every tool call carries that
|
|
305
|
-
token. The token is bound to a specific user identity:
|
|
306
|
-
|
|
307
|
-
- The user's role and per-connection ACLs are enforced **server-side**.
|
|
308
|
-
If you call a tool and get an authorization error, surface it to the
|
|
309
|
-
user — don't retry with different parameters.
|
|
310
|
-
- The user may have **chat-access policies** (plain-English rules
|
|
311
|
-
attached to a connection). The brain context already reflects them; if
|
|
312
|
-
a query you generate triggers a policy violation, the backend rejects
|
|
313
|
-
it. Trust the rejection and ask the user how to proceed.
|
|
314
|
-
- **Read-only is enforced at four independent layers** (client parser,
|
|
315
|
-
backend parser, per-connection ACL, DB role). Don't try to bypass any of
|
|
316
|
-
them — each rejection is a real signal that the operation isn't safe.
|
|
317
|
-
|
|
318
|
-
---
|
|
319
|
-
|
|
320
|
-
## When in doubt
|
|
321
|
-
|
|
322
|
-
1. Call `list_connections` first if you don't have a connectionId.
|
|
323
|
-
2. Call `get_brain_context` second if you have a question.
|
|
324
|
-
3. Generate your SQL using the names and rules from those calls.
|
|
325
|
-
4. Call `explain_readonly_sql` if performance matters.
|
|
326
|
-
5. Call `execute_readonly_sql` last.
|
|
327
|
-
|
|
328
|
-
That five-step flow handles 80% of legitimate analytical workloads. Anything
|
|
329
|
-
that doesn't fit this pattern probably warrants asking the user a
|
|
330
|
-
clarifying question instead of guessing.
|