@deepsql/mcp 0.10.1 → 0.11.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CLAUDE.md DELETED
@@ -1,330 +0,0 @@
1
- # DeepSQL — guidance for AI agents (Claude Code / Cursor / Codex / etc.)
2
-
3
- > Read this file before invoking DeepSQL tools. It exists because there are
4
- > **two surfaces** (MCP tools + CLI commands) that look similar and agents
5
- > tend to pick the wrong one or use the right one inefficiently.
6
-
7
- DeepSQL is an autonomous database performance assistant. It exposes itself
8
- to AI agents in two ways:
9
-
10
- | Surface | Where it lives | When you use it |
11
- |---|---|---|
12
- | **MCP tools** (programmatic) | The stdio server you connected via `deepsql mcp` | Default. You're an MCP client and these are first-class tool calls. |
13
- | **CLI** (`deepsql` binary) | The user's `$PATH`, invoked via Bash | Only when an MCP tool can't do it (admin ops, auth, multi-step user flows) **or** the user explicitly asked you to "run `deepsql ...`". |
14
-
15
- If you have both available, **prefer MCP tools.** They're structured, typed,
16
- faster, and don't depend on the user's shell environment.
17
-
18
- ---
19
-
20
- ## Decision tree — "I want to..."
21
-
22
- ```
23
- 1. Find which databases I can query
24
- → list_connections (returns id, name, type for each)
25
-
26
- 2. Understand a database's structure
27
- → get_brain_context (RAG retrieval — best when you have a
28
- natural-language question; returns ranked
29
- tables/columns/FKs/docs/business rules)
30
- → get_schema (full deterministic schema dump — when you
31
- need every table; expensive on large DBs)
32
- → get_database_objects (tables/views/functions/procedures only)
33
-
34
- 3. Answer a business question about data
35
- → get_brain_context first (retrieves what tables hold what, FK edges,
36
- business rules; gives you grounded context)
37
- → then construct SQL yourself, then:
38
- → explain_readonly_sql (validate the plan)
39
- → execute_readonly_sql (run it)
40
-
41
- 4. Find inferred relationships between tables
42
- → get_relationships (returns FK candidates with confidence scores)
43
-
44
- 5. Read business rules / data-access policies
45
- → list_business_rules (active rules + guardrails for a connection;
46
- pass `question` to scope to relevant ones)
47
-
48
- 6. Find anti-patterns
49
- → get_anti_patterns kind=table (schema/structural anti-patterns)
50
- → get_anti_patterns kind=query (slow/expensive query patterns)
51
-
52
- 7. Investigate slow queries
53
- → analyze_slow_queries (recent slow queries with fingerprints + ms)
54
-
55
- 8. Run SQL to inspect data
56
- → execute_readonly_sql (read-only — backend rejects mutations)
57
- ```
58
-
59
- **Rule of thumb for question-answering:** start with `get_brain_context`,
60
- not `execute_readonly_sql`. The brain context tells you which tables matter,
61
- their FKs, what the columns mean, and what business rules apply. Skipping
62
- straight to SQL is how you write queries against the wrong tables.
63
-
64
- ---
65
-
66
- ## MCP tool reference (10 tools)
67
-
68
- Every tool requires a `connectionId` (string UUID) **except** `list_connections`.
69
- Always call `list_connections` first if you don't already know the ID.
70
-
71
- ### Discovery
72
-
73
- #### `list_connections`
74
- - **Args:** none
75
- - **Returns:** array of `{ id, connectionName, databaseType, ... }`
76
- - **Use when:** the user mentions a DB by name and you need its ID, or you
77
- don't know which DBs are available.
78
-
79
- #### `get_schema(connectionId)`
80
- - **Returns:** the cached schema metadata for the whole DB.
81
- - **Use when:** you need an exhaustive listing. **Avoid** when the DB is
82
- large (hundreds of tables) — `get_brain_context` ranks the relevant
83
- subset much faster.
84
-
85
- #### `get_database_objects(connectionId)`
86
- - **Returns:** tables, views, functions, procedures.
87
- - **Use when:** the user asks "what views/functions exist?" — narrower
88
- than `get_schema`.
89
-
90
- ### RAG / brain retrieval (preferred for question-answering)
91
-
92
- #### `get_brain_context(connectionId, question, topK?)`
93
- - **Args:**
94
- - `question` — natural-language question used for retrieval ranking
95
- - `topK` (optional, 1–100) — when provided, returns ranked diagnostic
96
- snippets (good for "show me the top 5 most relevant tables"). When
97
- omitted, returns the rich training-context payload (tables + columns
98
- + FKs + business rules + docs assembled for prompt-grounding).
99
- - **Use when:** the user asks any analytical question. This is the cheapest
100
- way to ground yourself before generating SQL.
101
- - **Output:** typically includes `trainingContext` (text block ready to feed
102
- into your own context window) plus structured ranked results.
103
-
104
- #### `list_business_rules(connectionId, question?)`
105
- - **Returns:** `activeRules` + `applicableGuardrails` + `guardrailContext`.
106
- - **Use when:** before generating SQL that touches sensitive entities. If
107
- the rules say "PII columns are blocked," respect that in your output.
108
- Pass `question` to filter to rules applicable to the user's intent.
109
-
110
- #### `get_relationships(connectionId)`
111
- - **Returns:** array of `{ sourceTable, sourceColumn, targetTable, targetColumn, confidence, inferenceMethod, validationStatus }`.
112
- - **Use when:** writing JOINs and the actual FK constraint isn't declared
113
- in the schema. Anything `confidence >= 0.8` is safe; lower confidence
114
- means inferred from naming patterns or data — verify with the user.
115
-
116
- #### `get_anti_patterns(connectionId, kind?, limit?)`
117
- - **`kind="table"` (default):** schema-level anti-patterns (missing
118
- indexes, wide tables, etc.).
119
- - **`kind="query":** query-level patterns; pass `limit` (1–500).
120
- - **Use when:** the user asks "what's wrong with this DB?" or you've
121
- generated a query and want to sanity-check it.
122
-
123
- ### Operations
124
-
125
- #### `analyze_slow_queries(connectionId, thresholdMs?, limit?)`
126
- - **Args:** `thresholdMs` defaults 100, `limit` defaults 10.
127
- - **Returns:** recent slow queries from `pg_stat_statements` with
128
- fingerprints, durations, example statements.
129
- - **Use when:** the user asks "what's slow?" or you're triaging a
130
- performance incident.
131
-
132
- ### Execution
133
-
134
- #### `execute_readonly_sql(connectionId, query, limit?, timeoutSeconds?)`
135
- - **Read-only enforced at four layers:** client SQL parser, backend SQL
136
- parser, per-connection ACL on the calling user's token, and the DB
137
- role itself usually only has SELECT/EXPLAIN. Mutations (INSERT, UPDATE,
138
- DELETE, DDL, etc.) are rejected — **don't try to work around this**.
139
- - **Multi-statement SQL is rejected** in phase 1. Send one statement.
140
- - **Defaults:** 100-row `limit`, backend default `timeoutSeconds`.
141
- - **Use when:** you've grounded yourself with `get_brain_context` and need
142
- to fetch concrete numbers.
143
-
144
- #### `explain_readonly_sql(connectionId, query)`
145
- - **Don't include `EXPLAIN` in the query string** — the tool wraps it.
146
- `ANALYZE` is also rejected (read-only).
147
- - **Use when:** you want to validate a plan before running it, or you're
148
- diagnosing why a query is slow.
149
-
150
- ---
151
-
152
- ## CLI commands (run via Bash, only when MCP isn't enough)
153
-
154
- The CLI exposes the same data plane plus admin operations the MCP server
155
- deliberately doesn't expose. **Only run CLI commands when the user explicitly
156
- asks you to**, or when an MCP tool can't do what's needed (admin, auth,
157
- multi-step flows).
158
-
159
- ### Quick reference
160
-
161
- ```
162
- # Auth (the user typically did this once; don't re-run unless asked)
163
- deepsql login --url https://<host>
164
- deepsql whoami
165
- deepsql logout
166
-
167
- # Connections — the human's "active DB" pin (CLI-only; MCP tools don't read this)
168
- deepsql connections list # marks active with *
169
- deepsql connections use <name> # pin
170
- deepsql connections current # show pinned
171
- deepsql connections unset
172
-
173
- # Read-only data ops (mirror MCP tools — same backend, same guardrails)
174
- deepsql query "SELECT ..." --connection <name>
175
- deepsql explain "SELECT ..." --connection <name>
176
- deepsql schema [tables|objects] --connection <name>
177
-
178
- # Brain / RAG (mirror the MCP brain tools)
179
- deepsql brain-context "<question>" --connection <name> [--top-k N]
180
- deepsql business-rules --connection <name> [--question "..."]
181
- deepsql relationships --connection <name>
182
- deepsql anti-patterns --connection <name> [--kind table|query] [--limit N]
183
-
184
- # Slack daily digest
185
- deepsql digest [N] --connection <name>
186
-
187
- # Slow-query operations
188
- deepsql slow-queries latest --connection <name>
189
- deepsql slow-queries history --connection <name> [N]
190
- deepsql slow-queries analyze --connection <name>
191
- deepsql slow-queries optimize --connection <name> --query-id <id> # SSE stream
192
-
193
- # Admin (require ADMIN role)
194
- deepsql users list | get | add | set-role | lock | unlock | disable | delete
195
- deepsql access list | grant | revoke | policy
196
- deepsql permissions list | override | reset
197
- deepsql setup [--skip-email] [--skip-slack] # post-install wizard
198
- ```
199
-
200
- ### When CLI is the right call (vs MCP)
201
-
202
- - The user said "run `deepsql ...`" or "use the CLI."
203
- - The operation is admin (`users`, `access`, `permissions`, `setup`) — these
204
- aren't exposed via MCP intentionally.
205
- - The user wants Slack digest content (`digest`).
206
- - You're in a script context where structured stdin/stdout is preferable.
207
-
208
- ### When CLI is the **wrong** call
209
-
210
- - For everything in the decision tree above. The MCP equivalents are
211
- faster and don't depend on the user's `$PATH`, env, or saved auth.
212
- - For executing SQL the user is paying you to write — use
213
- `execute_readonly_sql`, not `Bash("deepsql query ...")`.
214
-
215
- ---
216
-
217
- ## Common mistakes — and how to avoid them
218
-
219
- ### ❌ Generating SQL without retrieving brain context first
220
- The user asks: *"How many active customers do we have?"*
221
-
222
- **Wrong:** call `execute_readonly_sql("SELECT COUNT(*) FROM customers WHERE active = true")` —
223
- guesses at the table name (`customers` vs `dim_customer` vs `users`),
224
- guesses at the column (`active` vs `is_active` vs `status='ACTIVE'`),
225
- ignores any business rule that defines what "active" means.
226
-
227
- **Right:**
228
- 1. `get_brain_context(connectionId, "how many active customers")` — returns the
229
- right table (`dim_customer`) and the column convention.
230
- 2. `list_business_rules(connectionId, "active customers")` — returns the rule
231
- if "active" has a workspace-specific definition.
232
- 3. Generate SQL using the names + rules from #1 and #2.
233
- 4. `explain_readonly_sql(...)` — sanity check.
234
- 5. `execute_readonly_sql(...)` — run it.
235
-
236
- ### ❌ Calling `get_schema` on every analysis question
237
- `get_schema` returns the entire DB. On a 200-table OLAP warehouse that's a
238
- huge response and most of it is irrelevant to the question. **Use
239
- `get_brain_context` for question-scoped retrieval.** Reserve `get_schema`
240
- for exhaustive listing tasks.
241
-
242
- ### ❌ Trying to mutate data
243
- Every execution path is read-only. INSERT/UPDATE/DELETE/CREATE/DROP/ALTER/
244
- TRUNCATE are all rejected at the SQL parser layer. If the user asks for a
245
- mutation, **stop and tell them DeepSQL is read-only**, then offer to draft
246
- the SQL for them to run themselves.
247
-
248
- ### ❌ Forgetting the connectionId
249
- Every tool except `list_connections` requires it. If the user mentions a DB
250
- by name (e.g., "look at prod-replica"), call `list_connections` first to
251
- resolve the name → UUID. Don't guess.
252
-
253
- ### ❌ Re-fetching context on every turn
254
- Schema and brain context don't change minute-to-minute. If you already
255
- called `get_brain_context` for a related question this conversation, reuse
256
- the result. Don't re-call unless the question has shifted topics.
257
-
258
- ### ❌ Mixing CLI invocations and MCP tool calls in the same session
259
- Pick one. If you have MCP available, stay in MCP. If you only have Bash,
260
- use the CLI. Mixing forces the user to debug two surfaces.
261
-
262
- ### ❌ Calling `analyze_slow_queries` and immediately querying the slow-query log table directly
263
- The MCP tool already does the right query against `pg_stat_statements` (or
264
- the equivalent for MySQL) with the right thresholds. Don't reinvent it.
265
-
266
- ---
267
-
268
- ## Output handling tips
269
-
270
- - **`get_brain_context` returns a `trainingContext` text block.** It's
271
- designed to drop into your prompt as-is. Don't summarize it before
272
- generating SQL — let the structured names flow through.
273
- - **`execute_readonly_sql` returns `{ result: { columns, rows, rowCount, totalRowCount, isLimited, ... }, success, queryType }`.**
274
- `rows` is array-of-arrays (column-positional), not array-of-objects. The
275
- CLI's `query` command renders this; if you're consuming the structured
276
- response yourself, zip `columns` and `rows[i]` to get an object.
277
- - **`explain_readonly_sql` returns the plan as JSON.** Postgres-style
278
- textual EXPLAIN is in `plan` if available; structured form may be
279
- alongside.
280
- - **`analyze_slow_queries` returns slow queries with fingerprints, not raw
281
- SQL.** Fingerprints are normalized (`?` for literals). Use the
282
- `queryId` to feed back into `optimize` flows.
283
-
284
- ---
285
-
286
- ## Multi-database situations
287
-
288
- DeepSQL doesn't support cross-connection JOINs at the SQL layer. If the user
289
- asks a question that spans DBs:
290
-
291
- 1. Call `list_connections` to enumerate.
292
- 2. For each relevant DB, call `get_brain_context` and/or `execute_readonly_sql`.
293
- 3. Combine the results in your reasoning, not in SQL.
294
-
295
- The CLI's "active connection" pin (`deepsql connections use`) is **not** read
296
- by MCP tools — it only saves typing for human CLI users. As an MCP client,
297
- always pass `connectionId` explicitly per call.
298
-
299
- ---
300
-
301
- ## Authentication & security model
302
-
303
- You don't need to manage auth — the MCP server was launched with a saved
304
- token from `~/.config/deepsql/auth.json`. Every tool call carries that
305
- token. The token is bound to a specific user identity:
306
-
307
- - The user's role and per-connection ACLs are enforced **server-side**.
308
- If you call a tool and get an authorization error, surface it to the
309
- user — don't retry with different parameters.
310
- - The user may have **chat-access policies** (plain-English rules
311
- attached to a connection). The brain context already reflects them; if
312
- a query you generate triggers a policy violation, the backend rejects
313
- it. Trust the rejection and ask the user how to proceed.
314
- - **Read-only is enforced at four independent layers** (client parser,
315
- backend parser, per-connection ACL, DB role). Don't try to bypass any of
316
- them — each rejection is a real signal that the operation isn't safe.
317
-
318
- ---
319
-
320
- ## When in doubt
321
-
322
- 1. Call `list_connections` first if you don't have a connectionId.
323
- 2. Call `get_brain_context` second if you have a question.
324
- 3. Generate your SQL using the names and rules from those calls.
325
- 4. Call `explain_readonly_sql` if performance matters.
326
- 5. Call `execute_readonly_sql` last.
327
-
328
- That five-step flow handles 80% of legitimate analytical workloads. Anything
329
- that doesn't fit this pattern probably warrants asking the user a
330
- clarifying question instead of guessing.