capybara-db-mcp 1.0.6 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,8 +1,17 @@
1
1
  # capybara-db-mcp
2
2
 
3
- > **Your data is safe with Capybara.** Just like capybaras are famously safe and peaceful to be around, **capybara-db-mcp keeps your database data safe—your query results are never shared with an LLM.** Data stays on your machine; the model receives only success/failure.
3
+ ## ⚠️ Production & Governance Notice
4
4
 
5
- **capybara-db-mcp** is a community fork of [DBHub](https://github.com/bytebase/dbhub) by [Bytebase](https://www.bytebase.com/). The key difference: **DBHub sends query results (rows, columns, counts) directly to the LLM**, which can expose sensitive data. **capybara-db-mcp is PII-safe**: it writes results to local files, opens them in the editor, and returns only success/failure to the LLM—no data or file path ever reaches the model. It also enforces read-only SQL, keeps the same internal names (e.g. `dbhub.toml`) for easy merging from upstream, and adds **default-schema support** for PostgreSQL and multi-database setups.
5
+ This project is intended for development, sandbox, or formally reviewed environments. Before connecting to any production system:
6
+
7
+ - Conduct a security review
8
+ - Validate data classification and handling requirements
9
+ - Ensure compliance with internal AI and data governance policies
10
+ - Confirm logging, auditing, and DLP controls are in place
11
+
12
+ This project is designed to reduce the likelihood of exposing query results to LLMs, but it does not replace enterprise security controls and should not be used to bypass governance processes.
13
+
14
+ **capybara-db-mcp** is a community fork of [DBHub](https://github.com/bytebase/dbhub) by [Bytebase](https://www.bytebase.com/). The key difference: **DBHub sends query results (rows, columns, counts) directly to the LLM**, which can expose sensitive data. capybara-db-mcp is designed to reduce the likelihood of exposing query results to LLMs by writing results to local files, opening them in the editor, and returning status-oriented metadata to the MCP client instead of result sets. It uses connector-level read-only connections (PostgreSQL, SQLite), keeps the same internal names (e.g. `dbhub.toml`) for easy merging from upstream, and adds **default-schema support** for PostgreSQL and multi-database setups.
6
15
 
7
16
  - **Original project:** [github.com/bytebase/dbhub](https://github.com/bytebase/dbhub)
8
17
  - **This fork:** [github.com/ajgreyling/capybara-db-mcp](https://github.com/ajgreyling/capybara-db-mcp)
@@ -58,9 +67,24 @@ flowchart LR
58
67
  M --> Ma
59
68
  ```
60
69
 
61
- ### PII-safe data flow
70
+ ## Security Model Overview
71
+
72
+ capybara-db-mcp is designed to reduce the likelihood of transmitting query result data to an LLM by isolating result sets to the local filesystem and returning status-oriented metadata to the MCP client.
73
+
74
+ - **1) LLM generates SQL**: The MCP client sends an `execute_sql` request containing SQL text.
75
+ - **2) Connector is read-only**: Database connections are opened in read-only mode (PostgreSQL: `default_transaction_read_only`; SQLite: readonly file mode). Write attempts fail at the database level.
76
+ - **3) Query executes against the database**: The query runs using the configured connector.
77
+ - **4) Results are written locally**: Result sets are written to `.safe-sql-results/` and opened in the editor (configurable).
78
+ - **5) LLM receives metadata only**: The MCP tool response is formatted to avoid including raw query results in the response payload.
79
+ - **6) Logging remains local**: Operational logs and diagnostic details are written locally.
80
+
81
+ This design reduces the likelihood of transmitting result data to an LLM, but it does not eliminate operational, environment, or governance risks. Database-level controls (RBAC, network segmentation, auditing) and approved operating procedures remain required.
62
82
 
63
- SQL results never reach the LLM. They are written to local files and opened in the editor; only a success/failure status is returned:
83
+ For detailed PII safety mechanisms (result isolation, generic errors, log redaction, search_objects names-only, request telemetry redaction, HTTP hardening), see [ARCHITECTURE.md](ARCHITECTURE.md).
84
+
85
+ ### Result handling and LLM exposure minimization
86
+
87
+ Query results are written to local files and opened in the editor; the MCP tool response is formatted to return success/failure metadata rather than result sets:
64
88
 
65
89
  ```mermaid
66
90
  flowchart TB
@@ -93,29 +117,21 @@ flowchart TB
93
117
 
94
118
  capybara-db-mcp is a zero-dependency, token-efficient MCP server implementing the Model Context Protocol (MCP). It supports the same features as DBHub, plus a default schema.
95
119
 
96
- **This fork is unconditionally read-only.** Only read-only SQL (SELECT, WITH, EXPLAIN, SHOW, etc.) is allowed. Write operations (UPDATE, DELETE, INSERT, MERGE, etc.) are never permitted.
120
+ **Read-only enforcement**: Database connections are opened in read-only mode (PostgreSQL: `default_transaction_read_only`; SQLite: readonly file mode). UPDATE, DELETE, INSERT, and other write operations fail at the connection level. This reduces the risk of accidental writes but does not replace database-level RBAC or permissions configuration.
97
121
 
98
- **Your data is safe with Capybara.** Capybaras are famously safe and peaceful—and so is your data. Query results are **never shared with an LLM**. Raw data is written to local files (`.safe-sql-results/`) and opened in the editor; the LLM receives only success/failure. No file path, row count, or column names are returned (to prevent exfiltration via dynamic SQL). Error responses are also PII-safe: SQL statements and parameter values are never sent to the LLM; they are logged to stderr for local debugging, and database error messages are truncated. This prevents personally identifiable information (PII) from ever reaching the model. There is a default timeout of 60 seconds to ensure queries are not tying up the server.
122
+ **Output isolation controls**: By default, query results are written to local files (`.safe-sql-results/`) and opened in the editor; tool responses are formatted to avoid returning result sets. Error responses return generic messages only (e.g. "Execution failed. See server logs for details."); no SQL, parameter values, or database error text are returned. Logs never include SQL or parameter values. These mechanisms are designed to reduce LLM data exposure risk when used appropriately, and do not constitute regulatory compliance or replace enterprise data governance and DLP controls.
99
123
 
100
124
  - **Local Development First**: Zero dependency, token efficient with just two MCP tools to maximize context window
101
125
  - **Multi-Database**: PostgreSQL, MySQL, MariaDB, SQL Server, and SQLite through a single interface
102
126
  - **Multi-Connection**: Connect to multiple databases simultaneously with TOML configuration
103
127
  - **Default schema**: Use `--schema` (or TOML `schema = "..."`) so PostgreSQL uses that schema for `execute_sql` and `search_objects` is restricted to it (see below)
104
- - **Guardrails**: Unconditionally read-only, row limiting, and a safe 60-second query timeout default (overridable per source via `query_timeout` in `dbhub.toml`) to prevent runaway operations
105
- - **PII-safe**: Query results are written to `.safe-sql-results/` and opened in the editor; only success/failure is sent to the LLM—no file path, row data, count, or column names (prevents exfiltration via dynamic column aliasing). Error responses are hardened: SQL and parameter values are logged locally, not returned to the LLM; database error text is truncated.
128
+ - **Guardrails**: Connector-level read-only connections, row limiting, and a 60-second query timeout default (overridable per source via `query_timeout` in `dbhub.toml`) to reduce runaway operations
129
+ - **Designed to reduce LLM data exposure**: Results are written to `.safe-sql-results/` and opened in the editor; tool responses return only success/failure metadata (no file path, row data, row counts, or column names). Error responses use generic messages only; no SQL, parameter values, or database error text reach the client. Logs are redacted to avoid SQL and parameter values.
106
130
  - **Secure Access**: SSH tunneling and SSL/TLS encryption
107
131
 
108
132
  ## Why Capybara?
109
133
 
110
- The capybara is the spirit animal of capybara-db-mcp: calm, social, and famously safe to be around. **Just as capybaras are safe, your database data stays safe—never shared with an LLM**. It reflects the project's philosophy of peaceful coexistence, predictable behavior, and built-in guardrails.
111
-
112
- ### The Capybara: A Paragon of Peaceful Coexistence
113
-
114
- - **Docile temperament**: Capybaras are known for gentle, non-aggressive behavior and are often seen peacefully sharing space with many species.
115
- - **Herbivorous nature**: As herbivores, they pose no predatory threat to humans or other animals.
116
- - **Social harmony**: They live in cooperative groups, reinforcing a "safe by default" ecosystem.
117
- - **Adaptability**: They thrive in different environments, reducing conflict and stress.
118
- - **Confident calm**: Other animals prefer their company, and capybaras are rarely rattled by neighbors around them.
134
+ Capybara branding reflects a calm, predictable design philosophy: minimal surface area, conservative defaults, and straightforward operational behavior. Branding is not a security or compliance claim; apply your organization’s governance and review standards before production use.
119
135
 
120
136
  ## Supported Databases
121
137
 
@@ -124,8 +140,7 @@ PostgreSQL, MySQL, SQL Server, MariaDB, and SQLite.
124
140
  ## MCP Tools
125
141
 
126
142
  - **[execute_sql](https://dbhub.ai/tools/execute-sql)**: Execute SQL queries with transaction support and safety controls
127
- - **[search_objects](https://dbhub.ai/tools/search-objects)**: Search and explore database schemas, tables, columns, indexes, and procedures with progressive disclosure
128
- - **[Custom Tools](https://dbhub.ai/tools/custom-tools)**: Define reusable, parameterized SQL operations in your `dbhub.toml` configuration file
143
+ - **[search_objects](https://dbhub.ai/tools/search-objects)**: Search and explore database schemas, tables, columns, indexes, and procedures (names only; summary/full metadata disabled for PII safety)
129
144
 
130
145
  ## Default schema (`--schema`)
131
146
 
@@ -170,13 +185,15 @@ schema = "my_app_schema"
170
185
 
171
186
  Full DBHub docs (including TOML and command-line options) apply; see [dbhub.ai](https://dbhub.ai) and [Command-Line Options](https://dbhub.ai/config/command-line).
172
187
 
173
- ### PII-safe output
188
+ ### Output isolation (designed to reduce LLM exposure)
189
+
190
+ By default, `execute_sql` writes query results to `.safe-sql-results/` in your project directory and opens them in the editor. The MCP tool response sent back to the MCP client is formatted to return success/failure metadata rather than result sets. This reduces the likelihood of transmitting result data to an LLM, but it does not eliminate data handling risk and does not by itself satisfy regulatory or compliance requirements.
174
191
 
175
- By default, `execute_sql` and custom tools write query results to `.safe-sql-results/` in your project directory and open them in the editor. The MCP tool response sent to the LLM contains only success/failure. **No file path, row data, row count, or column names** are returned—preventing both direct PII leakage and exfiltration via dynamic SQL (e.g. `SELECT secret AS "password_is_hunter2"`). Error responses are likewise hardened: SQL statements and parameter values are never included in tool error text sent to the LLM; they are logged to stderr for debugging. Database error messages are truncated before being returned. The user inspects results in the editor. Output format is configurable via `--output-format=csv|json|markdown` (default: `csv`).
192
+ To reduce exfiltration risk via dynamic SQL (e.g. `SELECT secret AS "password_is_hunter2"`), tool responses are formatted to avoid including file paths, row data, row counts, or column names. Error responses return generic messages only (e.g. "Execution failed. See server logs for details."); no SQL, parameter values, or database error text are returned. Logs never include SQL or parameter values.
176
193
 
177
- ### Read-only (unconditional)
194
+ ### Read-only enforcement
178
195
 
179
- This fork is unconditionally read-only. Write operations (UPDATE, DELETE, INSERT, MERGE, DROP, CREATE, ALTER, TRUNCATE, etc.) are never allowed. Only read-only SQL (SELECT, WITH, EXPLAIN, SHOW, DESCRIBE, etc.) is permitted.
196
+ Database connections are opened in read-only mode (PostgreSQL: `default_transaction_read_only`; SQLite: readonly file mode). UPDATE, DELETE, INSERT, and other write operations fail at the connection level. This is a guardrail and does not substitute for database-level RBAC, permissions, or audit controls.
180
197
 
181
198
  ## Workbench
182
199
 
@@ -0,0 +1,118 @@
1
+ // src/tools/builtin-tools.ts
2
+ var BUILTIN_TOOL_EXECUTE_SQL = "execute_sql";
3
+ var BUILTIN_TOOL_SEARCH_OBJECTS = "search_objects";
4
+ var BUILTIN_TOOLS = [
5
+ BUILTIN_TOOL_EXECUTE_SQL,
6
+ BUILTIN_TOOL_SEARCH_OBJECTS
7
+ ];
8
+
9
+ // src/tools/registry.ts
10
+ var ToolRegistry = class {
11
+ constructor(config) {
12
+ this.toolsBySource = this.buildRegistry(config);
13
+ }
14
+ /**
15
+ * Check if a tool name is a built-in tool
16
+ */
17
+ isBuiltinTool(toolName) {
18
+ return BUILTIN_TOOLS.includes(toolName);
19
+ }
20
+ /**
21
+ * Build the internal registry mapping sources to their enabled tools
22
+ */
23
+ buildRegistry(config) {
24
+ const registry = /* @__PURE__ */ new Map();
25
+ for (const tool of config.tools || []) {
26
+ if (!this.isBuiltinTool(tool.name)) {
27
+ throw new Error(
28
+ `Unknown tool '${tool.name}'. Valid tools: ${BUILTIN_TOOLS.join(", ")}. Custom tools are not supported.`
29
+ );
30
+ }
31
+ const existing = registry.get(tool.source) || [];
32
+ existing.push(tool);
33
+ registry.set(tool.source, existing);
34
+ }
35
+ for (const source of config.sources) {
36
+ if (!registry.has(source.id)) {
37
+ const defaultTools = BUILTIN_TOOLS.map((name) => {
38
+ if (name === "execute_sql") {
39
+ return { name: "execute_sql", source: source.id };
40
+ } else {
41
+ return { name: "search_objects", source: source.id };
42
+ }
43
+ });
44
+ registry.set(source.id, defaultTools);
45
+ }
46
+ }
47
+ return registry;
48
+ }
49
+ /**
50
+ * Get all enabled tool configs for a specific source
51
+ */
52
+ getEnabledToolConfigs(sourceId) {
53
+ return this.toolsBySource.get(sourceId) || [];
54
+ }
55
+ /**
56
+ * Get built-in tool configuration for a specific source
57
+ * Returns undefined if tool is not enabled or not a built-in
58
+ */
59
+ getBuiltinToolConfig(toolName, sourceId) {
60
+ if (!this.isBuiltinTool(toolName)) {
61
+ return void 0;
62
+ }
63
+ const tools = this.getEnabledToolConfigs(sourceId);
64
+ return tools.find((t) => t.name === toolName);
65
+ }
66
+ /**
67
+ * Get all unique tools across all sources (for tools/list response)
68
+ * Returns the union of all enabled tools
69
+ */
70
+ getAllTools() {
71
+ const seen = /* @__PURE__ */ new Set();
72
+ const result = [];
73
+ for (const tools of this.toolsBySource.values()) {
74
+ for (const tool of tools) {
75
+ if (!seen.has(tool.name)) {
76
+ seen.add(tool.name);
77
+ result.push(tool);
78
+ }
79
+ }
80
+ }
81
+ return result;
82
+ }
83
+ /**
84
+ * Get all built-in tool names that are enabled across any source
85
+ */
86
+ getEnabledBuiltinToolNames() {
87
+ const enabledBuiltins = /* @__PURE__ */ new Set();
88
+ for (const tools of this.toolsBySource.values()) {
89
+ for (const tool of tools) {
90
+ if (this.isBuiltinTool(tool.name)) {
91
+ enabledBuiltins.add(tool.name);
92
+ }
93
+ }
94
+ }
95
+ return Array.from(enabledBuiltins);
96
+ }
97
+ };
98
+ var globalRegistry = null;
99
+ function initializeToolRegistry(config) {
100
+ globalRegistry = new ToolRegistry(config);
101
+ }
102
+ function getToolRegistry() {
103
+ if (!globalRegistry) {
104
+ throw new Error(
105
+ "Tool registry not initialized. Call initializeToolRegistry first."
106
+ );
107
+ }
108
+ return globalRegistry;
109
+ }
110
+
111
+ export {
112
+ BUILTIN_TOOL_EXECUTE_SQL,
113
+ BUILTIN_TOOL_SEARCH_OBJECTS,
114
+ BUILTIN_TOOLS,
115
+ ToolRegistry,
116
+ initializeToolRegistry,
117
+ getToolRegistry
118
+ };