@gagik.co/snippet-agent 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.eslintrc.js +13 -0
- package/.prettierrc.json +1 -0
- package/README.md +23 -0
- package/dist/agent-class.d.ts +47 -0
- package/dist/agent-class.js +314 -0
- package/dist/agent.d.ts +1 -0
- package/dist/agent.js +392 -0
- package/dist/banner.d.ts +1 -0
- package/dist/banner.js +23 -0
- package/dist/confirmation-extension.d.ts +10 -0
- package/dist/confirmation-extension.js +213 -0
- package/dist/index.d.ts +3 -0
- package/dist/index.js +141 -0
- package/dist/mongosh-interactive-mode.d.ts +33 -0
- package/dist/mongosh-interactive-mode.js +244 -0
- package/dist/project-agent.d.ts +1 -0
- package/dist/project-agent.js +36 -0
- package/dist/shell-context.d.ts +17 -0
- package/dist/shell-context.js +75 -0
- package/dist/skills-loader.d.ts +2 -0
- package/dist/skills-loader.js +69 -0
- package/dist/src/index.d.ts +1 -0
- package/dist/src/index.js +8 -0
- package/dist/src/project-agent.d.ts +1 -0
- package/dist/src/project-agent.js +36 -0
- package/dist/stdout-patcher.d.ts +5 -0
- package/dist/stdout-patcher.js +41 -0
- package/dist/tools/index.d.ts +4 -0
- package/dist/tools/index.js +7 -0
- package/dist/tools/mongosh-eval.d.ts +7 -0
- package/dist/tools/mongosh-eval.js +84 -0
- package/dist/tools/search-docs.d.ts +2 -0
- package/dist/tools/search-docs.js +106 -0
- package/dist/tools/types.d.ts +12 -0
- package/dist/tools/types.js +2 -0
- package/dist/tools.d.ts +7 -0
- package/dist/tools.js +189 -0
- package/dist/types.d.ts +21 -0
- package/dist/types.js +2 -0
- package/package.json +38 -0
- package/skills/mongodb-connection.md +208 -0
- package/skills/mongodb-natural-language-querying.md +202 -0
- package/skills/mongodb-query-optimizer.md +265 -0
- package/skills/mongodb-schema-design.md +455 -0
- package/skills/mongodb-search-and-ai.md +357 -0
- package/skills/mongosh-shell.md +227 -0
- package/src/agent-class.ts +393 -0
- package/src/banner.ts +36 -0
- package/src/confirmation-extension.ts +297 -0
- package/src/index.ts +137 -0
- package/src/mongosh-interactive-mode.ts +420 -0
- package/src/shell-context.ts +97 -0
- package/src/skills-loader.ts +37 -0
- package/src/stdout-patcher.ts +48 -0
- package/src/tools/index.ts +4 -0
- package/src/tools/mongosh-eval.ts +115 -0
- package/src/tools/search-docs.ts +115 -0
- package/src/tools/types.ts +15 -0
- package/src/types.ts +23 -0
- package/tsconfig-lint.json +4 -0
- package/tsconfig.json +20 -0
|
@@ -0,0 +1,208 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: mongodb-connection
|
|
3
|
+
description: Optimize MongoDB client connection configuration (pools, timeouts, patterns) for any supported driver language. Use this skill when working/updating/reviewing functions that instantiate or configure a MongoDB client (eg, when calling `connect()`), configuring connection pools, troubleshooting connection errors (ECONNREFUSED, timeouts, pool exhaustion), optimizing performance issues related to connections.
|
|
4
|
+
disable-model-invocation: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# MongoDB Connection Optimizer
|
|
8
|
+
|
|
9
|
+
You are an expert in MongoDB connection management across all officially supported driver languages (Node.js, Python, Java, Go, C#, Ruby, PHP, etc.).
|
|
10
|
+
|
|
11
|
+
**Note:** This skill is for application/driver connection configuration, not for the current mongosh session. For the current mongosh connection, use `db.getMongo()` to inspect connection state.
|
|
12
|
+
|
|
13
|
+
## Core Principle: Context Before Configuration
|
|
14
|
+
|
|
15
|
+
**NEVER add connection pool parameters or timeout settings without first understanding the application's context.** Arbitrary values without justification lead to performance issues and harder-to-debug problems.
|
|
16
|
+
|
|
17
|
+
## Understanding Connection Pools
|
|
18
|
+
|
|
19
|
+
- Connection pooling exists because establishing a MongoDB connection is expensive (TCP + TLS + auth = 50-500ms)
|
|
20
|
+
- Open connections consume ~1 MB of RAM on the MongoDB server per connection
|
|
21
|
+
- Each MongoClient establishes 2 monitoring connections per replica set member
|
|
22
|
+
|
|
23
|
+
**Connection Lifecycle:** Borrow from pool → Execute operation → Return to pool → Prune idle connections exceeding `maxIdleTimeMS`.
|
|
24
|
+
|
|
25
|
+
**Formula:** `Total Connections = (minPoolSize + 2) × replica members × app instances`
|
|
26
|
+
|
|
27
|
+
Example: 10 instances, minPoolSize 5, 3-member set = 210 server connections.
|
|
28
|
+
|
|
29
|
+
## Configuration Design
|
|
30
|
+
|
|
31
|
+
**Before suggesting any configuration changes**, gather context about the application environment:
|
|
32
|
+
|
|
33
|
+
- Deployment type (serverless vs traditional server)
|
|
34
|
+
- Workload type (OLTP vs OLAP)
|
|
35
|
+
- Concurrency patterns (steady vs bursty)
|
|
36
|
+
- Server version and driver version
|
|
37
|
+
- Memory limits on application and database servers
|
|
38
|
+
|
|
39
|
+
### Configuration Scenarios
|
|
40
|
+
|
|
41
|
+
#### Scenario: Serverless Environments (Lambda, Cloud Functions)
|
|
42
|
+
|
|
43
|
+
**Critical pattern**: Initialize client OUTSIDE handler/function scope to enable connection reuse across warm invocations.
|
|
44
|
+
|
|
45
|
+
**Recommended configuration**:
|
|
46
|
+
|
|
47
|
+
| Parameter | Value | Reasoning |
|
|
48
|
+
|-----------|-------|-----------|
|
|
49
|
+
| `maxPoolSize` | 3-5 | Each serverless function instance has its own pool |
|
|
50
|
+
| `minPoolSize` | 0 | Prevent maintaining unused connections |
|
|
51
|
+
| `maxIdleTimeMS` | 10000-30000 | Release unused connections quickly (10-30s) |
|
|
52
|
+
| `connectTimeoutMS` | 5000 | Fail fast on connection issues |
|
|
53
|
+
| `socketTimeoutMS` | 5000 | Use timeouts to ensure sockets are closed |
|
|
54
|
+
|
|
55
|
+
**Node.js Example:**
|
|
56
|
+
```javascript
|
|
57
|
+
// OUTSIDE handler (reused across invocations)
|
|
58
|
+
const client = new MongoClient(uri, {
|
|
59
|
+
maxPoolSize: 3,
|
|
60
|
+
minPoolSize: 0,
|
|
61
|
+
maxIdleTimeMS: 10000,
|
|
62
|
+
connectTimeoutMS: 5000,
|
|
63
|
+
socketTimeoutMS: 5000
|
|
64
|
+
});
|
|
65
|
+
|
|
66
|
+
export const handler = async (event) => {
|
|
67
|
+
// Reuse existing connection
|
|
68
|
+
const db = client.db("mydb");
|
|
69
|
+
const result = await db.collection("items").find({});
|
|
70
|
+
return result;
|
|
71
|
+
};
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
#### Scenario: Traditional Long-Running Servers (OLTP Workload)
|
|
75
|
+
|
|
76
|
+
**Recommended configuration**:
|
|
77
|
+
|
|
78
|
+
| Parameter | Value | Reasoning |
|
|
79
|
+
|-----------|-------|-----------|
|
|
80
|
+
| `maxPoolSize` | 50-100 | Based on peak concurrent requests |
|
|
81
|
+
| `minPoolSize` | 10-20 | Pre-warmed connections for traffic spikes |
|
|
82
|
+
| `maxIdleTimeMS` | 300000-600000 | 5-10 minutes for stable servers |
|
|
83
|
+
| `connectTimeoutMS` | 5000-10000 | Fail fast on connection issues |
|
|
84
|
+
| `socketTimeoutMS` | 30000 | Prevent hanging queries |
|
|
85
|
+
| `serverSelectionTimeoutMS` | 5000 | Quick failover for replica set changes |
|
|
86
|
+
|
|
87
|
+
**Node.js Example:**
|
|
88
|
+
```javascript
|
|
89
|
+
const client = new MongoClient(uri, {
|
|
90
|
+
maxPoolSize: 50,
|
|
91
|
+
minPoolSize: 10,
|
|
92
|
+
maxIdleTimeMS: 300000,
|
|
93
|
+
connectTimeoutMS: 5000,
|
|
94
|
+
socketTimeoutMS: 30000,
|
|
95
|
+
serverSelectionTimeoutMS: 5000
|
|
96
|
+
});
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
#### Scenario: OLAP / Analytical Workloads
|
|
100
|
+
|
|
101
|
+
**Recommended configuration**:
|
|
102
|
+
|
|
103
|
+
| Parameter | Value | Reasoning |
|
|
104
|
+
|-----------|-------|-----------|
|
|
105
|
+
| `maxPoolSize` | 10-20 | Fewer concurrent operations |
|
|
106
|
+
| `minPoolSize` | 0-5 | Queries are infrequent |
|
|
107
|
+
| `socketTimeoutMS` | 300000+ | Allow long-running queries |
|
|
108
|
+
| `maxIdleTimeMS` | 600000 | Minimize connection churn |
|
|
109
|
+
|
|
110
|
+
**Node.js Example:**
|
|
111
|
+
```javascript
|
|
112
|
+
const client = new MongoClient(uri, {
|
|
113
|
+
maxPoolSize: 15,
|
|
114
|
+
minPoolSize: 2,
|
|
115
|
+
socketTimeoutMS: 300000, // 5 minutes for slow queries
|
|
116
|
+
maxIdleTimeMS: 600000
|
|
117
|
+
});
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
#### Scenario: High-Traffic / Bursty Workloads
|
|
121
|
+
|
|
122
|
+
**Recommended configuration**:
|
|
123
|
+
|
|
124
|
+
| Parameter | Value | Reasoning |
|
|
125
|
+
|-----------|-------|-----------|
|
|
126
|
+
| `maxPoolSize` | 100+ | Higher ceiling for traffic spikes |
|
|
127
|
+
| `minPoolSize` | 20-30 | More pre-warmed connections |
|
|
128
|
+
| `maxConnecting` | 2 | Prevent thundering herd |
|
|
129
|
+
| `waitQueueTimeoutMS` | 2000-5000 | Fail fast when pool exhausted |
|
|
130
|
+
| `maxIdleTimeMS` | 300000 | Balance reuse and cleanup |
|
|
131
|
+
|
|
132
|
+
## Troubleshooting Connection Issues
|
|
133
|
+
|
|
134
|
+
### Pool Exhaustion
|
|
135
|
+
|
|
136
|
+
**Symptoms:** `MongoWaitQueueTimeoutError`, increased latency, operations waiting.
|
|
137
|
+
|
|
138
|
+
**Diagnosis via mongosh (server-side):**
|
|
139
|
+
```javascript
|
|
140
|
+
// Check current connections
|
|
141
|
+
db.serverStatus().connections
|
|
142
|
+
|
|
143
|
+
// Check active operations
|
|
144
|
+
db.currentOp({ active: true })
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
**Solutions:**
|
|
148
|
+
- **Increase `maxPoolSize`** when: Server shows low utilization but clients are waiting
|
|
149
|
+
- **Don't increase** when: Server is at capacity (suggest query optimization instead)
|
|
150
|
+
|
|
151
|
+
### Connection Timeouts
|
|
152
|
+
|
|
153
|
+
**Symptoms:** `ECONNREFUSED`, `SocketTimeoutError`
|
|
154
|
+
|
|
155
|
+
**Check:**
|
|
156
|
+
- Network connectivity: Can you connect via mongosh from the same host?
|
|
157
|
+
- Firewall/VPC settings
|
|
158
|
+
- DNS resolution for SRV connections
|
|
159
|
+
- TLS certificate validity
|
|
160
|
+
|
|
161
|
+
### Connection Churn
|
|
162
|
+
|
|
163
|
+
**Symptoms:** Rapidly increasing `connections.totalCreated` server metric
|
|
164
|
+
|
|
165
|
+
**Causes:**
|
|
166
|
+
- Not reusing clients (creating new MongoClient per request)
|
|
167
|
+
- Not caching in serverless
|
|
168
|
+
- `maxIdleTimeMS` too low
|
|
169
|
+
|
|
170
|
+
**Solution:** Ensure single MongoClient instance reused across application lifecycle
|
|
171
|
+
|
|
172
|
+
## Monitoring Connections
|
|
173
|
+
|
|
174
|
+
```javascript
|
|
175
|
+
// In mongosh - check server-side connection metrics
|
|
176
|
+
db.serverStatus().connections
|
|
177
|
+
// Returns:
|
|
178
|
+
// {
|
|
179
|
+
// current: 42, // Current open connections
|
|
180
|
+
// available: 838858, // Available connection slots
|
|
181
|
+
// totalCreated: 1523 // Total connections created since startup
|
|
182
|
+
// }
|
|
183
|
+
|
|
184
|
+
// Check active operations
|
|
185
|
+
db.currentOp({ active: true }).inprog.length
|
|
186
|
+
|
|
187
|
+
// Check slow operations (if profiling enabled)
|
|
188
|
+
db.system.profile.find().sort({ ts: -1 }).limit(5)
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
## Best Practices Summary
|
|
192
|
+
|
|
193
|
+
1. **Create client once, reuse everywhere** - Never create new MongoClient per request
|
|
194
|
+
2. **Initialize outside serverless handlers** - Enable warm-start connection reuse
|
|
195
|
+
3. **Size pools based on concurrency** - Monitor and adjust based on actual load
|
|
196
|
+
4. **Use appropriate timeouts** - Match socketTimeoutMS to expected query duration
|
|
197
|
+
5. **Don't manually close connections** - Let the driver manage connection lifecycle
|
|
198
|
+
6. **Monitor connection metrics** - Watch `connections.current` and creation rate
|
|
199
|
+
|
|
200
|
+
## Action Policy
|
|
201
|
+
|
|
202
|
+
**I will NEVER suggest configuration changes without understanding your context first.**
|
|
203
|
+
|
|
204
|
+
Before recommending connection settings:
|
|
205
|
+
1. I'll ask about your deployment type and workload
|
|
206
|
+
2. I'll inquire about current issues you're experiencing
|
|
207
|
+
3. I'll suggest specific values **with explanations** of why they fit your scenario
|
|
208
|
+
4. You can then apply the configuration and test
|
|
@@ -0,0 +1,202 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: mongodb-natural-language-querying
|
|
3
|
+
description: Generate read-only MongoDB queries (find) or aggregation pipelines using natural language, with collection schema context and sample documents. Use this skill whenever the user asks to write, create, or generate MongoDB queries, wants to filter/query/aggregate data in MongoDB, asks "how do I query...", needs help with query syntax, or discusses finding/filtering/grouping MongoDB documents. Also use for translating SQL-like requests to MongoDB syntax. Does NOT handle Atlas Search ($search operator), vector/semantic search ($vectorSearch operator), fuzzy matching, autocomplete indexes, or relevance scoring - use mongodb-search-and-ai for those. Does NOT analyze or optimize existing queries - use mongodb-query-optimizer for that. Does NOT handle aggregation pipelines that involve write operations.
|
|
4
|
+
disable-model-invocation: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# MongoDB Natural Language Querying
|
|
8
|
+
|
|
9
|
+
You are an expert MongoDB read-only query and aggregation pipeline generator. You have access to the `mongosh_eval` tool to execute shell commands and inspect the database.
|
|
10
|
+
|
|
11
|
+
## Query Generation Process
|
|
12
|
+
|
|
13
|
+
### 1. Gather Context Using mongosh_eval
|
|
14
|
+
|
|
15
|
+
**Required Information:**
|
|
16
|
+
- Database name and collection name (use `show dbs`, `show collections`, or ask user)
|
|
17
|
+
- User's natural language description of the query
|
|
18
|
+
|
|
19
|
+
**Fetch in this order using mongosh_eval:**
|
|
20
|
+
|
|
21
|
+
1. **Indexes** (for query optimization):
|
|
22
|
+
```javascript
|
|
23
|
+
db.collection.getIndexes()
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
2. **Sample documents** (for understanding data patterns):
|
|
27
|
+
```javascript
|
|
28
|
+
db.collection.find().limit(4)
|
|
29
|
+
```
|
|
30
|
+
- Shows actual data values and formats
|
|
31
|
+
- Reveals common patterns (enums, ranges, etc.)
|
|
32
|
+
|
|
33
|
+
3. **Collection stats** (optional, for context):
|
|
34
|
+
```javascript
|
|
35
|
+
db.collection.stats()
|
|
36
|
+
db.collection.countDocuments()
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
### 2. Analyze Context and Validate Fields
|
|
40
|
+
|
|
41
|
+
Before generating a query, always validate field names against the sample documents you fetched. MongoDB won't error on nonexistent field names - it will simply return no results or behave unexpectedly, making bugs hard to diagnose. By checking the sample documents first, you catch these issues before the user tries to run the query.
|
|
42
|
+
|
|
43
|
+
Also review the available indexes to understand which query patterns will perform best.
|
|
44
|
+
|
|
45
|
+
### 3. Choose Query Type: Find vs Aggregation
|
|
46
|
+
|
|
47
|
+
Prefer find queries over aggregation pipelines because find queries are simpler and easier for other developers to understand.
|
|
48
|
+
|
|
49
|
+
**Use Find Query when:**
|
|
50
|
+
- Simple filtering on one or more fields
|
|
51
|
+
- Basic sorting, limiting, or projecting specific fields
|
|
52
|
+
- No need for grouping, complex transformations, or multi-stage processing
|
|
53
|
+
|
|
54
|
+
**Use Aggregation Pipeline when the request requires:**
|
|
55
|
+
- Grouping or aggregation functions (sum, count, average, etc.)
|
|
56
|
+
- Multiple transformation stages
|
|
57
|
+
- Joins with other collections ($lookup)
|
|
58
|
+
- Array unwinding or complex array operations
|
|
59
|
+
|
|
60
|
+
### 4. Format Your Response
|
|
61
|
+
|
|
62
|
+
Output queries using mongosh shell syntax for readability and compatibility with the current mongosh session.
|
|
63
|
+
|
|
64
|
+
**Find Query Response:**
|
|
65
|
+
```javascript
|
|
66
|
+
// Filter: users aged 25+
|
|
67
|
+
// Projection: name and age only
|
|
68
|
+
// Sort: by age descending, limit 10
|
|
69
|
+
db.users.find(
|
|
70
|
+
{ age: { $gte: 25 } },
|
|
71
|
+
{ name: 1, age: 1, _id: 0 }
|
|
72
|
+
).sort({ age: -1 }).limit(10)
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
**Aggregation Pipeline Response:**
|
|
76
|
+
```javascript
|
|
77
|
+
db.orders.aggregate([
|
|
78
|
+
{ $match: { status: 'active' } },
|
|
79
|
+
{ $group: { _id: '$category', total: { $sum: '$amount' } } },
|
|
80
|
+
{ $sort: { total: -1 } }
|
|
81
|
+
])
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
## Best Practices
|
|
85
|
+
|
|
86
|
+
### Query Quality
|
|
87
|
+
1. **Generate correct queries** - Build queries that match user requirements, then check index coverage:
|
|
88
|
+
- Generate the query to correctly satisfy all user requirements
|
|
89
|
+
- After generating the query, check if existing indexes can support it
|
|
90
|
+
- If no appropriate index exists, mention this in your response (user may want to create one)
|
|
91
|
+
- Never use `$where` because it prevents index usage
|
|
92
|
+
- Do not use `$text` without a text index
|
|
93
|
+
- `$expr` should only be used when necessary (use sparingly)
|
|
94
|
+
|
|
95
|
+
2. **Avoid redundant operators** - Never add operators that are already implied by other conditions:
|
|
96
|
+
- Don't add `$exists` when you already have an equality or inequality check (e.g., `status: "active"` or `age: { $gt: 25 }` already implies the field exists)
|
|
97
|
+
- Don't add overlapping range conditions (e.g., don't use both `$gte: 0` and `$gt: -1`)
|
|
98
|
+
- Each condition should add meaningful filtering that isn't already covered
|
|
99
|
+
|
|
100
|
+
3. **Project only needed fields** - Reduce data transfer with projections
|
|
101
|
+
- Add `_id: 0` to the projection when `_id` field is not needed
|
|
102
|
+
|
|
103
|
+
4. **Validate field names** against the sample documents before using them
|
|
104
|
+
|
|
105
|
+
5. **Use appropriate operators** - Choose the right MongoDB operator for the task:
|
|
106
|
+
- `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte` for comparisons
|
|
107
|
+
- `$in`, `$nin` for matching against a list of possible values (equivalent to multiple $eq/$ne conditions OR'ed together)
|
|
108
|
+
- `$and`, `$or`, `$not`, `$nor` for logical operations
|
|
109
|
+
- `$regex` for case-sensitive text pattern matching (prefer left-anchored patterns like `/^prefix/` when possible, as they can use indexes efficiently)
|
|
110
|
+
- `$exists` for field existence checks (prefer `a: {$ne: null}` to `a: {$exists: true}` to leverage available indexes)
|
|
111
|
+
- `$type` for type matching
|
|
112
|
+
|
|
113
|
+
6. **Optimize array field checks** - Use efficient patterns for array operations:
|
|
114
|
+
- To check if an array is non-empty: use `"arrayField.0": {$exists: true}` instead of `arrayField: {$exists: true, $type: "array", $ne: []}`
|
|
115
|
+
- Checking for the first element's existence is simpler, more readable, and more efficient than combining existence, type, and inequality checks
|
|
116
|
+
- For matching array elements with multiple conditions, use `$elemMatch`
|
|
117
|
+
- For array length checks, use `$size` when you need an exact count
|
|
118
|
+
|
|
119
|
+
### Aggregation Pipeline Quality
|
|
120
|
+
1. **Filter early** - Use `$match` as early as possible to reduce documents
|
|
121
|
+
2. **Project at the end** - Use `$project` at the end to correctly shape returned documents to the client
|
|
122
|
+
3. **Limit when possible** - Add `$limit` after `$sort` when appropriate
|
|
123
|
+
4. **Use indexes** - Ensure `$match` and `$sort` stages can use indexes:
|
|
124
|
+
- Place `$match` stages at the beginning of the pipeline
|
|
125
|
+
- Initial `$match` and `$sort` stages can use indexes if they precede any stage that modifies documents
|
|
126
|
+
- After generating `$match` filters, check if indexes can support them
|
|
127
|
+
- Minimize stages that transform documents before first `$match`
|
|
128
|
+
5. **Optimize `$lookup`** - Consider denormalization for frequently joined data
|
|
129
|
+
|
|
130
|
+
### Error Prevention
|
|
131
|
+
1. **Validate all field references** against the sample documents
|
|
132
|
+
2. **Quote field names correctly** - Use dot notation for nested fields
|
|
133
|
+
3. **Escape special characters** in regex patterns
|
|
134
|
+
4. **Check data types** - Ensure field values match field types from sample documents
|
|
135
|
+
5. **Geospatial coordinates** - MongoDB's GeoJSON format requires longitude first, then latitude (e.g., `[longitude, latitude]` or `{type: "Point", coordinates: [lng, lat]}`). This is opposite to how coordinates are often written in plain English, so double-check this when generating geo queries.
|
|
136
|
+
|
|
137
|
+
## Schema Analysis
|
|
138
|
+
|
|
139
|
+
When provided with sample documents, analyze:
|
|
140
|
+
1. **Field types** - String, Number, Boolean, Date, ObjectId, Array, Object
|
|
141
|
+
2. **Field patterns** - Required vs optional fields (check multiple samples)
|
|
142
|
+
3. **Nested structures** - Objects within objects, arrays of objects
|
|
143
|
+
4. **Array elements** - Homogeneous vs heterogeneous arrays
|
|
144
|
+
5. **Special types** - Dates, ObjectIds, Binary data, GeoJSON
|
|
145
|
+
|
|
146
|
+
## Sample Document Usage
|
|
147
|
+
|
|
148
|
+
Use sample documents to:
|
|
149
|
+
- Understand actual data values and ranges
|
|
150
|
+
- Identify field naming conventions (camelCase, snake_case, etc.)
|
|
151
|
+
- Detect common patterns (e.g., status enums, category values)
|
|
152
|
+
- Estimate cardinality for grouping operations
|
|
153
|
+
- Validate that your query will work with real data
|
|
154
|
+
|
|
155
|
+
## Error Handling
|
|
156
|
+
|
|
157
|
+
If you cannot generate a query:
|
|
158
|
+
1. **Explain why** - Missing schema, ambiguous request, impossible query
|
|
159
|
+
2. **Ask for clarification** - Request more details about requirements
|
|
160
|
+
3. **Suggest alternatives** - Propose different approaches if available
|
|
161
|
+
4. **Provide examples** - Show similar queries that could work
|
|
162
|
+
|
|
163
|
+
## Example Workflow
|
|
164
|
+
|
|
165
|
+
**User Input:** "Find all active users over 25 years old, sorted by registration date"
|
|
166
|
+
|
|
167
|
+
**Your Process:**
|
|
168
|
+
1. Use mongosh_eval to check schema: `db.users.find().limit(3)`
|
|
169
|
+
2. Use mongosh_eval to check indexes: `db.users.getIndexes()`
|
|
170
|
+
3. Verify field names: `status`, `age`, `registrationDate` or similar
|
|
171
|
+
4. Verify field types match the query requirements
|
|
172
|
+
5. Generate query based on user requirements
|
|
173
|
+
6. Check if available indexes can support the query
|
|
174
|
+
7. Suggest creating an index if no appropriate index exists for the query filters
|
|
175
|
+
|
|
176
|
+
**Generated Query:**
|
|
177
|
+
```javascript
|
|
178
|
+
db.users.find(
|
|
179
|
+
{ status: 'active', age: { $gt: 25 } }
|
|
180
|
+
).sort({ registrationDate: -1 })
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
## Managing Context Size
|
|
184
|
+
|
|
185
|
+
Fetching large or numerous sample documents wastes context and can degrade query quality.
|
|
186
|
+
|
|
187
|
+
**Adjust sample count by schema width:**
|
|
188
|
+
- < 30 fields: `limit: 4` (default)
|
|
189
|
+
- 30-80 fields: `limit: 2`
|
|
190
|
+
- 80-150 fields: `limit: 1`
|
|
191
|
+
- 150+ fields: `limit: 1` with a projection of only the fields relevant to the user's query
|
|
192
|
+
|
|
193
|
+
**Preview large array fields and strings:**
|
|
194
|
+
- If schema documents contains arrays, use `$slice: 3` in the sample projection to cap array size. Limit string fields to 100 characters with `$substr` in the sample projection to prevent excessively long values from consuming context.
|
|
195
|
+
|
|
196
|
+
## Executing Queries
|
|
197
|
+
|
|
198
|
+
When executing queries via mongosh_eval:
|
|
199
|
+
1. Start with a small limit to preview results: `.limit(5)`
|
|
200
|
+
2. Check that field names and values match expectations
|
|
201
|
+
3. For aggregation pipelines, test incrementally by adding one stage at a time
|
|
202
|
+
4. Use `.explain("executionStats")` to verify index usage for slow queries
|
|
@@ -0,0 +1,265 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: mongodb-query-optimizer
|
|
3
|
+
description: Help with MongoDB query optimization and indexing. Use only when the user asks for optimization or performance - "How do I optimize this query?", "How do I index this?", "Why is this query slow?", "Can you fix my slow queries?", "What are the slow queries on my cluster?", etc. Do not invoke for general MongoDB query writing unless user asks for performance or index help. Prefer indexing as optimization strategy.
|
|
4
|
+
disable-model-invocation: false
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# MongoDB Query Optimizer
|
|
8
|
+
|
|
9
|
+
## When this skill is invoked
|
|
10
|
+
|
|
11
|
+
Invoke **only** when the user wants:
|
|
12
|
+
|
|
13
|
+
- Query/index **optimization** or **performance** help
|
|
14
|
+
- **Why** a query is slow or **how to speed it up**
|
|
15
|
+
- **How to index** a specific query
|
|
16
|
+
- **Slow queries** on their cluster and/or **how to optimize them**
|
|
17
|
+
|
|
18
|
+
Do **not** invoke for routine query authoring unless the user has requested help with optimization, slow queries, or indexing.
|
|
19
|
+
|
|
20
|
+
## High Level Workflow
|
|
21
|
+
|
|
22
|
+
### General Performance Help
|
|
23
|
+
|
|
24
|
+
If the user wants to examine slow queries, or is looking for general performance suggestions (not regarding any particular query):
|
|
25
|
+
|
|
26
|
+
1. Check the profiling level and slow query log using mongosh_eval:
|
|
27
|
+
```javascript
|
|
28
|
+
// Enable profiling level 1 (slow ops only, >100ms)
|
|
29
|
+
db.setProfilingLevel(1, { slowms: 100 })
|
|
30
|
+
|
|
31
|
+
// View recent slow queries
|
|
32
|
+
db.system.profile.find().sort({ ts: -1 }).limit(10)
|
|
33
|
+
|
|
34
|
+
// Check server status for metrics
|
|
35
|
+
db.serverStatus()
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
2. Check index usage stats across collections:
|
|
39
|
+
```javascript
|
|
40
|
+
// For a specific collection
|
|
41
|
+
db.collection.aggregate([{ $indexStats: {} }])
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
### Help with a Specific Query
|
|
45
|
+
|
|
46
|
+
If the user is asking about a particular query:
|
|
47
|
+
|
|
48
|
+
1. **Get existing indexes** using mongosh_eval:
|
|
49
|
+
```javascript
|
|
50
|
+
db.collection.getIndexes()
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
2. **Run explain** to analyze the query plan:
|
|
54
|
+
```javascript
|
|
55
|
+
// Basic explain
|
|
56
|
+
db.collection.find({...}).explain()
|
|
57
|
+
|
|
58
|
+
// Execution stats for detailed analysis
|
|
59
|
+
db.collection.find({...}).explain("executionStats")
|
|
60
|
+
|
|
61
|
+
// All plans execution to compare different approaches
|
|
62
|
+
db.collection.find({...}).explain("allPlansExecution")
|
|
63
|
+
```
|
|
64
|
+
|
|
65
|
+
3. **Get a sample document** to understand the schema:
|
|
66
|
+
```javascript
|
|
67
|
+
db.collection.find().limit(1)
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## Using Explain Output
|
|
71
|
+
|
|
72
|
+
### Key Fields in explain("executionStats")
|
|
73
|
+
|
|
74
|
+
```javascript
|
|
75
|
+
{
|
|
76
|
+
executionStats: {
|
|
77
|
+
nReturned: 5, // Documents returned
|
|
78
|
+
totalDocsExamined: 1000, // Documents scanned
|
|
79
|
+
totalKeysExamined: 5, // Index keys examined
|
|
80
|
+
executionTimeMillis: 2, // Time in milliseconds
|
|
81
|
+
stage: "IXSCAN", // COLLSCAN, IXSCAN, FETCH, etc.
|
|
82
|
+
inputStage: {
|
|
83
|
+
stage: "IXSCAN",
|
|
84
|
+
indexName: "field_1",
|
|
85
|
+
keyPattern: { field: 1 }
|
|
86
|
+
}
|
|
87
|
+
}
|
|
88
|
+
}
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
### What to Look For
|
|
92
|
+
|
|
93
|
+
**Good signs:**
|
|
94
|
+
- `stage: "IXSCAN"` with `totalKeysExamined` close to `nReturned`
|
|
95
|
+
- `totalDocsExamined` equals `nReturned` (covered query)
|
|
96
|
+
- `executionTimeMillis` is low
|
|
97
|
+
|
|
98
|
+
**Bad signs:**
|
|
99
|
+
- `stage: "COLLSCAN"` (collection scan)
|
|
100
|
+
- `totalDocsExamined` much higher than `nReturned` (inefficient index)
|
|
101
|
+
- High `executionTimeMillis`
|
|
102
|
+
|
|
103
|
+
## Example Workflow 1 (help with specific query)
|
|
104
|
+
|
|
105
|
+
**User:** "Why is this query slow? `db.orders.find({status: 'shipped', region: 'US'}).sort({date: -1})`"
|
|
106
|
+
|
|
107
|
+
1. **Check existing collection indexes:**
|
|
108
|
+
```javascript
|
|
109
|
+
db.orders.getIndexes()
|
|
110
|
+
```
|
|
111
|
+
- Result shows: `{_id: 1}`, `{status: 1}`, `{date: -1}`
|
|
112
|
+
|
|
113
|
+
2. **Run explain:**
|
|
114
|
+
```javascript
|
|
115
|
+
db.orders.find(
|
|
116
|
+
{status: 'shipped', region: 'US'}
|
|
117
|
+
).sort({date: -1}).explain("executionStats")
|
|
118
|
+
```
|
|
119
|
+
- Result: Uses `{status: 1}` index, then in-memory SORT
|
|
120
|
+
- `totalKeysExamined: 50000`, `nReturned: 100`
|
|
121
|
+
|
|
122
|
+
3. **Diagnose:** The query targets 100 docs but scans 50K index entries. In-memory sort adds overhead. Index doesn't support both filter fields or sort.
|
|
123
|
+
|
|
124
|
+
4. **Recommend:** Create compound index `{status: 1, region: 1, date: -1}` following ESR (two equality fields, then sort).
|
|
125
|
+
|
|
126
|
+
## Indexing Best Practices
|
|
127
|
+
|
|
128
|
+
### Creating Indexes Safely
|
|
129
|
+
|
|
130
|
+
```javascript
|
|
131
|
+
// Check index doesn't exist first
|
|
132
|
+
db.collection.getIndexes()
|
|
133
|
+
|
|
134
|
+
// Create index in background (recommended for production)
|
|
135
|
+
db.collection.createIndex(
|
|
136
|
+
{ field: 1 },
|
|
137
|
+
{ background: true }
|
|
138
|
+
)
|
|
139
|
+
|
|
140
|
+
// Compound index following ESR rule
|
|
141
|
+
// Equality → Sort → Range
|
|
142
|
+
db.orders.createIndex({
|
|
143
|
+
status: 1, // Equality
|
|
144
|
+
createdAt: -1, // Sort
|
|
145
|
+
age: 1 // Range
|
|
146
|
+
})
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
### Common Index Candidates
|
|
150
|
+
- Fields in `find()` queries (especially equality matches)
|
|
151
|
+
- Fields in `sort()` operations
|
|
152
|
+
- Fields in aggregation `$match` stages
|
|
153
|
+
- Foreign key-like fields used in `$lookup`
|
|
154
|
+
|
|
155
|
+
### When Queries Use Indexes
|
|
156
|
+
- Equality matches on index prefix
|
|
157
|
+
- Range queries on index fields (use bounded ranges)
|
|
158
|
+
- Sorting on indexed fields
|
|
159
|
+
- Covered queries (all fields in index)
|
|
160
|
+
|
|
161
|
+
### When Indexes Are Ignored
|
|
162
|
+
- `$nin`, `$ne`, `$not` often can't use indexes effectively
|
|
163
|
+
- Regex without prefix anchor `/^pattern/`
|
|
164
|
+
- `$where` clauses
|
|
165
|
+
- Large `$in` arrays (threshold varies)
|
|
166
|
+
|
|
167
|
+
### Specialized Index Types
|
|
168
|
+
|
|
169
|
+
```javascript
|
|
170
|
+
// Partial index (only index active users)
|
|
171
|
+
db.users.createIndex(
|
|
172
|
+
{ email: 1 },
|
|
173
|
+
{ partialFilterExpression: { status: "active" } }
|
|
174
|
+
)
|
|
175
|
+
|
|
176
|
+
// Sparse index (only index documents where field exists)
|
|
177
|
+
db.collection.createIndex(
|
|
178
|
+
{ optionalField: 1 },
|
|
179
|
+
{ sparse: true }
|
|
180
|
+
)
|
|
181
|
+
|
|
182
|
+
// TTL index (auto-delete old documents)
|
|
183
|
+
db.logs.createIndex(
|
|
184
|
+
{ createdAt: 1 },
|
|
185
|
+
{ expireAfterSeconds: 2592000 } // 30 days
|
|
186
|
+
)
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
## Query Patterns to Avoid
|
|
190
|
+
|
|
191
|
+
1. **Unbounded queries**: Always use limits
|
|
192
|
+
```javascript
|
|
193
|
+
// BAD
|
|
194
|
+
db.logs.find({ level: "error" })
|
|
195
|
+
// GOOD
|
|
196
|
+
db.logs.find({ level: "error" }).limit(100)
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
2. **Large skip values**: Use cursor-based pagination
|
|
200
|
+
```javascript
|
|
201
|
+
// BAD: skip 1000000 is slow
|
|
202
|
+
db.collection.find().skip(1000000).limit(10)
|
|
203
|
+
// GOOD: cursor-based
|
|
204
|
+
db.collection.find({ _id: { $gt: lastId } }).limit(10)
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
3. **$lookup without index on foreign field**
|
|
208
|
+
|
|
209
|
+
4. **Updating large arrays**: Consider separate collection
|
|
210
|
+
|
|
211
|
+
5. **Unnecessary projections**: Project only needed fields
|
|
212
|
+
|
|
213
|
+
## Server Metrics to Monitor
|
|
214
|
+
|
|
215
|
+
```javascript
|
|
216
|
+
// Key metrics from serverStatus
|
|
217
|
+
db.serverStatus().opcounters // CRUD operation counts
|
|
218
|
+
db.serverStatus().connections // Current connections
|
|
219
|
+
db.serverStatus().mem // Memory usage
|
|
220
|
+
db.serverStatus().globalLock // Lock contention
|
|
221
|
+
|
|
222
|
+
// WiredTiger cache metrics
|
|
223
|
+
db.serverStatus().wiredTiger.cache
|
|
224
|
+
// Look for:
|
|
225
|
+
// - "bytes currently in the cache" vs available RAM
|
|
226
|
+
// - "pages evicted by application threads" (high = pressure)
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
## Aggregation Pipeline Optimization
|
|
230
|
+
|
|
231
|
+
### Stage Ordering
|
|
232
|
+
```javascript
|
|
233
|
+
// GOOD: Filter early
|
|
234
|
+
db.orders.aggregate([
|
|
235
|
+
{ $match: { status: "shipped", date: { $gte: startDate } } },
|
|
236
|
+
{ $group: { _id: "$customerId", total: { $sum: "$amount" } } },
|
|
237
|
+
{ $sort: { total: -1 } },
|
|
238
|
+
{ $limit: 10 }
|
|
239
|
+
])
|
|
240
|
+
|
|
241
|
+
// BAD: Sort before filtering
|
|
242
|
+
db.orders.aggregate([
|
|
243
|
+
{ $sort: { date: -1 } },
|
|
244
|
+
{ $match: { status: "shipped" } }
|
|
245
|
+
])
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
### Memory Optimization
|
|
249
|
+
```javascript
|
|
250
|
+
// Allow disk use for large aggregations
|
|
251
|
+
db.collection.aggregate([...], { allowDiskUse: true })
|
|
252
|
+
|
|
253
|
+
// Use $project to reduce document size early
|
|
254
|
+
{ $project: { neededField: 1, computed: { $add: ["$a", "$b"] } } }
|
|
255
|
+
```
|
|
256
|
+
|
|
257
|
+
## Output Guidelines
|
|
258
|
+
|
|
259
|
+
- Keep answers short and clear: a few sentences on index and optimization suggestions
|
|
260
|
+
- Focus on highest impact indexes or optimizations
|
|
261
|
+
- Do not use strong language like "this will definitely improve performance"
|
|
262
|
+
- Explain they are suggestions and give the reasoning behind them
|
|
263
|
+
- Consider how many indexes already exist on the collection (shouldn't generally be more than 20)
|
|
264
|
+
- Suggest removing indexes only if they are clearly unused (check `$indexStats`)
|
|
265
|
+
- Never create indexes without user approval - show the command and wait for confirmation
|