@mhalder/qdrant-mcp-server 1.1.0 → 1.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +82 -69
- package/CONTRIBUTING.md +81 -92
- package/README.md +97 -634
- package/examples/README.md +63 -253
- package/examples/basic/README.md +18 -72
- package/examples/filters/README.md +55 -155
- package/examples/knowledge-base/README.md +36 -98
- package/examples/rate-limiting/README.md +81 -290
- package/package.json +1 -1
- package/docs/test_report.md +0 -259
|
@@ -1,124 +1,81 @@
|
|
|
1
|
-
# Rate Limiting
|
|
1
|
+
# Rate Limiting
|
|
2
2
|
|
|
3
|
-
Learn how the
|
|
3
|
+
Learn how the server handles embedding provider API rate limits automatically with intelligent throttling and retry mechanisms.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
**Time:** 10-15 minutes | **Difficulty:** Beginner to Intermediate
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
## Why It Matters
|
|
8
8
|
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
-
|
|
13
|
-
|
|
14
|
-
|
|
9
|
+
| Provider | Rate Limits | Notes |
|
|
10
|
+
| -------------------- | --------------- | ------------------------------- |
|
|
11
|
+
| **Ollama** (default) | None | Local processing, no limits! |
|
|
12
|
+
| **OpenAI** | 500-10,000+/min | Based on tier (Free/Tier 1/2/3) |
|
|
13
|
+
| **Cohere** | ~100/min | Varies by plan |
|
|
14
|
+
| **Voyage AI** | ~300/min | Varies by plan |
|
|
15
15
|
|
|
16
|
-
**
|
|
17
|
-
**Difficulty:** Beginner to Intermediate
|
|
18
|
-
|
|
19
|
-
## Why Rate Limiting Matters
|
|
20
|
-
|
|
21
|
-
**Ollama (Default):** Since Ollama runs locally, there are no API rate limits! You can process as many embeddings as your system can handle.
|
|
22
|
-
|
|
23
|
-
**Cloud Embedding Providers** (OpenAI, Cohere, Voyage AI) enforce rate limits based on your account tier:
|
|
24
|
-
|
|
25
|
-
**OpenAI:**
|
|
26
|
-
| Tier | Requests/Minute |
|
|
27
|
-
| ------- | --------------- |
|
|
28
|
-
| Free | 500 |
|
|
29
|
-
| Tier 1 | 3,500 |
|
|
30
|
-
| Tier 2 | 5,000 |
|
|
31
|
-
| Tier 3+ | 10,000+ |
|
|
32
|
-
|
|
33
|
-
**Other Cloud Providers:**
|
|
34
|
-
|
|
35
|
-
- **Cohere**: ~100 requests/minute (varies by plan)
|
|
36
|
-
- **Voyage AI**: ~300 requests/minute (varies by plan)
|
|
37
|
-
|
|
38
|
-
Without rate limiting, batch operations with cloud providers can exceed these limits and fail. This is one reason why **Ollama is the default** - no rate limits to worry about!
|
|
16
|
+
Without rate limiting, batch operations with cloud providers can fail. **This is why Ollama is the default** - no rate limits to worry about!
|
|
39
17
|
|
|
40
18
|
## How It Works
|
|
41
19
|
|
|
42
20
|
The server automatically:
|
|
43
21
|
|
|
44
|
-
1. **Throttles
|
|
45
|
-
2. **Retries
|
|
46
|
-
3. **Respects
|
|
47
|
-
4. **Provides Feedback
|
|
22
|
+
1. **Throttles** - Queues API calls within limits
|
|
23
|
+
2. **Retries** - Exponential backoff (1s, 2s, 4s, 8s...)
|
|
24
|
+
3. **Respects Headers** - Follows provider retry guidance
|
|
25
|
+
4. **Provides Feedback** - Console shows retry progress
|
|
48
26
|
|
|
49
27
|
## Configuration
|
|
50
28
|
|
|
51
|
-
###
|
|
29
|
+
### Provider Defaults
|
|
30
|
+
|
|
31
|
+
| Provider | Default Limit | Retry Attempts | Retry Delay |
|
|
32
|
+
| --------- | ----------------- | -------------- | ----------- |
|
|
33
|
+
| Ollama | 1000/min | 3 | 500ms |
|
|
34
|
+
| OpenAI | 3500/min (Tier 1) | 3 | 1000ms |
|
|
35
|
+
| Cohere | 100/min | 3 | 1000ms |
|
|
36
|
+
| Voyage AI | 300/min | 3 | 1000ms |
|
|
37
|
+
|
|
38
|
+
### Custom Settings
|
|
52
39
|
|
|
53
40
|
```bash
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
#
|
|
41
|
+
# Adjust for your provider tier
|
|
42
|
+
EMBEDDING_MAX_REQUESTS_PER_MINUTE=500 # Free tier
|
|
43
|
+
EMBEDDING_RETRY_ATTEMPTS=5 # More resilient
|
|
44
|
+
EMBEDDING_RETRY_DELAY=2000 # Longer initial delay
|
|
58
45
|
```
|
|
59
46
|
|
|
60
|
-
###
|
|
47
|
+
### Provider Examples
|
|
61
48
|
|
|
62
|
-
**
|
|
49
|
+
**Ollama (Default):**
|
|
63
50
|
|
|
64
51
|
```bash
|
|
65
|
-
EMBEDDING_PROVIDER=
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
EMBEDDING_RETRY_DELAY=1000
|
|
52
|
+
EMBEDDING_PROVIDER=ollama
|
|
53
|
+
EMBEDDING_BASE_URL=http://localhost:11434
|
|
54
|
+
# No rate limit config needed!
|
|
69
55
|
```
|
|
70
56
|
|
|
71
|
-
**Free Tier:**
|
|
57
|
+
**OpenAI Free Tier:**
|
|
72
58
|
|
|
73
59
|
```bash
|
|
74
60
|
EMBEDDING_PROVIDER=openai
|
|
75
61
|
EMBEDDING_MAX_REQUESTS_PER_MINUTE=500
|
|
76
62
|
EMBEDDING_RETRY_ATTEMPTS=5
|
|
77
|
-
EMBEDDING_RETRY_DELAY=2000
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
### Cohere Settings
|
|
81
|
-
|
|
82
|
-
```bash
|
|
83
|
-
EMBEDDING_PROVIDER=cohere
|
|
84
|
-
EMBEDDING_MAX_REQUESTS_PER_MINUTE=100
|
|
85
|
-
EMBEDDING_RETRY_ATTEMPTS=3
|
|
86
|
-
EMBEDDING_RETRY_DELAY=1000
|
|
87
63
|
```
|
|
88
64
|
|
|
89
|
-
|
|
65
|
+
**OpenAI Paid Tier:**
|
|
90
66
|
|
|
91
67
|
```bash
|
|
92
|
-
EMBEDDING_PROVIDER=
|
|
93
|
-
EMBEDDING_MAX_REQUESTS_PER_MINUTE=
|
|
94
|
-
EMBEDDING_RETRY_ATTEMPTS=3
|
|
95
|
-
EMBEDDING_RETRY_DELAY=1000
|
|
68
|
+
EMBEDDING_PROVIDER=openai
|
|
69
|
+
EMBEDDING_MAX_REQUESTS_PER_MINUTE=3500 # Tier 1
|
|
96
70
|
```
|
|
97
71
|
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
```bash
|
|
101
|
-
EMBEDDING_PROVIDER=ollama
|
|
102
|
-
EMBEDDING_MAX_REQUESTS_PER_MINUTE=1000
|
|
103
|
-
EMBEDDING_RETRY_ATTEMPTS=3
|
|
104
|
-
EMBEDDING_RETRY_DELAY=500
|
|
105
|
-
```
|
|
106
|
-
|
|
107
|
-
## Example: Batch Document Processing
|
|
108
|
-
|
|
109
|
-
Let's test rate limiting by adding many documents at once.
|
|
110
|
-
|
|
111
|
-
### Step 1: Create Collection
|
|
72
|
+
## Example: Batch Processing
|
|
112
73
|
|
|
113
74
|
```
|
|
75
|
+
# Create collection
|
|
114
76
|
Create a collection named "rate-limit-test"
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
### Step 2: Add Batch of Documents
|
|
118
77
|
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
```
|
|
78
|
+
# Add batch of documents (tests rate limiting)
|
|
122
79
|
Add these documents to "rate-limit-test":
|
|
123
80
|
- id: 1, text: "Introduction to machine learning algorithms", metadata: {"topic": "ml"}
|
|
124
81
|
- id: 2, text: "Deep learning neural networks explained", metadata: {"topic": "dl"}
|
|
@@ -130,125 +87,23 @@ Add these documents to "rate-limit-test":
|
|
|
130
87
|
- id: 8, text: "Hyperparameter optimization methods", metadata: {"topic": "tuning"}
|
|
131
88
|
- id: 9, text: "Transfer learning and fine-tuning", metadata: {"topic": "transfer"}
|
|
132
89
|
- id: 10, text: "Ensemble methods and boosting", metadata: {"topic": "ensemble"}
|
|
133
|
-
```
|
|
134
90
|
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
- The server generates embeddings for all 10 documents
|
|
138
|
-
- Requests are automatically queued and throttled
|
|
139
|
-
- If rate limits are hit, automatic retry with backoff occurs
|
|
140
|
-
- Console shows retry messages with wait times
|
|
141
|
-
|
|
142
|
-
### Step 3: Test Search
|
|
143
|
-
|
|
144
|
-
```
|
|
91
|
+
# Search
|
|
145
92
|
Search "rate-limit-test" for "neural networks and deep learning"
|
|
146
|
-
```
|
|
147
|
-
|
|
148
|
-
### Step 4: Monitor Console Output
|
|
149
|
-
|
|
150
|
-
Watch for rate limiting messages:
|
|
151
|
-
|
|
152
|
-
```
|
|
153
|
-
Rate limit reached. Retrying in 1.0s (attempt 1/3)...
|
|
154
|
-
Rate limit reached. Retrying in 2.0s (attempt 2/3)...
|
|
155
|
-
```
|
|
156
|
-
|
|
157
|
-
These messages indicate:
|
|
158
|
-
|
|
159
|
-
- Rate limit was detected (429 error)
|
|
160
|
-
- Automatic retry is in progress
|
|
161
|
-
- Current attempt number and delay
|
|
162
|
-
|
|
163
|
-
## Simulating Rate Limit Scenarios
|
|
164
|
-
|
|
165
|
-
### Scenario 1: Free Tier User
|
|
166
|
-
|
|
167
|
-
**Configuration:**
|
|
168
|
-
|
|
169
|
-
```bash
|
|
170
|
-
OPENAI_MAX_REQUESTS_PER_MINUTE=500
|
|
171
|
-
```
|
|
172
|
-
|
|
173
|
-
**Test:** Add 50 documents in batches of 10
|
|
174
|
-
|
|
175
|
-
- Server automatically spaces requests
|
|
176
|
-
- No manual rate limit handling needed
|
|
177
|
-
- Operations complete successfully
|
|
178
|
-
|
|
179
|
-
### Scenario 2: High-Volume Batch
|
|
180
|
-
|
|
181
|
-
**Test:** Add 100+ documents
|
|
182
|
-
|
|
183
|
-
- Create collection: `batch-test-collection`
|
|
184
|
-
- Add documents in chunks
|
|
185
|
-
- Server queues requests automatically
|
|
186
|
-
- Monitor console for throttling behavior
|
|
187
|
-
|
|
188
|
-
### Scenario 3: Concurrent Operations
|
|
189
|
-
|
|
190
|
-
**Test:** Multiple searches simultaneously
|
|
191
|
-
|
|
192
|
-
- Perform several searches in quick succession
|
|
193
|
-
- Rate limiter queues them appropriately
|
|
194
|
-
- All complete without errors
|
|
195
|
-
|
|
196
|
-
## Best Practices
|
|
197
|
-
|
|
198
|
-
### 1. Configure for Your Provider
|
|
199
|
-
|
|
200
|
-
Always set `EMBEDDING_MAX_REQUESTS_PER_MINUTE` to match your provider's limits:
|
|
201
93
|
|
|
202
|
-
|
|
94
|
+
# Watch console for rate limit messages:
|
|
95
|
+
# "Rate limit reached. Retrying in 1.0s (attempt 1/3)..."
|
|
96
|
+
# "Rate limit reached. Retrying in 2.0s (attempt 2/3)..."
|
|
203
97
|
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
EMBEDDING_MAX_REQUESTS_PER_MINUTE=<your-limit>
|
|
207
|
-
```
|
|
208
|
-
|
|
209
|
-
**Other Providers:**
|
|
210
|
-
|
|
211
|
-
- Check your provider's dashboard for rate limits
|
|
212
|
-
- Start conservative and increase if needed
|
|
213
|
-
|
|
214
|
-
### 2. Adjust Retry Settings for Reliability
|
|
215
|
-
|
|
216
|
-
For critical operations, increase retry attempts:
|
|
217
|
-
|
|
218
|
-
```bash
|
|
219
|
-
EMBEDDING_RETRY_ATTEMPTS=5 # More resilient
|
|
220
|
-
```
|
|
221
|
-
|
|
222
|
-
For development/testing, reduce retries:
|
|
223
|
-
|
|
224
|
-
```bash
|
|
225
|
-
EMBEDDING_RETRY_ATTEMPTS=1 # Fail faster
|
|
98
|
+
# Cleanup
|
|
99
|
+
Delete collection "rate-limit-test"
|
|
226
100
|
```
|
|
227
101
|
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
Most embedding providers support batch operations:
|
|
231
|
-
|
|
232
|
-
- **OpenAI**: Up to 2048 texts per request
|
|
233
|
-
- **Cohere**: Batch support available
|
|
234
|
-
- **Voyage AI**: Batch support available
|
|
235
|
-
- **Ollama**: Sequential processing (one at a time)
|
|
236
|
-
|
|
237
|
-
The server automatically uses batch APIs when available for efficiency.
|
|
238
|
-
|
|
239
|
-
### 4. Monitor Your Usage
|
|
240
|
-
|
|
241
|
-
Watch console output during operations:
|
|
242
|
-
|
|
243
|
-
- No messages = smooth operation
|
|
244
|
-
- Retry messages = hitting limits (consider reducing rate)
|
|
245
|
-
- Error after max retries = need to reduce request volume
|
|
102
|
+
## Retry Behavior
|
|
246
103
|
|
|
247
|
-
|
|
104
|
+
### Exponential Backoff
|
|
248
105
|
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
With `OPENAI_RETRY_DELAY=1000`:
|
|
106
|
+
With `EMBEDDING_RETRY_DELAY=1000`:
|
|
252
107
|
|
|
253
108
|
| Attempt | Delay | Total Wait |
|
|
254
109
|
| ------- | ----- | ---------- |
|
|
@@ -259,118 +114,54 @@ With `OPENAI_RETRY_DELAY=1000`:
|
|
|
259
114
|
|
|
260
115
|
### Retry-After Header
|
|
261
116
|
|
|
262
|
-
If
|
|
117
|
+
If provider sends `Retry-After` header (OpenAI):
|
|
263
118
|
|
|
264
|
-
- Server uses
|
|
119
|
+
- Server uses exact delay
|
|
265
120
|
- Ignores exponential backoff
|
|
266
121
|
- Ensures optimal recovery
|
|
267
122
|
|
|
268
|
-
##
|
|
269
|
-
|
|
270
|
-
### Success Messages
|
|
271
|
-
|
|
272
|
-
```
|
|
273
|
-
Successfully added 10 document(s) to collection "rate-limit-test".
|
|
274
|
-
```
|
|
275
|
-
|
|
276
|
-
### Retry Messages (Normal)
|
|
277
|
-
|
|
278
|
-
```
|
|
279
|
-
Rate limit reached. Retrying in 2.0s (attempt 1/3)...
|
|
280
|
-
```
|
|
281
|
-
|
|
282
|
-
**Action:** None needed, automatic retry in progress
|
|
283
|
-
|
|
284
|
-
### Max Retries Exceeded (Rare)
|
|
285
|
-
|
|
286
|
-
```
|
|
287
|
-
Error: [Provider] API rate limit exceeded after 3 retry attempts.
|
|
288
|
-
Please try again later or reduce request frequency.
|
|
289
|
-
```
|
|
290
|
-
|
|
291
|
-
**Action:**
|
|
123
|
+
## Best Practices
|
|
292
124
|
|
|
293
|
-
-
|
|
294
|
-
-
|
|
295
|
-
|
|
125
|
+
1. **Match Your Tier** - Set `EMBEDDING_MAX_REQUESTS_PER_MINUTE` to your provider's limit
|
|
126
|
+
2. **Check Dashboards** - Verify limits at provider's dashboard
|
|
127
|
+
3. **Start Conservative** - Lower limits, increase if needed
|
|
128
|
+
4. **Monitor Console** - Watch for retry messages
|
|
129
|
+
5. **Use Ollama** - For unlimited local processing
|
|
296
130
|
|
|
297
|
-
|
|
131
|
+
### Batch Operation Tips
|
|
298
132
|
|
|
299
|
-
|
|
133
|
+
- **OpenAI**: Up to 2048 texts per request
|
|
134
|
+
- **Cohere**: Batch support available
|
|
135
|
+
- **Voyage AI**: Batch support available
|
|
136
|
+
- **Ollama**: Sequential processing (one at a time)
|
|
300
137
|
|
|
301
|
-
|
|
138
|
+
Server automatically uses batch APIs when available.
|
|
302
139
|
|
|
303
|
-
|
|
304
|
-
{
|
|
305
|
-
"mcpServers": {
|
|
306
|
-
"qdrant": {
|
|
307
|
-
"command": "node",
|
|
308
|
-
"args": ["/path/to/qdrant-mcp-server/build/index.js"],
|
|
309
|
-
"env": {
|
|
310
|
-
"QDRANT_URL": "http://localhost:6333",
|
|
311
|
-
"EMBEDDING_BASE_URL": "http://localhost:11434"
|
|
312
|
-
}
|
|
313
|
-
}
|
|
314
|
-
}
|
|
315
|
-
}
|
|
316
|
-
```
|
|
140
|
+
## Error Messages
|
|
317
141
|
|
|
318
|
-
**Example with OpenAI (Alternative):**
|
|
319
|
-
|
|
320
|
-
```json
|
|
321
|
-
{
|
|
322
|
-
"mcpServers": {
|
|
323
|
-
"qdrant": {
|
|
324
|
-
"command": "node",
|
|
325
|
-
"args": ["/path/to/qdrant-mcp-server/build/index.js"],
|
|
326
|
-
"env": {
|
|
327
|
-
"EMBEDDING_PROVIDER": "openai",
|
|
328
|
-
"OPENAI_API_KEY": "sk-your-key",
|
|
329
|
-
"QDRANT_URL": "http://localhost:6333",
|
|
330
|
-
"EMBEDDING_MAX_REQUESTS_PER_MINUTE": "3500",
|
|
331
|
-
"EMBEDDING_RETRY_ATTEMPTS": "3",
|
|
332
|
-
"EMBEDDING_RETRY_DELAY": "1000"
|
|
333
|
-
}
|
|
334
|
-
}
|
|
335
|
-
}
|
|
336
|
-
}
|
|
337
142
|
```
|
|
143
|
+
# Success
|
|
144
|
+
Successfully added 10 document(s) to collection "rate-limit-test".
|
|
338
145
|
|
|
339
|
-
|
|
146
|
+
# Retry (Normal)
|
|
147
|
+
Rate limit reached. Retrying in 2.0s (attempt 1/3)...
|
|
148
|
+
# Action: None needed, automatic retry
|
|
340
149
|
|
|
341
|
-
|
|
342
|
-
|
|
150
|
+
# Max Retries Exceeded (Rare)
|
|
151
|
+
Error: API rate limit exceeded after 3 retry attempts.
|
|
152
|
+
# Action: Wait, reduce EMBEDDING_MAX_REQUESTS_PER_MINUTE, check dashboard
|
|
343
153
|
```
|
|
344
154
|
|
|
345
|
-
##
|
|
155
|
+
## Troubleshooting
|
|
346
156
|
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
6. ✅ **Efficient**: Batch operations optimize API usage
|
|
157
|
+
| Issue | Solution |
|
|
158
|
+
| ---------------------- | -------------------------------------------------- |
|
|
159
|
+
| Persistent rate limits | Reduce `EMBEDDING_MAX_REQUESTS_PER_MINUTE` by 20% |
|
|
160
|
+
| Slow performance | Expected with rate limiting - better than failures |
|
|
161
|
+
| Need faster processing | Upgrade provider tier or use Ollama |
|
|
353
162
|
|
|
354
163
|
## Next Steps
|
|
355
164
|
|
|
356
|
-
- Explore [Knowledge Base
|
|
165
|
+
- Explore [Knowledge Base](../knowledge-base/) for real-world usage patterns
|
|
357
166
|
- Learn [Advanced Filtering](../filters/) for complex queries
|
|
358
|
-
-
|
|
359
|
-
|
|
360
|
-
## Troubleshooting
|
|
361
|
-
|
|
362
|
-
### Still Getting Rate Limit Errors?
|
|
363
|
-
|
|
364
|
-
1. **Check your provider's limits**: Visit your provider's dashboard
|
|
365
|
-
2. **Reduce request rate**: Lower `EMBEDDING_MAX_REQUESTS_PER_MINUTE` by 20%
|
|
366
|
-
3. **Increase retry attempts**: Set `EMBEDDING_RETRY_ATTEMPTS=5`
|
|
367
|
-
4. **Wait between batches**: For very large operations, split into multiple sessions
|
|
368
|
-
|
|
369
|
-
### Slow Performance?
|
|
370
|
-
|
|
371
|
-
If operations seem slow:
|
|
372
|
-
|
|
373
|
-
- This is expected with rate limiting
|
|
374
|
-
- It's better than failed operations
|
|
375
|
-
- Upgrade your provider's tier for higher limits
|
|
376
|
-
- Consider using Ollama for unlimited local processing
|
|
167
|
+
- Review [main README](../../README.md) for all configuration options
|
package/package.json
CHANGED