@mhalder/qdrant-mcp-server 1.1.0 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,124 +1,81 @@
1
- # Rate Limiting Example
1
+ # Rate Limiting
2
2
 
3
- Learn how the Qdrant MCP Server handles embedding provider API rate limits automatically with intelligent throttling and retry mechanisms.
3
+ Learn how the server handles embedding provider API rate limits automatically with intelligent throttling and retry mechanisms.
4
4
 
5
- ## Overview
5
+ **Time:** 10-15 minutes | **Difficulty:** Beginner to Intermediate
6
6
 
7
- This example demonstrates:
7
+ ## Why It Matters
8
8
 
9
- - How rate limiting prevents API failures (for cloud providers)
10
- - Configuring rate limits for your embedding provider
11
- - Batch operations with automatic throttling
12
- - Exponential backoff retry behavior
13
- - Monitoring rate limit events
14
- - Why Ollama doesn't need rate limiting (local processing)
9
+ | Provider | Rate Limits | Notes |
10
+ | -------------------- | --------------- | ------------------------------- |
11
+ | **Ollama** (default) | None | Local processing, no limits! |
12
+ | **OpenAI** | 500-10,000+/min | Based on tier (Free/Tier 1/2/3) |
13
+ | **Cohere** | ~100/min | Varies by plan |
14
+ | **Voyage AI** | ~300/min | Varies by plan |
15
15
 
16
- **Time:** 10-15 minutes
17
- **Difficulty:** Beginner to Intermediate
18
-
19
- ## Why Rate Limiting Matters
20
-
21
- **Ollama (Default):** Since Ollama runs locally, there are no API rate limits! You can process as many embeddings as your system can handle.
22
-
23
- **Cloud Embedding Providers** (OpenAI, Cohere, Voyage AI) enforce rate limits based on your account tier:
24
-
25
- **OpenAI:**
26
- | Tier | Requests/Minute |
27
- | ------- | --------------- |
28
- | Free | 500 |
29
- | Tier 1 | 3,500 |
30
- | Tier 2 | 5,000 |
31
- | Tier 3+ | 10,000+ |
32
-
33
- **Other Cloud Providers:**
34
-
35
- - **Cohere**: ~100 requests/minute (varies by plan)
36
- - **Voyage AI**: ~300 requests/minute (varies by plan)
37
-
38
- Without rate limiting, batch operations with cloud providers can exceed these limits and fail. This is one reason why **Ollama is the default** - no rate limits to worry about!
16
+ Without rate limiting, batch operations with cloud providers can fail. **This is why Ollama is the default** - no rate limits to worry about!
39
17
 
40
18
  ## How It Works
41
19
 
42
20
  The server automatically:
43
21
 
44
- 1. **Throttles Requests**: Queues API calls to stay within limits
45
- 2. **Retries on Failure**: Uses exponential backoff (1s, 2s, 4s, 8s...)
46
- 3. **Respects Retry-After**: Follows provider retry guidance (when available)
47
- 4. **Provides Feedback**: Shows retry progress in console
22
+ 1. **Throttles** - Queues API calls within limits
23
+ 2. **Retries** - Exponential backoff (1s, 2s, 4s, 8s...)
24
+ 3. **Respects Headers** - Follows provider retry guidance
25
+ 4. **Provides Feedback** - Console shows retry progress
48
26
 
49
27
  ## Configuration
50
28
 
51
- ### Ollama Settings (Default - No Rate Limiting Needed)
29
+ ### Provider Defaults
30
+
31
+ | Provider | Default Limit | Retry Attempts | Retry Delay |
32
+ | --------- | ----------------- | -------------- | ----------- |
33
+ | Ollama | 1000/min | 3 | 500ms |
34
+ | OpenAI | 3500/min (Tier 1) | 3 | 1000ms |
35
+ | Cohere | 100/min | 3 | 1000ms |
36
+ | Voyage AI | 300/min | 3 | 1000ms |
37
+
38
+ ### Custom Settings
52
39
 
53
40
  ```bash
54
- EMBEDDING_PROVIDER=ollama # or omit (ollama is default)
55
- EMBEDDING_BASE_URL=http://localhost:11434
56
- EMBEDDING_MODEL=nomic-embed-text
57
- # No rate limit configuration needed - runs locally!
41
+ # Adjust for your provider tier
42
+ EMBEDDING_MAX_REQUESTS_PER_MINUTE=500 # Free tier
43
+ EMBEDDING_RETRY_ATTEMPTS=5 # More resilient
44
+ EMBEDDING_RETRY_DELAY=2000 # Longer initial delay
58
45
  ```
59
46
 
60
- ### OpenAI Settings
47
+ ### Provider Examples
61
48
 
62
- **Default (Tier 1 Paid):**
49
+ **Ollama (Default):**
63
50
 
64
51
  ```bash
65
- EMBEDDING_PROVIDER=openai
66
- EMBEDDING_MAX_REQUESTS_PER_MINUTE=3500
67
- EMBEDDING_RETRY_ATTEMPTS=3
68
- EMBEDDING_RETRY_DELAY=1000
52
+ EMBEDDING_PROVIDER=ollama
53
+ EMBEDDING_BASE_URL=http://localhost:11434
54
+ # No rate limit config needed!
69
55
  ```
70
56
 
71
- **Free Tier:**
57
+ **OpenAI Free Tier:**
72
58
 
73
59
  ```bash
74
60
  EMBEDDING_PROVIDER=openai
75
61
  EMBEDDING_MAX_REQUESTS_PER_MINUTE=500
76
62
  EMBEDDING_RETRY_ATTEMPTS=5
77
- EMBEDDING_RETRY_DELAY=2000
78
- ```
79
-
80
- ### Cohere Settings
81
-
82
- ```bash
83
- EMBEDDING_PROVIDER=cohere
84
- EMBEDDING_MAX_REQUESTS_PER_MINUTE=100
85
- EMBEDDING_RETRY_ATTEMPTS=3
86
- EMBEDDING_RETRY_DELAY=1000
87
63
  ```
88
64
 
89
- ### Voyage AI Settings
65
+ **OpenAI Paid Tier:**
90
66
 
91
67
  ```bash
92
- EMBEDDING_PROVIDER=voyage
93
- EMBEDDING_MAX_REQUESTS_PER_MINUTE=300
94
- EMBEDDING_RETRY_ATTEMPTS=3
95
- EMBEDDING_RETRY_DELAY=1000
68
+ EMBEDDING_PROVIDER=openai
69
+ EMBEDDING_MAX_REQUESTS_PER_MINUTE=3500 # Tier 1
96
70
  ```
97
71
 
98
- ### Ollama Settings (Local)
99
-
100
- ```bash
101
- EMBEDDING_PROVIDER=ollama
102
- EMBEDDING_MAX_REQUESTS_PER_MINUTE=1000
103
- EMBEDDING_RETRY_ATTEMPTS=3
104
- EMBEDDING_RETRY_DELAY=500
105
- ```
106
-
107
- ## Example: Batch Document Processing
108
-
109
- Let's test rate limiting by adding many documents at once.
110
-
111
- ### Step 1: Create Collection
72
+ ## Example: Batch Processing
112
73
 
113
74
  ```
75
+ # Create collection
114
76
  Create a collection named "rate-limit-test"
115
- ```
116
-
117
- ### Step 2: Add Batch of Documents
118
77
 
119
- Try adding multiple documents in a single operation:
120
-
121
- ```
78
+ # Add batch of documents (tests rate limiting)
122
79
  Add these documents to "rate-limit-test":
123
80
  - id: 1, text: "Introduction to machine learning algorithms", metadata: {"topic": "ml"}
124
81
  - id: 2, text: "Deep learning neural networks explained", metadata: {"topic": "dl"}
@@ -130,125 +87,23 @@ Add these documents to "rate-limit-test":
130
87
  - id: 8, text: "Hyperparameter optimization methods", metadata: {"topic": "tuning"}
131
88
  - id: 9, text: "Transfer learning and fine-tuning", metadata: {"topic": "transfer"}
132
89
  - id: 10, text: "Ensemble methods and boosting", metadata: {"topic": "ensemble"}
133
- ```
134
90
 
135
- **What happens:**
136
-
137
- - The server generates embeddings for all 10 documents
138
- - Requests are automatically queued and throttled
139
- - If rate limits are hit, automatic retry with backoff occurs
140
- - Console shows retry messages with wait times
141
-
142
- ### Step 3: Test Search
143
-
144
- ```
91
+ # Search
145
92
  Search "rate-limit-test" for "neural networks and deep learning"
146
- ```
147
-
148
- ### Step 4: Monitor Console Output
149
-
150
- Watch for rate limiting messages:
151
-
152
- ```
153
- Rate limit reached. Retrying in 1.0s (attempt 1/3)...
154
- Rate limit reached. Retrying in 2.0s (attempt 2/3)...
155
- ```
156
-
157
- These messages indicate:
158
-
159
- - Rate limit was detected (429 error)
160
- - Automatic retry is in progress
161
- - Current attempt number and delay
162
-
163
- ## Simulating Rate Limit Scenarios
164
-
165
- ### Scenario 1: Free Tier User
166
-
167
- **Configuration:**
168
-
169
- ```bash
170
- OPENAI_MAX_REQUESTS_PER_MINUTE=500
171
- ```
172
-
173
- **Test:** Add 50 documents in batches of 10
174
-
175
- - Server automatically spaces requests
176
- - No manual rate limit handling needed
177
- - Operations complete successfully
178
-
179
- ### Scenario 2: High-Volume Batch
180
-
181
- **Test:** Add 100+ documents
182
-
183
- - Create collection: `batch-test-collection`
184
- - Add documents in chunks
185
- - Server queues requests automatically
186
- - Monitor console for throttling behavior
187
-
188
- ### Scenario 3: Concurrent Operations
189
-
190
- **Test:** Multiple searches simultaneously
191
-
192
- - Perform several searches in quick succession
193
- - Rate limiter queues them appropriately
194
- - All complete without errors
195
-
196
- ## Best Practices
197
-
198
- ### 1. Configure for Your Provider
199
-
200
- Always set `EMBEDDING_MAX_REQUESTS_PER_MINUTE` to match your provider's limits:
201
93
 
202
- **OpenAI:**
94
+ # Watch console for rate limit messages:
95
+ # "Rate limit reached. Retrying in 1.0s (attempt 1/3)..."
96
+ # "Rate limit reached. Retrying in 2.0s (attempt 2/3)..."
203
97
 
204
- ```bash
205
- # Check your tier at: https://platform.openai.com/account/limits
206
- EMBEDDING_MAX_REQUESTS_PER_MINUTE=<your-limit>
207
- ```
208
-
209
- **Other Providers:**
210
-
211
- - Check your provider's dashboard for rate limits
212
- - Start conservative and increase if needed
213
-
214
- ### 2. Adjust Retry Settings for Reliability
215
-
216
- For critical operations, increase retry attempts:
217
-
218
- ```bash
219
- EMBEDDING_RETRY_ATTEMPTS=5 # More resilient
220
- ```
221
-
222
- For development/testing, reduce retries:
223
-
224
- ```bash
225
- EMBEDDING_RETRY_ATTEMPTS=1 # Fail faster
98
+ # Cleanup
99
+ Delete collection "rate-limit-test"
226
100
  ```
227
101
 
228
- ### 3. Batch Operations Wisely
229
-
230
- Most embedding providers support batch operations:
231
-
232
- - **OpenAI**: Up to 2048 texts per request
233
- - **Cohere**: Batch support available
234
- - **Voyage AI**: Batch support available
235
- - **Ollama**: Sequential processing (one at a time)
236
-
237
- The server automatically uses batch APIs when available for efficiency.
238
-
239
- ### 4. Monitor Your Usage
240
-
241
- Watch console output during operations:
242
-
243
- - No messages = smooth operation
244
- - Retry messages = hitting limits (consider reducing rate)
245
- - Error after max retries = need to reduce request volume
102
+ ## Retry Behavior
246
103
 
247
- ## Understanding Retry Behavior
104
+ ### Exponential Backoff
248
105
 
249
- ### Exponential Backoff Example
250
-
251
- With `OPENAI_RETRY_DELAY=1000`:
106
+ With `EMBEDDING_RETRY_DELAY=1000`:
252
107
 
253
108
  | Attempt | Delay | Total Wait |
254
109
  | ------- | ----- | ---------- |
@@ -259,118 +114,54 @@ With `OPENAI_RETRY_DELAY=1000`:
259
114
 
260
115
  ### Retry-After Header
261
116
 
262
- If the provider provides a `Retry-After` header (OpenAI, some others):
117
+ If provider sends `Retry-After` header (OpenAI):
263
118
 
264
- - Server uses that exact delay
119
+ - Server uses exact delay
265
120
  - Ignores exponential backoff
266
121
  - Ensures optimal recovery
267
122
 
268
- ## Error Messages
269
-
270
- ### Success Messages
271
-
272
- ```
273
- Successfully added 10 document(s) to collection "rate-limit-test".
274
- ```
275
-
276
- ### Retry Messages (Normal)
277
-
278
- ```
279
- Rate limit reached. Retrying in 2.0s (attempt 1/3)...
280
- ```
281
-
282
- **Action:** None needed, automatic retry in progress
283
-
284
- ### Max Retries Exceeded (Rare)
285
-
286
- ```
287
- Error: [Provider] API rate limit exceeded after 3 retry attempts.
288
- Please try again later or reduce request frequency.
289
- ```
290
-
291
- **Action:**
123
+ ## Best Practices
292
124
 
293
- - Wait a few minutes
294
- - Reduce `EMBEDDING_MAX_REQUESTS_PER_MINUTE`
295
- - Check your provider's dashboard for current usage
125
+ 1. **Match Your Tier** - Set `EMBEDDING_MAX_REQUESTS_PER_MINUTE` to your provider's limit
126
+ 2. **Check Dashboards** - Verify limits at provider's dashboard
127
+ 3. **Start Conservative** - Lower limits, increase if needed
128
+ 4. **Monitor Console** - Watch for retry messages
129
+ 5. **Use Ollama** - For unlimited local processing
296
130
 
297
- ## Integration with Claude Code
131
+ ### Batch Operation Tips
298
132
 
299
- The rate limiting works seamlessly with Claude Code.
133
+ - **OpenAI**: Up to 2048 texts per request
134
+ - **Cohere**: Batch support available
135
+ - **Voyage AI**: Batch support available
136
+ - **Ollama**: Sequential processing (one at a time)
300
137
 
301
- **Example with Ollama (Default - No Rate Limits):**
138
+ Server automatically uses batch APIs when available.
302
139
 
303
- ```json
304
- {
305
- "mcpServers": {
306
- "qdrant": {
307
- "command": "node",
308
- "args": ["/path/to/qdrant-mcp-server/build/index.js"],
309
- "env": {
310
- "QDRANT_URL": "http://localhost:6333",
311
- "EMBEDDING_BASE_URL": "http://localhost:11434"
312
- }
313
- }
314
- }
315
- }
316
- ```
140
+ ## Error Messages
317
141
 
318
- **Example with OpenAI (Alternative):**
319
-
320
- ```json
321
- {
322
- "mcpServers": {
323
- "qdrant": {
324
- "command": "node",
325
- "args": ["/path/to/qdrant-mcp-server/build/index.js"],
326
- "env": {
327
- "EMBEDDING_PROVIDER": "openai",
328
- "OPENAI_API_KEY": "sk-your-key",
329
- "QDRANT_URL": "http://localhost:6333",
330
- "EMBEDDING_MAX_REQUESTS_PER_MINUTE": "3500",
331
- "EMBEDDING_RETRY_ATTEMPTS": "3",
332
- "EMBEDDING_RETRY_DELAY": "1000"
333
- }
334
- }
335
- }
336
- }
337
142
  ```
143
+ # Success
144
+ Successfully added 10 document(s) to collection "rate-limit-test".
338
145
 
339
- ## Cleanup
146
+ # Retry (Normal)
147
+ Rate limit reached. Retrying in 2.0s (attempt 1/3)...
148
+ # Action: None needed, automatic retry
340
149
 
341
- ```
342
- Delete collection "rate-limit-test"
150
+ # Max Retries Exceeded (Rare)
151
+ Error: API rate limit exceeded after 3 retry attempts.
152
+ # Action: Wait, reduce EMBEDDING_MAX_REQUESTS_PER_MINUTE, check dashboard
343
153
  ```
344
154
 
345
- ## Key Takeaways
155
+ ## Troubleshooting
346
156
 
347
- 1. **Ollama Default**: No rate limits with local processing
348
- 2. **Automatic**: Rate limiting works out-of-the-box for cloud providers
349
- 3. **Configurable**: Adjust for your cloud provider tier
350
- 4. **Resilient**: Exponential backoff handles temporary issues
351
- 5. **Transparent**: Console feedback shows what's happening
352
- 6. ✅ **Efficient**: Batch operations optimize API usage
157
+ | Issue | Solution |
158
+ | ---------------------- | -------------------------------------------------- |
159
+ | Persistent rate limits | Reduce `EMBEDDING_MAX_REQUESTS_PER_MINUTE` by 20% |
160
+ | Slow performance | Expected with rate limiting - better than failures |
161
+ | Need faster processing | Upgrade provider tier or use Ollama |
353
162
 
354
163
  ## Next Steps
355
164
 
356
- - Explore [Knowledge Base example](../knowledge-base/) for real-world usage
165
+ - Explore [Knowledge Base](../knowledge-base/) for real-world usage patterns
357
166
  - Learn [Advanced Filtering](../filters/) for complex queries
358
- - Read [main README](../../README.md) for all configuration options
359
-
360
- ## Troubleshooting
361
-
362
- ### Still Getting Rate Limit Errors?
363
-
364
- 1. **Check your provider's limits**: Visit your provider's dashboard
365
- 2. **Reduce request rate**: Lower `EMBEDDING_MAX_REQUESTS_PER_MINUTE` by 20%
366
- 3. **Increase retry attempts**: Set `EMBEDDING_RETRY_ATTEMPTS=5`
367
- 4. **Wait between batches**: For very large operations, split into multiple sessions
368
-
369
- ### Slow Performance?
370
-
371
- If operations seem slow:
372
-
373
- - This is expected with rate limiting
374
- - It's better than failed operations
375
- - Upgrade your provider's tier for higher limits
376
- - Consider using Ollama for unlimited local processing
167
+ - Review [main README](../../README.md) for all configuration options
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@mhalder/qdrant-mcp-server",
3
- "version": "1.1.0",
3
+ "version": "1.1.1",
4
4
  "description": "MCP server for semantic search using local Qdrant and Ollama (default) with support for OpenAI, Cohere, and Voyage AI",
5
5
  "type": "module",
6
6
  "bin": {