@mhalder/qdrant-mcp-server 1.1.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +87 -69
- package/CONTRIBUTING.md +81 -92
- package/README.md +99 -634
- package/biome.json +34 -0
- package/build/embeddings/sparse.d.ts +40 -0
- package/build/embeddings/sparse.d.ts.map +1 -0
- package/build/embeddings/sparse.js +105 -0
- package/build/embeddings/sparse.js.map +1 -0
- package/build/embeddings/sparse.test.d.ts +2 -0
- package/build/embeddings/sparse.test.d.ts.map +1 -0
- package/build/embeddings/sparse.test.js +69 -0
- package/build/embeddings/sparse.test.js.map +1 -0
- package/build/index.js +130 -30
- package/build/index.js.map +1 -1
- package/build/qdrant/client.d.ts +21 -2
- package/build/qdrant/client.d.ts.map +1 -1
- package/build/qdrant/client.js +131 -17
- package/build/qdrant/client.js.map +1 -1
- package/build/qdrant/client.test.js +429 -21
- package/build/qdrant/client.test.js.map +1 -1
- package/examples/README.md +78 -253
- package/examples/basic/README.md +19 -72
- package/examples/filters/README.md +55 -155
- package/examples/hybrid-search/README.md +199 -0
- package/examples/knowledge-base/README.md +36 -98
- package/examples/rate-limiting/README.md +81 -290
- package/package.json +1 -1
- package/src/embeddings/sparse.test.ts +87 -0
- package/src/embeddings/sparse.ts +127 -0
- package/src/index.ts +161 -57
- package/src/qdrant/client.test.ts +544 -56
- package/src/qdrant/client.ts +162 -22
- package/vitest.config.ts +3 -3
- package/docs/test_report.md +0 -259
|
@@ -1,33 +1,24 @@
|
|
|
1
|
-
# Knowledge Base
|
|
1
|
+
# Knowledge Base
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Build a searchable documentation system with rich metadata for filtering and organization.
|
|
4
|
+
|
|
5
|
+
**Time:** 15-20 minutes | **Difficulty:** Intermediate
|
|
4
6
|
|
|
5
7
|
## Use Case
|
|
6
8
|
|
|
7
|
-
|
|
9
|
+
Company knowledge base with:
|
|
8
10
|
|
|
9
11
|
- Documentation from multiple teams
|
|
10
|
-
-
|
|
11
|
-
-
|
|
12
|
-
|
|
13
|
-
## What You'll Learn
|
|
14
|
-
|
|
15
|
-
- Organizing documents with metadata
|
|
16
|
-
- Using metadata for categorization
|
|
17
|
-
- Filtering searches by metadata fields
|
|
18
|
-
- Building a scalable knowledge base structure
|
|
12
|
+
- Content with varying topics and difficulty levels
|
|
13
|
+
- Searchable and filterable articles
|
|
19
14
|
|
|
20
15
|
## Setup
|
|
21
16
|
|
|
22
|
-
### 1. Create the Collection
|
|
23
|
-
|
|
24
17
|
```
|
|
18
|
+
# Create collection
|
|
25
19
|
Create a collection named "company-kb"
|
|
26
|
-
```
|
|
27
|
-
|
|
28
|
-
### 2. Add Structured Documents
|
|
29
20
|
|
|
30
|
-
|
|
21
|
+
# Add structured documents
|
|
31
22
|
Add these documents to company-kb:
|
|
32
23
|
- id: "eng-001", text: "Our API uses REST principles with JSON payloads. Authentication is handled via JWT tokens in the Authorization header.", metadata: {"team": "engineering", "topic": "api", "difficulty": "intermediate", "category": "technical"}
|
|
33
24
|
- id: "eng-002", text: "To deploy to production, merge your PR to main. The CI/CD pipeline automatically runs tests and deploys if all checks pass.", metadata: {"team": "engineering", "topic": "deployment", "difficulty": "beginner", "category": "process"}
|
|
@@ -39,51 +30,26 @@ Add these documents to company-kb:
|
|
|
39
30
|
|
|
40
31
|
## Search Examples
|
|
41
32
|
|
|
42
|
-
### Basic Search (No Filters)
|
|
43
|
-
|
|
44
33
|
```
|
|
34
|
+
# Basic search
|
|
45
35
|
Search company-kb for "how do I deploy code"
|
|
46
|
-
```
|
|
47
|
-
|
|
48
|
-
Expected: Returns deployment-related docs (eng-002 likely ranks highest)
|
|
49
|
-
|
|
50
|
-
### Filter by Team
|
|
51
36
|
|
|
52
|
-
|
|
37
|
+
# Filter by team
|
|
53
38
|
Search company-kb for "process documentation" with filter {"must": [{"key": "team", "match": {"value": "engineering"}}]}
|
|
54
|
-
```
|
|
55
|
-
|
|
56
|
-
Returns only engineering team documents.
|
|
57
39
|
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
```
|
|
40
|
+
# Filter by difficulty
|
|
61
41
|
Search company-kb for "getting started" with filter {"must": [{"key": "difficulty", "match": {"value": "beginner"}}]}
|
|
62
|
-
```
|
|
63
42
|
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
### Multiple Filters (AND)
|
|
67
|
-
|
|
68
|
-
```
|
|
43
|
+
# Multiple filters (AND)
|
|
69
44
|
Search company-kb for "company procedures" with filter {"must": [{"key": "category", "match": {"value": "process"}}, {"key": "difficulty", "match": {"value": "beginner"}}]}
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
Returns beginner process documents only.
|
|
73
45
|
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
```
|
|
46
|
+
# Filter by topic
|
|
77
47
|
Search company-kb for "pricing information" with filter {"must": [{"key": "team", "match": {"value": "sales"}}]}
|
|
78
48
|
```
|
|
79
49
|
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
## Metadata Design Best Practices
|
|
50
|
+
## Metadata Design
|
|
83
51
|
|
|
84
|
-
###
|
|
85
|
-
|
|
86
|
-
Use the same metadata fields across all documents:
|
|
52
|
+
### Schema Pattern
|
|
87
53
|
|
|
88
54
|
```json
|
|
89
55
|
{
|
|
@@ -94,9 +60,9 @@ Use the same metadata fields across all documents:
|
|
|
94
60
|
}
|
|
95
61
|
```
|
|
96
62
|
|
|
97
|
-
###
|
|
63
|
+
### Advanced Patterns
|
|
98
64
|
|
|
99
|
-
|
|
65
|
+
**Hierarchical:**
|
|
100
66
|
|
|
101
67
|
```json
|
|
102
68
|
{
|
|
@@ -107,9 +73,7 @@ Consider nesting metadata for complex taxonomies:
|
|
|
107
73
|
}
|
|
108
74
|
```
|
|
109
75
|
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
Use arrays for multi-category documents:
|
|
76
|
+
**Multi-category:**
|
|
113
77
|
|
|
114
78
|
```json
|
|
115
79
|
{
|
|
@@ -118,9 +82,7 @@ Use arrays for multi-category documents:
|
|
|
118
82
|
}
|
|
119
83
|
```
|
|
120
84
|
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
Track freshness and versions:
|
|
85
|
+
**Versioned:**
|
|
124
86
|
|
|
125
87
|
```json
|
|
126
88
|
{
|
|
@@ -131,18 +93,7 @@ Track freshness and versions:
|
|
|
131
93
|
}
|
|
132
94
|
```
|
|
133
95
|
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
### Add More Content Types
|
|
137
|
-
|
|
138
|
-
- Code examples with language tags
|
|
139
|
-
- Video transcripts with duration metadata
|
|
140
|
-
- Meeting notes with attendees and dates
|
|
141
|
-
- Product specs with version numbers
|
|
142
|
-
|
|
143
|
-
### Implement Access Control
|
|
144
|
-
|
|
145
|
-
Use metadata for permissions:
|
|
96
|
+
**Access Control:**
|
|
146
97
|
|
|
147
98
|
```json
|
|
148
99
|
{
|
|
@@ -151,51 +102,38 @@ Use metadata for permissions:
|
|
|
151
102
|
}
|
|
152
103
|
```
|
|
153
104
|
|
|
154
|
-
|
|
105
|
+
## Scaling
|
|
155
106
|
|
|
156
|
-
###
|
|
107
|
+
### Content Types
|
|
157
108
|
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
"views": 0,
|
|
163
|
-
"last_accessed": null,
|
|
164
|
-
"author": "user@company.com"
|
|
165
|
-
}
|
|
166
|
-
```
|
|
167
|
-
|
|
168
|
-
## Maintenance
|
|
109
|
+
- Code examples with language tags
|
|
110
|
+
- Video transcripts with duration
|
|
111
|
+
- Meeting notes with attendees/dates
|
|
112
|
+
- Product specs with versions
|
|
169
113
|
|
|
170
|
-
###
|
|
114
|
+
### Maintenance
|
|
171
115
|
|
|
172
|
-
|
|
116
|
+
**Update documents:**
|
|
173
117
|
|
|
174
118
|
```
|
|
175
119
|
Delete documents ["eng-001"] from company-kb
|
|
176
|
-
|
|
177
120
|
Add these documents to company-kb:
|
|
178
|
-
- id: "eng-001", text: "Updated
|
|
121
|
+
- id: "eng-001", text: "Updated content...", metadata: {...}
|
|
179
122
|
```
|
|
180
123
|
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
Use status metadata to hide outdated docs:
|
|
124
|
+
**Archive old content:**
|
|
184
125
|
|
|
185
126
|
```json
|
|
186
|
-
{
|
|
187
|
-
"status": "archived",
|
|
188
|
-
"archived_date": "2024-12-01"
|
|
189
|
-
}
|
|
127
|
+
{ "status": "archived", "archived_date": "2024-12-01" }
|
|
190
128
|
```
|
|
191
129
|
|
|
192
|
-
Then filter searches
|
|
130
|
+
Then filter searches:
|
|
193
131
|
|
|
194
132
|
```
|
|
195
133
|
Search company-kb for "deployment" with filter {"must_not": [{"key": "status", "match": {"value": "archived"}}]}
|
|
196
134
|
```
|
|
197
135
|
|
|
198
|
-
##
|
|
136
|
+
## Cleanup
|
|
199
137
|
|
|
200
138
|
```
|
|
201
139
|
Delete collection "company-kb"
|
|
@@ -203,5 +141,5 @@ Delete collection "company-kb"
|
|
|
203
141
|
|
|
204
142
|
## Next Steps
|
|
205
143
|
|
|
206
|
-
- [Advanced Filtering
|
|
207
|
-
-
|
|
144
|
+
- Explore [Advanced Filtering](../filters/) for complex filter patterns
|
|
145
|
+
- Review [main README](../../README.md) for batch operations and advanced features
|
|
@@ -1,124 +1,81 @@
|
|
|
1
|
-
# Rate Limiting
|
|
1
|
+
# Rate Limiting
|
|
2
2
|
|
|
3
|
-
Learn how the
|
|
3
|
+
Learn how the server handles embedding provider API rate limits automatically with intelligent throttling and retry mechanisms.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
**Time:** 10-15 minutes | **Difficulty:** Beginner to Intermediate
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
## Why It Matters
|
|
8
8
|
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
-
|
|
13
|
-
|
|
14
|
-
|
|
9
|
+
| Provider | Rate Limits | Notes |
|
|
10
|
+
| -------------------- | --------------- | ------------------------------- |
|
|
11
|
+
| **Ollama** (default) | None | Local processing, no limits! |
|
|
12
|
+
| **OpenAI** | 500-10,000+/min | Based on tier (Free/Tier 1/2/3) |
|
|
13
|
+
| **Cohere** | ~100/min | Varies by plan |
|
|
14
|
+
| **Voyage AI** | ~300/min | Varies by plan |
|
|
15
15
|
|
|
16
|
-
**
|
|
17
|
-
**Difficulty:** Beginner to Intermediate
|
|
18
|
-
|
|
19
|
-
## Why Rate Limiting Matters
|
|
20
|
-
|
|
21
|
-
**Ollama (Default):** Since Ollama runs locally, there are no API rate limits! You can process as many embeddings as your system can handle.
|
|
22
|
-
|
|
23
|
-
**Cloud Embedding Providers** (OpenAI, Cohere, Voyage AI) enforce rate limits based on your account tier:
|
|
24
|
-
|
|
25
|
-
**OpenAI:**
|
|
26
|
-
| Tier | Requests/Minute |
|
|
27
|
-
| ------- | --------------- |
|
|
28
|
-
| Free | 500 |
|
|
29
|
-
| Tier 1 | 3,500 |
|
|
30
|
-
| Tier 2 | 5,000 |
|
|
31
|
-
| Tier 3+ | 10,000+ |
|
|
32
|
-
|
|
33
|
-
**Other Cloud Providers:**
|
|
34
|
-
|
|
35
|
-
- **Cohere**: ~100 requests/minute (varies by plan)
|
|
36
|
-
- **Voyage AI**: ~300 requests/minute (varies by plan)
|
|
37
|
-
|
|
38
|
-
Without rate limiting, batch operations with cloud providers can exceed these limits and fail. This is one reason why **Ollama is the default** - no rate limits to worry about!
|
|
16
|
+
Without rate limiting, batch operations with cloud providers can fail. **This is why Ollama is the default** - no rate limits to worry about!
|
|
39
17
|
|
|
40
18
|
## How It Works
|
|
41
19
|
|
|
42
20
|
The server automatically:
|
|
43
21
|
|
|
44
|
-
1. **Throttles
|
|
45
|
-
2. **Retries
|
|
46
|
-
3. **Respects
|
|
47
|
-
4. **Provides Feedback
|
|
22
|
+
1. **Throttles** - Queues API calls within limits
|
|
23
|
+
2. **Retries** - Exponential backoff (1s, 2s, 4s, 8s...)
|
|
24
|
+
3. **Respects Headers** - Follows provider retry guidance
|
|
25
|
+
4. **Provides Feedback** - Console shows retry progress
|
|
48
26
|
|
|
49
27
|
## Configuration
|
|
50
28
|
|
|
51
|
-
###
|
|
29
|
+
### Provider Defaults
|
|
30
|
+
|
|
31
|
+
| Provider | Default Limit | Retry Attempts | Retry Delay |
|
|
32
|
+
| --------- | ----------------- | -------------- | ----------- |
|
|
33
|
+
| Ollama | 1000/min | 3 | 500ms |
|
|
34
|
+
| OpenAI | 3500/min (Tier 1) | 3 | 1000ms |
|
|
35
|
+
| Cohere | 100/min | 3 | 1000ms |
|
|
36
|
+
| Voyage AI | 300/min | 3 | 1000ms |
|
|
37
|
+
|
|
38
|
+
### Custom Settings
|
|
52
39
|
|
|
53
40
|
```bash
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
#
|
|
41
|
+
# Adjust for your provider tier
|
|
42
|
+
EMBEDDING_MAX_REQUESTS_PER_MINUTE=500 # Free tier
|
|
43
|
+
EMBEDDING_RETRY_ATTEMPTS=5 # More resilient
|
|
44
|
+
EMBEDDING_RETRY_DELAY=2000 # Longer initial delay
|
|
58
45
|
```
|
|
59
46
|
|
|
60
|
-
###
|
|
47
|
+
### Provider Examples
|
|
61
48
|
|
|
62
|
-
**
|
|
49
|
+
**Ollama (Default):**
|
|
63
50
|
|
|
64
51
|
```bash
|
|
65
|
-
EMBEDDING_PROVIDER=
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
EMBEDDING_RETRY_DELAY=1000
|
|
52
|
+
EMBEDDING_PROVIDER=ollama
|
|
53
|
+
EMBEDDING_BASE_URL=http://localhost:11434
|
|
54
|
+
# No rate limit config needed!
|
|
69
55
|
```
|
|
70
56
|
|
|
71
|
-
**Free Tier:**
|
|
57
|
+
**OpenAI Free Tier:**
|
|
72
58
|
|
|
73
59
|
```bash
|
|
74
60
|
EMBEDDING_PROVIDER=openai
|
|
75
61
|
EMBEDDING_MAX_REQUESTS_PER_MINUTE=500
|
|
76
62
|
EMBEDDING_RETRY_ATTEMPTS=5
|
|
77
|
-
EMBEDDING_RETRY_DELAY=2000
|
|
78
|
-
```
|
|
79
|
-
|
|
80
|
-
### Cohere Settings
|
|
81
|
-
|
|
82
|
-
```bash
|
|
83
|
-
EMBEDDING_PROVIDER=cohere
|
|
84
|
-
EMBEDDING_MAX_REQUESTS_PER_MINUTE=100
|
|
85
|
-
EMBEDDING_RETRY_ATTEMPTS=3
|
|
86
|
-
EMBEDDING_RETRY_DELAY=1000
|
|
87
63
|
```
|
|
88
64
|
|
|
89
|
-
|
|
65
|
+
**OpenAI Paid Tier:**
|
|
90
66
|
|
|
91
67
|
```bash
|
|
92
|
-
EMBEDDING_PROVIDER=
|
|
93
|
-
EMBEDDING_MAX_REQUESTS_PER_MINUTE=
|
|
94
|
-
EMBEDDING_RETRY_ATTEMPTS=3
|
|
95
|
-
EMBEDDING_RETRY_DELAY=1000
|
|
68
|
+
EMBEDDING_PROVIDER=openai
|
|
69
|
+
EMBEDDING_MAX_REQUESTS_PER_MINUTE=3500 # Tier 1
|
|
96
70
|
```
|
|
97
71
|
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
```bash
|
|
101
|
-
EMBEDDING_PROVIDER=ollama
|
|
102
|
-
EMBEDDING_MAX_REQUESTS_PER_MINUTE=1000
|
|
103
|
-
EMBEDDING_RETRY_ATTEMPTS=3
|
|
104
|
-
EMBEDDING_RETRY_DELAY=500
|
|
105
|
-
```
|
|
106
|
-
|
|
107
|
-
## Example: Batch Document Processing
|
|
108
|
-
|
|
109
|
-
Let's test rate limiting by adding many documents at once.
|
|
110
|
-
|
|
111
|
-
### Step 1: Create Collection
|
|
72
|
+
## Example: Batch Processing
|
|
112
73
|
|
|
113
74
|
```
|
|
75
|
+
# Create collection
|
|
114
76
|
Create a collection named "rate-limit-test"
|
|
115
|
-
```
|
|
116
|
-
|
|
117
|
-
### Step 2: Add Batch of Documents
|
|
118
77
|
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
```
|
|
78
|
+
# Add batch of documents (tests rate limiting)
|
|
122
79
|
Add these documents to "rate-limit-test":
|
|
123
80
|
- id: 1, text: "Introduction to machine learning algorithms", metadata: {"topic": "ml"}
|
|
124
81
|
- id: 2, text: "Deep learning neural networks explained", metadata: {"topic": "dl"}
|
|
@@ -130,125 +87,23 @@ Add these documents to "rate-limit-test":
|
|
|
130
87
|
- id: 8, text: "Hyperparameter optimization methods", metadata: {"topic": "tuning"}
|
|
131
88
|
- id: 9, text: "Transfer learning and fine-tuning", metadata: {"topic": "transfer"}
|
|
132
89
|
- id: 10, text: "Ensemble methods and boosting", metadata: {"topic": "ensemble"}
|
|
133
|
-
```
|
|
134
90
|
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
- The server generates embeddings for all 10 documents
|
|
138
|
-
- Requests are automatically queued and throttled
|
|
139
|
-
- If rate limits are hit, automatic retry with backoff occurs
|
|
140
|
-
- Console shows retry messages with wait times
|
|
141
|
-
|
|
142
|
-
### Step 3: Test Search
|
|
143
|
-
|
|
144
|
-
```
|
|
91
|
+
# Search
|
|
145
92
|
Search "rate-limit-test" for "neural networks and deep learning"
|
|
146
|
-
```
|
|
147
|
-
|
|
148
|
-
### Step 4: Monitor Console Output
|
|
149
|
-
|
|
150
|
-
Watch for rate limiting messages:
|
|
151
|
-
|
|
152
|
-
```
|
|
153
|
-
Rate limit reached. Retrying in 1.0s (attempt 1/3)...
|
|
154
|
-
Rate limit reached. Retrying in 2.0s (attempt 2/3)...
|
|
155
|
-
```
|
|
156
|
-
|
|
157
|
-
These messages indicate:
|
|
158
|
-
|
|
159
|
-
- Rate limit was detected (429 error)
|
|
160
|
-
- Automatic retry is in progress
|
|
161
|
-
- Current attempt number and delay
|
|
162
|
-
|
|
163
|
-
## Simulating Rate Limit Scenarios
|
|
164
|
-
|
|
165
|
-
### Scenario 1: Free Tier User
|
|
166
|
-
|
|
167
|
-
**Configuration:**
|
|
168
|
-
|
|
169
|
-
```bash
|
|
170
|
-
OPENAI_MAX_REQUESTS_PER_MINUTE=500
|
|
171
|
-
```
|
|
172
|
-
|
|
173
|
-
**Test:** Add 50 documents in batches of 10
|
|
174
|
-
|
|
175
|
-
- Server automatically spaces requests
|
|
176
|
-
- No manual rate limit handling needed
|
|
177
|
-
- Operations complete successfully
|
|
178
|
-
|
|
179
|
-
### Scenario 2: High-Volume Batch
|
|
180
|
-
|
|
181
|
-
**Test:** Add 100+ documents
|
|
182
|
-
|
|
183
|
-
- Create collection: `batch-test-collection`
|
|
184
|
-
- Add documents in chunks
|
|
185
|
-
- Server queues requests automatically
|
|
186
|
-
- Monitor console for throttling behavior
|
|
187
|
-
|
|
188
|
-
### Scenario 3: Concurrent Operations
|
|
189
|
-
|
|
190
|
-
**Test:** Multiple searches simultaneously
|
|
191
|
-
|
|
192
|
-
- Perform several searches in quick succession
|
|
193
|
-
- Rate limiter queues them appropriately
|
|
194
|
-
- All complete without errors
|
|
195
|
-
|
|
196
|
-
## Best Practices
|
|
197
|
-
|
|
198
|
-
### 1. Configure for Your Provider
|
|
199
|
-
|
|
200
|
-
Always set `EMBEDDING_MAX_REQUESTS_PER_MINUTE` to match your provider's limits:
|
|
201
93
|
|
|
202
|
-
|
|
94
|
+
# Watch console for rate limit messages:
|
|
95
|
+
# "Rate limit reached. Retrying in 1.0s (attempt 1/3)..."
|
|
96
|
+
# "Rate limit reached. Retrying in 2.0s (attempt 2/3)..."
|
|
203
97
|
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
EMBEDDING_MAX_REQUESTS_PER_MINUTE=<your-limit>
|
|
207
|
-
```
|
|
208
|
-
|
|
209
|
-
**Other Providers:**
|
|
210
|
-
|
|
211
|
-
- Check your provider's dashboard for rate limits
|
|
212
|
-
- Start conservative and increase if needed
|
|
213
|
-
|
|
214
|
-
### 2. Adjust Retry Settings for Reliability
|
|
215
|
-
|
|
216
|
-
For critical operations, increase retry attempts:
|
|
217
|
-
|
|
218
|
-
```bash
|
|
219
|
-
EMBEDDING_RETRY_ATTEMPTS=5 # More resilient
|
|
220
|
-
```
|
|
221
|
-
|
|
222
|
-
For development/testing, reduce retries:
|
|
223
|
-
|
|
224
|
-
```bash
|
|
225
|
-
EMBEDDING_RETRY_ATTEMPTS=1 # Fail faster
|
|
98
|
+
# Cleanup
|
|
99
|
+
Delete collection "rate-limit-test"
|
|
226
100
|
```
|
|
227
101
|
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
Most embedding providers support batch operations:
|
|
231
|
-
|
|
232
|
-
- **OpenAI**: Up to 2048 texts per request
|
|
233
|
-
- **Cohere**: Batch support available
|
|
234
|
-
- **Voyage AI**: Batch support available
|
|
235
|
-
- **Ollama**: Sequential processing (one at a time)
|
|
236
|
-
|
|
237
|
-
The server automatically uses batch APIs when available for efficiency.
|
|
238
|
-
|
|
239
|
-
### 4. Monitor Your Usage
|
|
240
|
-
|
|
241
|
-
Watch console output during operations:
|
|
242
|
-
|
|
243
|
-
- No messages = smooth operation
|
|
244
|
-
- Retry messages = hitting limits (consider reducing rate)
|
|
245
|
-
- Error after max retries = need to reduce request volume
|
|
102
|
+
## Retry Behavior
|
|
246
103
|
|
|
247
|
-
|
|
104
|
+
### Exponential Backoff
|
|
248
105
|
|
|
249
|
-
|
|
250
|
-
|
|
251
|
-
With `OPENAI_RETRY_DELAY=1000`:
|
|
106
|
+
With `EMBEDDING_RETRY_DELAY=1000`:
|
|
252
107
|
|
|
253
108
|
| Attempt | Delay | Total Wait |
|
|
254
109
|
| ------- | ----- | ---------- |
|
|
@@ -259,118 +114,54 @@ With `OPENAI_RETRY_DELAY=1000`:
|
|
|
259
114
|
|
|
260
115
|
### Retry-After Header
|
|
261
116
|
|
|
262
|
-
If
|
|
117
|
+
If provider sends `Retry-After` header (OpenAI):
|
|
263
118
|
|
|
264
|
-
- Server uses
|
|
119
|
+
- Server uses exact delay
|
|
265
120
|
- Ignores exponential backoff
|
|
266
121
|
- Ensures optimal recovery
|
|
267
122
|
|
|
268
|
-
##
|
|
269
|
-
|
|
270
|
-
### Success Messages
|
|
271
|
-
|
|
272
|
-
```
|
|
273
|
-
Successfully added 10 document(s) to collection "rate-limit-test".
|
|
274
|
-
```
|
|
275
|
-
|
|
276
|
-
### Retry Messages (Normal)
|
|
277
|
-
|
|
278
|
-
```
|
|
279
|
-
Rate limit reached. Retrying in 2.0s (attempt 1/3)...
|
|
280
|
-
```
|
|
281
|
-
|
|
282
|
-
**Action:** None needed, automatic retry in progress
|
|
283
|
-
|
|
284
|
-
### Max Retries Exceeded (Rare)
|
|
285
|
-
|
|
286
|
-
```
|
|
287
|
-
Error: [Provider] API rate limit exceeded after 3 retry attempts.
|
|
288
|
-
Please try again later or reduce request frequency.
|
|
289
|
-
```
|
|
290
|
-
|
|
291
|
-
**Action:**
|
|
123
|
+
## Best Practices
|
|
292
124
|
|
|
293
|
-
-
|
|
294
|
-
-
|
|
295
|
-
|
|
125
|
+
1. **Match Your Tier** - Set `EMBEDDING_MAX_REQUESTS_PER_MINUTE` to your provider's limit
|
|
126
|
+
2. **Check Dashboards** - Verify limits at provider's dashboard
|
|
127
|
+
3. **Start Conservative** - Lower limits, increase if needed
|
|
128
|
+
4. **Monitor Console** - Watch for retry messages
|
|
129
|
+
5. **Use Ollama** - For unlimited local processing
|
|
296
130
|
|
|
297
|
-
|
|
131
|
+
### Batch Operation Tips
|
|
298
132
|
|
|
299
|
-
|
|
133
|
+
- **OpenAI**: Up to 2048 texts per request
|
|
134
|
+
- **Cohere**: Batch support available
|
|
135
|
+
- **Voyage AI**: Batch support available
|
|
136
|
+
- **Ollama**: Sequential processing (one at a time)
|
|
300
137
|
|
|
301
|
-
|
|
138
|
+
Server automatically uses batch APIs when available.
|
|
302
139
|
|
|
303
|
-
|
|
304
|
-
{
|
|
305
|
-
"mcpServers": {
|
|
306
|
-
"qdrant": {
|
|
307
|
-
"command": "node",
|
|
308
|
-
"args": ["/path/to/qdrant-mcp-server/build/index.js"],
|
|
309
|
-
"env": {
|
|
310
|
-
"QDRANT_URL": "http://localhost:6333",
|
|
311
|
-
"EMBEDDING_BASE_URL": "http://localhost:11434"
|
|
312
|
-
}
|
|
313
|
-
}
|
|
314
|
-
}
|
|
315
|
-
}
|
|
316
|
-
```
|
|
140
|
+
## Error Messages
|
|
317
141
|
|
|
318
|
-
**Example with OpenAI (Alternative):**
|
|
319
|
-
|
|
320
|
-
```json
|
|
321
|
-
{
|
|
322
|
-
"mcpServers": {
|
|
323
|
-
"qdrant": {
|
|
324
|
-
"command": "node",
|
|
325
|
-
"args": ["/path/to/qdrant-mcp-server/build/index.js"],
|
|
326
|
-
"env": {
|
|
327
|
-
"EMBEDDING_PROVIDER": "openai",
|
|
328
|
-
"OPENAI_API_KEY": "sk-your-key",
|
|
329
|
-
"QDRANT_URL": "http://localhost:6333",
|
|
330
|
-
"EMBEDDING_MAX_REQUESTS_PER_MINUTE": "3500",
|
|
331
|
-
"EMBEDDING_RETRY_ATTEMPTS": "3",
|
|
332
|
-
"EMBEDDING_RETRY_DELAY": "1000"
|
|
333
|
-
}
|
|
334
|
-
}
|
|
335
|
-
}
|
|
336
|
-
}
|
|
337
142
|
```
|
|
143
|
+
# Success
|
|
144
|
+
Successfully added 10 document(s) to collection "rate-limit-test".
|
|
338
145
|
|
|
339
|
-
|
|
146
|
+
# Retry (Normal)
|
|
147
|
+
Rate limit reached. Retrying in 2.0s (attempt 1/3)...
|
|
148
|
+
# Action: None needed, automatic retry
|
|
340
149
|
|
|
341
|
-
|
|
342
|
-
|
|
150
|
+
# Max Retries Exceeded (Rare)
|
|
151
|
+
Error: API rate limit exceeded after 3 retry attempts.
|
|
152
|
+
# Action: Wait, reduce EMBEDDING_MAX_REQUESTS_PER_MINUTE, check dashboard
|
|
343
153
|
```
|
|
344
154
|
|
|
345
|
-
##
|
|
155
|
+
## Troubleshooting
|
|
346
156
|
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
6. ✅ **Efficient**: Batch operations optimize API usage
|
|
157
|
+
| Issue | Solution |
|
|
158
|
+
| ---------------------- | -------------------------------------------------- |
|
|
159
|
+
| Persistent rate limits | Reduce `EMBEDDING_MAX_REQUESTS_PER_MINUTE` by 20% |
|
|
160
|
+
| Slow performance | Expected with rate limiting - better than failures |
|
|
161
|
+
| Need faster processing | Upgrade provider tier or use Ollama |
|
|
353
162
|
|
|
354
163
|
## Next Steps
|
|
355
164
|
|
|
356
|
-
- Explore [Knowledge Base
|
|
165
|
+
- Explore [Knowledge Base](../knowledge-base/) for real-world usage patterns
|
|
357
166
|
- Learn [Advanced Filtering](../filters/) for complex queries
|
|
358
|
-
-
|
|
359
|
-
|
|
360
|
-
## Troubleshooting
|
|
361
|
-
|
|
362
|
-
### Still Getting Rate Limit Errors?
|
|
363
|
-
|
|
364
|
-
1. **Check your provider's limits**: Visit your provider's dashboard
|
|
365
|
-
2. **Reduce request rate**: Lower `EMBEDDING_MAX_REQUESTS_PER_MINUTE` by 20%
|
|
366
|
-
3. **Increase retry attempts**: Set `EMBEDDING_RETRY_ATTEMPTS=5`
|
|
367
|
-
4. **Wait between batches**: For very large operations, split into multiple sessions
|
|
368
|
-
|
|
369
|
-
### Slow Performance?
|
|
370
|
-
|
|
371
|
-
If operations seem slow:
|
|
372
|
-
|
|
373
|
-
- This is expected with rate limiting
|
|
374
|
-
- It's better than failed operations
|
|
375
|
-
- Upgrade your provider's tier for higher limits
|
|
376
|
-
- Consider using Ollama for unlimited local processing
|
|
167
|
+
- Review [main README](../../README.md) for all configuration options
|
package/package.json
CHANGED