@mhalder/qdrant-mcp-server 1.1.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,18 +1,11 @@
1
- # Advanced Filtering Examples
1
+ # Advanced Filtering
2
2
 
3
- This example demonstrates powerful metadata filtering capabilities using Qdrant's filter syntax.
3
+ Master complex metadata filtering with boolean logic for powerful search refinement.
4
4
 
5
- ## What You'll Learn
6
-
7
- - Complex boolean logic (AND, OR, NOT)
8
- - Range filters for numeric values
9
- - Combining multiple filter conditions
10
- - Real-world filtering scenarios
5
+ **Time:** 20-30 minutes | **Difficulty:** Intermediate to Advanced
11
6
 
12
7
  ## Setup
13
8
 
14
- Create a collection with sample e-commerce data:
15
-
16
9
  ```
17
10
  Create a collection named "products"
18
11
 
@@ -29,117 +22,46 @@ Add these documents to products:
29
22
 
30
23
  ## Filter Examples
31
24
 
32
- ### 1. Simple Match Filter (AND)
33
-
34
- Find electronics products:
25
+ ### Basic Filters
35
26
 
36
27
  ```
28
+ # Match single category
37
29
  Search products for "device for music" with filter {"must": [{"key": "category", "match": {"value": "electronics"}}]}
38
- ```
39
-
40
- Expected: Returns p1, p4, p6 (all electronics)
30
+ # Returns: p1, p4, p6
41
31
 
42
- ### 2. Multiple Conditions (AND)
43
-
44
- Find in-stock electronics:
45
-
46
- ```
32
+ # Multiple conditions (AND)
47
33
  Search products for "gadgets" with filter {"must": [{"key": "category", "match": {"value": "electronics"}}, {"key": "in_stock", "match": {"value": true}}]}
48
- ```
49
-
50
- Expected: Returns p1, p4, p6 (in-stock electronics only)
34
+ # Returns: p1, p4, p6 (in-stock electronics)
51
35
 
52
- ### 3. OR Logic with Should
53
-
54
- Find either sports or accessories:
55
-
56
- ```
36
+ # OR logic
57
37
  Search products for "gear" with filter {"should": [{"key": "category", "match": {"value": "sports"}}, {"key": "category", "match": {"value": "accessories"}}]}
58
- ```
59
-
60
- Expected: Returns p3, p5, p7, p8 (sports OR accessories)
38
+ # Returns: p3, p5, p7, p8
61
39
 
62
- ### 4. Negation with Must Not
63
-
64
- Find everything except clothing:
65
-
66
- ```
40
+ # NOT logic
67
41
  Search products for "shopping" with filter {"must_not": [{"key": "category", "match": {"value": "clothing"}}]}
42
+ # Returns: All except p2
68
43
  ```
69
44
 
70
- Expected: Returns all products except p2
71
-
72
- ### 5. Range Filter - Greater Than
73
-
74
- **Note:** Range filters require Qdrant's range condition syntax. The current implementation supports match filters. For range queries, you would need to use Qdrant's native range syntax:
75
-
76
- Conceptual example (not yet implemented in MCP server):
77
-
78
- ```json
79
- {
80
- "must": [
81
- {
82
- "key": "price",
83
- "range": {
84
- "gt": 100.0
85
- }
86
- }
87
- ]
88
- }
89
- ```
90
-
91
- ### 6. Complex Boolean Logic
92
-
93
- Find in-stock products that are either:
94
-
95
- - Electronics with rating > 4.5, OR
96
- - Sports items under $50
45
+ ### Complex Combinations
97
46
 
98
47
  ```
48
+ # In-stock products (electronics OR sports)
99
49
  Search products for "quality products" with filter {"must": [{"key": "in_stock", "match": {"value": true}}], "should": [{"key": "category", "match": {"value": "electronics"}}, {"key": "category", "match": {"value": "sports"}}]}
100
- ```
101
-
102
- ### 7. Combining Multiple Must Conditions
103
-
104
- Find highly-rated in-stock electronics:
105
-
106
- ```
107
- Search products for "best gadgets" with filter {"must": [{"key": "category", "match": {"value": "electronics"}}, {"key": "in_stock", "match": {"value": true}}, {"key": "rating", "match": {"value": 4.5}}]}
108
- ```
109
-
110
- Note: Exact match on rating. For range queries, use Qdrant's range filter syntax.
111
-
112
- ### 8. Brand Filtering
113
50
 
114
- Find AudioTech products:
115
-
116
- ```
117
- Search products for "audio equipment" with filter {"must": [{"key": "brand", "match": {"value": "AudioTech"}}]}
118
- ```
119
-
120
- Expected: Returns p1 only
121
-
122
- ### 9. Out of Stock Products
123
-
124
- Find what needs restocking:
51
+ # Non-electronic in-stock items
52
+ Search products for "shopping" with filter {"must": [{"key": "in_stock", "match": {"value": true}}], "must_not": [{"key": "category", "match": {"value": "electronics"}}]}
53
+ # Returns: p2, p5, p8
125
54
 
126
- ```
55
+ # Out of stock items
127
56
  Search products for "items" with filter {"must": [{"key": "in_stock", "match": {"value": false}}]}
128
- ```
57
+ # Returns: p3, p7
129
58
 
130
- Expected: Returns p3, p7 (out of stock items)
131
-
132
- ### 10. Category Exclusion with Multiple Conditions
133
-
134
- Find non-electronic in-stock items:
135
-
136
- ```
137
- Search products for "shopping" with filter {"must": [{"key": "in_stock", "match": {"value": true}}], "must_not": [{"key": "category", "match": {"value": "electronics"}}]}
59
+ # Brand filtering
60
+ Search products for "audio equipment" with filter {"must": [{"key": "brand", "match": {"value": "AudioTech"}}]}
61
+ # Returns: p1 only
138
62
  ```
139
63
 
140
- Expected: Returns p2, p5, p8 (in-stock, non-electronics)
141
-
142
- ## Filter Syntax Reference
64
+ ## Filter Syntax
143
65
 
144
66
  ### Structure
145
67
 
@@ -153,24 +75,16 @@ Expected: Returns p2, p5, p8 (in-stock, non-electronics)
153
75
 
154
76
  ### Match Filter
155
77
 
156
- Exact value matching:
157
-
158
78
  ```json
159
79
  {
160
80
  "key": "field_name",
161
81
  "match": {
162
- "value": "exact_value"
82
+ "value": "exact_value" // Works with strings, numbers, booleans
163
83
  }
164
84
  }
165
85
  ```
166
86
 
167
- Works with:
168
-
169
- - Strings: `"value": "electronics"`
170
- - Numbers: `"value": 4.5`
171
- - Booleans: `"value": true`
172
-
173
- ### Range Filter (Qdrant Native)
87
+ ### Range Filters (Native Qdrant)
174
88
 
175
89
  For numeric comparisons (future enhancement):
176
90
 
@@ -188,68 +102,54 @@ For numeric comparisons (future enhancement):
188
102
 
189
103
  ## Real-World Scenarios
190
104
 
191
- ### E-commerce Product Search
192
-
193
- "Show me affordable fitness equipment"
194
-
195
105
  ```
106
+ # E-commerce: affordable fitness equipment
196
107
  Search products for "fitness equipment" with filter {"must": [{"key": "category", "match": {"value": "sports"}}, {"key": "in_stock", "match": {"value": true}}]}
197
- ```
198
-
199
- ### Inventory Management
200
108
 
201
- "Which electronics need restocking?"
202
-
203
- ```
109
+ # Inventory: electronics needing restock
204
110
  Search products for "electronics" with filter {"must": [{"key": "category", "match": {"value": "electronics"}}, {"key": "in_stock", "match": {"value": false}}]}
205
- ```
206
-
207
- ### Quality Control
208
111
 
209
- "Show me all highly-rated available products"
210
-
211
- ```
112
+ # Quality control: highly-rated available products
212
113
  Search products for "top rated products" with filter {"must": [{"key": "in_stock", "match": {"value": true}}]}
213
114
  ```
214
115
 
215
- ## Limitations and Workarounds
116
+ ## Workarounds
216
117
 
217
- ### Current Limitations
118
+ ### Price Ranges
218
119
 
219
- 1. **No native range filters**: Can't directly filter by price ranges like "between $50-$100"
220
- 2. **No text search on metadata**: Metadata matching is exact, not fuzzy
221
- 3. **No nested object queries**: Flat metadata structure only
120
+ Add `price_tier` to metadata:
222
121
 
223
- ### Workarounds
122
+ ```json
123
+ {"price_tier": "budget", "price": 29.99} // <$50
124
+ {"price_tier": "mid", "price": 149.99} // $50-$200
125
+ {"price_tier": "premium", "price": 299.99} // >$200
126
+ ```
224
127
 
225
- 1. **Price ranges**: Add price_tier to metadata:
128
+ ### Multiple Categories
226
129
 
227
- ```json
228
- {"price_tier": "budget", "price": 29.99} // budget: <$50
229
- {"price_tier": "mid", "price": 149.99} // mid: $50-$200
230
- {"price_tier": "premium", "price": 299.99} // premium: >$200
231
- ```
130
+ Use array-based tags:
232
131
 
233
- 2. **Multiple categories**: Use array-based tags:
132
+ ```json
133
+ { "tags": ["electronics", "wearable", "fitness"] }
134
+ ```
135
+
136
+ ### Date Filtering
234
137
 
235
- ```json
236
- { "tags": ["electronics", "wearable", "fitness"] }
237
- ```
138
+ Use comparable string format:
238
139
 
239
- 3. **Date filtering**: Store dates as strings in comparable format:
240
- ```json
241
- { "created_date": "2024-03-15" } // YYYY-MM-DD for lexicographic comparison
242
- ```
140
+ ```json
141
+ { "created_date": "2024-03-15" } // YYYY-MM-DD
142
+ ```
243
143
 
244
144
  ## Best Practices
245
145
 
246
- 1. **Keep metadata flat**: Avoid deep nesting for better filter performance
247
- 2. **Use consistent types**: Don't mix strings and numbers for the same field
248
- 3. **Index commonly filtered fields**: Design metadata around common queries
249
- 4. **Test filters first**: Validate filter syntax before complex queries
250
- 5. **Combine with semantic search**: Use filters to narrow, then semantic search to rank
146
+ 1. **Flat metadata** - Avoid deep nesting
147
+ 2. **Consistent types** - Don't mix strings/numbers for same field
148
+ 3. **Index common fields** - Design around frequent queries
149
+ 4. **Test filters first** - Validate syntax before complex queries
150
+ 5. **Combine with search** - Use filters to narrow, semantic search to rank
251
151
 
252
- ## Clean Up
152
+ ## Cleanup
253
153
 
254
154
  ```
255
155
  Delete collection "products"
@@ -258,5 +158,5 @@ Delete collection "products"
258
158
  ## Next Steps
259
159
 
260
160
  - Review [Qdrant filtering documentation](https://qdrant.tech/documentation/concepts/filtering/)
261
- - Explore the [Knowledge Base Example](../knowledge-base/) for organizational patterns
262
- - Check the main README for full filter syntax support
161
+ - Explore [Knowledge Base](../knowledge-base/) example for organizational patterns
162
+ - See [main README](../../README.md) for complete filter syntax reference
@@ -0,0 +1,199 @@
1
+ # Hybrid Search
2
+
3
+ Combine semantic vector search with keyword (BM25) search for more accurate and comprehensive results.
4
+
5
+ **Time:** 15-20 minutes | **Difficulty:** Intermediate
6
+
7
+ ## What is Hybrid Search?
8
+
9
+ Hybrid search combines two search approaches:
10
+
11
+ 1. **Semantic Search**: Understands meaning and context using vector embeddings
12
+ 2. **Keyword Search**: Exact term matching using BM25 sparse vectors
13
+
14
+ The results are merged using **Reciprocal Rank Fusion (RRF)**, which combines rankings from both methods to produce the best overall results.
15
+
16
+ ## When to Use Hybrid Search
17
+
18
+ Hybrid search is ideal for:
19
+
20
+ - **Technical documentation**: Users search for exact function names + concepts
21
+ - **Product search**: Match SKUs/model numbers + descriptions
22
+ - **Legal documents**: Exact citations + semantic context
23
+ - **Code search**: Function names + natural language descriptions
24
+ - **Mixed queries**: "authentication JWT" (semantic + exact keyword)
25
+
26
+ ## Benefits
27
+
28
+ - **Best of both worlds**: Precision (keyword) + recall (semantic)
29
+ - **Better results for ambiguous queries**
30
+ - **Handles typos** (semantic) and **exact matches** (keyword)
31
+ - **More control** over result relevance
32
+
33
+ ## Workflow
34
+
35
+ ### 1. Create a Collection with Hybrid Search Enabled
36
+
37
+ ```
38
+ Create a collection named "technical_docs" with Cosine distance and enableHybrid set to true
39
+ ```
40
+
41
+ **Important**: Set `enableHybrid: true` to enable hybrid search capabilities.
42
+
43
+ ### 2. Add Documents
44
+
45
+ Documents are automatically indexed for both semantic and keyword search:
46
+
47
+ ```
48
+ Add these documents to technical_docs:
49
+ - id: 1, text: "The authenticateUser function validates JWT tokens for user sessions",
50
+ metadata: {"category": "authentication", "type": "function"}
51
+ - id: 2, text: "JWT (JSON Web Token) is a compact URL-safe means of representing claims",
52
+ metadata: {"category": "authentication", "type": "definition"}
53
+ - id: 3, text: "OAuth2 provides authorization framework for third-party applications",
54
+ metadata: {"category": "authentication", "type": "protocol"}
55
+ - id: 4, text: "The login endpoint requires username and password credentials",
56
+ metadata: {"category": "authentication", "type": "endpoint"}
57
+ ```
58
+
59
+ ### 3. Perform Hybrid Search
60
+
61
+ Search using both semantic understanding and keyword matching:
62
+
63
+ ```
64
+ Search technical_docs for "JWT authentication function" with limit 3 using hybrid_search
65
+ ```
66
+
67
+ **Result**: Documents are ranked by combining:
68
+
69
+ - Semantic similarity to "authentication function"
70
+ - Exact keyword matches for "JWT"
71
+
72
+ ### 4. Hybrid Search with Filters
73
+
74
+ Combine hybrid search with metadata filtering:
75
+
76
+ ```
77
+ Search technical_docs for "JWT token validation" with limit 2 and filter {"type": "function"} using hybrid_search
78
+ ```
79
+
80
+ ## Comparison: Semantic vs Hybrid Search
81
+
82
+ ### Semantic Search Only
83
+
84
+ ```
85
+ Search technical_docs for "JWT authentication" with limit 3 using semantic_search
86
+ ```
87
+
88
+ **Result**: May miss documents with exact "JWT" match if they're not semantically similar.
89
+
90
+ ### Hybrid Search
91
+
92
+ ```
93
+ Search technical_docs for "JWT authentication" with limit 3 using hybrid_search
94
+ ```
95
+
96
+ **Result**: Finds both:
97
+
98
+ - Documents semantically related to authentication
99
+ - Documents with exact "JWT" keyword match
100
+ - Best combination ranked by RRF
101
+
102
+ ## Example Scenarios
103
+
104
+ ### Scenario 1: Exact Term + Context
105
+
106
+ **Query**: "authenticateUser JWT"
107
+
108
+ **Hybrid Search finds**:
109
+
110
+ 1. Documents with `authenticateUser` function name (keyword match)
111
+ 2. Documents about JWT authentication (semantic match)
112
+ 3. Best combination of both
113
+
114
+ **Pure semantic search might miss**: Exact function name if using different terminology.
115
+
116
+ ### Scenario 2: Acronym + Description
117
+
118
+ **Query**: "API rate limiting"
119
+
120
+ **Hybrid Search finds**:
121
+
122
+ 1. Documents with "API" acronym (keyword match)
123
+ 2. Documents about rate limiting concepts (semantic match)
124
+ 3. Documents mentioning "API rate limiting" get highest score
125
+
126
+ ### Scenario 3: Typos + Exact Terms
127
+
128
+ **Query**: "OAuth2 authentification"
129
+
130
+ **Hybrid Search finds**:
131
+
132
+ 1. "OAuth2" exact matches (keyword - ignores typo in other term)
133
+ 2. Authentication concepts (semantic - understands "authentification" ≈ "authentication")
134
+
135
+ ## Technical Details
136
+
137
+ ### How It Works
138
+
139
+ 1. **Dense Vector Generation**: Your query is embedded using the configured embedding provider (Ollama, OpenAI, etc.)
140
+ 2. **Sparse Vector Generation**: Query is tokenized and BM25 scores are calculated
141
+ 3. **Parallel Search**: Both vectors are searched simultaneously
142
+ 4. **Result Fusion**: RRF combines rankings from both searches
143
+ 5. **Final Ranking**: Merged results with combined relevance scores
144
+
145
+ ### BM25 Sparse Vectors
146
+
147
+ The server uses a lightweight BM25 implementation for sparse vectors:
148
+
149
+ - Tokenization: Lowercase + whitespace splitting
150
+ - IDF scoring: Inverse document frequency
151
+ - Configurable parameters: k1=1.2, b=0.75
152
+
153
+ ### Reciprocal Rank Fusion (RRF)
154
+
155
+ RRF formula: `score = Σ(1 / (k + rank))` where k=60 (default)
156
+
157
+ Benefits:
158
+
159
+ - No score normalization needed
160
+ - Robust to differences in score scales
161
+ - Works well for combining different ranking methods
162
+
163
+ ## Best Practices
164
+
165
+ 1. **Enable hybrid for technical content**: Use when exact terms matter
166
+ 2. **Use semantic for general content**: Natural language queries without technical terms
167
+ 3. **Combine with filters**: Narrow down results by category or type
168
+ 4. **Test both approaches**: Compare semantic vs hybrid for your use case
169
+ 5. **Monitor performance**: Hybrid search requires more computation
170
+
171
+ ## Performance Considerations
172
+
173
+ - **Storage**: Hybrid collections require more space (dense + sparse vectors)
174
+ - **Indexing**: Document indexing is slightly slower
175
+ - **Query time**: Hybrid search performs two searches and fusion
176
+ - **Scalability**: Qdrant optimizes both vector types efficiently
177
+
178
+ ## Troubleshooting
179
+
180
+ ### "Collection does not have hybrid search enabled"
181
+
182
+ **Solution**: Create a new collection with `enableHybrid: true`. Existing collections cannot be converted.
183
+
184
+ ### Poor results with hybrid search
185
+
186
+ **Try**:
187
+
188
+ 1. Adjust query phrasing to include key terms
189
+ 2. Use metadata filters to narrow scope
190
+ 3. Increase `limit` to see more results
191
+ 4. Compare with pure semantic search
192
+
193
+ ### Slow query performance
194
+
195
+ **Solutions**:
196
+
197
+ 1. Reduce prefetch limit (contact support for tuning)
198
+ 2. Add filters to narrow search space
199
+ 3. Use fewer documents or partition data