@mhalder/qdrant-mcp-server 1.1.1 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +18 -0
- package/README.md +36 -0
- package/biome.json +34 -0
- package/build/embeddings/sparse.d.ts +40 -0
- package/build/embeddings/sparse.d.ts.map +1 -0
- package/build/embeddings/sparse.js +105 -0
- package/build/embeddings/sparse.js.map +1 -0
- package/build/embeddings/sparse.test.d.ts +2 -0
- package/build/embeddings/sparse.test.d.ts.map +1 -0
- package/build/embeddings/sparse.test.js +69 -0
- package/build/embeddings/sparse.test.js.map +1 -0
- package/build/index.js +333 -32
- package/build/index.js.map +1 -1
- package/build/qdrant/client.d.ts +21 -2
- package/build/qdrant/client.d.ts.map +1 -1
- package/build/qdrant/client.js +131 -17
- package/build/qdrant/client.js.map +1 -1
- package/build/qdrant/client.test.js +429 -21
- package/build/qdrant/client.test.js.map +1 -1
- package/build/transport.test.d.ts +2 -0
- package/build/transport.test.d.ts.map +1 -0
- package/build/transport.test.js +168 -0
- package/build/transport.test.js.map +1 -0
- package/examples/README.md +16 -1
- package/examples/basic/README.md +1 -0
- package/examples/hybrid-search/README.md +236 -0
- package/package.json +3 -1
- package/src/embeddings/sparse.test.ts +87 -0
- package/src/embeddings/sparse.ts +127 -0
- package/src/index.ts +393 -59
- package/src/qdrant/client.test.ts +544 -56
- package/src/qdrant/client.ts +162 -22
- package/src/transport.test.ts +202 -0
- package/vitest.config.ts +3 -3
package/CHANGELOG.md
CHANGED
|
@@ -1,3 +1,21 @@
|
|
|
1
|
+
## 1.3.0 (2025-10-11)
|
|
2
|
+
|
|
3
|
+
* Merge pull request #25 from mhalder/feature/http-transport ([efc90c3](https://github.com/mhalder/qdrant-mcp-server/commit/efc90c3)), closes [#25](https://github.com/mhalder/qdrant-mcp-server/issues/25)
|
|
4
|
+
* fix: address PR feedback for HTTP transport implementation ([1aec6d5](https://github.com/mhalder/qdrant-mcp-server/commit/1aec6d5))
|
|
5
|
+
* fix: address PR feedback for HTTP transport implementation ([3243d0e](https://github.com/mhalder/qdrant-mcp-server/commit/3243d0e))
|
|
6
|
+
* fix: clear cleanup interval on shutdown and improve error messages ([6aa29f3](https://github.com/mhalder/qdrant-mcp-server/commit/6aa29f3))
|
|
7
|
+
* fix: implement per-IP rate limiting and consolidate port validation ([c3bfc92](https://github.com/mhalder/qdrant-mcp-server/commit/c3bfc92))
|
|
8
|
+
* fix: prevent transport double closure and add rate limiter memory management ([2f92d78](https://github.com/mhalder/qdrant-mcp-server/commit/2f92d78))
|
|
9
|
+
* fix: resolve critical issues in HTTP transport implementation ([7951f2b](https://github.com/mhalder/qdrant-mcp-server/commit/7951f2b))
|
|
10
|
+
* fix: resolve race condition and resource leak in HTTP timeout handler ([6635ccb](https://github.com/mhalder/qdrant-mcp-server/commit/6635ccb))
|
|
11
|
+
* docs: add Try It and Cleanup sections to hybrid-search example ([5e32f16](https://github.com/mhalder/qdrant-mcp-server/commit/5e32f16))
|
|
12
|
+
* feat: add HTTP transport support for remote MCP server deployment ([983a9d6](https://github.com/mhalder/qdrant-mcp-server/commit/983a9d6)), closes [#24](https://github.com/mhalder/qdrant-mcp-server/issues/24)
|
|
13
|
+
|
|
14
|
+
## 1.2.0 (2025-10-11)
|
|
15
|
+
|
|
16
|
+
* Merge pull request #23 from mhalder/feature/hybrid-search ([5925df7](https://github.com/mhalder/qdrant-mcp-server/commit/5925df7)), closes [#23](https://github.com/mhalder/qdrant-mcp-server/issues/23)
|
|
17
|
+
* feat: enable semantic search on hybrid collections ([c99e177](https://github.com/mhalder/qdrant-mcp-server/commit/c99e177))
|
|
18
|
+
|
|
1
19
|
## <small>1.1.1 (2025-10-11)</small>
|
|
2
20
|
|
|
3
21
|
* Merge pull request #22 from mhalder/docs/clean-and-condense ([991cb9d](https://github.com/mhalder/qdrant-mcp-server/commit/991cb9d)), closes [#22](https://github.com/mhalder/qdrant-mcp-server/issues/22)
|
package/README.md
CHANGED
|
@@ -10,9 +10,11 @@ A Model Context Protocol (MCP) server providing semantic search capabilities usi
|
|
|
10
10
|
- **Zero Setup**: Works out of the box with Ollama - no API keys required
|
|
11
11
|
- **Privacy-First**: Local embeddings and vector storage - data never leaves your machine
|
|
12
12
|
- **Multiple Providers**: Ollama (default), OpenAI, Cohere, and Voyage AI
|
|
13
|
+
- **Hybrid Search**: Combine semantic and keyword search for better results
|
|
13
14
|
- **Semantic Search**: Natural language search with metadata filtering
|
|
14
15
|
- **Rate Limiting**: Intelligent throttling with exponential backoff
|
|
15
16
|
- **Full CRUD**: Create, search, and manage collections and documents
|
|
17
|
+
- **Flexible Deployment**: Run locally (stdio) or as a remote HTTP server
|
|
16
18
|
|
|
17
19
|
## Quick Start
|
|
18
20
|
|
|
@@ -39,6 +41,8 @@ npm run build
|
|
|
39
41
|
|
|
40
42
|
### Configuration
|
|
41
43
|
|
|
44
|
+
#### Local Setup (stdio transport)
|
|
45
|
+
|
|
42
46
|
Add to `~/.claude/claude_code_config.json`:
|
|
43
47
|
|
|
44
48
|
```json
|
|
@@ -56,6 +60,35 @@ Add to `~/.claude/claude_code_config.json`:
|
|
|
56
60
|
}
|
|
57
61
|
```
|
|
58
62
|
|
|
63
|
+
#### Remote Setup (HTTP transport)
|
|
64
|
+
|
|
65
|
+
> **⚠️ Security Warning**: When deploying the HTTP transport in production:
|
|
66
|
+
>
|
|
67
|
+
> - **Always** run behind a reverse proxy (nginx, Caddy) with HTTPS
|
|
68
|
+
> - Implement authentication/authorization at the proxy level
|
|
69
|
+
> - Use firewalls to restrict access to trusted networks
|
|
70
|
+
> - Never expose directly to the public internet without protection
|
|
71
|
+
> - Consider implementing rate limiting at the proxy level
|
|
72
|
+
> - Monitor server logs for suspicious activity
|
|
73
|
+
|
|
74
|
+
**Start the server:**
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
TRANSPORT_MODE=http HTTP_PORT=3000 node build/index.js
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**Configure client:**
|
|
81
|
+
|
|
82
|
+
```json
|
|
83
|
+
{
|
|
84
|
+
"mcpServers": {
|
|
85
|
+
"qdrant": {
|
|
86
|
+
"url": "http://your-server:3000/mcp"
|
|
87
|
+
}
|
|
88
|
+
}
|
|
89
|
+
}
|
|
90
|
+
```
|
|
91
|
+
|
|
59
92
|
**Using a different provider:**
|
|
60
93
|
|
|
61
94
|
```json
|
|
@@ -87,6 +120,7 @@ See [Advanced Configuration](#advanced-configuration) section below for all opti
|
|
|
87
120
|
| ------------------ | ----------------------------------------------------------------------------- |
|
|
88
121
|
| `add_documents` | Add documents with automatic embedding (supports string/number IDs, metadata) |
|
|
89
122
|
| `semantic_search` | Natural language search with optional metadata filtering |
|
|
123
|
+
| `hybrid_search` | Hybrid search combining semantic and keyword (BM25) search with RRF |
|
|
90
124
|
| `delete_documents` | Delete specific documents by ID |
|
|
91
125
|
|
|
92
126
|
### Resources
|
|
@@ -109,6 +143,8 @@ See [examples/](examples/) directory for detailed guides:
|
|
|
109
143
|
|
|
110
144
|
| Variable | Description | Default |
|
|
111
145
|
| ----------------------------------- | -------------------------------------- | --------------------- |
|
|
146
|
+
| `TRANSPORT_MODE` | "stdio" or "http" | stdio |
|
|
147
|
+
| `HTTP_PORT` | Port for HTTP transport | 3000 |
|
|
112
148
|
| `EMBEDDING_PROVIDER` | "ollama", "openai", "cohere", "voyage" | ollama |
|
|
113
149
|
| `QDRANT_URL` | Qdrant server URL | http://localhost:6333 |
|
|
114
150
|
| `EMBEDDING_MODEL` | Model name | Provider-specific |
|
package/biome.json
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
{
|
|
2
|
+
"$schema": "https://biomejs.dev/schemas/2.2.5/schema.json",
|
|
3
|
+
"vcs": {
|
|
4
|
+
"enabled": true,
|
|
5
|
+
"clientKind": "git",
|
|
6
|
+
"useIgnoreFile": true,
|
|
7
|
+
"defaultBranch": "main"
|
|
8
|
+
},
|
|
9
|
+
"files": {
|
|
10
|
+
"ignoreUnknown": false
|
|
11
|
+
},
|
|
12
|
+
"formatter": {
|
|
13
|
+
"enabled": true,
|
|
14
|
+
"formatWithErrors": true,
|
|
15
|
+
"indentStyle": "space",
|
|
16
|
+
"indentWidth": 2,
|
|
17
|
+
"lineWidth": 100
|
|
18
|
+
},
|
|
19
|
+
"linter": {
|
|
20
|
+
"enabled": true,
|
|
21
|
+
"rules": {
|
|
22
|
+
"recommended": true
|
|
23
|
+
}
|
|
24
|
+
},
|
|
25
|
+
"javascript": {
|
|
26
|
+
"formatter": {
|
|
27
|
+
"quoteStyle": "double",
|
|
28
|
+
"jsxQuoteStyle": "double",
|
|
29
|
+
"trailingCommas": "es5",
|
|
30
|
+
"semicolons": "always",
|
|
31
|
+
"arrowParentheses": "always"
|
|
32
|
+
}
|
|
33
|
+
}
|
|
34
|
+
}
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* BM25 Sparse Vector Generator
|
|
3
|
+
*
|
|
4
|
+
* This module provides a simple BM25-like sparse vector generation for keyword search.
|
|
5
|
+
* For production use, consider using a proper BM25 implementation or Qdrant's built-in
|
|
6
|
+
* sparse vector generation via FastEmbed.
|
|
7
|
+
*/
|
|
8
|
+
import type { SparseVector } from "../qdrant/client.js";
|
|
9
|
+
export declare class BM25SparseVectorGenerator {
|
|
10
|
+
private vocabulary;
|
|
11
|
+
private idfScores;
|
|
12
|
+
private documentCount;
|
|
13
|
+
private k1;
|
|
14
|
+
private b;
|
|
15
|
+
constructor(k1?: number, b?: number);
|
|
16
|
+
/**
|
|
17
|
+
* Tokenize text into words (simple whitespace tokenization + lowercase)
|
|
18
|
+
*/
|
|
19
|
+
private tokenize;
|
|
20
|
+
/**
|
|
21
|
+
* Calculate term frequency for a document
|
|
22
|
+
*/
|
|
23
|
+
private getTermFrequency;
|
|
24
|
+
/**
|
|
25
|
+
* Build vocabulary from training documents (optional pre-training step)
|
|
26
|
+
* In a simple implementation, we can skip this and use on-the-fly vocabulary
|
|
27
|
+
*/
|
|
28
|
+
train(documents: string[]): void;
|
|
29
|
+
/**
|
|
30
|
+
* Generate sparse vector for a query or document
|
|
31
|
+
* Returns indices and values for non-zero dimensions
|
|
32
|
+
*/
|
|
33
|
+
generate(text: string, avgDocLength?: number): SparseVector;
|
|
34
|
+
/**
|
|
35
|
+
* Simple static method for generating sparse vectors without training
|
|
36
|
+
* Useful for quick implementation
|
|
37
|
+
*/
|
|
38
|
+
static generateSimple(text: string): SparseVector;
|
|
39
|
+
}
|
|
40
|
+
//# sourceMappingURL=sparse.d.ts.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"sparse.d.ts","sourceRoot":"","sources":["../../src/embeddings/sparse.ts"],"names":[],"mappings":"AAAA;;;;;;GAMG;AAEH,OAAO,KAAK,EAAE,YAAY,EAAE,MAAM,qBAAqB,CAAC;AAMxD,qBAAa,yBAAyB;IACpC,OAAO,CAAC,UAAU,CAAsB;IACxC,OAAO,CAAC,SAAS,CAAsB;IACvC,OAAO,CAAC,aAAa,CAAS;IAC9B,OAAO,CAAC,EAAE,CAAS;IACnB,OAAO,CAAC,CAAC,CAAS;gBAEN,EAAE,GAAE,MAAY,EAAE,CAAC,GAAE,MAAa;IAQ9C;;OAEG;IACH,OAAO,CAAC,QAAQ;IAQhB;;OAEG;IACH,OAAO,CAAC,gBAAgB;IAQxB;;;OAGG;IACH,KAAK,CAAC,SAAS,EAAE,MAAM,EAAE,GAAG,IAAI;IAwBhC;;;OAGG;IACH,QAAQ,CAAC,IAAI,EAAE,MAAM,EAAE,YAAY,GAAE,MAAW,GAAG,YAAY;IAmC/D;;;OAGG;IACH,MAAM,CAAC,cAAc,CAAC,IAAI,EAAE,MAAM,GAAG,YAAY;CAIlD"}
|
|
@@ -0,0 +1,105 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* BM25 Sparse Vector Generator
|
|
3
|
+
*
|
|
4
|
+
* This module provides a simple BM25-like sparse vector generation for keyword search.
|
|
5
|
+
* For production use, consider using a proper BM25 implementation or Qdrant's built-in
|
|
6
|
+
* sparse vector generation via FastEmbed.
|
|
7
|
+
*/
|
|
8
|
+
export class BM25SparseVectorGenerator {
|
|
9
|
+
vocabulary;
|
|
10
|
+
idfScores;
|
|
11
|
+
documentCount;
|
|
12
|
+
k1;
|
|
13
|
+
b;
|
|
14
|
+
constructor(k1 = 1.2, b = 0.75) {
|
|
15
|
+
this.vocabulary = new Map();
|
|
16
|
+
this.idfScores = new Map();
|
|
17
|
+
this.documentCount = 0;
|
|
18
|
+
this.k1 = k1;
|
|
19
|
+
this.b = b;
|
|
20
|
+
}
|
|
21
|
+
/**
|
|
22
|
+
* Tokenize text into words (simple whitespace tokenization + lowercase)
|
|
23
|
+
*/
|
|
24
|
+
tokenize(text) {
|
|
25
|
+
return text
|
|
26
|
+
.toLowerCase()
|
|
27
|
+
.replace(/[^\w\s]/g, " ")
|
|
28
|
+
.split(/\s+/)
|
|
29
|
+
.filter((token) => token.length > 0);
|
|
30
|
+
}
|
|
31
|
+
/**
|
|
32
|
+
* Calculate term frequency for a document
|
|
33
|
+
*/
|
|
34
|
+
getTermFrequency(tokens) {
|
|
35
|
+
const tf = {};
|
|
36
|
+
for (const token of tokens) {
|
|
37
|
+
tf[token] = (tf[token] || 0) + 1;
|
|
38
|
+
}
|
|
39
|
+
return tf;
|
|
40
|
+
}
|
|
41
|
+
/**
|
|
42
|
+
* Build vocabulary from training documents (optional pre-training step)
|
|
43
|
+
* In a simple implementation, we can skip this and use on-the-fly vocabulary
|
|
44
|
+
*/
|
|
45
|
+
train(documents) {
|
|
46
|
+
this.documentCount = documents.length;
|
|
47
|
+
const documentFrequency = new Map();
|
|
48
|
+
// Calculate document frequency for each term
|
|
49
|
+
for (const doc of documents) {
|
|
50
|
+
const tokens = this.tokenize(doc);
|
|
51
|
+
const uniqueTokens = new Set(tokens);
|
|
52
|
+
for (const token of uniqueTokens) {
|
|
53
|
+
if (!this.vocabulary.has(token)) {
|
|
54
|
+
this.vocabulary.set(token, this.vocabulary.size);
|
|
55
|
+
}
|
|
56
|
+
documentFrequency.set(token, (documentFrequency.get(token) || 0) + 1);
|
|
57
|
+
}
|
|
58
|
+
}
|
|
59
|
+
// Calculate IDF scores
|
|
60
|
+
for (const [token, df] of documentFrequency.entries()) {
|
|
61
|
+
const idf = Math.log((this.documentCount - df + 0.5) / (df + 0.5) + 1.0);
|
|
62
|
+
this.idfScores.set(token, idf);
|
|
63
|
+
}
|
|
64
|
+
}
|
|
65
|
+
/**
|
|
66
|
+
* Generate sparse vector for a query or document
|
|
67
|
+
* Returns indices and values for non-zero dimensions
|
|
68
|
+
*/
|
|
69
|
+
generate(text, avgDocLength = 50) {
|
|
70
|
+
const tokens = this.tokenize(text);
|
|
71
|
+
const tf = this.getTermFrequency(tokens);
|
|
72
|
+
const docLength = tokens.length;
|
|
73
|
+
const indices = [];
|
|
74
|
+
const values = [];
|
|
75
|
+
// Calculate BM25 score for each term
|
|
76
|
+
for (const [token, freq] of Object.entries(tf)) {
|
|
77
|
+
// Ensure token is in vocabulary
|
|
78
|
+
if (!this.vocabulary.has(token)) {
|
|
79
|
+
// For unseen tokens, add them to vocabulary dynamically
|
|
80
|
+
this.vocabulary.set(token, this.vocabulary.size);
|
|
81
|
+
}
|
|
82
|
+
const index = this.vocabulary.get(token);
|
|
83
|
+
// Use a default IDF if not trained
|
|
84
|
+
const idf = this.idfScores.get(token) || 1.0;
|
|
85
|
+
// BM25 formula
|
|
86
|
+
const numerator = freq * (this.k1 + 1);
|
|
87
|
+
const denominator = freq + this.k1 * (1 - this.b + this.b * (docLength / avgDocLength));
|
|
88
|
+
const score = idf * (numerator / denominator);
|
|
89
|
+
if (score > 0) {
|
|
90
|
+
indices.push(index);
|
|
91
|
+
values.push(score);
|
|
92
|
+
}
|
|
93
|
+
}
|
|
94
|
+
return { indices, values };
|
|
95
|
+
}
|
|
96
|
+
/**
|
|
97
|
+
* Simple static method for generating sparse vectors without training
|
|
98
|
+
* Useful for quick implementation
|
|
99
|
+
*/
|
|
100
|
+
static generateSimple(text) {
|
|
101
|
+
const generator = new BM25SparseVectorGenerator();
|
|
102
|
+
return generator.generate(text);
|
|
103
|
+
}
|
|
104
|
+
}
|
|
105
|
+
//# sourceMappingURL=sparse.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"sparse.js","sourceRoot":"","sources":["../../src/embeddings/sparse.ts"],"names":[],"mappings":"AAAA;;;;;;GAMG;AAQH,MAAM,OAAO,yBAAyB;IAC5B,UAAU,CAAsB;IAChC,SAAS,CAAsB;IAC/B,aAAa,CAAS;IACtB,EAAE,CAAS;IACX,CAAC,CAAS;IAElB,YAAY,KAAa,GAAG,EAAE,IAAY,IAAI;QAC5C,IAAI,CAAC,UAAU,GAAG,IAAI,GAAG,EAAE,CAAC;QAC5B,IAAI,CAAC,SAAS,GAAG,IAAI,GAAG,EAAE,CAAC;QAC3B,IAAI,CAAC,aAAa,GAAG,CAAC,CAAC;QACvB,IAAI,CAAC,EAAE,GAAG,EAAE,CAAC;QACb,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC;IACb,CAAC;IAED;;OAEG;IACK,QAAQ,CAAC,IAAY;QAC3B,OAAO,IAAI;aACR,WAAW,EAAE;aACb,OAAO,CAAC,UAAU,EAAE,GAAG,CAAC;aACxB,KAAK,CAAC,KAAK,CAAC;aACZ,MAAM,CAAC,CAAC,KAAK,EAAE,EAAE,CAAC,KAAK,CAAC,MAAM,GAAG,CAAC,CAAC,CAAC;IACzC,CAAC;IAED;;OAEG;IACK,gBAAgB,CAAC,MAAgB;QACvC,MAAM,EAAE,GAAmB,EAAE,CAAC;QAC9B,KAAK,MAAM,KAAK,IAAI,MAAM,EAAE,CAAC;YAC3B,EAAE,CAAC,KAAK,CAAC,GAAG,CAAC,EAAE,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC;QACnC,CAAC;QACD,OAAO,EAAE,CAAC;IACZ,CAAC;IAED;;;OAGG;IACH,KAAK,CAAC,SAAmB;QACvB,IAAI,CAAC,aAAa,GAAG,SAAS,CAAC,MAAM,CAAC;QACtC,MAAM,iBAAiB,GAAG,IAAI,GAAG,EAAkB,CAAC;QAEpD,6CAA6C;QAC7C,KAAK,MAAM,GAAG,IAAI,SAAS,EAAE,CAAC;YAC5B,MAAM,MAAM,GAAG,IAAI,CAAC,QAAQ,CAAC,GAAG,CAAC,CAAC;YAClC,MAAM,YAAY,GAAG,IAAI,GAAG,CAAC,MAAM,CAAC,CAAC;YAErC,KAAK,MAAM,KAAK,IAAI,YAAY,EAAE,CAAC;gBACjC,IAAI,CAAC,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC,KAAK,CAAC,EAAE,CAAC;oBAChC,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC,KAAK,EAAE,IAAI,CAAC,UAAU,CAAC,IAAI,CAAC,CAAC;gBACnD,CAAC;gBACD,iBAAiB,CAAC,GAAG,CAAC,KAAK,EAAE,CAAC,iBAAiB,CAAC,GAAG,CAAC,KAAK,CAAC,IAAI,CAAC,CAAC,GAAG,CAAC,CAAC,CAAC;YACxE,CAAC;QACH,CAAC;QAED,uBAAuB;QACvB,KAAK,MAAM,CAAC,KAAK,EAAE,EAAE,CAAC,IAAI,iBAAiB,CAAC,OAAO,EAAE,EAAE,CAAC;YACtD,MAAM,GAAG,GAAG,IAAI,CAAC,GAAG,CAAC,CAAC,IAAI,CAAC,aAAa,GAAG,EAAE,GAAG,GAAG,CAAC,GAAG,CAAC,EAAE,GAAG,GAAG,CAAC,GAAG,GAAG,CAAC,CAAC;YACzE,IAAI,CAAC,SAAS,CAAC,GAAG,CAAC,KAAK,EAAE,GAAG,CAAC,CAAC;QACjC,CAAC;IACH,CAAC;IAED;;;OAGG;IACH,QAAQ,CAAC,IAAY,EAAE,eAAuB,EAAE;QAC9C,MAAM,MAAM,GAAG,IAAI,CAAC,QAAQ,CAAC,IAAI,CAAC,CAAC;QACnC,MAAM,EAAE,GAAG,IAAI,CAAC,gBAAgB,CAAC,MAAM,CAAC,CAAC;QACzC,MAAM,SAAS,GAAG,MAAM,CAAC,MAAM,CAAC;QAEhC,MAAM,OAAO,GAAa,EAAE,CAAC;QAC7B,MAAM,MAAM,GAAa,EAAE,CAAC;QAE5B,qCAAqC;QACrC,KAAK,MAAM,CAAC,KAAK,EAAE,IAAI,CAAC,IAAI,MAAM,CAAC,OAAO,CAAC,EAAE,CAAC,EAAE,CAAC;YAC/C,gCAAgC;YAChC,IAAI,CAAC,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC,KAAK,CAAC,EAAE,CAAC;gBAChC,wDAAwD;gBACxD,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC,KAAK,EAAE,IAAI,CAAC,UAAU,CAAC,IAAI,CAAC,CAAC;YACnD,CAAC;YAED,MAAM,KAAK,GAAG,IAAI,CAAC,UAAU,CAAC,GAAG,CAAC,KAAK,CAAE,CAAC;YAE1C,mCAAmC;YACnC,MAAM,GAAG,GAAG,IAAI,CAAC,SAAS,CAAC,GAAG,CAAC,KAAK,CAAC,IAAI,GAAG,CAAC;YAE7C,eAAe;YACf,MAAM,SAAS,GAAG,IAAI,GAAG,CAAC,IAAI,CAAC,EAAE,GAAG,CAAC,CAAC,CAAC;YACvC,MAAM,WAAW,GAAG,IAAI,GAAG,IAAI,CAAC,EAAE,GAAG,CAAC,CAAC,GAAG,IAAI,CAAC,CAAC,GAAG,IAAI,CAAC,CAAC,GAAG,CAAC,SAAS,GAAG,YAAY,CAAC,CAAC,CAAC;YACxF,MAAM,KAAK,GAAG,GAAG,GAAG,CAAC,SAAS,GAAG,WAAW,CAAC,CAAC;YAE9C,IAAI,KAAK,GAAG,CAAC,EAAE,CAAC;gBACd,OAAO,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;gBACpB,MAAM,CAAC,IAAI,CAAC,KAAK,CAAC,CAAC;YACrB,CAAC;QACH,CAAC;QAED,OAAO,EAAE,OAAO,EAAE,MAAM,EAAE,CAAC;IAC7B,CAAC;IAED;;;OAGG;IACH,MAAM,CAAC,cAAc,CAAC,IAAY;QAChC,MAAM,SAAS,GAAG,IAAI,yBAAyB,EAAE,CAAC;QAClD,OAAO,SAAS,CAAC,QAAQ,CAAC,IAAI,CAAC,CAAC;IAClC,CAAC;CACF"}
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"sparse.test.d.ts","sourceRoot":"","sources":["../../src/embeddings/sparse.test.ts"],"names":[],"mappings":""}
|
|
@@ -0,0 +1,69 @@
|
|
|
1
|
+
import { describe, expect, it } from "vitest";
|
|
2
|
+
import { BM25SparseVectorGenerator } from "./sparse.js";
|
|
3
|
+
describe("BM25SparseVectorGenerator", () => {
|
|
4
|
+
it("should generate sparse vectors for simple text", () => {
|
|
5
|
+
const generator = new BM25SparseVectorGenerator();
|
|
6
|
+
const result = generator.generate("hello world");
|
|
7
|
+
expect(result.indices).toBeDefined();
|
|
8
|
+
expect(result.values).toBeDefined();
|
|
9
|
+
expect(result.indices.length).toBeGreaterThan(0);
|
|
10
|
+
expect(result.values.length).toBe(result.indices.length);
|
|
11
|
+
});
|
|
12
|
+
it("should generate different vectors for different texts", () => {
|
|
13
|
+
const generator = new BM25SparseVectorGenerator();
|
|
14
|
+
const result1 = generator.generate("hello world");
|
|
15
|
+
const result2 = generator.generate("goodbye world");
|
|
16
|
+
// Different texts should have different sparse representations
|
|
17
|
+
expect(result1.indices).not.toEqual(result2.indices);
|
|
18
|
+
});
|
|
19
|
+
it("should generate consistent vectors for the same text", () => {
|
|
20
|
+
const generator = new BM25SparseVectorGenerator();
|
|
21
|
+
const result1 = generator.generate("hello world");
|
|
22
|
+
const result2 = generator.generate("hello world");
|
|
23
|
+
expect(result1.indices).toEqual(result2.indices);
|
|
24
|
+
expect(result1.values).toEqual(result2.values);
|
|
25
|
+
});
|
|
26
|
+
it("should handle empty strings", () => {
|
|
27
|
+
const generator = new BM25SparseVectorGenerator();
|
|
28
|
+
const result = generator.generate("");
|
|
29
|
+
expect(result.indices).toHaveLength(0);
|
|
30
|
+
expect(result.values).toHaveLength(0);
|
|
31
|
+
});
|
|
32
|
+
it("should handle special characters and punctuation", () => {
|
|
33
|
+
const generator = new BM25SparseVectorGenerator();
|
|
34
|
+
const result = generator.generate("hello, world! how are you?");
|
|
35
|
+
expect(result.indices).toBeDefined();
|
|
36
|
+
expect(result.values).toBeDefined();
|
|
37
|
+
expect(result.indices.length).toBeGreaterThan(0);
|
|
38
|
+
});
|
|
39
|
+
it("should train on corpus and generate IDF scores", () => {
|
|
40
|
+
const generator = new BM25SparseVectorGenerator();
|
|
41
|
+
const corpus = ["the quick brown fox", "jumps over the lazy dog", "the fox is quick"];
|
|
42
|
+
generator.train(corpus);
|
|
43
|
+
const result = generator.generate("quick fox");
|
|
44
|
+
expect(result.indices).toBeDefined();
|
|
45
|
+
expect(result.values).toBeDefined();
|
|
46
|
+
expect(result.indices.length).toBeGreaterThan(0);
|
|
47
|
+
});
|
|
48
|
+
it("should use static generateSimple method", () => {
|
|
49
|
+
const result = BM25SparseVectorGenerator.generateSimple("hello world");
|
|
50
|
+
expect(result.indices).toBeDefined();
|
|
51
|
+
expect(result.values).toBeDefined();
|
|
52
|
+
expect(result.indices.length).toBeGreaterThan(0);
|
|
53
|
+
});
|
|
54
|
+
it("should lowercase and tokenize text properly", () => {
|
|
55
|
+
const generator = new BM25SparseVectorGenerator();
|
|
56
|
+
const result1 = generator.generate("HELLO WORLD");
|
|
57
|
+
const result2 = generator.generate("hello world");
|
|
58
|
+
// Should produce same results due to lowercasing
|
|
59
|
+
expect(result1.indices).toEqual(result2.indices);
|
|
60
|
+
});
|
|
61
|
+
it("should generate positive values", () => {
|
|
62
|
+
const generator = new BM25SparseVectorGenerator();
|
|
63
|
+
const result = generator.generate("hello world");
|
|
64
|
+
result.values.forEach((value) => {
|
|
65
|
+
expect(value).toBeGreaterThan(0);
|
|
66
|
+
});
|
|
67
|
+
});
|
|
68
|
+
});
|
|
69
|
+
//# sourceMappingURL=sparse.test.js.map
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
{"version":3,"file":"sparse.test.js","sourceRoot":"","sources":["../../src/embeddings/sparse.test.ts"],"names":[],"mappings":"AAAA,OAAO,EAAE,QAAQ,EAAE,MAAM,EAAE,EAAE,EAAE,MAAM,QAAQ,CAAC;AAC9C,OAAO,EAAE,yBAAyB,EAAE,MAAM,aAAa,CAAC;AAExD,QAAQ,CAAC,2BAA2B,EAAE,GAAG,EAAE;IACzC,EAAE,CAAC,gDAAgD,EAAE,GAAG,EAAE;QACxD,MAAM,SAAS,GAAG,IAAI,yBAAyB,EAAE,CAAC;QAClD,MAAM,MAAM,GAAG,SAAS,CAAC,QAAQ,CAAC,aAAa,CAAC,CAAC;QAEjD,MAAM,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,WAAW,EAAE,CAAC;QACrC,MAAM,CAAC,MAAM,CAAC,MAAM,CAAC,CAAC,WAAW,EAAE,CAAC;QACpC,MAAM,CAAC,MAAM,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC,eAAe,CAAC,CAAC,CAAC,CAAC;QACjD,MAAM,CAAC,MAAM,CAAC,MAAM,CAAC,MAAM,CAAC,CAAC,IAAI,CAAC,MAAM,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC;IAC3D,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,uDAAuD,EAAE,GAAG,EAAE;QAC/D,MAAM,SAAS,GAAG,IAAI,yBAAyB,EAAE,CAAC;QAClD,MAAM,OAAO,GAAG,SAAS,CAAC,QAAQ,CAAC,aAAa,CAAC,CAAC;QAClD,MAAM,OAAO,GAAG,SAAS,CAAC,QAAQ,CAAC,eAAe,CAAC,CAAC;QAEpD,+DAA+D;QAC/D,MAAM,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC,GAAG,CAAC,OAAO,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC;IACvD,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,sDAAsD,EAAE,GAAG,EAAE;QAC9D,MAAM,SAAS,GAAG,IAAI,yBAAyB,EAAE,CAAC;QAClD,MAAM,OAAO,GAAG,SAAS,CAAC,QAAQ,CAAC,aAAa,CAAC,CAAC;QAClD,MAAM,OAAO,GAAG,SAAS,CAAC,QAAQ,CAAC,aAAa,CAAC,CAAC;QAElD,MAAM,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC,OAAO,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC;QACjD,MAAM,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC,OAAO,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC;IACjD,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,6BAA6B,EAAE,GAAG,EAAE;QACrC,MAAM,SAAS,GAAG,IAAI,yBAAyB,EAAE,CAAC;QAClD,MAAM,MAAM,GAAG,SAAS,CAAC,QAAQ,CAAC,EAAE,CAAC,CAAC;QAEtC,MAAM,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,YAAY,CAAC,CAAC,CAAC,CAAC;QACvC,MAAM,CAAC,MAAM,CAAC,MAAM,CAAC,CAAC,YAAY,CAAC,CAAC,CAAC,CAAC;IACxC,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,kDAAkD,EAAE,GAAG,EAAE;QAC1D,MAAM,SAAS,GAAG,IAAI,yBAAyB,EAAE,CAAC;QAClD,MAAM,MAAM,GAAG,SAAS,CAAC,QAAQ,CAAC,4BAA4B,CAAC,CAAC;QAEhE,MAAM,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,WAAW,EAAE,CAAC;QACrC,MAAM,CAAC,MAAM,CAAC,MAAM,CAAC,CAAC,WAAW,EAAE,CAAC;QACpC,MAAM,CAAC,MAAM,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC,eAAe,CAAC,CAAC,CAAC,CAAC;IACnD,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,gDAAgD,EAAE,GAAG,EAAE;QACxD,MAAM,SAAS,GAAG,IAAI,yBAAyB,EAAE,CAAC;QAClD,MAAM,MAAM,GAAG,CAAC,qBAAqB,EAAE,yBAAyB,EAAE,kBAAkB,CAAC,CAAC;QAEtF,SAAS,CAAC,KAAK,CAAC,MAAM,CAAC,CAAC;QACxB,MAAM,MAAM,GAAG,SAAS,CAAC,QAAQ,CAAC,WAAW,CAAC,CAAC;QAE/C,MAAM,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,WAAW,EAAE,CAAC;QACrC,MAAM,CAAC,MAAM,CAAC,MAAM,CAAC,CAAC,WAAW,EAAE,CAAC;QACpC,MAAM,CAAC,MAAM,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC,eAAe,CAAC,CAAC,CAAC,CAAC;IACnD,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,yCAAyC,EAAE,GAAG,EAAE;QACjD,MAAM,MAAM,GAAG,yBAAyB,CAAC,cAAc,CAAC,aAAa,CAAC,CAAC;QAEvE,MAAM,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,WAAW,EAAE,CAAC;QACrC,MAAM,CAAC,MAAM,CAAC,MAAM,CAAC,CAAC,WAAW,EAAE,CAAC;QACpC,MAAM,CAAC,MAAM,CAAC,OAAO,CAAC,MAAM,CAAC,CAAC,eAAe,CAAC,CAAC,CAAC,CAAC;IACnD,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,6CAA6C,EAAE,GAAG,EAAE;QACrD,MAAM,SAAS,GAAG,IAAI,yBAAyB,EAAE,CAAC;QAClD,MAAM,OAAO,GAAG,SAAS,CAAC,QAAQ,CAAC,aAAa,CAAC,CAAC;QAClD,MAAM,OAAO,GAAG,SAAS,CAAC,QAAQ,CAAC,aAAa,CAAC,CAAC;QAElD,iDAAiD;QACjD,MAAM,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC,OAAO,CAAC,OAAO,CAAC,OAAO,CAAC,CAAC;IACnD,CAAC,CAAC,CAAC;IAEH,EAAE,CAAC,iCAAiC,EAAE,GAAG,EAAE;QACzC,MAAM,SAAS,GAAG,IAAI,yBAAyB,EAAE,CAAC;QAClD,MAAM,MAAM,GAAG,SAAS,CAAC,QAAQ,CAAC,aAAa,CAAC,CAAC;QAEjD,MAAM,CAAC,MAAM,CAAC,OAAO,CAAC,CAAC,KAAK,EAAE,EAAE;YAC9B,MAAM,CAAC,KAAK,CAAC,CAAC,eAAe,CAAC,CAAC,CAAC,CAAC;QACnC,CAAC,CAAC,CAAC;IACL,CAAC,CAAC,CAAC;AACL,CAAC,CAAC,CAAC"}
|