@nlptools/distance 0.0.2 → 0.0.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -21
- package/README.md +262 -66
- package/dist/index.d.mts +679 -4
- package/dist/index.mjs +1134 -6
- package/package.json +30 -27
- package/dist/index.d.ts +0 -5
package/LICENSE
CHANGED
|
@@ -1,21 +1,21 @@
|
|
|
1
|
-
MIT License
|
|
2
|
-
|
|
3
|
-
Copyright (c) 2023 Demo Macro
|
|
4
|
-
|
|
5
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
-
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
-
in the Software without restriction, including without limitation the rights
|
|
8
|
-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
-
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
-
furnished to do so, subject to the following conditions:
|
|
11
|
-
|
|
12
|
-
The above copyright notice and this permission notice shall be included in all
|
|
13
|
-
copies or substantial portions of the Software.
|
|
14
|
-
|
|
15
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
-
SOFTWARE.
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2023 Demo Macro
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
CHANGED
|
@@ -1,23 +1,19 @@
|
|
|
1
1
|
# @nlptools/distance
|
|
2
2
|
|
|
3
3
|

|
|
4
|
-

|
|
5
4
|

|
|
6
|
-
[](https://www.contributor-covenant.org/version/2/1/code_of_conduct/)
|
|
7
5
|
|
|
8
|
-
>
|
|
9
|
-
|
|
10
|
-
This package provides comprehensive text similarity and distance algorithms, combining the high-performance WebAssembly implementation from `@nlptools/distance-wasm` with additional JavaScript-based algorithms for maximum compatibility and performance.
|
|
6
|
+
> High-performance string distance and similarity algorithms, implemented in pure TypeScript
|
|
11
7
|
|
|
12
8
|
## Features
|
|
13
9
|
|
|
14
|
-
-
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
-
|
|
18
|
-
-
|
|
19
|
-
-
|
|
20
|
-
-
|
|
10
|
+
- Pure TypeScript implementation, zero native dependencies
|
|
11
|
+
- Edit distance: Levenshtein, LCS (Myers O(ND) and DP)
|
|
12
|
+
- Token similarity: Jaccard, Cosine, Sorensen-Dice (character multiset and n-gram variants)
|
|
13
|
+
- Hash-based deduplication: SimHash, MinHash, LSH
|
|
14
|
+
- Fuzzy search: `FuzzySearch` class and `findBestMatch` with multi-algorithm support
|
|
15
|
+
- Diff: based on `@algorithm.ts/diff` (Myers and DP backends)
|
|
16
|
+
- All distance algorithms include normalized similarity variants (0-1 range)
|
|
21
17
|
|
|
22
18
|
## Installation
|
|
23
19
|
|
|
@@ -34,94 +30,294 @@ pnpm add @nlptools/distance
|
|
|
34
30
|
|
|
35
31
|
## Usage
|
|
36
32
|
|
|
37
|
-
###
|
|
33
|
+
### Edit Distance
|
|
34
|
+
|
|
35
|
+
```typescript
|
|
36
|
+
import { levenshtein, levenshteinNormalized } from "@nlptools/distance";
|
|
37
|
+
|
|
38
|
+
levenshtein("kitten", "sitting"); // 3
|
|
39
|
+
levenshteinNormalized("kitten", "sitting"); // 0.571
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
### LCS (Longest Common Subsequence)
|
|
38
43
|
|
|
39
44
|
```typescript
|
|
40
|
-
import
|
|
45
|
+
import { lcsDistance, lcsNormalized, lcsLength, lcsPairs } from "@nlptools/distance";
|
|
41
46
|
|
|
42
|
-
//
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
47
|
+
lcsDistance("abcde", "ace"); // 2 (= 5 + 3 - 2 * 3)
|
|
48
|
+
lcsNormalized("abcde", "ace"); // 0.75
|
|
49
|
+
lcsLength("abcde", "ace"); // 3
|
|
50
|
+
lcsPairs("abcde", "ace"); // [[0,0], [2,1], [4,2]]
|
|
46
51
|
```
|
|
47
52
|
|
|
48
|
-
|
|
53
|
+
By default uses Myers O(ND) algorithm. Switch to DP with `algorithm: "dp"`.
|
|
49
54
|
|
|
50
|
-
|
|
55
|
+
### Token Similarity (Character Multiset)
|
|
56
|
+
|
|
57
|
+
Based on character frequency maps (Counter), matching the `textdistance` crate semantics:
|
|
51
58
|
|
|
52
59
|
```typescript
|
|
53
|
-
|
|
54
|
-
const dist = distance.levenshtein("cat", "bat"); // 1
|
|
60
|
+
import { jaccard, cosine, sorensen } from "@nlptools/distance";
|
|
55
61
|
|
|
56
|
-
|
|
57
|
-
|
|
62
|
+
jaccard("abc", "abd"); // 0.667
|
|
63
|
+
cosine("hello", "hallo"); // 0.8
|
|
64
|
+
sorensen("test", "text"); // 0.75
|
|
58
65
|
```
|
|
59
66
|
|
|
60
|
-
###
|
|
67
|
+
### N-Gram Variants
|
|
68
|
+
|
|
69
|
+
```typescript
|
|
70
|
+
import { jaccardNgram, cosineNgram, sorensenNgram } from "@nlptools/distance";
|
|
61
71
|
|
|
62
|
-
|
|
72
|
+
jaccardNgram("hello", "hallo"); // 0.333 (bigram-based)
|
|
73
|
+
cosineNgram("hello", "hallo"); // 0.5 (bigram-based)
|
|
74
|
+
```
|
|
63
75
|
|
|
64
|
-
|
|
76
|
+
### SimHash (Document Fingerprinting)
|
|
65
77
|
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
78
|
+
```typescript
|
|
79
|
+
import { simhash, hammingDistance, SimHasher } from "@nlptools/distance";
|
|
80
|
+
|
|
81
|
+
// Function-based
|
|
82
|
+
const fp1 = simhash(["hello", "world"]);
|
|
83
|
+
const fp2 = simhash(["hello", "earth"]);
|
|
84
|
+
hammingDistance(fp1, fp2); // small = similar
|
|
85
|
+
|
|
86
|
+
// Class-based
|
|
87
|
+
const hasher = new SimHasher();
|
|
88
|
+
const a = hasher.hash(["hello", "world"]);
|
|
89
|
+
const b = hasher.hash(["hello", "earth"]);
|
|
90
|
+
hasher.isDuplicate(a, b); // true if hamming distance <= 3
|
|
91
|
+
```
|
|
74
92
|
|
|
75
|
-
|
|
93
|
+
### MinHash (Jaccard Similarity Estimation)
|
|
76
94
|
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
- `ratcliff_obershelp` - Gestalt pattern matching
|
|
80
|
-
- `smith_waterman` - Local sequence alignment
|
|
95
|
+
```typescript
|
|
96
|
+
import { MinHash } from "@nlptools/distance";
|
|
81
97
|
|
|
82
|
-
|
|
98
|
+
const mh1 = new MinHash({ numHashes: 128 });
|
|
99
|
+
mh1.update("hello");
|
|
100
|
+
mh1.update("world");
|
|
83
101
|
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
- `tversky` - Tversky index
|
|
88
|
-
- `overlap` - Overlap coefficient
|
|
102
|
+
const mh2 = new MinHash({ numHashes: 128 });
|
|
103
|
+
mh2.update("hello");
|
|
104
|
+
mh2.update("earth");
|
|
89
105
|
|
|
90
|
-
|
|
106
|
+
MinHash.estimate(mh1.digest(), mh2.digest()); // ~0.67
|
|
107
|
+
```
|
|
91
108
|
|
|
92
|
-
|
|
93
|
-
- `cosine_bigram` - Cosine similarity on character bigrams
|
|
109
|
+
### LSH (Approximate Nearest Neighbor Search)
|
|
94
110
|
|
|
95
|
-
|
|
111
|
+
```typescript
|
|
112
|
+
import { MinHash } from "@nlptools/distance";
|
|
113
|
+
import { LSH } from "@nlptools/distance";
|
|
96
114
|
|
|
97
|
-
|
|
98
|
-
- `suffix` - Suffix similarity
|
|
99
|
-
- `length` - Length-based similarity
|
|
115
|
+
const lsh = new LSH({ numBands: 16, numHashes: 128 });
|
|
100
116
|
|
|
101
|
-
|
|
117
|
+
const mh = new MinHash({ numHashes: 128 });
|
|
118
|
+
mh.update("hello");
|
|
119
|
+
mh.update("world");
|
|
120
|
+
lsh.insert("doc1", mh.digest());
|
|
121
|
+
|
|
122
|
+
// Query for similar documents
|
|
123
|
+
const query = lsh.query(mh.digest(), 0.5);
|
|
124
|
+
// => [["doc1", 0.67]]
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Fuzzy Search
|
|
102
128
|
|
|
103
129
|
```typescript
|
|
104
|
-
|
|
105
|
-
|
|
130
|
+
import { FuzzySearch, findBestMatch } from "@nlptools/distance";
|
|
131
|
+
|
|
132
|
+
// String array search
|
|
133
|
+
const search = new FuzzySearch(["apple", "banana", "cherry"]);
|
|
134
|
+
search.search("aple");
|
|
135
|
+
// => [{ item: "apple", score: 0.8, index: 0 }]
|
|
136
|
+
|
|
137
|
+
// Object array with weighted keys
|
|
138
|
+
const books = [
|
|
139
|
+
{ title: "Old Man's War", author: "John Scalzi" },
|
|
140
|
+
{ title: "The Great Gatsby", author: "F. Scott Fitzgerald" },
|
|
141
|
+
];
|
|
142
|
+
const bookSearch = new FuzzySearch(books, {
|
|
143
|
+
keys: [
|
|
144
|
+
{ name: "title", weight: 0.7 },
|
|
145
|
+
{ name: "author", weight: 0.3 },
|
|
146
|
+
],
|
|
147
|
+
algorithm: "cosine",
|
|
148
|
+
threshold: 0.3,
|
|
149
|
+
});
|
|
150
|
+
bookSearch.search("old man");
|
|
151
|
+
// => [{ item: { title: "Old Man's War", ... }, score: 0.52, index: 0 }]
|
|
152
|
+
|
|
153
|
+
// One-shot best match
|
|
154
|
+
findBestMatch("kitten", ["sitting", "kit", "mitten"]);
|
|
155
|
+
// => { item: "kit", score: 0.5, index: 1 }
|
|
156
|
+
|
|
157
|
+
// With per-key details
|
|
158
|
+
const detailed = new FuzzySearch(books, {
|
|
159
|
+
keys: [{ name: "title" }, { name: "author" }],
|
|
160
|
+
includeMatchDetails: true,
|
|
161
|
+
});
|
|
162
|
+
detailed.search("gatsby");
|
|
163
|
+
// => [{ item: ..., score: 0.45, index: 1, matches: { title: 0.6, author: 0.1 } }]
|
|
164
|
+
```
|
|
165
|
+
|
|
166
|
+
### Diff
|
|
106
167
|
|
|
107
|
-
|
|
108
|
-
|
|
168
|
+
```typescript
|
|
169
|
+
import { diff, DiffType } from "@nlptools/distance";
|
|
170
|
+
|
|
171
|
+
const result = diff("abc", "ac");
|
|
172
|
+
// => [
|
|
173
|
+
// { type: DiffType.COMMON, tokens: "a" },
|
|
174
|
+
// { type: DiffType.REMOVED, tokens: "b" },
|
|
175
|
+
// { type: DiffType.COMMON, tokens: "c" },
|
|
176
|
+
// ]
|
|
109
177
|
```
|
|
110
178
|
|
|
179
|
+
## API Reference
|
|
180
|
+
|
|
181
|
+
### Edit Distance
|
|
182
|
+
|
|
183
|
+
| Function | Description | Returns |
|
|
184
|
+
| --------------------------------- | ------------------------- | -------------------- |
|
|
185
|
+
| `levenshtein(a, b)` | Levenshtein edit distance | `number` |
|
|
186
|
+
| `levenshteinNormalized(a, b)` | Normalized similarity | `number` (0-1) |
|
|
187
|
+
| `lcsDistance(a, b, algorithm?)` | LCS distance | `number` |
|
|
188
|
+
| `lcsNormalized(a, b, algorithm?)` | Normalized LCS similarity | `number` (0-1) |
|
|
189
|
+
| `lcsLength(a, b, algorithm?)` | LCS length | `number` |
|
|
190
|
+
| `lcsPairs(a, b, algorithm?)` | LCS matching pairs | `[number, number][]` |
|
|
191
|
+
|
|
192
|
+
### Token Similarity
|
|
193
|
+
|
|
194
|
+
| Function | Description | Returns |
|
|
195
|
+
| ------------------------- | ---------------------------------------------- | -------------- |
|
|
196
|
+
| `jaccard(a, b)` | Jaccard similarity (character multiset) | `number` (0-1) |
|
|
197
|
+
| `jaccardNgram(a, b, n?)` | Jaccard on character n-grams | `number` (0-1) |
|
|
198
|
+
| `cosine(a, b)` | Cosine similarity (character multiset) | `number` (0-1) |
|
|
199
|
+
| `cosineNgram(a, b, n?)` | Cosine on character n-grams | `number` (0-1) |
|
|
200
|
+
| `sorensen(a, b)` | Sorensen-Dice coefficient (character multiset) | `number` (0-1) |
|
|
201
|
+
| `sorensenNgram(a, b, n?)` | Sorensen-Dice on character n-grams | `number` (0-1) |
|
|
202
|
+
|
|
203
|
+
### Hash-Based Deduplication
|
|
204
|
+
|
|
205
|
+
| Function / Class | Description |
|
|
206
|
+
| -------------------------------- | ------------------------------------------------------------------ |
|
|
207
|
+
| `simhash(features, options?)` | Generate 64-bit fingerprint as `bigint` |
|
|
208
|
+
| `hammingDistance(a, b)` | Hamming distance between two fingerprints |
|
|
209
|
+
| `hammingSimilarity(a, b, bits?)` | Normalized Hamming similarity |
|
|
210
|
+
| `SimHasher` | Class with `hash()`, `distance()`, `similarity()`, `isDuplicate()` |
|
|
211
|
+
| `MinHash` | Class with `update()`, `digest()`, `estimate()` |
|
|
212
|
+
| `MinHash.estimate(sig1, sig2)` | Static: estimate Jaccard from signatures |
|
|
213
|
+
| `LSH` | Class with `insert()`, `query()`, `remove()` |
|
|
214
|
+
|
|
215
|
+
### Fuzzy Search
|
|
216
|
+
|
|
217
|
+
| Function / Class | Description |
|
|
218
|
+
| -------------------------------------------- | -------------------------------------------------- |
|
|
219
|
+
| `FuzzySearch<T>(collection, options?)` | Search engine with dynamic collection management |
|
|
220
|
+
| `findBestMatch(query, collection, options?)` | One-shot convenience: returns best match or `null` |
|
|
221
|
+
|
|
222
|
+
**FuzzySearch options:**
|
|
223
|
+
|
|
224
|
+
| Option | Type | Default | Description |
|
|
225
|
+
| --------------------- | ---------------------------------- | --------------- | ----------------------------- |
|
|
226
|
+
| `algorithm` | `BuiltinAlgorithm \| SimilarityFn` | `"levenshtein"` | Similarity algorithm to use |
|
|
227
|
+
| `keys` | `ISearchKey[]` | `[]` | Object fields to search on |
|
|
228
|
+
| `threshold` | `number` | `0` | Min similarity score (0-1) |
|
|
229
|
+
| `limit` | `number` | `Infinity` | Max results to return |
|
|
230
|
+
| `caseSensitive` | `boolean` | `false` | Case-insensitive by default |
|
|
231
|
+
| `includeMatchDetails` | `boolean` | `false` | Include per-key scores |
|
|
232
|
+
| `lsh` | `{ numHashes?, numBands? }` | — | Enable LSH for large datasets |
|
|
233
|
+
|
|
234
|
+
**Built-in algorithms:** `"levenshtein"`, `"lcs"`, `"jaccard"`, `"jaccardNgram"`, `"cosine"`, `"cosineNgram"`, `"sorensen"`, `"sorensenNgram"`
|
|
235
|
+
|
|
236
|
+
### Diff
|
|
237
|
+
|
|
238
|
+
| Function | Description | Returns |
|
|
239
|
+
| ---------------------- | --------------------------- | ---------------- |
|
|
240
|
+
| `diff(a, b, options?)` | Sequence diff (Myers or DP) | `IDiffItem<T>[]` |
|
|
241
|
+
|
|
242
|
+
### Types
|
|
243
|
+
|
|
244
|
+
| Type | Description |
|
|
245
|
+
| ----------------------- | -------------------------------------------- |
|
|
246
|
+
| `DiffType` | Enum: `ADDED`, `REMOVED`, `COMMON` |
|
|
247
|
+
| `IDiffItem<T>` | Diff result item with type and tokens |
|
|
248
|
+
| `IDiffOptions<T>` | Options for diff (equals, lcs algorithm) |
|
|
249
|
+
| `ISimHashOptions` | Options for SimHash (bits, hashFn) |
|
|
250
|
+
| `IMinHashOptions` | Options for MinHash (numHashes, seed) |
|
|
251
|
+
| `ILSHOptions` | Options for LSH (numBands, numHashes) |
|
|
252
|
+
| `IFuzzySearchOptions` | Options for FuzzySearch constructor |
|
|
253
|
+
| `IFindBestMatchOptions` | Options for findBestMatch function |
|
|
254
|
+
| `ISearchKey` | Searchable key config (name, weight, getter) |
|
|
255
|
+
| `ISearchResult<T>` | Search result with item, score, index |
|
|
256
|
+
| `SimilarityFn` | `(a: string, b: string) => number` in [0,1] |
|
|
257
|
+
|
|
111
258
|
## Performance
|
|
112
259
|
|
|
113
|
-
|
|
260
|
+
Benchmark: 1000 iterations per pair, same test data across all runtimes.
|
|
261
|
+
Unit: microseconds per operation (us/op).
|
|
262
|
+
|
|
263
|
+
### Edit Distance
|
|
264
|
+
|
|
265
|
+
| Algorithm | Size | TS (V8 JIT) | WASM (via JS) | Rust (native) |
|
|
266
|
+
| --------------- | --------------- | ----------- | ------------- | ------------- |
|
|
267
|
+
| levenshtein | Short (<10) | 0.3 | 7.9 | 0.11 |
|
|
268
|
+
| levenshtein | Medium (10-100) | 1.3 | 116.2 | 0.98 |
|
|
269
|
+
| levenshtein | Long (>200) | 15.2 | 2,877.5 | 39.68 |
|
|
270
|
+
| levenshteinNorm | Short | 0.3 | 7.9 | 0.11 |
|
|
271
|
+
| lcs | Short (<10) | 1.6 | 16.5 | 0.41 |
|
|
272
|
+
| lcs | Medium (10-100) | 6.8 | 272.6 | 3.22 |
|
|
273
|
+
| lcs | Long (>200) | 217.8 | 6,574.1 | 122.63 |
|
|
274
|
+
| lcsNorm | Short | 1.7 | 16.2 | 0.48 |
|
|
275
|
+
|
|
276
|
+
### Token Similarity (Character Multiset)
|
|
277
|
+
|
|
278
|
+
| Algorithm | Size | TS (V8 JIT) | WASM (via JS) | Rust (native) |
|
|
279
|
+
| --------- | --------------- | ----------- | ------------- | ------------- |
|
|
280
|
+
| jaccard | Short (<10) | 0.8 | 25.2 | 0.42 |
|
|
281
|
+
| jaccard | Medium (10-100) | 0.8 | 74.3 | 1.55 |
|
|
282
|
+
| jaccard | Long (>200) | 1.6 | 171.5 | 5.54 |
|
|
283
|
+
| cosine | Short (<10) | 0.8 | 19.3 | 0.32 |
|
|
284
|
+
| cosine | Medium (10-100) | 0.8 | 61.4 | 1.35 |
|
|
285
|
+
| cosine | Long (>200) | 1.5 | 158.5 | 4.77 |
|
|
286
|
+
| sorensen | Short (<10) | 0.7 | 19.3 | 0.33 |
|
|
287
|
+
| sorensen | Medium (10-100) | 0.7 | 61.0 | 1.33 |
|
|
288
|
+
| sorensen | Long (>200) | 1.5 | 160.0 | 4.46 |
|
|
289
|
+
|
|
290
|
+
### Bigram Variants
|
|
291
|
+
|
|
292
|
+
| Algorithm | Size | TS (V8 JIT) | WASM (via JS) | Rust (native) |
|
|
293
|
+
| ------------- | --------------- | ----------- | ------------- | ------------- |
|
|
294
|
+
| jaccardBigram | Short (<10) | 1.1 | 27.4 | 0.45 |
|
|
295
|
+
| jaccardBigram | Medium (10-100) | 7.7 | 160.4 | 3.86 |
|
|
296
|
+
| cosineBigram | Short (<10) | 0.8 | 21.2 | 0.36 |
|
|
297
|
+
| cosineBigram | Medium (10-100) | 5.9 | 127.0 | 3.12 |
|
|
298
|
+
|
|
299
|
+
TS implementations use V8 JIT optimization + `Int32Array` ASCII fast path + integer-encoded bigrams, avoiding JS-WASM boundary overhead entirely.
|
|
300
|
+
|
|
301
|
+
### Fuzzy Search: NLPTools vs Fuse.js
|
|
302
|
+
|
|
303
|
+
Benchmark: 20 items in collection, 6 queries per iteration, 1000 iterations.
|
|
304
|
+
Unit: milliseconds per operation (ms/op). Algorithm: levenshtein (default).
|
|
114
305
|
|
|
115
|
-
|
|
116
|
-
|
|
306
|
+
| Scenario | NLPTools | Fuse.js |
|
|
307
|
+
| ----------------------- | -------- | ------- |
|
|
308
|
+
| Setup (constructor) | 0.0002 | 0.0050 |
|
|
309
|
+
| Search (string array) | 0.0114 | 0.1077 |
|
|
310
|
+
| Search (object, 1 key) | 0.0176 | 0.3308 |
|
|
311
|
+
| Search (object, 2 keys) | 0.0289 | 0.6445 |
|
|
117
312
|
|
|
118
|
-
|
|
313
|
+
Both libraries return identical top-1 results for all test queries. NLPTools scores are normalized similarity (0-1, higher is better); Fuse.js uses Bitap error scores (0 = perfect, lower is better).
|
|
119
314
|
|
|
120
|
-
|
|
315
|
+
## Dependencies
|
|
121
316
|
|
|
122
|
-
-
|
|
123
|
-
-
|
|
317
|
+
- `fastest-levenshtein` — fastest JS Levenshtein implementation
|
|
318
|
+
- `@algorithm.ts/lcs` — Myers and DP Longest Common Subsequence
|
|
319
|
+
- `@algorithm.ts/diff` — Sequence diff built on LCS
|
|
124
320
|
|
|
125
321
|
## License
|
|
126
322
|
|
|
127
|
-
|
|
323
|
+
[MIT](../../LICENSE) © [Demo Macro](https://www.demomacro.com/)
|