cmpstr 2.0.0 → 2.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +81 -65
- package/package.json +4 -2
- package/src/CmpStr.d.ts +70 -0
- package/src/CmpStr.js +194 -66
- package/src/CmpStrAsync.d.ts +19 -0
- package/src/CmpStrAsync.js +20 -7
- package/src/index.d.ts +3 -0
- package/src/index.js +1 -1
package/README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
CmpStr is a lightweight and powerful npm package for calculating string similarity, finding the closest matches in arrays, performing phonetic searches, and more. It supports a variety of built-in algorithms (e.g., Levenshtein, Dice-Sørensen, Damerau-Levenshtein, Soundex) and allows users to add custom algorithms and normalization filters.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
**Key Features**
|
|
6
6
|
|
|
7
7
|
- Built-in support for multiple similarity algorithms.
|
|
8
8
|
- Phonetic search with language-specific configurations (e.g., Soundex).
|
|
@@ -10,6 +10,7 @@ CmpStr is a lightweight and powerful npm package for calculating string similari
|
|
|
10
10
|
- Customizable normalization with global flags and caching.
|
|
11
11
|
- Asynchronous support for non-blocking workflows.
|
|
12
12
|
- Extensible with custom algorithms and filters.
|
|
13
|
+
- TypeScript declarations for better developer experience.
|
|
13
14
|
|
|
14
15
|
## Installation
|
|
15
16
|
|
|
@@ -61,15 +62,19 @@ Sets the base string for comparison.
|
|
|
61
62
|
|
|
62
63
|
Parameters:
|
|
63
64
|
|
|
64
|
-
|
|
65
|
+
`<String> str` – string to set as the base
|
|
66
|
+
|
|
67
|
+
#### `getStr()`
|
|
68
|
+
|
|
69
|
+
Gets the base string for comparison.
|
|
65
70
|
|
|
66
71
|
#### `setFlags( [ flags = '' ] )`
|
|
67
72
|
|
|
68
73
|
Set default normalization flags. They will be overwritten by passing `flags` through the configuration object. See description of available flags / normalization options below in the documentation.
|
|
69
74
|
|
|
70
|
-
|
|
75
|
+
#### `getFlags()`
|
|
71
76
|
|
|
72
|
-
|
|
77
|
+
Gets the default normalization flags.
|
|
73
78
|
|
|
74
79
|
#### `clearCache()`
|
|
75
80
|
|
|
@@ -77,17 +82,21 @@ Clears the normalization cache.
|
|
|
77
82
|
|
|
78
83
|
### Algorithms
|
|
79
84
|
|
|
80
|
-
#### `listAlgo()`
|
|
85
|
+
#### `listAlgo( [ loadedOnly = false ] )`
|
|
81
86
|
|
|
82
87
|
List all registered similarity algorithms.
|
|
83
88
|
|
|
89
|
+
Parameters:
|
|
90
|
+
|
|
91
|
+
`<Boolean> loadedOnly` – it true, only loaded algorithm names are returned
|
|
92
|
+
|
|
84
93
|
#### `isAlgo( algo )`
|
|
85
94
|
|
|
86
95
|
Checks if an algorithm is registered. Returns `true` if so, `false` otherwise.
|
|
87
96
|
|
|
88
97
|
Parameters:
|
|
89
98
|
|
|
90
|
-
|
|
99
|
+
`<String> algo` – name of the algorithm
|
|
91
100
|
|
|
92
101
|
#### `setAlgo( algo )`
|
|
93
102
|
|
|
@@ -97,7 +106,11 @@ Allowed options for build-in althorithms are `cosine`, `damerau`, `dice`, `hammi
|
|
|
97
106
|
|
|
98
107
|
Parameters:
|
|
99
108
|
|
|
100
|
-
|
|
109
|
+
`<String> algo` – name of the algorithm
|
|
110
|
+
|
|
111
|
+
#### `getAlgo()`
|
|
112
|
+
|
|
113
|
+
Gets the current algorithm to use for similarity calculations.
|
|
101
114
|
|
|
102
115
|
#### `addAlgo( algo, callback [, useIt = true ] )`
|
|
103
116
|
|
|
@@ -105,9 +118,9 @@ Adding a new similarity algorithm by using the `addAlgo()` method passing the na
|
|
|
105
118
|
|
|
106
119
|
Parameters:
|
|
107
120
|
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
121
|
+
`<String> algo` – name of the algorithm
|
|
122
|
+
`<Function> callback` – callback function implementing the algorithm
|
|
123
|
+
`<Boolean> useIt` – whether to set this algorithm as the current one
|
|
111
124
|
|
|
112
125
|
Example:
|
|
113
126
|
|
|
@@ -128,7 +141,7 @@ Removing a registered similarity algorithm.
|
|
|
128
141
|
|
|
129
142
|
Parameters:
|
|
130
143
|
|
|
131
|
-
|
|
144
|
+
`<String> algo` – name of the algorithm
|
|
132
145
|
|
|
133
146
|
### Filters
|
|
134
147
|
|
|
@@ -142,9 +155,9 @@ Adds a custom normalization filter. Needs to be passed a unique name and callbac
|
|
|
142
155
|
|
|
143
156
|
Parameters:
|
|
144
157
|
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
158
|
+
`<String> name` – filter name
|
|
159
|
+
`<Function> callback` – callback function implementing the filter
|
|
160
|
+
`<Int> priority` – priority of the filter
|
|
148
161
|
|
|
149
162
|
Example:
|
|
150
163
|
|
|
@@ -160,7 +173,7 @@ Removes a custom normalization filter.
|
|
|
160
173
|
|
|
161
174
|
Parameters:
|
|
162
175
|
|
|
163
|
-
|
|
176
|
+
`<String> name` – filter name
|
|
164
177
|
|
|
165
178
|
#### `pauseFilter( name )`
|
|
166
179
|
|
|
@@ -168,7 +181,7 @@ Pauses a custom normalization filter.
|
|
|
168
181
|
|
|
169
182
|
Parameters:
|
|
170
183
|
|
|
171
|
-
|
|
184
|
+
`<String> name` – filter name
|
|
172
185
|
|
|
173
186
|
#### `resumeFilter( name )`
|
|
174
187
|
|
|
@@ -176,7 +189,7 @@ Resumes a custom normalization filter.
|
|
|
176
189
|
|
|
177
190
|
Parameters:
|
|
178
191
|
|
|
179
|
-
|
|
192
|
+
`<String> name` – filter name
|
|
180
193
|
|
|
181
194
|
#### `clearFilter( name )`
|
|
182
195
|
|
|
@@ -190,10 +203,10 @@ Compares two strings using the specified algorithm. The method returns either th
|
|
|
190
203
|
|
|
191
204
|
Parameters:
|
|
192
205
|
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
206
|
+
`<String> algo` – name of the algorithm
|
|
207
|
+
`<String> a` – first string
|
|
208
|
+
`<String> b` – second string
|
|
209
|
+
`<Object> config` – configuration object
|
|
197
210
|
|
|
198
211
|
Example:
|
|
199
212
|
|
|
@@ -210,8 +223,8 @@ Tests the similarity between the base string and a given target string. Returns
|
|
|
210
223
|
|
|
211
224
|
Parameters:
|
|
212
225
|
|
|
213
|
-
|
|
214
|
-
|
|
226
|
+
`<String> str` – target string
|
|
227
|
+
`<Object> config` – configuration object
|
|
215
228
|
|
|
216
229
|
Example:
|
|
217
230
|
|
|
@@ -228,8 +241,8 @@ Tests the similarity of multiple strings against the base string. Returns an arr
|
|
|
228
241
|
|
|
229
242
|
Parameters:
|
|
230
243
|
|
|
231
|
-
|
|
232
|
-
|
|
244
|
+
`<String[]> arr` – array of strings
|
|
245
|
+
`<Object> config` – configuration object
|
|
233
246
|
|
|
234
247
|
Example:
|
|
235
248
|
|
|
@@ -246,8 +259,8 @@ Finds strings in an array that exceed a similarity threshold and sorts them by h
|
|
|
246
259
|
|
|
247
260
|
Parameters:
|
|
248
261
|
|
|
249
|
-
|
|
250
|
-
|
|
262
|
+
`<String[]> arr` – array of strings
|
|
263
|
+
`<Object> config` – configuration object
|
|
251
264
|
|
|
252
265
|
Example:
|
|
253
266
|
|
|
@@ -266,8 +279,8 @@ Finds the closest matching string from an array and returns them.
|
|
|
266
279
|
|
|
267
280
|
Parameters:
|
|
268
281
|
|
|
269
|
-
|
|
270
|
-
|
|
282
|
+
`<String[]> arr` – array of strings
|
|
283
|
+
`<Object> config` – configuration object
|
|
271
284
|
|
|
272
285
|
Example:
|
|
273
286
|
|
|
@@ -284,9 +297,9 @@ Generates a similarity matrix for an array of strings. Returns an 2D array that
|
|
|
284
297
|
|
|
285
298
|
Parameters:
|
|
286
299
|
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
300
|
+
`<String> algo` – name of the algorithm
|
|
301
|
+
`<String[]> arr` – array of strings
|
|
302
|
+
`<Object> config` – configuration object
|
|
290
303
|
|
|
291
304
|
Example:
|
|
292
305
|
|
|
@@ -307,24 +320,24 @@ The `CmpStr` package allows strings to be normalized before the similarity compa
|
|
|
307
320
|
|
|
308
321
|
#### Supported Flags
|
|
309
322
|
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
|
|
318
|
-
|
|
323
|
+
`s` – remove special chars
|
|
324
|
+
`w` – collapse whitespaces
|
|
325
|
+
`r` – remove repeated chars
|
|
326
|
+
`k` – keep only letters
|
|
327
|
+
`n` – ignore numbers
|
|
328
|
+
`t` – trim whitespaces
|
|
329
|
+
`i` – case insensitivity
|
|
330
|
+
`d` – decompose unicode
|
|
331
|
+
`u` – normalize unicode
|
|
319
332
|
|
|
320
|
-
#### `normalize(
|
|
333
|
+
#### `normalize( input [, flags = '' ] )`
|
|
321
334
|
|
|
322
335
|
The method for normalizing strings can also be called on its own, without comparing the similarity of two strings. This also applies all filters and reads or writes to the cache. This can be helpful if certain strings should be saved beforehand or different normalization options want to be tested.
|
|
323
336
|
|
|
324
337
|
Parameters:
|
|
325
338
|
|
|
326
|
-
|
|
327
|
-
|
|
339
|
+
`<String|String[]> input` – single string or array of strings to normalize
|
|
340
|
+
`<String> flags` normalization flags
|
|
328
341
|
|
|
329
342
|
Example:
|
|
330
343
|
|
|
@@ -333,6 +346,9 @@ const cmp = new CmpStr();
|
|
|
333
346
|
|
|
334
347
|
console.log( cmp.normalize( ' he123LLo ', 'nti' ) );
|
|
335
348
|
// Output: hello
|
|
349
|
+
|
|
350
|
+
console.log( cmp.normalize( [ 'Hello World!', 'CmpStr 123' ], 'nwti' ) );
|
|
351
|
+
// Output: [ 'hello world!', 'cmpstr' ]
|
|
336
352
|
```
|
|
337
353
|
|
|
338
354
|
### Configuration Object
|
|
@@ -343,9 +359,9 @@ It also contains `options` as an object of key-value pairs that are passed to th
|
|
|
343
359
|
|
|
344
360
|
Global config options:
|
|
345
361
|
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
|
|
362
|
+
`<String> flags` – normalization flags
|
|
363
|
+
`<Number> threshold` – similarity threshold between 0 and 1
|
|
364
|
+
`<Object> options` – options passed to the algorithm
|
|
349
365
|
|
|
350
366
|
Example:
|
|
351
367
|
|
|
@@ -368,9 +384,9 @@ console.log( cmp.match( [
|
|
|
368
384
|
|
|
369
385
|
## Asynchronous Support
|
|
370
386
|
|
|
371
|
-
The `CmpStrAsync` class provides asynchronous
|
|
387
|
+
The `CmpStrAsync` class provides an asynchronous wrapper for all comparison methods as well as the string normalization function. It is ideal for large datasets or non-blocking workflows.
|
|
372
388
|
|
|
373
|
-
The asynchronous class supports the methods `compareAsync`, `testAsync`, `batchTestAsync`, `matchAsync`, `closestAsync` and `similarityMatrixAsync`. Each of these methods returns a `Promise`.
|
|
389
|
+
The asynchronous class supports the methods `normalizeAsync`, `compareAsync`, `testAsync`, `batchTestAsync`, `matchAsync`, `closestAsync` and `similarityMatrixAsync`. Each of these methods returns a `Promise`.
|
|
374
390
|
|
|
375
391
|
For options, arguments and returned values, see the documentation above.
|
|
376
392
|
|
|
@@ -398,7 +414,7 @@ The Levenshtein distance between two strings is the minimum number of single-cha
|
|
|
398
414
|
|
|
399
415
|
Options:
|
|
400
416
|
|
|
401
|
-
|
|
417
|
+
`<Boolean> raw` – if true the raw distance is returned
|
|
402
418
|
|
|
403
419
|
#### Damerau-Levenshtein – `damerau`
|
|
404
420
|
|
|
@@ -406,7 +422,7 @@ The Damerau-Levenshtein distance differs from the classical Levenshtein distance
|
|
|
406
422
|
|
|
407
423
|
Options:
|
|
408
424
|
|
|
409
|
-
|
|
425
|
+
`<Boolean> raw` – if true the raw distance is returned
|
|
410
426
|
|
|
411
427
|
#### Jaro-Winkler – `jaro`
|
|
412
428
|
|
|
@@ -414,7 +430,7 @@ Jaro-Winkler is a string similarity metric that gives more weight to matching ch
|
|
|
414
430
|
|
|
415
431
|
Options:
|
|
416
432
|
|
|
417
|
-
|
|
433
|
+
`<Boolean> raw` – if true the raw distance is returned
|
|
418
434
|
|
|
419
435
|
#### Cosine Similarity – `cosine`
|
|
420
436
|
|
|
@@ -422,7 +438,7 @@ Cosine similarity is a measure how similar two vectors are. It's often used in t
|
|
|
422
438
|
|
|
423
439
|
Options:
|
|
424
440
|
|
|
425
|
-
|
|
441
|
+
`<String> delimiter` – term delimiter
|
|
426
442
|
|
|
427
443
|
#### Dice Coefficient – `dice`
|
|
428
444
|
|
|
@@ -446,9 +462,9 @@ The Needleman-Wunsch algorithm performs global alignment, aligning two strings e
|
|
|
446
462
|
|
|
447
463
|
Options:
|
|
448
464
|
|
|
449
|
-
|
|
450
|
-
|
|
451
|
-
|
|
465
|
+
`<Number> match` – score for a match
|
|
466
|
+
`<Number> mismatch` – penalty for a mismatch
|
|
467
|
+
`<Number> gap` – penalty for a gap
|
|
452
468
|
|
|
453
469
|
#### Smith-Waterman – `smithWaterman`
|
|
454
470
|
|
|
@@ -456,9 +472,9 @@ The Smith-Waterman algorithm performs local alignment, finding the best matching
|
|
|
456
472
|
|
|
457
473
|
Options:
|
|
458
474
|
|
|
459
|
-
|
|
460
|
-
|
|
461
|
-
|
|
475
|
+
`<Number> match` – score for a match
|
|
476
|
+
`<Number> mismatch` – penalty for a mismatch
|
|
477
|
+
`<Number> gap` – penalty for a gap
|
|
462
478
|
|
|
463
479
|
#### q-Gram – `qGram`
|
|
464
480
|
|
|
@@ -466,7 +482,7 @@ Q-gram similarity is a string-matching algorithm that compares two strings by br
|
|
|
466
482
|
|
|
467
483
|
Options:
|
|
468
484
|
|
|
469
|
-
|
|
485
|
+
`<Int> q` length of substrings
|
|
470
486
|
|
|
471
487
|
### Phonetic Algorithms
|
|
472
488
|
|
|
@@ -476,8 +492,8 @@ The Soundex algorithm generates a phonetic representation of a string based on h
|
|
|
476
492
|
|
|
477
493
|
Options:
|
|
478
494
|
|
|
479
|
-
|
|
480
|
-
|
|
481
|
-
|
|
482
|
-
|
|
483
|
-
|
|
495
|
+
`<String> lang` – language code for predefined setups (e.g., `en`, `de`)
|
|
496
|
+
`<Boolean> raw` – if true, returns the raw sound index codes
|
|
497
|
+
`<Object> mapping` – custom phonetic mapping (overrides predefined)
|
|
498
|
+
`<String> exclude` – characters to exclude from the input (overrides predefined)
|
|
499
|
+
`<Number> maxLength` – maximum length of the phonetic code
|
package/package.json
CHANGED
|
@@ -7,8 +7,9 @@
|
|
|
7
7
|
"url" : "https://komed3.de"
|
|
8
8
|
},
|
|
9
9
|
"homepage": "https://github.com/komed3/cmpstr#readme",
|
|
10
|
-
"version": "2.0.
|
|
10
|
+
"version": "2.0.2",
|
|
11
11
|
"main": "src/index.js",
|
|
12
|
+
"types": "src/index.d.ts",
|
|
12
13
|
"license": "MIT",
|
|
13
14
|
"keywords": [
|
|
14
15
|
"string-similarity",
|
|
@@ -35,7 +36,8 @@
|
|
|
35
36
|
"text-processing",
|
|
36
37
|
"fuzzy-matching",
|
|
37
38
|
"string-matching",
|
|
38
|
-
"text-similarity"
|
|
39
|
+
"text-similarity",
|
|
40
|
+
"typescript-definitions"
|
|
39
41
|
],
|
|
40
42
|
"repository": {
|
|
41
43
|
"type": "git",
|
package/src/CmpStr.d.ts
ADDED
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
export interface Config {
|
|
2
|
+
flags?: string;
|
|
3
|
+
threshold?: number;
|
|
4
|
+
options?: Record<string, any>;
|
|
5
|
+
}
|
|
6
|
+
|
|
7
|
+
export interface BatchResult {
|
|
8
|
+
target: string;
|
|
9
|
+
match: number | any;
|
|
10
|
+
}
|
|
11
|
+
|
|
12
|
+
export declare class CmpStr {
|
|
13
|
+
|
|
14
|
+
constructor ( algo?: string, str?: string );
|
|
15
|
+
|
|
16
|
+
isReady () : boolean;
|
|
17
|
+
|
|
18
|
+
setStr ( str: string ) : boolean;
|
|
19
|
+
|
|
20
|
+
getStr () : string;
|
|
21
|
+
|
|
22
|
+
listAlgo ( loadedOnly?: boolean ) : string[];
|
|
23
|
+
|
|
24
|
+
isAlgo ( algo: string ) : boolean;
|
|
25
|
+
|
|
26
|
+
setAlgo ( algo: string ) : boolean;
|
|
27
|
+
|
|
28
|
+
getAlgo () : string;
|
|
29
|
+
|
|
30
|
+
addAlgo ( algo: string, callback: (
|
|
31
|
+
a: string, b: string, ...args : any
|
|
32
|
+
) => number | any, useIt?: boolean ) : boolean;
|
|
33
|
+
|
|
34
|
+
rmvAlgo( algo: string ) : boolean;
|
|
35
|
+
|
|
36
|
+
listFilter () : string[];
|
|
37
|
+
|
|
38
|
+
addFilter ( name: string, callback: (
|
|
39
|
+
str: string
|
|
40
|
+
) => string, priority?: number ) : boolean;
|
|
41
|
+
|
|
42
|
+
rmvFilter ( name: string ) : boolean;
|
|
43
|
+
|
|
44
|
+
pauseFilter ( name: string ) : boolean;
|
|
45
|
+
|
|
46
|
+
resumeFilter ( name: string ) : boolean;
|
|
47
|
+
|
|
48
|
+
clearFilter () : boolean;
|
|
49
|
+
|
|
50
|
+
setFlags( flags: string ) : void;
|
|
51
|
+
|
|
52
|
+
getFlags () : string;
|
|
53
|
+
|
|
54
|
+
normalize ( input: string|string[], flags?: string ) : string|string[];
|
|
55
|
+
|
|
56
|
+
clearCache () : boolean;
|
|
57
|
+
|
|
58
|
+
compare ( algo: string, a: string, b: string, config?: Config ) : number | any;
|
|
59
|
+
|
|
60
|
+
test ( str: string, config?: Config ) : number | any;
|
|
61
|
+
|
|
62
|
+
batchTest ( arr: string[], config?: Config ) : BatchResult[];
|
|
63
|
+
|
|
64
|
+
match ( arr: string[], config?: Config ) : BatchResult[];
|
|
65
|
+
|
|
66
|
+
closest ( arr: string[], config?: Config ) : string | undefined;
|
|
67
|
+
|
|
68
|
+
similarityMatrix ( algo: string, arr: string[], config?: Config ) : number[][];
|
|
69
|
+
|
|
70
|
+
}
|
package/src/CmpStr.js
CHANGED
|
@@ -20,6 +20,12 @@
|
|
|
20
20
|
|
|
21
21
|
module.exports = class CmpStr {
|
|
22
22
|
|
|
23
|
+
/**
|
|
24
|
+
* --------------------------------------------------
|
|
25
|
+
* Global Variables
|
|
26
|
+
* --------------------------------------------------
|
|
27
|
+
*/
|
|
28
|
+
|
|
23
29
|
/**
|
|
24
30
|
* all pre-defined similarity algorithms
|
|
25
31
|
*
|
|
@@ -41,6 +47,15 @@ module.exports = class CmpStr {
|
|
|
41
47
|
soundex: './algorithms/soundex'
|
|
42
48
|
};
|
|
43
49
|
|
|
50
|
+
/**
|
|
51
|
+
* stores the names of loaded algorithms
|
|
52
|
+
*
|
|
53
|
+
* @since 2.0.2
|
|
54
|
+
* @private
|
|
55
|
+
* @type {Set<String>}
|
|
56
|
+
*/
|
|
57
|
+
#loadedAlgo = new Set ();
|
|
58
|
+
|
|
44
59
|
/**
|
|
45
60
|
* normalized strings cache
|
|
46
61
|
*
|
|
@@ -61,28 +76,43 @@ module.exports = class CmpStr {
|
|
|
61
76
|
* default normalization flags
|
|
62
77
|
* set by setFlags()
|
|
63
78
|
*
|
|
64
|
-
* @
|
|
79
|
+
* @private
|
|
80
|
+
* @type {String}
|
|
81
|
+
*/
|
|
82
|
+
#flags = '';
|
|
83
|
+
|
|
84
|
+
/**
|
|
85
|
+
* current algorithm to use for similarity calculations
|
|
86
|
+
* set by setAlgo(), addAlgo() or constructor()
|
|
87
|
+
*
|
|
88
|
+
* @private
|
|
65
89
|
* @type {String}
|
|
66
90
|
*/
|
|
67
|
-
|
|
91
|
+
#algo;
|
|
68
92
|
|
|
69
93
|
/**
|
|
70
94
|
* base string for comparison
|
|
71
95
|
* set by setStr or constructor()
|
|
72
96
|
*
|
|
73
|
-
* @
|
|
97
|
+
* @private
|
|
74
98
|
* @type {String}
|
|
75
99
|
*/
|
|
76
|
-
str;
|
|
100
|
+
#str;
|
|
77
101
|
|
|
78
102
|
/**
|
|
79
|
-
*
|
|
80
|
-
* set by setAlgo(), addAlgo() or constructor()
|
|
103
|
+
* stores the current ready state
|
|
81
104
|
*
|
|
82
|
-
* @
|
|
83
|
-
* @
|
|
105
|
+
* @since 2.0.2
|
|
106
|
+
* @private
|
|
107
|
+
* @type {Boolean}
|
|
108
|
+
*/
|
|
109
|
+
#readyState = false;
|
|
110
|
+
|
|
111
|
+
/**
|
|
112
|
+
* --------------------------------------------------
|
|
113
|
+
* Constructor
|
|
114
|
+
* --------------------------------------------------
|
|
84
115
|
*/
|
|
85
|
-
algo;
|
|
86
116
|
|
|
87
117
|
/**
|
|
88
118
|
* initializes a CmpStr instance
|
|
@@ -107,6 +137,12 @@ module.exports = class CmpStr {
|
|
|
107
137
|
|
|
108
138
|
};
|
|
109
139
|
|
|
140
|
+
/**
|
|
141
|
+
* --------------------------------------------------
|
|
142
|
+
* Ready State
|
|
143
|
+
* --------------------------------------------------
|
|
144
|
+
*/
|
|
145
|
+
|
|
110
146
|
/**
|
|
111
147
|
* checks whether string and algorithm are set correctly
|
|
112
148
|
*
|
|
@@ -114,11 +150,23 @@ module.exports = class CmpStr {
|
|
|
114
150
|
*/
|
|
115
151
|
isReady () {
|
|
116
152
|
|
|
117
|
-
return
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
153
|
+
return this.#readyState;
|
|
154
|
+
|
|
155
|
+
};
|
|
156
|
+
|
|
157
|
+
/**
|
|
158
|
+
* updates the readiness state
|
|
159
|
+
*
|
|
160
|
+
* @since 2.0.2
|
|
161
|
+
* @private
|
|
162
|
+
*/
|
|
163
|
+
#updateReadyState () {
|
|
164
|
+
|
|
165
|
+
this.#readyState = (
|
|
166
|
+
typeof this.#algo === 'string' &&
|
|
167
|
+
this.isAlgo( this.#algo ) &&
|
|
168
|
+
typeof this.#str === 'string' &&
|
|
169
|
+
this.#str.length !== 0
|
|
122
170
|
);
|
|
123
171
|
|
|
124
172
|
};
|
|
@@ -126,12 +174,13 @@ module.exports = class CmpStr {
|
|
|
126
174
|
/**
|
|
127
175
|
* checks ready state and throws an error if not
|
|
128
176
|
*
|
|
177
|
+
* @private
|
|
129
178
|
* @returns {Boolean} true if ready
|
|
130
179
|
* @throws {Error} if CmpStr is not ready
|
|
131
180
|
*/
|
|
132
|
-
|
|
181
|
+
#checkReady () {
|
|
133
182
|
|
|
134
|
-
if ( !this
|
|
183
|
+
if ( !this.#readyState ) {
|
|
135
184
|
|
|
136
185
|
throw new Error(
|
|
137
186
|
`CmpStr instance is not ready. Ensure the algorithm and base string are set.`
|
|
@@ -143,6 +192,12 @@ module.exports = class CmpStr {
|
|
|
143
192
|
|
|
144
193
|
};
|
|
145
194
|
|
|
195
|
+
/**
|
|
196
|
+
* --------------------------------------------------
|
|
197
|
+
* Base String
|
|
198
|
+
* --------------------------------------------------
|
|
199
|
+
*/
|
|
200
|
+
|
|
146
201
|
/**
|
|
147
202
|
* sets the base string for comparison
|
|
148
203
|
*
|
|
@@ -151,12 +206,26 @@ module.exports = class CmpStr {
|
|
|
151
206
|
*/
|
|
152
207
|
setStr ( str ) {
|
|
153
208
|
|
|
154
|
-
this
|
|
209
|
+
this.#str = String ( str );
|
|
210
|
+
|
|
211
|
+
this.#updateReadyState();
|
|
155
212
|
|
|
156
213
|
return true;
|
|
157
214
|
|
|
158
215
|
};
|
|
159
216
|
|
|
217
|
+
/**
|
|
218
|
+
* gets the base string for comparison
|
|
219
|
+
*
|
|
220
|
+
* @since 2.0.2
|
|
221
|
+
* @returns {String} base string
|
|
222
|
+
*/
|
|
223
|
+
getStr () {
|
|
224
|
+
|
|
225
|
+
return this.#str;
|
|
226
|
+
|
|
227
|
+
};
|
|
228
|
+
|
|
160
229
|
/**
|
|
161
230
|
* --------------------------------------------------
|
|
162
231
|
* Algorithms
|
|
@@ -166,11 +235,14 @@ module.exports = class CmpStr {
|
|
|
166
235
|
/**
|
|
167
236
|
* list all registered similarity algorithms
|
|
168
237
|
*
|
|
238
|
+
* @param {Boolean} [loadedOnly=false] it true, only loaded algorithm names are returned
|
|
169
239
|
* @returns {String[]} array of algorithm names
|
|
170
240
|
*/
|
|
171
|
-
listAlgo () {
|
|
241
|
+
listAlgo ( loadedOnly = false ) {
|
|
172
242
|
|
|
173
|
-
return
|
|
243
|
+
return loadedOnly
|
|
244
|
+
? [ ...this.#loadedAlgo ]
|
|
245
|
+
: [ ...Object.keys( this.#algorithms ) ];
|
|
174
246
|
|
|
175
247
|
};
|
|
176
248
|
|
|
@@ -194,9 +266,11 @@ module.exports = class CmpStr {
|
|
|
194
266
|
*/
|
|
195
267
|
setAlgo ( algo ) {
|
|
196
268
|
|
|
197
|
-
if ( this
|
|
269
|
+
if ( this.#loadAlgo( algo ) ) {
|
|
198
270
|
|
|
199
|
-
this
|
|
271
|
+
this.#algo = algo;
|
|
272
|
+
|
|
273
|
+
this.#updateReadyState();
|
|
200
274
|
|
|
201
275
|
return true;
|
|
202
276
|
|
|
@@ -204,6 +278,18 @@ module.exports = class CmpStr {
|
|
|
204
278
|
|
|
205
279
|
};
|
|
206
280
|
|
|
281
|
+
/**
|
|
282
|
+
* gets the current algorithm to use for similarity calculations
|
|
283
|
+
*
|
|
284
|
+
* @since 2.0.2
|
|
285
|
+
* @returns {String} name of the algorithm
|
|
286
|
+
*/
|
|
287
|
+
getAlgo () {
|
|
288
|
+
|
|
289
|
+
return this.#algo;
|
|
290
|
+
|
|
291
|
+
};
|
|
292
|
+
|
|
207
293
|
/**
|
|
208
294
|
* adds a new similarity algorithm
|
|
209
295
|
*
|
|
@@ -255,11 +341,15 @@ module.exports = class CmpStr {
|
|
|
255
341
|
|
|
256
342
|
delete this.#algorithms[ algo ];
|
|
257
343
|
|
|
258
|
-
|
|
344
|
+
this.#loadedAlgo.delete( algo );
|
|
345
|
+
|
|
346
|
+
if ( this.#algo === algo ) {
|
|
259
347
|
|
|
260
348
|
/* reset current algorithm if it was removed */
|
|
261
349
|
|
|
262
|
-
this
|
|
350
|
+
this.#algo = undefined;
|
|
351
|
+
|
|
352
|
+
this.#updateReadyState();
|
|
263
353
|
|
|
264
354
|
}
|
|
265
355
|
|
|
@@ -278,18 +368,25 @@ module.exports = class CmpStr {
|
|
|
278
368
|
/**
|
|
279
369
|
* lazy-loads the specified algorithm module
|
|
280
370
|
*
|
|
371
|
+
* @private
|
|
281
372
|
* @param {String} algo name of the similarity algorithm
|
|
282
373
|
* @returns {Boolean} true if the algorithm is loaded
|
|
283
374
|
* @throws {Error} if the algorithm cannot be loaded or is not defined
|
|
284
375
|
*/
|
|
285
|
-
|
|
376
|
+
#loadAlgo ( algo ) {
|
|
286
377
|
|
|
287
|
-
if ( this.
|
|
378
|
+
if ( this.#loadedAlgo.has( algo ) ) {
|
|
379
|
+
|
|
380
|
+
return true;
|
|
381
|
+
|
|
382
|
+
} else if ( this.isAlgo( algo ) ) {
|
|
288
383
|
|
|
289
384
|
let typeOf = typeof this.#algorithms[ algo ];
|
|
290
385
|
|
|
291
386
|
if ( typeOf === 'function' ) {
|
|
292
387
|
|
|
388
|
+
this.#loadedAlgo.add( algo );
|
|
389
|
+
|
|
293
390
|
return true;
|
|
294
391
|
|
|
295
392
|
} else if ( typeOf === 'string' ) {
|
|
@@ -302,6 +399,8 @@ module.exports = class CmpStr {
|
|
|
302
399
|
this.#algorithms[ algo ]
|
|
303
400
|
);
|
|
304
401
|
|
|
402
|
+
this.#loadedAlgo.add( algo );
|
|
403
|
+
|
|
305
404
|
return true;
|
|
306
405
|
|
|
307
406
|
} catch ( err ) {
|
|
@@ -482,11 +581,12 @@ module.exports = class CmpStr {
|
|
|
482
581
|
/**
|
|
483
582
|
* applies all active filters to a string
|
|
484
583
|
*
|
|
584
|
+
* @private
|
|
485
585
|
* @param {String} str string to process
|
|
486
586
|
* @returns {String} filtered string
|
|
487
587
|
* @throws {Error} if applying filters cause an error
|
|
488
588
|
*/
|
|
489
|
-
|
|
589
|
+
#applyFilters ( str ) {
|
|
490
590
|
|
|
491
591
|
try {
|
|
492
592
|
|
|
@@ -524,7 +624,19 @@ module.exports = class CmpStr {
|
|
|
524
624
|
*/
|
|
525
625
|
setFlags ( flags = '' ) {
|
|
526
626
|
|
|
527
|
-
this
|
|
627
|
+
this.#flags = String ( flags );
|
|
628
|
+
|
|
629
|
+
};
|
|
630
|
+
|
|
631
|
+
/**
|
|
632
|
+
* get default normalization flags
|
|
633
|
+
*
|
|
634
|
+
* @since 2.0.2
|
|
635
|
+
* @returns {String} normalization flags
|
|
636
|
+
*/
|
|
637
|
+
getFlags () {
|
|
638
|
+
|
|
639
|
+
return this.#flags;
|
|
528
640
|
|
|
529
641
|
};
|
|
530
642
|
|
|
@@ -544,57 +656,73 @@ module.exports = class CmpStr {
|
|
|
544
656
|
* d :: decompose unicode
|
|
545
657
|
* u :: normalize unicode
|
|
546
658
|
*
|
|
547
|
-
* @param {String} string string to normalize
|
|
659
|
+
* @param {String|String[]} string string(s) to normalize
|
|
548
660
|
* @param {String} [flags=''] normalization flags
|
|
549
|
-
* @returns {String} normalized string
|
|
661
|
+
* @returns {String|String[]} normalized string(s)
|
|
550
662
|
* @throws {Error} if normalization cause an error
|
|
551
663
|
*/
|
|
552
|
-
normalize (
|
|
664
|
+
normalize ( input, flags = '' ) {
|
|
553
665
|
|
|
554
|
-
|
|
666
|
+
const processStr = ( str ) => {
|
|
555
667
|
|
|
556
|
-
|
|
668
|
+
let res = String ( str );
|
|
557
669
|
|
|
558
|
-
|
|
670
|
+
/* use normalized string from cache to increase performance */
|
|
559
671
|
|
|
560
|
-
|
|
672
|
+
let key = `${res}::${flags}`;
|
|
561
673
|
|
|
562
|
-
|
|
674
|
+
if ( this.#cache.has( key ) ) {
|
|
563
675
|
|
|
564
|
-
|
|
676
|
+
return this.#cache.get( key );
|
|
565
677
|
|
|
566
|
-
|
|
678
|
+
}
|
|
567
679
|
|
|
568
|
-
|
|
680
|
+
/* apply custom filters */
|
|
569
681
|
|
|
570
|
-
|
|
682
|
+
res = this.#applyFilters( res );
|
|
571
683
|
|
|
572
|
-
|
|
684
|
+
/* normalize using flags */
|
|
573
685
|
|
|
574
|
-
|
|
575
|
-
if ( flags.includes( 'w' ) ) res = res.replace( /\s+/g, ' ' );
|
|
576
|
-
if ( flags.includes( 'r' ) ) res = res.replace( /(.)\1+/g, '$1' );
|
|
577
|
-
if ( flags.includes( 'k' ) ) res = res.replace( /[^a-z]/gi, '' );
|
|
578
|
-
if ( flags.includes( 'n' ) ) res = res.replace( /[0-9]/g, '' );
|
|
579
|
-
if ( flags.includes( 't' ) ) res = res.trim();
|
|
580
|
-
if ( flags.includes( 'i' ) ) res = res.toLowerCase();
|
|
581
|
-
if ( flags.includes( 'd' ) ) res = res.normalize( 'NFD' ).replace( /[\u0300-\u036f]/g, '' );
|
|
582
|
-
if ( flags.includes( 'u' ) ) res = res.normalize( 'NFC' );
|
|
686
|
+
try {
|
|
583
687
|
|
|
584
|
-
|
|
688
|
+
if ( flags.includes( 's' ) ) res = res.replace( /[^a-z0-9]/gi, '' );
|
|
689
|
+
if ( flags.includes( 'w' ) ) res = res.replace( /\s+/g, ' ' );
|
|
690
|
+
if ( flags.includes( 'r' ) ) res = res.replace( /(.)\1+/g, '$1' );
|
|
691
|
+
if ( flags.includes( 'k' ) ) res = res.replace( /[^a-z]/gi, '' );
|
|
692
|
+
if ( flags.includes( 'n' ) ) res = res.replace( /[0-9]/g, '' );
|
|
693
|
+
if ( flags.includes( 't' ) ) res = res.trim();
|
|
694
|
+
if ( flags.includes( 'i' ) ) res = res.toLowerCase();
|
|
695
|
+
if ( flags.includes( 'd' ) ) res = res.normalize( 'NFD' ).replace( /[\u0300-\u036f]/g, '' );
|
|
696
|
+
if ( flags.includes( 'u' ) ) res = res.normalize( 'NFC' );
|
|
585
697
|
|
|
586
|
-
|
|
587
|
-
|
|
588
|
-
|
|
589
|
-
|
|
698
|
+
} catch ( err ) {
|
|
699
|
+
|
|
700
|
+
throw new Error (
|
|
701
|
+
`Error while normalization.`,
|
|
702
|
+
{ cause: err }
|
|
703
|
+
);
|
|
704
|
+
|
|
705
|
+
}
|
|
706
|
+
|
|
707
|
+
/* store the normalized string in the cache */
|
|
708
|
+
|
|
709
|
+
this.#cache.set( key, res );
|
|
710
|
+
|
|
711
|
+
return res;
|
|
590
712
|
|
|
591
713
|
}
|
|
592
714
|
|
|
593
|
-
/*
|
|
715
|
+
/* processing multiple string */
|
|
716
|
+
|
|
717
|
+
if ( Array.isArray( input ) ) {
|
|
594
718
|
|
|
595
|
-
|
|
719
|
+
return input.map(
|
|
720
|
+
( str ) => processStr( str )
|
|
721
|
+
);
|
|
722
|
+
|
|
723
|
+
}
|
|
596
724
|
|
|
597
|
-
return
|
|
725
|
+
return processStr( input );
|
|
598
726
|
|
|
599
727
|
};
|
|
600
728
|
|
|
@@ -629,7 +757,7 @@ module.exports = class CmpStr {
|
|
|
629
757
|
*/
|
|
630
758
|
compare ( algo, a, b, config = {} ) {
|
|
631
759
|
|
|
632
|
-
if ( this
|
|
760
|
+
if ( this.#loadAlgo( algo ) ) {
|
|
633
761
|
|
|
634
762
|
/* handle trivial cases */
|
|
635
763
|
|
|
@@ -639,7 +767,7 @@ module.exports = class CmpStr {
|
|
|
639
767
|
/* apply similarity algorithm */
|
|
640
768
|
|
|
641
769
|
const {
|
|
642
|
-
flags = this
|
|
770
|
+
flags = this.#flags,
|
|
643
771
|
options = {}
|
|
644
772
|
} = config;
|
|
645
773
|
|
|
@@ -674,11 +802,11 @@ module.exports = class CmpStr {
|
|
|
674
802
|
*/
|
|
675
803
|
test ( str, config = {} ) {
|
|
676
804
|
|
|
677
|
-
if ( this
|
|
805
|
+
if ( this.#checkReady() ) {
|
|
678
806
|
|
|
679
807
|
return this.compare(
|
|
680
|
-
this
|
|
681
|
-
this
|
|
808
|
+
this.#algo,
|
|
809
|
+
this.#str, str,
|
|
682
810
|
config
|
|
683
811
|
);
|
|
684
812
|
|
|
@@ -695,13 +823,13 @@ module.exports = class CmpStr {
|
|
|
695
823
|
*/
|
|
696
824
|
batchTest ( arr, config = {} ) {
|
|
697
825
|
|
|
698
|
-
if ( this
|
|
826
|
+
if ( this.#checkReady() ) {
|
|
699
827
|
|
|
700
828
|
return [ ...arr ].map( ( str ) => ( {
|
|
701
829
|
target: str,
|
|
702
830
|
match: this.compare(
|
|
703
|
-
this
|
|
704
|
-
this
|
|
831
|
+
this.#algo,
|
|
832
|
+
this.#str, str,
|
|
705
833
|
config
|
|
706
834
|
)
|
|
707
835
|
} ) );
|
|
@@ -763,7 +891,7 @@ module.exports = class CmpStr {
|
|
|
763
891
|
*/
|
|
764
892
|
similarityMatrix ( algo, arr, config = {} ) {
|
|
765
893
|
|
|
766
|
-
if ( this
|
|
894
|
+
if ( this.#loadAlgo( algo ) ) {
|
|
767
895
|
|
|
768
896
|
delete config?.options?.raw;
|
|
769
897
|
|
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
import { CmpStr, Config, BatchResult } from './CmpStr';
|
|
2
|
+
|
|
3
|
+
export declare class CmpStrAsync extends CmpStr {
|
|
4
|
+
|
|
5
|
+
normalizeAsync ( input: string|string[], flags?: string ) : string|string[];
|
|
6
|
+
|
|
7
|
+
compareAsync ( algo: string, a: string, b: string, config?: Config ) : Promise<number | any>;
|
|
8
|
+
|
|
9
|
+
testAsync ( str: string, config?: Config ) : Promise<number | any>;
|
|
10
|
+
|
|
11
|
+
batchTestAsync ( arr: string[], config?: Config ) : Promise<BatchResult[]>;
|
|
12
|
+
|
|
13
|
+
matchAsync ( arr: string[], config?: Config ) : Promise<BatchResult[]>;
|
|
14
|
+
|
|
15
|
+
closestAsync ( arr: string[], config?: Config ) : Promise<string | undefined>;
|
|
16
|
+
|
|
17
|
+
similarityMatrixAsync ( algo: string, arr: string[], config?: Config ) : Promise<number[][]>;
|
|
18
|
+
|
|
19
|
+
}
|
package/src/CmpStrAsync.js
CHANGED
|
@@ -40,9 +40,10 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
40
40
|
};
|
|
41
41
|
|
|
42
42
|
/**
|
|
43
|
-
* @private
|
|
44
43
|
* generic async wrapper for methods
|
|
44
|
+
* @async
|
|
45
45
|
*
|
|
46
|
+
* @private
|
|
46
47
|
* @param {Function} method method to call
|
|
47
48
|
* @param {...any} args arguments to pass to the method
|
|
48
49
|
* @returns {Promise} Promise resolving the result of the method
|
|
@@ -76,8 +77,25 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
76
77
|
*/
|
|
77
78
|
|
|
78
79
|
/**
|
|
79
|
-
*
|
|
80
|
+
* normalizes a string by chainable options; uses cache to increase
|
|
81
|
+
* performance and custom filters for advanced behavior
|
|
80
82
|
*
|
|
83
|
+
* @since 2.0.2
|
|
84
|
+
* @param {String|String[]} input string(s) to normalize
|
|
85
|
+
* @param {String} [flags=''] normalization flags
|
|
86
|
+
* @returns {Promise} Promise resolving string normalization
|
|
87
|
+
*/
|
|
88
|
+
normalizeAsync ( input, flags = '' ) {
|
|
89
|
+
|
|
90
|
+
return this.#asyncWrapper(
|
|
91
|
+
this.normalize,
|
|
92
|
+
input, flags
|
|
93
|
+
);
|
|
94
|
+
|
|
95
|
+
};
|
|
96
|
+
|
|
97
|
+
/**
|
|
98
|
+
* compares two string a and b using the passed algorithm
|
|
81
99
|
* @async
|
|
82
100
|
*
|
|
83
101
|
* @param {String} algo name of the algorithm
|
|
@@ -98,7 +116,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
98
116
|
/**
|
|
99
117
|
* tests the similarity between the base string and a target string
|
|
100
118
|
* using the current algorithm
|
|
101
|
-
*
|
|
102
119
|
* @async
|
|
103
120
|
*
|
|
104
121
|
* @param {String} str target string
|
|
@@ -116,7 +133,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
116
133
|
|
|
117
134
|
/**
|
|
118
135
|
* tests the similarity of multiple strings against the base string
|
|
119
|
-
*
|
|
120
136
|
* @async
|
|
121
137
|
*
|
|
122
138
|
* @param {String[]} arr array of strings
|
|
@@ -135,7 +151,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
135
151
|
/**
|
|
136
152
|
* finds strings in an array that exceed a similarity threshold
|
|
137
153
|
* returns the array sorted by highest similarity
|
|
138
|
-
*
|
|
139
154
|
* @async
|
|
140
155
|
*
|
|
141
156
|
* @param {String[]} arr array of strings
|
|
@@ -153,7 +168,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
153
168
|
|
|
154
169
|
/**
|
|
155
170
|
* finds the closest matching string from an array
|
|
156
|
-
*
|
|
157
171
|
* @async
|
|
158
172
|
*
|
|
159
173
|
* @param {String[]} arr array of strings
|
|
@@ -171,7 +185,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
171
185
|
|
|
172
186
|
/**
|
|
173
187
|
* generate a similarity matrix for an array of strings
|
|
174
|
-
*
|
|
175
188
|
* @async
|
|
176
189
|
*
|
|
177
190
|
* @param {String} algo name of the algorithm
|
package/src/index.d.ts
ADDED