cmpstr 2.0.1 → 2.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +87 -68
- package/package.json +1 -1
- package/src/CmpStr.d.ts +10 -4
- package/src/CmpStr.js +203 -70
- package/src/CmpStrAsync.d.ts +3 -1
- package/src/CmpStrAsync.js +21 -15
- package/src/index.js +1 -1
package/README.md
CHANGED
|
@@ -10,7 +10,7 @@ CmpStr is a lightweight and powerful npm package for calculating string similari
|
|
|
10
10
|
- Customizable normalization with global flags and caching.
|
|
11
11
|
- Asynchronous support for non-blocking workflows.
|
|
12
12
|
- Extensible with custom algorithms and filters.
|
|
13
|
-
- TypeScript
|
|
13
|
+
- TypeScript declarations for better developer experience.
|
|
14
14
|
|
|
15
15
|
## Installation
|
|
16
16
|
|
|
@@ -62,15 +62,19 @@ Sets the base string for comparison.
|
|
|
62
62
|
|
|
63
63
|
Parameters:
|
|
64
64
|
|
|
65
|
-
|
|
65
|
+
`<String> str` – string to set as the base
|
|
66
|
+
|
|
67
|
+
#### `getStr()`
|
|
68
|
+
|
|
69
|
+
Gets the base string for comparison.
|
|
66
70
|
|
|
67
71
|
#### `setFlags( [ flags = '' ] )`
|
|
68
72
|
|
|
69
73
|
Set default normalization flags. They will be overwritten by passing `flags` through the configuration object. See description of available flags / normalization options below in the documentation.
|
|
70
74
|
|
|
71
|
-
|
|
75
|
+
#### `getFlags()`
|
|
72
76
|
|
|
73
|
-
|
|
77
|
+
Gets the default normalization flags.
|
|
74
78
|
|
|
75
79
|
#### `clearCache()`
|
|
76
80
|
|
|
@@ -78,9 +82,13 @@ Clears the normalization cache.
|
|
|
78
82
|
|
|
79
83
|
### Algorithms
|
|
80
84
|
|
|
81
|
-
#### `listAlgo()`
|
|
85
|
+
#### `listAlgo( [ loadedOnly = false ] )`
|
|
86
|
+
|
|
87
|
+
List all registered or loaded similarity algorithms.
|
|
88
|
+
|
|
89
|
+
Parameters:
|
|
82
90
|
|
|
83
|
-
|
|
91
|
+
`<Boolean> loadedOnly` – it true, only loaded algorithm names are returned
|
|
84
92
|
|
|
85
93
|
#### `isAlgo( algo )`
|
|
86
94
|
|
|
@@ -88,7 +96,7 @@ Checks if an algorithm is registered. Returns `true` if so, `false` otherwise.
|
|
|
88
96
|
|
|
89
97
|
Parameters:
|
|
90
98
|
|
|
91
|
-
|
|
99
|
+
`<String> algo` – name of the algorithm
|
|
92
100
|
|
|
93
101
|
#### `setAlgo( algo )`
|
|
94
102
|
|
|
@@ -98,7 +106,11 @@ Allowed options for build-in althorithms are `cosine`, `damerau`, `dice`, `hammi
|
|
|
98
106
|
|
|
99
107
|
Parameters:
|
|
100
108
|
|
|
101
|
-
|
|
109
|
+
`<String> algo` – name of the algorithm
|
|
110
|
+
|
|
111
|
+
#### `getAlgo()`
|
|
112
|
+
|
|
113
|
+
Gets the current algorithm to use for similarity calculations.
|
|
102
114
|
|
|
103
115
|
#### `addAlgo( algo, callback [, useIt = true ] )`
|
|
104
116
|
|
|
@@ -106,9 +118,9 @@ Adding a new similarity algorithm by using the `addAlgo()` method passing the na
|
|
|
106
118
|
|
|
107
119
|
Parameters:
|
|
108
120
|
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
121
|
+
`<String> algo` – name of the algorithm
|
|
122
|
+
`<Function> callback` – callback function implementing the algorithm
|
|
123
|
+
`<Boolean> useIt` – whether to set this algorithm as the current one
|
|
112
124
|
|
|
113
125
|
Example:
|
|
114
126
|
|
|
@@ -129,13 +141,17 @@ Removing a registered similarity algorithm.
|
|
|
129
141
|
|
|
130
142
|
Parameters:
|
|
131
143
|
|
|
132
|
-
|
|
144
|
+
`<String> algo` – name of the algorithm
|
|
133
145
|
|
|
134
146
|
### Filters
|
|
135
147
|
|
|
136
|
-
#### `listFilter()`
|
|
148
|
+
#### `listFilter( [ activeOnly = false ] )`
|
|
149
|
+
|
|
150
|
+
List all added or active filter names.
|
|
137
151
|
|
|
138
|
-
|
|
152
|
+
Parameters:
|
|
153
|
+
|
|
154
|
+
`<Boolean> activeOnly` – it true, only names of active filters are returned
|
|
139
155
|
|
|
140
156
|
#### `addFilter( name, callback [, priority = 10 ] )`
|
|
141
157
|
|
|
@@ -143,9 +159,9 @@ Adds a custom normalization filter. Needs to be passed a unique name and callbac
|
|
|
143
159
|
|
|
144
160
|
Parameters:
|
|
145
161
|
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
162
|
+
`<String> name` – filter name
|
|
163
|
+
`<Function> callback` – callback function implementing the filter
|
|
164
|
+
`<Int> priority` – priority of the filter
|
|
149
165
|
|
|
150
166
|
Example:
|
|
151
167
|
|
|
@@ -161,7 +177,7 @@ Removes a custom normalization filter.
|
|
|
161
177
|
|
|
162
178
|
Parameters:
|
|
163
179
|
|
|
164
|
-
|
|
180
|
+
`<String> name` – filter name
|
|
165
181
|
|
|
166
182
|
#### `pauseFilter( name )`
|
|
167
183
|
|
|
@@ -169,7 +185,7 @@ Pauses a custom normalization filter.
|
|
|
169
185
|
|
|
170
186
|
Parameters:
|
|
171
187
|
|
|
172
|
-
|
|
188
|
+
`<String> name` – filter name
|
|
173
189
|
|
|
174
190
|
#### `resumeFilter( name )`
|
|
175
191
|
|
|
@@ -177,7 +193,7 @@ Resumes a custom normalization filter.
|
|
|
177
193
|
|
|
178
194
|
Parameters:
|
|
179
195
|
|
|
180
|
-
|
|
196
|
+
`<String> name` – filter name
|
|
181
197
|
|
|
182
198
|
#### `clearFilter( name )`
|
|
183
199
|
|
|
@@ -191,10 +207,10 @@ Compares two strings using the specified algorithm. The method returns either th
|
|
|
191
207
|
|
|
192
208
|
Parameters:
|
|
193
209
|
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
210
|
+
`<String> algo` – name of the algorithm
|
|
211
|
+
`<String> a` – first string
|
|
212
|
+
`<String> b` – second string
|
|
213
|
+
`<Object> config` – configuration object
|
|
198
214
|
|
|
199
215
|
Example:
|
|
200
216
|
|
|
@@ -211,8 +227,8 @@ Tests the similarity between the base string and a given target string. Returns
|
|
|
211
227
|
|
|
212
228
|
Parameters:
|
|
213
229
|
|
|
214
|
-
|
|
215
|
-
|
|
230
|
+
`<String> str` – target string
|
|
231
|
+
`<Object> config` – configuration object
|
|
216
232
|
|
|
217
233
|
Example:
|
|
218
234
|
|
|
@@ -229,8 +245,8 @@ Tests the similarity of multiple strings against the base string. Returns an arr
|
|
|
229
245
|
|
|
230
246
|
Parameters:
|
|
231
247
|
|
|
232
|
-
|
|
233
|
-
|
|
248
|
+
`<String[]> arr` – array of strings
|
|
249
|
+
`<Object> config` – configuration object
|
|
234
250
|
|
|
235
251
|
Example:
|
|
236
252
|
|
|
@@ -247,8 +263,8 @@ Finds strings in an array that exceed a similarity threshold and sorts them by h
|
|
|
247
263
|
|
|
248
264
|
Parameters:
|
|
249
265
|
|
|
250
|
-
|
|
251
|
-
|
|
266
|
+
`<String[]> arr` – array of strings
|
|
267
|
+
`<Object> config` – configuration object
|
|
252
268
|
|
|
253
269
|
Example:
|
|
254
270
|
|
|
@@ -267,8 +283,8 @@ Finds the closest matching string from an array and returns them.
|
|
|
267
283
|
|
|
268
284
|
Parameters:
|
|
269
285
|
|
|
270
|
-
|
|
271
|
-
|
|
286
|
+
`<String[]> arr` – array of strings
|
|
287
|
+
`<Object> config` – configuration object
|
|
272
288
|
|
|
273
289
|
Example:
|
|
274
290
|
|
|
@@ -285,9 +301,9 @@ Generates a similarity matrix for an array of strings. Returns an 2D array that
|
|
|
285
301
|
|
|
286
302
|
Parameters:
|
|
287
303
|
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
304
|
+
`<String> algo` – name of the algorithm
|
|
305
|
+
`<String[]> arr` – array of strings
|
|
306
|
+
`<Object> config` – configuration object
|
|
291
307
|
|
|
292
308
|
Example:
|
|
293
309
|
|
|
@@ -308,24 +324,24 @@ The `CmpStr` package allows strings to be normalized before the similarity compa
|
|
|
308
324
|
|
|
309
325
|
#### Supported Flags
|
|
310
326
|
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
|
|
318
|
-
|
|
319
|
-
|
|
327
|
+
`s` – remove special chars
|
|
328
|
+
`w` – collapse whitespaces
|
|
329
|
+
`r` – remove repeated chars
|
|
330
|
+
`k` – keep only letters
|
|
331
|
+
`n` – ignore numbers
|
|
332
|
+
`t` – trim whitespaces
|
|
333
|
+
`i` – case insensitivity
|
|
334
|
+
`d` – decompose unicode
|
|
335
|
+
`u` – normalize unicode
|
|
320
336
|
|
|
321
|
-
#### `normalize(
|
|
337
|
+
#### `normalize( input [, flags = '' ] )`
|
|
322
338
|
|
|
323
339
|
The method for normalizing strings can also be called on its own, without comparing the similarity of two strings. This also applies all filters and reads or writes to the cache. This can be helpful if certain strings should be saved beforehand or different normalization options want to be tested.
|
|
324
340
|
|
|
325
341
|
Parameters:
|
|
326
342
|
|
|
327
|
-
|
|
328
|
-
|
|
343
|
+
`<String|String[]> input` – single string or array of strings to normalize
|
|
344
|
+
`<String> flags` normalization flags
|
|
329
345
|
|
|
330
346
|
Example:
|
|
331
347
|
|
|
@@ -334,6 +350,9 @@ const cmp = new CmpStr();
|
|
|
334
350
|
|
|
335
351
|
console.log( cmp.normalize( ' he123LLo ', 'nti' ) );
|
|
336
352
|
// Output: hello
|
|
353
|
+
|
|
354
|
+
console.log( cmp.normalize( [ 'Hello World!', 'CmpStr 123' ], 'nwti' ) );
|
|
355
|
+
// Output: [ 'hello world!', 'cmpstr' ]
|
|
337
356
|
```
|
|
338
357
|
|
|
339
358
|
### Configuration Object
|
|
@@ -344,9 +363,9 @@ It also contains `options` as an object of key-value pairs that are passed to th
|
|
|
344
363
|
|
|
345
364
|
Global config options:
|
|
346
365
|
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
|
|
366
|
+
`<String> flags` – normalization flags
|
|
367
|
+
`<Number> threshold` – similarity threshold between 0 and 1
|
|
368
|
+
`<Object> options` – options passed to the algorithm
|
|
350
369
|
|
|
351
370
|
Example:
|
|
352
371
|
|
|
@@ -369,9 +388,9 @@ console.log( cmp.match( [
|
|
|
369
388
|
|
|
370
389
|
## Asynchronous Support
|
|
371
390
|
|
|
372
|
-
The `CmpStrAsync` class provides asynchronous
|
|
391
|
+
The `CmpStrAsync` class provides an asynchronous wrapper for all comparison methods as well as the string normalization function. It is ideal for large datasets or non-blocking workflows.
|
|
373
392
|
|
|
374
|
-
The asynchronous class supports the methods `compareAsync`, `testAsync`, `batchTestAsync`, `matchAsync`, `closestAsync` and `similarityMatrixAsync`. Each of these methods returns a `Promise`.
|
|
393
|
+
The asynchronous class supports the methods `normalizeAsync`, `compareAsync`, `testAsync`, `batchTestAsync`, `matchAsync`, `closestAsync` and `similarityMatrixAsync`. Each of these methods returns a `Promise`.
|
|
375
394
|
|
|
376
395
|
For options, arguments and returned values, see the documentation above.
|
|
377
396
|
|
|
@@ -399,7 +418,7 @@ The Levenshtein distance between two strings is the minimum number of single-cha
|
|
|
399
418
|
|
|
400
419
|
Options:
|
|
401
420
|
|
|
402
|
-
|
|
421
|
+
`<Boolean> raw` – if true the raw distance is returned
|
|
403
422
|
|
|
404
423
|
#### Damerau-Levenshtein – `damerau`
|
|
405
424
|
|
|
@@ -407,7 +426,7 @@ The Damerau-Levenshtein distance differs from the classical Levenshtein distance
|
|
|
407
426
|
|
|
408
427
|
Options:
|
|
409
428
|
|
|
410
|
-
|
|
429
|
+
`<Boolean> raw` – if true the raw distance is returned
|
|
411
430
|
|
|
412
431
|
#### Jaro-Winkler – `jaro`
|
|
413
432
|
|
|
@@ -415,7 +434,7 @@ Jaro-Winkler is a string similarity metric that gives more weight to matching ch
|
|
|
415
434
|
|
|
416
435
|
Options:
|
|
417
436
|
|
|
418
|
-
|
|
437
|
+
`<Boolean> raw` – if true the raw distance is returned
|
|
419
438
|
|
|
420
439
|
#### Cosine Similarity – `cosine`
|
|
421
440
|
|
|
@@ -423,7 +442,7 @@ Cosine similarity is a measure how similar two vectors are. It's often used in t
|
|
|
423
442
|
|
|
424
443
|
Options:
|
|
425
444
|
|
|
426
|
-
|
|
445
|
+
`<String> delimiter` – term delimiter
|
|
427
446
|
|
|
428
447
|
#### Dice Coefficient – `dice`
|
|
429
448
|
|
|
@@ -447,9 +466,9 @@ The Needleman-Wunsch algorithm performs global alignment, aligning two strings e
|
|
|
447
466
|
|
|
448
467
|
Options:
|
|
449
468
|
|
|
450
|
-
|
|
451
|
-
|
|
452
|
-
|
|
469
|
+
`<Number> match` – score for a match
|
|
470
|
+
`<Number> mismatch` – penalty for a mismatch
|
|
471
|
+
`<Number> gap` – penalty for a gap
|
|
453
472
|
|
|
454
473
|
#### Smith-Waterman – `smithWaterman`
|
|
455
474
|
|
|
@@ -457,9 +476,9 @@ The Smith-Waterman algorithm performs local alignment, finding the best matching
|
|
|
457
476
|
|
|
458
477
|
Options:
|
|
459
478
|
|
|
460
|
-
|
|
461
|
-
|
|
462
|
-
|
|
479
|
+
`<Number> match` – score for a match
|
|
480
|
+
`<Number> mismatch` – penalty for a mismatch
|
|
481
|
+
`<Number> gap` – penalty for a gap
|
|
463
482
|
|
|
464
483
|
#### q-Gram – `qGram`
|
|
465
484
|
|
|
@@ -467,7 +486,7 @@ Q-gram similarity is a string-matching algorithm that compares two strings by br
|
|
|
467
486
|
|
|
468
487
|
Options:
|
|
469
488
|
|
|
470
|
-
|
|
489
|
+
`<Int> q` length of substrings
|
|
471
490
|
|
|
472
491
|
### Phonetic Algorithms
|
|
473
492
|
|
|
@@ -477,8 +496,8 @@ The Soundex algorithm generates a phonetic representation of a string based on h
|
|
|
477
496
|
|
|
478
497
|
Options:
|
|
479
498
|
|
|
480
|
-
|
|
481
|
-
|
|
482
|
-
|
|
483
|
-
|
|
484
|
-
|
|
499
|
+
`<String> lang` – language code for predefined setups (e.g., `en`, `de`)
|
|
500
|
+
`<Boolean> raw` – if true, returns the raw sound index codes
|
|
501
|
+
`<Object> mapping` – custom phonetic mapping (overrides predefined)
|
|
502
|
+
`<String> exclude` – characters to exclude from the input (overrides predefined)
|
|
503
|
+
`<Number> maxLength` – maximum length of the phonetic code
|
package/package.json
CHANGED
package/src/CmpStr.d.ts
CHANGED
|
@@ -17,23 +17,27 @@ export declare class CmpStr {
|
|
|
17
17
|
|
|
18
18
|
setStr ( str: string ) : boolean;
|
|
19
19
|
|
|
20
|
-
|
|
20
|
+
getStr () : string;
|
|
21
|
+
|
|
22
|
+
listAlgo ( loadedOnly?: boolean = false ) : string[];
|
|
21
23
|
|
|
22
24
|
isAlgo ( algo: string ) : boolean;
|
|
23
25
|
|
|
24
26
|
setAlgo ( algo: string ) : boolean;
|
|
25
27
|
|
|
28
|
+
getAlgo () : string;
|
|
29
|
+
|
|
26
30
|
addAlgo ( algo: string, callback: (
|
|
27
31
|
a: string, b: string, ...args : any
|
|
28
32
|
) => number | any, useIt?: boolean ) : boolean;
|
|
29
33
|
|
|
30
34
|
rmvAlgo( algo: string ) : boolean;
|
|
31
35
|
|
|
32
|
-
listFilter () : string[];
|
|
36
|
+
listFilter ( activeOnly?: boolean = false ) : string[];
|
|
33
37
|
|
|
34
38
|
addFilter ( name: string, callback: (
|
|
35
39
|
str: string
|
|
36
|
-
) => string, priority?: number ) : boolean;
|
|
40
|
+
) => string, priority?: number = 10 ) : boolean;
|
|
37
41
|
|
|
38
42
|
rmvFilter ( name: string ) : boolean;
|
|
39
43
|
|
|
@@ -45,7 +49,9 @@ export declare class CmpStr {
|
|
|
45
49
|
|
|
46
50
|
setFlags( flags: string ) : void;
|
|
47
51
|
|
|
48
|
-
|
|
52
|
+
getFlags () : string;
|
|
53
|
+
|
|
54
|
+
normalize ( input: string|string[], flags?: string ) : string|string[];
|
|
49
55
|
|
|
50
56
|
clearCache () : boolean;
|
|
51
57
|
|
package/src/CmpStr.js
CHANGED
|
@@ -20,6 +20,12 @@
|
|
|
20
20
|
|
|
21
21
|
module.exports = class CmpStr {
|
|
22
22
|
|
|
23
|
+
/**
|
|
24
|
+
* --------------------------------------------------
|
|
25
|
+
* Global Variables
|
|
26
|
+
* --------------------------------------------------
|
|
27
|
+
*/
|
|
28
|
+
|
|
23
29
|
/**
|
|
24
30
|
* all pre-defined similarity algorithms
|
|
25
31
|
*
|
|
@@ -41,6 +47,15 @@ module.exports = class CmpStr {
|
|
|
41
47
|
soundex: './algorithms/soundex'
|
|
42
48
|
};
|
|
43
49
|
|
|
50
|
+
/**
|
|
51
|
+
* stores the names of loaded algorithms
|
|
52
|
+
*
|
|
53
|
+
* @since 2.0.2
|
|
54
|
+
* @private
|
|
55
|
+
* @type {Set<String>}
|
|
56
|
+
*/
|
|
57
|
+
#loadedAlgo = new Set ();
|
|
58
|
+
|
|
44
59
|
/**
|
|
45
60
|
* normalized strings cache
|
|
46
61
|
*
|
|
@@ -61,28 +76,43 @@ module.exports = class CmpStr {
|
|
|
61
76
|
* default normalization flags
|
|
62
77
|
* set by setFlags()
|
|
63
78
|
*
|
|
64
|
-
* @
|
|
79
|
+
* @private
|
|
80
|
+
* @type {String}
|
|
81
|
+
*/
|
|
82
|
+
#flags = '';
|
|
83
|
+
|
|
84
|
+
/**
|
|
85
|
+
* current algorithm to use for similarity calculations
|
|
86
|
+
* set by setAlgo(), addAlgo() or constructor()
|
|
87
|
+
*
|
|
88
|
+
* @private
|
|
65
89
|
* @type {String}
|
|
66
90
|
*/
|
|
67
|
-
|
|
91
|
+
#algo;
|
|
68
92
|
|
|
69
93
|
/**
|
|
70
94
|
* base string for comparison
|
|
71
95
|
* set by setStr or constructor()
|
|
72
96
|
*
|
|
73
|
-
* @
|
|
97
|
+
* @private
|
|
74
98
|
* @type {String}
|
|
75
99
|
*/
|
|
76
|
-
str;
|
|
100
|
+
#str;
|
|
77
101
|
|
|
78
102
|
/**
|
|
79
|
-
*
|
|
80
|
-
* set by setAlgo(), addAlgo() or constructor()
|
|
103
|
+
* stores the current ready state
|
|
81
104
|
*
|
|
82
|
-
* @
|
|
83
|
-
* @
|
|
105
|
+
* @since 2.0.2
|
|
106
|
+
* @private
|
|
107
|
+
* @type {Boolean}
|
|
108
|
+
*/
|
|
109
|
+
#readyState = false;
|
|
110
|
+
|
|
111
|
+
/**
|
|
112
|
+
* --------------------------------------------------
|
|
113
|
+
* Constructor
|
|
114
|
+
* --------------------------------------------------
|
|
84
115
|
*/
|
|
85
|
-
algo;
|
|
86
116
|
|
|
87
117
|
/**
|
|
88
118
|
* initializes a CmpStr instance
|
|
@@ -107,6 +137,12 @@ module.exports = class CmpStr {
|
|
|
107
137
|
|
|
108
138
|
};
|
|
109
139
|
|
|
140
|
+
/**
|
|
141
|
+
* --------------------------------------------------
|
|
142
|
+
* Ready State
|
|
143
|
+
* --------------------------------------------------
|
|
144
|
+
*/
|
|
145
|
+
|
|
110
146
|
/**
|
|
111
147
|
* checks whether string and algorithm are set correctly
|
|
112
148
|
*
|
|
@@ -114,11 +150,23 @@ module.exports = class CmpStr {
|
|
|
114
150
|
*/
|
|
115
151
|
isReady () {
|
|
116
152
|
|
|
117
|
-
return
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
153
|
+
return this.#readyState;
|
|
154
|
+
|
|
155
|
+
};
|
|
156
|
+
|
|
157
|
+
/**
|
|
158
|
+
* updates the readiness state
|
|
159
|
+
*
|
|
160
|
+
* @since 2.0.2
|
|
161
|
+
* @private
|
|
162
|
+
*/
|
|
163
|
+
#updateReadyState () {
|
|
164
|
+
|
|
165
|
+
this.#readyState = (
|
|
166
|
+
typeof this.#algo === 'string' &&
|
|
167
|
+
this.isAlgo( this.#algo ) &&
|
|
168
|
+
typeof this.#str === 'string' &&
|
|
169
|
+
this.#str.length !== 0
|
|
122
170
|
);
|
|
123
171
|
|
|
124
172
|
};
|
|
@@ -126,12 +174,13 @@ module.exports = class CmpStr {
|
|
|
126
174
|
/**
|
|
127
175
|
* checks ready state and throws an error if not
|
|
128
176
|
*
|
|
177
|
+
* @private
|
|
129
178
|
* @returns {Boolean} true if ready
|
|
130
179
|
* @throws {Error} if CmpStr is not ready
|
|
131
180
|
*/
|
|
132
|
-
|
|
181
|
+
#checkReady () {
|
|
133
182
|
|
|
134
|
-
if ( !this
|
|
183
|
+
if ( !this.#readyState ) {
|
|
135
184
|
|
|
136
185
|
throw new Error(
|
|
137
186
|
`CmpStr instance is not ready. Ensure the algorithm and base string are set.`
|
|
@@ -143,6 +192,12 @@ module.exports = class CmpStr {
|
|
|
143
192
|
|
|
144
193
|
};
|
|
145
194
|
|
|
195
|
+
/**
|
|
196
|
+
* --------------------------------------------------
|
|
197
|
+
* Base String
|
|
198
|
+
* --------------------------------------------------
|
|
199
|
+
*/
|
|
200
|
+
|
|
146
201
|
/**
|
|
147
202
|
* sets the base string for comparison
|
|
148
203
|
*
|
|
@@ -151,12 +206,26 @@ module.exports = class CmpStr {
|
|
|
151
206
|
*/
|
|
152
207
|
setStr ( str ) {
|
|
153
208
|
|
|
154
|
-
this
|
|
209
|
+
this.#str = String ( str );
|
|
210
|
+
|
|
211
|
+
this.#updateReadyState();
|
|
155
212
|
|
|
156
213
|
return true;
|
|
157
214
|
|
|
158
215
|
};
|
|
159
216
|
|
|
217
|
+
/**
|
|
218
|
+
* gets the base string for comparison
|
|
219
|
+
*
|
|
220
|
+
* @since 2.0.2
|
|
221
|
+
* @returns {String} base string
|
|
222
|
+
*/
|
|
223
|
+
getStr () {
|
|
224
|
+
|
|
225
|
+
return this.#str;
|
|
226
|
+
|
|
227
|
+
};
|
|
228
|
+
|
|
160
229
|
/**
|
|
161
230
|
* --------------------------------------------------
|
|
162
231
|
* Algorithms
|
|
@@ -164,13 +233,16 @@ module.exports = class CmpStr {
|
|
|
164
233
|
*/
|
|
165
234
|
|
|
166
235
|
/**
|
|
167
|
-
* list all registered similarity algorithms
|
|
236
|
+
* list all registered or loaded similarity algorithms
|
|
168
237
|
*
|
|
238
|
+
* @param {Boolean} [loadedOnly=false] it true, only loaded algorithm names are returned
|
|
169
239
|
* @returns {String[]} array of algorithm names
|
|
170
240
|
*/
|
|
171
|
-
listAlgo () {
|
|
241
|
+
listAlgo ( loadedOnly = false ) {
|
|
172
242
|
|
|
173
|
-
return
|
|
243
|
+
return loadedOnly
|
|
244
|
+
? [ ...this.#loadedAlgo ]
|
|
245
|
+
: [ ...Object.keys( this.#algorithms ) ];
|
|
174
246
|
|
|
175
247
|
};
|
|
176
248
|
|
|
@@ -194,9 +266,11 @@ module.exports = class CmpStr {
|
|
|
194
266
|
*/
|
|
195
267
|
setAlgo ( algo ) {
|
|
196
268
|
|
|
197
|
-
if ( this
|
|
269
|
+
if ( this.#loadAlgo( algo ) ) {
|
|
198
270
|
|
|
199
|
-
this
|
|
271
|
+
this.#algo = algo;
|
|
272
|
+
|
|
273
|
+
this.#updateReadyState();
|
|
200
274
|
|
|
201
275
|
return true;
|
|
202
276
|
|
|
@@ -204,6 +278,18 @@ module.exports = class CmpStr {
|
|
|
204
278
|
|
|
205
279
|
};
|
|
206
280
|
|
|
281
|
+
/**
|
|
282
|
+
* gets the current algorithm to use for similarity calculations
|
|
283
|
+
*
|
|
284
|
+
* @since 2.0.2
|
|
285
|
+
* @returns {String} name of the algorithm
|
|
286
|
+
*/
|
|
287
|
+
getAlgo () {
|
|
288
|
+
|
|
289
|
+
return this.#algo;
|
|
290
|
+
|
|
291
|
+
};
|
|
292
|
+
|
|
207
293
|
/**
|
|
208
294
|
* adds a new similarity algorithm
|
|
209
295
|
*
|
|
@@ -255,11 +341,15 @@ module.exports = class CmpStr {
|
|
|
255
341
|
|
|
256
342
|
delete this.#algorithms[ algo ];
|
|
257
343
|
|
|
258
|
-
|
|
344
|
+
this.#loadedAlgo.delete( algo );
|
|
345
|
+
|
|
346
|
+
if ( this.#algo === algo ) {
|
|
259
347
|
|
|
260
348
|
/* reset current algorithm if it was removed */
|
|
261
349
|
|
|
262
|
-
this
|
|
350
|
+
this.#algo = undefined;
|
|
351
|
+
|
|
352
|
+
this.#updateReadyState();
|
|
263
353
|
|
|
264
354
|
}
|
|
265
355
|
|
|
@@ -278,18 +368,25 @@ module.exports = class CmpStr {
|
|
|
278
368
|
/**
|
|
279
369
|
* lazy-loads the specified algorithm module
|
|
280
370
|
*
|
|
371
|
+
* @private
|
|
281
372
|
* @param {String} algo name of the similarity algorithm
|
|
282
373
|
* @returns {Boolean} true if the algorithm is loaded
|
|
283
374
|
* @throws {Error} if the algorithm cannot be loaded or is not defined
|
|
284
375
|
*/
|
|
285
|
-
|
|
376
|
+
#loadAlgo ( algo ) {
|
|
286
377
|
|
|
287
|
-
if ( this.
|
|
378
|
+
if ( this.#loadedAlgo.has( algo ) ) {
|
|
379
|
+
|
|
380
|
+
return true;
|
|
381
|
+
|
|
382
|
+
} else if ( this.isAlgo( algo ) ) {
|
|
288
383
|
|
|
289
384
|
let typeOf = typeof this.#algorithms[ algo ];
|
|
290
385
|
|
|
291
386
|
if ( typeOf === 'function' ) {
|
|
292
387
|
|
|
388
|
+
this.#loadedAlgo.add( algo );
|
|
389
|
+
|
|
293
390
|
return true;
|
|
294
391
|
|
|
295
392
|
} else if ( typeOf === 'string' ) {
|
|
@@ -302,6 +399,8 @@ module.exports = class CmpStr {
|
|
|
302
399
|
this.#algorithms[ algo ]
|
|
303
400
|
);
|
|
304
401
|
|
|
402
|
+
this.#loadedAlgo.add( algo );
|
|
403
|
+
|
|
305
404
|
return true;
|
|
306
405
|
|
|
307
406
|
} catch ( err ) {
|
|
@@ -338,13 +437,18 @@ module.exports = class CmpStr {
|
|
|
338
437
|
*/
|
|
339
438
|
|
|
340
439
|
/**
|
|
341
|
-
* list all added
|
|
440
|
+
* list all added or artice filter names
|
|
342
441
|
*
|
|
442
|
+
* @param {Boolean} [activeOnly=false] if true, only names of active filters are returned
|
|
343
443
|
* @returns {String[]} array of filter names
|
|
344
444
|
*/
|
|
345
|
-
listFilter () {
|
|
445
|
+
listFilter ( activeOnly = false ) {
|
|
346
446
|
|
|
347
|
-
return
|
|
447
|
+
return activeOnly
|
|
448
|
+
? Array.from( this.#filter.entries() )
|
|
449
|
+
.filter( ( [ _, filter ] ) => filter.active )
|
|
450
|
+
.map( ( [ name ] ) => name )
|
|
451
|
+
: [ ...this.#filter.keys() ];
|
|
348
452
|
|
|
349
453
|
};
|
|
350
454
|
|
|
@@ -482,11 +586,12 @@ module.exports = class CmpStr {
|
|
|
482
586
|
/**
|
|
483
587
|
* applies all active filters to a string
|
|
484
588
|
*
|
|
589
|
+
* @private
|
|
485
590
|
* @param {String} str string to process
|
|
486
591
|
* @returns {String} filtered string
|
|
487
592
|
* @throws {Error} if applying filters cause an error
|
|
488
593
|
*/
|
|
489
|
-
|
|
594
|
+
#applyFilters ( str ) {
|
|
490
595
|
|
|
491
596
|
try {
|
|
492
597
|
|
|
@@ -524,7 +629,19 @@ module.exports = class CmpStr {
|
|
|
524
629
|
*/
|
|
525
630
|
setFlags ( flags = '' ) {
|
|
526
631
|
|
|
527
|
-
this
|
|
632
|
+
this.#flags = String ( flags );
|
|
633
|
+
|
|
634
|
+
};
|
|
635
|
+
|
|
636
|
+
/**
|
|
637
|
+
* get default normalization flags
|
|
638
|
+
*
|
|
639
|
+
* @since 2.0.2
|
|
640
|
+
* @returns {String} normalization flags
|
|
641
|
+
*/
|
|
642
|
+
getFlags () {
|
|
643
|
+
|
|
644
|
+
return this.#flags;
|
|
528
645
|
|
|
529
646
|
};
|
|
530
647
|
|
|
@@ -544,57 +661,73 @@ module.exports = class CmpStr {
|
|
|
544
661
|
* d :: decompose unicode
|
|
545
662
|
* u :: normalize unicode
|
|
546
663
|
*
|
|
547
|
-
* @param {String} string string to normalize
|
|
664
|
+
* @param {String|String[]} string string(s) to normalize
|
|
548
665
|
* @param {String} [flags=''] normalization flags
|
|
549
|
-
* @returns {String} normalized string
|
|
666
|
+
* @returns {String|String[]} normalized string(s)
|
|
550
667
|
* @throws {Error} if normalization cause an error
|
|
551
668
|
*/
|
|
552
|
-
normalize (
|
|
669
|
+
normalize ( input, flags = '' ) {
|
|
553
670
|
|
|
554
|
-
|
|
671
|
+
const processStr = ( str ) => {
|
|
555
672
|
|
|
556
|
-
|
|
673
|
+
let res = String ( str );
|
|
557
674
|
|
|
558
|
-
|
|
675
|
+
/* use normalized string from cache to increase performance */
|
|
559
676
|
|
|
560
|
-
|
|
677
|
+
let key = `${res}::${flags}`;
|
|
561
678
|
|
|
562
|
-
|
|
679
|
+
if ( this.#cache.has( key ) ) {
|
|
563
680
|
|
|
564
|
-
|
|
681
|
+
return this.#cache.get( key );
|
|
565
682
|
|
|
566
|
-
|
|
683
|
+
}
|
|
567
684
|
|
|
568
|
-
|
|
685
|
+
/* apply custom filters */
|
|
569
686
|
|
|
570
|
-
|
|
687
|
+
res = this.#applyFilters( res );
|
|
571
688
|
|
|
572
|
-
|
|
689
|
+
/* normalize using flags */
|
|
573
690
|
|
|
574
|
-
|
|
575
|
-
if ( flags.includes( 'w' ) ) res = res.replace( /\s+/g, ' ' );
|
|
576
|
-
if ( flags.includes( 'r' ) ) res = res.replace( /(.)\1+/g, '$1' );
|
|
577
|
-
if ( flags.includes( 'k' ) ) res = res.replace( /[^a-z]/gi, '' );
|
|
578
|
-
if ( flags.includes( 'n' ) ) res = res.replace( /[0-9]/g, '' );
|
|
579
|
-
if ( flags.includes( 't' ) ) res = res.trim();
|
|
580
|
-
if ( flags.includes( 'i' ) ) res = res.toLowerCase();
|
|
581
|
-
if ( flags.includes( 'd' ) ) res = res.normalize( 'NFD' ).replace( /[\u0300-\u036f]/g, '' );
|
|
582
|
-
if ( flags.includes( 'u' ) ) res = res.normalize( 'NFC' );
|
|
691
|
+
try {
|
|
583
692
|
|
|
584
|
-
|
|
693
|
+
if ( flags.includes( 's' ) ) res = res.replace( /[^a-z0-9]/gi, '' );
|
|
694
|
+
if ( flags.includes( 'w' ) ) res = res.replace( /\s+/g, ' ' );
|
|
695
|
+
if ( flags.includes( 'r' ) ) res = res.replace( /(.)\1+/g, '$1' );
|
|
696
|
+
if ( flags.includes( 'k' ) ) res = res.replace( /[^a-z]/gi, '' );
|
|
697
|
+
if ( flags.includes( 'n' ) ) res = res.replace( /[0-9]/g, '' );
|
|
698
|
+
if ( flags.includes( 't' ) ) res = res.trim();
|
|
699
|
+
if ( flags.includes( 'i' ) ) res = res.toLowerCase();
|
|
700
|
+
if ( flags.includes( 'd' ) ) res = res.normalize( 'NFD' ).replace( /[\u0300-\u036f]/g, '' );
|
|
701
|
+
if ( flags.includes( 'u' ) ) res = res.normalize( 'NFC' );
|
|
585
702
|
|
|
586
|
-
|
|
587
|
-
|
|
588
|
-
|
|
589
|
-
|
|
703
|
+
} catch ( err ) {
|
|
704
|
+
|
|
705
|
+
throw new Error (
|
|
706
|
+
`Error while normalization.`,
|
|
707
|
+
{ cause: err }
|
|
708
|
+
);
|
|
709
|
+
|
|
710
|
+
}
|
|
711
|
+
|
|
712
|
+
/* store the normalized string in the cache */
|
|
713
|
+
|
|
714
|
+
this.#cache.set( key, res );
|
|
715
|
+
|
|
716
|
+
return res;
|
|
590
717
|
|
|
591
718
|
}
|
|
592
719
|
|
|
593
|
-
/*
|
|
720
|
+
/* processing multiple string */
|
|
721
|
+
|
|
722
|
+
if ( Array.isArray( input ) ) {
|
|
594
723
|
|
|
595
|
-
|
|
724
|
+
return input.map(
|
|
725
|
+
( str ) => processStr( str )
|
|
726
|
+
);
|
|
727
|
+
|
|
728
|
+
}
|
|
596
729
|
|
|
597
|
-
return
|
|
730
|
+
return processStr( input );
|
|
598
731
|
|
|
599
732
|
};
|
|
600
733
|
|
|
@@ -629,7 +762,7 @@ module.exports = class CmpStr {
|
|
|
629
762
|
*/
|
|
630
763
|
compare ( algo, a, b, config = {} ) {
|
|
631
764
|
|
|
632
|
-
if ( this
|
|
765
|
+
if ( this.#loadAlgo( algo ) ) {
|
|
633
766
|
|
|
634
767
|
/* handle trivial cases */
|
|
635
768
|
|
|
@@ -639,7 +772,7 @@ module.exports = class CmpStr {
|
|
|
639
772
|
/* apply similarity algorithm */
|
|
640
773
|
|
|
641
774
|
const {
|
|
642
|
-
flags = this
|
|
775
|
+
flags = this.#flags,
|
|
643
776
|
options = {}
|
|
644
777
|
} = config;
|
|
645
778
|
|
|
@@ -674,11 +807,11 @@ module.exports = class CmpStr {
|
|
|
674
807
|
*/
|
|
675
808
|
test ( str, config = {} ) {
|
|
676
809
|
|
|
677
|
-
if ( this
|
|
810
|
+
if ( this.#checkReady() ) {
|
|
678
811
|
|
|
679
812
|
return this.compare(
|
|
680
|
-
this
|
|
681
|
-
this
|
|
813
|
+
this.#algo,
|
|
814
|
+
this.#str, str,
|
|
682
815
|
config
|
|
683
816
|
);
|
|
684
817
|
|
|
@@ -695,13 +828,13 @@ module.exports = class CmpStr {
|
|
|
695
828
|
*/
|
|
696
829
|
batchTest ( arr, config = {} ) {
|
|
697
830
|
|
|
698
|
-
if ( this
|
|
831
|
+
if ( this.#checkReady() ) {
|
|
699
832
|
|
|
700
833
|
return [ ...arr ].map( ( str ) => ( {
|
|
701
834
|
target: str,
|
|
702
835
|
match: this.compare(
|
|
703
|
-
this
|
|
704
|
-
this
|
|
836
|
+
this.#algo,
|
|
837
|
+
this.#str, str,
|
|
705
838
|
config
|
|
706
839
|
)
|
|
707
840
|
} ) );
|
|
@@ -763,7 +896,7 @@ module.exports = class CmpStr {
|
|
|
763
896
|
*/
|
|
764
897
|
similarityMatrix ( algo, arr, config = {} ) {
|
|
765
898
|
|
|
766
|
-
if ( this
|
|
899
|
+
if ( this.#loadAlgo( algo ) ) {
|
|
767
900
|
|
|
768
901
|
delete config?.options?.raw;
|
|
769
902
|
|
package/src/CmpStrAsync.d.ts
CHANGED
|
@@ -2,6 +2,8 @@ import { CmpStr, Config, BatchResult } from './CmpStr';
|
|
|
2
2
|
|
|
3
3
|
export declare class CmpStrAsync extends CmpStr {
|
|
4
4
|
|
|
5
|
+
normalizeAsync ( input: string|string[], flags?: string ) : Promise<string|string[]>;
|
|
6
|
+
|
|
5
7
|
compareAsync ( algo: string, a: string, b: string, config?: Config ) : Promise<number | any>;
|
|
6
8
|
|
|
7
9
|
testAsync ( str: string, config?: Config ) : Promise<number | any>;
|
|
@@ -14,4 +16,4 @@ export declare class CmpStrAsync extends CmpStr {
|
|
|
14
16
|
|
|
15
17
|
similarityMatrixAsync ( algo: string, arr: string[], config?: Config ) : Promise<number[][]>;
|
|
16
18
|
|
|
17
|
-
}
|
|
19
|
+
}
|
package/src/CmpStrAsync.js
CHANGED
|
@@ -40,9 +40,9 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
40
40
|
};
|
|
41
41
|
|
|
42
42
|
/**
|
|
43
|
-
* @private
|
|
44
43
|
* generic async wrapper for methods
|
|
45
44
|
*
|
|
45
|
+
* @private
|
|
46
46
|
* @param {Function} method method to call
|
|
47
47
|
* @param {...any} args arguments to pass to the method
|
|
48
48
|
* @returns {Promise} Promise resolving the result of the method
|
|
@@ -76,9 +76,25 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
76
76
|
*/
|
|
77
77
|
|
|
78
78
|
/**
|
|
79
|
-
*
|
|
79
|
+
* normalizes a string by chainable options; uses cache to increase
|
|
80
|
+
* performance and custom filters for advanced behavior
|
|
80
81
|
*
|
|
81
|
-
* @
|
|
82
|
+
* @since 2.0.2
|
|
83
|
+
* @param {String|String[]} input string(s) to normalize
|
|
84
|
+
* @param {String} [flags=''] normalization flags
|
|
85
|
+
* @returns {Promise} Promise resolving string normalization
|
|
86
|
+
*/
|
|
87
|
+
normalizeAsync ( input, flags = '' ) {
|
|
88
|
+
|
|
89
|
+
return this.#asyncWrapper(
|
|
90
|
+
this.normalize,
|
|
91
|
+
input, flags
|
|
92
|
+
);
|
|
93
|
+
|
|
94
|
+
};
|
|
95
|
+
|
|
96
|
+
/**
|
|
97
|
+
* compares two string a and b using the passed algorithm
|
|
82
98
|
*
|
|
83
99
|
* @param {String} algo name of the algorithm
|
|
84
100
|
* @param {String} a string a
|
|
@@ -99,8 +115,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
99
115
|
* tests the similarity between the base string and a target string
|
|
100
116
|
* using the current algorithm
|
|
101
117
|
*
|
|
102
|
-
* @async
|
|
103
|
-
*
|
|
104
118
|
* @param {String} str target string
|
|
105
119
|
* @param {Object} [config={}] config (flags, args)
|
|
106
120
|
* @returns {Promise} Promise resolving similarity to base string
|
|
@@ -117,8 +131,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
117
131
|
/**
|
|
118
132
|
* tests the similarity of multiple strings against the base string
|
|
119
133
|
*
|
|
120
|
-
* @async
|
|
121
|
-
*
|
|
122
134
|
* @param {String[]} arr array of strings
|
|
123
135
|
* @param {Object} [config={}] config (flags, args)
|
|
124
136
|
* @returns {Promise} Promise resolving an array of objects, each containing target string and similarity score
|
|
@@ -136,13 +148,11 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
136
148
|
* finds strings in an array that exceed a similarity threshold
|
|
137
149
|
* returns the array sorted by highest similarity
|
|
138
150
|
*
|
|
139
|
-
* @async
|
|
140
|
-
*
|
|
141
151
|
* @param {String[]} arr array of strings
|
|
142
152
|
* @param {Object} [config={}] config (flags, threshold, args)
|
|
143
153
|
* @returns {Promise} Promise resolving an array of objects, sorted by highest similarity
|
|
144
154
|
*/
|
|
145
|
-
|
|
155
|
+
matchAsync ( arr, config = {} ) {
|
|
146
156
|
|
|
147
157
|
return this.#asyncWrapper(
|
|
148
158
|
this.match,
|
|
@@ -154,13 +164,11 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
154
164
|
/**
|
|
155
165
|
* finds the closest matching string from an array
|
|
156
166
|
*
|
|
157
|
-
* @async
|
|
158
|
-
*
|
|
159
167
|
* @param {String[]} arr array of strings
|
|
160
168
|
* @param {Object} [config={}] config (flags, args)
|
|
161
169
|
* @returns {Promise} Promise resolving the closest matching string
|
|
162
170
|
*/
|
|
163
|
-
|
|
171
|
+
closestAsync ( arr, config = {} ) {
|
|
164
172
|
|
|
165
173
|
return this.#asyncWrapper(
|
|
166
174
|
this.closest,
|
|
@@ -172,8 +180,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
172
180
|
/**
|
|
173
181
|
* generate a similarity matrix for an array of strings
|
|
174
182
|
*
|
|
175
|
-
* @async
|
|
176
|
-
*
|
|
177
183
|
* @param {String} algo name of the algorithm
|
|
178
184
|
* @param {String[]} arr array of strings to cross-compare
|
|
179
185
|
* @param {Object} [config={}] config (flags, args)
|