cmpstr 2.0.1 → 2.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +80 -65
- package/package.json +1 -1
- package/src/CmpStr.d.ts +8 -2
- package/src/CmpStr.js +194 -66
- package/src/CmpStrAsync.d.ts +2 -0
- package/src/CmpStrAsync.js +20 -7
- package/src/index.js +1 -1
package/README.md
CHANGED
|
@@ -10,7 +10,7 @@ CmpStr is a lightweight and powerful npm package for calculating string similari
|
|
|
10
10
|
- Customizable normalization with global flags and caching.
|
|
11
11
|
- Asynchronous support for non-blocking workflows.
|
|
12
12
|
- Extensible with custom algorithms and filters.
|
|
13
|
-
- TypeScript
|
|
13
|
+
- TypeScript declarations for better developer experience.
|
|
14
14
|
|
|
15
15
|
## Installation
|
|
16
16
|
|
|
@@ -62,15 +62,19 @@ Sets the base string for comparison.
|
|
|
62
62
|
|
|
63
63
|
Parameters:
|
|
64
64
|
|
|
65
|
-
|
|
65
|
+
`<String> str` – string to set as the base
|
|
66
|
+
|
|
67
|
+
#### `getStr()`
|
|
68
|
+
|
|
69
|
+
Gets the base string for comparison.
|
|
66
70
|
|
|
67
71
|
#### `setFlags( [ flags = '' ] )`
|
|
68
72
|
|
|
69
73
|
Set default normalization flags. They will be overwritten by passing `flags` through the configuration object. See description of available flags / normalization options below in the documentation.
|
|
70
74
|
|
|
71
|
-
|
|
75
|
+
#### `getFlags()`
|
|
72
76
|
|
|
73
|
-
|
|
77
|
+
Gets the default normalization flags.
|
|
74
78
|
|
|
75
79
|
#### `clearCache()`
|
|
76
80
|
|
|
@@ -78,17 +82,21 @@ Clears the normalization cache.
|
|
|
78
82
|
|
|
79
83
|
### Algorithms
|
|
80
84
|
|
|
81
|
-
#### `listAlgo()`
|
|
85
|
+
#### `listAlgo( [ loadedOnly = false ] )`
|
|
82
86
|
|
|
83
87
|
List all registered similarity algorithms.
|
|
84
88
|
|
|
89
|
+
Parameters:
|
|
90
|
+
|
|
91
|
+
`<Boolean> loadedOnly` – it true, only loaded algorithm names are returned
|
|
92
|
+
|
|
85
93
|
#### `isAlgo( algo )`
|
|
86
94
|
|
|
87
95
|
Checks if an algorithm is registered. Returns `true` if so, `false` otherwise.
|
|
88
96
|
|
|
89
97
|
Parameters:
|
|
90
98
|
|
|
91
|
-
|
|
99
|
+
`<String> algo` – name of the algorithm
|
|
92
100
|
|
|
93
101
|
#### `setAlgo( algo )`
|
|
94
102
|
|
|
@@ -98,7 +106,11 @@ Allowed options for build-in althorithms are `cosine`, `damerau`, `dice`, `hammi
|
|
|
98
106
|
|
|
99
107
|
Parameters:
|
|
100
108
|
|
|
101
|
-
|
|
109
|
+
`<String> algo` – name of the algorithm
|
|
110
|
+
|
|
111
|
+
#### `getAlgo()`
|
|
112
|
+
|
|
113
|
+
Gets the current algorithm to use for similarity calculations.
|
|
102
114
|
|
|
103
115
|
#### `addAlgo( algo, callback [, useIt = true ] )`
|
|
104
116
|
|
|
@@ -106,9 +118,9 @@ Adding a new similarity algorithm by using the `addAlgo()` method passing the na
|
|
|
106
118
|
|
|
107
119
|
Parameters:
|
|
108
120
|
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
121
|
+
`<String> algo` – name of the algorithm
|
|
122
|
+
`<Function> callback` – callback function implementing the algorithm
|
|
123
|
+
`<Boolean> useIt` – whether to set this algorithm as the current one
|
|
112
124
|
|
|
113
125
|
Example:
|
|
114
126
|
|
|
@@ -129,7 +141,7 @@ Removing a registered similarity algorithm.
|
|
|
129
141
|
|
|
130
142
|
Parameters:
|
|
131
143
|
|
|
132
|
-
|
|
144
|
+
`<String> algo` – name of the algorithm
|
|
133
145
|
|
|
134
146
|
### Filters
|
|
135
147
|
|
|
@@ -143,9 +155,9 @@ Adds a custom normalization filter. Needs to be passed a unique name and callbac
|
|
|
143
155
|
|
|
144
156
|
Parameters:
|
|
145
157
|
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
158
|
+
`<String> name` – filter name
|
|
159
|
+
`<Function> callback` – callback function implementing the filter
|
|
160
|
+
`<Int> priority` – priority of the filter
|
|
149
161
|
|
|
150
162
|
Example:
|
|
151
163
|
|
|
@@ -161,7 +173,7 @@ Removes a custom normalization filter.
|
|
|
161
173
|
|
|
162
174
|
Parameters:
|
|
163
175
|
|
|
164
|
-
|
|
176
|
+
`<String> name` – filter name
|
|
165
177
|
|
|
166
178
|
#### `pauseFilter( name )`
|
|
167
179
|
|
|
@@ -169,7 +181,7 @@ Pauses a custom normalization filter.
|
|
|
169
181
|
|
|
170
182
|
Parameters:
|
|
171
183
|
|
|
172
|
-
|
|
184
|
+
`<String> name` – filter name
|
|
173
185
|
|
|
174
186
|
#### `resumeFilter( name )`
|
|
175
187
|
|
|
@@ -177,7 +189,7 @@ Resumes a custom normalization filter.
|
|
|
177
189
|
|
|
178
190
|
Parameters:
|
|
179
191
|
|
|
180
|
-
|
|
192
|
+
`<String> name` – filter name
|
|
181
193
|
|
|
182
194
|
#### `clearFilter( name )`
|
|
183
195
|
|
|
@@ -191,10 +203,10 @@ Compares two strings using the specified algorithm. The method returns either th
|
|
|
191
203
|
|
|
192
204
|
Parameters:
|
|
193
205
|
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
206
|
+
`<String> algo` – name of the algorithm
|
|
207
|
+
`<String> a` – first string
|
|
208
|
+
`<String> b` – second string
|
|
209
|
+
`<Object> config` – configuration object
|
|
198
210
|
|
|
199
211
|
Example:
|
|
200
212
|
|
|
@@ -211,8 +223,8 @@ Tests the similarity between the base string and a given target string. Returns
|
|
|
211
223
|
|
|
212
224
|
Parameters:
|
|
213
225
|
|
|
214
|
-
|
|
215
|
-
|
|
226
|
+
`<String> str` – target string
|
|
227
|
+
`<Object> config` – configuration object
|
|
216
228
|
|
|
217
229
|
Example:
|
|
218
230
|
|
|
@@ -229,8 +241,8 @@ Tests the similarity of multiple strings against the base string. Returns an arr
|
|
|
229
241
|
|
|
230
242
|
Parameters:
|
|
231
243
|
|
|
232
|
-
|
|
233
|
-
|
|
244
|
+
`<String[]> arr` – array of strings
|
|
245
|
+
`<Object> config` – configuration object
|
|
234
246
|
|
|
235
247
|
Example:
|
|
236
248
|
|
|
@@ -247,8 +259,8 @@ Finds strings in an array that exceed a similarity threshold and sorts them by h
|
|
|
247
259
|
|
|
248
260
|
Parameters:
|
|
249
261
|
|
|
250
|
-
|
|
251
|
-
|
|
262
|
+
`<String[]> arr` – array of strings
|
|
263
|
+
`<Object> config` – configuration object
|
|
252
264
|
|
|
253
265
|
Example:
|
|
254
266
|
|
|
@@ -267,8 +279,8 @@ Finds the closest matching string from an array and returns them.
|
|
|
267
279
|
|
|
268
280
|
Parameters:
|
|
269
281
|
|
|
270
|
-
|
|
271
|
-
|
|
282
|
+
`<String[]> arr` – array of strings
|
|
283
|
+
`<Object> config` – configuration object
|
|
272
284
|
|
|
273
285
|
Example:
|
|
274
286
|
|
|
@@ -285,9 +297,9 @@ Generates a similarity matrix for an array of strings. Returns an 2D array that
|
|
|
285
297
|
|
|
286
298
|
Parameters:
|
|
287
299
|
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
300
|
+
`<String> algo` – name of the algorithm
|
|
301
|
+
`<String[]> arr` – array of strings
|
|
302
|
+
`<Object> config` – configuration object
|
|
291
303
|
|
|
292
304
|
Example:
|
|
293
305
|
|
|
@@ -308,24 +320,24 @@ The `CmpStr` package allows strings to be normalized before the similarity compa
|
|
|
308
320
|
|
|
309
321
|
#### Supported Flags
|
|
310
322
|
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
|
|
318
|
-
|
|
319
|
-
|
|
323
|
+
`s` – remove special chars
|
|
324
|
+
`w` – collapse whitespaces
|
|
325
|
+
`r` – remove repeated chars
|
|
326
|
+
`k` – keep only letters
|
|
327
|
+
`n` – ignore numbers
|
|
328
|
+
`t` – trim whitespaces
|
|
329
|
+
`i` – case insensitivity
|
|
330
|
+
`d` – decompose unicode
|
|
331
|
+
`u` – normalize unicode
|
|
320
332
|
|
|
321
|
-
#### `normalize(
|
|
333
|
+
#### `normalize( input [, flags = '' ] )`
|
|
322
334
|
|
|
323
335
|
The method for normalizing strings can also be called on its own, without comparing the similarity of two strings. This also applies all filters and reads or writes to the cache. This can be helpful if certain strings should be saved beforehand or different normalization options want to be tested.
|
|
324
336
|
|
|
325
337
|
Parameters:
|
|
326
338
|
|
|
327
|
-
|
|
328
|
-
|
|
339
|
+
`<String|String[]> input` – single string or array of strings to normalize
|
|
340
|
+
`<String> flags` normalization flags
|
|
329
341
|
|
|
330
342
|
Example:
|
|
331
343
|
|
|
@@ -334,6 +346,9 @@ const cmp = new CmpStr();
|
|
|
334
346
|
|
|
335
347
|
console.log( cmp.normalize( ' he123LLo ', 'nti' ) );
|
|
336
348
|
// Output: hello
|
|
349
|
+
|
|
350
|
+
console.log( cmp.normalize( [ 'Hello World!', 'CmpStr 123' ], 'nwti' ) );
|
|
351
|
+
// Output: [ 'hello world!', 'cmpstr' ]
|
|
337
352
|
```
|
|
338
353
|
|
|
339
354
|
### Configuration Object
|
|
@@ -344,9 +359,9 @@ It also contains `options` as an object of key-value pairs that are passed to th
|
|
|
344
359
|
|
|
345
360
|
Global config options:
|
|
346
361
|
|
|
347
|
-
|
|
348
|
-
|
|
349
|
-
|
|
362
|
+
`<String> flags` – normalization flags
|
|
363
|
+
`<Number> threshold` – similarity threshold between 0 and 1
|
|
364
|
+
`<Object> options` – options passed to the algorithm
|
|
350
365
|
|
|
351
366
|
Example:
|
|
352
367
|
|
|
@@ -369,9 +384,9 @@ console.log( cmp.match( [
|
|
|
369
384
|
|
|
370
385
|
## Asynchronous Support
|
|
371
386
|
|
|
372
|
-
The `CmpStrAsync` class provides asynchronous
|
|
387
|
+
The `CmpStrAsync` class provides an asynchronous wrapper for all comparison methods as well as the string normalization function. It is ideal for large datasets or non-blocking workflows.
|
|
373
388
|
|
|
374
|
-
The asynchronous class supports the methods `compareAsync`, `testAsync`, `batchTestAsync`, `matchAsync`, `closestAsync` and `similarityMatrixAsync`. Each of these methods returns a `Promise`.
|
|
389
|
+
The asynchronous class supports the methods `normalizeAsync`, `compareAsync`, `testAsync`, `batchTestAsync`, `matchAsync`, `closestAsync` and `similarityMatrixAsync`. Each of these methods returns a `Promise`.
|
|
375
390
|
|
|
376
391
|
For options, arguments and returned values, see the documentation above.
|
|
377
392
|
|
|
@@ -399,7 +414,7 @@ The Levenshtein distance between two strings is the minimum number of single-cha
|
|
|
399
414
|
|
|
400
415
|
Options:
|
|
401
416
|
|
|
402
|
-
|
|
417
|
+
`<Boolean> raw` – if true the raw distance is returned
|
|
403
418
|
|
|
404
419
|
#### Damerau-Levenshtein – `damerau`
|
|
405
420
|
|
|
@@ -407,7 +422,7 @@ The Damerau-Levenshtein distance differs from the classical Levenshtein distance
|
|
|
407
422
|
|
|
408
423
|
Options:
|
|
409
424
|
|
|
410
|
-
|
|
425
|
+
`<Boolean> raw` – if true the raw distance is returned
|
|
411
426
|
|
|
412
427
|
#### Jaro-Winkler – `jaro`
|
|
413
428
|
|
|
@@ -415,7 +430,7 @@ Jaro-Winkler is a string similarity metric that gives more weight to matching ch
|
|
|
415
430
|
|
|
416
431
|
Options:
|
|
417
432
|
|
|
418
|
-
|
|
433
|
+
`<Boolean> raw` – if true the raw distance is returned
|
|
419
434
|
|
|
420
435
|
#### Cosine Similarity – `cosine`
|
|
421
436
|
|
|
@@ -423,7 +438,7 @@ Cosine similarity is a measure how similar two vectors are. It's often used in t
|
|
|
423
438
|
|
|
424
439
|
Options:
|
|
425
440
|
|
|
426
|
-
|
|
441
|
+
`<String> delimiter` – term delimiter
|
|
427
442
|
|
|
428
443
|
#### Dice Coefficient – `dice`
|
|
429
444
|
|
|
@@ -447,9 +462,9 @@ The Needleman-Wunsch algorithm performs global alignment, aligning two strings e
|
|
|
447
462
|
|
|
448
463
|
Options:
|
|
449
464
|
|
|
450
|
-
|
|
451
|
-
|
|
452
|
-
|
|
465
|
+
`<Number> match` – score for a match
|
|
466
|
+
`<Number> mismatch` – penalty for a mismatch
|
|
467
|
+
`<Number> gap` – penalty for a gap
|
|
453
468
|
|
|
454
469
|
#### Smith-Waterman – `smithWaterman`
|
|
455
470
|
|
|
@@ -457,9 +472,9 @@ The Smith-Waterman algorithm performs local alignment, finding the best matching
|
|
|
457
472
|
|
|
458
473
|
Options:
|
|
459
474
|
|
|
460
|
-
|
|
461
|
-
|
|
462
|
-
|
|
475
|
+
`<Number> match` – score for a match
|
|
476
|
+
`<Number> mismatch` – penalty for a mismatch
|
|
477
|
+
`<Number> gap` – penalty for a gap
|
|
463
478
|
|
|
464
479
|
#### q-Gram – `qGram`
|
|
465
480
|
|
|
@@ -467,7 +482,7 @@ Q-gram similarity is a string-matching algorithm that compares two strings by br
|
|
|
467
482
|
|
|
468
483
|
Options:
|
|
469
484
|
|
|
470
|
-
|
|
485
|
+
`<Int> q` length of substrings
|
|
471
486
|
|
|
472
487
|
### Phonetic Algorithms
|
|
473
488
|
|
|
@@ -477,8 +492,8 @@ The Soundex algorithm generates a phonetic representation of a string based on h
|
|
|
477
492
|
|
|
478
493
|
Options:
|
|
479
494
|
|
|
480
|
-
|
|
481
|
-
|
|
482
|
-
|
|
483
|
-
|
|
484
|
-
|
|
495
|
+
`<String> lang` – language code for predefined setups (e.g., `en`, `de`)
|
|
496
|
+
`<Boolean> raw` – if true, returns the raw sound index codes
|
|
497
|
+
`<Object> mapping` – custom phonetic mapping (overrides predefined)
|
|
498
|
+
`<String> exclude` – characters to exclude from the input (overrides predefined)
|
|
499
|
+
`<Number> maxLength` – maximum length of the phonetic code
|
package/package.json
CHANGED
package/src/CmpStr.d.ts
CHANGED
|
@@ -17,12 +17,16 @@ export declare class CmpStr {
|
|
|
17
17
|
|
|
18
18
|
setStr ( str: string ) : boolean;
|
|
19
19
|
|
|
20
|
-
|
|
20
|
+
getStr () : string;
|
|
21
|
+
|
|
22
|
+
listAlgo ( loadedOnly?: boolean ) : string[];
|
|
21
23
|
|
|
22
24
|
isAlgo ( algo: string ) : boolean;
|
|
23
25
|
|
|
24
26
|
setAlgo ( algo: string ) : boolean;
|
|
25
27
|
|
|
28
|
+
getAlgo () : string;
|
|
29
|
+
|
|
26
30
|
addAlgo ( algo: string, callback: (
|
|
27
31
|
a: string, b: string, ...args : any
|
|
28
32
|
) => number | any, useIt?: boolean ) : boolean;
|
|
@@ -45,7 +49,9 @@ export declare class CmpStr {
|
|
|
45
49
|
|
|
46
50
|
setFlags( flags: string ) : void;
|
|
47
51
|
|
|
48
|
-
|
|
52
|
+
getFlags () : string;
|
|
53
|
+
|
|
54
|
+
normalize ( input: string|string[], flags?: string ) : string|string[];
|
|
49
55
|
|
|
50
56
|
clearCache () : boolean;
|
|
51
57
|
|
package/src/CmpStr.js
CHANGED
|
@@ -20,6 +20,12 @@
|
|
|
20
20
|
|
|
21
21
|
module.exports = class CmpStr {
|
|
22
22
|
|
|
23
|
+
/**
|
|
24
|
+
* --------------------------------------------------
|
|
25
|
+
* Global Variables
|
|
26
|
+
* --------------------------------------------------
|
|
27
|
+
*/
|
|
28
|
+
|
|
23
29
|
/**
|
|
24
30
|
* all pre-defined similarity algorithms
|
|
25
31
|
*
|
|
@@ -41,6 +47,15 @@ module.exports = class CmpStr {
|
|
|
41
47
|
soundex: './algorithms/soundex'
|
|
42
48
|
};
|
|
43
49
|
|
|
50
|
+
/**
|
|
51
|
+
* stores the names of loaded algorithms
|
|
52
|
+
*
|
|
53
|
+
* @since 2.0.2
|
|
54
|
+
* @private
|
|
55
|
+
* @type {Set<String>}
|
|
56
|
+
*/
|
|
57
|
+
#loadedAlgo = new Set ();
|
|
58
|
+
|
|
44
59
|
/**
|
|
45
60
|
* normalized strings cache
|
|
46
61
|
*
|
|
@@ -61,28 +76,43 @@ module.exports = class CmpStr {
|
|
|
61
76
|
* default normalization flags
|
|
62
77
|
* set by setFlags()
|
|
63
78
|
*
|
|
64
|
-
* @
|
|
79
|
+
* @private
|
|
80
|
+
* @type {String}
|
|
81
|
+
*/
|
|
82
|
+
#flags = '';
|
|
83
|
+
|
|
84
|
+
/**
|
|
85
|
+
* current algorithm to use for similarity calculations
|
|
86
|
+
* set by setAlgo(), addAlgo() or constructor()
|
|
87
|
+
*
|
|
88
|
+
* @private
|
|
65
89
|
* @type {String}
|
|
66
90
|
*/
|
|
67
|
-
|
|
91
|
+
#algo;
|
|
68
92
|
|
|
69
93
|
/**
|
|
70
94
|
* base string for comparison
|
|
71
95
|
* set by setStr or constructor()
|
|
72
96
|
*
|
|
73
|
-
* @
|
|
97
|
+
* @private
|
|
74
98
|
* @type {String}
|
|
75
99
|
*/
|
|
76
|
-
str;
|
|
100
|
+
#str;
|
|
77
101
|
|
|
78
102
|
/**
|
|
79
|
-
*
|
|
80
|
-
* set by setAlgo(), addAlgo() or constructor()
|
|
103
|
+
* stores the current ready state
|
|
81
104
|
*
|
|
82
|
-
* @
|
|
83
|
-
* @
|
|
105
|
+
* @since 2.0.2
|
|
106
|
+
* @private
|
|
107
|
+
* @type {Boolean}
|
|
108
|
+
*/
|
|
109
|
+
#readyState = false;
|
|
110
|
+
|
|
111
|
+
/**
|
|
112
|
+
* --------------------------------------------------
|
|
113
|
+
* Constructor
|
|
114
|
+
* --------------------------------------------------
|
|
84
115
|
*/
|
|
85
|
-
algo;
|
|
86
116
|
|
|
87
117
|
/**
|
|
88
118
|
* initializes a CmpStr instance
|
|
@@ -107,6 +137,12 @@ module.exports = class CmpStr {
|
|
|
107
137
|
|
|
108
138
|
};
|
|
109
139
|
|
|
140
|
+
/**
|
|
141
|
+
* --------------------------------------------------
|
|
142
|
+
* Ready State
|
|
143
|
+
* --------------------------------------------------
|
|
144
|
+
*/
|
|
145
|
+
|
|
110
146
|
/**
|
|
111
147
|
* checks whether string and algorithm are set correctly
|
|
112
148
|
*
|
|
@@ -114,11 +150,23 @@ module.exports = class CmpStr {
|
|
|
114
150
|
*/
|
|
115
151
|
isReady () {
|
|
116
152
|
|
|
117
|
-
return
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
153
|
+
return this.#readyState;
|
|
154
|
+
|
|
155
|
+
};
|
|
156
|
+
|
|
157
|
+
/**
|
|
158
|
+
* updates the readiness state
|
|
159
|
+
*
|
|
160
|
+
* @since 2.0.2
|
|
161
|
+
* @private
|
|
162
|
+
*/
|
|
163
|
+
#updateReadyState () {
|
|
164
|
+
|
|
165
|
+
this.#readyState = (
|
|
166
|
+
typeof this.#algo === 'string' &&
|
|
167
|
+
this.isAlgo( this.#algo ) &&
|
|
168
|
+
typeof this.#str === 'string' &&
|
|
169
|
+
this.#str.length !== 0
|
|
122
170
|
);
|
|
123
171
|
|
|
124
172
|
};
|
|
@@ -126,12 +174,13 @@ module.exports = class CmpStr {
|
|
|
126
174
|
/**
|
|
127
175
|
* checks ready state and throws an error if not
|
|
128
176
|
*
|
|
177
|
+
* @private
|
|
129
178
|
* @returns {Boolean} true if ready
|
|
130
179
|
* @throws {Error} if CmpStr is not ready
|
|
131
180
|
*/
|
|
132
|
-
|
|
181
|
+
#checkReady () {
|
|
133
182
|
|
|
134
|
-
if ( !this
|
|
183
|
+
if ( !this.#readyState ) {
|
|
135
184
|
|
|
136
185
|
throw new Error(
|
|
137
186
|
`CmpStr instance is not ready. Ensure the algorithm and base string are set.`
|
|
@@ -143,6 +192,12 @@ module.exports = class CmpStr {
|
|
|
143
192
|
|
|
144
193
|
};
|
|
145
194
|
|
|
195
|
+
/**
|
|
196
|
+
* --------------------------------------------------
|
|
197
|
+
* Base String
|
|
198
|
+
* --------------------------------------------------
|
|
199
|
+
*/
|
|
200
|
+
|
|
146
201
|
/**
|
|
147
202
|
* sets the base string for comparison
|
|
148
203
|
*
|
|
@@ -151,12 +206,26 @@ module.exports = class CmpStr {
|
|
|
151
206
|
*/
|
|
152
207
|
setStr ( str ) {
|
|
153
208
|
|
|
154
|
-
this
|
|
209
|
+
this.#str = String ( str );
|
|
210
|
+
|
|
211
|
+
this.#updateReadyState();
|
|
155
212
|
|
|
156
213
|
return true;
|
|
157
214
|
|
|
158
215
|
};
|
|
159
216
|
|
|
217
|
+
/**
|
|
218
|
+
* gets the base string for comparison
|
|
219
|
+
*
|
|
220
|
+
* @since 2.0.2
|
|
221
|
+
* @returns {String} base string
|
|
222
|
+
*/
|
|
223
|
+
getStr () {
|
|
224
|
+
|
|
225
|
+
return this.#str;
|
|
226
|
+
|
|
227
|
+
};
|
|
228
|
+
|
|
160
229
|
/**
|
|
161
230
|
* --------------------------------------------------
|
|
162
231
|
* Algorithms
|
|
@@ -166,11 +235,14 @@ module.exports = class CmpStr {
|
|
|
166
235
|
/**
|
|
167
236
|
* list all registered similarity algorithms
|
|
168
237
|
*
|
|
238
|
+
* @param {Boolean} [loadedOnly=false] it true, only loaded algorithm names are returned
|
|
169
239
|
* @returns {String[]} array of algorithm names
|
|
170
240
|
*/
|
|
171
|
-
listAlgo () {
|
|
241
|
+
listAlgo ( loadedOnly = false ) {
|
|
172
242
|
|
|
173
|
-
return
|
|
243
|
+
return loadedOnly
|
|
244
|
+
? [ ...this.#loadedAlgo ]
|
|
245
|
+
: [ ...Object.keys( this.#algorithms ) ];
|
|
174
246
|
|
|
175
247
|
};
|
|
176
248
|
|
|
@@ -194,9 +266,11 @@ module.exports = class CmpStr {
|
|
|
194
266
|
*/
|
|
195
267
|
setAlgo ( algo ) {
|
|
196
268
|
|
|
197
|
-
if ( this
|
|
269
|
+
if ( this.#loadAlgo( algo ) ) {
|
|
198
270
|
|
|
199
|
-
this
|
|
271
|
+
this.#algo = algo;
|
|
272
|
+
|
|
273
|
+
this.#updateReadyState();
|
|
200
274
|
|
|
201
275
|
return true;
|
|
202
276
|
|
|
@@ -204,6 +278,18 @@ module.exports = class CmpStr {
|
|
|
204
278
|
|
|
205
279
|
};
|
|
206
280
|
|
|
281
|
+
/**
|
|
282
|
+
* gets the current algorithm to use for similarity calculations
|
|
283
|
+
*
|
|
284
|
+
* @since 2.0.2
|
|
285
|
+
* @returns {String} name of the algorithm
|
|
286
|
+
*/
|
|
287
|
+
getAlgo () {
|
|
288
|
+
|
|
289
|
+
return this.#algo;
|
|
290
|
+
|
|
291
|
+
};
|
|
292
|
+
|
|
207
293
|
/**
|
|
208
294
|
* adds a new similarity algorithm
|
|
209
295
|
*
|
|
@@ -255,11 +341,15 @@ module.exports = class CmpStr {
|
|
|
255
341
|
|
|
256
342
|
delete this.#algorithms[ algo ];
|
|
257
343
|
|
|
258
|
-
|
|
344
|
+
this.#loadedAlgo.delete( algo );
|
|
345
|
+
|
|
346
|
+
if ( this.#algo === algo ) {
|
|
259
347
|
|
|
260
348
|
/* reset current algorithm if it was removed */
|
|
261
349
|
|
|
262
|
-
this
|
|
350
|
+
this.#algo = undefined;
|
|
351
|
+
|
|
352
|
+
this.#updateReadyState();
|
|
263
353
|
|
|
264
354
|
}
|
|
265
355
|
|
|
@@ -278,18 +368,25 @@ module.exports = class CmpStr {
|
|
|
278
368
|
/**
|
|
279
369
|
* lazy-loads the specified algorithm module
|
|
280
370
|
*
|
|
371
|
+
* @private
|
|
281
372
|
* @param {String} algo name of the similarity algorithm
|
|
282
373
|
* @returns {Boolean} true if the algorithm is loaded
|
|
283
374
|
* @throws {Error} if the algorithm cannot be loaded or is not defined
|
|
284
375
|
*/
|
|
285
|
-
|
|
376
|
+
#loadAlgo ( algo ) {
|
|
286
377
|
|
|
287
|
-
if ( this.
|
|
378
|
+
if ( this.#loadedAlgo.has( algo ) ) {
|
|
379
|
+
|
|
380
|
+
return true;
|
|
381
|
+
|
|
382
|
+
} else if ( this.isAlgo( algo ) ) {
|
|
288
383
|
|
|
289
384
|
let typeOf = typeof this.#algorithms[ algo ];
|
|
290
385
|
|
|
291
386
|
if ( typeOf === 'function' ) {
|
|
292
387
|
|
|
388
|
+
this.#loadedAlgo.add( algo );
|
|
389
|
+
|
|
293
390
|
return true;
|
|
294
391
|
|
|
295
392
|
} else if ( typeOf === 'string' ) {
|
|
@@ -302,6 +399,8 @@ module.exports = class CmpStr {
|
|
|
302
399
|
this.#algorithms[ algo ]
|
|
303
400
|
);
|
|
304
401
|
|
|
402
|
+
this.#loadedAlgo.add( algo );
|
|
403
|
+
|
|
305
404
|
return true;
|
|
306
405
|
|
|
307
406
|
} catch ( err ) {
|
|
@@ -482,11 +581,12 @@ module.exports = class CmpStr {
|
|
|
482
581
|
/**
|
|
483
582
|
* applies all active filters to a string
|
|
484
583
|
*
|
|
584
|
+
* @private
|
|
485
585
|
* @param {String} str string to process
|
|
486
586
|
* @returns {String} filtered string
|
|
487
587
|
* @throws {Error} if applying filters cause an error
|
|
488
588
|
*/
|
|
489
|
-
|
|
589
|
+
#applyFilters ( str ) {
|
|
490
590
|
|
|
491
591
|
try {
|
|
492
592
|
|
|
@@ -524,7 +624,19 @@ module.exports = class CmpStr {
|
|
|
524
624
|
*/
|
|
525
625
|
setFlags ( flags = '' ) {
|
|
526
626
|
|
|
527
|
-
this
|
|
627
|
+
this.#flags = String ( flags );
|
|
628
|
+
|
|
629
|
+
};
|
|
630
|
+
|
|
631
|
+
/**
|
|
632
|
+
* get default normalization flags
|
|
633
|
+
*
|
|
634
|
+
* @since 2.0.2
|
|
635
|
+
* @returns {String} normalization flags
|
|
636
|
+
*/
|
|
637
|
+
getFlags () {
|
|
638
|
+
|
|
639
|
+
return this.#flags;
|
|
528
640
|
|
|
529
641
|
};
|
|
530
642
|
|
|
@@ -544,57 +656,73 @@ module.exports = class CmpStr {
|
|
|
544
656
|
* d :: decompose unicode
|
|
545
657
|
* u :: normalize unicode
|
|
546
658
|
*
|
|
547
|
-
* @param {String} string string to normalize
|
|
659
|
+
* @param {String|String[]} string string(s) to normalize
|
|
548
660
|
* @param {String} [flags=''] normalization flags
|
|
549
|
-
* @returns {String} normalized string
|
|
661
|
+
* @returns {String|String[]} normalized string(s)
|
|
550
662
|
* @throws {Error} if normalization cause an error
|
|
551
663
|
*/
|
|
552
|
-
normalize (
|
|
664
|
+
normalize ( input, flags = '' ) {
|
|
553
665
|
|
|
554
|
-
|
|
666
|
+
const processStr = ( str ) => {
|
|
555
667
|
|
|
556
|
-
|
|
668
|
+
let res = String ( str );
|
|
557
669
|
|
|
558
|
-
|
|
670
|
+
/* use normalized string from cache to increase performance */
|
|
559
671
|
|
|
560
|
-
|
|
672
|
+
let key = `${res}::${flags}`;
|
|
561
673
|
|
|
562
|
-
|
|
674
|
+
if ( this.#cache.has( key ) ) {
|
|
563
675
|
|
|
564
|
-
|
|
676
|
+
return this.#cache.get( key );
|
|
565
677
|
|
|
566
|
-
|
|
678
|
+
}
|
|
567
679
|
|
|
568
|
-
|
|
680
|
+
/* apply custom filters */
|
|
569
681
|
|
|
570
|
-
|
|
682
|
+
res = this.#applyFilters( res );
|
|
571
683
|
|
|
572
|
-
|
|
684
|
+
/* normalize using flags */
|
|
573
685
|
|
|
574
|
-
|
|
575
|
-
if ( flags.includes( 'w' ) ) res = res.replace( /\s+/g, ' ' );
|
|
576
|
-
if ( flags.includes( 'r' ) ) res = res.replace( /(.)\1+/g, '$1' );
|
|
577
|
-
if ( flags.includes( 'k' ) ) res = res.replace( /[^a-z]/gi, '' );
|
|
578
|
-
if ( flags.includes( 'n' ) ) res = res.replace( /[0-9]/g, '' );
|
|
579
|
-
if ( flags.includes( 't' ) ) res = res.trim();
|
|
580
|
-
if ( flags.includes( 'i' ) ) res = res.toLowerCase();
|
|
581
|
-
if ( flags.includes( 'd' ) ) res = res.normalize( 'NFD' ).replace( /[\u0300-\u036f]/g, '' );
|
|
582
|
-
if ( flags.includes( 'u' ) ) res = res.normalize( 'NFC' );
|
|
686
|
+
try {
|
|
583
687
|
|
|
584
|
-
|
|
688
|
+
if ( flags.includes( 's' ) ) res = res.replace( /[^a-z0-9]/gi, '' );
|
|
689
|
+
if ( flags.includes( 'w' ) ) res = res.replace( /\s+/g, ' ' );
|
|
690
|
+
if ( flags.includes( 'r' ) ) res = res.replace( /(.)\1+/g, '$1' );
|
|
691
|
+
if ( flags.includes( 'k' ) ) res = res.replace( /[^a-z]/gi, '' );
|
|
692
|
+
if ( flags.includes( 'n' ) ) res = res.replace( /[0-9]/g, '' );
|
|
693
|
+
if ( flags.includes( 't' ) ) res = res.trim();
|
|
694
|
+
if ( flags.includes( 'i' ) ) res = res.toLowerCase();
|
|
695
|
+
if ( flags.includes( 'd' ) ) res = res.normalize( 'NFD' ).replace( /[\u0300-\u036f]/g, '' );
|
|
696
|
+
if ( flags.includes( 'u' ) ) res = res.normalize( 'NFC' );
|
|
585
697
|
|
|
586
|
-
|
|
587
|
-
|
|
588
|
-
|
|
589
|
-
|
|
698
|
+
} catch ( err ) {
|
|
699
|
+
|
|
700
|
+
throw new Error (
|
|
701
|
+
`Error while normalization.`,
|
|
702
|
+
{ cause: err }
|
|
703
|
+
);
|
|
704
|
+
|
|
705
|
+
}
|
|
706
|
+
|
|
707
|
+
/* store the normalized string in the cache */
|
|
708
|
+
|
|
709
|
+
this.#cache.set( key, res );
|
|
710
|
+
|
|
711
|
+
return res;
|
|
590
712
|
|
|
591
713
|
}
|
|
592
714
|
|
|
593
|
-
/*
|
|
715
|
+
/* processing multiple string */
|
|
716
|
+
|
|
717
|
+
if ( Array.isArray( input ) ) {
|
|
594
718
|
|
|
595
|
-
|
|
719
|
+
return input.map(
|
|
720
|
+
( str ) => processStr( str )
|
|
721
|
+
);
|
|
722
|
+
|
|
723
|
+
}
|
|
596
724
|
|
|
597
|
-
return
|
|
725
|
+
return processStr( input );
|
|
598
726
|
|
|
599
727
|
};
|
|
600
728
|
|
|
@@ -629,7 +757,7 @@ module.exports = class CmpStr {
|
|
|
629
757
|
*/
|
|
630
758
|
compare ( algo, a, b, config = {} ) {
|
|
631
759
|
|
|
632
|
-
if ( this
|
|
760
|
+
if ( this.#loadAlgo( algo ) ) {
|
|
633
761
|
|
|
634
762
|
/* handle trivial cases */
|
|
635
763
|
|
|
@@ -639,7 +767,7 @@ module.exports = class CmpStr {
|
|
|
639
767
|
/* apply similarity algorithm */
|
|
640
768
|
|
|
641
769
|
const {
|
|
642
|
-
flags = this
|
|
770
|
+
flags = this.#flags,
|
|
643
771
|
options = {}
|
|
644
772
|
} = config;
|
|
645
773
|
|
|
@@ -674,11 +802,11 @@ module.exports = class CmpStr {
|
|
|
674
802
|
*/
|
|
675
803
|
test ( str, config = {} ) {
|
|
676
804
|
|
|
677
|
-
if ( this
|
|
805
|
+
if ( this.#checkReady() ) {
|
|
678
806
|
|
|
679
807
|
return this.compare(
|
|
680
|
-
this
|
|
681
|
-
this
|
|
808
|
+
this.#algo,
|
|
809
|
+
this.#str, str,
|
|
682
810
|
config
|
|
683
811
|
);
|
|
684
812
|
|
|
@@ -695,13 +823,13 @@ module.exports = class CmpStr {
|
|
|
695
823
|
*/
|
|
696
824
|
batchTest ( arr, config = {} ) {
|
|
697
825
|
|
|
698
|
-
if ( this
|
|
826
|
+
if ( this.#checkReady() ) {
|
|
699
827
|
|
|
700
828
|
return [ ...arr ].map( ( str ) => ( {
|
|
701
829
|
target: str,
|
|
702
830
|
match: this.compare(
|
|
703
|
-
this
|
|
704
|
-
this
|
|
831
|
+
this.#algo,
|
|
832
|
+
this.#str, str,
|
|
705
833
|
config
|
|
706
834
|
)
|
|
707
835
|
} ) );
|
|
@@ -763,7 +891,7 @@ module.exports = class CmpStr {
|
|
|
763
891
|
*/
|
|
764
892
|
similarityMatrix ( algo, arr, config = {} ) {
|
|
765
893
|
|
|
766
|
-
if ( this
|
|
894
|
+
if ( this.#loadAlgo( algo ) ) {
|
|
767
895
|
|
|
768
896
|
delete config?.options?.raw;
|
|
769
897
|
|
package/src/CmpStrAsync.d.ts
CHANGED
|
@@ -2,6 +2,8 @@ import { CmpStr, Config, BatchResult } from './CmpStr';
|
|
|
2
2
|
|
|
3
3
|
export declare class CmpStrAsync extends CmpStr {
|
|
4
4
|
|
|
5
|
+
normalizeAsync ( input: string|string[], flags?: string ) : string|string[];
|
|
6
|
+
|
|
5
7
|
compareAsync ( algo: string, a: string, b: string, config?: Config ) : Promise<number | any>;
|
|
6
8
|
|
|
7
9
|
testAsync ( str: string, config?: Config ) : Promise<number | any>;
|
package/src/CmpStrAsync.js
CHANGED
|
@@ -40,9 +40,10 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
40
40
|
};
|
|
41
41
|
|
|
42
42
|
/**
|
|
43
|
-
* @private
|
|
44
43
|
* generic async wrapper for methods
|
|
44
|
+
* @async
|
|
45
45
|
*
|
|
46
|
+
* @private
|
|
46
47
|
* @param {Function} method method to call
|
|
47
48
|
* @param {...any} args arguments to pass to the method
|
|
48
49
|
* @returns {Promise} Promise resolving the result of the method
|
|
@@ -76,8 +77,25 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
76
77
|
*/
|
|
77
78
|
|
|
78
79
|
/**
|
|
79
|
-
*
|
|
80
|
+
* normalizes a string by chainable options; uses cache to increase
|
|
81
|
+
* performance and custom filters for advanced behavior
|
|
80
82
|
*
|
|
83
|
+
* @since 2.0.2
|
|
84
|
+
* @param {String|String[]} input string(s) to normalize
|
|
85
|
+
* @param {String} [flags=''] normalization flags
|
|
86
|
+
* @returns {Promise} Promise resolving string normalization
|
|
87
|
+
*/
|
|
88
|
+
normalizeAsync ( input, flags = '' ) {
|
|
89
|
+
|
|
90
|
+
return this.#asyncWrapper(
|
|
91
|
+
this.normalize,
|
|
92
|
+
input, flags
|
|
93
|
+
);
|
|
94
|
+
|
|
95
|
+
};
|
|
96
|
+
|
|
97
|
+
/**
|
|
98
|
+
* compares two string a and b using the passed algorithm
|
|
81
99
|
* @async
|
|
82
100
|
*
|
|
83
101
|
* @param {String} algo name of the algorithm
|
|
@@ -98,7 +116,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
98
116
|
/**
|
|
99
117
|
* tests the similarity between the base string and a target string
|
|
100
118
|
* using the current algorithm
|
|
101
|
-
*
|
|
102
119
|
* @async
|
|
103
120
|
*
|
|
104
121
|
* @param {String} str target string
|
|
@@ -116,7 +133,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
116
133
|
|
|
117
134
|
/**
|
|
118
135
|
* tests the similarity of multiple strings against the base string
|
|
119
|
-
*
|
|
120
136
|
* @async
|
|
121
137
|
*
|
|
122
138
|
* @param {String[]} arr array of strings
|
|
@@ -135,7 +151,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
135
151
|
/**
|
|
136
152
|
* finds strings in an array that exceed a similarity threshold
|
|
137
153
|
* returns the array sorted by highest similarity
|
|
138
|
-
*
|
|
139
154
|
* @async
|
|
140
155
|
*
|
|
141
156
|
* @param {String[]} arr array of strings
|
|
@@ -153,7 +168,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
153
168
|
|
|
154
169
|
/**
|
|
155
170
|
* finds the closest matching string from an array
|
|
156
|
-
*
|
|
157
171
|
* @async
|
|
158
172
|
*
|
|
159
173
|
* @param {String[]} arr array of strings
|
|
@@ -171,7 +185,6 @@ module.exports = class CmpStrAsync extends CmpStr {
|
|
|
171
185
|
|
|
172
186
|
/**
|
|
173
187
|
* generate a similarity matrix for an array of strings
|
|
174
|
-
*
|
|
175
188
|
* @async
|
|
176
189
|
*
|
|
177
190
|
* @param {String} algo name of the algorithm
|