@sepiariver/unique-set 1.1.1 → 2.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,10 +1,86 @@
1
1
  # @sepiariver/unique-set
2
2
 
3
- Extends the `has` and `add` methods on the native JavaScript `Set` object to use [fast-deep-equal](https://www.npmjs.com/package/fast-deep-equal) as the equality algorithm.
3
+ Extends the native `Set` class to deeply compare using [fast-deep-equal](https://www.npmjs.com/package/fast-deep-equal), with optional Bloom filter optimization.
4
4
 
5
- The extended methods iterate through the elements of the `UniqueSet` until equality is found. If no elements match, the entire `UniqueSet` would have been iterated to determine so. However fast `fast-deep-equal` is, calling it in a loop like this makes performance many times poorer than the native `Set`. For datasets greater than a thousand elements, there is probably a better way to achieve what you're trying to do. Otherwise, `UniqueSet` is convenient.
5
+ Supports ESM and CommonJS.
6
6
 
7
- Requires @babel/core 7+
7
+ ```js
8
+ import { BloomSet, UniqueSet } from '@sepiariver/unique-set';
9
+ ```
10
+
11
+ ```js
12
+ const { BloomSet, UniqueSet } = require('@sepiariver/unique-set');
13
+ ```
14
+
15
+ WARNING: This version exports 2 classes instead of a single default class, breaking b/c with version 1.
16
+
17
+ The overridden methods iterate through the elements of the `UniqueSet` deeply comparing equality until existence is found. If no elements match, the entire `UniqueSet` would have been iterated. However fast `fast-deep-equal` is [reported to be](https://github.com/epoberezkin/fast-deep-equal?tab=readme-ov-file#performance-benchmark), its time complexity is dependent on the depth of objects being compared. Calling it in a loop makes performance many, many times worse than the native `Set`.
18
+
19
+ _For datasets greater than a thousand elements, there is probably a better way to achieve what you're trying to do._ Otherwise, `UniqueSet` is convenient.
20
+
21
+ **UPDATE:** Version 2 ships with `BloomSet`, which uses a Bloom filter to greatly optimize absence checks, falling back to `fast-deep-equal` to validate potential false positives. This class is useful for larger datasets, up to the tens of thousands or even 100k depending largely on configuration. It performs about 3-10 times faster than `UniqueSet` for datasets greater than 1000 elements. Less than a few hundred (~400) elements, `UniqueSet` can be faster—it all depens on your dataset and config options. In all scenarios except the absolute best case, BloomSet is still orders of magnitude slower than the native `Set`, but if deep equality is required, this is a decent option.
22
+
23
+ Highly recommended: experiment with config options to find the best performance for your use case.
24
+
25
+ IMPORTANT: The `delete` method is unmodified in both classes. In the case of duplicate objects that are equivalent but have different references, the results of `delete` operations may be unexpected.
26
+
27
+ ## Config Options
28
+
29
+ ### Constructor Signature
30
+
31
+ `new BloomSet(iterable = [], options = { size, hashCount });`
32
+
33
+ ### Options
34
+
35
+ The options object allows you to customize the behavior and performance of the BloomSet. The following properties can be configured:
36
+
37
+ #### 1. size (number)
38
+
39
+ Description: Specifies the size of the bit array used internally by the Bloom filter. This directly impacts the memory usage and false positive probability.
40
+
41
+ Default: 6,553,577 (a prime number using roughly 800 KB of memory).
42
+
43
+ Recommendations:
44
+
45
+ For datasets with ~100,000 elements, this default size provides excellent performance (compared against `UniqueSet`) with minimal (< 1) false positives.
46
+
47
+ Larger datasets may require increasing the size for lower false positive rates. Remember though, false positives are mitigated by a fallback to `fast-deep-equal`, so you may be able to squeeze more performance from a higher tolerance for false positives, depending on your dataset.
48
+
49
+ #### 2. hashCount (number)
50
+
51
+ Description: Specifies the number of hash functions used by the Bloom filter. This impacts both the false positive probability and the computational cost of adding/checking elements.
52
+
53
+ Default: 7
54
+
55
+ ### Examples
56
+
57
+ Default Configuration:
58
+
59
+ ```js
60
+ const bloomSet = new BloomSet();
61
+ bloomSet.add("example");
62
+ console.log(bloomSet.has("example")); // true
63
+ ```
64
+
65
+ Custom Configuration for Larger Datasets:
66
+
67
+ Example 28,755,000 bit array size uses roughly 3.5 MB of memory, but this configuration is robust against datasets of something like 1M elements. The practicality of using BloomSet with that many elements is low, due to the performance hit of deep equality checks.
68
+
69
+ ```js
70
+ const bloomSet = new BloomSet([], { size: 28755000, hashCount: 20 });
71
+ bloomSet.add("custom");
72
+ console.log(bloomSet.has("custom")); // true
73
+ ```
74
+
75
+ ### Considerations
76
+
77
+ - Memory Usage: The bit array uses size / 8 bytes of memory. Even at 800 KB, initializing 1250 BloomSets in the same scope would use a gigabyte of memory.
78
+ - False Positive Rate: The probability of a false positive is influenced by size, hashCount, and the number of elements. Adjust these values to balance performance and accuracy for your dataset.
79
+
80
+ #### Further Tuning
81
+
82
+ - Use a larger size for datasets exceeding 100,000 elements.
83
+ - Reduce hashCount if performance is critical and your dataset contains very few duplicates.
8
84
 
9
85
  ## Installation
10
86
 
@@ -15,7 +91,7 @@ npm install @sepiariver/unique-set
15
91
  ## Usage
16
92
 
17
93
  ```js
18
- const UniqueSet = require('@sepiariver/unique-set');
94
+ const { BloomSet, UniqueSet } = require('@sepiariver/unique-set');
19
95
 
20
96
  const data = [
21
97
  "string",
@@ -38,13 +114,21 @@ const data = [
38
114
  [1, 2, 3],
39
115
  ];
40
116
 
41
- let unique1 = new UniqueSet();
117
+ const unique1 = new UniqueSet();
42
118
  data.forEach((el) => {
43
119
  unique1.add(el);
44
120
  });
45
- let unique2 = new UniqueSet(data);
121
+ const unique2 = new UniqueSet(data);
46
122
  console.log(unique1.size); // 6 instead of 8 with Set
47
123
  console.log(unique2.size); // 6
124
+
125
+ const bloom1 = new BloomSet();
126
+ data.forEach((el) => {
127
+ bloom1.add(el);
128
+ });
129
+ const bloom2 = new BloomSet(data);
130
+ console.log(bloom1.size); // 6 instead of 8 with Set
131
+ console.log(bloom2.size); // 6
48
132
  ```
49
133
 
50
134
  ## Testing
@@ -56,3 +140,7 @@ console.log(unique2.size); // 6
56
140
  ## Contributing
57
141
 
58
142
  Submit pull requests to [https://github.com/sepiariver/unique-set/pulls]
143
+
144
+ ## Issues
145
+
146
+ Issue reporting is encouraged: [https://github.com/sepiariver/unique-set/issues]
@@ -0,0 +1,41 @@
1
+ /** A `Set` extension that ensures uniqueness of items using deep equality checks. */
2
+ declare class UniqueSet<T> extends Set<T> {
3
+ /*** @throws TypeError If the input is not iterable. */
4
+ constructor(iterable?: Iterable<T>);
5
+ /**
6
+ * Determines whether an object is in the UniqueSet using deep equality.
7
+ * @param o The object to check for presence in the UniqueSet.
8
+ * @returns `true` if the object is found, `false` otherwise.
9
+ */
10
+ has(o: T): boolean;
11
+ /**
12
+ * Adds a new object to the UniqueSet if it is not already present.
13
+ * @param o The object to add to the UniqueSet.
14
+ * @returns The `UniqueSet` instance, allowing for chaining.
15
+ */
16
+ add(o: T): this;
17
+ }
18
+ /** A `Set` extension that uses a Bloom filter for fast existence checks combined with deep equality for accuracy. */
19
+ declare class BloomSet<T> extends Set<T> {
20
+ #private;
21
+ /**
22
+ * Creates a new `BloomSet` instance.
23
+ * @param iterable Optional: an iterable object with which to initialize the BloomSet.
24
+ * @param options Bloom filter configuration options.
25
+ * @param options.size The size of the Bloom filter's bit array. Defaults to 6553577.
26
+ * @param options.hashCount The number of hash functions to use. Defaults to 7.
27
+ * @throws TypeError If the input is not iterable.
28
+ */
29
+ constructor(iterable?: Iterable<T>, options?: {
30
+ size?: number;
31
+ hashCount?: number;
32
+ });
33
+ /** Determines existence of an object in the BloomSet using the Bloom filter and deep equality */
34
+ has(o: T): boolean;
35
+ /** Adds a new object to the BloomSet if it is not already present.
36
+ * @returns The `BloomSet` instance, allowing for chaining.
37
+ */
38
+ add(o: T): this;
39
+ }
40
+
41
+ export { BloomSet, UniqueSet };
package/dist/index.js CHANGED
@@ -1,12 +1,27 @@
1
1
  "use strict";
2
2
 
3
- function _typeof(obj) { "@babel/helpers - typeof"; return _typeof = "function" == typeof Symbol && "symbol" == typeof Symbol.iterator ? function (obj) { return typeof obj; } : function (obj) { return obj && "function" == typeof Symbol && obj.constructor === Symbol && obj !== Symbol.prototype ? "symbol" : typeof obj; }, _typeof(obj); }
3
+ Object.defineProperty(exports, "__esModule", {
4
+ value: true
5
+ });
6
+ exports.UniqueSet = exports.BloomSet = void 0;
4
7
 
5
8
  var _fastDeepEqual = _interopRequireDefault(require("fast-deep-equal"));
6
9
 
7
10
  function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { "default": obj }; }
8
11
 
9
- function _createForOfIteratorHelper(o, allowArrayLike) { var it = typeof Symbol !== "undefined" && o[Symbol.iterator] || o["@@iterator"]; if (!it) { if (Array.isArray(o) || (it = _unsupportedIterableToArray(o)) || allowArrayLike && o && typeof o.length === "number") { if (it) o = it; var i = 0; var F = function F() {}; return { s: F, n: function n() { if (i >= o.length) return { done: true }; return { done: false, value: o[i++] }; }, e: function e(_e) { throw _e; }, f: F }; } throw new TypeError("Invalid attempt to iterate non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method."); } var normalCompletion = true, didErr = false, err; return { s: function s() { it = it.call(o); }, n: function n() { var step = it.next(); normalCompletion = step.done; return step; }, e: function e(_e2) { didErr = true; err = _e2; }, f: function f() { try { if (!normalCompletion && it["return"] != null) it["return"](); } finally { if (didErr) throw err; } } }; }
12
+ function _slicedToArray(arr, i) { return _arrayWithHoles(arr) || _iterableToArrayLimit(arr, i) || _unsupportedIterableToArray(arr, i) || _nonIterableRest(); }
13
+
14
+ function _nonIterableRest() { throw new TypeError("Invalid attempt to destructure non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method."); }
15
+
16
+ function _iterableToArrayLimit(arr, i) { var _i = arr == null ? null : typeof Symbol !== "undefined" && arr[Symbol.iterator] || arr["@@iterator"]; if (_i == null) return; var _arr = []; var _n = true; var _d = false; var _s, _e; try { for (_i = _i.call(arr); !(_n = (_s = _i.next()).done); _n = true) { _arr.push(_s.value); if (i && _arr.length === i) break; } } catch (err) { _d = true; _e = err; } finally { try { if (!_n && _i["return"] != null) _i["return"](); } finally { if (_d) throw _e; } } return _arr; }
17
+
18
+ function _arrayWithHoles(arr) { if (Array.isArray(arr)) return arr; }
19
+
20
+ function _typeof(obj) { "@babel/helpers - typeof"; return _typeof = "function" == typeof Symbol && "symbol" == typeof Symbol.iterator ? function (obj) { return typeof obj; } : function (obj) { return obj && "function" == typeof Symbol && obj.constructor === Symbol && obj !== Symbol.prototype ? "symbol" : typeof obj; }, _typeof(obj); }
21
+
22
+ function _readOnlyError(name) { throw new TypeError("\"" + name + "\" is read-only"); }
23
+
24
+ function _createForOfIteratorHelper(o, allowArrayLike) { var it = typeof Symbol !== "undefined" && o[Symbol.iterator] || o["@@iterator"]; if (!it) { if (Array.isArray(o) || (it = _unsupportedIterableToArray(o)) || allowArrayLike && o && typeof o.length === "number") { if (it) o = it; var i = 0; var F = function F() {}; return { s: F, n: function n() { if (i >= o.length) return { done: true }; return { done: false, value: o[i++] }; }, e: function e(_e2) { throw _e2; }, f: F }; } throw new TypeError("Invalid attempt to iterate non-iterable instance.\nIn order to be iterable, non-array objects must have a [Symbol.iterator]() method."); } var normalCompletion = true, didErr = false, err; return { s: function s() { it = it.call(o); }, n: function n() { var step = it.next(); normalCompletion = step.done; return step; }, e: function e(_e3) { didErr = true; err = _e3; }, f: function f() { try { if (!normalCompletion && it["return"] != null) it["return"](); } finally { if (didErr) throw err; } } }; }
10
25
 
11
26
  function _unsupportedIterableToArray(o, minLen) { if (!o) return; if (typeof o === "string") return _arrayLikeToArray(o, minLen); var n = Object.prototype.toString.call(o).slice(8, -1); if (n === "Object" && o.constructor) n = o.constructor.name; if (n === "Map" || n === "Set") return Array.from(o); if (n === "Arguments" || /^(?:Ui|I)nt(?:8|16|32)(?:Clamped)?Array$/.test(n)) return _arrayLikeToArray(o, minLen); }
12
27
 
@@ -18,6 +33,10 @@ function _defineProperties(target, props) { for (var i = 0; i < props.length; i+
18
33
 
19
34
  function _createClass(Constructor, protoProps, staticProps) { if (protoProps) _defineProperties(Constructor.prototype, protoProps); if (staticProps) _defineProperties(Constructor, staticProps); Object.defineProperty(Constructor, "prototype", { writable: false }); return Constructor; }
20
35
 
36
+ function _get() { if (typeof Reflect !== "undefined" && Reflect.get) { _get = Reflect.get; } else { _get = function _get(target, property, receiver) { var base = _superPropBase(target, property); if (!base) return; var desc = Object.getOwnPropertyDescriptor(base, property); if (desc.get) { return desc.get.call(arguments.length < 3 ? target : receiver); } return desc.value; }; } return _get.apply(this, arguments); }
37
+
38
+ function _superPropBase(object, property) { while (!Object.prototype.hasOwnProperty.call(object, property)) { object = _getPrototypeOf(object); if (object === null) break; } return object; }
39
+
21
40
  function _inherits(subClass, superClass) { if (typeof superClass !== "function" && superClass !== null) { throw new TypeError("Super expression must either be null or a function"); } subClass.prototype = Object.create(superClass && superClass.prototype, { constructor: { value: subClass, writable: true, configurable: true } }); Object.defineProperty(subClass, "prototype", { writable: false }); if (superClass) _setPrototypeOf(subClass, superClass); }
22
41
 
23
42
  function _createSuper(Derived) { var hasNativeReflectConstruct = _isNativeReflectConstruct(); return function _createSuperInternal() { var Super = _getPrototypeOf(Derived), result; if (hasNativeReflectConstruct) { var NewTarget = _getPrototypeOf(this).constructor; result = Reflect.construct(Super, arguments, NewTarget); } else { result = Super.apply(this, arguments); } return _possibleConstructorReturn(this, result); }; }
@@ -44,33 +63,54 @@ var UniqueSet = /*#__PURE__*/function (_Set) {
44
63
  var _super = _createSuper(UniqueSet);
45
64
 
46
65
  function UniqueSet() {
66
+ var _this;
67
+
68
+ var iterable = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : [];
69
+
47
70
  _classCallCheck(this, UniqueSet);
48
71
 
49
- for (var _len = arguments.length, args = new Array(_len), _key = 0; _key < _len; _key++) {
50
- args[_key] = arguments[_key];
72
+ if (!Array.isArray(iterable) && !iterable[Symbol.iterator]) {
73
+ throw new TypeError("UniqueSet requires an iterable");
74
+ }
75
+
76
+ _this = _super.call(this);
77
+
78
+ var _iterator = _createForOfIteratorHelper(iterable),
79
+ _step;
80
+
81
+ try {
82
+ for (_iterator.s(); !(_step = _iterator.n()).done;) {
83
+ var item = _step.value;
84
+
85
+ _this.add(item);
86
+ }
87
+ } catch (err) {
88
+ _iterator.e(err);
89
+ } finally {
90
+ _iterator.f();
51
91
  }
52
92
 
53
- return _super.call.apply(_super, [this].concat(args));
93
+ return _this;
54
94
  }
55
95
 
56
96
  _createClass(UniqueSet, [{
57
97
  key: "has",
58
98
  value: function has(o) {
59
- var _iterator = _createForOfIteratorHelper(this),
60
- _step;
99
+ var _iterator2 = _createForOfIteratorHelper(this),
100
+ _step2;
61
101
 
62
102
  try {
63
- for (_iterator.s(); !(_step = _iterator.n()).done;) {
64
- var i = _step.value;
103
+ for (_iterator2.s(); !(_step2 = _iterator2.n()).done;) {
104
+ var i = _step2.value;
65
105
 
66
106
  if ((0, _fastDeepEqual["default"])(o, i)) {
67
107
  return true;
68
108
  }
69
109
  }
70
110
  } catch (err) {
71
- _iterator.e(err);
111
+ _iterator2.e(err);
72
112
  } finally {
73
- _iterator.f();
113
+ _iterator2.f();
74
114
  }
75
115
 
76
116
  return false;
@@ -79,12 +119,271 @@ var UniqueSet = /*#__PURE__*/function (_Set) {
79
119
  key: "add",
80
120
  value: function add(o) {
81
121
  if (!this.has(o)) {
82
- Set.prototype.add.call(this, o);
122
+ _get(_getPrototypeOf(UniqueSet.prototype), "add", this).call(this, o);
83
123
  }
124
+
125
+ return this;
84
126
  }
85
127
  }]);
86
128
 
87
129
  return UniqueSet;
88
130
  }( /*#__PURE__*/_wrapNativeSuper(Set));
89
131
 
90
- module.exports = UniqueSet;
132
+ exports.UniqueSet = UniqueSet;
133
+
134
+ var BloomSet = /*#__PURE__*/function (_Set2) {
135
+ _inherits(BloomSet, _Set2);
136
+
137
+ var _super2 = _createSuper(BloomSet);
138
+
139
+ function BloomSet() {
140
+ var _this2;
141
+
142
+ var iterable = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : [];
143
+ var options = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : {};
144
+
145
+ _classCallCheck(this, BloomSet);
146
+
147
+ if (!Array.isArray(iterable) && !iterable[Symbol.iterator]) {
148
+ throw new TypeError("BloomSet requires an iterable");
149
+ }
150
+
151
+ _this2 = _super2.call(this);
152
+
153
+ if (!options || _typeof(options) !== "object") {
154
+ options = {};
155
+ }
156
+
157
+ var _options = options,
158
+ _options$size = _options.size,
159
+ size = _options$size === void 0 ? 6553577 : _options$size,
160
+ _options$hashCount = _options.hashCount,
161
+ hashCount = _options$hashCount === void 0 ? 7 : _options$hashCount;
162
+
163
+ if (typeof size !== "number" || size <= 0) {
164
+ 6553577, _readOnlyError("size"); // Targeting < 1 collision per 100,000 elements, ~819 KB memory, needs 7 hashes
165
+ }
166
+
167
+ _this2.aSize = _this2._findNextPrime(size);
168
+
169
+ if (typeof hashCount !== "number" || hashCount <= 0) {
170
+ 7, _readOnlyError("hashCount");
171
+ }
172
+
173
+ _this2.hashCount = hashCount;
174
+ _this2.bitArray = new Uint8Array(Math.ceil(size / 8));
175
+
176
+ var _iterator3 = _createForOfIteratorHelper(iterable),
177
+ _step3;
178
+
179
+ try {
180
+ for (_iterator3.s(); !(_step3 = _iterator3.n()).done;) {
181
+ var item = _step3.value;
182
+
183
+ _this2.add(item);
184
+ }
185
+ } catch (err) {
186
+ _iterator3.e(err);
187
+ } finally {
188
+ _iterator3.f();
189
+ }
190
+
191
+ return _this2;
192
+ }
193
+
194
+ _createClass(BloomSet, [{
195
+ key: "_findNextPrime",
196
+ value: function _findNextPrime(num) {
197
+ if (num < 2) return 2;
198
+ if (num % 2 === 0) num++; // Odd numbers only
199
+
200
+ while (!this._isPrime(num)) {
201
+ num += 2; // Odd numbers only
202
+ }
203
+
204
+ return num;
205
+ }
206
+ }, {
207
+ key: "_isPrime",
208
+ value: function _isPrime(num) {
209
+ if (num < 2) return false;
210
+ if (num === 2 || num === 3) return true;
211
+ if (num % 2 === 0 || num % 3 === 0) return false;
212
+ var sqrt = Math.floor(Math.sqrt(num));
213
+
214
+ for (var i = 5; i <= sqrt; i += 6) {
215
+ if (num % i === 0 || num % (i + 2) === 0) return false;
216
+ }
217
+
218
+ return true;
219
+ }
220
+ }, {
221
+ key: "_serialize",
222
+ value: function _serialize(item) {
223
+ if (typeof item === "number" && isNaN(item)) {
224
+ return "NaN";
225
+ }
226
+
227
+ if (item && _typeof(item) === "object") {
228
+ var serialize = this._serialize.bind(this);
229
+
230
+ if (Array.isArray(item)) {
231
+ return "[".concat(item.map(serialize).join(","), "]");
232
+ } else {
233
+ return "{".concat(Object.entries(item).sort(function (_ref, _ref2) {
234
+ var _ref3 = _slicedToArray(_ref, 1),
235
+ a = _ref3[0];
236
+
237
+ var _ref4 = _slicedToArray(_ref2, 1),
238
+ b = _ref4[0];
239
+
240
+ return a.localeCompare(b);
241
+ }).map(function (_ref5) {
242
+ var _ref6 = _slicedToArray(_ref5, 2),
243
+ k = _ref6[0],
244
+ v = _ref6[1];
245
+
246
+ return "".concat(k, ":").concat(serialize(v));
247
+ }).join(","), "}");
248
+ }
249
+ }
250
+
251
+ return String(item);
252
+ }
253
+ }, {
254
+ key: "_hashes",
255
+ value: function _hashes(item) {
256
+ var hashes = [];
257
+
258
+ var str = this._serialize(item);
259
+
260
+ var hash = this._fnv1a(str); // Base hash
261
+ // Bloom into hashCount hash values
262
+
263
+
264
+ for (var i = 0; i < this.hashCount; i++) {
265
+ hash %= this.aSize; // Ensure within bounds
266
+ // Track
267
+
268
+ hashes.push(hash); // Modify
269
+
270
+ hash = (hash ^ hash >>> 13) * 0xc2b2ae35;
271
+ hash >>>= 0; // Ensure unsigned 32-bit integer
272
+ }
273
+
274
+ return hashes;
275
+ }
276
+ }, {
277
+ key: "_fnv1a",
278
+ value: function _fnv1a(str) {
279
+ if (typeof str !== "string") {
280
+ str = String(str);
281
+ }
282
+
283
+ var hash = 2166136261; // FNV offset basis for 32-bit
284
+
285
+ for (var i = 0; i < str.length; i++) {
286
+ hash ^= str.charCodeAt(i);
287
+ hash = hash * 16777619 >>> 0; // Multiply by the FNV prime and ensure 32-bit unsigned
288
+ }
289
+
290
+ return hash >>> 0;
291
+ }
292
+ }, {
293
+ key: "_setBits",
294
+ value: function _setBits(hashes) {
295
+ var _iterator4 = _createForOfIteratorHelper(hashes),
296
+ _step4;
297
+
298
+ try {
299
+ for (_iterator4.s(); !(_step4 = _iterator4.n()).done;) {
300
+ var hash = _step4.value;
301
+ var index = Math.floor(hash / 8);
302
+ var bit = hash % 8;
303
+ this.bitArray[index] |= 1 << bit;
304
+ }
305
+ } catch (err) {
306
+ _iterator4.e(err);
307
+ } finally {
308
+ _iterator4.f();
309
+ }
310
+ }
311
+ }, {
312
+ key: "_checkBits",
313
+ value: function _checkBits(hashes) {
314
+ var _iterator5 = _createForOfIteratorHelper(hashes),
315
+ _step5;
316
+
317
+ try {
318
+ for (_iterator5.s(); !(_step5 = _iterator5.n()).done;) {
319
+ var hash = _step5.value;
320
+ var index = Math.floor(hash / 8);
321
+ var bit = hash % 8;
322
+
323
+ if (!(this.bitArray[index] & 1 << bit)) {
324
+ return false;
325
+ }
326
+ }
327
+ } catch (err) {
328
+ _iterator5.e(err);
329
+ } finally {
330
+ _iterator5.f();
331
+ }
332
+
333
+ return true;
334
+ }
335
+ }, {
336
+ key: "has",
337
+ value: function has(o) {
338
+ var hashes = this._hashes(o);
339
+
340
+ if (!this._checkBits(hashes)) {
341
+ return false; // Definitely not in the set
342
+ } // Fall back to fast-deep-equal for false positives
343
+
344
+
345
+ var _iterator6 = _createForOfIteratorHelper(this),
346
+ _step6;
347
+
348
+ try {
349
+ for (_iterator6.s(); !(_step6 = _iterator6.n()).done;) {
350
+ var i = _step6.value;
351
+
352
+ if ((0, _fastDeepEqual["default"])(o, i)) {
353
+ return true;
354
+ }
355
+ }
356
+ } catch (err) {
357
+ _iterator6.e(err);
358
+ } finally {
359
+ _iterator6.f();
360
+ }
361
+
362
+ return false;
363
+ }
364
+ }, {
365
+ key: "add",
366
+ value: function add(o) {
367
+ if (!this.has(o)) {
368
+ var hashes = this._hashes(o);
369
+
370
+ this._setBits(hashes);
371
+
372
+ _get(_getPrototypeOf(BloomSet.prototype), "add", this).call(this, o);
373
+ }
374
+
375
+ return this;
376
+ }
377
+ }]);
378
+
379
+ return BloomSet;
380
+ }( /*#__PURE__*/_wrapNativeSuper(Set));
381
+
382
+ exports.BloomSet = BloomSet;
383
+
384
+ if (typeof module !== "undefined" && module.exports) {
385
+ module.exports = {
386
+ UniqueSet: UniqueSet,
387
+ BloomSet: BloomSet
388
+ };
389
+ }
package/dist/index.mjs ADDED
@@ -0,0 +1,182 @@
1
+ // index.ts
2
+ import equal from "fast-deep-equal/es6/index.js";
3
+ var UniqueSet = class extends Set {
4
+ /*** @throws TypeError If the input is not iterable. */
5
+ constructor(iterable = []) {
6
+ if (!Array.isArray(iterable) && !iterable[Symbol.iterator]) {
7
+ throw new TypeError("UniqueSet requires an iterable");
8
+ }
9
+ super();
10
+ for (const item of iterable) {
11
+ this.add(item);
12
+ }
13
+ }
14
+ /**
15
+ * Determines whether an object is in the UniqueSet using deep equality.
16
+ * @param o The object to check for presence in the UniqueSet.
17
+ * @returns `true` if the object is found, `false` otherwise.
18
+ */
19
+ has(o) {
20
+ for (const i of this) {
21
+ if (equal(o, i)) {
22
+ return true;
23
+ }
24
+ }
25
+ return false;
26
+ }
27
+ /**
28
+ * Adds a new object to the UniqueSet if it is not already present.
29
+ * @param o The object to add to the UniqueSet.
30
+ * @returns The `UniqueSet` instance, allowing for chaining.
31
+ */
32
+ add(o) {
33
+ if (!this.has(o)) {
34
+ super.add(o);
35
+ }
36
+ return this;
37
+ }
38
+ };
39
+ var BloomSet = class extends Set {
40
+ #bitArray;
41
+ #aSize;
42
+ #hashCount;
43
+ /**
44
+ * Creates a new `BloomSet` instance.
45
+ * @param iterable Optional: an iterable object with which to initialize the BloomSet.
46
+ * @param options Bloom filter configuration options.
47
+ * @param options.size The size of the Bloom filter's bit array. Defaults to 6553577.
48
+ * @param options.hashCount The number of hash functions to use. Defaults to 7.
49
+ * @throws TypeError If the input is not iterable.
50
+ */
51
+ constructor(iterable = [], options = {}) {
52
+ if (!Array.isArray(iterable) && !iterable[Symbol.iterator]) {
53
+ throw new TypeError("BloomSet requires an iterable");
54
+ }
55
+ super();
56
+ if (!options || typeof options !== "object") {
57
+ options = {};
58
+ }
59
+ options.hashCount ??= 7;
60
+ options.size ??= 6553577;
61
+ let { size, hashCount } = options;
62
+ if (typeof size !== "number" || size <= 0) {
63
+ size = 6553577;
64
+ }
65
+ this.#aSize = this.#findNextPrime(size);
66
+ if (typeof hashCount !== "number" || hashCount <= 0) {
67
+ hashCount = 7;
68
+ }
69
+ this.#hashCount = hashCount;
70
+ this.#bitArray = new Uint8Array(Math.ceil(size / 8));
71
+ for (const item of iterable) {
72
+ this.add(item);
73
+ }
74
+ }
75
+ /** @internal */
76
+ #findNextPrime(num) {
77
+ if (num < 2) return 2;
78
+ if (num % 2 === 0) num++;
79
+ while (!this.#isPrime(num)) {
80
+ num += 2;
81
+ }
82
+ return num;
83
+ }
84
+ /** @internal */
85
+ #isPrime(num) {
86
+ if (num < 2) return false;
87
+ if (num === 2 || num === 3) return true;
88
+ if (num % 2 === 0 || num % 3 === 0) return false;
89
+ const sqrt = Math.floor(Math.sqrt(num));
90
+ for (let i = 5; i <= sqrt; i += 6) {
91
+ if (num % i === 0 || num % (i + 2) === 0) return false;
92
+ }
93
+ return true;
94
+ }
95
+ /** @internal */
96
+ #serialize(item) {
97
+ if (typeof item === "number" && isNaN(item)) {
98
+ return "NaN";
99
+ }
100
+ if (item && typeof item === "object") {
101
+ const serialize = this.#serialize.bind(this);
102
+ if (Array.isArray(item)) {
103
+ return `[${item.map(serialize).join(",")}]`;
104
+ } else {
105
+ return `{${Object.entries(item).sort(([a], [b]) => a.localeCompare(b)).map(([k, v]) => `${k}:${serialize(v)}`).join(",")}}`;
106
+ }
107
+ }
108
+ return String(item);
109
+ }
110
+ /** @internal */
111
+ #hashes(item) {
112
+ const hashes = [];
113
+ const str = this.#serialize(item);
114
+ let hash = this.#fnv1a(str);
115
+ for (let i = 0; i < this.#hashCount; i++) {
116
+ hash %= this.#aSize;
117
+ hashes.push(hash);
118
+ hash = (hash ^ hash >>> 13) * 3266489909;
119
+ hash >>>= 0;
120
+ }
121
+ return hashes;
122
+ }
123
+ /** @internal */
124
+ #fnv1a(str) {
125
+ if (typeof str !== "string") {
126
+ str = String(str);
127
+ }
128
+ let hash = 2166136261;
129
+ for (let i = 0; i < str.length; i++) {
130
+ hash ^= str.charCodeAt(i);
131
+ hash = hash * 16777619 >>> 0;
132
+ }
133
+ return hash >>> 0;
134
+ }
135
+ /** @internal */
136
+ #setBits(hashes) {
137
+ for (const hash of hashes) {
138
+ const index = Math.floor(hash / 8);
139
+ const bit = hash % 8;
140
+ this.#bitArray[index] |= 1 << bit;
141
+ }
142
+ }
143
+ /** @internal */
144
+ #checkBits(hashes) {
145
+ for (const hash of hashes) {
146
+ const index = Math.floor(hash / 8);
147
+ const bit = hash % 8;
148
+ if (!(this.#bitArray[index] & 1 << bit)) {
149
+ return false;
150
+ }
151
+ }
152
+ return true;
153
+ }
154
+ /** Determines existence of an object in the BloomSet using the Bloom filter and deep equality */
155
+ has(o) {
156
+ const hashes = this.#hashes(o);
157
+ if (!this.#checkBits(hashes)) {
158
+ return false;
159
+ }
160
+ for (const i of this) {
161
+ if (equal(o, i)) {
162
+ return true;
163
+ }
164
+ }
165
+ return false;
166
+ }
167
+ /** Adds a new object to the BloomSet if it is not already present.
168
+ * @returns The `BloomSet` instance, allowing for chaining.
169
+ */
170
+ add(o) {
171
+ if (!this.has(o)) {
172
+ const hashes = this.#hashes(o);
173
+ this.#setBits(hashes);
174
+ super.add(o);
175
+ }
176
+ return this;
177
+ }
178
+ };
179
+ export {
180
+ BloomSet,
181
+ UniqueSet
182
+ };
package/index.ts ADDED
@@ -0,0 +1,212 @@
1
+ import equal from "fast-deep-equal/es6/index.js";
2
+
3
+ /** A `Set` extension that ensures uniqueness of items using deep equality checks. */
4
+ export class UniqueSet<T> extends Set<T> {
5
+ /*** @throws TypeError If the input is not iterable. */
6
+ constructor(iterable: Iterable<T> = []) {
7
+ if (!Array.isArray(iterable) && !iterable[Symbol.iterator]) {
8
+ throw new TypeError("UniqueSet requires an iterable");
9
+ }
10
+ super();
11
+ for (const item of iterable) {
12
+ this.add(item);
13
+ }
14
+ }
15
+ /**
16
+ * Determines whether an object is in the UniqueSet using deep equality.
17
+ * @param o The object to check for presence in the UniqueSet.
18
+ * @returns `true` if the object is found, `false` otherwise.
19
+ */
20
+ has(o: T): boolean {
21
+ for (const i of this) {
22
+ if (equal(o, i)) {
23
+ return true;
24
+ }
25
+ }
26
+ return false;
27
+ }
28
+ /**
29
+ * Adds a new object to the UniqueSet if it is not already present.
30
+ * @param o The object to add to the UniqueSet.
31
+ * @returns The `UniqueSet` instance, allowing for chaining.
32
+ */
33
+ add(o: T): this {
34
+ if (!this.has(o)) {
35
+ super.add(o);
36
+ }
37
+ return this;
38
+ }
39
+ }
40
+
41
+ /** A `Set` extension that uses a Bloom filter for fast existence checks combined with deep equality for accuracy. */
42
+ export class BloomSet<T> extends Set<T> {
43
+ #bitArray: Uint8Array;
44
+ #aSize: number;
45
+ #hashCount: number;
46
+ /**
47
+ * Creates a new `BloomSet` instance.
48
+ * @param iterable Optional: an iterable object with which to initialize the BloomSet.
49
+ * @param options Bloom filter configuration options.
50
+ * @param options.size The size of the Bloom filter's bit array. Defaults to 6553577.
51
+ * @param options.hashCount The number of hash functions to use. Defaults to 7.
52
+ * @throws TypeError If the input is not iterable.
53
+ */
54
+ constructor(
55
+ iterable: Iterable<T> = [],
56
+ options: { size?: number; hashCount?: number } = {}
57
+ ) {
58
+ if (!Array.isArray(iterable) && !iterable[Symbol.iterator]) {
59
+ throw new TypeError("BloomSet requires an iterable");
60
+ }
61
+ super();
62
+
63
+ if (!options || typeof options !== "object") {
64
+ options = {};
65
+ }
66
+
67
+ options.hashCount ??= 7;
68
+ options.size ??= 6553577;
69
+
70
+ let { size, hashCount } = options;
71
+
72
+ if (typeof size !== "number" || size <= 0) {
73
+ size = 6553577; // Targeting < 1 collision per 100,000 elements, ~819 KB memory, needs 7 hashes
74
+ }
75
+ this.#aSize = this.#findNextPrime(size);
76
+
77
+ if (typeof hashCount !== "number" || hashCount <= 0) {
78
+ hashCount = 7;
79
+ }
80
+ this.#hashCount = hashCount;
81
+ this.#bitArray = new Uint8Array(Math.ceil(size / 8));
82
+
83
+ for (const item of iterable) {
84
+ this.add(item);
85
+ }
86
+ }
87
+
88
+ /** @internal */
89
+ #findNextPrime(num: number) {
90
+ if (num < 2) return 2;
91
+ if (num % 2 === 0) num++; // Odd numbers only
92
+
93
+ while (!this.#isPrime(num)) {
94
+ num += 2; // Odd numbers only
95
+ }
96
+
97
+ return num;
98
+ }
99
+
100
+ /** @internal */
101
+ #isPrime(num: number) {
102
+ if (num < 2) return false;
103
+ if (num === 2 || num === 3) return true;
104
+ if (num % 2 === 0 || num % 3 === 0) return false;
105
+
106
+ const sqrt = Math.floor(Math.sqrt(num));
107
+ for (let i = 5; i <= sqrt; i += 6) {
108
+ if (num % i === 0 || num % (i + 2) === 0) return false;
109
+ }
110
+
111
+ return true;
112
+ }
113
+
114
+ /** @internal */
115
+ #serialize(item: T | number | object): string {
116
+ if (typeof item === "number" && isNaN(item)) {
117
+ return "NaN";
118
+ }
119
+
120
+ if (item && typeof item === "object") {
121
+ const serialize = this.#serialize.bind(this);
122
+ if (Array.isArray(item)) {
123
+ return `[${item.map(serialize).join(",")}]`;
124
+ } else {
125
+ return `{${Object.entries(item)
126
+ .sort(([a], [b]) => a.localeCompare(b))
127
+ .map(([k, v]) => `${k}:${serialize(v)}`)
128
+ .join(",")}}`;
129
+ }
130
+ }
131
+
132
+ return String(item);
133
+ }
134
+
135
+ /** @internal */
136
+ #hashes(item: T) {
137
+ const hashes: number[] = [];
138
+ const str = this.#serialize(item);
139
+ let hash = this.#fnv1a(str); // Base hash
140
+
141
+ // Bloom into hashCount hash values
142
+ for (let i = 0; i < this.#hashCount; i++) {
143
+ hash %= this.#aSize; // Ensure within bounds
144
+ // Track
145
+ hashes.push(hash);
146
+ // Modify
147
+ hash = (hash ^ (hash >>> 13)) * 0xc2b2ae35;
148
+ hash >>>= 0; // Ensure unsigned 32-bit integer
149
+ }
150
+
151
+ return hashes;
152
+ }
153
+
154
+ /** @internal */
155
+ #fnv1a(str: string) {
156
+ if (typeof str !== "string") {
157
+ str = String(str);
158
+ }
159
+ let hash = 2166136261; // FNV offset basis for 32-bit
160
+ for (let i = 0; i < str.length; i++) {
161
+ hash ^= str.charCodeAt(i);
162
+ hash = (hash * 16777619) >>> 0; // Multiply by the FNV prime and ensure 32-bit unsigned
163
+ }
164
+ return hash >>> 0;
165
+ }
166
+
167
+ /** @internal */
168
+ #setBits(hashes: number[]): void {
169
+ for (const hash of hashes) {
170
+ const index = Math.floor(hash / 8);
171
+ const bit = hash % 8;
172
+ this.#bitArray[index]! |= 1 << bit;
173
+ }
174
+ }
175
+
176
+ /** @internal */
177
+ #checkBits(hashes: number[]): boolean {
178
+ for (const hash of hashes) {
179
+ const index = Math.floor(hash / 8);
180
+ const bit = hash % 8;
181
+ if (!(this.#bitArray[index]! & (1 << bit))) {
182
+ return false;
183
+ }
184
+ }
185
+ return true;
186
+ }
187
+ /** Determines existence of an object in the BloomSet using the Bloom filter and deep equality */
188
+ has(o: T): boolean {
189
+ const hashes = this.#hashes(o);
190
+ if (!this.#checkBits(hashes)) {
191
+ return false; // Definitely not in the set
192
+ }
193
+ // Fall back to fast-deep-equal for false positives
194
+ for (const i of this) {
195
+ if (equal(o, i)) {
196
+ return true;
197
+ }
198
+ }
199
+ return false;
200
+ }
201
+ /** Adds a new object to the BloomSet if it is not already present.
202
+ * @returns The `BloomSet` instance, allowing for chaining.
203
+ */
204
+ add(o: T): this {
205
+ if (!this.has(o)) {
206
+ const hashes = this.#hashes(o);
207
+ this.#setBits(hashes);
208
+ super.add(o);
209
+ }
210
+ return this;
211
+ }
212
+ }
package/package.json CHANGED
@@ -1,11 +1,18 @@
1
1
  {
2
2
  "name": "@sepiariver/unique-set",
3
- "version": "1.1.1",
4
- "description": "Extends the has and add methods on the native JavaScript Set object to deeply compare using fast-deep-equal",
5
- "main": "index.js",
3
+ "version": "2.0.1",
4
+ "description": "Extends the native Set class to deeply compare using fast-deep-equal, with optional Bloom filter optimization. This version exports 2 classes instead of a default, breaking b/c with version 1.",
5
+ "main": "dist/index.js",
6
+ "module": "dist/index.mjs",
7
+ "types": "dist/index.d.mts",
8
+ "exports": {
9
+ "require": "./dist/index.js",
10
+ "import": "./dist/index.mjs"
11
+ },
6
12
  "scripts": {
7
- "test": "jest",
8
- "build": "babel src -d dist"
13
+ "test": "npm run build && vitest",
14
+ "lint": "tsc",
15
+ "build": "tsup index.ts --format esm --dts"
9
16
  },
10
17
  "repository": {
11
18
  "type": "git",
@@ -14,7 +21,7 @@
14
21
  "keywords": [
15
22
  "Set",
16
23
  "unique",
17
- "object",
24
+ "bloom",
18
25
  "deep",
19
26
  "compare",
20
27
  "equal"
@@ -29,10 +36,8 @@
29
36
  "fast-deep-equal": "^3.1.3"
30
37
  },
31
38
  "devDependencies": {
32
- "@babel/cli": "^7.17.6",
33
- "@babel/core": "^7.17.8",
34
- "@babel/preset-env": "^7.16.11",
35
- "babel-jest": "^27.5.1",
36
- "jest": "^27.5.1"
39
+ "tsup": "^8.3.5",
40
+ "typescript": "^5.7.2",
41
+ "vitest": "^2.1.8"
37
42
  }
38
43
  }
package/temp.cjs ADDED
@@ -0,0 +1,10 @@
1
+ const { BloomSet, UniqueSet } = require("./dist/index.js");
2
+
3
+ const bloom = new BloomSet();
4
+ bloom.add("foo");
5
+ console.log(bloom.has("foo")); // true
6
+
7
+ const unique = new UniqueSet();
8
+ unique.add("foo");
9
+ unique.add("foo");
10
+ console.log(unique.size); // 1
package/temp.mjs ADDED
@@ -0,0 +1,10 @@
1
+ import { BloomSet, UniqueSet } from "./dist/index.js";
2
+
3
+ const bloom = new BloomSet();
4
+ bloom.add("foo");
5
+ console.log(bloom.has("foo")); // true
6
+
7
+ const unique = new UniqueSet();
8
+ unique.add("foo");
9
+ unique.add("foo");
10
+ console.log(unique.size); // 1
package/tsconfig.json ADDED
@@ -0,0 +1,23 @@
1
+ {
2
+ "compilerOptions": {
3
+ /* Language and Environment */
4
+ "target": "ES2023", /* Set the JavaScript language version for emitted JavaScript and include compatible library declarations. */
5
+
6
+ /* Modules */
7
+ "module": "ES2022", /* Specify what module code is generated. */
8
+ "removeComments": false, /* Disable emitting comments. */
9
+ "moduleResolution": "bundler",
10
+ // "importHelpers": true, /* Allow importing helper functions from tslib once per project, instead of including them per-file. */
11
+ // "downlevelIteration": true, /* Emit more compliant, but verbose and less performant JavaScript for iteration. */
12
+ /* Interop Constraints */
13
+ "esModuleInterop": true, /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables 'allowSyntheticDefaultImports' for type compatibility. */
14
+ "forceConsistentCasingInFileNames": true, /* Ensure that casing is correct in imports. */
15
+
16
+ /* Type Checking */
17
+ "strict": true, /* Enable all strict type-checking options. */
18
+ /* Completeness */
19
+ "skipLibCheck": true, /* Skip type checking all .d.ts files. */
20
+ "noUncheckedIndexedAccess": true,
21
+ "noEmit": true
22
+ }
23
+ }
package/index.js DELETED
@@ -1 +0,0 @@
1
- module.exports = require("./dist");
package/src/index.js DELETED
@@ -1,22 +0,0 @@
1
- import equal from "fast-deep-equal";
2
-
3
- class UniqueSet extends Set {
4
- constructor(...args) {
5
- super(...args);
6
- }
7
- has(o) {
8
- for (const i of this) {
9
- if (equal(o, i)) {
10
- return true;
11
- }
12
- }
13
- return false;
14
- }
15
- add(o) {
16
- if (!this.has(o)) {
17
- Set.prototype.add.call(this, o);
18
- }
19
- }
20
- }
21
-
22
- module.exports = UniqueSet;