hyll 0.1.1 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +102 -0
- data/README.md +132 -18
- data/examples/redis_comparison_benchmark.rb +539 -0
- data/examples/v1_benchmark.rb +93 -0
- data/lib/hyll/algorithms/enhanced_hyperloglog.rb +240 -119
- data/lib/hyll/algorithms/hyperloglog.rb +263 -327
- data/lib/hyll/constants.rb +75 -0
- data/lib/hyll/utils/hash.rb +132 -21
- data/lib/hyll/utils/math.rb +136 -66
- data/lib/hyll/version.rb +1 -1
- metadata +4 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 92ad297dec2242d67ec3d5fb377658db11bb68fde8b26856f4bdf86f4d28fdc9
|
|
4
|
+
data.tar.gz: b8cb857f998f3eaee640d8958abf2ca9696488ea3517709eb4dc231643b89865
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 97ee62fadbb6c31e90b9a2b711347af4651a886e220906060c0a41402fe2ddece0d06d3e362d9cf1b8cc5ccee1ad0bee2accc156b39a4b9a6551e8b0ee5c97e4
|
|
7
|
+
data.tar.gz: 82fc6e4fa956a1c0aa9ab7e70faa35cb36ab5687d65cd9907b46faee23090922e21e1f2005371e7c5fcc1657764bc8b704347962a569b319e43f57fa380cf7ea
|
data/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,108 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [1.0.0] - 2025-11-28
|
|
9
|
+
|
|
10
|
+
### 🚀 MAJOR PERFORMANCE RELEASE - "Blazing Fast Edition"
|
|
11
|
+
|
|
12
|
+
This release marks the first stable version of Hyll with a complete performance overhaul, delivering significant speed improvements through optimizations at every level of the codebase.
|
|
13
|
+
|
|
14
|
+
### Performance Highlights
|
|
15
|
+
|
|
16
|
+
- **2x faster batch operations** through optimized `add_all` with chunk processing
|
|
17
|
+
- **Faster cardinality estimation** via pre-computed lookup tables
|
|
18
|
+
- **Reduced memory allocations** in hot paths
|
|
19
|
+
- **O(1) bit operations** using byte-level lookup tables for CLZ
|
|
20
|
+
|
|
21
|
+
### Added
|
|
22
|
+
|
|
23
|
+
#### Pre-computed Lookup Tables (Constants)
|
|
24
|
+
- `POW2_NEG_TABLE`: Pre-computed 2^(-n) values for n=0..64, eliminating expensive power calculations
|
|
25
|
+
- `POW2_TABLE`: Pre-computed 2^n values for instant bit shifting
|
|
26
|
+
- `CLZ8_TABLE`: 256-entry byte-level count-leading-zeros table for O(1) CLZ operations
|
|
27
|
+
- `LOG2_TABLE`: Pre-computed log2 values for common register counts
|
|
28
|
+
- `REGISTER_MASKS`: Pre-computed masks for register extraction (precision 4-16)
|
|
29
|
+
- `ALPHA_M_SQUARED`: Pre-computed α×m² values for each precision level
|
|
30
|
+
- `OPTIMAL_BATCH_SIZE`: Tuned batch size (1024) for optimal cache utilization
|
|
31
|
+
- Inlined MurmurHash3 constants for maximum speed
|
|
32
|
+
|
|
33
|
+
#### Ultra-Fast Hash Functions
|
|
34
|
+
- `murmurhash3_batch`: Batch hashing for multiple elements with amortized overhead
|
|
35
|
+
- `hash_and_extract`: Combined hash + HLL extraction in single pass
|
|
36
|
+
- `fast_clz32`: O(1) 32-bit count leading zeros using 256-entry byte lookup table
|
|
37
|
+
- Loop unrolling for 4-byte block processing in MurmurHash3
|
|
38
|
+
- Optimized tail byte handling
|
|
39
|
+
|
|
40
|
+
#### Optimized Math Utilities
|
|
41
|
+
- `pow2_neg`: O(1) power of 2 negative lookup
|
|
42
|
+
- `sum_pow2_neg`: Vectorized sum of 2^(-v) for register arrays
|
|
43
|
+
- `alpha_m_squared`: Pre-computed α×m² retrieval
|
|
44
|
+
- `harmonic_mean_sum`: Optimized harmonic mean for register values
|
|
45
|
+
- Cached Taylor series coefficients for h(x) calculation
|
|
46
|
+
|
|
47
|
+
#### HyperLogLog Core Optimizations
|
|
48
|
+
- `@register_mask`: Pre-computed bitmask for register index extraction
|
|
49
|
+
- `@alpha_m_squared`: Pre-computed estimation constant
|
|
50
|
+
- `@pow2_neg_table`: Instance-level table reference for cache locality
|
|
51
|
+
- `add_to_registers_fast`: Inlined hash + update path
|
|
52
|
+
- `update_register_fast`: Minimized branching with overflow handling
|
|
53
|
+
- `get_register_value_fast`: Optimized nibble extraction with bit operations
|
|
54
|
+
- `set_register_value_fast`: Direct nibble setting without conditionals
|
|
55
|
+
- `extract_counts_fast`: Single-pass register value counting
|
|
56
|
+
- Batch-optimized `add_all` with chunk processing
|
|
57
|
+
|
|
58
|
+
#### EnhancedHyperLogLog Optimizations
|
|
59
|
+
- `modification_probability_fast`: Cached probability calculation
|
|
60
|
+
- `@cached_mod_prob`: Modification probability cache with dirty flag
|
|
61
|
+
- `@registers_dirty`: Change tracking for cache invalidation
|
|
62
|
+
- `merge_dense_registers_optimized`: Direct register comparison without allocation
|
|
63
|
+
- `adjust_registers_for_estimation`: Non-mutating register adjustment
|
|
64
|
+
- `compute_cardinality_from_registers`: Separated estimation logic
|
|
65
|
+
- Eliminated `@registers.dup` in hot paths
|
|
66
|
+
|
|
67
|
+
### Changed
|
|
68
|
+
|
|
69
|
+
- **Bit operations**: Replaced `index / 2` with `index >> 1` and `index % 2` with `index & 1`
|
|
70
|
+
- **Power calculations**: All `2.0 ** -x` replaced with lookup table access
|
|
71
|
+
- **Register initialization**: `(@m + 1) >> 1` instead of `(@m / 2.0).ceil`
|
|
72
|
+
- **Loop optimization**: Integer iteration with early-exit patterns
|
|
73
|
+
- **Memory allocation**: Minimal object creation in hot paths
|
|
74
|
+
- **Method inlining**: Critical paths inlined to reduce call overhead
|
|
75
|
+
|
|
76
|
+
### Fixed
|
|
77
|
+
|
|
78
|
+
- Edge case in `linear_counting` when zero_registers >= m
|
|
79
|
+
- Potential division by zero in modification probability
|
|
80
|
+
- Memory leak in streaming estimate accumulation
|
|
81
|
+
|
|
82
|
+
### Internal
|
|
83
|
+
|
|
84
|
+
- Backward-compatible aliases for all renamed methods
|
|
85
|
+
- Comprehensive documentation for all new performance constants
|
|
86
|
+
- Type annotations preserved in signature files
|
|
87
|
+
|
|
88
|
+
## [0.2.0] - 2025-03-24
|
|
89
|
+
|
|
90
|
+
### Added
|
|
91
|
+
- Associativity test for HyperLogLog merges to ensure (A ∪ B) ∪ C = A ∪ (B ∪ C)
|
|
92
|
+
- Guards against invalid inputs in mathematical functions
|
|
93
|
+
- Memoization for h_values calculation in MLE algorithm
|
|
94
|
+
- More comprehensive error handling with safeguards
|
|
95
|
+
- Added `examples/redis_comparison_benchmark.rb` for Redis comparison
|
|
96
|
+
|
|
97
|
+
### Changed
|
|
98
|
+
- Optimized Maximum Likelihood Estimation (MLE) algorithm for better performance
|
|
99
|
+
- Improved numerical stability in the secant method implementation
|
|
100
|
+
- Enhanced Taylor series calculation for better accuracy
|
|
101
|
+
- Fixed Math module namespace conflicts
|
|
102
|
+
- Added safeguards against division by zero and other numerical errors
|
|
103
|
+
- Limited maximum iterations in convergence algorithms to prevent infinite loops
|
|
104
|
+
|
|
105
|
+
### Fixed
|
|
106
|
+
- Addressed potential numerical instability in calculate_h_values method
|
|
107
|
+
- Fixed undefined method 'exp' error by using global namespace operator
|
|
108
|
+
- Improved edge case handling in the MLE algorithm
|
|
109
|
+
|
|
8
110
|
## [0.1.1] - 2025-03-21
|
|
9
111
|
|
|
10
112
|
### Changed
|
data/README.md
CHANGED
|
@@ -1,11 +1,32 @@
|
|
|
1
1
|
# Hyll
|
|
2
2
|
|
|
3
|
+

|
|
4
|
+

|
|
3
5
|
[](https://github.com/davidesantangelo/hyll/actions)
|
|
4
6
|
|
|
5
|
-
Hyll is a Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
|
|
7
|
+
Hyll is a **blazing-fast** Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
|
|
6
8
|
|
|
7
9
|
> The name "Hyll" is a shortened form of "HyperLogLog", keeping the characteristic "H" and "LL" sounds.
|
|
8
10
|
|
|
11
|
+
## 🚀 Version 1.0.0 - Blazing Fast Edition
|
|
12
|
+
|
|
13
|
+
Version 1.0.0 marks the first stable release with a **complete performance overhaul**:
|
|
14
|
+
|
|
15
|
+
| Improvement | Description |
|
|
16
|
+
|-------------|-------------|
|
|
17
|
+
| Batch Operations | **2x faster** with optimized chunk processing |
|
|
18
|
+
| Lookup Tables | O(1) power-of-2 and CLZ operations |
|
|
19
|
+
| Memory Efficiency | Reduced allocations in hot paths |
|
|
20
|
+
| Hash Function | Inlined MurmurHash3 with loop unrolling |
|
|
21
|
+
|
|
22
|
+
### Key Optimizations
|
|
23
|
+
|
|
24
|
+
- **Pre-computed Lookup Tables**: All power-of-2 calculations use O(1) table lookups
|
|
25
|
+
- **Fast CLZ**: 256-entry byte-level lookup table for count-leading-zeros
|
|
26
|
+
- **Optimized MurmurHash3**: Loop unrolling and tail optimization
|
|
27
|
+
- **Cached Computations**: α×m², modification probability, register masks pre-computed
|
|
28
|
+
- **Batch Processing**: Chunk-based `add_all` for better performance
|
|
29
|
+
|
|
9
30
|
## Installation
|
|
10
31
|
|
|
11
32
|
Add this line to your application's Gemfile:
|
|
@@ -46,6 +67,19 @@ hll.add("apple") # Duplicates don't affect the cardinality
|
|
|
46
67
|
puts hll.cardinality # Output: approximately 3
|
|
47
68
|
```
|
|
48
69
|
|
|
70
|
+
### Batch Operations (Optimized in 1.0.0)
|
|
71
|
+
|
|
72
|
+
```ruby
|
|
73
|
+
# Efficient batch adding with chunk processing
|
|
74
|
+
hll = Hyll.new(precision: 12)
|
|
75
|
+
|
|
76
|
+
# Add many elements efficiently
|
|
77
|
+
elements = (1..100_000).map { |i| "element-#{i}" }
|
|
78
|
+
hll.add_all(elements)
|
|
79
|
+
|
|
80
|
+
puts hll.cardinality # Estimated count
|
|
81
|
+
```
|
|
82
|
+
|
|
49
83
|
### With Custom Precision
|
|
50
84
|
|
|
51
85
|
```ruby
|
|
@@ -204,18 +238,17 @@ This table compares different configurations of the HyperLogLog algorithm:
|
|
|
204
238
|
| K-Minimum Values | Medium | High (Approximate) | Yes | Medium | High accuracy cardinality estimation, set operations |
|
|
205
239
|
| Bloom Filter | Medium | N/A (Membership) | No (Cardinality) / Yes (Union) | Low | Membership testing with false positives, not cardinality |
|
|
206
240
|
|
|
207
|
-
### Benchmark Results
|
|
241
|
+
### Benchmark Results (v1.0.0)
|
|
208
242
|
|
|
209
|
-
Below are
|
|
243
|
+
Below are performance measurements from an Apple Mac Mini M4:
|
|
210
244
|
|
|
211
245
|
| Operation | Implementation | Time (seconds) | Items/Operations |
|
|
212
246
|
| ----------------------- | -------------------- | -------------- | ---------------- |
|
|
213
|
-
| Element Addition | Standard HyperLogLog | 0.
|
|
214
|
-
| Element Addition | EnhancedHyperLogLog
|
|
215
|
-
|
|
|
216
|
-
| Cardinality Calculation |
|
|
217
|
-
|
|
|
218
|
-
| Deserialization | Standard HyperLogLog | 0.0005 | 10 operations |
|
|
247
|
+
| Element Addition | Standard HyperLogLog | 0.15 | 100,000 items |
|
|
248
|
+
| Element Addition | EnhancedHyperLogLog | 0.16 | 100,000 items |
|
|
249
|
+
| Batch Addition | Standard HyperLogLog | 0.075 | 100,000 items |
|
|
250
|
+
| Cardinality Calculation | Standard HyperLogLog | 0.07 | 1,000 calls |
|
|
251
|
+
| Hash Function | MurmurHash3 | 0.05 | 100,000 hashes |
|
|
219
252
|
|
|
220
253
|
#### Memory Efficiency
|
|
221
254
|
|
|
@@ -226,15 +259,87 @@ Below are actual performance measurements from an Apple Mac Mini M4 with 24GB RA
|
|
|
226
259
|
|
|
227
260
|
These benchmarks demonstrate HyperLogLog's exceptional memory efficiency, maintaining a compression ratio of over 6,250x compared to storing the raw elements, while still providing accurate cardinality estimates.
|
|
228
261
|
|
|
262
|
+
## Benchmark Comparison with Redis
|
|
263
|
+
|
|
264
|
+
Hyll has been benchmarked against Redis' HyperLogLog implementation to provide a comparison with a widely-used production system. The tests were run on an Apple Silicon M1 Mac using Ruby 3.1.4 with 10,000 elements and a precision of 10.
|
|
265
|
+
|
|
266
|
+
### Insertion Performance
|
|
267
|
+
|
|
268
|
+
| Implementation | Operations/sec | Relative Performance |
|
|
269
|
+
|----------------|---------------:|---------------------:|
|
|
270
|
+
| Hyll Standard | 86.32 | 1.00x (fastest) |
|
|
271
|
+
| Hyll Batch | 85.98 | 1.00x |
|
|
272
|
+
| Redis Pipelined| 20.51 | 4.21x slower |
|
|
273
|
+
| Redis PFADD | 4.93 | 17.51x slower |
|
|
274
|
+
| Hyll Enhanced | 1.20 | 71.87x slower |
|
|
275
|
+
|
|
276
|
+
### Cardinality Estimation Performance
|
|
277
|
+
|
|
278
|
+
| Implementation | Operations/sec | Relative Performance |
|
|
279
|
+
|---------------------|---------------:|---------------------:|
|
|
280
|
+
| Redis PFCOUNT | 53,131 | 1.00x (fastest) |
|
|
281
|
+
| Hyll Enhanced Stream| 24,412 | 2.18x slower |
|
|
282
|
+
| Hyll Enhanced | 8,843 | 6.01x slower |
|
|
283
|
+
| Hyll Standard | 8,538 | 6.22x slower |
|
|
284
|
+
| Hyll MLE | 5,645 | 9.41x slower |
|
|
285
|
+
|
|
286
|
+
### Merge Performance
|
|
287
|
+
|
|
288
|
+
| Implementation | Operations/sec | Relative Performance |
|
|
289
|
+
|----------------|---------------:|---------------------:|
|
|
290
|
+
| Redis PFMERGE | 12,735 | 1.00x (fastest) |
|
|
291
|
+
| Hyll Enhanced | 6,523 | 1.95x slower |
|
|
292
|
+
| Hyll Standard | 2,932 | 4.34x slower |
|
|
293
|
+
|
|
294
|
+
### Memory Usage
|
|
295
|
+
|
|
296
|
+
| Implementation | Memory Usage |
|
|
297
|
+
|------------------|-------------:|
|
|
298
|
+
| Hyll Enhanced | 0.28 KB |
|
|
299
|
+
| Hyll Standard | 18.30 KB |
|
|
300
|
+
| Redis | 12.56 KB |
|
|
301
|
+
| Raw Elements | 0.04 KB |
|
|
302
|
+
|
|
303
|
+
### Accuracy Comparison
|
|
304
|
+
|
|
305
|
+
| Implementation | Estimated Count | Actual Count | Error |
|
|
306
|
+
|---------------------|----------------:|-------------:|---------:|
|
|
307
|
+
| Redis | 9,990 | 10,000 | 0.10% |
|
|
308
|
+
| Hyll Enhanced | 3,018 | 10,000 | 69.82% |
|
|
309
|
+
| Hyll (High Prec) | 19,016 | 10,000 | 90.16% |
|
|
310
|
+
| Hyll Standard | 32,348 | 10,000 | 223.48% |
|
|
311
|
+
| Hyll Enhanced Stream| 8,891,659 | 10,000 | 88,816.59% |
|
|
312
|
+
| Hyll MLE | 19,986,513 | 10,000 | 199,765.13% |
|
|
313
|
+
|
|
314
|
+
### Summary of Findings
|
|
315
|
+
|
|
316
|
+
- **Insertion Performance**: Hyll Standard and Batch operations are significantly faster than Redis for adding elements.
|
|
317
|
+
- **Cardinality Estimation**: Redis has the fastest cardinality estimation, with Hyll Enhanced Stream as a close second.
|
|
318
|
+
- **Merge Operations**: Redis outperforms Hyll for merging HyperLogLog sketches, but Hyll Enhanced provides competitive performance.
|
|
319
|
+
- **Memory Usage**: Hyll Enhanced offers the most memory-efficient implementation.
|
|
320
|
+
- **Accuracy**: Redis provides the best accuracy in this test scenario.
|
|
321
|
+
|
|
322
|
+
#### Recommendation
|
|
323
|
+
|
|
324
|
+
For most use cases, Redis offers an excellent balance of accuracy and performance. However, Hyll provides superior insertion performance and memory efficiency, making it a good choice for scenarios where these attributes are prioritized.
|
|
325
|
+
|
|
326
|
+
You can run these benchmarks yourself using the included script:
|
|
327
|
+
|
|
328
|
+
```ruby
|
|
329
|
+
ruby examples/redis_comparison_benchmark.rb
|
|
330
|
+
```
|
|
331
|
+
|
|
229
332
|
## Features
|
|
230
333
|
|
|
231
|
-
- Standard HyperLogLog implementation with customizable precision
|
|
334
|
+
- Standard HyperLogLog implementation with customizable precision (4-16)
|
|
335
|
+
- **Pre-computed lookup tables** for O(1) power-of-2 and CLZ operations
|
|
232
336
|
- Memory-efficient register storage with 4-bit packing (inspired by Facebook's Presto implementation)
|
|
233
|
-
- Sparse representation for small cardinalities
|
|
337
|
+
- Sparse representation for small cardinalities (exact counting)
|
|
234
338
|
- Dense representation for larger datasets
|
|
235
339
|
- EnhancedHyperLogLog format for compatibility with other systems
|
|
236
340
|
- Streaming martingale estimator for improved accuracy with EnhancedHyperLogLog
|
|
237
|
-
- Maximum Likelihood Estimation for improved accuracy
|
|
341
|
+
- Maximum Likelihood Estimation (MLE) for improved accuracy
|
|
342
|
+
- **Optimized batch processing** with `add_all`
|
|
238
343
|
- Merge and serialization capabilities
|
|
239
344
|
- Factory pattern for creating and deserializing counters
|
|
240
345
|
|
|
@@ -246,13 +351,15 @@ Hyll offers two main implementations:
|
|
|
246
351
|
|
|
247
352
|
2. **EnhancedHyperLogLog**: A strictly dense format similar to Facebook's Presto P4HYPERLOGLOG type, where "P4" refers to the 4-bit precision per register. This format is slightly less memory-efficient but offers better compatibility with other HyperLogLog implementations. It also includes a streaming martingale estimator that can provide up to 1.56x better accuracy for the same memory usage.
|
|
248
353
|
|
|
249
|
-
|
|
354
|
+
### v1.0.0 Performance Architecture
|
|
355
|
+
|
|
356
|
+
The internal architecture has been completely redesigned for maximum performance:
|
|
250
357
|
|
|
251
|
-
- `Hyll::Constants`:
|
|
252
|
-
- `Hyll::Utils::Hash`:
|
|
253
|
-
- `Hyll::Utils::Math`:
|
|
254
|
-
- `Hyll::HyperLogLog`:
|
|
255
|
-
- `Hyll::EnhancedHyperLogLog`:
|
|
358
|
+
- `Hyll::Constants`: Pre-computed lookup tables (POW2, CLZ8, ALPHA_M_SQUARED)
|
|
359
|
+
- `Hyll::Utils::Hash`: Optimized MurmurHash3 with batch processing and inlined operations
|
|
360
|
+
- `Hyll::Utils::Math`: Lookup-based math with cached computations
|
|
361
|
+
- `Hyll::HyperLogLog`: Register mask pre-computation, fast nibble operations
|
|
362
|
+
- `Hyll::EnhancedHyperLogLog`: Cached modification probability, zero-allocation updates
|
|
256
363
|
- `Hyll::Factory`: Factory pattern for creating counters
|
|
257
364
|
|
|
258
365
|
## Examples
|
|
@@ -293,6 +400,13 @@ For advanced usage scenarios, check out `examples/advance.rb` which includes:
|
|
|
293
400
|
- Advanced serialization techniques
|
|
294
401
|
- Precision vs. memory usage benchmarks
|
|
295
402
|
|
|
403
|
+
## Example Use Cases
|
|
404
|
+
|
|
405
|
+
Here is a quick illustration of how Hyll can be helpful in a real-world scenario:
|
|
406
|
+
- UniqueVisitorCounting: Track unique users visiting a website in real time. By adding each user's session ID or IP to the HyperLogLog, you get an approximate number of distinct users without storing everybody's data.
|
|
407
|
+
- LogAnalytics: Continuously process large log files to calculate the volume of unique events, keeping memory usage low.
|
|
408
|
+
- MarketingCampaigns: Quickly gauge how many distinct customers participate in a campaign while merging data from multiple sources.
|
|
409
|
+
|
|
296
410
|
## Development
|
|
297
411
|
|
|
298
412
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|