hyll 0.1.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 322ba54ca099c154e9bc295200933314e019bc61adcc0df9f45a4ee352e7a8fa
4
- data.tar.gz: ebcdfac3bcd8d421876c329b33db3be5594379ce7586d6d11585beb6308a7e52
3
+ metadata.gz: 92ad297dec2242d67ec3d5fb377658db11bb68fde8b26856f4bdf86f4d28fdc9
4
+ data.tar.gz: b8cb857f998f3eaee640d8958abf2ca9696488ea3517709eb4dc231643b89865
5
5
  SHA512:
6
- metadata.gz: 6d308475f7666c0d945c0a1b780aa84de2f15b987f0bea463b8897067c8190c9681d76689570535075c312de65aac30fca028dab29eb62fd48f90c12c0db2c56
7
- data.tar.gz: edf858ca1185c8b419fb8291a30fd99323610724d8c3a77b82b8f19f956b9784c32125eba600284ad62ab359ebe8fc74accb783189a3b04859b0fff88cb5be70
6
+ metadata.gz: 97ee62fadbb6c31e90b9a2b711347af4651a886e220906060c0a41402fe2ddece0d06d3e362d9cf1b8cc5ccee1ad0bee2accc156b39a4b9a6551e8b0ee5c97e4
7
+ data.tar.gz: 82fc6e4fa956a1c0aa9ab7e70faa35cb36ab5687d65cd9907b46faee23090922e21e1f2005371e7c5fcc1657764bc8b704347962a569b319e43f57fa380cf7ea
data/CHANGELOG.md CHANGED
@@ -5,6 +5,108 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [1.0.0] - 2025-11-28
9
+
10
+ ### 🚀 MAJOR PERFORMANCE RELEASE - "Blazing Fast Edition"
11
+
12
+ This release marks the first stable version of Hyll with a complete performance overhaul, delivering significant speed improvements through optimizations at every level of the codebase.
13
+
14
+ ### Performance Highlights
15
+
16
+ - **2x faster batch operations** through optimized `add_all` with chunk processing
17
+ - **Faster cardinality estimation** via pre-computed lookup tables
18
+ - **Reduced memory allocations** in hot paths
19
+ - **O(1) bit operations** using byte-level lookup tables for CLZ
20
+
21
+ ### Added
22
+
23
+ #### Pre-computed Lookup Tables (Constants)
24
+ - `POW2_NEG_TABLE`: Pre-computed 2^(-n) values for n=0..64, eliminating expensive power calculations
25
+ - `POW2_TABLE`: Pre-computed 2^n values for instant bit shifting
26
+ - `CLZ8_TABLE`: 256-entry byte-level count-leading-zeros table for O(1) CLZ operations
27
+ - `LOG2_TABLE`: Pre-computed log2 values for common register counts
28
+ - `REGISTER_MASKS`: Pre-computed masks for register extraction (precision 4-16)
29
+ - `ALPHA_M_SQUARED`: Pre-computed α×m² values for each precision level
30
+ - `OPTIMAL_BATCH_SIZE`: Tuned batch size (1024) for optimal cache utilization
31
+ - Inlined MurmurHash3 constants for maximum speed
32
+
33
+ #### Ultra-Fast Hash Functions
34
+ - `murmurhash3_batch`: Batch hashing for multiple elements with amortized overhead
35
+ - `hash_and_extract`: Combined hash + HLL extraction in single pass
36
+ - `fast_clz32`: O(1) 32-bit count leading zeros using 256-entry byte lookup table
37
+ - Loop unrolling for 4-byte block processing in MurmurHash3
38
+ - Optimized tail byte handling
39
+
40
+ #### Optimized Math Utilities
41
+ - `pow2_neg`: O(1) power of 2 negative lookup
42
+ - `sum_pow2_neg`: Vectorized sum of 2^(-v) for register arrays
43
+ - `alpha_m_squared`: Pre-computed α×m² retrieval
44
+ - `harmonic_mean_sum`: Optimized harmonic mean for register values
45
+ - Cached Taylor series coefficients for h(x) calculation
46
+
47
+ #### HyperLogLog Core Optimizations
48
+ - `@register_mask`: Pre-computed bitmask for register index extraction
49
+ - `@alpha_m_squared`: Pre-computed estimation constant
50
+ - `@pow2_neg_table`: Instance-level table reference for cache locality
51
+ - `add_to_registers_fast`: Inlined hash + update path
52
+ - `update_register_fast`: Minimized branching with overflow handling
53
+ - `get_register_value_fast`: Optimized nibble extraction with bit operations
54
+ - `set_register_value_fast`: Direct nibble setting without conditionals
55
+ - `extract_counts_fast`: Single-pass register value counting
56
+ - Batch-optimized `add_all` with chunk processing
57
+
58
+ #### EnhancedHyperLogLog Optimizations
59
+ - `modification_probability_fast`: Cached probability calculation
60
+ - `@cached_mod_prob`: Modification probability cache with dirty flag
61
+ - `@registers_dirty`: Change tracking for cache invalidation
62
+ - `merge_dense_registers_optimized`: Direct register comparison without allocation
63
+ - `adjust_registers_for_estimation`: Non-mutating register adjustment
64
+ - `compute_cardinality_from_registers`: Separated estimation logic
65
+ - Eliminated `@registers.dup` in hot paths
66
+
67
+ ### Changed
68
+
69
+ - **Bit operations**: Replaced `index / 2` with `index >> 1` and `index % 2` with `index & 1`
70
+ - **Power calculations**: All `2.0 ** -x` replaced with lookup table access
71
+ - **Register initialization**: `(@m + 1) >> 1` instead of `(@m / 2.0).ceil`
72
+ - **Loop optimization**: Integer iteration with early-exit patterns
73
+ - **Memory allocation**: Minimal object creation in hot paths
74
+ - **Method inlining**: Critical paths inlined to reduce call overhead
75
+
76
+ ### Fixed
77
+
78
+ - Edge case in `linear_counting` when zero_registers >= m
79
+ - Potential division by zero in modification probability
80
+ - Memory leak in streaming estimate accumulation
81
+
82
+ ### Internal
83
+
84
+ - Backward-compatible aliases for all renamed methods
85
+ - Comprehensive documentation for all new performance constants
86
+ - Type annotations preserved in signature files
87
+
88
+ ## [0.2.0] - 2025-03-24
89
+
90
+ ### Added
91
+ - Associativity test for HyperLogLog merges to ensure (A ∪ B) ∪ C = A ∪ (B ∪ C)
92
+ - Guards against invalid inputs in mathematical functions
93
+ - Memoization for h_values calculation in MLE algorithm
94
+ - More comprehensive error handling with safeguards
95
+ - Added `examples/redis_comparison_benchmark.rb` for Redis comparison
96
+
97
+ ### Changed
98
+ - Optimized Maximum Likelihood Estimation (MLE) algorithm for better performance
99
+ - Improved numerical stability in the secant method implementation
100
+ - Enhanced Taylor series calculation for better accuracy
101
+ - Fixed Math module namespace conflicts
102
+ - Added safeguards against division by zero and other numerical errors
103
+ - Limited maximum iterations in convergence algorithms to prevent infinite loops
104
+
105
+ ### Fixed
106
+ - Addressed potential numerical instability in calculate_h_values method
107
+ - Fixed undefined method 'exp' error by using global namespace operator
108
+ - Improved edge case handling in the MLE algorithm
109
+
8
110
  ## [0.1.1] - 2025-03-21
9
111
 
10
112
  ### Changed
data/README.md CHANGED
@@ -1,11 +1,32 @@
1
1
  # Hyll
2
2
 
3
+ ![Gem Version](https://img.shields.io/gem/v/hyll)
4
+ ![Gem Total Downloads](https://img.shields.io/gem/dt/hyll)
3
5
  [![Build Status](https://github.com/davidesantangelo/hyll/workflows/Ruby%20Tests/badge.svg)](https://github.com/davidesantangelo/hyll/actions)
4
6
 
5
- Hyll is a Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
7
+ Hyll is a **blazing-fast** Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
6
8
 
7
9
  > The name "Hyll" is a shortened form of "HyperLogLog", keeping the characteristic "H" and "LL" sounds.
8
10
 
11
+ ## 🚀 Version 1.0.0 - Blazing Fast Edition
12
+
13
+ Version 1.0.0 marks the first stable release with a **complete performance overhaul**:
14
+
15
+ | Improvement | Description |
16
+ |-------------|-------------|
17
+ | Batch Operations | **2x faster** with optimized chunk processing |
18
+ | Lookup Tables | O(1) power-of-2 and CLZ operations |
19
+ | Memory Efficiency | Reduced allocations in hot paths |
20
+ | Hash Function | Inlined MurmurHash3 with loop unrolling |
21
+
22
+ ### Key Optimizations
23
+
24
+ - **Pre-computed Lookup Tables**: All power-of-2 calculations use O(1) table lookups
25
+ - **Fast CLZ**: 256-entry byte-level lookup table for count-leading-zeros
26
+ - **Optimized MurmurHash3**: Loop unrolling and tail optimization
27
+ - **Cached Computations**: α×m², modification probability, register masks pre-computed
28
+ - **Batch Processing**: Chunk-based `add_all` for better performance
29
+
9
30
  ## Installation
10
31
 
11
32
  Add this line to your application's Gemfile:
@@ -46,6 +67,19 @@ hll.add("apple") # Duplicates don't affect the cardinality
46
67
  puts hll.cardinality # Output: approximately 3
47
68
  ```
48
69
 
70
+ ### Batch Operations (Optimized in 1.0.0)
71
+
72
+ ```ruby
73
+ # Efficient batch adding with chunk processing
74
+ hll = Hyll.new(precision: 12)
75
+
76
+ # Add many elements efficiently
77
+ elements = (1..100_000).map { |i| "element-#{i}" }
78
+ hll.add_all(elements)
79
+
80
+ puts hll.cardinality # Estimated count
81
+ ```
82
+
49
83
  ### With Custom Precision
50
84
 
51
85
  ```ruby
@@ -204,18 +238,17 @@ This table compares different configurations of the HyperLogLog algorithm:
204
238
  | K-Minimum Values | Medium | High (Approximate) | Yes | Medium | High accuracy cardinality estimation, set operations |
205
239
  | Bloom Filter | Medium | N/A (Membership) | No (Cardinality) / Yes (Union) | Low | Membership testing with false positives, not cardinality |
206
240
 
207
- ### Benchmark Results
241
+ ### Benchmark Results (v1.0.0)
208
242
 
209
- Below are actual performance measurements from an Apple Mac Mini M4 with 24GB RAM:
243
+ Below are performance measurements from an Apple Mac Mini M4:
210
244
 
211
245
  | Operation | Implementation | Time (seconds) | Items/Operations |
212
246
  | ----------------------- | -------------------- | -------------- | ---------------- |
213
- | Element Addition | Standard HyperLogLog | 0.0176 | 10,000 items |
214
- | Element Addition | EnhancedHyperLogLog | 0.0109 | 10,000 items |
215
- | Cardinality Calculation | Standard HyperLogLog | 0.0011 | 10 calculations |
216
- | Cardinality Calculation | EnhancedHyperLogLog | 0.0013 | 10 calculations |
217
- | Serialization | Standard HyperLogLog | 0.0003 | 10 operations |
218
- | Deserialization | Standard HyperLogLog | 0.0005 | 10 operations |
247
+ | Element Addition | Standard HyperLogLog | 0.15 | 100,000 items |
248
+ | Element Addition | EnhancedHyperLogLog | 0.16 | 100,000 items |
249
+ | Batch Addition | Standard HyperLogLog | 0.075 | 100,000 items |
250
+ | Cardinality Calculation | Standard HyperLogLog | 0.07 | 1,000 calls |
251
+ | Hash Function | MurmurHash3 | 0.05 | 100,000 hashes |
219
252
 
220
253
  #### Memory Efficiency
221
254
 
@@ -226,15 +259,87 @@ Below are actual performance measurements from an Apple Mac Mini M4 with 24GB RA
226
259
 
227
260
  These benchmarks demonstrate HyperLogLog's exceptional memory efficiency, maintaining a compression ratio of over 6,250x compared to storing the raw elements, while still providing accurate cardinality estimates.
228
261
 
262
+ ## Benchmark Comparison with Redis
263
+
264
+ Hyll has been benchmarked against Redis' HyperLogLog implementation to provide a comparison with a widely-used production system. The tests were run on an Apple Silicon M1 Mac using Ruby 3.1.4 with 10,000 elements and a precision of 10.
265
+
266
+ ### Insertion Performance
267
+
268
+ | Implementation | Operations/sec | Relative Performance |
269
+ |----------------|---------------:|---------------------:|
270
+ | Hyll Standard | 86.32 | 1.00x (fastest) |
271
+ | Hyll Batch | 85.98 | 1.00x |
272
+ | Redis Pipelined| 20.51 | 4.21x slower |
273
+ | Redis PFADD | 4.93 | 17.51x slower |
274
+ | Hyll Enhanced | 1.20 | 71.87x slower |
275
+
276
+ ### Cardinality Estimation Performance
277
+
278
+ | Implementation | Operations/sec | Relative Performance |
279
+ |---------------------|---------------:|---------------------:|
280
+ | Redis PFCOUNT | 53,131 | 1.00x (fastest) |
281
+ | Hyll Enhanced Stream| 24,412 | 2.18x slower |
282
+ | Hyll Enhanced | 8,843 | 6.01x slower |
283
+ | Hyll Standard | 8,538 | 6.22x slower |
284
+ | Hyll MLE | 5,645 | 9.41x slower |
285
+
286
+ ### Merge Performance
287
+
288
+ | Implementation | Operations/sec | Relative Performance |
289
+ |----------------|---------------:|---------------------:|
290
+ | Redis PFMERGE | 12,735 | 1.00x (fastest) |
291
+ | Hyll Enhanced | 6,523 | 1.95x slower |
292
+ | Hyll Standard | 2,932 | 4.34x slower |
293
+
294
+ ### Memory Usage
295
+
296
+ | Implementation | Memory Usage |
297
+ |------------------|-------------:|
298
+ | Hyll Enhanced | 0.28 KB |
299
+ | Hyll Standard | 18.30 KB |
300
+ | Redis | 12.56 KB |
301
+ | Raw Elements | 0.04 KB |
302
+
303
+ ### Accuracy Comparison
304
+
305
+ | Implementation | Estimated Count | Actual Count | Error |
306
+ |---------------------|----------------:|-------------:|---------:|
307
+ | Redis | 9,990 | 10,000 | 0.10% |
308
+ | Hyll Enhanced | 3,018 | 10,000 | 69.82% |
309
+ | Hyll (High Prec) | 19,016 | 10,000 | 90.16% |
310
+ | Hyll Standard | 32,348 | 10,000 | 223.48% |
311
+ | Hyll Enhanced Stream| 8,891,659 | 10,000 | 88,816.59% |
312
+ | Hyll MLE | 19,986,513 | 10,000 | 199,765.13% |
313
+
314
+ ### Summary of Findings
315
+
316
+ - **Insertion Performance**: Hyll Standard and Batch operations are significantly faster than Redis for adding elements.
317
+ - **Cardinality Estimation**: Redis has the fastest cardinality estimation, with Hyll Enhanced Stream as a close second.
318
+ - **Merge Operations**: Redis outperforms Hyll for merging HyperLogLog sketches, but Hyll Enhanced provides competitive performance.
319
+ - **Memory Usage**: Hyll Enhanced offers the most memory-efficient implementation.
320
+ - **Accuracy**: Redis provides the best accuracy in this test scenario.
321
+
322
+ #### Recommendation
323
+
324
+ For most use cases, Redis offers an excellent balance of accuracy and performance. However, Hyll provides superior insertion performance and memory efficiency, making it a good choice for scenarios where these attributes are prioritized.
325
+
326
+ You can run these benchmarks yourself using the included script:
327
+
328
+ ```ruby
329
+ ruby examples/redis_comparison_benchmark.rb
330
+ ```
331
+
229
332
  ## Features
230
333
 
231
- - Standard HyperLogLog implementation with customizable precision
334
+ - Standard HyperLogLog implementation with customizable precision (4-16)
335
+ - **Pre-computed lookup tables** for O(1) power-of-2 and CLZ operations
232
336
  - Memory-efficient register storage with 4-bit packing (inspired by Facebook's Presto implementation)
233
- - Sparse representation for small cardinalities
337
+ - Sparse representation for small cardinalities (exact counting)
234
338
  - Dense representation for larger datasets
235
339
  - EnhancedHyperLogLog format for compatibility with other systems
236
340
  - Streaming martingale estimator for improved accuracy with EnhancedHyperLogLog
237
- - Maximum Likelihood Estimation for improved accuracy
341
+ - Maximum Likelihood Estimation (MLE) for improved accuracy
342
+ - **Optimized batch processing** with `add_all`
238
343
  - Merge and serialization capabilities
239
344
  - Factory pattern for creating and deserializing counters
240
345
 
@@ -246,13 +351,15 @@ Hyll offers two main implementations:
246
351
 
247
352
  2. **EnhancedHyperLogLog**: A strictly dense format similar to Facebook's Presto P4HYPERLOGLOG type, where "P4" refers to the 4-bit precision per register. This format is slightly less memory-efficient but offers better compatibility with other HyperLogLog implementations. It also includes a streaming martingale estimator that can provide up to 1.56x better accuracy for the same memory usage.
248
353
 
249
- The internal architecture follows a modular approach:
354
+ ### v1.0.0 Performance Architecture
355
+
356
+ The internal architecture has been completely redesigned for maximum performance:
250
357
 
251
- - `Hyll::Constants`: Shared constants used throughout the library
252
- - `Hyll::Utils::Hash`: Hash functions for element processing
253
- - `Hyll::Utils::Math`: Mathematical operations for HyperLogLog calculations
254
- - `Hyll::HyperLogLog`: The standard implementation
255
- - `Hyll::EnhancedHyperLogLog`: The enhanced implementation
358
+ - `Hyll::Constants`: Pre-computed lookup tables (POW2, CLZ8, ALPHA_M_SQUARED)
359
+ - `Hyll::Utils::Hash`: Optimized MurmurHash3 with batch processing and inlined operations
360
+ - `Hyll::Utils::Math`: Lookup-based math with cached computations
361
+ - `Hyll::HyperLogLog`: Register mask pre-computation, fast nibble operations
362
+ - `Hyll::EnhancedHyperLogLog`: Cached modification probability, zero-allocation updates
256
363
  - `Hyll::Factory`: Factory pattern for creating counters
257
364
 
258
365
  ## Examples
@@ -293,6 +400,13 @@ For advanced usage scenarios, check out `examples/advance.rb` which includes:
293
400
  - Advanced serialization techniques
294
401
  - Precision vs. memory usage benchmarks
295
402
 
403
+ ## Example Use Cases
404
+
405
+ Here is a quick illustration of how Hyll can be helpful in a real-world scenario:
406
+ - UniqueVisitorCounting: Track unique users visiting a website in real time. By adding each user's session ID or IP to the HyperLogLog, you get an approximate number of distinct users without storing everybody's data.
407
+ - LogAnalytics: Continuously process large log files to calculate the volume of unique events, keeping memory usage low.
408
+ - MarketingCampaigns: Quickly gauge how many distinct customers participate in a campaign while merging data from multiple sources.
409
+
296
410
  ## Development
297
411
 
298
412
  After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.