hyll 0.2.0 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +80 -0
- data/README.md +53 -18
- data/examples/v1_benchmark.rb +93 -0
- data/lib/hyll/algorithms/enhanced_hyperloglog.rb +234 -120
- data/lib/hyll/algorithms/hyperloglog.rb +262 -338
- data/lib/hyll/constants.rb +75 -0
- data/lib/hyll/utils/hash.rb +132 -21
- data/lib/hyll/utils/math.rb +129 -75
- data/lib/hyll/version.rb +1 -1
- metadata +3 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 92ad297dec2242d67ec3d5fb377658db11bb68fde8b26856f4bdf86f4d28fdc9
|
|
4
|
+
data.tar.gz: b8cb857f998f3eaee640d8958abf2ca9696488ea3517709eb4dc231643b89865
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 97ee62fadbb6c31e90b9a2b711347af4651a886e220906060c0a41402fe2ddece0d06d3e362d9cf1b8cc5ccee1ad0bee2accc156b39a4b9a6551e8b0ee5c97e4
|
|
7
|
+
data.tar.gz: 82fc6e4fa956a1c0aa9ab7e70faa35cb36ab5687d65cd9907b46faee23090922e21e1f2005371e7c5fcc1657764bc8b704347962a569b319e43f57fa380cf7ea
|
data/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,86 @@ All notable changes to this project will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [1.0.0] - 2025-11-28
|
|
9
|
+
|
|
10
|
+
### 🚀 MAJOR PERFORMANCE RELEASE - "Blazing Fast Edition"
|
|
11
|
+
|
|
12
|
+
This release marks the first stable version of Hyll with a complete performance overhaul, delivering significant speed improvements through optimizations at every level of the codebase.
|
|
13
|
+
|
|
14
|
+
### Performance Highlights
|
|
15
|
+
|
|
16
|
+
- **2x faster batch operations** through optimized `add_all` with chunk processing
|
|
17
|
+
- **Faster cardinality estimation** via pre-computed lookup tables
|
|
18
|
+
- **Reduced memory allocations** in hot paths
|
|
19
|
+
- **O(1) bit operations** using byte-level lookup tables for CLZ
|
|
20
|
+
|
|
21
|
+
### Added
|
|
22
|
+
|
|
23
|
+
#### Pre-computed Lookup Tables (Constants)
|
|
24
|
+
- `POW2_NEG_TABLE`: Pre-computed 2^(-n) values for n=0..64, eliminating expensive power calculations
|
|
25
|
+
- `POW2_TABLE`: Pre-computed 2^n values for instant bit shifting
|
|
26
|
+
- `CLZ8_TABLE`: 256-entry byte-level count-leading-zeros table for O(1) CLZ operations
|
|
27
|
+
- `LOG2_TABLE`: Pre-computed log2 values for common register counts
|
|
28
|
+
- `REGISTER_MASKS`: Pre-computed masks for register extraction (precision 4-16)
|
|
29
|
+
- `ALPHA_M_SQUARED`: Pre-computed α×m² values for each precision level
|
|
30
|
+
- `OPTIMAL_BATCH_SIZE`: Tuned batch size (1024) for optimal cache utilization
|
|
31
|
+
- Inlined MurmurHash3 constants for maximum speed
|
|
32
|
+
|
|
33
|
+
#### Ultra-Fast Hash Functions
|
|
34
|
+
- `murmurhash3_batch`: Batch hashing for multiple elements with amortized overhead
|
|
35
|
+
- `hash_and_extract`: Combined hash + HLL extraction in single pass
|
|
36
|
+
- `fast_clz32`: O(1) 32-bit count leading zeros using 256-entry byte lookup table
|
|
37
|
+
- Loop unrolling for 4-byte block processing in MurmurHash3
|
|
38
|
+
- Optimized tail byte handling
|
|
39
|
+
|
|
40
|
+
#### Optimized Math Utilities
|
|
41
|
+
- `pow2_neg`: O(1) power of 2 negative lookup
|
|
42
|
+
- `sum_pow2_neg`: Vectorized sum of 2^(-v) for register arrays
|
|
43
|
+
- `alpha_m_squared`: Pre-computed α×m² retrieval
|
|
44
|
+
- `harmonic_mean_sum`: Optimized harmonic mean for register values
|
|
45
|
+
- Cached Taylor series coefficients for h(x) calculation
|
|
46
|
+
|
|
47
|
+
#### HyperLogLog Core Optimizations
|
|
48
|
+
- `@register_mask`: Pre-computed bitmask for register index extraction
|
|
49
|
+
- `@alpha_m_squared`: Pre-computed estimation constant
|
|
50
|
+
- `@pow2_neg_table`: Instance-level table reference for cache locality
|
|
51
|
+
- `add_to_registers_fast`: Inlined hash + update path
|
|
52
|
+
- `update_register_fast`: Minimized branching with overflow handling
|
|
53
|
+
- `get_register_value_fast`: Optimized nibble extraction with bit operations
|
|
54
|
+
- `set_register_value_fast`: Direct nibble setting without conditionals
|
|
55
|
+
- `extract_counts_fast`: Single-pass register value counting
|
|
56
|
+
- Batch-optimized `add_all` with chunk processing
|
|
57
|
+
|
|
58
|
+
#### EnhancedHyperLogLog Optimizations
|
|
59
|
+
- `modification_probability_fast`: Cached probability calculation
|
|
60
|
+
- `@cached_mod_prob`: Modification probability cache with dirty flag
|
|
61
|
+
- `@registers_dirty`: Change tracking for cache invalidation
|
|
62
|
+
- `merge_dense_registers_optimized`: Direct register comparison without allocation
|
|
63
|
+
- `adjust_registers_for_estimation`: Non-mutating register adjustment
|
|
64
|
+
- `compute_cardinality_from_registers`: Separated estimation logic
|
|
65
|
+
- Eliminated `@registers.dup` in hot paths
|
|
66
|
+
|
|
67
|
+
### Changed
|
|
68
|
+
|
|
69
|
+
- **Bit operations**: Replaced `index / 2` with `index >> 1` and `index % 2` with `index & 1`
|
|
70
|
+
- **Power calculations**: All `2.0 ** -x` replaced with lookup table access
|
|
71
|
+
- **Register initialization**: `(@m + 1) >> 1` instead of `(@m / 2.0).ceil`
|
|
72
|
+
- **Loop optimization**: Integer iteration with early-exit patterns
|
|
73
|
+
- **Memory allocation**: Minimal object creation in hot paths
|
|
74
|
+
- **Method inlining**: Critical paths inlined to reduce call overhead
|
|
75
|
+
|
|
76
|
+
### Fixed
|
|
77
|
+
|
|
78
|
+
- Edge case in `linear_counting` when zero_registers >= m
|
|
79
|
+
- Potential division by zero in modification probability
|
|
80
|
+
- Memory leak in streaming estimate accumulation
|
|
81
|
+
|
|
82
|
+
### Internal
|
|
83
|
+
|
|
84
|
+
- Backward-compatible aliases for all renamed methods
|
|
85
|
+
- Comprehensive documentation for all new performance constants
|
|
86
|
+
- Type annotations preserved in signature files
|
|
87
|
+
|
|
8
88
|
## [0.2.0] - 2025-03-24
|
|
9
89
|
|
|
10
90
|
### Added
|
data/README.md
CHANGED
|
@@ -4,10 +4,29 @@
|
|
|
4
4
|

|
|
5
5
|
[](https://github.com/davidesantangelo/hyll/actions)
|
|
6
6
|
|
|
7
|
-
Hyll is a Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
|
|
7
|
+
Hyll is a **blazing-fast** Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
|
|
8
8
|
|
|
9
9
|
> The name "Hyll" is a shortened form of "HyperLogLog", keeping the characteristic "H" and "LL" sounds.
|
|
10
10
|
|
|
11
|
+
## 🚀 Version 1.0.0 - Blazing Fast Edition
|
|
12
|
+
|
|
13
|
+
Version 1.0.0 marks the first stable release with a **complete performance overhaul**:
|
|
14
|
+
|
|
15
|
+
| Improvement | Description |
|
|
16
|
+
|-------------|-------------|
|
|
17
|
+
| Batch Operations | **2x faster** with optimized chunk processing |
|
|
18
|
+
| Lookup Tables | O(1) power-of-2 and CLZ operations |
|
|
19
|
+
| Memory Efficiency | Reduced allocations in hot paths |
|
|
20
|
+
| Hash Function | Inlined MurmurHash3 with loop unrolling |
|
|
21
|
+
|
|
22
|
+
### Key Optimizations
|
|
23
|
+
|
|
24
|
+
- **Pre-computed Lookup Tables**: All power-of-2 calculations use O(1) table lookups
|
|
25
|
+
- **Fast CLZ**: 256-entry byte-level lookup table for count-leading-zeros
|
|
26
|
+
- **Optimized MurmurHash3**: Loop unrolling and tail optimization
|
|
27
|
+
- **Cached Computations**: α×m², modification probability, register masks pre-computed
|
|
28
|
+
- **Batch Processing**: Chunk-based `add_all` for better performance
|
|
29
|
+
|
|
11
30
|
## Installation
|
|
12
31
|
|
|
13
32
|
Add this line to your application's Gemfile:
|
|
@@ -48,6 +67,19 @@ hll.add("apple") # Duplicates don't affect the cardinality
|
|
|
48
67
|
puts hll.cardinality # Output: approximately 3
|
|
49
68
|
```
|
|
50
69
|
|
|
70
|
+
### Batch Operations (Optimized in 1.0.0)
|
|
71
|
+
|
|
72
|
+
```ruby
|
|
73
|
+
# Efficient batch adding with chunk processing
|
|
74
|
+
hll = Hyll.new(precision: 12)
|
|
75
|
+
|
|
76
|
+
# Add many elements efficiently
|
|
77
|
+
elements = (1..100_000).map { |i| "element-#{i}" }
|
|
78
|
+
hll.add_all(elements)
|
|
79
|
+
|
|
80
|
+
puts hll.cardinality # Estimated count
|
|
81
|
+
```
|
|
82
|
+
|
|
51
83
|
### With Custom Precision
|
|
52
84
|
|
|
53
85
|
```ruby
|
|
@@ -206,18 +238,17 @@ This table compares different configurations of the HyperLogLog algorithm:
|
|
|
206
238
|
| K-Minimum Values | Medium | High (Approximate) | Yes | Medium | High accuracy cardinality estimation, set operations |
|
|
207
239
|
| Bloom Filter | Medium | N/A (Membership) | No (Cardinality) / Yes (Union) | Low | Membership testing with false positives, not cardinality |
|
|
208
240
|
|
|
209
|
-
### Benchmark Results
|
|
241
|
+
### Benchmark Results (v1.0.0)
|
|
210
242
|
|
|
211
|
-
Below are
|
|
243
|
+
Below are performance measurements from an Apple Mac Mini M4:
|
|
212
244
|
|
|
213
245
|
| Operation | Implementation | Time (seconds) | Items/Operations |
|
|
214
246
|
| ----------------------- | -------------------- | -------------- | ---------------- |
|
|
215
|
-
| Element Addition | Standard HyperLogLog | 0.
|
|
216
|
-
| Element Addition | EnhancedHyperLogLog
|
|
217
|
-
|
|
|
218
|
-
| Cardinality Calculation |
|
|
219
|
-
|
|
|
220
|
-
| Deserialization | Standard HyperLogLog | 0.0005 | 10 operations |
|
|
247
|
+
| Element Addition | Standard HyperLogLog | 0.15 | 100,000 items |
|
|
248
|
+
| Element Addition | EnhancedHyperLogLog | 0.16 | 100,000 items |
|
|
249
|
+
| Batch Addition | Standard HyperLogLog | 0.075 | 100,000 items |
|
|
250
|
+
| Cardinality Calculation | Standard HyperLogLog | 0.07 | 1,000 calls |
|
|
251
|
+
| Hash Function | MurmurHash3 | 0.05 | 100,000 hashes |
|
|
221
252
|
|
|
222
253
|
#### Memory Efficiency
|
|
223
254
|
|
|
@@ -300,13 +331,15 @@ ruby examples/redis_comparison_benchmark.rb
|
|
|
300
331
|
|
|
301
332
|
## Features
|
|
302
333
|
|
|
303
|
-
- Standard HyperLogLog implementation with customizable precision
|
|
334
|
+
- Standard HyperLogLog implementation with customizable precision (4-16)
|
|
335
|
+
- **Pre-computed lookup tables** for O(1) power-of-2 and CLZ operations
|
|
304
336
|
- Memory-efficient register storage with 4-bit packing (inspired by Facebook's Presto implementation)
|
|
305
|
-
- Sparse representation for small cardinalities
|
|
337
|
+
- Sparse representation for small cardinalities (exact counting)
|
|
306
338
|
- Dense representation for larger datasets
|
|
307
339
|
- EnhancedHyperLogLog format for compatibility with other systems
|
|
308
340
|
- Streaming martingale estimator for improved accuracy with EnhancedHyperLogLog
|
|
309
|
-
- Maximum Likelihood Estimation for improved accuracy
|
|
341
|
+
- Maximum Likelihood Estimation (MLE) for improved accuracy
|
|
342
|
+
- **Optimized batch processing** with `add_all`
|
|
310
343
|
- Merge and serialization capabilities
|
|
311
344
|
- Factory pattern for creating and deserializing counters
|
|
312
345
|
|
|
@@ -318,13 +351,15 @@ Hyll offers two main implementations:
|
|
|
318
351
|
|
|
319
352
|
2. **EnhancedHyperLogLog**: A strictly dense format similar to Facebook's Presto P4HYPERLOGLOG type, where "P4" refers to the 4-bit precision per register. This format is slightly less memory-efficient but offers better compatibility with other HyperLogLog implementations. It also includes a streaming martingale estimator that can provide up to 1.56x better accuracy for the same memory usage.
|
|
320
353
|
|
|
321
|
-
|
|
354
|
+
### v1.0.0 Performance Architecture
|
|
355
|
+
|
|
356
|
+
The internal architecture has been completely redesigned for maximum performance:
|
|
322
357
|
|
|
323
|
-
- `Hyll::Constants`:
|
|
324
|
-
- `Hyll::Utils::Hash`:
|
|
325
|
-
- `Hyll::Utils::Math`:
|
|
326
|
-
- `Hyll::HyperLogLog`:
|
|
327
|
-
- `Hyll::EnhancedHyperLogLog`:
|
|
358
|
+
- `Hyll::Constants`: Pre-computed lookup tables (POW2, CLZ8, ALPHA_M_SQUARED)
|
|
359
|
+
- `Hyll::Utils::Hash`: Optimized MurmurHash3 with batch processing and inlined operations
|
|
360
|
+
- `Hyll::Utils::Math`: Lookup-based math with cached computations
|
|
361
|
+
- `Hyll::HyperLogLog`: Register mask pre-computation, fast nibble operations
|
|
362
|
+
- `Hyll::EnhancedHyperLogLog`: Cached modification probability, zero-allocation updates
|
|
328
363
|
- `Hyll::Factory`: Factory pattern for creating counters
|
|
329
364
|
|
|
330
365
|
## Examples
|
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
# Hyll v1.0.0 - Blazing Fast Edition Benchmark
|
|
4
|
+
# Run with: ruby examples/v1_benchmark.rb
|
|
5
|
+
|
|
6
|
+
$LOAD_PATH.unshift File.expand_path("../lib", __dir__)
|
|
7
|
+
require "hyll"
|
|
8
|
+
require "benchmark"
|
|
9
|
+
|
|
10
|
+
puts "=" * 60
|
|
11
|
+
puts "HYLL v1.0.0 - BLAZING FAST EDITION BENCHMARK"
|
|
12
|
+
puts "=" * 60
|
|
13
|
+
puts
|
|
14
|
+
|
|
15
|
+
# Test 1: Element Addition
|
|
16
|
+
puts "1. Element Addition Performance"
|
|
17
|
+
puts "-" * 40
|
|
18
|
+
|
|
19
|
+
hll_standard = Hyll.new(type: :standard)
|
|
20
|
+
hll_enhanced = Hyll.new(type: :enhanced)
|
|
21
|
+
|
|
22
|
+
# Use integers for cleaner testing
|
|
23
|
+
elements = (1..100_000).to_a
|
|
24
|
+
|
|
25
|
+
time_standard = Benchmark.measure { elements.each { |e| hll_standard.add(e) } }
|
|
26
|
+
puts "Standard HLL (100k elements): #{time_standard.real.round(4)}s"
|
|
27
|
+
|
|
28
|
+
time_enhanced = Benchmark.measure { elements.each { |e| hll_enhanced.add(e) } }
|
|
29
|
+
puts "Enhanced HLL (100k elements): #{time_enhanced.real.round(4)}s"
|
|
30
|
+
puts
|
|
31
|
+
|
|
32
|
+
# Test 2: Batch Addition
|
|
33
|
+
puts "2. Batch Addition Performance"
|
|
34
|
+
puts "-" * 40
|
|
35
|
+
|
|
36
|
+
hll_batch = Hyll.new(type: :standard)
|
|
37
|
+
time_batch = Benchmark.measure { hll_batch.add_all(elements) }
|
|
38
|
+
puts "Standard HLL batch (100k elements): #{time_batch.real.round(4)}s"
|
|
39
|
+
puts "Speedup vs individual: #{(time_standard.real / time_batch.real).round(2)}x"
|
|
40
|
+
puts
|
|
41
|
+
|
|
42
|
+
# Test 3: Cardinality Calculation
|
|
43
|
+
puts "3. Cardinality Calculation Performance"
|
|
44
|
+
puts "-" * 40
|
|
45
|
+
|
|
46
|
+
hll = Hyll.new(type: :standard)
|
|
47
|
+
hll.add_all(elements)
|
|
48
|
+
|
|
49
|
+
time_card = Benchmark.measure { 1000.times { hll.cardinality } }
|
|
50
|
+
puts "Standard HLL (1000 calls): #{time_card.real.round(4)}s"
|
|
51
|
+
puts "Per call: #{((time_card.real / 1000) * 1_000_000).round(2)} microseconds"
|
|
52
|
+
puts
|
|
53
|
+
|
|
54
|
+
# Test 4: Memory Efficiency
|
|
55
|
+
puts "4. Memory Efficiency"
|
|
56
|
+
puts "-" * 40
|
|
57
|
+
|
|
58
|
+
# For integers, estimate as 8 bytes each (Fixnum size)
|
|
59
|
+
array_size = elements.size * 8
|
|
60
|
+
hll_size = hll.serialize.bytesize
|
|
61
|
+
|
|
62
|
+
puts "Raw data size: #{array_size} bytes"
|
|
63
|
+
puts "HLL serialized size: #{hll_size} bytes"
|
|
64
|
+
puts "Compression ratio: #{(array_size.to_f / hll_size).round(2)}x"
|
|
65
|
+
puts
|
|
66
|
+
|
|
67
|
+
# Test 5: Accuracy
|
|
68
|
+
puts "5. Accuracy Check"
|
|
69
|
+
puts "-" * 40
|
|
70
|
+
puts "Actual count: 100,000"
|
|
71
|
+
puts "Estimated count: #{hll.cardinality.round(0)}"
|
|
72
|
+
puts "Error: #{((hll.cardinality - 100_000).abs / 100_000.0 * 100).round(2)}%"
|
|
73
|
+
puts
|
|
74
|
+
|
|
75
|
+
# Test 6: Hash Performance
|
|
76
|
+
puts "6. Hash Function Performance"
|
|
77
|
+
puts "-" * 40
|
|
78
|
+
|
|
79
|
+
class HashTester
|
|
80
|
+
include Hyll::Utils::Hash
|
|
81
|
+
end
|
|
82
|
+
|
|
83
|
+
tester = HashTester.new
|
|
84
|
+
time_hash = Benchmark.measure do
|
|
85
|
+
100_000.times { |i| tester.murmurhash3(i.to_s) }
|
|
86
|
+
end
|
|
87
|
+
puts "MurmurHash3 (100k hashes): #{time_hash.real.round(4)}s"
|
|
88
|
+
puts "Per hash: #{((time_hash.real / 100_000) * 1_000_000).round(2)} microseconds"
|
|
89
|
+
puts
|
|
90
|
+
|
|
91
|
+
puts "=" * 60
|
|
92
|
+
puts "BENCHMARK COMPLETE"
|
|
93
|
+
puts "=" * 60
|