hyll 0.2.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 445c738f59356bd13dc55dd8f6c8a2b74e39260e4367fa28012a4f13c0f5ebeb
4
- data.tar.gz: 5f5d79fcd5aa2aec6afeb4e04be2725c6cc08cefa99c497e8070ee43eee03742
3
+ metadata.gz: 92ad297dec2242d67ec3d5fb377658db11bb68fde8b26856f4bdf86f4d28fdc9
4
+ data.tar.gz: b8cb857f998f3eaee640d8958abf2ca9696488ea3517709eb4dc231643b89865
5
5
  SHA512:
6
- metadata.gz: 8df3d73bd6f665ab163891b7f51d5e9f09fc8b2df0479445cb285487b6dd4b9640d324a1bab77069d50a934da0a6af9e464c33f212c94ac18a5af73349f234cf
7
- data.tar.gz: f73ae9153b519e77dfb77685395e5d1736de23f1f53e3575db29e96fb4f2fa512e08b1be737d42717db9e28b936358731a17ec69234864532c731a467782ec4a
6
+ metadata.gz: 97ee62fadbb6c31e90b9a2b711347af4651a886e220906060c0a41402fe2ddece0d06d3e362d9cf1b8cc5ccee1ad0bee2accc156b39a4b9a6551e8b0ee5c97e4
7
+ data.tar.gz: 82fc6e4fa956a1c0aa9ab7e70faa35cb36ab5687d65cd9907b46faee23090922e21e1f2005371e7c5fcc1657764bc8b704347962a569b319e43f57fa380cf7ea
data/CHANGELOG.md CHANGED
@@ -5,6 +5,86 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [1.0.0] - 2025-11-28
9
+
10
+ ### 🚀 MAJOR PERFORMANCE RELEASE - "Blazing Fast Edition"
11
+
12
+ This release marks the first stable version of Hyll with a complete performance overhaul, delivering significant speed improvements through optimizations at every level of the codebase.
13
+
14
+ ### Performance Highlights
15
+
16
+ - **2x faster batch operations** through optimized `add_all` with chunk processing
17
+ - **Faster cardinality estimation** via pre-computed lookup tables
18
+ - **Reduced memory allocations** in hot paths
19
+ - **O(1) bit operations** using byte-level lookup tables for CLZ
20
+
21
+ ### Added
22
+
23
+ #### Pre-computed Lookup Tables (Constants)
24
+ - `POW2_NEG_TABLE`: Pre-computed 2^(-n) values for n=0..64, eliminating expensive power calculations
25
+ - `POW2_TABLE`: Pre-computed 2^n values for instant bit shifting
26
+ - `CLZ8_TABLE`: 256-entry byte-level count-leading-zeros table for O(1) CLZ operations
27
+ - `LOG2_TABLE`: Pre-computed log2 values for common register counts
28
+ - `REGISTER_MASKS`: Pre-computed masks for register extraction (precision 4-16)
29
+ - `ALPHA_M_SQUARED`: Pre-computed α×m² values for each precision level
30
+ - `OPTIMAL_BATCH_SIZE`: Tuned batch size (1024) for optimal cache utilization
31
+ - Inlined MurmurHash3 constants for maximum speed
32
+
33
+ #### Ultra-Fast Hash Functions
34
+ - `murmurhash3_batch`: Batch hashing for multiple elements with amortized overhead
35
+ - `hash_and_extract`: Combined hash + HLL extraction in single pass
36
+ - `fast_clz32`: O(1) 32-bit count leading zeros using 256-entry byte lookup table
37
+ - Loop unrolling for 4-byte block processing in MurmurHash3
38
+ - Optimized tail byte handling
39
+
40
+ #### Optimized Math Utilities
41
+ - `pow2_neg`: O(1) power of 2 negative lookup
42
+ - `sum_pow2_neg`: Vectorized sum of 2^(-v) for register arrays
43
+ - `alpha_m_squared`: Pre-computed α×m² retrieval
44
+ - `harmonic_mean_sum`: Optimized harmonic mean for register values
45
+ - Cached Taylor series coefficients for h(x) calculation
46
+
47
+ #### HyperLogLog Core Optimizations
48
+ - `@register_mask`: Pre-computed bitmask for register index extraction
49
+ - `@alpha_m_squared`: Pre-computed estimation constant
50
+ - `@pow2_neg_table`: Instance-level table reference for cache locality
51
+ - `add_to_registers_fast`: Inlined hash + update path
52
+ - `update_register_fast`: Minimized branching with overflow handling
53
+ - `get_register_value_fast`: Optimized nibble extraction with bit operations
54
+ - `set_register_value_fast`: Direct nibble setting without conditionals
55
+ - `extract_counts_fast`: Single-pass register value counting
56
+ - Batch-optimized `add_all` with chunk processing
57
+
58
+ #### EnhancedHyperLogLog Optimizations
59
+ - `modification_probability_fast`: Cached probability calculation
60
+ - `@cached_mod_prob`: Modification probability cache with dirty flag
61
+ - `@registers_dirty`: Change tracking for cache invalidation
62
+ - `merge_dense_registers_optimized`: Direct register comparison without allocation
63
+ - `adjust_registers_for_estimation`: Non-mutating register adjustment
64
+ - `compute_cardinality_from_registers`: Separated estimation logic
65
+ - Eliminated `@registers.dup` in hot paths
66
+
67
+ ### Changed
68
+
69
+ - **Bit operations**: Replaced `index / 2` with `index >> 1` and `index % 2` with `index & 1`
70
+ - **Power calculations**: All `2.0 ** -x` replaced with lookup table access
71
+ - **Register initialization**: `(@m + 1) >> 1` instead of `(@m / 2.0).ceil`
72
+ - **Loop optimization**: Integer iteration with early-exit patterns
73
+ - **Memory allocation**: Minimal object creation in hot paths
74
+ - **Method inlining**: Critical paths inlined to reduce call overhead
75
+
76
+ ### Fixed
77
+
78
+ - Edge case in `linear_counting` when zero_registers >= m
79
+ - Potential division by zero in modification probability
80
+ - Memory leak in streaming estimate accumulation
81
+
82
+ ### Internal
83
+
84
+ - Backward-compatible aliases for all renamed methods
85
+ - Comprehensive documentation for all new performance constants
86
+ - Type annotations preserved in signature files
87
+
8
88
  ## [0.2.0] - 2025-03-24
9
89
 
10
90
  ### Added
data/README.md CHANGED
@@ -4,10 +4,29 @@
4
4
  ![Gem Total Downloads](https://img.shields.io/gem/dt/hyll)
5
5
  [![Build Status](https://github.com/davidesantangelo/hyll/workflows/Ruby%20Tests/badge.svg)](https://github.com/davidesantangelo/hyll/actions)
6
6
 
7
- Hyll is a Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
7
+ Hyll is a **blazing-fast** Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
8
8
 
9
9
  > The name "Hyll" is a shortened form of "HyperLogLog", keeping the characteristic "H" and "LL" sounds.
10
10
 
11
+ ## 🚀 Version 1.0.0 - Blazing Fast Edition
12
+
13
+ Version 1.0.0 marks the first stable release with a **complete performance overhaul**:
14
+
15
+ | Improvement | Description |
16
+ |-------------|-------------|
17
+ | Batch Operations | **2x faster** with optimized chunk processing |
18
+ | Lookup Tables | O(1) power-of-2 and CLZ operations |
19
+ | Memory Efficiency | Reduced allocations in hot paths |
20
+ | Hash Function | Inlined MurmurHash3 with loop unrolling |
21
+
22
+ ### Key Optimizations
23
+
24
+ - **Pre-computed Lookup Tables**: All power-of-2 calculations use O(1) table lookups
25
+ - **Fast CLZ**: 256-entry byte-level lookup table for count-leading-zeros
26
+ - **Optimized MurmurHash3**: Loop unrolling and tail optimization
27
+ - **Cached Computations**: α×m², modification probability, register masks pre-computed
28
+ - **Batch Processing**: Chunk-based `add_all` for better performance
29
+
11
30
  ## Installation
12
31
 
13
32
  Add this line to your application's Gemfile:
@@ -48,6 +67,19 @@ hll.add("apple") # Duplicates don't affect the cardinality
48
67
  puts hll.cardinality # Output: approximately 3
49
68
  ```
50
69
 
70
+ ### Batch Operations (Optimized in 1.0.0)
71
+
72
+ ```ruby
73
+ # Efficient batch adding with chunk processing
74
+ hll = Hyll.new(precision: 12)
75
+
76
+ # Add many elements efficiently
77
+ elements = (1..100_000).map { |i| "element-#{i}" }
78
+ hll.add_all(elements)
79
+
80
+ puts hll.cardinality # Estimated count
81
+ ```
82
+
51
83
  ### With Custom Precision
52
84
 
53
85
  ```ruby
@@ -206,18 +238,17 @@ This table compares different configurations of the HyperLogLog algorithm:
206
238
  | K-Minimum Values | Medium | High (Approximate) | Yes | Medium | High accuracy cardinality estimation, set operations |
207
239
  | Bloom Filter | Medium | N/A (Membership) | No (Cardinality) / Yes (Union) | Low | Membership testing with false positives, not cardinality |
208
240
 
209
- ### Benchmark Results
241
+ ### Benchmark Results (v1.0.0)
210
242
 
211
- Below are actual performance measurements from an Apple Mac Mini M4 with 24GB RAM:
243
+ Below are performance measurements from an Apple Mac Mini M4:
212
244
 
213
245
  | Operation | Implementation | Time (seconds) | Items/Operations |
214
246
  | ----------------------- | -------------------- | -------------- | ---------------- |
215
- | Element Addition | Standard HyperLogLog | 0.0176 | 10,000 items |
216
- | Element Addition | EnhancedHyperLogLog | 0.0109 | 10,000 items |
217
- | Cardinality Calculation | Standard HyperLogLog | 0.0011 | 10 calculations |
218
- | Cardinality Calculation | EnhancedHyperLogLog | 0.0013 | 10 calculations |
219
- | Serialization | Standard HyperLogLog | 0.0003 | 10 operations |
220
- | Deserialization | Standard HyperLogLog | 0.0005 | 10 operations |
247
+ | Element Addition | Standard HyperLogLog | 0.15 | 100,000 items |
248
+ | Element Addition | EnhancedHyperLogLog | 0.16 | 100,000 items |
249
+ | Batch Addition | Standard HyperLogLog | 0.075 | 100,000 items |
250
+ | Cardinality Calculation | Standard HyperLogLog | 0.07 | 1,000 calls |
251
+ | Hash Function | MurmurHash3 | 0.05 | 100,000 hashes |
221
252
 
222
253
  #### Memory Efficiency
223
254
 
@@ -300,13 +331,15 @@ ruby examples/redis_comparison_benchmark.rb
300
331
 
301
332
  ## Features
302
333
 
303
- - Standard HyperLogLog implementation with customizable precision
334
+ - Standard HyperLogLog implementation with customizable precision (4-16)
335
+ - **Pre-computed lookup tables** for O(1) power-of-2 and CLZ operations
304
336
  - Memory-efficient register storage with 4-bit packing (inspired by Facebook's Presto implementation)
305
- - Sparse representation for small cardinalities
337
+ - Sparse representation for small cardinalities (exact counting)
306
338
  - Dense representation for larger datasets
307
339
  - EnhancedHyperLogLog format for compatibility with other systems
308
340
  - Streaming martingale estimator for improved accuracy with EnhancedHyperLogLog
309
- - Maximum Likelihood Estimation for improved accuracy
341
+ - Maximum Likelihood Estimation (MLE) for improved accuracy
342
+ - **Optimized batch processing** with `add_all`
310
343
  - Merge and serialization capabilities
311
344
  - Factory pattern for creating and deserializing counters
312
345
 
@@ -318,13 +351,15 @@ Hyll offers two main implementations:
318
351
 
319
352
  2. **EnhancedHyperLogLog**: A strictly dense format similar to Facebook's Presto P4HYPERLOGLOG type, where "P4" refers to the 4-bit precision per register. This format is slightly less memory-efficient but offers better compatibility with other HyperLogLog implementations. It also includes a streaming martingale estimator that can provide up to 1.56x better accuracy for the same memory usage.
320
353
 
321
- The internal architecture follows a modular approach:
354
+ ### v1.0.0 Performance Architecture
355
+
356
+ The internal architecture has been completely redesigned for maximum performance:
322
357
 
323
- - `Hyll::Constants`: Shared constants used throughout the library
324
- - `Hyll::Utils::Hash`: Hash functions for element processing
325
- - `Hyll::Utils::Math`: Mathematical operations for HyperLogLog calculations
326
- - `Hyll::HyperLogLog`: The standard implementation
327
- - `Hyll::EnhancedHyperLogLog`: The enhanced implementation
358
+ - `Hyll::Constants`: Pre-computed lookup tables (POW2, CLZ8, ALPHA_M_SQUARED)
359
+ - `Hyll::Utils::Hash`: Optimized MurmurHash3 with batch processing and inlined operations
360
+ - `Hyll::Utils::Math`: Lookup-based math with cached computations
361
+ - `Hyll::HyperLogLog`: Register mask pre-computation, fast nibble operations
362
+ - `Hyll::EnhancedHyperLogLog`: Cached modification probability, zero-allocation updates
328
363
  - `Hyll::Factory`: Factory pattern for creating counters
329
364
 
330
365
  ## Examples
@@ -0,0 +1,93 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Hyll v1.0.0 - Blazing Fast Edition Benchmark
4
+ # Run with: ruby examples/v1_benchmark.rb
5
+
6
+ $LOAD_PATH.unshift File.expand_path("../lib", __dir__)
7
+ require "hyll"
8
+ require "benchmark"
9
+
10
+ puts "=" * 60
11
+ puts "HYLL v1.0.0 - BLAZING FAST EDITION BENCHMARK"
12
+ puts "=" * 60
13
+ puts
14
+
15
+ # Test 1: Element Addition
16
+ puts "1. Element Addition Performance"
17
+ puts "-" * 40
18
+
19
+ hll_standard = Hyll.new(type: :standard)
20
+ hll_enhanced = Hyll.new(type: :enhanced)
21
+
22
+ # Use integers for cleaner testing
23
+ elements = (1..100_000).to_a
24
+
25
+ time_standard = Benchmark.measure { elements.each { |e| hll_standard.add(e) } }
26
+ puts "Standard HLL (100k elements): #{time_standard.real.round(4)}s"
27
+
28
+ time_enhanced = Benchmark.measure { elements.each { |e| hll_enhanced.add(e) } }
29
+ puts "Enhanced HLL (100k elements): #{time_enhanced.real.round(4)}s"
30
+ puts
31
+
32
+ # Test 2: Batch Addition
33
+ puts "2. Batch Addition Performance"
34
+ puts "-" * 40
35
+
36
+ hll_batch = Hyll.new(type: :standard)
37
+ time_batch = Benchmark.measure { hll_batch.add_all(elements) }
38
+ puts "Standard HLL batch (100k elements): #{time_batch.real.round(4)}s"
39
+ puts "Speedup vs individual: #{(time_standard.real / time_batch.real).round(2)}x"
40
+ puts
41
+
42
+ # Test 3: Cardinality Calculation
43
+ puts "3. Cardinality Calculation Performance"
44
+ puts "-" * 40
45
+
46
+ hll = Hyll.new(type: :standard)
47
+ hll.add_all(elements)
48
+
49
+ time_card = Benchmark.measure { 1000.times { hll.cardinality } }
50
+ puts "Standard HLL (1000 calls): #{time_card.real.round(4)}s"
51
+ puts "Per call: #{((time_card.real / 1000) * 1_000_000).round(2)} microseconds"
52
+ puts
53
+
54
+ # Test 4: Memory Efficiency
55
+ puts "4. Memory Efficiency"
56
+ puts "-" * 40
57
+
58
+ # For integers, estimate as 8 bytes each (Fixnum size)
59
+ array_size = elements.size * 8
60
+ hll_size = hll.serialize.bytesize
61
+
62
+ puts "Raw data size: #{array_size} bytes"
63
+ puts "HLL serialized size: #{hll_size} bytes"
64
+ puts "Compression ratio: #{(array_size.to_f / hll_size).round(2)}x"
65
+ puts
66
+
67
+ # Test 5: Accuracy
68
+ puts "5. Accuracy Check"
69
+ puts "-" * 40
70
+ puts "Actual count: 100,000"
71
+ puts "Estimated count: #{hll.cardinality.round(0)}"
72
+ puts "Error: #{((hll.cardinality - 100_000).abs / 100_000.0 * 100).round(2)}%"
73
+ puts
74
+
75
+ # Test 6: Hash Performance
76
+ puts "6. Hash Function Performance"
77
+ puts "-" * 40
78
+
79
+ class HashTester
80
+ include Hyll::Utils::Hash
81
+ end
82
+
83
+ tester = HashTester.new
84
+ time_hash = Benchmark.measure do
85
+ 100_000.times { |i| tester.murmurhash3(i.to_s) }
86
+ end
87
+ puts "MurmurHash3 (100k hashes): #{time_hash.real.round(4)}s"
88
+ puts "Per hash: #{((time_hash.real / 100_000) * 1_000_000).round(2)} microseconds"
89
+ puts
90
+
91
+ puts "=" * 60
92
+ puts "BENCHMARK COMPLETE"
93
+ puts "=" * 60