RubyGems - hyll - Versions diffs - 0.1.1 → 1.0.0 - Mend

hyll 0.1.1 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +102 -0
data/README.md +132 -18
data/examples/redis_comparison_benchmark.rb +539 -0
data/examples/v1_benchmark.rb +93 -0
data/lib/hyll/algorithms/enhanced_hyperloglog.rb +240 -119
data/lib/hyll/algorithms/hyperloglog.rb +263 -327
data/lib/hyll/constants.rb +75 -0
data/lib/hyll/utils/hash.rb +132 -21
data/lib/hyll/utils/math.rb +136 -66
data/lib/hyll/version.rb +1 -1
metadata +4 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 322ba54ca099c154e9bc295200933314e019bc61adcc0df9f45a4ee352e7a8fa
-  data.tar.gz: ebcdfac3bcd8d421876c329b33db3be5594379ce7586d6d11585beb6308a7e52
+  metadata.gz: 92ad297dec2242d67ec3d5fb377658db11bb68fde8b26856f4bdf86f4d28fdc9
+  data.tar.gz: b8cb857f998f3eaee640d8958abf2ca9696488ea3517709eb4dc231643b89865
 SHA512:
-  metadata.gz: 6d308475f7666c0d945c0a1b780aa84de2f15b987f0bea463b8897067c8190c9681d76689570535075c312de65aac30fca028dab29eb62fd48f90c12c0db2c56
-  data.tar.gz: edf858ca1185c8b419fb8291a30fd99323610724d8c3a77b82b8f19f956b9784c32125eba600284ad62ab359ebe8fc74accb783189a3b04859b0fff88cb5be70
+  metadata.gz: 97ee62fadbb6c31e90b9a2b711347af4651a886e220906060c0a41402fe2ddece0d06d3e362d9cf1b8cc5ccee1ad0bee2accc156b39a4b9a6551e8b0ee5c97e4
+  data.tar.gz: 82fc6e4fa956a1c0aa9ab7e70faa35cb36ab5687d65cd9907b46faee23090922e21e1f2005371e7c5fcc1657764bc8b704347962a569b319e43f57fa380cf7ea

data/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,108 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.0.0] - 2025-11-28
+### 🚀 MAJOR PERFORMANCE RELEASE - "Blazing Fast Edition"
+This release marks the first stable version of Hyll with a complete performance overhaul, delivering significant speed improvements through optimizations at every level of the codebase.
+### Performance Highlights
+- **2x faster batch operations** through optimized `add_all` with chunk processing
+- **Faster cardinality estimation** via pre-computed lookup tables
+- **Reduced memory allocations** in hot paths
+- **O(1) bit operations** using byte-level lookup tables for CLZ
+### Added
+#### Pre-computed Lookup Tables (Constants)
+- `POW2_NEG_TABLE`: Pre-computed 2^(-n) values for n=0..64, eliminating expensive power calculations
+- `POW2_TABLE`: Pre-computed 2^n values for instant bit shifting
+- `CLZ8_TABLE`: 256-entry byte-level count-leading-zeros table for O(1) CLZ operations
+- `LOG2_TABLE`: Pre-computed log2 values for common register counts
+- `REGISTER_MASKS`: Pre-computed masks for register extraction (precision 4-16)
+- `ALPHA_M_SQUARED`: Pre-computed α×m² values for each precision level
+- `OPTIMAL_BATCH_SIZE`: Tuned batch size (1024) for optimal cache utilization
+- Inlined MurmurHash3 constants for maximum speed
+#### Ultra-Fast Hash Functions
+- `murmurhash3_batch`: Batch hashing for multiple elements with amortized overhead
+- `hash_and_extract`: Combined hash + HLL extraction in single pass
+- `fast_clz32`: O(1) 32-bit count leading zeros using 256-entry byte lookup table
+- Loop unrolling for 4-byte block processing in MurmurHash3
+- Optimized tail byte handling
+#### Optimized Math Utilities
+- `pow2_neg`: O(1) power of 2 negative lookup
+- `sum_pow2_neg`: Vectorized sum of 2^(-v) for register arrays
+- `alpha_m_squared`: Pre-computed α×m² retrieval
+- `harmonic_mean_sum`: Optimized harmonic mean for register values
+- Cached Taylor series coefficients for h(x) calculation
+#### HyperLogLog Core Optimizations
+- `@register_mask`: Pre-computed bitmask for register index extraction
+- `@alpha_m_squared`: Pre-computed estimation constant
+- `@pow2_neg_table`: Instance-level table reference for cache locality
+- `add_to_registers_fast`: Inlined hash + update path
+- `update_register_fast`: Minimized branching with overflow handling
+- `get_register_value_fast`: Optimized nibble extraction with bit operations
+- `set_register_value_fast`: Direct nibble setting without conditionals
+- `extract_counts_fast`: Single-pass register value counting
+- Batch-optimized `add_all` with chunk processing
+#### EnhancedHyperLogLog Optimizations
+- `modification_probability_fast`: Cached probability calculation
+- `@cached_mod_prob`: Modification probability cache with dirty flag
+- `@registers_dirty`: Change tracking for cache invalidation
+- `merge_dense_registers_optimized`: Direct register comparison without allocation
+- `adjust_registers_for_estimation`: Non-mutating register adjustment
+- `compute_cardinality_from_registers`: Separated estimation logic
+- Eliminated `@registers.dup` in hot paths
+### Changed
+- **Bit operations**: Replaced `index / 2` with `index >> 1` and `index % 2` with `index & 1`
+- **Power calculations**: All `2.0 ** -x` replaced with lookup table access
+- **Register initialization**: `(@m + 1) >> 1` instead of `(@m / 2.0).ceil`
+- **Loop optimization**: Integer iteration with early-exit patterns
+- **Memory allocation**: Minimal object creation in hot paths
+- **Method inlining**: Critical paths inlined to reduce call overhead
+### Fixed
+- Edge case in `linear_counting` when zero_registers >= m
+- Potential division by zero in modification probability
+- Memory leak in streaming estimate accumulation
+### Internal
+- Backward-compatible aliases for all renamed methods
+- Comprehensive documentation for all new performance constants
+- Type annotations preserved in signature files
+## [0.2.0] - 2025-03-24
+### Added
+- Associativity test for HyperLogLog merges to ensure (A ∪ B) ∪ C = A ∪ (B ∪ C)
+- Guards against invalid inputs in mathematical functions
+- Memoization for h_values calculation in MLE algorithm
+- More comprehensive error handling with safeguards
+- Added `examples/redis_comparison_benchmark.rb` for Redis comparison
+### Changed
+- Optimized Maximum Likelihood Estimation (MLE) algorithm for better performance
+- Improved numerical stability in the secant method implementation
+- Enhanced Taylor series calculation for better accuracy
+- Fixed Math module namespace conflicts
+- Added safeguards against division by zero and other numerical errors
+- Limited maximum iterations in convergence algorithms to prevent infinite loops
+### Fixed
+- Addressed potential numerical instability in calculate_h_values method
+- Fixed undefined method 'exp' error by using global namespace operator
+- Improved edge case handling in the MLE algorithm
 ## [0.1.1] - 2025-03-21
 ### Changed

data/README.md CHANGED Viewed

@@ -1,11 +1,32 @@
 # Hyll
+![Gem Version](https://img.shields.io/gem/v/hyll)
+![Gem Total Downloads](https://img.shields.io/gem/dt/hyll)
 [![Build Status](https://github.com/davidesantangelo/hyll/workflows/Ruby%20Tests/badge.svg)](https://github.com/davidesantangelo/hyll/actions)
-Hyll is a Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
+Hyll is a **blazing-fast** Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
 > The name "Hyll" is a shortened form of "HyperLogLog", keeping the characteristic "H" and "LL" sounds.
+## 🚀 Version 1.0.0 - Blazing Fast Edition
+Version 1.0.0 marks the first stable release with a **complete performance overhaul**:
+| Improvement | Description |
+|-------------|-------------|
+| Batch Operations | **2x faster** with optimized chunk processing |
+| Lookup Tables | O(1) power-of-2 and CLZ operations |
+| Memory Efficiency | Reduced allocations in hot paths |
+| Hash Function | Inlined MurmurHash3 with loop unrolling |
+### Key Optimizations
+- **Pre-computed Lookup Tables**: All power-of-2 calculations use O(1) table lookups
+- **Fast CLZ**: 256-entry byte-level lookup table for count-leading-zeros
+- **Optimized MurmurHash3**: Loop unrolling and tail optimization
+- **Cached Computations**: α×m², modification probability, register masks pre-computed
+- **Batch Processing**: Chunk-based `add_all` for better performance
 ## Installation
 Add this line to your application's Gemfile:
@@ -46,6 +67,19 @@ hll.add("apple")  # Duplicates don't affect the cardinality
 puts hll.cardinality  # Output: approximately 3
 ```
+### Batch Operations (Optimized in 1.0.0)
+```ruby
+# Efficient batch adding with chunk processing
+hll = Hyll.new(precision: 12)
+# Add many elements efficiently
+elements = (1..100_000).map { |i| "element-#{i}" }
+hll.add_all(elements)
+puts hll.cardinality  # Estimated count
+```
 ### With Custom Precision
 ```ruby
@@ -204,18 +238,17 @@ This table compares different configurations of the HyperLogLog algorithm:
 | K-Minimum Values | Medium            | High (Approximate)     | Yes                            | Medium                    | High accuracy cardinality estimation, set operations     |
 | Bloom Filter     | Medium            | N/A (Membership)       | No (Cardinality) / Yes (Union) | Low                       | Membership testing with false positives, not cardinality |
-### Benchmark Results
+### Benchmark Results (v1.0.0)
-Below are actual performance measurements from an Apple Mac Mini M4 with 24GB RAM:
+Below are performance measurements from an Apple Mac Mini M4:
 | Operation               | Implementation       | Time (seconds) | Items/Operations |
 | ----------------------- | -------------------- | -------------- | ---------------- |
-| Element Addition        | Standard HyperLogLog | 0.0176         | 10,000 items     |
-| Element Addition        | EnhancedHyperLogLog        | 0.0109         | 10,000 items     |
-| Cardinality Calculation | Standard HyperLogLog | 0.0011         | 10 calculations  |
-| Cardinality Calculation | EnhancedHyperLogLog        | 0.0013         | 10 calculations  |
-| Serialization           | Standard HyperLogLog | 0.0003         | 10 operations    |
-| Deserialization         | Standard HyperLogLog | 0.0005         | 10 operations    |
+| Element Addition        | Standard HyperLogLog | 0.15           | 100,000 items    |
+| Element Addition        | EnhancedHyperLogLog  | 0.16           | 100,000 items    |
+| Batch Addition          | Standard HyperLogLog | 0.075          | 100,000 items    |
+| Cardinality Calculation | Standard HyperLogLog | 0.07           | 1,000 calls      |
+| Hash Function           | MurmurHash3          | 0.05           | 100,000 hashes   |
 #### Memory Efficiency
@@ -226,15 +259,87 @@ Below are actual performance measurements from an Apple Mac Mini M4 with 24GB RA
 These benchmarks demonstrate HyperLogLog's exceptional memory efficiency, maintaining a compression ratio of over 6,250x compared to storing the raw elements, while still providing accurate cardinality estimates.
+## Benchmark Comparison with Redis
+Hyll has been benchmarked against Redis' HyperLogLog implementation to provide a comparison with a widely-used production system. The tests were run on an Apple Silicon M1 Mac using Ruby 3.1.4 with 10,000 elements and a precision of 10.
+### Insertion Performance
+| Implementation | Operations/sec | Relative Performance |
+|----------------|---------------:|---------------------:|
+| Hyll Standard  | 86.32          | 1.00x (fastest)      |
+| Hyll Batch     | 85.98          | 1.00x                |
+| Redis Pipelined| 20.51          | 4.21x slower         |
+| Redis PFADD    | 4.93           | 17.51x slower        |
+| Hyll Enhanced  | 1.20           | 71.87x slower        |
+### Cardinality Estimation Performance
+| Implementation      | Operations/sec | Relative Performance |
+|---------------------|---------------:|---------------------:|
+| Redis PFCOUNT       | 53,131         | 1.00x (fastest)      |
+| Hyll Enhanced Stream| 24,412         | 2.18x slower         |
+| Hyll Enhanced       | 8,843          | 6.01x slower         |
+| Hyll Standard       | 8,538          | 6.22x slower         |
+| Hyll MLE            | 5,645          | 9.41x slower         |
+### Merge Performance
+| Implementation | Operations/sec | Relative Performance |
+|----------------|---------------:|---------------------:|
+| Redis PFMERGE  | 12,735         | 1.00x (fastest)      |
+| Hyll Enhanced  | 6,523          | 1.95x slower         |
+| Hyll Standard  | 2,932          | 4.34x slower         |
+### Memory Usage
+| Implementation   | Memory Usage |
+|------------------|-------------:|
+| Hyll Enhanced    | 0.28 KB      |
+| Hyll Standard    | 18.30 KB     |
+| Redis            | 12.56 KB     |
+| Raw Elements     | 0.04 KB      |
+### Accuracy Comparison
+| Implementation      | Estimated Count | Actual Count | Error    |
+|---------------------|----------------:|-------------:|---------:|
+| Redis               | 9,990           | 10,000       | 0.10%    |
+| Hyll Enhanced       | 3,018           | 10,000       | 69.82%   |
+| Hyll (High Prec)    | 19,016          | 10,000       | 90.16%   |
+| Hyll Standard       | 32,348          | 10,000       | 223.48%  |
+| Hyll Enhanced Stream| 8,891,659       | 10,000       | 88,816.59% |
+| Hyll MLE            | 19,986,513      | 10,000       | 199,765.13% |
+### Summary of Findings
+- **Insertion Performance**: Hyll Standard and Batch operations are significantly faster than Redis for adding elements.
+- **Cardinality Estimation**: Redis has the fastest cardinality estimation, with Hyll Enhanced Stream as a close second.
+- **Merge Operations**: Redis outperforms Hyll for merging HyperLogLog sketches, but Hyll Enhanced provides competitive performance.
+- **Memory Usage**: Hyll Enhanced offers the most memory-efficient implementation.
+- **Accuracy**: Redis provides the best accuracy in this test scenario.
+#### Recommendation
+For most use cases, Redis offers an excellent balance of accuracy and performance. However, Hyll provides superior insertion performance and memory efficiency, making it a good choice for scenarios where these attributes are prioritized.
+You can run these benchmarks yourself using the included script:
+```ruby
+ruby examples/redis_comparison_benchmark.rb
+```
 ## Features
-- Standard HyperLogLog implementation with customizable precision
+- Standard HyperLogLog implementation with customizable precision (4-16)
+- **Pre-computed lookup tables** for O(1) power-of-2 and CLZ operations
 - Memory-efficient register storage with 4-bit packing (inspired by Facebook's Presto implementation)
-- Sparse representation for small cardinalities
+- Sparse representation for small cardinalities (exact counting)
 - Dense representation for larger datasets
 - EnhancedHyperLogLog format for compatibility with other systems
 - Streaming martingale estimator for improved accuracy with EnhancedHyperLogLog
-- Maximum Likelihood Estimation for improved accuracy
+- Maximum Likelihood Estimation (MLE) for improved accuracy
+- **Optimized batch processing** with `add_all`
 - Merge and serialization capabilities
 - Factory pattern for creating and deserializing counters
@@ -246,13 +351,15 @@ Hyll offers two main implementations:
 2. **EnhancedHyperLogLog**: A strictly dense format similar to Facebook's Presto P4HYPERLOGLOG type, where "P4" refers to the 4-bit precision per register. This format is slightly less memory-efficient but offers better compatibility with other HyperLogLog implementations. It also includes a streaming martingale estimator that can provide up to 1.56x better accuracy for the same memory usage.
-The internal architecture follows a modular approach:
+### v1.0.0 Performance Architecture
+The internal architecture has been completely redesigned for maximum performance:
-- `Hyll::Constants`: Shared constants used throughout the library
-- `Hyll::Utils::Hash`: Hash functions for element processing
-- `Hyll::Utils::Math`: Mathematical operations for HyperLogLog calculations
-- `Hyll::HyperLogLog`: The standard implementation
-- `Hyll::EnhancedHyperLogLog`: The enhanced implementation
+- `Hyll::Constants`: Pre-computed lookup tables (POW2, CLZ8, ALPHA_M_SQUARED)
+- `Hyll::Utils::Hash`: Optimized MurmurHash3 with batch processing and inlined operations
+- `Hyll::Utils::Math`: Lookup-based math with cached computations
+- `Hyll::HyperLogLog`: Register mask pre-computation, fast nibble operations
+- `Hyll::EnhancedHyperLogLog`: Cached modification probability, zero-allocation updates
 - `Hyll::Factory`: Factory pattern for creating counters
 ## Examples
@@ -293,6 +400,13 @@ For advanced usage scenarios, check out `examples/advance.rb` which includes:
 - Advanced serialization techniques
 - Precision vs. memory usage benchmarks
+## Example Use Cases
+Here is a quick illustration of how Hyll can be helpful in a real-world scenario:
+- UniqueVisitorCounting: Track unique users visiting a website in real time. By adding each user's session ID or IP to the HyperLogLog, you get an approximate number of distinct users without storing everybody's data.
+- LogAnalytics: Continuously process large log files to calculate the volume of unique events, keeping memory usage low.
+- MarketingCampaigns: Quickly gauge how many distinct customers participate in a campaign while merging data from multiple sources.
 ## Development
 After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.