RubyGems - hyll - Versions diffs - 0.2.0 → 1.0.0 - Mend

hyll 0.2.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +80 -0
data/README.md +53 -18
data/examples/v1_benchmark.rb +93 -0
data/lib/hyll/algorithms/enhanced_hyperloglog.rb +234 -120
data/lib/hyll/algorithms/hyperloglog.rb +262 -338
data/lib/hyll/constants.rb +75 -0
data/lib/hyll/utils/hash.rb +132 -21
data/lib/hyll/utils/math.rb +129 -75
data/lib/hyll/version.rb +1 -1
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 445c738f59356bd13dc55dd8f6c8a2b74e39260e4367fa28012a4f13c0f5ebeb
-  data.tar.gz: 5f5d79fcd5aa2aec6afeb4e04be2725c6cc08cefa99c497e8070ee43eee03742
+  metadata.gz: 92ad297dec2242d67ec3d5fb377658db11bb68fde8b26856f4bdf86f4d28fdc9
+  data.tar.gz: b8cb857f998f3eaee640d8958abf2ca9696488ea3517709eb4dc231643b89865
 SHA512:
-  metadata.gz: 8df3d73bd6f665ab163891b7f51d5e9f09fc8b2df0479445cb285487b6dd4b9640d324a1bab77069d50a934da0a6af9e464c33f212c94ac18a5af73349f234cf
-  data.tar.gz: f73ae9153b519e77dfb77685395e5d1736de23f1f53e3575db29e96fb4f2fa512e08b1be737d42717db9e28b936358731a17ec69234864532c731a467782ec4a
+  metadata.gz: 97ee62fadbb6c31e90b9a2b711347af4651a886e220906060c0a41402fe2ddece0d06d3e362d9cf1b8cc5ccee1ad0bee2accc156b39a4b9a6551e8b0ee5c97e4
+  data.tar.gz: 82fc6e4fa956a1c0aa9ab7e70faa35cb36ab5687d65cd9907b46faee23090922e21e1f2005371e7c5fcc1657764bc8b704347962a569b319e43f57fa380cf7ea

data/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,86 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.0.0] - 2025-11-28
+### 🚀 MAJOR PERFORMANCE RELEASE - "Blazing Fast Edition"
+This release marks the first stable version of Hyll with a complete performance overhaul, delivering significant speed improvements through optimizations at every level of the codebase.
+### Performance Highlights
+- **2x faster batch operations** through optimized `add_all` with chunk processing
+- **Faster cardinality estimation** via pre-computed lookup tables
+- **Reduced memory allocations** in hot paths
+- **O(1) bit operations** using byte-level lookup tables for CLZ
+### Added
+#### Pre-computed Lookup Tables (Constants)
+- `POW2_NEG_TABLE`: Pre-computed 2^(-n) values for n=0..64, eliminating expensive power calculations
+- `POW2_TABLE`: Pre-computed 2^n values for instant bit shifting
+- `CLZ8_TABLE`: 256-entry byte-level count-leading-zeros table for O(1) CLZ operations
+- `LOG2_TABLE`: Pre-computed log2 values for common register counts
+- `REGISTER_MASKS`: Pre-computed masks for register extraction (precision 4-16)
+- `ALPHA_M_SQUARED`: Pre-computed α×m² values for each precision level
+- `OPTIMAL_BATCH_SIZE`: Tuned batch size (1024) for optimal cache utilization
+- Inlined MurmurHash3 constants for maximum speed
+#### Ultra-Fast Hash Functions
+- `murmurhash3_batch`: Batch hashing for multiple elements with amortized overhead
+- `hash_and_extract`: Combined hash + HLL extraction in single pass
+- `fast_clz32`: O(1) 32-bit count leading zeros using 256-entry byte lookup table
+- Loop unrolling for 4-byte block processing in MurmurHash3
+- Optimized tail byte handling
+#### Optimized Math Utilities
+- `pow2_neg`: O(1) power of 2 negative lookup
+- `sum_pow2_neg`: Vectorized sum of 2^(-v) for register arrays
+- `alpha_m_squared`: Pre-computed α×m² retrieval
+- `harmonic_mean_sum`: Optimized harmonic mean for register values
+- Cached Taylor series coefficients for h(x) calculation
+#### HyperLogLog Core Optimizations
+- `@register_mask`: Pre-computed bitmask for register index extraction
+- `@alpha_m_squared`: Pre-computed estimation constant
+- `@pow2_neg_table`: Instance-level table reference for cache locality
+- `add_to_registers_fast`: Inlined hash + update path
+- `update_register_fast`: Minimized branching with overflow handling
+- `get_register_value_fast`: Optimized nibble extraction with bit operations
+- `set_register_value_fast`: Direct nibble setting without conditionals
+- `extract_counts_fast`: Single-pass register value counting
+- Batch-optimized `add_all` with chunk processing
+#### EnhancedHyperLogLog Optimizations
+- `modification_probability_fast`: Cached probability calculation
+- `@cached_mod_prob`: Modification probability cache with dirty flag
+- `@registers_dirty`: Change tracking for cache invalidation
+- `merge_dense_registers_optimized`: Direct register comparison without allocation
+- `adjust_registers_for_estimation`: Non-mutating register adjustment
+- `compute_cardinality_from_registers`: Separated estimation logic
+- Eliminated `@registers.dup` in hot paths
+### Changed
+- **Bit operations**: Replaced `index / 2` with `index >> 1` and `index % 2` with `index & 1`
+- **Power calculations**: All `2.0 ** -x` replaced with lookup table access
+- **Register initialization**: `(@m + 1) >> 1` instead of `(@m / 2.0).ceil`
+- **Loop optimization**: Integer iteration with early-exit patterns
+- **Memory allocation**: Minimal object creation in hot paths
+- **Method inlining**: Critical paths inlined to reduce call overhead
+### Fixed
+- Edge case in `linear_counting` when zero_registers >= m
+- Potential division by zero in modification probability
+- Memory leak in streaming estimate accumulation
+### Internal
+- Backward-compatible aliases for all renamed methods
+- Comprehensive documentation for all new performance constants
+- Type annotations preserved in signature files
 ## [0.2.0] - 2025-03-24
 ### Added

data/README.md CHANGED Viewed

@@ -4,10 +4,29 @@
 ![Gem Total Downloads](https://img.shields.io/gem/dt/hyll)
 [![Build Status](https://github.com/davidesantangelo/hyll/workflows/Ruby%20Tests/badge.svg)](https://github.com/davidesantangelo/hyll/actions)
-Hyll is a Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
+Hyll is a **blazing-fast** Ruby implementation of the [HyperLogLog algorithm](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, which efficiently approximates the number of distinct elements in a multiset with minimal memory usage. It supports both standard and Enhanced variants, offering a flexible approach for large-scale applications and providing convenient methods for merging, serialization, and maximum likelihood estimation.
 > The name "Hyll" is a shortened form of "HyperLogLog", keeping the characteristic "H" and "LL" sounds.
+## 🚀 Version 1.0.0 - Blazing Fast Edition
+Version 1.0.0 marks the first stable release with a **complete performance overhaul**:
+| Improvement | Description |
+|-------------|-------------|
+| Batch Operations | **2x faster** with optimized chunk processing |
+| Lookup Tables | O(1) power-of-2 and CLZ operations |
+| Memory Efficiency | Reduced allocations in hot paths |
+| Hash Function | Inlined MurmurHash3 with loop unrolling |
+### Key Optimizations
+- **Pre-computed Lookup Tables**: All power-of-2 calculations use O(1) table lookups
+- **Fast CLZ**: 256-entry byte-level lookup table for count-leading-zeros
+- **Optimized MurmurHash3**: Loop unrolling and tail optimization
+- **Cached Computations**: α×m², modification probability, register masks pre-computed
+- **Batch Processing**: Chunk-based `add_all` for better performance
 ## Installation
 Add this line to your application's Gemfile:
@@ -48,6 +67,19 @@ hll.add("apple")  # Duplicates don't affect the cardinality
 puts hll.cardinality  # Output: approximately 3
 ```
+### Batch Operations (Optimized in 1.0.0)
+```ruby
+# Efficient batch adding with chunk processing
+hll = Hyll.new(precision: 12)
+# Add many elements efficiently
+elements = (1..100_000).map { |i| "element-#{i}" }
+hll.add_all(elements)
+puts hll.cardinality  # Estimated count
+```
 ### With Custom Precision
 ```ruby
@@ -206,18 +238,17 @@ This table compares different configurations of the HyperLogLog algorithm:
 | K-Minimum Values | Medium            | High (Approximate)     | Yes                            | Medium                    | High accuracy cardinality estimation, set operations     |
 | Bloom Filter     | Medium            | N/A (Membership)       | No (Cardinality) / Yes (Union) | Low                       | Membership testing with false positives, not cardinality |
-### Benchmark Results
+### Benchmark Results (v1.0.0)
-Below are actual performance measurements from an Apple Mac Mini M4 with 24GB RAM:
+Below are performance measurements from an Apple Mac Mini M4:
 | Operation               | Implementation       | Time (seconds) | Items/Operations |
 | ----------------------- | -------------------- | -------------- | ---------------- |
-| Element Addition        | Standard HyperLogLog | 0.0176         | 10,000 items     |
-| Element Addition        | EnhancedHyperLogLog        | 0.0109         | 10,000 items     |
-| Cardinality Calculation | Standard HyperLogLog | 0.0011         | 10 calculations  |
-| Cardinality Calculation | EnhancedHyperLogLog        | 0.0013         | 10 calculations  |
-| Serialization           | Standard HyperLogLog | 0.0003         | 10 operations    |
-| Deserialization         | Standard HyperLogLog | 0.0005         | 10 operations    |
+| Element Addition        | Standard HyperLogLog | 0.15           | 100,000 items    |
+| Element Addition        | EnhancedHyperLogLog  | 0.16           | 100,000 items    |
+| Batch Addition          | Standard HyperLogLog | 0.075          | 100,000 items    |
+| Cardinality Calculation | Standard HyperLogLog | 0.07           | 1,000 calls      |
+| Hash Function           | MurmurHash3          | 0.05           | 100,000 hashes   |
 #### Memory Efficiency
@@ -300,13 +331,15 @@ ruby examples/redis_comparison_benchmark.rb
 ## Features
-- Standard HyperLogLog implementation with customizable precision
+- Standard HyperLogLog implementation with customizable precision (4-16)
+- **Pre-computed lookup tables** for O(1) power-of-2 and CLZ operations
 - Memory-efficient register storage with 4-bit packing (inspired by Facebook's Presto implementation)
-- Sparse representation for small cardinalities
+- Sparse representation for small cardinalities (exact counting)
 - Dense representation for larger datasets
 - EnhancedHyperLogLog format for compatibility with other systems
 - Streaming martingale estimator for improved accuracy with EnhancedHyperLogLog
-- Maximum Likelihood Estimation for improved accuracy
+- Maximum Likelihood Estimation (MLE) for improved accuracy
+- **Optimized batch processing** with `add_all`
 - Merge and serialization capabilities
 - Factory pattern for creating and deserializing counters
@@ -318,13 +351,15 @@ Hyll offers two main implementations:
 2. **EnhancedHyperLogLog**: A strictly dense format similar to Facebook's Presto P4HYPERLOGLOG type, where "P4" refers to the 4-bit precision per register. This format is slightly less memory-efficient but offers better compatibility with other HyperLogLog implementations. It also includes a streaming martingale estimator that can provide up to 1.56x better accuracy for the same memory usage.
-The internal architecture follows a modular approach:
+### v1.0.0 Performance Architecture
+The internal architecture has been completely redesigned for maximum performance:
-- `Hyll::Constants`: Shared constants used throughout the library
-- `Hyll::Utils::Hash`: Hash functions for element processing
-- `Hyll::Utils::Math`: Mathematical operations for HyperLogLog calculations
-- `Hyll::HyperLogLog`: The standard implementation
-- `Hyll::EnhancedHyperLogLog`: The enhanced implementation
+- `Hyll::Constants`: Pre-computed lookup tables (POW2, CLZ8, ALPHA_M_SQUARED)
+- `Hyll::Utils::Hash`: Optimized MurmurHash3 with batch processing and inlined operations
+- `Hyll::Utils::Math`: Lookup-based math with cached computations
+- `Hyll::HyperLogLog`: Register mask pre-computation, fast nibble operations
+- `Hyll::EnhancedHyperLogLog`: Cached modification probability, zero-allocation updates
 - `Hyll::Factory`: Factory pattern for creating counters
 ## Examples

data/examples/v1_benchmark.rb ADDED Viewed

@@ -0,0 +1,93 @@
+# frozen_string_literal: true
+# Hyll v1.0.0 - Blazing Fast Edition Benchmark
+# Run with: ruby examples/v1_benchmark.rb
+$LOAD_PATH.unshift File.expand_path("../lib", __dir__)
+require "hyll"
+require "benchmark"
+puts "=" * 60
+puts "HYLL v1.0.0 - BLAZING FAST EDITION BENCHMARK"
+puts "=" * 60
+puts
+# Test 1: Element Addition
+puts "1. Element Addition Performance"
+puts "-" * 40
+hll_standard = Hyll.new(type: :standard)
+hll_enhanced = Hyll.new(type: :enhanced)
+# Use integers for cleaner testing
+elements = (1..100_000).to_a
+time_standard = Benchmark.measure { elements.each { |e| hll_standard.add(e) } }
+puts "Standard HLL (100k elements): #{time_standard.real.round(4)}s"
+time_enhanced = Benchmark.measure { elements.each { |e| hll_enhanced.add(e) } }
+puts "Enhanced HLL (100k elements): #{time_enhanced.real.round(4)}s"
+puts
+# Test 2: Batch Addition
+puts "2. Batch Addition Performance"
+puts "-" * 40
+hll_batch = Hyll.new(type: :standard)
+time_batch = Benchmark.measure { hll_batch.add_all(elements) }
+puts "Standard HLL batch (100k elements): #{time_batch.real.round(4)}s"
+puts "Speedup vs individual: #{(time_standard.real / time_batch.real).round(2)}x"
+puts
+# Test 3: Cardinality Calculation
+puts "3. Cardinality Calculation Performance"
+puts "-" * 40
+hll = Hyll.new(type: :standard)
+hll.add_all(elements)
+time_card = Benchmark.measure { 1000.times { hll.cardinality } }
+puts "Standard HLL (1000 calls): #{time_card.real.round(4)}s"
+puts "Per call: #{((time_card.real / 1000) * 1_000_000).round(2)} microseconds"
+puts
+# Test 4: Memory Efficiency
+puts "4. Memory Efficiency"
+puts "-" * 40
+# For integers, estimate as 8 bytes each (Fixnum size)
+array_size = elements.size * 8
+hll_size = hll.serialize.bytesize
+puts "Raw data size: #{array_size} bytes"
+puts "HLL serialized size: #{hll_size} bytes"
+puts "Compression ratio: #{(array_size.to_f / hll_size).round(2)}x"
+puts
+# Test 5: Accuracy
+puts "5. Accuracy Check"
+puts "-" * 40
+puts "Actual count: 100,000"
+puts "Estimated count: #{hll.cardinality.round(0)}"
+puts "Error: #{((hll.cardinality - 100_000).abs / 100_000.0 * 100).round(2)}%"
+puts
+# Test 6: Hash Performance
+puts "6. Hash Function Performance"
+puts "-" * 40
+class HashTester
+  include Hyll::Utils::Hash
+end
+tester = HashTester.new
+time_hash = Benchmark.measure do
+  100_000.times { |i| tester.murmurhash3(i.to_s) }
+end
+puts "MurmurHash3 (100k hashes): #{time_hash.real.round(4)}s"
+puts "Per hash: #{((time_hash.real / 100_000) * 1_000_000).round(2)} microseconds"
+puts
+puts "=" * 60
+puts "BENCHMARK COMPLETE"
+puts "=" * 60