monotonic_grouper 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/README.md +156 -0
- data/ext/monotonic_grouper/extconf.rb +5 -0
- data/ext/monotonic_grouper/monotonic_grouper.c +173 -0
- data/lib/monotonic_grouper/monotonic_grouper.bundle +0 -0
- data/lib/monotonic_grouper/version.rb +3 -0
- data/lib/monotonic_grouper.rb +50 -0
- metadata +51 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: 508346e7c015553230c1d980e1a8d8830e5960185b78d34fba53eaf6fd475aef
|
|
4
|
+
data.tar.gz: bdb5b75f93444edfa9f2bdaa3461864e17ec628bf9db388995aa42fbeb34dac1
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 3b55f5727e17235ecd3666fb90b39cc01e117ff766e96294254d8e7a4f1767b7929770e32fc7f473349f6905e7c21fee57bf99ccb584079484b46bc1ca01b3b0
|
|
7
|
+
data.tar.gz: 1ee0c95d40d94801c0871f7e4951a57e69c32dc5563f3e9d2742c48fbbdb1d73693ec3664e3d6f690e5079f58211636a075dbbfd8b2f9c8c284d1b3c768619dc
|
data/README.md
ADDED
|
@@ -0,0 +1,156 @@
|
|
|
1
|
+
# MonotonicGrouper
|
|
2
|
+
|
|
3
|
+
Fast C extension for grouping monotonic sequences in Ruby arrays. Groups consecutive monotonic sequences into ranges while keeping isolated elements as singles.
|
|
4
|
+
|
|
5
|
+
## Features
|
|
6
|
+
|
|
7
|
+
- ⚡ **High Performance**: C implementation for maximum speed (O(n) complexity)
|
|
8
|
+
- 🔄 **Generic Support**: Works with any Comparable type that has `succ` method
|
|
9
|
+
- 📅 **Multiple Types**: Integers, Dates, Characters, and more
|
|
10
|
+
- 🎯 **Configurable**: Adjustable minimum range size
|
|
11
|
+
- 💎 **Ruby-friendly**: Seamless integration as Array method
|
|
12
|
+
|
|
13
|
+
## Installation
|
|
14
|
+
|
|
15
|
+
```bash
|
|
16
|
+
gem install monotonic_grouper
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
Or in your Gemfile:
|
|
20
|
+
|
|
21
|
+
```ruby
|
|
22
|
+
gem 'monotonic_grouper'
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
## Usage
|
|
26
|
+
|
|
27
|
+
```ruby
|
|
28
|
+
require 'monotonic_grouper'
|
|
29
|
+
|
|
30
|
+
# Basic integer usage
|
|
31
|
+
[1, 2, 3, 4, 5, 10, 11, 12].group_monotonic(3)
|
|
32
|
+
# => [2..5, 10..12]
|
|
33
|
+
|
|
34
|
+
# With singles (sequences shorter than min_range_size)
|
|
35
|
+
[1, 2, 3, 4, 7, 9, 10, 11, 12].group_monotonic(3)
|
|
36
|
+
# => [2..4, 7, 9, 10..12]
|
|
37
|
+
|
|
38
|
+
# Custom minimum range size
|
|
39
|
+
[1, 2, 3, 5, 6, 8].group_monotonic(2)
|
|
40
|
+
# => [2..3, 5..6, 8]
|
|
41
|
+
|
|
42
|
+
# Works with Dates
|
|
43
|
+
require 'date'
|
|
44
|
+
dates = [
|
|
45
|
+
Date.new(2024, 1, 1),
|
|
46
|
+
Date.new(2024, 1, 2),
|
|
47
|
+
Date.new(2024, 1, 3),
|
|
48
|
+
Date.new(2024, 1, 5),
|
|
49
|
+
Date.new(2024, 1, 6),
|
|
50
|
+
Date.new(2024, 1, 7)
|
|
51
|
+
]
|
|
52
|
+
dates.group_monotonic(3)
|
|
53
|
+
# => [Date.new(2024, 1, 2)..Date.new(2024, 1, 3),
|
|
54
|
+
# Date.new(2024, 1, 5)..Date.new(2024, 1, 7)]
|
|
55
|
+
|
|
56
|
+
# Works with characters
|
|
57
|
+
['a', 'b', 'c', 'd', 'f', 'g', 'h'].group_monotonic(3)
|
|
58
|
+
# => ['b'..'d', 'f'..'h']
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
## API
|
|
62
|
+
|
|
63
|
+
### `Array#group_monotonic(min_range_size = 3)`
|
|
64
|
+
|
|
65
|
+
Groups consecutive monotonic sequences in an array.
|
|
66
|
+
|
|
67
|
+
**Parameters:**
|
|
68
|
+
- `min_range_size` (Integer, optional): Minimum number of consecutive elements to form a range. Default: 3
|
|
69
|
+
|
|
70
|
+
**Returns:**
|
|
71
|
+
- Array containing Range objects for sequences and individual elements for singles
|
|
72
|
+
- First element is always excluded from the result
|
|
73
|
+
|
|
74
|
+
**Requirements:**
|
|
75
|
+
- All elements must be Comparable (respond to `<=>`)
|
|
76
|
+
- All elements must have `succ` method
|
|
77
|
+
- All elements must be of the same type
|
|
78
|
+
|
|
79
|
+
**Raises:**
|
|
80
|
+
- `TypeError` if elements don't meet requirements
|
|
81
|
+
|
|
82
|
+
## Algorithm Complexity
|
|
83
|
+
|
|
84
|
+
**Time Complexity**: O(n)
|
|
85
|
+
- Single pass through the array
|
|
86
|
+
- Each element is checked exactly once
|
|
87
|
+
- This is optimal - you cannot detect sequences without examining each element
|
|
88
|
+
|
|
89
|
+
**Space Complexity**: O(n)
|
|
90
|
+
- Result array can contain up to n elements in worst case
|
|
91
|
+
|
|
92
|
+
**Why O(n) is optimal:**
|
|
93
|
+
You cannot find monotonic sequences faster than O(n) because:
|
|
94
|
+
1. You must examine each element at least once
|
|
95
|
+
2. Any algorithm that skips elements might miss important data
|
|
96
|
+
3. The problem requires complete information about all elements
|
|
97
|
+
|
|
98
|
+
**C Extension Benefits:**
|
|
99
|
+
While algorithmic complexity stays O(n), the C extension provides:
|
|
100
|
+
- ~10-50x speedup over pure Ruby (depending on array size)
|
|
101
|
+
- Elimination of Ruby interpreter overhead
|
|
102
|
+
- Direct memory access
|
|
103
|
+
- Compiler optimizations
|
|
104
|
+
|
|
105
|
+
## Performance
|
|
106
|
+
|
|
107
|
+
Benchmark on 10,000 elements:
|
|
108
|
+
|
|
109
|
+
```ruby
|
|
110
|
+
require 'benchmark'
|
|
111
|
+
|
|
112
|
+
arr = (1..10000).to_a
|
|
113
|
+
|
|
114
|
+
Benchmark.bm do |x|
|
|
115
|
+
x.report("C extension:") { arr.group_monotonic }
|
|
116
|
+
end
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
The C extension is significantly faster than pure Ruby implementation while maintaining the same O(n) complexity.
|
|
120
|
+
|
|
121
|
+
## Building from Source
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
git clone https://github.com/yourusername/monotonic_grouper
|
|
125
|
+
cd monotonic_grouper
|
|
126
|
+
bundle install
|
|
127
|
+
rake compile
|
|
128
|
+
rake test
|
|
129
|
+
```
|
|
130
|
+
|
|
131
|
+
## Requirements
|
|
132
|
+
|
|
133
|
+
- Ruby >= 2.5.0
|
|
134
|
+
- C compiler (gcc, clang, etc.)
|
|
135
|
+
- Make
|
|
136
|
+
|
|
137
|
+
## Supported Types
|
|
138
|
+
|
|
139
|
+
Any Ruby object that:
|
|
140
|
+
1. Implements `Comparable` (`<=>` operator)
|
|
141
|
+
2. Has `succ` method (successor)
|
|
142
|
+
|
|
143
|
+
Examples:
|
|
144
|
+
- Integer
|
|
145
|
+
- Date
|
|
146
|
+
- Time
|
|
147
|
+
- String (single characters)
|
|
148
|
+
- Custom classes implementing these methods
|
|
149
|
+
|
|
150
|
+
## Contributing
|
|
151
|
+
|
|
152
|
+
Bug reports and pull requests are welcome!
|
|
153
|
+
|
|
154
|
+
## License
|
|
155
|
+
|
|
156
|
+
MIT License
|
|
@@ -0,0 +1,173 @@
|
|
|
1
|
+
#include <ruby.h>
|
|
2
|
+
|
|
3
|
+
#ifndef RB_BIGNUM_TYPE_P
|
|
4
|
+
#define RB_BIGNUM_TYPE_P(obj) (RB_TYPE_P((obj), T_BIGNUM))
|
|
5
|
+
#endif
|
|
6
|
+
|
|
7
|
+
static VALUE rb_mMonotonicGrouper;
|
|
8
|
+
static ID id_succ;
|
|
9
|
+
static ID id_eq;
|
|
10
|
+
|
|
11
|
+
static inline int
|
|
12
|
+
is_next_integer(VALUE a, VALUE b)
|
|
13
|
+
{
|
|
14
|
+
if (FIXNUM_P(a) && FIXNUM_P(b)) {
|
|
15
|
+
long av = FIX2LONG(a);
|
|
16
|
+
long bv = FIX2LONG(b);
|
|
17
|
+
return bv == av + 1;
|
|
18
|
+
}
|
|
19
|
+
return 0;
|
|
20
|
+
}
|
|
21
|
+
|
|
22
|
+
static inline int
|
|
23
|
+
is_next_in_sequence_generic(VALUE a, VALUE b)
|
|
24
|
+
{
|
|
25
|
+
VALUE succ_a = rb_funcall(a, id_succ, 0);
|
|
26
|
+
return RTEST(rb_funcall(b, id_eq, 1, succ_a));
|
|
27
|
+
}
|
|
28
|
+
|
|
29
|
+
static void
|
|
30
|
+
add_group_to_result_integer(VALUE result, VALUE group_start, VALUE group_end, long size, long min_range_size)
|
|
31
|
+
{
|
|
32
|
+
if (size >= min_range_size) {
|
|
33
|
+
VALUE range = rb_range_new(group_start, group_end, 0);
|
|
34
|
+
rb_ary_push(result, range);
|
|
35
|
+
} else {
|
|
36
|
+
long j;
|
|
37
|
+
long start_val = FIX2LONG(group_start);
|
|
38
|
+
for (j = 0; j < size; j++) {
|
|
39
|
+
rb_ary_push(result, LONG2FIX(start_val + j));
|
|
40
|
+
}
|
|
41
|
+
}
|
|
42
|
+
}
|
|
43
|
+
|
|
44
|
+
static void
|
|
45
|
+
add_group_to_result_generic(VALUE result, VALUE group_start, VALUE group_end, long size, long min_range_size)
|
|
46
|
+
{
|
|
47
|
+
if (size >= min_range_size) {
|
|
48
|
+
VALUE range = rb_range_new(group_start, group_end, 0);
|
|
49
|
+
rb_ary_push(result, range);
|
|
50
|
+
} else {
|
|
51
|
+
long j;
|
|
52
|
+
VALUE curr = group_start;
|
|
53
|
+
rb_ary_push(result, curr);
|
|
54
|
+
for (j = 1; j < size; j++) {
|
|
55
|
+
curr = rb_funcall(curr, id_succ, 0);
|
|
56
|
+
rb_ary_push(result, curr);
|
|
57
|
+
}
|
|
58
|
+
}
|
|
59
|
+
}
|
|
60
|
+
|
|
61
|
+
static VALUE
|
|
62
|
+
process_integer_array(VALUE self, long len, long min_range_size)
|
|
63
|
+
{
|
|
64
|
+
VALUE result = rb_ary_new2(len / 2);
|
|
65
|
+
VALUE first_elem = rb_ary_entry(self, 0);
|
|
66
|
+
VALUE group_start = first_elem;
|
|
67
|
+
VALUE group_end = first_elem;
|
|
68
|
+
VALUE prev_value = first_elem;
|
|
69
|
+
long current_size = 1;
|
|
70
|
+
long i;
|
|
71
|
+
|
|
72
|
+
for (i = 1; i < len; i++) {
|
|
73
|
+
VALUE curr_value = rb_ary_entry(self, i);
|
|
74
|
+
|
|
75
|
+
if (!FIXNUM_P(curr_value) && !RB_BIGNUM_TYPE_P(curr_value)) {
|
|
76
|
+
rb_raise(rb_eTypeError, "All elements must be of the same type");
|
|
77
|
+
}
|
|
78
|
+
|
|
79
|
+
if (is_next_integer(prev_value, curr_value)) {
|
|
80
|
+
group_end = curr_value;
|
|
81
|
+
current_size++;
|
|
82
|
+
} else {
|
|
83
|
+
add_group_to_result_integer(result, group_start, group_end, current_size, min_range_size);
|
|
84
|
+
group_start = curr_value;
|
|
85
|
+
group_end = curr_value;
|
|
86
|
+
current_size = 1;
|
|
87
|
+
}
|
|
88
|
+
|
|
89
|
+
prev_value = curr_value;
|
|
90
|
+
}
|
|
91
|
+
|
|
92
|
+
add_group_to_result_integer(result, group_start, group_end, current_size, min_range_size);
|
|
93
|
+
|
|
94
|
+
return result;
|
|
95
|
+
}
|
|
96
|
+
|
|
97
|
+
static VALUE
|
|
98
|
+
process_generic_array(VALUE self, long len, long min_range_size, VALUE first_elem)
|
|
99
|
+
{
|
|
100
|
+
VALUE result = rb_ary_new2(len / 2);
|
|
101
|
+
VALUE group_start = first_elem;
|
|
102
|
+
VALUE group_end = first_elem;
|
|
103
|
+
VALUE prev_value = first_elem;
|
|
104
|
+
VALUE first_class = CLASS_OF(first_elem);
|
|
105
|
+
long current_size = 1;
|
|
106
|
+
long i;
|
|
107
|
+
|
|
108
|
+
for (i = 1; i < len; i++) {
|
|
109
|
+
VALUE curr_value = rb_ary_entry(self, i);
|
|
110
|
+
if (CLASS_OF(curr_value) != first_class) {
|
|
111
|
+
rb_raise(rb_eTypeError, "All elements must be of the same type");
|
|
112
|
+
}
|
|
113
|
+
|
|
114
|
+
if (is_next_in_sequence_generic(prev_value, curr_value)) {
|
|
115
|
+
group_end = curr_value;
|
|
116
|
+
current_size++;
|
|
117
|
+
} else {
|
|
118
|
+
add_group_to_result_generic(result, group_start, group_end, current_size, min_range_size);
|
|
119
|
+
group_start = curr_value;
|
|
120
|
+
group_end = curr_value;
|
|
121
|
+
current_size = 1;
|
|
122
|
+
}
|
|
123
|
+
|
|
124
|
+
prev_value = curr_value;
|
|
125
|
+
}
|
|
126
|
+
|
|
127
|
+
add_group_to_result_generic(result, group_start, group_end, current_size, min_range_size);
|
|
128
|
+
|
|
129
|
+
return result;
|
|
130
|
+
}
|
|
131
|
+
|
|
132
|
+
static VALUE
|
|
133
|
+
rb_array_group_monotonic(int argc, VALUE *argv, VALUE self)
|
|
134
|
+
{
|
|
135
|
+
VALUE min_range_size_val;
|
|
136
|
+
long min_range_size;
|
|
137
|
+
long len;
|
|
138
|
+
VALUE first_elem;
|
|
139
|
+
|
|
140
|
+
rb_scan_args(argc, argv, "01", &min_range_size_val);
|
|
141
|
+
min_range_size = NIL_P(min_range_size_val) ? 3 : NUM2LONG(min_range_size_val);
|
|
142
|
+
|
|
143
|
+
if (min_range_size < 1) {
|
|
144
|
+
rb_raise(rb_eArgError, "min_range_size must be at least 1");
|
|
145
|
+
}
|
|
146
|
+
|
|
147
|
+
len = RARRAY_LEN(self);
|
|
148
|
+
|
|
149
|
+
if (len == 0) {
|
|
150
|
+
return rb_ary_new();
|
|
151
|
+
}
|
|
152
|
+
|
|
153
|
+
first_elem = rb_ary_entry(self, 0);
|
|
154
|
+
|
|
155
|
+
if (FIXNUM_P(first_elem) || RB_BIGNUM_TYPE_P(first_elem)) {
|
|
156
|
+
return process_integer_array(self, len, min_range_size);
|
|
157
|
+
}
|
|
158
|
+
|
|
159
|
+
if (!rb_respond_to(first_elem, id_succ)) {
|
|
160
|
+
rb_raise(rb_eTypeError, "Elements must respond to :succ method");
|
|
161
|
+
}
|
|
162
|
+
|
|
163
|
+
return process_generic_array(self, len, min_range_size, first_elem);
|
|
164
|
+
}
|
|
165
|
+
|
|
166
|
+
void
|
|
167
|
+
Init_monotonic_grouper(void)
|
|
168
|
+
{
|
|
169
|
+
id_succ = rb_intern("succ");
|
|
170
|
+
id_eq = rb_intern("==");
|
|
171
|
+
rb_mMonotonicGrouper = rb_define_module("MonotonicGrouper");
|
|
172
|
+
rb_define_method(rb_cArray, "group_monotonic", rb_array_group_monotonic, -1);
|
|
173
|
+
}
|
|
Binary file
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
require 'monotonic_grouper/version'
|
|
2
|
+
|
|
3
|
+
# Load the compiled extension (.so on Linux, .bundle on macOS)
|
|
4
|
+
begin
|
|
5
|
+
require 'monotonic_grouper/monotonic_grouper'
|
|
6
|
+
rescue LoadError
|
|
7
|
+
# Fallback for different extension names
|
|
8
|
+
ext_dir = File.expand_path('../monotonic_grouper', __FILE__)
|
|
9
|
+
if File.exist?(File.join(ext_dir, 'monotonic_grouper.bundle'))
|
|
10
|
+
require File.join(ext_dir, 'monotonic_grouper.bundle')
|
|
11
|
+
elsif File.exist?(File.join(ext_dir, 'monotonic_grouper.so'))
|
|
12
|
+
require File.join(ext_dir, 'monotonic_grouper.so')
|
|
13
|
+
else
|
|
14
|
+
raise LoadError, "Could not find compiled extension"
|
|
15
|
+
end
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
module FastBloomFilter
|
|
19
|
+
class Filter
|
|
20
|
+
def add_all(items)
|
|
21
|
+
items.each { |item| add(item.to_s) }
|
|
22
|
+
self
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
def count_possible_matches(items)
|
|
26
|
+
items.count { |item| include?(item.to_s) }
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
def inspect
|
|
30
|
+
s = stats
|
|
31
|
+
total_kb = (s[:total_bytes] / 1024.0).round(2)
|
|
32
|
+
fill_pct = (s[:fill_ratio] * 100).round(2)
|
|
33
|
+
|
|
34
|
+
"#<FastBloomFilter::Filter v2 layers=#{s[:num_layers]} " \
|
|
35
|
+
"count=#{s[:total_count]} size=#{total_kb}KB fill=#{fill_pct}%>"
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
def to_s
|
|
39
|
+
inspect
|
|
40
|
+
end
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
def self.for_emails(error_rate: 0.001, initial_capacity: 10_000)
|
|
44
|
+
Filter.new(error_rate: error_rate, initial_capacity: initial_capacity)
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
def self.for_urls(error_rate: 0.01, initial_capacity: 10_000)
|
|
48
|
+
Filter.new(error_rate: error_rate, initial_capacity: initial_capacity)
|
|
49
|
+
end
|
|
50
|
+
end
|
metadata
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
|
2
|
+
name: monotonic_grouper
|
|
3
|
+
version: !ruby/object:Gem::Version
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
platform: ruby
|
|
6
|
+
authors:
|
|
7
|
+
- Roman Hajdarov
|
|
8
|
+
autorequire:
|
|
9
|
+
bindir: bin
|
|
10
|
+
cert_chain: []
|
|
11
|
+
date: 2026-02-13 00:00:00.000000000 Z
|
|
12
|
+
dependencies: []
|
|
13
|
+
description: Groups consecutive monotonic sequences in arrays into ranges. Supports
|
|
14
|
+
any Comparable type with succ method.
|
|
15
|
+
email:
|
|
16
|
+
- romnhajdarov@gmail.com
|
|
17
|
+
executables: []
|
|
18
|
+
extensions:
|
|
19
|
+
- ext/monotonic_grouper/extconf.rb
|
|
20
|
+
extra_rdoc_files: []
|
|
21
|
+
files:
|
|
22
|
+
- README.md
|
|
23
|
+
- ext/monotonic_grouper/extconf.rb
|
|
24
|
+
- ext/monotonic_grouper/monotonic_grouper.c
|
|
25
|
+
- lib/monotonic_grouper.rb
|
|
26
|
+
- lib/monotonic_grouper/monotonic_grouper.bundle
|
|
27
|
+
- lib/monotonic_grouper/version.rb
|
|
28
|
+
homepage: https://github.com/romanhajdarov/monotonic_grouper
|
|
29
|
+
licenses:
|
|
30
|
+
- MIT
|
|
31
|
+
metadata: {}
|
|
32
|
+
post_install_message:
|
|
33
|
+
rdoc_options: []
|
|
34
|
+
require_paths:
|
|
35
|
+
- lib
|
|
36
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
|
37
|
+
requirements:
|
|
38
|
+
- - ">="
|
|
39
|
+
- !ruby/object:Gem::Version
|
|
40
|
+
version: 2.5.0
|
|
41
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
42
|
+
requirements:
|
|
43
|
+
- - ">="
|
|
44
|
+
- !ruby/object:Gem::Version
|
|
45
|
+
version: '0'
|
|
46
|
+
requirements: []
|
|
47
|
+
rubygems_version: 3.4.22
|
|
48
|
+
signing_key:
|
|
49
|
+
specification_version: 4
|
|
50
|
+
summary: Fast C extension for grouping monotonic sequences
|
|
51
|
+
test_files: []
|