fast_string 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/LICENSE.txt +21 -0
- data/README.md +89 -0
- data/benchmark/benchmark.rb +82 -0
- data/ext/fast_string/extconf.rb +50 -0
- data/ext/fast_string/fast_string.c +166 -0
- data/lib/fast_string/version.rb +3 -0
- data/lib/fast_string.rb +21 -0
- metadata +126 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: b9fd44e3eb58b60d33076e9bde88fed85a53604fb0b3f67d5dbf9d7ade85f496
|
|
4
|
+
data.tar.gz: 006a7207651e9eb23169f0eb340c502d5a8a796a98ae3e59cc95281949af52f7
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: 28ac83efab70d112ab6d0a3a99c8d82d49912201b1f646ea701374fcb21acb71ca7a30741595980dfbb55e658737e5ed05eac1570646cfeccf74af8e1c6d2312
|
|
7
|
+
data.tar.gz: a7cf692ee5e95c9c865a909c691370bd33b90646704fa18546fa190de752788e602b59625f187698c0bb9246bdc5dca9607523f2cba4f9ef3ae3110e21cdf521
|
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Roman Haydarov
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
# fast_string
|
|
2
|
+
|
|
3
|
+
High-performance Ruby `String` extensions implemented in **C**.
|
|
4
|
+
|
|
5
|
+
[](https://badge.fury.io/rb/fast_string)
|
|
6
|
+
[](https://github.com/roman-haidarov/fast_string/actions)
|
|
7
|
+
|
|
8
|
+
Optimized string methods for high-throughput workloads: log processing, CSV parsing, HTTP parsing, text analytics, streaming pipelines.
|
|
9
|
+
|
|
10
|
+
## Installation
|
|
11
|
+
|
|
12
|
+
```ruby
|
|
13
|
+
gem 'fast_string'
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
## Usage
|
|
17
|
+
|
|
18
|
+
```ruby
|
|
19
|
+
require 'fast_string'
|
|
20
|
+
|
|
21
|
+
"hello".fs_count("l") #=> 2
|
|
22
|
+
"line1\nline2\nline3".fs_lines #=> 2
|
|
23
|
+
" \t\n ".fs_blank? #=> true
|
|
24
|
+
" hello ".fs_trim #=> "hello"
|
|
25
|
+
"hello\nworld".fs_byte_replace("\n", " ") #=> "hello world"
|
|
26
|
+
"hello\r\nworld".fs_byte_delete("\r") #=> "hello\nworld"
|
|
27
|
+
|
|
28
|
+
data.fs_each_line { |line| puts line }
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## API
|
|
32
|
+
|
|
33
|
+
| Method | Ruby equivalent | What it does |
|
|
34
|
+
|---|---|---|
|
|
35
|
+
| `fs_count(char)` | `str.count(char)` | Count occurrences of a single byte |
|
|
36
|
+
| `fs_lines` | `str.count("\n")` | Count newline characters |
|
|
37
|
+
| `fs_blank?` | `str.strip.empty?` | Check if string is whitespace-only |
|
|
38
|
+
| `fs_trim` | `str.strip` | Strip whitespace (zero-copy) |
|
|
39
|
+
| `fs_byte_replace(from, to)` | `str.tr(from, to)` | Replace one byte with another |
|
|
40
|
+
| `fs_byte_delete(char)` | `str.delete(char)` | Delete all occurrences of a byte |
|
|
41
|
+
| `fs_each_line { }` | `str.each_line { }` | Iterate lines via memchr |
|
|
42
|
+
|
|
43
|
+
All methods operate at **byte level** using `memchr` and direct memory access. Single-character arguments only.
|
|
44
|
+
|
|
45
|
+
`fs_trim` returns a zero-copy shared substring — no `memcpy` unlike Ruby's `strip`.
|
|
46
|
+
|
|
47
|
+
`fs_byte_replace` and `fs_byte_delete` return new strings; originals are not mutated.
|
|
48
|
+
|
|
49
|
+
## Benchmark
|
|
50
|
+
|
|
51
|
+
Apple Silicon (M1), Ruby 2.7. Numbers are **times faster** than Ruby stdlib equivalent:
|
|
52
|
+
|
|
53
|
+
| Method | 42KB | 10MB | 9.6MB CSV | 38MB Log | 5.8MB Unicode |
|
|
54
|
+
|---|---|---|---|---|---|
|
|
55
|
+
| `fs_count` | 1.8x | 7.9x | 6.2x | 9.0x | 9.2x |
|
|
56
|
+
| `fs_lines` | 1.9x | 7.9x | 6.2x | 9.0x | 9.1x |
|
|
57
|
+
| `fs_blank?` | 4.1x | 4.0x | 4.0x | 4.0x | 4.1x |
|
|
58
|
+
| `fs_trim` | 3.8x | 4.0x | 4.0x | 3.9x | 4.0x |
|
|
59
|
+
| `fs_byte_replace` | 5.7x | 15.0x | 13.0x | **16.1x** | **67.9x** |
|
|
60
|
+
| `fs_byte_delete` | 3.8x | 13.5x | 12.2x | **15.1x** | **33.2x** |
|
|
61
|
+
| `fs_each_line` | 1.1x | 1.1x | 1.0x | 1.1x | 1.0x |
|
|
62
|
+
|
|
63
|
+
`fs_byte_replace` and `fs_byte_delete` show the largest gains because Ruby's `tr` and `delete` go through the encoding layer per character, while fast_string uses `memchr` to skip non-matching regions in bulk.
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
ruby benchmark/benchmark.rb
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
## Requirements
|
|
70
|
+
|
|
71
|
+
- Ruby >= 2.7.0
|
|
72
|
+
- C compiler (GCC, Clang)
|
|
73
|
+
|
|
74
|
+
## Platforms
|
|
75
|
+
|
|
76
|
+
Linux, macOS, BSD, ARM (Apple Silicon), x86_64
|
|
77
|
+
|
|
78
|
+
## Development
|
|
79
|
+
|
|
80
|
+
```bash
|
|
81
|
+
bundle install
|
|
82
|
+
rake compile
|
|
83
|
+
ruby test_basic.rb
|
|
84
|
+
ruby benchmark/benchmark.rb
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## License
|
|
88
|
+
|
|
89
|
+
MIT
|
|
@@ -0,0 +1,82 @@
|
|
|
1
|
+
require 'benchmark'
|
|
2
|
+
|
|
3
|
+
$LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
|
|
4
|
+
require 'fast_string'
|
|
5
|
+
|
|
6
|
+
small_data = "hello\nworld\ntest line with some text here\n" * 1000
|
|
7
|
+
medium_data = "This is a longer test line with various characters and symbols @#$%^&*().\nAnother line here with different content and more text to process.\nYet another line with even more content to make this realistic.\n" * 50000
|
|
8
|
+
csv_data = ("user_id,name,email,age,city,country\n" + "12345,John Doe,john@example.com,30,New York,USA\n" * 200000).freeze
|
|
9
|
+
log_data = ("2024-01-01T10:30:45.123Z [INFO ] ApplicationController - Processing request\n" * 500000).freeze
|
|
10
|
+
mixed_unicode = "ASCII text here with some unicode: тест, 测试, مرحبا, こんにちは\n" * 100000
|
|
11
|
+
|
|
12
|
+
puts "FastString Benchmark Results"
|
|
13
|
+
puts "=" * 50
|
|
14
|
+
puts
|
|
15
|
+
|
|
16
|
+
puts "Test data sizes:"
|
|
17
|
+
puts "Small: #{small_data.length} bytes"
|
|
18
|
+
puts "Medium: #{medium_data.length} bytes"
|
|
19
|
+
puts "CSV: #{csv_data.length} bytes"
|
|
20
|
+
puts "Log: #{log_data.length} bytes"
|
|
21
|
+
puts "Mixed: #{mixed_unicode.length} bytes"
|
|
22
|
+
puts
|
|
23
|
+
|
|
24
|
+
test_cases = [
|
|
25
|
+
["Small Data", small_data],
|
|
26
|
+
["Medium Data", medium_data],
|
|
27
|
+
["CSV Data", csv_data],
|
|
28
|
+
["Log Data", log_data],
|
|
29
|
+
["Mixed Unicode", mixed_unicode]
|
|
30
|
+
]
|
|
31
|
+
|
|
32
|
+
test_cases.each do |name, data|
|
|
33
|
+
puts "#{name} (#{data.length} bytes):"
|
|
34
|
+
puts "-" * 30
|
|
35
|
+
|
|
36
|
+
puts "Counting newline characters:"
|
|
37
|
+
Benchmark.bm(19) do |x|
|
|
38
|
+
x.report("ruby count") { 1000.times { data.count("\n") } }
|
|
39
|
+
x.report("fs_count") { 1000.times { data.fs_count("\n") } }
|
|
40
|
+
x.report("fs_lines") { 1000.times { data.fs_lines } }
|
|
41
|
+
end
|
|
42
|
+
puts
|
|
43
|
+
|
|
44
|
+
whitespace_data = " \t\n\r " * (data.length / 20)
|
|
45
|
+
puts "Checking blank string:"
|
|
46
|
+
Benchmark.bm(19) do |x|
|
|
47
|
+
x.report("ruby strip.empty?") { 1000.times { whitespace_data.strip.empty? } }
|
|
48
|
+
x.report("fs_blank?") { 1000.times { whitespace_data.fs_blank? } }
|
|
49
|
+
end
|
|
50
|
+
puts
|
|
51
|
+
|
|
52
|
+
puts "Stripping whitespace:"
|
|
53
|
+
Benchmark.bm(19) do |x|
|
|
54
|
+
x.report("ruby strip") { 1000.times { whitespace_data.strip } }
|
|
55
|
+
x.report("fs_trim") { 1000.times { whitespace_data.fs_trim } }
|
|
56
|
+
end
|
|
57
|
+
puts
|
|
58
|
+
|
|
59
|
+
puts "Replacing byte (\\n -> space):"
|
|
60
|
+
Benchmark.bm(19) do |x|
|
|
61
|
+
x.report("ruby tr") { 100.times { data.tr("\n", " ") } }
|
|
62
|
+
x.report("fs_byte_replace") { 100.times { data.fs_byte_replace("\n", " ") } }
|
|
63
|
+
end
|
|
64
|
+
puts
|
|
65
|
+
|
|
66
|
+
puts "Deleting byte (\\r):"
|
|
67
|
+
crlf_data = data.gsub("\n", "\r\n")
|
|
68
|
+
Benchmark.bm(19) do |x|
|
|
69
|
+
x.report("ruby delete") { 100.times { crlf_data.delete("\r") } }
|
|
70
|
+
x.report("fs_byte_delete") { 100.times { crlf_data.fs_byte_delete("\r") } }
|
|
71
|
+
end
|
|
72
|
+
puts
|
|
73
|
+
|
|
74
|
+
puts "Iterating lines:"
|
|
75
|
+
Benchmark.bm(19) do |x|
|
|
76
|
+
x.report("ruby each_line") { 100.times { data.each_line { |line| line.length } } }
|
|
77
|
+
x.report("fs_each_line") { 100.times { data.fs_each_line { |line| line.length } } }
|
|
78
|
+
end
|
|
79
|
+
|
|
80
|
+
puts "=" * 50
|
|
81
|
+
puts
|
|
82
|
+
end
|
|
@@ -0,0 +1,50 @@
|
|
|
1
|
+
require 'mkmf'
|
|
2
|
+
|
|
3
|
+
# Add optimization flags for maximum performance
|
|
4
|
+
$CFLAGS += " -O3 -funroll-loops -ffast-math"
|
|
5
|
+
|
|
6
|
+
# Additional performance flags
|
|
7
|
+
$CFLAGS += " -DNDEBUG" # Disable debug assertions
|
|
8
|
+
|
|
9
|
+
# SIMD support detection and flags
|
|
10
|
+
def check_simd_support
|
|
11
|
+
# Check for AVX2 support
|
|
12
|
+
if try_compile("#include <immintrin.h>\nint main() { __m256i v = _mm256_setzero_si256(); return 0; }")
|
|
13
|
+
puts "AVX2 support detected"
|
|
14
|
+
$CFLAGS += " -mavx2"
|
|
15
|
+
# Check for SSE4.1 support
|
|
16
|
+
elsif try_compile("#include <smmintrin.h>\nint main() { __m128i v = _mm_setzero_si128(); return 0; }")
|
|
17
|
+
puts "SSE4.1 support detected"
|
|
18
|
+
$CFLAGS += " -msse4.1"
|
|
19
|
+
else
|
|
20
|
+
puts "No advanced SIMD support detected, using scalar fallback"
|
|
21
|
+
end
|
|
22
|
+
|
|
23
|
+
# Check for ARM NEON on ARM platforms
|
|
24
|
+
if RUBY_PLATFORM =~ /arm|aarch64/i
|
|
25
|
+
if try_compile("#include <arm_neon.h>\nint main() { uint8x16_t v = vdupq_n_u8(0); return 0; }")
|
|
26
|
+
puts "ARM NEON support detected"
|
|
27
|
+
$CFLAGS += " -mfpu=neon" if RUBY_PLATFORM !~ /aarch64/i
|
|
28
|
+
end
|
|
29
|
+
end
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
# Enable native arch optimizations if supported
|
|
33
|
+
if try_compile("int main() { return 0; }", "-march=native")
|
|
34
|
+
$CFLAGS += " -march=native"
|
|
35
|
+
puts "Using -march=native for optimal performance"
|
|
36
|
+
else
|
|
37
|
+
puts "march=native not supported, using manual SIMD detection"
|
|
38
|
+
check_simd_support
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
# Check for required headers and functions
|
|
42
|
+
have_header('immintrin.h')
|
|
43
|
+
have_header('smmintrin.h')
|
|
44
|
+
have_header('arm_neon.h')
|
|
45
|
+
|
|
46
|
+
# Ensure we have string.h and required Ruby functions
|
|
47
|
+
abort "string.h is required" unless have_header('string.h')
|
|
48
|
+
abort "Ruby encoding functions not available" unless have_func('rb_enc_get', 'ruby/encoding.h')
|
|
49
|
+
|
|
50
|
+
create_makefile('fast_string/fast_string')
|
|
@@ -0,0 +1,166 @@
|
|
|
1
|
+
#include <ruby.h>
|
|
2
|
+
#include <string.h>
|
|
3
|
+
#include <stdint.h>
|
|
4
|
+
|
|
5
|
+
static const int whitespace_table[256] = {
|
|
6
|
+
[' '] = 1, ['\t'] = 1, ['\n'] = 1, ['\r'] = 1, ['\f'] = 1, ['\v'] = 1
|
|
7
|
+
};
|
|
8
|
+
|
|
9
|
+
static VALUE rb_string_fs_count(VALUE self, VALUE target) {
|
|
10
|
+
Check_Type(target, T_STRING);
|
|
11
|
+
if (RSTRING_LEN(target) != 1)
|
|
12
|
+
rb_raise(rb_eArgError, "target must be a single character");
|
|
13
|
+
|
|
14
|
+
char *ptr = RSTRING_PTR(self);
|
|
15
|
+
long len = RSTRING_LEN(self);
|
|
16
|
+
unsigned char target_char = (unsigned char)RSTRING_PTR(target)[0];
|
|
17
|
+
long count = 0;
|
|
18
|
+
char *current = ptr;
|
|
19
|
+
char *end = ptr + len;
|
|
20
|
+
|
|
21
|
+
while (current < end) {
|
|
22
|
+
char *found = memchr(current, target_char, end - current);
|
|
23
|
+
if (found == NULL) break;
|
|
24
|
+
count++;
|
|
25
|
+
current = found + 1;
|
|
26
|
+
}
|
|
27
|
+
return LONG2NUM(count);
|
|
28
|
+
}
|
|
29
|
+
|
|
30
|
+
static VALUE rb_string_fs_lines(VALUE self) {
|
|
31
|
+
char *ptr = RSTRING_PTR(self);
|
|
32
|
+
long len = RSTRING_LEN(self);
|
|
33
|
+
long count = 0;
|
|
34
|
+
char *current = ptr;
|
|
35
|
+
char *end = ptr + len;
|
|
36
|
+
|
|
37
|
+
while (current < end) {
|
|
38
|
+
char *found = memchr(current, '\n', end - current);
|
|
39
|
+
if (found == NULL) break;
|
|
40
|
+
count++;
|
|
41
|
+
current = found + 1;
|
|
42
|
+
}
|
|
43
|
+
return LONG2NUM(count);
|
|
44
|
+
}
|
|
45
|
+
|
|
46
|
+
static VALUE rb_string_fs_blank(VALUE self) {
|
|
47
|
+
char *ptr = RSTRING_PTR(self);
|
|
48
|
+
long len = RSTRING_LEN(self);
|
|
49
|
+
|
|
50
|
+
for (long i = 0; i < len; i++) {
|
|
51
|
+
if (!whitespace_table[(unsigned char)ptr[i]])
|
|
52
|
+
return Qfalse;
|
|
53
|
+
}
|
|
54
|
+
return Qtrue;
|
|
55
|
+
}
|
|
56
|
+
|
|
57
|
+
static VALUE rb_string_fs_each_line(VALUE self) {
|
|
58
|
+
RETURN_ENUMERATOR(self, 0, 0);
|
|
59
|
+
|
|
60
|
+
char *ptr = RSTRING_PTR(self);
|
|
61
|
+
long len = RSTRING_LEN(self);
|
|
62
|
+
if (len == 0) return self;
|
|
63
|
+
|
|
64
|
+
char *current = ptr;
|
|
65
|
+
char *end = ptr + len;
|
|
66
|
+
|
|
67
|
+
while (current < end) {
|
|
68
|
+
char *found = memchr(current, '\n', end - current);
|
|
69
|
+
long line_len = found ? (found - current + 1) : (end - current);
|
|
70
|
+
rb_yield(rb_str_subseq(self, current - ptr, line_len));
|
|
71
|
+
current += line_len;
|
|
72
|
+
}
|
|
73
|
+
return self;
|
|
74
|
+
}
|
|
75
|
+
|
|
76
|
+
static VALUE rb_string_fs_trim(VALUE self) {
|
|
77
|
+
char *ptr = RSTRING_PTR(self);
|
|
78
|
+
long len = RSTRING_LEN(self);
|
|
79
|
+
long start = 0;
|
|
80
|
+
long end = len;
|
|
81
|
+
|
|
82
|
+
while (start < end && whitespace_table[(unsigned char)ptr[start]])
|
|
83
|
+
start++;
|
|
84
|
+
while (end > start && whitespace_table[(unsigned char)ptr[end - 1]])
|
|
85
|
+
end--;
|
|
86
|
+
|
|
87
|
+
if (start == 0 && end == len)
|
|
88
|
+
return rb_str_dup(self);
|
|
89
|
+
|
|
90
|
+
return rb_str_subseq(self, start, end - start);
|
|
91
|
+
}
|
|
92
|
+
|
|
93
|
+
static VALUE rb_string_fs_byte_replace(VALUE self, VALUE from, VALUE to) {
|
|
94
|
+
Check_Type(from, T_STRING);
|
|
95
|
+
Check_Type(to, T_STRING);
|
|
96
|
+
if (RSTRING_LEN(from) != 1 || RSTRING_LEN(to) != 1)
|
|
97
|
+
rb_raise(rb_eArgError, "from and to must be single characters");
|
|
98
|
+
|
|
99
|
+
char *ptr = RSTRING_PTR(self);
|
|
100
|
+
long len = RSTRING_LEN(self);
|
|
101
|
+
unsigned char from_byte = (unsigned char)RSTRING_PTR(from)[0];
|
|
102
|
+
unsigned char to_byte = (unsigned char)RSTRING_PTR(to)[0];
|
|
103
|
+
|
|
104
|
+
if (from_byte == to_byte)
|
|
105
|
+
return rb_str_dup(self);
|
|
106
|
+
|
|
107
|
+
VALUE result = rb_str_new(ptr, len);
|
|
108
|
+
rb_enc_associate(result, rb_enc_get(self));
|
|
109
|
+
char *out = RSTRING_PTR(result);
|
|
110
|
+
char *current = out;
|
|
111
|
+
char *end = out + len;
|
|
112
|
+
|
|
113
|
+
while (current < end) {
|
|
114
|
+
char *found = memchr(current, from_byte, end - current);
|
|
115
|
+
if (found == NULL) break;
|
|
116
|
+
*found = (char)to_byte;
|
|
117
|
+
current = found + 1;
|
|
118
|
+
}
|
|
119
|
+
return result;
|
|
120
|
+
}
|
|
121
|
+
|
|
122
|
+
static VALUE rb_string_fs_byte_delete(VALUE self, VALUE target) {
|
|
123
|
+
Check_Type(target, T_STRING);
|
|
124
|
+
if (RSTRING_LEN(target) != 1)
|
|
125
|
+
rb_raise(rb_eArgError, "target must be a single character");
|
|
126
|
+
|
|
127
|
+
char *ptr = RSTRING_PTR(self);
|
|
128
|
+
long len = RSTRING_LEN(self);
|
|
129
|
+
unsigned char target_byte = (unsigned char)RSTRING_PTR(target)[0];
|
|
130
|
+
|
|
131
|
+
VALUE result = rb_str_buf_new(len);
|
|
132
|
+
rb_enc_associate(result, rb_enc_get(self));
|
|
133
|
+
char *out = RSTRING_PTR(result);
|
|
134
|
+
long out_len = 0;
|
|
135
|
+
char *current = ptr;
|
|
136
|
+
char *end = ptr + len;
|
|
137
|
+
|
|
138
|
+
while (current < end) {
|
|
139
|
+
char *found = memchr(current, target_byte, end - current);
|
|
140
|
+
if (found == NULL) {
|
|
141
|
+
long chunk = end - current;
|
|
142
|
+
memcpy(out + out_len, current, chunk);
|
|
143
|
+
out_len += chunk;
|
|
144
|
+
break;
|
|
145
|
+
}
|
|
146
|
+
long chunk = found - current;
|
|
147
|
+
if (chunk > 0) {
|
|
148
|
+
memcpy(out + out_len, current, chunk);
|
|
149
|
+
out_len += chunk;
|
|
150
|
+
}
|
|
151
|
+
current = found + 1;
|
|
152
|
+
}
|
|
153
|
+
|
|
154
|
+
rb_str_set_len(result, out_len);
|
|
155
|
+
return result;
|
|
156
|
+
}
|
|
157
|
+
|
|
158
|
+
void Init_fast_string(void) {
|
|
159
|
+
rb_define_method(rb_cString, "fs_count", rb_string_fs_count, 1);
|
|
160
|
+
rb_define_method(rb_cString, "fs_lines", rb_string_fs_lines, 0);
|
|
161
|
+
rb_define_method(rb_cString, "fs_blank?", rb_string_fs_blank, 0);
|
|
162
|
+
rb_define_method(rb_cString, "fs_each_line", rb_string_fs_each_line, 0);
|
|
163
|
+
rb_define_method(rb_cString, "fs_trim", rb_string_fs_trim, 0);
|
|
164
|
+
rb_define_method(rb_cString, "fs_byte_replace", rb_string_fs_byte_replace, 2);
|
|
165
|
+
rb_define_method(rb_cString, "fs_byte_delete", rb_string_fs_byte_delete, 1);
|
|
166
|
+
}
|
data/lib/fast_string.rb
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
require 'fast_string/version'
|
|
2
|
+
|
|
3
|
+
# Load the compiled extension (.so on Linux, .bundle on macOS)
|
|
4
|
+
begin
|
|
5
|
+
require 'fast_string/fast_string'
|
|
6
|
+
rescue LoadError
|
|
7
|
+
# Fallback for different extension names
|
|
8
|
+
ext_dir = File.expand_path('../fast_string', __FILE__)
|
|
9
|
+
if File.exist?(File.join(ext_dir, 'fast_string.bundle'))
|
|
10
|
+
require File.join(ext_dir, 'fast_string.bundle')
|
|
11
|
+
elsif File.exist?(File.join(ext_dir, 'fast_string.so'))
|
|
12
|
+
require File.join(ext_dir, 'fast_string.so')
|
|
13
|
+
else
|
|
14
|
+
raise LoadError, "Could not find compiled extension"
|
|
15
|
+
end
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
# The C extension will automatically extend String class with fs_ methods
|
|
19
|
+
module FastString
|
|
20
|
+
# Module can be used for future utility methods if needed
|
|
21
|
+
end
|
metadata
ADDED
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
|
2
|
+
name: fast_string
|
|
3
|
+
version: !ruby/object:Gem::Version
|
|
4
|
+
version: 0.1.0
|
|
5
|
+
platform: ruby
|
|
6
|
+
authors:
|
|
7
|
+
- Roman Haydarov
|
|
8
|
+
autorequire:
|
|
9
|
+
bindir: exe
|
|
10
|
+
cert_chain: []
|
|
11
|
+
date: 2026-03-20 00:00:00.000000000 Z
|
|
12
|
+
dependencies:
|
|
13
|
+
- !ruby/object:Gem::Dependency
|
|
14
|
+
name: bundler
|
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
|
16
|
+
requirements:
|
|
17
|
+
- - "~>"
|
|
18
|
+
- !ruby/object:Gem::Version
|
|
19
|
+
version: '2.0'
|
|
20
|
+
type: :development
|
|
21
|
+
prerelease: false
|
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
23
|
+
requirements:
|
|
24
|
+
- - "~>"
|
|
25
|
+
- !ruby/object:Gem::Version
|
|
26
|
+
version: '2.0'
|
|
27
|
+
- !ruby/object:Gem::Dependency
|
|
28
|
+
name: rake
|
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
|
30
|
+
requirements:
|
|
31
|
+
- - "~>"
|
|
32
|
+
- !ruby/object:Gem::Version
|
|
33
|
+
version: '13.0'
|
|
34
|
+
type: :development
|
|
35
|
+
prerelease: false
|
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
37
|
+
requirements:
|
|
38
|
+
- - "~>"
|
|
39
|
+
- !ruby/object:Gem::Version
|
|
40
|
+
version: '13.0'
|
|
41
|
+
- !ruby/object:Gem::Dependency
|
|
42
|
+
name: rake-compiler
|
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
|
44
|
+
requirements:
|
|
45
|
+
- - "~>"
|
|
46
|
+
- !ruby/object:Gem::Version
|
|
47
|
+
version: '1.2'
|
|
48
|
+
type: :development
|
|
49
|
+
prerelease: false
|
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
51
|
+
requirements:
|
|
52
|
+
- - "~>"
|
|
53
|
+
- !ruby/object:Gem::Version
|
|
54
|
+
version: '1.2'
|
|
55
|
+
- !ruby/object:Gem::Dependency
|
|
56
|
+
name: minitest
|
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
|
58
|
+
requirements:
|
|
59
|
+
- - "~>"
|
|
60
|
+
- !ruby/object:Gem::Version
|
|
61
|
+
version: '5.0'
|
|
62
|
+
type: :development
|
|
63
|
+
prerelease: false
|
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
65
|
+
requirements:
|
|
66
|
+
- - "~>"
|
|
67
|
+
- !ruby/object:Gem::Version
|
|
68
|
+
version: '5.0'
|
|
69
|
+
- !ruby/object:Gem::Dependency
|
|
70
|
+
name: benchmark-ips
|
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
|
72
|
+
requirements:
|
|
73
|
+
- - "~>"
|
|
74
|
+
- !ruby/object:Gem::Version
|
|
75
|
+
version: '2.0'
|
|
76
|
+
type: :development
|
|
77
|
+
prerelease: false
|
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
79
|
+
requirements:
|
|
80
|
+
- - "~>"
|
|
81
|
+
- !ruby/object:Gem::Version
|
|
82
|
+
version: '2.0'
|
|
83
|
+
description: Minimal set of optimized string scanning methods for high-throughput
|
|
84
|
+
workloads like log processing, CSV parsing, HTTP parsing, and text analytics. Pure
|
|
85
|
+
C implementation with no external dependencies.
|
|
86
|
+
email:
|
|
87
|
+
- romnhajdarov@gmail.com
|
|
88
|
+
executables: []
|
|
89
|
+
extensions:
|
|
90
|
+
- ext/fast_string/extconf.rb
|
|
91
|
+
extra_rdoc_files: []
|
|
92
|
+
files:
|
|
93
|
+
- LICENSE.txt
|
|
94
|
+
- README.md
|
|
95
|
+
- benchmark/benchmark.rb
|
|
96
|
+
- ext/fast_string/extconf.rb
|
|
97
|
+
- ext/fast_string/fast_string.c
|
|
98
|
+
- lib/fast_string.rb
|
|
99
|
+
- lib/fast_string/version.rb
|
|
100
|
+
homepage: https://github.com/roman-haidarov/fast_string
|
|
101
|
+
licenses:
|
|
102
|
+
- MIT
|
|
103
|
+
metadata:
|
|
104
|
+
homepage_uri: https://github.com/roman-haidarov/fast_string
|
|
105
|
+
source_code_uri: https://github.com/roman-haidarov/fast_string
|
|
106
|
+
changelog_uri: https://github.com/roman-haidarov/fast_string/blob/main/CHANGELOG.md
|
|
107
|
+
post_install_message:
|
|
108
|
+
rdoc_options: []
|
|
109
|
+
require_paths:
|
|
110
|
+
- lib
|
|
111
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
|
112
|
+
requirements:
|
|
113
|
+
- - ">="
|
|
114
|
+
- !ruby/object:Gem::Version
|
|
115
|
+
version: 2.7.0
|
|
116
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
117
|
+
requirements:
|
|
118
|
+
- - ">="
|
|
119
|
+
- !ruby/object:Gem::Version
|
|
120
|
+
version: '0'
|
|
121
|
+
requirements: []
|
|
122
|
+
rubygems_version: 3.4.22
|
|
123
|
+
signing_key:
|
|
124
|
+
specification_version: 4
|
|
125
|
+
summary: High-performance Ruby String extensions implemented in C
|
|
126
|
+
test_files: []
|