ahocorasick-rust 1.0.2 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +124 -11
- data/docs/match_kind.md +139 -0
- data/docs/reference.md +396 -0
- data/ext/rahocorasick/src/lib.rs +164 -5
- metadata +8 -8
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 040bfa66b5fdcf34e1ac7c7abc0ea45fa52561e83964750c59e54213bb5dbe10
|
|
4
|
+
data.tar.gz: 3e33df04f2ae0972f696085efe98984131eecf17cece98b9ae33c9a497f8b690
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 58bfcb0c7c05b0b477d1e85e9c9ed7d9ab683967771d2ec512a6f2f36b5c72c6c39beb4ba4f3c0ab3e1c66e3492f53fa0ccdff2b57b0dfe62a75114105810583
|
|
7
|
+
data.tar.gz: 77839a72ffe30a4f80aa6c1626d7770d6a215954f3365c9967b739f968a4b52d88de7d350e672cd4f5feea7c7a293fbea636a7194aaeecdbe085c94410a4af18
|
data/README.md
CHANGED
|
@@ -19,10 +19,12 @@ Aho-Corasick is a powerful string searching algorithm that can find **multiple p
|
|
|
19
19
|
|
|
20
20
|
**Why this gem rocks:**
|
|
21
21
|
- 🦀 Powered by Rust for maximum speed
|
|
22
|
-
- 💎
|
|
22
|
+
- 💎 Clean, intuitive Ruby API with 7+ search methods
|
|
23
23
|
- 🚀 Up to **67x faster** than pure Ruby implementations
|
|
24
24
|
- ✨ Precompiled binaries for major platforms
|
|
25
|
-
-
|
|
25
|
+
- 🎯 Multiple search modes: overlapping, positioned, existence checks
|
|
26
|
+
- 🔄 Find & replace with hash or block-based logic
|
|
27
|
+
- 🌈 Works with Ruby 2.7+ and UTF-8/emoji
|
|
26
28
|
|
|
27
29
|
## Installation 📦
|
|
28
30
|
|
|
@@ -44,24 +46,135 @@ Or install it yourself:
|
|
|
44
46
|
gem install ahocorasick-rust
|
|
45
47
|
```
|
|
46
48
|
|
|
47
|
-
##
|
|
49
|
+
## Features ✨
|
|
48
50
|
|
|
49
|
-
|
|
51
|
+
- **Multiple search modes** - Find all matches, overlapping matches, or just check existence
|
|
52
|
+
- **Position tracking** - Get byte offsets for every match
|
|
53
|
+
- **Case-insensitive matching** - Optional ASCII case-insensitive search
|
|
54
|
+
- **Match strategies** - Control priority when patterns overlap
|
|
55
|
+
- **Find & replace** - Replace patterns with strings or dynamic logic via blocks
|
|
56
|
+
- **Unicode support** - Works seamlessly with UTF-8 text and emoji
|
|
57
|
+
- **Zero-copy where possible** - Efficient memory usage
|
|
58
|
+
|
|
59
|
+
## Quick Start 🎀
|
|
60
|
+
|
|
61
|
+
### Basic Pattern Matching
|
|
50
62
|
|
|
51
63
|
```ruby
|
|
52
64
|
require 'ahocorasick-rust'
|
|
53
65
|
|
|
54
|
-
# Create a
|
|
55
|
-
|
|
56
|
-
matcher = AhoCorasickRust.new(animals)
|
|
66
|
+
# Create a matcher with your patterns
|
|
67
|
+
matcher = AhoCorasickRust.new(['cat', 'dog', 'fox'])
|
|
57
68
|
|
|
58
|
-
#
|
|
59
|
-
|
|
60
|
-
matcher.lookup(text)
|
|
69
|
+
# Find all matches
|
|
70
|
+
matcher.lookup("The quick brown fox jumps over the lazy dog.")
|
|
61
71
|
# => ["fox", "dog"]
|
|
72
|
+
|
|
73
|
+
# Check if any pattern exists
|
|
74
|
+
matcher.match?("I have a cat")
|
|
75
|
+
# => true
|
|
62
76
|
```
|
|
63
77
|
|
|
64
|
-
|
|
78
|
+
### Case-Insensitive Matching
|
|
79
|
+
|
|
80
|
+
```ruby
|
|
81
|
+
matcher = AhoCorasickRust.new(['Ruby', 'Python'], case_insensitive: true)
|
|
82
|
+
|
|
83
|
+
matcher.lookup('I love RUBY and python!')
|
|
84
|
+
# => ["Ruby", "Python"]
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
### Get Match Positions
|
|
88
|
+
|
|
89
|
+
```ruby
|
|
90
|
+
matcher = AhoCorasickRust.new(['fox', 'dog'])
|
|
91
|
+
|
|
92
|
+
matcher.lookup_with_positions('The fox and dog')
|
|
93
|
+
# => [
|
|
94
|
+
# { pattern: 'fox', start: 4, end: 7 },
|
|
95
|
+
# { pattern: 'dog', start: 12, end: 15 }
|
|
96
|
+
# ]
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
### Find & Replace
|
|
100
|
+
|
|
101
|
+
```ruby
|
|
102
|
+
matcher = AhoCorasickRust.new(['bad', 'worse', 'worst'])
|
|
103
|
+
|
|
104
|
+
# Replace with hash
|
|
105
|
+
matcher.replace_all('This is bad and worse', { 'bad' => 'good', 'worse' => 'better' })
|
|
106
|
+
# => "This is good and better"
|
|
107
|
+
|
|
108
|
+
# Replace with block
|
|
109
|
+
matcher.replace_all('This is bad and worse') { |word| '*' * word.length }
|
|
110
|
+
# => "This is *** and *****"
|
|
111
|
+
```
|
|
112
|
+
|
|
113
|
+
### Overlapping Matches
|
|
114
|
+
|
|
115
|
+
```ruby
|
|
116
|
+
matcher = AhoCorasickRust.new(['abc', 'bcd', 'cde'])
|
|
117
|
+
|
|
118
|
+
# Regular lookup finds non-overlapping matches
|
|
119
|
+
matcher.lookup('abcde')
|
|
120
|
+
# => ["abc"]
|
|
121
|
+
|
|
122
|
+
# Overlapping lookup finds all matches
|
|
123
|
+
matcher.lookup_overlapping('abcde')
|
|
124
|
+
# => ["abc", "bcd", "cde"]
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### Advanced: Match Strategies
|
|
128
|
+
|
|
129
|
+
```ruby
|
|
130
|
+
# Prefer longest matches
|
|
131
|
+
matcher = AhoCorasickRust.new(
|
|
132
|
+
['test', 'testing'],
|
|
133
|
+
match_kind: :leftmost_longest
|
|
134
|
+
)
|
|
135
|
+
|
|
136
|
+
matcher.lookup('testing')
|
|
137
|
+
# => ["testing"] # chooses longer match over 'test'
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### Find First (Efficient for Existence Checks)
|
|
141
|
+
|
|
142
|
+
```ruby
|
|
143
|
+
matcher = AhoCorasickRust.new(['foo', 'bar', 'baz'])
|
|
144
|
+
|
|
145
|
+
# Get just the first match (faster than getting all matches)
|
|
146
|
+
matcher.find_first('hello foo bar baz')
|
|
147
|
+
# => "foo"
|
|
148
|
+
|
|
149
|
+
# Or with position
|
|
150
|
+
matcher.find_first_with_position('hello foo bar')
|
|
151
|
+
# => { pattern: 'foo', start: 6, end: 9 }
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
## API Overview 🔍
|
|
155
|
+
|
|
156
|
+
**Constructor:**
|
|
157
|
+
- `AhoCorasickRust.new(patterns, case_insensitive: false, match_kind: :leftmost_first)`
|
|
158
|
+
|
|
159
|
+
**Search Methods:**
|
|
160
|
+
- `#lookup(text)` - Find all non-overlapping matches
|
|
161
|
+
- `#lookup_overlapping(text)` - Find all matches including overlaps
|
|
162
|
+
- `#lookup_with_positions(text)` - Find matches with byte positions
|
|
163
|
+
- `#match?(text)` - Check if any pattern exists (returns boolean)
|
|
164
|
+
- `#find_first(text)` - Get first match only
|
|
165
|
+
- `#find_first_with_position(text)` - Get first match with position
|
|
166
|
+
|
|
167
|
+
**Replace Methods:**
|
|
168
|
+
- `#replace_all(text, hash)` - Replace with hash mapping
|
|
169
|
+
- `#replace_all(text) { |match| ... }` - Replace with block
|
|
170
|
+
|
|
171
|
+
## Documentation 📖
|
|
172
|
+
|
|
173
|
+
- **[API Reference](docs/reference.md)** - Complete method documentation with examples
|
|
174
|
+
- **[Match Kind Guide](docs/match_kind.md)** - Understanding match strategies
|
|
175
|
+
- **[Example Script](scripts/example.rb)** - Real-world usage examples
|
|
176
|
+
|
|
177
|
+
**Want more examples?** Check out our example script with content filtering, language detection, and more! 🌈
|
|
65
178
|
|
|
66
179
|
## Benchmark 📊
|
|
67
180
|
|
data/docs/match_kind.md
ADDED
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
# Match Kind Strategies
|
|
2
|
+
|
|
3
|
+
The `match_kind` option controls how the Aho-Corasick automaton behaves when multiple patterns could match at the same position in the text. This is important when you have overlapping patterns like `'abc'` and `'abcd'`.
|
|
4
|
+
|
|
5
|
+
## Available Options
|
|
6
|
+
|
|
7
|
+
### `:leftmost_first` (default)
|
|
8
|
+
|
|
9
|
+
**Priority:** First pattern in the list wins
|
|
10
|
+
|
|
11
|
+
When multiple patterns could match at the same position, the pattern that appears first in your pattern list takes precedence.
|
|
12
|
+
|
|
13
|
+
```ruby
|
|
14
|
+
matcher = AhoCorasickRust.new(['abc', 'abcd'], match_kind: :leftmost_first)
|
|
15
|
+
matcher.lookup('abcd')
|
|
16
|
+
# => ['abc'] # 'abc' appears first in the pattern list, so it wins
|
|
17
|
+
```
|
|
18
|
+
|
|
19
|
+
**Use cases:**
|
|
20
|
+
- Keyword replacement where you want specific patterns to take precedence
|
|
21
|
+
- Content filtering with priority rules
|
|
22
|
+
- When you want explicit control over match priority by ordering your patterns
|
|
23
|
+
|
|
24
|
+
---
|
|
25
|
+
|
|
26
|
+
### `:leftmost_longest`
|
|
27
|
+
|
|
28
|
+
**Priority:** Longest match wins
|
|
29
|
+
|
|
30
|
+
When multiple patterns could match at the same position, the longest matching pattern is preferred.
|
|
31
|
+
|
|
32
|
+
```ruby
|
|
33
|
+
matcher = AhoCorasickRust.new(['abc', 'abcd'], match_kind: :leftmost_longest)
|
|
34
|
+
matcher.lookup('abcd')
|
|
35
|
+
# => ['abcd'] # 'abcd' is longer, so it wins
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
**Use cases:**
|
|
39
|
+
- Tokenization where longer tokens are more meaningful
|
|
40
|
+
- Entity recognition where you want the most specific match
|
|
41
|
+
- Parsing structured text where longer patterns indicate more context
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
### `:standard`
|
|
46
|
+
|
|
47
|
+
**Priority:** Standard Aho-Corasick algorithm behavior
|
|
48
|
+
|
|
49
|
+
Reports matches as they're encountered in the automaton's state machine. This is the classical Aho-Corasick behavior.
|
|
50
|
+
|
|
51
|
+
```ruby
|
|
52
|
+
matcher = AhoCorasickRust.new(['abc', 'abcd'], match_kind: :standard)
|
|
53
|
+
matcher.lookup('abcd')
|
|
54
|
+
# => ['abc'] # Reports matches as the automaton finds them
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
**Use cases:**
|
|
58
|
+
- When you need strict Aho-Corasick semantics
|
|
59
|
+
- Potentially slightly faster for certain pattern sets
|
|
60
|
+
- Academic or research purposes requiring standard algorithm behavior
|
|
61
|
+
|
|
62
|
+
## Comparison Example
|
|
63
|
+
|
|
64
|
+
Given patterns `['ab', 'abc', 'abcd']` and text `'abcd'`:
|
|
65
|
+
|
|
66
|
+
| match_kind | Result | Reason |
|
|
67
|
+
|------------|--------|--------|
|
|
68
|
+
| `:leftmost_first` | `['ab']` | First pattern in list |
|
|
69
|
+
| `:leftmost_longest` | `['abcd']` | Longest matching pattern |
|
|
70
|
+
| `:standard` | `['ab']` | First match encountered by automaton |
|
|
71
|
+
|
|
72
|
+
## Interaction with Other Features
|
|
73
|
+
|
|
74
|
+
### With `#lookup_overlapping`
|
|
75
|
+
|
|
76
|
+
The `match_kind` option does **not** affect `#lookup_overlapping`, which always returns all overlapping matches regardless of the strategy:
|
|
77
|
+
|
|
78
|
+
```ruby
|
|
79
|
+
matcher = AhoCorasickRust.new(['abc', 'bcd'], match_kind: :leftmost_first)
|
|
80
|
+
|
|
81
|
+
# match_kind affects non-overlapping lookup
|
|
82
|
+
matcher.lookup('abcd')
|
|
83
|
+
# => ['abc'] (only one match, respects match_kind)
|
|
84
|
+
|
|
85
|
+
# overlapping lookup ignores match_kind
|
|
86
|
+
matcher.lookup_overlapping('abcd')
|
|
87
|
+
# => ['abc', 'bcd'] (all matches, ignores match_kind)
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
### With `case_insensitive`
|
|
91
|
+
|
|
92
|
+
The `match_kind` and `case_insensitive` options work together seamlessly:
|
|
93
|
+
|
|
94
|
+
```ruby
|
|
95
|
+
matcher = AhoCorasickRust.new(
|
|
96
|
+
['abc', 'abcd'],
|
|
97
|
+
match_kind: :leftmost_longest,
|
|
98
|
+
case_insensitive: true
|
|
99
|
+
)
|
|
100
|
+
matcher.lookup('ABCD')
|
|
101
|
+
# => ['abcd'] # Finds longest match, case-insensitively
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### With `#find_first`
|
|
105
|
+
|
|
106
|
+
The `match_kind` affects which match is returned by `#find_first`:
|
|
107
|
+
|
|
108
|
+
```ruby
|
|
109
|
+
# With leftmost_first
|
|
110
|
+
matcher1 = AhoCorasickRust.new(['abc', 'abcd'], match_kind: :leftmost_first)
|
|
111
|
+
matcher1.find_first('abcd') # => 'abc'
|
|
112
|
+
|
|
113
|
+
# With leftmost_longest
|
|
114
|
+
matcher2 = AhoCorasickRust.new(['abc', 'abcd'], match_kind: :leftmost_longest)
|
|
115
|
+
matcher2.find_first('abcd') # => 'abcd'
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
## Choosing the Right Strategy
|
|
119
|
+
|
|
120
|
+
**Use `:leftmost_first` when:**
|
|
121
|
+
- You want explicit control over priority
|
|
122
|
+
- Pattern order is meaningful in your domain
|
|
123
|
+
- You're doing rule-based text processing
|
|
124
|
+
|
|
125
|
+
**Use `:leftmost_longest` when:**
|
|
126
|
+
- Longer matches are more specific/important
|
|
127
|
+
- You're tokenizing or parsing
|
|
128
|
+
- You want the most complete match
|
|
129
|
+
|
|
130
|
+
**Use `:standard` when:**
|
|
131
|
+
- You need classical Aho-Corasick semantics
|
|
132
|
+
- You're comparing against other implementations
|
|
133
|
+
- Performance is critical and you understand the tradeoffs
|
|
134
|
+
|
|
135
|
+
## Performance Notes
|
|
136
|
+
|
|
137
|
+
All three strategies use the same underlying automaton construction, so performance differences are minimal. The main difference is in the matching logic when choosing between multiple possible matches at the same position.
|
|
138
|
+
|
|
139
|
+
In practice, `:leftmost_first` (the default) provides the best balance of performance, predictability, and control for most use cases.
|
data/docs/reference.md
ADDED
|
@@ -0,0 +1,396 @@
|
|
|
1
|
+
# API Reference
|
|
2
|
+
|
|
3
|
+
Complete reference for all methods and options in the `ahocorasick-rust` gem.
|
|
4
|
+
|
|
5
|
+
## Table of Contents
|
|
6
|
+
|
|
7
|
+
- [Constructor](#constructor)
|
|
8
|
+
- [Search Methods](#search-methods)
|
|
9
|
+
- [Replace Methods](#replace-methods)
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Constructor
|
|
14
|
+
|
|
15
|
+
### `AhoCorasickRust.new(patterns, **options)`
|
|
16
|
+
|
|
17
|
+
Creates a new Aho-Corasick matcher from an array of pattern strings.
|
|
18
|
+
|
|
19
|
+
**Parameters:**
|
|
20
|
+
- `patterns` (Array<String>) - Array of pattern strings to search for
|
|
21
|
+
- `options` (Hash) - Optional configuration
|
|
22
|
+
|
|
23
|
+
**Options:**
|
|
24
|
+
- `case_insensitive` (Boolean) - Enable ASCII case-insensitive matching (default: `false`)
|
|
25
|
+
- `match_kind` (Symbol) - Strategy for handling overlapping patterns (default: `:leftmost_first`)
|
|
26
|
+
- `:leftmost_first` - First pattern in list wins
|
|
27
|
+
- `:leftmost_longest` - Longest pattern wins
|
|
28
|
+
- `:standard` - Standard Aho-Corasick behavior
|
|
29
|
+
|
|
30
|
+
**Examples:**
|
|
31
|
+
|
|
32
|
+
```ruby
|
|
33
|
+
# Basic usage
|
|
34
|
+
matcher = AhoCorasickRust.new(['foo', 'bar', 'baz'])
|
|
35
|
+
|
|
36
|
+
# Case-insensitive matching
|
|
37
|
+
matcher = AhoCorasickRust.new(['Ruby', 'Python'], case_insensitive: true)
|
|
38
|
+
|
|
39
|
+
# Control match priority
|
|
40
|
+
matcher = AhoCorasickRust.new(['abc', 'abcd'], match_kind: :leftmost_longest)
|
|
41
|
+
|
|
42
|
+
# Combine options
|
|
43
|
+
matcher = AhoCorasickRust.new(
|
|
44
|
+
['test', 'testing'],
|
|
45
|
+
case_insensitive: true,
|
|
46
|
+
match_kind: :leftmost_longest
|
|
47
|
+
)
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
**Raises:**
|
|
51
|
+
- `TypeError` - If patterns is not an array or contains non-strings
|
|
52
|
+
- `ArgumentError` - If match_kind is invalid
|
|
53
|
+
- `RuntimeError` - If automaton construction fails
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Search Methods
|
|
58
|
+
|
|
59
|
+
### `#lookup(haystack)`
|
|
60
|
+
|
|
61
|
+
Finds all non-overlapping pattern matches in the haystack.
|
|
62
|
+
|
|
63
|
+
**Parameters:**
|
|
64
|
+
- `haystack` (String) - The text to search
|
|
65
|
+
|
|
66
|
+
**Returns:** Array<String> - Array of matched patterns
|
|
67
|
+
|
|
68
|
+
**Examples:**
|
|
69
|
+
|
|
70
|
+
```ruby
|
|
71
|
+
matcher = AhoCorasickRust.new(['foo', 'bar'])
|
|
72
|
+
|
|
73
|
+
matcher.lookup('foo and bar')
|
|
74
|
+
# => ['foo', 'bar']
|
|
75
|
+
|
|
76
|
+
matcher.lookup('hello world')
|
|
77
|
+
# => []
|
|
78
|
+
|
|
79
|
+
# Non-overlapping: finds 'abc' and stops
|
|
80
|
+
matcher = AhoCorasickRust.new(['abc', 'bcd'])
|
|
81
|
+
matcher.lookup('abcd')
|
|
82
|
+
# => ['abc']
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
---
|
|
86
|
+
|
|
87
|
+
### `#lookup_overlapping(haystack)`
|
|
88
|
+
|
|
89
|
+
Finds all pattern matches, including overlapping ones.
|
|
90
|
+
|
|
91
|
+
**Parameters:**
|
|
92
|
+
- `haystack` (String) - The text to search
|
|
93
|
+
|
|
94
|
+
**Returns:** Array<String> - Array of all matched patterns, including overlaps
|
|
95
|
+
|
|
96
|
+
**Examples:**
|
|
97
|
+
|
|
98
|
+
```ruby
|
|
99
|
+
matcher = AhoCorasickRust.new(['abc', 'bcd', 'cde'])
|
|
100
|
+
|
|
101
|
+
matcher.lookup_overlapping('abcde')
|
|
102
|
+
# => ['abc', 'bcd', 'cde']
|
|
103
|
+
|
|
104
|
+
# Compare with non-overlapping
|
|
105
|
+
matcher.lookup('abcde')
|
|
106
|
+
# => ['abc']
|
|
107
|
+
|
|
108
|
+
# Finds multiple occurrences of same pattern at different positions
|
|
109
|
+
matcher = AhoCorasickRust.new(['a', 'ab'])
|
|
110
|
+
matcher.lookup_overlapping('aab')
|
|
111
|
+
# => ['a', 'a', 'ab']
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
---
|
|
115
|
+
|
|
116
|
+
### `#lookup_with_positions(haystack)`
|
|
117
|
+
|
|
118
|
+
Finds all non-overlapping matches with their byte positions.
|
|
119
|
+
|
|
120
|
+
**Parameters:**
|
|
121
|
+
- `haystack` (String) - The text to search
|
|
122
|
+
|
|
123
|
+
**Returns:** Array<Hash> - Array of hashes with `:pattern`, `:start`, `:end` keys
|
|
124
|
+
|
|
125
|
+
**Examples:**
|
|
126
|
+
|
|
127
|
+
```ruby
|
|
128
|
+
matcher = AhoCorasickRust.new(['fox', 'dog'])
|
|
129
|
+
|
|
130
|
+
matcher.lookup_with_positions('The quick brown fox jumps over the lazy dog.')
|
|
131
|
+
# => [
|
|
132
|
+
# { pattern: 'fox', start: 16, end: 19 },
|
|
133
|
+
# { pattern: 'dog', start: 40, end: 43 }
|
|
134
|
+
# ]
|
|
135
|
+
|
|
136
|
+
matcher.lookup_with_positions('hello world')
|
|
137
|
+
# => []
|
|
138
|
+
|
|
139
|
+
# Positions are byte offsets, not character offsets
|
|
140
|
+
matcher = AhoCorasickRust.new(['数据'])
|
|
141
|
+
matcher.lookup_with_positions('金数据工具')
|
|
142
|
+
# => [{ pattern: '数据', start: 3, end: 9 }] # byte positions
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
---
|
|
146
|
+
|
|
147
|
+
### `#match?(haystack)`
|
|
148
|
+
|
|
149
|
+
Checks if any pattern matches in the haystack (predicate method).
|
|
150
|
+
|
|
151
|
+
**Parameters:**
|
|
152
|
+
- `haystack` (String) - The text to search
|
|
153
|
+
|
|
154
|
+
**Returns:** Boolean - `true` if any pattern matches, `false` otherwise
|
|
155
|
+
|
|
156
|
+
**Examples:**
|
|
157
|
+
|
|
158
|
+
```ruby
|
|
159
|
+
matcher = AhoCorasickRust.new(['foo', 'bar'])
|
|
160
|
+
|
|
161
|
+
matcher.match?('hello foo world')
|
|
162
|
+
# => true
|
|
163
|
+
|
|
164
|
+
matcher.match?('hello world')
|
|
165
|
+
# => false
|
|
166
|
+
|
|
167
|
+
# Works with case-insensitive
|
|
168
|
+
matcher = AhoCorasickRust.new(['Ruby'], case_insensitive: true)
|
|
169
|
+
matcher.match?('I love ruby')
|
|
170
|
+
# => true
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
---
|
|
174
|
+
|
|
175
|
+
### `#find_first(haystack)`
|
|
176
|
+
|
|
177
|
+
Returns the first pattern match found, or `nil` if no match.
|
|
178
|
+
|
|
179
|
+
**Parameters:**
|
|
180
|
+
- `haystack` (String) - The text to search
|
|
181
|
+
|
|
182
|
+
**Returns:** String or nil - First matched pattern, or `nil` if no match
|
|
183
|
+
|
|
184
|
+
**Examples:**
|
|
185
|
+
|
|
186
|
+
```ruby
|
|
187
|
+
matcher = AhoCorasickRust.new(['foo', 'bar', 'baz'])
|
|
188
|
+
|
|
189
|
+
matcher.find_first('hello foo bar baz')
|
|
190
|
+
# => 'foo'
|
|
191
|
+
|
|
192
|
+
matcher.find_first('hello world')
|
|
193
|
+
# => nil
|
|
194
|
+
|
|
195
|
+
# Stops after first match (more efficient than #lookup)
|
|
196
|
+
matcher = AhoCorasickRust.new(['cat', 'dog', 'bird'])
|
|
197
|
+
matcher.find_first('The cat and dog are friends')
|
|
198
|
+
# => 'cat' (stops, doesn't find 'dog')
|
|
199
|
+
```
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
### `#find_first_with_position(haystack)`
|
|
204
|
+
|
|
205
|
+
Returns the first pattern match with its position, or `nil` if no match.
|
|
206
|
+
|
|
207
|
+
**Parameters:**
|
|
208
|
+
- `haystack` (String) - The text to search
|
|
209
|
+
|
|
210
|
+
**Returns:** Hash or nil - Hash with `:pattern`, `:start`, `:end` keys, or `nil` if no match
|
|
211
|
+
|
|
212
|
+
**Examples:**
|
|
213
|
+
|
|
214
|
+
```ruby
|
|
215
|
+
matcher = AhoCorasickRust.new(['foo', 'bar'])
|
|
216
|
+
|
|
217
|
+
matcher.find_first_with_position('hello foo world')
|
|
218
|
+
# => { pattern: 'foo', start: 6, end: 9 }
|
|
219
|
+
|
|
220
|
+
matcher.find_first_with_position('hello world')
|
|
221
|
+
# => nil
|
|
222
|
+
|
|
223
|
+
# Finds earliest match in text, not first in pattern list
|
|
224
|
+
matcher = AhoCorasickRust.new(['bar', 'foo'])
|
|
225
|
+
matcher.find_first_with_position('foo bar baz')
|
|
226
|
+
# => { pattern: 'foo', start: 0, end: 3 } # 'foo' appears first in text
|
|
227
|
+
```
|
|
228
|
+
|
|
229
|
+
---
|
|
230
|
+
|
|
231
|
+
## Replace Methods
|
|
232
|
+
|
|
233
|
+
### `#replace_all(haystack, replacements)`
|
|
234
|
+
|
|
235
|
+
Replaces all pattern matches with their corresponding replacements.
|
|
236
|
+
|
|
237
|
+
**Parameters:**
|
|
238
|
+
- `haystack` (String) - The text to search
|
|
239
|
+
- `replacements` (Hash or Block) - Replacement mapping or block
|
|
240
|
+
|
|
241
|
+
**Returns:** String - New string with replacements applied
|
|
242
|
+
|
|
243
|
+
**Hash-based replacement:**
|
|
244
|
+
|
|
245
|
+
Maps pattern strings to their replacements. Patterns not in the hash remain unchanged.
|
|
246
|
+
|
|
247
|
+
```ruby
|
|
248
|
+
matcher = AhoCorasickRust.new(['foo', 'bar', 'baz'])
|
|
249
|
+
|
|
250
|
+
matcher.replace_all('foo and bar', { 'foo' => 'FOO', 'bar' => 'BAR' })
|
|
251
|
+
# => 'FOO and BAR'
|
|
252
|
+
|
|
253
|
+
# Partial replacement - 'baz' not in hash, stays unchanged
|
|
254
|
+
matcher.replace_all('foo bar baz', { 'foo' => 'hello' })
|
|
255
|
+
# => 'hello bar baz'
|
|
256
|
+
|
|
257
|
+
# Empty hash - no replacements
|
|
258
|
+
matcher.replace_all('foo and bar', {})
|
|
259
|
+
# => 'foo and bar'
|
|
260
|
+
```
|
|
261
|
+
|
|
262
|
+
**Block-based replacement:**
|
|
263
|
+
|
|
264
|
+
Passes each matched pattern to the block, uses return value as replacement.
|
|
265
|
+
|
|
266
|
+
```ruby
|
|
267
|
+
matcher = AhoCorasickRust.new(['foo', 'bar'])
|
|
268
|
+
|
|
269
|
+
matcher.replace_all('foo and bar') { |match| match.upcase }
|
|
270
|
+
# => 'FOO and BAR'
|
|
271
|
+
|
|
272
|
+
# Dynamic replacement logic
|
|
273
|
+
matcher = AhoCorasickRust.new(['apple', 'banana', 'cherry'])
|
|
274
|
+
text = 'I like apple, banana, and cherry'
|
|
275
|
+
result = matcher.replace_all(text) do |fruit|
|
|
276
|
+
{ 'apple' => '🍎', 'banana' => '🍌', 'cherry' => '🍒' }[fruit]
|
|
277
|
+
end
|
|
278
|
+
# => 'I like 🍎, 🍌, and 🍒'
|
|
279
|
+
|
|
280
|
+
# Replacement length can differ from match
|
|
281
|
+
matcher = AhoCorasickRust.new(['a', 'bb'])
|
|
282
|
+
matcher.replace_all('a bb a bb') { |m| m.length.to_s }
|
|
283
|
+
# => '1 2 1 2'
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
**Raises:**
|
|
287
|
+
- `ArgumentError` - If replacements is neither a Hash nor a block given
|
|
288
|
+
|
|
289
|
+
---
|
|
290
|
+
|
|
291
|
+
## Usage Patterns
|
|
292
|
+
|
|
293
|
+
### Content Filtering
|
|
294
|
+
|
|
295
|
+
```ruby
|
|
296
|
+
# Filter profanity with asterisks
|
|
297
|
+
bad_words = ['bad', 'worse', 'worst']
|
|
298
|
+
filter = AhoCorasickRust.new(bad_words, case_insensitive: true)
|
|
299
|
+
|
|
300
|
+
filter.replace_all('This is bad and worse') { |word| '*' * word.length }
|
|
301
|
+
# => 'This is *** and *****'
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
### Keyword Highlighting
|
|
305
|
+
|
|
306
|
+
```ruby
|
|
307
|
+
keywords = ['Ruby', 'Python', 'JavaScript']
|
|
308
|
+
matcher = AhoCorasickRust.new(keywords)
|
|
309
|
+
|
|
310
|
+
positions = matcher.lookup_with_positions('I love Ruby and Python')
|
|
311
|
+
# Use positions to add HTML tags, syntax highlighting, etc.
|
|
312
|
+
```
|
|
313
|
+
|
|
314
|
+
### Quick Existence Check
|
|
315
|
+
|
|
316
|
+
```ruby
|
|
317
|
+
# Check if any banned word appears
|
|
318
|
+
banned = ['spam', 'scam', 'fraud']
|
|
319
|
+
checker = AhoCorasickRust.new(banned, case_insensitive: true)
|
|
320
|
+
|
|
321
|
+
if checker.match?(user_input)
|
|
322
|
+
reject_message
|
|
323
|
+
end
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
### DNA Sequence Analysis
|
|
327
|
+
|
|
328
|
+
```ruby
|
|
329
|
+
# Find all overlapping genetic markers
|
|
330
|
+
markers = ['ATCG', 'TCGA', 'CGAT']
|
|
331
|
+
analyzer = AhoCorasickRust.new(markers)
|
|
332
|
+
|
|
333
|
+
sequence = 'ATCGAT'
|
|
334
|
+
analyzer.lookup_overlapping(sequence)
|
|
335
|
+
# => ['ATCG', 'TCGA', 'CGAT'] # all overlapping matches
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
### Tokenization
|
|
339
|
+
|
|
340
|
+
```ruby
|
|
341
|
+
# Prefer longest matches for tokens
|
|
342
|
+
keywords = ['if', 'iffy', 'then', 'end', 'endif']
|
|
343
|
+
tokenizer = AhoCorasickRust.new(keywords, match_kind: :leftmost_longest)
|
|
344
|
+
|
|
345
|
+
tokenizer.lookup('iffy then endif')
|
|
346
|
+
# => ['iffy', 'then', 'endif'] # chooses longer 'iffy' over 'if'
|
|
347
|
+
```
|
|
348
|
+
|
|
349
|
+
---
|
|
350
|
+
|
|
351
|
+
## Type Compatibility
|
|
352
|
+
|
|
353
|
+
### Accepted Types
|
|
354
|
+
|
|
355
|
+
- **Patterns:** Array of String objects
|
|
356
|
+
- **Haystack:** String objects
|
|
357
|
+
- **Options:** Symbol keys (`:case_insensitive`, `:match_kind`)
|
|
358
|
+
- **Replacements:** Hash with String keys/values, or Block returning String
|
|
359
|
+
|
|
360
|
+
### Unicode Support
|
|
361
|
+
|
|
362
|
+
All methods support UTF-8 encoded strings:
|
|
363
|
+
|
|
364
|
+
```ruby
|
|
365
|
+
matcher = AhoCorasickRust.new(['こんにちは', '世界'])
|
|
366
|
+
matcher.lookup('こんにちは世界')
|
|
367
|
+
# => ['こんにちは', '世界']
|
|
368
|
+
|
|
369
|
+
# Emoji support
|
|
370
|
+
matcher = AhoCorasickRust.new(['😊', '🎉'])
|
|
371
|
+
matcher.lookup('I am 😊 today 🎉')
|
|
372
|
+
# => ['😊', '🎉']
|
|
373
|
+
```
|
|
374
|
+
|
|
375
|
+
**Note:** Position values in `#lookup_with_positions` and `#find_first_with_position` are byte offsets, not character offsets. For multi-byte UTF-8 characters, byte positions will differ from character positions.
|
|
376
|
+
|
|
377
|
+
---
|
|
378
|
+
|
|
379
|
+
## Error Handling
|
|
380
|
+
|
|
381
|
+
```ruby
|
|
382
|
+
# TypeError: patterns must be array of strings
|
|
383
|
+
AhoCorasickRust.new('not an array')
|
|
384
|
+
# TypeError: wrong argument type String (expected Array)
|
|
385
|
+
|
|
386
|
+
AhoCorasickRust.new(['foo', 123])
|
|
387
|
+
# TypeError: wrong argument type Integer (expected String)
|
|
388
|
+
|
|
389
|
+
# ArgumentError: invalid match_kind
|
|
390
|
+
AhoCorasickRust.new(['foo'], match_kind: :invalid)
|
|
391
|
+
# ArgumentError: Invalid match_kind: 'invalid'...
|
|
392
|
+
|
|
393
|
+
# TypeError: haystack must be string
|
|
394
|
+
matcher.lookup(123)
|
|
395
|
+
# TypeError: wrong argument type Integer (expected String)
|
|
396
|
+
```
|
data/ext/rahocorasick/src/lib.rs
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
|
-
use aho_corasick::AhoCorasick;
|
|
2
|
-
use magnus::{method, function, prelude::*, Error, Ruby};
|
|
1
|
+
use aho_corasick::{AhoCorasick, AhoCorasickBuilder, MatchKind};
|
|
2
|
+
use magnus::{method, function, prelude::*, Error, Ruby, RHash, RArray, Value, Symbol};
|
|
3
3
|
|
|
4
4
|
#[magnus::wrap(class = "AhoCorasickRust")]
|
|
5
5
|
pub struct AhoCorasickRust {
|
|
@@ -8,12 +8,56 @@ pub struct AhoCorasickRust {
|
|
|
8
8
|
}
|
|
9
9
|
|
|
10
10
|
impl AhoCorasickRust {
|
|
11
|
-
fn
|
|
12
|
-
let
|
|
11
|
+
fn new_impl(ruby: &Ruby, words: Vec<String>, kwargs: Option<RHash>) -> Result<Self, Error> {
|
|
12
|
+
let mut builder = AhoCorasickBuilder::new();
|
|
13
|
+
|
|
14
|
+
// Check for options if kwargs provided
|
|
15
|
+
if let Some(kwargs) = kwargs {
|
|
16
|
+
// case_insensitive option
|
|
17
|
+
if let Some(val) = kwargs.get(ruby.to_symbol("case_insensitive")) {
|
|
18
|
+
if let Ok(case_insensitive) = bool::try_convert(val) {
|
|
19
|
+
if case_insensitive {
|
|
20
|
+
builder.ascii_case_insensitive(true);
|
|
21
|
+
}
|
|
22
|
+
}
|
|
23
|
+
}
|
|
24
|
+
|
|
25
|
+
// match_kind option
|
|
26
|
+
if let Some(val) = kwargs.get(ruby.to_symbol("match_kind")) {
|
|
27
|
+
if let Some(sym) = Symbol::from_value(val) {
|
|
28
|
+
let kind_str = sym.name()?.to_string();
|
|
29
|
+
let match_kind = match kind_str.as_ref() {
|
|
30
|
+
"standard" => MatchKind::Standard,
|
|
31
|
+
"leftmost_first" => MatchKind::LeftmostFirst,
|
|
32
|
+
"leftmost_longest" => MatchKind::LeftmostLongest,
|
|
33
|
+
_ => {
|
|
34
|
+
return Err(Error::new(
|
|
35
|
+
ruby.exception_arg_error(),
|
|
36
|
+
format!("Invalid match_kind: '{}'. Valid values are :standard, :leftmost_first, :leftmost_longest", kind_str)
|
|
37
|
+
));
|
|
38
|
+
}
|
|
39
|
+
};
|
|
40
|
+
builder.match_kind(match_kind);
|
|
41
|
+
}
|
|
42
|
+
}
|
|
43
|
+
}
|
|
44
|
+
|
|
45
|
+
let ac = builder.build(&words)
|
|
13
46
|
.map_err(|e| Error::new(ruby.exception_runtime_error(), format!("Failed to build automaton: {}", e)))?;
|
|
14
47
|
Ok(Self { words, ac })
|
|
15
48
|
}
|
|
16
49
|
|
|
50
|
+
fn new(ruby: &Ruby, args: &[Value]) -> Result<Self, Error> {
|
|
51
|
+
let args = magnus::scan_args::scan_args::<(Vec<String>,), (), (), (), RHash, ()>(args)?;
|
|
52
|
+
let (words,) = args.required;
|
|
53
|
+
let kwargs = args.keywords;
|
|
54
|
+
|
|
55
|
+
// Only pass kwargs if non-empty
|
|
56
|
+
let kwargs_opt = if kwargs.len() > 0 { Some(kwargs) } else { None };
|
|
57
|
+
|
|
58
|
+
Self::new_impl(ruby, words, kwargs_opt)
|
|
59
|
+
}
|
|
60
|
+
|
|
17
61
|
fn lookup(&self, haystack: String) -> Vec<String> {
|
|
18
62
|
let mut matches = vec![];
|
|
19
63
|
for mat in self.ac.find_iter(&haystack) {
|
|
@@ -21,12 +65,127 @@ impl AhoCorasickRust {
|
|
|
21
65
|
}
|
|
22
66
|
matches
|
|
23
67
|
}
|
|
68
|
+
|
|
69
|
+
fn is_match(&self, haystack: String) -> bool {
|
|
70
|
+
self.ac.is_match(&haystack)
|
|
71
|
+
}
|
|
72
|
+
|
|
73
|
+
fn lookup_overlapping(&self, haystack: String) -> Vec<String> {
|
|
74
|
+
let mut matches = vec![];
|
|
75
|
+
for mat in self.ac.find_overlapping_iter(&haystack) {
|
|
76
|
+
matches.push(self.words[mat.pattern()].clone());
|
|
77
|
+
}
|
|
78
|
+
matches
|
|
79
|
+
}
|
|
80
|
+
|
|
81
|
+
fn find_first(&self, haystack: String) -> Option<String> {
|
|
82
|
+
self.ac.find(&haystack).map(|mat| self.words[mat.pattern()].clone())
|
|
83
|
+
}
|
|
84
|
+
|
|
85
|
+
fn find_first_with_position(&self, haystack: String) -> Result<Option<RHash>, Error> {
|
|
86
|
+
let ruby = Ruby::get().unwrap();
|
|
87
|
+
if let Some(mat) = self.ac.find(&haystack) {
|
|
88
|
+
let hash = ruby.hash_new();
|
|
89
|
+
hash.aset(ruby.to_symbol("pattern"), self.words[mat.pattern()].clone())?;
|
|
90
|
+
hash.aset(ruby.to_symbol("start"), mat.start())?;
|
|
91
|
+
hash.aset(ruby.to_symbol("end"), mat.end())?;
|
|
92
|
+
Ok(Some(hash))
|
|
93
|
+
} else {
|
|
94
|
+
Ok(None)
|
|
95
|
+
}
|
|
96
|
+
}
|
|
97
|
+
|
|
98
|
+
fn lookup_with_positions(&self, haystack: String) -> Result<RArray, Error> {
|
|
99
|
+
let ruby = Ruby::get().unwrap();
|
|
100
|
+
let matches = ruby.ary_new();
|
|
101
|
+
for mat in self.ac.find_iter(&haystack) {
|
|
102
|
+
let hash = ruby.hash_new();
|
|
103
|
+
hash.aset(ruby.to_symbol("pattern"), self.words[mat.pattern()].clone())?;
|
|
104
|
+
hash.aset(ruby.to_symbol("start"), mat.start())?;
|
|
105
|
+
hash.aset(ruby.to_symbol("end"), mat.end())?;
|
|
106
|
+
matches.push(hash)?;
|
|
107
|
+
}
|
|
108
|
+
Ok(matches)
|
|
109
|
+
}
|
|
110
|
+
|
|
111
|
+
fn replace_all(&self, args: &[Value]) -> Result<String, Error> {
|
|
112
|
+
let ruby = Ruby::get().unwrap();
|
|
113
|
+
|
|
114
|
+
// Parse arguments: haystack (required), replacements (optional if block given)
|
|
115
|
+
let haystack: String = if args.is_empty() {
|
|
116
|
+
return Err(Error::new(
|
|
117
|
+
ruby.exception_arg_error(),
|
|
118
|
+
"wrong number of arguments (given 0, expected 1..2)"
|
|
119
|
+
));
|
|
120
|
+
} else {
|
|
121
|
+
String::try_convert(args[0])?
|
|
122
|
+
};
|
|
123
|
+
|
|
124
|
+
// Check if a block was given
|
|
125
|
+
match ruby.block_proc() {
|
|
126
|
+
Ok(proc) => {
|
|
127
|
+
// Block-based replacement
|
|
128
|
+
let mut result = haystack.clone();
|
|
129
|
+
let mut offset: isize = 0;
|
|
130
|
+
|
|
131
|
+
for mat in self.ac.find_iter(&haystack) {
|
|
132
|
+
let pattern = &self.words[mat.pattern()];
|
|
133
|
+
let replacement: String = proc.call((pattern.clone(),))?;
|
|
134
|
+
|
|
135
|
+
let start = (mat.start() as isize + offset) as usize;
|
|
136
|
+
let end = (mat.end() as isize + offset) as usize;
|
|
137
|
+
|
|
138
|
+
result.replace_range(start..end, &replacement);
|
|
139
|
+
offset += replacement.len() as isize - (mat.end() - mat.start()) as isize;
|
|
140
|
+
}
|
|
141
|
+
|
|
142
|
+
Ok(result)
|
|
143
|
+
}
|
|
144
|
+
Err(_) if args.len() >= 2 => {
|
|
145
|
+
// Hash-based replacement
|
|
146
|
+
if let Some(hash) = RHash::from_value(args[1]) {
|
|
147
|
+
let mut replace_with: Vec<String> = Vec::with_capacity(self.words.len());
|
|
148
|
+
|
|
149
|
+
for word in &self.words {
|
|
150
|
+
if let Some(val) = hash.get(word.clone()) {
|
|
151
|
+
if let Ok(replacement) = String::try_convert(val) {
|
|
152
|
+
replace_with.push(replacement);
|
|
153
|
+
} else {
|
|
154
|
+
replace_with.push(word.clone());
|
|
155
|
+
}
|
|
156
|
+
} else {
|
|
157
|
+
replace_with.push(word.clone());
|
|
158
|
+
}
|
|
159
|
+
}
|
|
160
|
+
|
|
161
|
+
Ok(self.ac.replace_all(&haystack, &replace_with))
|
|
162
|
+
} else {
|
|
163
|
+
Err(Error::new(
|
|
164
|
+
ruby.exception_arg_error(),
|
|
165
|
+
"replace_all requires a Hash or block"
|
|
166
|
+
))
|
|
167
|
+
}
|
|
168
|
+
}
|
|
169
|
+
Err(_) => {
|
|
170
|
+
Err(Error::new(
|
|
171
|
+
ruby.exception_arg_error(),
|
|
172
|
+
"replace_all requires a Hash or block"
|
|
173
|
+
))
|
|
174
|
+
}
|
|
175
|
+
}
|
|
176
|
+
}
|
|
24
177
|
}
|
|
25
178
|
|
|
26
179
|
#[magnus::init]
|
|
27
180
|
fn main(ruby: &Ruby) -> Result<(), Error> {
|
|
28
181
|
let class = ruby.define_class("AhoCorasickRust", ruby.class_object())?;
|
|
29
|
-
class.define_singleton_method("new", function!(AhoCorasickRust::new, 1))?;
|
|
182
|
+
class.define_singleton_method("new", function!(AhoCorasickRust::new, -1))?;
|
|
30
183
|
class.define_method("lookup", method!(AhoCorasickRust::lookup, 1))?;
|
|
184
|
+
class.define_method("match?", method!(AhoCorasickRust::is_match, 1))?;
|
|
185
|
+
class.define_method("lookup_overlapping", method!(AhoCorasickRust::lookup_overlapping, 1))?;
|
|
186
|
+
class.define_method("find_first", method!(AhoCorasickRust::find_first, 1))?;
|
|
187
|
+
class.define_method("find_first_with_position", method!(AhoCorasickRust::find_first_with_position, 1))?;
|
|
188
|
+
class.define_method("lookup_with_positions", method!(AhoCorasickRust::lookup_with_positions, 1))?;
|
|
189
|
+
class.define_method("replace_all", method!(AhoCorasickRust::replace_all, -1))?;
|
|
31
190
|
Ok(())
|
|
32
191
|
}
|
metadata
CHANGED
|
@@ -1,14 +1,13 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: ahocorasick-rust
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version:
|
|
4
|
+
version: 2.0.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Eric
|
|
8
|
-
autorequire:
|
|
9
8
|
bindir: bin
|
|
10
9
|
cert_chain: []
|
|
11
|
-
date:
|
|
10
|
+
date: 1980-01-02 00:00:00.000000000 Z
|
|
12
11
|
dependencies:
|
|
13
12
|
- !ruby/object:Gem::Dependency
|
|
14
13
|
name: rb_sys
|
|
@@ -26,8 +25,9 @@ dependencies:
|
|
|
26
25
|
version: 0.9.117
|
|
27
26
|
description: A Ruby gem wrapping the legendary Rust Aho-Corasick algorithm! Aho-Corasick
|
|
28
27
|
is a powerful string searching algorithm that finds multiple patterns simultaneously
|
|
29
|
-
in a text.
|
|
30
|
-
|
|
28
|
+
in a text. Features include overlapping matches, case-insensitive search, find &
|
|
29
|
+
replace, match positions, and configurable match strategies. Perfect for content
|
|
30
|
+
filtering, tokenization, and multi-pattern search at lightning speed! (ノ◕ヮ◕)ノ*:・゚✧
|
|
31
31
|
email:
|
|
32
32
|
- eric@ebj.dev
|
|
33
33
|
executables: []
|
|
@@ -37,6 +37,8 @@ extra_rdoc_files: []
|
|
|
37
37
|
files:
|
|
38
38
|
- README.md
|
|
39
39
|
- Rakefile
|
|
40
|
+
- docs/match_kind.md
|
|
41
|
+
- docs/reference.md
|
|
40
42
|
- ext/rahocorasick/Cargo.lock
|
|
41
43
|
- ext/rahocorasick/Cargo.toml
|
|
42
44
|
- ext/rahocorasick/extconf.rb
|
|
@@ -46,7 +48,6 @@ homepage: https://github.com/jetpks/ahocorasick-rust-ruby
|
|
|
46
48
|
licenses:
|
|
47
49
|
- MIT
|
|
48
50
|
metadata: {}
|
|
49
|
-
post_install_message:
|
|
50
51
|
rdoc_options: []
|
|
51
52
|
require_paths:
|
|
52
53
|
- lib
|
|
@@ -61,8 +62,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
61
62
|
- !ruby/object:Gem::Version
|
|
62
63
|
version: '0'
|
|
63
64
|
requirements: []
|
|
64
|
-
rubygems_version: 3.
|
|
65
|
-
signing_key:
|
|
65
|
+
rubygems_version: 3.6.9
|
|
66
66
|
specification_version: 4
|
|
67
67
|
summary: Blazing-fast ✨ Ruby wrapper for the Rust Aho-Corasick string matching algorithm!
|
|
68
68
|
test_files: []
|