list_matcher 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +14 -0
- data/Gemfile +4 -0
- data/LICENSE.txt +22 -0
- data/README.md +444 -0
- data/Rakefile +9 -0
- data/examples/date_grammar.rb +49 -0
- data/lib/list_matcher/version.rb +3 -0
- data/lib/list_matcher.rb +729 -0
- data/list_matcher.gemspec +23 -0
- data/test/basic_test.rb +248 -0
- data/test/benchmarks.rb +149 -0
- data/test/stress.rb +44 -0
- metadata +87 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: a4f9013c830b1d516bbf895732abce4ee0fd327f
|
4
|
+
data.tar.gz: a6f49385a4dc17ffe42fe706bfce7e2cef2e4135
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: d97d5937836d59422ec61c8f4daa1e859ea8b62aa8598f74d875b7dbde4426a6d24368cd94ed0936baedb5dde357cfc10492ab173fe6f7e0b67748856aabf7f8
|
7
|
+
data.tar.gz: 343cbde71b7cb693b4dc67965a5334a5cdc05ebcf34d103bfd0be21633abd80aa6950c6718360d0a05fbea7a55ff3e553edcfdaa7d06f0777773a9e623a162a6
|
data/.gitignore
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2015 dfhoughton
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,444 @@
|
|
1
|
+
# ListMatcher
|
2
|
+
|
3
|
+
For creating compact, non-backtracking regular expressions from a list of strings.
|
4
|
+
|
5
|
+
## Installation
|
6
|
+
|
7
|
+
Add this line to your application's Gemfile:
|
8
|
+
|
9
|
+
```ruby
|
10
|
+
gem 'list_matcher'
|
11
|
+
```
|
12
|
+
|
13
|
+
And then execute:
|
14
|
+
|
15
|
+
$ bundle
|
16
|
+
|
17
|
+
Or install it yourself as:
|
18
|
+
|
19
|
+
$ gem install list_matcher
|
20
|
+
|
21
|
+
## Usage
|
22
|
+
|
23
|
+
```ruby
|
24
|
+
require 'list_matcher'
|
25
|
+
|
26
|
+
m = List::Matcher.new
|
27
|
+
puts m.pattern %w( cat dog ) # (?:cat|dog)
|
28
|
+
puts m.pattern %w( cat rat ) # (?:[cr]at)
|
29
|
+
puts m.pattern %w( cat camel ) # (?:ca(?:mel|t))
|
30
|
+
puts m.pattern %w( cat flat sprat ) # (?:(?:c|fl|spr)at)
|
31
|
+
puts m.pattern %w( catttttttttt ) # (?:cat{10})
|
32
|
+
puts m.pattern %w( cat-t-t-t-t-t-t-t-t-t ) # (?:ca(?:t-){9}t)
|
33
|
+
puts m.pattern %w( catttttttttt batttttttttt ) # (?:[bc]at{10})
|
34
|
+
puts m.pattern %w( cad bad dad ) # (?:[b-d]ad)
|
35
|
+
puts m.pattern %w( cat catalog ) # (?:cat(?:alog)?+)
|
36
|
+
puts m.pattern (1..31).to_a # (?:[4-9]|1\d?+|2\d?+|3[01]?+)
|
37
|
+
```
|
38
|
+
|
39
|
+
## Description
|
40
|
+
|
41
|
+
`List::Matcher` facilitates generating efficient regexen programmatically. This is useful, for example, when looking for
|
42
|
+
occurrences of particular words or phrases in free-form text. `List::Matcher` will automatically generate regular expressions
|
43
|
+
that minimize backtracking, so they tend to be as fast as one could hope a regular expression to be. (The general strategy is
|
44
|
+
to represent the items in the list as a trie.)
|
45
|
+
|
46
|
+
`List::Matcher` has many options and the initialization of a matcher for pattern generation is somewhat complex, so various methods
|
47
|
+
are provided to minimize initializations and the number of times you specify options. For one-off patterns, you may as well call
|
48
|
+
class methods, either `pattern` which generates a string, or `rx`, which returns a `Regexp` object:
|
49
|
+
|
50
|
+
```ruby
|
51
|
+
List::Matcher.pattern %( cat dog ) # "(?:cat|dog)"
|
52
|
+
List::Matcher.rx %( cat dog ) # /(?:cat|dog)/
|
53
|
+
```
|
54
|
+
|
55
|
+
If you plan to generate multiple regexen, or have complicated options which you always use, you should generate a configured
|
56
|
+
instance first:
|
57
|
+
|
58
|
+
```ruby
|
59
|
+
m = List::Matcher.new normalize_whitespace: true, bound: true, case_insensitive: true, multiline: true, atomic: false, symbols: { num: '\d++' }
|
60
|
+
m.pattern method_that_gets_a_long_list
|
61
|
+
m.rx method_that_gets_a_long_list
|
62
|
+
...
|
63
|
+
```
|
64
|
+
|
65
|
+
If you have a basic set of options and you need to modify these in particular cases, you can:
|
66
|
+
|
67
|
+
```ruby
|
68
|
+
m.pattern list, case_insensitive: false
|
69
|
+
```
|
70
|
+
|
71
|
+
You can also generate a prototype list matcher with a particular variation and bud off children with their own properties:
|
72
|
+
|
73
|
+
```ruby
|
74
|
+
m = List::Matcher.new normalize_whitespace: true, bound: true, case_insensitive: true, multiline: true, atomic: false, symbols: { num: '\d++' }
|
75
|
+
m2 = m.bud case_insensitive: false
|
76
|
+
```
|
77
|
+
|
78
|
+
Basically, you can mix in options in whatever way suits you. Constructing configured instances gives you a tiny bit of efficiency, but
|
79
|
+
mostly it saves you from specifying these options in multiple places.
|
80
|
+
|
81
|
+
## Options
|
82
|
+
|
83
|
+
The one can provide to `new`, `bud`, `pattern`, or `rx` are all the same. These are
|
84
|
+
|
85
|
+
### atomic
|
86
|
+
|
87
|
+
```ruby
|
88
|
+
default: true
|
89
|
+
```
|
90
|
+
|
91
|
+
If true, the returned expression is always wrapped in some grouping expression -- `(?:...)`, `(?>...)`, `(?i:...)`, etc.; whatever
|
92
|
+
is appropriate given the other options and defaults -- so it can receive a quantification suffix.
|
93
|
+
|
94
|
+
```ruby
|
95
|
+
List::Matcher.pattern %w(cat dog), atomic: false # "cat|dog"
|
96
|
+
List::Matcher.pattern %w(cat dog), atomic: true # "(?:cat|dog)"
|
97
|
+
```
|
98
|
+
|
99
|
+
### backtracking
|
100
|
+
|
101
|
+
```ruby
|
102
|
+
default: true
|
103
|
+
```
|
104
|
+
|
105
|
+
If true, the default non-capturing grouping expression is `(?:...)` rather than `(?>...)`, and the optional quantifier is
|
106
|
+
`?` rather than `?+`.
|
107
|
+
|
108
|
+
```ruby
|
109
|
+
List::Matcher.pattern %w( cat dog ) # "(?:cat|dog)"
|
110
|
+
List::Matcher.pattern %w( cat dog ), backtracking: false # "(?>cat|dog)"
|
111
|
+
```
|
112
|
+
|
113
|
+
### bound
|
114
|
+
|
115
|
+
```ruby
|
116
|
+
default: false
|
117
|
+
```
|
118
|
+
|
119
|
+
Whether boundary expressions should be attached to the margins of every expression in the list. If this value is simply true, this means
|
120
|
+
each items marginal characters, the first and the last, are tested to see whether they are word characters and if so the word
|
121
|
+
boundary symbol, `\b`, is appended to them where appropriate. There are several variants on this, however:
|
122
|
+
|
123
|
+
```ruby
|
124
|
+
bound: :word
|
125
|
+
```
|
126
|
+
|
127
|
+
This is the same as `bound: true`.
|
128
|
+
|
129
|
+
```ruby
|
130
|
+
List::Matcher.pattern %w(cat), bound: :word # "(?:\\bcat\\b)"
|
131
|
+
List::Matcher.pattern %w(cat), bound: true # "(?:\\bcat\\b)"
|
132
|
+
```
|
133
|
+
|
134
|
+
```ruby
|
135
|
+
bound: :line
|
136
|
+
```
|
137
|
+
|
138
|
+
Each item should take up an entire line, so the boundary symbols are `^` and `$`.
|
139
|
+
|
140
|
+
```ruby
|
141
|
+
List::Matcher.pattern %w(cat), bound: :line # "(?:^cat$)"
|
142
|
+
```
|
143
|
+
|
144
|
+
```ruby
|
145
|
+
bound: :string
|
146
|
+
```
|
147
|
+
|
148
|
+
Each item should match the entire string compared against, so the boundary symbols are `\A` and `\z`.
|
149
|
+
|
150
|
+
```ruby
|
151
|
+
List::Matcher.pattern %w(cat), bound: :string # "(?:\\Acat\\z)"
|
152
|
+
```
|
153
|
+
|
154
|
+
```ruby
|
155
|
+
bound: { test: /\d/, left: '(?<!\d)', right: '(?!\d)'}
|
156
|
+
```
|
157
|
+
|
158
|
+
If you have an ad hoc boundary definition -- here it is a digit/non-digit boundary -- you may specify it so. The test parameter
|
159
|
+
identifies marginal characters that require the boundary tests and the `:left` and `:right` symbols identify the boundary conditions.
|
160
|
+
|
161
|
+
```ruby
|
162
|
+
List::Matcher.pattern (1...1000).to_a, bound: { test: /\d/, left: '(?<!\d)', right: '(?!\d)'}
|
163
|
+
# "(?:(?<!\\d)[1-9](?:\\d\\d?)?(?!\\d))"
|
164
|
+
```
|
165
|
+
|
166
|
+
### strip
|
167
|
+
|
168
|
+
```ruby
|
169
|
+
default: false
|
170
|
+
```
|
171
|
+
|
172
|
+
Strip whitespace off the margins of items in the list.
|
173
|
+
|
174
|
+
```ruby
|
175
|
+
List::Matcher.pattern [' cat '] # "(?:(?:\\ ){5}cat(?:\\ ){5})"
|
176
|
+
List::Matcher.pattern [' cat '], strip: true # "(?:cat)"
|
177
|
+
```
|
178
|
+
|
179
|
+
### case_insensitive
|
180
|
+
|
181
|
+
```ruby
|
182
|
+
default: false
|
183
|
+
```
|
184
|
+
|
185
|
+
Generate a case-insensitive regular expression.
|
186
|
+
|
187
|
+
```ruby
|
188
|
+
List::Matcher.pattern %w( Cat cat CAT ) # "(?:C(?:AT|at)|cat)"
|
189
|
+
List::Matcher.pattern %w( Cat cat CAT ), case_insensitive: true # "(?i:cat)"
|
190
|
+
```
|
191
|
+
|
192
|
+
### multiline
|
193
|
+
|
194
|
+
```ruby
|
195
|
+
default: false
|
196
|
+
```
|
197
|
+
|
198
|
+
Generate a multi-line regex.
|
199
|
+
|
200
|
+
```ruby
|
201
|
+
List::Matcher.pattern %w(cat), multiline: true # "(?m:cat)"
|
202
|
+
```
|
203
|
+
|
204
|
+
The special feature of a multi-line regular expression is that `.` can grab newline characters. Because `List::Matcher`
|
205
|
+
never produces `.` on its own, this option is only useful in conjunction with the `symbols` option, which lets one
|
206
|
+
inject snippets of regex into the one generated.
|
207
|
+
|
208
|
+
### normalize_whitespace
|
209
|
+
|
210
|
+
```ruby
|
211
|
+
default: false
|
212
|
+
```
|
213
|
+
|
214
|
+
This strips whitespace from items in the list and treats all internal whitespace as equivalent.
|
215
|
+
|
216
|
+
```ruby
|
217
|
+
List::Matcher.pattern [ ' cat walker ', ' dog walker', 'camel walker' ]
|
218
|
+
# "(?:\\ (?:\\ dog\\ walker|cat\\ \\ walker\\ )|camel\\ \\ walker)"
|
219
|
+
List::Matcher.pattern [ ' cat walker ', ' dog walker', 'camel walker' ], normalize_whitespace: true
|
220
|
+
# "(?:(?:ca(?:mel|t)|dog)\\s++walker)"
|
221
|
+
```
|
222
|
+
|
223
|
+
### symbols
|
224
|
+
|
225
|
+
You can tell `List::Matcher` that certain character sequences should be regarded as "symbols". It will then leave
|
226
|
+
these unmolested, replacing them in the generated regex with whatever you map the symbol sequences to. The keys in
|
227
|
+
the symbol hash are expected to be strings, symbols, or `Regexps`. Symbol keys are converted to their
|
228
|
+
sequence by stringification. `Regexp` keys convert any sequence they match.
|
229
|
+
|
230
|
+
```ruby
|
231
|
+
List::Matcher.pattern [ 'Catch 22', '1984', 'Fahrenheit 451' ], symbols: { /\d+/ => '\d++' }
|
232
|
+
# "(?:(?:(?:Catch|Fahrenheit)\\ )?\\d++)"
|
233
|
+
List::Matcher.pattern [ 'Catch foo', 'foo', 'Fahrenheit foo' ], symbols: { 'foo' => '\d++' }
|
234
|
+
# "(?:(?:(?:Catch|Fahrenheit)\\ )?\\d++)"
|
235
|
+
List::Matcher.pattern [ 'Catch foo', 'foo', 'Fahrenheit foo' ], symbols: { foo: '\d++' }
|
236
|
+
# "(?:(?:(?:Catch|Fahrenheit)\\ )?\\d++)"
|
237
|
+
```
|
238
|
+
|
239
|
+
Because it is possible for symbol sequences to overlap, sequences with string or symbol keys are evaluated before `Regexps`, and longer keys are
|
240
|
+
evaluated before shorter ones.
|
241
|
+
|
242
|
+
### name
|
243
|
+
|
244
|
+
If you assign your pattern a name, it will be constructed with a named group such that you can extract
|
245
|
+
the substring matched.
|
246
|
+
|
247
|
+
```ruby
|
248
|
+
List::Matcher.pattern %w(cat), name: :cat # "(?<cat>cat)"
|
249
|
+
```
|
250
|
+
|
251
|
+
This is mostly useful if you are using `List::Matcher` to compose complex regexen incrementally. E.g., from the examples directory,
|
252
|
+
|
253
|
+
```ruby
|
254
|
+
require 'list_matcher'
|
255
|
+
|
256
|
+
m = List::Matcher.new atomic: false, bound: true
|
257
|
+
|
258
|
+
year = m.pattern( (1901..2000).to_a, name: :year )
|
259
|
+
mday = m.pattern( (1..31).to_a, name: :mday )
|
260
|
+
weekdays = %w( Monday Tuesday Wednesday Thursday Friday Saturday Sunday )
|
261
|
+
weekdays += weekdays.map{ |w| w[0...3] }
|
262
|
+
wday = m.pattern weekdays, case_insensitive: true, name: :wday
|
263
|
+
months = %w( January February March April May June July August September October November December )
|
264
|
+
months += months.map{ |w| w[0...3] }
|
265
|
+
mo = m.pattern months, case_insensitive: true, name: :mo
|
266
|
+
|
267
|
+
date_20th_century = m.rx(
|
268
|
+
[
|
269
|
+
'wday, mo mday',
|
270
|
+
'wday, mo mday year',
|
271
|
+
'mo mday, year',
|
272
|
+
'mo year',
|
273
|
+
'mday mo year',
|
274
|
+
'wday',
|
275
|
+
'year',
|
276
|
+
'mday mo',
|
277
|
+
'mo mday',
|
278
|
+
'mo mday year'
|
279
|
+
],
|
280
|
+
normalize_whitespace: true,
|
281
|
+
atomic: true,
|
282
|
+
symbols: {
|
283
|
+
year: year,
|
284
|
+
mday: mday,
|
285
|
+
wday: wday,
|
286
|
+
mo: mo
|
287
|
+
}
|
288
|
+
)
|
289
|
+
|
290
|
+
[
|
291
|
+
'Friday',
|
292
|
+
'August 27',
|
293
|
+
'May 6, 1969',
|
294
|
+
'1 Jan 2000',
|
295
|
+
'this is not actually a date'
|
296
|
+
].each do |candidate|
|
297
|
+
if m = date_20th_century.match(candidate)
|
298
|
+
puts "candidate: #{candidate}; year: #{m[:year]}; month: #{m[:mo]}; weekday: #{m[:wday]}; day of the month: #{m[:mday]}"
|
299
|
+
else
|
300
|
+
puts "#{candidate} does not look like a plausible date in the 20th century"
|
301
|
+
end
|
302
|
+
end
|
303
|
+
```
|
304
|
+
|
305
|
+
### vet
|
306
|
+
|
307
|
+
```ruby
|
308
|
+
default: false
|
309
|
+
```
|
310
|
+
|
311
|
+
If true, all patterns associated with symbols will be tested upon initialization to make sure they will
|
312
|
+
create legitimate regular expressions. If you are prone to doing this, for example:
|
313
|
+
|
314
|
+
```ruby
|
315
|
+
List::Matcher.new symbols: { aw_nuts: '+++' }
|
316
|
+
```
|
317
|
+
|
318
|
+
then you may want to vet your symbols. Vetting is not done by default because one assumes you've worked out
|
319
|
+
your substitutions on your own time and we need not waste runtime checking them.
|
320
|
+
|
321
|
+
## Benchmarks
|
322
|
+
|
323
|
+
Efficiency isn't the principle purpose of List::Matcher, but in almost all cases List::Matcher
|
324
|
+
regular expressions are more efficient than a regular expression generated by simply joining alternates
|
325
|
+
with `|`. The following results were extracted from the output of the benchmark script included with this
|
326
|
+
distribution. Sets are provided as a baseline for comparison, though there are many things one can do
|
327
|
+
with a regular expression that one cannot do with a set.
|
328
|
+
|
329
|
+
```
|
330
|
+
RANDOM WORDS, VARIABLE LENGTH
|
331
|
+
|
332
|
+
number of words: 100
|
333
|
+
|
334
|
+
set good: 53360.1 i/s
|
335
|
+
List::Matcher good: 22211.7 i/s - 2.40x slower
|
336
|
+
simple rx good: 13086.6 i/s - 4.08x slower
|
337
|
+
list good: 4748.0 i/s - 11.24x slower
|
338
|
+
|
339
|
+
set bad: 57387.1 i/s
|
340
|
+
List::Matcher bad: 14398.7 i/s - 3.99x slower
|
341
|
+
simple rx bad: 7347.1 i/s - 7.81x slower
|
342
|
+
list bad: 2583.1 i/s - 22.22x slower
|
343
|
+
|
344
|
+
|
345
|
+
number of words: 1000
|
346
|
+
|
347
|
+
set good: 5380.5 i/s
|
348
|
+
List::Matcher good: 1665.3 i/s - 3.23x slower
|
349
|
+
simple rx good: 166.7 i/s - 32.27x slower
|
350
|
+
list good: 52.8 i/s - 101.98x slower
|
351
|
+
|
352
|
+
set bad: 5294.8 i/s
|
353
|
+
List::Matcher bad: 1061.1 i/s - 4.99x slower
|
354
|
+
simple rx bad: 81.0 i/s - 65.34x slower
|
355
|
+
list bad: 26.1 i/s - 202.51x slower
|
356
|
+
|
357
|
+
|
358
|
+
number of words: 10000
|
359
|
+
|
360
|
+
set good: 361.3 i/s
|
361
|
+
List::Matcher good: 146.4 i/s - 2.47x slower
|
362
|
+
simple rx good: 1.7 i/s - 210.46x slower
|
363
|
+
list good: 0.4 i/s - 1027.74x slower
|
364
|
+
|
365
|
+
set bad: 370.3 i/s
|
366
|
+
List::Matcher bad: 82.2 i/s - 4.51x slower
|
367
|
+
simple rx bad: 0.8 i/s - 447.85x slower
|
368
|
+
list bad: 0.2 i/s - 1882.35x slower
|
369
|
+
|
370
|
+
|
371
|
+
FIXED LENGTH, FULL RANGE
|
372
|
+
|
373
|
+
number of words: 10; List::Matcher rx: (?-mix:\A\d\z)
|
374
|
+
|
375
|
+
set good: 520144.5 i/s
|
376
|
+
List::Matcher good: 382968.0 i/s - 1.36x slower
|
377
|
+
list good: 323052.6 i/s - 1.61x slower
|
378
|
+
simple rx good: 316058.3 i/s - 1.65x slower
|
379
|
+
|
380
|
+
set bad: 624424.8 i/s
|
381
|
+
List::Matcher bad: 270882.3 i/s - 2.31x slower
|
382
|
+
simple rx bad: 266277.3 i/s - 2.35x slower
|
383
|
+
list bad: 175058.3 i/s - 3.57x slower
|
384
|
+
|
385
|
+
|
386
|
+
number of words: 100; List::Matcher rx: (?-mix:\A\d\d\z)
|
387
|
+
|
388
|
+
set creation: 20.3 i/s
|
389
|
+
simple rx creation: 15.9 i/s - 1.28x slower
|
390
|
+
List::Matcher creation: 15.9 i/s - 1.28x slower
|
391
|
+
|
392
|
+
set good: 52058.4 i/s
|
393
|
+
List::Matcher good: 41841.7 i/s - 1.24x slower
|
394
|
+
simple rx good: 15095.6 i/s - 3.45x slower
|
395
|
+
list good: 4350.1 i/s - 11.97x slower
|
396
|
+
|
397
|
+
set bad: 59315.4 i/s
|
398
|
+
simple rx bad: 28063.6 i/s - 2.11x slower
|
399
|
+
List::Matcher bad: 27823.9 i/s - 2.13x slower
|
400
|
+
list bad: 2083.9 i/s - 28.46x slower
|
401
|
+
|
402
|
+
|
403
|
+
number of words: 1000; List::Matcher rx: (?-mix:\A\d{3}\z)
|
404
|
+
|
405
|
+
set creation: 2.1 i/s
|
406
|
+
List::Matcher creation: 1.5 i/s - 1.40x slower
|
407
|
+
simple rx creation: 1.5 i/s - 1.41x slower
|
408
|
+
|
409
|
+
set good: 4664.2 i/s
|
410
|
+
List::Matcher good: 3514.1 i/s - 1.33x slower
|
411
|
+
simple rx good: 225.6 i/s - 20.67x slower
|
412
|
+
list good: 44.2 i/s - 105.57x slower
|
413
|
+
|
414
|
+
set bad: 5830.5 i/s
|
415
|
+
simple rx bad: 2802.5 i/s - 2.08x slower
|
416
|
+
List::Matcher bad: 2717.0 i/s - 2.15x slower
|
417
|
+
list bad: 20.0 i/s - 291.10x slower
|
418
|
+
|
419
|
+
|
420
|
+
number of words: 10000; List::Matcher rx: (?-mix:\A\d{4}\z)
|
421
|
+
|
422
|
+
set creation: 0.2 i/s
|
423
|
+
simple rx creation: 0.1 i/s - 1.21x slower
|
424
|
+
List::Matcher creation: 0.1 i/s - 1.31x slower
|
425
|
+
|
426
|
+
set good: 369.4 i/s
|
427
|
+
List::Matcher good: 326.2 i/s - 1.13x slower
|
428
|
+
simple rx good: 2.3 i/s - 159.07x slower
|
429
|
+
list good: 0.4 i/s - 966.48x slower
|
430
|
+
|
431
|
+
set bad: 426.6 i/s
|
432
|
+
simple rx bad: 285.6 i/s - 1.49x slower
|
433
|
+
List::Matcher bad: 277.1 i/s - 1.54x slower
|
434
|
+
list bad: 0.2 i/s - 2236.24x slower
|
435
|
+
|
436
|
+
```
|
437
|
+
|
438
|
+
## Contributing
|
439
|
+
|
440
|
+
1. Fork it ( https://github.com/[my-github-username]/list_matcher/fork )
|
441
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
442
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
443
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
444
|
+
5. Create a new Pull Request
|
data/Rakefile
ADDED
@@ -0,0 +1,49 @@
|
|
1
|
+
require 'list_matcher'
|
2
|
+
|
3
|
+
m = List::Matcher.new atomic: false, bound: true
|
4
|
+
|
5
|
+
year = m.pattern( (1901..2000).to_a, name: :year )
|
6
|
+
mday = m.pattern( (1..31).to_a, name: :mday )
|
7
|
+
weekdays = %w( Monday Tuesday Wednesday Thursday Friday Saturday Sunday )
|
8
|
+
weekdays += weekdays.map{ |w| w[0...3] }
|
9
|
+
wday = m.pattern weekdays, case_insensitive: true, name: :wday
|
10
|
+
months = %w( January February March April May June July August September October November December )
|
11
|
+
months += months.map{ |w| w[0...3] }
|
12
|
+
mo = m.pattern months, case_insensitive: true, name: :mo
|
13
|
+
|
14
|
+
date_20th_century = m.rx(
|
15
|
+
[
|
16
|
+
'wday, mo mday',
|
17
|
+
'wday, mo mday year',
|
18
|
+
'mo mday, year',
|
19
|
+
'mo year',
|
20
|
+
'mday mo year',
|
21
|
+
'wday',
|
22
|
+
'year',
|
23
|
+
'mday mo',
|
24
|
+
'mo mday',
|
25
|
+
'mo mday year'
|
26
|
+
],
|
27
|
+
normalize_whitespace: true,
|
28
|
+
atomic: true,
|
29
|
+
symbols: {
|
30
|
+
year: year,
|
31
|
+
mday: mday,
|
32
|
+
wday: wday,
|
33
|
+
mo: mo
|
34
|
+
}
|
35
|
+
)
|
36
|
+
|
37
|
+
[
|
38
|
+
'Friday',
|
39
|
+
'August 27',
|
40
|
+
'May 6, 1969',
|
41
|
+
'1 Jan 2000',
|
42
|
+
'this is not actually a date'
|
43
|
+
].each do |candidate|
|
44
|
+
if m = date_20th_century.match(candidate)
|
45
|
+
puts "candidate: #{candidate}; year: #{m[:year]}; month: #{m[:mo]}; weekday: #{m[:wday]}; day of the month: #{m[:mday]}"
|
46
|
+
else
|
47
|
+
puts "#{candidate} does not look like a plausible date in the 20th century"
|
48
|
+
end
|
49
|
+
end
|