regexador 0.4.5
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/README.md +385 -0
- data/lib/chars.rb +22 -0
- data/lib/keywords.rb +22 -0
- data/lib/predefs.rb +49 -0
- data/lib/regexador.rb +79 -0
- data/lib/regexador_parser.rb +113 -0
- data/lib/regexador_xform.rb +180 -0
- data/spec/parsing_spec.rb +174 -0
- data/spec/programs_spec.rb +2928 -0
- data/spec/testing.rb +35 -0
- data/test/test.rb +39 -0
- metadata +109 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: b16d02292d2ae9d888a09c04d2e3330c72be9836
|
4
|
+
data.tar.gz: d4b472b343ece2a984b30ceecae7968fefdb6c93
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 1f12498dd028fb55dcef7f927d81a1057af6441824c56d2c6fb8dfe9ee9c99e3c6fb7caa9c94069e848489b737e5d025196a9bf74f4fa844014a61509d361adb
|
7
|
+
data.tar.gz: 896ba315badba2f4bdfce8d731c3a3802111ce712ac1bbb768bb4fa8c0f54d1306eabbba896214ccd97fef74c9439f66abcd66a5c9612eb494959dc67d2f256f
|
data/README.md
ADDED
@@ -0,0 +1,385 @@
|
|
1
|
+
**UPDATING for 2019**
|
2
|
+
- create gemspec
|
3
|
+
- update code for Ruby 2.6
|
4
|
+
- convert RSpec to MiniTest
|
5
|
+
- add more tests
|
6
|
+
- add more examples
|
7
|
+
- add a tutorial
|
8
|
+
- begin work on translating Ruby regexes
|
9
|
+
- investigate Python/Perl/Elixir compatibility
|
10
|
+
- investigate possibility of engine mockup with debugger
|
11
|
+
|
12
|
+
|
13
|
+
# regexador
|
14
|
+
|
15
|
+
An external DSL for Ruby that tries to make regular expressions readable and maintainable.
|
16
|
+
|
17
|
+
**PLEASE NOTE**: This README may not be as up-to-date
|
18
|
+
as [the wiki](http://github.com/Hal9000/regexador/wiki).
|
19
|
+
|
20
|
+
### The Basic Concept
|
21
|
+
|
22
|
+
Many people are intimidated or confused by regular expressions.
|
23
|
+
A large part of this is the confusing syntax.
|
24
|
+
|
25
|
+
Regexador is a mini-language purely for building regular expressions.
|
26
|
+
It's purely a Ruby project for now, though in theory it could be
|
27
|
+
implemented in/for other languages.
|
28
|
+
|
29
|
+
For an analogy, think of how we sometimes manipulate databases by
|
30
|
+
constructing SQL queries and passing them into the appropriate
|
31
|
+
methods. Regexador works much the same way.
|
32
|
+
|
33
|
+
### A Short Example
|
34
|
+
|
35
|
+
Suppose we want to match a string consisting of a single IP address.
|
36
|
+
(Remember that the numbers can only range as high as 255.)
|
37
|
+
|
38
|
+
Here is traditional regular expression notation:
|
39
|
+
|
40
|
+
/^(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})\.(25[0-5]|2[0-4]\d|([01])?(\d){1,2})$/
|
41
|
+
|
42
|
+
And here is Regexador notation:
|
43
|
+
|
44
|
+
dot = "."
|
45
|
+
num = "25" D5 | `2 D4 D | maybe D1 1,2*D
|
46
|
+
match BOS num dot num dot num dot num EOS end
|
47
|
+
|
48
|
+
In your Ruby code, you can create a Regexador "script" or "program"
|
49
|
+
(probably by means of a here-document) that you can then pass into
|
50
|
+
the Regexador class. At minimum, you can convert this into a "real"
|
51
|
+
Ruby regular expression; there are a few other features and functions,
|
52
|
+
and more may be added.
|
53
|
+
|
54
|
+
So here is a complete Ruby program:
|
55
|
+
|
56
|
+
require 'regexador'
|
57
|
+
|
58
|
+
program = <<-EOS
|
59
|
+
dot = "."
|
60
|
+
num = "25" D5 | `2 D4 D | maybe D1 0,2*D
|
61
|
+
match WB num dot num dot num dot num WB end
|
62
|
+
EOS
|
63
|
+
|
64
|
+
pattern = Regexador.new(program)
|
65
|
+
|
66
|
+
puts "Give me an IP address"
|
67
|
+
str = gets.chomp
|
68
|
+
|
69
|
+
rx = pattern.to_regex # Can retrieve the actual regex
|
70
|
+
|
71
|
+
if pattern.match?(str) # ...or use in other direct ways
|
72
|
+
puts "Valid"
|
73
|
+
else
|
74
|
+
puts "Invalid"
|
75
|
+
end
|
76
|
+
|
77
|
+
|
78
|
+
|
79
|
+
**Traditional Syntax: Things I Personally Dislike**
|
80
|
+
|
81
|
+
- There are no keywords -- only punctuation.
|
82
|
+
These symbols all have special meanings: ^$.\[]()+\*? (and others)
|
83
|
+
- ^ has at least three different meanings
|
84
|
+
- [ and ] each have two or three different meanings
|
85
|
+
- Parentheses aren't just for grouping, but for specifying captures
|
86
|
+
- Character literals are "naked"
|
87
|
+
- Excessive punctuation makes use of backslash common
|
88
|
+
- Repetition is strictly postfix form
|
89
|
+
- Typically (except for Ruby's /x): They're not multi-line, they don't allow comments, and whitespace is highly significant.
|
90
|
+
- There's no way to avoid duplication (e.g.) by assigning subexpressions to variables.
|
91
|
+
- And other things I'm forgetting
|
92
|
+
|
93
|
+
|
94
|
+
### Regexador at a Glance
|
95
|
+
|
96
|
+
I'm attracted to old-fashioned line-oriented syntax; but I don't want
|
97
|
+
to lock myself into that completely.
|
98
|
+
|
99
|
+
In general, useful definitions (variables) will come first. Many things
|
100
|
+
are predefined already, such as all the usual anchors and the POSIX
|
101
|
+
character classes. These are in all caps and are considered constants.
|
102
|
+
|
103
|
+
At the end, a *match* clause drives the actual building of the final
|
104
|
+
regular expression. Within this clause, names may be assigned to the
|
105
|
+
individual sub-matches (using variables that start with "@"). These will
|
106
|
+
naturally be available externally as named captures.
|
107
|
+
|
108
|
+
Because this is really just a "builder," and because we don't have "hooks"
|
109
|
+
into the regular expression engine itself, a Regexador script will not
|
110
|
+
look or act much like a "real program." There will be no arithmetic, no
|
111
|
+
function calls, no looping or branching. Also there can be no printing
|
112
|
+
of debug information "at matching time"; in principle, printing could be
|
113
|
+
done during parsing/compilation, but I don't see any value in this.
|
114
|
+
|
115
|
+
Of course, syntax errors in Regexador will be found and made available
|
116
|
+
to the caller.
|
117
|
+
|
118
|
+
|
119
|
+
**Beginning at the Beginning**
|
120
|
+
|
121
|
+
I've tried to "think ahead" so as not to paint myself into a corner
|
122
|
+
too much.
|
123
|
+
|
124
|
+
However, probably not all of this can be implemented in the first
|
125
|
+
version. The current "working version" (0.2.7) has been implemented
|
126
|
+
over a period of nine weeks.
|
127
|
+
|
128
|
+
Therefore some of the syntax described in the following will not be
|
129
|
+
available right away.
|
130
|
+
|
131
|
+
Features still postponed:
|
132
|
+
- intra-line comments: #{...}
|
133
|
+
- case/end
|
134
|
+
- unsure about upto, thru
|
135
|
+
- unsure about next, last
|
136
|
+
- pos/neg lookahead/behind
|
137
|
+
|
138
|
+
|
139
|
+
**Syntax notes:**
|
140
|
+
|
141
|
+
"abc" A char string /abc/
|
142
|
+
`a A single character /a/
|
143
|
+
&2345 Unicode char U+2345
|
144
|
+
~`a Negated char class /[^a]/
|
145
|
+
'abc' One of class a, b, c /[abc]/
|
146
|
+
`a-`z Char range /[a-z]/
|
147
|
+
`a~`z Negated char range /[^a-z]/
|
148
|
+
p1 | p2 Alternative
|
149
|
+
maybe PAT Optional pattern PAT?
|
150
|
+
any PAT Zero or more of pattern PAT\*
|
151
|
+
many PAT One or more of pattern PAT+
|
152
|
+
nocase PAT Case-insensitive PAT (?i)PAT
|
153
|
+
0,1 * PAT Same as maybe PAT?
|
154
|
+
1,3 * PAT One to three of PAT PAT{1,3}
|
155
|
+
5 * PAT Five of PAT PAT{5}
|
156
|
+
@var A named capture \g<var>{0}
|
157
|
+
:var A parameter passed in
|
158
|
+
%alpha POSIX or Ruby char class [[:alpha:]]
|
159
|
+
var = val Assign value to local var
|
160
|
+
match Start assembling the regex
|
161
|
+
# ... Comment
|
162
|
+
D Digit /[0-9]/
|
163
|
+
D1, D2, ... 0 through whatever /[0-1]/ /[0-1]/ ...
|
164
|
+
X Any character /./
|
165
|
+
WB Word boundary /\b/
|
166
|
+
CR Carriage return "\r" /\r/
|
167
|
+
LF Linefeed "\n" /\n/
|
168
|
+
NL Newline "\n" /\n/
|
169
|
+
START Start of the string /\A/
|
170
|
+
END End of the string /\Z/
|
171
|
+
|
172
|
+
|
173
|
+
"On hold" for now...
|
174
|
+
|
175
|
+
upto `a All non-a chars until a /([^a]\*?a)/
|
176
|
+
thru `a All chars including next a /(.\*?a)/
|
177
|
+
last PAT Greedy (.\*)PAT
|
178
|
+
next PAT Non-greedy (default) (.\*)?PAT
|
179
|
+
#{...} Inline comment
|
180
|
+
case/when/end Complex alternatives
|
181
|
+
|
182
|
+
|
183
|
+
### Notes, precedence, etc.
|
184
|
+
|
185
|
+
any, many, maybe, nocase ... These refer to the very next pattern (but parentheses are legal):
|
186
|
+
|
187
|
+
maybe "abc" many "xyz" /(abc)?(xyz)+/
|
188
|
+
maybe many "def" /(def)+?/
|
189
|
+
maybe ("abc" many "xyz") /(abc(xyz)+)?/
|
190
|
+
"abc" nocase "def" "ghi" /abc((?i)def)ghi/
|
191
|
+
|
192
|
+
String concatenation is implied:
|
193
|
+
|
194
|
+
str = "abc" NL "def" /abc\ndef/
|
195
|
+
|
196
|
+
Strings don't interpolate and the backslash is not special (unsure?):
|
197
|
+
|
198
|
+
str = "lm\nop" /lm\\nop/
|
199
|
+
|
200
|
+
A character literal is essentially the same as a one-character string.
|
201
|
+
|
202
|
+
c1 = `$ /\$/
|
203
|
+
s1 = "$" /\$/
|
204
|
+
|
205
|
+
However, a character can be negated, while a string (at present) cannot.
|
206
|
+
|
207
|
+
n1 = ~`$ /[^$]/
|
208
|
+
|
209
|
+
It is possible to use the "ampersand" notation (with four hex digits)
|
210
|
+
to specify a Unicode codepoint explicitly.
|
211
|
+
|
212
|
+
&20ac /€/
|
213
|
+
|
214
|
+
The encoding is assumed to be UTF-8. Characters used as literals are limited
|
215
|
+
only by the editor and the current Ruby encoding.
|
216
|
+
|
217
|
+
str = "æßçöñ" /æကßçö/
|
218
|
+
|
219
|
+
Tokens such as any, many, match, (etc.) are keywords, and as such cannot be local variable names
|
220
|
+
|
221
|
+
However, parameters (starting with colon) and named matches (starting with @) can be named @any, :many, and so on.
|
222
|
+
|
223
|
+
Capitalized predefined matches such as WB (word boundary) are really keywords also
|
224
|
+
|
225
|
+
Alternation binds very loosely:
|
226
|
+
|
227
|
+
many "abc" | "xyz" /(abc)+|xyx/
|
228
|
+
(many "abc") | "xyz" /(abc)+|xyz/ # Same as above
|
229
|
+
many ("abc" | "xyz") /(abc|xyz)+/ # Different!
|
230
|
+
|
231
|
+
A variable may refer to a string, a number, or a pattern:
|
232
|
+
|
233
|
+
var1 = 3
|
234
|
+
var2 = "abc" # Really a string is a pattern too
|
235
|
+
var3 = maybe many D
|
236
|
+
|
237
|
+
There is no arithmetic, but variables may be used where numbers may:
|
238
|
+
|
239
|
+
m = 3
|
240
|
+
n = 5
|
241
|
+
m,n * "xyz" /(xyz){3,5}/
|
242
|
+
|
243
|
+
Parameters may be used the same way:
|
244
|
+
|
245
|
+
# Assuming params :m, :n are 2 and 4
|
246
|
+
:m,:n * "xyz" /(xyz){2,4}/
|
247
|
+
|
248
|
+
But data type matters, of course:
|
249
|
+
|
250
|
+
m = 3
|
251
|
+
n = "foo"
|
252
|
+
m,n * "def" # Syntax error!
|
253
|
+
|
254
|
+
The "match clause" uses all previous definitions to finally build the regular expression. It starts with "match" and ends with "end":
|
255
|
+
|
256
|
+
match "abc" | "def" | many `x end
|
257
|
+
|
258
|
+
Named matches are only used inside the match clause; anywhere a pattern may be used, "@var = pattern" may also be used.
|
259
|
+
|
260
|
+
match @first = (many %alpha) SPACES @last = (many %alpha) end
|
261
|
+
|
262
|
+
Multiple lines are fine (and more readable):
|
263
|
+
|
264
|
+
match
|
265
|
+
@first = many %alpha
|
266
|
+
SPACES
|
267
|
+
@last = many %alpha
|
268
|
+
end
|
269
|
+
|
270
|
+
A "case" may be used for more complex alternatives (needed??):
|
271
|
+
|
272
|
+
case
|
273
|
+
when "abc" ...
|
274
|
+
when "def" ...
|
275
|
+
when "xyz" ...
|
276
|
+
end
|
277
|
+
|
278
|
+
Multiple "programs" can be concatenated, assuming the initial ones are all definitions and there is only one match clause at the end.
|
279
|
+
|
280
|
+
# Ruby code
|
281
|
+
defs = "..."
|
282
|
+
prog = "..."
|
283
|
+
matcher = Regexador.new(defs + prog)
|
284
|
+
|
285
|
+
Pass in parameters this way:
|
286
|
+
|
287
|
+
# Ruby code
|
288
|
+
prog = "..."
|
289
|
+
matcher = Regexador.new(prog, this: 3, that: "foo")
|
290
|
+
|
291
|
+
Possibly invoke "on its own" (compile to regex internally) or explicitly compile?
|
292
|
+
|
293
|
+
result = matcher.match(str)
|
294
|
+
if result.ok?
|
295
|
+
alpha, beta = result[:alpha, :beta] # Captured matches
|
296
|
+
end
|
297
|
+
|
298
|
+
# Alternatively:
|
299
|
+
rx = matcher.regexp # Return a Ruby regex, use however
|
300
|
+
|
301
|
+
### Examples
|
302
|
+
|
303
|
+
Match a signed float /[-+]?[0-9]+\.[0-9]+([Ee][0-9]+)?/
|
304
|
+
|
305
|
+
sign = '+-'
|
306
|
+
digits = many D
|
307
|
+
match
|
308
|
+
@sign = maybe sign
|
309
|
+
@left = digits
|
310
|
+
`.
|
311
|
+
@right = digits
|
312
|
+
maybe ('Ee' @exp=(maybe sign digits))
|
313
|
+
end
|
314
|
+
|
315
|
+
Match balanced HTML tags and capture cdata /\<TAG\b[^\>]\*\>(.\*?)\<\/TAG\>/
|
316
|
+
|
317
|
+
# Note that :tag is a parameter, so for example,
|
318
|
+
# TABLE or BODY might be passed in
|
319
|
+
match
|
320
|
+
`< :tag WB
|
321
|
+
@cdata = (upto `>)
|
322
|
+
"</" :tag `>
|
323
|
+
end
|
324
|
+
|
325
|
+
|
326
|
+
Match IP address (honoring 255 limit) Regex: /\b(25[0-5]|2[0-4][0-9]|[01]?[0-9]{0,2})\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]{0,2})\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]{0,2})\.(25[0-5]|2[0-4][0-9]|[01]?[0-9]{0,2})\b/
|
327
|
+
|
328
|
+
dot = "."
|
329
|
+
num = "25" D5 | `2 D4 D | maybe D1 1,2*D
|
330
|
+
match WB num dot num dot num dot num WB end
|
331
|
+
|
332
|
+
Determine whether a credit card number is valid Regex: /^(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|6(?:011|5[0-9][0-9])[0-9]{12}|3[47][0-9]{13}|3(?:0[0-5]|[68][0-9])[0-9]{11}|(?:2131|1800|35\d{3})\d{11})$/
|
333
|
+
|
334
|
+
# Warning: This one likely has errors!
|
335
|
+
# Assuming no spaces
|
336
|
+
|
337
|
+
# Visa: ^4[0-9]{12}(?:[0-9]{3})?$
|
338
|
+
# All Visa card numbers start with a 4. New cards have 16 digits. Old cards have 13.
|
339
|
+
# MasterCard: ^5[1-5][0-9]{14}$
|
340
|
+
# All MasterCard numbers start with the numbers 51 through 55. All have 16 digits.
|
341
|
+
# American Express: ^3[47][0-9]{13}$
|
342
|
+
# American Express card numbers start with 34 or 37 and have 15 digits.
|
343
|
+
# Diners Club: ^3(?:0[0-5]|[68][0-9])[0-9]{11}$
|
344
|
+
# Diners Club card numbers begin with 300 through 305, 36 or 38. All have 14 digits.
|
345
|
+
# There are Diners Club cards that begin with 5 and have 16 digits. These are a
|
346
|
+
# joint venture between Diners Club and MasterCard, and should be processed like
|
347
|
+
# a MasterCard.
|
348
|
+
# Discover: ^6(?:011|5[0-9]{2})[0-9]{12}$
|
349
|
+
# Discover card numbers begin with 6011 or 65. All have 16 digits.
|
350
|
+
# JCB: ^(?:2131|1800|35\d{3})\d{11}$
|
351
|
+
# JCB cards beginning with 2131 or 1800 have 15 digits. JCB cards beginning with 35 have 16 digits.
|
352
|
+
|
353
|
+
visa = `4 12\*D maybe 3\*D
|
354
|
+
mc = `5 D5 14\*D
|
355
|
+
amex = `3 '47' 13\*D
|
356
|
+
diners = `3 (`0 D5 | '68' D) 11\*D
|
357
|
+
discover = `6 ("011" | `5 2\*D) 12\*D
|
358
|
+
jcb = ("2131"|"1800"|"35" 3\*D) 11\*D
|
359
|
+
|
360
|
+
match visa | mc | amex | diners | discover | jcb end
|
361
|
+
|
362
|
+
### Open Questions
|
363
|
+
|
364
|
+
1. What about pos/neg lookahead/lookbehind, possessive matches? Laziness??
|
365
|
+
2. Do upto and thru really make sense?
|
366
|
+
3. Do next and last really make sense?
|
367
|
+
4. How to handle /m? /o?
|
368
|
+
5. What special symbols/anchors do we need to predefine?
|
369
|
+
6. Possibly allow postfix repetition as well as prefix? (e.g.: pattern \* 1,3)
|
370
|
+
7. Other issues...
|
371
|
+
|
372
|
+
### Update history
|
373
|
+
|
374
|
+
This history has been maintained only since version 0.4.2
|
375
|
+
|
376
|
+
*0.4.3*
|
377
|
+
- Experimenting with lookarounds (pos/neg lookahead/behind)
|
378
|
+
- Rearranged tests
|
379
|
+
- Added "escaping" keyword
|
380
|
+
*0.4.2*
|
381
|
+
- UTF-8 encoding is assumed
|
382
|
+
- &xxxx notation can specify an arbitrary Unicode codepoint
|
383
|
+
- Backreferences work as expected
|
384
|
+
- Backreferences now can be inlined and parenthesized
|
385
|
+
- The nocase qualifier permits case-insensitive sub-expressions
|
data/lib/chars.rb
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
abort "Require out of order" if ! defined? Regexador
|
2
|
+
|
3
|
+
class Regexador::Parser
|
4
|
+
rule(:cSQUOTE) { str("'") }
|
5
|
+
rule(:cQUOTE) { str('"') }
|
6
|
+
rule(:cAMPERSAND) { str('&') }
|
7
|
+
rule(:cTICK) { str('`') }
|
8
|
+
rule(:cBAR) { str('|') }
|
9
|
+
rule(:cPERCENT) { str('%') }
|
10
|
+
rule(:cCOMMA) { str(',') }
|
11
|
+
rule(:cHYPHEN) { str('-') }
|
12
|
+
rule(:cTILDE) { str('~') }
|
13
|
+
rule(:cUNDERSCORE) { str('_') }
|
14
|
+
rule(:cEQUAL) { str('=') }
|
15
|
+
rule(:cHASH) { str('#') }
|
16
|
+
rule(:cTIMES) { str('*') }
|
17
|
+
rule(:cAT) { str("@") }
|
18
|
+
rule(:cCOLON) { str(":") }
|
19
|
+
rule(:cLPAREN) { str('(') }
|
20
|
+
rule(:cRPAREN) { str(')') }
|
21
|
+
rule(:cNEWLINE) { str("\n") }
|
22
|
+
end
|
data/lib/keywords.rb
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
|
2
|
+
abort "Require out of order" if ! defined? Regexador
|
3
|
+
|
4
|
+
class Regexador::Parser
|
5
|
+
|
6
|
+
rule(:kANY) { str("any") }
|
7
|
+
rule(:kMANY) { str("many") }
|
8
|
+
rule(:kMAYBE) { str("maybe") }
|
9
|
+
rule(:kMATCH) { str("match") }
|
10
|
+
rule(:kEND) { str("end") }
|
11
|
+
rule(:kNOCASE) { str("nocase") }
|
12
|
+
|
13
|
+
rule(:kWITH) { str("with") }
|
14
|
+
rule(:kWITHOUT) { str("without") }
|
15
|
+
rule(:kFIND) { str("find") }
|
16
|
+
|
17
|
+
rule(:kWITHIN) { str("within") }
|
18
|
+
rule(:kESCAPING) { str("escaping") }
|
19
|
+
|
20
|
+
rule(:keyword) { kANY | kMANY | kMAYBE | kMATCH | kEND | kNOCASE |
|
21
|
+
kWITH | kWITHOUT | kFIND | kWITHIN | kESCAPING }
|
22
|
+
end
|
data/lib/predefs.rb
ADDED
@@ -0,0 +1,49 @@
|
|
1
|
+
|
2
|
+
abort "Require out of order" if ! defined? Regexador
|
3
|
+
|
4
|
+
class Regexador::Parser
|
5
|
+
|
6
|
+
Predef2Regex = {
|
7
|
+
pD: "\\d",
|
8
|
+
pD0: "0",
|
9
|
+
pD1: "[01]",
|
10
|
+
pD2: "[0-2]",
|
11
|
+
pD3: "[0-3]",
|
12
|
+
pD4: "[0-4]",
|
13
|
+
pD5: "[0-5]",
|
14
|
+
pD6: "[0-6]",
|
15
|
+
pD7: "[0-7]",
|
16
|
+
pD8: "[0-8]",
|
17
|
+
pD9: "\\d",
|
18
|
+
pX: ".",
|
19
|
+
|
20
|
+
pCR: "\r",
|
21
|
+
pLF: "\n",
|
22
|
+
pNL: "\n",
|
23
|
+
pCRLF: "\r\n",
|
24
|
+
|
25
|
+
pSPACE: "\s", # ?
|
26
|
+
pSPACES: "\s+",
|
27
|
+
pBLANK: "\s",
|
28
|
+
pBLANKS: "\s+",
|
29
|
+
|
30
|
+
pWB: "\\b",
|
31
|
+
pBOS: "^",
|
32
|
+
pEOS: "$",
|
33
|
+
pSTART: "\A",
|
34
|
+
pEND: "\Z"
|
35
|
+
}
|
36
|
+
|
37
|
+
# We need to reverse sort the keys so that longer keys are used before
|
38
|
+
# shorter keys. (ie D0 vs. D)
|
39
|
+
syms = Predef2Regex.keys.sort.reverse
|
40
|
+
|
41
|
+
syms.each do |sym|
|
42
|
+
# rule(:WB) { str('WB') }
|
43
|
+
rule(sym) { str(sym.to_s[1..-1]) } # strip leading "p"
|
44
|
+
end
|
45
|
+
|
46
|
+
# rule(:predef) { (pD | pD0 | ...).as(:predef) }
|
47
|
+
rule(:predef) {
|
48
|
+
syms.map { |s| self.send(s) }.reduce(&:|).as(:predef) }
|
49
|
+
end
|