citrus 2.1.2 → 2.2.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README +106 -49
- data/benchmark/after.dat +192 -0
- data/benchmark/before.dat +192 -0
- data/citrus.gemspec +0 -1
- data/doc/extras.markdown +16 -0
- data/doc/syntax.markdown +76 -29
- data/doc/testing.markdown +12 -20
- data/examples/calc.citrus +12 -11
- data/examples/calc.rb +12 -11
- data/lib/citrus.rb +416 -253
- data/lib/citrus/file.rb +66 -33
- data/test/_files/super.citrus +1 -1
- data/test/_files/super2.citrus +13 -0
- data/test/alias_test.rb +18 -34
- data/test/and_predicate_test.rb +15 -10
- data/test/but_predicate_test.rb +22 -17
- data/test/calc_file_test.rb +1 -1
- data/test/choice_test.rb +12 -37
- data/test/{rule_test.rb → extension_test.rb} +17 -16
- data/test/file_test.rb +350 -244
- data/test/grammar_test.rb +5 -11
- data/test/helper.rb +1 -17
- data/test/input_test.rb +172 -2
- data/test/label_test.rb +0 -10
- data/test/match_test.rb +91 -35
- data/test/multibyte_test.rb +4 -4
- data/test/not_predicate_test.rb +15 -10
- data/test/parse_error_test.rb +1 -3
- data/test/repeat_test.rb +59 -32
- data/test/sequence_test.rb +19 -31
- data/test/string_terminal_test.rb +55 -0
- data/test/super_test.rb +31 -31
- data/test/terminal_test.rb +12 -37
- metadata +13 -23
- data/lib/citrus/debug.rb +0 -69
- data/test/debug_test.rb +0 -23
data/README
CHANGED
@@ -113,46 +113,57 @@ already be familiar to Ruby programmers.
|
|
113
113
|
Terminals may be represented by a string or a regular expression. Both follow
|
114
114
|
the same rules as Ruby string and regular expression literals.
|
115
115
|
|
116
|
-
'abc'
|
117
|
-
"abc\n"
|
118
|
-
|
116
|
+
'abc' # match "abc"
|
117
|
+
"abc\n" # match "abc\n"
|
118
|
+
/abc/i # match "abc" in any case
|
119
|
+
/\xFF/ # match "\xFF"
|
119
120
|
|
120
121
|
Character classes and the dot (match anything) symbol are supported as well for
|
121
122
|
compatibility with other parsing expression implementations.
|
122
123
|
|
123
124
|
[a-z0-9] # match any lowercase letter or digit
|
124
125
|
[\x00-\xFF] # match any octet
|
125
|
-
. # match
|
126
|
+
. # match any single character, including new lines
|
126
127
|
|
127
|
-
|
128
|
+
Also, strings may use backticks instead of quotes to indicate that they should
|
129
|
+
match in a case-insensitive manner.
|
130
|
+
|
131
|
+
`abc` # match "abc" in any case
|
132
|
+
|
133
|
+
See [Terminal](api/classes/Citrus/Terminal.html) and
|
134
|
+
[StringTerminal](api/classes/Citrus/StringTerminal.html) for more information.
|
128
135
|
|
129
136
|
## Repetition
|
130
137
|
|
131
138
|
Quantifiers may be used after any expression to specify a number of times it
|
132
|
-
must match. The universal form of a quantifier is N*M where N is the minimum
|
133
|
-
M is the maximum number of times the expression may match.
|
139
|
+
must match. The universal form of a quantifier is `N*M` where `N` is the minimum
|
140
|
+
and `M` is the maximum number of times the expression may match.
|
134
141
|
|
135
|
-
'abc'1*2 # match "abc" a minimum of one, maximum
|
136
|
-
# of two times
|
142
|
+
'abc'1*2 # match "abc" a minimum of one, maximum of two times
|
137
143
|
'abc'1* # match "abc" at least once
|
138
144
|
'abc'*2 # match "abc" a maximum of twice
|
139
145
|
|
140
|
-
|
141
|
-
|
146
|
+
Additionally, the minimum and maximum may be omitted entirely to specify that an
|
147
|
+
expression may match zero or more times.
|
148
|
+
|
149
|
+
'abc'* # match "abc" zero or more times
|
142
150
|
|
143
|
-
|
144
|
-
|
151
|
+
The `+` and `?` operators are supported as well for the common cases of `1*` and
|
152
|
+
`*1` respectively.
|
153
|
+
|
154
|
+
'abc'+ # match "abc" one or more times
|
155
|
+
'abc'? # match "abc" zero or one time
|
145
156
|
|
146
157
|
See [Repeat](api/classes/Citrus/Repeat.html) for more information.
|
147
158
|
|
148
159
|
## Lookahead
|
149
160
|
|
150
|
-
Both positive and negative lookahead are supported in Citrus. Use the
|
151
|
-
operators to indicate that an expression either should or should not match.
|
152
|
-
neither case is any input consumed.
|
161
|
+
Both positive and negative lookahead are supported in Citrus. Use the `&` and
|
162
|
+
`!` operators to indicate that an expression either should or should not match.
|
163
|
+
In neither case is any input consumed.
|
153
164
|
|
154
165
|
&'a' 'b' # match a "b" preceded by an "a"
|
155
|
-
|
166
|
+
'a' !'b' # match an "a" that is not followed by a "b"
|
156
167
|
!'a' . # match any character except for "a"
|
157
168
|
|
158
169
|
A special form of lookahead is also supported which will match any character
|
@@ -178,20 +189,17 @@ See [Sequence](api/classes/Citrus/Sequence.html) for more information.
|
|
178
189
|
## Choices
|
179
190
|
|
180
191
|
Ordered choice is indicated by a vertical bar that separates two expressions.
|
181
|
-
|
192
|
+
When using choice, each expression is tried in order. When one matches, the
|
193
|
+
rule returns the match immediately without trying the remaining rules.
|
182
194
|
|
183
195
|
'a' | 'b' # match "a" or "b"
|
184
196
|
'a' 'b' | 'c' # match "a" then "b" (in sequence), or "c"
|
185
197
|
|
186
|
-
|
187
|
-
|
188
|
-
|
189
|
-
|
190
|
-
When including a grammar inside another, all rules in the child that have the
|
191
|
-
same name as a rule in the parent also have access to the "super" keyword to
|
192
|
-
invoke the parent rule.
|
198
|
+
It is important to note when using ordered choice that any operator binds more
|
199
|
+
tightly than the vertical bar. A full chart of operators and their respective
|
200
|
+
levels of precedence is below.
|
193
201
|
|
194
|
-
See [
|
202
|
+
See [Choice](api/classes/Citrus/Choice.html) for more information.
|
195
203
|
|
196
204
|
## Labels
|
197
205
|
|
@@ -199,12 +207,50 @@ Match objects may be referred to by a different name than the rule that
|
|
199
207
|
originally generated them. Labels are created by placing the label and a colon
|
200
208
|
immediately preceding any expression.
|
201
209
|
|
202
|
-
chars:/[a-z]+/ # the characters matched by the regular
|
203
|
-
#
|
204
|
-
#
|
210
|
+
chars:/[a-z]+/ # the characters matched by the regular expression
|
211
|
+
# may be referred to as "chars" in an extension
|
212
|
+
# method
|
205
213
|
|
206
214
|
See [Label](api/classes/Citrus/Label.html) for more information.
|
207
215
|
|
216
|
+
## Grouping
|
217
|
+
|
218
|
+
As is common in many programming languages, parentheses may be used to override
|
219
|
+
the normal binding order of operators.
|
220
|
+
|
221
|
+
'a' ('b' | 'c') # match "a", then "b" or "c"
|
222
|
+
|
223
|
+
## Extensions
|
224
|
+
|
225
|
+
Extensions may be specified using either "module" or "block" syntax. When using
|
226
|
+
module syntax, specify the name of a module that is used to extend match objects
|
227
|
+
in between less than and greater than symbols.
|
228
|
+
|
229
|
+
[a-z0-9]5*9 <CouponCode> # match a string that consists of any lower
|
230
|
+
# cased letter or digit between 5 and 9
|
231
|
+
# times and extend the match with the
|
232
|
+
# CouponCode module
|
233
|
+
|
234
|
+
Additionally, extensions may be specified inline using curly braces. Inside the
|
235
|
+
curly braces you may embed method definitions that will be used to extend match
|
236
|
+
objects.
|
237
|
+
|
238
|
+
# match any digit and return its integer value when calling the
|
239
|
+
# #value method on the match object
|
240
|
+
[0-9] {
|
241
|
+
def value
|
242
|
+
to_i
|
243
|
+
end
|
244
|
+
}
|
245
|
+
|
246
|
+
## Super
|
247
|
+
|
248
|
+
When including a grammar inside another, all rules in the child that have the
|
249
|
+
same name as a rule in the parent also have access to the `super` keyword to
|
250
|
+
invoke the parent rule.
|
251
|
+
|
252
|
+
See [Super](api/classes/Citrus/Super.html) for more information.
|
253
|
+
|
208
254
|
## Precedence
|
209
255
|
|
210
256
|
The following table contains a list of all Citrus symbols and operators and
|
@@ -214,6 +260,7 @@ Operator | Name | Precedence
|
|
214
260
|
--------- | ------------------------- | ----------
|
215
261
|
'' | String (single quoted) | 6
|
216
262
|
"" | String (double quoted) | 6
|
263
|
+
`` | String (case insensitive) | 6
|
217
264
|
[] | Character class | 6
|
218
265
|
. | Dot (any character) | 6
|
219
266
|
// | Regular expression | 6
|
@@ -410,12 +457,11 @@ case that could be used to test that our grammar works properly.
|
|
410
457
|
end
|
411
458
|
end
|
412
459
|
|
413
|
-
The key here is using the
|
414
|
-
|
415
|
-
|
416
|
-
|
417
|
-
|
418
|
-
on the fly like this enables easy unit testing of the entire grammar.
|
460
|
+
The key here is using the `:root` option when performing the parse to specify
|
461
|
+
the name of the rule at which the parse should start. In `test_number`, since
|
462
|
+
`:number` was given the parse will start at that rule as if it were the root
|
463
|
+
rule of the entire grammar. The ability to change the root rule on the fly like
|
464
|
+
this enables easy unit testing of the entire grammar.
|
419
465
|
|
420
466
|
Also note that because match objects are themselves strings, assertions may be
|
421
467
|
made to test equality of match objects with string values.
|
@@ -424,9 +470,9 @@ made to test equality of match objects with string values.
|
|
424
470
|
|
425
471
|
When a parse fails, a [ParseError](api/classes/Citrus/ParseError.html) object is
|
426
472
|
generated which provides a wealth of information about exactly where the parse
|
427
|
-
failed
|
428
|
-
|
429
|
-
to do this.
|
473
|
+
failed including the offset, line number, line text, and line offset. Using this
|
474
|
+
object, you could possibly provide some useful feedback to the user about why
|
475
|
+
the input was bad. The following code demonstrates one way to do this.
|
430
476
|
|
431
477
|
def parse_some_stuff(stuff)
|
432
478
|
match = StuffGrammar.parse(stuff)
|
@@ -435,17 +481,28 @@ to do this.
|
|
435
481
|
[e.line_number, e.line_offset]
|
436
482
|
end
|
437
483
|
|
438
|
-
In addition to useful error objects, Citrus also includes a
|
439
|
-
|
440
|
-
|
441
|
-
|
442
|
-
|
443
|
-
|
444
|
-
|
445
|
-
|
446
|
-
|
447
|
-
|
448
|
-
|
484
|
+
In addition to useful error objects, Citrus also includes a means of visualizing
|
485
|
+
match trees in the console via `Match#dump`. This can help when determining
|
486
|
+
which rules are generating which matches and how they are organized in the
|
487
|
+
match tree.
|
488
|
+
|
489
|
+
|
490
|
+
# Extras
|
491
|
+
|
492
|
+
|
493
|
+
Several files are included in the Citrus repository that make it easier to work
|
494
|
+
with grammar files in various editors.
|
495
|
+
|
496
|
+
## TextMate
|
497
|
+
|
498
|
+
To install the Citrus [TextMate](http://macromates.com/) bundle, simply
|
499
|
+
double-click on the `Citrus.tmbundle` file in the `extras` directory.
|
500
|
+
|
501
|
+
## Vim
|
502
|
+
|
503
|
+
To install the [Vim](http://www.vim.org/) scripts, copy the files in
|
504
|
+
`extras/vim` to a directory in Vim's
|
505
|
+
[runtimepath](http://vimdoc.sourceforge.net/htmldoc/options.html#\'runtimepath\').
|
449
506
|
|
450
507
|
|
451
508
|
# Links
|
data/benchmark/after.dat
ADDED
@@ -0,0 +1,192 @@
|
|
1
|
+
12 0
|
2
|
+
30 0
|
3
|
+
35 0
|
4
|
+
1749 19
|
5
|
+
1962 9
|
6
|
+
2383 20
|
7
|
+
3728 29
|
8
|
+
3919 30
|
9
|
+
3952 30
|
10
|
+
3995 30
|
11
|
+
4063 30
|
12
|
+
4325 40
|
13
|
+
4527 39
|
14
|
+
4570 39
|
15
|
+
4607 50
|
16
|
+
4654 40
|
17
|
+
4679 39
|
18
|
+
4774 40
|
19
|
+
4968 40
|
20
|
+
5059 50
|
21
|
+
5383 39
|
22
|
+
5915 49
|
23
|
+
6109 50
|
24
|
+
6122 49
|
25
|
+
6218 50
|
26
|
+
6332 50
|
27
|
+
6681 59
|
28
|
+
7440 60
|
29
|
+
7530 60
|
30
|
+
7605 70
|
31
|
+
8155 60
|
32
|
+
8402 69
|
33
|
+
8420 70
|
34
|
+
8617 80
|
35
|
+
8635 70
|
36
|
+
8841 79
|
37
|
+
8843 79
|
38
|
+
8852 69
|
39
|
+
9151 80
|
40
|
+
9271 80
|
41
|
+
9521 80
|
42
|
+
9525 80
|
43
|
+
9566 80
|
44
|
+
9584 80
|
45
|
+
9642 70
|
46
|
+
10138 89
|
47
|
+
10181 80
|
48
|
+
10225 80
|
49
|
+
10338 80
|
50
|
+
10449 89
|
51
|
+
10629 90
|
52
|
+
10763 89
|
53
|
+
10817 89
|
54
|
+
11059 90
|
55
|
+
11062 90
|
56
|
+
11215 89
|
57
|
+
11698 99
|
58
|
+
11891 99
|
59
|
+
11945 100
|
60
|
+
11956 100
|
61
|
+
12018 100
|
62
|
+
12053 100
|
63
|
+
12178 99
|
64
|
+
12283 100
|
65
|
+
12326 109
|
66
|
+
12430 99
|
67
|
+
12438 99
|
68
|
+
12572 99
|
69
|
+
12638 99
|
70
|
+
12687 99
|
71
|
+
12703 109
|
72
|
+
12896 109
|
73
|
+
12922 109
|
74
|
+
12996 99
|
75
|
+
13137 110
|
76
|
+
13211 129
|
77
|
+
13462 109
|
78
|
+
13477 109
|
79
|
+
13576 109
|
80
|
+
13577 120
|
81
|
+
13584 110
|
82
|
+
13605 109
|
83
|
+
13631 109
|
84
|
+
14216 120
|
85
|
+
14237 120
|
86
|
+
14260 130
|
87
|
+
14367 119
|
88
|
+
14371 120
|
89
|
+
14741 120
|
90
|
+
14893 120
|
91
|
+
14910 120
|
92
|
+
14917 129
|
93
|
+
14977 130
|
94
|
+
15049 119
|
95
|
+
15191 130
|
96
|
+
15382 129
|
97
|
+
15618 129
|
98
|
+
15623 130
|
99
|
+
15629 129
|
100
|
+
15856 129
|
101
|
+
16496 130
|
102
|
+
16512 159
|
103
|
+
16956 129
|
104
|
+
17074 140
|
105
|
+
17237 139
|
106
|
+
17371 150
|
107
|
+
17568 149
|
108
|
+
17945 140
|
109
|
+
18147 149
|
110
|
+
18343 150
|
111
|
+
18417 160
|
112
|
+
18823 159
|
113
|
+
18970 149
|
114
|
+
19285 170
|
115
|
+
19333 160
|
116
|
+
19500 160
|
117
|
+
19548 150
|
118
|
+
19634 160
|
119
|
+
19673 160
|
120
|
+
19689 179
|
121
|
+
19909 169
|
122
|
+
20054 160
|
123
|
+
20107 169
|
124
|
+
20248 169
|
125
|
+
20580 169
|
126
|
+
20744 169
|
127
|
+
20806 189
|
128
|
+
20954 169
|
129
|
+
21034 179
|
130
|
+
21187 179
|
131
|
+
21303 189
|
132
|
+
21450 169
|
133
|
+
21626 179
|
134
|
+
21931 179
|
135
|
+
21950 169
|
136
|
+
22359 189
|
137
|
+
22626 190
|
138
|
+
22638 180
|
139
|
+
22772 189
|
140
|
+
22885 190
|
141
|
+
22897 190
|
142
|
+
23114 190
|
143
|
+
23242 189
|
144
|
+
23428 189
|
145
|
+
23452 209
|
146
|
+
23495 199
|
147
|
+
23499 189
|
148
|
+
23558 190
|
149
|
+
23744 190
|
150
|
+
23881 219
|
151
|
+
23945 189
|
152
|
+
24361 210
|
153
|
+
24501 210
|
154
|
+
24642 199
|
155
|
+
24672 200
|
156
|
+
24694 210
|
157
|
+
24706 210
|
158
|
+
24931 210
|
159
|
+
25065 210
|
160
|
+
25140 199
|
161
|
+
25402 210
|
162
|
+
25702 209
|
163
|
+
25743 209
|
164
|
+
27139 230
|
165
|
+
27316 240
|
166
|
+
27333 229
|
167
|
+
27414 220
|
168
|
+
27771 230
|
169
|
+
27798 229
|
170
|
+
28583 240
|
171
|
+
28906 239
|
172
|
+
29025 240
|
173
|
+
29209 240
|
174
|
+
29272 239
|
175
|
+
29273 250
|
176
|
+
29359 240
|
177
|
+
29577 240
|
178
|
+
30886 259
|
179
|
+
31170 250
|
180
|
+
31593 259
|
181
|
+
32460 269
|
182
|
+
32486 259
|
183
|
+
32630 269
|
184
|
+
33010 269
|
185
|
+
33137 289
|
186
|
+
33142 280
|
187
|
+
33739 280
|
188
|
+
39880 319
|
189
|
+
39940 339
|
190
|
+
42952 350
|
191
|
+
43227 350
|
192
|
+
52855 439
|
@@ -0,0 +1,192 @@
|
|
1
|
+
12 0
|
2
|
+
30 0
|
3
|
+
35 0
|
4
|
+
1749 20
|
5
|
+
1962 20
|
6
|
+
2383 30
|
7
|
+
3728 39
|
8
|
+
3919 49
|
9
|
+
3952 99
|
10
|
+
3995 120
|
11
|
+
4063 49
|
12
|
+
4325 49
|
13
|
+
4527 109
|
14
|
+
4570 89
|
15
|
+
4607 100
|
16
|
+
4654 59
|
17
|
+
4679 80
|
18
|
+
4774 130
|
19
|
+
4968 59
|
20
|
+
5059 60
|
21
|
+
5383 90
|
22
|
+
5915 70
|
23
|
+
6109 70
|
24
|
+
6122 150
|
25
|
+
6218 190
|
26
|
+
6332 89
|
27
|
+
6681 160
|
28
|
+
7440 149
|
29
|
+
7530 100
|
30
|
+
7605 129
|
31
|
+
8155 169
|
32
|
+
8402 210
|
33
|
+
8420 160
|
34
|
+
8617 179
|
35
|
+
8635 200
|
36
|
+
8841 129
|
37
|
+
8843 159
|
38
|
+
8852 140
|
39
|
+
9151 100
|
40
|
+
9271 170
|
41
|
+
9521 200
|
42
|
+
9525 179
|
43
|
+
9566 119
|
44
|
+
9584 149
|
45
|
+
9642 119
|
46
|
+
10138 200
|
47
|
+
10181 120
|
48
|
+
10225 160
|
49
|
+
10338 149
|
50
|
+
10449 160
|
51
|
+
10629 179
|
52
|
+
10763 190
|
53
|
+
10817 200
|
54
|
+
11059 169
|
55
|
+
11062 200
|
56
|
+
11215 179
|
57
|
+
11698 179
|
58
|
+
11891 219
|
59
|
+
11945 229
|
60
|
+
11956 209
|
61
|
+
12018 219
|
62
|
+
12053 179
|
63
|
+
12178 239
|
64
|
+
12283 150
|
65
|
+
12326 239
|
66
|
+
12430 210
|
67
|
+
12438 160
|
68
|
+
12572 190
|
69
|
+
12638 250
|
70
|
+
12687 219
|
71
|
+
12703 219
|
72
|
+
12896 190
|
73
|
+
12922 239
|
74
|
+
12996 189
|
75
|
+
13137 230
|
76
|
+
13211 230
|
77
|
+
13462 230
|
78
|
+
13477 199
|
79
|
+
13576 209
|
80
|
+
13577 219
|
81
|
+
13584 279
|
82
|
+
13605 210
|
83
|
+
13631 219
|
84
|
+
14216 230
|
85
|
+
14237 240
|
86
|
+
14260 190
|
87
|
+
14367 210
|
88
|
+
14371 240
|
89
|
+
14741 269
|
90
|
+
14893 229
|
91
|
+
14910 239
|
92
|
+
14917 239
|
93
|
+
14977 260
|
94
|
+
15049 260
|
95
|
+
15191 199
|
96
|
+
15382 260
|
97
|
+
15618 230
|
98
|
+
15623 250
|
99
|
+
15629 239
|
100
|
+
15856 269
|
101
|
+
16496 300
|
102
|
+
16512 250
|
103
|
+
16956 300
|
104
|
+
17074 250
|
105
|
+
17237 259
|
106
|
+
17371 300
|
107
|
+
17568 290
|
108
|
+
17945 300
|
109
|
+
18147 270
|
110
|
+
18343 269
|
111
|
+
18417 279
|
112
|
+
18823 400
|
113
|
+
18970 280
|
114
|
+
19285 269
|
115
|
+
19333 329
|
116
|
+
19500 269
|
117
|
+
19548 299
|
118
|
+
19634 280
|
119
|
+
19673 330
|
120
|
+
19689 359
|
121
|
+
19909 300
|
122
|
+
20054 280
|
123
|
+
20107 359
|
124
|
+
20248 319
|
125
|
+
20580 320
|
126
|
+
20744 299
|
127
|
+
20806 320
|
128
|
+
20954 359
|
129
|
+
21034 299
|
130
|
+
21187 309
|
131
|
+
21303 429
|
132
|
+
21450 310
|
133
|
+
21626 340
|
134
|
+
21931 369
|
135
|
+
21950 320
|
136
|
+
22359 370
|
137
|
+
22626 339
|
138
|
+
22638 359
|
139
|
+
22772 359
|
140
|
+
22885 350
|
141
|
+
22897 339
|
142
|
+
23114 320
|
143
|
+
23242 310
|
144
|
+
23428 359
|
145
|
+
23452 359
|
146
|
+
23495 379
|
147
|
+
23499 370
|
148
|
+
23558 350
|
149
|
+
23744 380
|
150
|
+
23881 400
|
151
|
+
23945 330
|
152
|
+
24361 350
|
153
|
+
24501 370
|
154
|
+
24642 369
|
155
|
+
24672 360
|
156
|
+
24694 429
|
157
|
+
24706 359
|
158
|
+
24931 370
|
159
|
+
25065 400
|
160
|
+
25140 330
|
161
|
+
25402 349
|
162
|
+
25702 380
|
163
|
+
25743 339
|
164
|
+
27139 410
|
165
|
+
27316 449
|
166
|
+
27333 430
|
167
|
+
27414 349
|
168
|
+
27771 390
|
169
|
+
27798 399
|
170
|
+
28583 370
|
171
|
+
28906 410
|
172
|
+
29025 490
|
173
|
+
29209 399
|
174
|
+
29272 460
|
175
|
+
29273 460
|
176
|
+
29359 420
|
177
|
+
29577 459
|
178
|
+
30886 500
|
179
|
+
31170 459
|
180
|
+
31593 500
|
181
|
+
32460 510
|
182
|
+
32486 509
|
183
|
+
32630 509
|
184
|
+
33010 480
|
185
|
+
33137 510
|
186
|
+
33142 470
|
187
|
+
33739 530
|
188
|
+
39880 629
|
189
|
+
39940 619
|
190
|
+
42952 659
|
191
|
+
43227 619
|
192
|
+
52855 780
|