citrus 1.8.0 → 2.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README +53 -46
- data/benchmark/after.dat +192 -0
- data/benchmark/before.dat +192 -0
- data/benchmark/master.dat +192 -0
- data/doc/background.markdown +9 -10
- data/doc/example.markdown +24 -15
- data/doc/syntax.markdown +20 -21
- data/lib/citrus.rb +208 -178
- data/lib/citrus/debug.rb +34 -4
- data/test/file_test.rb +12 -12
- data/test/helper.rb +27 -5
- data/test/match_test.rb +18 -34
- data/test/parse_error_test.rb +56 -0
- data/test/terminal_test.rb +56 -0
- metadata +10 -7
- data/test/expression_test.rb +0 -29
- data/test/fixed_width_test.rb +0 -37
data/README
CHANGED
@@ -56,9 +56,9 @@ A [Rule](api/classes/Citrus/Rule.html) is an object that specifies some matching
|
|
56
56
|
behavior on a string. There are two types of rules: terminals and non-terminals.
|
57
57
|
Terminals can be either Ruby strings or regular expressions that specify some
|
58
58
|
input to match. For example, a terminal created from the string "end" would
|
59
|
-
match any sequence of the characters "e", "n", and "d", in that order.
|
60
|
-
|
61
|
-
|
59
|
+
match any sequence of the characters "e", "n", and "d", in that order. Terminals
|
60
|
+
created from regular expressions may match any sequence of characters that can
|
61
|
+
be generated from that expression.
|
62
62
|
|
63
63
|
Non-terminals are rules that may contain other rules but do not themselves match
|
64
64
|
directly on the input. For example, a Repeat is a non-terminal that may contain
|
@@ -85,10 +85,10 @@ similar to Ruby's super keyword.
|
|
85
85
|
## Matches
|
86
86
|
|
87
87
|
Matches are created by rule objects when they match on the input. A
|
88
|
-
[Match](api/classes/Citrus/Match.html)
|
89
|
-
[String](http://ruby-doc.org/core/classes/String.html) with some extra
|
90
|
-
information attached such as the name(s) of the rule(s) which
|
91
|
-
|
88
|
+
[Match](api/classes/Citrus/Match.html) is actually a
|
89
|
+
[String](http://ruby-doc.org/core/classes/String.html) object with some extra
|
90
|
+
information attached such as the name(s) of the rule(s) from which it was
|
91
|
+
generated and any submatches it may contain.
|
92
92
|
|
93
93
|
During a parse, matches are arranged in a tree structure where any match may
|
94
94
|
contain any number of other matches. This structure is determined by the way in
|
@@ -97,9 +97,8 @@ match that is created from a non-terminal rule that contains several other
|
|
97
97
|
terminals will likewise contain several matches, one for each terminal.
|
98
98
|
|
99
99
|
Match objects may be extended with semantic information in the form of methods.
|
100
|
-
These methods
|
101
|
-
|
102
|
-
and any submatches.
|
100
|
+
These methods should provide various interpretations for the semantic value of a
|
101
|
+
match.
|
103
102
|
|
104
103
|
|
105
104
|
# Syntax
|
@@ -125,8 +124,7 @@ compatibility with other parsing expression implementations.
|
|
125
124
|
[\x00-\xFF] # match any octet
|
126
125
|
. # match anything, even new lines
|
127
126
|
|
128
|
-
See [
|
129
|
-
[Expression](api/classes/Citrus/Expression.html) for more information.
|
127
|
+
See [Terminal](api/classes/Citrus/Terminal.html) for more information.
|
130
128
|
|
131
129
|
## Repetition
|
132
130
|
|
@@ -212,25 +210,25 @@ See [Label](api/classes/Citrus/Label.html) for more information.
|
|
212
210
|
The following table contains a list of all Citrus operators and their
|
213
211
|
precedence. A higher precedence indicates tighter binding.
|
214
212
|
|
215
|
-
|
216
|
-
|
217
|
-
|
218
|
-
|
219
|
-
|
220
|
-
|
221
|
-
|
222
|
-
|
223
|
-
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
231
|
-
|
232
|
-
|
233
|
-
|
213
|
+
Operator | Name | Precedence
|
214
|
+
----------- | ------------------------- | ----------
|
215
|
+
'' | String (single quoted) | 6
|
216
|
+
"" | String (double quoted) | 6
|
217
|
+
[] | Character class | 6
|
218
|
+
. | Dot (any character) | 6
|
219
|
+
// | Regular expression | 6
|
220
|
+
() | Grouping | 6
|
221
|
+
* | Repetition (arbitrary) | 5
|
222
|
+
+ | Repetition (one or more) | 5
|
223
|
+
? | Repetition (zero or one) | 5
|
224
|
+
& | And predicate | 4
|
225
|
+
! | Not predicate | 4
|
226
|
+
~ | But predicate | 4
|
227
|
+
: | Label | 4
|
228
|
+
<> | Extension (module name) | 3
|
229
|
+
{} | Extension (literal) | 3
|
230
|
+
e1 e2 | Sequence | 2
|
231
|
+
e1 | e2 | Ordered choice | 1
|
234
232
|
|
235
233
|
|
236
234
|
# Example
|
@@ -282,9 +280,9 @@ Submatches are created whenever a rule contains another rule. For example, in
|
|
282
280
|
the grammar above the number rule matches a string of digits followed by white
|
283
281
|
space. Thus, a match generated by the number rule will contain two submatches.
|
284
282
|
|
285
|
-
We can
|
286
|
-
matches when they are created
|
287
|
-
|
283
|
+
We can define methods inside a set of curly braces that will be used to extend
|
284
|
+
matches when they are created. This works in similar fashion to using Ruby's
|
285
|
+
blocks. Let's extend the `Addition` grammar using this technique.
|
288
286
|
|
289
287
|
grammar Addition
|
290
288
|
rule additive
|
@@ -318,25 +316,27 @@ on all match objects that result from matches of those particular rules. It's
|
|
318
316
|
easiest to explain what is going on here by starting with the lowest level
|
319
317
|
block, which is defined within the number rule.
|
320
318
|
|
321
|
-
The semantic block associated with the number rule defines one method, value
|
319
|
+
The semantic block associated with the number rule defines one method, `value`.
|
322
320
|
Inside this method, we can see that the value of a number match is determined to
|
323
|
-
be its text value, stripped of white space and converted to an integer.
|
324
|
-
that matches are simply strings, so the `strip`
|
325
|
-
|
326
|
-
|
327
|
-
|
328
|
-
|
329
|
-
|
330
|
-
|
321
|
+
be its text value, stripped of white space and converted to an integer.
|
322
|
+
[Remember](background.html) that matches are simply strings, so the `strip`
|
323
|
+
method in this case is actually
|
324
|
+
[String#strip](http://ruby-doc.org/core/classes/String.html#M000820).
|
325
|
+
|
326
|
+
The `additive` rule also extends its matches with a `value` method. Notice the
|
327
|
+
use of the `term` label within the rule definition. This label allows the match
|
328
|
+
that is created by either the additive or the number rule to be retrieved using
|
329
|
+
the `term` label. The value of an additive is determined to be the values of its
|
331
330
|
`number` and `term` matches added together using Ruby's addition operator.
|
332
331
|
|
333
332
|
Since additive is the first rule defined in the grammar, any match that results
|
334
333
|
from parsing a string with this grammar will have a `value` method that can be
|
335
334
|
used to recursively calculate the collective value of the entire match tree.
|
336
335
|
|
337
|
-
To give it a try, save the code for the Addition grammar in a file called
|
338
|
-
addition.citrus. Next, assuming you have the Citrus
|
339
|
-
|
336
|
+
To give it a try, save the code for the `Addition` grammar in a file called
|
337
|
+
addition.citrus. Next, assuming you have the Citrus
|
338
|
+
[gem](https://rubygems.org/gems/citrus) installed, try the following sequence of
|
339
|
+
commands in a terminal.
|
340
340
|
|
341
341
|
$ irb
|
342
342
|
> require 'citrus'
|
@@ -350,6 +350,13 @@ following sequence of commands in a terminal.
|
|
350
350
|
|
351
351
|
Congratulations! You just ran your first piece of Citrus code.
|
352
352
|
|
353
|
+
One interesting thing to notice about the above sequence of commands is the
|
354
|
+
return value of [Citrus#load](api/classes/Citrus.html#M000003). When you use
|
355
|
+
`Citrus.load` to
|
356
|
+
load a grammar file (and likewise [Citrus#eval](api/classes/Citrus.html#M000004) to evaluate
|
357
|
+
a raw string of grammar code), the return value is an array of all the grammars
|
358
|
+
present in that file.
|
359
|
+
|
353
360
|
Take a look at
|
354
361
|
[examples/calc.citrus](http://github.com/mjijackson/citrus/blob/master/examples/calc.citrus)
|
355
362
|
for an example of a calculator that is able to parse and evaluate more complex
|
data/benchmark/after.dat
ADDED
@@ -0,0 +1,192 @@
|
|
1
|
+
12 0
|
2
|
+
30 0
|
3
|
+
35 0
|
4
|
+
1749 9
|
5
|
+
1962 9
|
6
|
+
2383 10
|
7
|
+
3728 19
|
8
|
+
3919 19
|
9
|
+
3952 19
|
10
|
+
3995 30
|
11
|
+
4063 19
|
12
|
+
4325 20
|
13
|
+
4527 29
|
14
|
+
4570 20
|
15
|
+
4607 20
|
16
|
+
4654 19
|
17
|
+
4679 19
|
18
|
+
4774 29
|
19
|
+
4968 20
|
20
|
+
5059 30
|
21
|
+
5383 19
|
22
|
+
5915 29
|
23
|
+
6109 30
|
24
|
+
6122 20
|
25
|
+
6218 39
|
26
|
+
6332 49
|
27
|
+
6681 29
|
28
|
+
7440 50
|
29
|
+
7530 30
|
30
|
+
7605 40
|
31
|
+
8155 60
|
32
|
+
8402 49
|
33
|
+
8420 60
|
34
|
+
8617 50
|
35
|
+
8635 69
|
36
|
+
8841 60
|
37
|
+
8843 40
|
38
|
+
8852 70
|
39
|
+
9151 59
|
40
|
+
9271 50
|
41
|
+
9521 70
|
42
|
+
9525 49
|
43
|
+
9566 40
|
44
|
+
9584 60
|
45
|
+
9642 40
|
46
|
+
10138 40
|
47
|
+
10181 70
|
48
|
+
10225 49
|
49
|
+
10338 49
|
50
|
+
10449 70
|
51
|
+
10629 60
|
52
|
+
10763 69
|
53
|
+
10817 50
|
54
|
+
11059 50
|
55
|
+
11062 70
|
56
|
+
11215 60
|
57
|
+
11698 50
|
58
|
+
11891 89
|
59
|
+
11945 89
|
60
|
+
11956 49
|
61
|
+
12018 50
|
62
|
+
12053 89
|
63
|
+
12178 69
|
64
|
+
12283 70
|
65
|
+
12326 70
|
66
|
+
12430 79
|
67
|
+
12438 80
|
68
|
+
12572 60
|
69
|
+
12638 70
|
70
|
+
12687 80
|
71
|
+
12703 80
|
72
|
+
12896 89
|
73
|
+
12922 90
|
74
|
+
12996 70
|
75
|
+
13137 80
|
76
|
+
13211 80
|
77
|
+
13462 69
|
78
|
+
13477 59
|
79
|
+
13576 80
|
80
|
+
13577 70
|
81
|
+
13584 89
|
82
|
+
13605 80
|
83
|
+
13631 80
|
84
|
+
14216 99
|
85
|
+
14237 70
|
86
|
+
14260 89
|
87
|
+
14367 80
|
88
|
+
14371 80
|
89
|
+
14741 110
|
90
|
+
14893 89
|
91
|
+
14910 89
|
92
|
+
14917 70
|
93
|
+
14977 70
|
94
|
+
15049 89
|
95
|
+
15191 89
|
96
|
+
15382 89
|
97
|
+
15618 100
|
98
|
+
15623 89
|
99
|
+
15629 70
|
100
|
+
15856 100
|
101
|
+
16496 100
|
102
|
+
16512 109
|
103
|
+
16956 110
|
104
|
+
17074 99
|
105
|
+
17237 89
|
106
|
+
17371 80
|
107
|
+
17568 120
|
108
|
+
17945 80
|
109
|
+
18147 100
|
110
|
+
18343 99
|
111
|
+
18417 89
|
112
|
+
18823 89
|
113
|
+
18970 109
|
114
|
+
19285 89
|
115
|
+
19333 120
|
116
|
+
19500 109
|
117
|
+
19548 109
|
118
|
+
19634 120
|
119
|
+
19673 120
|
120
|
+
19689 109
|
121
|
+
19909 109
|
122
|
+
20054 110
|
123
|
+
20107 120
|
124
|
+
20248 109
|
125
|
+
20580 120
|
126
|
+
20744 109
|
127
|
+
20806 120
|
128
|
+
20954 120
|
129
|
+
21034 120
|
130
|
+
21187 100
|
131
|
+
21303 119
|
132
|
+
21450 129
|
133
|
+
21626 139
|
134
|
+
21931 129
|
135
|
+
21950 100
|
136
|
+
22359 119
|
137
|
+
22626 119
|
138
|
+
22638 120
|
139
|
+
22772 130
|
140
|
+
22885 119
|
141
|
+
22897 160
|
142
|
+
23114 150
|
143
|
+
23242 120
|
144
|
+
23428 150
|
145
|
+
23452 110
|
146
|
+
23495 159
|
147
|
+
23499 120
|
148
|
+
23558 140
|
149
|
+
23744 120
|
150
|
+
23881 120
|
151
|
+
23945 130
|
152
|
+
24361 149
|
153
|
+
24501 119
|
154
|
+
24642 139
|
155
|
+
24672 139
|
156
|
+
24694 160
|
157
|
+
24706 150
|
158
|
+
24931 129
|
159
|
+
25065 130
|
160
|
+
25140 130
|
161
|
+
25402 160
|
162
|
+
25702 150
|
163
|
+
25743 140
|
164
|
+
27139 160
|
165
|
+
27316 129
|
166
|
+
27333 149
|
167
|
+
27414 170
|
168
|
+
27771 139
|
169
|
+
27798 139
|
170
|
+
28583 150
|
171
|
+
28906 160
|
172
|
+
29025 179
|
173
|
+
29209 160
|
174
|
+
29272 179
|
175
|
+
29273 150
|
176
|
+
29359 140
|
177
|
+
29577 160
|
178
|
+
30886 189
|
179
|
+
31170 169
|
180
|
+
31593 199
|
181
|
+
32460 230
|
182
|
+
32486 180
|
183
|
+
32630 179
|
184
|
+
33010 199
|
185
|
+
33137 189
|
186
|
+
33142 189
|
187
|
+
33739 180
|
188
|
+
39880 229
|
189
|
+
39940 220
|
190
|
+
42952 269
|
191
|
+
43227 240
|
192
|
+
52855 290
|
@@ -0,0 +1,192 @@
|
|
1
|
+
12 0
|
2
|
+
30 0
|
3
|
+
35 0
|
4
|
+
1749 19
|
5
|
+
1962 9
|
6
|
+
2383 20
|
7
|
+
3728 29
|
8
|
+
3919 30
|
9
|
+
3952 29
|
10
|
+
3995 30
|
11
|
+
4063 20
|
12
|
+
4325 30
|
13
|
+
4527 30
|
14
|
+
4570 50
|
15
|
+
4607 49
|
16
|
+
4654 49
|
17
|
+
4679 29
|
18
|
+
4774 30
|
19
|
+
4968 80
|
20
|
+
5059 89
|
21
|
+
5383 39
|
22
|
+
5915 59
|
23
|
+
6109 40
|
24
|
+
6122 80
|
25
|
+
6218 99
|
26
|
+
6332 39
|
27
|
+
6681 89
|
28
|
+
7440 89
|
29
|
+
7530 70
|
30
|
+
7605 39
|
31
|
+
8155 89
|
32
|
+
8402 60
|
33
|
+
8420 99
|
34
|
+
8617 89
|
35
|
+
8635 110
|
36
|
+
8841 99
|
37
|
+
8843 89
|
38
|
+
8852 89
|
39
|
+
9151 70
|
40
|
+
9271 99
|
41
|
+
9521 89
|
42
|
+
9525 89
|
43
|
+
9566 80
|
44
|
+
9584 80
|
45
|
+
9642 59
|
46
|
+
10138 119
|
47
|
+
10181 89
|
48
|
+
10225 80
|
49
|
+
10338 99
|
50
|
+
10449 99
|
51
|
+
10629 109
|
52
|
+
10763 109
|
53
|
+
10817 120
|
54
|
+
11059 100
|
55
|
+
11062 99
|
56
|
+
11215 100
|
57
|
+
11698 79
|
58
|
+
11891 130
|
59
|
+
11945 109
|
60
|
+
11956 109
|
61
|
+
12018 129
|
62
|
+
12053 100
|
63
|
+
12178 130
|
64
|
+
12283 110
|
65
|
+
12326 119
|
66
|
+
12430 99
|
67
|
+
12438 109
|
68
|
+
12572 109
|
69
|
+
12638 120
|
70
|
+
12687 120
|
71
|
+
12703 110
|
72
|
+
12896 109
|
73
|
+
12922 119
|
74
|
+
12996 99
|
75
|
+
13137 109
|
76
|
+
13211 140
|
77
|
+
13462 120
|
78
|
+
13477 120
|
79
|
+
13576 130
|
80
|
+
13577 109
|
81
|
+
13584 109
|
82
|
+
13605 119
|
83
|
+
13631 120
|
84
|
+
14216 129
|
85
|
+
14237 119
|
86
|
+
14260 129
|
87
|
+
14367 120
|
88
|
+
14371 140
|
89
|
+
14741 140
|
90
|
+
14893 120
|
91
|
+
14910 120
|
92
|
+
14917 140
|
93
|
+
14977 100
|
94
|
+
15049 140
|
95
|
+
15191 130
|
96
|
+
15382 129
|
97
|
+
15618 130
|
98
|
+
15623 139
|
99
|
+
15629 129
|
100
|
+
15856 140
|
101
|
+
16496 120
|
102
|
+
16512 149
|
103
|
+
16956 160
|
104
|
+
17074 130
|
105
|
+
17237 150
|
106
|
+
17371 149
|
107
|
+
17568 150
|
108
|
+
17945 160
|
109
|
+
18147 140
|
110
|
+
18343 140
|
111
|
+
18417 160
|
112
|
+
18823 189
|
113
|
+
18970 140
|
114
|
+
19285 149
|
115
|
+
19333 169
|
116
|
+
19500 149
|
117
|
+
19548 170
|
118
|
+
19634 160
|
119
|
+
19673 160
|
120
|
+
19689 200
|
121
|
+
19909 169
|
122
|
+
20054 150
|
123
|
+
20107 199
|
124
|
+
20248 169
|
125
|
+
20580 179
|
126
|
+
20744 160
|
127
|
+
20806 170
|
128
|
+
20954 179
|
129
|
+
21034 160
|
130
|
+
21187 170
|
131
|
+
21303 209
|
132
|
+
21450 169
|
133
|
+
21626 179
|
134
|
+
21931 160
|
135
|
+
21950 179
|
136
|
+
22359 179
|
137
|
+
22626 190
|
138
|
+
22638 200
|
139
|
+
22772 200
|
140
|
+
22885 199
|
141
|
+
22897 189
|
142
|
+
23114 190
|
143
|
+
23242 170
|
144
|
+
23428 210
|
145
|
+
23452 190
|
146
|
+
23495 209
|
147
|
+
23499 209
|
148
|
+
23558 220
|
149
|
+
23744 200
|
150
|
+
23881 199
|
151
|
+
23945 189
|
152
|
+
24361 200
|
153
|
+
24501 199
|
154
|
+
24642 219
|
155
|
+
24672 199
|
156
|
+
24694 190
|
157
|
+
24706 199
|
158
|
+
24931 210
|
159
|
+
25065 190
|
160
|
+
25140 210
|
161
|
+
25402 199
|
162
|
+
25702 199
|
163
|
+
25743 219
|
164
|
+
27139 240
|
165
|
+
27316 239
|
166
|
+
27333 250
|
167
|
+
27414 209
|
168
|
+
27771 210
|
169
|
+
27798 229
|
170
|
+
28583 240
|
171
|
+
28906 240
|
172
|
+
29025 259
|
173
|
+
29209 219
|
174
|
+
29272 249
|
175
|
+
29273 250
|
176
|
+
29359 230
|
177
|
+
29577 250
|
178
|
+
30886 279
|
179
|
+
31170 259
|
180
|
+
31593 259
|
181
|
+
32460 300
|
182
|
+
32486 269
|
183
|
+
32630 280
|
184
|
+
33010 269
|
185
|
+
33137 270
|
186
|
+
33142 259
|
187
|
+
33739 289
|
188
|
+
39880 349
|
189
|
+
39940 359
|
190
|
+
42952 359
|
191
|
+
43227 360
|
192
|
+
52855 450
|