citrus 1.8.0 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README +53 -46
- data/benchmark/after.dat +192 -0
- data/benchmark/before.dat +192 -0
- data/benchmark/master.dat +192 -0
- data/doc/background.markdown +9 -10
- data/doc/example.markdown +24 -15
- data/doc/syntax.markdown +20 -21
- data/lib/citrus.rb +208 -178
- data/lib/citrus/debug.rb +34 -4
- data/test/file_test.rb +12 -12
- data/test/helper.rb +27 -5
- data/test/match_test.rb +18 -34
- data/test/parse_error_test.rb +56 -0
- data/test/terminal_test.rb +56 -0
- metadata +10 -7
- data/test/expression_test.rb +0 -29
- data/test/fixed_width_test.rb +0 -37
data/README
CHANGED
@@ -56,9 +56,9 @@ A [Rule](api/classes/Citrus/Rule.html) is an object that specifies some matching
|
|
56
56
|
behavior on a string. There are two types of rules: terminals and non-terminals.
|
57
57
|
Terminals can be either Ruby strings or regular expressions that specify some
|
58
58
|
input to match. For example, a terminal created from the string "end" would
|
59
|
-
match any sequence of the characters "e", "n", and "d", in that order.
|
60
|
-
|
61
|
-
|
59
|
+
match any sequence of the characters "e", "n", and "d", in that order. Terminals
|
60
|
+
created from regular expressions may match any sequence of characters that can
|
61
|
+
be generated from that expression.
|
62
62
|
|
63
63
|
Non-terminals are rules that may contain other rules but do not themselves match
|
64
64
|
directly on the input. For example, a Repeat is a non-terminal that may contain
|
@@ -85,10 +85,10 @@ similar to Ruby's super keyword.
|
|
85
85
|
## Matches
|
86
86
|
|
87
87
|
Matches are created by rule objects when they match on the input. A
|
88
|
-
[Match](api/classes/Citrus/Match.html)
|
89
|
-
[String](http://ruby-doc.org/core/classes/String.html) with some extra
|
90
|
-
information attached such as the name(s) of the rule(s) which
|
91
|
-
|
88
|
+
[Match](api/classes/Citrus/Match.html) is actually a
|
89
|
+
[String](http://ruby-doc.org/core/classes/String.html) object with some extra
|
90
|
+
information attached such as the name(s) of the rule(s) from which it was
|
91
|
+
generated and any submatches it may contain.
|
92
92
|
|
93
93
|
During a parse, matches are arranged in a tree structure where any match may
|
94
94
|
contain any number of other matches. This structure is determined by the way in
|
@@ -97,9 +97,8 @@ match that is created from a non-terminal rule that contains several other
|
|
97
97
|
terminals will likewise contain several matches, one for each terminal.
|
98
98
|
|
99
99
|
Match objects may be extended with semantic information in the form of methods.
|
100
|
-
These methods
|
101
|
-
|
102
|
-
and any submatches.
|
100
|
+
These methods should provide various interpretations for the semantic value of a
|
101
|
+
match.
|
103
102
|
|
104
103
|
|
105
104
|
# Syntax
|
@@ -125,8 +124,7 @@ compatibility with other parsing expression implementations.
|
|
125
124
|
[\x00-\xFF] # match any octet
|
126
125
|
. # match anything, even new lines
|
127
126
|
|
128
|
-
See [
|
129
|
-
[Expression](api/classes/Citrus/Expression.html) for more information.
|
127
|
+
See [Terminal](api/classes/Citrus/Terminal.html) for more information.
|
130
128
|
|
131
129
|
## Repetition
|
132
130
|
|
@@ -212,25 +210,25 @@ See [Label](api/classes/Citrus/Label.html) for more information.
|
|
212
210
|
The following table contains a list of all Citrus operators and their
|
213
211
|
precedence. A higher precedence indicates tighter binding.
|
214
212
|
|
215
|
-
|
216
|
-
|
217
|
-
|
218
|
-
|
219
|
-
|
220
|
-
|
221
|
-
|
222
|
-
|
223
|
-
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
231
|
-
|
232
|
-
|
233
|
-
|
213
|
+
Operator | Name | Precedence
|
214
|
+
----------- | ------------------------- | ----------
|
215
|
+
'' | String (single quoted) | 6
|
216
|
+
"" | String (double quoted) | 6
|
217
|
+
[] | Character class | 6
|
218
|
+
. | Dot (any character) | 6
|
219
|
+
// | Regular expression | 6
|
220
|
+
() | Grouping | 6
|
221
|
+
* | Repetition (arbitrary) | 5
|
222
|
+
+ | Repetition (one or more) | 5
|
223
|
+
? | Repetition (zero or one) | 5
|
224
|
+
& | And predicate | 4
|
225
|
+
! | Not predicate | 4
|
226
|
+
~ | But predicate | 4
|
227
|
+
: | Label | 4
|
228
|
+
<> | Extension (module name) | 3
|
229
|
+
{} | Extension (literal) | 3
|
230
|
+
e1 e2 | Sequence | 2
|
231
|
+
e1 | e2 | Ordered choice | 1
|
234
232
|
|
235
233
|
|
236
234
|
# Example
|
@@ -282,9 +280,9 @@ Submatches are created whenever a rule contains another rule. For example, in
|
|
282
280
|
the grammar above the number rule matches a string of digits followed by white
|
283
281
|
space. Thus, a match generated by the number rule will contain two submatches.
|
284
282
|
|
285
|
-
We can
|
286
|
-
matches when they are created
|
287
|
-
|
283
|
+
We can define methods inside a set of curly braces that will be used to extend
|
284
|
+
matches when they are created. This works in similar fashion to using Ruby's
|
285
|
+
blocks. Let's extend the `Addition` grammar using this technique.
|
288
286
|
|
289
287
|
grammar Addition
|
290
288
|
rule additive
|
@@ -318,25 +316,27 @@ on all match objects that result from matches of those particular rules. It's
|
|
318
316
|
easiest to explain what is going on here by starting with the lowest level
|
319
317
|
block, which is defined within the number rule.
|
320
318
|
|
321
|
-
The semantic block associated with the number rule defines one method, value
|
319
|
+
The semantic block associated with the number rule defines one method, `value`.
|
322
320
|
Inside this method, we can see that the value of a number match is determined to
|
323
|
-
be its text value, stripped of white space and converted to an integer.
|
324
|
-
that matches are simply strings, so the `strip`
|
325
|
-
|
326
|
-
|
327
|
-
|
328
|
-
|
329
|
-
|
330
|
-
|
321
|
+
be its text value, stripped of white space and converted to an integer.
|
322
|
+
[Remember](background.html) that matches are simply strings, so the `strip`
|
323
|
+
method in this case is actually
|
324
|
+
[String#strip](http://ruby-doc.org/core/classes/String.html#M000820).
|
325
|
+
|
326
|
+
The `additive` rule also extends its matches with a `value` method. Notice the
|
327
|
+
use of the `term` label within the rule definition. This label allows the match
|
328
|
+
that is created by either the additive or the number rule to be retrieved using
|
329
|
+
the `term` label. The value of an additive is determined to be the values of its
|
331
330
|
`number` and `term` matches added together using Ruby's addition operator.
|
332
331
|
|
333
332
|
Since additive is the first rule defined in the grammar, any match that results
|
334
333
|
from parsing a string with this grammar will have a `value` method that can be
|
335
334
|
used to recursively calculate the collective value of the entire match tree.
|
336
335
|
|
337
|
-
To give it a try, save the code for the Addition grammar in a file called
|
338
|
-
addition.citrus. Next, assuming you have the Citrus
|
339
|
-
|
336
|
+
To give it a try, save the code for the `Addition` grammar in a file called
|
337
|
+
addition.citrus. Next, assuming you have the Citrus
|
338
|
+
[gem](https://rubygems.org/gems/citrus) installed, try the following sequence of
|
339
|
+
commands in a terminal.
|
340
340
|
|
341
341
|
$ irb
|
342
342
|
> require 'citrus'
|
@@ -350,6 +350,13 @@ following sequence of commands in a terminal.
|
|
350
350
|
|
351
351
|
Congratulations! You just ran your first piece of Citrus code.
|
352
352
|
|
353
|
+
One interesting thing to notice about the above sequence of commands is the
|
354
|
+
return value of [Citrus#load](api/classes/Citrus.html#M000003). When you use
|
355
|
+
`Citrus.load` to
|
356
|
+
load a grammar file (and likewise [Citrus#eval](api/classes/Citrus.html#M000004) to evaluate
|
357
|
+
a raw string of grammar code), the return value is an array of all the grammars
|
358
|
+
present in that file.
|
359
|
+
|
353
360
|
Take a look at
|
354
361
|
[examples/calc.citrus](http://github.com/mjijackson/citrus/blob/master/examples/calc.citrus)
|
355
362
|
for an example of a calculator that is able to parse and evaluate more complex
|
data/benchmark/after.dat
ADDED
@@ -0,0 +1,192 @@
|
|
1
|
+
12 0
|
2
|
+
30 0
|
3
|
+
35 0
|
4
|
+
1749 9
|
5
|
+
1962 9
|
6
|
+
2383 10
|
7
|
+
3728 19
|
8
|
+
3919 19
|
9
|
+
3952 19
|
10
|
+
3995 30
|
11
|
+
4063 19
|
12
|
+
4325 20
|
13
|
+
4527 29
|
14
|
+
4570 20
|
15
|
+
4607 20
|
16
|
+
4654 19
|
17
|
+
4679 19
|
18
|
+
4774 29
|
19
|
+
4968 20
|
20
|
+
5059 30
|
21
|
+
5383 19
|
22
|
+
5915 29
|
23
|
+
6109 30
|
24
|
+
6122 20
|
25
|
+
6218 39
|
26
|
+
6332 49
|
27
|
+
6681 29
|
28
|
+
7440 50
|
29
|
+
7530 30
|
30
|
+
7605 40
|
31
|
+
8155 60
|
32
|
+
8402 49
|
33
|
+
8420 60
|
34
|
+
8617 50
|
35
|
+
8635 69
|
36
|
+
8841 60
|
37
|
+
8843 40
|
38
|
+
8852 70
|
39
|
+
9151 59
|
40
|
+
9271 50
|
41
|
+
9521 70
|
42
|
+
9525 49
|
43
|
+
9566 40
|
44
|
+
9584 60
|
45
|
+
9642 40
|
46
|
+
10138 40
|
47
|
+
10181 70
|
48
|
+
10225 49
|
49
|
+
10338 49
|
50
|
+
10449 70
|
51
|
+
10629 60
|
52
|
+
10763 69
|
53
|
+
10817 50
|
54
|
+
11059 50
|
55
|
+
11062 70
|
56
|
+
11215 60
|
57
|
+
11698 50
|
58
|
+
11891 89
|
59
|
+
11945 89
|
60
|
+
11956 49
|
61
|
+
12018 50
|
62
|
+
12053 89
|
63
|
+
12178 69
|
64
|
+
12283 70
|
65
|
+
12326 70
|
66
|
+
12430 79
|
67
|
+
12438 80
|
68
|
+
12572 60
|
69
|
+
12638 70
|
70
|
+
12687 80
|
71
|
+
12703 80
|
72
|
+
12896 89
|
73
|
+
12922 90
|
74
|
+
12996 70
|
75
|
+
13137 80
|
76
|
+
13211 80
|
77
|
+
13462 69
|
78
|
+
13477 59
|
79
|
+
13576 80
|
80
|
+
13577 70
|
81
|
+
13584 89
|
82
|
+
13605 80
|
83
|
+
13631 80
|
84
|
+
14216 99
|
85
|
+
14237 70
|
86
|
+
14260 89
|
87
|
+
14367 80
|
88
|
+
14371 80
|
89
|
+
14741 110
|
90
|
+
14893 89
|
91
|
+
14910 89
|
92
|
+
14917 70
|
93
|
+
14977 70
|
94
|
+
15049 89
|
95
|
+
15191 89
|
96
|
+
15382 89
|
97
|
+
15618 100
|
98
|
+
15623 89
|
99
|
+
15629 70
|
100
|
+
15856 100
|
101
|
+
16496 100
|
102
|
+
16512 109
|
103
|
+
16956 110
|
104
|
+
17074 99
|
105
|
+
17237 89
|
106
|
+
17371 80
|
107
|
+
17568 120
|
108
|
+
17945 80
|
109
|
+
18147 100
|
110
|
+
18343 99
|
111
|
+
18417 89
|
112
|
+
18823 89
|
113
|
+
18970 109
|
114
|
+
19285 89
|
115
|
+
19333 120
|
116
|
+
19500 109
|
117
|
+
19548 109
|
118
|
+
19634 120
|
119
|
+
19673 120
|
120
|
+
19689 109
|
121
|
+
19909 109
|
122
|
+
20054 110
|
123
|
+
20107 120
|
124
|
+
20248 109
|
125
|
+
20580 120
|
126
|
+
20744 109
|
127
|
+
20806 120
|
128
|
+
20954 120
|
129
|
+
21034 120
|
130
|
+
21187 100
|
131
|
+
21303 119
|
132
|
+
21450 129
|
133
|
+
21626 139
|
134
|
+
21931 129
|
135
|
+
21950 100
|
136
|
+
22359 119
|
137
|
+
22626 119
|
138
|
+
22638 120
|
139
|
+
22772 130
|
140
|
+
22885 119
|
141
|
+
22897 160
|
142
|
+
23114 150
|
143
|
+
23242 120
|
144
|
+
23428 150
|
145
|
+
23452 110
|
146
|
+
23495 159
|
147
|
+
23499 120
|
148
|
+
23558 140
|
149
|
+
23744 120
|
150
|
+
23881 120
|
151
|
+
23945 130
|
152
|
+
24361 149
|
153
|
+
24501 119
|
154
|
+
24642 139
|
155
|
+
24672 139
|
156
|
+
24694 160
|
157
|
+
24706 150
|
158
|
+
24931 129
|
159
|
+
25065 130
|
160
|
+
25140 130
|
161
|
+
25402 160
|
162
|
+
25702 150
|
163
|
+
25743 140
|
164
|
+
27139 160
|
165
|
+
27316 129
|
166
|
+
27333 149
|
167
|
+
27414 170
|
168
|
+
27771 139
|
169
|
+
27798 139
|
170
|
+
28583 150
|
171
|
+
28906 160
|
172
|
+
29025 179
|
173
|
+
29209 160
|
174
|
+
29272 179
|
175
|
+
29273 150
|
176
|
+
29359 140
|
177
|
+
29577 160
|
178
|
+
30886 189
|
179
|
+
31170 169
|
180
|
+
31593 199
|
181
|
+
32460 230
|
182
|
+
32486 180
|
183
|
+
32630 179
|
184
|
+
33010 199
|
185
|
+
33137 189
|
186
|
+
33142 189
|
187
|
+
33739 180
|
188
|
+
39880 229
|
189
|
+
39940 220
|
190
|
+
42952 269
|
191
|
+
43227 240
|
192
|
+
52855 290
|
@@ -0,0 +1,192 @@
|
|
1
|
+
12 0
|
2
|
+
30 0
|
3
|
+
35 0
|
4
|
+
1749 19
|
5
|
+
1962 9
|
6
|
+
2383 20
|
7
|
+
3728 29
|
8
|
+
3919 30
|
9
|
+
3952 29
|
10
|
+
3995 30
|
11
|
+
4063 20
|
12
|
+
4325 30
|
13
|
+
4527 30
|
14
|
+
4570 50
|
15
|
+
4607 49
|
16
|
+
4654 49
|
17
|
+
4679 29
|
18
|
+
4774 30
|
19
|
+
4968 80
|
20
|
+
5059 89
|
21
|
+
5383 39
|
22
|
+
5915 59
|
23
|
+
6109 40
|
24
|
+
6122 80
|
25
|
+
6218 99
|
26
|
+
6332 39
|
27
|
+
6681 89
|
28
|
+
7440 89
|
29
|
+
7530 70
|
30
|
+
7605 39
|
31
|
+
8155 89
|
32
|
+
8402 60
|
33
|
+
8420 99
|
34
|
+
8617 89
|
35
|
+
8635 110
|
36
|
+
8841 99
|
37
|
+
8843 89
|
38
|
+
8852 89
|
39
|
+
9151 70
|
40
|
+
9271 99
|
41
|
+
9521 89
|
42
|
+
9525 89
|
43
|
+
9566 80
|
44
|
+
9584 80
|
45
|
+
9642 59
|
46
|
+
10138 119
|
47
|
+
10181 89
|
48
|
+
10225 80
|
49
|
+
10338 99
|
50
|
+
10449 99
|
51
|
+
10629 109
|
52
|
+
10763 109
|
53
|
+
10817 120
|
54
|
+
11059 100
|
55
|
+
11062 99
|
56
|
+
11215 100
|
57
|
+
11698 79
|
58
|
+
11891 130
|
59
|
+
11945 109
|
60
|
+
11956 109
|
61
|
+
12018 129
|
62
|
+
12053 100
|
63
|
+
12178 130
|
64
|
+
12283 110
|
65
|
+
12326 119
|
66
|
+
12430 99
|
67
|
+
12438 109
|
68
|
+
12572 109
|
69
|
+
12638 120
|
70
|
+
12687 120
|
71
|
+
12703 110
|
72
|
+
12896 109
|
73
|
+
12922 119
|
74
|
+
12996 99
|
75
|
+
13137 109
|
76
|
+
13211 140
|
77
|
+
13462 120
|
78
|
+
13477 120
|
79
|
+
13576 130
|
80
|
+
13577 109
|
81
|
+
13584 109
|
82
|
+
13605 119
|
83
|
+
13631 120
|
84
|
+
14216 129
|
85
|
+
14237 119
|
86
|
+
14260 129
|
87
|
+
14367 120
|
88
|
+
14371 140
|
89
|
+
14741 140
|
90
|
+
14893 120
|
91
|
+
14910 120
|
92
|
+
14917 140
|
93
|
+
14977 100
|
94
|
+
15049 140
|
95
|
+
15191 130
|
96
|
+
15382 129
|
97
|
+
15618 130
|
98
|
+
15623 139
|
99
|
+
15629 129
|
100
|
+
15856 140
|
101
|
+
16496 120
|
102
|
+
16512 149
|
103
|
+
16956 160
|
104
|
+
17074 130
|
105
|
+
17237 150
|
106
|
+
17371 149
|
107
|
+
17568 150
|
108
|
+
17945 160
|
109
|
+
18147 140
|
110
|
+
18343 140
|
111
|
+
18417 160
|
112
|
+
18823 189
|
113
|
+
18970 140
|
114
|
+
19285 149
|
115
|
+
19333 169
|
116
|
+
19500 149
|
117
|
+
19548 170
|
118
|
+
19634 160
|
119
|
+
19673 160
|
120
|
+
19689 200
|
121
|
+
19909 169
|
122
|
+
20054 150
|
123
|
+
20107 199
|
124
|
+
20248 169
|
125
|
+
20580 179
|
126
|
+
20744 160
|
127
|
+
20806 170
|
128
|
+
20954 179
|
129
|
+
21034 160
|
130
|
+
21187 170
|
131
|
+
21303 209
|
132
|
+
21450 169
|
133
|
+
21626 179
|
134
|
+
21931 160
|
135
|
+
21950 179
|
136
|
+
22359 179
|
137
|
+
22626 190
|
138
|
+
22638 200
|
139
|
+
22772 200
|
140
|
+
22885 199
|
141
|
+
22897 189
|
142
|
+
23114 190
|
143
|
+
23242 170
|
144
|
+
23428 210
|
145
|
+
23452 190
|
146
|
+
23495 209
|
147
|
+
23499 209
|
148
|
+
23558 220
|
149
|
+
23744 200
|
150
|
+
23881 199
|
151
|
+
23945 189
|
152
|
+
24361 200
|
153
|
+
24501 199
|
154
|
+
24642 219
|
155
|
+
24672 199
|
156
|
+
24694 190
|
157
|
+
24706 199
|
158
|
+
24931 210
|
159
|
+
25065 190
|
160
|
+
25140 210
|
161
|
+
25402 199
|
162
|
+
25702 199
|
163
|
+
25743 219
|
164
|
+
27139 240
|
165
|
+
27316 239
|
166
|
+
27333 250
|
167
|
+
27414 209
|
168
|
+
27771 210
|
169
|
+
27798 229
|
170
|
+
28583 240
|
171
|
+
28906 240
|
172
|
+
29025 259
|
173
|
+
29209 219
|
174
|
+
29272 249
|
175
|
+
29273 250
|
176
|
+
29359 230
|
177
|
+
29577 250
|
178
|
+
30886 279
|
179
|
+
31170 259
|
180
|
+
31593 259
|
181
|
+
32460 300
|
182
|
+
32486 269
|
183
|
+
32630 280
|
184
|
+
33010 269
|
185
|
+
33137 270
|
186
|
+
33142 259
|
187
|
+
33739 289
|
188
|
+
39880 349
|
189
|
+
39940 359
|
190
|
+
42952 359
|
191
|
+
43227 360
|
192
|
+
52855 450
|