citrus 2.4.0 → 2.4.1
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGES +6 -0
- data/{README → README.md} +89 -86
- data/citrus.gemspec +2 -2
- data/doc/syntax.markdown +4 -4
- data/lib/citrus/file.rb +1 -1
- data/lib/citrus/grammars.rb +2 -1
- data/lib/citrus/version.rb +1 -1
- data/test/_files/{super.citrus → file1.citrus} +0 -0
- data/test/_files/{super2.citrus → file2.citrus} +0 -0
- data/test/_files/file3.citrus +9 -0
- data/test/_files/rule5.citrus +1 -0
- data/test/file_test.rb +1 -1
- metadata +31 -62
- data/benchmark/after.dat +0 -192
- data/benchmark/before.dat +0 -192
- data/test/alias_test.rbc +0 -1491
- data/test/and_predicate_test.rbc +0 -1327
- data/test/but_predicate_test.rbc +0 -1398
- data/test/choice_test.rbc +0 -1278
- data/test/extension_test.rbc +0 -1121
- data/test/file_test.rbc +0 -21465
- data/test/grammar_test.rbc +0 -3490
- data/test/grammars/calc_test.rbc +0 -2268
- data/test/grammars/ipaddress_test.rbc +0 -515
- data/test/grammars/ipv4address_test.rbc +0 -1112
- data/test/grammars/ipv6address_test.rbc +0 -1195
- data/test/helper.rbc +0 -1046
- data/test/input_test.rbc +0 -1775
- data/test/label_test.rbc +0 -692
- data/test/match_test.rbc +0 -2415
- data/test/memoized_input_test.rbc +0 -1607
- data/test/multibyte_test.rbc +0 -1597
- data/test/not_predicate_test.rbc +0 -1327
- data/test/parse_error_test.rbc +0 -1253
- data/test/repeat_test.rbc +0 -3245
- data/test/sequence_test.rbc +0 -1534
- data/test/string_terminal_test.rbc +0 -1802
- data/test/super_test.rbc +0 -2083
- data/test/terminal_test.rbc +0 -1315
data/CHANGES
CHANGED
data/{README → README.md}
RENAMED
@@ -1,9 +1,4 @@
|
|
1
|
-
|
2
|
-
|
3
|
-
~* Citrus *~
|
4
|
-
|
5
|
-
Parsing Expressions for Ruby
|
6
|
-
|
1
|
+
Citrus :: Parsing Expressions for Ruby
|
7
2
|
|
8
3
|
Citrus is a compact and powerful parsing library for
|
9
4
|
[Ruby](http://ruby-lang.org/) that combines the elegance and expressiveness of
|
@@ -52,13 +47,13 @@ In Citrus, there are three main types of objects: rules, grammars, and matches.
|
|
52
47
|
|
53
48
|
## Rules
|
54
49
|
|
55
|
-
A [Rule](api/classes/Citrus/Rule.html) is an object
|
56
|
-
behavior on a string. There are two types of rules:
|
57
|
-
Terminals can be either Ruby strings or regular
|
58
|
-
input to match. For example, a terminal created
|
59
|
-
match any sequence of the characters "e", "n", and
|
60
|
-
created from regular expressions may match any
|
61
|
-
be generated from that expression.
|
50
|
+
A [Rule](http://mjijackson.com/citrus/api/classes/Citrus/Rule.html) is an object
|
51
|
+
that specifies some matching behavior on a string. There are two types of rules:
|
52
|
+
terminals and non-terminals. Terminals can be either Ruby strings or regular
|
53
|
+
expressions that specify some input to match. For example, a terminal created
|
54
|
+
from the string "end" would match any sequence of the characters "e", "n", and
|
55
|
+
"d", in that order. Terminals created from regular expressions may match any
|
56
|
+
sequence of characters that can be generated from that expression.
|
62
57
|
|
63
58
|
Non-terminals are rules that may contain other rules but do not themselves match
|
64
59
|
directly on the input. For example, a Repeat is a non-terminal that may contain
|
@@ -70,9 +65,9 @@ of Ruby modules. Rules use these modules to extend the matches they create.
|
|
70
65
|
|
71
66
|
## Grammars
|
72
67
|
|
73
|
-
A [Grammar](api/classes/Citrus/Grammar.html) is a
|
74
|
-
the rules in a grammar collectively form a complete
|
75
|
-
language, or a well-defined subset thereof.
|
68
|
+
A [Grammar](http://mjijackson.com/citrus/api/classes/Citrus/Grammar.html) is a
|
69
|
+
container for rules. Usually the rules in a grammar collectively form a complete
|
70
|
+
specification for some language, or a well-defined subset thereof.
|
76
71
|
|
77
72
|
A Citrus grammar is really just a souped-up Ruby
|
78
73
|
[module](http://ruby-doc.org/core/classes/Module.html). These modules may be
|
@@ -84,8 +79,9 @@ Ruby's `super` keyword.
|
|
84
79
|
|
85
80
|
## Matches
|
86
81
|
|
87
|
-
A [Match](api/classes/Citrus/Match.html) object
|
88
|
-
recognition of some piece of the input. Matches are
|
82
|
+
A [Match](http://mjijackson.com/citrus/api/classes/Citrus/Match.html) object
|
83
|
+
represents a successful recognition of some piece of the input. Matches are
|
84
|
+
created by rule objects during a parse.
|
89
85
|
|
90
86
|
Matches are arranged in a tree structure where any match may contain any number
|
91
87
|
of other matches. Each match contains information about its own subtree. The
|
@@ -132,8 +128,9 @@ match in a case-insensitive manner.
|
|
132
128
|
Besides case sensitivity, case-insensitive strings have the same behavior as
|
133
129
|
double quoted strings.
|
134
130
|
|
135
|
-
See [Terminal](api/classes/Citrus/Terminal.html) and
|
136
|
-
[StringTerminal](api/classes/Citrus/StringTerminal.html)
|
131
|
+
See [Terminal](http://mjijackson.com/citrus/api/classes/Citrus/Terminal.html) and
|
132
|
+
[StringTerminal](http://mjijackson.com/citrus/api/classes/Citrus/StringTerminal.html)
|
133
|
+
for more information.
|
137
134
|
|
138
135
|
## Repetition
|
139
136
|
|
@@ -156,7 +153,8 @@ The `+` and `?` operators are supported as well for the common cases of `1*` and
|
|
156
153
|
'abc'+ # match "abc" one or more times
|
157
154
|
'abc'? # match "abc" zero or one time
|
158
155
|
|
159
|
-
See [Repeat](api/classes/Citrus/Repeat.html) for
|
156
|
+
See [Repeat](http://mjijackson.com/citrus/api/classes/Citrus/Repeat.html) for
|
157
|
+
more information.
|
160
158
|
|
161
159
|
## Lookahead
|
162
160
|
|
@@ -164,7 +162,7 @@ Both positive and negative lookahead are supported in Citrus. Use the `&` and
|
|
164
162
|
`!` operators to indicate that an expression either should or should not match.
|
165
163
|
In neither case is any input consumed.
|
166
164
|
|
167
|
-
|
165
|
+
'a' &'b' # match an "a" that is followed by a "b"
|
168
166
|
'a' !'b' # match an "a" that is not followed by a "b"
|
169
167
|
!'a' . # match any character except for "a"
|
170
168
|
|
@@ -177,9 +175,10 @@ that does not match a given expression.
|
|
177
175
|
When using this operator (the tilde), at least one character must be consumed
|
178
176
|
for the rule to succeed.
|
179
177
|
|
180
|
-
See [AndPredicate](api/classes/Citrus/AndPredicate.html),
|
181
|
-
[NotPredicate](api/classes/Citrus/NotPredicate.html),
|
182
|
-
[ButPredicate](api/classes/Citrus/ButPredicate.html)
|
178
|
+
See [AndPredicate](http://mjijackson.com/citrus/api/classes/Citrus/AndPredicate.html),
|
179
|
+
[NotPredicate](http://mjijackson.com/citrus/api/classes/Citrus/NotPredicate.html),
|
180
|
+
and [ButPredicate](http://mjijackson.com/citrus/api/classes/Citrus/ButPredicate.html)
|
181
|
+
for more information.
|
183
182
|
|
184
183
|
## Sequences
|
185
184
|
|
@@ -189,7 +188,8 @@ should match in that order.
|
|
189
188
|
'a' 'b' 'c' # match "a", then "b", then "c"
|
190
189
|
'a' [0-9] # match "a", then a numeric digit
|
191
190
|
|
192
|
-
See [Sequence](api/classes/Citrus/Sequence.html)
|
191
|
+
See [Sequence](http://mjijackson.com/citrus/api/classes/Citrus/Sequence.html)
|
192
|
+
for more information.
|
193
193
|
|
194
194
|
## Choices
|
195
195
|
|
@@ -204,7 +204,8 @@ It is important to note when using ordered choice that any operator binds more
|
|
204
204
|
tightly than the vertical bar. A full chart of operators and their respective
|
205
205
|
levels of precedence is below.
|
206
206
|
|
207
|
-
See [Choice](api/classes/Citrus/Choice.html) for
|
207
|
+
See [Choice](http://mjijackson.com/citrus/api/classes/Citrus/Choice.html) for
|
208
|
+
more information.
|
208
209
|
|
209
210
|
## Labels
|
210
211
|
|
@@ -245,14 +246,14 @@ same name as a rule in the parent also have access to the `super` keyword to
|
|
245
246
|
invoke the parent rule.
|
246
247
|
|
247
248
|
grammar Number
|
248
|
-
|
249
|
+
rule number
|
249
250
|
[0-9]+
|
250
251
|
end
|
251
252
|
end
|
252
|
-
|
253
|
+
|
253
254
|
grammar FloatingPoint
|
254
255
|
include Number
|
255
|
-
|
256
|
+
|
256
257
|
rule number
|
257
258
|
super ('.' super)?
|
258
259
|
end
|
@@ -262,33 +263,34 @@ In the example above, the `FloatingPoint` grammar includes `Number`. Both have a
|
|
262
263
|
rule named `number`, so `FloatingPoint#number` has access to `Number#number` by
|
263
264
|
means of using `super`.
|
264
265
|
|
265
|
-
See [Super](api/classes/Citrus/Super.html) for more
|
266
|
+
See [Super](http://mjijackson.com/citrus/api/classes/Citrus/Super.html) for more
|
267
|
+
information.
|
266
268
|
|
267
269
|
## Precedence
|
268
270
|
|
269
271
|
The following table contains a list of all Citrus symbols and operators and
|
270
272
|
their precedence. A higher precedence indicates tighter binding.
|
271
273
|
|
272
|
-
Operator
|
273
|
-
|
274
|
-
''
|
275
|
-
""
|
276
|
-
|
277
|
-
[]
|
278
|
-
|
279
|
-
|
280
|
-
()
|
281
|
-
|
282
|
-
|
283
|
-
|
284
|
-
|
285
|
-
|
286
|
-
|
287
|
-
|
288
|
-
{}
|
289
|
-
|
290
|
-
e1 e2
|
291
|
-
e1
|
274
|
+
Operator | Name | Precedence
|
275
|
+
------------------------- | ------------------------- | ----------
|
276
|
+
`''` | String (single quoted) | 7
|
277
|
+
`""` | String (double quoted) | 7
|
278
|
+
<code>``</code> | String (case insensitive) | 7
|
279
|
+
`[]` | Character class | 7
|
280
|
+
`.` | Dot (any character) | 7
|
281
|
+
`//` | Regular expression | 7
|
282
|
+
`()` | Grouping | 7
|
283
|
+
`*` | Repetition (arbitrary) | 6
|
284
|
+
`+` | Repetition (one or more) | 6
|
285
|
+
`?` | Repetition (zero or one) | 6
|
286
|
+
`&` | And predicate | 5
|
287
|
+
`!` | Not predicate | 5
|
288
|
+
`~` | But predicate | 5
|
289
|
+
`<>` | Extension (module name) | 4
|
290
|
+
`{}` | Extension (literal) | 4
|
291
|
+
`:` | Label | 3
|
292
|
+
`e1 e2` | Sequence | 2
|
293
|
+
<code>e1 | e2</code> | Ordered choice | 1
|
292
294
|
|
293
295
|
## Grouping
|
294
296
|
|
@@ -310,15 +312,15 @@ integers separated by any amount of white space and a `+` symbol.
|
|
310
312
|
rule additive
|
311
313
|
number plus (additive | number)
|
312
314
|
end
|
313
|
-
|
315
|
+
|
314
316
|
rule number
|
315
317
|
[0-9]+ space
|
316
318
|
end
|
317
|
-
|
319
|
+
|
318
320
|
rule plus
|
319
321
|
'+' space
|
320
322
|
end
|
321
|
-
|
323
|
+
|
322
324
|
rule space
|
323
325
|
[ \t]*
|
324
326
|
end
|
@@ -341,8 +343,9 @@ and "1 + 2+3", but it does not have enough semantic information to be able to
|
|
341
343
|
actually interpret these expressions.
|
342
344
|
|
343
345
|
At this point, when the grammar parses a string it generates a tree of
|
344
|
-
[Match](api/classes/Citrus/Match.html) objects.
|
345
|
-
and may itself be comprised of any number of
|
346
|
+
[Match](http://mjijackson.com/citrus/api/classes/Citrus/Match.html) objects.
|
347
|
+
Each match is created by a rule and may itself be comprised of any number of
|
348
|
+
submatches.
|
346
349
|
|
347
350
|
Submatches are created whenever a rule contains another rule. For example, in
|
348
351
|
the grammar above `number` matches a string of digits followed by white space.
|
@@ -358,17 +361,17 @@ blocks. Let's extend the `Addition` grammar using this technique.
|
|
358
361
|
number.value + term.value
|
359
362
|
}
|
360
363
|
end
|
361
|
-
|
364
|
+
|
362
365
|
rule number
|
363
366
|
([0-9]+ space) {
|
364
367
|
to_i
|
365
368
|
}
|
366
369
|
end
|
367
|
-
|
370
|
+
|
368
371
|
rule plus
|
369
372
|
'+' space
|
370
373
|
end
|
371
|
-
|
374
|
+
|
372
375
|
rule space
|
373
376
|
[ \t]*
|
374
377
|
end
|
@@ -415,14 +418,14 @@ commands in a terminal.
|
|
415
418
|
Congratulations! You just ran your first piece of Citrus code.
|
416
419
|
|
417
420
|
One interesting thing to notice about the above sequence of commands is the
|
418
|
-
return value of [Citrus#load](api/classes/Citrus.html#M000003).
|
419
|
-
`Citrus.load` to load a grammar file (and likewise
|
420
|
-
[Citrus#eval](api/classes/Citrus.html#M000004) to
|
421
|
-
grammar code), the return value is an array of all the
|
422
|
-
file.
|
421
|
+
return value of [Citrus#load](http://mjijackson.com/citrus/api/classes/Citrus.html#M000003).
|
422
|
+
When you use `Citrus.load` to load a grammar file (and likewise
|
423
|
+
[Citrus#eval](http://mjijackson.com/citrus/api/classes/Citrus.html#M000004) to
|
424
|
+
evaluate a raw string of grammar code), the return value is an array of all the
|
425
|
+
grammars present in that file.
|
423
426
|
|
424
427
|
Take a look at
|
425
|
-
[
|
428
|
+
[calc.citrus](http://github.com/mjijackson/citrus/blob/master/lib/citrus/grammars/calc.citrus)
|
426
429
|
for an example of a calculator that is able to parse and evaluate more complex
|
427
430
|
mathematical expressions.
|
428
431
|
|
@@ -431,20 +434,20 @@ mathematical expressions.
|
|
431
434
|
If you need more than just a `value` method on your match object, you can attach
|
432
435
|
additional methods as well. There are two ways to do this. The first lets you
|
433
436
|
define additional methods inline in your semantic block. This block will be used
|
434
|
-
to create a new Module using [Module#new](http://ruby-doc.org/core/classes/Module.html#M001682).
|
435
|
-
`Addition` example above, we might refactor the `additive` rule to
|
436
|
-
this:
|
437
|
+
to create a new Module using [Module#new](http://ruby-doc.org/core/classes/Module.html#M001682).
|
438
|
+
Using the `Addition` example above, we might refactor the `additive` rule to
|
439
|
+
look like this:
|
437
440
|
|
438
441
|
rule additive
|
439
442
|
(number plus term:(additive | number)) {
|
440
443
|
def lhs
|
441
444
|
number.value
|
442
445
|
end
|
443
|
-
|
446
|
+
|
444
447
|
def rhs
|
445
448
|
term.value
|
446
449
|
end
|
447
|
-
|
450
|
+
|
448
451
|
def value
|
449
452
|
lhs + rhs
|
450
453
|
end
|
@@ -474,11 +477,11 @@ define the following module.
|
|
474
477
|
def lhs
|
475
478
|
number.value
|
476
479
|
end
|
477
|
-
|
480
|
+
|
478
481
|
def rhs
|
479
482
|
term.value
|
480
483
|
end
|
481
|
-
|
484
|
+
|
482
485
|
def value
|
483
486
|
lhs + rhs
|
484
487
|
end
|
@@ -510,7 +513,7 @@ case that could be used to test that our grammar works properly.
|
|
510
513
|
assert_equal('23 + 12', match)
|
511
514
|
assert_equal(35, match.value)
|
512
515
|
end
|
513
|
-
|
516
|
+
|
514
517
|
def test_number
|
515
518
|
match = Addition.parse('23', :root => :number)
|
516
519
|
assert(match)
|
@@ -530,11 +533,11 @@ made to test equality of match objects with string values.
|
|
530
533
|
|
531
534
|
## Debugging
|
532
535
|
|
533
|
-
When a parse fails, a [ParseError](api/classes/Citrus/ParseError.html)
|
534
|
-
generated which provides a wealth of information about exactly where
|
535
|
-
failed including the offset, line number, line text, and line offset.
|
536
|
-
object, you could possibly provide some useful feedback to the user
|
537
|
-
the input was bad. The following code demonstrates one way to do this.
|
536
|
+
When a parse fails, a [ParseError](http://mjijackson.com/citrus/api/classes/Citrus/ParseError.html)
|
537
|
+
object is generated which provides a wealth of information about exactly where
|
538
|
+
the parse failed including the offset, line number, line text, and line offset.
|
539
|
+
Using this object, you could possibly provide some useful feedback to the user
|
540
|
+
about why the input was bad. The following code demonstrates one way to do this.
|
538
541
|
|
539
542
|
def parse_some_stuff(stuff)
|
540
543
|
match = StuffGrammar.parse(stuff)
|
@@ -606,7 +609,7 @@ included here for those who may wish to explore an alternative implementation.
|
|
606
609
|
# License
|
607
610
|
|
608
611
|
|
609
|
-
Copyright 2010 Michael Jackson
|
612
|
+
Copyright 2010-2011 Michael Jackson
|
610
613
|
|
611
614
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
612
615
|
of this software and associated documentation files (the "Software"), to deal
|
@@ -618,10 +621,10 @@ furnished to do so, subject to the following conditions:
|
|
618
621
|
The above copyright notice and this permission notice shall be included in
|
619
622
|
all copies or substantial portions of the Software.
|
620
623
|
|
621
|
-
|
622
|
-
|
623
|
-
|
624
|
-
|
625
|
-
|
626
|
-
|
627
|
-
|
624
|
+
The software is provided "as is", without warranty of any kind, express or
|
625
|
+
implied, including but not limited to the warranties of merchantability,
|
626
|
+
fitness for a particular purpose and non-infringement. In no event shall the
|
627
|
+
authors or copyright holders be liable for any claim, damages or other
|
628
|
+
liability, whether in an action of contract, tort or otherwise, arising from,
|
629
|
+
out of or in connection with the software or the use or other dealings in
|
630
|
+
the software.
|
data/citrus.gemspec
CHANGED
@@ -20,7 +20,7 @@ Gem::Specification.new do |s|
|
|
20
20
|
Dir['extras/**'] +
|
21
21
|
Dir['lib/**/*.rb'] +
|
22
22
|
Dir['test/**/*'] +
|
23
|
-
%w< citrus.gemspec Rakefile README CHANGES >
|
23
|
+
%w< citrus.gemspec Rakefile README.md CHANGES >
|
24
24
|
|
25
25
|
s.test_files = s.files.select {|path| path =~ /^test\/.*_test.rb/ }
|
26
26
|
|
@@ -28,7 +28,7 @@ Gem::Specification.new do |s|
|
|
28
28
|
|
29
29
|
s.has_rdoc = true
|
30
30
|
s.rdoc_options = %w< --line-numbers --inline-source --title Citrus --main Citrus >
|
31
|
-
s.extra_rdoc_files = %w< README CHANGES >
|
31
|
+
s.extra_rdoc_files = %w< README.md CHANGES >
|
32
32
|
|
33
33
|
s.homepage = 'http://mjijackson.com/citrus'
|
34
34
|
end
|
data/doc/syntax.markdown
CHANGED
@@ -62,7 +62,7 @@ Both positive and negative lookahead are supported in Citrus. Use the `&` and
|
|
62
62
|
`!` operators to indicate that an expression either should or should not match.
|
63
63
|
In neither case is any input consumed.
|
64
64
|
|
65
|
-
|
65
|
+
'a' &'b' # match an "a" that is followed by a "b"
|
66
66
|
'a' !'b' # match an "a" that is not followed by a "b"
|
67
67
|
!'a' . # match any character except for "a"
|
68
68
|
|
@@ -143,14 +143,14 @@ same name as a rule in the parent also have access to the `super` keyword to
|
|
143
143
|
invoke the parent rule.
|
144
144
|
|
145
145
|
grammar Number
|
146
|
-
|
146
|
+
rule number
|
147
147
|
[0-9]+
|
148
148
|
end
|
149
149
|
end
|
150
|
-
|
150
|
+
|
151
151
|
grammar FloatingPoint
|
152
152
|
include Number
|
153
|
-
|
153
|
+
|
154
154
|
rule number
|
155
155
|
super ('.' super)?
|
156
156
|
end
|
data/lib/citrus/file.rb
CHANGED
data/lib/citrus/grammars.rb
CHANGED
data/lib/citrus/version.rb
CHANGED
File without changes
|
File without changes
|
@@ -0,0 +1 @@
|
|
1
|
+
rule super '' end
|