citrus 2.4.0 → 2.4.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/CHANGES +6 -0
- data/{README → README.md} +89 -86
- data/citrus.gemspec +2 -2
- data/doc/syntax.markdown +4 -4
- data/lib/citrus/file.rb +1 -1
- data/lib/citrus/grammars.rb +2 -1
- data/lib/citrus/version.rb +1 -1
- data/test/_files/{super.citrus → file1.citrus} +0 -0
- data/test/_files/{super2.citrus → file2.citrus} +0 -0
- data/test/_files/file3.citrus +9 -0
- data/test/_files/rule5.citrus +1 -0
- data/test/file_test.rb +1 -1
- metadata +31 -62
- data/benchmark/after.dat +0 -192
- data/benchmark/before.dat +0 -192
- data/test/alias_test.rbc +0 -1491
- data/test/and_predicate_test.rbc +0 -1327
- data/test/but_predicate_test.rbc +0 -1398
- data/test/choice_test.rbc +0 -1278
- data/test/extension_test.rbc +0 -1121
- data/test/file_test.rbc +0 -21465
- data/test/grammar_test.rbc +0 -3490
- data/test/grammars/calc_test.rbc +0 -2268
- data/test/grammars/ipaddress_test.rbc +0 -515
- data/test/grammars/ipv4address_test.rbc +0 -1112
- data/test/grammars/ipv6address_test.rbc +0 -1195
- data/test/helper.rbc +0 -1046
- data/test/input_test.rbc +0 -1775
- data/test/label_test.rbc +0 -692
- data/test/match_test.rbc +0 -2415
- data/test/memoized_input_test.rbc +0 -1607
- data/test/multibyte_test.rbc +0 -1597
- data/test/not_predicate_test.rbc +0 -1327
- data/test/parse_error_test.rbc +0 -1253
- data/test/repeat_test.rbc +0 -3245
- data/test/sequence_test.rbc +0 -1534
- data/test/string_terminal_test.rbc +0 -1802
- data/test/super_test.rbc +0 -2083
- data/test/terminal_test.rbc +0 -1315
data/CHANGES
CHANGED
data/{README → README.md}
RENAMED
@@ -1,9 +1,4 @@
|
|
1
|
-
|
2
|
-
|
3
|
-
~* Citrus *~
|
4
|
-
|
5
|
-
Parsing Expressions for Ruby
|
6
|
-
|
1
|
+
Citrus :: Parsing Expressions for Ruby
|
7
2
|
|
8
3
|
Citrus is a compact and powerful parsing library for
|
9
4
|
[Ruby](http://ruby-lang.org/) that combines the elegance and expressiveness of
|
@@ -52,13 +47,13 @@ In Citrus, there are three main types of objects: rules, grammars, and matches.
|
|
52
47
|
|
53
48
|
## Rules
|
54
49
|
|
55
|
-
A [Rule](api/classes/Citrus/Rule.html) is an object
|
56
|
-
behavior on a string. There are two types of rules:
|
57
|
-
Terminals can be either Ruby strings or regular
|
58
|
-
input to match. For example, a terminal created
|
59
|
-
match any sequence of the characters "e", "n", and
|
60
|
-
created from regular expressions may match any
|
61
|
-
be generated from that expression.
|
50
|
+
A [Rule](http://mjijackson.com/citrus/api/classes/Citrus/Rule.html) is an object
|
51
|
+
that specifies some matching behavior on a string. There are two types of rules:
|
52
|
+
terminals and non-terminals. Terminals can be either Ruby strings or regular
|
53
|
+
expressions that specify some input to match. For example, a terminal created
|
54
|
+
from the string "end" would match any sequence of the characters "e", "n", and
|
55
|
+
"d", in that order. Terminals created from regular expressions may match any
|
56
|
+
sequence of characters that can be generated from that expression.
|
62
57
|
|
63
58
|
Non-terminals are rules that may contain other rules but do not themselves match
|
64
59
|
directly on the input. For example, a Repeat is a non-terminal that may contain
|
@@ -70,9 +65,9 @@ of Ruby modules. Rules use these modules to extend the matches they create.
|
|
70
65
|
|
71
66
|
## Grammars
|
72
67
|
|
73
|
-
A [Grammar](api/classes/Citrus/Grammar.html) is a
|
74
|
-
the rules in a grammar collectively form a complete
|
75
|
-
language, or a well-defined subset thereof.
|
68
|
+
A [Grammar](http://mjijackson.com/citrus/api/classes/Citrus/Grammar.html) is a
|
69
|
+
container for rules. Usually the rules in a grammar collectively form a complete
|
70
|
+
specification for some language, or a well-defined subset thereof.
|
76
71
|
|
77
72
|
A Citrus grammar is really just a souped-up Ruby
|
78
73
|
[module](http://ruby-doc.org/core/classes/Module.html). These modules may be
|
@@ -84,8 +79,9 @@ Ruby's `super` keyword.
|
|
84
79
|
|
85
80
|
## Matches
|
86
81
|
|
87
|
-
A [Match](api/classes/Citrus/Match.html) object
|
88
|
-
recognition of some piece of the input. Matches are
|
82
|
+
A [Match](http://mjijackson.com/citrus/api/classes/Citrus/Match.html) object
|
83
|
+
represents a successful recognition of some piece of the input. Matches are
|
84
|
+
created by rule objects during a parse.
|
89
85
|
|
90
86
|
Matches are arranged in a tree structure where any match may contain any number
|
91
87
|
of other matches. Each match contains information about its own subtree. The
|
@@ -132,8 +128,9 @@ match in a case-insensitive manner.
|
|
132
128
|
Besides case sensitivity, case-insensitive strings have the same behavior as
|
133
129
|
double quoted strings.
|
134
130
|
|
135
|
-
See [Terminal](api/classes/Citrus/Terminal.html) and
|
136
|
-
[StringTerminal](api/classes/Citrus/StringTerminal.html)
|
131
|
+
See [Terminal](http://mjijackson.com/citrus/api/classes/Citrus/Terminal.html) and
|
132
|
+
[StringTerminal](http://mjijackson.com/citrus/api/classes/Citrus/StringTerminal.html)
|
133
|
+
for more information.
|
137
134
|
|
138
135
|
## Repetition
|
139
136
|
|
@@ -156,7 +153,8 @@ The `+` and `?` operators are supported as well for the common cases of `1*` and
|
|
156
153
|
'abc'+ # match "abc" one or more times
|
157
154
|
'abc'? # match "abc" zero or one time
|
158
155
|
|
159
|
-
See [Repeat](api/classes/Citrus/Repeat.html) for
|
156
|
+
See [Repeat](http://mjijackson.com/citrus/api/classes/Citrus/Repeat.html) for
|
157
|
+
more information.
|
160
158
|
|
161
159
|
## Lookahead
|
162
160
|
|
@@ -164,7 +162,7 @@ Both positive and negative lookahead are supported in Citrus. Use the `&` and
|
|
164
162
|
`!` operators to indicate that an expression either should or should not match.
|
165
163
|
In neither case is any input consumed.
|
166
164
|
|
167
|
-
|
165
|
+
'a' &'b' # match an "a" that is followed by a "b"
|
168
166
|
'a' !'b' # match an "a" that is not followed by a "b"
|
169
167
|
!'a' . # match any character except for "a"
|
170
168
|
|
@@ -177,9 +175,10 @@ that does not match a given expression.
|
|
177
175
|
When using this operator (the tilde), at least one character must be consumed
|
178
176
|
for the rule to succeed.
|
179
177
|
|
180
|
-
See [AndPredicate](api/classes/Citrus/AndPredicate.html),
|
181
|
-
[NotPredicate](api/classes/Citrus/NotPredicate.html),
|
182
|
-
[ButPredicate](api/classes/Citrus/ButPredicate.html)
|
178
|
+
See [AndPredicate](http://mjijackson.com/citrus/api/classes/Citrus/AndPredicate.html),
|
179
|
+
[NotPredicate](http://mjijackson.com/citrus/api/classes/Citrus/NotPredicate.html),
|
180
|
+
and [ButPredicate](http://mjijackson.com/citrus/api/classes/Citrus/ButPredicate.html)
|
181
|
+
for more information.
|
183
182
|
|
184
183
|
## Sequences
|
185
184
|
|
@@ -189,7 +188,8 @@ should match in that order.
|
|
189
188
|
'a' 'b' 'c' # match "a", then "b", then "c"
|
190
189
|
'a' [0-9] # match "a", then a numeric digit
|
191
190
|
|
192
|
-
See [Sequence](api/classes/Citrus/Sequence.html)
|
191
|
+
See [Sequence](http://mjijackson.com/citrus/api/classes/Citrus/Sequence.html)
|
192
|
+
for more information.
|
193
193
|
|
194
194
|
## Choices
|
195
195
|
|
@@ -204,7 +204,8 @@ It is important to note when using ordered choice that any operator binds more
|
|
204
204
|
tightly than the vertical bar. A full chart of operators and their respective
|
205
205
|
levels of precedence is below.
|
206
206
|
|
207
|
-
See [Choice](api/classes/Citrus/Choice.html) for
|
207
|
+
See [Choice](http://mjijackson.com/citrus/api/classes/Citrus/Choice.html) for
|
208
|
+
more information.
|
208
209
|
|
209
210
|
## Labels
|
210
211
|
|
@@ -245,14 +246,14 @@ same name as a rule in the parent also have access to the `super` keyword to
|
|
245
246
|
invoke the parent rule.
|
246
247
|
|
247
248
|
grammar Number
|
248
|
-
|
249
|
+
rule number
|
249
250
|
[0-9]+
|
250
251
|
end
|
251
252
|
end
|
252
|
-
|
253
|
+
|
253
254
|
grammar FloatingPoint
|
254
255
|
include Number
|
255
|
-
|
256
|
+
|
256
257
|
rule number
|
257
258
|
super ('.' super)?
|
258
259
|
end
|
@@ -262,33 +263,34 @@ In the example above, the `FloatingPoint` grammar includes `Number`. Both have a
|
|
262
263
|
rule named `number`, so `FloatingPoint#number` has access to `Number#number` by
|
263
264
|
means of using `super`.
|
264
265
|
|
265
|
-
See [Super](api/classes/Citrus/Super.html) for more
|
266
|
+
See [Super](http://mjijackson.com/citrus/api/classes/Citrus/Super.html) for more
|
267
|
+
information.
|
266
268
|
|
267
269
|
## Precedence
|
268
270
|
|
269
271
|
The following table contains a list of all Citrus symbols and operators and
|
270
272
|
their precedence. A higher precedence indicates tighter binding.
|
271
273
|
|
272
|
-
Operator
|
273
|
-
|
274
|
-
''
|
275
|
-
""
|
276
|
-
|
277
|
-
[]
|
278
|
-
|
279
|
-
|
280
|
-
()
|
281
|
-
|
282
|
-
|
283
|
-
|
284
|
-
|
285
|
-
|
286
|
-
|
287
|
-
|
288
|
-
{}
|
289
|
-
|
290
|
-
e1 e2
|
291
|
-
e1
|
274
|
+
Operator | Name | Precedence
|
275
|
+
------------------------- | ------------------------- | ----------
|
276
|
+
`''` | String (single quoted) | 7
|
277
|
+
`""` | String (double quoted) | 7
|
278
|
+
<code>``</code> | String (case insensitive) | 7
|
279
|
+
`[]` | Character class | 7
|
280
|
+
`.` | Dot (any character) | 7
|
281
|
+
`//` | Regular expression | 7
|
282
|
+
`()` | Grouping | 7
|
283
|
+
`*` | Repetition (arbitrary) | 6
|
284
|
+
`+` | Repetition (one or more) | 6
|
285
|
+
`?` | Repetition (zero or one) | 6
|
286
|
+
`&` | And predicate | 5
|
287
|
+
`!` | Not predicate | 5
|
288
|
+
`~` | But predicate | 5
|
289
|
+
`<>` | Extension (module name) | 4
|
290
|
+
`{}` | Extension (literal) | 4
|
291
|
+
`:` | Label | 3
|
292
|
+
`e1 e2` | Sequence | 2
|
293
|
+
<code>e1 | e2</code> | Ordered choice | 1
|
292
294
|
|
293
295
|
## Grouping
|
294
296
|
|
@@ -310,15 +312,15 @@ integers separated by any amount of white space and a `+` symbol.
|
|
310
312
|
rule additive
|
311
313
|
number plus (additive | number)
|
312
314
|
end
|
313
|
-
|
315
|
+
|
314
316
|
rule number
|
315
317
|
[0-9]+ space
|
316
318
|
end
|
317
|
-
|
319
|
+
|
318
320
|
rule plus
|
319
321
|
'+' space
|
320
322
|
end
|
321
|
-
|
323
|
+
|
322
324
|
rule space
|
323
325
|
[ \t]*
|
324
326
|
end
|
@@ -341,8 +343,9 @@ and "1 + 2+3", but it does not have enough semantic information to be able to
|
|
341
343
|
actually interpret these expressions.
|
342
344
|
|
343
345
|
At this point, when the grammar parses a string it generates a tree of
|
344
|
-
[Match](api/classes/Citrus/Match.html) objects.
|
345
|
-
and may itself be comprised of any number of
|
346
|
+
[Match](http://mjijackson.com/citrus/api/classes/Citrus/Match.html) objects.
|
347
|
+
Each match is created by a rule and may itself be comprised of any number of
|
348
|
+
submatches.
|
346
349
|
|
347
350
|
Submatches are created whenever a rule contains another rule. For example, in
|
348
351
|
the grammar above `number` matches a string of digits followed by white space.
|
@@ -358,17 +361,17 @@ blocks. Let's extend the `Addition` grammar using this technique.
|
|
358
361
|
number.value + term.value
|
359
362
|
}
|
360
363
|
end
|
361
|
-
|
364
|
+
|
362
365
|
rule number
|
363
366
|
([0-9]+ space) {
|
364
367
|
to_i
|
365
368
|
}
|
366
369
|
end
|
367
|
-
|
370
|
+
|
368
371
|
rule plus
|
369
372
|
'+' space
|
370
373
|
end
|
371
|
-
|
374
|
+
|
372
375
|
rule space
|
373
376
|
[ \t]*
|
374
377
|
end
|
@@ -415,14 +418,14 @@ commands in a terminal.
|
|
415
418
|
Congratulations! You just ran your first piece of Citrus code.
|
416
419
|
|
417
420
|
One interesting thing to notice about the above sequence of commands is the
|
418
|
-
return value of [Citrus#load](api/classes/Citrus.html#M000003).
|
419
|
-
`Citrus.load` to load a grammar file (and likewise
|
420
|
-
[Citrus#eval](api/classes/Citrus.html#M000004) to
|
421
|
-
grammar code), the return value is an array of all the
|
422
|
-
file.
|
421
|
+
return value of [Citrus#load](http://mjijackson.com/citrus/api/classes/Citrus.html#M000003).
|
422
|
+
When you use `Citrus.load` to load a grammar file (and likewise
|
423
|
+
[Citrus#eval](http://mjijackson.com/citrus/api/classes/Citrus.html#M000004) to
|
424
|
+
evaluate a raw string of grammar code), the return value is an array of all the
|
425
|
+
grammars present in that file.
|
423
426
|
|
424
427
|
Take a look at
|
425
|
-
[
|
428
|
+
[calc.citrus](http://github.com/mjijackson/citrus/blob/master/lib/citrus/grammars/calc.citrus)
|
426
429
|
for an example of a calculator that is able to parse and evaluate more complex
|
427
430
|
mathematical expressions.
|
428
431
|
|
@@ -431,20 +434,20 @@ mathematical expressions.
|
|
431
434
|
If you need more than just a `value` method on your match object, you can attach
|
432
435
|
additional methods as well. There are two ways to do this. The first lets you
|
433
436
|
define additional methods inline in your semantic block. This block will be used
|
434
|
-
to create a new Module using [Module#new](http://ruby-doc.org/core/classes/Module.html#M001682).
|
435
|
-
`Addition` example above, we might refactor the `additive` rule to
|
436
|
-
this:
|
437
|
+
to create a new Module using [Module#new](http://ruby-doc.org/core/classes/Module.html#M001682).
|
438
|
+
Using the `Addition` example above, we might refactor the `additive` rule to
|
439
|
+
look like this:
|
437
440
|
|
438
441
|
rule additive
|
439
442
|
(number plus term:(additive | number)) {
|
440
443
|
def lhs
|
441
444
|
number.value
|
442
445
|
end
|
443
|
-
|
446
|
+
|
444
447
|
def rhs
|
445
448
|
term.value
|
446
449
|
end
|
447
|
-
|
450
|
+
|
448
451
|
def value
|
449
452
|
lhs + rhs
|
450
453
|
end
|
@@ -474,11 +477,11 @@ define the following module.
|
|
474
477
|
def lhs
|
475
478
|
number.value
|
476
479
|
end
|
477
|
-
|
480
|
+
|
478
481
|
def rhs
|
479
482
|
term.value
|
480
483
|
end
|
481
|
-
|
484
|
+
|
482
485
|
def value
|
483
486
|
lhs + rhs
|
484
487
|
end
|
@@ -510,7 +513,7 @@ case that could be used to test that our grammar works properly.
|
|
510
513
|
assert_equal('23 + 12', match)
|
511
514
|
assert_equal(35, match.value)
|
512
515
|
end
|
513
|
-
|
516
|
+
|
514
517
|
def test_number
|
515
518
|
match = Addition.parse('23', :root => :number)
|
516
519
|
assert(match)
|
@@ -530,11 +533,11 @@ made to test equality of match objects with string values.
|
|
530
533
|
|
531
534
|
## Debugging
|
532
535
|
|
533
|
-
When a parse fails, a [ParseError](api/classes/Citrus/ParseError.html)
|
534
|
-
generated which provides a wealth of information about exactly where
|
535
|
-
failed including the offset, line number, line text, and line offset.
|
536
|
-
object, you could possibly provide some useful feedback to the user
|
537
|
-
the input was bad. The following code demonstrates one way to do this.
|
536
|
+
When a parse fails, a [ParseError](http://mjijackson.com/citrus/api/classes/Citrus/ParseError.html)
|
537
|
+
object is generated which provides a wealth of information about exactly where
|
538
|
+
the parse failed including the offset, line number, line text, and line offset.
|
539
|
+
Using this object, you could possibly provide some useful feedback to the user
|
540
|
+
about why the input was bad. The following code demonstrates one way to do this.
|
538
541
|
|
539
542
|
def parse_some_stuff(stuff)
|
540
543
|
match = StuffGrammar.parse(stuff)
|
@@ -606,7 +609,7 @@ included here for those who may wish to explore an alternative implementation.
|
|
606
609
|
# License
|
607
610
|
|
608
611
|
|
609
|
-
Copyright 2010 Michael Jackson
|
612
|
+
Copyright 2010-2011 Michael Jackson
|
610
613
|
|
611
614
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
612
615
|
of this software and associated documentation files (the "Software"), to deal
|
@@ -618,10 +621,10 @@ furnished to do so, subject to the following conditions:
|
|
618
621
|
The above copyright notice and this permission notice shall be included in
|
619
622
|
all copies or substantial portions of the Software.
|
620
623
|
|
621
|
-
|
622
|
-
|
623
|
-
|
624
|
-
|
625
|
-
|
626
|
-
|
627
|
-
|
624
|
+
The software is provided "as is", without warranty of any kind, express or
|
625
|
+
implied, including but not limited to the warranties of merchantability,
|
626
|
+
fitness for a particular purpose and non-infringement. In no event shall the
|
627
|
+
authors or copyright holders be liable for any claim, damages or other
|
628
|
+
liability, whether in an action of contract, tort or otherwise, arising from,
|
629
|
+
out of or in connection with the software or the use or other dealings in
|
630
|
+
the software.
|
data/citrus.gemspec
CHANGED
@@ -20,7 +20,7 @@ Gem::Specification.new do |s|
|
|
20
20
|
Dir['extras/**'] +
|
21
21
|
Dir['lib/**/*.rb'] +
|
22
22
|
Dir['test/**/*'] +
|
23
|
-
%w< citrus.gemspec Rakefile README CHANGES >
|
23
|
+
%w< citrus.gemspec Rakefile README.md CHANGES >
|
24
24
|
|
25
25
|
s.test_files = s.files.select {|path| path =~ /^test\/.*_test.rb/ }
|
26
26
|
|
@@ -28,7 +28,7 @@ Gem::Specification.new do |s|
|
|
28
28
|
|
29
29
|
s.has_rdoc = true
|
30
30
|
s.rdoc_options = %w< --line-numbers --inline-source --title Citrus --main Citrus >
|
31
|
-
s.extra_rdoc_files = %w< README CHANGES >
|
31
|
+
s.extra_rdoc_files = %w< README.md CHANGES >
|
32
32
|
|
33
33
|
s.homepage = 'http://mjijackson.com/citrus'
|
34
34
|
end
|
data/doc/syntax.markdown
CHANGED
@@ -62,7 +62,7 @@ Both positive and negative lookahead are supported in Citrus. Use the `&` and
|
|
62
62
|
`!` operators to indicate that an expression either should or should not match.
|
63
63
|
In neither case is any input consumed.
|
64
64
|
|
65
|
-
|
65
|
+
'a' &'b' # match an "a" that is followed by a "b"
|
66
66
|
'a' !'b' # match an "a" that is not followed by a "b"
|
67
67
|
!'a' . # match any character except for "a"
|
68
68
|
|
@@ -143,14 +143,14 @@ same name as a rule in the parent also have access to the `super` keyword to
|
|
143
143
|
invoke the parent rule.
|
144
144
|
|
145
145
|
grammar Number
|
146
|
-
|
146
|
+
rule number
|
147
147
|
[0-9]+
|
148
148
|
end
|
149
149
|
end
|
150
|
-
|
150
|
+
|
151
151
|
grammar FloatingPoint
|
152
152
|
include Number
|
153
|
-
|
153
|
+
|
154
154
|
rule number
|
155
155
|
super ('.' super)?
|
156
156
|
end
|
data/lib/citrus/file.rb
CHANGED
data/lib/citrus/grammars.rb
CHANGED
data/lib/citrus/version.rb
CHANGED
File without changes
|
File without changes
|
@@ -0,0 +1 @@
|
|
1
|
+
rule super '' end
|