citrus 2.1.2 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README +106 -49
- data/benchmark/after.dat +192 -0
- data/benchmark/before.dat +192 -0
- data/citrus.gemspec +0 -1
- data/doc/extras.markdown +16 -0
- data/doc/syntax.markdown +76 -29
- data/doc/testing.markdown +12 -20
- data/examples/calc.citrus +12 -11
- data/examples/calc.rb +12 -11
- data/lib/citrus.rb +416 -253
- data/lib/citrus/file.rb +66 -33
- data/test/_files/super.citrus +1 -1
- data/test/_files/super2.citrus +13 -0
- data/test/alias_test.rb +18 -34
- data/test/and_predicate_test.rb +15 -10
- data/test/but_predicate_test.rb +22 -17
- data/test/calc_file_test.rb +1 -1
- data/test/choice_test.rb +12 -37
- data/test/{rule_test.rb → extension_test.rb} +17 -16
- data/test/file_test.rb +350 -244
- data/test/grammar_test.rb +5 -11
- data/test/helper.rb +1 -17
- data/test/input_test.rb +172 -2
- data/test/label_test.rb +0 -10
- data/test/match_test.rb +91 -35
- data/test/multibyte_test.rb +4 -4
- data/test/not_predicate_test.rb +15 -10
- data/test/parse_error_test.rb +1 -3
- data/test/repeat_test.rb +59 -32
- data/test/sequence_test.rb +19 -31
- data/test/string_terminal_test.rb +55 -0
- data/test/super_test.rb +31 -31
- data/test/terminal_test.rb +12 -37
- metadata +13 -23
- data/lib/citrus/debug.rb +0 -69
- data/test/debug_test.rb +0 -23
data/citrus.gemspec
CHANGED
data/doc/extras.markdown
ADDED
@@ -0,0 +1,16 @@
|
|
1
|
+
# Extras
|
2
|
+
|
3
|
+
|
4
|
+
Several files are included in the Citrus repository that make it easier to work
|
5
|
+
with grammar files in various editors.
|
6
|
+
|
7
|
+
## TextMate
|
8
|
+
|
9
|
+
To install the Citrus [TextMate](http://macromates.com/) bundle, simply
|
10
|
+
double-click on the `Citrus.tmbundle` file in the `extras` directory.
|
11
|
+
|
12
|
+
## Vim
|
13
|
+
|
14
|
+
To install the [Vim](http://www.vim.org/) scripts, copy the files in
|
15
|
+
`extras/vim` to a directory in Vim's
|
16
|
+
[runtimepath](http://vimdoc.sourceforge.net/htmldoc/options.html#\'runtimepath\').
|
data/doc/syntax.markdown
CHANGED
@@ -10,46 +10,57 @@ already be familiar to Ruby programmers.
|
|
10
10
|
Terminals may be represented by a string or a regular expression. Both follow
|
11
11
|
the same rules as Ruby string and regular expression literals.
|
12
12
|
|
13
|
-
'abc'
|
14
|
-
"abc\n"
|
15
|
-
|
13
|
+
'abc' # match "abc"
|
14
|
+
"abc\n" # match "abc\n"
|
15
|
+
/abc/i # match "abc" in any case
|
16
|
+
/\xFF/ # match "\xFF"
|
16
17
|
|
17
18
|
Character classes and the dot (match anything) symbol are supported as well for
|
18
19
|
compatibility with other parsing expression implementations.
|
19
20
|
|
20
21
|
[a-z0-9] # match any lowercase letter or digit
|
21
22
|
[\x00-\xFF] # match any octet
|
22
|
-
. # match
|
23
|
+
. # match any single character, including new lines
|
23
24
|
|
24
|
-
|
25
|
+
Also, strings may use backticks instead of quotes to indicate that they should
|
26
|
+
match in a case-insensitive manner.
|
27
|
+
|
28
|
+
`abc` # match "abc" in any case
|
29
|
+
|
30
|
+
See [Terminal](api/classes/Citrus/Terminal.html) and
|
31
|
+
[StringTerminal](api/classes/Citrus/StringTerminal.html) for more information.
|
25
32
|
|
26
33
|
## Repetition
|
27
34
|
|
28
35
|
Quantifiers may be used after any expression to specify a number of times it
|
29
|
-
must match. The universal form of a quantifier is N*M where N is the minimum
|
30
|
-
M is the maximum number of times the expression may match.
|
36
|
+
must match. The universal form of a quantifier is `N*M` where `N` is the minimum
|
37
|
+
and `M` is the maximum number of times the expression may match.
|
31
38
|
|
32
|
-
'abc'1*2 # match "abc" a minimum of one, maximum
|
33
|
-
# of two times
|
39
|
+
'abc'1*2 # match "abc" a minimum of one, maximum of two times
|
34
40
|
'abc'1* # match "abc" at least once
|
35
41
|
'abc'*2 # match "abc" a maximum of twice
|
36
42
|
|
37
|
-
|
38
|
-
|
43
|
+
Additionally, the minimum and maximum may be omitted entirely to specify that an
|
44
|
+
expression may match zero or more times.
|
45
|
+
|
46
|
+
'abc'* # match "abc" zero or more times
|
47
|
+
|
48
|
+
The `+` and `?` operators are supported as well for the common cases of `1*` and
|
49
|
+
`*1` respectively.
|
39
50
|
|
40
|
-
'abc'+ # match "abc"
|
41
|
-
'abc'? # match "abc"
|
51
|
+
'abc'+ # match "abc" one or more times
|
52
|
+
'abc'? # match "abc" zero or one time
|
42
53
|
|
43
54
|
See [Repeat](api/classes/Citrus/Repeat.html) for more information.
|
44
55
|
|
45
56
|
## Lookahead
|
46
57
|
|
47
|
-
Both positive and negative lookahead are supported in Citrus. Use the
|
48
|
-
operators to indicate that an expression either should or should not match.
|
49
|
-
neither case is any input consumed.
|
58
|
+
Both positive and negative lookahead are supported in Citrus. Use the `&` and
|
59
|
+
`!` operators to indicate that an expression either should or should not match.
|
60
|
+
In neither case is any input consumed.
|
50
61
|
|
51
62
|
&'a' 'b' # match a "b" preceded by an "a"
|
52
|
-
|
63
|
+
'a' !'b' # match an "a" that is not followed by a "b"
|
53
64
|
!'a' . # match any character except for "a"
|
54
65
|
|
55
66
|
A special form of lookahead is also supported which will match any character
|
@@ -75,20 +86,17 @@ See [Sequence](api/classes/Citrus/Sequence.html) for more information.
|
|
75
86
|
## Choices
|
76
87
|
|
77
88
|
Ordered choice is indicated by a vertical bar that separates two expressions.
|
78
|
-
|
89
|
+
When using choice, each expression is tried in order. When one matches, the
|
90
|
+
rule returns the match immediately without trying the remaining rules.
|
79
91
|
|
80
92
|
'a' | 'b' # match "a" or "b"
|
81
93
|
'a' 'b' | 'c' # match "a" then "b" (in sequence), or "c"
|
82
94
|
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
When including a grammar inside another, all rules in the child that have the
|
88
|
-
same name as a rule in the parent also have access to the "super" keyword to
|
89
|
-
invoke the parent rule.
|
95
|
+
It is important to note when using ordered choice that any operator binds more
|
96
|
+
tightly than the vertical bar. A full chart of operators and their respective
|
97
|
+
levels of precedence is below.
|
90
98
|
|
91
|
-
See [
|
99
|
+
See [Choice](api/classes/Citrus/Choice.html) for more information.
|
92
100
|
|
93
101
|
## Labels
|
94
102
|
|
@@ -96,12 +104,50 @@ Match objects may be referred to by a different name than the rule that
|
|
96
104
|
originally generated them. Labels are created by placing the label and a colon
|
97
105
|
immediately preceding any expression.
|
98
106
|
|
99
|
-
chars:/[a-z]+/ # the characters matched by the regular
|
100
|
-
#
|
101
|
-
#
|
107
|
+
chars:/[a-z]+/ # the characters matched by the regular expression
|
108
|
+
# may be referred to as "chars" in an extension
|
109
|
+
# method
|
102
110
|
|
103
111
|
See [Label](api/classes/Citrus/Label.html) for more information.
|
104
112
|
|
113
|
+
## Grouping
|
114
|
+
|
115
|
+
As is common in many programming languages, parentheses may be used to override
|
116
|
+
the normal binding order of operators.
|
117
|
+
|
118
|
+
'a' ('b' | 'c') # match "a", then "b" or "c"
|
119
|
+
|
120
|
+
## Extensions
|
121
|
+
|
122
|
+
Extensions may be specified using either "module" or "block" syntax. When using
|
123
|
+
module syntax, specify the name of a module that is used to extend match objects
|
124
|
+
in between less than and greater than symbols.
|
125
|
+
|
126
|
+
[a-z0-9]5*9 <CouponCode> # match a string that consists of any lower
|
127
|
+
# cased letter or digit between 5 and 9
|
128
|
+
# times and extend the match with the
|
129
|
+
# CouponCode module
|
130
|
+
|
131
|
+
Additionally, extensions may be specified inline using curly braces. Inside the
|
132
|
+
curly braces you may embed method definitions that will be used to extend match
|
133
|
+
objects.
|
134
|
+
|
135
|
+
# match any digit and return its integer value when calling the
|
136
|
+
# #value method on the match object
|
137
|
+
[0-9] {
|
138
|
+
def value
|
139
|
+
to_i
|
140
|
+
end
|
141
|
+
}
|
142
|
+
|
143
|
+
## Super
|
144
|
+
|
145
|
+
When including a grammar inside another, all rules in the child that have the
|
146
|
+
same name as a rule in the parent also have access to the `super` keyword to
|
147
|
+
invoke the parent rule.
|
148
|
+
|
149
|
+
See [Super](api/classes/Citrus/Super.html) for more information.
|
150
|
+
|
105
151
|
## Precedence
|
106
152
|
|
107
153
|
The following table contains a list of all Citrus symbols and operators and
|
@@ -111,6 +157,7 @@ Operator | Name | Precedence
|
|
111
157
|
------------------------- | ------------------------- | ----------
|
112
158
|
`''` | String (single quoted) | 6
|
113
159
|
`""` | String (double quoted) | 6
|
160
|
+
<code>``</code> | String (case insensitive) | 6
|
114
161
|
`[]` | Character class | 6
|
115
162
|
`.` | Dot (any character) | 6
|
116
163
|
`//` | Regular expression | 6
|
data/doc/testing.markdown
CHANGED
@@ -22,12 +22,11 @@ case that could be used to test that our grammar works properly.
|
|
22
22
|
end
|
23
23
|
end
|
24
24
|
|
25
|
-
The key here is using the
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
on the fly like this enables easy unit testing of the entire grammar.
|
25
|
+
The key here is using the `:root` option when performing the parse to specify
|
26
|
+
the name of the rule at which the parse should start. In `test_number`, since
|
27
|
+
`:number` was given the parse will start at that rule as if it were the root
|
28
|
+
rule of the entire grammar. The ability to change the root rule on the fly like
|
29
|
+
this enables easy unit testing of the entire grammar.
|
31
30
|
|
32
31
|
Also note that because match objects are themselves strings, assertions may be
|
33
32
|
made to test equality of match objects with string values.
|
@@ -36,9 +35,9 @@ made to test equality of match objects with string values.
|
|
36
35
|
|
37
36
|
When a parse fails, a [ParseError](api/classes/Citrus/ParseError.html) object is
|
38
37
|
generated which provides a wealth of information about exactly where the parse
|
39
|
-
failed
|
40
|
-
|
41
|
-
to do this.
|
38
|
+
failed including the offset, line number, line text, and line offset. Using this
|
39
|
+
object, you could possibly provide some useful feedback to the user about why
|
40
|
+
the input was bad. The following code demonstrates one way to do this.
|
42
41
|
|
43
42
|
def parse_some_stuff(stuff)
|
44
43
|
match = StuffGrammar.parse(stuff)
|
@@ -47,14 +46,7 @@ to do this.
|
|
47
46
|
[e.line_number, e.line_offset]
|
48
47
|
end
|
49
48
|
|
50
|
-
In addition to useful error objects, Citrus also includes a
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
When debugging is enabled, you can visualize parse trees in the console as XML
|
56
|
-
documents. This can help when determining which rules are generating which
|
57
|
-
matches and how they are organized in the output. Also when debugging, each
|
58
|
-
match object automatically records its offset in the original input, which can
|
59
|
-
also be very helpful in keeping track of which offsets in the input generated
|
60
|
-
which matches.
|
49
|
+
In addition to useful error objects, Citrus also includes a means of visualizing
|
50
|
+
match trees in the console via `Match#dump`. This can help when determining
|
51
|
+
which rules are generating which matches and how they are organized in the
|
52
|
+
match tree.
|
data/examples/calc.citrus
CHANGED
@@ -5,7 +5,7 @@
|
|
5
5
|
# An identical grammar that is written using pure Ruby can be found in calc.rb.
|
6
6
|
grammar Calc
|
7
7
|
|
8
|
-
##
|
8
|
+
## Hierarchical syntax
|
9
9
|
|
10
10
|
rule term
|
11
11
|
additive | factor
|
@@ -55,50 +55,51 @@ grammar Calc
|
|
55
55
|
(lparen term rparen) { term.value }
|
56
56
|
end
|
57
57
|
|
58
|
-
##
|
58
|
+
## Lexical syntax
|
59
59
|
|
60
60
|
rule number
|
61
61
|
float | integer
|
62
62
|
end
|
63
63
|
|
64
64
|
rule float
|
65
|
-
(digits '.' digits space) { strip.to_f }
|
65
|
+
(digits '.' digits space*) { strip.to_f }
|
66
66
|
end
|
67
67
|
|
68
68
|
rule integer
|
69
|
-
(digits space) { strip.to_i }
|
69
|
+
(digits space*) { strip.to_i }
|
70
70
|
end
|
71
71
|
|
72
72
|
rule digits
|
73
|
+
# Numbers may contain underscores in Ruby.
|
73
74
|
[0-9]+ ('_' [0-9]+)*
|
74
75
|
end
|
75
76
|
|
76
77
|
rule additive_operator
|
77
|
-
(('+' | '-') space) { |a, b|
|
78
|
+
(('+' | '-') space*) { |a, b|
|
78
79
|
a.send(strip, b)
|
79
80
|
}
|
80
81
|
end
|
81
82
|
|
82
83
|
rule multiplicative_operator
|
83
|
-
(('*' | '/' | '%') space) { |a, b|
|
84
|
+
(('*' | '/' | '%') space*) { |a, b|
|
84
85
|
a.send(strip, b)
|
85
86
|
}
|
86
87
|
end
|
87
88
|
|
88
89
|
rule exponential_operator
|
89
|
-
('**' space) { |a, b|
|
90
|
+
('**' space*) { |a, b|
|
90
91
|
a ** b
|
91
92
|
}
|
92
93
|
end
|
93
94
|
|
94
95
|
rule unary_operator
|
95
|
-
(('~' | '+' | '-') space) { |n|
|
96
|
+
(('~' | '+' | '-') space*) { |n|
|
96
97
|
# Unary + and - require an @.
|
97
98
|
n.send(strip == '~' ? strip : '%s@' % strip)
|
98
99
|
}
|
99
100
|
end
|
100
101
|
|
101
|
-
rule lparen '(' space
|
102
|
-
rule rparen ')' space
|
103
|
-
rule space [ \t\n\r]
|
102
|
+
rule lparen '(' space* end
|
103
|
+
rule rparen ')' space* end
|
104
|
+
rule space [ \t\n\r] end
|
104
105
|
end
|
data/examples/calc.rb
CHANGED
@@ -8,7 +8,7 @@ require 'citrus'
|
|
8
8
|
# found in calc.citrus.
|
9
9
|
grammar :Calc do
|
10
10
|
|
11
|
-
##
|
11
|
+
## Hierarchical syntax
|
12
12
|
|
13
13
|
rule :term do
|
14
14
|
any(:additive, :factor)
|
@@ -58,50 +58,51 @@ grammar :Calc do
|
|
58
58
|
all(:lparen, :term, :rparen) { term.value }
|
59
59
|
end
|
60
60
|
|
61
|
-
##
|
61
|
+
## Lexical syntax
|
62
62
|
|
63
63
|
rule :number do
|
64
64
|
any(:float, :integer)
|
65
65
|
end
|
66
66
|
|
67
67
|
rule :float do
|
68
|
-
all(:digits, '.', :digits, :space) { strip.to_f }
|
68
|
+
all(:digits, '.', :digits, zero_or_more(:space)) { strip.to_f }
|
69
69
|
end
|
70
70
|
|
71
71
|
rule :integer do
|
72
|
-
all(:digits, :space) { strip.to_i }
|
72
|
+
all(:digits, zero_or_more(:space)) { strip.to_i }
|
73
73
|
end
|
74
74
|
|
75
75
|
rule :digits do
|
76
|
+
# Numbers may contain underscores in Ruby.
|
76
77
|
/[0-9]+(?:_[0-9]+)*/
|
77
78
|
end
|
78
79
|
|
79
80
|
rule :additive_operator do
|
80
|
-
all(any('+', '-'), :space) { |a, b|
|
81
|
+
all(any('+', '-'), zero_or_more(:space)) { |a, b|
|
81
82
|
a.send(strip, b)
|
82
83
|
}
|
83
84
|
end
|
84
85
|
|
85
86
|
rule :multiplicative_operator do
|
86
|
-
all(any('*', '/', '%'), :space) { |a, b|
|
87
|
+
all(any('*', '/', '%'), zero_or_more(:space)) { |a, b|
|
87
88
|
a.send(strip, b)
|
88
89
|
}
|
89
90
|
end
|
90
91
|
|
91
92
|
rule :exponential_operator do
|
92
|
-
all('**', :space) { |a, b|
|
93
|
+
all('**', zero_or_more(:space)) { |a, b|
|
93
94
|
a ** b
|
94
95
|
}
|
95
96
|
end
|
96
97
|
|
97
98
|
rule :unary_operator do
|
98
|
-
all(any('~', '+', '-'), :space) { |n|
|
99
|
+
all(any('~', '+', '-'), zero_or_more(:space)) { |n|
|
99
100
|
# Unary + and - require an @.
|
100
101
|
n.send(strip == '~' ? strip : '%s@' % strip)
|
101
102
|
}
|
102
103
|
end
|
103
104
|
|
104
|
-
rule :lparen, ['(', :space]
|
105
|
-
rule :rparen, [')', :space]
|
106
|
-
rule :space, /[ \t\n\r]
|
105
|
+
rule :lparen, ['(', zero_or_more(:space)]
|
106
|
+
rule :rparen, [')', zero_or_more(:space)]
|
107
|
+
rule :space, /[ \t\n\r]/
|
107
108
|
end
|
data/lib/citrus.rb
CHANGED
@@ -8,7 +8,7 @@ require 'strscan'
|
|
8
8
|
module Citrus
|
9
9
|
autoload :File, 'citrus/file'
|
10
10
|
|
11
|
-
VERSION = [2,
|
11
|
+
VERSION = [2, 2, 0]
|
12
12
|
|
13
13
|
# Returns the current version of Citrus as a string.
|
14
14
|
def self.version
|
@@ -22,6 +22,8 @@ module Citrus
|
|
22
22
|
|
23
23
|
F = ::File
|
24
24
|
|
25
|
+
CLOSE = -1
|
26
|
+
|
25
27
|
# Loads the grammar from the given +file+ into the global scope using #eval.
|
26
28
|
def self.load(file)
|
27
29
|
file << '.citrus' unless F.file?(file)
|
@@ -40,26 +42,12 @@ module Citrus
|
|
40
42
|
# Parses the given Citrus +code+ using the given +options+. Returns the
|
41
43
|
# generated match tree. Raises a +SyntaxError+ if the parse fails.
|
42
44
|
def self.parse(code, options={})
|
43
|
-
|
44
|
-
File.parse(code, options)
|
45
|
-
rescue ParseError => e
|
46
|
-
raise SyntaxError.new(e)
|
47
|
-
end
|
45
|
+
File.parse(code, options)
|
48
46
|
end
|
49
47
|
|
50
48
|
# A standard error class that all Citrus errors extend.
|
51
49
|
class Error < RuntimeError; end
|
52
50
|
|
53
|
-
# Raised when there is an error parsing Citrus code.
|
54
|
-
class SyntaxError < Error
|
55
|
-
# The +error+ given here should be a +ParseError+ object.
|
56
|
-
def initialize(error)
|
57
|
-
msg = "Syntax error on line %d at offset %d\n%s" %
|
58
|
-
[error.line_number, error.line_offset, error.detail]
|
59
|
-
super(msg)
|
60
|
-
end
|
61
|
-
end
|
62
|
-
|
63
51
|
# Raised when a match cannot be found.
|
64
52
|
class NoMatchError < Error; end
|
65
53
|
|
@@ -71,8 +59,8 @@ module Citrus
|
|
71
59
|
@line_offset = input.line_offset(offset)
|
72
60
|
@line_number = input.line_number(offset)
|
73
61
|
@line = input.line(offset)
|
74
|
-
msg = "Failed to parse input at offset %d\n" %
|
75
|
-
|
62
|
+
msg = "Failed to parse input on line %d at offset %d\n%s" %
|
63
|
+
[line_number, line_offset, detail]
|
76
64
|
super(msg)
|
77
65
|
end
|
78
66
|
|
@@ -106,7 +94,7 @@ module Citrus
|
|
106
94
|
@max_offset = 0
|
107
95
|
end
|
108
96
|
|
109
|
-
# The maximum offset
|
97
|
+
# The maximum offset in the input that was successfully parsed.
|
110
98
|
attr_reader :max_offset
|
111
99
|
|
112
100
|
# A nested hash of rule id's to offsets and their respective matches. Only
|
@@ -116,11 +104,11 @@ module Citrus
|
|
116
104
|
# The number of times the cache was hit. Only present if memoing is enabled.
|
117
105
|
attr_reader :cache_hits
|
118
106
|
|
119
|
-
# Resets all internal variables so that this object may be used in
|
120
|
-
#
|
121
|
-
def reset
|
122
|
-
super
|
107
|
+
# Resets all internal variables so that this object may be used in another
|
108
|
+
# parse.
|
109
|
+
def reset # :nodoc:
|
123
110
|
@max_offset = 0
|
111
|
+
super
|
124
112
|
end
|
125
113
|
|
126
114
|
# Returns the length of this input.
|
@@ -153,7 +141,7 @@ module Citrus
|
|
153
141
|
# Returns the 0-based number of the line that contains the character at the
|
154
142
|
# given +pos+. +pos+ defaults to the current pointer position.
|
155
143
|
def line_index(pos=pos)
|
156
|
-
p
|
144
|
+
p = n = 0
|
157
145
|
each_line do |line|
|
158
146
|
p += line.length
|
159
147
|
return n if p >= pos
|
@@ -176,20 +164,29 @@ module Citrus
|
|
176
164
|
lines[line_index(pos)]
|
177
165
|
end
|
178
166
|
|
179
|
-
# Returns
|
180
|
-
#
|
181
|
-
|
182
|
-
|
183
|
-
|
167
|
+
# Returns an array of events for the given +rule+ at the current pointer
|
168
|
+
# position. Objects in this array may be one of three types: a rule id,
|
169
|
+
# Citrus::CLOSE, or a length.
|
170
|
+
def exec(rule, events=[])
|
171
|
+
start = pos
|
172
|
+
index = events.size
|
184
173
|
|
185
|
-
|
174
|
+
rule.exec(self, events)
|
175
|
+
|
176
|
+
if index < events.size
|
177
|
+
self.pos = start + events[-1]
|
186
178
|
@max_offset = pos if pos > @max_offset
|
187
179
|
else
|
188
|
-
|
189
|
-
self.pos = offset unless match
|
180
|
+
self.pos = start
|
190
181
|
end
|
191
182
|
|
192
|
-
|
183
|
+
events
|
184
|
+
end
|
185
|
+
|
186
|
+
# Returns the length of a match for the given +rule+ at the current pointer
|
187
|
+
# position, +nil+ if none can be made.
|
188
|
+
def test(rule)
|
189
|
+
rule.exec(self)[-1]
|
193
190
|
end
|
194
191
|
|
195
192
|
# Returns +true+ when using memoization to cache match results.
|
@@ -205,29 +202,31 @@ module Citrus
|
|
205
202
|
def memoize!
|
206
203
|
return if memoized?
|
207
204
|
|
205
|
+
@cache = {}
|
206
|
+
@cache_hits = 0
|
207
|
+
|
208
208
|
# Using +instance_eval+ here preserves access to +super+ within the
|
209
209
|
# methods we define inside the block.
|
210
210
|
instance_eval do
|
211
|
-
def
|
211
|
+
def exec(rule, events=[]) # :nodoc:
|
212
212
|
c = @cache[rule.id] ||= {}
|
213
213
|
|
214
|
-
if c
|
214
|
+
e = if c[pos]
|
215
215
|
@cache_hits += 1
|
216
216
|
c[pos]
|
217
217
|
else
|
218
|
-
c[pos] = super
|
218
|
+
c[pos] = super(rule)
|
219
219
|
end
|
220
|
+
|
221
|
+
events.concat(e)
|
220
222
|
end
|
221
223
|
|
222
224
|
def reset # :nodoc:
|
223
|
-
|
224
|
-
@cache = {}
|
225
|
+
@cache.clear
|
225
226
|
@cache_hits = 0
|
227
|
+
super
|
226
228
|
end
|
227
229
|
end
|
228
|
-
|
229
|
-
@cache = {}
|
230
|
-
@cache_hits = 0
|
231
230
|
end
|
232
231
|
end
|
233
232
|
|
@@ -266,6 +265,16 @@ module Citrus
|
|
266
265
|
super
|
267
266
|
end
|
268
267
|
|
268
|
+
# Parses the given +string+ using this grammar's root rule. Optionally, the
|
269
|
+
# name of a different rule may be given here as the value of the +:root+
|
270
|
+
# option. Otherwise, all options are the same as in Rule#parse.
|
271
|
+
def parse(string, options={})
|
272
|
+
rule_name = options.delete(:root) || root
|
273
|
+
rule = rule(rule_name)
|
274
|
+
raise 'No rule named "%s"' % rule_name unless rule
|
275
|
+
rule.parse(string, options)
|
276
|
+
end
|
277
|
+
|
269
278
|
# Returns the name of this grammar as a string.
|
270
279
|
def name
|
271
280
|
super.to_s
|
@@ -310,9 +319,9 @@ module Citrus
|
|
310
319
|
# and returns it on success. Returns +nil+ on failure.
|
311
320
|
def super_rule(name)
|
312
321
|
sym = name.to_sym
|
313
|
-
included_grammars.each do |
|
314
|
-
|
315
|
-
return
|
322
|
+
included_grammars.each do |grammar|
|
323
|
+
rule = grammar.rule(sym)
|
324
|
+
return rule if rule
|
316
325
|
end
|
317
326
|
nil
|
318
327
|
end
|
@@ -433,48 +442,6 @@ module Citrus
|
|
433
442
|
rule.extension = mod if mod
|
434
443
|
rule
|
435
444
|
end
|
436
|
-
|
437
|
-
# Parses the given input +string+ using the given +options+. If no match can
|
438
|
-
# be made, a ParseError is raised. See #default_parse_options for a detailed
|
439
|
-
# description of available parse options.
|
440
|
-
def parse(string, options={})
|
441
|
-
opts = default_parse_options.merge(options)
|
442
|
-
raise 'No root rule specified' unless opts[:root]
|
443
|
-
|
444
|
-
root_rule = rule(opts[:root])
|
445
|
-
raise 'No rule named "%s"' % root unless root_rule
|
446
|
-
|
447
|
-
input = Input.new(string)
|
448
|
-
input.memoize! if opts[:memoize]
|
449
|
-
input.pos = opts[:offset] if opts[:offset] > 0
|
450
|
-
|
451
|
-
match = input.match(root_rule)
|
452
|
-
if match.nil? || (opts[:consume] && input.length != match.length)
|
453
|
-
raise ParseError.new(input)
|
454
|
-
end
|
455
|
-
|
456
|
-
match
|
457
|
-
end
|
458
|
-
|
459
|
-
# The default set of options that is used in #parse. The options hash may
|
460
|
-
# have any of the following keys:
|
461
|
-
#
|
462
|
-
# offset:: The offset at which the parse should start. Defaults to 0.
|
463
|
-
# root:: The name of the root rule to use for the parse. Defaults
|
464
|
-
# to the name supplied by calling #root.
|
465
|
-
# memoize:: If this is +true+ the matches generated during a parse are
|
466
|
-
# memoized. See Input#memoize! for more information. Defaults to
|
467
|
-
# +false+.
|
468
|
-
# consume:: If this is +true+ a ParseError will be raised during a parse
|
469
|
-
# unless the entire input string is consumed. Defaults to
|
470
|
-
# +false+.
|
471
|
-
def default_parse_options
|
472
|
-
{ :offset => 0,
|
473
|
-
:root => root,
|
474
|
-
:memoize => false,
|
475
|
-
:consume => false
|
476
|
-
}
|
477
|
-
end
|
478
445
|
end
|
479
446
|
|
480
447
|
# A Rule is an object that is used by a grammar to create matches on the
|
@@ -491,12 +458,13 @@ module Citrus
|
|
491
458
|
# Returns a new Rule object depending on the type of object given.
|
492
459
|
def self.new(obj)
|
493
460
|
case obj
|
494
|
-
when Rule
|
495
|
-
when Symbol
|
496
|
-
when String
|
497
|
-
when
|
498
|
-
when
|
499
|
-
when
|
461
|
+
when Rule then obj
|
462
|
+
when Symbol then Alias.new(obj)
|
463
|
+
when String then StringTerminal.new(obj)
|
464
|
+
when Regexp then Terminal.new(obj)
|
465
|
+
when Array then Sequence.new(obj)
|
466
|
+
when Range then Choice.new(obj.to_a)
|
467
|
+
when Numeric then StringTerminal.new(obj.to_s)
|
500
468
|
else
|
501
469
|
raise ArgumentError, "Invalid rule object: %s" % obj.inspect
|
502
470
|
end
|
@@ -504,26 +472,44 @@ module Citrus
|
|
504
472
|
|
505
473
|
@unique_id = 0
|
506
474
|
|
507
|
-
#
|
508
|
-
|
509
|
-
|
475
|
+
# A global registry for Rule objects. Keyed by rule id.
|
476
|
+
@rules = {}
|
477
|
+
|
478
|
+
# Adds the given +rule+ to the global registry and gives it an id.
|
479
|
+
def self.<<(rule) # :nodoc:
|
480
|
+
rule.id = (@unique_id += 1)
|
481
|
+
@rules[rule.id] = rule
|
510
482
|
end
|
511
483
|
|
512
|
-
#
|
513
|
-
|
484
|
+
# Returns the Rule object with the given +id+.
|
485
|
+
def self.[](id)
|
486
|
+
@rules[id]
|
487
|
+
end
|
514
488
|
|
515
|
-
|
516
|
-
|
517
|
-
@id ||= Rule.new_id
|
489
|
+
def initialize(*args) # :nodoc:
|
490
|
+
Rule << self
|
518
491
|
end
|
519
492
|
|
493
|
+
# An integer id that is unique to this rule.
|
494
|
+
attr_accessor :id
|
495
|
+
|
496
|
+
# The grammar this rule belongs to.
|
497
|
+
attr_accessor :grammar
|
498
|
+
|
520
499
|
# Sets the name of this rule.
|
521
500
|
def name=(name)
|
522
501
|
@name = name.to_sym
|
523
502
|
end
|
524
503
|
|
525
|
-
#
|
526
|
-
|
504
|
+
# Returns the name of this rule.
|
505
|
+
def name
|
506
|
+
@name || '<anonymous>'
|
507
|
+
end
|
508
|
+
|
509
|
+
# Returns +true+ if this rule has a name, +false+ otherwise.
|
510
|
+
def named?
|
511
|
+
!! @name
|
512
|
+
end
|
527
513
|
|
528
514
|
# Specifies a module that will be used to extend all Match objects that
|
529
515
|
# result from this rule. If +mod+ is a Proc, it is used to create an
|
@@ -532,9 +518,9 @@ module Citrus
|
|
532
518
|
if Proc === mod
|
533
519
|
begin
|
534
520
|
tmp = Module.new(&mod)
|
535
|
-
raise ArgumentError
|
521
|
+
raise ArgumentError if tmp.instance_methods.empty?
|
536
522
|
mod = tmp
|
537
|
-
rescue ArgumentError, NameError
|
523
|
+
rescue NoMethodError, ArgumentError, NameError
|
538
524
|
mod = Module.new { define_method(:value, &mod) }
|
539
525
|
end
|
540
526
|
end
|
@@ -547,11 +533,70 @@ module Citrus
|
|
547
533
|
# The module this rule uses to extend new matches.
|
548
534
|
attr_reader :extension
|
549
535
|
|
536
|
+
# Attempts to parse the given +string+ and return a Match if any can be
|
537
|
+
# made. The +options+ may contain any of the following keys:
|
538
|
+
#
|
539
|
+
# offset:: The offset in +string+ at which to start the parse. Defaults
|
540
|
+
# to 0.
|
541
|
+
# memoize:: If this is +true+ the matches generated during a parse are
|
542
|
+
# memoized. See Input#memoize! for more information. Defaults to
|
543
|
+
# +false+.
|
544
|
+
# consume:: If this is +true+ a ParseError will be raised during a parse
|
545
|
+
# unless the entire input string is consumed. Defaults to
|
546
|
+
# +false+.
|
547
|
+
def parse(string, options={})
|
548
|
+
opts = default_parse_options.merge(options)
|
549
|
+
|
550
|
+
input = Input.new(string)
|
551
|
+
input.memoize! if opts[:memoize]
|
552
|
+
input.pos = opts[:offset] if opts[:offset] > 0
|
553
|
+
|
554
|
+
start = input.pos
|
555
|
+
events = input.exec(self)
|
556
|
+
length = events[-1]
|
557
|
+
|
558
|
+
if !length || (opts[:consume] && length < (input.length - opts[:offset]))
|
559
|
+
raise ParseError.new(input)
|
560
|
+
end
|
561
|
+
|
562
|
+
Match.new(string.slice(start, length), events)
|
563
|
+
end
|
564
|
+
|
565
|
+
# The default set of options to use when parsing.
|
566
|
+
def default_parse_options # :nodoc:
|
567
|
+
{ :offset => 0,
|
568
|
+
:memoize => false,
|
569
|
+
:consume => false
|
570
|
+
}
|
571
|
+
end
|
572
|
+
|
573
|
+
# Tests whether or not this rule matches on the given +string+. Returns the
|
574
|
+
# length of the match if any can be made, +nil+ otherwise.
|
575
|
+
def test(string)
|
576
|
+
input = Input.new(string)
|
577
|
+
input.test(self)
|
578
|
+
end
|
579
|
+
|
550
580
|
# Returns +true+ if this rule is a Terminal.
|
551
581
|
def terminal?
|
552
582
|
is_a?(Terminal)
|
553
583
|
end
|
554
584
|
|
585
|
+
# Returns +true+ if this rule is able to propagate extensions from child
|
586
|
+
# rules to the scope of the parent, +false+ otherwise. In general, this will
|
587
|
+
# return +false+ for any rule whose match value is derived from an arbitrary
|
588
|
+
# number of child rules, such as a Repeat or a Sequence. Note that this is
|
589
|
+
# not true for Choice objects because they rely on exactly 1 rule to match,
|
590
|
+
# as do Proxy objects.
|
591
|
+
def propagates_extensions?
|
592
|
+
case self
|
593
|
+
when AndPredicate, NotPredicate, ButPredicate, Repeat, Sequence
|
594
|
+
false
|
595
|
+
else
|
596
|
+
true
|
597
|
+
end
|
598
|
+
end
|
599
|
+
|
555
600
|
# Returns +true+ if this rule needs to be surrounded by parentheses when
|
556
601
|
# using #embed.
|
557
602
|
def paren?
|
@@ -561,23 +606,90 @@ module Citrus
|
|
561
606
|
# Returns a string version of this rule that is suitable to be used in the
|
562
607
|
# string representation of another rule.
|
563
608
|
def embed
|
564
|
-
|
609
|
+
named? ? name.to_s : (paren? ? '(%s)' % to_s : to_s)
|
565
610
|
end
|
566
611
|
|
567
612
|
def inspect # :nodoc:
|
568
613
|
to_s
|
569
614
|
end
|
615
|
+
end
|
570
616
|
|
571
|
-
|
617
|
+
# A Terminal is a Rule that matches directly on the input stream and may not
|
618
|
+
# contain any other rule. Terminals are essentially wrappers for regular
|
619
|
+
# expressions. As such, the Citrus notation is identical to Ruby's regular
|
620
|
+
# expression notation, e.g.:
|
621
|
+
#
|
622
|
+
# /expr/
|
623
|
+
#
|
624
|
+
# Character classes and the dot symbol may also be used in Citrus notation for
|
625
|
+
# compatibility with other parsing expression implementations, e.g.:
|
626
|
+
#
|
627
|
+
# [a-zA-Z]
|
628
|
+
# .
|
629
|
+
#
|
630
|
+
class Terminal
|
631
|
+
include Rule
|
632
|
+
|
633
|
+
def initialize(rule=/^/)
|
634
|
+
super
|
635
|
+
@rule = rule
|
636
|
+
end
|
637
|
+
|
638
|
+
# The actual Regexp object this rule uses to match.
|
639
|
+
attr_reader :rule
|
640
|
+
|
641
|
+
# Returns an array of events for this rule on the given +input+.
|
642
|
+
def exec(input, events=[])
|
643
|
+
length = input.scan_full(rule, false, false)
|
644
|
+
if length
|
645
|
+
events << id
|
646
|
+
events << CLOSE
|
647
|
+
events << length
|
648
|
+
end
|
649
|
+
events
|
650
|
+
end
|
572
651
|
|
573
|
-
|
574
|
-
|
575
|
-
|
576
|
-
match
|
652
|
+
# Returns +true+ if this rule is case sensitive.
|
653
|
+
def case_sensitive?
|
654
|
+
!rule.casefold?
|
577
655
|
end
|
578
656
|
|
579
|
-
|
580
|
-
|
657
|
+
# Returns the Citrus notation of this rule as a string.
|
658
|
+
def to_s
|
659
|
+
rule.inspect
|
660
|
+
end
|
661
|
+
end
|
662
|
+
|
663
|
+
# A StringTerminal is a Terminal that may be instantiated from a String
|
664
|
+
# object. The Citrus notation is any sequence of characters enclosed in either
|
665
|
+
# single or double quotes, e.g.:
|
666
|
+
#
|
667
|
+
# 'expr'
|
668
|
+
# "expr"
|
669
|
+
#
|
670
|
+
# This notation works the same as it does in Ruby; i.e. strings in double
|
671
|
+
# quotes may contain escape sequences while strings in single quotes may not.
|
672
|
+
# In order to specify that a string should ignore case when matching, enclose
|
673
|
+
# it in backticks instead of single or double quotes, e.g.:
|
674
|
+
#
|
675
|
+
# `expr`
|
676
|
+
#
|
677
|
+
# Besides case sensitivity, case-insensitive strings have the same semantics
|
678
|
+
# as double-quoted strings.
|
679
|
+
class StringTerminal < Terminal
|
680
|
+
# The +flags+ will be passed directly to Regexp#new.
|
681
|
+
def initialize(rule='', flags=0)
|
682
|
+
super(Regexp.new(Regexp.escape(rule), flags))
|
683
|
+
@string = rule
|
684
|
+
end
|
685
|
+
|
686
|
+
# Returns the Citrus notation of this rule as a string.
|
687
|
+
def to_s
|
688
|
+
if case_sensitive?
|
689
|
+
@string.inspect
|
690
|
+
else
|
691
|
+
@string.inspect.gsub(/^"|"$/, '`')
|
692
|
+
end
|
581
693
|
end
|
582
694
|
end
|
583
695
|
|
@@ -589,6 +701,7 @@ module Citrus
|
|
589
701
|
include Rule
|
590
702
|
|
591
703
|
def initialize(rule_name='<proxy>')
|
704
|
+
super
|
592
705
|
self.rule_name = rule_name
|
593
706
|
end
|
594
707
|
|
@@ -605,10 +718,9 @@ module Citrus
|
|
605
718
|
@rule ||= resolve!
|
606
719
|
end
|
607
720
|
|
608
|
-
# Returns
|
609
|
-
def
|
610
|
-
|
611
|
-
extend_match(m, name) if m
|
721
|
+
# Returns an array of events for this rule on the given +input+.
|
722
|
+
def exec(input, events=[])
|
723
|
+
input.exec(rule, events)
|
612
724
|
end
|
613
725
|
end
|
614
726
|
|
@@ -631,10 +743,8 @@ module Citrus
|
|
631
743
|
# Searches this proxy's grammar and any included grammars for a rule with
|
632
744
|
# this proxy's #rule_name. Raises an error if one cannot be found.
|
633
745
|
def resolve!
|
634
|
-
|
635
|
-
|
636
|
-
[rule_name, grammar.name] unless rule
|
637
|
-
rule
|
746
|
+
grammar.rule(rule_name) or raise RuntimeError,
|
747
|
+
'No rule named "%s" in grammar %s' % [rule_name, grammar.name]
|
638
748
|
end
|
639
749
|
end
|
640
750
|
|
@@ -658,60 +768,8 @@ module Citrus
|
|
658
768
|
# Searches this proxy's included grammars for a rule with this proxy's
|
659
769
|
# #rule_name. Raises an error if one cannot be found.
|
660
770
|
def resolve!
|
661
|
-
|
662
|
-
|
663
|
-
[rule_name, grammar.name] unless rule
|
664
|
-
rule
|
665
|
-
end
|
666
|
-
end
|
667
|
-
|
668
|
-
# A Terminal is a Rule that matches directly on the input stream and may not
|
669
|
-
# contain any other rule. Terminals may be created from either a String or a
|
670
|
-
# Regexp object. When created from strings, the Citrus notation is any
|
671
|
-
# sequence of characters enclosed in either single or double quotes, e.g.:
|
672
|
-
#
|
673
|
-
# 'expr'
|
674
|
-
# "expr"
|
675
|
-
#
|
676
|
-
# When created from a regular expression, the Citrus notation is identical to
|
677
|
-
# Ruby's regular expression notation, e.g.:
|
678
|
-
#
|
679
|
-
# /expr/
|
680
|
-
#
|
681
|
-
# Character classes and the dot symbol may also be used in Citrus notation for
|
682
|
-
# compatibility with other parsing expression implementations, e.g.:
|
683
|
-
#
|
684
|
-
# [a-zA-Z]
|
685
|
-
# .
|
686
|
-
#
|
687
|
-
class Terminal
|
688
|
-
include Rule
|
689
|
-
|
690
|
-
def initialize(rule='')
|
691
|
-
case rule
|
692
|
-
when String
|
693
|
-
@string = rule
|
694
|
-
@rule = Regexp.new(Regexp.escape(rule))
|
695
|
-
when Regexp
|
696
|
-
@rule = rule
|
697
|
-
else
|
698
|
-
raise ArgumentError, "Cannot create terminal from object: %s" %
|
699
|
-
rule.inspect
|
700
|
-
end
|
701
|
-
end
|
702
|
-
|
703
|
-
# The actual Regexp object this rule uses to match.
|
704
|
-
attr_reader :rule
|
705
|
-
|
706
|
-
# Returns the Match for this rule on +input+, +nil+ if no match can be made.
|
707
|
-
def match(input)
|
708
|
-
m = input.scan(rule)
|
709
|
-
create_match(m) if m
|
710
|
-
end
|
711
|
-
|
712
|
-
# Returns the Citrus notation of this rule as a string.
|
713
|
-
def to_s
|
714
|
-
(@string || @rule).inspect
|
771
|
+
grammar.super_rule(rule_name) or raise RuntimeError,
|
772
|
+
'No rule named "%s" in hierarchy of grammar %s' % [rule_name, grammar.name]
|
715
773
|
end
|
716
774
|
end
|
717
775
|
|
@@ -723,15 +781,16 @@ module Citrus
|
|
723
781
|
include Rule
|
724
782
|
|
725
783
|
def initialize(rules=[])
|
784
|
+
super
|
726
785
|
@rules = rules.map {|r| Rule.new(r) }
|
727
786
|
end
|
728
787
|
|
729
788
|
# An array of the actual Rule objects this rule uses to match.
|
730
789
|
attr_reader :rules
|
731
790
|
|
732
|
-
def grammar=(grammar)
|
733
|
-
@rules.each {|r| r.grammar = grammar }
|
791
|
+
def grammar=(grammar) # :nodoc:
|
734
792
|
super
|
793
|
+
@rules.each {|r| r.grammar = grammar }
|
735
794
|
end
|
736
795
|
end
|
737
796
|
|
@@ -758,9 +817,14 @@ module Citrus
|
|
758
817
|
class AndPredicate
|
759
818
|
include Predicate
|
760
819
|
|
761
|
-
# Returns
|
762
|
-
def
|
763
|
-
|
820
|
+
# Returns an array of events for this rule on the given +input+.
|
821
|
+
def exec(input, events=[])
|
822
|
+
if input.test(rule)
|
823
|
+
events << id
|
824
|
+
events << CLOSE
|
825
|
+
events << 0
|
826
|
+
end
|
827
|
+
events
|
764
828
|
end
|
765
829
|
|
766
830
|
# Returns the Citrus notation of this rule as a string.
|
@@ -778,9 +842,14 @@ module Citrus
|
|
778
842
|
class NotPredicate
|
779
843
|
include Predicate
|
780
844
|
|
781
|
-
# Returns
|
782
|
-
def
|
783
|
-
|
845
|
+
# Returns an array of events for this rule on the given +input+.
|
846
|
+
def exec(input, events=[])
|
847
|
+
unless input.test(rule)
|
848
|
+
events << id
|
849
|
+
events << CLOSE
|
850
|
+
events << 0
|
851
|
+
end
|
852
|
+
events
|
784
853
|
end
|
785
854
|
|
786
855
|
# Returns the Citrus notation of this rule as a string.
|
@@ -800,16 +869,20 @@ module Citrus
|
|
800
869
|
|
801
870
|
DOT_RULE = Rule.new(DOT)
|
802
871
|
|
803
|
-
# Returns
|
804
|
-
def
|
805
|
-
|
806
|
-
|
807
|
-
|
808
|
-
break unless
|
809
|
-
|
872
|
+
# Returns an array of events for this rule on the given +input+.
|
873
|
+
def exec(input, events=[])
|
874
|
+
length = 0
|
875
|
+
until input.test(rule)
|
876
|
+
len = input.exec(DOT_RULE)[-1]
|
877
|
+
break unless len
|
878
|
+
length += len
|
879
|
+
end
|
880
|
+
if length > 0
|
881
|
+
events << id
|
882
|
+
events << CLOSE
|
883
|
+
events << length
|
810
884
|
end
|
811
|
-
|
812
|
-
create_match(matches.join) if matches.any?
|
885
|
+
events
|
813
886
|
end
|
814
887
|
|
815
888
|
# Returns the Citrus notation of this rule as a string.
|
@@ -841,12 +914,9 @@ module Citrus
|
|
841
914
|
# The label this rule adds to all its matches.
|
842
915
|
attr_reader :label
|
843
916
|
|
844
|
-
# Returns
|
845
|
-
|
846
|
-
|
847
|
-
def match(input)
|
848
|
-
m = input.match(rule)
|
849
|
-
extend_match(m, label) if m
|
917
|
+
# Returns an array of events for this rule on the given +input+.
|
918
|
+
def exec(input, events=[])
|
919
|
+
input.exec(rule, events)
|
850
920
|
end
|
851
921
|
|
852
922
|
# Returns the Citrus notation of this rule as a string.
|
@@ -878,20 +948,32 @@ module Citrus
|
|
878
948
|
include Predicate
|
879
949
|
|
880
950
|
def initialize(rule='', min=1, max=Infinity)
|
881
|
-
super(rule)
|
882
951
|
raise ArgumentError, "Min cannot be greater than max" if min > max
|
952
|
+
super(rule)
|
883
953
|
@range = Range.new(min, max)
|
884
954
|
end
|
885
955
|
|
886
|
-
# Returns
|
887
|
-
def
|
888
|
-
|
889
|
-
|
890
|
-
|
891
|
-
|
892
|
-
|
956
|
+
# Returns an array of events for this rule on the given +input+.
|
957
|
+
def exec(input, events=[])
|
958
|
+
events << id
|
959
|
+
|
960
|
+
index = events.size
|
961
|
+
start = index - 1
|
962
|
+
length = n = 0
|
963
|
+
while n < max && input.exec(rule, events).size > index
|
964
|
+
index = events.size
|
965
|
+
length += events[-1]
|
966
|
+
n += 1
|
893
967
|
end
|
894
|
-
|
968
|
+
|
969
|
+
if n >= min
|
970
|
+
events << CLOSE
|
971
|
+
events << length
|
972
|
+
else
|
973
|
+
events.slice!(start, events.size)
|
974
|
+
end
|
975
|
+
|
976
|
+
events
|
895
977
|
end
|
896
978
|
|
897
979
|
# The minimum number of times this rule must match.
|
@@ -941,13 +1023,25 @@ module Citrus
|
|
941
1023
|
class Choice
|
942
1024
|
include List
|
943
1025
|
|
944
|
-
# Returns
|
945
|
-
def
|
946
|
-
|
947
|
-
|
948
|
-
|
1026
|
+
# Returns an array of events for this rule on the given +input+.
|
1027
|
+
def exec(input, events=[])
|
1028
|
+
events << id
|
1029
|
+
|
1030
|
+
index = events.size
|
1031
|
+
start = index - 1
|
1032
|
+
n = 0
|
1033
|
+
while n < rules.length && input.exec(rules[n], events).size == index
|
1034
|
+
n += 1
|
949
1035
|
end
|
950
|
-
|
1036
|
+
|
1037
|
+
if index < events.size
|
1038
|
+
events << CLOSE
|
1039
|
+
events << events[-2]
|
1040
|
+
else
|
1041
|
+
events.slice!(start, events.size)
|
1042
|
+
end
|
1043
|
+
|
1044
|
+
events
|
951
1045
|
end
|
952
1046
|
|
953
1047
|
# Returns the Citrus notation of this rule as a string.
|
@@ -964,15 +1058,27 @@ module Citrus
|
|
964
1058
|
class Sequence
|
965
1059
|
include List
|
966
1060
|
|
967
|
-
# Returns
|
968
|
-
def
|
969
|
-
|
970
|
-
|
971
|
-
|
972
|
-
|
973
|
-
|
1061
|
+
# Returns an array of events for this rule on the given +input+.
|
1062
|
+
def exec(input, events=[])
|
1063
|
+
events << id
|
1064
|
+
|
1065
|
+
index = events.size
|
1066
|
+
start = index - 1
|
1067
|
+
length = n = 0
|
1068
|
+
while n < rules.length && input.exec(rules[n], events).size > index
|
1069
|
+
index = events.size
|
1070
|
+
length += events[-1]
|
1071
|
+
n += 1
|
1072
|
+
end
|
1073
|
+
|
1074
|
+
if n == rules.length
|
1075
|
+
events << CLOSE
|
1076
|
+
events << length
|
1077
|
+
else
|
1078
|
+
events.slice!(start, events.size)
|
974
1079
|
end
|
975
|
-
|
1080
|
+
|
1081
|
+
events
|
976
1082
|
end
|
977
1083
|
|
978
1084
|
# Returns the Citrus notation of this rule as a string.
|
@@ -985,19 +1091,19 @@ module Citrus
|
|
985
1091
|
# match may contain any number of other matches. This class provides several
|
986
1092
|
# convenient tree traversal methods that help when examining parse results.
|
987
1093
|
class Match < String
|
988
|
-
def initialize(
|
989
|
-
|
990
|
-
|
991
|
-
|
992
|
-
|
993
|
-
|
994
|
-
|
995
|
-
|
996
|
-
raise ArgumentError, "Cannot create match from object: %s" %
|
997
|
-
data.inspect
|
998
|
-
end
|
1094
|
+
def initialize(string, events=[])
|
1095
|
+
raise ArgumentError, "Invalid events for match length %d" %
|
1096
|
+
string.length if events[-1] && string.length != events[-1]
|
1097
|
+
|
1098
|
+
super(string)
|
1099
|
+
@events = events
|
1100
|
+
|
1101
|
+
extend!
|
999
1102
|
end
|
1000
1103
|
|
1104
|
+
# The array of events that was passed to the constructor.
|
1105
|
+
attr_reader :events
|
1106
|
+
|
1001
1107
|
# An array of all names of this match. A name is added to a match object
|
1002
1108
|
# for each rule that returns that object when matching. These names can then
|
1003
1109
|
# be used to determine which rules were satisfied by a given match.
|
@@ -1012,20 +1118,64 @@ module Citrus
|
|
1012
1118
|
|
1013
1119
|
# Returns +true+ if this match has the given +name+.
|
1014
1120
|
def has_name?(name)
|
1015
|
-
names.include?(name)
|
1121
|
+
names.include?(name.to_sym)
|
1122
|
+
end
|
1123
|
+
|
1124
|
+
# Returns an array of all Rule objects that extend this match.
|
1125
|
+
def extenders
|
1126
|
+
@extenders ||= begin
|
1127
|
+
extenders = []
|
1128
|
+
@events.each do |event|
|
1129
|
+
break if event == CLOSE
|
1130
|
+
rule = Rule[event]
|
1131
|
+
extenders.unshift(rule)
|
1132
|
+
break unless rule.propagates_extensions?
|
1133
|
+
end
|
1134
|
+
extenders
|
1135
|
+
end
|
1136
|
+
end
|
1137
|
+
|
1138
|
+
# Returns a reference to the Rule object that first created this match.
|
1139
|
+
def creator
|
1140
|
+
extenders.first
|
1016
1141
|
end
|
1017
1142
|
|
1018
|
-
#
|
1143
|
+
# Returns an array of Match objects that are submatches of this match in the
|
1144
|
+
# order they appeared in the input.
|
1019
1145
|
def matches
|
1020
|
-
@matches ||=
|
1146
|
+
@matches ||= begin
|
1147
|
+
matches = []
|
1148
|
+
stack = []
|
1149
|
+
offset = 0
|
1150
|
+
close = false
|
1151
|
+
index = 0
|
1152
|
+
|
1153
|
+
while index < @events.size
|
1154
|
+
event = @events[index]
|
1155
|
+
if close
|
1156
|
+
start = stack.pop
|
1157
|
+
if stack.size == extenders.size
|
1158
|
+
matches << Match.new(slice(offset, event), @events[start..index])
|
1159
|
+
offset += event
|
1160
|
+
end
|
1161
|
+
close = false
|
1162
|
+
elsif event == CLOSE
|
1163
|
+
close = true
|
1164
|
+
else
|
1165
|
+
stack << index
|
1166
|
+
end
|
1167
|
+
index += 1
|
1168
|
+
end
|
1169
|
+
|
1170
|
+
matches
|
1171
|
+
end
|
1021
1172
|
end
|
1022
1173
|
|
1023
1174
|
# Returns an array of all sub-matches with the given +name+. If +deep+ is
|
1024
1175
|
# +false+, returns only sub-matches that are immediate descendants of this
|
1025
1176
|
# match.
|
1026
1177
|
def find(name, deep=true)
|
1027
|
-
|
1028
|
-
ms = matches.select {|m| m.has_name?(sym) }
|
1178
|
+
ms = matches.select {|m| m.has_name?(name) }
|
1029
1179
|
matches.each {|m| ms.concat(m.find(name, deep)) } if deep
|
1030
1180
|
ms
|
1031
1181
|
end
|
@@ -1034,31 +1184,44 @@ module Citrus
|
|
1034
1184
|
# +name+ is given, attempts to retrieve the first immediate sub-match named
|
1035
1185
|
# +name+.
|
1036
1186
|
def first(name=nil)
|
1037
|
-
name
|
1187
|
+
name ? find(name, false).first : matches.first
|
1038
1188
|
end
|
1039
1189
|
|
1040
|
-
#
|
1041
|
-
#
|
1042
|
-
def
|
1043
|
-
|
1190
|
+
# Allows sub-matches of this match to be retrieved by name as instance
|
1191
|
+
# methods.
|
1192
|
+
def method_missing(sym, *args)
|
1193
|
+
if sym == :to_ary
|
1194
|
+
# This is a workaround for a bug in Ruby 1.9 with classes that
|
1195
|
+
# extend String.
|
1196
|
+
super
|
1197
|
+
else
|
1198
|
+
first(sym) or raise NoMatchError, 'No match named "%s" in %s (%s)' %
|
1199
|
+
[sym, self, name]
|
1200
|
+
end
|
1044
1201
|
end
|
1045
1202
|
|
1046
|
-
#
|
1047
|
-
|
1048
|
-
|
1203
|
+
# Returns a string representation of this match that displays the entire
|
1204
|
+
# match tree for easy viewing in the console.
|
1205
|
+
def dump
|
1206
|
+
dump_lines.join("\n")
|
1049
1207
|
end
|
1050
1208
|
|
1051
|
-
|
1052
|
-
|
1053
|
-
|
1054
|
-
|
1055
|
-
|
1056
|
-
|
1057
|
-
[sym, self, name || '<anonymous>']
|
1209
|
+
def dump_lines(indent=' ') # :nodoc:
|
1210
|
+
line = to_s.inspect
|
1211
|
+
line << ' (%s)' % names.join(',') unless names.empty?
|
1212
|
+
matches.inject([line]) do |lines, m|
|
1213
|
+
lines.concat(m.dump_lines(indent).map {|line| indent + line })
|
1214
|
+
end
|
1058
1215
|
end
|
1059
1216
|
|
1060
|
-
|
1061
|
-
|
1217
|
+
private
|
1218
|
+
|
1219
|
+
# Extends this match with the extensions provided by its #rules.
|
1220
|
+
def extend! # :nodoc:
|
1221
|
+
extenders.each do |rule|
|
1222
|
+
self.names << rule.name if rule.named?
|
1223
|
+
extend(rule.extension) if rule.extension
|
1224
|
+
end
|
1062
1225
|
end
|
1063
1226
|
end
|
1064
1227
|
end
|