citrus 2.2.2 → 2.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README +160 -98
- data/doc/background.markdown +16 -17
- data/doc/example.markdown +86 -46
- data/doc/syntax.markdown +59 -36
- data/examples/calc.citrus +9 -3
- data/examples/calc.rb +9 -3
- data/examples/ip.rb +2 -0
- data/lib/citrus.rb +576 -473
- data/lib/citrus/file.rb +80 -71
- data/test/alias_test.rb +8 -2
- data/test/and_predicate_test.rb +13 -2
- data/test/but_predicate_test.rb +9 -3
- data/test/calc_file_test.rb +9 -4
- data/test/calc_test.rb +4 -4
- data/test/choice_test.rb +11 -5
- data/test/extension_test.rb +2 -12
- data/test/file_test.rb +215 -175
- data/test/grammar_test.rb +1 -1
- data/test/helper.rb +2 -2
- data/test/input_test.rb +44 -48
- data/test/label_test.rb +14 -17
- data/test/match_test.rb +21 -63
- data/test/not_predicate_test.rb +13 -2
- data/test/repeat_test.rb +20 -20
- data/test/sequence_test.rb +22 -8
- data/test/string_terminal_test.rb +10 -5
- data/test/super_test.rb +19 -16
- data/test/terminal_test.rb +7 -2
- metadata +21 -12
- data/benchmark/after.dat +0 -192
- data/benchmark/before.dat +0 -192
- data/test/_files/grammar3.citrus +0 -112
data/doc/syntax.markdown
CHANGED
@@ -27,6 +27,9 @@ match in a case-insensitive manner.
|
|
27
27
|
|
28
28
|
`abc` # match "abc" in any case
|
29
29
|
|
30
|
+
Besides case sensitivity, case-insensitive strings have the same behavior as
|
31
|
+
double quoted strings.
|
32
|
+
|
30
33
|
See [Terminal](api/classes/Citrus/Terminal.html) and
|
31
34
|
[StringTerminal](api/classes/Citrus/StringTerminal.html) for more information.
|
32
35
|
|
@@ -69,6 +72,9 @@ that does not match a given expression.
|
|
69
72
|
~'a' # match all characters until an "a"
|
70
73
|
~/xyz/ # match all characters until /xyz/ matches
|
71
74
|
|
75
|
+
When using this operator (the tilde), at least one character must be consumed
|
76
|
+
for the rule to succeed.
|
77
|
+
|
72
78
|
See [AndPredicate](api/classes/Citrus/AndPredicate.html),
|
73
79
|
[NotPredicate](api/classes/Citrus/NotPredicate.html), and
|
74
80
|
[ButPredicate](api/classes/Citrus/ButPredicate.html) for more information.
|
@@ -98,25 +104,25 @@ levels of precedence is below.
|
|
98
104
|
|
99
105
|
See [Choice](api/classes/Citrus/Choice.html) for more information.
|
100
106
|
|
107
|
+
## Grouping
|
108
|
+
|
109
|
+
As is common in many programming languages, parentheses may be used to override
|
110
|
+
the normal binding order of operators. In the following example parentheses are
|
111
|
+
used to make the vertical bar between `'b'` and `'c'` bind tighter than the
|
112
|
+
space between `'a'` and `'b'`.
|
113
|
+
|
114
|
+
'a' ('b' | 'c') # match "a", then "b" or "c"
|
115
|
+
|
101
116
|
## Labels
|
102
117
|
|
103
118
|
Match objects may be referred to by a different name than the rule that
|
104
|
-
originally generated them. Labels are
|
119
|
+
originally generated them. Labels are added by placing the label and a colon
|
105
120
|
immediately preceding any expression.
|
106
121
|
|
107
122
|
chars:/[a-z]+/ # the characters matched by the regular expression
|
108
123
|
# may be referred to as "chars" in an extension
|
109
124
|
# method
|
110
125
|
|
111
|
-
See [Label](api/classes/Citrus/Label.html) for more information.
|
112
|
-
|
113
|
-
## Grouping
|
114
|
-
|
115
|
-
As is common in many programming languages, parentheses may be used to override
|
116
|
-
the normal binding order of operators.
|
117
|
-
|
118
|
-
'a' ('b' | 'c') # match "a", then "b" or "c"
|
119
|
-
|
120
126
|
## Extensions
|
121
127
|
|
122
128
|
Extensions may be specified using either "module" or "block" syntax. When using
|
@@ -128,17 +134,16 @@ in between less than and greater than symbols.
|
|
128
134
|
# times and extend the match with the
|
129
135
|
# CouponCode module
|
130
136
|
|
131
|
-
Additionally, extensions may be specified inline using curly braces.
|
132
|
-
|
133
|
-
|
137
|
+
Additionally, extensions may be specified inline using curly braces. When using
|
138
|
+
this method, the code inside the curly braces may be invoked by calling the
|
139
|
+
`value` method on the match object.
|
134
140
|
|
135
|
-
# match any digit and return its integer value when
|
136
|
-
|
137
|
-
|
138
|
-
|
139
|
-
|
140
|
-
|
141
|
-
}
|
141
|
+
[0-9] { to_i } # match any digit and return its integer value when
|
142
|
+
# calling the #value method on the match object
|
143
|
+
|
144
|
+
Note that when using the inline block method you may also specify arguments in
|
145
|
+
between vertical bars immediately following the opening curly brace, just like
|
146
|
+
in Ruby blocks.
|
142
147
|
|
143
148
|
## Super
|
144
149
|
|
@@ -146,6 +151,24 @@ When including a grammar inside another, all rules in the child that have the
|
|
146
151
|
same name as a rule in the parent also have access to the `super` keyword to
|
147
152
|
invoke the parent rule.
|
148
153
|
|
154
|
+
grammar Number
|
155
|
+
def number
|
156
|
+
[0-9]+
|
157
|
+
end
|
158
|
+
end
|
159
|
+
|
160
|
+
grammar FloatingPoint
|
161
|
+
include Number
|
162
|
+
|
163
|
+
rule number
|
164
|
+
super ('.' super)?
|
165
|
+
end
|
166
|
+
end
|
167
|
+
|
168
|
+
In the example above, the `FloatingPoint` grammar includes `Number`. Both have a
|
169
|
+
rule named `number`, so `FloatingPoint#number` has access to `Number#number` by
|
170
|
+
means of using `super`.
|
171
|
+
|
149
172
|
See [Super](api/classes/Citrus/Super.html) for more information.
|
150
173
|
|
151
174
|
## Precedence
|
@@ -155,21 +178,21 @@ their precedence. A higher precedence indicates tighter binding.
|
|
155
178
|
|
156
179
|
Operator | Name | Precedence
|
157
180
|
------------------------- | ------------------------- | ----------
|
158
|
-
`''` | String (single quoted) |
|
159
|
-
`""` | String (double quoted) |
|
160
|
-
<code>``</code> | String (case insensitive) |
|
161
|
-
`[]` | Character class |
|
162
|
-
`.` | Dot (any character) |
|
163
|
-
`//` | Regular expression |
|
164
|
-
`()` | Grouping |
|
165
|
-
`*` | Repetition (arbitrary) |
|
166
|
-
`+` | Repetition (one or more) |
|
167
|
-
`?` | Repetition (zero or one) |
|
168
|
-
`&` | And predicate |
|
169
|
-
`!` | Not predicate |
|
170
|
-
`~` | But predicate |
|
171
|
-
|
172
|
-
|
173
|
-
|
181
|
+
`''` | String (single quoted) | 7
|
182
|
+
`""` | String (double quoted) | 7
|
183
|
+
<code>``</code> | String (case insensitive) | 7
|
184
|
+
`[]` | Character class | 7
|
185
|
+
`.` | Dot (any character) | 7
|
186
|
+
`//` | Regular expression | 7
|
187
|
+
`()` | Grouping | 7
|
188
|
+
`*` | Repetition (arbitrary) | 6
|
189
|
+
`+` | Repetition (one or more) | 6
|
190
|
+
`?` | Repetition (zero or one) | 6
|
191
|
+
`&` | And predicate | 5
|
192
|
+
`!` | Not predicate | 5
|
193
|
+
`~` | But predicate | 5
|
194
|
+
`<>` | Extension (module name) | 4
|
195
|
+
`{}` | Extension (literal) | 4
|
196
|
+
`:` | Label | 3
|
174
197
|
`e1 e2` | Sequence | 2
|
175
198
|
<code>e1 | e2</code> | Ordered choice | 1
|
data/examples/calc.citrus
CHANGED
@@ -52,7 +52,9 @@ grammar Calc
|
|
52
52
|
end
|
53
53
|
|
54
54
|
rule group
|
55
|
-
(lparen term rparen) {
|
55
|
+
(lparen term rparen) {
|
56
|
+
term.value
|
57
|
+
}
|
56
58
|
end
|
57
59
|
|
58
60
|
## Lexical syntax
|
@@ -62,11 +64,15 @@ grammar Calc
|
|
62
64
|
end
|
63
65
|
|
64
66
|
rule float
|
65
|
-
(digits '.' digits space*) {
|
67
|
+
(digits '.' digits space*) {
|
68
|
+
strip.to_f
|
69
|
+
}
|
66
70
|
end
|
67
71
|
|
68
72
|
rule integer
|
69
|
-
(digits space*) {
|
73
|
+
(digits space*) {
|
74
|
+
strip.to_i
|
75
|
+
}
|
70
76
|
end
|
71
77
|
|
72
78
|
rule digits
|
data/examples/calc.rb
CHANGED
@@ -55,7 +55,9 @@ grammar :Calc do
|
|
55
55
|
end
|
56
56
|
|
57
57
|
rule :group do
|
58
|
-
all(:lparen, :term, :rparen) {
|
58
|
+
all(:lparen, :term, :rparen) {
|
59
|
+
term.value
|
60
|
+
}
|
59
61
|
end
|
60
62
|
|
61
63
|
## Lexical syntax
|
@@ -65,11 +67,15 @@ grammar :Calc do
|
|
65
67
|
end
|
66
68
|
|
67
69
|
rule :float do
|
68
|
-
all(:digits, '.', :digits, zero_or_more(:space)) {
|
70
|
+
all(:digits, '.', :digits, zero_or_more(:space)) {
|
71
|
+
strip.to_f
|
72
|
+
}
|
69
73
|
end
|
70
74
|
|
71
75
|
rule :integer do
|
72
|
-
all(:digits, zero_or_more(:space)) {
|
76
|
+
all(:digits, zero_or_more(:space)) {
|
77
|
+
strip.to_i
|
78
|
+
}
|
73
79
|
end
|
74
80
|
|
75
81
|
rule :digits do
|
data/examples/ip.rb
CHANGED
data/lib/citrus.rb
CHANGED
@@ -8,7 +8,7 @@ require 'strscan'
|
|
8
8
|
module Citrus
|
9
9
|
autoload :File, 'citrus/file'
|
10
10
|
|
11
|
-
VERSION = [2,
|
11
|
+
VERSION = [2, 3, 0]
|
12
12
|
|
13
13
|
# Returns the current version of Citrus as a string.
|
14
14
|
def self.version
|
@@ -20,37 +20,47 @@ module Citrus
|
|
20
20
|
|
21
21
|
Infinity = 1.0 / 0
|
22
22
|
|
23
|
-
F = ::File
|
24
|
-
|
25
23
|
CLOSE = -1
|
26
24
|
|
27
|
-
#
|
28
|
-
def self.
|
29
|
-
|
30
|
-
raise "Cannot find file #{file}" unless F.file?(file)
|
31
|
-
raise "Cannot read file #{file}" unless F.readable?(file)
|
32
|
-
eval(F.read(file))
|
25
|
+
# Parses the given Citrus +code+ using +options+.
|
26
|
+
def self.parse(code, options={})
|
27
|
+
File.parse(code, options)
|
33
28
|
end
|
34
29
|
|
35
30
|
# Evaluates the given Citrus parsing expression grammar +code+ in the global
|
36
|
-
# scope. Returns an array of any grammar modules that are created.
|
37
|
-
#
|
31
|
+
# scope. Returns an array of any grammar modules that are created.
|
32
|
+
#
|
33
|
+
# Citrus.eval(<<CITRUS)
|
34
|
+
# grammar MyGrammar
|
35
|
+
# rule abc
|
36
|
+
# "abc"
|
37
|
+
# end
|
38
|
+
# end
|
39
|
+
# CITRUS
|
40
|
+
#
|
38
41
|
def self.eval(code)
|
39
|
-
parse(code
|
42
|
+
parse(code).value
|
40
43
|
end
|
41
44
|
|
42
|
-
#
|
43
|
-
#
|
44
|
-
|
45
|
-
|
45
|
+
# Evaluates the given expression and creates a new Rule object from it.
|
46
|
+
#
|
47
|
+
# Citrus.rule('"a" | "b"')
|
48
|
+
#
|
49
|
+
def self.rule(expr)
|
50
|
+
parse(expr, :root => :rule_body).value
|
51
|
+
end
|
52
|
+
|
53
|
+
# Loads the grammar from the given +file+ into the global scope using #eval.
|
54
|
+
def self.load(file)
|
55
|
+
file << '.citrus' unless ::File.file?(file)
|
56
|
+
raise "Cannot find file #{file}" unless ::File.file?(file)
|
57
|
+
raise "Cannot read file #{file}" unless ::File.readable?(file)
|
58
|
+
eval(::File.read(file))
|
46
59
|
end
|
47
60
|
|
48
61
|
# A standard error class that all Citrus errors extend.
|
49
62
|
class Error < RuntimeError; end
|
50
63
|
|
51
|
-
# Raised when a match cannot be found.
|
52
|
-
class NoMatchError < Error; end
|
53
|
-
|
54
64
|
# Raised when a parse fails.
|
55
65
|
class ParseError < Error
|
56
66
|
# The +input+ given here is an instance of Citrus::Input.
|
@@ -59,9 +69,7 @@ module Citrus
|
|
59
69
|
@line_offset = input.line_offset(offset)
|
60
70
|
@line_number = input.line_number(offset)
|
61
71
|
@line = input.line(offset)
|
62
|
-
|
63
|
-
[line_number, line_offset, detail]
|
64
|
-
super(msg)
|
72
|
+
super("Failed to parse input on line #{line_number} at offset #{line_offset}\n#{detail}")
|
65
73
|
end
|
66
74
|
|
67
75
|
# The 0-based offset at which the error occurred in the input, i.e. the
|
@@ -82,12 +90,12 @@ module Citrus
|
|
82
90
|
# Returns a string that, when printed, gives a visual representation of
|
83
91
|
# exactly where the error occurred on its line in the input.
|
84
92
|
def detail
|
85
|
-
"
|
93
|
+
"#{line}\n#{' ' * line_offset}^"
|
86
94
|
end
|
87
95
|
end
|
88
96
|
|
89
|
-
#
|
90
|
-
# string and
|
97
|
+
# An Input is a scanner that is responsible for executing rules at different
|
98
|
+
# positions in the input string and persisting event streams.
|
91
99
|
class Input < StringScanner
|
92
100
|
def initialize(string)
|
93
101
|
super(string)
|
@@ -97,40 +105,25 @@ module Citrus
|
|
97
105
|
# The maximum offset in the input that was successfully parsed.
|
98
106
|
attr_reader :max_offset
|
99
107
|
|
100
|
-
# A nested hash of rule id's to offsets and their respective matches. Only
|
101
|
-
# present if memoing is enabled.
|
102
|
-
attr_reader :cache
|
103
|
-
|
104
|
-
# The number of times the cache was hit. Only present if memoing is enabled.
|
105
|
-
attr_reader :cache_hits
|
106
|
-
|
107
|
-
# Resets all internal variables so that this object may be used in another
|
108
|
-
# parse.
|
109
108
|
def reset # :nodoc:
|
110
109
|
@max_offset = 0
|
111
110
|
super
|
112
111
|
end
|
113
112
|
|
114
|
-
# Returns the length of this input.
|
115
|
-
def length
|
116
|
-
string.length
|
117
|
-
end
|
118
|
-
|
119
113
|
# Returns an array containing the lines of text in the input.
|
120
114
|
def lines
|
121
|
-
string.
|
122
|
-
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
string.each_line(&block)
|
115
|
+
if string.respond_to?(:lines)
|
116
|
+
string.lines.to_a
|
117
|
+
else
|
118
|
+
string.to_a
|
119
|
+
end
|
127
120
|
end
|
128
121
|
|
129
122
|
# Returns the 0-based offset of the given +pos+ in the input on the line
|
130
123
|
# on which it is found. +pos+ defaults to the current pointer position.
|
131
124
|
def line_offset(pos=pos)
|
132
125
|
p = 0
|
133
|
-
each_line do |line|
|
126
|
+
string.each_line do |line|
|
134
127
|
len = line.length
|
135
128
|
return (pos - p) if p + len >= pos
|
136
129
|
p += len
|
@@ -142,7 +135,7 @@ module Citrus
|
|
142
135
|
# given +pos+. +pos+ defaults to the current pointer position.
|
143
136
|
def line_index(pos=pos)
|
144
137
|
p = n = 0
|
145
|
-
each_line do |line|
|
138
|
+
string.each_line do |line|
|
146
139
|
p += line.length
|
147
140
|
return n if p >= pos
|
148
141
|
n += 1
|
@@ -156,7 +149,7 @@ module Citrus
|
|
156
149
|
line_index(pos) + 1
|
157
150
|
end
|
158
151
|
|
159
|
-
|
152
|
+
alias_method :lineno, :line_number
|
160
153
|
|
161
154
|
# Returns the text of the line that contains the character at the given
|
162
155
|
# +pos+. +pos+ defaults to the current pointer position.
|
@@ -165,17 +158,16 @@ module Citrus
|
|
165
158
|
end
|
166
159
|
|
167
160
|
# Returns an array of events for the given +rule+ at the current pointer
|
168
|
-
# position. Objects in this array may be one of three types: a
|
169
|
-
# Citrus::CLOSE, or a length.
|
161
|
+
# position. Objects in this array may be one of three types: a Rule,
|
162
|
+
# Citrus::CLOSE, or a length (integer).
|
170
163
|
def exec(rule, events=[])
|
171
164
|
start = pos
|
172
165
|
index = events.size
|
173
166
|
|
174
|
-
rule.exec(self, events)
|
175
|
-
|
176
|
-
if index < events.size
|
177
|
-
self.pos = start + events[-1]
|
167
|
+
if rule.exec(self, events).size > index
|
168
|
+
pos = start + events[-1]
|
178
169
|
@max_offset = pos if pos > @max_offset
|
170
|
+
self.pos = pos
|
179
171
|
else
|
180
172
|
self.pos = start
|
181
173
|
end
|
@@ -186,47 +178,59 @@ module Citrus
|
|
186
178
|
# Returns the length of a match for the given +rule+ at the current pointer
|
187
179
|
# position, +nil+ if none can be made.
|
188
180
|
def test(rule)
|
189
|
-
|
181
|
+
start = pos
|
182
|
+
events = rule.exec(self)
|
183
|
+
self.pos = start
|
184
|
+
events[-1]
|
190
185
|
end
|
191
186
|
|
192
187
|
# Returns +true+ when using memoization to cache match results.
|
193
188
|
def memoized?
|
194
|
-
|
189
|
+
false
|
195
190
|
end
|
191
|
+
end
|
196
192
|
|
197
|
-
|
198
|
-
|
199
|
-
|
200
|
-
|
201
|
-
|
202
|
-
|
203
|
-
|
204
|
-
|
193
|
+
# A MemoizingInput is an Input that caches segments of the event stream for
|
194
|
+
# particular rules in a parse. This technique (also known as "Packrat"
|
195
|
+
# parsing) guarantees parsers will operate in linear time but costs
|
196
|
+
# significantly more in terms of time and memory required to perform a parse.
|
197
|
+
# For more information, please read the paper on Packrat parsing at
|
198
|
+
# http://pdos.csail.mit.edu/~baford/packrat/icfp02/.
|
199
|
+
class MemoizingInput < Input
|
200
|
+
def initialize(string)
|
201
|
+
super(string)
|
205
202
|
@cache = {}
|
206
203
|
@cache_hits = 0
|
204
|
+
end
|
207
205
|
|
208
|
-
|
209
|
-
|
210
|
-
instance_eval do
|
211
|
-
def exec(rule, events=[]) # :nodoc:
|
212
|
-
c = @cache[rule.id] ||= {}
|
206
|
+
# A nested hash of rules to offsets and their respective matches.
|
207
|
+
attr_reader :cache
|
213
208
|
|
214
|
-
|
215
|
-
|
216
|
-
c[pos]
|
217
|
-
else
|
218
|
-
c[pos] = super(rule)
|
219
|
-
end
|
209
|
+
# The number of times the cache was hit.
|
210
|
+
attr_reader :cache_hits
|
220
211
|
|
221
|
-
|
222
|
-
|
212
|
+
def reset # :nodoc:
|
213
|
+
@cache.clear
|
214
|
+
@cache_hits = 0
|
215
|
+
super
|
216
|
+
end
|
223
217
|
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
228
|
-
|
218
|
+
def exec(rule, events=[]) # :nodoc:
|
219
|
+
c = @cache[rule] ||= {}
|
220
|
+
|
221
|
+
e = if c[pos]
|
222
|
+
@cache_hits += 1
|
223
|
+
c[pos]
|
224
|
+
else
|
225
|
+
c[pos] = super(rule)
|
229
226
|
end
|
227
|
+
|
228
|
+
events.concat(e)
|
229
|
+
end
|
230
|
+
|
231
|
+
# Returns +true+ when using memoization to cache match results.
|
232
|
+
def memoized?
|
233
|
+
true
|
230
234
|
end
|
231
235
|
end
|
232
236
|
|
@@ -327,7 +331,7 @@ module Citrus
|
|
327
331
|
end
|
328
332
|
|
329
333
|
# Gets/sets the rule with the given +name+. If +obj+ is given the rule
|
330
|
-
# will be set to the value of +obj+ passed through Rule
|
334
|
+
# will be set to the value of +obj+ passed through Rule.for. If a block is
|
331
335
|
# given, its return value will be used for the value of +obj+.
|
332
336
|
#
|
333
337
|
# It is important to note that this method will also check any included
|
@@ -340,7 +344,7 @@ module Citrus
|
|
340
344
|
if obj
|
341
345
|
rule_names << sym unless has_rule?(sym)
|
342
346
|
|
343
|
-
rule = Rule.
|
347
|
+
rule = Rule.for(obj)
|
344
348
|
rule.name = name
|
345
349
|
setup_super(rule, name)
|
346
350
|
rule.grammar = self
|
@@ -350,7 +354,9 @@ module Citrus
|
|
350
354
|
|
351
355
|
rules[sym] || super_rule(sym)
|
352
356
|
rescue => e
|
353
|
-
|
357
|
+
# This preserves the backtrace.
|
358
|
+
e.message.replace("Cannot create rule \"#{name}\": #{e.message}")
|
359
|
+
raise e
|
354
360
|
end
|
355
361
|
|
356
362
|
# Gets/sets the +name+ of the root rule of this grammar. If no root rule is
|
@@ -364,7 +370,7 @@ module Citrus
|
|
364
370
|
# Creates a new rule that will match any single character. A block may be
|
365
371
|
# provided to specify semantic behavior (via #ext).
|
366
372
|
def dot(&block)
|
367
|
-
ext(Rule.
|
373
|
+
ext(Rule.for(DOT), block)
|
368
374
|
end
|
369
375
|
|
370
376
|
# Creates a new Super for the rule currently being defined in the grammar. A
|
@@ -387,18 +393,10 @@ module Citrus
|
|
387
393
|
|
388
394
|
# Creates a new ButPredicate using the given +rule+. A block may be provided
|
389
395
|
# to specify semantic behavior (via #ext).
|
390
|
-
def
|
396
|
+
def butp(rule, &block)
|
391
397
|
ext(ButPredicate.new(rule), block)
|
392
398
|
end
|
393
399
|
|
394
|
-
alias butp but # For consistency with #andp and #notp.
|
395
|
-
|
396
|
-
# Creates a new Label using the given +rule+ and +label+. A block may be
|
397
|
-
# provided to specify semantic behavior (via #ext).
|
398
|
-
def label(rule, label, &block)
|
399
|
-
ext(Label.new(rule, label), block)
|
400
|
-
end
|
401
|
-
|
402
400
|
# Creates a new Repeat using the given +rule+. +min+ and +max+ specify the
|
403
401
|
# minimum and maximum number of times the rule must match. A block may be
|
404
402
|
# provided to specify semantic behavior (via #ext).
|
@@ -433,30 +431,37 @@ module Citrus
|
|
433
431
|
ext(Choice.new(args), block)
|
434
432
|
end
|
435
433
|
|
434
|
+
# Adds +label+ to the given +rule+.A block may be provided to specify
|
435
|
+
# semantic behavior (via #ext).
|
436
|
+
def label(rule, label, &block)
|
437
|
+
rule = ext(rule, block)
|
438
|
+
rule.label = label
|
439
|
+
rule
|
440
|
+
end
|
441
|
+
|
436
442
|
# Specifies a Module that will be used to extend all matches created with
|
437
443
|
# the given +rule+. A block may also be given that will be used to create
|
438
|
-
# an anonymous module. See Rule#
|
444
|
+
# an anonymous module. See Rule#extension=.
|
439
445
|
def ext(rule, mod=nil, &block)
|
440
|
-
rule = Rule.
|
446
|
+
rule = Rule.for(rule)
|
441
447
|
mod = block if block
|
442
448
|
rule.extension = mod if mod
|
443
449
|
rule
|
444
450
|
end
|
451
|
+
|
452
|
+
# Creates a new Module from the given +block+ and sets it to be the
|
453
|
+
# extension of the given +rule+. See Rule#extension=.
|
454
|
+
def mod(rule, &block)
|
455
|
+
rule.extension = Module.new(&block)
|
456
|
+
rule
|
457
|
+
end
|
445
458
|
end
|
446
459
|
|
447
|
-
# A Rule is an object that is used by a grammar to create matches on
|
460
|
+
# A Rule is an object that is used by a grammar to create matches on an
|
448
461
|
# Input during parsing.
|
449
462
|
module Rule
|
450
|
-
# Evaluates the given expression and creates a new rule object from it.
|
451
|
-
#
|
452
|
-
# Citrus::Rule.eval('"a" | "b"')
|
453
|
-
#
|
454
|
-
def self.eval(expr)
|
455
|
-
Citrus.parse(expr, :root => :rule_body, :consume => true).value
|
456
|
-
end
|
457
|
-
|
458
463
|
# Returns a new Rule object depending on the type of object given.
|
459
|
-
def self.
|
464
|
+
def self.for(obj)
|
460
465
|
case obj
|
461
466
|
when Rule then obj
|
462
467
|
when Symbol then Alias.new(obj)
|
@@ -466,33 +471,10 @@ module Citrus
|
|
466
471
|
when Range then Choice.new(obj.to_a)
|
467
472
|
when Numeric then StringTerminal.new(obj.to_s)
|
468
473
|
else
|
469
|
-
raise ArgumentError, "Invalid rule object:
|
474
|
+
raise ArgumentError, "Invalid rule object: #{obj.inspect}"
|
470
475
|
end
|
471
476
|
end
|
472
477
|
|
473
|
-
@unique_id = 0
|
474
|
-
|
475
|
-
# A global registry for Rule objects. Keyed by rule id.
|
476
|
-
@rules = {}
|
477
|
-
|
478
|
-
# Adds the given +rule+ to the global registry and gives it an id.
|
479
|
-
def self.<<(rule) # :nodoc:
|
480
|
-
rule.id = (@unique_id += 1)
|
481
|
-
@rules[rule.id] = rule
|
482
|
-
end
|
483
|
-
|
484
|
-
# Returns the Rule object with the given +id+.
|
485
|
-
def self.[](id)
|
486
|
-
@rules[id]
|
487
|
-
end
|
488
|
-
|
489
|
-
def initialize(*args) # :nodoc:
|
490
|
-
Rule << self
|
491
|
-
end
|
492
|
-
|
493
|
-
# An integer id that is unique to this rule.
|
494
|
-
attr_accessor :id
|
495
|
-
|
496
478
|
# The grammar this rule belongs to.
|
497
479
|
attr_accessor :grammar
|
498
480
|
|
@@ -501,28 +483,25 @@ module Citrus
|
|
501
483
|
@name = name.to_sym
|
502
484
|
end
|
503
485
|
|
504
|
-
#
|
505
|
-
|
506
|
-
@name || '<anonymous>'
|
507
|
-
end
|
486
|
+
# The name of this rule.
|
487
|
+
attr_reader :name
|
508
488
|
|
509
|
-
#
|
510
|
-
def
|
511
|
-
|
489
|
+
# Sets the label of this rule.
|
490
|
+
def label=(label)
|
491
|
+
@label = label.to_sym
|
512
492
|
end
|
513
493
|
|
494
|
+
# A label for this rule. If a rule has a label, all matches that it creates
|
495
|
+
# will be accessible as named captures from the scope of their parent match
|
496
|
+
# using that label.
|
497
|
+
attr_reader :label
|
498
|
+
|
514
499
|
# Specifies a module that will be used to extend all Match objects that
|
515
500
|
# result from this rule. If +mod+ is a Proc, it is used to create an
|
516
|
-
# anonymous module.
|
501
|
+
# anonymous module with a +value+ method.
|
517
502
|
def extension=(mod)
|
518
503
|
if Proc === mod
|
519
|
-
|
520
|
-
tmp = Module.new(&mod)
|
521
|
-
raise ArgumentError if tmp.instance_methods.empty?
|
522
|
-
mod = tmp
|
523
|
-
rescue NoMethodError, ArgumentError, NameError
|
524
|
-
mod = Module.new { define_method(:value, &mod) }
|
525
|
-
end
|
504
|
+
mod = Module.new { define_method(:value, &mod) }
|
526
505
|
end
|
527
506
|
|
528
507
|
raise ArgumentError unless Module === mod
|
@@ -543,18 +522,22 @@ module Citrus
|
|
543
522
|
# +false+.
|
544
523
|
# consume:: If this is +true+ a ParseError will be raised during a parse
|
545
524
|
# unless the entire input string is consumed. Defaults to
|
546
|
-
# +
|
525
|
+
# +true+.
|
547
526
|
def parse(string, options={})
|
548
527
|
opts = default_parse_options.merge(options)
|
549
528
|
|
550
|
-
input =
|
551
|
-
|
529
|
+
input = if opts[:memoize]
|
530
|
+
MemoizingInput.new(string)
|
531
|
+
else
|
532
|
+
Input.new(string)
|
533
|
+
end
|
534
|
+
|
552
535
|
input.pos = opts[:offset] if opts[:offset] > 0
|
553
536
|
|
554
537
|
events = input.exec(self)
|
555
538
|
length = events[-1]
|
556
539
|
|
557
|
-
if !length || (opts[:consume] && length < (
|
540
|
+
if !length || (opts[:consume] && length < (string.length - opts[:offset]))
|
558
541
|
raise ParseError.new(input)
|
559
542
|
end
|
560
543
|
|
@@ -565,135 +548,71 @@ module Citrus
|
|
565
548
|
def default_parse_options # :nodoc:
|
566
549
|
{ :offset => 0,
|
567
550
|
:memoize => false,
|
568
|
-
:consume =>
|
551
|
+
:consume => true
|
569
552
|
}
|
570
553
|
end
|
571
554
|
|
572
555
|
# Tests whether or not this rule matches on the given +string+. Returns the
|
573
556
|
# length of the match if any can be made, +nil+ otherwise.
|
574
557
|
def test(string)
|
575
|
-
|
576
|
-
input.test(self)
|
558
|
+
Input.new(string).test(self)
|
577
559
|
end
|
578
560
|
|
579
561
|
# Returns +true+ if this rule is a Terminal.
|
580
562
|
def terminal?
|
581
|
-
is_a?(Terminal)
|
582
|
-
end
|
583
|
-
|
584
|
-
# Returns +true+ if this rule is able to propagate extensions from child
|
585
|
-
# rules to the scope of the parent, +false+ otherwise. In general, this will
|
586
|
-
# return +false+ for any rule whose match value is derived from an arbitrary
|
587
|
-
# number of child rules, such as a Repeat or a Sequence. Note that this is
|
588
|
-
# not true for Choice objects because they rely on exactly 1 rule to match,
|
589
|
-
# as do Proxy objects.
|
590
|
-
def propagates_extensions?
|
591
|
-
case self
|
592
|
-
when AndPredicate, NotPredicate, ButPredicate, Repeat, Sequence
|
593
|
-
false
|
594
|
-
else
|
595
|
-
true
|
596
|
-
end
|
597
|
-
end
|
598
|
-
|
599
|
-
# Returns +true+ if this rule needs to be surrounded by parentheses when
|
600
|
-
# using #embed.
|
601
|
-
def paren?
|
602
563
|
false
|
603
564
|
end
|
604
565
|
|
605
|
-
# Returns
|
606
|
-
#
|
607
|
-
def
|
608
|
-
|
609
|
-
end
|
610
|
-
|
611
|
-
def inspect # :nodoc:
|
612
|
-
to_s
|
566
|
+
# Returns +true+ if this rule should extend a match but should not appear in
|
567
|
+
# its event stream.
|
568
|
+
def elide?
|
569
|
+
false
|
613
570
|
end
|
614
571
|
|
615
|
-
|
616
|
-
|
617
|
-
|
572
|
+
# Returns +true+ if this rule needs to be surrounded by parentheses when
|
573
|
+
# using #to_embedded_s.
|
574
|
+
def needs_paren? # :nodoc:
|
575
|
+
is_a?(Nonterminal) && rules.length > 1
|
618
576
|
end
|
619
|
-
end
|
620
|
-
|
621
|
-
# A Terminal is a Rule that matches directly on the input stream and may not
|
622
|
-
# contain any other rule. Terminals are essentially wrappers for regular
|
623
|
-
# expressions. As such, the Citrus notation is identical to Ruby's regular
|
624
|
-
# expression notation, e.g.:
|
625
|
-
#
|
626
|
-
# /expr/
|
627
|
-
#
|
628
|
-
# Character classes and the dot symbol may also be used in Citrus notation for
|
629
|
-
# compatibility with other parsing expression implementations, e.g.:
|
630
|
-
#
|
631
|
-
# [a-zA-Z]
|
632
|
-
# .
|
633
|
-
#
|
634
|
-
class Terminal
|
635
|
-
include Rule
|
636
577
|
|
637
|
-
|
638
|
-
|
639
|
-
|
578
|
+
# Returns the Citrus notation of this rule as a string.
|
579
|
+
def to_s
|
580
|
+
if label
|
581
|
+
"#{label}:" + (needs_paren? ? "(#{to_citrus})" : to_citrus)
|
582
|
+
else
|
583
|
+
to_citrus
|
584
|
+
end
|
640
585
|
end
|
641
586
|
|
642
|
-
|
643
|
-
attr_reader :rule
|
587
|
+
alias_method :to_str, :to_s
|
644
588
|
|
645
|
-
# Returns
|
646
|
-
|
647
|
-
|
648
|
-
if
|
649
|
-
|
650
|
-
|
651
|
-
|
589
|
+
# Returns the Citrus notation of this rule as a string that is suitable to
|
590
|
+
# be embedded in the string representation of another rule.
|
591
|
+
def to_embedded_s # :nodoc:
|
592
|
+
if name
|
593
|
+
name.to_s
|
594
|
+
else
|
595
|
+
needs_paren? && label.nil? ? "(#{to_s})" : to_s
|
652
596
|
end
|
653
|
-
events
|
654
597
|
end
|
655
598
|
|
656
|
-
|
657
|
-
|
658
|
-
|
599
|
+
def ==(other)
|
600
|
+
case other
|
601
|
+
when Rule
|
602
|
+
to_s == other.to_s
|
603
|
+
else
|
604
|
+
super
|
605
|
+
end
|
659
606
|
end
|
660
607
|
|
661
|
-
|
662
|
-
def to_s
|
663
|
-
rule.inspect
|
664
|
-
end
|
665
|
-
end
|
608
|
+
alias_method :eql?, :==
|
666
609
|
|
667
|
-
|
668
|
-
|
669
|
-
# single or double quotes, e.g.:
|
670
|
-
#
|
671
|
-
# 'expr'
|
672
|
-
# "expr"
|
673
|
-
#
|
674
|
-
# This notation works the same as it does in Ruby; i.e. strings in double
|
675
|
-
# quotes may contain escape sequences while strings in single quotes may not.
|
676
|
-
# In order to specify that a string should ignore case when matching, enclose
|
677
|
-
# it in backticks instead of single or double quotes, e.g.:
|
678
|
-
#
|
679
|
-
# `expr`
|
680
|
-
#
|
681
|
-
# Besides case sensitivity, case-insensitive strings have the same semantics
|
682
|
-
# as double-quoted strings.
|
683
|
-
class StringTerminal < Terminal
|
684
|
-
# The +flags+ will be passed directly to Regexp#new.
|
685
|
-
def initialize(rule='', flags=0)
|
686
|
-
super(Regexp.new(Regexp.escape(rule), flags))
|
687
|
-
@string = rule
|
610
|
+
def inspect # :nodoc:
|
611
|
+
to_s
|
688
612
|
end
|
689
613
|
|
690
|
-
|
691
|
-
|
692
|
-
if case_sensitive?
|
693
|
-
@string.inspect
|
694
|
-
else
|
695
|
-
@string.inspect.gsub(/^"|"$/, '`')
|
696
|
-
end
|
614
|
+
def extend_match(match) # :nodoc:
|
615
|
+
match.extend(extension) if extension
|
697
616
|
end
|
698
617
|
end
|
699
618
|
|
@@ -705,7 +624,6 @@ module Citrus
|
|
705
624
|
include Rule
|
706
625
|
|
707
626
|
def initialize(rule_name='<proxy>')
|
708
|
-
super
|
709
627
|
self.rule_name = rule_name
|
710
628
|
end
|
711
629
|
|
@@ -724,19 +642,29 @@ module Citrus
|
|
724
642
|
|
725
643
|
# Returns an array of events for this rule on the given +input+.
|
726
644
|
def exec(input, events=[])
|
727
|
-
events << id
|
728
|
-
|
729
645
|
index = events.size
|
730
|
-
|
646
|
+
|
731
647
|
if input.exec(rule, events).size > index
|
732
|
-
|
733
|
-
|
734
|
-
|
735
|
-
events.slice!(start, events.size)
|
648
|
+
# Proxy objects insert themselves into the event stream in place of the
|
649
|
+
# rule they are proxy for.
|
650
|
+
events[index] = self
|
736
651
|
end
|
737
652
|
|
738
653
|
events
|
739
654
|
end
|
655
|
+
|
656
|
+
# Returns +true+ if this rule should extend a match but should not appear in
|
657
|
+
# its event stream.
|
658
|
+
def elide? # :nodoc:
|
659
|
+
rule.elide?
|
660
|
+
end
|
661
|
+
|
662
|
+
def extend_match(match) # :nodoc:
|
663
|
+
# Proxy objects preserve the extension of the rule they are proxy for, and
|
664
|
+
# may also use their own extension.
|
665
|
+
rule.extend_match(match)
|
666
|
+
super
|
667
|
+
end
|
740
668
|
end
|
741
669
|
|
742
670
|
# An Alias is a Proxy for a rule in the same grammar. It is used in rule
|
@@ -749,7 +677,7 @@ module Citrus
|
|
749
677
|
include Proxy
|
750
678
|
|
751
679
|
# Returns the Citrus notation of this rule as a string.
|
752
|
-
def
|
680
|
+
def to_citrus # :nodoc:
|
753
681
|
rule_name.to_s
|
754
682
|
end
|
755
683
|
|
@@ -758,8 +686,14 @@ module Citrus
|
|
758
686
|
# Searches this proxy's grammar and any included grammars for a rule with
|
759
687
|
# this proxy's #rule_name. Raises an error if one cannot be found.
|
760
688
|
def resolve!
|
761
|
-
grammar.rule(rule_name)
|
762
|
-
|
689
|
+
rule = grammar.rule(rule_name)
|
690
|
+
|
691
|
+
unless rule
|
692
|
+
raise RuntimeError,
|
693
|
+
"No rule named \"#{rule_name}\" in grammar #{grammar.name}"
|
694
|
+
end
|
695
|
+
|
696
|
+
rule
|
763
697
|
end
|
764
698
|
end
|
765
699
|
|
@@ -774,7 +708,7 @@ module Citrus
|
|
774
708
|
include Proxy
|
775
709
|
|
776
710
|
# Returns the Citrus notation of this rule as a string.
|
777
|
-
def
|
711
|
+
def to_citrus # :nodoc:
|
778
712
|
'super'
|
779
713
|
end
|
780
714
|
|
@@ -783,8 +717,119 @@ module Citrus
|
|
783
717
|
# Searches this proxy's included grammars for a rule with this proxy's
|
784
718
|
# #rule_name. Raises an error if one cannot be found.
|
785
719
|
def resolve!
|
786
|
-
grammar.super_rule(rule_name)
|
787
|
-
|
720
|
+
rule = grammar.super_rule(rule_name)
|
721
|
+
|
722
|
+
unless rule
|
723
|
+
raise RuntimeError,
|
724
|
+
"No rule named \"#{rule_name}\" in hierarchy of grammar #{grammar.name}"
|
725
|
+
end
|
726
|
+
|
727
|
+
rule
|
728
|
+
end
|
729
|
+
end
|
730
|
+
|
731
|
+
# A Terminal is a Rule that matches directly on the input stream and may not
|
732
|
+
# contain any other rule. Terminals are essentially wrappers for regular
|
733
|
+
# expressions. As such, the Citrus notation is identical to Ruby's regular
|
734
|
+
# expression notation, e.g.:
|
735
|
+
#
|
736
|
+
# /expr/
|
737
|
+
#
|
738
|
+
# Character classes and the dot symbol may also be used in Citrus notation for
|
739
|
+
# compatibility with other parsing expression implementations, e.g.:
|
740
|
+
#
|
741
|
+
# [a-zA-Z]
|
742
|
+
# .
|
743
|
+
#
|
744
|
+
# Character classes have the same semantics as character classes inside Ruby
|
745
|
+
# regular expressions. The dot matches any character, including newlines.
|
746
|
+
class Terminal
|
747
|
+
include Rule
|
748
|
+
|
749
|
+
def initialize(regexp=/^/)
|
750
|
+
@regexp = regexp
|
751
|
+
end
|
752
|
+
|
753
|
+
# The actual Regexp object this rule uses to match.
|
754
|
+
attr_reader :regexp
|
755
|
+
|
756
|
+
# Returns an array of events for this rule on the given +input+.
|
757
|
+
def exec(input, events=[])
|
758
|
+
length = input.scan_full(@regexp, false, false)
|
759
|
+
|
760
|
+
if length
|
761
|
+
events << self
|
762
|
+
events << CLOSE
|
763
|
+
events << length
|
764
|
+
end
|
765
|
+
|
766
|
+
events
|
767
|
+
end
|
768
|
+
|
769
|
+
# Returns +true+ if this rule is case sensitive.
|
770
|
+
def case_sensitive?
|
771
|
+
!@regexp.casefold?
|
772
|
+
end
|
773
|
+
|
774
|
+
def ==(other)
|
775
|
+
case other
|
776
|
+
when Regexp
|
777
|
+
@regexp == other
|
778
|
+
else
|
779
|
+
super
|
780
|
+
end
|
781
|
+
end
|
782
|
+
|
783
|
+
# Returns +true+ if this rule is a Terminal.
|
784
|
+
def terminal? # :nodoc:
|
785
|
+
true
|
786
|
+
end
|
787
|
+
|
788
|
+
# Returns the Citrus notation of this rule as a string.
|
789
|
+
def to_citrus # :nodoc:
|
790
|
+
@regexp.inspect
|
791
|
+
end
|
792
|
+
end
|
793
|
+
|
794
|
+
# A StringTerminal is a Terminal that may be instantiated from a String
|
795
|
+
# object. The Citrus notation is any sequence of characters enclosed in either
|
796
|
+
# single or double quotes, e.g.:
|
797
|
+
#
|
798
|
+
# 'expr'
|
799
|
+
# "expr"
|
800
|
+
#
|
801
|
+
# This notation works the same as it does in Ruby; i.e. strings in double
|
802
|
+
# quotes may contain escape sequences while strings in single quotes may not.
|
803
|
+
# In order to specify that a string should ignore case when matching, enclose
|
804
|
+
# it in backticks instead of single or double quotes, e.g.:
|
805
|
+
#
|
806
|
+
# `expr`
|
807
|
+
#
|
808
|
+
# Besides case sensitivity, case-insensitive strings have the same semantics
|
809
|
+
# as double-quoted strings.
|
810
|
+
class StringTerminal < Terminal
|
811
|
+
# The +flags+ will be passed directly to Regexp#new.
|
812
|
+
def initialize(rule='', flags=0)
|
813
|
+
super(Regexp.new(Regexp.escape(rule), flags))
|
814
|
+
@string = rule
|
815
|
+
end
|
816
|
+
|
817
|
+
def ==(other)
|
818
|
+
case other
|
819
|
+
when String
|
820
|
+
@string == other
|
821
|
+
else
|
822
|
+
super
|
823
|
+
end
|
824
|
+
end
|
825
|
+
|
826
|
+
# Returns the Citrus notation of this rule as a string.
|
827
|
+
def to_citrus # :nodoc:
|
828
|
+
if case_sensitive?
|
829
|
+
@string.inspect
|
830
|
+
else
|
831
|
+
@string.inspect.gsub(/^"|"$/, '`')
|
832
|
+
end
|
788
833
|
end
|
789
834
|
end
|
790
835
|
|
@@ -796,8 +841,7 @@ module Citrus
|
|
796
841
|
include Rule
|
797
842
|
|
798
843
|
def initialize(rules=[])
|
799
|
-
|
800
|
-
@rules = rules.map {|r| Rule.new(r) }
|
844
|
+
@rules = rules.map {|r| Rule.for(r) }
|
801
845
|
end
|
802
846
|
|
803
847
|
# An array of the actual Rule objects this rule uses to match.
|
@@ -809,8 +853,13 @@ module Citrus
|
|
809
853
|
end
|
810
854
|
end
|
811
855
|
|
812
|
-
#
|
813
|
-
|
856
|
+
# An AndPredicate is a Nonterminal that contains a rule that must match. Upon
|
857
|
+
# success an empty match is returned and no input is consumed. The Citrus
|
858
|
+
# notation is any expression preceded by an ampersand, e.g.:
|
859
|
+
#
|
860
|
+
# &expr
|
861
|
+
#
|
862
|
+
class AndPredicate
|
814
863
|
include Nonterminal
|
815
864
|
|
816
865
|
def initialize(rule='')
|
@@ -821,145 +870,108 @@ module Citrus
|
|
821
870
|
def rule
|
822
871
|
rules[0]
|
823
872
|
end
|
824
|
-
end
|
825
|
-
|
826
|
-
# An AndPredicate is a Predicate that contains a rule that must match. Upon
|
827
|
-
# success an empty match is returned and no input is consumed. The Citrus
|
828
|
-
# notation is any expression preceded by an ampersand, e.g.:
|
829
|
-
#
|
830
|
-
# &expr
|
831
|
-
#
|
832
|
-
class AndPredicate
|
833
|
-
include Predicate
|
834
873
|
|
835
874
|
# Returns an array of events for this rule on the given +input+.
|
836
875
|
def exec(input, events=[])
|
837
876
|
if input.test(rule)
|
838
|
-
events <<
|
877
|
+
events << self
|
839
878
|
events << CLOSE
|
840
879
|
events << 0
|
841
880
|
end
|
881
|
+
|
842
882
|
events
|
843
883
|
end
|
844
884
|
|
845
885
|
# Returns the Citrus notation of this rule as a string.
|
846
|
-
def
|
847
|
-
'&' + rule.
|
886
|
+
def to_citrus # :nodoc:
|
887
|
+
'&' + rule.to_embedded_s
|
848
888
|
end
|
849
889
|
end
|
850
890
|
|
851
|
-
# A NotPredicate is a
|
852
|
-
# success an empty match is returned and no input is consumed. The Citrus
|
891
|
+
# A NotPredicate is a Nonterminal that contains a rule that must not match.
|
892
|
+
# Upon success an empty match is returned and no input is consumed. The Citrus
|
853
893
|
# notation is any expression preceded by an exclamation mark, e.g.:
|
854
894
|
#
|
855
895
|
# !expr
|
856
896
|
#
|
857
897
|
class NotPredicate
|
858
|
-
include
|
898
|
+
include Nonterminal
|
899
|
+
|
900
|
+
def initialize(rule='')
|
901
|
+
super([rule])
|
902
|
+
end
|
903
|
+
|
904
|
+
# Returns the Rule object this rule uses to match.
|
905
|
+
def rule
|
906
|
+
rules[0]
|
907
|
+
end
|
859
908
|
|
860
909
|
# Returns an array of events for this rule on the given +input+.
|
861
910
|
def exec(input, events=[])
|
862
911
|
unless input.test(rule)
|
863
|
-
events <<
|
912
|
+
events << self
|
864
913
|
events << CLOSE
|
865
914
|
events << 0
|
866
915
|
end
|
916
|
+
|
867
917
|
events
|
868
918
|
end
|
869
919
|
|
870
920
|
# Returns the Citrus notation of this rule as a string.
|
871
|
-
def
|
872
|
-
'!' + rule.
|
921
|
+
def to_citrus # :nodoc:
|
922
|
+
'!' + rule.to_embedded_s
|
873
923
|
end
|
874
924
|
end
|
875
925
|
|
876
|
-
# A ButPredicate is a
|
926
|
+
# A ButPredicate is a Nonterminal that consumes all characters until its rule
|
877
927
|
# matches. It must match at least one character in order to succeed. The
|
878
928
|
# Citrus notation is any expression preceded by a tilde, e.g.:
|
879
929
|
#
|
880
930
|
# ~expr
|
881
931
|
#
|
882
932
|
class ButPredicate
|
883
|
-
include
|
933
|
+
include Nonterminal
|
884
934
|
|
885
|
-
DOT_RULE = Rule.
|
935
|
+
DOT_RULE = Rule.for(DOT)
|
936
|
+
|
937
|
+
def initialize(rule='')
|
938
|
+
super([rule])
|
939
|
+
end
|
940
|
+
|
941
|
+
# Returns the Rule object this rule uses to match.
|
942
|
+
def rule
|
943
|
+
rules[0]
|
944
|
+
end
|
886
945
|
|
887
946
|
# Returns an array of events for this rule on the given +input+.
|
888
947
|
def exec(input, events=[])
|
889
948
|
length = 0
|
949
|
+
|
890
950
|
until input.test(rule)
|
891
951
|
len = input.exec(DOT_RULE)[-1]
|
892
952
|
break unless len
|
893
953
|
length += len
|
894
954
|
end
|
955
|
+
|
895
956
|
if length > 0
|
896
|
-
events <<
|
957
|
+
events << self
|
897
958
|
events << CLOSE
|
898
959
|
events << length
|
899
960
|
end
|
900
|
-
events
|
901
|
-
end
|
902
|
-
|
903
|
-
# Returns the Citrus notation of this rule as a string.
|
904
|
-
def to_s
|
905
|
-
'~' + rule.embed
|
906
|
-
end
|
907
|
-
end
|
908
|
-
|
909
|
-
# A Label is a Predicate that applies a new name to any matches made by its
|
910
|
-
# rule. The Citrus notation is any sequence of word characters (i.e.
|
911
|
-
# <tt>[a-zA-Z0-9_]</tt>) followed by a colon, followed by any other
|
912
|
-
# expression, e.g.:
|
913
|
-
#
|
914
|
-
# label:expr
|
915
|
-
#
|
916
|
-
class Label
|
917
|
-
include Predicate
|
918
|
-
|
919
|
-
def initialize(rule='', label='<label>')
|
920
|
-
super(rule)
|
921
|
-
self.label = label
|
922
|
-
end
|
923
|
-
|
924
|
-
# Sets the name of this label.
|
925
|
-
def label=(label)
|
926
|
-
@label = label.to_sym
|
927
|
-
end
|
928
|
-
|
929
|
-
# The label this rule adds to all its matches.
|
930
|
-
attr_reader :label
|
931
|
-
|
932
|
-
# Returns an array of events for this rule on the given +input+.
|
933
|
-
def exec(input, events=[])
|
934
|
-
events << id
|
935
|
-
|
936
|
-
index = events.size
|
937
|
-
start = index - 1
|
938
|
-
if input.exec(rule, events).size > index
|
939
|
-
events << CLOSE
|
940
|
-
events << events[-2]
|
941
|
-
else
|
942
|
-
events.slice!(start, events.size)
|
943
|
-
end
|
944
961
|
|
945
962
|
events
|
946
963
|
end
|
947
964
|
|
948
965
|
# Returns the Citrus notation of this rule as a string.
|
949
|
-
def
|
950
|
-
|
951
|
-
end
|
952
|
-
|
953
|
-
def extend_match(match) # :nodoc:
|
954
|
-
match.names << label
|
955
|
-
super
|
966
|
+
def to_citrus # :nodoc:
|
967
|
+
'~' + rule.to_embedded_s
|
956
968
|
end
|
957
969
|
end
|
958
970
|
|
959
|
-
# A Repeat is a
|
960
|
-
# its rule must match. The Citrus notation is an integer, +N+, followed
|
961
|
-
# asterisk, followed by another integer, +M+, all of which follow any
|
962
|
-
# expression, e.g.:
|
971
|
+
# A Repeat is a Nonterminal that specifies a minimum and maximum number of
|
972
|
+
# times its rule must match. The Citrus notation is an integer, +N+, followed
|
973
|
+
# by an asterisk, followed by another integer, +M+, all of which follow any
|
974
|
+
# other expression, e.g.:
|
963
975
|
#
|
964
976
|
# expr N*M
|
965
977
|
#
|
@@ -976,22 +988,29 @@ module Citrus
|
|
976
988
|
# expr?
|
977
989
|
#
|
978
990
|
class Repeat
|
979
|
-
include
|
991
|
+
include Nonterminal
|
980
992
|
|
981
993
|
def initialize(rule='', min=1, max=Infinity)
|
982
994
|
raise ArgumentError, "Min cannot be greater than max" if min > max
|
983
|
-
super(rule)
|
995
|
+
super([rule])
|
984
996
|
@range = Range.new(min, max)
|
985
997
|
end
|
986
998
|
|
999
|
+
# Returns the Rule object this rule uses to match.
|
1000
|
+
def rule
|
1001
|
+
rules[0]
|
1002
|
+
end
|
1003
|
+
|
987
1004
|
# Returns an array of events for this rule on the given +input+.
|
988
1005
|
def exec(input, events=[])
|
989
|
-
events <<
|
1006
|
+
events << self
|
990
1007
|
|
991
1008
|
index = events.size
|
992
1009
|
start = index - 1
|
993
1010
|
length = n = 0
|
994
|
-
|
1011
|
+
m = max
|
1012
|
+
|
1013
|
+
while n < m && input.exec(rule, events).size > index
|
995
1014
|
index = events.size
|
996
1015
|
length += events[-1]
|
997
1016
|
n += 1
|
@@ -1030,44 +1049,37 @@ module Citrus
|
|
1030
1049
|
end
|
1031
1050
|
|
1032
1051
|
# Returns the Citrus notation of this rule as a string.
|
1033
|
-
def
|
1034
|
-
rule.
|
1035
|
-
end
|
1036
|
-
end
|
1037
|
-
|
1038
|
-
# A List is a Nonterminal that contains any number of other rules and tests
|
1039
|
-
# them for matches in sequential order.
|
1040
|
-
module List
|
1041
|
-
include Nonterminal
|
1042
|
-
|
1043
|
-
# See Rule#paren?.
|
1044
|
-
def paren?
|
1045
|
-
rules.length > 1
|
1052
|
+
def to_citrus # :nodoc:
|
1053
|
+
rule.to_embedded_s + operator
|
1046
1054
|
end
|
1047
1055
|
end
|
1048
1056
|
|
1049
|
-
# A
|
1050
|
-
# two or more expressions separated by a
|
1057
|
+
# A Sequence is a Nonterminal where all rules must match. The Citrus notation
|
1058
|
+
# is two or more expressions separated by a space, e.g.:
|
1051
1059
|
#
|
1052
|
-
# expr
|
1060
|
+
# expr expr
|
1053
1061
|
#
|
1054
|
-
class
|
1055
|
-
include
|
1062
|
+
class Sequence
|
1063
|
+
include Nonterminal
|
1056
1064
|
|
1057
1065
|
# Returns an array of events for this rule on the given +input+.
|
1058
1066
|
def exec(input, events=[])
|
1059
|
-
events <<
|
1067
|
+
events << self
|
1060
1068
|
|
1061
1069
|
index = events.size
|
1062
1070
|
start = index - 1
|
1063
|
-
n = 0
|
1064
|
-
|
1071
|
+
length = n = 0
|
1072
|
+
m = rules.length
|
1073
|
+
|
1074
|
+
while n < m && input.exec(rules[n], events).size > index
|
1075
|
+
index = events.size
|
1076
|
+
length += events[-1]
|
1065
1077
|
n += 1
|
1066
1078
|
end
|
1067
1079
|
|
1068
|
-
if
|
1080
|
+
if n == rules.length
|
1069
1081
|
events << CLOSE
|
1070
|
-
events <<
|
1082
|
+
events << length
|
1071
1083
|
else
|
1072
1084
|
events.slice!(start, events.size)
|
1073
1085
|
end
|
@@ -1076,181 +1088,272 @@ module Citrus
|
|
1076
1088
|
end
|
1077
1089
|
|
1078
1090
|
# Returns the Citrus notation of this rule as a string.
|
1079
|
-
def
|
1080
|
-
rules.map {|r| r.
|
1091
|
+
def to_citrus # :nodoc:
|
1092
|
+
rules.map {|r| r.to_embedded_s }.join(' ')
|
1081
1093
|
end
|
1082
1094
|
end
|
1083
1095
|
|
1084
|
-
# A
|
1085
|
-
# or more expressions separated by a
|
1096
|
+
# A Choice is a Nonterminal where only one rule must match. The Citrus
|
1097
|
+
# notation is two or more expressions separated by a vertical bar, e.g.:
|
1086
1098
|
#
|
1087
|
-
# expr expr
|
1099
|
+
# expr | expr
|
1088
1100
|
#
|
1089
|
-
class
|
1090
|
-
include
|
1101
|
+
class Choice
|
1102
|
+
include Nonterminal
|
1091
1103
|
|
1092
1104
|
# Returns an array of events for this rule on the given +input+.
|
1093
1105
|
def exec(input, events=[])
|
1094
|
-
events <<
|
1106
|
+
events << self
|
1095
1107
|
|
1096
1108
|
index = events.size
|
1097
|
-
|
1098
|
-
|
1099
|
-
|
1100
|
-
|
1101
|
-
length += events[-1]
|
1109
|
+
n = 0
|
1110
|
+
m = rules.length
|
1111
|
+
|
1112
|
+
while n < m && input.exec(rules[n], events).size == index
|
1102
1113
|
n += 1
|
1103
1114
|
end
|
1104
1115
|
|
1105
|
-
if
|
1116
|
+
if index < events.size
|
1106
1117
|
events << CLOSE
|
1107
|
-
events <<
|
1118
|
+
events << events[-2]
|
1108
1119
|
else
|
1109
|
-
events.
|
1120
|
+
events.pop
|
1110
1121
|
end
|
1111
1122
|
|
1112
1123
|
events
|
1113
1124
|
end
|
1114
1125
|
|
1126
|
+
# Returns +true+ if this rule should extend a match but should not appear in
|
1127
|
+
# its event stream.
|
1128
|
+
def elide? # :nodoc:
|
1129
|
+
true
|
1130
|
+
end
|
1131
|
+
|
1115
1132
|
# Returns the Citrus notation of this rule as a string.
|
1116
|
-
def
|
1117
|
-
rules.map {|r| r.
|
1133
|
+
def to_citrus # :nodoc:
|
1134
|
+
rules.map {|r| r.to_embedded_s }.join(' | ')
|
1118
1135
|
end
|
1119
1136
|
end
|
1120
1137
|
|
1121
1138
|
# The base class for all matches. Matches are organized into a tree where any
|
1122
|
-
# match may contain any number of other matches.
|
1123
|
-
#
|
1124
|
-
|
1139
|
+
# match may contain any number of other matches. Nodes of the tree are lazily
|
1140
|
+
# instantiated as needed. This class provides several convenient tree
|
1141
|
+
# traversal methods that help when examining and interpreting parse results.
|
1142
|
+
class Match
|
1125
1143
|
def initialize(string, events=[])
|
1126
|
-
|
1127
|
-
string.length if events[-1] && string.length != events[-1]
|
1144
|
+
@string = string
|
1128
1145
|
|
1129
|
-
|
1130
|
-
|
1146
|
+
if events.length > 0
|
1147
|
+
if events[-1] != string.length
|
1148
|
+
raise ArgumentError, "Invalid events for length #{string.length}"
|
1149
|
+
end
|
1131
1150
|
|
1132
|
-
|
1133
|
-
end
|
1151
|
+
elisions = []
|
1134
1152
|
|
1135
|
-
|
1136
|
-
|
1153
|
+
while events[0].elide?
|
1154
|
+
elisions.unshift(events.shift)
|
1155
|
+
events = events.slice(0, events.length - 2)
|
1156
|
+
end
|
1137
1157
|
|
1138
|
-
|
1139
|
-
# for each rule that returns that object when matching. These names can then
|
1140
|
-
# be used to determine which rules were satisfied by a given match.
|
1141
|
-
def names
|
1142
|
-
@names ||= []
|
1143
|
-
end
|
1158
|
+
events[0].extend_match(self)
|
1144
1159
|
|
1145
|
-
|
1146
|
-
|
1147
|
-
|
1148
|
-
|
1160
|
+
elisions.each do |rule|
|
1161
|
+
rule.extend_match(self)
|
1162
|
+
end
|
1163
|
+
end
|
1149
1164
|
|
1150
|
-
|
1151
|
-
def has_name?(name)
|
1152
|
-
names.include?(name.to_sym)
|
1165
|
+
@events = events
|
1153
1166
|
end
|
1154
1167
|
|
1155
|
-
#
|
1156
|
-
|
1157
|
-
|
1158
|
-
|
1159
|
-
|
1160
|
-
|
1161
|
-
rule = Rule[event]
|
1162
|
-
extenders.unshift(rule)
|
1163
|
-
break unless rule.propagates_extensions?
|
1164
|
-
end
|
1165
|
-
extenders
|
1166
|
-
end
|
1168
|
+
# The array of events for this match.
|
1169
|
+
attr_reader :events
|
1170
|
+
|
1171
|
+
# Returns the length of this match.
|
1172
|
+
def length
|
1173
|
+
@string.length
|
1167
1174
|
end
|
1168
1175
|
|
1169
|
-
# Returns
|
1170
|
-
# order they appeared in the input.
|
1171
|
-
def
|
1172
|
-
@
|
1173
|
-
|
1176
|
+
# Returns a hash of capture names to arrays of matches with that name,
|
1177
|
+
# in the order they appeared in the input.
|
1178
|
+
def captures
|
1179
|
+
@captures ||= begin
|
1180
|
+
captures = {}
|
1174
1181
|
stack = []
|
1175
1182
|
offset = 0
|
1176
1183
|
close = false
|
1177
1184
|
index = 0
|
1185
|
+
last_length = nil
|
1186
|
+
in_proxy = false
|
1187
|
+
count = 0
|
1178
1188
|
|
1179
1189
|
while index < @events.size
|
1180
1190
|
event = @events[index]
|
1191
|
+
|
1181
1192
|
if close
|
1182
1193
|
start = stack.pop
|
1183
|
-
|
1184
|
-
|
1185
|
-
|
1194
|
+
|
1195
|
+
if Rule === start
|
1196
|
+
rule = start
|
1197
|
+
os = stack.pop
|
1198
|
+
start = stack.pop
|
1199
|
+
|
1200
|
+
match = Match.new(@string.slice(os, event), @events[start..index])
|
1201
|
+
|
1202
|
+
# We can lookup immediate submatches by their index.
|
1203
|
+
if stack.size == 1
|
1204
|
+
captures[count] = match
|
1205
|
+
count += 1
|
1206
|
+
end
|
1207
|
+
|
1208
|
+
# We can lookup matches that were created by proxy by the name of
|
1209
|
+
# the rule they are proxy for.
|
1210
|
+
if Proxy === rule
|
1211
|
+
if captures[rule.rule_name]
|
1212
|
+
captures[rule.rule_name] << match
|
1213
|
+
else
|
1214
|
+
captures[rule.rule_name] = [match]
|
1215
|
+
end
|
1216
|
+
end
|
1217
|
+
|
1218
|
+
# We can lookup matches that were created by rules with labels by
|
1219
|
+
# that label.
|
1220
|
+
if rule.label
|
1221
|
+
if captures[rule.label]
|
1222
|
+
captures[rule.label] << match
|
1223
|
+
else
|
1224
|
+
captures[rule.label] = [match]
|
1225
|
+
end
|
1226
|
+
end
|
1227
|
+
|
1228
|
+
in_proxy = false
|
1186
1229
|
end
|
1230
|
+
|
1231
|
+
unless last_length
|
1232
|
+
last_length = event
|
1233
|
+
end
|
1234
|
+
|
1187
1235
|
close = false
|
1188
1236
|
elsif event == CLOSE
|
1189
1237
|
close = true
|
1190
1238
|
else
|
1191
1239
|
stack << index
|
1240
|
+
|
1241
|
+
# We can calculate the offset of this rule event by adding back the
|
1242
|
+
# last match length.
|
1243
|
+
if last_length
|
1244
|
+
offset += last_length
|
1245
|
+
last_length = nil
|
1246
|
+
end
|
1247
|
+
|
1248
|
+
# We should not create captures when traversing the portion of the
|
1249
|
+
# event stream that is masked by a proxy in the original rule
|
1250
|
+
# definition.
|
1251
|
+
unless in_proxy || stack.size == 1
|
1252
|
+
stack << offset
|
1253
|
+
stack << event
|
1254
|
+
in_proxy = true if Proxy === event
|
1255
|
+
end
|
1192
1256
|
end
|
1257
|
+
|
1193
1258
|
index += 1
|
1194
1259
|
end
|
1195
1260
|
|
1196
|
-
|
1261
|
+
captures
|
1197
1262
|
end
|
1198
1263
|
end
|
1199
1264
|
|
1200
|
-
# Returns an array of all
|
1201
|
-
|
1202
|
-
|
1203
|
-
def find(name, deep=true)
|
1204
|
-
ms = matches.select {|m| m.has_name?(name) }
|
1205
|
-
matches.each {|m| ms.concat(m.find(name, deep)) } if deep
|
1206
|
-
ms
|
1265
|
+
# Returns an array of all immediate submatches of this match.
|
1266
|
+
def matches
|
1267
|
+
@matches ||= (0...captures.size).map {|n| captures[n] }.compact
|
1207
1268
|
end
|
1208
1269
|
|
1209
|
-
# A shortcut for retrieving the first immediate
|
1210
|
-
|
1211
|
-
|
1212
|
-
def first(name=nil)
|
1213
|
-
name ? find(name, false).first : matches.first
|
1270
|
+
# A shortcut for retrieving the first immediate submatch of this match.
|
1271
|
+
def first
|
1272
|
+
captures[0]
|
1214
1273
|
end
|
1215
1274
|
|
1216
1275
|
# The default value for a match is its string value. This method is
|
1217
1276
|
# overridden in most cases to be more meaningful according to the desired
|
1218
1277
|
# interpretation.
|
1219
|
-
|
1220
|
-
|
1221
|
-
# Allows
|
1222
|
-
#
|
1223
|
-
def method_missing(sym, *args)
|
1224
|
-
if sym
|
1225
|
-
|
1226
|
-
# extend String.
|
1227
|
-
super
|
1278
|
+
alias_method :value, :to_s
|
1279
|
+
|
1280
|
+
# Allows methods of this match's string to be called directly and provides
|
1281
|
+
# a convenient interface for retrieving the first match with a given name.
|
1282
|
+
def method_missing(sym, *args, &block)
|
1283
|
+
if @string.respond_to?(sym)
|
1284
|
+
@string.__send__(sym, *args, &block)
|
1228
1285
|
else
|
1229
|
-
first
|
1230
|
-
[sym, self, name]
|
1286
|
+
captures[sym].first if captures[sym]
|
1231
1287
|
end
|
1232
1288
|
end
|
1233
1289
|
|
1234
|
-
|
1235
|
-
|
1236
|
-
def dump
|
1237
|
-
dump_lines.join("\n")
|
1290
|
+
def to_s
|
1291
|
+
@string
|
1238
1292
|
end
|
1239
1293
|
|
1240
|
-
|
1241
|
-
|
1242
|
-
|
1243
|
-
|
1244
|
-
|
1294
|
+
alias_method :to_str, :to_s
|
1295
|
+
|
1296
|
+
def ==(other)
|
1297
|
+
case other
|
1298
|
+
when String
|
1299
|
+
@string == other
|
1300
|
+
when Match
|
1301
|
+
@string == other.to_s
|
1302
|
+
else
|
1303
|
+
super
|
1245
1304
|
end
|
1246
1305
|
end
|
1247
1306
|
|
1248
|
-
|
1307
|
+
alias_method :eql?, :==
|
1249
1308
|
|
1250
|
-
def
|
1251
|
-
|
1252
|
-
|
1309
|
+
def inspect
|
1310
|
+
@string.inspect
|
1311
|
+
end
|
1312
|
+
|
1313
|
+
# Prints the entire subtree of this match using the given +indent+ to
|
1314
|
+
# indicate nested match levels. Useful for debugging.
|
1315
|
+
def dump(indent=' ')
|
1316
|
+
lines = []
|
1317
|
+
stack = []
|
1318
|
+
offset = 0
|
1319
|
+
close = false
|
1320
|
+
index = 0
|
1321
|
+
last_length = nil
|
1322
|
+
|
1323
|
+
while index < @events.size
|
1324
|
+
event = @events[index]
|
1325
|
+
|
1326
|
+
if close
|
1327
|
+
os = stack.pop
|
1328
|
+
start = stack.pop
|
1329
|
+
rule = stack.pop
|
1330
|
+
|
1331
|
+
space = indent * (stack.size / 3)
|
1332
|
+
string = @string.slice(os, event)
|
1333
|
+
lines[start] = "#{space}#{string.inspect} rule=#{rule}, offset=#{os}, length=#{event}"
|
1334
|
+
|
1335
|
+
unless last_length
|
1336
|
+
last_length = event
|
1337
|
+
end
|
1338
|
+
|
1339
|
+
close = false
|
1340
|
+
elsif event == CLOSE
|
1341
|
+
close = true
|
1342
|
+
else
|
1343
|
+
if last_length
|
1344
|
+
offset += last_length
|
1345
|
+
last_length = nil
|
1346
|
+
end
|
1347
|
+
|
1348
|
+
stack << event
|
1349
|
+
stack << index
|
1350
|
+
stack << offset
|
1351
|
+
end
|
1352
|
+
|
1353
|
+
index += 1
|
1253
1354
|
end
|
1355
|
+
|
1356
|
+
puts lines.compact.join("\n")
|
1254
1357
|
end
|
1255
1358
|
end
|
1256
1359
|
end
|