citrus 2.2.2 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/doc/syntax.markdown CHANGED
@@ -27,6 +27,9 @@ match in a case-insensitive manner.
27
27
 
28
28
  `abc` # match "abc" in any case
29
29
 
30
+ Besides case sensitivity, case-insensitive strings have the same behavior as
31
+ double quoted strings.
32
+
30
33
  See [Terminal](api/classes/Citrus/Terminal.html) and
31
34
  [StringTerminal](api/classes/Citrus/StringTerminal.html) for more information.
32
35
 
@@ -69,6 +72,9 @@ that does not match a given expression.
69
72
  ~'a' # match all characters until an "a"
70
73
  ~/xyz/ # match all characters until /xyz/ matches
71
74
 
75
+ When using this operator (the tilde), at least one character must be consumed
76
+ for the rule to succeed.
77
+
72
78
  See [AndPredicate](api/classes/Citrus/AndPredicate.html),
73
79
  [NotPredicate](api/classes/Citrus/NotPredicate.html), and
74
80
  [ButPredicate](api/classes/Citrus/ButPredicate.html) for more information.
@@ -98,25 +104,25 @@ levels of precedence is below.
98
104
 
99
105
  See [Choice](api/classes/Citrus/Choice.html) for more information.
100
106
 
107
+ ## Grouping
108
+
109
+ As is common in many programming languages, parentheses may be used to override
110
+ the normal binding order of operators. In the following example parentheses are
111
+ used to make the vertical bar between `'b'` and `'c'` bind tighter than the
112
+ space between `'a'` and `'b'`.
113
+
114
+ 'a' ('b' | 'c') # match "a", then "b" or "c"
115
+
101
116
  ## Labels
102
117
 
103
118
  Match objects may be referred to by a different name than the rule that
104
- originally generated them. Labels are created by placing the label and a colon
119
+ originally generated them. Labels are added by placing the label and a colon
105
120
  immediately preceding any expression.
106
121
 
107
122
  chars:/[a-z]+/ # the characters matched by the regular expression
108
123
  # may be referred to as "chars" in an extension
109
124
  # method
110
125
 
111
- See [Label](api/classes/Citrus/Label.html) for more information.
112
-
113
- ## Grouping
114
-
115
- As is common in many programming languages, parentheses may be used to override
116
- the normal binding order of operators.
117
-
118
- 'a' ('b' | 'c') # match "a", then "b" or "c"
119
-
120
126
  ## Extensions
121
127
 
122
128
  Extensions may be specified using either "module" or "block" syntax. When using
@@ -128,17 +134,16 @@ in between less than and greater than symbols.
128
134
  # times and extend the match with the
129
135
  # CouponCode module
130
136
 
131
- Additionally, extensions may be specified inline using curly braces. Inside the
132
- curly braces you may embed method definitions that will be used to extend match
133
- objects.
137
+ Additionally, extensions may be specified inline using curly braces. When using
138
+ this method, the code inside the curly braces may be invoked by calling the
139
+ `value` method on the match object.
134
140
 
135
- # match any digit and return its integer value when calling the
136
- # #value method on the match object
137
- [0-9] {
138
- def value
139
- to_i
140
- end
141
- }
141
+ [0-9] { to_i } # match any digit and return its integer value when
142
+ # calling the #value method on the match object
143
+
144
+ Note that when using the inline block method you may also specify arguments in
145
+ between vertical bars immediately following the opening curly brace, just like
146
+ in Ruby blocks.
142
147
 
143
148
  ## Super
144
149
 
@@ -146,6 +151,24 @@ When including a grammar inside another, all rules in the child that have the
146
151
  same name as a rule in the parent also have access to the `super` keyword to
147
152
  invoke the parent rule.
148
153
 
154
+ grammar Number
155
+ def number
156
+ [0-9]+
157
+ end
158
+ end
159
+
160
+ grammar FloatingPoint
161
+ include Number
162
+
163
+ rule number
164
+ super ('.' super)?
165
+ end
166
+ end
167
+
168
+ In the example above, the `FloatingPoint` grammar includes `Number`. Both have a
169
+ rule named `number`, so `FloatingPoint#number` has access to `Number#number` by
170
+ means of using `super`.
171
+
149
172
  See [Super](api/classes/Citrus/Super.html) for more information.
150
173
 
151
174
  ## Precedence
@@ -155,21 +178,21 @@ their precedence. A higher precedence indicates tighter binding.
155
178
 
156
179
  Operator | Name | Precedence
157
180
  ------------------------- | ------------------------- | ----------
158
- `''` | String (single quoted) | 6
159
- `""` | String (double quoted) | 6
160
- <code>``</code> | String (case insensitive) | 6
161
- `[]` | Character class | 6
162
- `.` | Dot (any character) | 6
163
- `//` | Regular expression | 6
164
- `()` | Grouping | 6
165
- `*` | Repetition (arbitrary) | 5
166
- `+` | Repetition (one or more) | 5
167
- `?` | Repetition (zero or one) | 5
168
- `&` | And predicate | 4
169
- `!` | Not predicate | 4
170
- `~` | But predicate | 4
171
- `:` | Label | 4
172
- `<>` | Extension (module name) | 3
173
- `{}` | Extension (literal) | 3
181
+ `''` | String (single quoted) | 7
182
+ `""` | String (double quoted) | 7
183
+ <code>``</code> | String (case insensitive) | 7
184
+ `[]` | Character class | 7
185
+ `.` | Dot (any character) | 7
186
+ `//` | Regular expression | 7
187
+ `()` | Grouping | 7
188
+ `*` | Repetition (arbitrary) | 6
189
+ `+` | Repetition (one or more) | 6
190
+ `?` | Repetition (zero or one) | 6
191
+ `&` | And predicate | 5
192
+ `!` | Not predicate | 5
193
+ `~` | But predicate | 5
194
+ `<>` | Extension (module name) | 4
195
+ `{}` | Extension (literal) | 4
196
+ `:` | Label | 3
174
197
  `e1 e2` | Sequence | 2
175
198
  <code>e1 &#124; e2</code> | Ordered choice | 1
data/examples/calc.citrus CHANGED
@@ -52,7 +52,9 @@ grammar Calc
52
52
  end
53
53
 
54
54
  rule group
55
- (lparen term rparen) { term.value }
55
+ (lparen term rparen) {
56
+ term.value
57
+ }
56
58
  end
57
59
 
58
60
  ## Lexical syntax
@@ -62,11 +64,15 @@ grammar Calc
62
64
  end
63
65
 
64
66
  rule float
65
- (digits '.' digits space*) { strip.to_f }
67
+ (digits '.' digits space*) {
68
+ strip.to_f
69
+ }
66
70
  end
67
71
 
68
72
  rule integer
69
- (digits space*) { strip.to_i }
73
+ (digits space*) {
74
+ strip.to_i
75
+ }
70
76
  end
71
77
 
72
78
  rule digits
data/examples/calc.rb CHANGED
@@ -55,7 +55,9 @@ grammar :Calc do
55
55
  end
56
56
 
57
57
  rule :group do
58
- all(:lparen, :term, :rparen) { term.value }
58
+ all(:lparen, :term, :rparen) {
59
+ term.value
60
+ }
59
61
  end
60
62
 
61
63
  ## Lexical syntax
@@ -65,11 +67,15 @@ grammar :Calc do
65
67
  end
66
68
 
67
69
  rule :float do
68
- all(:digits, '.', :digits, zero_or_more(:space)) { strip.to_f }
70
+ all(:digits, '.', :digits, zero_or_more(:space)) {
71
+ strip.to_f
72
+ }
69
73
  end
70
74
 
71
75
  rule :integer do
72
- all(:digits, zero_or_more(:space)) { strip.to_i }
76
+ all(:digits, zero_or_more(:space)) {
77
+ strip.to_i
78
+ }
73
79
  end
74
80
 
75
81
  rule :digits do
data/examples/ip.rb CHANGED
@@ -1,3 +1,5 @@
1
+ $LOAD_PATH.unshift(File.expand_path('../../lib', __FILE__))
2
+
1
3
  require 'citrus'
2
4
 
3
5
  # This file contains a small suite of tests for the grammars found in ip.citrus.
data/lib/citrus.rb CHANGED
@@ -8,7 +8,7 @@ require 'strscan'
8
8
  module Citrus
9
9
  autoload :File, 'citrus/file'
10
10
 
11
- VERSION = [2, 2, 2]
11
+ VERSION = [2, 3, 0]
12
12
 
13
13
  # Returns the current version of Citrus as a string.
14
14
  def self.version
@@ -20,37 +20,47 @@ module Citrus
20
20
 
21
21
  Infinity = 1.0 / 0
22
22
 
23
- F = ::File
24
-
25
23
  CLOSE = -1
26
24
 
27
- # Loads the grammar from the given +file+ into the global scope using #eval.
28
- def self.load(file)
29
- file << '.citrus' unless F.file?(file)
30
- raise "Cannot find file #{file}" unless F.file?(file)
31
- raise "Cannot read file #{file}" unless F.readable?(file)
32
- eval(F.read(file))
25
+ # Parses the given Citrus +code+ using +options+.
26
+ def self.parse(code, options={})
27
+ File.parse(code, options)
33
28
  end
34
29
 
35
30
  # Evaluates the given Citrus parsing expression grammar +code+ in the global
36
- # scope. Returns an array of any grammar modules that are created. Implicitly
37
- # raises +SyntaxError+ on a failed parse.
31
+ # scope. Returns an array of any grammar modules that are created.
32
+ #
33
+ # Citrus.eval(<<CITRUS)
34
+ # grammar MyGrammar
35
+ # rule abc
36
+ # "abc"
37
+ # end
38
+ # end
39
+ # CITRUS
40
+ #
38
41
  def self.eval(code)
39
- parse(code, :consume => true).value
42
+ parse(code).value
40
43
  end
41
44
 
42
- # Parses the given Citrus +code+ using the given +options+. Returns the
43
- # generated match tree. Raises a +SyntaxError+ if the parse fails.
44
- def self.parse(code, options={})
45
- File.parse(code, options)
45
+ # Evaluates the given expression and creates a new Rule object from it.
46
+ #
47
+ # Citrus.rule('"a" | "b"')
48
+ #
49
+ def self.rule(expr)
50
+ parse(expr, :root => :rule_body).value
51
+ end
52
+
53
+ # Loads the grammar from the given +file+ into the global scope using #eval.
54
+ def self.load(file)
55
+ file << '.citrus' unless ::File.file?(file)
56
+ raise "Cannot find file #{file}" unless ::File.file?(file)
57
+ raise "Cannot read file #{file}" unless ::File.readable?(file)
58
+ eval(::File.read(file))
46
59
  end
47
60
 
48
61
  # A standard error class that all Citrus errors extend.
49
62
  class Error < RuntimeError; end
50
63
 
51
- # Raised when a match cannot be found.
52
- class NoMatchError < Error; end
53
-
54
64
  # Raised when a parse fails.
55
65
  class ParseError < Error
56
66
  # The +input+ given here is an instance of Citrus::Input.
@@ -59,9 +69,7 @@ module Citrus
59
69
  @line_offset = input.line_offset(offset)
60
70
  @line_number = input.line_number(offset)
61
71
  @line = input.line(offset)
62
- msg = "Failed to parse input on line %d at offset %d\n%s" %
63
- [line_number, line_offset, detail]
64
- super(msg)
72
+ super("Failed to parse input on line #{line_number} at offset #{line_offset}\n#{detail}")
65
73
  end
66
74
 
67
75
  # The 0-based offset at which the error occurred in the input, i.e. the
@@ -82,12 +90,12 @@ module Citrus
82
90
  # Returns a string that, when printed, gives a visual representation of
83
91
  # exactly where the error occurred on its line in the input.
84
92
  def detail
85
- "%s\n%s^" % [line, ' ' * line_offset]
93
+ "#{line}\n#{' ' * line_offset}^"
86
94
  end
87
95
  end
88
96
 
89
- # This class represents the core of the parsing algorithm. It wraps the input
90
- # string and serves matches to all nonterminals.
97
+ # An Input is a scanner that is responsible for executing rules at different
98
+ # positions in the input string and persisting event streams.
91
99
  class Input < StringScanner
92
100
  def initialize(string)
93
101
  super(string)
@@ -97,40 +105,25 @@ module Citrus
97
105
  # The maximum offset in the input that was successfully parsed.
98
106
  attr_reader :max_offset
99
107
 
100
- # A nested hash of rule id's to offsets and their respective matches. Only
101
- # present if memoing is enabled.
102
- attr_reader :cache
103
-
104
- # The number of times the cache was hit. Only present if memoing is enabled.
105
- attr_reader :cache_hits
106
-
107
- # Resets all internal variables so that this object may be used in another
108
- # parse.
109
108
  def reset # :nodoc:
110
109
  @max_offset = 0
111
110
  super
112
111
  end
113
112
 
114
- # Returns the length of this input.
115
- def length
116
- string.length
117
- end
118
-
119
113
  # Returns an array containing the lines of text in the input.
120
114
  def lines
121
- string.send(string.respond_to?(:lines) ? :lines : :to_s).to_a
122
- end
123
-
124
- # Iterates over the lines of text in the input using the given +block+.
125
- def each_line(&block)
126
- string.each_line(&block)
115
+ if string.respond_to?(:lines)
116
+ string.lines.to_a
117
+ else
118
+ string.to_a
119
+ end
127
120
  end
128
121
 
129
122
  # Returns the 0-based offset of the given +pos+ in the input on the line
130
123
  # on which it is found. +pos+ defaults to the current pointer position.
131
124
  def line_offset(pos=pos)
132
125
  p = 0
133
- each_line do |line|
126
+ string.each_line do |line|
134
127
  len = line.length
135
128
  return (pos - p) if p + len >= pos
136
129
  p += len
@@ -142,7 +135,7 @@ module Citrus
142
135
  # given +pos+. +pos+ defaults to the current pointer position.
143
136
  def line_index(pos=pos)
144
137
  p = n = 0
145
- each_line do |line|
138
+ string.each_line do |line|
146
139
  p += line.length
147
140
  return n if p >= pos
148
141
  n += 1
@@ -156,7 +149,7 @@ module Citrus
156
149
  line_index(pos) + 1
157
150
  end
158
151
 
159
- alias lineno line_number
152
+ alias_method :lineno, :line_number
160
153
 
161
154
  # Returns the text of the line that contains the character at the given
162
155
  # +pos+. +pos+ defaults to the current pointer position.
@@ -165,17 +158,16 @@ module Citrus
165
158
  end
166
159
 
167
160
  # Returns an array of events for the given +rule+ at the current pointer
168
- # position. Objects in this array may be one of three types: a rule id,
169
- # Citrus::CLOSE, or a length.
161
+ # position. Objects in this array may be one of three types: a Rule,
162
+ # Citrus::CLOSE, or a length (integer).
170
163
  def exec(rule, events=[])
171
164
  start = pos
172
165
  index = events.size
173
166
 
174
- rule.exec(self, events)
175
-
176
- if index < events.size
177
- self.pos = start + events[-1]
167
+ if rule.exec(self, events).size > index
168
+ pos = start + events[-1]
178
169
  @max_offset = pos if pos > @max_offset
170
+ self.pos = pos
179
171
  else
180
172
  self.pos = start
181
173
  end
@@ -186,47 +178,59 @@ module Citrus
186
178
  # Returns the length of a match for the given +rule+ at the current pointer
187
179
  # position, +nil+ if none can be made.
188
180
  def test(rule)
189
- rule.exec(self)[-1]
181
+ start = pos
182
+ events = rule.exec(self)
183
+ self.pos = start
184
+ events[-1]
190
185
  end
191
186
 
192
187
  # Returns +true+ when using memoization to cache match results.
193
188
  def memoized?
194
- !! @cache
189
+ false
195
190
  end
191
+ end
196
192
 
197
- # Modifies this object to cache match results during a parse. This technique
198
- # (also known as "Packrat" parsing) guarantees parsers will operate in
199
- # linear time but costs significantly more in terms of time and memory
200
- # required to perform a parse. For more information, please read the paper
201
- # on Packrat parsing at http://pdos.csail.mit.edu/~baford/packrat/icfp02/.
202
- def memoize!
203
- return if memoized?
204
-
193
+ # A MemoizingInput is an Input that caches segments of the event stream for
194
+ # particular rules in a parse. This technique (also known as "Packrat"
195
+ # parsing) guarantees parsers will operate in linear time but costs
196
+ # significantly more in terms of time and memory required to perform a parse.
197
+ # For more information, please read the paper on Packrat parsing at
198
+ # http://pdos.csail.mit.edu/~baford/packrat/icfp02/.
199
+ class MemoizingInput < Input
200
+ def initialize(string)
201
+ super(string)
205
202
  @cache = {}
206
203
  @cache_hits = 0
204
+ end
207
205
 
208
- # Using +instance_eval+ here preserves access to +super+ within the
209
- # methods we define inside the block.
210
- instance_eval do
211
- def exec(rule, events=[]) # :nodoc:
212
- c = @cache[rule.id] ||= {}
206
+ # A nested hash of rules to offsets and their respective matches.
207
+ attr_reader :cache
213
208
 
214
- e = if c[pos]
215
- @cache_hits += 1
216
- c[pos]
217
- else
218
- c[pos] = super(rule)
219
- end
209
+ # The number of times the cache was hit.
210
+ attr_reader :cache_hits
220
211
 
221
- events.concat(e)
222
- end
212
+ def reset # :nodoc:
213
+ @cache.clear
214
+ @cache_hits = 0
215
+ super
216
+ end
223
217
 
224
- def reset # :nodoc:
225
- @cache.clear
226
- @cache_hits = 0
227
- super
228
- end
218
+ def exec(rule, events=[]) # :nodoc:
219
+ c = @cache[rule] ||= {}
220
+
221
+ e = if c[pos]
222
+ @cache_hits += 1
223
+ c[pos]
224
+ else
225
+ c[pos] = super(rule)
229
226
  end
227
+
228
+ events.concat(e)
229
+ end
230
+
231
+ # Returns +true+ when using memoization to cache match results.
232
+ def memoized?
233
+ true
230
234
  end
231
235
  end
232
236
 
@@ -327,7 +331,7 @@ module Citrus
327
331
  end
328
332
 
329
333
  # Gets/sets the rule with the given +name+. If +obj+ is given the rule
330
- # will be set to the value of +obj+ passed through Rule#new. If a block is
334
+ # will be set to the value of +obj+ passed through Rule.for. If a block is
331
335
  # given, its return value will be used for the value of +obj+.
332
336
  #
333
337
  # It is important to note that this method will also check any included
@@ -340,7 +344,7 @@ module Citrus
340
344
  if obj
341
345
  rule_names << sym unless has_rule?(sym)
342
346
 
343
- rule = Rule.new(obj)
347
+ rule = Rule.for(obj)
344
348
  rule.name = name
345
349
  setup_super(rule, name)
346
350
  rule.grammar = self
@@ -350,7 +354,9 @@ module Citrus
350
354
 
351
355
  rules[sym] || super_rule(sym)
352
356
  rescue => e
353
- raise 'Cannot create rule "%s": %s' % [name, e.message]
357
+ # This preserves the backtrace.
358
+ e.message.replace("Cannot create rule \"#{name}\": #{e.message}")
359
+ raise e
354
360
  end
355
361
 
356
362
  # Gets/sets the +name+ of the root rule of this grammar. If no root rule is
@@ -364,7 +370,7 @@ module Citrus
364
370
  # Creates a new rule that will match any single character. A block may be
365
371
  # provided to specify semantic behavior (via #ext).
366
372
  def dot(&block)
367
- ext(Rule.new(DOT), block)
373
+ ext(Rule.for(DOT), block)
368
374
  end
369
375
 
370
376
  # Creates a new Super for the rule currently being defined in the grammar. A
@@ -387,18 +393,10 @@ module Citrus
387
393
 
388
394
  # Creates a new ButPredicate using the given +rule+. A block may be provided
389
395
  # to specify semantic behavior (via #ext).
390
- def but(rule, &block)
396
+ def butp(rule, &block)
391
397
  ext(ButPredicate.new(rule), block)
392
398
  end
393
399
 
394
- alias butp but # For consistency with #andp and #notp.
395
-
396
- # Creates a new Label using the given +rule+ and +label+. A block may be
397
- # provided to specify semantic behavior (via #ext).
398
- def label(rule, label, &block)
399
- ext(Label.new(rule, label), block)
400
- end
401
-
402
400
  # Creates a new Repeat using the given +rule+. +min+ and +max+ specify the
403
401
  # minimum and maximum number of times the rule must match. A block may be
404
402
  # provided to specify semantic behavior (via #ext).
@@ -433,30 +431,37 @@ module Citrus
433
431
  ext(Choice.new(args), block)
434
432
  end
435
433
 
434
+ # Adds +label+ to the given +rule+.A block may be provided to specify
435
+ # semantic behavior (via #ext).
436
+ def label(rule, label, &block)
437
+ rule = ext(rule, block)
438
+ rule.label = label
439
+ rule
440
+ end
441
+
436
442
  # Specifies a Module that will be used to extend all matches created with
437
443
  # the given +rule+. A block may also be given that will be used to create
438
- # an anonymous module. See Rule#ext=.
444
+ # an anonymous module. See Rule#extension=.
439
445
  def ext(rule, mod=nil, &block)
440
- rule = Rule.new(rule)
446
+ rule = Rule.for(rule)
441
447
  mod = block if block
442
448
  rule.extension = mod if mod
443
449
  rule
444
450
  end
451
+
452
+ # Creates a new Module from the given +block+ and sets it to be the
453
+ # extension of the given +rule+. See Rule#extension=.
454
+ def mod(rule, &block)
455
+ rule.extension = Module.new(&block)
456
+ rule
457
+ end
445
458
  end
446
459
 
447
- # A Rule is an object that is used by a grammar to create matches on the
460
+ # A Rule is an object that is used by a grammar to create matches on an
448
461
  # Input during parsing.
449
462
  module Rule
450
- # Evaluates the given expression and creates a new rule object from it.
451
- #
452
- # Citrus::Rule.eval('"a" | "b"')
453
- #
454
- def self.eval(expr)
455
- Citrus.parse(expr, :root => :rule_body, :consume => true).value
456
- end
457
-
458
463
  # Returns a new Rule object depending on the type of object given.
459
- def self.new(obj)
464
+ def self.for(obj)
460
465
  case obj
461
466
  when Rule then obj
462
467
  when Symbol then Alias.new(obj)
@@ -466,33 +471,10 @@ module Citrus
466
471
  when Range then Choice.new(obj.to_a)
467
472
  when Numeric then StringTerminal.new(obj.to_s)
468
473
  else
469
- raise ArgumentError, "Invalid rule object: %s" % obj.inspect
474
+ raise ArgumentError, "Invalid rule object: #{obj.inspect}"
470
475
  end
471
476
  end
472
477
 
473
- @unique_id = 0
474
-
475
- # A global registry for Rule objects. Keyed by rule id.
476
- @rules = {}
477
-
478
- # Adds the given +rule+ to the global registry and gives it an id.
479
- def self.<<(rule) # :nodoc:
480
- rule.id = (@unique_id += 1)
481
- @rules[rule.id] = rule
482
- end
483
-
484
- # Returns the Rule object with the given +id+.
485
- def self.[](id)
486
- @rules[id]
487
- end
488
-
489
- def initialize(*args) # :nodoc:
490
- Rule << self
491
- end
492
-
493
- # An integer id that is unique to this rule.
494
- attr_accessor :id
495
-
496
478
  # The grammar this rule belongs to.
497
479
  attr_accessor :grammar
498
480
 
@@ -501,28 +483,25 @@ module Citrus
501
483
  @name = name.to_sym
502
484
  end
503
485
 
504
- # Returns the name of this rule.
505
- def name
506
- @name || '<anonymous>'
507
- end
486
+ # The name of this rule.
487
+ attr_reader :name
508
488
 
509
- # Returns +true+ if this rule has a name, +false+ otherwise.
510
- def named?
511
- !! @name
489
+ # Sets the label of this rule.
490
+ def label=(label)
491
+ @label = label.to_sym
512
492
  end
513
493
 
494
+ # A label for this rule. If a rule has a label, all matches that it creates
495
+ # will be accessible as named captures from the scope of their parent match
496
+ # using that label.
497
+ attr_reader :label
498
+
514
499
  # Specifies a module that will be used to extend all Match objects that
515
500
  # result from this rule. If +mod+ is a Proc, it is used to create an
516
- # anonymous module.
501
+ # anonymous module with a +value+ method.
517
502
  def extension=(mod)
518
503
  if Proc === mod
519
- begin
520
- tmp = Module.new(&mod)
521
- raise ArgumentError if tmp.instance_methods.empty?
522
- mod = tmp
523
- rescue NoMethodError, ArgumentError, NameError
524
- mod = Module.new { define_method(:value, &mod) }
525
- end
504
+ mod = Module.new { define_method(:value, &mod) }
526
505
  end
527
506
 
528
507
  raise ArgumentError unless Module === mod
@@ -543,18 +522,22 @@ module Citrus
543
522
  # +false+.
544
523
  # consume:: If this is +true+ a ParseError will be raised during a parse
545
524
  # unless the entire input string is consumed. Defaults to
546
- # +false+.
525
+ # +true+.
547
526
  def parse(string, options={})
548
527
  opts = default_parse_options.merge(options)
549
528
 
550
- input = Input.new(string)
551
- input.memoize! if opts[:memoize]
529
+ input = if opts[:memoize]
530
+ MemoizingInput.new(string)
531
+ else
532
+ Input.new(string)
533
+ end
534
+
552
535
  input.pos = opts[:offset] if opts[:offset] > 0
553
536
 
554
537
  events = input.exec(self)
555
538
  length = events[-1]
556
539
 
557
- if !length || (opts[:consume] && length < (input.length - opts[:offset]))
540
+ if !length || (opts[:consume] && length < (string.length - opts[:offset]))
558
541
  raise ParseError.new(input)
559
542
  end
560
543
 
@@ -565,135 +548,71 @@ module Citrus
565
548
  def default_parse_options # :nodoc:
566
549
  { :offset => 0,
567
550
  :memoize => false,
568
- :consume => false
551
+ :consume => true
569
552
  }
570
553
  end
571
554
 
572
555
  # Tests whether or not this rule matches on the given +string+. Returns the
573
556
  # length of the match if any can be made, +nil+ otherwise.
574
557
  def test(string)
575
- input = Input.new(string)
576
- input.test(self)
558
+ Input.new(string).test(self)
577
559
  end
578
560
 
579
561
  # Returns +true+ if this rule is a Terminal.
580
562
  def terminal?
581
- is_a?(Terminal)
582
- end
583
-
584
- # Returns +true+ if this rule is able to propagate extensions from child
585
- # rules to the scope of the parent, +false+ otherwise. In general, this will
586
- # return +false+ for any rule whose match value is derived from an arbitrary
587
- # number of child rules, such as a Repeat or a Sequence. Note that this is
588
- # not true for Choice objects because they rely on exactly 1 rule to match,
589
- # as do Proxy objects.
590
- def propagates_extensions?
591
- case self
592
- when AndPredicate, NotPredicate, ButPredicate, Repeat, Sequence
593
- false
594
- else
595
- true
596
- end
597
- end
598
-
599
- # Returns +true+ if this rule needs to be surrounded by parentheses when
600
- # using #embed.
601
- def paren?
602
563
  false
603
564
  end
604
565
 
605
- # Returns a string version of this rule that is suitable to be used in the
606
- # string representation of another rule.
607
- def embed
608
- named? ? name.to_s : (paren? ? '(%s)' % to_s : to_s)
609
- end
610
-
611
- def inspect # :nodoc:
612
- to_s
566
+ # Returns +true+ if this rule should extend a match but should not appear in
567
+ # its event stream.
568
+ def elide?
569
+ false
613
570
  end
614
571
 
615
- def extend_match(match) # :nodoc:
616
- match.names << name if named?
617
- match.extend(extension) if extension
572
+ # Returns +true+ if this rule needs to be surrounded by parentheses when
573
+ # using #to_embedded_s.
574
+ def needs_paren? # :nodoc:
575
+ is_a?(Nonterminal) && rules.length > 1
618
576
  end
619
- end
620
-
621
- # A Terminal is a Rule that matches directly on the input stream and may not
622
- # contain any other rule. Terminals are essentially wrappers for regular
623
- # expressions. As such, the Citrus notation is identical to Ruby's regular
624
- # expression notation, e.g.:
625
- #
626
- # /expr/
627
- #
628
- # Character classes and the dot symbol may also be used in Citrus notation for
629
- # compatibility with other parsing expression implementations, e.g.:
630
- #
631
- # [a-zA-Z]
632
- # .
633
- #
634
- class Terminal
635
- include Rule
636
577
 
637
- def initialize(rule=/^/)
638
- super
639
- @rule = rule
578
+ # Returns the Citrus notation of this rule as a string.
579
+ def to_s
580
+ if label
581
+ "#{label}:" + (needs_paren? ? "(#{to_citrus})" : to_citrus)
582
+ else
583
+ to_citrus
584
+ end
640
585
  end
641
586
 
642
- # The actual Regexp object this rule uses to match.
643
- attr_reader :rule
587
+ alias_method :to_str, :to_s
644
588
 
645
- # Returns an array of events for this rule on the given +input+.
646
- def exec(input, events=[])
647
- length = input.scan_full(rule, false, false)
648
- if length
649
- events << id
650
- events << CLOSE
651
- events << length
589
+ # Returns the Citrus notation of this rule as a string that is suitable to
590
+ # be embedded in the string representation of another rule.
591
+ def to_embedded_s # :nodoc:
592
+ if name
593
+ name.to_s
594
+ else
595
+ needs_paren? && label.nil? ? "(#{to_s})" : to_s
652
596
  end
653
- events
654
597
  end
655
598
 
656
- # Returns +true+ if this rule is case sensitive.
657
- def case_sensitive?
658
- !rule.casefold?
599
+ def ==(other)
600
+ case other
601
+ when Rule
602
+ to_s == other.to_s
603
+ else
604
+ super
605
+ end
659
606
  end
660
607
 
661
- # Returns the Citrus notation of this rule as a string.
662
- def to_s
663
- rule.inspect
664
- end
665
- end
608
+ alias_method :eql?, :==
666
609
 
667
- # A StringTerminal is a Terminal that may be instantiated from a String
668
- # object. The Citrus notation is any sequence of characters enclosed in either
669
- # single or double quotes, e.g.:
670
- #
671
- # 'expr'
672
- # "expr"
673
- #
674
- # This notation works the same as it does in Ruby; i.e. strings in double
675
- # quotes may contain escape sequences while strings in single quotes may not.
676
- # In order to specify that a string should ignore case when matching, enclose
677
- # it in backticks instead of single or double quotes, e.g.:
678
- #
679
- # `expr`
680
- #
681
- # Besides case sensitivity, case-insensitive strings have the same semantics
682
- # as double-quoted strings.
683
- class StringTerminal < Terminal
684
- # The +flags+ will be passed directly to Regexp#new.
685
- def initialize(rule='', flags=0)
686
- super(Regexp.new(Regexp.escape(rule), flags))
687
- @string = rule
610
+ def inspect # :nodoc:
611
+ to_s
688
612
  end
689
613
 
690
- # Returns the Citrus notation of this rule as a string.
691
- def to_s
692
- if case_sensitive?
693
- @string.inspect
694
- else
695
- @string.inspect.gsub(/^"|"$/, '`')
696
- end
614
+ def extend_match(match) # :nodoc:
615
+ match.extend(extension) if extension
697
616
  end
698
617
  end
699
618
 
@@ -705,7 +624,6 @@ module Citrus
705
624
  include Rule
706
625
 
707
626
  def initialize(rule_name='<proxy>')
708
- super
709
627
  self.rule_name = rule_name
710
628
  end
711
629
 
@@ -724,19 +642,29 @@ module Citrus
724
642
 
725
643
  # Returns an array of events for this rule on the given +input+.
726
644
  def exec(input, events=[])
727
- events << id
728
-
729
645
  index = events.size
730
- start = index - 1
646
+
731
647
  if input.exec(rule, events).size > index
732
- events << CLOSE
733
- events << events[-2]
734
- else
735
- events.slice!(start, events.size)
648
+ # Proxy objects insert themselves into the event stream in place of the
649
+ # rule they are proxy for.
650
+ events[index] = self
736
651
  end
737
652
 
738
653
  events
739
654
  end
655
+
656
+ # Returns +true+ if this rule should extend a match but should not appear in
657
+ # its event stream.
658
+ def elide? # :nodoc:
659
+ rule.elide?
660
+ end
661
+
662
+ def extend_match(match) # :nodoc:
663
+ # Proxy objects preserve the extension of the rule they are proxy for, and
664
+ # may also use their own extension.
665
+ rule.extend_match(match)
666
+ super
667
+ end
740
668
  end
741
669
 
742
670
  # An Alias is a Proxy for a rule in the same grammar. It is used in rule
@@ -749,7 +677,7 @@ module Citrus
749
677
  include Proxy
750
678
 
751
679
  # Returns the Citrus notation of this rule as a string.
752
- def to_s
680
+ def to_citrus # :nodoc:
753
681
  rule_name.to_s
754
682
  end
755
683
 
@@ -758,8 +686,14 @@ module Citrus
758
686
  # Searches this proxy's grammar and any included grammars for a rule with
759
687
  # this proxy's #rule_name. Raises an error if one cannot be found.
760
688
  def resolve!
761
- grammar.rule(rule_name) or raise RuntimeError,
762
- 'No rule named "%s" in grammar %s' % [rule_name, grammar.name]
689
+ rule = grammar.rule(rule_name)
690
+
691
+ unless rule
692
+ raise RuntimeError,
693
+ "No rule named \"#{rule_name}\" in grammar #{grammar.name}"
694
+ end
695
+
696
+ rule
763
697
  end
764
698
  end
765
699
 
@@ -774,7 +708,7 @@ module Citrus
774
708
  include Proxy
775
709
 
776
710
  # Returns the Citrus notation of this rule as a string.
777
- def to_s
711
+ def to_citrus # :nodoc:
778
712
  'super'
779
713
  end
780
714
 
@@ -783,8 +717,119 @@ module Citrus
783
717
  # Searches this proxy's included grammars for a rule with this proxy's
784
718
  # #rule_name. Raises an error if one cannot be found.
785
719
  def resolve!
786
- grammar.super_rule(rule_name) or raise RuntimeError,
787
- 'No rule named "%s" in hierarchy of grammar %s' % [rule_name, grammar.name]
720
+ rule = grammar.super_rule(rule_name)
721
+
722
+ unless rule
723
+ raise RuntimeError,
724
+ "No rule named \"#{rule_name}\" in hierarchy of grammar #{grammar.name}"
725
+ end
726
+
727
+ rule
728
+ end
729
+ end
730
+
731
+ # A Terminal is a Rule that matches directly on the input stream and may not
732
+ # contain any other rule. Terminals are essentially wrappers for regular
733
+ # expressions. As such, the Citrus notation is identical to Ruby's regular
734
+ # expression notation, e.g.:
735
+ #
736
+ # /expr/
737
+ #
738
+ # Character classes and the dot symbol may also be used in Citrus notation for
739
+ # compatibility with other parsing expression implementations, e.g.:
740
+ #
741
+ # [a-zA-Z]
742
+ # .
743
+ #
744
+ # Character classes have the same semantics as character classes inside Ruby
745
+ # regular expressions. The dot matches any character, including newlines.
746
+ class Terminal
747
+ include Rule
748
+
749
+ def initialize(regexp=/^/)
750
+ @regexp = regexp
751
+ end
752
+
753
+ # The actual Regexp object this rule uses to match.
754
+ attr_reader :regexp
755
+
756
+ # Returns an array of events for this rule on the given +input+.
757
+ def exec(input, events=[])
758
+ length = input.scan_full(@regexp, false, false)
759
+
760
+ if length
761
+ events << self
762
+ events << CLOSE
763
+ events << length
764
+ end
765
+
766
+ events
767
+ end
768
+
769
+ # Returns +true+ if this rule is case sensitive.
770
+ def case_sensitive?
771
+ !@regexp.casefold?
772
+ end
773
+
774
+ def ==(other)
775
+ case other
776
+ when Regexp
777
+ @regexp == other
778
+ else
779
+ super
780
+ end
781
+ end
782
+
783
+ # Returns +true+ if this rule is a Terminal.
784
+ def terminal? # :nodoc:
785
+ true
786
+ end
787
+
788
+ # Returns the Citrus notation of this rule as a string.
789
+ def to_citrus # :nodoc:
790
+ @regexp.inspect
791
+ end
792
+ end
793
+
794
+ # A StringTerminal is a Terminal that may be instantiated from a String
795
+ # object. The Citrus notation is any sequence of characters enclosed in either
796
+ # single or double quotes, e.g.:
797
+ #
798
+ # 'expr'
799
+ # "expr"
800
+ #
801
+ # This notation works the same as it does in Ruby; i.e. strings in double
802
+ # quotes may contain escape sequences while strings in single quotes may not.
803
+ # In order to specify that a string should ignore case when matching, enclose
804
+ # it in backticks instead of single or double quotes, e.g.:
805
+ #
806
+ # `expr`
807
+ #
808
+ # Besides case sensitivity, case-insensitive strings have the same semantics
809
+ # as double-quoted strings.
810
+ class StringTerminal < Terminal
811
+ # The +flags+ will be passed directly to Regexp#new.
812
+ def initialize(rule='', flags=0)
813
+ super(Regexp.new(Regexp.escape(rule), flags))
814
+ @string = rule
815
+ end
816
+
817
+ def ==(other)
818
+ case other
819
+ when String
820
+ @string == other
821
+ else
822
+ super
823
+ end
824
+ end
825
+
826
+ # Returns the Citrus notation of this rule as a string.
827
+ def to_citrus # :nodoc:
828
+ if case_sensitive?
829
+ @string.inspect
830
+ else
831
+ @string.inspect.gsub(/^"|"$/, '`')
832
+ end
788
833
  end
789
834
  end
790
835
 
@@ -796,8 +841,7 @@ module Citrus
796
841
  include Rule
797
842
 
798
843
  def initialize(rules=[])
799
- super
800
- @rules = rules.map {|r| Rule.new(r) }
844
+ @rules = rules.map {|r| Rule.for(r) }
801
845
  end
802
846
 
803
847
  # An array of the actual Rule objects this rule uses to match.
@@ -809,8 +853,13 @@ module Citrus
809
853
  end
810
854
  end
811
855
 
812
- # A Predicate is a Nonterminal that contains one other rule.
813
- module Predicate
856
+ # An AndPredicate is a Nonterminal that contains a rule that must match. Upon
857
+ # success an empty match is returned and no input is consumed. The Citrus
858
+ # notation is any expression preceded by an ampersand, e.g.:
859
+ #
860
+ # &expr
861
+ #
862
+ class AndPredicate
814
863
  include Nonterminal
815
864
 
816
865
  def initialize(rule='')
@@ -821,145 +870,108 @@ module Citrus
821
870
  def rule
822
871
  rules[0]
823
872
  end
824
- end
825
-
826
- # An AndPredicate is a Predicate that contains a rule that must match. Upon
827
- # success an empty match is returned and no input is consumed. The Citrus
828
- # notation is any expression preceded by an ampersand, e.g.:
829
- #
830
- # &expr
831
- #
832
- class AndPredicate
833
- include Predicate
834
873
 
835
874
  # Returns an array of events for this rule on the given +input+.
836
875
  def exec(input, events=[])
837
876
  if input.test(rule)
838
- events << id
877
+ events << self
839
878
  events << CLOSE
840
879
  events << 0
841
880
  end
881
+
842
882
  events
843
883
  end
844
884
 
845
885
  # Returns the Citrus notation of this rule as a string.
846
- def to_s
847
- '&' + rule.embed
886
+ def to_citrus # :nodoc:
887
+ '&' + rule.to_embedded_s
848
888
  end
849
889
  end
850
890
 
851
- # A NotPredicate is a Predicate that contains a rule that must not match. Upon
852
- # success an empty match is returned and no input is consumed. The Citrus
891
+ # A NotPredicate is a Nonterminal that contains a rule that must not match.
892
+ # Upon success an empty match is returned and no input is consumed. The Citrus
853
893
  # notation is any expression preceded by an exclamation mark, e.g.:
854
894
  #
855
895
  # !expr
856
896
  #
857
897
  class NotPredicate
858
- include Predicate
898
+ include Nonterminal
899
+
900
+ def initialize(rule='')
901
+ super([rule])
902
+ end
903
+
904
+ # Returns the Rule object this rule uses to match.
905
+ def rule
906
+ rules[0]
907
+ end
859
908
 
860
909
  # Returns an array of events for this rule on the given +input+.
861
910
  def exec(input, events=[])
862
911
  unless input.test(rule)
863
- events << id
912
+ events << self
864
913
  events << CLOSE
865
914
  events << 0
866
915
  end
916
+
867
917
  events
868
918
  end
869
919
 
870
920
  # Returns the Citrus notation of this rule as a string.
871
- def to_s
872
- '!' + rule.embed
921
+ def to_citrus # :nodoc:
922
+ '!' + rule.to_embedded_s
873
923
  end
874
924
  end
875
925
 
876
- # A ButPredicate is a Predicate that consumes all characters until its rule
926
+ # A ButPredicate is a Nonterminal that consumes all characters until its rule
877
927
  # matches. It must match at least one character in order to succeed. The
878
928
  # Citrus notation is any expression preceded by a tilde, e.g.:
879
929
  #
880
930
  # ~expr
881
931
  #
882
932
  class ButPredicate
883
- include Predicate
933
+ include Nonterminal
884
934
 
885
- DOT_RULE = Rule.new(DOT)
935
+ DOT_RULE = Rule.for(DOT)
936
+
937
+ def initialize(rule='')
938
+ super([rule])
939
+ end
940
+
941
+ # Returns the Rule object this rule uses to match.
942
+ def rule
943
+ rules[0]
944
+ end
886
945
 
887
946
  # Returns an array of events for this rule on the given +input+.
888
947
  def exec(input, events=[])
889
948
  length = 0
949
+
890
950
  until input.test(rule)
891
951
  len = input.exec(DOT_RULE)[-1]
892
952
  break unless len
893
953
  length += len
894
954
  end
955
+
895
956
  if length > 0
896
- events << id
957
+ events << self
897
958
  events << CLOSE
898
959
  events << length
899
960
  end
900
- events
901
- end
902
-
903
- # Returns the Citrus notation of this rule as a string.
904
- def to_s
905
- '~' + rule.embed
906
- end
907
- end
908
-
909
- # A Label is a Predicate that applies a new name to any matches made by its
910
- # rule. The Citrus notation is any sequence of word characters (i.e.
911
- # <tt>[a-zA-Z0-9_]</tt>) followed by a colon, followed by any other
912
- # expression, e.g.:
913
- #
914
- # label:expr
915
- #
916
- class Label
917
- include Predicate
918
-
919
- def initialize(rule='', label='<label>')
920
- super(rule)
921
- self.label = label
922
- end
923
-
924
- # Sets the name of this label.
925
- def label=(label)
926
- @label = label.to_sym
927
- end
928
-
929
- # The label this rule adds to all its matches.
930
- attr_reader :label
931
-
932
- # Returns an array of events for this rule on the given +input+.
933
- def exec(input, events=[])
934
- events << id
935
-
936
- index = events.size
937
- start = index - 1
938
- if input.exec(rule, events).size > index
939
- events << CLOSE
940
- events << events[-2]
941
- else
942
- events.slice!(start, events.size)
943
- end
944
961
 
945
962
  events
946
963
  end
947
964
 
948
965
  # Returns the Citrus notation of this rule as a string.
949
- def to_s
950
- label.to_s + ':' + rule.embed
951
- end
952
-
953
- def extend_match(match) # :nodoc:
954
- match.names << label
955
- super
966
+ def to_citrus # :nodoc:
967
+ '~' + rule.to_embedded_s
956
968
  end
957
969
  end
958
970
 
959
- # A Repeat is a Predicate that specifies a minimum and maximum number of times
960
- # its rule must match. The Citrus notation is an integer, +N+, followed by an
961
- # asterisk, followed by another integer, +M+, all of which follow any other
962
- # expression, e.g.:
971
+ # A Repeat is a Nonterminal that specifies a minimum and maximum number of
972
+ # times its rule must match. The Citrus notation is an integer, +N+, followed
973
+ # by an asterisk, followed by another integer, +M+, all of which follow any
974
+ # other expression, e.g.:
963
975
  #
964
976
  # expr N*M
965
977
  #
@@ -976,22 +988,29 @@ module Citrus
976
988
  # expr?
977
989
  #
978
990
  class Repeat
979
- include Predicate
991
+ include Nonterminal
980
992
 
981
993
  def initialize(rule='', min=1, max=Infinity)
982
994
  raise ArgumentError, "Min cannot be greater than max" if min > max
983
- super(rule)
995
+ super([rule])
984
996
  @range = Range.new(min, max)
985
997
  end
986
998
 
999
+ # Returns the Rule object this rule uses to match.
1000
+ def rule
1001
+ rules[0]
1002
+ end
1003
+
987
1004
  # Returns an array of events for this rule on the given +input+.
988
1005
  def exec(input, events=[])
989
- events << id
1006
+ events << self
990
1007
 
991
1008
  index = events.size
992
1009
  start = index - 1
993
1010
  length = n = 0
994
- while n < max && input.exec(rule, events).size > index
1011
+ m = max
1012
+
1013
+ while n < m && input.exec(rule, events).size > index
995
1014
  index = events.size
996
1015
  length += events[-1]
997
1016
  n += 1
@@ -1030,44 +1049,37 @@ module Citrus
1030
1049
  end
1031
1050
 
1032
1051
  # Returns the Citrus notation of this rule as a string.
1033
- def to_s
1034
- rule.embed + operator
1035
- end
1036
- end
1037
-
1038
- # A List is a Nonterminal that contains any number of other rules and tests
1039
- # them for matches in sequential order.
1040
- module List
1041
- include Nonterminal
1042
-
1043
- # See Rule#paren?.
1044
- def paren?
1045
- rules.length > 1
1052
+ def to_citrus # :nodoc:
1053
+ rule.to_embedded_s + operator
1046
1054
  end
1047
1055
  end
1048
1056
 
1049
- # A Choice is a List where only one rule must match. The Citrus notation is
1050
- # two or more expressions separated by a vertical bar, e.g.:
1057
+ # A Sequence is a Nonterminal where all rules must match. The Citrus notation
1058
+ # is two or more expressions separated by a space, e.g.:
1051
1059
  #
1052
- # expr | expr
1060
+ # expr expr
1053
1061
  #
1054
- class Choice
1055
- include List
1062
+ class Sequence
1063
+ include Nonterminal
1056
1064
 
1057
1065
  # Returns an array of events for this rule on the given +input+.
1058
1066
  def exec(input, events=[])
1059
- events << id
1067
+ events << self
1060
1068
 
1061
1069
  index = events.size
1062
1070
  start = index - 1
1063
- n = 0
1064
- while n < rules.length && input.exec(rules[n], events).size == index
1071
+ length = n = 0
1072
+ m = rules.length
1073
+
1074
+ while n < m && input.exec(rules[n], events).size > index
1075
+ index = events.size
1076
+ length += events[-1]
1065
1077
  n += 1
1066
1078
  end
1067
1079
 
1068
- if index < events.size
1080
+ if n == rules.length
1069
1081
  events << CLOSE
1070
- events << events[-2]
1082
+ events << length
1071
1083
  else
1072
1084
  events.slice!(start, events.size)
1073
1085
  end
@@ -1076,181 +1088,272 @@ module Citrus
1076
1088
  end
1077
1089
 
1078
1090
  # Returns the Citrus notation of this rule as a string.
1079
- def to_s
1080
- rules.map {|r| r.embed }.join(' | ')
1091
+ def to_citrus # :nodoc:
1092
+ rules.map {|r| r.to_embedded_s }.join(' ')
1081
1093
  end
1082
1094
  end
1083
1095
 
1084
- # A Sequence is a List where all rules must match. The Citrus notation is two
1085
- # or more expressions separated by a space, e.g.:
1096
+ # A Choice is a Nonterminal where only one rule must match. The Citrus
1097
+ # notation is two or more expressions separated by a vertical bar, e.g.:
1086
1098
  #
1087
- # expr expr
1099
+ # expr | expr
1088
1100
  #
1089
- class Sequence
1090
- include List
1101
+ class Choice
1102
+ include Nonterminal
1091
1103
 
1092
1104
  # Returns an array of events for this rule on the given +input+.
1093
1105
  def exec(input, events=[])
1094
- events << id
1106
+ events << self
1095
1107
 
1096
1108
  index = events.size
1097
- start = index - 1
1098
- length = n = 0
1099
- while n < rules.length && input.exec(rules[n], events).size > index
1100
- index = events.size
1101
- length += events[-1]
1109
+ n = 0
1110
+ m = rules.length
1111
+
1112
+ while n < m && input.exec(rules[n], events).size == index
1102
1113
  n += 1
1103
1114
  end
1104
1115
 
1105
- if n == rules.length
1116
+ if index < events.size
1106
1117
  events << CLOSE
1107
- events << length
1118
+ events << events[-2]
1108
1119
  else
1109
- events.slice!(start, events.size)
1120
+ events.pop
1110
1121
  end
1111
1122
 
1112
1123
  events
1113
1124
  end
1114
1125
 
1126
+ # Returns +true+ if this rule should extend a match but should not appear in
1127
+ # its event stream.
1128
+ def elide? # :nodoc:
1129
+ true
1130
+ end
1131
+
1115
1132
  # Returns the Citrus notation of this rule as a string.
1116
- def to_s
1117
- rules.map {|r| r.embed }.join(' ')
1133
+ def to_citrus # :nodoc:
1134
+ rules.map {|r| r.to_embedded_s }.join(' | ')
1118
1135
  end
1119
1136
  end
1120
1137
 
1121
1138
  # The base class for all matches. Matches are organized into a tree where any
1122
- # match may contain any number of other matches. This class provides several
1123
- # convenient tree traversal methods that help when examining parse results.
1124
- class Match < String
1139
+ # match may contain any number of other matches. Nodes of the tree are lazily
1140
+ # instantiated as needed. This class provides several convenient tree
1141
+ # traversal methods that help when examining and interpreting parse results.
1142
+ class Match
1125
1143
  def initialize(string, events=[])
1126
- raise ArgumentError, "Invalid events for match length %d" %
1127
- string.length if events[-1] && string.length != events[-1]
1144
+ @string = string
1128
1145
 
1129
- super(string)
1130
- @events = events
1146
+ if events.length > 0
1147
+ if events[-1] != string.length
1148
+ raise ArgumentError, "Invalid events for length #{string.length}"
1149
+ end
1131
1150
 
1132
- extend!
1133
- end
1151
+ elisions = []
1134
1152
 
1135
- # The array of events that was passed to the constructor.
1136
- attr_reader :events
1153
+ while events[0].elide?
1154
+ elisions.unshift(events.shift)
1155
+ events = events.slice(0, events.length - 2)
1156
+ end
1137
1157
 
1138
- # An array of all names of this match. A name is added to a match object
1139
- # for each rule that returns that object when matching. These names can then
1140
- # be used to determine which rules were satisfied by a given match.
1141
- def names
1142
- @names ||= []
1143
- end
1158
+ events[0].extend_match(self)
1144
1159
 
1145
- # The name of the lowest level rule that originally created this match.
1146
- def name
1147
- names.first
1148
- end
1160
+ elisions.each do |rule|
1161
+ rule.extend_match(self)
1162
+ end
1163
+ end
1149
1164
 
1150
- # Returns +true+ if this match has the given +name+.
1151
- def has_name?(name)
1152
- names.include?(name.to_sym)
1165
+ @events = events
1153
1166
  end
1154
1167
 
1155
- # Returns an array of all Rule objects that extend this match.
1156
- def extenders
1157
- @extenders ||= begin
1158
- extenders = []
1159
- @events.each do |event|
1160
- break if event == CLOSE
1161
- rule = Rule[event]
1162
- extenders.unshift(rule)
1163
- break unless rule.propagates_extensions?
1164
- end
1165
- extenders
1166
- end
1168
+ # The array of events for this match.
1169
+ attr_reader :events
1170
+
1171
+ # Returns the length of this match.
1172
+ def length
1173
+ @string.length
1167
1174
  end
1168
1175
 
1169
- # Returns an array of Match objects that are submatches of this match in the
1170
- # order they appeared in the input.
1171
- def matches
1172
- @matches ||= begin
1173
- matches = []
1176
+ # Returns a hash of capture names to arrays of matches with that name,
1177
+ # in the order they appeared in the input.
1178
+ def captures
1179
+ @captures ||= begin
1180
+ captures = {}
1174
1181
  stack = []
1175
1182
  offset = 0
1176
1183
  close = false
1177
1184
  index = 0
1185
+ last_length = nil
1186
+ in_proxy = false
1187
+ count = 0
1178
1188
 
1179
1189
  while index < @events.size
1180
1190
  event = @events[index]
1191
+
1181
1192
  if close
1182
1193
  start = stack.pop
1183
- if stack.size == extenders.size
1184
- matches << Match.new(slice(offset, event), @events[start..index])
1185
- offset += event
1194
+
1195
+ if Rule === start
1196
+ rule = start
1197
+ os = stack.pop
1198
+ start = stack.pop
1199
+
1200
+ match = Match.new(@string.slice(os, event), @events[start..index])
1201
+
1202
+ # We can lookup immediate submatches by their index.
1203
+ if stack.size == 1
1204
+ captures[count] = match
1205
+ count += 1
1206
+ end
1207
+
1208
+ # We can lookup matches that were created by proxy by the name of
1209
+ # the rule they are proxy for.
1210
+ if Proxy === rule
1211
+ if captures[rule.rule_name]
1212
+ captures[rule.rule_name] << match
1213
+ else
1214
+ captures[rule.rule_name] = [match]
1215
+ end
1216
+ end
1217
+
1218
+ # We can lookup matches that were created by rules with labels by
1219
+ # that label.
1220
+ if rule.label
1221
+ if captures[rule.label]
1222
+ captures[rule.label] << match
1223
+ else
1224
+ captures[rule.label] = [match]
1225
+ end
1226
+ end
1227
+
1228
+ in_proxy = false
1186
1229
  end
1230
+
1231
+ unless last_length
1232
+ last_length = event
1233
+ end
1234
+
1187
1235
  close = false
1188
1236
  elsif event == CLOSE
1189
1237
  close = true
1190
1238
  else
1191
1239
  stack << index
1240
+
1241
+ # We can calculate the offset of this rule event by adding back the
1242
+ # last match length.
1243
+ if last_length
1244
+ offset += last_length
1245
+ last_length = nil
1246
+ end
1247
+
1248
+ # We should not create captures when traversing the portion of the
1249
+ # event stream that is masked by a proxy in the original rule
1250
+ # definition.
1251
+ unless in_proxy || stack.size == 1
1252
+ stack << offset
1253
+ stack << event
1254
+ in_proxy = true if Proxy === event
1255
+ end
1192
1256
  end
1257
+
1193
1258
  index += 1
1194
1259
  end
1195
1260
 
1196
- matches
1261
+ captures
1197
1262
  end
1198
1263
  end
1199
1264
 
1200
- # Returns an array of all sub-matches with the given +name+. If +deep+ is
1201
- # +false+, returns only sub-matches that are immediate descendants of this
1202
- # match.
1203
- def find(name, deep=true)
1204
- ms = matches.select {|m| m.has_name?(name) }
1205
- matches.each {|m| ms.concat(m.find(name, deep)) } if deep
1206
- ms
1265
+ # Returns an array of all immediate submatches of this match.
1266
+ def matches
1267
+ @matches ||= (0...captures.size).map {|n| captures[n] }.compact
1207
1268
  end
1208
1269
 
1209
- # A shortcut for retrieving the first immediate sub-match of this match. If
1210
- # +name+ is given, attempts to retrieve the first immediate sub-match named
1211
- # +name+.
1212
- def first(name=nil)
1213
- name ? find(name, false).first : matches.first
1270
+ # A shortcut for retrieving the first immediate submatch of this match.
1271
+ def first
1272
+ captures[0]
1214
1273
  end
1215
1274
 
1216
1275
  # The default value for a match is its string value. This method is
1217
1276
  # overridden in most cases to be more meaningful according to the desired
1218
1277
  # interpretation.
1219
- alias value to_s
1220
-
1221
- # Allows sub-matches of this match to be retrieved by name as instance
1222
- # methods.
1223
- def method_missing(sym, *args)
1224
- if sym == :to_ary
1225
- # This is a workaround for a bug in Ruby 1.9 with classes that
1226
- # extend String.
1227
- super
1278
+ alias_method :value, :to_s
1279
+
1280
+ # Allows methods of this match's string to be called directly and provides
1281
+ # a convenient interface for retrieving the first match with a given name.
1282
+ def method_missing(sym, *args, &block)
1283
+ if @string.respond_to?(sym)
1284
+ @string.__send__(sym, *args, &block)
1228
1285
  else
1229
- first(sym) or raise NoMatchError, 'No match named "%s" in %s (%s)' %
1230
- [sym, self, name]
1286
+ captures[sym].first if captures[sym]
1231
1287
  end
1232
1288
  end
1233
1289
 
1234
- # Returns a string representation of this match that displays the entire
1235
- # match tree for easy viewing in the console.
1236
- def dump
1237
- dump_lines.join("\n")
1290
+ def to_s
1291
+ @string
1238
1292
  end
1239
1293
 
1240
- def dump_lines(indent=' ') # :nodoc:
1241
- line = to_s.inspect
1242
- line << ' (%s)' % names.join(',') unless names.empty?
1243
- matches.inject([line]) do |lines, m|
1244
- lines.concat(m.dump_lines(indent).map {|line| indent + line })
1294
+ alias_method :to_str, :to_s
1295
+
1296
+ def ==(other)
1297
+ case other
1298
+ when String
1299
+ @string == other
1300
+ when Match
1301
+ @string == other.to_s
1302
+ else
1303
+ super
1245
1304
  end
1246
1305
  end
1247
1306
 
1248
- private
1307
+ alias_method :eql?, :==
1249
1308
 
1250
- def extend! # :nodoc:
1251
- extenders.each do |rule|
1252
- rule.extend_match(self)
1309
+ def inspect
1310
+ @string.inspect
1311
+ end
1312
+
1313
+ # Prints the entire subtree of this match using the given +indent+ to
1314
+ # indicate nested match levels. Useful for debugging.
1315
+ def dump(indent=' ')
1316
+ lines = []
1317
+ stack = []
1318
+ offset = 0
1319
+ close = false
1320
+ index = 0
1321
+ last_length = nil
1322
+
1323
+ while index < @events.size
1324
+ event = @events[index]
1325
+
1326
+ if close
1327
+ os = stack.pop
1328
+ start = stack.pop
1329
+ rule = stack.pop
1330
+
1331
+ space = indent * (stack.size / 3)
1332
+ string = @string.slice(os, event)
1333
+ lines[start] = "#{space}#{string.inspect} rule=#{rule}, offset=#{os}, length=#{event}"
1334
+
1335
+ unless last_length
1336
+ last_length = event
1337
+ end
1338
+
1339
+ close = false
1340
+ elsif event == CLOSE
1341
+ close = true
1342
+ else
1343
+ if last_length
1344
+ offset += last_length
1345
+ last_length = nil
1346
+ end
1347
+
1348
+ stack << event
1349
+ stack << index
1350
+ stack << offset
1351
+ end
1352
+
1353
+ index += 1
1253
1354
  end
1355
+
1356
+ puts lines.compact.join("\n")
1254
1357
  end
1255
1358
  end
1256
1359
  end