citrus 2.1.2 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/citrus.gemspec CHANGED
@@ -25,7 +25,6 @@ Gem::Specification.new do |s|
25
25
 
26
26
  s.test_files = s.files.select {|path| path =~ /^test\/.*_test.rb/ }
27
27
 
28
- s.add_dependency('builder')
29
28
  s.add_development_dependency('rake')
30
29
 
31
30
  s.has_rdoc = true
@@ -0,0 +1,16 @@
1
+ # Extras
2
+
3
+
4
+ Several files are included in the Citrus repository that make it easier to work
5
+ with grammar files in various editors.
6
+
7
+ ## TextMate
8
+
9
+ To install the Citrus [TextMate](http://macromates.com/) bundle, simply
10
+ double-click on the `Citrus.tmbundle` file in the `extras` directory.
11
+
12
+ ## Vim
13
+
14
+ To install the [Vim](http://www.vim.org/) scripts, copy the files in
15
+ `extras/vim` to a directory in Vim's
16
+ [runtimepath](http://vimdoc.sourceforge.net/htmldoc/options.html#\'runtimepath\').
data/doc/syntax.markdown CHANGED
@@ -10,46 +10,57 @@ already be familiar to Ruby programmers.
10
10
  Terminals may be represented by a string or a regular expression. Both follow
11
11
  the same rules as Ruby string and regular expression literals.
12
12
 
13
- 'abc'
14
- "abc\n"
15
- /\xFF/
13
+ 'abc' # match "abc"
14
+ "abc\n" # match "abc\n"
15
+ /abc/i # match "abc" in any case
16
+ /\xFF/ # match "\xFF"
16
17
 
17
18
  Character classes and the dot (match anything) symbol are supported as well for
18
19
  compatibility with other parsing expression implementations.
19
20
 
20
21
  [a-z0-9] # match any lowercase letter or digit
21
22
  [\x00-\xFF] # match any octet
22
- . # match anything, even new lines
23
+ . # match any single character, including new lines
23
24
 
24
- See [Terminal](api/classes/Citrus/Terminal.html) for more information.
25
+ Also, strings may use backticks instead of quotes to indicate that they should
26
+ match in a case-insensitive manner.
27
+
28
+ `abc` # match "abc" in any case
29
+
30
+ See [Terminal](api/classes/Citrus/Terminal.html) and
31
+ [StringTerminal](api/classes/Citrus/StringTerminal.html) for more information.
25
32
 
26
33
  ## Repetition
27
34
 
28
35
  Quantifiers may be used after any expression to specify a number of times it
29
- must match. The universal form of a quantifier is N*M where N is the minimum and
30
- M is the maximum number of times the expression may match.
36
+ must match. The universal form of a quantifier is `N*M` where `N` is the minimum
37
+ and `M` is the maximum number of times the expression may match.
31
38
 
32
- 'abc'1*2 # match "abc" a minimum of one, maximum
33
- # of two times
39
+ 'abc'1*2 # match "abc" a minimum of one, maximum of two times
34
40
  'abc'1* # match "abc" at least once
35
41
  'abc'*2 # match "abc" a maximum of twice
36
42
 
37
- The + and ? operators are supported as well for the common cases of 1* and *1
38
- respectively.
43
+ Additionally, the minimum and maximum may be omitted entirely to specify that an
44
+ expression may match zero or more times.
45
+
46
+ 'abc'* # match "abc" zero or more times
47
+
48
+ The `+` and `?` operators are supported as well for the common cases of `1*` and
49
+ `*1` respectively.
39
50
 
40
- 'abc'+ # match "abc" at least once
41
- 'abc'? # match "abc" a maximum of once
51
+ 'abc'+ # match "abc" one or more times
52
+ 'abc'? # match "abc" zero or one time
42
53
 
43
54
  See [Repeat](api/classes/Citrus/Repeat.html) for more information.
44
55
 
45
56
  ## Lookahead
46
57
 
47
- Both positive and negative lookahead are supported in Citrus. Use the & and !
48
- operators to indicate that an expression either should or should not match. In
49
- neither case is any input consumed.
58
+ Both positive and negative lookahead are supported in Citrus. Use the `&` and
59
+ `!` operators to indicate that an expression either should or should not match.
60
+ In neither case is any input consumed.
50
61
 
51
62
  &'a' 'b' # match a "b" preceded by an "a"
52
- !'a' 'b' # match a "b" that is not preceded by an "a"
63
+ 'a' !'b' # match an "a" that is not followed by a "b"
53
64
  !'a' . # match any character except for "a"
54
65
 
55
66
  A special form of lookahead is also supported which will match any character
@@ -75,20 +86,17 @@ See [Sequence](api/classes/Citrus/Sequence.html) for more information.
75
86
  ## Choices
76
87
 
77
88
  Ordered choice is indicated by a vertical bar that separates two expressions.
78
- Note that any operator binds more tightly than the bar.
89
+ When using choice, each expression is tried in order. When one matches, the
90
+ rule returns the match immediately without trying the remaining rules.
79
91
 
80
92
  'a' | 'b' # match "a" or "b"
81
93
  'a' 'b' | 'c' # match "a" then "b" (in sequence), or "c"
82
94
 
83
- See [Choice](api/classes/Citrus/Choice.html) for more information.
84
-
85
- ## Super
86
-
87
- When including a grammar inside another, all rules in the child that have the
88
- same name as a rule in the parent also have access to the "super" keyword to
89
- invoke the parent rule.
95
+ It is important to note when using ordered choice that any operator binds more
96
+ tightly than the vertical bar. A full chart of operators and their respective
97
+ levels of precedence is below.
90
98
 
91
- See [Super](api/classes/Citrus/Super.html) for more information.
99
+ See [Choice](api/classes/Citrus/Choice.html) for more information.
92
100
 
93
101
  ## Labels
94
102
 
@@ -96,12 +104,50 @@ Match objects may be referred to by a different name than the rule that
96
104
  originally generated them. Labels are created by placing the label and a colon
97
105
  immediately preceding any expression.
98
106
 
99
- chars:/[a-z]+/ # the characters matched by the regular
100
- # expression may be referred to as "chars"
101
- # in a block method
107
+ chars:/[a-z]+/ # the characters matched by the regular expression
108
+ # may be referred to as "chars" in an extension
109
+ # method
102
110
 
103
111
  See [Label](api/classes/Citrus/Label.html) for more information.
104
112
 
113
+ ## Grouping
114
+
115
+ As is common in many programming languages, parentheses may be used to override
116
+ the normal binding order of operators.
117
+
118
+ 'a' ('b' | 'c') # match "a", then "b" or "c"
119
+
120
+ ## Extensions
121
+
122
+ Extensions may be specified using either "module" or "block" syntax. When using
123
+ module syntax, specify the name of a module that is used to extend match objects
124
+ in between less than and greater than symbols.
125
+
126
+ [a-z0-9]5*9 <CouponCode> # match a string that consists of any lower
127
+ # cased letter or digit between 5 and 9
128
+ # times and extend the match with the
129
+ # CouponCode module
130
+
131
+ Additionally, extensions may be specified inline using curly braces. Inside the
132
+ curly braces you may embed method definitions that will be used to extend match
133
+ objects.
134
+
135
+ # match any digit and return its integer value when calling the
136
+ # #value method on the match object
137
+ [0-9] {
138
+ def value
139
+ to_i
140
+ end
141
+ }
142
+
143
+ ## Super
144
+
145
+ When including a grammar inside another, all rules in the child that have the
146
+ same name as a rule in the parent also have access to the `super` keyword to
147
+ invoke the parent rule.
148
+
149
+ See [Super](api/classes/Citrus/Super.html) for more information.
150
+
105
151
  ## Precedence
106
152
 
107
153
  The following table contains a list of all Citrus symbols and operators and
@@ -111,6 +157,7 @@ Operator | Name | Precedence
111
157
  ------------------------- | ------------------------- | ----------
112
158
  `''` | String (single quoted) | 6
113
159
  `""` | String (double quoted) | 6
160
+ <code>``</code> | String (case insensitive) | 6
114
161
  `[]` | Character class | 6
115
162
  `.` | Dot (any character) | 6
116
163
  `//` | Regular expression | 6
data/doc/testing.markdown CHANGED
@@ -22,12 +22,11 @@ case that could be used to test that our grammar works properly.
22
22
  end
23
23
  end
24
24
 
25
- The key here is using the `root`
26
- [option](api/classes/Citrus/GrammarMethods.html#M000031) when performing the
27
- parse to specify the name of the rule at which the parse should start. In
28
- `test_number`, since `:number` was given the parse will start at that rule as if
29
- it were the root rule of the entire grammar. The ability to change the root rule
30
- on the fly like this enables easy unit testing of the entire grammar.
25
+ The key here is using the `:root` option when performing the parse to specify
26
+ the name of the rule at which the parse should start. In `test_number`, since
27
+ `:number` was given the parse will start at that rule as if it were the root
28
+ rule of the entire grammar. The ability to change the root rule on the fly like
29
+ this enables easy unit testing of the entire grammar.
31
30
 
32
31
  Also note that because match objects are themselves strings, assertions may be
33
32
  made to test equality of match objects with string values.
@@ -36,9 +35,9 @@ made to test equality of match objects with string values.
36
35
 
37
36
  When a parse fails, a [ParseError](api/classes/Citrus/ParseError.html) object is
38
37
  generated which provides a wealth of information about exactly where the parse
39
- failed. Using this object, you could possibly provide some useful feedback to
40
- the user about why the input was bad. The following code demonstrates one way
41
- to do this.
38
+ failed including the offset, line number, line text, and line offset. Using this
39
+ object, you could possibly provide some useful feedback to the user about why
40
+ the input was bad. The following code demonstrates one way to do this.
42
41
 
43
42
  def parse_some_stuff(stuff)
44
43
  match = StuffGrammar.parse(stuff)
@@ -47,14 +46,7 @@ to do this.
47
46
  [e.line_number, e.line_offset]
48
47
  end
49
48
 
50
- In addition to useful error objects, Citrus also includes a special file that
51
- should help grammar authors when debugging grammars. To get this extra
52
- functionality, simply `require 'citrus/debug'` instead of `require 'citrus'`
53
- when running your code.
54
-
55
- When debugging is enabled, you can visualize parse trees in the console as XML
56
- documents. This can help when determining which rules are generating which
57
- matches and how they are organized in the output. Also when debugging, each
58
- match object automatically records its offset in the original input, which can
59
- also be very helpful in keeping track of which offsets in the input generated
60
- which matches.
49
+ In addition to useful error objects, Citrus also includes a means of visualizing
50
+ match trees in the console via `Match#dump`. This can help when determining
51
+ which rules are generating which matches and how they are organized in the
52
+ match tree.
data/examples/calc.citrus CHANGED
@@ -5,7 +5,7 @@
5
5
  # An identical grammar that is written using pure Ruby can be found in calc.rb.
6
6
  grammar Calc
7
7
 
8
- ## Hierarchy
8
+ ## Hierarchical syntax
9
9
 
10
10
  rule term
11
11
  additive | factor
@@ -55,50 +55,51 @@ grammar Calc
55
55
  (lparen term rparen) { term.value }
56
56
  end
57
57
 
58
- ## Syntax
58
+ ## Lexical syntax
59
59
 
60
60
  rule number
61
61
  float | integer
62
62
  end
63
63
 
64
64
  rule float
65
- (digits '.' digits space) { strip.to_f }
65
+ (digits '.' digits space*) { strip.to_f }
66
66
  end
67
67
 
68
68
  rule integer
69
- (digits space) { strip.to_i }
69
+ (digits space*) { strip.to_i }
70
70
  end
71
71
 
72
72
  rule digits
73
+ # Numbers may contain underscores in Ruby.
73
74
  [0-9]+ ('_' [0-9]+)*
74
75
  end
75
76
 
76
77
  rule additive_operator
77
- (('+' | '-') space) { |a, b|
78
+ (('+' | '-') space*) { |a, b|
78
79
  a.send(strip, b)
79
80
  }
80
81
  end
81
82
 
82
83
  rule multiplicative_operator
83
- (('*' | '/' | '%') space) { |a, b|
84
+ (('*' | '/' | '%') space*) { |a, b|
84
85
  a.send(strip, b)
85
86
  }
86
87
  end
87
88
 
88
89
  rule exponential_operator
89
- ('**' space) { |a, b|
90
+ ('**' space*) { |a, b|
90
91
  a ** b
91
92
  }
92
93
  end
93
94
 
94
95
  rule unary_operator
95
- (('~' | '+' | '-') space) { |n|
96
+ (('~' | '+' | '-') space*) { |n|
96
97
  # Unary + and - require an @.
97
98
  n.send(strip == '~' ? strip : '%s@' % strip)
98
99
  }
99
100
  end
100
101
 
101
- rule lparen '(' space end
102
- rule rparen ')' space end
103
- rule space [ \t\n\r]* end
102
+ rule lparen '(' space* end
103
+ rule rparen ')' space* end
104
+ rule space [ \t\n\r] end
104
105
  end
data/examples/calc.rb CHANGED
@@ -8,7 +8,7 @@ require 'citrus'
8
8
  # found in calc.citrus.
9
9
  grammar :Calc do
10
10
 
11
- ## Hierarchy
11
+ ## Hierarchical syntax
12
12
 
13
13
  rule :term do
14
14
  any(:additive, :factor)
@@ -58,50 +58,51 @@ grammar :Calc do
58
58
  all(:lparen, :term, :rparen) { term.value }
59
59
  end
60
60
 
61
- ## Syntax
61
+ ## Lexical syntax
62
62
 
63
63
  rule :number do
64
64
  any(:float, :integer)
65
65
  end
66
66
 
67
67
  rule :float do
68
- all(:digits, '.', :digits, :space) { strip.to_f }
68
+ all(:digits, '.', :digits, zero_or_more(:space)) { strip.to_f }
69
69
  end
70
70
 
71
71
  rule :integer do
72
- all(:digits, :space) { strip.to_i }
72
+ all(:digits, zero_or_more(:space)) { strip.to_i }
73
73
  end
74
74
 
75
75
  rule :digits do
76
+ # Numbers may contain underscores in Ruby.
76
77
  /[0-9]+(?:_[0-9]+)*/
77
78
  end
78
79
 
79
80
  rule :additive_operator do
80
- all(any('+', '-'), :space) { |a, b|
81
+ all(any('+', '-'), zero_or_more(:space)) { |a, b|
81
82
  a.send(strip, b)
82
83
  }
83
84
  end
84
85
 
85
86
  rule :multiplicative_operator do
86
- all(any('*', '/', '%'), :space) { |a, b|
87
+ all(any('*', '/', '%'), zero_or_more(:space)) { |a, b|
87
88
  a.send(strip, b)
88
89
  }
89
90
  end
90
91
 
91
92
  rule :exponential_operator do
92
- all('**', :space) { |a, b|
93
+ all('**', zero_or_more(:space)) { |a, b|
93
94
  a ** b
94
95
  }
95
96
  end
96
97
 
97
98
  rule :unary_operator do
98
- all(any('~', '+', '-'), :space) { |n|
99
+ all(any('~', '+', '-'), zero_or_more(:space)) { |n|
99
100
  # Unary + and - require an @.
100
101
  n.send(strip == '~' ? strip : '%s@' % strip)
101
102
  }
102
103
  end
103
104
 
104
- rule :lparen, ['(', :space]
105
- rule :rparen, [')', :space]
106
- rule :space, /[ \t\n\r]*/
105
+ rule :lparen, ['(', zero_or_more(:space)]
106
+ rule :rparen, [')', zero_or_more(:space)]
107
+ rule :space, /[ \t\n\r]/
107
108
  end
data/lib/citrus.rb CHANGED
@@ -8,7 +8,7 @@ require 'strscan'
8
8
  module Citrus
9
9
  autoload :File, 'citrus/file'
10
10
 
11
- VERSION = [2, 1, 2]
11
+ VERSION = [2, 2, 0]
12
12
 
13
13
  # Returns the current version of Citrus as a string.
14
14
  def self.version
@@ -22,6 +22,8 @@ module Citrus
22
22
 
23
23
  F = ::File
24
24
 
25
+ CLOSE = -1
26
+
25
27
  # Loads the grammar from the given +file+ into the global scope using #eval.
26
28
  def self.load(file)
27
29
  file << '.citrus' unless F.file?(file)
@@ -40,26 +42,12 @@ module Citrus
40
42
  # Parses the given Citrus +code+ using the given +options+. Returns the
41
43
  # generated match tree. Raises a +SyntaxError+ if the parse fails.
42
44
  def self.parse(code, options={})
43
- begin
44
- File.parse(code, options)
45
- rescue ParseError => e
46
- raise SyntaxError.new(e)
47
- end
45
+ File.parse(code, options)
48
46
  end
49
47
 
50
48
  # A standard error class that all Citrus errors extend.
51
49
  class Error < RuntimeError; end
52
50
 
53
- # Raised when there is an error parsing Citrus code.
54
- class SyntaxError < Error
55
- # The +error+ given here should be a +ParseError+ object.
56
- def initialize(error)
57
- msg = "Syntax error on line %d at offset %d\n%s" %
58
- [error.line_number, error.line_offset, error.detail]
59
- super(msg)
60
- end
61
- end
62
-
63
51
  # Raised when a match cannot be found.
64
52
  class NoMatchError < Error; end
65
53
 
@@ -71,8 +59,8 @@ module Citrus
71
59
  @line_offset = input.line_offset(offset)
72
60
  @line_number = input.line_number(offset)
73
61
  @line = input.line(offset)
74
- msg = "Failed to parse input at offset %d\n" % offset
75
- msg << detail
62
+ msg = "Failed to parse input on line %d at offset %d\n%s" %
63
+ [line_number, line_offset, detail]
76
64
  super(msg)
77
65
  end
78
66
 
@@ -106,7 +94,7 @@ module Citrus
106
94
  @max_offset = 0
107
95
  end
108
96
 
109
- # The maximum offset that has been achieved during a parse.
97
+ # The maximum offset in the input that was successfully parsed.
110
98
  attr_reader :max_offset
111
99
 
112
100
  # A nested hash of rule id's to offsets and their respective matches. Only
@@ -116,11 +104,11 @@ module Citrus
116
104
  # The number of times the cache was hit. Only present if memoing is enabled.
117
105
  attr_reader :cache_hits
118
106
 
119
- # Resets all internal variables so that this object may be used in
120
- # another parse.
121
- def reset
122
- super
107
+ # Resets all internal variables so that this object may be used in another
108
+ # parse.
109
+ def reset # :nodoc:
123
110
  @max_offset = 0
111
+ super
124
112
  end
125
113
 
126
114
  # Returns the length of this input.
@@ -153,7 +141,7 @@ module Citrus
153
141
  # Returns the 0-based number of the line that contains the character at the
154
142
  # given +pos+. +pos+ defaults to the current pointer position.
155
143
  def line_index(pos=pos)
156
- p, n = 0, 0
144
+ p = n = 0
157
145
  each_line do |line|
158
146
  p += line.length
159
147
  return n if p >= pos
@@ -176,20 +164,29 @@ module Citrus
176
164
  lines[line_index(pos)]
177
165
  end
178
166
 
179
- # Returns the match for the given +rule+ at the current pointer position,
180
- # which is +nil+ if no match can be made.
181
- def match(rule)
182
- offset = pos
183
- match = rule.match(self)
167
+ # Returns an array of events for the given +rule+ at the current pointer
168
+ # position. Objects in this array may be one of three types: a rule id,
169
+ # Citrus::CLOSE, or a length.
170
+ def exec(rule, events=[])
171
+ start = pos
172
+ index = events.size
184
173
 
185
- if match
174
+ rule.exec(self, events)
175
+
176
+ if index < events.size
177
+ self.pos = start + events[-1]
186
178
  @max_offset = pos if pos > @max_offset
187
179
  else
188
- # Reset the position for the next attempt at a match.
189
- self.pos = offset unless match
180
+ self.pos = start
190
181
  end
191
182
 
192
- match
183
+ events
184
+ end
185
+
186
+ # Returns the length of a match for the given +rule+ at the current pointer
187
+ # position, +nil+ if none can be made.
188
+ def test(rule)
189
+ rule.exec(self)[-1]
193
190
  end
194
191
 
195
192
  # Returns +true+ when using memoization to cache match results.
@@ -205,29 +202,31 @@ module Citrus
205
202
  def memoize!
206
203
  return if memoized?
207
204
 
205
+ @cache = {}
206
+ @cache_hits = 0
207
+
208
208
  # Using +instance_eval+ here preserves access to +super+ within the
209
209
  # methods we define inside the block.
210
210
  instance_eval do
211
- def match(rule) # :nodoc:
211
+ def exec(rule, events=[]) # :nodoc:
212
212
  c = @cache[rule.id] ||= {}
213
213
 
214
- if c.key?(pos)
214
+ e = if c[pos]
215
215
  @cache_hits += 1
216
216
  c[pos]
217
217
  else
218
- c[pos] = super
218
+ c[pos] = super(rule)
219
219
  end
220
+
221
+ events.concat(e)
220
222
  end
221
223
 
222
224
  def reset # :nodoc:
223
- super
224
- @cache = {}
225
+ @cache.clear
225
226
  @cache_hits = 0
227
+ super
226
228
  end
227
229
  end
228
-
229
- @cache = {}
230
- @cache_hits = 0
231
230
  end
232
231
  end
233
232
 
@@ -266,6 +265,16 @@ module Citrus
266
265
  super
267
266
  end
268
267
 
268
+ # Parses the given +string+ using this grammar's root rule. Optionally, the
269
+ # name of a different rule may be given here as the value of the +:root+
270
+ # option. Otherwise, all options are the same as in Rule#parse.
271
+ def parse(string, options={})
272
+ rule_name = options.delete(:root) || root
273
+ rule = rule(rule_name)
274
+ raise 'No rule named "%s"' % rule_name unless rule
275
+ rule.parse(string, options)
276
+ end
277
+
269
278
  # Returns the name of this grammar as a string.
270
279
  def name
271
280
  super.to_s
@@ -310,9 +319,9 @@ module Citrus
310
319
  # and returns it on success. Returns +nil+ on failure.
311
320
  def super_rule(name)
312
321
  sym = name.to_sym
313
- included_grammars.each do |g|
314
- r = g.rule(sym)
315
- return r if r
322
+ included_grammars.each do |grammar|
323
+ rule = grammar.rule(sym)
324
+ return rule if rule
316
325
  end
317
326
  nil
318
327
  end
@@ -433,48 +442,6 @@ module Citrus
433
442
  rule.extension = mod if mod
434
443
  rule
435
444
  end
436
-
437
- # Parses the given input +string+ using the given +options+. If no match can
438
- # be made, a ParseError is raised. See #default_parse_options for a detailed
439
- # description of available parse options.
440
- def parse(string, options={})
441
- opts = default_parse_options.merge(options)
442
- raise 'No root rule specified' unless opts[:root]
443
-
444
- root_rule = rule(opts[:root])
445
- raise 'No rule named "%s"' % root unless root_rule
446
-
447
- input = Input.new(string)
448
- input.memoize! if opts[:memoize]
449
- input.pos = opts[:offset] if opts[:offset] > 0
450
-
451
- match = input.match(root_rule)
452
- if match.nil? || (opts[:consume] && input.length != match.length)
453
- raise ParseError.new(input)
454
- end
455
-
456
- match
457
- end
458
-
459
- # The default set of options that is used in #parse. The options hash may
460
- # have any of the following keys:
461
- #
462
- # offset:: The offset at which the parse should start. Defaults to 0.
463
- # root:: The name of the root rule to use for the parse. Defaults
464
- # to the name supplied by calling #root.
465
- # memoize:: If this is +true+ the matches generated during a parse are
466
- # memoized. See Input#memoize! for more information. Defaults to
467
- # +false+.
468
- # consume:: If this is +true+ a ParseError will be raised during a parse
469
- # unless the entire input string is consumed. Defaults to
470
- # +false+.
471
- def default_parse_options
472
- { :offset => 0,
473
- :root => root,
474
- :memoize => false,
475
- :consume => false
476
- }
477
- end
478
445
  end
479
446
 
480
447
  # A Rule is an object that is used by a grammar to create matches on the
@@ -491,12 +458,13 @@ module Citrus
491
458
  # Returns a new Rule object depending on the type of object given.
492
459
  def self.new(obj)
493
460
  case obj
494
- when Rule then obj
495
- when Symbol then Alias.new(obj)
496
- when String, Regexp then Terminal.new(obj)
497
- when Array then Sequence.new(obj)
498
- when Range then Choice.new(obj.to_a)
499
- when Numeric then Terminal.new(obj.to_s)
461
+ when Rule then obj
462
+ when Symbol then Alias.new(obj)
463
+ when String then StringTerminal.new(obj)
464
+ when Regexp then Terminal.new(obj)
465
+ when Array then Sequence.new(obj)
466
+ when Range then Choice.new(obj.to_a)
467
+ when Numeric then StringTerminal.new(obj.to_s)
500
468
  else
501
469
  raise ArgumentError, "Invalid rule object: %s" % obj.inspect
502
470
  end
@@ -504,26 +472,44 @@ module Citrus
504
472
 
505
473
  @unique_id = 0
506
474
 
507
- # Generates a new rule id.
508
- def self.new_id
509
- @unique_id += 1
475
+ # A global registry for Rule objects. Keyed by rule id.
476
+ @rules = {}
477
+
478
+ # Adds the given +rule+ to the global registry and gives it an id.
479
+ def self.<<(rule) # :nodoc:
480
+ rule.id = (@unique_id += 1)
481
+ @rules[rule.id] = rule
510
482
  end
511
483
 
512
- # The grammar this rule belongs to.
513
- attr_accessor :grammar
484
+ # Returns the Rule object with the given +id+.
485
+ def self.[](id)
486
+ @rules[id]
487
+ end
514
488
 
515
- # An integer id that is unique to this rule.
516
- def id
517
- @id ||= Rule.new_id
489
+ def initialize(*args) # :nodoc:
490
+ Rule << self
518
491
  end
519
492
 
493
+ # An integer id that is unique to this rule.
494
+ attr_accessor :id
495
+
496
+ # The grammar this rule belongs to.
497
+ attr_accessor :grammar
498
+
520
499
  # Sets the name of this rule.
521
500
  def name=(name)
522
501
  @name = name.to_sym
523
502
  end
524
503
 
525
- # The name of this rule.
526
- attr_reader :name
504
+ # Returns the name of this rule.
505
+ def name
506
+ @name || '<anonymous>'
507
+ end
508
+
509
+ # Returns +true+ if this rule has a name, +false+ otherwise.
510
+ def named?
511
+ !! @name
512
+ end
527
513
 
528
514
  # Specifies a module that will be used to extend all Match objects that
529
515
  # result from this rule. If +mod+ is a Proc, it is used to create an
@@ -532,9 +518,9 @@ module Citrus
532
518
  if Proc === mod
533
519
  begin
534
520
  tmp = Module.new(&mod)
535
- raise ArgumentError unless tmp.instance_methods.any?
521
+ raise ArgumentError if tmp.instance_methods.empty?
536
522
  mod = tmp
537
- rescue ArgumentError, NameError, NoMethodError
523
+ rescue NoMethodError, ArgumentError, NameError
538
524
  mod = Module.new { define_method(:value, &mod) }
539
525
  end
540
526
  end
@@ -547,11 +533,70 @@ module Citrus
547
533
  # The module this rule uses to extend new matches.
548
534
  attr_reader :extension
549
535
 
536
+ # Attempts to parse the given +string+ and return a Match if any can be
537
+ # made. The +options+ may contain any of the following keys:
538
+ #
539
+ # offset:: The offset in +string+ at which to start the parse. Defaults
540
+ # to 0.
541
+ # memoize:: If this is +true+ the matches generated during a parse are
542
+ # memoized. See Input#memoize! for more information. Defaults to
543
+ # +false+.
544
+ # consume:: If this is +true+ a ParseError will be raised during a parse
545
+ # unless the entire input string is consumed. Defaults to
546
+ # +false+.
547
+ def parse(string, options={})
548
+ opts = default_parse_options.merge(options)
549
+
550
+ input = Input.new(string)
551
+ input.memoize! if opts[:memoize]
552
+ input.pos = opts[:offset] if opts[:offset] > 0
553
+
554
+ start = input.pos
555
+ events = input.exec(self)
556
+ length = events[-1]
557
+
558
+ if !length || (opts[:consume] && length < (input.length - opts[:offset]))
559
+ raise ParseError.new(input)
560
+ end
561
+
562
+ Match.new(string.slice(start, length), events)
563
+ end
564
+
565
+ # The default set of options to use when parsing.
566
+ def default_parse_options # :nodoc:
567
+ { :offset => 0,
568
+ :memoize => false,
569
+ :consume => false
570
+ }
571
+ end
572
+
573
+ # Tests whether or not this rule matches on the given +string+. Returns the
574
+ # length of the match if any can be made, +nil+ otherwise.
575
+ def test(string)
576
+ input = Input.new(string)
577
+ input.test(self)
578
+ end
579
+
550
580
  # Returns +true+ if this rule is a Terminal.
551
581
  def terminal?
552
582
  is_a?(Terminal)
553
583
  end
554
584
 
585
+ # Returns +true+ if this rule is able to propagate extensions from child
586
+ # rules to the scope of the parent, +false+ otherwise. In general, this will
587
+ # return +false+ for any rule whose match value is derived from an arbitrary
588
+ # number of child rules, such as a Repeat or a Sequence. Note that this is
589
+ # not true for Choice objects because they rely on exactly 1 rule to match,
590
+ # as do Proxy objects.
591
+ def propagates_extensions?
592
+ case self
593
+ when AndPredicate, NotPredicate, ButPredicate, Repeat, Sequence
594
+ false
595
+ else
596
+ true
597
+ end
598
+ end
599
+
555
600
  # Returns +true+ if this rule needs to be surrounded by parentheses when
556
601
  # using #embed.
557
602
  def paren?
@@ -561,23 +606,90 @@ module Citrus
561
606
  # Returns a string version of this rule that is suitable to be used in the
562
607
  # string representation of another rule.
563
608
  def embed
564
- name ? name.to_s : (paren? ? '(%s)' % to_s : to_s)
609
+ named? ? name.to_s : (paren? ? '(%s)' % to_s : to_s)
565
610
  end
566
611
 
567
612
  def inspect # :nodoc:
568
613
  to_s
569
614
  end
615
+ end
570
616
 
571
- private
617
+ # A Terminal is a Rule that matches directly on the input stream and may not
618
+ # contain any other rule. Terminals are essentially wrappers for regular
619
+ # expressions. As such, the Citrus notation is identical to Ruby's regular
620
+ # expression notation, e.g.:
621
+ #
622
+ # /expr/
623
+ #
624
+ # Character classes and the dot symbol may also be used in Citrus notation for
625
+ # compatibility with other parsing expression implementations, e.g.:
626
+ #
627
+ # [a-zA-Z]
628
+ # .
629
+ #
630
+ class Terminal
631
+ include Rule
632
+
633
+ def initialize(rule=/^/)
634
+ super
635
+ @rule = rule
636
+ end
637
+
638
+ # The actual Regexp object this rule uses to match.
639
+ attr_reader :rule
640
+
641
+ # Returns an array of events for this rule on the given +input+.
642
+ def exec(input, events=[])
643
+ length = input.scan_full(rule, false, false)
644
+ if length
645
+ events << id
646
+ events << CLOSE
647
+ events << length
648
+ end
649
+ events
650
+ end
572
651
 
573
- def extend_match(match, name)
574
- match.extend(extension) if extension
575
- match.names << name if name
576
- match
652
+ # Returns +true+ if this rule is case sensitive.
653
+ def case_sensitive?
654
+ !rule.casefold?
577
655
  end
578
656
 
579
- def create_match(data)
580
- extend_match(Match.new(data), name)
657
+ # Returns the Citrus notation of this rule as a string.
658
+ def to_s
659
+ rule.inspect
660
+ end
661
+ end
662
+
663
+ # A StringTerminal is a Terminal that may be instantiated from a String
664
+ # object. The Citrus notation is any sequence of characters enclosed in either
665
+ # single or double quotes, e.g.:
666
+ #
667
+ # 'expr'
668
+ # "expr"
669
+ #
670
+ # This notation works the same as it does in Ruby; i.e. strings in double
671
+ # quotes may contain escape sequences while strings in single quotes may not.
672
+ # In order to specify that a string should ignore case when matching, enclose
673
+ # it in backticks instead of single or double quotes, e.g.:
674
+ #
675
+ # `expr`
676
+ #
677
+ # Besides case sensitivity, case-insensitive strings have the same semantics
678
+ # as double-quoted strings.
679
+ class StringTerminal < Terminal
680
+ # The +flags+ will be passed directly to Regexp#new.
681
+ def initialize(rule='', flags=0)
682
+ super(Regexp.new(Regexp.escape(rule), flags))
683
+ @string = rule
684
+ end
685
+
686
+ # Returns the Citrus notation of this rule as a string.
687
+ def to_s
688
+ if case_sensitive?
689
+ @string.inspect
690
+ else
691
+ @string.inspect.gsub(/^"|"$/, '`')
692
+ end
581
693
  end
582
694
  end
583
695
 
@@ -589,6 +701,7 @@ module Citrus
589
701
  include Rule
590
702
 
591
703
  def initialize(rule_name='<proxy>')
704
+ super
592
705
  self.rule_name = rule_name
593
706
  end
594
707
 
@@ -605,10 +718,9 @@ module Citrus
605
718
  @rule ||= resolve!
606
719
  end
607
720
 
608
- # Returns the Match for this rule on +input+, +nil+ if no match can be made.
609
- def match(input)
610
- m = input.match(rule)
611
- extend_match(m, name) if m
721
+ # Returns an array of events for this rule on the given +input+.
722
+ def exec(input, events=[])
723
+ input.exec(rule, events)
612
724
  end
613
725
  end
614
726
 
@@ -631,10 +743,8 @@ module Citrus
631
743
  # Searches this proxy's grammar and any included grammars for a rule with
632
744
  # this proxy's #rule_name. Raises an error if one cannot be found.
633
745
  def resolve!
634
- rule = grammar.rule(rule_name)
635
- raise RuntimeError, 'No rule named "%s" in grammar %s' %
636
- [rule_name, grammar.name] unless rule
637
- rule
746
+ grammar.rule(rule_name) or raise RuntimeError,
747
+ 'No rule named "%s" in grammar %s' % [rule_name, grammar.name]
638
748
  end
639
749
  end
640
750
 
@@ -658,60 +768,8 @@ module Citrus
658
768
  # Searches this proxy's included grammars for a rule with this proxy's
659
769
  # #rule_name. Raises an error if one cannot be found.
660
770
  def resolve!
661
- rule = grammar.super_rule(rule_name)
662
- raise RuntimeError, 'No rule named "%s" in hierarchy of grammar %s' %
663
- [rule_name, grammar.name] unless rule
664
- rule
665
- end
666
- end
667
-
668
- # A Terminal is a Rule that matches directly on the input stream and may not
669
- # contain any other rule. Terminals may be created from either a String or a
670
- # Regexp object. When created from strings, the Citrus notation is any
671
- # sequence of characters enclosed in either single or double quotes, e.g.:
672
- #
673
- # 'expr'
674
- # "expr"
675
- #
676
- # When created from a regular expression, the Citrus notation is identical to
677
- # Ruby's regular expression notation, e.g.:
678
- #
679
- # /expr/
680
- #
681
- # Character classes and the dot symbol may also be used in Citrus notation for
682
- # compatibility with other parsing expression implementations, e.g.:
683
- #
684
- # [a-zA-Z]
685
- # .
686
- #
687
- class Terminal
688
- include Rule
689
-
690
- def initialize(rule='')
691
- case rule
692
- when String
693
- @string = rule
694
- @rule = Regexp.new(Regexp.escape(rule))
695
- when Regexp
696
- @rule = rule
697
- else
698
- raise ArgumentError, "Cannot create terminal from object: %s" %
699
- rule.inspect
700
- end
701
- end
702
-
703
- # The actual Regexp object this rule uses to match.
704
- attr_reader :rule
705
-
706
- # Returns the Match for this rule on +input+, +nil+ if no match can be made.
707
- def match(input)
708
- m = input.scan(rule)
709
- create_match(m) if m
710
- end
711
-
712
- # Returns the Citrus notation of this rule as a string.
713
- def to_s
714
- (@string || @rule).inspect
771
+ grammar.super_rule(rule_name) or raise RuntimeError,
772
+ 'No rule named "%s" in hierarchy of grammar %s' % [rule_name, grammar.name]
715
773
  end
716
774
  end
717
775
 
@@ -723,15 +781,16 @@ module Citrus
723
781
  include Rule
724
782
 
725
783
  def initialize(rules=[])
784
+ super
726
785
  @rules = rules.map {|r| Rule.new(r) }
727
786
  end
728
787
 
729
788
  # An array of the actual Rule objects this rule uses to match.
730
789
  attr_reader :rules
731
790
 
732
- def grammar=(grammar)
733
- @rules.each {|r| r.grammar = grammar }
791
+ def grammar=(grammar) # :nodoc:
734
792
  super
793
+ @rules.each {|r| r.grammar = grammar }
735
794
  end
736
795
  end
737
796
 
@@ -758,9 +817,14 @@ module Citrus
758
817
  class AndPredicate
759
818
  include Predicate
760
819
 
761
- # Returns the Match for this rule on +input+, +nil+ if no match can be made.
762
- def match(input)
763
- create_match('') if input.match(rule)
820
+ # Returns an array of events for this rule on the given +input+.
821
+ def exec(input, events=[])
822
+ if input.test(rule)
823
+ events << id
824
+ events << CLOSE
825
+ events << 0
826
+ end
827
+ events
764
828
  end
765
829
 
766
830
  # Returns the Citrus notation of this rule as a string.
@@ -778,9 +842,14 @@ module Citrus
778
842
  class NotPredicate
779
843
  include Predicate
780
844
 
781
- # Returns the Match for this rule on +input+, +nil+ if no match can be made.
782
- def match(input)
783
- create_match('') unless input.match(rule)
845
+ # Returns an array of events for this rule on the given +input+.
846
+ def exec(input, events=[])
847
+ unless input.test(rule)
848
+ events << id
849
+ events << CLOSE
850
+ events << 0
851
+ end
852
+ events
784
853
  end
785
854
 
786
855
  # Returns the Citrus notation of this rule as a string.
@@ -800,16 +869,20 @@ module Citrus
800
869
 
801
870
  DOT_RULE = Rule.new(DOT)
802
871
 
803
- # Returns the Match for this rule on +input+, +nil+ if no match can be made.
804
- def match(input)
805
- matches = []
806
- while input.match(rule).nil?
807
- m = input.match(DOT_RULE)
808
- break unless m
809
- matches << m
872
+ # Returns an array of events for this rule on the given +input+.
873
+ def exec(input, events=[])
874
+ length = 0
875
+ until input.test(rule)
876
+ len = input.exec(DOT_RULE)[-1]
877
+ break unless len
878
+ length += len
879
+ end
880
+ if length > 0
881
+ events << id
882
+ events << CLOSE
883
+ events << length
810
884
  end
811
- # Create a single match from the aggregate text value of all submatches.
812
- create_match(matches.join) if matches.any?
885
+ events
813
886
  end
814
887
 
815
888
  # Returns the Citrus notation of this rule as a string.
@@ -841,12 +914,9 @@ module Citrus
841
914
  # The label this rule adds to all its matches.
842
915
  attr_reader :label
843
916
 
844
- # Returns the Match for this rule on +input+, +nil+ if no match can be made.
845
- # When a Label makes a match, it re-names the match to the value of its
846
- # #label.
847
- def match(input)
848
- m = input.match(rule)
849
- extend_match(m, label) if m
917
+ # Returns an array of events for this rule on the given +input+.
918
+ def exec(input, events=[])
919
+ input.exec(rule, events)
850
920
  end
851
921
 
852
922
  # Returns the Citrus notation of this rule as a string.
@@ -878,20 +948,32 @@ module Citrus
878
948
  include Predicate
879
949
 
880
950
  def initialize(rule='', min=1, max=Infinity)
881
- super(rule)
882
951
  raise ArgumentError, "Min cannot be greater than max" if min > max
952
+ super(rule)
883
953
  @range = Range.new(min, max)
884
954
  end
885
955
 
886
- # Returns the Match for this rule on +input+, +nil+ if no match can be made.
887
- def match(input)
888
- matches = []
889
- while matches.length < @range.end
890
- m = input.match(rule)
891
- break unless m
892
- matches << m
956
+ # Returns an array of events for this rule on the given +input+.
957
+ def exec(input, events=[])
958
+ events << id
959
+
960
+ index = events.size
961
+ start = index - 1
962
+ length = n = 0
963
+ while n < max && input.exec(rule, events).size > index
964
+ index = events.size
965
+ length += events[-1]
966
+ n += 1
893
967
  end
894
- create_match(matches) if @range.include?(matches.length)
968
+
969
+ if n >= min
970
+ events << CLOSE
971
+ events << length
972
+ else
973
+ events.slice!(start, events.size)
974
+ end
975
+
976
+ events
895
977
  end
896
978
 
897
979
  # The minimum number of times this rule must match.
@@ -941,13 +1023,25 @@ module Citrus
941
1023
  class Choice
942
1024
  include List
943
1025
 
944
- # Returns the Match for this rule on +input+, +nil+ if no match can be made.
945
- def match(input)
946
- rules.each do |rule|
947
- m = input.match(rule)
948
- return extend_match(m, name) if m
1026
+ # Returns an array of events for this rule on the given +input+.
1027
+ def exec(input, events=[])
1028
+ events << id
1029
+
1030
+ index = events.size
1031
+ start = index - 1
1032
+ n = 0
1033
+ while n < rules.length && input.exec(rules[n], events).size == index
1034
+ n += 1
949
1035
  end
950
- nil
1036
+
1037
+ if index < events.size
1038
+ events << CLOSE
1039
+ events << events[-2]
1040
+ else
1041
+ events.slice!(start, events.size)
1042
+ end
1043
+
1044
+ events
951
1045
  end
952
1046
 
953
1047
  # Returns the Citrus notation of this rule as a string.
@@ -964,15 +1058,27 @@ module Citrus
964
1058
  class Sequence
965
1059
  include List
966
1060
 
967
- # Returns the Match for this rule on +input+, +nil+ if no match can be made.
968
- def match(input)
969
- matches = []
970
- rules.each do |rule|
971
- m = input.match(rule)
972
- break unless m
973
- matches << m
1061
+ # Returns an array of events for this rule on the given +input+.
1062
+ def exec(input, events=[])
1063
+ events << id
1064
+
1065
+ index = events.size
1066
+ start = index - 1
1067
+ length = n = 0
1068
+ while n < rules.length && input.exec(rules[n], events).size > index
1069
+ index = events.size
1070
+ length += events[-1]
1071
+ n += 1
1072
+ end
1073
+
1074
+ if n == rules.length
1075
+ events << CLOSE
1076
+ events << length
1077
+ else
1078
+ events.slice!(start, events.size)
974
1079
  end
975
- create_match(matches) if matches.length == rules.length
1080
+
1081
+ events
976
1082
  end
977
1083
 
978
1084
  # Returns the Citrus notation of this rule as a string.
@@ -985,19 +1091,19 @@ module Citrus
985
1091
  # match may contain any number of other matches. This class provides several
986
1092
  # convenient tree traversal methods that help when examining parse results.
987
1093
  class Match < String
988
- def initialize(data)
989
- case data
990
- when String
991
- super(data)
992
- when Array
993
- super(data.join)
994
- @matches = data
995
- else
996
- raise ArgumentError, "Cannot create match from object: %s" %
997
- data.inspect
998
- end
1094
+ def initialize(string, events=[])
1095
+ raise ArgumentError, "Invalid events for match length %d" %
1096
+ string.length if events[-1] && string.length != events[-1]
1097
+
1098
+ super(string)
1099
+ @events = events
1100
+
1101
+ extend!
999
1102
  end
1000
1103
 
1104
+ # The array of events that was passed to the constructor.
1105
+ attr_reader :events
1106
+
1001
1107
  # An array of all names of this match. A name is added to a match object
1002
1108
  # for each rule that returns that object when matching. These names can then
1003
1109
  # be used to determine which rules were satisfied by a given match.
@@ -1012,20 +1118,64 @@ module Citrus
1012
1118
 
1013
1119
  # Returns +true+ if this match has the given +name+.
1014
1120
  def has_name?(name)
1015
- names.include?(name)
1121
+ names.include?(name.to_sym)
1122
+ end
1123
+
1124
+ # Returns an array of all Rule objects that extend this match.
1125
+ def extenders
1126
+ @extenders ||= begin
1127
+ extenders = []
1128
+ @events.each do |event|
1129
+ break if event == CLOSE
1130
+ rule = Rule[event]
1131
+ extenders.unshift(rule)
1132
+ break unless rule.propagates_extensions?
1133
+ end
1134
+ extenders
1135
+ end
1136
+ end
1137
+
1138
+ # Returns a reference to the Rule object that first created this match.
1139
+ def creator
1140
+ extenders.first
1016
1141
  end
1017
1142
 
1018
- # An array of all sub-matches of this match.
1143
+ # Returns an array of Match objects that are submatches of this match in the
1144
+ # order they appeared in the input.
1019
1145
  def matches
1020
- @matches ||= []
1146
+ @matches ||= begin
1147
+ matches = []
1148
+ stack = []
1149
+ offset = 0
1150
+ close = false
1151
+ index = 0
1152
+
1153
+ while index < @events.size
1154
+ event = @events[index]
1155
+ if close
1156
+ start = stack.pop
1157
+ if stack.size == extenders.size
1158
+ matches << Match.new(slice(offset, event), @events[start..index])
1159
+ offset += event
1160
+ end
1161
+ close = false
1162
+ elsif event == CLOSE
1163
+ close = true
1164
+ else
1165
+ stack << index
1166
+ end
1167
+ index += 1
1168
+ end
1169
+
1170
+ matches
1171
+ end
1021
1172
  end
1022
1173
 
1023
1174
  # Returns an array of all sub-matches with the given +name+. If +deep+ is
1024
1175
  # +false+, returns only sub-matches that are immediate descendants of this
1025
1176
  # match.
1026
1177
  def find(name, deep=true)
1027
- sym = name.to_sym
1028
- ms = matches.select {|m| m.has_name?(sym) }
1178
+ ms = matches.select {|m| m.has_name?(name) }
1029
1179
  matches.each {|m| ms.concat(m.find(name, deep)) } if deep
1030
1180
  ms
1031
1181
  end
@@ -1034,31 +1184,44 @@ module Citrus
1034
1184
  # +name+ is given, attempts to retrieve the first immediate sub-match named
1035
1185
  # +name+.
1036
1186
  def first(name=nil)
1037
- name.nil? ? matches.first : find(name, false).first
1187
+ name ? find(name, false).first : matches.first
1038
1188
  end
1039
1189
 
1040
- # Returns +true+ if this match has no descendants (was created from a
1041
- # Terminal).
1042
- def terminal?
1043
- matches.length == 0
1190
+ # Allows sub-matches of this match to be retrieved by name as instance
1191
+ # methods.
1192
+ def method_missing(sym, *args)
1193
+ if sym == :to_ary
1194
+ # This is a workaround for a bug in Ruby 1.9 with classes that
1195
+ # extend String.
1196
+ super
1197
+ else
1198
+ first(sym) or raise NoMatchError, 'No match named "%s" in %s (%s)' %
1199
+ [sym, self, name]
1200
+ end
1044
1201
  end
1045
1202
 
1046
- # Creates a new String object from the contents of this match.
1047
- def to_s
1048
- String.new(self)
1203
+ # Returns a string representation of this match that displays the entire
1204
+ # match tree for easy viewing in the console.
1205
+ def dump
1206
+ dump_lines.join("\n")
1049
1207
  end
1050
1208
 
1051
- # Allows sub-matches of this match to be retrieved by name as instance
1052
- # methods.
1053
- def method_missing(sym, *args)
1054
- m = first(sym)
1055
- return m if m
1056
- raise NoMatchError, 'No match named "%s" in %s (%s)' %
1057
- [sym, self, name || '<anonymous>']
1209
+ def dump_lines(indent=' ') # :nodoc:
1210
+ line = to_s.inspect
1211
+ line << ' (%s)' % names.join(',') unless names.empty?
1212
+ matches.inject([line]) do |lines, m|
1213
+ lines.concat(m.dump_lines(indent).map {|line| indent + line })
1214
+ end
1058
1215
  end
1059
1216
 
1060
- def to_ary
1061
- # This method intentionally left blank to work around a bug in Ruby 1.9.
1217
+ private
1218
+
1219
+ # Extends this match with the extensions provided by its #rules.
1220
+ def extend! # :nodoc:
1221
+ extenders.each do |rule|
1222
+ self.names << rule.name if rule.named?
1223
+ extend(rule.extension) if rule.extension
1224
+ end
1062
1225
  end
1063
1226
  end
1064
1227
  end