citrus 2.0.1 → 2.1.1

Sign up to get free protection for your applications and to get access to all the features.
data/README CHANGED
@@ -5,8 +5,8 @@
5
5
  Parsing Expressions for Ruby
6
6
 
7
7
 
8
- Citrus is a compact and powerful parsing library for
9
- [Ruby](http://ruby-lang.org/) that combines the elegance and expressiveness of
8
+ Citrus is a compact and powerful parsing library for
9
+ [Ruby](http://ruby-lang.org/) that combines the elegance and expressiveness of
10
10
  the language with the simplicity and power of
11
11
  [parsing expressions](http://en.wikipedia.org/wiki/Parsing_expression_grammar).
12
12
 
@@ -16,13 +16,13 @@ the language with the simplicity and power of
16
16
 
17
17
  Via [RubyGems](http://rubygems.org/):
18
18
 
19
- $ sudo gem install citrus
19
+ $ gem install citrus
20
20
 
21
21
  From a local copy:
22
22
 
23
23
  $ git clone git://github.com/mjijackson/citrus.git
24
24
  $ cd citrus
25
- $ rake package && sudo rake install
25
+ $ rake package install
26
26
 
27
27
 
28
28
  # Background
@@ -77,23 +77,23 @@ thereof.
77
77
  A Citrus grammar is really just a souped-up Ruby
78
78
  [module](http://ruby-doc.org/core/classes/Module.html). These modules may be
79
79
  included in other grammar modules in the same way that Ruby modules are normally
80
- used. This property allows you to divide a complex grammar into more manageable,
81
- reusable pieces that may be combined at runtime. Any grammar rule with the same
82
- name as a rule in an included grammar may access that rule with a mechanism
80
+ used. This property allows you to divide a complex grammar into more manageable,
81
+ reusable pieces that may be combined at runtime. Any grammar rule with the same
82
+ name as a rule in an included grammar may access that rule with a mechanism
83
83
  similar to Ruby's super keyword.
84
84
 
85
85
  ## Matches
86
86
 
87
- Matches are created by rule objects when they match on the input. A
88
- [Match](api/classes/Citrus/Match.html) is actually a
89
- [String](http://ruby-doc.org/core/classes/String.html) object with some extra
87
+ Matches are created by rule objects when they match on the input. A
88
+ [Match](api/classes/Citrus/Match.html) is actually a
89
+ [String](http://ruby-doc.org/core/classes/String.html) object with some extra
90
90
  information attached such as the name(s) of the rule(s) from which it was
91
91
  generated and any submatches it may contain.
92
92
 
93
93
  During a parse, matches are arranged in a tree structure where any match may
94
94
  contain any number of other matches. This structure is determined by the way in
95
- which the rule that generated each match is used in the grammar. For example, a
96
- match that is created from a non-terminal rule that contains several other
95
+ which the rule that generated each match is used in the grammar. For example, a
96
+ match that is created from a non-terminal rule that contains several other
97
97
  terminals will likewise contain several matches, one for each terminal.
98
98
 
99
99
  Match objects may be extended with semantic information in the form of methods.
@@ -207,28 +207,28 @@ See [Label](api/classes/Citrus/Label.html) for more information.
207
207
 
208
208
  ## Precedence
209
209
 
210
- The following table contains a list of all Citrus operators and their
211
- precedence. A higher precedence indicates tighter binding.
212
-
213
- Operator | Name | Precedence
214
- ----------- | ------------------------- | ----------
215
- '' | String (single quoted) | 6
216
- "" | String (double quoted) | 6
217
- [] | Character class | 6
218
- . | Dot (any character) | 6
219
- // | Regular expression | 6
220
- () | Grouping | 6
221
- * | Repetition (arbitrary) | 5
222
- + | Repetition (one or more) | 5
223
- ? | Repetition (zero or one) | 5
224
- & | And predicate | 4
225
- ! | Not predicate | 4
226
- ~ | But predicate | 4
227
- : | Label | 4
228
- <> | Extension (module name) | 3
229
- {} | Extension (literal) | 3
230
- e1 e2 | Sequence | 2
231
- e1 | e2 | Ordered choice | 1
210
+ The following table contains a list of all Citrus symbols and operators and
211
+ their precedence. A higher precedence indicates tighter binding.
212
+
213
+ Operator | Name | Precedence
214
+ --------- | ------------------------- | ----------
215
+ '' | String (single quoted) | 6
216
+ "" | String (double quoted) | 6
217
+ [] | Character class | 6
218
+ . | Dot (any character) | 6
219
+ // | Regular expression | 6
220
+ () | Grouping | 6
221
+ * | Repetition (arbitrary) | 5
222
+ + | Repetition (one or more) | 5
223
+ ? | Repetition (zero or one) | 5
224
+ & | And predicate | 4
225
+ ! | Not predicate | 4
226
+ ~ | But predicate | 4
227
+ : | Label | 4
228
+ <> | Extension (module name) | 3
229
+ {} | Extension (literal) | 3
230
+ e1 e2 | Sequence | 2
231
+ e1 | e2 | Ordered choice | 1
232
232
 
233
233
 
234
234
  # Example
@@ -272,13 +272,12 @@ and "1 + 2+3", but it does not have enough semantic information to be able to
272
272
  actually interpret these expressions.
273
273
 
274
274
  At this point, when the grammar parses a string it generates a tree of
275
- [Match](api/classes/Citrus/Match.html) objects. Each match is created by a rule.
276
- A match knows what text it contains, its offset in the original input, and what
277
- submatches it contains.
275
+ [Match](api/classes/Citrus/Match.html) objects. Each match is created by a rule
276
+ and may itself be comprised of any number of submatches.
278
277
 
279
278
  Submatches are created whenever a rule contains another rule. For example, in
280
- the grammar above the number rule matches a string of digits followed by white
281
- space. Thus, a match generated by the number rule will contain two submatches.
279
+ the grammar above `number` matches a string of digits followed by white space.
280
+ Thus, a match generated by this rule will contain two submatches.
282
281
 
283
282
  We can define methods inside a set of curly braces that will be used to extend
284
283
  matches when they are created. This works in similar fashion to using Ruby's
@@ -352,14 +351,14 @@ Congratulations! You just ran your first piece of Citrus code.
352
351
 
353
352
  One interesting thing to notice about the above sequence of commands is the
354
353
  return value of [Citrus#load](api/classes/Citrus.html#M000003). When you use
355
- `Citrus.load` to
356
- load a grammar file (and likewise [Citrus#eval](api/classes/Citrus.html#M000004) to evaluate
357
- a raw string of grammar code), the return value is an array of all the grammars
358
- present in that file.
354
+ `Citrus.load` to load a grammar file (and likewise
355
+ [Citrus#eval](api/classes/Citrus.html#M000004) to evaluate a raw string of
356
+ grammar code), the return value is an array of all the grammars present in that
357
+ file.
359
358
 
360
- Take a look at
359
+ Take a look at
361
360
  [examples/calc.citrus](http://github.com/mjijackson/citrus/blob/master/examples/calc.citrus)
362
- for an example of a calculator that is able to parse and evaluate more complex
361
+ for an example of a calculator that is able to parse and evaluate more complex
363
362
  mathematical expressions.
364
363
 
365
364
  ## Implicit Value
@@ -383,23 +382,86 @@ as:
383
382
  }
384
383
  end
385
384
 
386
- Since no method name is explicitly specified in the semantic blocks, they may be
385
+ Since no method name is explicitly specified in the semantic blocks, they may be
387
386
  called using the `value` method.
388
387
 
389
388
 
389
+ # Testing
390
+
391
+
392
+ Citrus was designed to facilitate simple and powerful testing of grammars. To
393
+ demonstrate how this is to be done, we'll use the `Addition` grammar from our
394
+ previous [example](example.html). The following code demonstrates a simple test
395
+ case that could be used to test that our grammar works properly.
396
+
397
+ class AdditionTest < Test::Unit::TestCase
398
+ def test_additive
399
+ match = Addition.parse('23 + 12', :root => :additive)
400
+ assert(match)
401
+ assert_equal('23 + 12', match)
402
+ assert_equal(35, match.value)
403
+ end
404
+
405
+ def test_number
406
+ match = Addition.parse('23', :root => :number)
407
+ assert(match)
408
+ assert_equal('23', match)
409
+ assert_equal(23, match.value)
410
+ end
411
+ end
412
+
413
+ The key here is using the `root`
414
+ [option](api/classes/Citrus/GrammarMethods.html#M000031) when performing the
415
+ parse to specify the name of the rule at which the parse should start. In
416
+ `test_number`, since `:number` was given the parse will start at that rule as if
417
+ it were the root rule of the entire grammar. The ability to change the root rule
418
+ on the fly like this enables easy unit testing of the entire grammar.
419
+
420
+ Also note that because match objects are themselves strings, assertions may be
421
+ made to test equality of match objects with string values.
422
+
423
+ ## Debugging
424
+
425
+ When a parse fails, a [ParseError](api/classes/Citrus/ParseError.html) object is
426
+ generated which provides a wealth of information about exactly where the parse
427
+ failed. Using this object, you could possibly provide some useful feedback to
428
+ the user about why the input was bad. The following code demonstrates one way
429
+ to do this.
430
+
431
+ def parse_some_stuff(stuff)
432
+ match = StuffGrammar.parse(stuff)
433
+ rescue Citrus::ParseError => e
434
+ raise ArgumentError, "Invalid stuff on line %d, offset %d!" %
435
+ [e.line_number, e.line_offset]
436
+ end
437
+
438
+ In addition to useful error objects, Citrus also includes a special file that
439
+ should help grammar authors when debugging grammars. To get this extra
440
+ functionality, simply `require 'citrus/debug'` instead of `require 'citrus'`
441
+ when running your code.
442
+
443
+ When debugging is enabled, you can visualize parse trees in the console as XML
444
+ documents. This can help when determining which rules are generating which
445
+ matches and how they are organized in the output. Also when debugging, each
446
+ match object automatically records its offset in the original input, which can
447
+ also be very helpful in keeping track of which offsets in the input generated
448
+ which matches.
449
+
450
+
390
451
  # Links
391
452
 
392
453
 
393
454
  The primary resource for all things to do with parsing expressions can be found
394
- on the original [Packrat and Parsing Expression Grammars page](http://pdos.csail.mit.edu/~baford/packrat) at MIT.
455
+ on the original [Packrat and Parsing Expression Grammars page](http://pdos.csail.mit.edu/~baford/packrat)
456
+ at MIT.
395
457
 
396
- Also, a useful summary of parsing expression grammars can be found on
458
+ Also, a useful summary of parsing expression grammars can be found on
397
459
  [Wikipedia](http://en.wikipedia.org/wiki/Parsing_expression_grammar).
398
460
 
399
461
  Citrus draws inspiration from another Ruby library for writing parsing
400
462
  expression grammars, Treetop. While Citrus' syntax is similar to that of
401
- [Treetop](http://treetop.rubyforge.org), it's not identical. The link is
402
- included here for those who may wish toexplore an alternative implementation.
463
+ [Treetop](http://treetop.rubyforge.org), it's not identical. The link is
464
+ included here for those who may wish to explore an alternative implementation.
403
465
 
404
466
 
405
467
  # License
@@ -50,23 +50,23 @@ thereof.
50
50
  A Citrus grammar is really just a souped-up Ruby
51
51
  [module](http://ruby-doc.org/core/classes/Module.html). These modules may be
52
52
  included in other grammar modules in the same way that Ruby modules are normally
53
- used. This property allows you to divide a complex grammar into more manageable,
54
- reusable pieces that may be combined at runtime. Any grammar rule with the same
55
- name as a rule in an included grammar may access that rule with a mechanism
53
+ used. This property allows you to divide a complex grammar into more manageable,
54
+ reusable pieces that may be combined at runtime. Any grammar rule with the same
55
+ name as a rule in an included grammar may access that rule with a mechanism
56
56
  similar to Ruby's super keyword.
57
57
 
58
58
  ## Matches
59
59
 
60
- Matches are created by rule objects when they match on the input. A
61
- [Match](api/classes/Citrus/Match.html) is actually a
62
- [String](http://ruby-doc.org/core/classes/String.html) object with some extra
60
+ Matches are created by rule objects when they match on the input. A
61
+ [Match](api/classes/Citrus/Match.html) is actually a
62
+ [String](http://ruby-doc.org/core/classes/String.html) object with some extra
63
63
  information attached such as the name(s) of the rule(s) from which it was
64
64
  generated and any submatches it may contain.
65
65
 
66
66
  During a parse, matches are arranged in a tree structure where any match may
67
67
  contain any number of other matches. This structure is determined by the way in
68
- which the rule that generated each match is used in the grammar. For example, a
69
- match that is created from a non-terminal rule that contains several other
68
+ which the rule that generated each match is used in the grammar. For example, a
69
+ match that is created from a non-terminal rule that contains several other
70
70
  terminals will likewise contain several matches, one for each terminal.
71
71
 
72
72
  Match objects may be extended with semantic information in the form of methods.
data/doc/example.markdown CHANGED
@@ -39,13 +39,12 @@ and "1 + 2+3", but it does not have enough semantic information to be able to
39
39
  actually interpret these expressions.
40
40
 
41
41
  At this point, when the grammar parses a string it generates a tree of
42
- [Match](api/classes/Citrus/Match.html) objects. Each match is created by a rule.
43
- A match knows what text it contains, its offset in the original input, and what
44
- submatches it contains.
42
+ [Match](api/classes/Citrus/Match.html) objects. Each match is created by a rule
43
+ and may itself be comprised of any number of submatches.
45
44
 
46
45
  Submatches are created whenever a rule contains another rule. For example, in
47
- the grammar above the number rule matches a string of digits followed by white
48
- space. Thus, a match generated by the number rule will contain two submatches.
46
+ the grammar above `number` matches a string of digits followed by white space.
47
+ Thus, a match generated by this rule will contain two submatches.
49
48
 
50
49
  We can define methods inside a set of curly braces that will be used to extend
51
50
  matches when they are created. This works in similar fashion to using Ruby's
@@ -119,14 +118,14 @@ Congratulations! You just ran your first piece of Citrus code.
119
118
 
120
119
  One interesting thing to notice about the above sequence of commands is the
121
120
  return value of [Citrus#load](api/classes/Citrus.html#M000003). When you use
122
- `Citrus.load` to
123
- load a grammar file (and likewise [Citrus#eval](api/classes/Citrus.html#M000004) to evaluate
124
- a raw string of grammar code), the return value is an array of all the grammars
125
- present in that file.
121
+ `Citrus.load` to load a grammar file (and likewise
122
+ [Citrus#eval](api/classes/Citrus.html#M000004) to evaluate a raw string of
123
+ grammar code), the return value is an array of all the grammars present in that
124
+ file.
126
125
 
127
- Take a look at
126
+ Take a look at
128
127
  [examples/calc.citrus](http://github.com/mjijackson/citrus/blob/master/examples/calc.citrus)
129
- for an example of a calculator that is able to parse and evaluate more complex
128
+ for an example of a calculator that is able to parse and evaluate more complex
130
129
  mathematical expressions.
131
130
 
132
131
  ## Implicit Value
@@ -150,5 +149,5 @@ as:
150
149
  }
151
150
  end
152
151
 
153
- Since no method name is explicitly specified in the semantic blocks, they may be
152
+ Since no method name is explicitly specified in the semantic blocks, they may be
154
153
  called using the `value` method.
data/doc/index.markdown CHANGED
@@ -1,5 +1,5 @@
1
- Citrus is a compact and powerful parsing library for
2
- [Ruby](http://ruby-lang.org/) that combines the elegance and expressiveness of
1
+ Citrus is a compact and powerful parsing library for
2
+ [Ruby](http://ruby-lang.org/) that combines the elegance and expressiveness of
3
3
  the language with the simplicity and power of
4
4
  [parsing expressions](http://en.wikipedia.org/wiki/Parsing_expression_grammar).
5
5
 
@@ -9,10 +9,10 @@ the language with the simplicity and power of
9
9
 
10
10
  Via [RubyGems](http://rubygems.org/):
11
11
 
12
- $ sudo gem install citrus
12
+ $ gem install citrus
13
13
 
14
14
  From a local copy:
15
15
 
16
16
  $ git clone git://github.com/mjijackson/citrus.git
17
17
  $ cd citrus
18
- $ rake package && sudo rake install
18
+ $ rake package install
data/doc/links.markdown CHANGED
@@ -2,12 +2,13 @@
2
2
 
3
3
 
4
4
  The primary resource for all things to do with parsing expressions can be found
5
- on the original [Packrat and Parsing Expression Grammars page](http://pdos.csail.mit.edu/~baford/packrat) at MIT.
5
+ on the original [Packrat and Parsing Expression Grammars page](http://pdos.csail.mit.edu/~baford/packrat)
6
+ at MIT.
6
7
 
7
- Also, a useful summary of parsing expression grammars can be found on
8
+ Also, a useful summary of parsing expression grammars can be found on
8
9
  [Wikipedia](http://en.wikipedia.org/wiki/Parsing_expression_grammar).
9
10
 
10
11
  Citrus draws inspiration from another Ruby library for writing parsing
11
12
  expression grammars, Treetop. While Citrus' syntax is similar to that of
12
- [Treetop](http://treetop.rubyforge.org), it's not identical. The link is
13
- included here for those who may wish toexplore an alternative implementation.
13
+ [Treetop](http://treetop.rubyforge.org), it's not identical. The link is
14
+ included here for those who may wish to explore an alternative implementation.
data/doc/syntax.markdown CHANGED
@@ -104,8 +104,8 @@ See [Label](api/classes/Citrus/Label.html) for more information.
104
104
 
105
105
  ## Precedence
106
106
 
107
- The following table contains a list of all Citrus operators and their
108
- precedence. A higher precedence indicates tighter binding.
107
+ The following table contains a list of all Citrus symbols and operators and
108
+ their precedence. A higher precedence indicates tighter binding.
109
109
 
110
110
  Operator | Name | Precedence
111
111
  ------------------------- | ------------------------- | ----------
@@ -0,0 +1,60 @@
1
+ # Testing
2
+
3
+
4
+ Citrus was designed to facilitate simple and powerful testing of grammars. To
5
+ demonstrate how this is to be done, we'll use the `Addition` grammar from our
6
+ previous [example](example.html). The following code demonstrates a simple test
7
+ case that could be used to test that our grammar works properly.
8
+
9
+ class AdditionTest < Test::Unit::TestCase
10
+ def test_additive
11
+ match = Addition.parse('23 + 12', :root => :additive)
12
+ assert(match)
13
+ assert_equal('23 + 12', match)
14
+ assert_equal(35, match.value)
15
+ end
16
+
17
+ def test_number
18
+ match = Addition.parse('23', :root => :number)
19
+ assert(match)
20
+ assert_equal('23', match)
21
+ assert_equal(23, match.value)
22
+ end
23
+ end
24
+
25
+ The key here is using the `root`
26
+ [option](api/classes/Citrus/GrammarMethods.html#M000031) when performing the
27
+ parse to specify the name of the rule at which the parse should start. In
28
+ `test_number`, since `:number` was given the parse will start at that rule as if
29
+ it were the root rule of the entire grammar. The ability to change the root rule
30
+ on the fly like this enables easy unit testing of the entire grammar.
31
+
32
+ Also note that because match objects are themselves strings, assertions may be
33
+ made to test equality of match objects with string values.
34
+
35
+ ## Debugging
36
+
37
+ When a parse fails, a [ParseError](api/classes/Citrus/ParseError.html) object is
38
+ generated which provides a wealth of information about exactly where the parse
39
+ failed. Using this object, you could possibly provide some useful feedback to
40
+ the user about why the input was bad. The following code demonstrates one way
41
+ to do this.
42
+
43
+ def parse_some_stuff(stuff)
44
+ match = StuffGrammar.parse(stuff)
45
+ rescue Citrus::ParseError => e
46
+ raise ArgumentError, "Invalid stuff on line %d, offset %d!" %
47
+ [e.line_number, e.line_offset]
48
+ end
49
+
50
+ In addition to useful error objects, Citrus also includes a special file that
51
+ should help grammar authors when debugging grammars. To get this extra
52
+ functionality, simply `require 'citrus/debug'` instead of `require 'citrus'`
53
+ when running your code.
54
+
55
+ When debugging is enabled, you can visualize parse trees in the console as XML
56
+ documents. This can help when determining which rules are generating which
57
+ matches and how they are organized in the output. Also when debugging, each
58
+ match object automatically records its offset in the original input, which can
59
+ also be very helpful in keeping track of which offsets in the input generated
60
+ which matches.
data/lib/citrus.rb CHANGED
@@ -8,7 +8,7 @@ require 'strscan'
8
8
  module Citrus
9
9
  autoload :File, 'citrus/file'
10
10
 
11
- VERSION = [2, 0, 1]
11
+ VERSION = [2, 1, 1]
12
12
 
13
13
  # Returns the current version of Citrus as a string.
14
14
  def self.version
@@ -27,92 +27,204 @@ module Citrus
27
27
  file << '.citrus' unless F.file?(file)
28
28
  raise "Cannot find file #{file}" unless F.file?(file)
29
29
  raise "Cannot read file #{file}" unless F.readable?(file)
30
- self.eval(F.read(file))
30
+ eval(F.read(file))
31
31
  end
32
32
 
33
33
  # Evaluates the given Citrus parsing expression grammar +code+ in the global
34
- # scope. The +code+ may contain the definition of any number of modules.
35
- # Returns an array of any grammar modules that are created.
34
+ # scope. Returns an array of any grammar modules that are created. Implicitly
35
+ # raises +SyntaxError+ on a failed parse.
36
36
  def self.eval(code)
37
- File.parse(code).value
37
+ parse(code, :consume => true).value
38
38
  end
39
39
 
40
- # This error is raised whenever a parse fails.
41
- class ParseError < Exception
42
- def initialize(input)
43
- @input = input
44
- msg = "Failed to parse input at offset %d\n" % offset
45
- msg << detail
46
- super(msg)
40
+ # Parses the given Citrus +code+ using the given +options+. Returns the
41
+ # generated match tree. Raises a +SyntaxError+ if the parse fails.
42
+ def self.parse(code, options={})
43
+ begin
44
+ File.parse(code, options)
45
+ rescue ParseError => e
46
+ raise SyntaxError.new(e)
47
47
  end
48
+ end
48
49
 
49
- # The Input object that was used for the parse.
50
- attr_reader :input
50
+ # A standard error class that all Citrus errors extend.
51
+ class Error < RuntimeError; end
51
52
 
52
- # Returns the 0-based offset at which the error occurred in the input, i.e.
53
- # the maximum offset in the input that was successfully parsed before the
54
- # error occurred.
55
- def offset
56
- input.max_offset
53
+ # Raised when there is an error parsing Citrus code.
54
+ class SyntaxError < Error
55
+ # The +error+ given here should be a +ParseError+ object.
56
+ def initialize(error)
57
+ msg = "Syntax error on line %d at offset %d\n%s" %
58
+ [error.line_number, error.line_offset, error.detail]
59
+ super(msg)
57
60
  end
61
+ end
62
+
63
+ # Raised when a match cannot be found.
64
+ class NoMatchError < Error; end
58
65
 
59
- # Returns the text of the line on which the error occurred.
60
- def line
61
- lines[line_index]
66
+ # Raised when a parse fails.
67
+ class ParseError < Error
68
+ # The +input+ given here is an instance of Citrus::Input.
69
+ def initialize(input)
70
+ @offset = input.max_offset
71
+ @line_offset = input.line_offset(offset)
72
+ @line_number = input.line_number(offset)
73
+ @line = input.line(offset)
74
+ msg = "Failed to parse input at offset %d\n" % offset
75
+ msg << detail
76
+ super(msg)
62
77
  end
63
78
 
64
- # Returns the 1-based number of the line in the input where the error
79
+ # The 0-based offset at which the error occurred in the input, i.e. the
80
+ # maximum offset in the input that was successfully parsed before the error
65
81
  # occurred.
66
- def line_number
67
- line_index + 1
68
- end
82
+ attr_reader :offset
69
83
 
70
- alias lineno line_number
84
+ # The 0-based offset at which the error occurred on the line on which it
85
+ # occurred in the input.
86
+ attr_reader :line_offset
71
87
 
72
- # Returns the 0-based offset at which the error occurred on the line on
73
- # which it occurred.
74
- def line_offset
75
- pos = 0
76
- each_line do |line|
77
- len = line.length
78
- return (offset - pos) if pos + len >= offset
79
- pos += len
80
- end
81
- 0
82
- end
88
+ # The 1-based number of the line in the input where the error occurred.
89
+ attr_reader :line_number
90
+
91
+ # The text of the line in the input where the error occurred.
92
+ attr_reader :line
83
93
 
84
94
  # Returns a string that, when printed, gives a visual representation of
85
95
  # exactly where the error occurred on its line in the input.
86
96
  def detail
87
97
  "%s\n%s^" % [line, ' ' * line_offset]
88
98
  end
99
+ end
89
100
 
90
- private
101
+ # This class represents the core of the parsing algorithm. It wraps the input
102
+ # string and serves matches to all nonterminals.
103
+ class Input < StringScanner
104
+ def initialize(string)
105
+ super(string)
106
+ @max_offset = 0
107
+ end
108
+
109
+ # The maximum offset that has been achieved during a parse.
110
+ attr_reader :max_offset
111
+
112
+ # A nested hash of rule id's to offsets and their respective matches. Only
113
+ # present if memoing is enabled.
114
+ attr_reader :cache
115
+
116
+ # The number of times the cache was hit. Only present if memoing is enabled.
117
+ attr_reader :cache_hits
91
118
 
92
- def string
93
- input.string
119
+ # Returns the length of this input.
120
+ def length
121
+ string.length
94
122
  end
95
123
 
124
+ # Returns an array containing the lines of text in the input.
96
125
  def lines
97
126
  string.send(string.respond_to?(:lines) ? :lines : :to_s).to_a
98
127
  end
99
128
 
129
+ # Iterates over the lines of text in the input using the given +block+.
100
130
  def each_line(&block)
101
131
  string.each_line(&block)
102
132
  end
103
133
 
104
- # Returns the 0-based number of the line in the input where the error
105
- # occurred.
106
- def line_index
107
- pos = 0
108
- idx = 0
134
+ # Returns the 0-based offset of the given +pos+ in the input on the line
135
+ # on which it is found. +pos+ defaults to the current pointer position.
136
+ def line_offset(pos=pos)
137
+ p = 0
138
+ each_line do |line|
139
+ len = line.length
140
+ return (pos - p) if p + len >= pos
141
+ p += len
142
+ end
143
+ 0
144
+ end
145
+
146
+ # Returns the 0-based number of the line that contains the character at the
147
+ # given +pos+. +pos+ defaults to the current pointer position.
148
+ def line_index(pos=pos)
149
+ p, n = 0, 0
109
150
  each_line do |line|
110
- pos += line.length
111
- return idx if pos >= offset
112
- idx += 1
151
+ p += line.length
152
+ return n if p >= pos
153
+ n += 1
113
154
  end
114
155
  0
115
156
  end
157
+
158
+ # Returns the 1-based number of the line that contains the character at the
159
+ # given +pos+. +pos+ defaults to the current pointer position.
160
+ def line_number(pos=pos)
161
+ line_index(pos) + 1
162
+ end
163
+
164
+ alias lineno line_number
165
+
166
+ # Returns the text of the line that contains the character at the given
167
+ # +pos+. +pos+ defaults to the current pointer position.
168
+ def line(pos=pos)
169
+ lines[line_index(pos)]
170
+ end
171
+
172
+ # Returns the match for the given +rule+ at the current pointer position,
173
+ # which is +nil+ if no match can be made.
174
+ def match(rule)
175
+ offset = pos
176
+ match = rule.match(self)
177
+
178
+ if match
179
+ @max_offset = pos if pos > @max_offset
180
+ else
181
+ # Reset the position for the next attempt at a match.
182
+ self.pos = offset unless match
183
+ end
184
+
185
+ match
186
+ end
187
+
188
+ # Returns +true+ when using memoization to cache match results.
189
+ def memoized?
190
+ !! @cache
191
+ end
192
+
193
+ # Modifies this object to cache match results during a parse. This technique
194
+ # (also known as "Packrat" parsing) guarantees parsers will operate in
195
+ # linear time but costs significantly more in terms of time and memory
196
+ # required to perform a parse. For more information, please read the paper
197
+ # on Packrat parsing at http://pdos.csail.mit.edu/~baford/packrat/icfp02/.
198
+ def memoize!
199
+ return if memoized?
200
+
201
+ # Using +instance_eval+ here preserves access to +super+ within the
202
+ # methods we define inside the block.
203
+ instance_eval do
204
+ def match(rule) # :nodoc:
205
+ c = @cache[rule.id] ||= {}
206
+
207
+ if c.key?(pos)
208
+ @cache_hits += 1
209
+ c[pos]
210
+ else
211
+ c[pos] = super
212
+ end
213
+ end
214
+
215
+ # Resets all internal variables so that this object may be used in
216
+ # another parse.
217
+ def reset
218
+ super
219
+ @max_offset = 0
220
+ @cache = {}
221
+ @cache_hits = 0
222
+ end
223
+ end
224
+
225
+ @cache = {}
226
+ @cache_hits = 0
227
+ end
116
228
  end
117
229
 
118
230
  # Inclusion of this module into another extends the receiver with the grammar
@@ -361,85 +473,6 @@ module Citrus
361
473
  end
362
474
  end
363
475
 
364
- # This class represents the core of the parsing algorithm. It wraps the input
365
- # string and serves matches to all nonterminals.
366
- class Input < StringScanner
367
- def initialize(string)
368
- super(string)
369
- @max_offset = 0
370
- end
371
-
372
- # The maximum offset that has been achieved during a parse.
373
- attr_reader :max_offset
374
-
375
- # A nested hash of rule id's to offsets and their respective matches. Only
376
- # present if memoing is enabled.
377
- attr_reader :cache
378
-
379
- # The number of times the cache was hit. Only present if memoing is enabled.
380
- attr_reader :cache_hits
381
-
382
- # Returns the length of this input.
383
- def length
384
- string.length
385
- end
386
-
387
- # Returns the match for a given +rule+ at the current position in the input.
388
- def match(rule)
389
- offset = pos
390
- match = rule.match(self)
391
-
392
- if match
393
- @max_offset = pos if pos > @max_offset
394
- else
395
- # Reset the position for the next attempt at a match.
396
- self.pos = offset
397
- end
398
-
399
- match
400
- end
401
-
402
- # Returns true if this input uses memoization to cache match results. See
403
- # #memoize!.
404
- def memoized?
405
- !! @cache
406
- end
407
-
408
- # Modifies this object to cache match results during a parse. This technique
409
- # (also known as "Packrat" parsing) guarantees parsers will operate in
410
- # linear time but costs significantly more in terms of time and memory
411
- # required to perform a parse. For more information, please read the paper
412
- # on Packrat parsing at http://pdos.csail.mit.edu/~baford/packrat/icfp02/.
413
- def memoize!
414
- return if memoized?
415
-
416
- # Using +instance_eval+ here preserves access to +super+ within the
417
- # methods we define inside the block.
418
- instance_eval do
419
- def match(rule)
420
- c = @cache[rule.id] ||= {}
421
-
422
- if c.key?(pos)
423
- @cache_hits += 1
424
- c[pos]
425
- else
426
- c[pos] = super
427
- end
428
- end
429
-
430
- def reset
431
- super
432
- @max_offset = 0
433
- @cache = {}
434
- @cache_hits = 0
435
- end
436
- end
437
-
438
- @cache = {}
439
- @cache_hits = 0
440
- end
441
- end
442
-
443
476
  # A Rule is an object that is used by a grammar to create matches on the
444
477
  # Input during parsing.
445
478
  module Rule
@@ -448,7 +481,7 @@ module Citrus
448
481
  # Citrus::Rule.eval('"a" | "b"')
449
482
  #
450
483
  def self.eval(expr)
451
- File.parse(expr, :root => :rule_body).value
484
+ Citrus.parse(expr, :root => :rule_body, :consume => true).value
452
485
  end
453
486
 
454
487
  # Returns a new Rule object depending on the type of object given.
@@ -668,7 +701,7 @@ module Citrus
668
701
 
669
702
  # Returns the Match for this rule on +input+, +nil+ if no match can be made.
670
703
  def match(input)
671
- m = input.scan(@rule)
704
+ m = input.scan(rule)
672
705
  create_match(m) if m
673
706
  end
674
707
 
@@ -1016,7 +1049,8 @@ module Citrus
1016
1049
  def method_missing(sym, *args)
1017
1050
  m = first(sym)
1018
1051
  return m if m
1019
- raise 'No match named "%s" in %s (%s)' % [sym, self, name]
1052
+ raise NoMatchError, 'No match named "%s" in %s (%s)' %
1053
+ [sym, self, name || '<anonymous>']
1020
1054
  end
1021
1055
 
1022
1056
  def to_ary
@@ -1037,8 +1071,8 @@ class Object
1037
1071
  # end
1038
1072
  #
1039
1073
  def grammar(name, &block)
1040
- obj = respond_to?(:const_set) ? self : Object
1041
- obj.const_set(name, Citrus::Grammar.new(&block))
1074
+ namespace = respond_to?(:const_set) ? self : Object
1075
+ namespace.const_set(name, Citrus::Grammar.new(&block))
1042
1076
  rescue NameError
1043
1077
  raise ArgumentError, 'Invalid grammar name: %s' % name
1044
1078
  end
data/lib/citrus/debug.rb CHANGED
@@ -58,7 +58,7 @@ module Citrus
58
58
  ].each do |rule_class|
59
59
  rule_class.class_eval do
60
60
  alias original_match match
61
-
61
+
62
62
  def match(input)
63
63
  m = original_match(input)
64
64
  m.offset = input.pos - m.length if m
data/lib/citrus/file.rb CHANGED
@@ -1,41 +1,67 @@
1
1
  require 'citrus'
2
2
 
3
3
  module Citrus
4
+ # Some helper methods for rules that alias +module_name+ and don't want to
5
+ # use +Kernel#eval+ to retrieve Module objects.
6
+ module ModuleHelpers #:nodoc:
7
+ def module_segments
8
+ @module_segments ||= module_name.value.split('::')
9
+ end
10
+
11
+ def module_namespace
12
+ module_segments[0..-2].inject(Object) do |namespace, constant|
13
+ constant.empty? ? namespace : namespace.const_get(constant)
14
+ end
15
+ end
16
+
17
+ def module_basename
18
+ module_segments.last
19
+ end
20
+ end
21
+
4
22
  # A grammar for Citrus grammar files. This grammar is used in Citrus#eval to
5
23
  # parse and evaluate Citrus grammars and serves as a prime example of how to
6
24
  # create a complex grammar complete with semantic interpretation in pure Ruby.
7
- File = Grammar.new do
25
+ File = Grammar.new do #:nodoc:
8
26
 
9
27
  ## Hierarchical syntax
10
28
 
11
29
  rule :file do
12
30
  all(:space, zero_or_more(any(:require, :grammar))) {
13
- find(:require).each { |r| require r.value }
14
- find(:grammar).map { |g| g.value }
31
+ find(:require).each {|r| require r.value }
32
+ find(:grammar).map {|g| g.value }
15
33
  }
16
34
  end
17
35
 
18
36
  rule :grammar do
19
37
  all(:grammar_keyword, :module_name, :grammar_body, :end_keyword) {
20
- code = '%s = Citrus::Grammar.new' % module_name.value
21
- grammar = eval(code, TOPLEVEL_BINDING)
38
+ include ModuleHelpers
39
+
40
+ def value
41
+ module_namespace.const_set(module_basename, grammar_body.value)
42
+ end
43
+ }
44
+ end
45
+
46
+ rule :grammar_body do
47
+ zero_or_more(any(:include, :root, :rule)) {
48
+ grammar = Grammar.new
22
49
 
23
- modules = find(:include).map { |inc| eval(inc.value, TOPLEVEL_BINDING) }
24
- modules.each { |mod| grammar.include(mod) }
50
+ find(:include).map do |inc|
51
+ grammar.include(inc.value)
52
+ end
25
53
 
26
54
  root = find(:root).last
27
55
  grammar.root(root.value) if root
28
56
 
29
- find(:rule).each { |r| grammar.rule(r.rule_name.value, r.value) }
57
+ find(:rule).each do |r|
58
+ grammar.rule(r.rule_name.value, r.value)
59
+ end
30
60
 
31
61
  grammar
32
62
  }
33
63
  end
34
64
 
35
- rule :grammar_body do
36
- zero_or_more(any(:include, :root, :rule))
37
- end
38
-
39
65
  rule :rule do
40
66
  all(:rule_keyword, :rule_name, :rule_body, :end_keyword) {
41
67
  rule_body.value
@@ -43,27 +69,37 @@ module Citrus
43
69
  end
44
70
 
45
71
  rule :rule_body do
46
- all(:sequence, :choice) {
47
- @choices ||= [ sequence ] + choice.value
48
- values = @choices.map { |c| c.value }
49
- values.length > 1 ? Choice.new(values) : values[0]
72
+ zero_or_one(:choice) {
73
+ # An empty rule definition matches the empty string.
74
+ matches.length > 0 ? choice.value : Rule.new('')
50
75
  }
51
76
  end
52
77
 
53
78
  rule :choice do
54
- zero_or_more([ :bar, :sequence ]) {
55
- matches.map { |m| m.matches[1] }
79
+ all(:sequence, zero_or_more([ :bar, :sequence ])) {
80
+ def rules
81
+ @rules ||= [ sequence.value ] + matches[1].matches.map {|m| m.matches[1].value }
82
+ end
83
+
84
+ def value
85
+ rules.length > 1 ? Choice.new(rules) : rules.first
86
+ end
56
87
  }
57
88
  end
58
89
 
59
90
  rule :sequence do
60
- zero_or_more(:appendix) {
61
- values = matches.map { |m| m.value }
62
- values.length > 1 ? Sequence.new(values) : values[0]
91
+ one_or_more(:expression) {
92
+ def rules
93
+ @rules ||= matches.map {|m| m.value }
94
+ end
95
+
96
+ def value
97
+ rules.length > 1 ? Sequence.new(rules) : rules.first
98
+ end
63
99
  }
64
100
  end
65
101
 
66
- rule :appendix do
102
+ rule :expression do
67
103
  all(:prefix, zero_or_one(:extension)) {
68
104
  rule = prefix.value
69
105
  extension = matches[1].first
@@ -105,7 +141,13 @@ module Citrus
105
141
  end
106
142
 
107
143
  rule :include do
108
- all(:include_keyword, :module_name) { module_name.value }
144
+ all(:include_keyword, :module_name) {
145
+ include ModuleHelpers
146
+
147
+ def value
148
+ module_namespace.const_get(module_basename)
149
+ end
150
+ }
109
151
  end
110
152
 
111
153
  rule :root do
@@ -142,13 +184,13 @@ module Citrus
142
184
 
143
185
  rule :quoted_string do
144
186
  all(/(["'])(?:\\?.)*?\1/, :space) {
145
- eval(first.to_s)
187
+ eval(first)
146
188
  }
147
189
  end
148
190
 
149
191
  rule :character_class do
150
192
  all(/\[(?:\\?.)*?\]/, :space) {
151
- Regexp.new('\A' + first.to_s, nil, 'n')
193
+ Regexp.new('\A' + first, nil, 'n')
152
194
  }
153
195
  end
154
196
 
@@ -160,7 +202,7 @@ module Citrus
160
202
 
161
203
  rule :regular_expression do
162
204
  all(/\/(?:\\?.)*?\/[imxouesn]*/, :space) {
163
- eval(first.to_s)
205
+ eval(first)
164
206
  }
165
207
  end
166
208
 
@@ -198,12 +240,16 @@ module Citrus
198
240
 
199
241
  rule :tag do
200
242
  all(:lt, :module_name, :gt) {
201
- eval(module_name.value, TOPLEVEL_BINDING)
243
+ include ModuleHelpers
244
+
245
+ def value
246
+ module_namespace.const_get(module_basename)
247
+ end
202
248
  }
203
249
  end
204
250
 
205
251
  rule :block do
206
- all(:lcurly, zero_or_more(any(:block, /[^}]+/)), :rcurly) {
252
+ all(:lcurly, zero_or_more(any(:block, /[^{}]+/)), :rcurly) {
207
253
  eval('Proc.new ' + to_s)
208
254
  }
209
255
  end
data/test/alias_test.rb CHANGED
@@ -32,7 +32,7 @@ class AliasTest < Test::Unit::TestCase
32
32
  assert(match)
33
33
  assert('ab', match.value)
34
34
 
35
- assert_raise RuntimeError do
35
+ assert_raise NoMatchError do
36
36
  match.b
37
37
  end
38
38
  end
@@ -0,0 +1,23 @@
1
+ require File.expand_path('../helper', __FILE__)
2
+
3
+ # This file tests functionality that is only present when debugging is enabled.
4
+
5
+ class DebugTest < Test::Unit::TestCase
6
+
7
+ def test_offset
8
+ match = Words.parse('one two')
9
+ assert(match)
10
+ assert_equal(0, match.offset)
11
+
12
+ words = match.find(:word)
13
+ assert(match)
14
+ assert_equal(2, words.length)
15
+
16
+ assert_equal('one', words[0])
17
+ assert_equal(0, words[0].offset)
18
+
19
+ assert_equal('two', words[1])
20
+ assert_equal(4, words[1].offset)
21
+ end
22
+
23
+ end
data/test/file_test.rb CHANGED
@@ -210,20 +210,60 @@ class CitrusFileTest < Test::Unit::TestCase
210
210
  assert_instance_of(AndPredicate, match.value)
211
211
  end
212
212
 
213
+ def test_empty
214
+ grammar = file(:rule_body)
215
+
216
+ match = grammar.parse('')
217
+ assert(match)
218
+ end
219
+
220
+ def test_choice
221
+ grammar = file(:choice)
222
+
223
+ match = grammar.parse('"a" | "b"')
224
+ assert(match)
225
+ assert_equal(2, match.rules.length)
226
+ assert_instance_of(Choice, match.value)
227
+
228
+ match = grammar.parse('"a" | ("b" "c")')
229
+ assert(match)
230
+ assert_equal(2, match.rules.length)
231
+ assert_instance_of(Choice, match.value)
232
+ end
233
+
213
234
  def test_sequence
214
235
  grammar = file(:sequence)
215
236
 
216
237
  match = grammar.parse('"" ""')
217
238
  assert(match)
218
- assert_kind_of(Rule, match.value)
239
+ assert_equal(2, match.rules.length)
219
240
  assert_instance_of(Sequence, match.value)
220
241
 
221
242
  match = grammar.parse('"a" "b" "c"')
222
243
  assert(match)
223
- assert_kind_of(Rule, match.value)
244
+ assert_equal(3, match.rules.length)
224
245
  assert_instance_of(Sequence, match.value)
225
246
  end
226
247
 
248
+ def test_expression
249
+ grammar = file(:expression)
250
+
251
+ match = grammar.parse('"" <Module>')
252
+ assert(match)
253
+ assert_kind_of(Rule, match.value)
254
+ assert_kind_of(Module, match.value.extension)
255
+
256
+ match = grammar.parse('"" {}')
257
+ assert(match)
258
+ assert_kind_of(Rule, match.value)
259
+ assert_kind_of(Module, match.value.extension)
260
+
261
+ match = grammar.parse('"" {} ')
262
+ assert(match)
263
+ assert_kind_of(Rule, match.value)
264
+ assert_kind_of(Module, match.value.extension)
265
+ end
266
+
227
267
  def test_prefix
228
268
  grammar = file(:prefix)
229
269
 
@@ -248,25 +288,6 @@ class CitrusFileTest < Test::Unit::TestCase
248
288
  assert_instance_of(Label, match.value)
249
289
  end
250
290
 
251
- def test_appendix
252
- grammar = file(:appendix)
253
-
254
- match = grammar.parse('"" <Module>')
255
- assert(match)
256
- assert_kind_of(Rule, match.value)
257
- assert_kind_of(Module, match.value.extension)
258
-
259
- match = grammar.parse('"" {}')
260
- assert(match)
261
- assert_kind_of(Rule, match.value)
262
- assert_kind_of(Module, match.value.extension)
263
-
264
- match = grammar.parse('"" {} ')
265
- assert(match)
266
- assert_kind_of(Rule, match.value)
267
- assert_kind_of(Module, match.value.extension)
268
- end
269
-
270
291
  def test_suffix
271
292
  grammar = file(:suffix)
272
293
 
@@ -325,11 +346,11 @@ class CitrusFileTest < Test::Unit::TestCase
325
346
 
326
347
  match = grammar.parse('include Module')
327
348
  assert(match)
328
- assert_equal('Module', match.value)
349
+ assert_equal(Module, match.value)
329
350
 
330
351
  match = grammar.parse('include ::Module')
331
352
  assert(match)
332
- assert_equal('::Module', match.value)
353
+ assert_equal(Module, match.value)
333
354
  end
334
355
 
335
356
  def test_root
@@ -577,6 +598,15 @@ class CitrusFileTest < Test::Unit::TestCase
577
598
  match = grammar.parse("{\n def value\n 'a'\n end\n} ")
578
599
  assert(match)
579
600
  assert(match.value)
601
+
602
+ end
603
+
604
+ def test_block_with_interpolation
605
+ grammar = file(:block)
606
+
607
+ match = grammar.parse('{ "#{number}" }')
608
+ assert(match)
609
+ assert(match.value)
580
610
  end
581
611
 
582
612
  def test_repeat
@@ -0,0 +1,22 @@
1
+ require File.expand_path('../helper', __FILE__)
2
+
3
+ class InputTest < Test::Unit::TestCase
4
+
5
+ def test_new_input
6
+ input = Input.new("abc\ndef\nghi")
7
+ assert_equal(0, input.line_offset)
8
+ assert_equal(0, input.line_index)
9
+ assert_equal(1, input.line_number)
10
+ assert_equal("abc\n", input.line)
11
+ end
12
+
13
+ def test_advanced_input
14
+ input = Input.new("abc\ndef\nghi")
15
+ input.pos = 6
16
+ assert_equal(2, input.line_offset)
17
+ assert_equal(1, input.line_index)
18
+ assert_equal(2, input.line_number)
19
+ assert_equal("def\n", input.line)
20
+ end
21
+
22
+ end
data/test/match_test.rb CHANGED
@@ -53,20 +53,4 @@ class MatchTest < Test::Unit::TestCase
53
53
  assert_equal(15, match.find(:alpha).length)
54
54
  end
55
55
 
56
- def test_offset
57
- match = Words.parse('one two')
58
- assert(match)
59
- assert_equal(0, match.offset)
60
-
61
- words = match.find(:word)
62
- assert(match)
63
- assert_equal(2, words.length)
64
-
65
- assert_equal('one', words[0])
66
- assert_equal(0, words[0].offset)
67
-
68
- assert_equal('two', words[1])
69
- assert_equal(4, words[1].offset)
70
- end
71
-
72
56
  end
metadata CHANGED
@@ -4,9 +4,9 @@ version: !ruby/object:Gem::Version
4
4
  prerelease: false
5
5
  segments:
6
6
  - 2
7
- - 0
8
7
  - 1
9
- version: 2.0.1
8
+ - 1
9
+ version: 2.1.1
10
10
  platform: ruby
11
11
  authors:
12
12
  - Michael Jackson
@@ -61,6 +61,7 @@ files:
61
61
  - doc/license.markdown
62
62
  - doc/links.markdown
63
63
  - doc/syntax.markdown
64
+ - doc/testing.markdown
64
65
  - examples/calc.citrus
65
66
  - examples/calc.rb
66
67
  - examples/ip.citrus
@@ -83,9 +84,11 @@ files:
83
84
  - test/calc_file_test.rb
84
85
  - test/calc_test.rb
85
86
  - test/choice_test.rb
87
+ - test/debug_test.rb
86
88
  - test/file_test.rb
87
89
  - test/grammar_test.rb
88
90
  - test/helper.rb
91
+ - test/input_test.rb
89
92
  - test/label_test.rb
90
93
  - test/match_test.rb
91
94
  - test/multibyte_test.rb
@@ -143,8 +146,10 @@ test_files:
143
146
  - test/calc_file_test.rb
144
147
  - test/calc_test.rb
145
148
  - test/choice_test.rb
149
+ - test/debug_test.rb
146
150
  - test/file_test.rb
147
151
  - test/grammar_test.rb
152
+ - test/input_test.rb
148
153
  - test/label_test.rb
149
154
  - test/match_test.rb
150
155
  - test/multibyte_test.rb