treetop 1.2.3 → 1.2.4

Sign up to get free protection for your applications and to get access to all the features.
data/README CHANGED
@@ -15,6 +15,7 @@ The first step in using Treetop is defining a grammar in a file with the `.treet
15
15
  Next, you start filling your grammar with rules. Each rule associates a name with a parsing expression, like the following:
16
16
 
17
17
  # my_grammar.treetop
18
+ # You can use a .tt extension instead if you wish
18
19
  grammar MyGrammar
19
20
  rule hello
20
21
  'hello chomsky'
@@ -27,10 +28,12 @@ The first rule becomes the *root* of the grammar, causing its expression to be m
27
28
  require 'rubygems'
28
29
  require 'treetop'
29
30
  Treetop.load 'my_grammar'
31
+ # or just:
32
+ # require 'my_grammar' # This works because Polyglot hooks "require" to find and load Treetop files
30
33
 
31
34
  parser = MyGrammarParser.new
32
- puts parser.parse('hello chomsky').success? # => true
33
- puts parser.parse('silly generativists!').success? # => false
35
+ puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
36
+ puts parser.parse('silly generativists!') # => nil
34
37
 
35
38
  Users of *regular expressions* will find parsing expressions familiar. They share the same basic purpose, matching strings against patterns. However, parsing expressions can recognize a broader category of languages than their less expressive brethren. Before we get into demonstrating that, lets cover some basics. At first parsing expressions won't seem much different. Trust that they are.
36
39
 
@@ -50,10 +53,12 @@ Ordered choices are *composite expressions*, which allow for any of several sube
50
53
  end
51
54
 
52
55
  # fragment of use_grammar.rb
53
- puts parser.parse('hello chomsky').success? # => true
54
- puts parser.parse('hello lambek').success? # => true
55
- puts parser.parse('silly generativists!').success? # => false
56
-
56
+ puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
57
+ puts parser.parse('hello lambek') # => Treetop::Runtime::SyntaxNode
58
+ puts parser.parse('silly generativists!') # => nil
59
+
60
+ Note that once a choice rule has matched the text using a particular alternative at a particular location in the input and hence has succeeded, that choice will never be reconsidered, even if the chosen alternative causes another rule to fail where a later alternative wouldn't have. It's always a later alternative, since the first to succeed is final - why keep looking when you've found what you wanted? This is a feature of PEG parsers that you need to understand if you're going to succeed in using Treetop. In order to memoize success and failures, such decisions cannot be reversed. Luckily Treetop provides a variety of clever ways you can tell it to avoid making the wrong decisions. But more on that later.
61
+
57
62
  Sequences
58
63
  ---------
59
64
  Sequences are composed of other parsing expressions separated by spaces. Using sequences, we can tighten up the above grammar.
@@ -65,7 +70,9 @@ Sequences are composed of other parsing expressions separated by spaces. Using s
65
70
  end
66
71
  end
67
72
 
68
- Node the use of parentheses to override the default precedence rules, which bind sequences more tightly than choices.
73
+ Note the use of parentheses to override the default precedence rules, which bind sequences more tightly than choices.
74
+
75
+ Once the whole sequence has been matched, the result is memoized and the details of the match will not be reconsidered for that location in the input.
69
76
 
70
77
  Nonterminal Symbols
71
78
  -------------------
@@ -94,6 +101,47 @@ The true power of this facility, however, is unleashed when writing *recursive e
94
101
 
95
102
  The `parens` expression simply states that a `parens` is a set of parentheses surrounding another `parens` expression or, if that doesn't match, the empty string. If you are uncomfortable with recursion, its time to get comfortable, because it is the basis of language. Here's a tip: Don't try and imagine the parser circling round and round through the same rule. Instead, imagine the rule is *already* defined while you are defining it. If you imagine that `parens` already matches a string of matching parentheses, then its easy to think of `parens` as an open and closing parentheses around another set of matching parentheses, which conveniently, you happen to be defining. You know that `parens` is supposed to represent a string of matched parentheses, so trust in that meaning, even if you haven't fully implemented it yet.
96
103
 
104
+ Repetition
105
+ ----------
106
+ Any item in a rule may be followed by a '+' or a '*' character, signifying one-or-more and zero-or-more occurrences of that item. Beware though; the match is greedy, and if it matches too many items and causes subsequent items in the sequence to fail, the number matched will never be reconsidered. Here's a simple example of a rule that will never succeed:
107
+
108
+ # toogreedy.treetop
109
+ grammar TooGreedy
110
+ rule a_s
111
+ 'a'* 'a'
112
+ end
113
+ end
114
+
115
+ The 'a'* will always eat up any 'a's that follow, and the subsequent 'a' will find none there, so the whole rule will fail. You might need to use lookahead to avoid matching too much.
116
+
117
+ Negative Lookahead
118
+ ------------------
119
+
120
+ When you need to ensure that the following item *doesn't* match in some case where it might otherwise, you can use negat!ve lookahead, which is an item preceeded by a ! - here's an example:
121
+
122
+ # postcondition.treetop
123
+ grammar PostCondition
124
+ rule conditional_sentence
125
+ ( !conditional_keyword word )+ conditional_keyword [ \t]+ word*
126
+ end
127
+
128
+ rule word
129
+ ([a-zA-Z]+ [ \t]+)
130
+ end
131
+
132
+ rule conditional_keyword
133
+ 'if' / 'while' / 'until'
134
+ end
135
+ end
136
+
137
+ Even though the rule `word` would match any of the conditional keywords, the first words of a conditional_sentence must not be conditional_keywords. The negative lookahead prevents that matching, and prevents the repetition from matching too much input. Note that the lookahead may be a grammar rule of any complexity, including one that isn't used elsewhere in your grammar.
138
+
139
+ Positive lookahead
140
+ ------------------
141
+
142
+ Sometimes you want an item to match, but only if the *following* text would match some pattern. You don't want to consume that following text, but if it's not there, you want this rule to fail. You can append a positive lookahead like this to a rule by appending the lookahead rule preceeded by an & character.
143
+
144
+
97
145
 
98
146
  Features to cover in the talk
99
147
  =============================
@@ -114,5 +162,3 @@ Features to cover in the talk
114
162
  * Use of super within within labels
115
163
  * Grammar composition with include
116
164
  * Use of super with grammar composition
117
-
118
-
data/Rakefile CHANGED
@@ -15,7 +15,7 @@ end
15
15
 
16
16
  gemspec = Gem::Specification.new do |s|
17
17
  s.name = "treetop"
18
- s.version = "1.2.3"
18
+ s.version = "1.2.4"
19
19
  s.author = "Nathan Sobo"
20
20
  s.email = "nathansobo@gmail.com"
21
21
  s.homepage = "http://functionalform.blogspot.com"
data/bin/tt CHANGED
File without changes
@@ -3,9 +3,10 @@ I've created a <a href="http://groups.google.com/group/treetop-dev">Google Group
3
3
  treetop-dev@google-groups.com
4
4
 
5
5
  #Contributing
6
+ Visit <a href="http://github.com/nathansobo/treetop/tree/master">the Treetop repository page on GitHub</a> in your browser for more information about checking out the source code.
7
+
6
8
  I like to try Rubinius's policy regarding commit rights. If you submit one patch worth integrating, I'll give you commit rights. We'll see how this goes, but I think it's a good policy.
7
9
 
8
- The source code is currently stored in a git repository at <a href="http://repo.or.cz/w/treetop.git">http://repo.or.cz/w/treetop.git</a>
9
10
 
10
11
  ##Getting Started with the Code
11
12
  Treetop compiler is interesting in that it is implemented in itself. Its functionality revolves around `metagrammar.treetop`, which specifies the grammar for Treetop grammars. I took a hybrid approach with regard to definition of methods on syntax nodes in the metagrammar. Methods that are more syntactic in nature, like those that provide access to elements of the syntax tree, are often defined inline, directly in the grammar. More semantic methods are defined in custom node classes.
@@ -10,7 +10,7 @@ class Layout < Erector::Widget
10
10
  :type => "text/css",
11
11
  :href => "./screen.css"
12
12
 
13
- text %(
13
+ rawtext %(
14
14
  <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
15
15
  </script>
16
16
  <script type="text/javascript">
@@ -42,7 +42,7 @@ class Layout < Erector::Widget
42
42
 
43
43
  def bluecloth(relative_path)
44
44
  File.open(File.join(File.dirname(__FILE__), relative_path)) do |file|
45
- text BlueCloth.new(file.read).to_html
45
+ rawtext BlueCloth.new(file.read).to_html
46
46
  end
47
47
  end
48
48
 
File without changes
@@ -6,23 +6,35 @@ module Treetop
6
6
  target_file.write(ruby_source(source_path))
7
7
  end
8
8
  end
9
-
9
+
10
+ # compile a treetop file into ruby
10
11
  def ruby_source(source_path)
11
- File.open(source_path) do |source_file|
12
- parser = MetagrammarParser.new
13
- result = parser.parse(source_file.read)
14
- unless result
15
- raise RuntimeError.new(parser.failure_reason)
16
- end
17
- result.compile
12
+ ruby_source_from_string(File.read(source_path))
13
+ end
14
+
15
+ # compile a string containing treetop source into ruby
16
+ def ruby_source_from_string(s)
17
+ parser = MetagrammarParser.new
18
+ result = parser.parse(s)
19
+ unless result
20
+ raise RuntimeError.new(parser.failure_reason)
18
21
  end
22
+ result.compile
19
23
  end
20
24
  end
21
25
  end
22
26
 
27
+ # compile a treetop source file and load it
23
28
  def self.load(path)
24
29
  adjusted_path = path =~ /\.(treetop|tt)\Z/ ? path : path + '.treetop'
30
+ File.open(adjusted_path) do |source_file|
31
+ load_from_string(source_file.read)
32
+ end
33
+ end
34
+
35
+ # compile a treetop source string and load it
36
+ def self.load_from_string(s)
25
37
  compiler = Treetop::Compiler::GrammarCompiler.new
26
- Object.class_eval(compiler.ruby_source(adjusted_path))
38
+ Object.class_eval(compiler.ruby_source_from_string(s))
27
39
  end
28
40
  end
@@ -141,7 +141,7 @@ module Treetop
141
141
  r3 = _nt_space
142
142
  s1 << r3
143
143
  if r3
144
- if input.index(/[A-Z]/, index) == index
144
+ if input.index(Regexp.new('[A-Z]'), index) == index
145
145
  r4 = (SyntaxNode).new(input, index...(index + 1))
146
146
  @index += 1
147
147
  else
@@ -324,7 +324,7 @@ module Treetop
324
324
  end
325
325
 
326
326
  i0, s0 = index, []
327
- if input.index(/[A-Z]/, index) == index
327
+ if input.index(Regexp.new('[A-Z]'), index) == index
328
328
  r1 = (SyntaxNode).new(input, index...(index + 1))
329
329
  @index += 1
330
330
  else
@@ -523,7 +523,7 @@ module Treetop
523
523
  r2 = _nt_space
524
524
  s0 << r2
525
525
  if r2
526
- if input.index(/[A-Z]/, index) == index
526
+ if input.index(Regexp.new('[A-Z]'), index) == index
527
527
  r3 = (SyntaxNode).new(input, index...(index + 1))
528
528
  @index += 1
529
529
  else
@@ -2508,7 +2508,7 @@ module Treetop
2508
2508
  else
2509
2509
  i5, s5 = index, []
2510
2510
  i6 = index
2511
- if input.index(/[{}]/, index) == index
2511
+ if input.index(Regexp.new('[{}]'), index) == index
2512
2512
  r7 = (SyntaxNode).new(input, index...(index + 1))
2513
2513
  @index += 1
2514
2514
  else
@@ -2691,7 +2691,7 @@ module Treetop
2691
2691
  return cached
2692
2692
  end
2693
2693
 
2694
- if input.index(/[A-Za-z_]/, index) == index
2694
+ if input.index(Regexp.new('[A-Za-z_]'), index) == index
2695
2695
  r0 = (SyntaxNode).new(input, index...(index + 1))
2696
2696
  @index += 1
2697
2697
  else
@@ -2716,7 +2716,7 @@ module Treetop
2716
2716
  if r1
2717
2717
  r0 = r1
2718
2718
  else
2719
- if input.index(/[0-9]/, index) == index
2719
+ if input.index(Regexp.new('[0-9]'), index) == index
2720
2720
  r2 = (SyntaxNode).new(input, index...(index + 1))
2721
2721
  @index += 1
2722
2722
  else
@@ -2841,12 +2841,7 @@ module Treetop
2841
2841
  break
2842
2842
  end
2843
2843
  end
2844
- if s2.empty?
2845
- self.index = i2
2846
- r2 = nil
2847
- else
2848
- r2 = SyntaxNode.new(input, i2...index, s2)
2849
- end
2844
+ r2 = SyntaxNode.new(input, i2...index, s2)
2850
2845
  s0 << r2
2851
2846
  end
2852
2847
  if s0.last
@@ -2870,7 +2865,7 @@ module Treetop
2870
2865
  return cached
2871
2866
  end
2872
2867
 
2873
- if input.index(/[ \t\n\r]/, index) == index
2868
+ if input.index(Regexp.new('[ \\t\\n\\r]'), index) == index
2874
2869
  r0 = (SyntaxNode).new(input, index...(index + 1))
2875
2870
  @index += 1
2876
2871
  else
@@ -393,7 +393,7 @@ module Treetop
393
393
  end
394
394
 
395
395
  rule comment_to_eol
396
- '#' (!"\n" .)+
396
+ '#' (!"\n" .)*
397
397
  end
398
398
 
399
399
  rule white
File without changes
File without changes
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: treetop
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.3
4
+ version: 1.2.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Nathan Sobo
@@ -9,7 +9,7 @@ autorequire: treetop
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2008-03-07 00:00:00 -08:00
12
+ date: 2008-06-02 00:00:00 +10:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -32,6 +32,7 @@ extra_rdoc_files: []
32
32
  files:
33
33
  - README
34
34
  - Rakefile
35
+ - lib/metagrammar.rb
35
36
  - lib/treetop
36
37
  - lib/treetop/bootstrap_gen_1_metagrammar.rb
37
38
  - lib/treetop/compiler
@@ -82,20 +83,6 @@ files:
82
83
  - doc/index.markdown
83
84
  - doc/pitfalls_and_advanced_techniques.markdown
84
85
  - doc/semantic_interpretation.markdown
85
- - doc/site
86
- - doc/site/contribute.html
87
- - doc/site/images
88
- - doc/site/images/bottom_background.png
89
- - doc/site/images/middle_background.png
90
- - doc/site/images/paren_language_output.png
91
- - doc/site/images/pivotal.gif
92
- - doc/site/images/top_background.png
93
- - doc/site/index.html
94
- - doc/site/pitfalls_and_advanced_techniques.html
95
- - doc/site/screen.css
96
- - doc/site/semantic_interpretation.html
97
- - doc/site/syntactic_recognition.html
98
- - doc/site/using_in_ruby.html
99
86
  - doc/site.rb
100
87
  - doc/sitegen.rb
101
88
  - doc/syntactic_recognition.markdown
@@ -133,7 +120,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
133
120
  requirements: []
134
121
 
135
122
  rubyforge_project:
136
- rubygems_version: 1.0.1
123
+ rubygems_version: 1.1.0
137
124
  signing_key:
138
125
  specification_version: 2
139
126
  summary: A Ruby-based text parsing and interpretation DSL
@@ -1,123 +0,0 @@
1
- <html><head><link type="text/css" href="./screen.css" rel="stylesheet" />
2
- <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3
- </script>
4
- <script type="text/javascript">
5
- _uacct = "UA-3418876-1";
6
- urchinTracker();
7
- </script>
8
- </head><body><div id="top"><div id="main_navigation"><ul><li><a href="syntactic_recognition.html">Documentation</a></li><li>Contribute</li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><h1>Google Group</h1>
9
-
10
- <p>I've created a <a href="http://groups.google.com/group/treetop-dev">Google Group</a> as a better place to organize discussion and development.
11
- treetop-dev@google-groups.com</p>
12
-
13
- <h1>Contributing</h1>
14
-
15
- <p>I like to try Rubinius's policy regarding commit rights. If you submit one patch worth integrating, I'll give you commit rights. We'll see how this goes, but I think it's a good policy.</p>
16
-
17
- <p>The source code is currently stored in a git repository at <a href="http://repo.or.cz/w/treetop.git">http://repo.or.cz/w/treetop.git</a></p>
18
-
19
- <h2>Getting Started with the Code</h2>
20
-
21
- <p>Treetop compiler is interesting in that it is implemented in itself. Its functionality revolves around <code>metagrammar.treetop</code>, which specifies the grammar for Treetop grammars. I took a hybrid approach with regard to definition of methods on syntax nodes in the metagrammar. Methods that are more syntactic in nature, like those that provide access to elements of the syntax tree, are often defined inline, directly in the grammar. More semantic methods are defined in custom node classes.</p>
22
-
23
- <p>Iterating on the metagrammar is tricky. The current testing strategy uses the last stable version of Treetop to parse the version under test. Then the version under test is used to parse and functionally test the various pieces of syntax it should recognize and translate to Ruby. As you change <code>metagrammar.treetop</code> and its associated node classes, note that the node classes you are changing are also used to support the previous stable version of the metagrammar, so must be kept backward compatible until such time as a new stable version can be produced to replace it.</p>
24
-
25
- <h2>Tests</h2>
26
-
27
- <p>Most of the compiler's tests are functional in nature. The grammar under test is used to parse and compile piece of sample code. Then I attempt to parse input with the compiled output and test its results.</p>
28
-
29
- <h1>What Needs to be Done</h1>
30
-
31
- <h2>Small Stuff</h2>
32
-
33
- <ul>
34
- <li>Improve the <code>tt</code> command line tool to allow <code>.treetop</code> extensions to be elided in its arguments.</li>
35
- <li>Generate and load temp files with <code>Treetop.load</code> rather than evaluating strings to improve stack trace readability.</li>
36
- <li>Allow <code>do/end</code> style blocks as well as curly brace blocks. This was originally omitted because I thought it would be confusing. It probably isn't.</li>
37
- </ul>
38
-
39
- <h2>Big Stuff</h2>
40
-
41
- <h4>Transient Expressions</h4>
42
-
43
- <p>Currently, every parsing expression instantiates a syntax node. This includes even very simple parsing expressions, like single characters. It is probably unnecessary for every single expression in the parse to correspond to its own syntax node, so much savings could be garnered from a transient declaration that instructs the parser only to attempt a match without instantiating nodes.</p>
44
-
45
- <h3>Generate Rule Implementations in C</h3>
46
-
47
- <p>Parsing expressions are currently compiled into simple Ruby source code that comprises the body of parsing rules, which are translated into Ruby methods. The generator could produce C instead of Ruby in the body of these method implementations.</p>
48
-
49
- <h3>Global Parsing State and Semantic Backtrack Triggering</h3>
50
-
51
- <p>Some programming language grammars are not entirely context-free, requiring that global state dictate the behavior of the parser in certain circumstances. Treetop does not currently expose explicit parser control to the grammar writer, and instead automatically constructs the syntax tree for them. A means of semantic parser control compatible with this approach would involve callback methods defined on parsing nodes. Each time a node is successfully parsed it will be given an opportunity to set global state and optionally trigger a parse failure on <em>extrasyntactic</em> grounds. Nodes will probably need to define an additional method that undoes their changes to global state when there is a parse failure and they are backtracked.</p>
52
-
53
- <p>Here is a sketch of the potential utility of such mechanisms. Consider the structure of YAML, which uses indentation to indicate block structure.</p>
54
-
55
- <pre><code>level_1:
56
- level_2a:
57
- level_2b:
58
- level_3a:
59
- level_2c:
60
- </code></pre>
61
-
62
- <p>Imagine a grammar like the following:</p>
63
-
64
- <pre><code>rule yaml_element
65
- name ':' block
66
- /
67
- name ':' value
68
- end
69
-
70
- rule block
71
- indent yaml_elements outdent
72
- end
73
-
74
- rule yaml_elements
75
- yaml_element (samedent yaml_element)*
76
- end
77
-
78
- rule samedent
79
- newline spaces {
80
- def after_success(parser_state)
81
- spaces.length == parser_state.indent_level
82
- end
83
- }
84
- end
85
-
86
- rule indent
87
- newline spaces {
88
- def after_success(parser_state)
89
- if spaces.length == parser_state.indent_level + 2
90
- parser_state.indent_level += 2
91
- true
92
- else
93
- false # fail the parse on extrasyntactic grounds
94
- end
95
- end
96
-
97
- def undo_success(parser_state)
98
- parser_state.indent_level -= 2
99
- end
100
- }
101
- end
102
-
103
- rule outdent
104
- newline spaces {
105
- def after_success(parser_state)
106
- if spaces.length == parser_state.indent_level - 2
107
- parser_state.indent_level -= 2
108
- true
109
- else
110
- false # fail the parse on extrasyntactic grounds
111
- end
112
- end
113
-
114
- def undo_success(parser_state)
115
- parser_state.indent_level += 2
116
- end
117
- }
118
- end
119
- </code></pre>
120
-
121
- <p>In this case a block will be detected only if a change in indentation warrants it. Note that this change in the state of indentation must be undone if a subsequent failure causes this node not to ultimately be incorporated into a successful result.</p>
122
-
123
- <p>I am by no means sure that the above sketch is free of problems, or even that this overall strategy is sound, but it seems like a promising path.</p></div></div><div id="bottom"></div></body></html>
Binary file
@@ -1,102 +0,0 @@
1
- <html><head><link type="text/css" href="./screen.css" rel="stylesheet" />
2
- <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3
- </script>
4
- <script type="text/javascript">
5
- _uacct = "UA-3418876-1";
6
- urchinTracker();
7
- </script>
8
- </head><body><div id="top"><div id="main_navigation"><ul><li><a href="syntactic_recognition.html">Documentation</a></li><li><a href="contribute.html">Contribute</a></li><li>Home</li></ul></div></div><div id="middle"><div id="content"><p class="intro_text">
9
-
10
- Treetop is a language for describing languages. Combining the elegance of Ruby with cutting-edge <em>parsing expression grammars</em>, it helps you analyze syntax with revolutionarily ease.
11
-
12
- </p>
13
-
14
- <pre><code>sudo gem install treetop
15
- </code></pre>
16
-
17
- <h1>Intuitive Grammar Specifications</h1>
18
-
19
- <p>Parsing expression grammars (PEGs) are simple to write and easy to maintain. They are a simple but powerful generalization of regular expressions that are easier to work with than the LALR or LR-1 grammars of traditional parser generators. There's no need for a tokenization phase, and <em>lookahead assertions</em> can be used for a limited degree of context-sensitivity. Here's an extremely simple Treetop grammar that matches a subset of arithmetic, respecting operator precedence:</p>
20
-
21
- <pre><code>grammar Arithmetic
22
- rule additive
23
- multitive '+' additive / multitive
24
- end
25
-
26
- rule multitive
27
- primary '*' multitive / primary
28
- end
29
-
30
- rule primary
31
- '(' additive ')' / number
32
- end
33
-
34
- rule number
35
- [1-9] [0-9]*
36
- end
37
- end
38
- </code></pre>
39
-
40
- <h1>Syntax-Oriented Programming</h1>
41
-
42
- <p>Rather than implementing semantic actions that construct parse trees, Treetop lets you define methods on trees that it constructs for you automatically. You can define these methods directly within the grammar...</p>
43
-
44
- <pre><code>grammar Arithmetic
45
- rule additive
46
- multitive '+' additive {
47
- def value
48
- multitive.value + additive.value
49
- end
50
- }
51
- /
52
- multitive
53
- end
54
-
55
- # other rules below ...
56
- end
57
- </code></pre>
58
-
59
- <p>...or associate rules with classes of nodes you wish your parsers to instantiate upon matching a rule.</p>
60
-
61
- <pre><code>grammar Arithmetic
62
- rule additive
63
- multitive '+' additive &lt;AdditiveNode&gt;
64
- /
65
- multitive
66
- end
67
-
68
- # other rules below ...
69
- end
70
- </code></pre>
71
-
72
- <h1>Reusable, Composable Language Descriptions</h1>
73
-
74
- <p>Because PEGs are closed under composition, Treetop grammars can be treated like Ruby modules. You can mix them into one another and override rules with access to the <code>super</code> keyword. You can break large grammars down into coherent units or make your language's syntax modular. This is especially useful if you want other programmers to be able to reuse your work.</p>
75
-
76
- <pre><code>grammar RubyWithEmbeddedSQL
77
- include SQL
78
-
79
- rule string
80
- quote sql_expression quote / super
81
- end
82
- end
83
- </code></pre>
84
-
85
- <h1>Acknowledgements</h1>
86
-
87
- <p><a href="http://pivotallabs.com"><img id="pivotal_logo" src="./images/pivotal.gif"></a></p>
88
-
89
- <p>First, thank you to my employer Rob Mee of <a href="http://pivotallabs.com"/>Pivotal Labs</a> for funding a substantial portion of Treetop's development. He gets it.</p>
90
-
91
- <p>I'd also like to thank:</p>
92
-
93
- <ul>
94
- <li>Damon McCormick for several hours of pair programming.</li>
95
- <li>Nick Kallen for lots of well-considered feedback and a few afternoons of programming.</li>
96
- <li>Brian Takita for a night of pair programming.</li>
97
- <li>Eliot Miranda for urging me rewrite as a compiler right away rather than putting it off.</li>
98
- <li>Ryan Davis and Eric Hodel for hurting my code.</li>
99
- <li>Dav Yaginuma for kicking me into action on my idea.</li>
100
- <li>Bryan Ford for his seminal work on Packrat Parsers.</li>
101
- <li>The editors of Lambda the Ultimate, where I discovered parsing expression grammars.</li>
102
- </ul></div></div><div id="bottom"></div></body></html>
@@ -1,68 +0,0 @@
1
- <html><head><link type="text/css" href="./screen.css" rel="stylesheet" />
2
- <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3
- </script>
4
- <script type="text/javascript">
5
- _uacct = "UA-3418876-1";
6
- urchinTracker();
7
- </script>
8
- </head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><div id="secondary_navigation"><ul><li><a href="syntactic_recognition.html">Syntax</a></li><li><a href="semantic_interpretation.html">Semantics</a></li><li><a href="using_in_ruby.html">Using In Ruby</a></li><li>Advanced Techniques</li></ul></div><div id="documentation_content"><h1>Pitfalls</h1>
9
-
10
- <h2>Left Recursion</h2>
11
-
12
- <p>An weakness shared by all recursive descent parsers is the inability to parse left-recursive rules. Consider the following rule:</p>
13
-
14
- <pre><code>rule left_recursive
15
- left_recursive 'a' / 'a'
16
- end
17
- </code></pre>
18
-
19
- <p>Logically it should match a list of 'a' characters. But it never consumes anything, because attempting to recognize <code>left_recursive</code> begins by attempting to recognize <code>left_recursive</code>, and so goes an infinite recursion. There's always a way to eliminate these types of structures from your grammar. There's a mechanistic transformation called <em>left factorization</em> that can eliminate it, but it isn't always pretty, especially in combination with automatically constructed syntax trees. So far, I have found more thoughtful ways around the problem. For instance, in the interpreter example I interpret inherently left-recursive function application right recursively in syntax, then correct the directionality in my semantic interpretation. You may have to be clever.</p>
20
-
21
- <h1>Advanced Techniques</h1>
22
-
23
- <p>Here are a few interesting problems I've encountered. I figure sharing them may give you insight into how these types of issues are addressed with the tools of parsing expressions.</p>
24
-
25
- <h2>Matching a String</h2>
26
-
27
- <pre><code>rule string
28
- '"' (!'"' . / '\"')* '"'
29
- end
30
- </code></pre>
31
-
32
- <p>This expression says: Match a quote, then zero or more of any character but a quote or an escaped quote followed by a quote. Lookahead assertions are essential for these types of problems.</p>
33
-
34
- <h2>Matching Nested Structures With Non-Unique Delimeters</h2>
35
-
36
- <p>Say I want to parse a diabolical wiki syntax in which the following interpretations apply.</p>
37
-
38
- <pre><code>** *hello* ** --&gt; &lt;strong&gt;&lt;em&gt;hello&lt;/em&gt;&lt;/strong&gt;
39
- * **hello** * --&gt; &lt;em&gt;&lt;strong&gt;hello&lt;/strong&gt;&lt;/em&gt;
40
-
41
- rule strong
42
- '**' (em / !'*' . / '*')+ '**'
43
- end
44
-
45
- rule em
46
- '**' (strong / !'*' . / '*')+ '**'
47
- end
48
- </code></pre>
49
-
50
- <p>Emphasized text is allowed within strong text by virtue of <code>em</code> being the first alternative. Since <code>em</code> will only successfully parse if a matching <code>*</code> is found, it is permitted, but other than that, no <code>*</code> characters are allowed unless they are escaped.</p>
51
-
52
- <h2>Matching a Keyword But Not Words Prefixed Therewith</h2>
53
-
54
- <p>Say I want to consider a given string a characters only when it occurs in isolation. Lets use the <code>end</code> keyword as an example. We don't want the prefix of <code>'enders_game'</code> to be considered a keyword. A naiive implementation might be the following.</p>
55
-
56
- <pre><code>rule end_keyword
57
- 'end' &amp;space
58
- end
59
- </code></pre>
60
-
61
- <p>This says that <code>'end'</code> must be followed by a space, but this space is not consumed as part of the matching of <code>keyword</code>. This works in most cases, but is actually incorrect. What if <code>end</code> occurs at the end of the buffer? In that case, it occurs in isolation but will not match the above expression. What we really mean is that <code>'end'</code> cannot be followed by a <em>non-space</em> character.</p>
62
-
63
- <pre><code>rule end_keyword
64
- 'end' !(!' ' .)
65
- end
66
- </code></pre>
67
-
68
- <p>In general, when the syntax gets tough, it helps to focus on what you really mean. A keyword is a character not followed by another character that isn't a space.</p></div></div></div><div id="bottom"></div></body></html>
@@ -1,129 +0,0 @@
1
- body {
2
- margin: 0;
3
- padding: 0;
4
- background: #666666;
5
- font-family: "Lucida Grande", Geneva, Arial, Verdana, sans-serif;
6
- color: #333333;
7
- }
8
-
9
- div {
10
- margin: 0;
11
- background-position: center;
12
- background-repeat: none;
13
- }
14
-
15
- h1 {
16
- font-size: 125%;
17
- margin-top: 1.5em;
18
- margin-bottom: .5em;
19
- }
20
-
21
- h2 {
22
- font-size: 115%;
23
- margin-top: 3em;
24
- margin-bottom: .5em;
25
- }
26
-
27
- h3 {
28
- font-size: 105%;
29
- margin-top: 1.5em;
30
- margin-bottom: .5em;
31
- }
32
-
33
- a {
34
- color: #ff8429;
35
- }
36
-
37
-
38
- div#top {
39
- background-image: url( "images/top_background.png" );
40
- height: 200px;
41
- width: 100%;
42
- }
43
-
44
- div#middle {
45
- padding-top: 10px;
46
- background-image: url( "images/middle_background.png" );
47
- background-repeat: repeat-y;
48
- }
49
-
50
- div#bottom {
51
- background-image: url( "images/bottom_background.png" );
52
- height: 13px;
53
- margin-bottom: 30px;
54
- }
55
-
56
- div#main_navigation {
57
- width: 300px;
58
- margin: 0px auto 0 auto;
59
- padding-top: 43px;
60
- padding-right: 10px;
61
- position: relative;
62
- right: 500px;
63
- text-align: right;
64
- line-height: 130%;
65
- font-size: 90%;
66
- }
67
-
68
- div#main_navigation ul {
69
- list-style-type: none;
70
- padding: 0;
71
- }
72
-
73
- div#main_navigation a, div#main_navigation a:visited {
74
- color: white;
75
- text-decoration: none;
76
- }
77
-
78
- div#main_navigation a:hover {
79
- text-decoration: underline;
80
- }
81
-
82
- div#secondary_navigation {
83
- position: relative;
84
- font-size: 90%;
85
- margin: 0 auto 0 auto;
86
- padding: 0px;
87
- text-align: center;
88
- position: relative;
89
- top: -10px;
90
- }
91
-
92
- div#secondary_navigation ul {
93
- list-style-type: none;
94
- padding: 0;
95
- }
96
-
97
- div#secondary_navigation li {
98
- display: inline;
99
- margin-left: 10px;
100
- margin-right: 10px;
101
- }
102
-
103
- div#content {
104
- width: 545px;
105
- margin: 0 auto 0 auto;
106
- padding: 0 60px 25px 60px;
107
- }
108
-
109
- pre {
110
- background: #333333;
111
- color: white;
112
- padding: 15px;
113
- border: 1px solid #666666;
114
- }
115
-
116
- p {
117
- line-height: 150%;
118
- }
119
-
120
- p.intro_text {
121
- color: #C45900;
122
- font-size: 115%;
123
- }
124
-
125
- img#pivotal_logo {
126
- border: none;
127
- margin-left: auto;
128
- margin-right: auto;
129
- }
@@ -1,214 +0,0 @@
1
- <html><head><link type="text/css" href="./screen.css" rel="stylesheet" />
2
- <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3
- </script>
4
- <script type="text/javascript">
5
- _uacct = "UA-3418876-1";
6
- urchinTracker();
7
- </script>
8
- </head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><div id="secondary_navigation"><ul><li><a href="syntactic_recognition.html">Syntax</a></li><li>Semantics</li><li><a href="using_in_ruby.html">Using In Ruby</a></li><li><a href="pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><div id="documentation_content"><h1>Semantic Interpretation</h1>
9
-
10
- <p>Lets use the below grammar as an example. It describes parentheses wrapping a single character to an arbitrary depth.</p>
11
-
12
- <pre><code>grammar ParenLanguage
13
- rule parenthesized_letter
14
- '(' parenthesized_letter ')'
15
- /
16
- [a-z]
17
- end
18
- end
19
- </code></pre>
20
-
21
- <p>Matches:</p>
22
-
23
- <ul>
24
- <li><code>'a'</code></li>
25
- <li><code>'(a)'</code></li>
26
- <li><code>'((a))'</code></li>
27
- <li>etc.</li>
28
- </ul>
29
-
30
- <p>Output from a parser for this grammar looks like this:</p>
31
-
32
- <p><img src="./images/paren_language_output.png" alt="Tree Returned By ParenLanguageParser"/></p>
33
-
34
- <p>This is a parse tree whose nodes are instances of <code>Treetop::Runtime::SyntaxNode</code>. What if we could define methods on these node objects? We would then have an object-oriented program whose structure corresponded to the structure of our language. Treetop provides two techniques for doing just this.</p>
35
-
36
- <h2>Associating Methods with Node-Instantiating Expressions</h2>
37
-
38
- <p>Sequences and all types of terminals are node-instantiating expressions. When they match, they create instances of <code>Treetop::Runtime::SyntaxNode</code>. Methods can be added to these nodes in the following ways:</p>
39
-
40
- <h3>Inline Method Definition</h3>
41
-
42
- <p>Methods can be added to the nodes instantiated by the successful match of an expression</p>
43
-
44
- <pre><code>grammar ParenLanguage
45
- rule parenthesized_letter
46
- '(' parenthesized_letter ')' {
47
- def depth
48
- parenthesized_letter.depth + 1
49
- end
50
- }
51
- /
52
- [a-z] {
53
- def depth
54
- 0
55
- end
56
- }
57
- end
58
- end
59
- </code></pre>
60
-
61
- <p>Note that each alternative expression is followed by a block containing a method definition. A <code>depth</code> method is defined on both expressions. The recursive <code>depth</code> method defined in the block following the first expression determines the depth of the nested parentheses and adds one two it. The base case is implemented in the block following the second expression; a single character has a depth of 0.</p>
62
-
63
- <h3>Custom <code>SyntaxNode</code> Subclass Declarations</h3>
64
-
65
- <p>You can instruct the parser to instantiate a custom subclass of Treetop::Runtime::SyntaxNode for an expression by following it by the name of that class enclosed in angle brackets (<code>&lt;&gt;</code>). The above inline method definitions could have been moved out into a single class like so.</p>
66
-
67
- <pre><code># in .treetop file
68
- grammar ParenLanguage
69
- rule parenthesized_letter
70
- '(' parenthesized_letter ')' &lt;ParenNode&gt;
71
- /
72
- [a-z] &lt;ParenNode&gt;
73
- end
74
- end
75
-
76
- # in separate .rb file
77
- class ParenNode &lt; Treetop::Runtime::SyntaxNode
78
- def depth
79
- if nonterminal?
80
- parenthesized_letter.depth + 1
81
- else
82
- 0
83
- end
84
- end
85
- end
86
- </code></pre>
87
-
88
- <h2>Automatic Extension of Results</h2>
89
-
90
- <p>Nonterminal and ordered choice expressions do not instantiate new nodes, but rather pass through nodes that are instantiated by other expressions. They can extend nodes they propagate with anonymous or declared modules, using similar constructs used with expressions that instantiate their own syntax nodes.</p>
91
-
92
- <h3>Extending a Propagated Node with an Anonymous Module</h3>
93
-
94
- <pre><code>rule parenthesized_letter
95
- ('(' parenthesized_letter ')' / [a-z]) {
96
- def depth
97
- if nonterminal?
98
- parenthesized_letter.depth + 1
99
- else
100
- 0
101
- end
102
- end
103
- }
104
- end
105
- </code></pre>
106
-
107
- <p>The parenthesized choice above can result in a node matching either of the two choices. Than node will be extended with methods defined in the subsequent block. Note that a choice must always be parenthesized to be associated with a following block.</p>
108
-
109
- <h3>Extending A Propagated Node with a Declared Module</h3>
110
-
111
- <pre><code># in .treetop file
112
- rule parenthesized_letter
113
- ('(' parenthesized_letter ')' / [a-z]) &lt;ParenNode&gt;
114
- end
115
-
116
- # in separate .rb file
117
- module ParenNode
118
- def depth
119
- if nonterminal?
120
- parenthesized_letter.depth + 1
121
- else
122
- 0
123
- end
124
- end
125
- end
126
- </code></pre>
127
-
128
- <p>Here the result is extended with the <code>ParenNode</code> module. Note the previous example for node-instantiating expressions, the constant in the declaration must be a module because the result is extended with it.</p>
129
-
130
- <h2>Automatically-Defined Element Accessor Methods</h2>
131
-
132
- <h3>Default Accessors</h3>
133
-
134
- <p>Nodes instantiated upon the matching of sequences have methods automatically defined for any nonterminals in the sequence.</p>
135
-
136
- <pre><code>rule abc
137
- a b c {
138
- def to_s
139
- a.to_s + b.to_s + c.to_s
140
- end
141
- }
142
- end
143
- </code></pre>
144
-
145
- <p>In the above code, the <code>to_s</code> method calls automatically-defined element accessors for the nodes returned by parsing nonterminals <code>a</code>, <code>b</code>, and <code>c</code>. </p>
146
-
147
- <h3>Labels</h3>
148
-
149
- <p>Subexpressions can be given an explicit label to have an element accessor method defined for them. This is useful in cases of ambiguity between two references to the same nonterminal or when you need to access an unnamed subexpression.</p>
150
-
151
- <pre><code>rule labels
152
- first_letter:[a-z] rest_letters:(', ' letter:[a-z])* {
153
- def letters
154
- [first_letter] + rest_letters.map do |comma_and_letter|
155
- comma_and_letter.letter
156
- end
157
- end
158
- }
159
- end
160
- </code></pre>
161
-
162
- <p>The above grammar uses label-derived accessors to determine the letters in a comma-delimited list of letters. The labeled expressions <em>could</em> have been extracted to their own rules, but if they aren't used elsewhere, labels still enable them to be referenced by a name within the expression's methods.</p>
163
-
164
- <h3>Overriding Element Accessors</h3>
165
-
166
- <p>The module containing automatically defined element accessor methods is an ancestor of the module in which you define your own methods, meaning you can override them with access to the <code>super</code> keyword. Here's an example of how this fact can improve the readability of the example above.</p>
167
-
168
- <pre><code>rule labels
169
- first_letter:[a-z] rest_letters:(', ' letter:[a-z])* {
170
- def letters
171
- [first_letter] + rest_letters
172
- end
173
-
174
- def rest_letters
175
- super.map { |comma_and_letter| comma_and_letter.letter }
176
- end
177
- }
178
- end
179
- </code></pre>
180
-
181
- <h2>Methods Available on <code>Treetop::Runtime::SyntaxNode</code></h2>
182
-
183
- <table>
184
- <tr>
185
- <td>
186
- <code>terminal?</code>
187
- </td>
188
- <td>
189
- Was this node produced by the matching of a terminal symbol?
190
- </td>
191
- </tr>
192
- <tr>
193
- <td>
194
- <code>nonterminal?</code>
195
- </td>
196
- <td>
197
- Was this node produced by the matching of a nonterminal symbol?
198
- </td>
199
- <tr>
200
- <td>
201
- <code>text_value</code>
202
- </td>
203
- <td>
204
- The substring of the input represented by this node.
205
- </td>
206
- <tr>
207
- <td>
208
- <code>elements</code>
209
- </td>
210
- <td>
211
- Available only on nonterminal nodes, returns the nodes parsed by the elements of the matched sequence.
212
- </td>
213
- </tr>
214
- </table></div></div></div><div id="bottom"></div></body></html>
@@ -1,142 +0,0 @@
1
- <html><head><link type="text/css" href="./screen.css" rel="stylesheet" />
2
- <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3
- </script>
4
- <script type="text/javascript">
5
- _uacct = "UA-3418876-1";
6
- urchinTracker();
7
- </script>
8
- </head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><div id="secondary_navigation"><ul><li>Syntax</li><li><a href="semantic_interpretation.html">Semantics</a></li><li><a href="using_in_ruby.html">Using In Ruby</a></li><li><a href="pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><div id="documentation_content"><h1>Syntactic Recognition</h1>
9
-
10
- <p>Treetop grammars are written in a custom language based on parsing expression grammars. Literature on the subject of <a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">parsing expression grammars</a> is useful in writing Treetop grammars.</p>
11
-
12
- <h1>Grammar Structure</h1>
13
-
14
- <p>Treetop grammars look like this:</p>
15
-
16
- <pre><code>grammar GrammarName
17
- rule rule_name
18
- ...
19
- end
20
-
21
- rule rule_name
22
- ...
23
- end
24
-
25
- ...
26
- end
27
- </code></pre>
28
-
29
- <p>The main keywords are:</p>
30
-
31
- <ul>
32
- <li><p><code>grammar</code> : This introduces a new grammar. It is followed by a constant name to which the grammar will be bound when it is loaded.</p></li>
33
- <li><p><code>rule</code> : This defines a parsing rule within the grammar. It is followed by a name by which this rule can be referenced within other rules. It is then followed by a parsing expression defining the rule.</p></li>
34
- </ul>
35
-
36
- <h1>Parsing Expressions</h1>
37
-
38
- <p>Each rule associates a name with a <em>parsing expression</em>. Parsing expressions are a generalization of vanilla regular expressions. Their key feature is the ability to reference other expressions in the grammar by name.</p>
39
-
40
- <h2>Terminal Symbols</h2>
41
-
42
- <h3>Strings</h3>
43
-
44
- <p>Strings are surrounded in double or single quotes and must be matched exactly.</p>
45
-
46
- <ul>
47
- <li><code>"foo"</code></li>
48
- <li><code>'foo'</code></li>
49
- </ul>
50
-
51
- <h3>Character Classes</h3>
52
-
53
- <p>Character classes are surrounded by brackets. Their semantics are identical to those used in Ruby's regular expressions.</p>
54
-
55
- <ul>
56
- <li><code>[a-zA-Z]</code></li>
57
- <li><code>[0-9]</code></li>
58
- </ul>
59
-
60
- <h3>The Anything Symbol</h3>
61
-
62
- <p>The anything symbol is represented by a dot (<code>.</code>) and matches any single character.</p>
63
-
64
- <h2>Nonterminal Symbols</h2>
65
-
66
- <p>Nonterminal symbols are unquoted references to other named rules. They are equivalent to an inline substitution of the named expression.</p>
67
-
68
- <pre><code>rule foo
69
- "the dog " bar
70
- end
71
-
72
- rule bar
73
- "jumped"
74
- end
75
- </code></pre>
76
-
77
- <p>The above grammar is equivalent to:</p>
78
-
79
- <pre><code>rule foo
80
- "the dog jumped"
81
- end
82
- </code></pre>
83
-
84
- <h2>Ordered Choice</h2>
85
-
86
- <p>Parsers attempt to match ordered choices in left-to-right order, and stop after the first successful match.</p>
87
-
88
- <pre><code>"foobar" / "foo" / "bar"
89
- </code></pre>
90
-
91
- <p>Note that if <code>"foo"</code> in the above expression came first, <code>"foobar"</code> would never be matched.</p>
92
-
93
- <h2>Sequences</h2>
94
-
95
- <p>Sequences are a space-separated list of parsing expressions. They have higher precedence than choices, so choices must be parenthesized to be used as the elements of a sequence. </p>
96
-
97
- <pre><code>"foo" "bar" ("baz" / "bop")
98
- </code></pre>
99
-
100
- <h2>Zero or More</h2>
101
-
102
- <p>Parsers will greedily match an expression zero or more times if it is followed by the star (<code>*</code>) symbol.</p>
103
-
104
- <ul>
105
- <li><code>'foo'*</code> matches the empty string, <code>"foo"</code>, <code>"foofoo"</code>, etc.</li>
106
- </ul>
107
-
108
- <h2>One or More</h2>
109
-
110
- <p>Parsers will greedily match an expression one or more times if it is followed by the star (<code>+</code>) symbol.</p>
111
-
112
- <ul>
113
- <li><code>'foo'+</code> does not match the empty string, but matches <code>"foo"</code>, <code>"foofoo"</code>, etc.</li>
114
- </ul>
115
-
116
- <h2>Optional Expressions</h2>
117
-
118
- <p>An expression can be declared optional by following it with a question mark (<code>?</code>).</p>
119
-
120
- <ul>
121
- <li><code>'foo'?</code> matches <code>"foo"</code> or the empty string.</li>
122
- </ul>
123
-
124
- <h2>Lookahead Assertions</h2>
125
-
126
- <p>Lookahead assertions can be used to give parsing expressions a limited degree of context-sensitivity. The parser will look ahead into the buffer and attempt to match an expression without consuming input.</p>
127
-
128
- <h3>Positive Lookahead Assertion</h3>
129
-
130
- <p>Preceding an expression with an ampersand <code>(&amp;)</code> indicates that it must match, but no input will be consumed in the process of determining whether this is true.</p>
131
-
132
- <ul>
133
- <li><code>"foo" &amp;"bar"</code> matches <code>"foobar"</code> but only consumes up to the end <code>"foo"</code>. It will not match <code>"foobaz"</code>.</li>
134
- </ul>
135
-
136
- <h3>Negative Lookahead Assertion</h3>
137
-
138
- <p>Preceding an expression with a bang <code>(!)</code> indicates that the expression must not match, but no input will be consumed in the process of determining whether this is true.</p>
139
-
140
- <ul>
141
- <li><code>"foo" !"bar"</code> matches <code>"foobaz"</code> but only consumes up to the end <code>"foo"</code>. It will not match <code>"foobar"</code>.</li>
142
- </ul></div></div></div><div id="bottom"></div></body></html>
@@ -1,34 +0,0 @@
1
- <html><head><link type="text/css" href="./screen.css" rel="stylesheet" />
2
- <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3
- </script>
4
- <script type="text/javascript">
5
- _uacct = "UA-3418876-1";
6
- urchinTracker();
7
- </script>
8
- </head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="content"><div id="secondary_navigation"><ul><li><a href="syntactic_recognition.html">Syntax</a></li><li><a href="semantic_interpretation.html">Semantics</a></li><li>Using In Ruby</li><li><a href="pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><div id="documentation_content"><h1>Using Treetop Grammars in Ruby</h1>
9
-
10
- <h2>Using the Command Line Compiler</h2>
11
-
12
- <p>You can <code>.treetop</code> files into Ruby source code with the <code>tt</code> command line script. <code>tt</code> takes an list of files with a <code>.treetop</code> extension and compiles them into <code>.rb</code> files of the same name. You can then <code>require</code> these files like any other Ruby script. Alternately, you can supply just one <code>.treetop</code> file and a <code>-o</code> flag to name specify the name of the output file. Improvements to this compilation script are welcome.</p>
13
-
14
- <pre><code>tt foo.treetop bar.treetop
15
- tt foo.treetop -o foogrammar.rb
16
- </code></pre>
17
-
18
- <h2>Loading A Grammar Directly</h2>
19
-
20
- <p>The Polyglot gem makes it possible to load <code>.treetop</code> or <code>.tt</code> files directly with <code>require</code>. This will invoke <code>Treetop.load</code>, which automatically compiles the grammar to Ruby and then evaluates the Ruby source. If you are getting errors in methods you define on the syntax tree, try using the command line compiler for better stack trace feedback. A better solution to this issue is in the works.</p>
21
-
22
- <h2>Instantiating and Using Parsers</h2>
23
-
24
- <p>If a grammar by the name of <code>Foo</code> is defined, the compiled Ruby source will define a <code>FooParser</code> class. To parse input, create an instance and call its <code>parse</code> method with a string. The parser will return the syntax tree of the match or <code>nil</code> if there is a failure.</p>
25
-
26
- <pre><code>Treetop.load "arithmetic"
27
-
28
- parser = ArithmeticParser.new
29
- if parser.parse('1+1')
30
- puts 'success'
31
- else
32
- puts 'failure'
33
- end
34
- </code></pre></div></div></div><div id="bottom"></div></body></html>