treetop 1.4.8 → 1.4.9

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,271 @@
1
+ <html><head><link href="./screen.css" rel="stylesheet" type="text/css" />
2
+ <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3
+ </script>
4
+ <script type="text/javascript">
5
+ _uacct = "UA-3418876-1";
6
+ urchinTracker();
7
+ </script>
8
+ </head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="main_content"><div id="secondary_navigation"><ul><li>Syntax</li><li><a href="semantic_interpretation.html">Semantics</a></li><li><a href="using_in_ruby.html">Using In Ruby</a></li><li><a href="pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><div id="documentation_content"><h1>Syntactic Recognition</h1>
9
+
10
+ <p>Treetop grammars are written in a custom language based on parsing expression grammars. Literature on the subject of <a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">parsing expression grammars</a> (PEGs) is useful in writing Treetop grammars.</p>
11
+
12
+ <p>PEGs have no separate lexical analyser (since the algorithm has the same time-complexity guarantees as the best lexical analysers) so all whitespace and other lexical niceties (like comments) must be explicitly handled in the grammar. A further benefit is that multiple PEG grammars may be seamlessly composed into a single parser.</p>
13
+
14
+ <h1>Grammar Structure</h1>
15
+
16
+ <p>Treetop grammars look like this:</p>
17
+
18
+ <pre><code>require "my_stuff"
19
+
20
+ grammar GrammarName
21
+ include Module::Submodule
22
+
23
+ rule rule_name
24
+ ...
25
+ end
26
+
27
+ rule rule_name
28
+ ...
29
+ end
30
+
31
+ ...
32
+ end
33
+ </code></pre>
34
+
35
+ <p>The main keywords are:</p>
36
+
37
+ <ul>
38
+ <li><p><code>grammar</code> : This introduces a new grammar. It is followed by a constant name to which the grammar will be bound when it is loaded.</p></li>
39
+ <li><p><code>include</code>: This causes the generated parser to include the referenced Ruby module (which may be another parser)</p></li>
40
+ <li><p><code>require</code>: This must be at the start of the file, and is passed through to the emitted Ruby grammar</p></li>
41
+ <li><p><code>rule</code> : This defines a parsing rule within the grammar. It is followed by a name by which this rule can be referenced within other rules. It is then followed by a parsing expression defining the rule.</p></li>
42
+ </ul>
43
+
44
+
45
+ <p>A grammar may be surrounded by one or more nested <code>module</code> statements, which provides a namespace for the generated Ruby parser.</p>
46
+
47
+ <p>Treetop will emit a module called <code>GrammarName</code> and a parser class called <code>GrammarNameParser</code> (in the module namespace, if specified).</p>
48
+
49
+ <h1>Parsing Expressions</h1>
50
+
51
+ <p>Each rule associates a name with a <em>parsing expression</em>. Parsing expressions are a generalization of vanilla regular expressions. Their key feature is the ability to reference other expressions in the grammar by name.</p>
52
+
53
+ <h2>Terminal Symbols</h2>
54
+
55
+ <h3>Strings</h3>
56
+
57
+ <p>Strings are surrounded in double or single quotes and must be matched exactly.</p>
58
+
59
+ <ul>
60
+ <li><code>"foo"</code></li>
61
+ <li><code>'foo'</code></li>
62
+ </ul>
63
+
64
+
65
+ <h3>Character Classes</h3>
66
+
67
+ <p>Character classes are surrounded by brackets. Their semantics are identical to those used in Ruby's regular expressions.</p>
68
+
69
+ <ul>
70
+ <li><code>[a-zA-Z]</code></li>
71
+ <li><code>[0-9]</code></li>
72
+ </ul>
73
+
74
+
75
+ <h3>The Anything Symbol</h3>
76
+
77
+ <p>The anything symbol is represented by a dot (<code>.</code>) and matches any single character.</p>
78
+
79
+ <h3>Ellipsis</h3>
80
+
81
+ <p>An empty string matches at any position and consumes no input. It's useful when you wish to treat a single symbol as part of a sequence, for example when an alternate rule will be processed using shared code.</p>
82
+
83
+ <ul>
84
+ <li><code>''</code></li>
85
+ </ul>
86
+
87
+
88
+ <h2>Nonterminal Symbols</h2>
89
+
90
+ <p>Nonterminal symbols are unquoted references to other named rules. They are equivalent to an inline substitution of the named expression.</p>
91
+
92
+ <pre><code>rule foo
93
+ "the dog " bar
94
+ end
95
+
96
+ rule bar
97
+ "jumped"
98
+ end
99
+ </code></pre>
100
+
101
+ <p>The above grammar is equivalent to:</p>
102
+
103
+ <pre><code>rule foo
104
+ "the dog jumped"
105
+ end
106
+ </code></pre>
107
+
108
+ <h2>Ordered Choice</h2>
109
+
110
+ <p>Parsers attempt to match ordered choices in left-to-right order, and stop after the first successful match.</p>
111
+
112
+ <pre><code>"foobar" / "foo" / "bar"
113
+ </code></pre>
114
+
115
+ <p>Note that if <code>"foo"</code> in the above expression came first, <code>"foobar"</code> would never be matched.
116
+ Note also that the above rule will match <code>"bar"</code> as a prefix of <code>"barbie"</code>.
117
+ Care is required when it's desired to match language keywords exactly.</p>
118
+
119
+ <h2>Sequences</h2>
120
+
121
+ <p>Sequences are a space-separated list of parsing expressions. They have higher precedence than choices, so choices must be parenthesized to be used as the elements of a sequence.</p>
122
+
123
+ <pre><code>"foo" "bar" ("baz" / "bop")
124
+ </code></pre>
125
+
126
+ <h2>Zero or More</h2>
127
+
128
+ <p>Parsers will greedily match an expression zero or more times if it is followed by the star (<code>*</code>) symbol.</p>
129
+
130
+ <ul>
131
+ <li><code>'foo'*</code> matches the empty string, <code>"foo"</code>, <code>"foofoo"</code>, etc.</li>
132
+ </ul>
133
+
134
+
135
+ <h2>One or More</h2>
136
+
137
+ <p>Parsers will greedily match an expression one or more times if it is followed by the plus (<code>+</code>) symbol.</p>
138
+
139
+ <ul>
140
+ <li><code>'foo'+</code> does not match the empty string, but matches <code>"foo"</code>, <code>"foofoo"</code>, etc.</li>
141
+ </ul>
142
+
143
+
144
+ <h2>Optional Expressions</h2>
145
+
146
+ <p>An expression can be declared optional by following it with a question mark (<code>?</code>).</p>
147
+
148
+ <ul>
149
+ <li><code>'foo'?</code> matches <code>"foo"</code> or the empty string.</li>
150
+ </ul>
151
+
152
+
153
+ <h2>Repetition count</h2>
154
+
155
+ <p>A generalised repetition count (minimum, maximum) is also available.</p>
156
+
157
+ <ul>
158
+ <li><code>'foo' 2..</code> matches <code>'foo'</code> two or more times</li>
159
+ <li><code>'foo' 3..5</code> matches <code>'foo'</code> from three to five times</li>
160
+ <li><code>'foo' ..4</code> matches <code>'foo'</code> from zero to four times</li>
161
+ </ul>
162
+
163
+
164
+ <h2>Lookahead Assertions</h2>
165
+
166
+ <p>Lookahead assertions can be used to make parsing expressions context-sensitive.
167
+ The parser will look ahead into the buffer and attempt to match an expression without consuming input.</p>
168
+
169
+ <h3>Positive Lookahead Assertion</h3>
170
+
171
+ <p>Preceding an expression with an ampersand <code>(&amp;)</code> indicates that it must match, but no input will be consumed in the process of determining whether this is true.</p>
172
+
173
+ <ul>
174
+ <li><code>"foo" &amp;"bar"</code> matches <code>"foobar"</code> but only consumes up to the end <code>"foo"</code>. It will not match <code>"foobaz"</code>.</li>
175
+ </ul>
176
+
177
+
178
+ <h3>Negative Lookahead Assertion</h3>
179
+
180
+ <p>Preceding an expression with a bang <code>(!)</code> indicates that the expression must not match, but no input will be consumed in the process of determining whether this is true.</p>
181
+
182
+ <ul>
183
+ <li><code>"foo" !"bar"</code> matches <code>"foobaz"</code> but only consumes up to the end <code>"foo"</code>. It will not match <code>"foobar"</code>.</li>
184
+ </ul>
185
+
186
+
187
+ <p>Note that a lookahead assertion may be used on any rule, not just a string terminal.</p>
188
+
189
+ <pre><code>rule things
190
+ thing (!(disallowed / ',') following)*
191
+ end
192
+ </code></pre>
193
+
194
+ <p>Here's a common use case:</p>
195
+
196
+ <pre><code>rule word
197
+ [a-zA-Z]+
198
+ end
199
+
200
+ rule conjunction
201
+ primitive ('and' ' '+ primitive)*
202
+ end
203
+
204
+ rule primitive
205
+ (!'and' word ' '+)*
206
+ end
207
+ </code></pre>
208
+
209
+ <p>Here's the easiest way to handle C-style comments:</p>
210
+
211
+ <pre><code>rule c_comment
212
+ '/*'
213
+ (
214
+ !'*/'
215
+ (. / "\n")
216
+ )*
217
+ '*/'
218
+ end
219
+ </code></pre>
220
+
221
+ <h2>Semantic predicates (positive and negative)</h2>
222
+
223
+ <p>Sometimes you must execute Ruby code during parsing in order to decide how to proceed.
224
+ This is an advanced feature, and must be used with great care, because it can change the
225
+ way a Treetop parser backtracks in a way that breaks the parsing algorithm. See the
226
+ notes on below on how to use this feature safely.</p>
227
+
228
+ <p>The code block is the body of a Ruby lambda block, and should return true or false, to cause this
229
+ parse rule to continue or fail (for positive sempreds), fail or continue (for negative sempreds).</p>
230
+
231
+ <ul>
232
+ <li><code>&amp;{ ... }</code> Evaluate the block and fail this rule if the result is false or nil</li>
233
+ <li><code>!{ ... }</code> Evaluate the block and fail this rule if the result is not false or nil</li>
234
+ </ul>
235
+
236
+
237
+ <p>The lambda is passed a single argument which is the array of syntax nodes matched so far in the
238
+ current sequence. Note that because the current rule has not yet succeeded, no syntax node has
239
+ yet been constructed, and so the lambda block is being run in a context where the <code>names</code> of the
240
+ preceding rules (or as assigned by labels) are not available to access the sub-rules.</p>
241
+
242
+ <pre><code>rule id
243
+ [a-zA-Z][a-zA-Z0-9]*
244
+ {
245
+ def is_reserved
246
+ ReservedSymbols.include? text_value
247
+ end
248
+ }
249
+ end
250
+
251
+ rule foo_rule
252
+ foo id &amp;{|seq| seq[1].is_reserved } baz`
253
+ end
254
+ </code></pre>
255
+
256
+ <p>Match "foo id baz" only if <code>id.is_reserved</code>. Note that <code>id</code> cannot be referenced by name from <code>foo_rule</code>,
257
+ since that rule has not yet succeeded, but <code>id</code> has completed and so its added methods are available.</p>
258
+
259
+ <pre><code>rule test_it
260
+ foo bar &amp;{|s| debugger; true } baz
261
+ end
262
+ </code></pre>
263
+
264
+ <p>Match <code>foo</code> then <code>bar</code>, stop to enter the debugger (make sure you have said <code>require "ruby-debug"</code> somewhere),
265
+ then continue by trying to match <code>baz</code>.</p>
266
+
267
+ <p>Treetop, like other PEG parsers, achieves its performance guarantee by remembering which rules it has
268
+ tried at which locations in the input, and what the result was. This process, called memoization,
269
+ requires that the rule would produce the same result (if run again) as it produced the first time when
270
+ the result was remembered. If you violate this principle in your semantic predicates, be prepared to
271
+ fight Cerberus before you're allowed out of Hades again.</p></div></div></div><div id="bottom"></div></body></html>
@@ -0,0 +1,123 @@
1
+ <html><head><link href="./screen.css" rel="stylesheet" type="text/css" />
2
+ <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3
+ </script>
4
+ <script type="text/javascript">
5
+ _uacct = "UA-3418876-1";
6
+ urchinTracker();
7
+ </script>
8
+ </head><body><div id="top"><div id="main_navigation"><ul><li>Documentation</li><li><a href="contribute.html">Contribute</a></li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="main_content"><div id="secondary_navigation"><ul><li><a href="syntactic_recognition.html">Syntax</a></li><li><a href="semantic_interpretation.html">Semantics</a></li><li>Using In Ruby</li><li><a href="pitfalls_and_advanced_techniques.html">Advanced Techniques</a></li></ul></div><div id="documentation_content"><h1>Using Treetop Grammars in Ruby</h1>
9
+
10
+ <h2>Using the Command Line Compiler</h2>
11
+
12
+ <p>You can compile <code>.treetop</code> files into Ruby source code with the <code>tt</code> command line script. <code>tt</code> takes an list of files with a <code>.treetop</code> extension and compiles them into <code>.rb</code> files of the same name. You can then <code>require</code> these files like any other Ruby script. Alternately, you can supply just one <code>.treetop</code> file and a <code>-o</code> flag to name specify the name of the output file. Improvements to this compilation script are welcome.</p>
13
+
14
+ <pre><code>tt foo.treetop bar.treetop
15
+ tt foo.treetop -o foogrammar.rb
16
+ </code></pre>
17
+
18
+ <h2>Loading A Grammar Directly</h2>
19
+
20
+ <p>The Polyglot gem makes it possible to load <code>.treetop</code> or <code>.tt</code> files directly with <code>require</code>. This will invoke <code>Treetop.load</code>, which automatically compiles the grammar to Ruby and then evaluates the Ruby source. If you are getting errors in methods you define on the syntax tree, try using the command line compiler for better stack trace feedback. A better solution to this issue is in the works.</p>
21
+
22
+ <p>In order to use Polyglot dynamic loading of <code>.treetop</code> or <code>.tt</code> files though, you need to require the Polyglot gem before you require the Treetop gem as Treetop will only create hooks into Polyglot for the treetop files if Polyglot is already loaded. So you need to use:</p>
23
+
24
+ <pre><code>require 'polyglot'
25
+ require 'treetop'
26
+ </code></pre>
27
+
28
+ <p>in order to use Polyglot auto loading with Treetop in Ruby.</p>
29
+
30
+ <h2>Instantiating and Using Parsers</h2>
31
+
32
+ <p>If a grammar by the name of <code>Foo</code> is defined, the compiled Ruby source will define a <code>FooParser</code> class.
33
+ To parse input, create an instance and call its <code>parse</code> method with a string.
34
+ The parser will return the syntax tree of the match or <code>nil</code> if there is a failure.
35
+ Note that by default, the parser will fail unless <em>all</em> input is consumed.</p>
36
+
37
+ <pre><code>Treetop.load "arithmetic"
38
+
39
+ parser = ArithmeticParser.new
40
+ if parser.parse('1+1')
41
+ puts 'success'
42
+ else
43
+ puts 'failure'
44
+ end
45
+ </code></pre>
46
+
47
+ <h2>Parser Options</h2>
48
+
49
+ <p>A Treetop parser has several options you may set.
50
+ Some are settable permanently by methods on the parser, but all may be passed in as options to the <code>parse</code> method.</p>
51
+
52
+ <pre><code>parser = ArithmeticParser.new
53
+ input = 'x = 2; y = x+3;'
54
+
55
+ # Temporarily override an option:
56
+ result1 = parser.parse(input, :consume_all_input =&gt; false)
57
+ puts "consumed #{parser.index} characters"
58
+
59
+ parser.consume_all_input = false
60
+ result1 = parser.parse(input)
61
+ puts "consumed #{parser.index} characters"
62
+
63
+ # Continue the parse with the next character:
64
+ result2 = parser.parse(input, :index =&gt; parser.index)
65
+
66
+ # Parse, but match rule `variable` instead of the normal root rule:
67
+ parser.parse(input, :root =&gt; :variable)
68
+ parser.root = :variable # Permanent setting
69
+ </code></pre>
70
+
71
+ <p>If you have a statement-oriented language, you can save memory by parsing just one statement at a time,
72
+ and discarding the parse tree after each statement.</p>
73
+
74
+ <h2>Learning From Failure</h2>
75
+
76
+ <p>If a parse fails, it returns nil. In this case, you can ask the parser for an explanation.
77
+ The failure reasons include the terminal nodes which were tried at the furthermost point the parse reached.</p>
78
+
79
+ <pre><code>parser = ArithmeticParser.new
80
+ result = parser.parse('4+=3')
81
+
82
+ if !result
83
+ puts parser.failure_reason
84
+ puts parser.failure_line
85
+ puts parser.failure_column
86
+ end
87
+
88
+ =&gt;
89
+ Expected one of (, - at line 1, column 3 (byte 3) after +
90
+ 1
91
+ 3
92
+ </code></pre>
93
+
94
+ <h2>Using Parse Results</h2>
95
+
96
+ <p>Please don't try to walk down the syntax tree yourself, and please don't use the tree as your own convenient data structure.
97
+ It contains many more nodes than your application needs, often even more than one for every character of input.</p>
98
+
99
+ <pre><code>parser = ArithmeticParser.new
100
+ p parser.parse('2+3')
101
+
102
+ =&gt;
103
+ SyntaxNode+Additive1 offset=0, "2+3" (multitive):
104
+ SyntaxNode+Multitive1 offset=0, "2" (primary):
105
+ SyntaxNode+Number0 offset=0, "2":
106
+ SyntaxNode offset=0, ""
107
+ SyntaxNode offset=0, "2"
108
+ SyntaxNode offset=1, ""
109
+ SyntaxNode offset=1, ""
110
+ SyntaxNode offset=1, "+3":
111
+ SyntaxNode+Additive0 offset=1, "+3" (multitive):
112
+ SyntaxNode offset=1, "+"
113
+ SyntaxNode+Multitive1 offset=2, "3" (primary):
114
+ SyntaxNode+Number0 offset=2, "3":
115
+ SyntaxNode offset=2, ""
116
+ SyntaxNode offset=2, "3"
117
+ SyntaxNode offset=3, ""
118
+ SyntaxNode offset=3, ""
119
+ </code></pre>
120
+
121
+ <p>Instead, add methods to the root rule which return the information you require in a sensible form.
122
+ Each rule can call its sub-rules, and this method of walking the syntax tree is much preferable to
123
+ attempting to walk it from the outside.</p></div></div></div><div id="bottom"></div></body></html>
data/doc/sitegen.rb CHANGED
@@ -16,7 +16,7 @@ class Layout < Erector::Widget
16
16
 
17
17
  def generate_html
18
18
  File.open(absolute_path, 'w') do |file|
19
- file.write(new.render)
19
+ file.write(new.to_html)
20
20
  end
21
21
  end
22
22
 
@@ -1,10 +1,16 @@
1
1
  #Syntactic Recognition
2
- Treetop grammars are written in a custom language based on parsing expression grammars. Literature on the subject of <a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">parsing expression grammars</a> is useful in writing Treetop grammars.
2
+ Treetop grammars are written in a custom language based on parsing expression grammars. Literature on the subject of <a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">parsing expression grammars</a> (PEGs) is useful in writing Treetop grammars.
3
+
4
+ PEGs have no separate lexical analyser (since the algorithm has the same time-complexity guarantees as the best lexical analysers) so all whitespace and other lexical niceties (like comments) must be explicitly handled in the grammar. A further benefit is that multiple PEG grammars may be seamlessly composed into a single parser.
3
5
 
4
6
  #Grammar Structure
5
7
  Treetop grammars look like this:
6
8
 
9
+ require "my_stuff"
10
+
7
11
  grammar GrammarName
12
+ include Module::Submodule
13
+
8
14
  rule rule_name
9
15
  ...
10
16
  end
@@ -20,8 +26,16 @@ The main keywords are:
20
26
 
21
27
  * `grammar` : This introduces a new grammar. It is followed by a constant name to which the grammar will be bound when it is loaded.
22
28
 
29
+ * `include`: This causes the generated parser to include the referenced Ruby module (which may be another parser)
30
+
31
+ * `require`: This must be at the start of the file, and is passed through to the emitted Ruby grammar
32
+
23
33
  * `rule` : This defines a parsing rule within the grammar. It is followed by a name by which this rule can be referenced within other rules. It is then followed by a parsing expression defining the rule.
24
34
 
35
+ A grammar may be surrounded by one or more nested `module` statements, which provides a namespace for the generated Ruby parser.
36
+
37
+ Treetop will emit a module called `GrammarName` and a parser class called `GrammarNameParser` (in the module namespace, if specified).
38
+
25
39
  #Parsing Expressions
26
40
  Each rule associates a name with a _parsing expression_. Parsing expressions are a generalization of vanilla regular expressions. Their key feature is the ability to reference other expressions in the grammar by name.
27
41
 
@@ -41,6 +55,11 @@ Character classes are surrounded by brackets. Their semantics are identical to t
41
55
  ###The Anything Symbol
42
56
  The anything symbol is represented by a dot (`.`) and matches any single character.
43
57
 
58
+ ###Ellipsis
59
+ An empty string matches at any position and consumes no input. It's useful when you wish to treat a single symbol as part of a sequence, for example when an alternate rule will be processed using shared code.
60
+
61
+ * `''`
62
+
44
63
  ##Nonterminal Symbols
45
64
  Nonterminal symbols are unquoted references to other named rules. They are equivalent to an inline substitution of the named expression.
46
65
 
@@ -64,6 +83,8 @@ Parsers attempt to match ordered choices in left-to-right order, and stop after
64
83
  "foobar" / "foo" / "bar"
65
84
 
66
85
  Note that if `"foo"` in the above expression came first, `"foobar"` would never be matched.
86
+ Note also that the above rule will match `"bar"` as a prefix of `"barbie"`.
87
+ Care is required when it's desired to match language keywords exactly.
67
88
 
68
89
  ##Sequences
69
90
 
@@ -77,7 +98,7 @@ Parsers will greedily match an expression zero or more times if it is followed b
77
98
  * `'foo'*` matches the empty string, `"foo"`, `"foofoo"`, etc.
78
99
 
79
100
  ##One or More
80
- Parsers will greedily match an expression one or more times if it is followed by the star (`+`) symbol.
101
+ Parsers will greedily match an expression one or more times if it is followed by the plus (`+`) symbol.
81
102
 
82
103
  * `'foo'+` does not match the empty string, but matches `"foo"`, `"foofoo"`, etc.
83
104
 
@@ -86,8 +107,16 @@ An expression can be declared optional by following it with a question mark (`?`
86
107
 
87
108
  * `'foo'?` matches `"foo"` or the empty string.
88
109
 
110
+ ##Repetition count
111
+ A generalised repetition count (minimum, maximum) is also available.
112
+
113
+ * `'foo' 2..` matches `'foo'` two or more times
114
+ * `'foo' 3..5` matches `'foo'` from three to five times
115
+ * `'foo' ..4` matches `'foo'` from zero to four times
116
+
89
117
  ##Lookahead Assertions
90
- Lookahead assertions can be used to give parsing expressions a limited degree of context-sensitivity. The parser will look ahead into the buffer and attempt to match an expression without consuming input.
118
+ Lookahead assertions can be used to make parsing expressions context-sensitive.
119
+ The parser will look ahead into the buffer and attempt to match an expression without consuming input.
91
120
 
92
121
  ###Positive Lookahead Assertion
93
122
  Preceding an expression with an ampersand `(&)` indicates that it must match, but no input will be consumed in the process of determining whether this is true.
@@ -98,3 +127,80 @@ Preceding an expression with an ampersand `(&)` indicates that it must match, bu
98
127
  Preceding an expression with a bang `(!)` indicates that the expression must not match, but no input will be consumed in the process of determining whether this is true.
99
128
 
100
129
  * `"foo" !"bar"` matches `"foobaz"` but only consumes up to the end `"foo"`. It will not match `"foobar"`.
130
+
131
+ Note that a lookahead assertion may be used on any rule, not just a string terminal.
132
+
133
+ rule things
134
+ thing (!(disallowed / ',') following)*
135
+ end
136
+
137
+ Here's a common use case:
138
+
139
+ rule word
140
+ [a-zA-Z]+
141
+ end
142
+
143
+ rule conjunction
144
+ primitive ('and' ' '+ primitive)*
145
+ end
146
+
147
+ rule primitive
148
+ (!'and' word ' '+)*
149
+ end
150
+
151
+ Here's the easiest way to handle C-style comments:
152
+
153
+ rule c_comment
154
+ '/*'
155
+ (
156
+ !'*/'
157
+ (. / "\n")
158
+ )*
159
+ '*/'
160
+ end
161
+
162
+ ##Semantic predicates (positive and negative)
163
+ Sometimes you must execute Ruby code during parsing in order to decide how to proceed.
164
+ This is an advanced feature, and must be used with great care, because it can change the
165
+ way a Treetop parser backtracks in a way that breaks the parsing algorithm. See the
166
+ notes on below on how to use this feature safely.
167
+
168
+ The code block is the body of a Ruby lambda block, and should return true or false, to cause this
169
+ parse rule to continue or fail (for positive sempreds), fail or continue (for negative sempreds).
170
+
171
+ * `&{ ... }` Evaluate the block and fail this rule if the result is false or nil
172
+ * `!{ ... }` Evaluate the block and fail this rule if the result is not false or nil
173
+
174
+ The lambda is passed a single argument which is the array of syntax nodes matched so far in the
175
+ current sequence. Note that because the current rule has not yet succeeded, no syntax node has
176
+ yet been constructed, and so the lambda block is being run in a context where the `names` of the
177
+ preceding rules (or as assigned by labels) are not available to access the sub-rules.
178
+
179
+ rule id
180
+ [a-zA-Z][a-zA-Z0-9]*
181
+ {
182
+ def is_reserved
183
+ ReservedSymbols.include? text_value
184
+ end
185
+ }
186
+ end
187
+
188
+ rule foo_rule
189
+ foo id &{|seq| seq[1].is_reserved } baz`
190
+ end
191
+
192
+ Match "foo id baz" only if `id.is_reserved`. Note that `id` cannot be referenced by name from `foo_rule`,
193
+ since that rule has not yet succeeded, but `id` has completed and so its added methods are available.
194
+
195
+ rule test_it
196
+ foo bar &{|s| debugger; true } baz
197
+ end
198
+
199
+ Match `foo` then `bar`, stop to enter the debugger (make sure you have said `require "ruby-debug"` somewhere),
200
+ then continue by trying to match `baz`.
201
+
202
+ Treetop, like other PEG parsers, achieves its performance guarantee by remembering which rules it has
203
+ tried at which locations in the input, and what the result was. This process, called memoization,
204
+ requires that the rule would produce the same result (if run again) as it produced the first time when
205
+ the result was remembered. If you violate this principle in your semantic predicates, be prepared to
206
+ fight Cerberus before you're allowed out of Hades again.