treetop 1.4.8 → 1.4.9

Sign up to get free protection for your applications and to get access to all the features.
data/Rakefile CHANGED
@@ -26,3 +26,18 @@ end
26
26
  task :version do
27
27
  puts RUBY_VERSION
28
28
  end
29
+
30
+ desc 'Generate website files'
31
+ task :website_generate do
32
+ `cd doc; ruby ./site.rb`
33
+ end
34
+
35
+ desc 'Upload website files'
36
+ task :website_upload do
37
+ rubyforge_config_file = "#{ENV['HOME']}/.rubyforge/user-config.yml"
38
+ rubyforge_config = YAML.load_file(rubyforge_config_file)
39
+ `rsync -aCv doc/site/ #{rubyforge_config['username']}@rubyforge.org:/var/www/gforge-projects/treetop/`
40
+ end
41
+
42
+ desc 'Generate and upload website files'
43
+ task :website => [:website_generate, :website_upload]
@@ -7,7 +7,6 @@ Visit <a href="http://github.com/nathansobo/treetop/tree/master">the Treetop rep
7
7
 
8
8
  I like to try Rubinius's policy regarding commit rights. If you submit one patch worth integrating, I'll give you commit rights. We'll see how this goes, but I think it's a good policy.
9
9
 
10
-
11
10
  ##Getting Started with the Code
12
11
  Treetop compiler is interesting in that it is implemented in itself. Its functionality revolves around `metagrammar.treetop`, which specifies the grammar for Treetop grammars. I took a hybrid approach with regard to definition of methods on syntax nodes in the metagrammar. Methods that are more syntactic in nature, like those that provide access to elements of the syntax tree, are often defined inline, directly in the grammar. More semantic methods are defined in custom node classes.
13
12
 
data/doc/index.markdown CHANGED
@@ -1,6 +1,6 @@
1
1
  <p class="intro_text">
2
-
3
- Treetop is a language for describing languages. Combining the elegance of Ruby with cutting-edge <em>parsing expression grammars</em>, it helps you analyze syntax with revolutionarily ease.
2
+
3
+ Treetop is a language for describing languages. Combining the elegance of Ruby with cutting-edge <em>parsing expression grammars</em>, it helps you analyze syntax with revolutionary ease.
4
4
 
5
5
  </p>
6
6
 
@@ -11,49 +11,47 @@ Parsing expression grammars (PEGs) are simple to write and easy to maintain. The
11
11
 
12
12
  grammar Arithmetic
13
13
  rule additive
14
- multitive '+' additive / multitive
14
+ multitive ( '+' multitive )*
15
15
  end
16
-
16
+
17
17
  rule multitive
18
- primary '*' multitive / primary
18
+ primary ( [*/%] primary )*
19
19
  end
20
-
20
+
21
21
  rule primary
22
22
  '(' additive ')' / number
23
23
  end
24
-
24
+
25
25
  rule number
26
- [1-9] [0-9]*
26
+ '-'? [1-9] [0-9]*
27
27
  end
28
28
  end
29
-
29
+
30
30
 
31
31
  #Syntax-Oriented Programming
32
32
  Rather than implementing semantic actions that construct parse trees, Treetop lets you define methods on trees that it constructs for you automatically. You can define these methods directly within the grammar...
33
33
 
34
34
  grammar Arithmetic
35
35
  rule additive
36
- multitive '+' additive {
36
+ multitive a:( '+' multitive )* {
37
37
  def value
38
- multitive.value + additive.value
38
+ a.elements.inject(multitive.value) { |sum, e|
39
+ sum+e.multitive.value
40
+ }
39
41
  end
40
42
  }
41
- /
42
- multitive
43
43
  end
44
-
44
+
45
45
  # other rules below ...
46
46
  end
47
-
47
+
48
48
  ...or associate rules with classes of nodes you wish your parsers to instantiate upon matching a rule.
49
49
 
50
50
  grammar Arithmetic
51
51
  rule additive
52
- multitive '+' additive <AdditiveNode>
53
- /
54
- multitive
52
+ multitive ('+' multitive)* <AdditiveNode>
55
53
  end
56
-
54
+
57
55
  # other rules below ...
58
56
  end
59
57
 
@@ -63,7 +61,7 @@ Because PEGs are closed under composition, Treetop grammars can be treated like
63
61
 
64
62
  grammar RubyWithEmbeddedSQL
65
63
  include SQL
66
-
64
+
67
65
  rule string
68
66
  quote sql_expression quote / super
69
67
  end
@@ -87,4 +85,4 @@ I'd also like to thank:
87
85
  * Ryan Davis and Eric Hodel for hurting my code.
88
86
  * Dav Yaginuma for kicking me into action on my idea.
89
87
  * Bryan Ford for his seminal work on Packrat Parsers.
90
- * The editors of Lambda the Ultimate, where I discovered parsing expression grammars.
88
+ * The editors of Lambda the Ultimate, where I discovered parsing expression grammars.
@@ -87,7 +87,7 @@ Nonterminal and ordered choice expressions do not instantiate new nodes, but rat
87
87
  }
88
88
  end
89
89
 
90
- The parenthesized choice above can result in a node matching either of the two choices. Than node will be extended with methods defined in the subsequent block. Note that a choice must always be parenthesized to be associated with a following block.
90
+ The parenthesized choice above can result in a node matching either of the two choices. The node will be extended with methods defined in the subsequent block. Note that a choice must always be parenthesized to be associated with a following block, otherwise the block will apply to just the last alternative.
91
91
 
92
92
  ###Extending A Propagated Node with a Declared Module
93
93
  # in .treetop file
@@ -185,5 +185,34 @@ The module containing automatically defined element accessor methods is an ances
185
185
  <td>
186
186
  Available only on nonterminal nodes, returns the nodes parsed by the elements of the matched sequence.
187
187
  </td>
188
+ <tr>
189
+ <td>
190
+ <code>input</code>
191
+ </td>
192
+ <td>
193
+ The entire input string, which is useful mainly in conjunction with <code>interval</code>
194
+ </td>
195
+ <tr>
196
+ <td>
197
+ <code>interval</code>
198
+ </td>
199
+ <td>
200
+ The Range of characters in <code>input</code> matched by this rule
201
+ </td>
202
+ <tr>
203
+ <td>
204
+ <code>empty?</code>
205
+ </td>
206
+ <td>
207
+ returns true if this rule matched no characters of input
208
+ </td>
209
+ <tr>
210
+ <td>
211
+ <code>inspect</code>
212
+ </td>
213
+ <td>
214
+ Handy-dandy method that returns an indented subtree dump of the syntax tree starting here.
215
+ This dump includes, for each node, the offset and a snippet of the text this rule matched, and the names of mixin modules and the accessor and extension methods.
216
+ </td>
188
217
  </tr>
189
218
  </table>
data/doc/site.rb CHANGED
@@ -5,7 +5,7 @@ require 'fileutils'
5
5
  require 'bluecloth'
6
6
 
7
7
  class Layout < Erector::Widget
8
- def render
8
+ def content
9
9
  html do
10
10
  head do
11
11
  link :rel => "stylesheet",
@@ -29,8 +29,8 @@ class Layout < Erector::Widget
29
29
  end
30
30
  end
31
31
  div :id => 'middle' do
32
- div :id => 'content' do
33
- content
32
+ div :id => 'main_content' do
33
+ main_content
34
34
  end
35
35
  end
36
36
  div :id => 'bottom' do
@@ -48,12 +48,12 @@ class Layout < Erector::Widget
48
48
  end
49
49
  end
50
50
 
51
- def content
51
+ def main_content
52
52
  end
53
53
  end
54
54
 
55
55
  class Index < Layout
56
- def content
56
+ def main_content
57
57
  bluecloth "index.markdown"
58
58
  end
59
59
  end
@@ -61,7 +61,7 @@ end
61
61
  class Documentation < Layout
62
62
  abstract
63
63
 
64
- def content
64
+ def main_content
65
65
  div :id => 'secondary_navigation' do
66
66
  ul do
67
67
  li { link_to 'Syntax', SyntacticRecognition }
@@ -103,7 +103,7 @@ end
103
103
 
104
104
 
105
105
  class Contribute < Layout
106
- def content
106
+ def main_content
107
107
  bluecloth "contributing_and_planned_features.markdown"
108
108
  end
109
109
  end
@@ -0,0 +1,124 @@
1
+ <html><head><link href="./screen.css" rel="stylesheet" type="text/css" />
2
+ <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3
+ </script>
4
+ <script type="text/javascript">
5
+ _uacct = "UA-3418876-1";
6
+ urchinTracker();
7
+ </script>
8
+ </head><body><div id="top"><div id="main_navigation"><ul><li><a href="syntactic_recognition.html">Documentation</a></li><li>Contribute</li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="main_content"><h1>Google Group</h1>
9
+
10
+ <p>I've created a <a href="http://groups.google.com/group/treetop-dev">Google Group</a> as a better place to organize discussion and development.
11
+ treetop-dev@google-groups.com</p>
12
+
13
+ <h1>Contributing</h1>
14
+
15
+ <p>Visit <a href="http://github.com/nathansobo/treetop/tree/master">the Treetop repository page on GitHub</a> in your browser for more information about checking out the source code.</p>
16
+
17
+ <p>I like to try Rubinius's policy regarding commit rights. If you submit one patch worth integrating, I'll give you commit rights. We'll see how this goes, but I think it's a good policy.</p>
18
+
19
+ <h2>Getting Started with the Code</h2>
20
+
21
+ <p>Treetop compiler is interesting in that it is implemented in itself. Its functionality revolves around <code>metagrammar.treetop</code>, which specifies the grammar for Treetop grammars. I took a hybrid approach with regard to definition of methods on syntax nodes in the metagrammar. Methods that are more syntactic in nature, like those that provide access to elements of the syntax tree, are often defined inline, directly in the grammar. More semantic methods are defined in custom node classes.</p>
22
+
23
+ <p>Iterating on the metagrammar is tricky. The current testing strategy uses the last stable version of Treetop to parse the version under test. Then the version under test is used to parse and functionally test the various pieces of syntax it should recognize and translate to Ruby. As you change <code>metagrammar.treetop</code> and its associated node classes, note that the node classes you are changing are also used to support the previous stable version of the metagrammar, so must be kept backward compatible until such time as a new stable version can be produced to replace it.</p>
24
+
25
+ <h2>Tests</h2>
26
+
27
+ <p>Most of the compiler's tests are functional in nature. The grammar under test is used to parse and compile piece of sample code. Then I attempt to parse input with the compiled output and test its results.</p>
28
+
29
+ <h1>What Needs to be Done</h1>
30
+
31
+ <h2>Small Stuff</h2>
32
+
33
+ <ul>
34
+ <li>Improve the <code>tt</code> command line tool to allow <code>.treetop</code> extensions to be elided in its arguments.</li>
35
+ <li>Generate and load temp files with <code>Treetop.load</code> rather than evaluating strings to improve stack trace readability.</li>
36
+ <li>Allow <code>do/end</code> style blocks as well as curly brace blocks. This was originally omitted because I thought it would be confusing. It probably isn't.</li>
37
+ </ul>
38
+
39
+
40
+ <h2>Big Stuff</h2>
41
+
42
+ <h4>Transient Expressions</h4>
43
+
44
+ <p>Currently, every parsing expression instantiates a syntax node. This includes even very simple parsing expressions, like single characters. It is probably unnecessary for every single expression in the parse to correspond to its own syntax node, so much savings could be garnered from a transient declaration that instructs the parser only to attempt a match without instantiating nodes.</p>
45
+
46
+ <h3>Generate Rule Implementations in C</h3>
47
+
48
+ <p>Parsing expressions are currently compiled into simple Ruby source code that comprises the body of parsing rules, which are translated into Ruby methods. The generator could produce C instead of Ruby in the body of these method implementations.</p>
49
+
50
+ <h3>Global Parsing State and Semantic Backtrack Triggering</h3>
51
+
52
+ <p>Some programming language grammars are not entirely context-free, requiring that global state dictate the behavior of the parser in certain circumstances. Treetop does not currently expose explicit parser control to the grammar writer, and instead automatically constructs the syntax tree for them. A means of semantic parser control compatible with this approach would involve callback methods defined on parsing nodes. Each time a node is successfully parsed it will be given an opportunity to set global state and optionally trigger a parse failure on <em>extrasyntactic</em> grounds. Nodes will probably need to define an additional method that undoes their changes to global state when there is a parse failure and they are backtracked.</p>
53
+
54
+ <p>Here is a sketch of the potential utility of such mechanisms. Consider the structure of YAML, which uses indentation to indicate block structure.</p>
55
+
56
+ <pre><code>level_1:
57
+ level_2a:
58
+ level_2b:
59
+ level_3a:
60
+ level_2c:
61
+ </code></pre>
62
+
63
+ <p>Imagine a grammar like the following:</p>
64
+
65
+ <pre><code>rule yaml_element
66
+ name ':' block
67
+ /
68
+ name ':' value
69
+ end
70
+
71
+ rule block
72
+ indent yaml_elements outdent
73
+ end
74
+
75
+ rule yaml_elements
76
+ yaml_element (samedent yaml_element)*
77
+ end
78
+
79
+ rule samedent
80
+ newline spaces {
81
+ def after_success(parser_state)
82
+ spaces.length == parser_state.indent_level
83
+ end
84
+ }
85
+ end
86
+
87
+ rule indent
88
+ newline spaces {
89
+ def after_success(parser_state)
90
+ if spaces.length == parser_state.indent_level + 2
91
+ parser_state.indent_level += 2
92
+ true
93
+ else
94
+ false # fail the parse on extrasyntactic grounds
95
+ end
96
+ end
97
+
98
+ def undo_success(parser_state)
99
+ parser_state.indent_level -= 2
100
+ end
101
+ }
102
+ end
103
+
104
+ rule outdent
105
+ newline spaces {
106
+ def after_success(parser_state)
107
+ if spaces.length == parser_state.indent_level - 2
108
+ parser_state.indent_level -= 2
109
+ true
110
+ else
111
+ false # fail the parse on extrasyntactic grounds
112
+ end
113
+ end
114
+
115
+ def undo_success(parser_state)
116
+ parser_state.indent_level += 2
117
+ end
118
+ }
119
+ end
120
+ </code></pre>
121
+
122
+ <p>In this case a block will be detected only if a change in indentation warrants it. Note that this change in the state of indentation must be undone if a subsequent failure causes this node not to ultimately be incorporated into a successful result.</p>
123
+
124
+ <p>I am by no means sure that the above sketch is free of problems, or even that this overall strategy is sound, but it seems like a promising path.</p></div></div><div id="bottom"></div></body></html>
Binary file
Binary file
@@ -0,0 +1,102 @@
1
+ <html><head><link href="./screen.css" rel="stylesheet" type="text/css" />
2
+ <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3
+ </script>
4
+ <script type="text/javascript">
5
+ _uacct = "UA-3418876-1";
6
+ urchinTracker();
7
+ </script>
8
+ </head><body><div id="top"><div id="main_navigation"><ul><li><a href="syntactic_recognition.html">Documentation</a></li><li><a href="contribute.html">Contribute</a></li><li>Home</li></ul></div></div><div id="middle"><div id="main_content"><p class="intro_text">
9
+
10
+ Treetop is a language for describing languages. Combining the elegance of Ruby with cutting-edge <em>parsing expression grammars</em>, it helps you analyze syntax with revolutionary ease.
11
+
12
+ </p>
13
+
14
+
15
+ <pre><code>sudo gem install treetop
16
+ </code></pre>
17
+
18
+ <h1>Intuitive Grammar Specifications</h1>
19
+
20
+ <p>Parsing expression grammars (PEGs) are simple to write and easy to maintain. They are a simple but powerful generalization of regular expressions that are easier to work with than the LALR or LR-1 grammars of traditional parser generators. There's no need for a tokenization phase, and <em>lookahead assertions</em> can be used for a limited degree of context-sensitivity. Here's an extremely simple Treetop grammar that matches a subset of arithmetic, respecting operator precedence:</p>
21
+
22
+ <pre><code>grammar Arithmetic
23
+ rule additive
24
+ multitive ( '+' multitive )*
25
+ end
26
+
27
+ rule multitive
28
+ primary ( [*/%] primary )*
29
+ end
30
+
31
+ rule primary
32
+ '(' additive ')' / number
33
+ end
34
+
35
+ rule number
36
+ '-'? [1-9] [0-9]*
37
+ end
38
+ end
39
+ </code></pre>
40
+
41
+ <h1>Syntax-Oriented Programming</h1>
42
+
43
+ <p>Rather than implementing semantic actions that construct parse trees, Treetop lets you define methods on trees that it constructs for you automatically. You can define these methods directly within the grammar...</p>
44
+
45
+ <pre><code>grammar Arithmetic
46
+ rule additive
47
+ multitive a:( '+' multitive )* {
48
+ def value
49
+ a.elements.inject(multitive.value) { |sum, e|
50
+ sum+e.multitive.value
51
+ }
52
+ end
53
+ }
54
+ end
55
+
56
+ # other rules below ...
57
+ end
58
+ </code></pre>
59
+
60
+ <p>...or associate rules with classes of nodes you wish your parsers to instantiate upon matching a rule.</p>
61
+
62
+ <pre><code>grammar Arithmetic
63
+ rule additive
64
+ multitive ('+' multitive)* &lt;AdditiveNode&gt;
65
+ end
66
+
67
+ # other rules below ...
68
+ end
69
+ </code></pre>
70
+
71
+ <h1>Reusable, Composable Language Descriptions</h1>
72
+
73
+ <p>Because PEGs are closed under composition, Treetop grammars can be treated like Ruby modules. You can mix them into one another and override rules with access to the <code>super</code> keyword. You can break large grammars down into coherent units or make your language's syntax modular. This is especially useful if you want other programmers to be able to reuse your work.</p>
74
+
75
+ <pre><code>grammar RubyWithEmbeddedSQL
76
+ include SQL
77
+
78
+ rule string
79
+ quote sql_expression quote / super
80
+ end
81
+ end
82
+ </code></pre>
83
+
84
+ <h1>Acknowledgements</h1>
85
+
86
+ <p><a href="http://pivotallabs.com"><img id="pivotal_logo" src="./images/pivotal.gif"></a></p>
87
+
88
+ <p>First, thank you to my employer Rob Mee of <a href="http://pivotallabs.com"/>Pivotal Labs</a> for funding a substantial portion of Treetop's development. He gets it.</p>
89
+
90
+ <p>I'd also like to thank:</p>
91
+
92
+ <ul>
93
+ <li>Damon McCormick for several hours of pair programming.</li>
94
+ <li>Nick Kallen for lots of well-considered feedback and a few afternoons of programming.</li>
95
+ <li>Brian Takita for a night of pair programming.</li>
96
+ <li>Eliot Miranda for urging me rewrite as a compiler right away rather than putting it off.</li>
97
+ <li>Ryan Davis and Eric Hodel for hurting my code.</li>
98
+ <li>Dav Yaginuma for kicking me into action on my idea.</li>
99
+ <li>Bryan Ford for his seminal work on Packrat Parsers.</li>
100
+ <li>The editors of Lambda the Ultimate, where I discovered parsing expression grammars.</li>
101
+ </ul>
102
+ </div></div><div id="bottom"></div></body></html>