treetop 1.4.14 → 1.4.15
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +17 -15
- data/Rakefile +5 -1
- data/doc/tt.1 +83 -0
- data/lib/treetop/ruby_extensions/string.rb +8 -12
- data/lib/treetop/version.rb +1 -1
- data/spec/compiler/character_class_spec.rb +1 -1
- data/spec/compiler/grammar_compiler_spec.rb +22 -0
- data/spec/compiler/test_grammar_magic_coding.treetop +8 -0
- data/spec/compiler/test_grammar_magic_encoding.treetop +8 -0
- data/spec/spec_helper.rb +1 -1
- data/treetop.gemspec +5 -15
- metadata +6 -16
- data/doc/site/contribute.html +0 -124
- data/doc/site/images/bottom_background.png +0 -0
- data/doc/site/images/middle_background.png +0 -0
- data/doc/site/images/paren_language_output.png +0 -0
- data/doc/site/images/pivotal.gif +0 -0
- data/doc/site/images/top_background.png +0 -0
- data/doc/site/index.html +0 -102
- data/doc/site/pitfalls_and_advanced_techniques.html +0 -68
- data/doc/site/robots.txt +0 -5
- data/doc/site/screen.css +0 -134
- data/doc/site/semantic_interpretation.html +0 -245
- data/doc/site/syntactic_recognition.html +0 -278
- data/doc/site/using_in_ruby.html +0 -123
data/README.md
CHANGED
@@ -30,17 +30,18 @@ Next, you start filling your grammar with rules. Each rule associates a name wit
|
|
30
30
|
|
31
31
|
The first rule becomes the *root* of the grammar, causing its expression to be matched when a parser for the grammar is fed a string. The above grammar can now be used in a Ruby program. Notice how a string matching the first rule parses successfully, but a second nonmatching string does not.
|
32
32
|
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
33
|
+
```ruby
|
34
|
+
# use_grammar.rb
|
35
|
+
require 'rubygems'
|
36
|
+
require 'treetop'
|
37
|
+
Treetop.load 'my_grammar'
|
38
|
+
# or just:
|
39
|
+
# require 'my_grammar' # This works because Polyglot hooks "require" to find and load Treetop files
|
39
40
|
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
41
|
+
parser = MyGrammarParser.new
|
42
|
+
puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
|
43
|
+
puts parser.parse('silly generativists!') # => nil
|
44
|
+
```
|
44
45
|
Users of *regular expressions* will find parsing expressions familiar. They share the same basic purpose, matching strings against patterns. However, parsing expressions can recognize a broader category of languages than their less expressive brethren. Before we get into demonstrating that, lets cover some basics. At first parsing expressions won't seem much different. Trust that they are.
|
45
46
|
|
46
47
|
Terminal Symbols
|
@@ -57,12 +58,13 @@ Ordered choices are *composite expressions*, which allow for any of several sube
|
|
57
58
|
'hello chomsky' / 'hello lambek'
|
58
59
|
end
|
59
60
|
end
|
60
|
-
|
61
|
-
# fragment of use_grammar.rb
|
62
|
-
puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
|
63
|
-
puts parser.parse('hello lambek') # => Treetop::Runtime::SyntaxNode
|
64
|
-
puts parser.parse('silly generativists!') # => nil
|
65
61
|
|
62
|
+
```ruby
|
63
|
+
# fragment of use_grammar.rb
|
64
|
+
puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
|
65
|
+
puts parser.parse('hello lambek') # => Treetop::Runtime::SyntaxNode
|
66
|
+
puts parser.parse('silly generativists!') # => nil
|
67
|
+
```
|
66
68
|
Note that once a choice rule has matched the text using a particular alternative at a particular location in the input and hence has succeeded, that choice will never be reconsidered, even if the chosen alternative causes another rule to fail where a later alternative wouldn't have. It's always a later alternative, since the first to succeed is final - why keep looking when you've found what you wanted? This is a feature of PEG parsers that you need to understand if you're going to succeed in using Treetop. In order to memoize success and failures, such decisions cannot be reversed. Luckily Treetop provides a variety of clever ways you can tell it to avoid making the wrong decisions. But more on that later.
|
67
69
|
|
68
70
|
Sequences
|
data/Rakefile
CHANGED
@@ -15,7 +15,11 @@ Jeweler::Tasks.new do |gem|
|
|
15
15
|
gem.homepage = "https://github.com/cjheath/treetop"
|
16
16
|
gem.platform = Gem::Platform::RUBY
|
17
17
|
gem.summary = "A Ruby-based text parsing and interpretation DSL"
|
18
|
-
gem.files = [
|
18
|
+
gem.files = [
|
19
|
+
"LICENSE", "README.md", "Rakefile", "treetop.gemspec",
|
20
|
+
"{spec,lib,bin,examples}/**/*",
|
21
|
+
"doc/*"
|
22
|
+
].map{|p| Dir[p] }.flatten
|
19
23
|
gem.bindir = "bin"
|
20
24
|
gem.executables = ["tt"]
|
21
25
|
gem.require_path = "lib"
|
data/doc/tt.1
ADDED
@@ -0,0 +1,83 @@
|
|
1
|
+
.\" treetop - Bringing the simplicity of Ruby to syntactic analysis
|
2
|
+
.\"
|
3
|
+
.\" Copyright (c) 2007 Nathan Sobo.
|
4
|
+
.\"
|
5
|
+
.\" Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
.\" of this software and associated documentation files (the "Software"), to deal
|
7
|
+
.\" in the Software without restriction, including without limitation the rights
|
8
|
+
.\" to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
.\" copies of the Software, and to permit persons to whom the Software is
|
10
|
+
.\" furnished to do so, subject to the following conditions:
|
11
|
+
.\"
|
12
|
+
.\" The above copyright notice and this permission notice shall be included in
|
13
|
+
.\" all copies or substantial portions of the Software.
|
14
|
+
.\"
|
15
|
+
.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
.\" OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
.\" THE SOFTWARE.
|
22
|
+
.TH tt 1 2013-06-19 Treetop "Treetop v1.4.14"
|
23
|
+
.SH NAME
|
24
|
+
tt \- Compile a treetop grammar file to ruby source code
|
25
|
+
.SH SYNOPSIS
|
26
|
+
.B tt
|
27
|
+
.RI [ options "] " grammar_file "[.treetop|.tt] ..."
|
28
|
+
|
29
|
+
.SH DESCRIPTION
|
30
|
+
The
|
31
|
+
.B tt
|
32
|
+
program is a command-line script to compile .treetop files into Ruby
|
33
|
+
source code.
|
34
|
+
|
35
|
+
The
|
36
|
+
.B tt
|
37
|
+
program takes a list of files with a .treetop extension and compiles
|
38
|
+
them into .rb files of the same name. You can then require these files
|
39
|
+
like any other Ruby script.
|
40
|
+
|
41
|
+
Alternately, you can supply just one .treetop file and a \-o flag to
|
42
|
+
specify the name of the output file.
|
43
|
+
|
44
|
+
Note: while treetop grammar files
|
45
|
+
.B must
|
46
|
+
have a supported filename extensions, (.treetop or .tt), the extension
|
47
|
+
name is not required when calling the compiler with grammar file
|
48
|
+
names.
|
49
|
+
.SH OPTIONS
|
50
|
+
.TP 4
|
51
|
+
.BI "\-o, \-\-output" " FILENAME"
|
52
|
+
|
53
|
+
Write parser source to
|
54
|
+
.I FILENAME.
|
55
|
+
.TP 4
|
56
|
+
.B \-f, \-\-force
|
57
|
+
|
58
|
+
Overwrite existing output file(s)
|
59
|
+
.TP 4
|
60
|
+
.B \-v, \-\-version
|
61
|
+
|
62
|
+
Show Treetop version
|
63
|
+
.TP 4
|
64
|
+
.B \-h, \-\-help
|
65
|
+
|
66
|
+
.SH EXAMPLES
|
67
|
+
.TP 4
|
68
|
+
1 grammar -> 1 parser source
|
69
|
+
|
70
|
+
tt foo.tt
|
71
|
+
.TP 4
|
72
|
+
2 grammars -> 2 separate parsers
|
73
|
+
|
74
|
+
tt foo bar.treetop
|
75
|
+
.TP 4
|
76
|
+
Alternately named output file
|
77
|
+
|
78
|
+
tt \-o alterate_name.rb foo
|
79
|
+
.SH SEE ALSO
|
80
|
+
|
81
|
+
The treetop website:
|
82
|
+
|
83
|
+
.B http://treetop.rubyforge.org
|
@@ -22,23 +22,19 @@ class String
|
|
22
22
|
# The following methods are lifted from Facets 2.0.2
|
23
23
|
def tabto(n)
|
24
24
|
if self =~ /^( *)\S/
|
25
|
-
indent(n - $1.length)
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
end
|
30
|
-
|
31
|
-
unless method_defined?(:indent)
|
32
|
-
def indent(n)
|
33
|
-
if n >= 0
|
34
|
-
gsub(/^/, ' ' * n)
|
25
|
+
# Inlined due to collision with ActiveSupport 4.0: indent(n - $1.length)
|
26
|
+
m = n - $1.length
|
27
|
+
if m >= 0
|
28
|
+
gsub(/^/, ' ' * m)
|
35
29
|
else
|
36
|
-
gsub(/^ {0,#{-
|
30
|
+
gsub(/^ {0,#{-m}}/, "")
|
37
31
|
end
|
32
|
+
else
|
33
|
+
self
|
38
34
|
end
|
39
35
|
end
|
40
36
|
|
41
37
|
def treetop_camelize
|
42
38
|
to_s.gsub(/\/(.?)/){ "::" + $1.upcase }.gsub(/(^|_)(.)/){ $2.upcase }
|
43
39
|
end
|
44
|
-
end
|
40
|
+
end
|
data/lib/treetop/version.rb
CHANGED
@@ -86,7 +86,7 @@ module CharacterClassSpec
|
|
86
86
|
end
|
87
87
|
|
88
88
|
describe "a character class with a negated POSIX bracket expression" do
|
89
|
-
testing_expression "[[
|
89
|
+
testing_expression "[^[:space:]]"
|
90
90
|
it "matches a character outside the negated class" do
|
91
91
|
parse('a').should_not be_nil
|
92
92
|
end
|
@@ -12,8 +12,12 @@ describe Compiler::GrammarCompiler do
|
|
12
12
|
@source_path_with_treetop_extension = "#{dir}/test_grammar.treetop"
|
13
13
|
@source_path_with_do = "#{dir}/test_grammar_do.treetop"
|
14
14
|
@source_path_with_tt_extension = "#{dir}/test_grammar.tt"
|
15
|
+
@source_path_with_magic_coding = "#{dir}/test_grammar_magic_coding.treetop"
|
16
|
+
@source_path_with_magic_encoding = "#{dir}/test_grammar_magic_encoding.treetop"
|
15
17
|
@target_path = "#{@tmpdir}/test_grammar.rb"
|
16
18
|
@target_path_with_do = "#{@tmpdir}/test_grammar_do.rb"
|
19
|
+
@target_path_with_magic_coding = "#{@tmpdir}/test_grammar_magic_coding.rb"
|
20
|
+
@target_path_with_magic_encoding = "#{@tmpdir}/test_grammar_magic_encoding.rb"
|
17
21
|
@alternate_target_path = "#{@tmpdir}/test_grammar_alt.rb"
|
18
22
|
delete_target_files
|
19
23
|
end
|
@@ -82,6 +86,24 @@ describe Compiler::GrammarCompiler do
|
|
82
86
|
Test::GrammarParser.new.parse('foo').should_not be_nil
|
83
87
|
end
|
84
88
|
|
89
|
+
specify "grammars with magic 'encoding' comments keep those comments at the top" do
|
90
|
+
src_copy = "#{@tmpdir}/test_grammar_magic_encoding.treetop"
|
91
|
+
File.open(@source_path_with_magic_encoding) do |f|
|
92
|
+
File.open(src_copy,'w'){|o|o.write(f.read)}
|
93
|
+
end
|
94
|
+
compiler.compile(src_copy)
|
95
|
+
File.open(@target_path_with_magic_encoding).readline.should == "# encoding: UTF-8\n"
|
96
|
+
end
|
97
|
+
|
98
|
+
specify "grammars with magic 'coding' comments keep those comments at the top" do
|
99
|
+
src_copy = "#{@tmpdir}/test_grammar_magic_coding.treetop"
|
100
|
+
File.open(@source_path_with_magic_coding) do |f|
|
101
|
+
File.open(src_copy,'w'){|o|o.write(f.read)}
|
102
|
+
end
|
103
|
+
compiler.compile(src_copy)
|
104
|
+
File.open(@target_path_with_magic_coding).readline.should == "# coding: UTF-8\n"
|
105
|
+
end
|
106
|
+
|
85
107
|
def delete_target_files
|
86
108
|
File.delete(target_path) if File.exists?(target_path)
|
87
109
|
File.delete(@target_path_with_do) if File.exists?(@target_path_with_do)
|
data/spec/spec_helper.rb
CHANGED
@@ -63,7 +63,7 @@ module Treetop
|
|
63
63
|
def parse_multibyte(input, options = {})
|
64
64
|
require 'active_support/all'
|
65
65
|
|
66
|
-
if RUBY_VERSION !~ /^1
|
66
|
+
if RUBY_VERSION !~ /^(1\.9|2\.0)/ && 'NONE' == $KCODE then $KCODE = 'UTF8' end
|
67
67
|
# rspec 1.3 used to do something similar (set it to 'u') that we need
|
68
68
|
# for activerecord multibyte wrapper to kick in (1.8 only? @todo)
|
69
69
|
|
data/treetop.gemspec
CHANGED
@@ -5,12 +5,12 @@
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = "treetop"
|
8
|
-
s.version = "1.4.
|
8
|
+
s.version = "1.4.15"
|
9
9
|
|
10
10
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
11
|
s.authors = ["Nathan Sobo", "Clifford Heath"]
|
12
12
|
s.autorequire = "treetop"
|
13
|
-
s.date = "2013-
|
13
|
+
s.date = "2013-08-17"
|
14
14
|
s.email = "cliffordheath@gmail.com"
|
15
15
|
s.executables = ["tt"]
|
16
16
|
s.extra_rdoc_files = [
|
@@ -28,21 +28,9 @@ Gem::Specification.new do |s|
|
|
28
28
|
"doc/pitfalls_and_advanced_techniques.markdown",
|
29
29
|
"doc/semantic_interpretation.markdown",
|
30
30
|
"doc/site.rb",
|
31
|
-
"doc/site/contribute.html",
|
32
|
-
"doc/site/images/bottom_background.png",
|
33
|
-
"doc/site/images/middle_background.png",
|
34
|
-
"doc/site/images/paren_language_output.png",
|
35
|
-
"doc/site/images/pivotal.gif",
|
36
|
-
"doc/site/images/top_background.png",
|
37
|
-
"doc/site/index.html",
|
38
|
-
"doc/site/pitfalls_and_advanced_techniques.html",
|
39
|
-
"doc/site/robots.txt",
|
40
|
-
"doc/site/screen.css",
|
41
|
-
"doc/site/semantic_interpretation.html",
|
42
|
-
"doc/site/syntactic_recognition.html",
|
43
|
-
"doc/site/using_in_ruby.html",
|
44
31
|
"doc/sitegen.rb",
|
45
32
|
"doc/syntactic_recognition.markdown",
|
33
|
+
"doc/tt.1",
|
46
34
|
"doc/using_in_ruby.markdown",
|
47
35
|
"examples/lambda_calculus/arithmetic.rb",
|
48
36
|
"examples/lambda_calculus/arithmetic.treetop",
|
@@ -120,6 +108,8 @@ Gem::Specification.new do |s|
|
|
120
108
|
"spec/compiler/test_grammar.treetop",
|
121
109
|
"spec/compiler/test_grammar.tt",
|
122
110
|
"spec/compiler/test_grammar_do.treetop",
|
111
|
+
"spec/compiler/test_grammar_magic_coding.treetop",
|
112
|
+
"spec/compiler/test_grammar_magic_encoding.treetop",
|
123
113
|
"spec/compiler/tt_compiler_spec.rb",
|
124
114
|
"spec/compiler/zero_or_more_spec.rb",
|
125
115
|
"spec/composition/a.treetop",
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: treetop
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.4.
|
4
|
+
version: 1.4.15
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -10,7 +10,7 @@ authors:
|
|
10
10
|
autorequire: treetop
|
11
11
|
bindir: bin
|
12
12
|
cert_chain: []
|
13
|
-
date: 2013-
|
13
|
+
date: 2013-08-17 00:00:00.000000000 Z
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
16
|
name: polyglot
|
@@ -159,21 +159,9 @@ files:
|
|
159
159
|
- doc/pitfalls_and_advanced_techniques.markdown
|
160
160
|
- doc/semantic_interpretation.markdown
|
161
161
|
- doc/site.rb
|
162
|
-
- doc/site/contribute.html
|
163
|
-
- doc/site/images/bottom_background.png
|
164
|
-
- doc/site/images/middle_background.png
|
165
|
-
- doc/site/images/paren_language_output.png
|
166
|
-
- doc/site/images/pivotal.gif
|
167
|
-
- doc/site/images/top_background.png
|
168
|
-
- doc/site/index.html
|
169
|
-
- doc/site/pitfalls_and_advanced_techniques.html
|
170
|
-
- doc/site/robots.txt
|
171
|
-
- doc/site/screen.css
|
172
|
-
- doc/site/semantic_interpretation.html
|
173
|
-
- doc/site/syntactic_recognition.html
|
174
|
-
- doc/site/using_in_ruby.html
|
175
162
|
- doc/sitegen.rb
|
176
163
|
- doc/syntactic_recognition.markdown
|
164
|
+
- doc/tt.1
|
177
165
|
- doc/using_in_ruby.markdown
|
178
166
|
- examples/lambda_calculus/arithmetic.rb
|
179
167
|
- examples/lambda_calculus/arithmetic.treetop
|
@@ -251,6 +239,8 @@ files:
|
|
251
239
|
- spec/compiler/test_grammar.treetop
|
252
240
|
- spec/compiler/test_grammar.tt
|
253
241
|
- spec/compiler/test_grammar_do.treetop
|
242
|
+
- spec/compiler/test_grammar_magic_coding.treetop
|
243
|
+
- spec/compiler/test_grammar_magic_encoding.treetop
|
254
244
|
- spec/compiler/tt_compiler_spec.rb
|
255
245
|
- spec/compiler/zero_or_more_spec.rb
|
256
246
|
- spec/composition/a.treetop
|
@@ -289,7 +279,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
289
279
|
version: '0'
|
290
280
|
segments:
|
291
281
|
- 0
|
292
|
-
hash:
|
282
|
+
hash: 2062680504230675145
|
293
283
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
294
284
|
none: false
|
295
285
|
requirements:
|
data/doc/site/contribute.html
DELETED
@@ -1,124 +0,0 @@
|
|
1
|
-
<html><head><link href="./screen.css" rel="stylesheet" type="text/css" />
|
2
|
-
<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
|
3
|
-
</script>
|
4
|
-
<script type="text/javascript">
|
5
|
-
_uacct = "UA-3418876-1";
|
6
|
-
urchinTracker();
|
7
|
-
</script>
|
8
|
-
</head><body><div id="top"><div id="main_navigation"><ul><li><a href="syntactic_recognition.html">Documentation</a></li><li>Contribute</li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="main_content"><h1>Google Group</h1>
|
9
|
-
|
10
|
-
<p>I've created a <a href="http://groups.google.com/group/treetop-dev">Google Group</a> as a better place to organize discussion and development.
|
11
|
-
treetop-dev@google-groups.com</p>
|
12
|
-
|
13
|
-
<h1>Contributing</h1>
|
14
|
-
|
15
|
-
<p>Visit <a href="http://github.com/nathansobo/treetop/tree/master">the Treetop repository page on GitHub</a> in your browser for more information about checking out the source code.</p>
|
16
|
-
|
17
|
-
<p>I like to try Rubinius's policy regarding commit rights. If you submit one patch worth integrating, I'll give you commit rights. We'll see how this goes, but I think it's a good policy.</p>
|
18
|
-
|
19
|
-
<h2>Getting Started with the Code</h2>
|
20
|
-
|
21
|
-
<p>Treetop compiler is interesting in that it is implemented in itself. Its functionality revolves around <code>metagrammar.treetop</code>, which specifies the grammar for Treetop grammars. I took a hybrid approach with regard to definition of methods on syntax nodes in the metagrammar. Methods that are more syntactic in nature, like those that provide access to elements of the syntax tree, are often defined inline, directly in the grammar. More semantic methods are defined in custom node classes.</p>
|
22
|
-
|
23
|
-
<p>Iterating on the metagrammar is tricky. The current testing strategy uses the last stable version of Treetop to parse the version under test. Then the version under test is used to parse and functionally test the various pieces of syntax it should recognize and translate to Ruby. As you change <code>metagrammar.treetop</code> and its associated node classes, note that the node classes you are changing are also used to support the previous stable version of the metagrammar, so must be kept backward compatible until such time as a new stable version can be produced to replace it.</p>
|
24
|
-
|
25
|
-
<h2>Tests</h2>
|
26
|
-
|
27
|
-
<p>Most of the compiler's tests are functional in nature. The grammar under test is used to parse and compile piece of sample code. Then I attempt to parse input with the compiled output and test its results.</p>
|
28
|
-
|
29
|
-
<h1>What Needs to be Done</h1>
|
30
|
-
|
31
|
-
<h2>Small Stuff</h2>
|
32
|
-
|
33
|
-
<ul>
|
34
|
-
<li>Improve the <code>tt</code> command line tool to allow <code>.treetop</code> extensions to be elided in its arguments.</li>
|
35
|
-
<li>Generate and load temp files with <code>Treetop.load</code> rather than evaluating strings to improve stack trace readability.</li>
|
36
|
-
<li>Allow <code>do/end</code> style blocks as well as curly brace blocks. This was originally omitted because I thought it would be confusing. It probably isn't.</li>
|
37
|
-
</ul>
|
38
|
-
|
39
|
-
|
40
|
-
<h2>Big Stuff</h2>
|
41
|
-
|
42
|
-
<h4>Transient Expressions</h4>
|
43
|
-
|
44
|
-
<p>Currently, every parsing expression instantiates a syntax node. This includes even very simple parsing expressions, like single characters. It is probably unnecessary for every single expression in the parse to correspond to its own syntax node, so much savings could be garnered from a transient declaration that instructs the parser only to attempt a match without instantiating nodes.</p>
|
45
|
-
|
46
|
-
<h3>Generate Rule Implementations in C</h3>
|
47
|
-
|
48
|
-
<p>Parsing expressions are currently compiled into simple Ruby source code that comprises the body of parsing rules, which are translated into Ruby methods. The generator could produce C instead of Ruby in the body of these method implementations.</p>
|
49
|
-
|
50
|
-
<h3>Global Parsing State and Semantic Backtrack Triggering</h3>
|
51
|
-
|
52
|
-
<p>Some programming language grammars are not entirely context-free, requiring that global state dictate the behavior of the parser in certain circumstances. Treetop does not currently expose explicit parser control to the grammar writer, and instead automatically constructs the syntax tree for them. A means of semantic parser control compatible with this approach would involve callback methods defined on parsing nodes. Each time a node is successfully parsed it will be given an opportunity to set global state and optionally trigger a parse failure on <em>extrasyntactic</em> grounds. Nodes will probably need to define an additional method that undoes their changes to global state when there is a parse failure and they are backtracked.</p>
|
53
|
-
|
54
|
-
<p>Here is a sketch of the potential utility of such mechanisms. Consider the structure of YAML, which uses indentation to indicate block structure.</p>
|
55
|
-
|
56
|
-
<pre><code>level_1:
|
57
|
-
level_2a:
|
58
|
-
level_2b:
|
59
|
-
level_3a:
|
60
|
-
level_2c:
|
61
|
-
</code></pre>
|
62
|
-
|
63
|
-
<p>Imagine a grammar like the following:</p>
|
64
|
-
|
65
|
-
<pre><code>rule yaml_element
|
66
|
-
name ':' block
|
67
|
-
/
|
68
|
-
name ':' value
|
69
|
-
end
|
70
|
-
|
71
|
-
rule block
|
72
|
-
indent yaml_elements outdent
|
73
|
-
end
|
74
|
-
|
75
|
-
rule yaml_elements
|
76
|
-
yaml_element (samedent yaml_element)*
|
77
|
-
end
|
78
|
-
|
79
|
-
rule samedent
|
80
|
-
newline spaces {
|
81
|
-
def after_success(parser_state)
|
82
|
-
spaces.length == parser_state.indent_level
|
83
|
-
end
|
84
|
-
}
|
85
|
-
end
|
86
|
-
|
87
|
-
rule indent
|
88
|
-
newline spaces {
|
89
|
-
def after_success(parser_state)
|
90
|
-
if spaces.length == parser_state.indent_level + 2
|
91
|
-
parser_state.indent_level += 2
|
92
|
-
true
|
93
|
-
else
|
94
|
-
false # fail the parse on extrasyntactic grounds
|
95
|
-
end
|
96
|
-
end
|
97
|
-
|
98
|
-
def undo_success(parser_state)
|
99
|
-
parser_state.indent_level -= 2
|
100
|
-
end
|
101
|
-
}
|
102
|
-
end
|
103
|
-
|
104
|
-
rule outdent
|
105
|
-
newline spaces {
|
106
|
-
def after_success(parser_state)
|
107
|
-
if spaces.length == parser_state.indent_level - 2
|
108
|
-
parser_state.indent_level -= 2
|
109
|
-
true
|
110
|
-
else
|
111
|
-
false # fail the parse on extrasyntactic grounds
|
112
|
-
end
|
113
|
-
end
|
114
|
-
|
115
|
-
def undo_success(parser_state)
|
116
|
-
parser_state.indent_level += 2
|
117
|
-
end
|
118
|
-
}
|
119
|
-
end
|
120
|
-
</code></pre>
|
121
|
-
|
122
|
-
<p>In this case a block will be detected only if a change in indentation warrants it. Note that this change in the state of indentation must be undone if a subsequent failure causes this node not to ultimately be incorporated into a successful result.</p>
|
123
|
-
|
124
|
-
<p>I am by no means sure that the above sketch is free of problems, or even that this overall strategy is sound, but it seems like a promising path.</p></div></div><div id="bottom"></div></body></html>
|