treetop 1.4.14 → 1.4.15

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -30,17 +30,18 @@ Next, you start filling your grammar with rules. Each rule associates a name wit
30
30
 
31
31
  The first rule becomes the *root* of the grammar, causing its expression to be matched when a parser for the grammar is fed a string. The above grammar can now be used in a Ruby program. Notice how a string matching the first rule parses successfully, but a second nonmatching string does not.
32
32
 
33
- # use_grammar.rb
34
- require 'rubygems'
35
- require 'treetop'
36
- Treetop.load 'my_grammar'
37
- # or just:
38
- # require 'my_grammar' # This works because Polyglot hooks "require" to find and load Treetop files
33
+ ```ruby
34
+ # use_grammar.rb
35
+ require 'rubygems'
36
+ require 'treetop'
37
+ Treetop.load 'my_grammar'
38
+ # or just:
39
+ # require 'my_grammar' # This works because Polyglot hooks "require" to find and load Treetop files
39
40
 
40
- parser = MyGrammarParser.new
41
- puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
42
- puts parser.parse('silly generativists!') # => nil
43
-
41
+ parser = MyGrammarParser.new
42
+ puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
43
+ puts parser.parse('silly generativists!') # => nil
44
+ ```
44
45
  Users of *regular expressions* will find parsing expressions familiar. They share the same basic purpose, matching strings against patterns. However, parsing expressions can recognize a broader category of languages than their less expressive brethren. Before we get into demonstrating that, lets cover some basics. At first parsing expressions won't seem much different. Trust that they are.
45
46
 
46
47
  Terminal Symbols
@@ -57,12 +58,13 @@ Ordered choices are *composite expressions*, which allow for any of several sube
57
58
  'hello chomsky' / 'hello lambek'
58
59
  end
59
60
  end
60
-
61
- # fragment of use_grammar.rb
62
- puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
63
- puts parser.parse('hello lambek') # => Treetop::Runtime::SyntaxNode
64
- puts parser.parse('silly generativists!') # => nil
65
61
 
62
+ ```ruby
63
+ # fragment of use_grammar.rb
64
+ puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
65
+ puts parser.parse('hello lambek') # => Treetop::Runtime::SyntaxNode
66
+ puts parser.parse('silly generativists!') # => nil
67
+ ```
66
68
  Note that once a choice rule has matched the text using a particular alternative at a particular location in the input and hence has succeeded, that choice will never be reconsidered, even if the chosen alternative causes another rule to fail where a later alternative wouldn't have. It's always a later alternative, since the first to succeed is final - why keep looking when you've found what you wanted? This is a feature of PEG parsers that you need to understand if you're going to succeed in using Treetop. In order to memoize success and failures, such decisions cannot be reversed. Luckily Treetop provides a variety of clever ways you can tell it to avoid making the wrong decisions. But more on that later.
67
69
 
68
70
  Sequences
data/Rakefile CHANGED
@@ -15,7 +15,11 @@ Jeweler::Tasks.new do |gem|
15
15
  gem.homepage = "https://github.com/cjheath/treetop"
16
16
  gem.platform = Gem::Platform::RUBY
17
17
  gem.summary = "A Ruby-based text parsing and interpretation DSL"
18
- gem.files = ["LICENSE", "README.md", "Rakefile", "treetop.gemspec", "{spec,lib,bin,doc,examples}/**/*"].map{|p| Dir[p]}.flatten
18
+ gem.files = [
19
+ "LICENSE", "README.md", "Rakefile", "treetop.gemspec",
20
+ "{spec,lib,bin,examples}/**/*",
21
+ "doc/*"
22
+ ].map{|p| Dir[p] }.flatten
19
23
  gem.bindir = "bin"
20
24
  gem.executables = ["tt"]
21
25
  gem.require_path = "lib"
data/doc/tt.1 ADDED
@@ -0,0 +1,83 @@
1
+ .\" treetop - Bringing the simplicity of Ruby to syntactic analysis
2
+ .\"
3
+ .\" Copyright (c) 2007 Nathan Sobo.
4
+ .\"
5
+ .\" Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ .\" of this software and associated documentation files (the "Software"), to deal
7
+ .\" in the Software without restriction, including without limitation the rights
8
+ .\" to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ .\" copies of the Software, and to permit persons to whom the Software is
10
+ .\" furnished to do so, subject to the following conditions:
11
+ .\"
12
+ .\" The above copyright notice and this permission notice shall be included in
13
+ .\" all copies or substantial portions of the Software.
14
+ .\"
15
+ .\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ .\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ .\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ .\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ .\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ .\" OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ .\" THE SOFTWARE.
22
+ .TH tt 1 2013-06-19 Treetop "Treetop v1.4.14"
23
+ .SH NAME
24
+ tt \- Compile a treetop grammar file to ruby source code
25
+ .SH SYNOPSIS
26
+ .B tt
27
+ .RI [ options "] " grammar_file "[.treetop|.tt] ..."
28
+
29
+ .SH DESCRIPTION
30
+ The
31
+ .B tt
32
+ program is a command-line script to compile .treetop files into Ruby
33
+ source code.
34
+
35
+ The
36
+ .B tt
37
+ program takes a list of files with a .treetop extension and compiles
38
+ them into .rb files of the same name. You can then require these files
39
+ like any other Ruby script.
40
+
41
+ Alternately, you can supply just one .treetop file and a \-o flag to
42
+ specify the name of the output file.
43
+
44
+ Note: while treetop grammar files
45
+ .B must
46
+ have a supported filename extensions, (.treetop or .tt), the extension
47
+ name is not required when calling the compiler with grammar file
48
+ names.
49
+ .SH OPTIONS
50
+ .TP 4
51
+ .BI "\-o, \-\-output" " FILENAME"
52
+
53
+ Write parser source to
54
+ .I FILENAME.
55
+ .TP 4
56
+ .B \-f, \-\-force
57
+
58
+ Overwrite existing output file(s)
59
+ .TP 4
60
+ .B \-v, \-\-version
61
+
62
+ Show Treetop version
63
+ .TP 4
64
+ .B \-h, \-\-help
65
+
66
+ .SH EXAMPLES
67
+ .TP 4
68
+ 1 grammar -> 1 parser source
69
+
70
+ tt foo.tt
71
+ .TP 4
72
+ 2 grammars -> 2 separate parsers
73
+
74
+ tt foo bar.treetop
75
+ .TP 4
76
+ Alternately named output file
77
+
78
+ tt \-o alterate_name.rb foo
79
+ .SH SEE ALSO
80
+
81
+ The treetop website:
82
+
83
+ .B http://treetop.rubyforge.org
@@ -22,23 +22,19 @@ class String
22
22
  # The following methods are lifted from Facets 2.0.2
23
23
  def tabto(n)
24
24
  if self =~ /^( *)\S/
25
- indent(n - $1.length)
26
- else
27
- self
28
- end
29
- end
30
-
31
- unless method_defined?(:indent)
32
- def indent(n)
33
- if n >= 0
34
- gsub(/^/, ' ' * n)
25
+ # Inlined due to collision with ActiveSupport 4.0: indent(n - $1.length)
26
+ m = n - $1.length
27
+ if m >= 0
28
+ gsub(/^/, ' ' * m)
35
29
  else
36
- gsub(/^ {0,#{-n}}/, "")
30
+ gsub(/^ {0,#{-m}}/, "")
37
31
  end
32
+ else
33
+ self
38
34
  end
39
35
  end
40
36
 
41
37
  def treetop_camelize
42
38
  to_s.gsub(/\/(.?)/){ "::" + $1.upcase }.gsub(/(^|_)(.)/){ $2.upcase }
43
39
  end
44
- end
40
+ end
@@ -2,7 +2,7 @@ module Treetop #:nodoc:
2
2
  module VERSION #:nodoc:
3
3
  MAJOR = 1
4
4
  MINOR = 4
5
- TINY = 14
5
+ TINY = 15
6
6
 
7
7
  STRING = [MAJOR, MINOR, TINY].join('.')
8
8
  end
@@ -86,7 +86,7 @@ module CharacterClassSpec
86
86
  end
87
87
 
88
88
  describe "a character class with a negated POSIX bracket expression" do
89
- testing_expression "[[:^space:]]"
89
+ testing_expression "[^[:space:]]"
90
90
  it "matches a character outside the negated class" do
91
91
  parse('a').should_not be_nil
92
92
  end
@@ -12,8 +12,12 @@ describe Compiler::GrammarCompiler do
12
12
  @source_path_with_treetop_extension = "#{dir}/test_grammar.treetop"
13
13
  @source_path_with_do = "#{dir}/test_grammar_do.treetop"
14
14
  @source_path_with_tt_extension = "#{dir}/test_grammar.tt"
15
+ @source_path_with_magic_coding = "#{dir}/test_grammar_magic_coding.treetop"
16
+ @source_path_with_magic_encoding = "#{dir}/test_grammar_magic_encoding.treetop"
15
17
  @target_path = "#{@tmpdir}/test_grammar.rb"
16
18
  @target_path_with_do = "#{@tmpdir}/test_grammar_do.rb"
19
+ @target_path_with_magic_coding = "#{@tmpdir}/test_grammar_magic_coding.rb"
20
+ @target_path_with_magic_encoding = "#{@tmpdir}/test_grammar_magic_encoding.rb"
17
21
  @alternate_target_path = "#{@tmpdir}/test_grammar_alt.rb"
18
22
  delete_target_files
19
23
  end
@@ -82,6 +86,24 @@ describe Compiler::GrammarCompiler do
82
86
  Test::GrammarParser.new.parse('foo').should_not be_nil
83
87
  end
84
88
 
89
+ specify "grammars with magic 'encoding' comments keep those comments at the top" do
90
+ src_copy = "#{@tmpdir}/test_grammar_magic_encoding.treetop"
91
+ File.open(@source_path_with_magic_encoding) do |f|
92
+ File.open(src_copy,'w'){|o|o.write(f.read)}
93
+ end
94
+ compiler.compile(src_copy)
95
+ File.open(@target_path_with_magic_encoding).readline.should == "# encoding: UTF-8\n"
96
+ end
97
+
98
+ specify "grammars with magic 'coding' comments keep those comments at the top" do
99
+ src_copy = "#{@tmpdir}/test_grammar_magic_coding.treetop"
100
+ File.open(@source_path_with_magic_coding) do |f|
101
+ File.open(src_copy,'w'){|o|o.write(f.read)}
102
+ end
103
+ compiler.compile(src_copy)
104
+ File.open(@target_path_with_magic_coding).readline.should == "# coding: UTF-8\n"
105
+ end
106
+
85
107
  def delete_target_files
86
108
  File.delete(target_path) if File.exists?(target_path)
87
109
  File.delete(@target_path_with_do) if File.exists?(@target_path_with_do)
@@ -0,0 +1,8 @@
1
+ # coding: UTF-8
2
+ module Test
3
+ grammar Grammar do
4
+ rule foo do
5
+ 'foo'
6
+ end
7
+ end
8
+ end
@@ -0,0 +1,8 @@
1
+ # encoding: UTF-8
2
+ module Test
3
+ grammar Grammar do
4
+ rule foo do
5
+ 'foo'
6
+ end
7
+ end
8
+ end
data/spec/spec_helper.rb CHANGED
@@ -63,7 +63,7 @@ module Treetop
63
63
  def parse_multibyte(input, options = {})
64
64
  require 'active_support/all'
65
65
 
66
- if RUBY_VERSION !~ /^1.9/ && 'NONE' == $KCODE then $KCODE = 'UTF8' end
66
+ if RUBY_VERSION !~ /^(1\.9|2\.0)/ && 'NONE' == $KCODE then $KCODE = 'UTF8' end
67
67
  # rspec 1.3 used to do something similar (set it to 'u') that we need
68
68
  # for activerecord multibyte wrapper to kick in (1.8 only? @todo)
69
69
 
data/treetop.gemspec CHANGED
@@ -5,12 +5,12 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = "treetop"
8
- s.version = "1.4.14"
8
+ s.version = "1.4.15"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Nathan Sobo", "Clifford Heath"]
12
12
  s.autorequire = "treetop"
13
- s.date = "2013-06-04"
13
+ s.date = "2013-08-17"
14
14
  s.email = "cliffordheath@gmail.com"
15
15
  s.executables = ["tt"]
16
16
  s.extra_rdoc_files = [
@@ -28,21 +28,9 @@ Gem::Specification.new do |s|
28
28
  "doc/pitfalls_and_advanced_techniques.markdown",
29
29
  "doc/semantic_interpretation.markdown",
30
30
  "doc/site.rb",
31
- "doc/site/contribute.html",
32
- "doc/site/images/bottom_background.png",
33
- "doc/site/images/middle_background.png",
34
- "doc/site/images/paren_language_output.png",
35
- "doc/site/images/pivotal.gif",
36
- "doc/site/images/top_background.png",
37
- "doc/site/index.html",
38
- "doc/site/pitfalls_and_advanced_techniques.html",
39
- "doc/site/robots.txt",
40
- "doc/site/screen.css",
41
- "doc/site/semantic_interpretation.html",
42
- "doc/site/syntactic_recognition.html",
43
- "doc/site/using_in_ruby.html",
44
31
  "doc/sitegen.rb",
45
32
  "doc/syntactic_recognition.markdown",
33
+ "doc/tt.1",
46
34
  "doc/using_in_ruby.markdown",
47
35
  "examples/lambda_calculus/arithmetic.rb",
48
36
  "examples/lambda_calculus/arithmetic.treetop",
@@ -120,6 +108,8 @@ Gem::Specification.new do |s|
120
108
  "spec/compiler/test_grammar.treetop",
121
109
  "spec/compiler/test_grammar.tt",
122
110
  "spec/compiler/test_grammar_do.treetop",
111
+ "spec/compiler/test_grammar_magic_coding.treetop",
112
+ "spec/compiler/test_grammar_magic_encoding.treetop",
123
113
  "spec/compiler/tt_compiler_spec.rb",
124
114
  "spec/compiler/zero_or_more_spec.rb",
125
115
  "spec/composition/a.treetop",
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: treetop
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.4.14
4
+ version: 1.4.15
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -10,7 +10,7 @@ authors:
10
10
  autorequire: treetop
11
11
  bindir: bin
12
12
  cert_chain: []
13
- date: 2013-06-04 00:00:00.000000000 Z
13
+ date: 2013-08-17 00:00:00.000000000 Z
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
16
16
  name: polyglot
@@ -159,21 +159,9 @@ files:
159
159
  - doc/pitfalls_and_advanced_techniques.markdown
160
160
  - doc/semantic_interpretation.markdown
161
161
  - doc/site.rb
162
- - doc/site/contribute.html
163
- - doc/site/images/bottom_background.png
164
- - doc/site/images/middle_background.png
165
- - doc/site/images/paren_language_output.png
166
- - doc/site/images/pivotal.gif
167
- - doc/site/images/top_background.png
168
- - doc/site/index.html
169
- - doc/site/pitfalls_and_advanced_techniques.html
170
- - doc/site/robots.txt
171
- - doc/site/screen.css
172
- - doc/site/semantic_interpretation.html
173
- - doc/site/syntactic_recognition.html
174
- - doc/site/using_in_ruby.html
175
162
  - doc/sitegen.rb
176
163
  - doc/syntactic_recognition.markdown
164
+ - doc/tt.1
177
165
  - doc/using_in_ruby.markdown
178
166
  - examples/lambda_calculus/arithmetic.rb
179
167
  - examples/lambda_calculus/arithmetic.treetop
@@ -251,6 +239,8 @@ files:
251
239
  - spec/compiler/test_grammar.treetop
252
240
  - spec/compiler/test_grammar.tt
253
241
  - spec/compiler/test_grammar_do.treetop
242
+ - spec/compiler/test_grammar_magic_coding.treetop
243
+ - spec/compiler/test_grammar_magic_encoding.treetop
254
244
  - spec/compiler/tt_compiler_spec.rb
255
245
  - spec/compiler/zero_or_more_spec.rb
256
246
  - spec/composition/a.treetop
@@ -289,7 +279,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
289
279
  version: '0'
290
280
  segments:
291
281
  - 0
292
- hash: 622706517614693275
282
+ hash: 2062680504230675145
293
283
  required_rubygems_version: !ruby/object:Gem::Requirement
294
284
  none: false
295
285
  requirements:
@@ -1,124 +0,0 @@
1
- <html><head><link href="./screen.css" rel="stylesheet" type="text/css" />
2
- <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
3
- </script>
4
- <script type="text/javascript">
5
- _uacct = "UA-3418876-1";
6
- urchinTracker();
7
- </script>
8
- </head><body><div id="top"><div id="main_navigation"><ul><li><a href="syntactic_recognition.html">Documentation</a></li><li>Contribute</li><li><a href="index.html">Home</a></li></ul></div></div><div id="middle"><div id="main_content"><h1>Google Group</h1>
9
-
10
- <p>I've created a <a href="http://groups.google.com/group/treetop-dev">Google Group</a> as a better place to organize discussion and development.
11
- treetop-dev@google-groups.com</p>
12
-
13
- <h1>Contributing</h1>
14
-
15
- <p>Visit <a href="http://github.com/nathansobo/treetop/tree/master">the Treetop repository page on GitHub</a> in your browser for more information about checking out the source code.</p>
16
-
17
- <p>I like to try Rubinius's policy regarding commit rights. If you submit one patch worth integrating, I'll give you commit rights. We'll see how this goes, but I think it's a good policy.</p>
18
-
19
- <h2>Getting Started with the Code</h2>
20
-
21
- <p>Treetop compiler is interesting in that it is implemented in itself. Its functionality revolves around <code>metagrammar.treetop</code>, which specifies the grammar for Treetop grammars. I took a hybrid approach with regard to definition of methods on syntax nodes in the metagrammar. Methods that are more syntactic in nature, like those that provide access to elements of the syntax tree, are often defined inline, directly in the grammar. More semantic methods are defined in custom node classes.</p>
22
-
23
- <p>Iterating on the metagrammar is tricky. The current testing strategy uses the last stable version of Treetop to parse the version under test. Then the version under test is used to parse and functionally test the various pieces of syntax it should recognize and translate to Ruby. As you change <code>metagrammar.treetop</code> and its associated node classes, note that the node classes you are changing are also used to support the previous stable version of the metagrammar, so must be kept backward compatible until such time as a new stable version can be produced to replace it.</p>
24
-
25
- <h2>Tests</h2>
26
-
27
- <p>Most of the compiler's tests are functional in nature. The grammar under test is used to parse and compile piece of sample code. Then I attempt to parse input with the compiled output and test its results.</p>
28
-
29
- <h1>What Needs to be Done</h1>
30
-
31
- <h2>Small Stuff</h2>
32
-
33
- <ul>
34
- <li>Improve the <code>tt</code> command line tool to allow <code>.treetop</code> extensions to be elided in its arguments.</li>
35
- <li>Generate and load temp files with <code>Treetop.load</code> rather than evaluating strings to improve stack trace readability.</li>
36
- <li>Allow <code>do/end</code> style blocks as well as curly brace blocks. This was originally omitted because I thought it would be confusing. It probably isn't.</li>
37
- </ul>
38
-
39
-
40
- <h2>Big Stuff</h2>
41
-
42
- <h4>Transient Expressions</h4>
43
-
44
- <p>Currently, every parsing expression instantiates a syntax node. This includes even very simple parsing expressions, like single characters. It is probably unnecessary for every single expression in the parse to correspond to its own syntax node, so much savings could be garnered from a transient declaration that instructs the parser only to attempt a match without instantiating nodes.</p>
45
-
46
- <h3>Generate Rule Implementations in C</h3>
47
-
48
- <p>Parsing expressions are currently compiled into simple Ruby source code that comprises the body of parsing rules, which are translated into Ruby methods. The generator could produce C instead of Ruby in the body of these method implementations.</p>
49
-
50
- <h3>Global Parsing State and Semantic Backtrack Triggering</h3>
51
-
52
- <p>Some programming language grammars are not entirely context-free, requiring that global state dictate the behavior of the parser in certain circumstances. Treetop does not currently expose explicit parser control to the grammar writer, and instead automatically constructs the syntax tree for them. A means of semantic parser control compatible with this approach would involve callback methods defined on parsing nodes. Each time a node is successfully parsed it will be given an opportunity to set global state and optionally trigger a parse failure on <em>extrasyntactic</em> grounds. Nodes will probably need to define an additional method that undoes their changes to global state when there is a parse failure and they are backtracked.</p>
53
-
54
- <p>Here is a sketch of the potential utility of such mechanisms. Consider the structure of YAML, which uses indentation to indicate block structure.</p>
55
-
56
- <pre><code>level_1:
57
- level_2a:
58
- level_2b:
59
- level_3a:
60
- level_2c:
61
- </code></pre>
62
-
63
- <p>Imagine a grammar like the following:</p>
64
-
65
- <pre><code>rule yaml_element
66
- name ':' block
67
- /
68
- name ':' value
69
- end
70
-
71
- rule block
72
- indent yaml_elements outdent
73
- end
74
-
75
- rule yaml_elements
76
- yaml_element (samedent yaml_element)*
77
- end
78
-
79
- rule samedent
80
- newline spaces {
81
- def after_success(parser_state)
82
- spaces.length == parser_state.indent_level
83
- end
84
- }
85
- end
86
-
87
- rule indent
88
- newline spaces {
89
- def after_success(parser_state)
90
- if spaces.length == parser_state.indent_level + 2
91
- parser_state.indent_level += 2
92
- true
93
- else
94
- false # fail the parse on extrasyntactic grounds
95
- end
96
- end
97
-
98
- def undo_success(parser_state)
99
- parser_state.indent_level -= 2
100
- end
101
- }
102
- end
103
-
104
- rule outdent
105
- newline spaces {
106
- def after_success(parser_state)
107
- if spaces.length == parser_state.indent_level - 2
108
- parser_state.indent_level -= 2
109
- true
110
- else
111
- false # fail the parse on extrasyntactic grounds
112
- end
113
- end
114
-
115
- def undo_success(parser_state)
116
- parser_state.indent_level += 2
117
- end
118
- }
119
- end
120
- </code></pre>
121
-
122
- <p>In this case a block will be detected only if a change in indentation warrants it. Note that this change in the state of indentation must be undone if a subsequent failure causes this node not to ultimately be incorporated into a successful result.</p>
123
-
124
- <p>I am by no means sure that the above sketch is free of problems, or even that this overall strategy is sound, but it seems like a promising path.</p></div></div><div id="bottom"></div></body></html>