regex-treetop 1.4.8

Sign up to get free protection for your applications and to get access to all the features.
Files changed (65) hide show
  1. data/LICENSE +19 -0
  2. data/README.md +164 -0
  3. data/Rakefile +19 -0
  4. data/bin/tt +112 -0
  5. data/doc/contributing_and_planned_features.markdown +103 -0
  6. data/doc/grammar_composition.markdown +65 -0
  7. data/doc/index.markdown +90 -0
  8. data/doc/pitfalls_and_advanced_techniques.markdown +51 -0
  9. data/doc/semantic_interpretation.markdown +189 -0
  10. data/doc/site.rb +112 -0
  11. data/doc/sitegen.rb +65 -0
  12. data/doc/syntactic_recognition.markdown +100 -0
  13. data/doc/using_in_ruby.markdown +21 -0
  14. data/examples/lambda_calculus/arithmetic.rb +551 -0
  15. data/examples/lambda_calculus/arithmetic.treetop +97 -0
  16. data/examples/lambda_calculus/arithmetic_node_classes.rb +7 -0
  17. data/examples/lambda_calculus/arithmetic_test.rb +54 -0
  18. data/examples/lambda_calculus/lambda_calculus +0 -0
  19. data/examples/lambda_calculus/lambda_calculus.rb +718 -0
  20. data/examples/lambda_calculus/lambda_calculus.treetop +132 -0
  21. data/examples/lambda_calculus/lambda_calculus_node_classes.rb +5 -0
  22. data/examples/lambda_calculus/lambda_calculus_test.rb +89 -0
  23. data/examples/lambda_calculus/test_helper.rb +18 -0
  24. data/lib/treetop.rb +16 -0
  25. data/lib/treetop/bootstrap_gen_1_metagrammar.rb +45 -0
  26. data/lib/treetop/compiler.rb +6 -0
  27. data/lib/treetop/compiler/grammar_compiler.rb +44 -0
  28. data/lib/treetop/compiler/lexical_address_space.rb +17 -0
  29. data/lib/treetop/compiler/metagrammar.rb +3392 -0
  30. data/lib/treetop/compiler/metagrammar.treetop +454 -0
  31. data/lib/treetop/compiler/node_classes.rb +21 -0
  32. data/lib/treetop/compiler/node_classes/anything_symbol.rb +18 -0
  33. data/lib/treetop/compiler/node_classes/atomic_expression.rb +14 -0
  34. data/lib/treetop/compiler/node_classes/character_class.rb +28 -0
  35. data/lib/treetop/compiler/node_classes/choice.rb +31 -0
  36. data/lib/treetop/compiler/node_classes/declaration_sequence.rb +24 -0
  37. data/lib/treetop/compiler/node_classes/grammar.rb +28 -0
  38. data/lib/treetop/compiler/node_classes/inline_module.rb +27 -0
  39. data/lib/treetop/compiler/node_classes/nonterminal.rb +13 -0
  40. data/lib/treetop/compiler/node_classes/optional.rb +19 -0
  41. data/lib/treetop/compiler/node_classes/parenthesized_expression.rb +9 -0
  42. data/lib/treetop/compiler/node_classes/parsing_expression.rb +146 -0
  43. data/lib/treetop/compiler/node_classes/parsing_rule.rb +55 -0
  44. data/lib/treetop/compiler/node_classes/predicate.rb +45 -0
  45. data/lib/treetop/compiler/node_classes/predicate_block.rb +16 -0
  46. data/lib/treetop/compiler/node_classes/regex.rb +23 -0
  47. data/lib/treetop/compiler/node_classes/repetition.rb +55 -0
  48. data/lib/treetop/compiler/node_classes/sequence.rb +71 -0
  49. data/lib/treetop/compiler/node_classes/terminal.rb +20 -0
  50. data/lib/treetop/compiler/node_classes/transient_prefix.rb +9 -0
  51. data/lib/treetop/compiler/node_classes/treetop_file.rb +9 -0
  52. data/lib/treetop/compiler/ruby_builder.rb +113 -0
  53. data/lib/treetop/ruby_extensions.rb +2 -0
  54. data/lib/treetop/ruby_extensions/string.rb +42 -0
  55. data/lib/treetop/runtime.rb +5 -0
  56. data/lib/treetop/runtime/compiled_parser.rb +118 -0
  57. data/lib/treetop/runtime/interval_skip_list.rb +4 -0
  58. data/lib/treetop/runtime/interval_skip_list/head_node.rb +15 -0
  59. data/lib/treetop/runtime/interval_skip_list/interval_skip_list.rb +200 -0
  60. data/lib/treetop/runtime/interval_skip_list/node.rb +164 -0
  61. data/lib/treetop/runtime/syntax_node.rb +114 -0
  62. data/lib/treetop/runtime/terminal_parse_failure.rb +16 -0
  63. data/lib/treetop/runtime/terminal_syntax_node.rb +17 -0
  64. data/lib/treetop/version.rb +9 -0
  65. metadata +138 -0
data/LICENSE ADDED
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2007 Nathan Sobo.
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
@@ -0,0 +1,164 @@
1
+ Tutorial
2
+ ========
3
+ Languages can be split into two components, their *syntax* and their *semantics*. It's your understanding of English syntax that tells you the stream of words "Sleep furiously green ideas colorless" is not a valid sentence. Semantics is deeper. Even if we rearrange the above sentence to be "Colorless green ideas sleep furiously", which is syntactically correct, it remains nonsensical on a semantic level. With Treetop, you'll be dealing with languages that are much simpler than English, but these basic concepts apply. Your programs will need to address both the syntax and the semantics of the languages they interpret.
4
+
5
+ Treetop equips you with powerful tools for each of these two aspects of interpreter writing. You'll describe the syntax of your language with a *parsing expression grammar*. From this description, Treetop will generate a Ruby parser that transforms streams of characters written into your language into *abstract syntax trees* representing their structure. You'll then describe the semantics of your language in Ruby by defining methods on the syntax trees the parser generates.
6
+
7
+ Parsing Expression Grammars, The Basics
8
+ =======================================
9
+ The first step in using Treetop is defining a grammar in a file with the `.treetop` extension. Here's a grammar that's useless because it's empty:
10
+
11
+ # my_grammar.treetop
12
+ grammar MyGrammar
13
+ end
14
+
15
+ Next, you start filling your grammar with rules. Each rule associates a name with a parsing expression, like the following:
16
+
17
+ # my_grammar.treetop
18
+ # You can use a .tt extension instead if you wish
19
+ grammar MyGrammar
20
+ rule hello
21
+ 'hello chomsky'
22
+ end
23
+ end
24
+
25
+ The first rule becomes the *root* of the grammar, causing its expression to be matched when a parser for the grammar is fed a string. The above grammar can now be used in a Ruby program. Notice how a string matching the first rule parses successfully, but a second nonmatching string does not.
26
+
27
+ # use_grammar.rb
28
+ require 'rubygems'
29
+ require 'treetop'
30
+ Treetop.load 'my_grammar'
31
+ # or just:
32
+ # require 'my_grammar' # This works because Polyglot hooks "require" to find and load Treetop files
33
+
34
+ parser = MyGrammarParser.new
35
+ puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
36
+ puts parser.parse('silly generativists!') # => nil
37
+
38
+ Users of *regular expressions* will find parsing expressions familiar. They share the same basic purpose, matching strings against patterns. However, parsing expressions can recognize a broader category of languages than their less expressive brethren. Before we get into demonstrating that, lets cover some basics. At first parsing expressions won't seem much different. Trust that they are.
39
+
40
+ Terminal Symbols
41
+ ----------------
42
+ The expression in the grammar above is a terminal symbol. It will only match a string that matches it exactly. There are two other kinds of terminal symbols, which we'll revisit later. Terminals are called *atomic expressions* because they aren't composed of smaller expressions.
43
+
44
+ Ordered Choices
45
+ ---------------
46
+ Ordered choices are *composite expressions*, which allow for any of several subexpressions to be matched. These should be familiar from regular expressions, but in parsing expressions, they are delimited by the `/` character. Its important to note that the choices are prioritized in the order they appear. If an earlier expression is matched, no subsequent expressions are tried. Here's an example:
47
+
48
+ # my_grammar.treetop
49
+ grammar MyGrammar
50
+ rule hello
51
+ 'hello chomsky' / 'hello lambek'
52
+ end
53
+ end
54
+
55
+ # fragment of use_grammar.rb
56
+ puts parser.parse('hello chomsky') # => Treetop::Runtime::SyntaxNode
57
+ puts parser.parse('hello lambek') # => Treetop::Runtime::SyntaxNode
58
+ puts parser.parse('silly generativists!') # => nil
59
+
60
+ Note that once a choice rule has matched the text using a particular alternative at a particular location in the input and hence has succeeded, that choice will never be reconsidered, even if the chosen alternative causes another rule to fail where a later alternative wouldn't have. It's always a later alternative, since the first to succeed is final - why keep looking when you've found what you wanted? This is a feature of PEG parsers that you need to understand if you're going to succeed in using Treetop. In order to memoize success and failures, such decisions cannot be reversed. Luckily Treetop provides a variety of clever ways you can tell it to avoid making the wrong decisions. But more on that later.
61
+
62
+ Sequences
63
+ ---------
64
+ Sequences are composed of other parsing expressions separated by spaces. Using sequences, we can tighten up the above grammar.
65
+
66
+ # my_grammar.treetop
67
+ grammar MyGrammar
68
+ rule hello
69
+ 'hello ' ('chomsky' / 'lambek')
70
+ end
71
+ end
72
+
73
+ Note the use of parentheses to override the default precedence rules, which bind sequences more tightly than choices.
74
+
75
+ Once the whole sequence has been matched, the result is memoized and the details of the match will not be reconsidered for that location in the input.
76
+
77
+ Nonterminal Symbols
78
+ -------------------
79
+ Here we leave regular expressions behind. Nonterminals allow expressions to refer to other expressions by name. A trivial use of this facility would allow us to make the above grammar more readable should the list of names grow longer.
80
+
81
+ # my_grammar.treetop
82
+ grammar MyGrammar
83
+ rule hello
84
+ 'hello ' linguist
85
+ end
86
+
87
+ rule linguist
88
+ 'chomsky' / 'lambek' / 'jacobsen' / 'frege'
89
+ end
90
+ end
91
+
92
+ The true power of this facility, however, is unleashed when writing *recursive expressions*. Here is a self-referential expression that can match any number of open parentheses followed by any number of closed parentheses. This is theoretically impossible with regular expressions due to the *pumping lemma*.
93
+
94
+ # parentheses.treetop
95
+ grammar Parentheses
96
+ rule parens
97
+ '(' parens ')' / ''
98
+ end
99
+ end
100
+
101
+
102
+ The `parens` expression simply states that a `parens` is a set of parentheses surrounding another `parens` expression or, if that doesn't match, the empty string. If you are uncomfortable with recursion, its time to get comfortable, because it is the basis of language. Here's a tip: Don't try and imagine the parser circling round and round through the same rule. Instead, imagine the rule is *already* defined while you are defining it. If you imagine that `parens` already matches a string of matching parentheses, then its easy to think of `parens` as an open and closing parentheses around another set of matching parentheses, which conveniently, you happen to be defining. You know that `parens` is supposed to represent a string of matched parentheses, so trust in that meaning, even if you haven't fully implemented it yet.
103
+
104
+ Repetition
105
+ ----------
106
+ Any item in a rule may be followed by a '+' or a '*' character, signifying one-or-more and zero-or-more occurrences of that item. Beware though; the match is greedy, and if it matches too many items and causes subsequent items in the sequence to fail, the number matched will never be reconsidered. Here's a simple example of a rule that will never succeed:
107
+
108
+ # toogreedy.treetop
109
+ grammar TooGreedy
110
+ rule a_s
111
+ 'a'* 'a'
112
+ end
113
+ end
114
+
115
+ The 'a'* will always eat up any 'a's that follow, and the subsequent 'a' will find none there, so the whole rule will fail. You might need to use lookahead to avoid matching too much.
116
+
117
+ Negative Lookahead
118
+ ------------------
119
+
120
+ When you need to ensure that the following item *doesn't* match in some case where it might otherwise, you can use negat!ve lookahead, which is an item preceeded by a ! - here's an example:
121
+
122
+ # postcondition.treetop
123
+ grammar PostCondition
124
+ rule conditional_sentence
125
+ ( !conditional_keyword word )+ conditional_keyword [ \t]+ word*
126
+ end
127
+
128
+ rule word
129
+ ([a-zA-Z]+ [ \t]+)
130
+ end
131
+
132
+ rule conditional_keyword
133
+ 'if' / 'while' / 'until'
134
+ end
135
+ end
136
+
137
+ Even though the rule `word` would match any of the conditional keywords, the first words of a conditional_sentence must not be conditional_keywords. The negative lookahead prevents that matching, and prevents the repetition from matching too much input. Note that the lookahead may be a grammar rule of any complexity, including one that isn't used elsewhere in your grammar.
138
+
139
+ Positive lookahead
140
+ ------------------
141
+
142
+ Sometimes you want an item to match, but only if the *following* text would match some pattern. You don't want to consume that following text, but if it's not there, you want this rule to fail. You can append a positive lookahead like this to a rule by appending the lookahead rule preceeded by an & character.
143
+
144
+
145
+
146
+ Features to cover in the talk
147
+ =============================
148
+
149
+ * Treetop files
150
+ * Grammar definition
151
+ * Rules
152
+ * Loading a grammar
153
+ * Compiling a grammar with the `tt` command
154
+ * Accessing a parser for the grammar from Ruby
155
+ * Parsing Expressions of all kinds
156
+ ? Left recursion and factorization
157
+ - Here I can talk about function application, discussing how the operator
158
+ could be an arbitrary expression
159
+ * Inline node class eval blocks
160
+ * Node class declarations
161
+ * Labels
162
+ * Use of super within within labels
163
+ * Grammar composition with include
164
+ * Use of super with grammar composition
@@ -0,0 +1,19 @@
1
+ dir = File.dirname(__FILE__)
2
+ require 'rubygems'
3
+ require 'rake'
4
+ $LOAD_PATH.unshift(File.join(dir, 'vendor', 'rspec', 'lib'))
5
+ require 'spec/rake/spectask'
6
+
7
+ require 'rake/gempackagetask'
8
+
9
+ task :default => :spec
10
+
11
+ Spec::Rake::SpecTask.new do |t|
12
+ t.pattern = 'spec/**/*spec.rb'
13
+ end
14
+
15
+ load "./treetop.gemspec"
16
+
17
+ Rake::GemPackageTask.new($gemspec) do |pkg|
18
+ pkg.need_tar = true
19
+ end
data/bin/tt ADDED
@@ -0,0 +1,112 @@
1
+ #!/usr/bin/env ruby
2
+ require 'optparse'
3
+ require 'rubygems'
4
+ require 'treetop'
5
+
6
+ $LOAD_PATH.unshift(File.expand_path(File.dirname(__FILE__) + "/../lib"))
7
+ require 'treetop'
8
+ require 'treetop/version'
9
+
10
+ options = {}
11
+ parser = OptionParser.new do |opts|
12
+ exts = Treetop::VALID_GRAMMAR_EXT.collect { |i| '.' + i }
13
+
14
+ opts.banner = "Treetop Parsing Expression Grammar (PEG) Comand Line Compiler"
15
+ opts.define_head "Usage: tt [options] grammar_file[#{exts.join('|')}] ..."
16
+ opts.separator ''
17
+ opts.separator 'Examples:'
18
+ opts.separator ' tt foo.tt # 1 grammar -> 1 parser source'
19
+ opts.separator ' tt foo bar.treetop # 2 grammars -> 2 separate parsers'
20
+ opts.separator ' tt -o alt_name.rb foo # alternately named output file'
21
+ opts.separator ''
22
+ opts.separator ''
23
+ opts.separator 'NOTE: while treetop grammar files *must* have one of the following'
24
+ opts.separator 'filename extensions, the extension name is not required when calling'
25
+ opts.separator 'the compiler with grammar file names.'
26
+ opts.separator ''
27
+ opts.separator " Valid extensions: #{exts.join(', ')}"
28
+ opts.separator ''
29
+ opts.separator ''
30
+ opts.separator 'Options:'
31
+
32
+ opts.on('-o', '--output FILENAME', 'Write parser source to FILENAME') do |fn|
33
+ options[:out_file] = fn
34
+ end
35
+
36
+ opts.on('-f', '--force', 'Overwrite existing output file(s)') do
37
+ options[:force] = true
38
+ end
39
+
40
+ opts.on_tail('-v', '--version', 'Show Treetop version') do
41
+ puts "Treetop v#{Treetop::VERSION::STRING}"
42
+ exit
43
+ end
44
+
45
+ opts.on_tail('-h', '--help', 'Show this help message') do
46
+ puts opts
47
+ exit
48
+ end
49
+
50
+ end
51
+ file_list = parser.parse!
52
+
53
+ # check options and arg constraints
54
+ if file_list.empty? || (options[:out_file] && file_list.size > 1)
55
+ puts parser
56
+ exit 1
57
+ end
58
+
59
+ def grammar_exist?(filename)
60
+ if File.extname(filename).empty?
61
+ Treetop::VALID_GRAMMAR_EXT.each do |ext|
62
+ fn_ext = "#{filename}.#{ext}"
63
+ return true if File.exist?(fn_ext) && !File.zero?(fn_ext)
64
+ end
65
+ end
66
+ File.exist?(filename) && !File.zero?(filename)
67
+ end
68
+
69
+ def full_grammar_filename(filename)
70
+ return filename if !File.extname(filename).empty?
71
+ Treetop::VALID_GRAMMAR_EXT.each do |ext|
72
+ fn_ext = "#{filename}.#{ext}"
73
+ return fn_ext if File.exist?(fn_ext) && !File.zero?(fn_ext)
74
+ end
75
+ end
76
+
77
+ def protect_output?(filename, forced=false)
78
+ if !forced and
79
+ File.exist?(filename) and
80
+ (l=File.open(filename) { |f| f.gets rescue "" }) != Treetop::Compiler::AUTOGENERATED
81
+ puts "ERROR: '#{filename}' output already exists; skipping compilation...\n"
82
+ return true
83
+ end
84
+ false
85
+ end
86
+
87
+ compiler = Treetop::Compiler::GrammarCompiler.new
88
+
89
+ while !file_list.empty?
90
+ treetop_file = file_list.shift
91
+
92
+ # handle nonexistent and existent grammar files mixed together
93
+ if !grammar_exist?(treetop_file)
94
+ puts "ERROR: input grammar file '#{treetop_file}' does not exist; continuing...\n"
95
+ next
96
+ end
97
+
98
+ # try to compile
99
+ treetop_file = full_grammar_filename(treetop_file)
100
+ std_output_file = treetop_file.gsub(Treetop::VALID_GRAMMAR_EXT_REGEXP, '.rb')
101
+
102
+ if options[:out_file]
103
+ # explicit output file name option; never overwrite unless forced
104
+ next if protect_output?(options[:out_file], options[:force])
105
+ compiler.compile(treetop_file, options[:out_file])
106
+ else
107
+ # compile one input file from input file list option; never overwrite unless forced
108
+ next if protect_output?(std_output_file, options[:force])
109
+ compiler.compile(treetop_file)
110
+ end
111
+
112
+ end
@@ -0,0 +1,103 @@
1
+ #Google Group
2
+ I've created a <a href="http://groups.google.com/group/treetop-dev">Google Group</a> as a better place to organize discussion and development.
3
+ treetop-dev@google-groups.com
4
+
5
+ #Contributing
6
+ Visit <a href="http://github.com/nathansobo/treetop/tree/master">the Treetop repository page on GitHub</a> in your browser for more information about checking out the source code.
7
+
8
+ I like to try Rubinius's policy regarding commit rights. If you submit one patch worth integrating, I'll give you commit rights. We'll see how this goes, but I think it's a good policy.
9
+
10
+
11
+ ##Getting Started with the Code
12
+ Treetop compiler is interesting in that it is implemented in itself. Its functionality revolves around `metagrammar.treetop`, which specifies the grammar for Treetop grammars. I took a hybrid approach with regard to definition of methods on syntax nodes in the metagrammar. Methods that are more syntactic in nature, like those that provide access to elements of the syntax tree, are often defined inline, directly in the grammar. More semantic methods are defined in custom node classes.
13
+
14
+ Iterating on the metagrammar is tricky. The current testing strategy uses the last stable version of Treetop to parse the version under test. Then the version under test is used to parse and functionally test the various pieces of syntax it should recognize and translate to Ruby. As you change `metagrammar.treetop` and its associated node classes, note that the node classes you are changing are also used to support the previous stable version of the metagrammar, so must be kept backward compatible until such time as a new stable version can be produced to replace it.
15
+
16
+ ##Tests
17
+ Most of the compiler's tests are functional in nature. The grammar under test is used to parse and compile piece of sample code. Then I attempt to parse input with the compiled output and test its results.
18
+
19
+ #What Needs to be Done
20
+ ##Small Stuff
21
+ * Improve the `tt` command line tool to allow `.treetop` extensions to be elided in its arguments.
22
+ * Generate and load temp files with `Treetop.load` rather than evaluating strings to improve stack trace readability.
23
+ * Allow `do/end` style blocks as well as curly brace blocks. This was originally omitted because I thought it would be confusing. It probably isn't.
24
+
25
+ ##Big Stuff
26
+ ####Transient Expressions
27
+ Currently, every parsing expression instantiates a syntax node. This includes even very simple parsing expressions, like single characters. It is probably unnecessary for every single expression in the parse to correspond to its own syntax node, so much savings could be garnered from a transient declaration that instructs the parser only to attempt a match without instantiating nodes.
28
+
29
+ ###Generate Rule Implementations in C
30
+ Parsing expressions are currently compiled into simple Ruby source code that comprises the body of parsing rules, which are translated into Ruby methods. The generator could produce C instead of Ruby in the body of these method implementations.
31
+
32
+ ###Global Parsing State and Semantic Backtrack Triggering
33
+ Some programming language grammars are not entirely context-free, requiring that global state dictate the behavior of the parser in certain circumstances. Treetop does not currently expose explicit parser control to the grammar writer, and instead automatically constructs the syntax tree for them. A means of semantic parser control compatible with this approach would involve callback methods defined on parsing nodes. Each time a node is successfully parsed it will be given an opportunity to set global state and optionally trigger a parse failure on _extrasyntactic_ grounds. Nodes will probably need to define an additional method that undoes their changes to global state when there is a parse failure and they are backtracked.
34
+
35
+ Here is a sketch of the potential utility of such mechanisms. Consider the structure of YAML, which uses indentation to indicate block structure.
36
+
37
+ level_1:
38
+ level_2a:
39
+ level_2b:
40
+ level_3a:
41
+ level_2c:
42
+
43
+ Imagine a grammar like the following:
44
+
45
+ rule yaml_element
46
+ name ':' block
47
+ /
48
+ name ':' value
49
+ end
50
+
51
+ rule block
52
+ indent yaml_elements outdent
53
+ end
54
+
55
+ rule yaml_elements
56
+ yaml_element (samedent yaml_element)*
57
+ end
58
+
59
+ rule samedent
60
+ newline spaces {
61
+ def after_success(parser_state)
62
+ spaces.length == parser_state.indent_level
63
+ end
64
+ }
65
+ end
66
+
67
+ rule indent
68
+ newline spaces {
69
+ def after_success(parser_state)
70
+ if spaces.length == parser_state.indent_level + 2
71
+ parser_state.indent_level += 2
72
+ true
73
+ else
74
+ false # fail the parse on extrasyntactic grounds
75
+ end
76
+ end
77
+
78
+ def undo_success(parser_state)
79
+ parser_state.indent_level -= 2
80
+ end
81
+ }
82
+ end
83
+
84
+ rule outdent
85
+ newline spaces {
86
+ def after_success(parser_state)
87
+ if spaces.length == parser_state.indent_level - 2
88
+ parser_state.indent_level -= 2
89
+ true
90
+ else
91
+ false # fail the parse on extrasyntactic grounds
92
+ end
93
+ end
94
+
95
+ def undo_success(parser_state)
96
+ parser_state.indent_level += 2
97
+ end
98
+ }
99
+ end
100
+
101
+ In this case a block will be detected only if a change in indentation warrants it. Note that this change in the state of indentation must be undone if a subsequent failure causes this node not to ultimately be incorporated into a successful result.
102
+
103
+ I am by no means sure that the above sketch is free of problems, or even that this overall strategy is sound, but it seems like a promising path.
@@ -0,0 +1,65 @@
1
+ #Grammar Composition
2
+ A unique property of parsing expression grammars is that they are _closed under composition_. This means that when you compose two grammars they yield another grammar that can be composed yet again. This is a radical departure from parsing frameworks require on lexical scanning, which makes compositionally impossible. Treetop's facilities for composition are built upon those of Ruby.
3
+
4
+ ##The Mapping of Treetop Constructs to Ruby Constructs
5
+ When Treetop compiles a grammar definition, it produces a module and a class. The module contains methods implementing all of the rules defined in the grammar. The generated class is a subclass of Treetop::Runtime::CompiledParser and includes the module. For example:
6
+
7
+ grammar Foo
8
+ ...
9
+ end
10
+
11
+ results in a Ruby module named `Foo` and a Ruby class named `FooParser` that `include`s the `Foo` module.
12
+
13
+ ##Using Mixin Semantics to Compose Grammars
14
+ Because grammars are just modules, they can be mixed into one another. This enables grammars to share rules.
15
+
16
+ grammar A
17
+ rule a
18
+ 'a'
19
+ end
20
+ end
21
+
22
+ grammar B
23
+ include A
24
+
25
+ rule ab
26
+ a 'b'
27
+ end
28
+ end
29
+
30
+ Grammar `B` above references rule `a` defined in a separate grammar that it includes. Because module inclusion places modules in the ancestor chain, rules may also be overridden with the use of the `super` keyword accessing the overridden rule.
31
+
32
+ grammar A
33
+ rule a
34
+ 'a'
35
+ end
36
+ end
37
+
38
+ grammar B
39
+ include A
40
+
41
+ rule a
42
+ super / 'b'
43
+ end
44
+ end
45
+
46
+ Now rule `a` in grammar `B` matches either `'a'` or `'b'`.
47
+
48
+ ##Motivation
49
+ Imagine a grammar for Ruby that took account of SQL queries embedded in strings within the language. That could be achieved by combining two existing grammars.
50
+
51
+ grammar RubyPlusSQL
52
+ include Ruby
53
+ include SQL
54
+
55
+ rule expression
56
+ ruby_expression
57
+ end
58
+
59
+ rule ruby_string
60
+ ruby_quote sql_expression ruby_quote / ruby_string
61
+ end
62
+ end
63
+
64
+ ##Work to be Done
65
+ It has become clear that the include facility in grammars would be more useful if it had the ability to name prefix all rules from the included grammar to avoid collision. This is a planned but currently unimplemented feature.