personify 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (130) hide show
  1. data/.gitignore +1 -0
  2. data/LICENSE +20 -0
  3. data/README.md +172 -0
  4. data/Rakefile +53 -0
  5. data/VERSION +1 -0
  6. data/doc/syntax_ideas.md +141 -0
  7. data/lib/personify/context.rb +55 -0
  8. data/lib/personify/parser/personify.rb +1071 -0
  9. data/lib/personify/parser/personify.treetop +107 -0
  10. data/lib/personify/parser/personify_node_classes.rb +121 -0
  11. data/lib/personify/template.rb +17 -0
  12. data/lib/personify.rb +8 -0
  13. data/script/generate_parser.rb +6 -0
  14. data/test/context_test.rb +122 -0
  15. data/test/fixtures/multiple_tags.txt +8 -0
  16. data/test/parse_runner.rb +60 -0
  17. data/test/parser_test.rb +291 -0
  18. data/test/test_helper.rb +16 -0
  19. data/vendor/treetop/.gitignore +5 -0
  20. data/vendor/treetop/History.txt +9 -0
  21. data/vendor/treetop/README +164 -0
  22. data/vendor/treetop/Rakefile +20 -0
  23. data/vendor/treetop/Treetop.tmbundle/Snippets/grammar ___ end.tmSnippet +20 -0
  24. data/vendor/treetop/Treetop.tmbundle/Snippets/rule ___ end.tmSnippet +18 -0
  25. data/vendor/treetop/Treetop.tmbundle/Syntaxes/Treetop Grammar.tmLanguage +251 -0
  26. data/vendor/treetop/Treetop.tmbundle/info.plist +10 -0
  27. data/vendor/treetop/bin/tt +28 -0
  28. data/vendor/treetop/doc/contributing_and_planned_features.markdown +103 -0
  29. data/vendor/treetop/doc/grammar_composition.markdown +65 -0
  30. data/vendor/treetop/doc/index.markdown +90 -0
  31. data/vendor/treetop/doc/pitfalls_and_advanced_techniques.markdown +51 -0
  32. data/vendor/treetop/doc/semantic_interpretation.markdown +189 -0
  33. data/vendor/treetop/doc/site.rb +110 -0
  34. data/vendor/treetop/doc/sitegen.rb +60 -0
  35. data/vendor/treetop/doc/syntactic_recognition.markdown +100 -0
  36. data/vendor/treetop/doc/using_in_ruby.markdown +21 -0
  37. data/vendor/treetop/examples/lambda_calculus/arithmetic.rb +551 -0
  38. data/vendor/treetop/examples/lambda_calculus/arithmetic.treetop +97 -0
  39. data/vendor/treetop/examples/lambda_calculus/arithmetic_node_classes.rb +7 -0
  40. data/vendor/treetop/examples/lambda_calculus/arithmetic_test.rb +54 -0
  41. data/vendor/treetop/examples/lambda_calculus/lambda_calculus +0 -0
  42. data/vendor/treetop/examples/lambda_calculus/lambda_calculus.rb +718 -0
  43. data/vendor/treetop/examples/lambda_calculus/lambda_calculus.treetop +132 -0
  44. data/vendor/treetop/examples/lambda_calculus/lambda_calculus_node_classes.rb +5 -0
  45. data/vendor/treetop/examples/lambda_calculus/lambda_calculus_test.rb +89 -0
  46. data/vendor/treetop/examples/lambda_calculus/test_helper.rb +18 -0
  47. data/vendor/treetop/lib/treetop/bootstrap_gen_1_metagrammar.rb +45 -0
  48. data/vendor/treetop/lib/treetop/compiler/grammar_compiler.rb +40 -0
  49. data/vendor/treetop/lib/treetop/compiler/lexical_address_space.rb +17 -0
  50. data/vendor/treetop/lib/treetop/compiler/metagrammar.rb +2955 -0
  51. data/vendor/treetop/lib/treetop/compiler/metagrammar.treetop +404 -0
  52. data/vendor/treetop/lib/treetop/compiler/node_classes/anything_symbol.rb +20 -0
  53. data/vendor/treetop/lib/treetop/compiler/node_classes/atomic_expression.rb +14 -0
  54. data/vendor/treetop/lib/treetop/compiler/node_classes/character_class.rb +22 -0
  55. data/vendor/treetop/lib/treetop/compiler/node_classes/choice.rb +31 -0
  56. data/vendor/treetop/lib/treetop/compiler/node_classes/declaration_sequence.rb +24 -0
  57. data/vendor/treetop/lib/treetop/compiler/node_classes/grammar.rb +28 -0
  58. data/vendor/treetop/lib/treetop/compiler/node_classes/inline_module.rb +27 -0
  59. data/vendor/treetop/lib/treetop/compiler/node_classes/nonterminal.rb +13 -0
  60. data/vendor/treetop/lib/treetop/compiler/node_classes/optional.rb +19 -0
  61. data/vendor/treetop/lib/treetop/compiler/node_classes/parenthesized_expression.rb +9 -0
  62. data/vendor/treetop/lib/treetop/compiler/node_classes/parsing_expression.rb +138 -0
  63. data/vendor/treetop/lib/treetop/compiler/node_classes/parsing_rule.rb +55 -0
  64. data/vendor/treetop/lib/treetop/compiler/node_classes/predicate.rb +45 -0
  65. data/vendor/treetop/lib/treetop/compiler/node_classes/repetition.rb +55 -0
  66. data/vendor/treetop/lib/treetop/compiler/node_classes/sequence.rb +68 -0
  67. data/vendor/treetop/lib/treetop/compiler/node_classes/terminal.rb +20 -0
  68. data/vendor/treetop/lib/treetop/compiler/node_classes/transient_prefix.rb +9 -0
  69. data/vendor/treetop/lib/treetop/compiler/node_classes/treetop_file.rb +9 -0
  70. data/vendor/treetop/lib/treetop/compiler/node_classes.rb +19 -0
  71. data/vendor/treetop/lib/treetop/compiler/ruby_builder.rb +113 -0
  72. data/vendor/treetop/lib/treetop/compiler.rb +6 -0
  73. data/vendor/treetop/lib/treetop/ruby_extensions/string.rb +42 -0
  74. data/vendor/treetop/lib/treetop/ruby_extensions.rb +2 -0
  75. data/vendor/treetop/lib/treetop/runtime/compiled_parser.rb +95 -0
  76. data/vendor/treetop/lib/treetop/runtime/interval_skip_list/head_node.rb +15 -0
  77. data/vendor/treetop/lib/treetop/runtime/interval_skip_list/interval_skip_list.rb +200 -0
  78. data/vendor/treetop/lib/treetop/runtime/interval_skip_list/node.rb +164 -0
  79. data/vendor/treetop/lib/treetop/runtime/interval_skip_list.rb +4 -0
  80. data/vendor/treetop/lib/treetop/runtime/syntax_node.rb +72 -0
  81. data/vendor/treetop/lib/treetop/runtime/terminal_parse_failure.rb +16 -0
  82. data/vendor/treetop/lib/treetop/runtime/terminal_syntax_node.rb +17 -0
  83. data/vendor/treetop/lib/treetop/runtime.rb +5 -0
  84. data/vendor/treetop/lib/treetop/version.rb +9 -0
  85. data/vendor/treetop/lib/treetop.rb +11 -0
  86. data/vendor/treetop/script/generate_metagrammar.rb +14 -0
  87. data/vendor/treetop/script/svnadd +11 -0
  88. data/vendor/treetop/script/svnrm +11 -0
  89. data/vendor/treetop/spec/compiler/and_predicate_spec.rb +36 -0
  90. data/vendor/treetop/spec/compiler/anything_symbol_spec.rb +52 -0
  91. data/vendor/treetop/spec/compiler/character_class_spec.rb +188 -0
  92. data/vendor/treetop/spec/compiler/choice_spec.rb +80 -0
  93. data/vendor/treetop/spec/compiler/circular_compilation_spec.rb +28 -0
  94. data/vendor/treetop/spec/compiler/failure_propagation_functional_spec.rb +21 -0
  95. data/vendor/treetop/spec/compiler/grammar_compiler_spec.rb +84 -0
  96. data/vendor/treetop/spec/compiler/grammar_spec.rb +41 -0
  97. data/vendor/treetop/spec/compiler/nonterminal_symbol_spec.rb +40 -0
  98. data/vendor/treetop/spec/compiler/not_predicate_spec.rb +38 -0
  99. data/vendor/treetop/spec/compiler/one_or_more_spec.rb +35 -0
  100. data/vendor/treetop/spec/compiler/optional_spec.rb +37 -0
  101. data/vendor/treetop/spec/compiler/parenthesized_expression_spec.rb +19 -0
  102. data/vendor/treetop/spec/compiler/parsing_rule_spec.rb +32 -0
  103. data/vendor/treetop/spec/compiler/sequence_spec.rb +115 -0
  104. data/vendor/treetop/spec/compiler/terminal_spec.rb +81 -0
  105. data/vendor/treetop/spec/compiler/terminal_symbol_spec.rb +37 -0
  106. data/vendor/treetop/spec/compiler/test_grammar.treetop +7 -0
  107. data/vendor/treetop/spec/compiler/test_grammar.tt +7 -0
  108. data/vendor/treetop/spec/compiler/test_grammar_do.treetop +7 -0
  109. data/vendor/treetop/spec/compiler/zero_or_more_spec.rb +56 -0
  110. data/vendor/treetop/spec/composition/a.treetop +11 -0
  111. data/vendor/treetop/spec/composition/b.treetop +11 -0
  112. data/vendor/treetop/spec/composition/c.treetop +10 -0
  113. data/vendor/treetop/spec/composition/d.treetop +10 -0
  114. data/vendor/treetop/spec/composition/grammar_composition_spec.rb +26 -0
  115. data/vendor/treetop/spec/ruby_extensions/string_spec.rb +32 -0
  116. data/vendor/treetop/spec/runtime/compiled_parser_spec.rb +101 -0
  117. data/vendor/treetop/spec/runtime/interval_skip_list/delete_spec.rb +147 -0
  118. data/vendor/treetop/spec/runtime/interval_skip_list/expire_range_spec.rb +349 -0
  119. data/vendor/treetop/spec/runtime/interval_skip_list/insert_and_delete_node.rb +385 -0
  120. data/vendor/treetop/spec/runtime/interval_skip_list/insert_spec.rb +660 -0
  121. data/vendor/treetop/spec/runtime/interval_skip_list/interval_skip_list_spec.graffle +6175 -0
  122. data/vendor/treetop/spec/runtime/interval_skip_list/interval_skip_list_spec.rb +58 -0
  123. data/vendor/treetop/spec/runtime/interval_skip_list/palindromic_fixture.rb +23 -0
  124. data/vendor/treetop/spec/runtime/interval_skip_list/palindromic_fixture_spec.rb +164 -0
  125. data/vendor/treetop/spec/runtime/interval_skip_list/spec_helper.rb +84 -0
  126. data/vendor/treetop/spec/runtime/syntax_node_spec.rb +53 -0
  127. data/vendor/treetop/spec/spec_helper.rb +106 -0
  128. data/vendor/treetop/spec/spec_suite.rb +4 -0
  129. data/vendor/treetop/treetop.gemspec +18 -0
  130. metadata +196 -0
@@ -0,0 +1,251 @@
1
+ <?xml version="1.0" encoding="UTF-8"?>
2
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
3
+ <plist version="1.0">
4
+ <dict>
5
+ <key>fileTypes</key>
6
+ <array>
7
+ <string>treetop</string>
8
+ </array>
9
+ <key>foldingStartMarker</key>
10
+ <string>(grammer|rule).*$</string>
11
+ <key>foldingStopMarker</key>
12
+ <string>^\s*end</string>
13
+ <key>keyEquivalent</key>
14
+ <string>^~T</string>
15
+ <key>name</key>
16
+ <string>Treetop Grammar</string>
17
+ <key>patterns</key>
18
+ <array>
19
+ <dict>
20
+ <key>begin</key>
21
+ <string>^(grammar) ([A-Z]\w+)</string>
22
+ <key>beginCaptures</key>
23
+ <dict>
24
+ <key>1</key>
25
+ <dict>
26
+ <key>name</key>
27
+ <string>keyword.begin.grammar.treetop</string>
28
+ </dict>
29
+ <key>2</key>
30
+ <dict>
31
+ <key>name</key>
32
+ <string>entity.name.grammar.treetop</string>
33
+ </dict>
34
+ </dict>
35
+ <key>end</key>
36
+ <string>^end$</string>
37
+ <key>endCaptures</key>
38
+ <dict>
39
+ <key>0</key>
40
+ <dict>
41
+ <key>name</key>
42
+ <string>keyword.end.grammar.treetop</string>
43
+ </dict>
44
+ </dict>
45
+ <key>name</key>
46
+ <string>meta.grammar.treetop</string>
47
+ <key>patterns</key>
48
+ <array>
49
+ <dict>
50
+ <key>begin</key>
51
+ <string>\b(rule)\b (.+)$</string>
52
+ <key>beginCaptures</key>
53
+ <dict>
54
+ <key>1</key>
55
+ <dict>
56
+ <key>name</key>
57
+ <string>keyword.begin.rule.treetop</string>
58
+ </dict>
59
+ <key>2</key>
60
+ <dict>
61
+ <key>name</key>
62
+ <string>entity.name.rule.treetop</string>
63
+ </dict>
64
+ </dict>
65
+ <key>end</key>
66
+ <string>^\s+\bend\b\s*$</string>
67
+ <key>endCaptures</key>
68
+ <dict>
69
+ <key>0</key>
70
+ <dict>
71
+ <key>name</key>
72
+ <string>keyword.end.rule.treetop</string>
73
+ </dict>
74
+ </dict>
75
+ <key>name</key>
76
+ <string>meta.rule.treetop</string>
77
+ <key>patterns</key>
78
+ <array>
79
+ <dict>
80
+ <key>include</key>
81
+ <string>#strings</string>
82
+ </dict>
83
+ <dict>
84
+ <key>include</key>
85
+ <string>#character-class</string>
86
+ </dict>
87
+ <dict>
88
+ <key>match</key>
89
+ <string>\/</string>
90
+ <key>name</key>
91
+ <string>keyword.operator.or.treetop</string>
92
+ </dict>
93
+ <dict>
94
+ <key>match</key>
95
+ <string>&lt;\w+?&gt;</string>
96
+ <key>name</key>
97
+ <string>variable.class-instance.treetop</string>
98
+ </dict>
99
+ <dict>
100
+ <key>match</key>
101
+ <string>\w+?:</string>
102
+ <key>name</key>
103
+ <string>support.operand.treetop</string>
104
+ </dict>
105
+ <dict>
106
+ <key>begin</key>
107
+ <string>\{</string>
108
+ <key>end</key>
109
+ <string>\}</string>
110
+ <key>name</key>
111
+ <string>meta.embedded-ruby.treetop</string>
112
+ <key>patterns</key>
113
+ <array>
114
+ <dict>
115
+ <key>include</key>
116
+ <string>source.ruby</string>
117
+ </dict>
118
+ </array>
119
+ </dict>
120
+ </array>
121
+ </dict>
122
+ </array>
123
+ </dict>
124
+ </array>
125
+ <key>repository</key>
126
+ <dict>
127
+ <key>character-class</key>
128
+ <dict>
129
+ <key>patterns</key>
130
+ <array>
131
+ <dict>
132
+ <key>match</key>
133
+ <string>\\[wWsSdDhH]|\.</string>
134
+ <key>name</key>
135
+ <string>constant.character.character-class.regexp</string>
136
+ </dict>
137
+ <dict>
138
+ <key>match</key>
139
+ <string>\\.</string>
140
+ <key>name</key>
141
+ <string>constant.character.escape.backslash.regexp</string>
142
+ </dict>
143
+ <dict>
144
+ <key>begin</key>
145
+ <string>(\[)(\^)?</string>
146
+ <key>beginCaptures</key>
147
+ <dict>
148
+ <key>1</key>
149
+ <dict>
150
+ <key>name</key>
151
+ <string>punctuation.definition.character-class.regexp</string>
152
+ </dict>
153
+ <key>2</key>
154
+ <dict>
155
+ <key>name</key>
156
+ <string>keyword.operator.negation.regexp</string>
157
+ </dict>
158
+ </dict>
159
+ <key>end</key>
160
+ <string>(\])</string>
161
+ <key>endCaptures</key>
162
+ <dict>
163
+ <key>1</key>
164
+ <dict>
165
+ <key>name</key>
166
+ <string>punctuation.definition.character-class.regexp</string>
167
+ </dict>
168
+ </dict>
169
+ <key>name</key>
170
+ <string>constant.other.character-class.set.regexp</string>
171
+ <key>patterns</key>
172
+ <array>
173
+ <dict>
174
+ <key>include</key>
175
+ <string>#character-class</string>
176
+ </dict>
177
+ <dict>
178
+ <key>captures</key>
179
+ <dict>
180
+ <key>2</key>
181
+ <dict>
182
+ <key>name</key>
183
+ <string>constant.character.escape.backslash.regexp</string>
184
+ </dict>
185
+ <key>4</key>
186
+ <dict>
187
+ <key>name</key>
188
+ <string>constant.character.escape.backslash.regexp</string>
189
+ </dict>
190
+ </dict>
191
+ <key>match</key>
192
+ <string>(.|(\\.))\-([^\]]|(\\.))</string>
193
+ <key>name</key>
194
+ <string>constant.other.character-class.range.regexp</string>
195
+ </dict>
196
+ <dict>
197
+ <key>match</key>
198
+ <string>&amp;&amp;</string>
199
+ <key>name</key>
200
+ <string>keyword.operator.intersection.regexp</string>
201
+ </dict>
202
+ </array>
203
+ </dict>
204
+ </array>
205
+ </dict>
206
+ <key>strings</key>
207
+ <dict>
208
+ <key>patterns</key>
209
+ <array>
210
+ <dict>
211
+ <key>begin</key>
212
+ <string>'</string>
213
+ <key>beginCaptures</key>
214
+ <dict>
215
+ <key>0</key>
216
+ <dict>
217
+ <key>name</key>
218
+ <string>punctuation.definition.string.begin.treetop</string>
219
+ </dict>
220
+ </dict>
221
+ <key>end</key>
222
+ <string>'</string>
223
+ <key>endCaptures</key>
224
+ <dict>
225
+ <key>0</key>
226
+ <dict>
227
+ <key>name</key>
228
+ <string>punctuation.definition.string.end.treetop</string>
229
+ </dict>
230
+ </dict>
231
+ <key>name</key>
232
+ <string>string.quoted.single.treetop</string>
233
+ <key>patterns</key>
234
+ <array>
235
+ <dict>
236
+ <key>match</key>
237
+ <string>\\(u\h{4}|.)</string>
238
+ <key>name</key>
239
+ <string>constant.character.escape.antlr</string>
240
+ </dict>
241
+ </array>
242
+ </dict>
243
+ </array>
244
+ </dict>
245
+ </dict>
246
+ <key>scopeName</key>
247
+ <string>source.treetop</string>
248
+ <key>uuid</key>
249
+ <string>A1604A34-0B73-4D5A-9499-87D881DFA8D5</string>
250
+ </dict>
251
+ </plist>
@@ -0,0 +1,10 @@
1
+ <?xml version="1.0" encoding="UTF-8"?>
2
+ <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
3
+ <plist version="1.0">
4
+ <dict>
5
+ <key>name</key>
6
+ <string>Treetop</string>
7
+ <key>uuid</key>
8
+ <string>83A8B700-143D-4BD6-B4EA-D73796E8F883</string>
9
+ </dict>
10
+ </plist>
@@ -0,0 +1,28 @@
1
+ #!/usr/bin/env ruby
2
+ require 'rubygems'
3
+ gem 'treetop'
4
+
5
+ $LOAD_PATH.unshift(File.expand_path(File.dirname(__FILE__) + "/../lib"))
6
+ require 'treetop'
7
+
8
+ if ARGV.empty?
9
+ puts "Usage:\n\ntt foo.treetop bar.treetop ...\n or\ntt foo.treetop -o alternate_name.rb\n\n"
10
+ exit
11
+ end
12
+
13
+ compiler = Treetop::Compiler::GrammarCompiler.new
14
+
15
+ while !ARGV.empty?
16
+ treetop_file = ARGV.shift
17
+ if !File.exist?(treetop_file)
18
+ puts "Error: file '#{treetop_file}' doesn't exist\n\n"
19
+ exit(2)
20
+ end
21
+ if ARGV.size >= 2 && ARGV[1] == '-o'
22
+ ARGV.shift # explicit output file name option
23
+ compiler.compile(treetop_file, ARGV.shift)
24
+ else
25
+ # list of input files option
26
+ compiler.compile(treetop_file)
27
+ end
28
+ end
@@ -0,0 +1,103 @@
1
+ #Google Group
2
+ I've created a <a href="http://groups.google.com/group/treetop-dev">Google Group</a> as a better place to organize discussion and development.
3
+ treetop-dev@google-groups.com
4
+
5
+ #Contributing
6
+ Visit <a href="http://github.com/nathansobo/treetop/tree/master">the Treetop repository page on GitHub</a> in your browser for more information about checking out the source code.
7
+
8
+ I like to try Rubinius's policy regarding commit rights. If you submit one patch worth integrating, I'll give you commit rights. We'll see how this goes, but I think it's a good policy.
9
+
10
+
11
+ ##Getting Started with the Code
12
+ Treetop compiler is interesting in that it is implemented in itself. Its functionality revolves around `metagrammar.treetop`, which specifies the grammar for Treetop grammars. I took a hybrid approach with regard to definition of methods on syntax nodes in the metagrammar. Methods that are more syntactic in nature, like those that provide access to elements of the syntax tree, are often defined inline, directly in the grammar. More semantic methods are defined in custom node classes.
13
+
14
+ Iterating on the metagrammar is tricky. The current testing strategy uses the last stable version of Treetop to parse the version under test. Then the version under test is used to parse and functionally test the various pieces of syntax it should recognize and translate to Ruby. As you change `metagrammar.treetop` and its associated node classes, note that the node classes you are changing are also used to support the previous stable version of the metagrammar, so must be kept backward compatible until such time as a new stable version can be produced to replace it.
15
+
16
+ ##Tests
17
+ Most of the compiler's tests are functional in nature. The grammar under test is used to parse and compile piece of sample code. Then I attempt to parse input with the compiled output and test its results.
18
+
19
+ #What Needs to be Done
20
+ ##Small Stuff
21
+ * Improve the `tt` command line tool to allow `.treetop` extensions to be elided in its arguments.
22
+ * Generate and load temp files with `Treetop.load` rather than evaluating strings to improve stack trace readability.
23
+ * Allow `do/end` style blocks as well as curly brace blocks. This was originally omitted because I thought it would be confusing. It probably isn't.
24
+
25
+ ##Big Stuff
26
+ ####Transient Expressions
27
+ Currently, every parsing expression instantiates a syntax node. This includes even very simple parsing expressions, like single characters. It is probably unnecessary for every single expression in the parse to correspond to its own syntax node, so much savings could be garnered from a transient declaration that instructs the parser only to attempt a match without instantiating nodes.
28
+
29
+ ###Generate Rule Implementations in C
30
+ Parsing expressions are currently compiled into simple Ruby source code that comprises the body of parsing rules, which are translated into Ruby methods. The generator could produce C instead of Ruby in the body of these method implementations.
31
+
32
+ ###Global Parsing State and Semantic Backtrack Triggering
33
+ Some programming language grammars are not entirely context-free, requiring that global state dictate the behavior of the parser in certain circumstances. Treetop does not currently expose explicit parser control to the grammar writer, and instead automatically constructs the syntax tree for them. A means of semantic parser control compatible with this approach would involve callback methods defined on parsing nodes. Each time a node is successfully parsed it will be given an opportunity to set global state and optionally trigger a parse failure on _extrasyntactic_ grounds. Nodes will probably need to define an additional method that undoes their changes to global state when there is a parse failure and they are backtracked.
34
+
35
+ Here is a sketch of the potential utility of such mechanisms. Consider the structure of YAML, which uses indentation to indicate block structure.
36
+
37
+ level_1:
38
+ level_2a:
39
+ level_2b:
40
+ level_3a:
41
+ level_2c:
42
+
43
+ Imagine a grammar like the following:
44
+
45
+ rule yaml_element
46
+ name ':' block
47
+ /
48
+ name ':' value
49
+ end
50
+
51
+ rule block
52
+ indent yaml_elements outdent
53
+ end
54
+
55
+ rule yaml_elements
56
+ yaml_element (samedent yaml_element)*
57
+ end
58
+
59
+ rule samedent
60
+ newline spaces {
61
+ def after_success(parser_state)
62
+ spaces.length == parser_state.indent_level
63
+ end
64
+ }
65
+ end
66
+
67
+ rule indent
68
+ newline spaces {
69
+ def after_success(parser_state)
70
+ if spaces.length == parser_state.indent_level + 2
71
+ parser_state.indent_level += 2
72
+ true
73
+ else
74
+ false # fail the parse on extrasyntactic grounds
75
+ end
76
+ end
77
+
78
+ def undo_success(parser_state)
79
+ parser_state.indent_level -= 2
80
+ end
81
+ }
82
+ end
83
+
84
+ rule outdent
85
+ newline spaces {
86
+ def after_success(parser_state)
87
+ if spaces.length == parser_state.indent_level - 2
88
+ parser_state.indent_level -= 2
89
+ true
90
+ else
91
+ false # fail the parse on extrasyntactic grounds
92
+ end
93
+ end
94
+
95
+ def undo_success(parser_state)
96
+ parser_state.indent_level += 2
97
+ end
98
+ }
99
+ end
100
+
101
+ In this case a block will be detected only if a change in indentation warrants it. Note that this change in the state of indentation must be undone if a subsequent failure causes this node not to ultimately be incorporated into a successful result.
102
+
103
+ I am by no means sure that the above sketch is free of problems, or even that this overall strategy is sound, but it seems like a promising path.
@@ -0,0 +1,65 @@
1
+ #Grammar Composition
2
+ A unique property of parsing expression grammars is that they are _closed under composition_. This means that when you compose two grammars they yield another grammar that can be composed yet again. This is a radical departure from parsing frameworks require on lexical scanning, which makes compositionally impossible. Treetop's facilities for composition are built upon those of Ruby.
3
+
4
+ ##The Mapping of Treetop Constructs to Ruby Constructs
5
+ When Treetop compiles a grammar definition, it produces a module and a class. The module contains methods implementing all of the rules defined in the grammar. The generated class is a subclass of Treetop::Runtime::CompiledParser and includes the module. For example:
6
+
7
+ grammar Foo
8
+ ...
9
+ end
10
+
11
+ results in a Ruby module named `Foo` and a Ruby class named `FooParser` that `include`s the `Foo` module.
12
+
13
+ ##Using Mixin Semantics to Compose Grammars
14
+ Because grammars are just modules, they can be mixed into one another. This enables grammars to share rules.
15
+
16
+ grammar A
17
+ rule a
18
+ 'a'
19
+ end
20
+ end
21
+
22
+ grammar B
23
+ include A
24
+
25
+ rule ab
26
+ a 'b'
27
+ end
28
+ end
29
+
30
+ Grammar `B` above references rule `a` defined in a separate grammar that it includes. Because module inclusion places modules in the ancestor chain, rules may also be overridden with the use of the `super` keyword accessing the overridden rule.
31
+
32
+ grammar A
33
+ rule a
34
+ 'a'
35
+ end
36
+ end
37
+
38
+ grammar B
39
+ include A
40
+
41
+ rule a
42
+ super / 'b'
43
+ end
44
+ end
45
+
46
+ Now rule `a` in grammar `B` matches either `'a'` or `'b'`.
47
+
48
+ ##Motivation
49
+ Imagine a grammar for Ruby that took account of SQL queries embedded in strings within the language. That could be achieved by combining two existing grammars.
50
+
51
+ grammar RubyPlusSQL
52
+ include Ruby
53
+ include SQL
54
+
55
+ rule expression
56
+ ruby_expression
57
+ end
58
+
59
+ rule ruby_string
60
+ ruby_quote sql_expression ruby_quote / ruby_string
61
+ end
62
+ end
63
+
64
+ ##Work to be Done
65
+ It has become clear that the include facility in grammars would be more useful if it had the ability to name prefix all rules from the included grammar to avoid collision. This is a planned but currently unimplemented feature.
@@ -0,0 +1,90 @@
1
+ <p class="intro_text">
2
+
3
+ Treetop is a language for describing languages. Combining the elegance of Ruby with cutting-edge <em>parsing expression grammars</em>, it helps you analyze syntax with revolutionarily ease.
4
+
5
+ </p>
6
+
7
+ sudo gem install treetop
8
+
9
+ #Intuitive Grammar Specifications
10
+ Parsing expression grammars (PEGs) are simple to write and easy to maintain. They are a simple but powerful generalization of regular expressions that are easier to work with than the LALR or LR-1 grammars of traditional parser generators. There's no need for a tokenization phase, and _lookahead assertions_ can be used for a limited degree of context-sensitivity. Here's an extremely simple Treetop grammar that matches a subset of arithmetic, respecting operator precedence:
11
+
12
+ grammar Arithmetic
13
+ rule additive
14
+ multitive '+' additive / multitive
15
+ end
16
+
17
+ rule multitive
18
+ primary '*' multitive / primary
19
+ end
20
+
21
+ rule primary
22
+ '(' additive ')' / number
23
+ end
24
+
25
+ rule number
26
+ [1-9] [0-9]*
27
+ end
28
+ end
29
+
30
+
31
+ #Syntax-Oriented Programming
32
+ Rather than implementing semantic actions that construct parse trees, Treetop lets you define methods on trees that it constructs for you automatically. You can define these methods directly within the grammar...
33
+
34
+ grammar Arithmetic
35
+ rule additive
36
+ multitive '+' additive {
37
+ def value
38
+ multitive.value + additive.value
39
+ end
40
+ }
41
+ /
42
+ multitive
43
+ end
44
+
45
+ # other rules below ...
46
+ end
47
+
48
+ ...or associate rules with classes of nodes you wish your parsers to instantiate upon matching a rule.
49
+
50
+ grammar Arithmetic
51
+ rule additive
52
+ multitive '+' additive <AdditiveNode>
53
+ /
54
+ multitive
55
+ end
56
+
57
+ # other rules below ...
58
+ end
59
+
60
+
61
+ #Reusable, Composable Language Descriptions
62
+ Because PEGs are closed under composition, Treetop grammars can be treated like Ruby modules. You can mix them into one another and override rules with access to the `super` keyword. You can break large grammars down into coherent units or make your language's syntax modular. This is especially useful if you want other programmers to be able to reuse your work.
63
+
64
+ grammar RubyWithEmbeddedSQL
65
+ include SQL
66
+
67
+ rule string
68
+ quote sql_expression quote / super
69
+ end
70
+ end
71
+
72
+
73
+ #Acknowledgements
74
+
75
+
76
+ <a href="http://pivotallabs.com"><img id="pivotal_logo" src="./images/pivotal.gif"></a>
77
+
78
+ First, thank you to my employer Rob Mee of <a href="http://pivotallabs.com"/>Pivotal Labs</a> for funding a substantial portion of Treetop's development. He gets it.
79
+
80
+
81
+ I'd also like to thank:
82
+
83
+ * Damon McCormick for several hours of pair programming.
84
+ * Nick Kallen for lots of well-considered feedback and a few afternoons of programming.
85
+ * Brian Takita for a night of pair programming.
86
+ * Eliot Miranda for urging me rewrite as a compiler right away rather than putting it off.
87
+ * Ryan Davis and Eric Hodel for hurting my code.
88
+ * Dav Yaginuma for kicking me into action on my idea.
89
+ * Bryan Ford for his seminal work on Packrat Parsers.
90
+ * The editors of Lambda the Ultimate, where I discovered parsing expression grammars.
@@ -0,0 +1,51 @@
1
+ #Pitfalls
2
+ ##Left Recursion
3
+ An weakness shared by all recursive descent parsers is the inability to parse left-recursive rules. Consider the following rule:
4
+
5
+ rule left_recursive
6
+ left_recursive 'a' / 'a'
7
+ end
8
+
9
+ Logically it should match a list of 'a' characters. But it never consumes anything, because attempting to recognize `left_recursive` begins by attempting to recognize `left_recursive`, and so goes an infinite recursion. There's always a way to eliminate these types of structures from your grammar. There's a mechanistic transformation called _left factorization_ that can eliminate it, but it isn't always pretty, especially in combination with automatically constructed syntax trees. So far, I have found more thoughtful ways around the problem. For instance, in the interpreter example I interpret inherently left-recursive function application right recursively in syntax, then correct the directionality in my semantic interpretation. You may have to be clever.
10
+
11
+ #Advanced Techniques
12
+ Here are a few interesting problems I've encountered. I figure sharing them may give you insight into how these types of issues are addressed with the tools of parsing expressions.
13
+
14
+ ##Matching a String
15
+
16
+ rule string
17
+ '"' (!'"' . / '\"')* '"'
18
+ end
19
+
20
+ This expression says: Match a quote, then zero or more of any character but a quote or an escaped quote followed by a quote. Lookahead assertions are essential for these types of problems.
21
+
22
+ ##Matching Nested Structures With Non-Unique Delimeters
23
+ Say I want to parse a diabolical wiki syntax in which the following interpretations apply.
24
+
25
+ ** *hello* ** --> <strong><em>hello</em></strong>
26
+ * **hello** * --> <em><strong>hello</strong></em>
27
+
28
+ rule strong
29
+ '**' (em / !'*' . / '\*')+ '**'
30
+ end
31
+
32
+ rule em
33
+ '**' (strong / !'*' . / '\*')+ '**'
34
+ end
35
+
36
+ Emphasized text is allowed within strong text by virtue of `em` being the first alternative. Since `em` will only successfully parse if a matching `*` is found, it is permitted, but other than that, no `*` characters are allowed unless they are escaped.
37
+
38
+ ##Matching a Keyword But Not Words Prefixed Therewith
39
+ Say I want to consider a given string a characters only when it occurs in isolation. Lets use the `end` keyword as an example. We don't want the prefix of `'enders_game'` to be considered a keyword. A naiive implementation might be the following.
40
+
41
+ rule end_keyword
42
+ 'end' &space
43
+ end
44
+
45
+ This says that `'end'` must be followed by a space, but this space is not consumed as part of the matching of `keyword`. This works in most cases, but is actually incorrect. What if `end` occurs at the end of the buffer? In that case, it occurs in isolation but will not match the above expression. What we really mean is that `'end'` cannot be followed by a _non-space_ character.
46
+
47
+ rule end_keyword
48
+ 'end' !(!' ' .)
49
+ end
50
+
51
+ In general, when the syntax gets tough, it helps to focus on what you really mean. A keyword is a character not followed by another character that isn't a space.