babel_bridge 0.5.1 → 0.5.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. data/CHANGE_LOG +165 -0
  2. data/Gemfile +4 -0
  3. data/Guardfile +7 -0
  4. data/LICENCE +24 -0
  5. data/README.md +244 -0
  6. data/Rakefile +8 -2
  7. data/TODO +100 -0
  8. data/babel_bridge.gemspec +11 -3
  9. data/examples/json/json_parser.rb +23 -0
  10. data/examples/json/json_parser2.rb +37 -0
  11. data/lib/babel_bridge.rb +3 -2
  12. data/lib/{nodes.rb → babel_bridge/nodes.rb} +0 -0
  13. data/lib/{nodes → babel_bridge/nodes}/empty_node.rb +0 -0
  14. data/lib/{nodes → babel_bridge/nodes}/node.rb +1 -1
  15. data/lib/{nodes → babel_bridge/nodes}/non_terminal_node.rb +0 -8
  16. data/lib/{nodes → babel_bridge/nodes}/root_node.rb +0 -0
  17. data/lib/{nodes → babel_bridge/nodes}/rule_node.rb +0 -0
  18. data/lib/{nodes → babel_bridge/nodes}/terminal_node.rb +0 -0
  19. data/lib/{parser.rb → babel_bridge/parser.rb} +7 -14
  20. data/lib/{pattern_element.rb → babel_bridge/pattern_element.rb} +27 -25
  21. data/lib/babel_bridge/pattern_element_hash.rb +22 -0
  22. data/lib/{rule.rb → babel_bridge/rule.rb} +0 -0
  23. data/lib/{rule_variant.rb → babel_bridge/rule_variant.rb} +0 -4
  24. data/lib/{shell.rb → babel_bridge/shell.rb} +0 -0
  25. data/lib/{string.rb → babel_bridge/string.rb} +0 -0
  26. data/lib/{tools.rb → babel_bridge/tools.rb} +0 -0
  27. data/lib/babel_bridge/version.rb +3 -0
  28. data/spec/advanced_parsers_spec.rb +1 -0
  29. data/spec/basic_parsing_spec.rb +43 -0
  30. data/spec/bb_spec.rb +19 -0
  31. data/spec/compound_patterns_spec.rb +61 -0
  32. data/spec/node_spec.rb +3 -3
  33. data/spec/pattern_generators_spec.rb +4 -4
  34. data/spec/spec_helper.rb +3 -0
  35. metadata +115 -33
  36. data/README +0 -144
  37. data/examples/turing/examples.turing +0 -33
  38. data/examples/turing/notes.rb +0 -111
  39. data/examples/turing/turing_demo.rb +0 -71
  40. data/lib/version.rb +0 -4
data/CHANGE_LOG ADDED
@@ -0,0 +1,165 @@
1
+ 2013-2-12 v0.5.3
2
+
3
+ fixed bug with 0-length matchs' to_s returning non-zero-length strings
4
+
5
+ 2012-1-25 v0.5.1
6
+
7
+ added parser.relative_source_file
8
+
9
+ 2012-1-12 v0.5.0
10
+
11
+ added Parser.new :source_file => String
12
+ Sets parser.source_file value
13
+
14
+ Changed uniform_tabs to NOT include at least one space. If you want to ensure at least one space, you should add a space after your tab.
15
+
16
+ Fixed out-of-date tests in tools_spec.
17
+
18
+ 2012-1-6 v0.5.0
19
+
20
+ Nodes now have #line and #column methods which return the line and column of the source for the start of that Node's match.
21
+
22
+ 2012-1-5 v0.5.0
23
+
24
+ Completely reworked ignore_whitespace - again.
25
+
26
+ Now there is a global "delimiter" pattern which is matched between every sub-pattern of every rule AND at the begining and end of the entire parse.
27
+
28
+ ignore_whitespace sets this delimiter to: /\s*/
29
+
30
+ You can set your own delimiter with the delimiter method:
31
+
32
+ class MyParser < BabelBridge::Parser
33
+ delimiter :hi, "there", "/[mM]ust/", "be between every sub-pattern!" # delimiter can take any pattern "rule" can
34
+ rule :hi, "hi"
35
+ end
36
+
37
+ You can override the delimiter pattern for a single rule to put in special code:
38
+
39
+ class MyParser < BabelBridge::Parser
40
+ ignore_whitespace
41
+
42
+ rule :root, many(:statement, ';')
43
+ rule :statement, many(:word, / +/), :delimiter => // # disable the global delimiter
44
+ end
45
+
46
+ INCOMPATIBLE CHANGE: node.matches is no longer positional
47
+
48
+ node.matches now includes only things that were matched. This means conditional matches which do not match no longer add an EmptyNode to node.matches.
49
+
50
+ node.matches now contains all delimiter matches.
51
+
52
+ INCOMPATIBLE CHANGE: no more ManyNode
53
+
54
+ The many(rule) parser pattern no longer generates a special kind of parse-tree node. Instead it adds all its matches to the parent rule's .matches list. It also adds all the many-delimiters.
55
+
56
+ NOTE: 'delimiter' referes to the global delimiter pattern or the rule-local override. 'many-delimiter' refers to the optional, explicit delimiter specfied for the many-pattern.
57
+
58
+ NOTE: many(:rule,:many_delimiter) will effectively match: [rule]([delimiter][many_delimiter][delimiter][rule])*
59
+
60
+ 2012-12-31 v0.4.2
61
+
62
+ Bugfix: parser_failure_info now works when nothing is matched
63
+
64
+ 2012-12-17 v0.4.1
65
+
66
+ rewind_whitespace usage example:
67
+
68
+ rule :end_statement, rewind_whitespace, /([\t ]*[\n;])+/
69
+
70
+ In this example, end_statement is similar to the end-of-statement pattern for the ruby language. Each statement either ends with a new line or a semicolon. "rewind_whitspace" indicates the parser should back up to the end of the last match and then continue matching.
71
+
72
+ 2012-11-20 v0.4.0
73
+
74
+ INCOMPATIBLE CHANGE: Removed the post-match pattern option from the "many" pattern matcher. It simplifies things and can easily be reproduced with a custom rule.
75
+
76
+ Did significant code cleanup. NonTerminalNode was renamed RuleNode and a new NonTerminalNode class was created as a parent for RuleNode and ManyNode.
77
+
78
+ ignore_whitespace is now just a regexp. An Empty regexp is used if ignore_whitespace is not specified. It is now handled consistenly throughout. Every node has postwhitespace_range and prewhitespace_range methods that allow you to find the whitespace after and before that node.
79
+
80
+ node.to_s and node.text now both just return the matched text WITHOUT the preceding and trailing whitespace. Note, however, that it will still include any whitespace inbetween as it is just a single slice out of the source.
81
+
82
+ 2012-11-13
83
+
84
+ ignore_whitespace now optionally takes a regexp for what to ignore after every TerminalNode. Default: /\s*/
85
+
86
+ rewind_whitespace matching pattern added. This allows you to match the string ignored by "ignore_whitespace" after the previous token.
87
+
88
+ Example: Implements the Ruby ";" / new-line parsing rule.
89
+
90
+ class MyParser < BabelBridge::Parser
91
+ ignore_whitespace
92
+
93
+ rule :pair, :statement, :end_statement, :statement
94
+ rule :end_statement, rewind_whitespace(/([\t ]*[\n;])+/)
95
+ rule :statement, "0"
96
+ end
97
+
98
+ # matches two 0s separated by one or more ";" or "\n" and any whitespace
99
+
100
+
101
+ 2012-09-28
102
+
103
+ Added to_sym on nodes.
104
+
105
+ 2012-09-19 version 0.3.1
106
+
107
+ Added refinements to the parser-failure output.
108
+
109
+ 2012-09-13
110
+
111
+ Reversed the precedence order for binary_operators_rule. The first element has the highest precedence, i.e., it is computed first.
112
+
113
+ Now, the correct precedence order for the basic operators is:
114
+
115
+ [["*", "/"], ["+", "-"]]
116
+
117
+ 2012-09-12
118
+
119
+ using readline for shell
120
+
121
+ added support for infix binary operator presedence resolution:
122
+
123
+ USAGE:
124
+
125
+ binary_operators_rule :any_rule_name, :operands_pattern, operators, [:right_operators => [...]]
126
+
127
+ Where "operators" is an array of operators, ordered by precedence such as: ["+", "-", "*", "/"].
128
+
129
+ The last operators in the array are matched first.
130
+
131
+ You can also group operators into the same precedence level: [["+", "-"], ["*", "/"]]
132
+
133
+ Operators in the same precedence level are matched left-to-right.
134
+
135
+ You optionally can list one or more "right_operators" - which can be strings or regexps - to specify which operators are right-associative.
136
+
137
+ MATCHING:
138
+
139
+ binary_operators_rule :any_rule_name, :operands_pattern, ["+", "-", "*", "/"]
140
+
141
+ matches the same string as:
142
+
143
+ rule :any_rule_name, many(:operands_pattern,/[-+*\/]/)
144
+
145
+ PARSE TREE:
146
+
147
+ The resulting parse-tree consists of 1 or more instances of the :any_rule_name rule's varient class. Each node has methods for easy acess to:
148
+
149
+ left -> the left operand node
150
+ right -> the right operand node
151
+ operator -> the operator as a symbol
152
+ operator_node -> the operator node
153
+
154
+ ignore_whitespace feature added
155
+
156
+ Called in the parser's class. Sets a flag that causes all future parsing to ignore white spaces. Specifically, this means that after each terminal-node match, all trailing-whitespace is consumed before the next terminal match is attempted.
157
+
158
+ This means that terminal nodes can still match any white-spaces they require.
159
+
160
+ The exact matched string, including trailing whitespace, is still available via the "text" method. The "to_s" method, though, now returns the stripped token value (if ignore_whitespace is enabled).
161
+
162
+ 2012-09-09
163
+
164
+ forward_to now scans all patern elements for the first one that responds to the method
165
+ added shell
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in foiled.gemspec
4
+ gemspec
data/Guardfile ADDED
@@ -0,0 +1,7 @@
1
+ guard 'rspec', :cli => "--color" do
2
+ watch(%r{^spec/.+_spec\.rb$})
3
+ watch(%r{^lib/(.+)\.rb$}) { "spec" }
4
+ watch('spec/spec_helper.rb') { "spec" }
5
+
6
+ end
7
+
data/LICENCE ADDED
@@ -0,0 +1,24 @@
1
+ Copyright (c) 2010, Shane Brinkman-Davis
2
+ All rights reserved.
3
+
4
+ Redistribution and use in source and binary forms, with or without
5
+ modification, are permitted provided that the following conditions are met:
6
+ * Redistributions of source code must retain the above copyright
7
+ notice, this list of conditions and the following disclaimer.
8
+ * Redistributions in binary form must reproduce the above copyright
9
+ notice, this list of conditions and the following disclaimer in the
10
+ documentation and/or other materials provided with the distribution.
11
+ * Neither the name of the <organization> nor the
12
+ names of its contributors may be used to endorse or promote products
13
+ derived from this software without specific prior written permission.
14
+
15
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
16
+ ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
17
+ WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
18
+ DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
19
+ DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
20
+ (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
21
+ LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
22
+ ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
23
+ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
24
+ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
data/README.md ADDED
@@ -0,0 +1,244 @@
1
+ Summary
2
+ -------
3
+
4
+ Babel Bridge let's you generate parsers 100% in Ruby code. It is a memoizing Parsing Expression Grammar (PEG) generator like Treetop, but it doesn't require special file-types or new syntax. Overall focus is on simplicity and usability over performance.
5
+
6
+ Goals
7
+ -----
8
+
9
+ * Allow expression 100% in ruby
10
+ * Productivity through Simplicity and Understandability first
11
+ * Performance second
12
+
13
+
14
+ Example
15
+ -------
16
+
17
+ ``` ruby
18
+ require "babel_bridge"
19
+
20
+ class MyParser < BabelBridge::Parser
21
+
22
+ # foo rule: match "foo" optionally followed by the :bar rule
23
+ rule :foo, "foo", :bar?
24
+
25
+ # bar rule: match "bar"
26
+ rule :bar, "bar"
27
+ end
28
+
29
+ # create one more instances of your parser
30
+ parser = MyParser.new
31
+
32
+ parser.parse "foo" # matches "foo"
33
+ # => FooNode1 > "foo"
34
+
35
+ parser.parse "foobar" # matches "foobar"
36
+ # => FooNode1
37
+ # "foo"
38
+ # BarNode1 > "bar"
39
+
40
+ parser.parse "fribar" # fails to match
41
+ # => nil
42
+
43
+ parser.parse "foobarbar" # fails to match entire input
44
+ # => nil
45
+ ```
46
+
47
+ More elaborate examples:
48
+ * [Parsing JSON the Not-So-Hard Way](http://www.essenceandartifact.com/2013/01/parsing-json-not-so-hard-way.html)
49
+ * [How to Create a Turing Complete Programming Language in 40 Minutes](http://www.essenceandartifact.com/2012/09/how-to-create-turing-complete.html)
50
+
51
+ Features
52
+ --------
53
+
54
+ ``` ruby
55
+
56
+ # returns the BabelBridge::Rule instance for that rule
57
+ rule = MyParser[:foo]
58
+ # => rule :foo, "foo", :bar?
59
+
60
+ # nice human-readable view of the rule with extra info:
61
+ rule.to_s
62
+ # rule :foo, node_class: MyParser::FooNode
63
+ # variant_class: MyParser::FooNode1, pattern: "foo", :bar?
64
+
65
+ # returns the code necessary for generating the rule and all its variants
66
+ # (minus any class_eval code)
67
+ rule.inspect
68
+ # => rule :foo, "foo", :bar?
69
+
70
+ # returns the Node class for a rule
71
+ MyParser.node_class(:foo)
72
+ # => MyParser::FooNode
73
+
74
+ MyParser.node_class(:foo) do
75
+ # class_eval inside the rule's Node-class
76
+ end
77
+
78
+ # parses Text starting with the MyParser.root_rule
79
+ # The root_rule is defined automatically by the first rule defined, but can be set by:
80
+ # MyParser.root_rule=v
81
+ # where v is the symbol name of the rule or the actual rule object from MyParser[rule]
82
+ text = "foobar"
83
+ parser.parse(text)
84
+
85
+ # do a one-time parse with :bar set as the root-rule
86
+ text = "bar"
87
+ parser.parse(text, :rule => :bar)
88
+
89
+ # relax requirement to match entire input
90
+ parser.parse "foobar and then something", :partial_match => true
91
+
92
+ # parse failure
93
+ parser.parse "foo is not immediately followed by bar"
94
+
95
+ # human readable parser failure info
96
+ puts parser.parser_failure_info
97
+ ```
98
+
99
+ Parser failure info output:
100
+ ```
101
+ Parsing error at line 1 column 4 offset 3
102
+
103
+ Source:
104
+ ...
105
+ foo<HERE> is not immediately followed by bar
106
+ ...
107
+
108
+ Parser did not match entire input.
109
+
110
+ Parse path at failure:
111
+ FooNode1
112
+
113
+ Expecting:
114
+ "bar" BarNode1
115
+ ```
116
+ NOTE: This is an evolving feature, this output is as-of 0.5.1 and may not match the current version.
117
+
118
+ Defining Rules
119
+ --------------
120
+
121
+ Inside the parser class, a rule is defined as follows:
122
+
123
+ ``` ruby
124
+ class MyParser < BabelBridge::Parser
125
+ rule :rule_name, pattern
126
+ end
127
+ ```
128
+
129
+ Where:
130
+
131
+ * :rule_name is a symbol
132
+ * pattern see Patterns below
133
+
134
+ You can also add new rules outside the class definition by:
135
+
136
+ ``` ruby
137
+ MyParser.rule :rule_name, pattern
138
+ ```
139
+
140
+ Patterns
141
+ --------
142
+
143
+ Patterns are a list of pattern elements, matched in order:
144
+
145
+ Example:
146
+
147
+ ``` ruby
148
+ rule :my_rule, "match", "this", "in", "order" # matches "matchthisinorder"
149
+ ```
150
+
151
+ Pattern Elements
152
+ ----------------
153
+
154
+ Pattern elements are basic-pattern-element or extended-pattern-element ( expressed as a hash). Internally, they are "compiled" into instances of PatternElement with optimized lambda functions for parsing.
155
+
156
+ ## Basic Pattern Elements (basic_element)
157
+
158
+ ``` ruby
159
+ :my_rule # matches the Rule named :my_rule
160
+ :my_rule? # optional: optionally matches Rule :my_rule
161
+ :my_rule! # negative: success only if it DOESN'T match Rule :my_rule
162
+ "string" # matches the string exactly
163
+ /regex/ # matches the regex exactly
164
+ ```
165
+
166
+ ## Advanced Pattern Elements
167
+
168
+ ``` ruby
169
+
170
+ # success if basic_element could be matched, but the input is not consumed
171
+ could.match(pattern_element)
172
+
173
+ # negative (two equivelent methods)
174
+ dont.match(pattern_element)
175
+ match!(pattern_element)
176
+
177
+ # optional (two equivelent methods)
178
+ optionally.match(pattern_element)
179
+ match?(pattern_element)
180
+
181
+ # match 1 or more
182
+ many(pattern_element)
183
+
184
+ # match 1 or more of one basic_element delimited by another basic_element)
185
+ many(pattern_element, delimiter_pattern_element)
186
+
187
+ # match 0 or more
188
+ many?(pattern_element)
189
+
190
+ # An array of patterns tells BB to match those patterns in order ("and" matching)
191
+ [pattern_element_a, pattern_element_b, pattern_element_c, ...]
192
+
193
+ # match any one of the listed patterns ("or" matching)
194
+ any(pattern_element_a, pattern_element_b, pattern_element_c, ...)
195
+
196
+ # optionally match any of the patterns
197
+ any?(pattern_element_a, pattern_element_b, pattern_element_c, ...)
198
+
199
+ # don't match any of the patterns
200
+ any!(pattern_element_a, pattern_element_b, pattern_element_c, ...)
201
+
202
+ ```
203
+
204
+ ## Custom Pattern Element Parser
205
+
206
+ Custom pattern elements are not generally needed, but for certain patterns, particularly context sensative ones, we provide a way to do it.
207
+
208
+ ``` ruby
209
+ class MyParser < BabelBridge::Parser
210
+
211
+ # custom parser to match an all upper-case word followed by any number of characters before that word is repeated
212
+ rule :foo, (custom_parser do |parent_node|
213
+ offset = parent_node.next
214
+ src = parent_node.src
215
+
216
+ # Note, the \A anchors the search at the beginning of the string
217
+ if src[offset..-1].index(/\A[A-Z]+/) == 0
218
+ endpattern=$~.to_s
219
+ if i = src.index(endpattern, offset + endpattern.length)
220
+ range = offset..(i + endpattern.length)
221
+ BabelBridge::TerminalNode.new(parent_node, range, "endpattern")
222
+ end
223
+ end
224
+ end)
225
+ end
226
+
227
+ parser = parser
228
+ parser.parse "END this is in the middle END"
229
+ # => FooNode1 > "END this is in the middle END"
230
+
231
+ parser.parse "DRUID this is in the middle DRUID"
232
+ # => FooNode1 > "DRUID this is in the middle DRUID"
233
+
234
+ parser.parse "DRUID this is in the middle DRUI"
235
+ # => nil
236
+ ```
237
+
238
+ Structure
239
+ ---------
240
+
241
+ * Each Rule defines a subclass of Node
242
+ * Each RuleVariant defines a subclass of the parent Rule's node-class
243
+
244
+ Therefor you can easily define code to be shared across all variants as well as define code specific to one variant.