babel_bridge 0.5.1 → 0.5.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/CHANGE_LOG +165 -0
- data/Gemfile +4 -0
- data/Guardfile +7 -0
- data/LICENCE +24 -0
- data/README.md +244 -0
- data/Rakefile +8 -2
- data/TODO +100 -0
- data/babel_bridge.gemspec +11 -3
- data/examples/json/json_parser.rb +23 -0
- data/examples/json/json_parser2.rb +37 -0
- data/lib/babel_bridge.rb +3 -2
- data/lib/{nodes.rb → babel_bridge/nodes.rb} +0 -0
- data/lib/{nodes → babel_bridge/nodes}/empty_node.rb +0 -0
- data/lib/{nodes → babel_bridge/nodes}/node.rb +1 -1
- data/lib/{nodes → babel_bridge/nodes}/non_terminal_node.rb +0 -8
- data/lib/{nodes → babel_bridge/nodes}/root_node.rb +0 -0
- data/lib/{nodes → babel_bridge/nodes}/rule_node.rb +0 -0
- data/lib/{nodes → babel_bridge/nodes}/terminal_node.rb +0 -0
- data/lib/{parser.rb → babel_bridge/parser.rb} +7 -14
- data/lib/{pattern_element.rb → babel_bridge/pattern_element.rb} +27 -25
- data/lib/babel_bridge/pattern_element_hash.rb +22 -0
- data/lib/{rule.rb → babel_bridge/rule.rb} +0 -0
- data/lib/{rule_variant.rb → babel_bridge/rule_variant.rb} +0 -4
- data/lib/{shell.rb → babel_bridge/shell.rb} +0 -0
- data/lib/{string.rb → babel_bridge/string.rb} +0 -0
- data/lib/{tools.rb → babel_bridge/tools.rb} +0 -0
- data/lib/babel_bridge/version.rb +3 -0
- data/spec/advanced_parsers_spec.rb +1 -0
- data/spec/basic_parsing_spec.rb +43 -0
- data/spec/bb_spec.rb +19 -0
- data/spec/compound_patterns_spec.rb +61 -0
- data/spec/node_spec.rb +3 -3
- data/spec/pattern_generators_spec.rb +4 -4
- data/spec/spec_helper.rb +3 -0
- metadata +115 -33
- data/README +0 -144
- data/examples/turing/examples.turing +0 -33
- data/examples/turing/notes.rb +0 -111
- data/examples/turing/turing_demo.rb +0 -71
- data/lib/version.rb +0 -4
data/CHANGE_LOG
ADDED
@@ -0,0 +1,165 @@
|
|
1
|
+
2013-2-12 v0.5.3
|
2
|
+
|
3
|
+
fixed bug with 0-length matchs' to_s returning non-zero-length strings
|
4
|
+
|
5
|
+
2012-1-25 v0.5.1
|
6
|
+
|
7
|
+
added parser.relative_source_file
|
8
|
+
|
9
|
+
2012-1-12 v0.5.0
|
10
|
+
|
11
|
+
added Parser.new :source_file => String
|
12
|
+
Sets parser.source_file value
|
13
|
+
|
14
|
+
Changed uniform_tabs to NOT include at least one space. If you want to ensure at least one space, you should add a space after your tab.
|
15
|
+
|
16
|
+
Fixed out-of-date tests in tools_spec.
|
17
|
+
|
18
|
+
2012-1-6 v0.5.0
|
19
|
+
|
20
|
+
Nodes now have #line and #column methods which return the line and column of the source for the start of that Node's match.
|
21
|
+
|
22
|
+
2012-1-5 v0.5.0
|
23
|
+
|
24
|
+
Completely reworked ignore_whitespace - again.
|
25
|
+
|
26
|
+
Now there is a global "delimiter" pattern which is matched between every sub-pattern of every rule AND at the begining and end of the entire parse.
|
27
|
+
|
28
|
+
ignore_whitespace sets this delimiter to: /\s*/
|
29
|
+
|
30
|
+
You can set your own delimiter with the delimiter method:
|
31
|
+
|
32
|
+
class MyParser < BabelBridge::Parser
|
33
|
+
delimiter :hi, "there", "/[mM]ust/", "be between every sub-pattern!" # delimiter can take any pattern "rule" can
|
34
|
+
rule :hi, "hi"
|
35
|
+
end
|
36
|
+
|
37
|
+
You can override the delimiter pattern for a single rule to put in special code:
|
38
|
+
|
39
|
+
class MyParser < BabelBridge::Parser
|
40
|
+
ignore_whitespace
|
41
|
+
|
42
|
+
rule :root, many(:statement, ';')
|
43
|
+
rule :statement, many(:word, / +/), :delimiter => // # disable the global delimiter
|
44
|
+
end
|
45
|
+
|
46
|
+
INCOMPATIBLE CHANGE: node.matches is no longer positional
|
47
|
+
|
48
|
+
node.matches now includes only things that were matched. This means conditional matches which do not match no longer add an EmptyNode to node.matches.
|
49
|
+
|
50
|
+
node.matches now contains all delimiter matches.
|
51
|
+
|
52
|
+
INCOMPATIBLE CHANGE: no more ManyNode
|
53
|
+
|
54
|
+
The many(rule) parser pattern no longer generates a special kind of parse-tree node. Instead it adds all its matches to the parent rule's .matches list. It also adds all the many-delimiters.
|
55
|
+
|
56
|
+
NOTE: 'delimiter' referes to the global delimiter pattern or the rule-local override. 'many-delimiter' refers to the optional, explicit delimiter specfied for the many-pattern.
|
57
|
+
|
58
|
+
NOTE: many(:rule,:many_delimiter) will effectively match: [rule]([delimiter][many_delimiter][delimiter][rule])*
|
59
|
+
|
60
|
+
2012-12-31 v0.4.2
|
61
|
+
|
62
|
+
Bugfix: parser_failure_info now works when nothing is matched
|
63
|
+
|
64
|
+
2012-12-17 v0.4.1
|
65
|
+
|
66
|
+
rewind_whitespace usage example:
|
67
|
+
|
68
|
+
rule :end_statement, rewind_whitespace, /([\t ]*[\n;])+/
|
69
|
+
|
70
|
+
In this example, end_statement is similar to the end-of-statement pattern for the ruby language. Each statement either ends with a new line or a semicolon. "rewind_whitspace" indicates the parser should back up to the end of the last match and then continue matching.
|
71
|
+
|
72
|
+
2012-11-20 v0.4.0
|
73
|
+
|
74
|
+
INCOMPATIBLE CHANGE: Removed the post-match pattern option from the "many" pattern matcher. It simplifies things and can easily be reproduced with a custom rule.
|
75
|
+
|
76
|
+
Did significant code cleanup. NonTerminalNode was renamed RuleNode and a new NonTerminalNode class was created as a parent for RuleNode and ManyNode.
|
77
|
+
|
78
|
+
ignore_whitespace is now just a regexp. An Empty regexp is used if ignore_whitespace is not specified. It is now handled consistenly throughout. Every node has postwhitespace_range and prewhitespace_range methods that allow you to find the whitespace after and before that node.
|
79
|
+
|
80
|
+
node.to_s and node.text now both just return the matched text WITHOUT the preceding and trailing whitespace. Note, however, that it will still include any whitespace inbetween as it is just a single slice out of the source.
|
81
|
+
|
82
|
+
2012-11-13
|
83
|
+
|
84
|
+
ignore_whitespace now optionally takes a regexp for what to ignore after every TerminalNode. Default: /\s*/
|
85
|
+
|
86
|
+
rewind_whitespace matching pattern added. This allows you to match the string ignored by "ignore_whitespace" after the previous token.
|
87
|
+
|
88
|
+
Example: Implements the Ruby ";" / new-line parsing rule.
|
89
|
+
|
90
|
+
class MyParser < BabelBridge::Parser
|
91
|
+
ignore_whitespace
|
92
|
+
|
93
|
+
rule :pair, :statement, :end_statement, :statement
|
94
|
+
rule :end_statement, rewind_whitespace(/([\t ]*[\n;])+/)
|
95
|
+
rule :statement, "0"
|
96
|
+
end
|
97
|
+
|
98
|
+
# matches two 0s separated by one or more ";" or "\n" and any whitespace
|
99
|
+
|
100
|
+
|
101
|
+
2012-09-28
|
102
|
+
|
103
|
+
Added to_sym on nodes.
|
104
|
+
|
105
|
+
2012-09-19 version 0.3.1
|
106
|
+
|
107
|
+
Added refinements to the parser-failure output.
|
108
|
+
|
109
|
+
2012-09-13
|
110
|
+
|
111
|
+
Reversed the precedence order for binary_operators_rule. The first element has the highest precedence, i.e., it is computed first.
|
112
|
+
|
113
|
+
Now, the correct precedence order for the basic operators is:
|
114
|
+
|
115
|
+
[["*", "/"], ["+", "-"]]
|
116
|
+
|
117
|
+
2012-09-12
|
118
|
+
|
119
|
+
using readline for shell
|
120
|
+
|
121
|
+
added support for infix binary operator presedence resolution:
|
122
|
+
|
123
|
+
USAGE:
|
124
|
+
|
125
|
+
binary_operators_rule :any_rule_name, :operands_pattern, operators, [:right_operators => [...]]
|
126
|
+
|
127
|
+
Where "operators" is an array of operators, ordered by precedence such as: ["+", "-", "*", "/"].
|
128
|
+
|
129
|
+
The last operators in the array are matched first.
|
130
|
+
|
131
|
+
You can also group operators into the same precedence level: [["+", "-"], ["*", "/"]]
|
132
|
+
|
133
|
+
Operators in the same precedence level are matched left-to-right.
|
134
|
+
|
135
|
+
You optionally can list one or more "right_operators" - which can be strings or regexps - to specify which operators are right-associative.
|
136
|
+
|
137
|
+
MATCHING:
|
138
|
+
|
139
|
+
binary_operators_rule :any_rule_name, :operands_pattern, ["+", "-", "*", "/"]
|
140
|
+
|
141
|
+
matches the same string as:
|
142
|
+
|
143
|
+
rule :any_rule_name, many(:operands_pattern,/[-+*\/]/)
|
144
|
+
|
145
|
+
PARSE TREE:
|
146
|
+
|
147
|
+
The resulting parse-tree consists of 1 or more instances of the :any_rule_name rule's varient class. Each node has methods for easy acess to:
|
148
|
+
|
149
|
+
left -> the left operand node
|
150
|
+
right -> the right operand node
|
151
|
+
operator -> the operator as a symbol
|
152
|
+
operator_node -> the operator node
|
153
|
+
|
154
|
+
ignore_whitespace feature added
|
155
|
+
|
156
|
+
Called in the parser's class. Sets a flag that causes all future parsing to ignore white spaces. Specifically, this means that after each terminal-node match, all trailing-whitespace is consumed before the next terminal match is attempted.
|
157
|
+
|
158
|
+
This means that terminal nodes can still match any white-spaces they require.
|
159
|
+
|
160
|
+
The exact matched string, including trailing whitespace, is still available via the "text" method. The "to_s" method, though, now returns the stripped token value (if ignore_whitespace is enabled).
|
161
|
+
|
162
|
+
2012-09-09
|
163
|
+
|
164
|
+
forward_to now scans all patern elements for the first one that responds to the method
|
165
|
+
added shell
|
data/Gemfile
ADDED
data/Guardfile
ADDED
data/LICENCE
ADDED
@@ -0,0 +1,24 @@
|
|
1
|
+
Copyright (c) 2010, Shane Brinkman-Davis
|
2
|
+
All rights reserved.
|
3
|
+
|
4
|
+
Redistribution and use in source and binary forms, with or without
|
5
|
+
modification, are permitted provided that the following conditions are met:
|
6
|
+
* Redistributions of source code must retain the above copyright
|
7
|
+
notice, this list of conditions and the following disclaimer.
|
8
|
+
* Redistributions in binary form must reproduce the above copyright
|
9
|
+
notice, this list of conditions and the following disclaimer in the
|
10
|
+
documentation and/or other materials provided with the distribution.
|
11
|
+
* Neither the name of the <organization> nor the
|
12
|
+
names of its contributors may be used to endorse or promote products
|
13
|
+
derived from this software without specific prior written permission.
|
14
|
+
|
15
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
16
|
+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
17
|
+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
18
|
+
DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
|
19
|
+
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
20
|
+
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
21
|
+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
22
|
+
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
23
|
+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
24
|
+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
data/README.md
ADDED
@@ -0,0 +1,244 @@
|
|
1
|
+
Summary
|
2
|
+
-------
|
3
|
+
|
4
|
+
Babel Bridge let's you generate parsers 100% in Ruby code. It is a memoizing Parsing Expression Grammar (PEG) generator like Treetop, but it doesn't require special file-types or new syntax. Overall focus is on simplicity and usability over performance.
|
5
|
+
|
6
|
+
Goals
|
7
|
+
-----
|
8
|
+
|
9
|
+
* Allow expression 100% in ruby
|
10
|
+
* Productivity through Simplicity and Understandability first
|
11
|
+
* Performance second
|
12
|
+
|
13
|
+
|
14
|
+
Example
|
15
|
+
-------
|
16
|
+
|
17
|
+
``` ruby
|
18
|
+
require "babel_bridge"
|
19
|
+
|
20
|
+
class MyParser < BabelBridge::Parser
|
21
|
+
|
22
|
+
# foo rule: match "foo" optionally followed by the :bar rule
|
23
|
+
rule :foo, "foo", :bar?
|
24
|
+
|
25
|
+
# bar rule: match "bar"
|
26
|
+
rule :bar, "bar"
|
27
|
+
end
|
28
|
+
|
29
|
+
# create one more instances of your parser
|
30
|
+
parser = MyParser.new
|
31
|
+
|
32
|
+
parser.parse "foo" # matches "foo"
|
33
|
+
# => FooNode1 > "foo"
|
34
|
+
|
35
|
+
parser.parse "foobar" # matches "foobar"
|
36
|
+
# => FooNode1
|
37
|
+
# "foo"
|
38
|
+
# BarNode1 > "bar"
|
39
|
+
|
40
|
+
parser.parse "fribar" # fails to match
|
41
|
+
# => nil
|
42
|
+
|
43
|
+
parser.parse "foobarbar" # fails to match entire input
|
44
|
+
# => nil
|
45
|
+
```
|
46
|
+
|
47
|
+
More elaborate examples:
|
48
|
+
* [Parsing JSON the Not-So-Hard Way](http://www.essenceandartifact.com/2013/01/parsing-json-not-so-hard-way.html)
|
49
|
+
* [How to Create a Turing Complete Programming Language in 40 Minutes](http://www.essenceandartifact.com/2012/09/how-to-create-turing-complete.html)
|
50
|
+
|
51
|
+
Features
|
52
|
+
--------
|
53
|
+
|
54
|
+
``` ruby
|
55
|
+
|
56
|
+
# returns the BabelBridge::Rule instance for that rule
|
57
|
+
rule = MyParser[:foo]
|
58
|
+
# => rule :foo, "foo", :bar?
|
59
|
+
|
60
|
+
# nice human-readable view of the rule with extra info:
|
61
|
+
rule.to_s
|
62
|
+
# rule :foo, node_class: MyParser::FooNode
|
63
|
+
# variant_class: MyParser::FooNode1, pattern: "foo", :bar?
|
64
|
+
|
65
|
+
# returns the code necessary for generating the rule and all its variants
|
66
|
+
# (minus any class_eval code)
|
67
|
+
rule.inspect
|
68
|
+
# => rule :foo, "foo", :bar?
|
69
|
+
|
70
|
+
# returns the Node class for a rule
|
71
|
+
MyParser.node_class(:foo)
|
72
|
+
# => MyParser::FooNode
|
73
|
+
|
74
|
+
MyParser.node_class(:foo) do
|
75
|
+
# class_eval inside the rule's Node-class
|
76
|
+
end
|
77
|
+
|
78
|
+
# parses Text starting with the MyParser.root_rule
|
79
|
+
# The root_rule is defined automatically by the first rule defined, but can be set by:
|
80
|
+
# MyParser.root_rule=v
|
81
|
+
# where v is the symbol name of the rule or the actual rule object from MyParser[rule]
|
82
|
+
text = "foobar"
|
83
|
+
parser.parse(text)
|
84
|
+
|
85
|
+
# do a one-time parse with :bar set as the root-rule
|
86
|
+
text = "bar"
|
87
|
+
parser.parse(text, :rule => :bar)
|
88
|
+
|
89
|
+
# relax requirement to match entire input
|
90
|
+
parser.parse "foobar and then something", :partial_match => true
|
91
|
+
|
92
|
+
# parse failure
|
93
|
+
parser.parse "foo is not immediately followed by bar"
|
94
|
+
|
95
|
+
# human readable parser failure info
|
96
|
+
puts parser.parser_failure_info
|
97
|
+
```
|
98
|
+
|
99
|
+
Parser failure info output:
|
100
|
+
```
|
101
|
+
Parsing error at line 1 column 4 offset 3
|
102
|
+
|
103
|
+
Source:
|
104
|
+
...
|
105
|
+
foo<HERE> is not immediately followed by bar
|
106
|
+
...
|
107
|
+
|
108
|
+
Parser did not match entire input.
|
109
|
+
|
110
|
+
Parse path at failure:
|
111
|
+
FooNode1
|
112
|
+
|
113
|
+
Expecting:
|
114
|
+
"bar" BarNode1
|
115
|
+
```
|
116
|
+
NOTE: This is an evolving feature, this output is as-of 0.5.1 and may not match the current version.
|
117
|
+
|
118
|
+
Defining Rules
|
119
|
+
--------------
|
120
|
+
|
121
|
+
Inside the parser class, a rule is defined as follows:
|
122
|
+
|
123
|
+
``` ruby
|
124
|
+
class MyParser < BabelBridge::Parser
|
125
|
+
rule :rule_name, pattern
|
126
|
+
end
|
127
|
+
```
|
128
|
+
|
129
|
+
Where:
|
130
|
+
|
131
|
+
* :rule_name is a symbol
|
132
|
+
* pattern see Patterns below
|
133
|
+
|
134
|
+
You can also add new rules outside the class definition by:
|
135
|
+
|
136
|
+
``` ruby
|
137
|
+
MyParser.rule :rule_name, pattern
|
138
|
+
```
|
139
|
+
|
140
|
+
Patterns
|
141
|
+
--------
|
142
|
+
|
143
|
+
Patterns are a list of pattern elements, matched in order:
|
144
|
+
|
145
|
+
Example:
|
146
|
+
|
147
|
+
``` ruby
|
148
|
+
rule :my_rule, "match", "this", "in", "order" # matches "matchthisinorder"
|
149
|
+
```
|
150
|
+
|
151
|
+
Pattern Elements
|
152
|
+
----------------
|
153
|
+
|
154
|
+
Pattern elements are basic-pattern-element or extended-pattern-element ( expressed as a hash). Internally, they are "compiled" into instances of PatternElement with optimized lambda functions for parsing.
|
155
|
+
|
156
|
+
## Basic Pattern Elements (basic_element)
|
157
|
+
|
158
|
+
``` ruby
|
159
|
+
:my_rule # matches the Rule named :my_rule
|
160
|
+
:my_rule? # optional: optionally matches Rule :my_rule
|
161
|
+
:my_rule! # negative: success only if it DOESN'T match Rule :my_rule
|
162
|
+
"string" # matches the string exactly
|
163
|
+
/regex/ # matches the regex exactly
|
164
|
+
```
|
165
|
+
|
166
|
+
## Advanced Pattern Elements
|
167
|
+
|
168
|
+
``` ruby
|
169
|
+
|
170
|
+
# success if basic_element could be matched, but the input is not consumed
|
171
|
+
could.match(pattern_element)
|
172
|
+
|
173
|
+
# negative (two equivelent methods)
|
174
|
+
dont.match(pattern_element)
|
175
|
+
match!(pattern_element)
|
176
|
+
|
177
|
+
# optional (two equivelent methods)
|
178
|
+
optionally.match(pattern_element)
|
179
|
+
match?(pattern_element)
|
180
|
+
|
181
|
+
# match 1 or more
|
182
|
+
many(pattern_element)
|
183
|
+
|
184
|
+
# match 1 or more of one basic_element delimited by another basic_element)
|
185
|
+
many(pattern_element, delimiter_pattern_element)
|
186
|
+
|
187
|
+
# match 0 or more
|
188
|
+
many?(pattern_element)
|
189
|
+
|
190
|
+
# An array of patterns tells BB to match those patterns in order ("and" matching)
|
191
|
+
[pattern_element_a, pattern_element_b, pattern_element_c, ...]
|
192
|
+
|
193
|
+
# match any one of the listed patterns ("or" matching)
|
194
|
+
any(pattern_element_a, pattern_element_b, pattern_element_c, ...)
|
195
|
+
|
196
|
+
# optionally match any of the patterns
|
197
|
+
any?(pattern_element_a, pattern_element_b, pattern_element_c, ...)
|
198
|
+
|
199
|
+
# don't match any of the patterns
|
200
|
+
any!(pattern_element_a, pattern_element_b, pattern_element_c, ...)
|
201
|
+
|
202
|
+
```
|
203
|
+
|
204
|
+
## Custom Pattern Element Parser
|
205
|
+
|
206
|
+
Custom pattern elements are not generally needed, but for certain patterns, particularly context sensative ones, we provide a way to do it.
|
207
|
+
|
208
|
+
``` ruby
|
209
|
+
class MyParser < BabelBridge::Parser
|
210
|
+
|
211
|
+
# custom parser to match an all upper-case word followed by any number of characters before that word is repeated
|
212
|
+
rule :foo, (custom_parser do |parent_node|
|
213
|
+
offset = parent_node.next
|
214
|
+
src = parent_node.src
|
215
|
+
|
216
|
+
# Note, the \A anchors the search at the beginning of the string
|
217
|
+
if src[offset..-1].index(/\A[A-Z]+/) == 0
|
218
|
+
endpattern=$~.to_s
|
219
|
+
if i = src.index(endpattern, offset + endpattern.length)
|
220
|
+
range = offset..(i + endpattern.length)
|
221
|
+
BabelBridge::TerminalNode.new(parent_node, range, "endpattern")
|
222
|
+
end
|
223
|
+
end
|
224
|
+
end)
|
225
|
+
end
|
226
|
+
|
227
|
+
parser = parser
|
228
|
+
parser.parse "END this is in the middle END"
|
229
|
+
# => FooNode1 > "END this is in the middle END"
|
230
|
+
|
231
|
+
parser.parse "DRUID this is in the middle DRUID"
|
232
|
+
# => FooNode1 > "DRUID this is in the middle DRUID"
|
233
|
+
|
234
|
+
parser.parse "DRUID this is in the middle DRUI"
|
235
|
+
# => nil
|
236
|
+
```
|
237
|
+
|
238
|
+
Structure
|
239
|
+
---------
|
240
|
+
|
241
|
+
* Each Rule defines a subclass of Node
|
242
|
+
* Each RuleVariant defines a subclass of the parent Rule's node-class
|
243
|
+
|
244
|
+
Therefor you can easily define code to be shared across all variants as well as define code specific to one variant.
|