babel_bridge 0.5.1 → 0.5.3
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGE_LOG +165 -0
- data/Gemfile +4 -0
- data/Guardfile +7 -0
- data/LICENCE +24 -0
- data/README.md +244 -0
- data/Rakefile +8 -2
- data/TODO +100 -0
- data/babel_bridge.gemspec +11 -3
- data/examples/json/json_parser.rb +23 -0
- data/examples/json/json_parser2.rb +37 -0
- data/lib/babel_bridge.rb +3 -2
- data/lib/{nodes.rb → babel_bridge/nodes.rb} +0 -0
- data/lib/{nodes → babel_bridge/nodes}/empty_node.rb +0 -0
- data/lib/{nodes → babel_bridge/nodes}/node.rb +1 -1
- data/lib/{nodes → babel_bridge/nodes}/non_terminal_node.rb +0 -8
- data/lib/{nodes → babel_bridge/nodes}/root_node.rb +0 -0
- data/lib/{nodes → babel_bridge/nodes}/rule_node.rb +0 -0
- data/lib/{nodes → babel_bridge/nodes}/terminal_node.rb +0 -0
- data/lib/{parser.rb → babel_bridge/parser.rb} +7 -14
- data/lib/{pattern_element.rb → babel_bridge/pattern_element.rb} +27 -25
- data/lib/babel_bridge/pattern_element_hash.rb +22 -0
- data/lib/{rule.rb → babel_bridge/rule.rb} +0 -0
- data/lib/{rule_variant.rb → babel_bridge/rule_variant.rb} +0 -4
- data/lib/{shell.rb → babel_bridge/shell.rb} +0 -0
- data/lib/{string.rb → babel_bridge/string.rb} +0 -0
- data/lib/{tools.rb → babel_bridge/tools.rb} +0 -0
- data/lib/babel_bridge/version.rb +3 -0
- data/spec/advanced_parsers_spec.rb +1 -0
- data/spec/basic_parsing_spec.rb +43 -0
- data/spec/bb_spec.rb +19 -0
- data/spec/compound_patterns_spec.rb +61 -0
- data/spec/node_spec.rb +3 -3
- data/spec/pattern_generators_spec.rb +4 -4
- data/spec/spec_helper.rb +3 -0
- metadata +115 -33
- data/README +0 -144
- data/examples/turing/examples.turing +0 -33
- data/examples/turing/notes.rb +0 -111
- data/examples/turing/turing_demo.rb +0 -71
- data/lib/version.rb +0 -4
data/CHANGE_LOG
ADDED
@@ -0,0 +1,165 @@
|
|
1
|
+
2013-2-12 v0.5.3
|
2
|
+
|
3
|
+
fixed bug with 0-length matchs' to_s returning non-zero-length strings
|
4
|
+
|
5
|
+
2012-1-25 v0.5.1
|
6
|
+
|
7
|
+
added parser.relative_source_file
|
8
|
+
|
9
|
+
2012-1-12 v0.5.0
|
10
|
+
|
11
|
+
added Parser.new :source_file => String
|
12
|
+
Sets parser.source_file value
|
13
|
+
|
14
|
+
Changed uniform_tabs to NOT include at least one space. If you want to ensure at least one space, you should add a space after your tab.
|
15
|
+
|
16
|
+
Fixed out-of-date tests in tools_spec.
|
17
|
+
|
18
|
+
2012-1-6 v0.5.0
|
19
|
+
|
20
|
+
Nodes now have #line and #column methods which return the line and column of the source for the start of that Node's match.
|
21
|
+
|
22
|
+
2012-1-5 v0.5.0
|
23
|
+
|
24
|
+
Completely reworked ignore_whitespace - again.
|
25
|
+
|
26
|
+
Now there is a global "delimiter" pattern which is matched between every sub-pattern of every rule AND at the begining and end of the entire parse.
|
27
|
+
|
28
|
+
ignore_whitespace sets this delimiter to: /\s*/
|
29
|
+
|
30
|
+
You can set your own delimiter with the delimiter method:
|
31
|
+
|
32
|
+
class MyParser < BabelBridge::Parser
|
33
|
+
delimiter :hi, "there", "/[mM]ust/", "be between every sub-pattern!" # delimiter can take any pattern "rule" can
|
34
|
+
rule :hi, "hi"
|
35
|
+
end
|
36
|
+
|
37
|
+
You can override the delimiter pattern for a single rule to put in special code:
|
38
|
+
|
39
|
+
class MyParser < BabelBridge::Parser
|
40
|
+
ignore_whitespace
|
41
|
+
|
42
|
+
rule :root, many(:statement, ';')
|
43
|
+
rule :statement, many(:word, / +/), :delimiter => // # disable the global delimiter
|
44
|
+
end
|
45
|
+
|
46
|
+
INCOMPATIBLE CHANGE: node.matches is no longer positional
|
47
|
+
|
48
|
+
node.matches now includes only things that were matched. This means conditional matches which do not match no longer add an EmptyNode to node.matches.
|
49
|
+
|
50
|
+
node.matches now contains all delimiter matches.
|
51
|
+
|
52
|
+
INCOMPATIBLE CHANGE: no more ManyNode
|
53
|
+
|
54
|
+
The many(rule) parser pattern no longer generates a special kind of parse-tree node. Instead it adds all its matches to the parent rule's .matches list. It also adds all the many-delimiters.
|
55
|
+
|
56
|
+
NOTE: 'delimiter' referes to the global delimiter pattern or the rule-local override. 'many-delimiter' refers to the optional, explicit delimiter specfied for the many-pattern.
|
57
|
+
|
58
|
+
NOTE: many(:rule,:many_delimiter) will effectively match: [rule]([delimiter][many_delimiter][delimiter][rule])*
|
59
|
+
|
60
|
+
2012-12-31 v0.4.2
|
61
|
+
|
62
|
+
Bugfix: parser_failure_info now works when nothing is matched
|
63
|
+
|
64
|
+
2012-12-17 v0.4.1
|
65
|
+
|
66
|
+
rewind_whitespace usage example:
|
67
|
+
|
68
|
+
rule :end_statement, rewind_whitespace, /([\t ]*[\n;])+/
|
69
|
+
|
70
|
+
In this example, end_statement is similar to the end-of-statement pattern for the ruby language. Each statement either ends with a new line or a semicolon. "rewind_whitspace" indicates the parser should back up to the end of the last match and then continue matching.
|
71
|
+
|
72
|
+
2012-11-20 v0.4.0
|
73
|
+
|
74
|
+
INCOMPATIBLE CHANGE: Removed the post-match pattern option from the "many" pattern matcher. It simplifies things and can easily be reproduced with a custom rule.
|
75
|
+
|
76
|
+
Did significant code cleanup. NonTerminalNode was renamed RuleNode and a new NonTerminalNode class was created as a parent for RuleNode and ManyNode.
|
77
|
+
|
78
|
+
ignore_whitespace is now just a regexp. An Empty regexp is used if ignore_whitespace is not specified. It is now handled consistenly throughout. Every node has postwhitespace_range and prewhitespace_range methods that allow you to find the whitespace after and before that node.
|
79
|
+
|
80
|
+
node.to_s and node.text now both just return the matched text WITHOUT the preceding and trailing whitespace. Note, however, that it will still include any whitespace inbetween as it is just a single slice out of the source.
|
81
|
+
|
82
|
+
2012-11-13
|
83
|
+
|
84
|
+
ignore_whitespace now optionally takes a regexp for what to ignore after every TerminalNode. Default: /\s*/
|
85
|
+
|
86
|
+
rewind_whitespace matching pattern added. This allows you to match the string ignored by "ignore_whitespace" after the previous token.
|
87
|
+
|
88
|
+
Example: Implements the Ruby ";" / new-line parsing rule.
|
89
|
+
|
90
|
+
class MyParser < BabelBridge::Parser
|
91
|
+
ignore_whitespace
|
92
|
+
|
93
|
+
rule :pair, :statement, :end_statement, :statement
|
94
|
+
rule :end_statement, rewind_whitespace(/([\t ]*[\n;])+/)
|
95
|
+
rule :statement, "0"
|
96
|
+
end
|
97
|
+
|
98
|
+
# matches two 0s separated by one or more ";" or "\n" and any whitespace
|
99
|
+
|
100
|
+
|
101
|
+
2012-09-28
|
102
|
+
|
103
|
+
Added to_sym on nodes.
|
104
|
+
|
105
|
+
2012-09-19 version 0.3.1
|
106
|
+
|
107
|
+
Added refinements to the parser-failure output.
|
108
|
+
|
109
|
+
2012-09-13
|
110
|
+
|
111
|
+
Reversed the precedence order for binary_operators_rule. The first element has the highest precedence, i.e., it is computed first.
|
112
|
+
|
113
|
+
Now, the correct precedence order for the basic operators is:
|
114
|
+
|
115
|
+
[["*", "/"], ["+", "-"]]
|
116
|
+
|
117
|
+
2012-09-12
|
118
|
+
|
119
|
+
using readline for shell
|
120
|
+
|
121
|
+
added support for infix binary operator presedence resolution:
|
122
|
+
|
123
|
+
USAGE:
|
124
|
+
|
125
|
+
binary_operators_rule :any_rule_name, :operands_pattern, operators, [:right_operators => [...]]
|
126
|
+
|
127
|
+
Where "operators" is an array of operators, ordered by precedence such as: ["+", "-", "*", "/"].
|
128
|
+
|
129
|
+
The last operators in the array are matched first.
|
130
|
+
|
131
|
+
You can also group operators into the same precedence level: [["+", "-"], ["*", "/"]]
|
132
|
+
|
133
|
+
Operators in the same precedence level are matched left-to-right.
|
134
|
+
|
135
|
+
You optionally can list one or more "right_operators" - which can be strings or regexps - to specify which operators are right-associative.
|
136
|
+
|
137
|
+
MATCHING:
|
138
|
+
|
139
|
+
binary_operators_rule :any_rule_name, :operands_pattern, ["+", "-", "*", "/"]
|
140
|
+
|
141
|
+
matches the same string as:
|
142
|
+
|
143
|
+
rule :any_rule_name, many(:operands_pattern,/[-+*\/]/)
|
144
|
+
|
145
|
+
PARSE TREE:
|
146
|
+
|
147
|
+
The resulting parse-tree consists of 1 or more instances of the :any_rule_name rule's varient class. Each node has methods for easy acess to:
|
148
|
+
|
149
|
+
left -> the left operand node
|
150
|
+
right -> the right operand node
|
151
|
+
operator -> the operator as a symbol
|
152
|
+
operator_node -> the operator node
|
153
|
+
|
154
|
+
ignore_whitespace feature added
|
155
|
+
|
156
|
+
Called in the parser's class. Sets a flag that causes all future parsing to ignore white spaces. Specifically, this means that after each terminal-node match, all trailing-whitespace is consumed before the next terminal match is attempted.
|
157
|
+
|
158
|
+
This means that terminal nodes can still match any white-spaces they require.
|
159
|
+
|
160
|
+
The exact matched string, including trailing whitespace, is still available via the "text" method. The "to_s" method, though, now returns the stripped token value (if ignore_whitespace is enabled).
|
161
|
+
|
162
|
+
2012-09-09
|
163
|
+
|
164
|
+
forward_to now scans all patern elements for the first one that responds to the method
|
165
|
+
added shell
|
data/Gemfile
ADDED
data/Guardfile
ADDED
data/LICENCE
ADDED
@@ -0,0 +1,24 @@
|
|
1
|
+
Copyright (c) 2010, Shane Brinkman-Davis
|
2
|
+
All rights reserved.
|
3
|
+
|
4
|
+
Redistribution and use in source and binary forms, with or without
|
5
|
+
modification, are permitted provided that the following conditions are met:
|
6
|
+
* Redistributions of source code must retain the above copyright
|
7
|
+
notice, this list of conditions and the following disclaimer.
|
8
|
+
* Redistributions in binary form must reproduce the above copyright
|
9
|
+
notice, this list of conditions and the following disclaimer in the
|
10
|
+
documentation and/or other materials provided with the distribution.
|
11
|
+
* Neither the name of the <organization> nor the
|
12
|
+
names of its contributors may be used to endorse or promote products
|
13
|
+
derived from this software without specific prior written permission.
|
14
|
+
|
15
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
|
16
|
+
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
|
17
|
+
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
18
|
+
DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
|
19
|
+
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
|
20
|
+
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
|
21
|
+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
|
22
|
+
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
23
|
+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
24
|
+
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
data/README.md
ADDED
@@ -0,0 +1,244 @@
|
|
1
|
+
Summary
|
2
|
+
-------
|
3
|
+
|
4
|
+
Babel Bridge let's you generate parsers 100% in Ruby code. It is a memoizing Parsing Expression Grammar (PEG) generator like Treetop, but it doesn't require special file-types or new syntax. Overall focus is on simplicity and usability over performance.
|
5
|
+
|
6
|
+
Goals
|
7
|
+
-----
|
8
|
+
|
9
|
+
* Allow expression 100% in ruby
|
10
|
+
* Productivity through Simplicity and Understandability first
|
11
|
+
* Performance second
|
12
|
+
|
13
|
+
|
14
|
+
Example
|
15
|
+
-------
|
16
|
+
|
17
|
+
``` ruby
|
18
|
+
require "babel_bridge"
|
19
|
+
|
20
|
+
class MyParser < BabelBridge::Parser
|
21
|
+
|
22
|
+
# foo rule: match "foo" optionally followed by the :bar rule
|
23
|
+
rule :foo, "foo", :bar?
|
24
|
+
|
25
|
+
# bar rule: match "bar"
|
26
|
+
rule :bar, "bar"
|
27
|
+
end
|
28
|
+
|
29
|
+
# create one more instances of your parser
|
30
|
+
parser = MyParser.new
|
31
|
+
|
32
|
+
parser.parse "foo" # matches "foo"
|
33
|
+
# => FooNode1 > "foo"
|
34
|
+
|
35
|
+
parser.parse "foobar" # matches "foobar"
|
36
|
+
# => FooNode1
|
37
|
+
# "foo"
|
38
|
+
# BarNode1 > "bar"
|
39
|
+
|
40
|
+
parser.parse "fribar" # fails to match
|
41
|
+
# => nil
|
42
|
+
|
43
|
+
parser.parse "foobarbar" # fails to match entire input
|
44
|
+
# => nil
|
45
|
+
```
|
46
|
+
|
47
|
+
More elaborate examples:
|
48
|
+
* [Parsing JSON the Not-So-Hard Way](http://www.essenceandartifact.com/2013/01/parsing-json-not-so-hard-way.html)
|
49
|
+
* [How to Create a Turing Complete Programming Language in 40 Minutes](http://www.essenceandartifact.com/2012/09/how-to-create-turing-complete.html)
|
50
|
+
|
51
|
+
Features
|
52
|
+
--------
|
53
|
+
|
54
|
+
``` ruby
|
55
|
+
|
56
|
+
# returns the BabelBridge::Rule instance for that rule
|
57
|
+
rule = MyParser[:foo]
|
58
|
+
# => rule :foo, "foo", :bar?
|
59
|
+
|
60
|
+
# nice human-readable view of the rule with extra info:
|
61
|
+
rule.to_s
|
62
|
+
# rule :foo, node_class: MyParser::FooNode
|
63
|
+
# variant_class: MyParser::FooNode1, pattern: "foo", :bar?
|
64
|
+
|
65
|
+
# returns the code necessary for generating the rule and all its variants
|
66
|
+
# (minus any class_eval code)
|
67
|
+
rule.inspect
|
68
|
+
# => rule :foo, "foo", :bar?
|
69
|
+
|
70
|
+
# returns the Node class for a rule
|
71
|
+
MyParser.node_class(:foo)
|
72
|
+
# => MyParser::FooNode
|
73
|
+
|
74
|
+
MyParser.node_class(:foo) do
|
75
|
+
# class_eval inside the rule's Node-class
|
76
|
+
end
|
77
|
+
|
78
|
+
# parses Text starting with the MyParser.root_rule
|
79
|
+
# The root_rule is defined automatically by the first rule defined, but can be set by:
|
80
|
+
# MyParser.root_rule=v
|
81
|
+
# where v is the symbol name of the rule or the actual rule object from MyParser[rule]
|
82
|
+
text = "foobar"
|
83
|
+
parser.parse(text)
|
84
|
+
|
85
|
+
# do a one-time parse with :bar set as the root-rule
|
86
|
+
text = "bar"
|
87
|
+
parser.parse(text, :rule => :bar)
|
88
|
+
|
89
|
+
# relax requirement to match entire input
|
90
|
+
parser.parse "foobar and then something", :partial_match => true
|
91
|
+
|
92
|
+
# parse failure
|
93
|
+
parser.parse "foo is not immediately followed by bar"
|
94
|
+
|
95
|
+
# human readable parser failure info
|
96
|
+
puts parser.parser_failure_info
|
97
|
+
```
|
98
|
+
|
99
|
+
Parser failure info output:
|
100
|
+
```
|
101
|
+
Parsing error at line 1 column 4 offset 3
|
102
|
+
|
103
|
+
Source:
|
104
|
+
...
|
105
|
+
foo<HERE> is not immediately followed by bar
|
106
|
+
...
|
107
|
+
|
108
|
+
Parser did not match entire input.
|
109
|
+
|
110
|
+
Parse path at failure:
|
111
|
+
FooNode1
|
112
|
+
|
113
|
+
Expecting:
|
114
|
+
"bar" BarNode1
|
115
|
+
```
|
116
|
+
NOTE: This is an evolving feature, this output is as-of 0.5.1 and may not match the current version.
|
117
|
+
|
118
|
+
Defining Rules
|
119
|
+
--------------
|
120
|
+
|
121
|
+
Inside the parser class, a rule is defined as follows:
|
122
|
+
|
123
|
+
``` ruby
|
124
|
+
class MyParser < BabelBridge::Parser
|
125
|
+
rule :rule_name, pattern
|
126
|
+
end
|
127
|
+
```
|
128
|
+
|
129
|
+
Where:
|
130
|
+
|
131
|
+
* :rule_name is a symbol
|
132
|
+
* pattern see Patterns below
|
133
|
+
|
134
|
+
You can also add new rules outside the class definition by:
|
135
|
+
|
136
|
+
``` ruby
|
137
|
+
MyParser.rule :rule_name, pattern
|
138
|
+
```
|
139
|
+
|
140
|
+
Patterns
|
141
|
+
--------
|
142
|
+
|
143
|
+
Patterns are a list of pattern elements, matched in order:
|
144
|
+
|
145
|
+
Example:
|
146
|
+
|
147
|
+
``` ruby
|
148
|
+
rule :my_rule, "match", "this", "in", "order" # matches "matchthisinorder"
|
149
|
+
```
|
150
|
+
|
151
|
+
Pattern Elements
|
152
|
+
----------------
|
153
|
+
|
154
|
+
Pattern elements are basic-pattern-element or extended-pattern-element ( expressed as a hash). Internally, they are "compiled" into instances of PatternElement with optimized lambda functions for parsing.
|
155
|
+
|
156
|
+
## Basic Pattern Elements (basic_element)
|
157
|
+
|
158
|
+
``` ruby
|
159
|
+
:my_rule # matches the Rule named :my_rule
|
160
|
+
:my_rule? # optional: optionally matches Rule :my_rule
|
161
|
+
:my_rule! # negative: success only if it DOESN'T match Rule :my_rule
|
162
|
+
"string" # matches the string exactly
|
163
|
+
/regex/ # matches the regex exactly
|
164
|
+
```
|
165
|
+
|
166
|
+
## Advanced Pattern Elements
|
167
|
+
|
168
|
+
``` ruby
|
169
|
+
|
170
|
+
# success if basic_element could be matched, but the input is not consumed
|
171
|
+
could.match(pattern_element)
|
172
|
+
|
173
|
+
# negative (two equivelent methods)
|
174
|
+
dont.match(pattern_element)
|
175
|
+
match!(pattern_element)
|
176
|
+
|
177
|
+
# optional (two equivelent methods)
|
178
|
+
optionally.match(pattern_element)
|
179
|
+
match?(pattern_element)
|
180
|
+
|
181
|
+
# match 1 or more
|
182
|
+
many(pattern_element)
|
183
|
+
|
184
|
+
# match 1 or more of one basic_element delimited by another basic_element)
|
185
|
+
many(pattern_element, delimiter_pattern_element)
|
186
|
+
|
187
|
+
# match 0 or more
|
188
|
+
many?(pattern_element)
|
189
|
+
|
190
|
+
# An array of patterns tells BB to match those patterns in order ("and" matching)
|
191
|
+
[pattern_element_a, pattern_element_b, pattern_element_c, ...]
|
192
|
+
|
193
|
+
# match any one of the listed patterns ("or" matching)
|
194
|
+
any(pattern_element_a, pattern_element_b, pattern_element_c, ...)
|
195
|
+
|
196
|
+
# optionally match any of the patterns
|
197
|
+
any?(pattern_element_a, pattern_element_b, pattern_element_c, ...)
|
198
|
+
|
199
|
+
# don't match any of the patterns
|
200
|
+
any!(pattern_element_a, pattern_element_b, pattern_element_c, ...)
|
201
|
+
|
202
|
+
```
|
203
|
+
|
204
|
+
## Custom Pattern Element Parser
|
205
|
+
|
206
|
+
Custom pattern elements are not generally needed, but for certain patterns, particularly context sensative ones, we provide a way to do it.
|
207
|
+
|
208
|
+
``` ruby
|
209
|
+
class MyParser < BabelBridge::Parser
|
210
|
+
|
211
|
+
# custom parser to match an all upper-case word followed by any number of characters before that word is repeated
|
212
|
+
rule :foo, (custom_parser do |parent_node|
|
213
|
+
offset = parent_node.next
|
214
|
+
src = parent_node.src
|
215
|
+
|
216
|
+
# Note, the \A anchors the search at the beginning of the string
|
217
|
+
if src[offset..-1].index(/\A[A-Z]+/) == 0
|
218
|
+
endpattern=$~.to_s
|
219
|
+
if i = src.index(endpattern, offset + endpattern.length)
|
220
|
+
range = offset..(i + endpattern.length)
|
221
|
+
BabelBridge::TerminalNode.new(parent_node, range, "endpattern")
|
222
|
+
end
|
223
|
+
end
|
224
|
+
end)
|
225
|
+
end
|
226
|
+
|
227
|
+
parser = parser
|
228
|
+
parser.parse "END this is in the middle END"
|
229
|
+
# => FooNode1 > "END this is in the middle END"
|
230
|
+
|
231
|
+
parser.parse "DRUID this is in the middle DRUID"
|
232
|
+
# => FooNode1 > "DRUID this is in the middle DRUID"
|
233
|
+
|
234
|
+
parser.parse "DRUID this is in the middle DRUI"
|
235
|
+
# => nil
|
236
|
+
```
|
237
|
+
|
238
|
+
Structure
|
239
|
+
---------
|
240
|
+
|
241
|
+
* Each Rule defines a subclass of Node
|
242
|
+
* Each RuleVariant defines a subclass of the parent Rule's node-class
|
243
|
+
|
244
|
+
Therefor you can easily define code to be shared across all variants as well as define code specific to one variant.
|