babel_bridge 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README +144 -0
- data/babel_bridge.gemspec +14 -0
- data/lib/babel_bridge.rb +529 -0
- data/lib/nodes.rb +256 -0
- data/test/test_bb.rb +387 -0
- data/test/test_helper.rb +44 -0
- metadata +60 -0
data/README
ADDED
@@ -0,0 +1,144 @@
|
|
1
|
+
Summary
|
2
|
+
-------
|
3
|
+
|
4
|
+
Babel Bridge let's you generate parsers 100% in Ruby code. It is a memoizing Parsing Expression Grammar (PEG) generator like Treetop, but it doesn't require special file-types or new syntax. Overall focus is on simplicity and usability over performance.
|
5
|
+
|
6
|
+
Example
|
7
|
+
-------
|
8
|
+
|
9
|
+
require "babel_bridge"
|
10
|
+
class MyParser < BabelBridge::Parser
|
11
|
+
rule :foo, "foo", :bar? # match "foo" optionally followed by the :bar
|
12
|
+
rule :bar, "bar" # match "bar"
|
13
|
+
end
|
14
|
+
|
15
|
+
MyParser.new.parse("foo") # matches "foo"
|
16
|
+
MyParser.new.parse("foobar") # matches "foobar"
|
17
|
+
|
18
|
+
Babel Bridge is a parser-generator for Parsing Expression Grammars
|
19
|
+
|
20
|
+
Goals
|
21
|
+
-----
|
22
|
+
|
23
|
+
Allow expression 100% in ruby
|
24
|
+
Productivity through Simplicity and Understandability first
|
25
|
+
Performance second
|
26
|
+
|
27
|
+
Features
|
28
|
+
--------
|
29
|
+
|
30
|
+
rule=MyParser[:foo] # returns the BabelBridge::Rule instance for that rule
|
31
|
+
|
32
|
+
rule.to_s
|
33
|
+
nice human-readable view of the rule with extra info
|
34
|
+
|
35
|
+
rule.inspect
|
36
|
+
returns the code necessary for generating the rule and all its variants
|
37
|
+
(minus any class_eval code)
|
38
|
+
|
39
|
+
MyParser.node_class(rule)
|
40
|
+
returns the Node class for a rule
|
41
|
+
|
42
|
+
MyParser.node_class(rule) do
|
43
|
+
# class_eval inside the rule's Node-class
|
44
|
+
end
|
45
|
+
|
46
|
+
MyParser.new.parse(text)
|
47
|
+
# parses Text starting with the MyParser.root_rule
|
48
|
+
# The root_rule is defined automatically by the first rule defined, but can be set by:
|
49
|
+
# MyParser.root_rule=v # where v is the symbol name of the rule or the actual rule object from MyParser[rule]
|
50
|
+
MyParser.new.parse(text,offset,rule) # only has to match the rule - it's ok if there is input left
|
51
|
+
parser.parse uses the root_rule
|
52
|
+
|
53
|
+
detailed parser_failure_info report
|
54
|
+
|
55
|
+
Defining Rules
|
56
|
+
--------------
|
57
|
+
|
58
|
+
Inside the parser class, a rule is defined as follows:
|
59
|
+
|
60
|
+
class MyParser < BabelBridge::Parser
|
61
|
+
rule :rule_name, pattern
|
62
|
+
end
|
63
|
+
|
64
|
+
Where:
|
65
|
+
|
66
|
+
:rule_name is a symbol
|
67
|
+
pattern see Patterns below
|
68
|
+
|
69
|
+
You can also add new rules outside the class definition by:
|
70
|
+
|
71
|
+
MyParser.rule :rule_name, pattern
|
72
|
+
|
73
|
+
Patterns
|
74
|
+
--------
|
75
|
+
|
76
|
+
Patterns are an Array of pattern elements, matched in order:
|
77
|
+
|
78
|
+
Ex (both are equivelent):
|
79
|
+
rule :my_rule, "match", "this", "in", "order" # matches "matchthisinorder"
|
80
|
+
rule :my_rule, ["match", "this", "in", "order"] # matches "matchthisinorder"
|
81
|
+
|
82
|
+
Pattern Elements
|
83
|
+
----------------
|
84
|
+
|
85
|
+
Pattern elements are basic-pattern-element or extended-pattern-element ( expressed as a hash). Internally, they are "compiled" into instances of PatternElement with optimized lambda functions for parsing.
|
86
|
+
|
87
|
+
basic-pattern-element:
|
88
|
+
:my_rule matches the Rule named :my_rule
|
89
|
+
:my_rule? optional: optionally matches Rule :my_rule
|
90
|
+
:my_rule! negative: success only if it DOESN'T match Rule :my_rule
|
91
|
+
"string" matches the string exactly
|
92
|
+
/regex/ matches the regex exactly
|
93
|
+
true always matches the empty string (useful as a no-op if you don't want to change the length of your pattern)
|
94
|
+
|
95
|
+
extended-pattern-element:
|
96
|
+
|
97
|
+
A Hash with :match or :parser set and zero or more additional options:
|
98
|
+
|
99
|
+
:match => basic_element
|
100
|
+
provide one of the basic elements above
|
101
|
+
NOTE: Optional and Negative options are preserved, but they are overridden by any such directives in the Hash-Element
|
102
|
+
|
103
|
+
:parser => lambda {|parent_node| ... }
|
104
|
+
Custom lambda function for parsing the input.
|
105
|
+
Return "nil" if could not find a parse, otherwise return a new Node, typically the TerminalNode
|
106
|
+
Make sure the returned node.next value is the index where you wish parsing to resume
|
107
|
+
|
108
|
+
:as => :my_name
|
109
|
+
Assign a name to an element for later programatic reference:
|
110
|
+
rule_variant_node_class_instance.my_name
|
111
|
+
|
112
|
+
:optionally => true
|
113
|
+
PEG equivelent: term?
|
114
|
+
turn this into an optional-match element
|
115
|
+
optional elements cannot be negative
|
116
|
+
|
117
|
+
:dont => true
|
118
|
+
PEG equivalent: !term
|
119
|
+
turn this into a Negative-match element
|
120
|
+
negative elements cannot be optional
|
121
|
+
|
122
|
+
:could => true
|
123
|
+
PEG equivalent: &term
|
124
|
+
|
125
|
+
:many => PatternElement
|
126
|
+
PEG equivalent: term+ (for "term*", use optionally + many)
|
127
|
+
accept 1 or more reptitions of this element delimited by PatternElement
|
128
|
+
NOTE: PatternElement can be "true" for no delimiter (since "true" matches the empty string)
|
129
|
+
|
130
|
+
:delimiter => PatternElement
|
131
|
+
pattern to match between the :many patterns
|
132
|
+
|
133
|
+
:post_delimiter => true # use the :delimiter PatternElement for final match
|
134
|
+
:post_delimiter => PatternElement # use custom post_delimiter PatternElement for final match
|
135
|
+
if true, then poly will match a delimiter after the last poly-match
|
136
|
+
|
137
|
+
Structure
|
138
|
+
---------
|
139
|
+
|
140
|
+
Each Rule defines a subclass of Node
|
141
|
+
Each RuleVariant defines a subclass of the parent Rule's node-class
|
142
|
+
|
143
|
+
Therefor you can easily define code to be shared across all variants as well
|
144
|
+
as define code specific to one variant.
|
@@ -0,0 +1,14 @@
|
|
1
|
+
$gemspec = Gem::Specification.new do |s|
|
2
|
+
s.name = "babel_bridge"
|
3
|
+
s.version = "0.1.0"
|
4
|
+
s.author = "Shane Brinkman-Davis"
|
5
|
+
s.date = "2010-11-28"
|
6
|
+
s.email = "shanebdavis@gmail.com"
|
7
|
+
s.homepage = "http://babel-bridge.rubyforge.org"
|
8
|
+
s.platform = Gem::Platform::RUBY
|
9
|
+
s.rubyforge_project = "babel-bridge"
|
10
|
+
s.summary = "A Ruby-based parser-generator based on Parsing Expression Grammars."
|
11
|
+
s.description = "Babel Bridge let's you generate parsers 100% in Ruby code. It is a memoizing Parsing Expression Grammar (PEG) generator like Treetop, but it doesn't require special file-types or new syntax. Overall focus is on simplicity and usability over performance."
|
12
|
+
s.files = ["LICENSE", "README", "Rakefile", "babel_bridge.gemspec", "{test,lib,doc,examples}/**/*"].map{|p| Dir[p]}.flatten
|
13
|
+
s.has_rdoc = false
|
14
|
+
end
|
data/lib/babel_bridge.rb
ADDED
@@ -0,0 +1,529 @@
|
|
1
|
+
=begin
|
2
|
+
|
3
|
+
See README
|
4
|
+
|
5
|
+
TODO-FEATURE: :pre_delimiter option
|
6
|
+
TODO-FEATURE: The "expecting" feature is so good I wonder if we should add the ability to automatically repair the parse!
|
7
|
+
This would need:
|
8
|
+
a) default values for regex termainals (string terminals are their own default values)
|
9
|
+
default values for regex should be verified to match the regex
|
10
|
+
b) an interactive prompter if there is more than one option
|
11
|
+
|
12
|
+
TODO-IMPROVEMENT: "Expecting" should show line numbers instead of char numbers, but it should only calculated
|
13
|
+
on demand. This means we need a smarter formatter for our possible-error-logging.
|
14
|
+
TODO-IMPROVEMENT: "Expecting" code lines dump should show line numbers
|
15
|
+
|
16
|
+
TODO-BUG: "Expecting" doesn't do the right thing of a "dont" clause matched
|
17
|
+
Should say "something other than #{the don't clause}"
|
18
|
+
Ideally, we would continue matching and list all the possible next clauses that would allow us to continue
|
19
|
+
|
20
|
+
IDEA: could use the "-" prefix operator to mean "dont":
|
21
|
+
-"this"
|
22
|
+
-:that
|
23
|
+
-match(:foo)
|
24
|
+
-many(:foo)
|
25
|
+
|
26
|
+
TODO-OPTIMIZATION: add memoizing (caching / dynamic-programming) to guarantee linear time parsing
|
27
|
+
http://en.wikipedia.org/wiki/Parsing_expression_grammar#Implementing_parsers_from_parsing_expression_grammars
|
28
|
+
=end
|
29
|
+
|
30
|
+
require File.dirname(__FILE__) + "/nodes.rb"
|
31
|
+
|
32
|
+
class String
|
33
|
+
def camelize
|
34
|
+
self.split("_").collect {|a| a.capitalize}.join
|
35
|
+
end
|
36
|
+
|
37
|
+
def first_lines(n)
|
38
|
+
lines=self.split("\n",-1)
|
39
|
+
lines.length<=n ? self : lines[0..n-1].join("\n")
|
40
|
+
end
|
41
|
+
|
42
|
+
def last_lines(n)
|
43
|
+
lines=self.split("\n",-1)
|
44
|
+
lines.length<=n ? self : lines[-n..-1].join("\n")
|
45
|
+
end
|
46
|
+
|
47
|
+
def line_col(offset)
|
48
|
+
lines=self[0..offset-1].split("\n")
|
49
|
+
return lines.length, lines[-1].length
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
module BabelBridge
|
54
|
+
|
55
|
+
# hash which can be used declaratively
|
56
|
+
class PatternElementHash < Hash
|
57
|
+
def method_missing(method_name, *args) #method_name is a symbol
|
58
|
+
return self if args.length==1 && !args[0] # if nil is provided, don't set anything
|
59
|
+
self[method_name]=args[0] || true # on the other hand, if no args are provided, assume true
|
60
|
+
self
|
61
|
+
end
|
62
|
+
end
|
63
|
+
|
64
|
+
# PatternElement provides optimized parsing for each Element of a pattern
|
65
|
+
# PatternElement provides all the logic for parsing:
|
66
|
+
# :many
|
67
|
+
# :optional
|
68
|
+
class PatternElement
|
69
|
+
attr_accessor :parser,:optional,:negative,:name,:terminal,:could_match
|
70
|
+
attr_accessor :match,:rule_variant
|
71
|
+
|
72
|
+
#match can be:
|
73
|
+
# true, Hash, Symbol, String, Regexp
|
74
|
+
def initialize(match,rule_variant)
|
75
|
+
self.rule_variant=rule_variant
|
76
|
+
init(match)
|
77
|
+
|
78
|
+
raise "pattern element cannot be both :dont and :optional" if negative && optional
|
79
|
+
end
|
80
|
+
|
81
|
+
def to_s
|
82
|
+
match.inspect
|
83
|
+
end
|
84
|
+
|
85
|
+
def parse(parent_node)
|
86
|
+
# run element parser
|
87
|
+
match=parser.call(parent_node)
|
88
|
+
|
89
|
+
# Negative patterns (PEG: !element)
|
90
|
+
match=match ? nil : EmptyNode.new(parent_node) if negative
|
91
|
+
|
92
|
+
# Optional patterns (PEG: element?)
|
93
|
+
match=EmptyNode.new(parent_node) if !match && optional
|
94
|
+
|
95
|
+
# Could-match patterns (PEG: &element)
|
96
|
+
match.match_length=0 if match && could_match
|
97
|
+
|
98
|
+
# return match
|
99
|
+
match
|
100
|
+
end
|
101
|
+
|
102
|
+
private
|
103
|
+
|
104
|
+
def init(match)
|
105
|
+
self.match=match
|
106
|
+
case match
|
107
|
+
when TrueClass then init_true
|
108
|
+
when Hash then init_hash match
|
109
|
+
when Symbol then init_rule match
|
110
|
+
when String then init_string match
|
111
|
+
when Regexp then init_regex match
|
112
|
+
else raise "invalid pattern type: #{match.inspect}"
|
113
|
+
end
|
114
|
+
end
|
115
|
+
|
116
|
+
def init_rule(rule_name)
|
117
|
+
rule_name.to_s[/^([^?!]*)([?!])?$/]
|
118
|
+
rule_name=$1.to_sym
|
119
|
+
option=$2
|
120
|
+
match_rule=rule_variant.rule.parser.rules[rule_name]
|
121
|
+
raise "no rule for #{rule_name}" unless match_rule
|
122
|
+
|
123
|
+
self.parser =lambda {|parent_node| match_rule.parse(parent_node)}
|
124
|
+
self.name = rule_name
|
125
|
+
case option
|
126
|
+
when "?" then self.optional=true
|
127
|
+
when "!" then self.negative=true
|
128
|
+
end
|
129
|
+
end
|
130
|
+
|
131
|
+
def init_hash(hash)
|
132
|
+
if hash[:parser]
|
133
|
+
self.parser=hash[:parser]
|
134
|
+
elsif hash[:many]
|
135
|
+
init hash[:many]
|
136
|
+
#generate parser for poly
|
137
|
+
delimiter_pattern_element= PatternElement.new(hash[:delimiter]||true,rule_variant)
|
138
|
+
|
139
|
+
post_delimiter_element=case hash[:post_delimiter]
|
140
|
+
when TrueClass then delimiter_pattern_element
|
141
|
+
when nil then nil
|
142
|
+
else PatternElement.new(hash[:post_delimiter],rule_variant)
|
143
|
+
end
|
144
|
+
|
145
|
+
# convert the single element parser into a poly-parser
|
146
|
+
single_parser=parser
|
147
|
+
self.parser= lambda do |parent_node|
|
148
|
+
last_match=single_parser.call(parent_node)
|
149
|
+
many_node=ManyNode.new(parent_node)
|
150
|
+
while last_match
|
151
|
+
many_node<<last_match
|
152
|
+
|
153
|
+
#match delimiter
|
154
|
+
delimiter_match=delimiter_pattern_element.parse(many_node)
|
155
|
+
break unless delimiter_match
|
156
|
+
many_node.delimiter_matches<<delimiter_match
|
157
|
+
|
158
|
+
#match next
|
159
|
+
last_match=single_parser.call(many_node)
|
160
|
+
end
|
161
|
+
|
162
|
+
# success only if we have at least one match
|
163
|
+
return nil unless many_node.length>0
|
164
|
+
|
165
|
+
# pop the post delimiter matched with delimiter_pattern_element
|
166
|
+
many_node.delimiter_matches.pop if many_node.length==many_node.delimiter_matches.length
|
167
|
+
|
168
|
+
# If post_delimiter is requested, many_node and delimiter_matches must be the same length
|
169
|
+
if post_delimiter_element
|
170
|
+
post_delimiter_match=post_delimiter_element.parse(many_node)
|
171
|
+
|
172
|
+
# fail if post_delimiter didn't match
|
173
|
+
return nil unless post_delimiter_match
|
174
|
+
many_node.delimiter_matches<<post_delimiter_match
|
175
|
+
end
|
176
|
+
|
177
|
+
many_node
|
178
|
+
end
|
179
|
+
elsif hash[:match]
|
180
|
+
init hash[:match]
|
181
|
+
else
|
182
|
+
raise "extended-options patterns (specified by a hash) must have either :parser=> or a :match=> set"
|
183
|
+
end
|
184
|
+
|
185
|
+
self.name = hash[:as] || self.name
|
186
|
+
self.optional ||= hash[:optional] || hash[:optionally]
|
187
|
+
self.could_match ||= hash[:could]
|
188
|
+
self.negative ||= hash[:dont]
|
189
|
+
|
190
|
+
end
|
191
|
+
|
192
|
+
# "true" parser always matches the empty string
|
193
|
+
def init_true
|
194
|
+
self.parser=lambda {|parent_node| EmptyNode.new(parent_node)}
|
195
|
+
end
|
196
|
+
|
197
|
+
# parser that matches exactly the string specified
|
198
|
+
def init_string(string)
|
199
|
+
self.parser=lambda {|parent_node| parent_node.src[parent_node.next,string.length]==string && TerminalNode.new(parent_node,string.length,string)}
|
200
|
+
self.terminal=true
|
201
|
+
end
|
202
|
+
|
203
|
+
# parser that matches the given regex
|
204
|
+
def init_regex(regex)
|
205
|
+
self.parser=lambda {|parent_node| offset=parent_node.next;parent_node.src.index(regex,offset)==offset && (o=$~.offset(0)) && TerminalNode.new(parent_node,o[1]-o[0],regex)}
|
206
|
+
self.terminal=true
|
207
|
+
end
|
208
|
+
|
209
|
+
end
|
210
|
+
|
211
|
+
|
212
|
+
# Each Rule has one or more RuleVariant
|
213
|
+
# Rules attempt to match each of their Variants in order. The first one to succeed returns true and the Rule succeeds.
|
214
|
+
class RuleVariant
|
215
|
+
attr_accessor :pattern,:rule,:node_class
|
216
|
+
|
217
|
+
def initialize(pattern,rule,node_class=nil)
|
218
|
+
self.pattern=pattern
|
219
|
+
self.rule=rule
|
220
|
+
self.node_class=node_class
|
221
|
+
end
|
222
|
+
|
223
|
+
def inspect
|
224
|
+
pattern.collect{|a|a.inspect}.join(', ')
|
225
|
+
end
|
226
|
+
|
227
|
+
def to_s
|
228
|
+
"variant_class: #{node_class}, pattern: #{inspect}"
|
229
|
+
end
|
230
|
+
|
231
|
+
# convert the pattern into a set of lamba functions
|
232
|
+
def pattern_elements
|
233
|
+
@pattern_elements||=pattern.collect { |match| PatternElement.new match, self }
|
234
|
+
end
|
235
|
+
|
236
|
+
# returns a Node object if it matches, nil otherwise
|
237
|
+
def parse(parent_node)
|
238
|
+
#return parse_nongreedy_optional(src,offset,parent_node) # nongreedy optionals break standard PEG
|
239
|
+
node=node_class.new(parent_node)
|
240
|
+
|
241
|
+
pattern_elements.each do |pe|
|
242
|
+
match=pe.parse(node)
|
243
|
+
|
244
|
+
# if parse failed
|
245
|
+
if !match
|
246
|
+
if pe.terminal
|
247
|
+
# log failures on Terminal patterns for debug output if overall parse fails
|
248
|
+
node.parser.log_parsing_failure(node.next,:pattern=>pe.match,:node=>node)
|
249
|
+
end
|
250
|
+
return nil
|
251
|
+
end
|
252
|
+
|
253
|
+
# parse succeeded, add to node and continue
|
254
|
+
node.add_match(match,pe.name)
|
255
|
+
end
|
256
|
+
node
|
257
|
+
end
|
258
|
+
end
|
259
|
+
|
260
|
+
# Rules define one or more patterns (RuleVariants) to match for a given non-terminal
|
261
|
+
class Rule
|
262
|
+
attr_accessor :name,:variants,:parser,:node_class
|
263
|
+
|
264
|
+
def initialize(name,parser)
|
265
|
+
self.name=name
|
266
|
+
self.variants=[]
|
267
|
+
self.parser=parser
|
268
|
+
|
269
|
+
class_name = "#{parser.module_name}_#{name}_node".camelize
|
270
|
+
self.node_class = parser.const_set(class_name,Class.new(NodeNT))
|
271
|
+
end
|
272
|
+
|
273
|
+
def add_variant(pattern, &block)
|
274
|
+
|
275
|
+
rule_variant_class_name = "#{name}_node#{self.variants.length+1}".camelize
|
276
|
+
rule_variant_class = parser.const_set(rule_variant_class_name,Class.new(node_class))
|
277
|
+
self.variants << RuleVariant.new(pattern,self,rule_variant_class)
|
278
|
+
rule_variant_class.class_eval &block if block
|
279
|
+
rule_variant_class
|
280
|
+
end
|
281
|
+
|
282
|
+
def parse(node)
|
283
|
+
if cached=node.parser.cached(name,node.next)
|
284
|
+
return cached==:no_match ? nil : cached # return nil if cached==:no_matched
|
285
|
+
end
|
286
|
+
|
287
|
+
variants.each do |v|
|
288
|
+
match=v.parse(node)
|
289
|
+
if match
|
290
|
+
node.parser.cache_match(name,match)
|
291
|
+
return match
|
292
|
+
end
|
293
|
+
end
|
294
|
+
node.parser.cache_no_match(name,node.next)
|
295
|
+
nil
|
296
|
+
end
|
297
|
+
|
298
|
+
# inspect returns a string which approximates the syntax for generating the rule and all its variants
|
299
|
+
def inspect
|
300
|
+
variants.collect do |v|
|
301
|
+
"rule #{name.inspect}, #{v.inspect}"
|
302
|
+
end.join("\n")
|
303
|
+
end
|
304
|
+
|
305
|
+
# returns a more human-readable explanation of the rule
|
306
|
+
def to_s
|
307
|
+
"rule #{name.inspect}, node_class: #{node_class}\n\t"+
|
308
|
+
"#{variants.collect {|v|v.to_s}.join("\n\t")}"
|
309
|
+
end
|
310
|
+
end
|
311
|
+
|
312
|
+
# primary object used by the client
|
313
|
+
# Used to generate the grammer with .rule methods
|
314
|
+
# Used to parse with .parse
|
315
|
+
class Parser
|
316
|
+
|
317
|
+
# Parser sub-class grammaer definition
|
318
|
+
# These methods are used in the creation of a Parser Sub-Class to define
|
319
|
+
# its grammar
|
320
|
+
class <<self
|
321
|
+
attr_accessor :rules,:module_name,:root_rule
|
322
|
+
|
323
|
+
def rules
|
324
|
+
@rules||={}
|
325
|
+
end
|
326
|
+
# rules can be specified as:
|
327
|
+
# parser.rule :name, to_match1, to_match2, etc...
|
328
|
+
#or
|
329
|
+
# parser.rule :name, [to_match1, to_match2, etc...]
|
330
|
+
def rule(name,*pattern,&block)
|
331
|
+
pattern=pattern[0] if pattern[0].kind_of?(Array)
|
332
|
+
rule=self.rules[name]||=Rule.new(name,self)
|
333
|
+
self.root_rule||=name
|
334
|
+
rule.add_variant(pattern,&block)
|
335
|
+
end
|
336
|
+
|
337
|
+
def node_class(name,&block)
|
338
|
+
klass=self.rules[name].node_class
|
339
|
+
return klass unless block
|
340
|
+
klass.class_eval &block
|
341
|
+
end
|
342
|
+
|
343
|
+
def [](i)
|
344
|
+
rules[i]
|
345
|
+
end
|
346
|
+
|
347
|
+
# rule can be symbol-name of one of the rules in rules_array or one of the actual Rule objects in that array
|
348
|
+
def root_rule=(rule)
|
349
|
+
raise "Symbol required" unless rule.kind_of?(Symbol)
|
350
|
+
raise "rule #{rule.inspect} not found" unless rules[rule]
|
351
|
+
@root_rule=rule
|
352
|
+
end
|
353
|
+
end
|
354
|
+
|
355
|
+
#*********************************************
|
356
|
+
# pattern construction tools
|
357
|
+
#
|
358
|
+
# Ex:
|
359
|
+
# # match 'keyword'
|
360
|
+
# # (succeeds if keyword is matched; advances the read pointer)
|
361
|
+
# rule :sample_rule, "keyword"
|
362
|
+
# rule :sample_rule, match("keyword")
|
363
|
+
#
|
364
|
+
# # don't match 'keyword'
|
365
|
+
# # (succeeds only if keyword is NOT matched; does not advance the read pointer)
|
366
|
+
# rule :sample_rule, match!("keyword")
|
367
|
+
# rule :sample_rule, dont.match("keyword")
|
368
|
+
#
|
369
|
+
# # optionally match 'keyword'
|
370
|
+
# # (always succeeds; advances the read pointer if keyword is matched)
|
371
|
+
# rule :sample_rule, match?("keyword")
|
372
|
+
# rule :sample_rule, optionally.match("keyword")
|
373
|
+
#
|
374
|
+
# # ensure we could match 'keyword'
|
375
|
+
# # (succeeds only if keyword is matched, but does not advance the read pointer)
|
376
|
+
# rule :sample_rule, could.match("keyword")
|
377
|
+
#
|
378
|
+
|
379
|
+
# dont.match("keyword") #
|
380
|
+
#*********************************************
|
381
|
+
class <<self
|
382
|
+
def many(m,delimiter=nil,post_delimiter=nil) PatternElementHash.new.match.many(m).delimiter(delimiter).post_delimiter(post_delimiter) end
|
383
|
+
def many?(m,delimiter=nil,post_delimiter=nil) PatternElementHash.new.optionally.match.many(m).delimiter(delimiter).post_delimiter(post_delimiter) end
|
384
|
+
def many!(m,delimiter=nil,post_delimiter=nil) PatternElementHash.new.dont.match.many(m).delimiter(delimiter).post_delimiter(post_delimiter) end
|
385
|
+
|
386
|
+
def match?(*args) PatternElementHash.new.optionally.match(*args) end
|
387
|
+
def match(*args) PatternElementHash.new.match(*args) end
|
388
|
+
def match!(*args) PatternElementHash.new.dont.match(*args) end
|
389
|
+
|
390
|
+
def dont; PatternElementHash.new.dont end
|
391
|
+
def optionally; PatternElementHash.new.optionally end
|
392
|
+
def could; PatternElementHash.new.could end
|
393
|
+
end
|
394
|
+
|
395
|
+
|
396
|
+
#*********************************************
|
397
|
+
#*********************************************
|
398
|
+
# parser instance implementation
|
399
|
+
# this methods are used for each actual parse run
|
400
|
+
# they are tied to an instnace of the Parser Sub-class to you can have more than one
|
401
|
+
# parser active at a time
|
402
|
+
attr_accessor :failure_index
|
403
|
+
attr_accessor :expecting_list
|
404
|
+
attr_accessor :src
|
405
|
+
attr_accessor :parse_cache
|
406
|
+
|
407
|
+
def initialize
|
408
|
+
reset_parser_tracking
|
409
|
+
end
|
410
|
+
|
411
|
+
def reset_parser_tracking
|
412
|
+
self.src=nil
|
413
|
+
self.failure_index=0
|
414
|
+
self.expecting_list={}
|
415
|
+
self.parse_cache={}
|
416
|
+
end
|
417
|
+
|
418
|
+
def cached(rule_class,offset)
|
419
|
+
(parse_cache[rule_class]||={})[offset]
|
420
|
+
end
|
421
|
+
|
422
|
+
def cache_match(rule_class,match)
|
423
|
+
(parse_cache[rule_class]||={})[match.offset]=match
|
424
|
+
end
|
425
|
+
|
426
|
+
def cache_no_match(rule_class,offset)
|
427
|
+
(parse_cache[rule_class]||={})[offset]=:no_match
|
428
|
+
end
|
429
|
+
|
430
|
+
def log_parsing_failure(index,expecting)
|
431
|
+
if index>failure_index
|
432
|
+
key=expecting[:pattern]
|
433
|
+
@expecting_list={key=>expecting}
|
434
|
+
@failure_index = index
|
435
|
+
elsif index == failure_index
|
436
|
+
key=expecting[:pattern]
|
437
|
+
self.expecting_list[key]=expecting
|
438
|
+
else
|
439
|
+
# ignored
|
440
|
+
end
|
441
|
+
end
|
442
|
+
|
443
|
+
|
444
|
+
def parse(src,offset=0,rule=nil)
|
445
|
+
reset_parser_tracking
|
446
|
+
@start_time=Time.now
|
447
|
+
self.src=src
|
448
|
+
root_node=RootNode.new(self)
|
449
|
+
ret=self.class[rule||self.class.root_rule].parse(root_node)
|
450
|
+
unless rule
|
451
|
+
if ret
|
452
|
+
if ret.next<src.length # parse only succeeds if the whole input is matched
|
453
|
+
@parsing_did_not_match_entire_input=true
|
454
|
+
@failure_index=ret.next
|
455
|
+
ret=nil
|
456
|
+
else
|
457
|
+
reset_parser_tracking
|
458
|
+
end
|
459
|
+
end
|
460
|
+
end
|
461
|
+
@end_time=Time.now
|
462
|
+
ret
|
463
|
+
end
|
464
|
+
|
465
|
+
def parse_time
|
466
|
+
@end_time-@start_time
|
467
|
+
end
|
468
|
+
|
469
|
+
def parse_and_puts_errors(src,out=$stdout)
|
470
|
+
ret=parse(src)
|
471
|
+
unless ret
|
472
|
+
out.puts parser_failure_info
|
473
|
+
end
|
474
|
+
ret
|
475
|
+
end
|
476
|
+
|
477
|
+
def node_list_string(node_list,common_root=[])
|
478
|
+
node_list && node_list[common_root.length..-1].map{|p|"#{p.class}(#{p.offset})"}.join(" > ")
|
479
|
+
end
|
480
|
+
|
481
|
+
def parser_failure_info
|
482
|
+
return unless src
|
483
|
+
bracketing_lines=5
|
484
|
+
line,col=src.line_col(failure_index)
|
485
|
+
ret=<<-ENDTXT
|
486
|
+
Parsing error at line #{line} column #{col} offset #{failure_index}
|
487
|
+
|
488
|
+
Source:
|
489
|
+
...
|
490
|
+
#{(failure_index==0 ? "" : src[0..(failure_index-1)]).last_lines(bracketing_lines)}<HERE>#{src[(failure_index)..-1].first_lines(bracketing_lines)}
|
491
|
+
...
|
492
|
+
ENDTXT
|
493
|
+
|
494
|
+
if @parsing_did_not_match_entire_input
|
495
|
+
ret+="\nParser did not match entire input."
|
496
|
+
else
|
497
|
+
|
498
|
+
common_root=nil
|
499
|
+
expecting_list.values.each do |e|
|
500
|
+
node=e[:node]
|
501
|
+
pl=node.parent_list
|
502
|
+
if common_root
|
503
|
+
common_root.each_index do |i|
|
504
|
+
if pl[i]!=common_root[i]
|
505
|
+
common_root=common_root[0..i-1]
|
506
|
+
break
|
507
|
+
end
|
508
|
+
end
|
509
|
+
else
|
510
|
+
common_root=node.parent_list
|
511
|
+
end
|
512
|
+
end
|
513
|
+
ret+=<<ENDTXT
|
514
|
+
|
515
|
+
Successfully matched rules up to failure:
|
516
|
+
#{node_list_string(common_root)}
|
517
|
+
|
518
|
+
Expecting#{expecting_list.length>1 ? ' one of' : ''}:
|
519
|
+
#{expecting_list.values.collect do |a|
|
520
|
+
list=node_list_string(a[:node].parent_list,common_root)
|
521
|
+
[list,"#{a[:pattern].inspect} (#{list})"]
|
522
|
+
end.sort.map{|i|i[1]}.join("\n ")}
|
523
|
+
ENDTXT
|
524
|
+
end
|
525
|
+
ret
|
526
|
+
end
|
527
|
+
end
|
528
|
+
end
|
529
|
+
|