babel_bridge 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README +144 -0
- data/babel_bridge.gemspec +14 -0
- data/lib/babel_bridge.rb +529 -0
- data/lib/nodes.rb +256 -0
- data/test/test_bb.rb +387 -0
- data/test/test_helper.rb +44 -0
- metadata +60 -0
data/README
ADDED
@@ -0,0 +1,144 @@
|
|
1
|
+
Summary
|
2
|
+
-------
|
3
|
+
|
4
|
+
Babel Bridge let's you generate parsers 100% in Ruby code. It is a memoizing Parsing Expression Grammar (PEG) generator like Treetop, but it doesn't require special file-types or new syntax. Overall focus is on simplicity and usability over performance.
|
5
|
+
|
6
|
+
Example
|
7
|
+
-------
|
8
|
+
|
9
|
+
require "babel_bridge"
|
10
|
+
class MyParser < BabelBridge::Parser
|
11
|
+
rule :foo, "foo", :bar? # match "foo" optionally followed by the :bar
|
12
|
+
rule :bar, "bar" # match "bar"
|
13
|
+
end
|
14
|
+
|
15
|
+
MyParser.new.parse("foo") # matches "foo"
|
16
|
+
MyParser.new.parse("foobar") # matches "foobar"
|
17
|
+
|
18
|
+
Babel Bridge is a parser-generator for Parsing Expression Grammars
|
19
|
+
|
20
|
+
Goals
|
21
|
+
-----
|
22
|
+
|
23
|
+
Allow expression 100% in ruby
|
24
|
+
Productivity through Simplicity and Understandability first
|
25
|
+
Performance second
|
26
|
+
|
27
|
+
Features
|
28
|
+
--------
|
29
|
+
|
30
|
+
rule=MyParser[:foo] # returns the BabelBridge::Rule instance for that rule
|
31
|
+
|
32
|
+
rule.to_s
|
33
|
+
nice human-readable view of the rule with extra info
|
34
|
+
|
35
|
+
rule.inspect
|
36
|
+
returns the code necessary for generating the rule and all its variants
|
37
|
+
(minus any class_eval code)
|
38
|
+
|
39
|
+
MyParser.node_class(rule)
|
40
|
+
returns the Node class for a rule
|
41
|
+
|
42
|
+
MyParser.node_class(rule) do
|
43
|
+
# class_eval inside the rule's Node-class
|
44
|
+
end
|
45
|
+
|
46
|
+
MyParser.new.parse(text)
|
47
|
+
# parses Text starting with the MyParser.root_rule
|
48
|
+
# The root_rule is defined automatically by the first rule defined, but can be set by:
|
49
|
+
# MyParser.root_rule=v # where v is the symbol name of the rule or the actual rule object from MyParser[rule]
|
50
|
+
MyParser.new.parse(text,offset,rule) # only has to match the rule - it's ok if there is input left
|
51
|
+
parser.parse uses the root_rule
|
52
|
+
|
53
|
+
detailed parser_failure_info report
|
54
|
+
|
55
|
+
Defining Rules
|
56
|
+
--------------
|
57
|
+
|
58
|
+
Inside the parser class, a rule is defined as follows:
|
59
|
+
|
60
|
+
class MyParser < BabelBridge::Parser
|
61
|
+
rule :rule_name, pattern
|
62
|
+
end
|
63
|
+
|
64
|
+
Where:
|
65
|
+
|
66
|
+
:rule_name is a symbol
|
67
|
+
pattern see Patterns below
|
68
|
+
|
69
|
+
You can also add new rules outside the class definition by:
|
70
|
+
|
71
|
+
MyParser.rule :rule_name, pattern
|
72
|
+
|
73
|
+
Patterns
|
74
|
+
--------
|
75
|
+
|
76
|
+
Patterns are an Array of pattern elements, matched in order:
|
77
|
+
|
78
|
+
Ex (both are equivelent):
|
79
|
+
rule :my_rule, "match", "this", "in", "order" # matches "matchthisinorder"
|
80
|
+
rule :my_rule, ["match", "this", "in", "order"] # matches "matchthisinorder"
|
81
|
+
|
82
|
+
Pattern Elements
|
83
|
+
----------------
|
84
|
+
|
85
|
+
Pattern elements are basic-pattern-element or extended-pattern-element ( expressed as a hash). Internally, they are "compiled" into instances of PatternElement with optimized lambda functions for parsing.
|
86
|
+
|
87
|
+
basic-pattern-element:
|
88
|
+
:my_rule matches the Rule named :my_rule
|
89
|
+
:my_rule? optional: optionally matches Rule :my_rule
|
90
|
+
:my_rule! negative: success only if it DOESN'T match Rule :my_rule
|
91
|
+
"string" matches the string exactly
|
92
|
+
/regex/ matches the regex exactly
|
93
|
+
true always matches the empty string (useful as a no-op if you don't want to change the length of your pattern)
|
94
|
+
|
95
|
+
extended-pattern-element:
|
96
|
+
|
97
|
+
A Hash with :match or :parser set and zero or more additional options:
|
98
|
+
|
99
|
+
:match => basic_element
|
100
|
+
provide one of the basic elements above
|
101
|
+
NOTE: Optional and Negative options are preserved, but they are overridden by any such directives in the Hash-Element
|
102
|
+
|
103
|
+
:parser => lambda {|parent_node| ... }
|
104
|
+
Custom lambda function for parsing the input.
|
105
|
+
Return "nil" if could not find a parse, otherwise return a new Node, typically the TerminalNode
|
106
|
+
Make sure the returned node.next value is the index where you wish parsing to resume
|
107
|
+
|
108
|
+
:as => :my_name
|
109
|
+
Assign a name to an element for later programatic reference:
|
110
|
+
rule_variant_node_class_instance.my_name
|
111
|
+
|
112
|
+
:optionally => true
|
113
|
+
PEG equivelent: term?
|
114
|
+
turn this into an optional-match element
|
115
|
+
optional elements cannot be negative
|
116
|
+
|
117
|
+
:dont => true
|
118
|
+
PEG equivalent: !term
|
119
|
+
turn this into a Negative-match element
|
120
|
+
negative elements cannot be optional
|
121
|
+
|
122
|
+
:could => true
|
123
|
+
PEG equivalent: &term
|
124
|
+
|
125
|
+
:many => PatternElement
|
126
|
+
PEG equivalent: term+ (for "term*", use optionally + many)
|
127
|
+
accept 1 or more reptitions of this element delimited by PatternElement
|
128
|
+
NOTE: PatternElement can be "true" for no delimiter (since "true" matches the empty string)
|
129
|
+
|
130
|
+
:delimiter => PatternElement
|
131
|
+
pattern to match between the :many patterns
|
132
|
+
|
133
|
+
:post_delimiter => true # use the :delimiter PatternElement for final match
|
134
|
+
:post_delimiter => PatternElement # use custom post_delimiter PatternElement for final match
|
135
|
+
if true, then poly will match a delimiter after the last poly-match
|
136
|
+
|
137
|
+
Structure
|
138
|
+
---------
|
139
|
+
|
140
|
+
Each Rule defines a subclass of Node
|
141
|
+
Each RuleVariant defines a subclass of the parent Rule's node-class
|
142
|
+
|
143
|
+
Therefor you can easily define code to be shared across all variants as well
|
144
|
+
as define code specific to one variant.
|
@@ -0,0 +1,14 @@
|
|
1
|
+
$gemspec = Gem::Specification.new do |s|
|
2
|
+
s.name = "babel_bridge"
|
3
|
+
s.version = "0.1.0"
|
4
|
+
s.author = "Shane Brinkman-Davis"
|
5
|
+
s.date = "2010-11-28"
|
6
|
+
s.email = "shanebdavis@gmail.com"
|
7
|
+
s.homepage = "http://babel-bridge.rubyforge.org"
|
8
|
+
s.platform = Gem::Platform::RUBY
|
9
|
+
s.rubyforge_project = "babel-bridge"
|
10
|
+
s.summary = "A Ruby-based parser-generator based on Parsing Expression Grammars."
|
11
|
+
s.description = "Babel Bridge let's you generate parsers 100% in Ruby code. It is a memoizing Parsing Expression Grammar (PEG) generator like Treetop, but it doesn't require special file-types or new syntax. Overall focus is on simplicity and usability over performance."
|
12
|
+
s.files = ["LICENSE", "README", "Rakefile", "babel_bridge.gemspec", "{test,lib,doc,examples}/**/*"].map{|p| Dir[p]}.flatten
|
13
|
+
s.has_rdoc = false
|
14
|
+
end
|
data/lib/babel_bridge.rb
ADDED
@@ -0,0 +1,529 @@
|
|
1
|
+
=begin
|
2
|
+
|
3
|
+
See README
|
4
|
+
|
5
|
+
TODO-FEATURE: :pre_delimiter option
|
6
|
+
TODO-FEATURE: The "expecting" feature is so good I wonder if we should add the ability to automatically repair the parse!
|
7
|
+
This would need:
|
8
|
+
a) default values for regex termainals (string terminals are their own default values)
|
9
|
+
default values for regex should be verified to match the regex
|
10
|
+
b) an interactive prompter if there is more than one option
|
11
|
+
|
12
|
+
TODO-IMPROVEMENT: "Expecting" should show line numbers instead of char numbers, but it should only calculated
|
13
|
+
on demand. This means we need a smarter formatter for our possible-error-logging.
|
14
|
+
TODO-IMPROVEMENT: "Expecting" code lines dump should show line numbers
|
15
|
+
|
16
|
+
TODO-BUG: "Expecting" doesn't do the right thing of a "dont" clause matched
|
17
|
+
Should say "something other than #{the don't clause}"
|
18
|
+
Ideally, we would continue matching and list all the possible next clauses that would allow us to continue
|
19
|
+
|
20
|
+
IDEA: could use the "-" prefix operator to mean "dont":
|
21
|
+
-"this"
|
22
|
+
-:that
|
23
|
+
-match(:foo)
|
24
|
+
-many(:foo)
|
25
|
+
|
26
|
+
TODO-OPTIMIZATION: add memoizing (caching / dynamic-programming) to guarantee linear time parsing
|
27
|
+
http://en.wikipedia.org/wiki/Parsing_expression_grammar#Implementing_parsers_from_parsing_expression_grammars
|
28
|
+
=end
|
29
|
+
|
30
|
+
require File.dirname(__FILE__) + "/nodes.rb"
|
31
|
+
|
32
|
+
class String
|
33
|
+
def camelize
|
34
|
+
self.split("_").collect {|a| a.capitalize}.join
|
35
|
+
end
|
36
|
+
|
37
|
+
def first_lines(n)
|
38
|
+
lines=self.split("\n",-1)
|
39
|
+
lines.length<=n ? self : lines[0..n-1].join("\n")
|
40
|
+
end
|
41
|
+
|
42
|
+
def last_lines(n)
|
43
|
+
lines=self.split("\n",-1)
|
44
|
+
lines.length<=n ? self : lines[-n..-1].join("\n")
|
45
|
+
end
|
46
|
+
|
47
|
+
def line_col(offset)
|
48
|
+
lines=self[0..offset-1].split("\n")
|
49
|
+
return lines.length, lines[-1].length
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
module BabelBridge
|
54
|
+
|
55
|
+
# hash which can be used declaratively
|
56
|
+
class PatternElementHash < Hash
|
57
|
+
def method_missing(method_name, *args) #method_name is a symbol
|
58
|
+
return self if args.length==1 && !args[0] # if nil is provided, don't set anything
|
59
|
+
self[method_name]=args[0] || true # on the other hand, if no args are provided, assume true
|
60
|
+
self
|
61
|
+
end
|
62
|
+
end
|
63
|
+
|
64
|
+
# PatternElement provides optimized parsing for each Element of a pattern
|
65
|
+
# PatternElement provides all the logic for parsing:
|
66
|
+
# :many
|
67
|
+
# :optional
|
68
|
+
class PatternElement
|
69
|
+
attr_accessor :parser,:optional,:negative,:name,:terminal,:could_match
|
70
|
+
attr_accessor :match,:rule_variant
|
71
|
+
|
72
|
+
#match can be:
|
73
|
+
# true, Hash, Symbol, String, Regexp
|
74
|
+
def initialize(match,rule_variant)
|
75
|
+
self.rule_variant=rule_variant
|
76
|
+
init(match)
|
77
|
+
|
78
|
+
raise "pattern element cannot be both :dont and :optional" if negative && optional
|
79
|
+
end
|
80
|
+
|
81
|
+
def to_s
|
82
|
+
match.inspect
|
83
|
+
end
|
84
|
+
|
85
|
+
def parse(parent_node)
|
86
|
+
# run element parser
|
87
|
+
match=parser.call(parent_node)
|
88
|
+
|
89
|
+
# Negative patterns (PEG: !element)
|
90
|
+
match=match ? nil : EmptyNode.new(parent_node) if negative
|
91
|
+
|
92
|
+
# Optional patterns (PEG: element?)
|
93
|
+
match=EmptyNode.new(parent_node) if !match && optional
|
94
|
+
|
95
|
+
# Could-match patterns (PEG: &element)
|
96
|
+
match.match_length=0 if match && could_match
|
97
|
+
|
98
|
+
# return match
|
99
|
+
match
|
100
|
+
end
|
101
|
+
|
102
|
+
private
|
103
|
+
|
104
|
+
def init(match)
|
105
|
+
self.match=match
|
106
|
+
case match
|
107
|
+
when TrueClass then init_true
|
108
|
+
when Hash then init_hash match
|
109
|
+
when Symbol then init_rule match
|
110
|
+
when String then init_string match
|
111
|
+
when Regexp then init_regex match
|
112
|
+
else raise "invalid pattern type: #{match.inspect}"
|
113
|
+
end
|
114
|
+
end
|
115
|
+
|
116
|
+
def init_rule(rule_name)
|
117
|
+
rule_name.to_s[/^([^?!]*)([?!])?$/]
|
118
|
+
rule_name=$1.to_sym
|
119
|
+
option=$2
|
120
|
+
match_rule=rule_variant.rule.parser.rules[rule_name]
|
121
|
+
raise "no rule for #{rule_name}" unless match_rule
|
122
|
+
|
123
|
+
self.parser =lambda {|parent_node| match_rule.parse(parent_node)}
|
124
|
+
self.name = rule_name
|
125
|
+
case option
|
126
|
+
when "?" then self.optional=true
|
127
|
+
when "!" then self.negative=true
|
128
|
+
end
|
129
|
+
end
|
130
|
+
|
131
|
+
def init_hash(hash)
|
132
|
+
if hash[:parser]
|
133
|
+
self.parser=hash[:parser]
|
134
|
+
elsif hash[:many]
|
135
|
+
init hash[:many]
|
136
|
+
#generate parser for poly
|
137
|
+
delimiter_pattern_element= PatternElement.new(hash[:delimiter]||true,rule_variant)
|
138
|
+
|
139
|
+
post_delimiter_element=case hash[:post_delimiter]
|
140
|
+
when TrueClass then delimiter_pattern_element
|
141
|
+
when nil then nil
|
142
|
+
else PatternElement.new(hash[:post_delimiter],rule_variant)
|
143
|
+
end
|
144
|
+
|
145
|
+
# convert the single element parser into a poly-parser
|
146
|
+
single_parser=parser
|
147
|
+
self.parser= lambda do |parent_node|
|
148
|
+
last_match=single_parser.call(parent_node)
|
149
|
+
many_node=ManyNode.new(parent_node)
|
150
|
+
while last_match
|
151
|
+
many_node<<last_match
|
152
|
+
|
153
|
+
#match delimiter
|
154
|
+
delimiter_match=delimiter_pattern_element.parse(many_node)
|
155
|
+
break unless delimiter_match
|
156
|
+
many_node.delimiter_matches<<delimiter_match
|
157
|
+
|
158
|
+
#match next
|
159
|
+
last_match=single_parser.call(many_node)
|
160
|
+
end
|
161
|
+
|
162
|
+
# success only if we have at least one match
|
163
|
+
return nil unless many_node.length>0
|
164
|
+
|
165
|
+
# pop the post delimiter matched with delimiter_pattern_element
|
166
|
+
many_node.delimiter_matches.pop if many_node.length==many_node.delimiter_matches.length
|
167
|
+
|
168
|
+
# If post_delimiter is requested, many_node and delimiter_matches must be the same length
|
169
|
+
if post_delimiter_element
|
170
|
+
post_delimiter_match=post_delimiter_element.parse(many_node)
|
171
|
+
|
172
|
+
# fail if post_delimiter didn't match
|
173
|
+
return nil unless post_delimiter_match
|
174
|
+
many_node.delimiter_matches<<post_delimiter_match
|
175
|
+
end
|
176
|
+
|
177
|
+
many_node
|
178
|
+
end
|
179
|
+
elsif hash[:match]
|
180
|
+
init hash[:match]
|
181
|
+
else
|
182
|
+
raise "extended-options patterns (specified by a hash) must have either :parser=> or a :match=> set"
|
183
|
+
end
|
184
|
+
|
185
|
+
self.name = hash[:as] || self.name
|
186
|
+
self.optional ||= hash[:optional] || hash[:optionally]
|
187
|
+
self.could_match ||= hash[:could]
|
188
|
+
self.negative ||= hash[:dont]
|
189
|
+
|
190
|
+
end
|
191
|
+
|
192
|
+
# "true" parser always matches the empty string
|
193
|
+
def init_true
|
194
|
+
self.parser=lambda {|parent_node| EmptyNode.new(parent_node)}
|
195
|
+
end
|
196
|
+
|
197
|
+
# parser that matches exactly the string specified
|
198
|
+
def init_string(string)
|
199
|
+
self.parser=lambda {|parent_node| parent_node.src[parent_node.next,string.length]==string && TerminalNode.new(parent_node,string.length,string)}
|
200
|
+
self.terminal=true
|
201
|
+
end
|
202
|
+
|
203
|
+
# parser that matches the given regex
|
204
|
+
def init_regex(regex)
|
205
|
+
self.parser=lambda {|parent_node| offset=parent_node.next;parent_node.src.index(regex,offset)==offset && (o=$~.offset(0)) && TerminalNode.new(parent_node,o[1]-o[0],regex)}
|
206
|
+
self.terminal=true
|
207
|
+
end
|
208
|
+
|
209
|
+
end
|
210
|
+
|
211
|
+
|
212
|
+
# Each Rule has one or more RuleVariant
|
213
|
+
# Rules attempt to match each of their Variants in order. The first one to succeed returns true and the Rule succeeds.
|
214
|
+
class RuleVariant
|
215
|
+
attr_accessor :pattern,:rule,:node_class
|
216
|
+
|
217
|
+
def initialize(pattern,rule,node_class=nil)
|
218
|
+
self.pattern=pattern
|
219
|
+
self.rule=rule
|
220
|
+
self.node_class=node_class
|
221
|
+
end
|
222
|
+
|
223
|
+
def inspect
|
224
|
+
pattern.collect{|a|a.inspect}.join(', ')
|
225
|
+
end
|
226
|
+
|
227
|
+
def to_s
|
228
|
+
"variant_class: #{node_class}, pattern: #{inspect}"
|
229
|
+
end
|
230
|
+
|
231
|
+
# convert the pattern into a set of lamba functions
|
232
|
+
def pattern_elements
|
233
|
+
@pattern_elements||=pattern.collect { |match| PatternElement.new match, self }
|
234
|
+
end
|
235
|
+
|
236
|
+
# returns a Node object if it matches, nil otherwise
|
237
|
+
def parse(parent_node)
|
238
|
+
#return parse_nongreedy_optional(src,offset,parent_node) # nongreedy optionals break standard PEG
|
239
|
+
node=node_class.new(parent_node)
|
240
|
+
|
241
|
+
pattern_elements.each do |pe|
|
242
|
+
match=pe.parse(node)
|
243
|
+
|
244
|
+
# if parse failed
|
245
|
+
if !match
|
246
|
+
if pe.terminal
|
247
|
+
# log failures on Terminal patterns for debug output if overall parse fails
|
248
|
+
node.parser.log_parsing_failure(node.next,:pattern=>pe.match,:node=>node)
|
249
|
+
end
|
250
|
+
return nil
|
251
|
+
end
|
252
|
+
|
253
|
+
# parse succeeded, add to node and continue
|
254
|
+
node.add_match(match,pe.name)
|
255
|
+
end
|
256
|
+
node
|
257
|
+
end
|
258
|
+
end
|
259
|
+
|
260
|
+
# Rules define one or more patterns (RuleVariants) to match for a given non-terminal
|
261
|
+
class Rule
|
262
|
+
attr_accessor :name,:variants,:parser,:node_class
|
263
|
+
|
264
|
+
def initialize(name,parser)
|
265
|
+
self.name=name
|
266
|
+
self.variants=[]
|
267
|
+
self.parser=parser
|
268
|
+
|
269
|
+
class_name = "#{parser.module_name}_#{name}_node".camelize
|
270
|
+
self.node_class = parser.const_set(class_name,Class.new(NodeNT))
|
271
|
+
end
|
272
|
+
|
273
|
+
def add_variant(pattern, &block)
|
274
|
+
|
275
|
+
rule_variant_class_name = "#{name}_node#{self.variants.length+1}".camelize
|
276
|
+
rule_variant_class = parser.const_set(rule_variant_class_name,Class.new(node_class))
|
277
|
+
self.variants << RuleVariant.new(pattern,self,rule_variant_class)
|
278
|
+
rule_variant_class.class_eval &block if block
|
279
|
+
rule_variant_class
|
280
|
+
end
|
281
|
+
|
282
|
+
def parse(node)
|
283
|
+
if cached=node.parser.cached(name,node.next)
|
284
|
+
return cached==:no_match ? nil : cached # return nil if cached==:no_matched
|
285
|
+
end
|
286
|
+
|
287
|
+
variants.each do |v|
|
288
|
+
match=v.parse(node)
|
289
|
+
if match
|
290
|
+
node.parser.cache_match(name,match)
|
291
|
+
return match
|
292
|
+
end
|
293
|
+
end
|
294
|
+
node.parser.cache_no_match(name,node.next)
|
295
|
+
nil
|
296
|
+
end
|
297
|
+
|
298
|
+
# inspect returns a string which approximates the syntax for generating the rule and all its variants
|
299
|
+
def inspect
|
300
|
+
variants.collect do |v|
|
301
|
+
"rule #{name.inspect}, #{v.inspect}"
|
302
|
+
end.join("\n")
|
303
|
+
end
|
304
|
+
|
305
|
+
# returns a more human-readable explanation of the rule
|
306
|
+
def to_s
|
307
|
+
"rule #{name.inspect}, node_class: #{node_class}\n\t"+
|
308
|
+
"#{variants.collect {|v|v.to_s}.join("\n\t")}"
|
309
|
+
end
|
310
|
+
end
|
311
|
+
|
312
|
+
# primary object used by the client
|
313
|
+
# Used to generate the grammer with .rule methods
|
314
|
+
# Used to parse with .parse
|
315
|
+
class Parser
|
316
|
+
|
317
|
+
# Parser sub-class grammaer definition
|
318
|
+
# These methods are used in the creation of a Parser Sub-Class to define
|
319
|
+
# its grammar
|
320
|
+
class <<self
|
321
|
+
attr_accessor :rules,:module_name,:root_rule
|
322
|
+
|
323
|
+
def rules
|
324
|
+
@rules||={}
|
325
|
+
end
|
326
|
+
# rules can be specified as:
|
327
|
+
# parser.rule :name, to_match1, to_match2, etc...
|
328
|
+
#or
|
329
|
+
# parser.rule :name, [to_match1, to_match2, etc...]
|
330
|
+
def rule(name,*pattern,&block)
|
331
|
+
pattern=pattern[0] if pattern[0].kind_of?(Array)
|
332
|
+
rule=self.rules[name]||=Rule.new(name,self)
|
333
|
+
self.root_rule||=name
|
334
|
+
rule.add_variant(pattern,&block)
|
335
|
+
end
|
336
|
+
|
337
|
+
def node_class(name,&block)
|
338
|
+
klass=self.rules[name].node_class
|
339
|
+
return klass unless block
|
340
|
+
klass.class_eval &block
|
341
|
+
end
|
342
|
+
|
343
|
+
def [](i)
|
344
|
+
rules[i]
|
345
|
+
end
|
346
|
+
|
347
|
+
# rule can be symbol-name of one of the rules in rules_array or one of the actual Rule objects in that array
|
348
|
+
def root_rule=(rule)
|
349
|
+
raise "Symbol required" unless rule.kind_of?(Symbol)
|
350
|
+
raise "rule #{rule.inspect} not found" unless rules[rule]
|
351
|
+
@root_rule=rule
|
352
|
+
end
|
353
|
+
end
|
354
|
+
|
355
|
+
#*********************************************
|
356
|
+
# pattern construction tools
|
357
|
+
#
|
358
|
+
# Ex:
|
359
|
+
# # match 'keyword'
|
360
|
+
# # (succeeds if keyword is matched; advances the read pointer)
|
361
|
+
# rule :sample_rule, "keyword"
|
362
|
+
# rule :sample_rule, match("keyword")
|
363
|
+
#
|
364
|
+
# # don't match 'keyword'
|
365
|
+
# # (succeeds only if keyword is NOT matched; does not advance the read pointer)
|
366
|
+
# rule :sample_rule, match!("keyword")
|
367
|
+
# rule :sample_rule, dont.match("keyword")
|
368
|
+
#
|
369
|
+
# # optionally match 'keyword'
|
370
|
+
# # (always succeeds; advances the read pointer if keyword is matched)
|
371
|
+
# rule :sample_rule, match?("keyword")
|
372
|
+
# rule :sample_rule, optionally.match("keyword")
|
373
|
+
#
|
374
|
+
# # ensure we could match 'keyword'
|
375
|
+
# # (succeeds only if keyword is matched, but does not advance the read pointer)
|
376
|
+
# rule :sample_rule, could.match("keyword")
|
377
|
+
#
|
378
|
+
|
379
|
+
# dont.match("keyword") #
|
380
|
+
#*********************************************
|
381
|
+
class <<self
|
382
|
+
def many(m,delimiter=nil,post_delimiter=nil) PatternElementHash.new.match.many(m).delimiter(delimiter).post_delimiter(post_delimiter) end
|
383
|
+
def many?(m,delimiter=nil,post_delimiter=nil) PatternElementHash.new.optionally.match.many(m).delimiter(delimiter).post_delimiter(post_delimiter) end
|
384
|
+
def many!(m,delimiter=nil,post_delimiter=nil) PatternElementHash.new.dont.match.many(m).delimiter(delimiter).post_delimiter(post_delimiter) end
|
385
|
+
|
386
|
+
def match?(*args) PatternElementHash.new.optionally.match(*args) end
|
387
|
+
def match(*args) PatternElementHash.new.match(*args) end
|
388
|
+
def match!(*args) PatternElementHash.new.dont.match(*args) end
|
389
|
+
|
390
|
+
def dont; PatternElementHash.new.dont end
|
391
|
+
def optionally; PatternElementHash.new.optionally end
|
392
|
+
def could; PatternElementHash.new.could end
|
393
|
+
end
|
394
|
+
|
395
|
+
|
396
|
+
#*********************************************
|
397
|
+
#*********************************************
|
398
|
+
# parser instance implementation
|
399
|
+
# this methods are used for each actual parse run
|
400
|
+
# they are tied to an instnace of the Parser Sub-class to you can have more than one
|
401
|
+
# parser active at a time
|
402
|
+
attr_accessor :failure_index
|
403
|
+
attr_accessor :expecting_list
|
404
|
+
attr_accessor :src
|
405
|
+
attr_accessor :parse_cache
|
406
|
+
|
407
|
+
def initialize
|
408
|
+
reset_parser_tracking
|
409
|
+
end
|
410
|
+
|
411
|
+
def reset_parser_tracking
|
412
|
+
self.src=nil
|
413
|
+
self.failure_index=0
|
414
|
+
self.expecting_list={}
|
415
|
+
self.parse_cache={}
|
416
|
+
end
|
417
|
+
|
418
|
+
def cached(rule_class,offset)
|
419
|
+
(parse_cache[rule_class]||={})[offset]
|
420
|
+
end
|
421
|
+
|
422
|
+
def cache_match(rule_class,match)
|
423
|
+
(parse_cache[rule_class]||={})[match.offset]=match
|
424
|
+
end
|
425
|
+
|
426
|
+
def cache_no_match(rule_class,offset)
|
427
|
+
(parse_cache[rule_class]||={})[offset]=:no_match
|
428
|
+
end
|
429
|
+
|
430
|
+
def log_parsing_failure(index,expecting)
|
431
|
+
if index>failure_index
|
432
|
+
key=expecting[:pattern]
|
433
|
+
@expecting_list={key=>expecting}
|
434
|
+
@failure_index = index
|
435
|
+
elsif index == failure_index
|
436
|
+
key=expecting[:pattern]
|
437
|
+
self.expecting_list[key]=expecting
|
438
|
+
else
|
439
|
+
# ignored
|
440
|
+
end
|
441
|
+
end
|
442
|
+
|
443
|
+
|
444
|
+
def parse(src,offset=0,rule=nil)
|
445
|
+
reset_parser_tracking
|
446
|
+
@start_time=Time.now
|
447
|
+
self.src=src
|
448
|
+
root_node=RootNode.new(self)
|
449
|
+
ret=self.class[rule||self.class.root_rule].parse(root_node)
|
450
|
+
unless rule
|
451
|
+
if ret
|
452
|
+
if ret.next<src.length # parse only succeeds if the whole input is matched
|
453
|
+
@parsing_did_not_match_entire_input=true
|
454
|
+
@failure_index=ret.next
|
455
|
+
ret=nil
|
456
|
+
else
|
457
|
+
reset_parser_tracking
|
458
|
+
end
|
459
|
+
end
|
460
|
+
end
|
461
|
+
@end_time=Time.now
|
462
|
+
ret
|
463
|
+
end
|
464
|
+
|
465
|
+
def parse_time
|
466
|
+
@end_time-@start_time
|
467
|
+
end
|
468
|
+
|
469
|
+
def parse_and_puts_errors(src,out=$stdout)
|
470
|
+
ret=parse(src)
|
471
|
+
unless ret
|
472
|
+
out.puts parser_failure_info
|
473
|
+
end
|
474
|
+
ret
|
475
|
+
end
|
476
|
+
|
477
|
+
def node_list_string(node_list,common_root=[])
|
478
|
+
node_list && node_list[common_root.length..-1].map{|p|"#{p.class}(#{p.offset})"}.join(" > ")
|
479
|
+
end
|
480
|
+
|
481
|
+
def parser_failure_info
|
482
|
+
return unless src
|
483
|
+
bracketing_lines=5
|
484
|
+
line,col=src.line_col(failure_index)
|
485
|
+
ret=<<-ENDTXT
|
486
|
+
Parsing error at line #{line} column #{col} offset #{failure_index}
|
487
|
+
|
488
|
+
Source:
|
489
|
+
...
|
490
|
+
#{(failure_index==0 ? "" : src[0..(failure_index-1)]).last_lines(bracketing_lines)}<HERE>#{src[(failure_index)..-1].first_lines(bracketing_lines)}
|
491
|
+
...
|
492
|
+
ENDTXT
|
493
|
+
|
494
|
+
if @parsing_did_not_match_entire_input
|
495
|
+
ret+="\nParser did not match entire input."
|
496
|
+
else
|
497
|
+
|
498
|
+
common_root=nil
|
499
|
+
expecting_list.values.each do |e|
|
500
|
+
node=e[:node]
|
501
|
+
pl=node.parent_list
|
502
|
+
if common_root
|
503
|
+
common_root.each_index do |i|
|
504
|
+
if pl[i]!=common_root[i]
|
505
|
+
common_root=common_root[0..i-1]
|
506
|
+
break
|
507
|
+
end
|
508
|
+
end
|
509
|
+
else
|
510
|
+
common_root=node.parent_list
|
511
|
+
end
|
512
|
+
end
|
513
|
+
ret+=<<ENDTXT
|
514
|
+
|
515
|
+
Successfully matched rules up to failure:
|
516
|
+
#{node_list_string(common_root)}
|
517
|
+
|
518
|
+
Expecting#{expecting_list.length>1 ? ' one of' : ''}:
|
519
|
+
#{expecting_list.values.collect do |a|
|
520
|
+
list=node_list_string(a[:node].parent_list,common_root)
|
521
|
+
[list,"#{a[:pattern].inspect} (#{list})"]
|
522
|
+
end.sort.map{|i|i[1]}.join("\n ")}
|
523
|
+
ENDTXT
|
524
|
+
end
|
525
|
+
ret
|
526
|
+
end
|
527
|
+
end
|
528
|
+
end
|
529
|
+
|