babel_bridge 0.5.1 → 0.5.3

Sign up to get free protection for your applications and to get access to all the features.
Files changed (40) hide show
  1. data/CHANGE_LOG +165 -0
  2. data/Gemfile +4 -0
  3. data/Guardfile +7 -0
  4. data/LICENCE +24 -0
  5. data/README.md +244 -0
  6. data/Rakefile +8 -2
  7. data/TODO +100 -0
  8. data/babel_bridge.gemspec +11 -3
  9. data/examples/json/json_parser.rb +23 -0
  10. data/examples/json/json_parser2.rb +37 -0
  11. data/lib/babel_bridge.rb +3 -2
  12. data/lib/{nodes.rb → babel_bridge/nodes.rb} +0 -0
  13. data/lib/{nodes → babel_bridge/nodes}/empty_node.rb +0 -0
  14. data/lib/{nodes → babel_bridge/nodes}/node.rb +1 -1
  15. data/lib/{nodes → babel_bridge/nodes}/non_terminal_node.rb +0 -8
  16. data/lib/{nodes → babel_bridge/nodes}/root_node.rb +0 -0
  17. data/lib/{nodes → babel_bridge/nodes}/rule_node.rb +0 -0
  18. data/lib/{nodes → babel_bridge/nodes}/terminal_node.rb +0 -0
  19. data/lib/{parser.rb → babel_bridge/parser.rb} +7 -14
  20. data/lib/{pattern_element.rb → babel_bridge/pattern_element.rb} +27 -25
  21. data/lib/babel_bridge/pattern_element_hash.rb +22 -0
  22. data/lib/{rule.rb → babel_bridge/rule.rb} +0 -0
  23. data/lib/{rule_variant.rb → babel_bridge/rule_variant.rb} +0 -4
  24. data/lib/{shell.rb → babel_bridge/shell.rb} +0 -0
  25. data/lib/{string.rb → babel_bridge/string.rb} +0 -0
  26. data/lib/{tools.rb → babel_bridge/tools.rb} +0 -0
  27. data/lib/babel_bridge/version.rb +3 -0
  28. data/spec/advanced_parsers_spec.rb +1 -0
  29. data/spec/basic_parsing_spec.rb +43 -0
  30. data/spec/bb_spec.rb +19 -0
  31. data/spec/compound_patterns_spec.rb +61 -0
  32. data/spec/node_spec.rb +3 -3
  33. data/spec/pattern_generators_spec.rb +4 -4
  34. data/spec/spec_helper.rb +3 -0
  35. metadata +115 -33
  36. data/README +0 -144
  37. data/examples/turing/examples.turing +0 -33
  38. data/examples/turing/notes.rb +0 -111
  39. data/examples/turing/turing_demo.rb +0 -71
  40. data/lib/version.rb +0 -4
data/Rakefile CHANGED
@@ -1,6 +1,12 @@
1
1
  require "bundler/gem_tasks"
2
2
  require "rspec/core/rake_task"
3
3
 
4
- RSpec::Core::RakeTask.new(:spec)
5
-
6
4
  task :default => :spec
5
+
6
+ desc "Run specs"
7
+ RSpec::Core::RakeTask.new do |task|
8
+ task.pattern = "**/spec/*_spec.rb"
9
+ task.rspec_opts = Dir.glob("[0-9][0-9][0-9]_*").collect { |x| "-I#{x}" }.sort
10
+ task.rspec_opts << '--color'
11
+ task.rspec_opts << '-f documentation'
12
+ end
data/TODO ADDED
@@ -0,0 +1,100 @@
1
+ TODO: merge date entries into new TOPIC organized section
2
+
3
+ By-Topic
4
+ --------
5
+
6
+ Parser Feedback (updated Jan 2013
7
+
8
+ If parsing failed and there was a negative (dont) match that prevented further parsing at the failure index, report it in a separate list: "Possibly could have continued if DIDN'T match:"
9
+
10
+ --
11
+
12
+ One confusing parser failure is when the greedy nature of PEG causes one rule to prevent another from matching.
13
+
14
+ Is it possible to a) detect this situation and b) to we provide it as a suggestion to fix (re-order rules) or should the parser automatically try alternaties when parsing fails? Likely the latter changes the Big-O behavior of the algorithm.
15
+
16
+ Convert ignore_whitespace engine to "delimiters" (Dec 2012)
17
+
18
+ Replace engine under the hood for ignore_whitespace:
19
+
20
+ * Add global inter-token delimiter pattern with optional, per-rule overrides.
21
+ * ignore_whitespace will still be supported - just sets the global delimiter to /\s*/
22
+ * rewind_whitespace will be removed - override the containing rule's delimiter to ""
23
+
24
+ possible syntax:
25
+
26
+ rule :rule_name, pattern_a, pattern_b, :delimiter => "" # disable global delimiter; pattern_b must come immediately after pattern_a
27
+ rule :rule_name, pattern_a, pattern_b, :delimiter => "..." # must match "..." between pattern_a and pattern_b
28
+
29
+ Discussion:
30
+
31
+ ignore-whitespace should be considered a "token delimiter" much like the optional delimiter of the "many" node. We could then eliminate the need for "rewind_whitespace" if we allowed you to change the "token delimiter" for a given rule. Often when you need rewind_whitespace, you need it more than once in the same rule. It seems silly to first match the whitespace, and then unmatch it. Let's just allow you to change the delimiter to whatever you want - including the empty string - as well as have a global default (which ignore_whitespace sets to /\s*/).
32
+
33
+ The one question is should we provide some way to access what this "token delimiter" matches? Note this gets a little strange with "many" and it's delimiter since it will be matching: match_pattern, token_delimiter, match_delimiter, token_delimiter, match_pattern.
34
+
35
+ "Python-like" Support (Dec 2012)
36
+
37
+ Add clean, easy support for indention based languages like python or coffeescript.
38
+
39
+ Nodes use ruby ranges instead of "offset" and "length" (Nov 2012)
40
+
41
+ Would like to convert all Node member variables "offset" and "length" to "range" -- use Ruby ranges.
42
+
43
+ Left-Recursion (Nov 2012)
44
+
45
+ http://en.wikipedia.org/wiki/Parsing_expression_grammar
46
+
47
+ Left-Recursion: Given what wikipedia says about left-recursion, I don't want to "support it" at the expense of losing linear-time parsing. I think the right answer is to "handle it nicely"
48
+
49
+ Detection:
50
+ * detect when we attempt to match a rule-variant that is already being matched at the same character position further up the stack.
51
+ * When we detect such a situation, immediately fail to match - as attempting to match leads to infinite recursion.
52
+
53
+ Idea 1: detect and throw error
54
+
55
+ Idea 2: This changes parser behavior in that it will not cause an error. Instead, the parser will proceed to attempt to match other variants.
56
+
57
+ Idea 3: An alternative behavior for left-recursion could be "greedy". The first time it happens, we proceed to any alternate rules. If there are none, then we fail. If we succeed, then attempt to match with just one recursive loop. If that succeeds, we advance to two recursive loops. This is kind-of like tail recursion optimization except it is more like head-recursion in this sense :).
58
+
59
+ Arbitrary Nested Patterns (Dec 2010)
60
+
61
+ rule :my_rule, dont.match("hi", "there", /mom|dad/)
62
+
63
+ This just streamlines some rules into one-liners.
64
+
65
+ Add "or(a,b)" Pattern Matcher (Dec 2010)
66
+
67
+ Including an or(a,b) pattern matcher.
68
+
69
+ Pluralize Many-Rules (?) (Aug 2011)
70
+
71
+ When matching "many(:stick)", it would be nice to be able to refer to all the matches as "sticks" not "stick". Need "pluralize".
72
+
73
+ Counter: our namespace is already pretty overloaded. This may confuse things.
74
+
75
+ Add Shared Methods for all Nodes for a specific Parser
76
+
77
+ I want the ability to add methods to the base-class for all Nodes on a per-parser basis.
78
+
79
+ This means that each parser needs to define a new node base-class, derived from BabelBridge::Node, from which all the Rule Nodes derive.
80
+
81
+ Better handling of EmptyNode? (Dec 2010)
82
+
83
+ Right now an Optional node that is not matched returns an instance of EmptyNode. However, in the parsed result, ideally that match-slot would have the value "nil". How can we accomplish that simply?
84
+
85
+ 2010-11-*
86
+ ---------
87
+ TODO-FEATURE: The "expecting" feature is so good I wonder if we should add the ability to make and apply suggestions to "repair the parse".
88
+ This would need:
89
+ a) default values for regex terminals
90
+ . string terminals are their own default values
91
+ . default values for regex should be verified to match the regex
92
+ b) an interactive prompter if there is more than one option
93
+
94
+ TODO-IMPROVEMENT: "Expecting" should show line numbers instead of char numbers, but it should only be calculated on demand. This means we need a smarter formatter for our possible-error-logging.
95
+
96
+ IDEA: could use the "-" prefix operator to mean "dont":
97
+ -"this"
98
+ -:that
99
+ -match(:foo)
100
+ -many(:foo)
data/babel_bridge.gemspec CHANGED
@@ -1,4 +1,6 @@
1
- require File.join(File.dirname(__FILE__),"lib/babel_bridge.rb")
1
+ lib = File.expand_path('../lib', __FILE__)
2
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
3
+ require 'babel_bridge/version'
2
4
 
3
5
  $gemspec = Gem::Specification.new do |gem|
4
6
  gem.name = "babel_bridge"
@@ -15,9 +17,15 @@ Babel Bridge is an object oriented parser generator for parsing expression gramm
15
17
  Generate memoizing packrat parsers 100% in Ruby code with a simple embedded DSL.
16
18
  DESCRIPTION
17
19
 
18
- gem.files = ["LICENSE", "README", "Rakefile", "babel_bridge.gemspec", "{test,spec,lib,doc,examples}/**/*"].map{|p| Dir[p]}.flatten
19
- gem.has_rdoc = false
20
+ gem.files = `git ls-files`.split($/)
21
+ gem.executables = gem.files.grep(%r{^bin/}).map{ |f| File.basename(f) }
22
+ gem.test_files = gem.files.grep(%r{^(test|spec|features)/})
23
+ gem.require_paths = ["lib"]
20
24
 
21
25
  gem.add_development_dependency 'rake'
22
26
  gem.add_development_dependency 'rspec'
27
+ gem.add_development_dependency 'simplecov'
28
+ gem.add_development_dependency 'guard-rspec'
29
+ gem.add_development_dependency 'guard-test'
30
+ gem.add_development_dependency 'rb-fsevent'
23
31
  end
@@ -0,0 +1,23 @@
1
+ # basic parser that accepts only legal JSON
2
+ require "babel_bridge"
3
+
4
+ class JsonParser < BabelBridge::Parser
5
+ ignore_whitespace
6
+
7
+ rule :document, any(:object, :array)
8
+
9
+ rule :array, '[', many?(:value, ','), ']'
10
+ rule :object, '{', many?(:pair, ','), '}'
11
+
12
+ rule :pair, :string, ':', :value
13
+
14
+ rule :value, any(:object, :array, :number, :string, :true, :false, :null)
15
+
16
+ rule :string, /"(?:[^"\\]|\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4}))*"/
17
+ rule :number, /-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?/
18
+ rule :true, "true"
19
+ rule :false, "false"
20
+ rule :null, "null"
21
+ end
22
+
23
+ BabelBridge::Shell.new(JsonParser.new).start
@@ -0,0 +1,37 @@
1
+ # basic parser that accepts only legal JSON
2
+ # parse-tree-nodes support "#evaluate" which returns the ruby-equivalent data-structure
3
+ require "babel_bridge"
4
+
5
+ class JsonParser < BabelBridge::Parser
6
+ ignore_whitespace
7
+
8
+ rule :document, any(:object, :array)
9
+
10
+ rule :array, '[', many?(:value, ','), ']' do
11
+ def evaluate; value.collect {|v| v.evaluate} end
12
+ end
13
+
14
+ rule :object, '{', many?(:pair, ','), '}' do
15
+ def evaluate; Hash[ pair.collect {|p| p.evaluate } ] end
16
+ end
17
+
18
+ rule :pair, :string, ':', :value do
19
+ def evaluate; [ eval(string.to_s), value.evaluate ] end
20
+ end
21
+
22
+ rule :value, any(:object, :array, :ruby_compatible_literal, :null)
23
+
24
+ rule :ruby_compatible_literal, any(:number, :string, :true, :false) do
25
+ def evaluate; eval(to_s); end
26
+ end
27
+
28
+ rule :string, /"(?:[^"\\]|\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4}))*"/
29
+ rule :number, /-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?/
30
+ rule :true, "true"
31
+ rule :false, "false"
32
+ rule :null, "null" do
33
+ def evaluate; nil end
34
+ end
35
+ end
36
+
37
+ BabelBridge::Shell.new(JsonParser.new).start
data/lib/babel_bridge.rb CHANGED
@@ -9,11 +9,12 @@ http://babel-bridge.rubyforge.org/
9
9
  string
10
10
  version
11
11
  nodes
12
+ pattern_element_hash
12
13
  pattern_element
13
14
  shell
14
15
  rule_variant
15
16
  rule
16
17
  parser
17
18
  }.each do |file|
18
- require File.join(File.dirname(__FILE__),file)
19
- end
19
+ require File.join(File.dirname(__FILE__),"babel_bridge",file)
20
+ end
File without changes
File without changes
@@ -88,7 +88,7 @@ class Node
88
88
  # info methods
89
89
  #********************
90
90
  alias :next :offset_after_match
91
- def text; src[match_range] end # the substring in src matched
91
+ def text; match_length == 0 ? "" : src[match_range] end # the substring in src matched
92
92
 
93
93
  # length returns the number of sub-nodes
94
94
  def length
@@ -36,13 +36,5 @@ class NonTerminalNode < Node
36
36
  update_match_length
37
37
  end
38
38
  end
39
-
40
- def [](i)
41
- matches[i]
42
- end
43
-
44
- def each(&block)
45
- matches.each(&block)
46
- end
47
39
  end
48
40
  end
File without changes
File without changes
File without changes
@@ -145,14 +145,17 @@ class Parser
145
145
  #
146
146
  #*********************************************
147
147
  class <<self
148
- def many(m,delimiter=nil) PatternElementHash.new.match.many(m).delimiter(delimiter) end
148
+ def many(m,delimiter=nil) PatternElementHash.new.match.many(m).delimiter(delimiter) end
149
149
  def many?(m,delimiter=nil) PatternElementHash.new.optionally.match.many(m).delimiter(delimiter) end
150
- def many!(m,delimiter=nil) PatternElementHash.new.dont.match.many(m).delimiter(delimiter) end
151
150
 
151
+ def match(*args) PatternElementHash.new.match(*args) end
152
152
  def match?(*args) PatternElementHash.new.optionally.match(*args) end
153
- def match(*args) PatternElementHash.new.match(*args) end
154
153
  def match!(*args) PatternElementHash.new.dont.match(*args) end
155
154
 
155
+ def any(*args) PatternElementHash.new.any(args) end
156
+ def any?(*args) PatternElementHash.new.optionally.any(args) end
157
+ def any!(*args) PatternElementHash.new.dont.any(args) end
158
+
156
159
  def dont; PatternElementHash.new.dont end
157
160
  def optionally; PatternElementHash.new.optionally end
158
161
  def could; PatternElementHash.new.could end
@@ -234,11 +237,9 @@ class Parser
234
237
  # If nil is returned, parsing failed. Call .parser_failure_info after failure for a human-readable description of the failure.
235
238
  # src: the string to parse
236
239
  # options:
237
- # offset: where to start in the string for parsing
238
240
  # rule: lets you specify the root rule for matching
239
241
  # partial_match: allow partial matching
240
242
  def parse(src, options={})
241
- offset = options[:offset] || 0
242
243
  rule = options[:rule] || self.class.root_rule
243
244
  reset_parser_tracking
244
245
  @start_time = Time.now
@@ -266,14 +267,6 @@ class Parser
266
267
  @end_time-@start_time
267
268
  end
268
269
 
269
- def parse_and_puts_errors(src,out=$stdout)
270
- ret=parse(src)
271
- unless ret
272
- out.puts parser_failure_info
273
- end
274
- ret
275
- end
276
-
277
270
  # options[:verbose] => false
278
271
  def node_list_string(node_list,common_root=[],options={})
279
272
  return unless node_list
@@ -323,7 +316,7 @@ Parse path at failure:
323
316
  Expecting#{expecting_list.length>1 ? ' one of' : ''}:
324
317
  #{Tools.uniform_tabs(Tools.indent(expecting_list.values.collect do |a|
325
318
  list=node_list_string(nodes_interesting_parse_path(a[:node]),common_root,options)
326
- "#{a[:pattern].inspect}\t #{list}"
319
+ "#{a[:pattern]}\t #{list}"
327
320
  end.sort.join("\n")," "))}
328
321
  ENDTXT
329
322
  end
@@ -5,24 +5,6 @@ http://babel-bridge.rubyforge.org/
5
5
  =end
6
6
 
7
7
  module BabelBridge
8
- # hash which can be used declaratively
9
- class PatternElementHash
10
- attr_accessor :hash
11
-
12
- def initialize
13
- @hash = {}
14
- end
15
-
16
- def [](key) @hash[key] end
17
- def []=(key,value) @hash[key]=value end
18
-
19
- def method_missing(method_name, *args) #method_name is a symbol
20
- return self if args.length==1 && !args[0] # if nil is provided, don't set anything
21
- raise "More than one argument is not supported. #{self.class}##{method_name} args=#{args.inspect}" if args.length > 1
22
- @hash[method_name] = args[0] || true # on the other hand, if no args are provided, assume true
23
- self
24
- end
25
- end
26
8
 
27
9
  # PatternElement provides optimized parsing for each Element of a pattern
28
10
  # PatternElement provides all the logic for parsing:
@@ -90,7 +72,7 @@ class PatternElement
90
72
 
91
73
  if !match && (terminal || negative)
92
74
  # log failures on Terminal patterns for debug output if overall parse fails
93
- parent_node.parser.log_parsing_failure parent_node.next, :pattern => self.match, :node => parent_node
75
+ parent_node.parser.log_parsing_failure parent_node.next, :pattern => self.to_s, :node => parent_node
94
76
  end
95
77
 
96
78
  match.delimiter = delimiter if match
@@ -106,20 +88,15 @@ class PatternElement
106
88
  self.match = match
107
89
  match = match[0] if match.kind_of?(Array) && match.length == 1
108
90
  case match
109
- when TrueClass then init_true
110
91
  when String then init_string match
111
92
  when Regexp then init_regex match
112
93
  when Symbol then init_rule match
113
94
  when PatternElementHash then init_hash match
95
+ when Array then init_array match
114
96
  else raise "invalid pattern type: #{match.inspect}"
115
97
  end
116
98
  end
117
99
 
118
- # "true" parser always matches the empty string
119
- def init_true
120
- self.parser=lambda {|parent_node| EmptyNode.new(parent_node)}
121
- end
122
-
123
100
  # initialize PatternElement as a parser that matches exactly the string specified
124
101
  def init_string(string)
125
102
  init_regex Regexp.escape(string)
@@ -155,6 +132,29 @@ class PatternElement
155
132
  end
156
133
  end
157
134
 
135
+ def init_any(patterns)
136
+ patterns = patterns.collect {|p| PatternElement.new(p,@init_options)}
137
+ self.parser = lambda do |parent_node|
138
+ patterns.each do |p|
139
+ node = p.parse(parent_node)
140
+ return node if node
141
+ end
142
+ nil
143
+ end
144
+ end
145
+
146
+ def init_array(patterns)
147
+ patterns = patterns.collect {|p| PatternElement.new(p,@init_options)}
148
+ self.parser = lambda do |parent_node|
149
+ parent_node.attempt_match do
150
+ patterns.each do |p|
151
+ return nil unless parent_node.match p
152
+ end
153
+ end
154
+ parent_node
155
+ end
156
+ end
157
+
158
158
  # initialize the PatternElement from hashed parameters
159
159
  def init_hash(hash)
160
160
  if hash[:parser]
@@ -163,6 +163,8 @@ class PatternElement
163
163
  init_many hash
164
164
  elsif hash[:match]
165
165
  init hash[:match]
166
+ elsif hash[:any]
167
+ init_any hash[:any]
166
168
  else
167
169
  raise "extended-options patterns (specified by a hash) must have either :parser=> or a :match=> set"
168
170
  end
@@ -0,0 +1,22 @@
1
+ module BabelBridge
2
+ # hash which can be used declaratively
3
+ class PatternElementHash
4
+ attr_accessor :hash
5
+
6
+ def initialize
7
+ @hash = {}
8
+ end
9
+
10
+ def inspect; hash.inspect; end
11
+
12
+ def [](key) @hash[key] end
13
+ def []=(key,value) @hash[key]=value end
14
+
15
+ def method_missing(method_name, *args) #method_name is a symbol
16
+ return self if args.length==1 && !args[0] # if nil is provided, don't set anything
17
+ raise "More than one argument is not supported. #{self.class}##{method_name} args=#{args.inspect}" if args.length > 1
18
+ @hash[method_name] = args[0] || true # on the other hand, if no args are provided, assume true
19
+ self
20
+ end
21
+ end
22
+ end
File without changes
@@ -37,10 +37,6 @@ class RuleVariant
37
37
  @pattern_elements||=pattern.collect { |match| [PatternElement.new(match, :rule_variant => self, :pattern_element => true), delimiter_pattern] }.flatten[0..-2]
38
38
  end
39
39
 
40
- def parse_element(element_parser, node)
41
- node.add_match element_parser.parse(node), element_parser.name
42
- end
43
-
44
40
  # returns a Node object if it matches, nil otherwise
45
41
  def parse(parent_node)
46
42
  #return parse_nongreedy_optional(src,offset,parent_node) # nongreedy optionals break standard PEG