neg 0.3.0 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -2,7 +2,7 @@
2
2
  # neg - CHANGELOG.md
3
3
 
4
4
 
5
- ## neg - 1.0.0 not yet released
5
+ ## neg - 1.0.0 released 2013-01-16
6
6
 
7
7
  - initial release
8
8
 
@@ -1,5 +1,5 @@
1
1
 
2
- Copyright (c) 2012-2012, John Mettraux, jmettraux@gmail.com
2
+ Copyright (c) 2012-2013, John Mettraux, jmettraux@gmail.com
3
3
 
4
4
  Permission is hereby granted, free of charge, to any person obtaining a copy
5
5
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -1,10 +1,177 @@
1
1
 
2
2
  # neg
3
3
 
4
- A neg parser.
4
+ A neg narser.
5
5
 
6
6
  A silly little exploration project.
7
7
 
8
+ It could have been "peg" as in "peg, a peg parser" but that would have been presomptuous, it could have been "leg" as in "leg, a leg larser", but there is already a [leg](http://piumarta.com/software/peg/peg.1.html), so it became "neg" as in "neg, a neg narser". It sounds neg-ative, but whatever, it's just a toy project.
9
+
10
+
11
+ ## Ruby PEG libraries
12
+
13
+ Ruby has many such libraries. Here are three preeminent ones:
14
+
15
+ * Treetop: <http://treetop.rubyforge.org/>
16
+ * Citrus: <http://mjijackson.com/citrus/>
17
+ * Parslet: <http://kschiess.github.com/parslet/>
18
+
19
+ My favourite is Parslet. Neg is born out of the ashes of contribution attempts to Parslet. Studying this great library made me want to implement my own mini PEG library, for the fun of it.
20
+
21
+ So if you're looking for something robust and battle-tested, something for the long term, stop reading here and use one of the three gems above. IMHO, [Parslet](http://kschiess.github.com/parslet/) stands above for its error reporting.
22
+
23
+
24
+ ## expressing a grammar with neg
25
+
26
+ Here is the classical arithmetic example:
27
+
28
+ ```ruby
29
+ class ArithParser < Neg::Parser
30
+
31
+ expression == operation
32
+
33
+ operator == `+` | `-` | `*` | `/`
34
+ operation == value + (operator + value) * 0
35
+ value == parenthese | number
36
+ parenthese == `(` + expression + `)`
37
+ number == `-` * -1 + _('0-9') * 1
38
+ end
39
+
40
+ tree = ArithParser.parse("1+(2*12)")
41
+ ```
42
+
43
+ (Note: this is Ruby code)
44
+
45
+
46
+ ## grammar building blocks
47
+
48
+ ```ruby
49
+ # leaves
50
+
51
+ StringParser
52
+ text == `foreach`
53
+
54
+ CharacterParser
55
+ stuff == _ # any character
56
+ stuff == _ * 1 # one or more of any character
57
+ stuff == _("0-9") * 1 # like /[0-9]+/
58
+
59
+ # composite
60
+
61
+ SequenceParser
62
+ sentence == subject + verb + object
63
+
64
+ AlternativeParser
65
+ subject == person | animal | place
66
+
67
+ # parentheses
68
+ sentence = (person | animal) + verb + (object | (`in ` + place))
69
+
70
+ # modifiers
71
+
72
+ RepetitionParser
73
+ text == `x` * 0 # 0 or more
74
+ text == `x` * 1 # 1 or more
75
+ text == `x` * -1 # 0 or 1
76
+ text == `x` * [2, 4] # 2, 3 or 4
77
+
78
+ LookaheadParser
79
+ x_then_z == `x` + ~`z` # presence
80
+ x_then_not_z == `x` + -`z` # absence
81
+
82
+ # naming
83
+
84
+ NonTerminalParser
85
+ brand == `mazda` | `ford` # "brand" is the non-terminal
86
+
87
+ NonTerminalParser (name is omitted in output parse tree)
88
+ _operator == `+` | `*` | `-` | `/`
89
+
90
+ Embedded naming (here "operator")
91
+ operation == number + (`+` | `-`)["operator"] + number
92
+ ```
93
+
94
+
95
+ ## parser output
96
+
97
+ Without a translator, the parser outputs a raw parse tree, something like:
98
+
99
+ ```ruby
100
+ [ :json,
101
+ [ 0, 1, 1 ],
102
+ true,
103
+ nil,
104
+ [ [ :spaces?, [ 0, 1, 1 ], true, '', [] ],
105
+ [ :value, [ 0, 1, 1 ], true, nil, [
106
+ [ :bfalse, [ 0, 1, 1 ], true, 'false', [] ] ] ],
107
+ [ :spaces?, [ 5, 1, 6 ], true, '', [] ] ] ]
108
+ ```
109
+
110
+ It's a nested assemblage of result nodes.
111
+
112
+ ```ruby
113
+ [ rule_name, [ offset, line, column ], success?, result, children ]
114
+ #
115
+ # for example
116
+ [ :bfalse, [ 0, 1, 1 ], true, 'false', [] ]
117
+ ```
118
+
119
+ In case of successful parsing, the succes? == false also get all pruned. In case of failed parsing, they are left in the output parse tree.
120
+
121
+ A translator turns a raw parse tree into some final result. Look below and at the JSON parser sample in the specs for more information. If the parse failed and a translator is present, a ParseError is raised.
122
+
123
+
124
+ ## parser + translator
125
+
126
+ It's OK to stuff the translator inside of the parser:
127
+
128
+ ```ruby
129
+ class CompactArithParser < Neg::Parser
130
+
131
+ parser do
132
+
133
+ expression == operation
134
+
135
+ operator == `+` | `-` | `*` | `/`
136
+ operation == value + (operator + value) * 0
137
+ value == parenthese | number
138
+ parenthese == `(` + expression + `)`
139
+ number == `-` * -1 + _('0-9') * 1
140
+ end
141
+
142
+ translator do
143
+
144
+ on(:number) { |n| n.result.to_i }
145
+ on(:operator) { |n| n.result }
146
+ on(:value) { |n| n.results.first }
147
+
148
+ on(:expression) { |n|
149
+ results = n.results.flatten(2)
150
+ results.size == 1 ? results.first : results
151
+ }
152
+ end
153
+ end
154
+
155
+ CompactArithParser.parse("1+2+3")
156
+ # => [ 1, '+', 2, '+', 3 ]
157
+ ```
158
+
159
+ As said above, when a translator is present and the parsing fails (before the translator kicks in), a ParseError is raised, with fancy methods to navigate the failed parse tree.
160
+
161
+
162
+ ## presentations
163
+
164
+ Neg was published on the 2012-10-06 as it was presented to [Hiroshima.rb](http://hiroshimarb.github.com/).
165
+
166
+ The \[very dry\] deck of slides that accompanied it can be found at <https://speakerdeck.com/u/jmettraux/p/neg-a-neg-narser>.
167
+
168
+
169
+ ## links
170
+
171
+ * source: <https://github.com/jmettraux/neg>
172
+ * issues: <https://github.com/jmettraux/neg/issues>
173
+ * irc: freenode.net #ruote
174
+
8
175
 
9
176
  ## license
10
177
 
data/TODO.txt CHANGED
@@ -4,15 +4,14 @@
4
4
  [o] switch from ^ to * (how * is related to +)
5
5
  [o] _ (any)
6
6
  [o] chars
7
- [ ] lookahead present/absent
7
+ [o] lookahead present/absent
8
8
  ~ x --> present
9
9
  ! x --> absent
10
+
10
11
  [ ] blankslate
11
12
  [ ] drop UnconsumedInput, replace with regular [ false, ... ] output
12
13
  [ ] x * '?' / x * '+' / x * '*' as shortcuts
14
+ [ ] memoization (only at non-terminal level?)
13
15
 
14
- `x` + [.]
15
- `x` + [a-z]
16
- `x` + c('a-z')
17
- `x` + _('a-z')
16
+ [ ] "xxx" instead of `xxx` (trick on the right side)
18
17
 
data/lib/neg.rb CHANGED
@@ -1,3 +1,4 @@
1
1
 
2
+ require 'neg/version'
2
3
  require 'neg/parser'
3
4
 
@@ -0,0 +1,66 @@
1
+ #--
2
+ # Copyright (c) 2012-2013, John Mettraux, jmettraux@gmail.com
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the "Software"), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in
12
+ # all copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
20
+ # THE SOFTWARE.
21
+ #
22
+ # Made in Japan.
23
+ #++
24
+
25
+
26
+ module Neg
27
+
28
+ class NegError < StandardError; end
29
+
30
+ class UnconsumedInputError < NegError; end
31
+ class ParserError < NegError; end
32
+
33
+ class ParseError < NegError
34
+
35
+ attr_reader :tree
36
+
37
+ def initialize(tree)
38
+
39
+ @tree = tree
40
+ @nodes = list_nodes(tree)
41
+
42
+ super(deepest_error[3])
43
+ end
44
+
45
+ def errors
46
+
47
+ @nodes.select { |n| n[2] == false && n[3].is_a?(String) }
48
+ end
49
+
50
+ def deepest_error
51
+
52
+ errors.inject { |e, n| e[1][0] < n[1][0] ? n : e }
53
+ end
54
+
55
+ protected
56
+
57
+ def list_nodes(start, accumulator=[])
58
+
59
+ accumulator << start
60
+ start[4].each { |n| list_nodes(n, accumulator) }
61
+
62
+ accumulator
63
+ end
64
+ end
65
+ end
66
+
@@ -1,5 +1,5 @@
1
1
  #--
2
- # Copyright (c) 2012-2012, John Mettraux, jmettraux@gmail.com
2
+ # Copyright (c) 2012-2013, John Mettraux, jmettraux@gmail.com
3
3
  #
4
4
  # Permission is hereby granted, free of charge, to any person obtaining a copy
5
5
  # of this software and associated documentation files (the "Software"), to deal
@@ -1,5 +1,5 @@
1
1
  #--
2
- # Copyright (c) 2012-2012, John Mettraux, jmettraux@gmail.com
2
+ # Copyright (c) 2012-2013, John Mettraux, jmettraux@gmail.com
3
3
  #
4
4
  # Permission is hereby granted, free of charge, to any person obtaining a copy
5
5
  # of this software and associated documentation files (the "Software"), to deal
@@ -22,20 +22,29 @@
22
22
  # Made in Japan.
23
23
  #++
24
24
 
25
- require 'neg/version'
26
25
  require 'neg/input'
26
+ require 'neg/errors'
27
+ require 'neg/translator'
27
28
 
28
29
 
29
30
  module Neg
30
31
 
31
- class UnconsumedInputError < StandardError; end
32
- class ParseError < StandardError; end
33
-
34
32
  class Parser
35
33
 
36
34
  def self.`(s) ; StringParser.new(s); end
37
35
  def self._(c=nil) ; CharacterParser.new(c); end
38
36
 
37
+ def self.parser(&block)
38
+
39
+ self.instance_eval(&block)
40
+ end
41
+
42
+ def self.translator(&block)
43
+
44
+ @translator = Class.new(Neg::Translator)
45
+ @translator.instance_eval(&block)
46
+ end
47
+
39
48
  def self.method_missing(m, *args)
40
49
 
41
50
  return super if args.any?
@@ -44,22 +53,30 @@ module Neg
44
53
  @root ||= m
45
54
  pa = NonTerminalParser.new(m)
46
55
 
47
- (class << self; self; end).send(:define_method, m) { pa }
56
+ (class << self; self; end).__send__(:define_method, m) { pa }
48
57
 
49
58
  pa
50
59
  end
51
60
 
52
- def self.parse(s)
61
+ def self.parse(s, opts={})
53
62
 
54
63
  i = Neg::Input(s)
55
64
 
56
- result = send(@root).parse(i)
65
+ result = __send__(@root).parse(i, opts)
57
66
 
58
67
  raise UnconsumedInputError.new(
59
68
  "remaining: #{i.remains.inspect}"
60
69
  ) if result[2] && ( ! i.eoi?)
61
70
 
62
- result
71
+ if @translator && opts[:translate] != false
72
+ if result[2]
73
+ @translator.translate(result)
74
+ else
75
+ raise ParseError.new(result)
76
+ end
77
+ else
78
+ result
79
+ end
63
80
  end
64
81
 
65
82
  def self.to_s
@@ -71,12 +88,12 @@ module Neg
71
88
  m = method(mname)
72
89
 
73
90
  next if m.owner == Class
74
- next if %w[ _ to_s ].include?(mname.to_s)
91
+ next if %w[ _ to_s parser translator ].include?(mname.to_s)
75
92
  next unless m.arity == (RUBY_VERSION > '1.9' ? 0 : -1)
76
93
  next unless m.owner.ancestors.include?(Class)
77
94
  next unless m.receiver.ancestors.include?(Neg::Parser)
78
95
 
79
- s << " #{send(mname).to_s}"
96
+ s << " #{__send__(mname).to_s}"
80
97
  end
81
98
 
82
99
  s << " root: #{@root}"
@@ -97,15 +114,19 @@ module Neg
97
114
  def ~ ; LookaheadParser.new(self, true); end
98
115
  def -@ ; LookaheadParser.new(self, false); end
99
116
 
100
- def parse(input_or_string)
117
+ def parse(input_or_string, opts)
101
118
 
102
119
  input = Neg::Input(input_or_string)
103
120
  start = input.position
104
121
 
105
- success, result, children = do_parse(input)
122
+ success, result, children = do_parse(input, opts)
106
123
 
107
124
  input.rewind(start) unless success
108
125
 
126
+ #if success && children.size == 1 && children.first[1] == start
127
+ # return children.first
128
+ #end
129
+
109
130
  [ nil, start, success, result, children ]
110
131
  end
111
132
  end
@@ -123,39 +144,38 @@ module Neg
123
144
  @child = pa
124
145
  end
125
146
 
126
- # def reduce(children_results)
127
- #
128
- # children_results.collect { |cr|
129
- # if cr[0] && cr[0] != :digit
130
- # false
131
- # elsif cr[2]
132
- # cr[3] ? cr[3] : reduce(cr[4])
133
- # else
134
- # nil
135
- # end
136
- # }.flatten.compact
137
- # end
147
+ def reduce(children_results)
148
+
149
+ children_results.collect { |cr|
150
+ if cr[0] && cr[0].to_s.match(/^_/).nil?
151
+ false
152
+ elsif cr[2]
153
+ cr[3] ? cr[3] : reduce(cr[4])
154
+ else
155
+ nil
156
+ end
157
+ }.flatten.compact
158
+ end
138
159
 
139
- def do_parse(i)
160
+ def do_parse(i, opts)
140
161
 
141
- raise ParseError.new("\"#{@name}\" is missing") if @child.nil?
162
+ raise ParserError.new("\"#{@name}\" is missing") if @child.nil?
142
163
 
143
- r = @child.do_parse(i)
164
+ r = @child.do_parse(i, opts)
144
165
 
145
- return r
166
+ return r if r[0] == false
167
+ return r if r[1].is_a?(String)
146
168
 
147
- # return r if r[0] == false
148
- #
149
- # report = reduce(r[2])
150
- #
151
- # return r if report.include?(false)
152
- #
153
- # [ true, report.join, [] ]
169
+ report = reduce(r[2])
170
+
171
+ return r if report.include?(false)
172
+
173
+ [ true, report.join, opts[:noreduce] ? r[2] : [] ]
154
174
  end
155
175
 
156
- def parse(input_or_string)
176
+ def parse(input_or_string, opts)
157
177
 
158
- r = super(input_or_string)
178
+ r = super
159
179
  r[0] = @name
160
180
 
161
181
  r
@@ -189,12 +209,12 @@ module Neg
189
209
  end
190
210
  end
191
211
 
192
- def do_parse(i)
212
+ def do_parse(i, opts)
193
213
 
194
214
  rs = []
195
215
 
196
216
  loop do
197
- r = @child.parse(i)
217
+ r = @child.parse(i, opts)
198
218
  break if ! r[2] && rs.size >= @min && (@max.nil? || rs.size <= @max)
199
219
  rs << r
200
220
  break if ! r[2]
@@ -219,7 +239,7 @@ module Neg
219
239
  @s = s
220
240
  end
221
241
 
222
- def do_parse(i)
242
+ def do_parse(i, opts)
223
243
 
224
244
  if (s = i.read(@s.length)) == @s
225
245
  [ true, @s, [] ]
@@ -242,7 +262,7 @@ module Neg
242
262
  @r = Regexp.new(c ? "[#{c}]" : '.')
243
263
  end
244
264
 
245
- def do_parse(i)
265
+ def do_parse(i, opts)
246
266
 
247
267
  if (s = i.read(1)).match(@r)
248
268
  [ true, s, [] ]
@@ -274,13 +294,13 @@ module Neg
274
294
  self
275
295
  end
276
296
 
277
- def do_parse(i)
297
+ def do_parse(i, opts)
278
298
 
279
299
  results = []
280
300
 
281
301
  @children.each do |c|
282
302
 
283
- results << c.parse(i)
303
+ results << c.parse(i, opts)
284
304
  break unless results.last[2]
285
305
  end
286
306
 
@@ -302,15 +322,17 @@ module Neg
302
322
  self
303
323
  end
304
324
 
305
- def do_parse(i)
325
+ def do_parse(i, opts)
306
326
 
307
327
  results = []
308
328
 
309
329
  @children.each { |c|
310
- results << c.parse(i)
330
+ results << c.parse(i, opts)
311
331
  break if results.last[2]
312
332
  }
313
333
 
334
+ results = results[-1, 1] if results.last[2] && ! opts[:noreduce]
335
+
314
336
  [ results.last[2], nil, results ]
315
337
  end
316
338
 
@@ -328,18 +350,18 @@ module Neg
328
350
  @presence = presence
329
351
  end
330
352
 
331
- def do_parse(i)
353
+ def do_parse(i, opts)
332
354
 
333
355
  start = i.position
334
356
 
335
- r = @child.parse(i)
357
+ r = @child.parse(i, opts)
336
358
  i.rewind(start)
337
359
 
338
360
  success = r[2]
339
361
  success = ! success if ! @presence
340
362
 
341
363
  result = if success
342
- nil
364
+ '' # for NonTerminal#reduce not to continue
343
365
  else
344
366
  [
345
367
  @child.to_s(nil), 'is not', @presence ? 'present' : 'absent'