neg 0.3.0 → 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG.md +1 -1
- data/LICENSE.txt +1 -1
- data/README.md +168 -1
- data/TODO.txt +4 -5
- data/lib/neg.rb +1 -0
- data/lib/neg/errors.rb +66 -0
- data/lib/neg/input.rb +1 -1
- data/lib/neg/parser.rb +71 -49
- data/lib/neg/translator.rb +76 -0
- data/lib/neg/version.rb +2 -2
- data/spec/parser_alternative_spec.rb +8 -3
- data/spec/parser_character_spec.rb +2 -19
- data/spec/parser_lookahead_parser_spec.rb +18 -16
- data/spec/parser_non_terminal_spec.rb +9 -12
- data/spec/parser_repetition_spec.rb +5 -14
- data/spec/parser_sequence_spec.rb +2 -15
- data/spec/parser_spec.rb +15 -0
- data/spec/parser_string_spec.rb +3 -3
- data/spec/sample_arith_spec.rb +90 -0
- data/spec/sample_compact_spec.rb +50 -0
- data/spec/sample_json_parser_spec.rb +169 -50
- metadata +12 -2
data/CHANGELOG.md
CHANGED
data/LICENSE.txt
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
|
2
|
-
Copyright (c) 2012-
|
2
|
+
Copyright (c) 2012-2013, John Mettraux, jmettraux@gmail.com
|
3
3
|
|
4
4
|
Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
5
|
of this software and associated documentation files (the "Software"), to deal
|
data/README.md
CHANGED
@@ -1,10 +1,177 @@
|
|
1
1
|
|
2
2
|
# neg
|
3
3
|
|
4
|
-
A neg
|
4
|
+
A neg narser.
|
5
5
|
|
6
6
|
A silly little exploration project.
|
7
7
|
|
8
|
+
It could have been "peg" as in "peg, a peg parser" but that would have been presomptuous, it could have been "leg" as in "leg, a leg larser", but there is already a [leg](http://piumarta.com/software/peg/peg.1.html), so it became "neg" as in "neg, a neg narser". It sounds neg-ative, but whatever, it's just a toy project.
|
9
|
+
|
10
|
+
|
11
|
+
## Ruby PEG libraries
|
12
|
+
|
13
|
+
Ruby has many such libraries. Here are three preeminent ones:
|
14
|
+
|
15
|
+
* Treetop: <http://treetop.rubyforge.org/>
|
16
|
+
* Citrus: <http://mjijackson.com/citrus/>
|
17
|
+
* Parslet: <http://kschiess.github.com/parslet/>
|
18
|
+
|
19
|
+
My favourite is Parslet. Neg is born out of the ashes of contribution attempts to Parslet. Studying this great library made me want to implement my own mini PEG library, for the fun of it.
|
20
|
+
|
21
|
+
So if you're looking for something robust and battle-tested, something for the long term, stop reading here and use one of the three gems above. IMHO, [Parslet](http://kschiess.github.com/parslet/) stands above for its error reporting.
|
22
|
+
|
23
|
+
|
24
|
+
## expressing a grammar with neg
|
25
|
+
|
26
|
+
Here is the classical arithmetic example:
|
27
|
+
|
28
|
+
```ruby
|
29
|
+
class ArithParser < Neg::Parser
|
30
|
+
|
31
|
+
expression == operation
|
32
|
+
|
33
|
+
operator == `+` | `-` | `*` | `/`
|
34
|
+
operation == value + (operator + value) * 0
|
35
|
+
value == parenthese | number
|
36
|
+
parenthese == `(` + expression + `)`
|
37
|
+
number == `-` * -1 + _('0-9') * 1
|
38
|
+
end
|
39
|
+
|
40
|
+
tree = ArithParser.parse("1+(2*12)")
|
41
|
+
```
|
42
|
+
|
43
|
+
(Note: this is Ruby code)
|
44
|
+
|
45
|
+
|
46
|
+
## grammar building blocks
|
47
|
+
|
48
|
+
```ruby
|
49
|
+
# leaves
|
50
|
+
|
51
|
+
StringParser
|
52
|
+
text == `foreach`
|
53
|
+
|
54
|
+
CharacterParser
|
55
|
+
stuff == _ # any character
|
56
|
+
stuff == _ * 1 # one or more of any character
|
57
|
+
stuff == _("0-9") * 1 # like /[0-9]+/
|
58
|
+
|
59
|
+
# composite
|
60
|
+
|
61
|
+
SequenceParser
|
62
|
+
sentence == subject + verb + object
|
63
|
+
|
64
|
+
AlternativeParser
|
65
|
+
subject == person | animal | place
|
66
|
+
|
67
|
+
# parentheses
|
68
|
+
sentence = (person | animal) + verb + (object | (`in ` + place))
|
69
|
+
|
70
|
+
# modifiers
|
71
|
+
|
72
|
+
RepetitionParser
|
73
|
+
text == `x` * 0 # 0 or more
|
74
|
+
text == `x` * 1 # 1 or more
|
75
|
+
text == `x` * -1 # 0 or 1
|
76
|
+
text == `x` * [2, 4] # 2, 3 or 4
|
77
|
+
|
78
|
+
LookaheadParser
|
79
|
+
x_then_z == `x` + ~`z` # presence
|
80
|
+
x_then_not_z == `x` + -`z` # absence
|
81
|
+
|
82
|
+
# naming
|
83
|
+
|
84
|
+
NonTerminalParser
|
85
|
+
brand == `mazda` | `ford` # "brand" is the non-terminal
|
86
|
+
|
87
|
+
NonTerminalParser (name is omitted in output parse tree)
|
88
|
+
_operator == `+` | `*` | `-` | `/`
|
89
|
+
|
90
|
+
Embedded naming (here "operator")
|
91
|
+
operation == number + (`+` | `-`)["operator"] + number
|
92
|
+
```
|
93
|
+
|
94
|
+
|
95
|
+
## parser output
|
96
|
+
|
97
|
+
Without a translator, the parser outputs a raw parse tree, something like:
|
98
|
+
|
99
|
+
```ruby
|
100
|
+
[ :json,
|
101
|
+
[ 0, 1, 1 ],
|
102
|
+
true,
|
103
|
+
nil,
|
104
|
+
[ [ :spaces?, [ 0, 1, 1 ], true, '', [] ],
|
105
|
+
[ :value, [ 0, 1, 1 ], true, nil, [
|
106
|
+
[ :bfalse, [ 0, 1, 1 ], true, 'false', [] ] ] ],
|
107
|
+
[ :spaces?, [ 5, 1, 6 ], true, '', [] ] ] ]
|
108
|
+
```
|
109
|
+
|
110
|
+
It's a nested assemblage of result nodes.
|
111
|
+
|
112
|
+
```ruby
|
113
|
+
[ rule_name, [ offset, line, column ], success?, result, children ]
|
114
|
+
#
|
115
|
+
# for example
|
116
|
+
[ :bfalse, [ 0, 1, 1 ], true, 'false', [] ]
|
117
|
+
```
|
118
|
+
|
119
|
+
In case of successful parsing, the succes? == false also get all pruned. In case of failed parsing, they are left in the output parse tree.
|
120
|
+
|
121
|
+
A translator turns a raw parse tree into some final result. Look below and at the JSON parser sample in the specs for more information. If the parse failed and a translator is present, a ParseError is raised.
|
122
|
+
|
123
|
+
|
124
|
+
## parser + translator
|
125
|
+
|
126
|
+
It's OK to stuff the translator inside of the parser:
|
127
|
+
|
128
|
+
```ruby
|
129
|
+
class CompactArithParser < Neg::Parser
|
130
|
+
|
131
|
+
parser do
|
132
|
+
|
133
|
+
expression == operation
|
134
|
+
|
135
|
+
operator == `+` | `-` | `*` | `/`
|
136
|
+
operation == value + (operator + value) * 0
|
137
|
+
value == parenthese | number
|
138
|
+
parenthese == `(` + expression + `)`
|
139
|
+
number == `-` * -1 + _('0-9') * 1
|
140
|
+
end
|
141
|
+
|
142
|
+
translator do
|
143
|
+
|
144
|
+
on(:number) { |n| n.result.to_i }
|
145
|
+
on(:operator) { |n| n.result }
|
146
|
+
on(:value) { |n| n.results.first }
|
147
|
+
|
148
|
+
on(:expression) { |n|
|
149
|
+
results = n.results.flatten(2)
|
150
|
+
results.size == 1 ? results.first : results
|
151
|
+
}
|
152
|
+
end
|
153
|
+
end
|
154
|
+
|
155
|
+
CompactArithParser.parse("1+2+3")
|
156
|
+
# => [ 1, '+', 2, '+', 3 ]
|
157
|
+
```
|
158
|
+
|
159
|
+
As said above, when a translator is present and the parsing fails (before the translator kicks in), a ParseError is raised, with fancy methods to navigate the failed parse tree.
|
160
|
+
|
161
|
+
|
162
|
+
## presentations
|
163
|
+
|
164
|
+
Neg was published on the 2012-10-06 as it was presented to [Hiroshima.rb](http://hiroshimarb.github.com/).
|
165
|
+
|
166
|
+
The \[very dry\] deck of slides that accompanied it can be found at <https://speakerdeck.com/u/jmettraux/p/neg-a-neg-narser>.
|
167
|
+
|
168
|
+
|
169
|
+
## links
|
170
|
+
|
171
|
+
* source: <https://github.com/jmettraux/neg>
|
172
|
+
* issues: <https://github.com/jmettraux/neg/issues>
|
173
|
+
* irc: freenode.net #ruote
|
174
|
+
|
8
175
|
|
9
176
|
## license
|
10
177
|
|
data/TODO.txt
CHANGED
@@ -4,15 +4,14 @@
|
|
4
4
|
[o] switch from ^ to * (how * is related to +)
|
5
5
|
[o] _ (any)
|
6
6
|
[o] chars
|
7
|
-
[
|
7
|
+
[o] lookahead present/absent
|
8
8
|
~ x --> present
|
9
9
|
! x --> absent
|
10
|
+
|
10
11
|
[ ] blankslate
|
11
12
|
[ ] drop UnconsumedInput, replace with regular [ false, ... ] output
|
12
13
|
[ ] x * '?' / x * '+' / x * '*' as shortcuts
|
14
|
+
[ ] memoization (only at non-terminal level?)
|
13
15
|
|
14
|
-
|
15
|
-
`x` + [a-z]
|
16
|
-
`x` + c('a-z')
|
17
|
-
`x` + _('a-z')
|
16
|
+
[ ] "xxx" instead of `xxx` (trick on the right side)
|
18
17
|
|
data/lib/neg.rb
CHANGED
data/lib/neg/errors.rb
ADDED
@@ -0,0 +1,66 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2012-2013, John Mettraux, jmettraux@gmail.com
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the "Software"), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in
|
12
|
+
# all copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
20
|
+
# THE SOFTWARE.
|
21
|
+
#
|
22
|
+
# Made in Japan.
|
23
|
+
#++
|
24
|
+
|
25
|
+
|
26
|
+
module Neg
|
27
|
+
|
28
|
+
class NegError < StandardError; end
|
29
|
+
|
30
|
+
class UnconsumedInputError < NegError; end
|
31
|
+
class ParserError < NegError; end
|
32
|
+
|
33
|
+
class ParseError < NegError
|
34
|
+
|
35
|
+
attr_reader :tree
|
36
|
+
|
37
|
+
def initialize(tree)
|
38
|
+
|
39
|
+
@tree = tree
|
40
|
+
@nodes = list_nodes(tree)
|
41
|
+
|
42
|
+
super(deepest_error[3])
|
43
|
+
end
|
44
|
+
|
45
|
+
def errors
|
46
|
+
|
47
|
+
@nodes.select { |n| n[2] == false && n[3].is_a?(String) }
|
48
|
+
end
|
49
|
+
|
50
|
+
def deepest_error
|
51
|
+
|
52
|
+
errors.inject { |e, n| e[1][0] < n[1][0] ? n : e }
|
53
|
+
end
|
54
|
+
|
55
|
+
protected
|
56
|
+
|
57
|
+
def list_nodes(start, accumulator=[])
|
58
|
+
|
59
|
+
accumulator << start
|
60
|
+
start[4].each { |n| list_nodes(n, accumulator) }
|
61
|
+
|
62
|
+
accumulator
|
63
|
+
end
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
data/lib/neg/input.rb
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
#--
|
2
|
-
# Copyright (c) 2012-
|
2
|
+
# Copyright (c) 2012-2013, John Mettraux, jmettraux@gmail.com
|
3
3
|
#
|
4
4
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
5
|
# of this software and associated documentation files (the "Software"), to deal
|
data/lib/neg/parser.rb
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
#--
|
2
|
-
# Copyright (c) 2012-
|
2
|
+
# Copyright (c) 2012-2013, John Mettraux, jmettraux@gmail.com
|
3
3
|
#
|
4
4
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
5
|
# of this software and associated documentation files (the "Software"), to deal
|
@@ -22,20 +22,29 @@
|
|
22
22
|
# Made in Japan.
|
23
23
|
#++
|
24
24
|
|
25
|
-
require 'neg/version'
|
26
25
|
require 'neg/input'
|
26
|
+
require 'neg/errors'
|
27
|
+
require 'neg/translator'
|
27
28
|
|
28
29
|
|
29
30
|
module Neg
|
30
31
|
|
31
|
-
class UnconsumedInputError < StandardError; end
|
32
|
-
class ParseError < StandardError; end
|
33
|
-
|
34
32
|
class Parser
|
35
33
|
|
36
34
|
def self.`(s) ; StringParser.new(s); end
|
37
35
|
def self._(c=nil) ; CharacterParser.new(c); end
|
38
36
|
|
37
|
+
def self.parser(&block)
|
38
|
+
|
39
|
+
self.instance_eval(&block)
|
40
|
+
end
|
41
|
+
|
42
|
+
def self.translator(&block)
|
43
|
+
|
44
|
+
@translator = Class.new(Neg::Translator)
|
45
|
+
@translator.instance_eval(&block)
|
46
|
+
end
|
47
|
+
|
39
48
|
def self.method_missing(m, *args)
|
40
49
|
|
41
50
|
return super if args.any?
|
@@ -44,22 +53,30 @@ module Neg
|
|
44
53
|
@root ||= m
|
45
54
|
pa = NonTerminalParser.new(m)
|
46
55
|
|
47
|
-
(class << self; self; end).
|
56
|
+
(class << self; self; end).__send__(:define_method, m) { pa }
|
48
57
|
|
49
58
|
pa
|
50
59
|
end
|
51
60
|
|
52
|
-
def self.parse(s)
|
61
|
+
def self.parse(s, opts={})
|
53
62
|
|
54
63
|
i = Neg::Input(s)
|
55
64
|
|
56
|
-
result =
|
65
|
+
result = __send__(@root).parse(i, opts)
|
57
66
|
|
58
67
|
raise UnconsumedInputError.new(
|
59
68
|
"remaining: #{i.remains.inspect}"
|
60
69
|
) if result[2] && ( ! i.eoi?)
|
61
70
|
|
62
|
-
|
71
|
+
if @translator && opts[:translate] != false
|
72
|
+
if result[2]
|
73
|
+
@translator.translate(result)
|
74
|
+
else
|
75
|
+
raise ParseError.new(result)
|
76
|
+
end
|
77
|
+
else
|
78
|
+
result
|
79
|
+
end
|
63
80
|
end
|
64
81
|
|
65
82
|
def self.to_s
|
@@ -71,12 +88,12 @@ module Neg
|
|
71
88
|
m = method(mname)
|
72
89
|
|
73
90
|
next if m.owner == Class
|
74
|
-
next if %w[ _ to_s ].include?(mname.to_s)
|
91
|
+
next if %w[ _ to_s parser translator ].include?(mname.to_s)
|
75
92
|
next unless m.arity == (RUBY_VERSION > '1.9' ? 0 : -1)
|
76
93
|
next unless m.owner.ancestors.include?(Class)
|
77
94
|
next unless m.receiver.ancestors.include?(Neg::Parser)
|
78
95
|
|
79
|
-
s << " #{
|
96
|
+
s << " #{__send__(mname).to_s}"
|
80
97
|
end
|
81
98
|
|
82
99
|
s << " root: #{@root}"
|
@@ -97,15 +114,19 @@ module Neg
|
|
97
114
|
def ~ ; LookaheadParser.new(self, true); end
|
98
115
|
def -@ ; LookaheadParser.new(self, false); end
|
99
116
|
|
100
|
-
def parse(input_or_string)
|
117
|
+
def parse(input_or_string, opts)
|
101
118
|
|
102
119
|
input = Neg::Input(input_or_string)
|
103
120
|
start = input.position
|
104
121
|
|
105
|
-
success, result, children = do_parse(input)
|
122
|
+
success, result, children = do_parse(input, opts)
|
106
123
|
|
107
124
|
input.rewind(start) unless success
|
108
125
|
|
126
|
+
#if success && children.size == 1 && children.first[1] == start
|
127
|
+
# return children.first
|
128
|
+
#end
|
129
|
+
|
109
130
|
[ nil, start, success, result, children ]
|
110
131
|
end
|
111
132
|
end
|
@@ -123,39 +144,38 @@ module Neg
|
|
123
144
|
@child = pa
|
124
145
|
end
|
125
146
|
|
126
|
-
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
|
133
|
-
|
134
|
-
|
135
|
-
|
136
|
-
|
137
|
-
|
147
|
+
def reduce(children_results)
|
148
|
+
|
149
|
+
children_results.collect { |cr|
|
150
|
+
if cr[0] && cr[0].to_s.match(/^_/).nil?
|
151
|
+
false
|
152
|
+
elsif cr[2]
|
153
|
+
cr[3] ? cr[3] : reduce(cr[4])
|
154
|
+
else
|
155
|
+
nil
|
156
|
+
end
|
157
|
+
}.flatten.compact
|
158
|
+
end
|
138
159
|
|
139
|
-
def do_parse(i)
|
160
|
+
def do_parse(i, opts)
|
140
161
|
|
141
|
-
raise
|
162
|
+
raise ParserError.new("\"#{@name}\" is missing") if @child.nil?
|
142
163
|
|
143
|
-
r = @child.do_parse(i)
|
164
|
+
r = @child.do_parse(i, opts)
|
144
165
|
|
145
|
-
return r
|
166
|
+
return r if r[0] == false
|
167
|
+
return r if r[1].is_a?(String)
|
146
168
|
|
147
|
-
|
148
|
-
|
149
|
-
|
150
|
-
|
151
|
-
|
152
|
-
#
|
153
|
-
# [ true, report.join, [] ]
|
169
|
+
report = reduce(r[2])
|
170
|
+
|
171
|
+
return r if report.include?(false)
|
172
|
+
|
173
|
+
[ true, report.join, opts[:noreduce] ? r[2] : [] ]
|
154
174
|
end
|
155
175
|
|
156
|
-
def parse(input_or_string)
|
176
|
+
def parse(input_or_string, opts)
|
157
177
|
|
158
|
-
r = super
|
178
|
+
r = super
|
159
179
|
r[0] = @name
|
160
180
|
|
161
181
|
r
|
@@ -189,12 +209,12 @@ module Neg
|
|
189
209
|
end
|
190
210
|
end
|
191
211
|
|
192
|
-
def do_parse(i)
|
212
|
+
def do_parse(i, opts)
|
193
213
|
|
194
214
|
rs = []
|
195
215
|
|
196
216
|
loop do
|
197
|
-
r = @child.parse(i)
|
217
|
+
r = @child.parse(i, opts)
|
198
218
|
break if ! r[2] && rs.size >= @min && (@max.nil? || rs.size <= @max)
|
199
219
|
rs << r
|
200
220
|
break if ! r[2]
|
@@ -219,7 +239,7 @@ module Neg
|
|
219
239
|
@s = s
|
220
240
|
end
|
221
241
|
|
222
|
-
def do_parse(i)
|
242
|
+
def do_parse(i, opts)
|
223
243
|
|
224
244
|
if (s = i.read(@s.length)) == @s
|
225
245
|
[ true, @s, [] ]
|
@@ -242,7 +262,7 @@ module Neg
|
|
242
262
|
@r = Regexp.new(c ? "[#{c}]" : '.')
|
243
263
|
end
|
244
264
|
|
245
|
-
def do_parse(i)
|
265
|
+
def do_parse(i, opts)
|
246
266
|
|
247
267
|
if (s = i.read(1)).match(@r)
|
248
268
|
[ true, s, [] ]
|
@@ -274,13 +294,13 @@ module Neg
|
|
274
294
|
self
|
275
295
|
end
|
276
296
|
|
277
|
-
def do_parse(i)
|
297
|
+
def do_parse(i, opts)
|
278
298
|
|
279
299
|
results = []
|
280
300
|
|
281
301
|
@children.each do |c|
|
282
302
|
|
283
|
-
results << c.parse(i)
|
303
|
+
results << c.parse(i, opts)
|
284
304
|
break unless results.last[2]
|
285
305
|
end
|
286
306
|
|
@@ -302,15 +322,17 @@ module Neg
|
|
302
322
|
self
|
303
323
|
end
|
304
324
|
|
305
|
-
def do_parse(i)
|
325
|
+
def do_parse(i, opts)
|
306
326
|
|
307
327
|
results = []
|
308
328
|
|
309
329
|
@children.each { |c|
|
310
|
-
results << c.parse(i)
|
330
|
+
results << c.parse(i, opts)
|
311
331
|
break if results.last[2]
|
312
332
|
}
|
313
333
|
|
334
|
+
results = results[-1, 1] if results.last[2] && ! opts[:noreduce]
|
335
|
+
|
314
336
|
[ results.last[2], nil, results ]
|
315
337
|
end
|
316
338
|
|
@@ -328,18 +350,18 @@ module Neg
|
|
328
350
|
@presence = presence
|
329
351
|
end
|
330
352
|
|
331
|
-
def do_parse(i)
|
353
|
+
def do_parse(i, opts)
|
332
354
|
|
333
355
|
start = i.position
|
334
356
|
|
335
|
-
r = @child.parse(i)
|
357
|
+
r = @child.parse(i, opts)
|
336
358
|
i.rewind(start)
|
337
359
|
|
338
360
|
success = r[2]
|
339
361
|
success = ! success if ! @presence
|
340
362
|
|
341
363
|
result = if success
|
342
|
-
|
364
|
+
'' # for NonTerminal#reduce not to continue
|
343
365
|
else
|
344
366
|
[
|
345
367
|
@child.to_s(nil), 'is not', @presence ? 'present' : 'absent'
|