whittle 0.0.2 → 0.0.3
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +41 -13
- data/lib/whittle/parser.rb +38 -48
- data/lib/whittle/version.rb +1 -1
- data/spec/unit/parser/surplus_input_spec.rb +37 -0
- metadata +6 -4
data/README.md
CHANGED
@@ -83,9 +83,10 @@ program, which in this case is the `:expr` rule that can add two numbers togethe
|
|
83
83
|
|
84
84
|
There are two terminal rules (`"+"` and `:int`) and one nonterminal (`:expr`) in the above
|
85
85
|
grammar. Each rule can have a block attached to it. The block is invoked with the result
|
86
|
-
evaluating
|
86
|
+
evaluating each of its inputs via their own blocks (in a depth-first manner). The default
|
87
87
|
action if no block is given, is to return whatever the leftmost input to the rule happens to
|
88
|
-
be.
|
88
|
+
be. We use `#as` to provide an action that actually does something meaningful with the
|
89
|
+
inputs.
|
89
90
|
|
90
91
|
We can optionally use the Hash notation to map a name with a pattern (or a fixed string) when
|
91
92
|
we declare terminal rules too, as we have done with the `:int` rule above. Note that the
|
@@ -94,16 +95,26 @@ block, but since this is such a common use-case, Whittle offers the shorthand.
|
|
94
95
|
|
95
96
|
As the input string is parsed, it *must* match the start rule `:expr`.
|
96
97
|
|
97
|
-
Let's step through the parse for the above input "1+2".
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
98
|
+
Let's step through the parse for the above input "1+2".
|
99
|
+
|
100
|
+
- When the parser starts, it looks at the start rule `:expr` and decides what tokens would
|
101
|
+
be valid if they were encountered.
|
102
|
+
- Since `:expr` starts with `:int`, the only thing that would be valid is anything matching
|
103
|
+
`/[0-9]+/`.
|
104
|
+
- When the parser reads the "1", it recognizes it as an `:int`, evaluates its block (thus
|
105
|
+
casting it to an Integer), and moves it aside (puts it on the stack, to be precise).
|
106
|
+
- Now it advances through the rule for `:expr` and decides the only valid input would be a
|
107
|
+
"+"
|
108
|
+
- Upon reading the "+", the rule for "+" is invoked (which does nothing) and the "+" is put
|
109
|
+
on the stack, along with the `:int` we already have.
|
110
|
+
- Now the parser's only valid input is another `:int`, which it gets from the "2", casting
|
111
|
+
it to an Integer according to its block, and putting it on the stack.
|
112
|
+
- Finally, upon having read the sequence `:int`, "+", `:int`, our block attached to that
|
113
|
+
particular rule is invoked to return a result by summing the 1 and the 2 to make 3. Magic!
|
114
|
+
|
115
|
+
This was a simple parse. At each point there was only one valid input. As we'll see, parses
|
116
|
+
can be arbitrarily complex, without increasing the amount of work needed to process the input
|
117
|
+
string.
|
107
118
|
|
108
119
|
## Nonterminal rules can have more than one valid sequence
|
109
120
|
|
@@ -474,12 +485,29 @@ would probably be a useful exercise.
|
|
474
485
|
|
475
486
|
If you have any examples you'd like to contribute, I will gladly add them to the repository.
|
476
487
|
|
488
|
+
## Issues & Questions
|
489
|
+
|
490
|
+
Any issues, I will address them quickly as it is still early days, though I am pretty confident,
|
491
|
+
since this is based on a scientific algorithm, issues would be relatively minor. Post them to
|
492
|
+
the issue tracker:
|
493
|
+
|
494
|
+
- https://github.com/d11wtq/whittle/issues
|
495
|
+
|
496
|
+
If you have any suggestions for how I might improve the DSL in order to be more human-friendly,
|
497
|
+
you can suggest those in the issue tracker too.
|
498
|
+
|
499
|
+
For any "how do I do this?" type questions, you can message me directly (via my github profile
|
500
|
+
page):
|
501
|
+
|
502
|
+
- https://github.com/d11wtq
|
503
|
+
|
504
|
+
Or simply post an issue.
|
505
|
+
|
477
506
|
## TODO
|
478
507
|
|
479
508
|
- Provide a more powerful (state based) lexer algorithm, or at least document how users can
|
480
509
|
override `#lex`.
|
481
510
|
- Allow inspection of the parse table (it is not very human friendly right now).
|
482
|
-
- Allow inspection of the AST (maybe).
|
483
511
|
- Given in an input String, provide a human readble explanation of the parse.
|
484
512
|
|
485
513
|
## License & Copyright
|
data/lib/whittle/parser.rb
CHANGED
@@ -19,25 +19,21 @@ module Whittle
|
|
19
19
|
# @example A simple Whittle Parser
|
20
20
|
#
|
21
21
|
# class Calculator < Whittle::Parser
|
22
|
-
# rule(:wsp
|
23
|
-
# r[/s+/] # skip whitespace
|
24
|
-
# end
|
22
|
+
# rule(:wsp => /\s+/).skip!
|
25
23
|
#
|
26
|
-
# rule(:int)
|
27
|
-
# r[/[0-9]+/].as { |i| Integer(i) }
|
28
|
-
# end
|
24
|
+
# rule(:int => /[0-9]+/).as { |i| Integer(i) }
|
29
25
|
#
|
30
|
-
# rule("+") % :left
|
31
|
-
# rule("-") % :left
|
32
|
-
# rule("/") % :left
|
33
|
-
# rule("*") % :left
|
26
|
+
# rule("+") % :left ^ 1
|
27
|
+
# rule("-") % :left ^ 1
|
28
|
+
# rule("/") % :left ^ 2
|
29
|
+
# rule("*") % :left ^ 2
|
34
30
|
#
|
35
31
|
# rule(:expr) do |r|
|
36
32
|
# r[:expr, "+", :expr].as { |left, _, right| left + right }
|
37
33
|
# r[:expr, "-", :expr].as { |left, _, right| left - right }
|
38
34
|
# r[:expr, "/", :expr].as { |left, _, right| left / right }
|
39
35
|
# r[:expr, "*", :expr].as { |left, _, right| left * right }
|
40
|
-
# r[:int]
|
36
|
+
# r[:int]
|
41
37
|
# end
|
42
38
|
#
|
43
39
|
# start(:expr)
|
@@ -158,11 +154,11 @@ module Whittle
|
|
158
154
|
raise GrammarError, "Undefined start rule #{start.inspect}" unless rules.key?(start)
|
159
155
|
|
160
156
|
if rules[start].terminal?
|
161
|
-
rule(
|
157
|
+
rule(:$start) do |r|
|
162
158
|
r[start].as { |prog| prog }
|
163
159
|
end
|
164
160
|
|
165
|
-
start(
|
161
|
+
start(:$start)
|
166
162
|
end
|
167
163
|
end
|
168
164
|
end
|
@@ -177,12 +173,13 @@ module Whittle
|
|
177
173
|
# Accepts input in the form of a String and attempts to parse it according to the grammar.
|
178
174
|
#
|
179
175
|
# The input is scanned using a lexical analysis routine, defined by the #lex method. Each
|
180
|
-
# token detected by the routine is used to pick an action from the parse table.
|
181
|
-
# reduction initially builds a branch in an AST (abstract syntax tree), until all input has
|
182
|
-
# been read and the start rule has been recognized, at which point the AST is evaluated by
|
183
|
-
# invoking the callbacks defined in the grammar in a depth-first fashion.
|
176
|
+
# token detected by the routine is used to pick an action from the parse table.
|
184
177
|
#
|
185
|
-
#
|
178
|
+
# Each time a sequence of inputs has been read that concludes a rule in the grammar, the
|
179
|
+
# inputs are passed as arguments to the block for that rule, converting the sequence into
|
180
|
+
# single input before the parse continues.
|
181
|
+
#
|
182
|
+
# If the parser encounters a token it does not expect, a parse error will be raised,
|
186
183
|
# specifying what was expected, what was received, and on which line the error occurred.
|
187
184
|
#
|
188
185
|
# A successful parse returns the result of evaluating the start rule, whatever that may be.
|
@@ -200,39 +197,32 @@ module Whittle
|
|
200
197
|
|
201
198
|
lex(input) do |token|
|
202
199
|
line = token[:line]
|
203
|
-
input = token
|
204
200
|
|
205
|
-
|
206
|
-
|
207
|
-
state = table[states.last]
|
201
|
+
loop do
|
202
|
+
state = table[states.last]
|
208
203
|
|
209
|
-
|
210
|
-
|
211
|
-
|
212
|
-
|
213
|
-
|
214
|
-
|
215
|
-
|
216
|
-
|
217
|
-
|
218
|
-
|
219
|
-
|
220
|
-
:name => ins[:rule].name,
|
221
|
-
:line => line,
|
222
|
-
:args => args.pop(size)
|
223
|
-
}
|
224
|
-
states.pop(size)
|
225
|
-
args << input
|
204
|
+
if instruction = state[token[:name]] || state[nil]
|
205
|
+
case instruction[:action]
|
206
|
+
when :shift
|
207
|
+
states << instruction[:state]
|
208
|
+
args << token[:rule].action.call(token[:value])
|
209
|
+
break
|
210
|
+
when :reduce
|
211
|
+
rule = instruction[:rule]
|
212
|
+
size = rule.components.length
|
213
|
+
args << rule.action.call(*args.pop(size))
|
214
|
+
states.pop(size)
|
226
215
|
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
216
|
+
if states.length == 1 && token[:name] == :$end
|
217
|
+
return args.pop
|
218
|
+
elsif goto = table[states.last][rule.name]
|
219
|
+
states << goto[:state]
|
220
|
+
next
|
231
221
|
end
|
232
|
-
else
|
233
|
-
error(state, input, :states => states, :args => args)
|
234
222
|
end
|
235
223
|
end
|
224
|
+
|
225
|
+
error(state, token, :states => states, :args => args)
|
236
226
|
end
|
237
227
|
end
|
238
228
|
end
|
@@ -282,7 +272,7 @@ module Whittle
|
|
282
272
|
# @param [Hash] stack
|
283
273
|
# the current parse context (arg stack + state stack)
|
284
274
|
def error(state, input, stack)
|
285
|
-
expected = state
|
275
|
+
expected = extract_expected_tokens(state)
|
286
276
|
message = <<-ERROR.gsub(/\n\s+/, " ").strip
|
287
277
|
Parse error:
|
288
278
|
expected
|
@@ -309,8 +299,8 @@ module Whittle
|
|
309
299
|
nil
|
310
300
|
end
|
311
301
|
|
312
|
-
def
|
313
|
-
|
302
|
+
def extract_expected_tokens(state)
|
303
|
+
state.reject { |s, i| i[:action] == :goto }.keys.collect { |k| k.nil? ? :$end : k }
|
314
304
|
end
|
315
305
|
end
|
316
306
|
end
|
data/lib/whittle/version.rb
CHANGED
@@ -0,0 +1,37 @@
|
|
1
|
+
require "spec_helper"
|
2
|
+
|
3
|
+
describe "a parser expecting a fixed amount of input" do
|
4
|
+
let(:parser) do
|
5
|
+
Class.new(Whittle::Parser) do
|
6
|
+
rule("a")
|
7
|
+
rule("b")
|
8
|
+
rule("c")
|
9
|
+
|
10
|
+
rule(:prog) do |r|
|
11
|
+
r["a", "b", "c"]
|
12
|
+
end
|
13
|
+
|
14
|
+
start(:prog)
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
it "raises a parse error if additional input is encountered" do
|
19
|
+
expect { parser.new.parse("abcabc") }.to raise_error(Whittle::ParseError)
|
20
|
+
end
|
21
|
+
|
22
|
+
it "indicates that :$end is the expected token" do
|
23
|
+
begin
|
24
|
+
parser.new.parse("abcabc")
|
25
|
+
rescue Whittle::ParseError => e
|
26
|
+
e.expected.should == [:$end]
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
it "indicates that the first surplus token is the received input" do
|
31
|
+
begin
|
32
|
+
parser.new.parse("abcabc")
|
33
|
+
rescue Whittle::ParseError => e
|
34
|
+
e.received.should == "a"
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: whittle
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.3
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,11 +9,11 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2011-11-
|
12
|
+
date: 2011-11-29 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rspec
|
16
|
-
requirement: &
|
16
|
+
requirement: &70110140399280 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ~>
|
@@ -21,7 +21,7 @@ dependencies:
|
|
21
21
|
version: '2.6'
|
22
22
|
type: :development
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *70110140399280
|
25
25
|
description: ! "Write powerful parsers by defining a series of very simple rules\n
|
26
26
|
\ and operations to perform as those rules are matched. Whittle\n
|
27
27
|
\ parsers are written in pure ruby and as such are extremely
|
@@ -62,6 +62,7 @@ files:
|
|
62
62
|
- spec/unit/parser/self_referential_expr_spec.rb
|
63
63
|
- spec/unit/parser/skipped_tokens_spec.rb
|
64
64
|
- spec/unit/parser/sum_parser_spec.rb
|
65
|
+
- spec/unit/parser/surplus_input_spec.rb
|
65
66
|
- spec/unit/parser/typecast_parser_spec.rb
|
66
67
|
- whittle.gemspec
|
67
68
|
homepage: https://github.com/d11wtq/whittle
|
@@ -101,5 +102,6 @@ test_files:
|
|
101
102
|
- spec/unit/parser/self_referential_expr_spec.rb
|
102
103
|
- spec/unit/parser/skipped_tokens_spec.rb
|
103
104
|
- spec/unit/parser/sum_parser_spec.rb
|
105
|
+
- spec/unit/parser/surplus_input_spec.rb
|
104
106
|
- spec/unit/parser/typecast_parser_spec.rb
|
105
107
|
has_rdoc:
|