whittle 0.0.2 → 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +41 -13
- data/lib/whittle/parser.rb +38 -48
- data/lib/whittle/version.rb +1 -1
- data/spec/unit/parser/surplus_input_spec.rb +37 -0
- metadata +6 -4
data/README.md
CHANGED
@@ -83,9 +83,10 @@ program, which in this case is the `:expr` rule that can add two numbers togethe
|
|
83
83
|
|
84
84
|
There are two terminal rules (`"+"` and `:int`) and one nonterminal (`:expr`) in the above
|
85
85
|
grammar. Each rule can have a block attached to it. The block is invoked with the result
|
86
|
-
evaluating
|
86
|
+
evaluating each of its inputs via their own blocks (in a depth-first manner). The default
|
87
87
|
action if no block is given, is to return whatever the leftmost input to the rule happens to
|
88
|
-
be.
|
88
|
+
be. We use `#as` to provide an action that actually does something meaningful with the
|
89
|
+
inputs.
|
89
90
|
|
90
91
|
We can optionally use the Hash notation to map a name with a pattern (or a fixed string) when
|
91
92
|
we declare terminal rules too, as we have done with the `:int` rule above. Note that the
|
@@ -94,16 +95,26 @@ block, but since this is such a common use-case, Whittle offers the shorthand.
|
|
94
95
|
|
95
96
|
As the input string is parsed, it *must* match the start rule `:expr`.
|
96
97
|
|
97
|
-
Let's step through the parse for the above input "1+2".
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
98
|
+
Let's step through the parse for the above input "1+2".
|
99
|
+
|
100
|
+
- When the parser starts, it looks at the start rule `:expr` and decides what tokens would
|
101
|
+
be valid if they were encountered.
|
102
|
+
- Since `:expr` starts with `:int`, the only thing that would be valid is anything matching
|
103
|
+
`/[0-9]+/`.
|
104
|
+
- When the parser reads the "1", it recognizes it as an `:int`, evaluates its block (thus
|
105
|
+
casting it to an Integer), and moves it aside (puts it on the stack, to be precise).
|
106
|
+
- Now it advances through the rule for `:expr` and decides the only valid input would be a
|
107
|
+
"+"
|
108
|
+
- Upon reading the "+", the rule for "+" is invoked (which does nothing) and the "+" is put
|
109
|
+
on the stack, along with the `:int` we already have.
|
110
|
+
- Now the parser's only valid input is another `:int`, which it gets from the "2", casting
|
111
|
+
it to an Integer according to its block, and putting it on the stack.
|
112
|
+
- Finally, upon having read the sequence `:int`, "+", `:int`, our block attached to that
|
113
|
+
particular rule is invoked to return a result by summing the 1 and the 2 to make 3. Magic!
|
114
|
+
|
115
|
+
This was a simple parse. At each point there was only one valid input. As we'll see, parses
|
116
|
+
can be arbitrarily complex, without increasing the amount of work needed to process the input
|
117
|
+
string.
|
107
118
|
|
108
119
|
## Nonterminal rules can have more than one valid sequence
|
109
120
|
|
@@ -474,12 +485,29 @@ would probably be a useful exercise.
|
|
474
485
|
|
475
486
|
If you have any examples you'd like to contribute, I will gladly add them to the repository.
|
476
487
|
|
488
|
+
## Issues & Questions
|
489
|
+
|
490
|
+
Any issues, I will address them quickly as it is still early days, though I am pretty confident,
|
491
|
+
since this is based on a scientific algorithm, issues would be relatively minor. Post them to
|
492
|
+
the issue tracker:
|
493
|
+
|
494
|
+
- https://github.com/d11wtq/whittle/issues
|
495
|
+
|
496
|
+
If you have any suggestions for how I might improve the DSL in order to be more human-friendly,
|
497
|
+
you can suggest those in the issue tracker too.
|
498
|
+
|
499
|
+
For any "how do I do this?" type questions, you can message me directly (via my github profile
|
500
|
+
page):
|
501
|
+
|
502
|
+
- https://github.com/d11wtq
|
503
|
+
|
504
|
+
Or simply post an issue.
|
505
|
+
|
477
506
|
## TODO
|
478
507
|
|
479
508
|
- Provide a more powerful (state based) lexer algorithm, or at least document how users can
|
480
509
|
override `#lex`.
|
481
510
|
- Allow inspection of the parse table (it is not very human friendly right now).
|
482
|
-
- Allow inspection of the AST (maybe).
|
483
511
|
- Given in an input String, provide a human readble explanation of the parse.
|
484
512
|
|
485
513
|
## License & Copyright
|
data/lib/whittle/parser.rb
CHANGED
@@ -19,25 +19,21 @@ module Whittle
|
|
19
19
|
# @example A simple Whittle Parser
|
20
20
|
#
|
21
21
|
# class Calculator < Whittle::Parser
|
22
|
-
# rule(:wsp
|
23
|
-
# r[/s+/] # skip whitespace
|
24
|
-
# end
|
22
|
+
# rule(:wsp => /\s+/).skip!
|
25
23
|
#
|
26
|
-
# rule(:int)
|
27
|
-
# r[/[0-9]+/].as { |i| Integer(i) }
|
28
|
-
# end
|
24
|
+
# rule(:int => /[0-9]+/).as { |i| Integer(i) }
|
29
25
|
#
|
30
|
-
# rule("+") % :left
|
31
|
-
# rule("-") % :left
|
32
|
-
# rule("/") % :left
|
33
|
-
# rule("*") % :left
|
26
|
+
# rule("+") % :left ^ 1
|
27
|
+
# rule("-") % :left ^ 1
|
28
|
+
# rule("/") % :left ^ 2
|
29
|
+
# rule("*") % :left ^ 2
|
34
30
|
#
|
35
31
|
# rule(:expr) do |r|
|
36
32
|
# r[:expr, "+", :expr].as { |left, _, right| left + right }
|
37
33
|
# r[:expr, "-", :expr].as { |left, _, right| left - right }
|
38
34
|
# r[:expr, "/", :expr].as { |left, _, right| left / right }
|
39
35
|
# r[:expr, "*", :expr].as { |left, _, right| left * right }
|
40
|
-
# r[:int]
|
36
|
+
# r[:int]
|
41
37
|
# end
|
42
38
|
#
|
43
39
|
# start(:expr)
|
@@ -158,11 +154,11 @@ module Whittle
|
|
158
154
|
raise GrammarError, "Undefined start rule #{start.inspect}" unless rules.key?(start)
|
159
155
|
|
160
156
|
if rules[start].terminal?
|
161
|
-
rule(
|
157
|
+
rule(:$start) do |r|
|
162
158
|
r[start].as { |prog| prog }
|
163
159
|
end
|
164
160
|
|
165
|
-
start(
|
161
|
+
start(:$start)
|
166
162
|
end
|
167
163
|
end
|
168
164
|
end
|
@@ -177,12 +173,13 @@ module Whittle
|
|
177
173
|
# Accepts input in the form of a String and attempts to parse it according to the grammar.
|
178
174
|
#
|
179
175
|
# The input is scanned using a lexical analysis routine, defined by the #lex method. Each
|
180
|
-
# token detected by the routine is used to pick an action from the parse table.
|
181
|
-
# reduction initially builds a branch in an AST (abstract syntax tree), until all input has
|
182
|
-
# been read and the start rule has been recognized, at which point the AST is evaluated by
|
183
|
-
# invoking the callbacks defined in the grammar in a depth-first fashion.
|
176
|
+
# token detected by the routine is used to pick an action from the parse table.
|
184
177
|
#
|
185
|
-
#
|
178
|
+
# Each time a sequence of inputs has been read that concludes a rule in the grammar, the
|
179
|
+
# inputs are passed as arguments to the block for that rule, converting the sequence into
|
180
|
+
# single input before the parse continues.
|
181
|
+
#
|
182
|
+
# If the parser encounters a token it does not expect, a parse error will be raised,
|
186
183
|
# specifying what was expected, what was received, and on which line the error occurred.
|
187
184
|
#
|
188
185
|
# A successful parse returns the result of evaluating the start rule, whatever that may be.
|
@@ -200,39 +197,32 @@ module Whittle
|
|
200
197
|
|
201
198
|
lex(input) do |token|
|
202
199
|
line = token[:line]
|
203
|
-
input = token
|
204
200
|
|
205
|
-
|
206
|
-
|
207
|
-
state = table[states.last]
|
201
|
+
loop do
|
202
|
+
state = table[states.last]
|
208
203
|
|
209
|
-
|
210
|
-
|
211
|
-
|
212
|
-
|
213
|
-
|
214
|
-
|
215
|
-
|
216
|
-
|
217
|
-
|
218
|
-
|
219
|
-
|
220
|
-
:name => ins[:rule].name,
|
221
|
-
:line => line,
|
222
|
-
:args => args.pop(size)
|
223
|
-
}
|
224
|
-
states.pop(size)
|
225
|
-
args << input
|
204
|
+
if instruction = state[token[:name]] || state[nil]
|
205
|
+
case instruction[:action]
|
206
|
+
when :shift
|
207
|
+
states << instruction[:state]
|
208
|
+
args << token[:rule].action.call(token[:value])
|
209
|
+
break
|
210
|
+
when :reduce
|
211
|
+
rule = instruction[:rule]
|
212
|
+
size = rule.components.length
|
213
|
+
args << rule.action.call(*args.pop(size))
|
214
|
+
states.pop(size)
|
226
215
|
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
216
|
+
if states.length == 1 && token[:name] == :$end
|
217
|
+
return args.pop
|
218
|
+
elsif goto = table[states.last][rule.name]
|
219
|
+
states << goto[:state]
|
220
|
+
next
|
231
221
|
end
|
232
|
-
else
|
233
|
-
error(state, input, :states => states, :args => args)
|
234
222
|
end
|
235
223
|
end
|
224
|
+
|
225
|
+
error(state, token, :states => states, :args => args)
|
236
226
|
end
|
237
227
|
end
|
238
228
|
end
|
@@ -282,7 +272,7 @@ module Whittle
|
|
282
272
|
# @param [Hash] stack
|
283
273
|
# the current parse context (arg stack + state stack)
|
284
274
|
def error(state, input, stack)
|
285
|
-
expected = state
|
275
|
+
expected = extract_expected_tokens(state)
|
286
276
|
message = <<-ERROR.gsub(/\n\s+/, " ").strip
|
287
277
|
Parse error:
|
288
278
|
expected
|
@@ -309,8 +299,8 @@ module Whittle
|
|
309
299
|
nil
|
310
300
|
end
|
311
301
|
|
312
|
-
def
|
313
|
-
|
302
|
+
def extract_expected_tokens(state)
|
303
|
+
state.reject { |s, i| i[:action] == :goto }.keys.collect { |k| k.nil? ? :$end : k }
|
314
304
|
end
|
315
305
|
end
|
316
306
|
end
|
data/lib/whittle/version.rb
CHANGED
@@ -0,0 +1,37 @@
|
|
1
|
+
require "spec_helper"
|
2
|
+
|
3
|
+
describe "a parser expecting a fixed amount of input" do
|
4
|
+
let(:parser) do
|
5
|
+
Class.new(Whittle::Parser) do
|
6
|
+
rule("a")
|
7
|
+
rule("b")
|
8
|
+
rule("c")
|
9
|
+
|
10
|
+
rule(:prog) do |r|
|
11
|
+
r["a", "b", "c"]
|
12
|
+
end
|
13
|
+
|
14
|
+
start(:prog)
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
it "raises a parse error if additional input is encountered" do
|
19
|
+
expect { parser.new.parse("abcabc") }.to raise_error(Whittle::ParseError)
|
20
|
+
end
|
21
|
+
|
22
|
+
it "indicates that :$end is the expected token" do
|
23
|
+
begin
|
24
|
+
parser.new.parse("abcabc")
|
25
|
+
rescue Whittle::ParseError => e
|
26
|
+
e.expected.should == [:$end]
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
it "indicates that the first surplus token is the received input" do
|
31
|
+
begin
|
32
|
+
parser.new.parse("abcabc")
|
33
|
+
rescue Whittle::ParseError => e
|
34
|
+
e.received.should == "a"
|
35
|
+
end
|
36
|
+
end
|
37
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: whittle
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.3
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,11 +9,11 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2011-11-
|
12
|
+
date: 2011-11-29 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rspec
|
16
|
-
requirement: &
|
16
|
+
requirement: &70110140399280 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ~>
|
@@ -21,7 +21,7 @@ dependencies:
|
|
21
21
|
version: '2.6'
|
22
22
|
type: :development
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *70110140399280
|
25
25
|
description: ! "Write powerful parsers by defining a series of very simple rules\n
|
26
26
|
\ and operations to perform as those rules are matched. Whittle\n
|
27
27
|
\ parsers are written in pure ruby and as such are extremely
|
@@ -62,6 +62,7 @@ files:
|
|
62
62
|
- spec/unit/parser/self_referential_expr_spec.rb
|
63
63
|
- spec/unit/parser/skipped_tokens_spec.rb
|
64
64
|
- spec/unit/parser/sum_parser_spec.rb
|
65
|
+
- spec/unit/parser/surplus_input_spec.rb
|
65
66
|
- spec/unit/parser/typecast_parser_spec.rb
|
66
67
|
- whittle.gemspec
|
67
68
|
homepage: https://github.com/d11wtq/whittle
|
@@ -101,5 +102,6 @@ test_files:
|
|
101
102
|
- spec/unit/parser/self_referential_expr_spec.rb
|
102
103
|
- spec/unit/parser/skipped_tokens_spec.rb
|
103
104
|
- spec/unit/parser/sum_parser_spec.rb
|
105
|
+
- spec/unit/parser/surplus_input_spec.rb
|
104
106
|
- spec/unit/parser/typecast_parser_spec.rb
|
105
107
|
has_rdoc:
|