whittle 0.0.2 → 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -83,9 +83,10 @@ program, which in this case is the `:expr` rule that can add two numbers togethe
83
83
 
84
84
  There are two terminal rules (`"+"` and `:int`) and one nonterminal (`:expr`) in the above
85
85
  grammar. Each rule can have a block attached to it. The block is invoked with the result
86
- evaluating the blocks attached to each of its inputs (in a depth-first manner). The default
86
+ evaluating each of its inputs via their own blocks (in a depth-first manner). The default
87
87
  action if no block is given, is to return whatever the leftmost input to the rule happens to
88
- be.
88
+ be. We use `#as` to provide an action that actually does something meaningful with the
89
+ inputs.
89
90
 
90
91
  We can optionally use the Hash notation to map a name with a pattern (or a fixed string) when
91
92
  we declare terminal rules too, as we have done with the `:int` rule above. Note that the
@@ -94,16 +95,26 @@ block, but since this is such a common use-case, Whittle offers the shorthand.
94
95
 
95
96
  As the input string is parsed, it *must* match the start rule `:expr`.
96
97
 
97
- Let's step through the parse for the above input "1+2". When the parser starts, it looks at
98
- the start rule `:expr` and decides what tokens would be valid if they were encountered. Since
99
- `:expr` starts with `:int`, the only thing that would be valid is anything matching
100
- `/[0-9]+/`. When the parser reads the "1", it recognizes it as an `:int`, puts at aside (puts
101
- it on the stack, in technical terms). Now it advances through the rule for `:expr` and
102
- decides the only possible valid input would be a "+", and finally the last `:int`. Upon
103
- having read the sequence `:int`, "+", `:int`, our block attached to that rule is invoked to
104
- return a result. First the three inputs are passed through their respective blocks (so the
105
- "1" and the "2" are cast to integers, according to the rule for `:int`), then they are passed
106
- to the `:expr`, which adds the 1 and the 2 to make 3. Magic!
98
+ Let's step through the parse for the above input "1+2".
99
+
100
+ - When the parser starts, it looks at the start rule `:expr` and decides what tokens would
101
+ be valid if they were encountered.
102
+ - Since `:expr` starts with `:int`, the only thing that would be valid is anything matching
103
+ `/[0-9]+/`.
104
+ - When the parser reads the "1", it recognizes it as an `:int`, evaluates its block (thus
105
+ casting it to an Integer), and moves it aside (puts it on the stack, to be precise).
106
+ - Now it advances through the rule for `:expr` and decides the only valid input would be a
107
+ "+"
108
+ - Upon reading the "+", the rule for "+" is invoked (which does nothing) and the "+" is put
109
+ on the stack, along with the `:int` we already have.
110
+ - Now the parser's only valid input is another `:int`, which it gets from the "2", casting
111
+ it to an Integer according to its block, and putting it on the stack.
112
+ - Finally, upon having read the sequence `:int`, "+", `:int`, our block attached to that
113
+ particular rule is invoked to return a result by summing the 1 and the 2 to make 3. Magic!
114
+
115
+ This was a simple parse. At each point there was only one valid input. As we'll see, parses
116
+ can be arbitrarily complex, without increasing the amount of work needed to process the input
117
+ string.
107
118
 
108
119
  ## Nonterminal rules can have more than one valid sequence
109
120
 
@@ -474,12 +485,29 @@ would probably be a useful exercise.
474
485
 
475
486
  If you have any examples you'd like to contribute, I will gladly add them to the repository.
476
487
 
488
+ ## Issues & Questions
489
+
490
+ Any issues, I will address them quickly as it is still early days, though I am pretty confident,
491
+ since this is based on a scientific algorithm, issues would be relatively minor. Post them to
492
+ the issue tracker:
493
+
494
+ - https://github.com/d11wtq/whittle/issues
495
+
496
+ If you have any suggestions for how I might improve the DSL in order to be more human-friendly,
497
+ you can suggest those in the issue tracker too.
498
+
499
+ For any "how do I do this?" type questions, you can message me directly (via my github profile
500
+ page):
501
+
502
+ - https://github.com/d11wtq
503
+
504
+ Or simply post an issue.
505
+
477
506
  ## TODO
478
507
 
479
508
  - Provide a more powerful (state based) lexer algorithm, or at least document how users can
480
509
  override `#lex`.
481
510
  - Allow inspection of the parse table (it is not very human friendly right now).
482
- - Allow inspection of the AST (maybe).
483
511
  - Given in an input String, provide a human readble explanation of the parse.
484
512
 
485
513
  ## License & Copyright
@@ -19,25 +19,21 @@ module Whittle
19
19
  # @example A simple Whittle Parser
20
20
  #
21
21
  # class Calculator < Whittle::Parser
22
- # rule(:wsp) do |r|
23
- # r[/s+/] # skip whitespace
24
- # end
22
+ # rule(:wsp => /\s+/).skip!
25
23
  #
26
- # rule(:int) do |r|
27
- # r[/[0-9]+/].as { |i| Integer(i) }
28
- # end
24
+ # rule(:int => /[0-9]+/).as { |i| Integer(i) }
29
25
  #
30
- # rule("+") % :left
31
- # rule("-") % :left
32
- # rule("/") % :left
33
- # rule("*") % :left
26
+ # rule("+") % :left ^ 1
27
+ # rule("-") % :left ^ 1
28
+ # rule("/") % :left ^ 2
29
+ # rule("*") % :left ^ 2
34
30
  #
35
31
  # rule(:expr) do |r|
36
32
  # r[:expr, "+", :expr].as { |left, _, right| left + right }
37
33
  # r[:expr, "-", :expr].as { |left, _, right| left - right }
38
34
  # r[:expr, "/", :expr].as { |left, _, right| left / right }
39
35
  # r[:expr, "*", :expr].as { |left, _, right| left * right }
40
- # r[:int].as(:value)
36
+ # r[:int]
41
37
  # end
42
38
  #
43
39
  # start(:expr)
@@ -158,11 +154,11 @@ module Whittle
158
154
  raise GrammarError, "Undefined start rule #{start.inspect}" unless rules.key?(start)
159
155
 
160
156
  if rules[start].terminal?
161
- rule(:*) do |r|
157
+ rule(:$start) do |r|
162
158
  r[start].as { |prog| prog }
163
159
  end
164
160
 
165
- start(:*)
161
+ start(:$start)
166
162
  end
167
163
  end
168
164
  end
@@ -177,12 +173,13 @@ module Whittle
177
173
  # Accepts input in the form of a String and attempts to parse it according to the grammar.
178
174
  #
179
175
  # The input is scanned using a lexical analysis routine, defined by the #lex method. Each
180
- # token detected by the routine is used to pick an action from the parse table. Each
181
- # reduction initially builds a branch in an AST (abstract syntax tree), until all input has
182
- # been read and the start rule has been recognized, at which point the AST is evaluated by
183
- # invoking the callbacks defined in the grammar in a depth-first fashion.
176
+ # token detected by the routine is used to pick an action from the parse table.
184
177
  #
185
- # If the parser encounters a token it does not recognise, a parse error will be raised,
178
+ # Each time a sequence of inputs has been read that concludes a rule in the grammar, the
179
+ # inputs are passed as arguments to the block for that rule, converting the sequence into
180
+ # single input before the parse continues.
181
+ #
182
+ # If the parser encounters a token it does not expect, a parse error will be raised,
186
183
  # specifying what was expected, what was received, and on which line the error occurred.
187
184
  #
188
185
  # A successful parse returns the result of evaluating the start rule, whatever that may be.
@@ -200,39 +197,32 @@ module Whittle
200
197
 
201
198
  lex(input) do |token|
202
199
  line = token[:line]
203
- input = token
204
200
 
205
- catch(:shifted) do
206
- loop do
207
- state = table[states.last]
201
+ loop do
202
+ state = table[states.last]
208
203
 
209
- if ins = state[input[:name]] || state[nil]
210
- case ins[:action]
211
- when :shift
212
- input[:args] = [input.delete(:value)]
213
- states << ins[:state]
214
- args << input
215
- throw :shifted
216
- when :reduce
217
- size = ins[:rule].components.length
218
- input = {
219
- :rule => ins[:rule],
220
- :name => ins[:rule].name,
221
- :line => line,
222
- :args => args.pop(size)
223
- }
224
- states.pop(size)
225
- args << input
204
+ if instruction = state[token[:name]] || state[nil]
205
+ case instruction[:action]
206
+ when :shift
207
+ states << instruction[:state]
208
+ args << token[:rule].action.call(token[:value])
209
+ break
210
+ when :reduce
211
+ rule = instruction[:rule]
212
+ size = rule.components.length
213
+ args << rule.action.call(*args.pop(size))
214
+ states.pop(size)
226
215
 
227
- return accept(args.pop) if states.length == 1 && token[:name] == :$end
228
- when :goto
229
- input = token
230
- states << ins[:state]
216
+ if states.length == 1 && token[:name] == :$end
217
+ return args.pop
218
+ elsif goto = table[states.last][rule.name]
219
+ states << goto[:state]
220
+ next
231
221
  end
232
- else
233
- error(state, input, :states => states, :args => args)
234
222
  end
235
223
  end
224
+
225
+ error(state, token, :states => states, :args => args)
236
226
  end
237
227
  end
238
228
  end
@@ -282,7 +272,7 @@ module Whittle
282
272
  # @param [Hash] stack
283
273
  # the current parse context (arg stack + state stack)
284
274
  def error(state, input, stack)
285
- expected = state.reject { |s, i| i[:action] == :goto }.keys
275
+ expected = extract_expected_tokens(state)
286
276
  message = <<-ERROR.gsub(/\n\s+/, " ").strip
287
277
  Parse error:
288
278
  expected
@@ -309,8 +299,8 @@ module Whittle
309
299
  nil
310
300
  end
311
301
 
312
- def accept(tree)
313
- tree[:rule].action.call(*tree[:args].map { |arg| Hash === arg ? accept(arg) : arg })
302
+ def extract_expected_tokens(state)
303
+ state.reject { |s, i| i[:action] == :goto }.keys.collect { |k| k.nil? ? :$end : k }
314
304
  end
315
305
  end
316
306
  end
@@ -3,5 +3,5 @@
3
3
  # Copyright (c) Chris Corbyn, 2011
4
4
 
5
5
  module Whittle
6
- VERSION = "0.0.2"
6
+ VERSION = "0.0.3"
7
7
  end
@@ -0,0 +1,37 @@
1
+ require "spec_helper"
2
+
3
+ describe "a parser expecting a fixed amount of input" do
4
+ let(:parser) do
5
+ Class.new(Whittle::Parser) do
6
+ rule("a")
7
+ rule("b")
8
+ rule("c")
9
+
10
+ rule(:prog) do |r|
11
+ r["a", "b", "c"]
12
+ end
13
+
14
+ start(:prog)
15
+ end
16
+ end
17
+
18
+ it "raises a parse error if additional input is encountered" do
19
+ expect { parser.new.parse("abcabc") }.to raise_error(Whittle::ParseError)
20
+ end
21
+
22
+ it "indicates that :$end is the expected token" do
23
+ begin
24
+ parser.new.parse("abcabc")
25
+ rescue Whittle::ParseError => e
26
+ e.expected.should == [:$end]
27
+ end
28
+ end
29
+
30
+ it "indicates that the first surplus token is the received input" do
31
+ begin
32
+ parser.new.parse("abcabc")
33
+ rescue Whittle::ParseError => e
34
+ e.received.should == "a"
35
+ end
36
+ end
37
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: whittle
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.0.3
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2011-11-28 00:00:00.000000000 Z
12
+ date: 2011-11-29 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
16
- requirement: &70351976364700 !ruby/object:Gem::Requirement
16
+ requirement: &70110140399280 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ~>
@@ -21,7 +21,7 @@ dependencies:
21
21
  version: '2.6'
22
22
  type: :development
23
23
  prerelease: false
24
- version_requirements: *70351976364700
24
+ version_requirements: *70110140399280
25
25
  description: ! "Write powerful parsers by defining a series of very simple rules\n
26
26
  \ and operations to perform as those rules are matched. Whittle\n
27
27
  \ parsers are written in pure ruby and as such are extremely
@@ -62,6 +62,7 @@ files:
62
62
  - spec/unit/parser/self_referential_expr_spec.rb
63
63
  - spec/unit/parser/skipped_tokens_spec.rb
64
64
  - spec/unit/parser/sum_parser_spec.rb
65
+ - spec/unit/parser/surplus_input_spec.rb
65
66
  - spec/unit/parser/typecast_parser_spec.rb
66
67
  - whittle.gemspec
67
68
  homepage: https://github.com/d11wtq/whittle
@@ -101,5 +102,6 @@ test_files:
101
102
  - spec/unit/parser/self_referential_expr_spec.rb
102
103
  - spec/unit/parser/skipped_tokens_spec.rb
103
104
  - spec/unit/parser/sum_parser_spec.rb
105
+ - spec/unit/parser/surplus_input_spec.rb
104
106
  - spec/unit/parser/typecast_parser_spec.rb
105
107
  has_rdoc: