whittle 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -83,9 +83,10 @@ program, which in this case is the `:expr` rule that can add two numbers togethe
83
83
 
84
84
  There are two terminal rules (`"+"` and `:int`) and one nonterminal (`:expr`) in the above
85
85
  grammar. Each rule can have a block attached to it. The block is invoked with the result
86
- evaluating the blocks attached to each of its inputs (in a depth-first manner). The default
86
+ evaluating each of its inputs via their own blocks (in a depth-first manner). The default
87
87
  action if no block is given, is to return whatever the leftmost input to the rule happens to
88
- be.
88
+ be. We use `#as` to provide an action that actually does something meaningful with the
89
+ inputs.
89
90
 
90
91
  We can optionally use the Hash notation to map a name with a pattern (or a fixed string) when
91
92
  we declare terminal rules too, as we have done with the `:int` rule above. Note that the
@@ -94,16 +95,26 @@ block, but since this is such a common use-case, Whittle offers the shorthand.
94
95
 
95
96
  As the input string is parsed, it *must* match the start rule `:expr`.
96
97
 
97
- Let's step through the parse for the above input "1+2". When the parser starts, it looks at
98
- the start rule `:expr` and decides what tokens would be valid if they were encountered. Since
99
- `:expr` starts with `:int`, the only thing that would be valid is anything matching
100
- `/[0-9]+/`. When the parser reads the "1", it recognizes it as an `:int`, puts at aside (puts
101
- it on the stack, in technical terms). Now it advances through the rule for `:expr` and
102
- decides the only possible valid input would be a "+", and finally the last `:int`. Upon
103
- having read the sequence `:int`, "+", `:int`, our block attached to that rule is invoked to
104
- return a result. First the three inputs are passed through their respective blocks (so the
105
- "1" and the "2" are cast to integers, according to the rule for `:int`), then they are passed
106
- to the `:expr`, which adds the 1 and the 2 to make 3. Magic!
98
+ Let's step through the parse for the above input "1+2".
99
+
100
+ - When the parser starts, it looks at the start rule `:expr` and decides what tokens would
101
+ be valid if they were encountered.
102
+ - Since `:expr` starts with `:int`, the only thing that would be valid is anything matching
103
+ `/[0-9]+/`.
104
+ - When the parser reads the "1", it recognizes it as an `:int`, evaluates its block (thus
105
+ casting it to an Integer), and moves it aside (puts it on the stack, to be precise).
106
+ - Now it advances through the rule for `:expr` and decides the only valid input would be a
107
+ "+"
108
+ - Upon reading the "+", the rule for "+" is invoked (which does nothing) and the "+" is put
109
+ on the stack, along with the `:int` we already have.
110
+ - Now the parser's only valid input is another `:int`, which it gets from the "2", casting
111
+ it to an Integer according to its block, and putting it on the stack.
112
+ - Finally, upon having read the sequence `:int`, "+", `:int`, our block attached to that
113
+ particular rule is invoked to return a result by summing the 1 and the 2 to make 3. Magic!
114
+
115
+ This was a simple parse. At each point there was only one valid input. As we'll see, parses
116
+ can be arbitrarily complex, without increasing the amount of work needed to process the input
117
+ string.
107
118
 
108
119
  ## Nonterminal rules can have more than one valid sequence
109
120
 
@@ -474,12 +485,29 @@ would probably be a useful exercise.
474
485
 
475
486
  If you have any examples you'd like to contribute, I will gladly add them to the repository.
476
487
 
488
+ ## Issues & Questions
489
+
490
+ Any issues, I will address them quickly as it is still early days, though I am pretty confident,
491
+ since this is based on a scientific algorithm, issues would be relatively minor. Post them to
492
+ the issue tracker:
493
+
494
+ - https://github.com/d11wtq/whittle/issues
495
+
496
+ If you have any suggestions for how I might improve the DSL in order to be more human-friendly,
497
+ you can suggest those in the issue tracker too.
498
+
499
+ For any "how do I do this?" type questions, you can message me directly (via my github profile
500
+ page):
501
+
502
+ - https://github.com/d11wtq
503
+
504
+ Or simply post an issue.
505
+
477
506
  ## TODO
478
507
 
479
508
  - Provide a more powerful (state based) lexer algorithm, or at least document how users can
480
509
  override `#lex`.
481
510
  - Allow inspection of the parse table (it is not very human friendly right now).
482
- - Allow inspection of the AST (maybe).
483
511
  - Given in an input String, provide a human readble explanation of the parse.
484
512
 
485
513
  ## License & Copyright
@@ -19,25 +19,21 @@ module Whittle
19
19
  # @example A simple Whittle Parser
20
20
  #
21
21
  # class Calculator < Whittle::Parser
22
- # rule(:wsp) do |r|
23
- # r[/s+/] # skip whitespace
24
- # end
22
+ # rule(:wsp => /\s+/).skip!
25
23
  #
26
- # rule(:int) do |r|
27
- # r[/[0-9]+/].as { |i| Integer(i) }
28
- # end
24
+ # rule(:int => /[0-9]+/).as { |i| Integer(i) }
29
25
  #
30
- # rule("+") % :left
31
- # rule("-") % :left
32
- # rule("/") % :left
33
- # rule("*") % :left
26
+ # rule("+") % :left ^ 1
27
+ # rule("-") % :left ^ 1
28
+ # rule("/") % :left ^ 2
29
+ # rule("*") % :left ^ 2
34
30
  #
35
31
  # rule(:expr) do |r|
36
32
  # r[:expr, "+", :expr].as { |left, _, right| left + right }
37
33
  # r[:expr, "-", :expr].as { |left, _, right| left - right }
38
34
  # r[:expr, "/", :expr].as { |left, _, right| left / right }
39
35
  # r[:expr, "*", :expr].as { |left, _, right| left * right }
40
- # r[:int].as(:value)
36
+ # r[:int]
41
37
  # end
42
38
  #
43
39
  # start(:expr)
@@ -158,11 +154,11 @@ module Whittle
158
154
  raise GrammarError, "Undefined start rule #{start.inspect}" unless rules.key?(start)
159
155
 
160
156
  if rules[start].terminal?
161
- rule(:*) do |r|
157
+ rule(:$start) do |r|
162
158
  r[start].as { |prog| prog }
163
159
  end
164
160
 
165
- start(:*)
161
+ start(:$start)
166
162
  end
167
163
  end
168
164
  end
@@ -177,12 +173,13 @@ module Whittle
177
173
  # Accepts input in the form of a String and attempts to parse it according to the grammar.
178
174
  #
179
175
  # The input is scanned using a lexical analysis routine, defined by the #lex method. Each
180
- # token detected by the routine is used to pick an action from the parse table. Each
181
- # reduction initially builds a branch in an AST (abstract syntax tree), until all input has
182
- # been read and the start rule has been recognized, at which point the AST is evaluated by
183
- # invoking the callbacks defined in the grammar in a depth-first fashion.
176
+ # token detected by the routine is used to pick an action from the parse table.
184
177
  #
185
- # If the parser encounters a token it does not recognise, a parse error will be raised,
178
+ # Each time a sequence of inputs has been read that concludes a rule in the grammar, the
179
+ # inputs are passed as arguments to the block for that rule, converting the sequence into
180
+ # single input before the parse continues.
181
+ #
182
+ # If the parser encounters a token it does not expect, a parse error will be raised,
186
183
  # specifying what was expected, what was received, and on which line the error occurred.
187
184
  #
188
185
  # A successful parse returns the result of evaluating the start rule, whatever that may be.
@@ -200,39 +197,32 @@ module Whittle
200
197
 
201
198
  lex(input) do |token|
202
199
  line = token[:line]
203
- input = token
204
200
 
205
- catch(:shifted) do
206
- loop do
207
- state = table[states.last]
201
+ loop do
202
+ state = table[states.last]
208
203
 
209
- if ins = state[input[:name]] || state[nil]
210
- case ins[:action]
211
- when :shift
212
- input[:args] = [input.delete(:value)]
213
- states << ins[:state]
214
- args << input
215
- throw :shifted
216
- when :reduce
217
- size = ins[:rule].components.length
218
- input = {
219
- :rule => ins[:rule],
220
- :name => ins[:rule].name,
221
- :line => line,
222
- :args => args.pop(size)
223
- }
224
- states.pop(size)
225
- args << input
204
+ if instruction = state[token[:name]] || state[nil]
205
+ case instruction[:action]
206
+ when :shift
207
+ states << instruction[:state]
208
+ args << token[:rule].action.call(token[:value])
209
+ break
210
+ when :reduce
211
+ rule = instruction[:rule]
212
+ size = rule.components.length
213
+ args << rule.action.call(*args.pop(size))
214
+ states.pop(size)
226
215
 
227
- return accept(args.pop) if states.length == 1 && token[:name] == :$end
228
- when :goto
229
- input = token
230
- states << ins[:state]
216
+ if states.length == 1 && token[:name] == :$end
217
+ return args.pop
218
+ elsif goto = table[states.last][rule.name]
219
+ states << goto[:state]
220
+ next
231
221
  end
232
- else
233
- error(state, input, :states => states, :args => args)
234
222
  end
235
223
  end
224
+
225
+ error(state, token, :states => states, :args => args)
236
226
  end
237
227
  end
238
228
  end
@@ -282,7 +272,7 @@ module Whittle
282
272
  # @param [Hash] stack
283
273
  # the current parse context (arg stack + state stack)
284
274
  def error(state, input, stack)
285
- expected = state.reject { |s, i| i[:action] == :goto }.keys
275
+ expected = extract_expected_tokens(state)
286
276
  message = <<-ERROR.gsub(/\n\s+/, " ").strip
287
277
  Parse error:
288
278
  expected
@@ -309,8 +299,8 @@ module Whittle
309
299
  nil
310
300
  end
311
301
 
312
- def accept(tree)
313
- tree[:rule].action.call(*tree[:args].map { |arg| Hash === arg ? accept(arg) : arg })
302
+ def extract_expected_tokens(state)
303
+ state.reject { |s, i| i[:action] == :goto }.keys.collect { |k| k.nil? ? :$end : k }
314
304
  end
315
305
  end
316
306
  end
@@ -3,5 +3,5 @@
3
3
  # Copyright (c) Chris Corbyn, 2011
4
4
 
5
5
  module Whittle
6
- VERSION = "0.0.2"
6
+ VERSION = "0.0.3"
7
7
  end
@@ -0,0 +1,37 @@
1
+ require "spec_helper"
2
+
3
+ describe "a parser expecting a fixed amount of input" do
4
+ let(:parser) do
5
+ Class.new(Whittle::Parser) do
6
+ rule("a")
7
+ rule("b")
8
+ rule("c")
9
+
10
+ rule(:prog) do |r|
11
+ r["a", "b", "c"]
12
+ end
13
+
14
+ start(:prog)
15
+ end
16
+ end
17
+
18
+ it "raises a parse error if additional input is encountered" do
19
+ expect { parser.new.parse("abcabc") }.to raise_error(Whittle::ParseError)
20
+ end
21
+
22
+ it "indicates that :$end is the expected token" do
23
+ begin
24
+ parser.new.parse("abcabc")
25
+ rescue Whittle::ParseError => e
26
+ e.expected.should == [:$end]
27
+ end
28
+ end
29
+
30
+ it "indicates that the first surplus token is the received input" do
31
+ begin
32
+ parser.new.parse("abcabc")
33
+ rescue Whittle::ParseError => e
34
+ e.received.should == "a"
35
+ end
36
+ end
37
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: whittle
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.0.3
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2011-11-28 00:00:00.000000000 Z
12
+ date: 2011-11-29 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
16
- requirement: &70351976364700 !ruby/object:Gem::Requirement
16
+ requirement: &70110140399280 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ~>
@@ -21,7 +21,7 @@ dependencies:
21
21
  version: '2.6'
22
22
  type: :development
23
23
  prerelease: false
24
- version_requirements: *70351976364700
24
+ version_requirements: *70110140399280
25
25
  description: ! "Write powerful parsers by defining a series of very simple rules\n
26
26
  \ and operations to perform as those rules are matched. Whittle\n
27
27
  \ parsers are written in pure ruby and as such are extremely
@@ -62,6 +62,7 @@ files:
62
62
  - spec/unit/parser/self_referential_expr_spec.rb
63
63
  - spec/unit/parser/skipped_tokens_spec.rb
64
64
  - spec/unit/parser/sum_parser_spec.rb
65
+ - spec/unit/parser/surplus_input_spec.rb
65
66
  - spec/unit/parser/typecast_parser_spec.rb
66
67
  - whittle.gemspec
67
68
  homepage: https://github.com/d11wtq/whittle
@@ -101,5 +102,6 @@ test_files:
101
102
  - spec/unit/parser/self_referential_expr_spec.rb
102
103
  - spec/unit/parser/skipped_tokens_spec.rb
103
104
  - spec/unit/parser/sum_parser_spec.rb
105
+ - spec/unit/parser/surplus_input_spec.rb
104
106
  - spec/unit/parser/typecast_parser_spec.rb
105
107
  has_rdoc: