whittle 0.0.3 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -5,6 +5,8 @@ it's 100% ruby. You write parsers by specifying sequences of allowable rules (w
5
5
  other rules, or even to themselves). For each rule in your grammar, you provide a block that
6
6
  is invoked when the grammar is recognized.
7
7
 
8
+ **TL;DR** (Skip to 'Summary & FAQ')
9
+
8
10
  If you're *not* familiar with parsing, you should find Whittle to be a very friendly little
9
11
  parser.
10
12
 
@@ -485,6 +487,198 @@ would probably be a useful exercise.
485
487
 
486
488
  If you have any examples you'd like to contribute, I will gladly add them to the repository.
487
489
 
490
+ ## Summary & FAQ
491
+
492
+ ### Defining a rule to match a chunk of the input string
493
+
494
+ These are called "terminal rules", since they don't lead anywhere beyond themselves. A word of
495
+ caution here: the ordering matters. They are scanned in order from top to bottom.
496
+
497
+ ``` ruby
498
+ rule("keyword")
499
+ # or
500
+ rule(:name => /pattern/)
501
+ ```
502
+
503
+ ### Providing a semantic action for a terminal rule
504
+
505
+ ``` ruby
506
+ rule(:int => /[0-9]+/).as { |str| Integer(str) }
507
+ ```
508
+
509
+ ### Defining a rule to match a sequence of other rules
510
+
511
+ These are called "nonterminal rules", since they require chaining to other rules.
512
+
513
+ ``` ruby
514
+ rule(:sum) do |r|
515
+ r[:int, "+", :int].as { |a, _, b| a + b }
516
+ end
517
+ ```
518
+
519
+ Where `:int` and `"+"` have been previously declared by other rules. Arguments `a` and `b` in the
520
+ block are the two integers. Argument `_` is the "+" (which we're not using, hence the argument
521
+ name).
522
+
523
+ ### Defining alternatives for the same rule
524
+
525
+ Call `[](*args)` more than once.
526
+
527
+ ``` ruby
528
+ rule(:expr) do |r|
529
+ r[:expr, "+", :expr].as { |a, _, b| a + b }
530
+ r[:expr, "-", :expr].as { |a, _, b| a - b }
531
+ r[:int]
532
+ end
533
+ ```
534
+
535
+ ### Skipping whitespace and comments
536
+
537
+ ``` ruby
538
+ rule(:wsp => /\s+/).skip!
539
+ rule(:comment => /#.*$/m).skip!
540
+ ```
541
+
542
+ ### Looking for the same thing multiple times
543
+
544
+ Define the rule for the single item, then add another rule for itself, followed by the single item.
545
+
546
+ ``` ruby
547
+ rule(:list) do |r|
548
+ r[:list, :id].as { |list, id| list << id }
549
+ r[:id].as { |id| [id] }
550
+ end
551
+ ```
552
+
553
+ If you want to allow zero of something, add an additional `r[]`.
554
+
555
+ ### Looking for a comma separated list of something
556
+
557
+ Just like for above, but with a comma in our recursive rule.
558
+
559
+ ``` ruby
560
+ rule(:list) do |r|
561
+ r[:list, ",", :id].as { |list, _, id| list << id }
562
+ r[:id].as { |id| [id] }
563
+ end
564
+ ```
565
+
566
+ ### Evaluate the left hand side of binary expressions as early as possible
567
+
568
+ This is called left association. Tag the operators with `% :left`. They are tagged `% :right` by default.
569
+
570
+ ``` ruby
571
+ rule("+") % :left
572
+ ```
573
+
574
+ ### Give one operator a higher precedence than another
575
+
576
+ Attach a precedence number to any operators that need them. The higher the number, the higher the precedence.
577
+
578
+ ``` ruby
579
+ rule("+") ^ 1
580
+ rule("*") ^ 2
581
+ ```
582
+
583
+ ### I have two types of expression: binary and function call. How can I allow a binary expression in a function call argument, and a function call in a binary expression?
584
+
585
+ If you can explain it this simply on paper, you can explain it formally in your grammar. If `:binary_expr`
586
+ allows `:invocation_expr` as an operand, and if `:invocation_expr` allows `:binary_expr` as an argument, then
587
+ what you're saying is they can be used in place of each other; thus, define a rule that represents the two of them
588
+ and use that new rule where you want to support both types of expression.
589
+
590
+ Assuming your grammar looked something like this:
591
+
592
+ ``` ruby
593
+ rule("+")
594
+
595
+ rule(:int => /[0-9]+/).as { |i| Integer(i) }
596
+ rule(:id => /\w+/)
597
+
598
+ rule(:binary_expr) do |r|
599
+ r[:binary_expr, "+", :binary_expr].as { |a, _, b| a + b}
600
+ r[:int]
601
+ end
602
+
603
+ rule(:args) do |r|
604
+ r[].as { [] } # empty list
605
+ r[:args, :int].as { |args, i| args << i }
606
+ r[:int].as { |i| [i] }
607
+ end
608
+
609
+ rule(:invocation_expr) do |r|
610
+ r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
611
+ end
612
+ ```
613
+
614
+ This grammar can parse things like "1 + 2 + 3" and "foo(1, 2, 3)", but it can't parse something like
615
+ "1 + foo(2 + 3) + 4".
616
+
617
+ The goal is to replace `:int` in the `:args` rule and `:binary_expr` in the `:binary_expr` rule, with
618
+ something that represents both types of expression.
619
+
620
+ ``` ruby
621
+ rule("+")
622
+
623
+ rule(:int => /[0-9]+/).as { |i| Integer(i) }
624
+ rule(:id => /\w+/)
625
+
626
+ rule(:expr) do |r|
627
+ r[:binary_expr]
628
+ r[:invocation_expr]
629
+ end
630
+
631
+ rule(:binary_expr) do |r|
632
+ r[:expr, "+", :expr].as { |a, _, b| a + b}
633
+ r[:int]
634
+ end
635
+
636
+ rule(:args) do |r|
637
+ r[].as { [] } # empty list
638
+ r[:args, :expr].as { |args, expr| args << expr }
639
+ r[:expr].as { |expr| [expr] }
640
+ end
641
+
642
+ rule(:invocation_expr) do |r|
643
+ r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
644
+ end
645
+ ```
646
+
647
+ Now we can parse the more complex expression "1 + foo(2, 3) + 4" without any issues.
648
+
649
+ ### How do I track state to store variables etc with Whittle?
650
+
651
+ One of the goals of making Whittle all ruby was that I wouldn't have to tie people into any particular way of doing
652
+ something. Your blocks can call any ruby code they like, so create an object of some sort those blocks can reference
653
+ and do as you need during the parse. For example, you could add a method to the class called something like `runtime`,
654
+ which is accessible from each block.
655
+
656
+ ### I just want Whittle to give me an AST of my input
657
+
658
+ AST (abstract syntax tree) is a loose term. Early versions originally created an AST, but the format you want the AST
659
+ in probably differs from the format the next developer wants it in. It's really easy to use your grammar to make one
660
+ however you please:
661
+
662
+ ``` ruby
663
+ class Parser < Whittle::Parser
664
+ rule("+")
665
+
666
+ rule(:int => /[0-9]+/).as { |int| { :int => int } }
667
+
668
+ rule(:sum) do |r|
669
+ r[:int, "+", :int].as { |a, _, b| { :sum => [a, b] } }
670
+ end
671
+
672
+ start(:sum)
673
+ end
674
+
675
+ p Parser.new.parse("1+2")
676
+ # =>
677
+ # {:sum=>[{:int=>"1"}, {:int=>"2"}]}
678
+ ```
679
+
680
+ (There could be a side-project in this if somebody thinks a "generic AST" is useful enough).
681
+
488
682
  ## Issues & Questions
489
683
 
490
684
  Any issues, I will address them quickly as it is still early days, though I am pretty confident,
@@ -139,10 +139,11 @@ module Whittle
139
139
  {},
140
140
  self,
141
141
  {
142
- :state => initial_state,
143
- :seen => [],
144
- :offset => 0,
145
- :prec => 0
142
+ :initial => true,
143
+ :state => initial_state,
144
+ :seen => [],
145
+ :offset => 0,
146
+ :prec => 0
146
147
  }
147
148
  )
148
149
  end
@@ -207,13 +208,13 @@ module Whittle
207
208
  states << instruction[:state]
208
209
  args << token[:rule].action.call(token[:value])
209
210
  break
210
- when :reduce
211
+ when :reduce, :accept
211
212
  rule = instruction[:rule]
212
213
  size = rule.components.length
213
214
  args << rule.action.call(*args.pop(size))
214
215
  states.pop(size)
215
216
 
216
- if states.length == 1 && token[:name] == :$end
217
+ if states.length == 1 && instruction[:action] == :accept
217
218
  return args.pop
218
219
  elsif goto = table[states.last][rule.name]
219
220
  states << goto[:state]
@@ -300,7 +301,7 @@ module Whittle
300
301
  end
301
302
 
302
303
  def extract_expected_tokens(state)
303
- state.reject { |s, i| i[:action] == :goto }.keys.collect { |k| k.nil? ? :$end : k }
304
+ state.select { |s, i| [:shift, :accept].include?(i[:action]) }.keys
304
305
  end
305
306
  end
306
307
  end
@@ -88,6 +88,13 @@ module Whittle
88
88
  :rule => self,
89
89
  :prec => context[:prec]
90
90
  }
91
+
92
+ if context[:initial]
93
+ state[:$end] = {
94
+ :action => :accept,
95
+ :rule => self
96
+ }
97
+ end
91
98
  else
92
99
  raise GrammarError, "Unreferenced rule #{sym.inspect}" if rule.nil?
93
100
 
@@ -126,10 +133,11 @@ module Whittle
126
133
  table,
127
134
  parser,
128
135
  {
129
- :state => new_state,
130
- :seen => context[:seen],
131
- :offset => new_offset,
132
- :prec => new_prec
136
+ :initial => context[:initial],
137
+ :state => new_state,
138
+ :seen => context[:seen],
139
+ :offset => new_offset,
140
+ :prec => new_prec
133
141
  }
134
142
  )
135
143
  end
@@ -227,8 +235,7 @@ module Whittle
227
235
  {
228
236
  :rule => self,
229
237
  :value => match[0],
230
- # FIXME: Optimize this line count in a cross-platform compatible way
231
- :line => line + ("~" + match[0] + "~").lines.count - 1,
238
+ :line => line + match[0].count("\r\n", "\n"),
232
239
  :discarded => @action.equal?(NULL_ACTION)
233
240
  }
234
241
  end
@@ -3,5 +3,5 @@
3
3
  # Copyright (c) Chris Corbyn, 2011
4
4
 
5
5
  module Whittle
6
- VERSION = "0.0.3"
6
+ VERSION = "0.0.4"
7
7
  end
@@ -0,0 +1,43 @@
1
+ require "spec_helper"
2
+
3
+ describe "a parser receiving only partial input" do
4
+ let(:parser) do
5
+ Class.new(Whittle::Parser) do
6
+ rule("a")
7
+ rule("b")
8
+ rule("c")
9
+
10
+ rule(";")
11
+
12
+ rule(:abc) do |r|
13
+ r["a", "b", "c"]
14
+ end
15
+
16
+ rule(:prog) do |r|
17
+ r[:abc, ";"]
18
+ end
19
+
20
+ start(:prog)
21
+ end
22
+ end
23
+
24
+ it "raises a parse error" do
25
+ expect { parser.new.parse("abc") }.to raise_error(Whittle::ParseError)
26
+ end
27
+
28
+ it "reports the expected token" do
29
+ begin
30
+ parser.new.parse("abc")
31
+ rescue Whittle::ParseError => e
32
+ e.expected.should == [";"]
33
+ end
34
+ end
35
+
36
+ it "indicates :$end as the received token" do
37
+ begin
38
+ parser.new.parse("abc")
39
+ rescue Whittle::ParseError => e
40
+ e.received.should == :$end
41
+ end
42
+ end
43
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: whittle
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.3
4
+ version: 0.0.4
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2011-11-29 00:00:00.000000000 Z
12
+ date: 2011-11-30 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
16
- requirement: &70110140399280 !ruby/object:Gem::Requirement
16
+ requirement: &70302063205940 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ~>
@@ -21,7 +21,7 @@ dependencies:
21
21
  version: '2.6'
22
22
  type: :development
23
23
  prerelease: false
24
- version_requirements: *70110140399280
24
+ version_requirements: *70302063205940
25
25
  description: ! "Write powerful parsers by defining a series of very simple rules\n
26
26
  \ and operations to perform as those rules are matched. Whittle\n
27
27
  \ parsers are written in pure ruby and as such are extremely
@@ -59,6 +59,7 @@ files:
59
59
  - spec/unit/parser/noop_spec.rb
60
60
  - spec/unit/parser/pass_through_parser_spec.rb
61
61
  - spec/unit/parser/precedence_spec.rb
62
+ - spec/unit/parser/premature_eof_spec.rb
62
63
  - spec/unit/parser/self_referential_expr_spec.rb
63
64
  - spec/unit/parser/skipped_tokens_spec.rb
64
65
  - spec/unit/parser/sum_parser_spec.rb
@@ -99,6 +100,7 @@ test_files:
99
100
  - spec/unit/parser/noop_spec.rb
100
101
  - spec/unit/parser/pass_through_parser_spec.rb
101
102
  - spec/unit/parser/precedence_spec.rb
103
+ - spec/unit/parser/premature_eof_spec.rb
102
104
  - spec/unit/parser/self_referential_expr_spec.rb
103
105
  - spec/unit/parser/skipped_tokens_spec.rb
104
106
  - spec/unit/parser/sum_parser_spec.rb