whittle 0.0.3 → 0.0.4

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -5,6 +5,8 @@ it's 100% ruby. You write parsers by specifying sequences of allowable rules (w
5
5
  other rules, or even to themselves). For each rule in your grammar, you provide a block that
6
6
  is invoked when the grammar is recognized.
7
7
 
8
+ **TL;DR** (Skip to 'Summary & FAQ')
9
+
8
10
  If you're *not* familiar with parsing, you should find Whittle to be a very friendly little
9
11
  parser.
10
12
 
@@ -485,6 +487,198 @@ would probably be a useful exercise.
485
487
 
486
488
  If you have any examples you'd like to contribute, I will gladly add them to the repository.
487
489
 
490
+ ## Summary & FAQ
491
+
492
+ ### Defining a rule to match a chunk of the input string
493
+
494
+ These are called "terminal rules", since they don't lead anywhere beyond themselves. A word of
495
+ caution here: the ordering matters. They are scanned in order from top to bottom.
496
+
497
+ ``` ruby
498
+ rule("keyword")
499
+ # or
500
+ rule(:name => /pattern/)
501
+ ```
502
+
503
+ ### Providing a semantic action for a terminal rule
504
+
505
+ ``` ruby
506
+ rule(:int => /[0-9]+/).as { |str| Integer(str) }
507
+ ```
508
+
509
+ ### Defining a rule to match a sequence of other rules
510
+
511
+ These are called "nonterminal rules", since they require chaining to other rules.
512
+
513
+ ``` ruby
514
+ rule(:sum) do |r|
515
+ r[:int, "+", :int].as { |a, _, b| a + b }
516
+ end
517
+ ```
518
+
519
+ Where `:int` and `"+"` have been previously declared by other rules. Arguments `a` and `b` in the
520
+ block are the two integers. Argument `_` is the "+" (which we're not using, hence the argument
521
+ name).
522
+
523
+ ### Defining alternatives for the same rule
524
+
525
+ Call `[](*args)` more than once.
526
+
527
+ ``` ruby
528
+ rule(:expr) do |r|
529
+ r[:expr, "+", :expr].as { |a, _, b| a + b }
530
+ r[:expr, "-", :expr].as { |a, _, b| a - b }
531
+ r[:int]
532
+ end
533
+ ```
534
+
535
+ ### Skipping whitespace and comments
536
+
537
+ ``` ruby
538
+ rule(:wsp => /\s+/).skip!
539
+ rule(:comment => /#.*$/m).skip!
540
+ ```
541
+
542
+ ### Looking for the same thing multiple times
543
+
544
+ Define the rule for the single item, then add another rule for itself, followed by the single item.
545
+
546
+ ``` ruby
547
+ rule(:list) do |r|
548
+ r[:list, :id].as { |list, id| list << id }
549
+ r[:id].as { |id| [id] }
550
+ end
551
+ ```
552
+
553
+ If you want to allow zero of something, add an additional `r[]`.
554
+
555
+ ### Looking for a comma separated list of something
556
+
557
+ Just like for above, but with a comma in our recursive rule.
558
+
559
+ ``` ruby
560
+ rule(:list) do |r|
561
+ r[:list, ",", :id].as { |list, _, id| list << id }
562
+ r[:id].as { |id| [id] }
563
+ end
564
+ ```
565
+
566
+ ### Evaluate the left hand side of binary expressions as early as possible
567
+
568
+ This is called left association. Tag the operators with `% :left`. They are tagged `% :right` by default.
569
+
570
+ ``` ruby
571
+ rule("+") % :left
572
+ ```
573
+
574
+ ### Give one operator a higher precedence than another
575
+
576
+ Attach a precedence number to any operators that need them. The higher the number, the higher the precedence.
577
+
578
+ ``` ruby
579
+ rule("+") ^ 1
580
+ rule("*") ^ 2
581
+ ```
582
+
583
+ ### I have two types of expression: binary and function call. How can I allow a binary expression in a function call argument, and a function call in a binary expression?
584
+
585
+ If you can explain it this simply on paper, you can explain it formally in your grammar. If `:binary_expr`
586
+ allows `:invocation_expr` as an operand, and if `:invocation_expr` allows `:binary_expr` as an argument, then
587
+ what you're saying is they can be used in place of each other; thus, define a rule that represents the two of them
588
+ and use that new rule where you want to support both types of expression.
589
+
590
+ Assuming your grammar looked something like this:
591
+
592
+ ``` ruby
593
+ rule("+")
594
+
595
+ rule(:int => /[0-9]+/).as { |i| Integer(i) }
596
+ rule(:id => /\w+/)
597
+
598
+ rule(:binary_expr) do |r|
599
+ r[:binary_expr, "+", :binary_expr].as { |a, _, b| a + b}
600
+ r[:int]
601
+ end
602
+
603
+ rule(:args) do |r|
604
+ r[].as { [] } # empty list
605
+ r[:args, :int].as { |args, i| args << i }
606
+ r[:int].as { |i| [i] }
607
+ end
608
+
609
+ rule(:invocation_expr) do |r|
610
+ r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
611
+ end
612
+ ```
613
+
614
+ This grammar can parse things like "1 + 2 + 3" and "foo(1, 2, 3)", but it can't parse something like
615
+ "1 + foo(2 + 3) + 4".
616
+
617
+ The goal is to replace `:int` in the `:args` rule and `:binary_expr` in the `:binary_expr` rule, with
618
+ something that represents both types of expression.
619
+
620
+ ``` ruby
621
+ rule("+")
622
+
623
+ rule(:int => /[0-9]+/).as { |i| Integer(i) }
624
+ rule(:id => /\w+/)
625
+
626
+ rule(:expr) do |r|
627
+ r[:binary_expr]
628
+ r[:invocation_expr]
629
+ end
630
+
631
+ rule(:binary_expr) do |r|
632
+ r[:expr, "+", :expr].as { |a, _, b| a + b}
633
+ r[:int]
634
+ end
635
+
636
+ rule(:args) do |r|
637
+ r[].as { [] } # empty list
638
+ r[:args, :expr].as { |args, expr| args << expr }
639
+ r[:expr].as { |expr| [expr] }
640
+ end
641
+
642
+ rule(:invocation_expr) do |r|
643
+ r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
644
+ end
645
+ ```
646
+
647
+ Now we can parse the more complex expression "1 + foo(2, 3) + 4" without any issues.
648
+
649
+ ### How do I track state to store variables etc with Whittle?
650
+
651
+ One of the goals of making Whittle all ruby was that I wouldn't have to tie people into any particular way of doing
652
+ something. Your blocks can call any ruby code they like, so create an object of some sort those blocks can reference
653
+ and do as you need during the parse. For example, you could add a method to the class called something like `runtime`,
654
+ which is accessible from each block.
655
+
656
+ ### I just want Whittle to give me an AST of my input
657
+
658
+ AST (abstract syntax tree) is a loose term. Early versions originally created an AST, but the format you want the AST
659
+ in probably differs from the format the next developer wants it in. It's really easy to use your grammar to make one
660
+ however you please:
661
+
662
+ ``` ruby
663
+ class Parser < Whittle::Parser
664
+ rule("+")
665
+
666
+ rule(:int => /[0-9]+/).as { |int| { :int => int } }
667
+
668
+ rule(:sum) do |r|
669
+ r[:int, "+", :int].as { |a, _, b| { :sum => [a, b] } }
670
+ end
671
+
672
+ start(:sum)
673
+ end
674
+
675
+ p Parser.new.parse("1+2")
676
+ # =>
677
+ # {:sum=>[{:int=>"1"}, {:int=>"2"}]}
678
+ ```
679
+
680
+ (There could be a side-project in this if somebody thinks a "generic AST" is useful enough).
681
+
488
682
  ## Issues & Questions
489
683
 
490
684
  Any issues, I will address them quickly as it is still early days, though I am pretty confident,
@@ -139,10 +139,11 @@ module Whittle
139
139
  {},
140
140
  self,
141
141
  {
142
- :state => initial_state,
143
- :seen => [],
144
- :offset => 0,
145
- :prec => 0
142
+ :initial => true,
143
+ :state => initial_state,
144
+ :seen => [],
145
+ :offset => 0,
146
+ :prec => 0
146
147
  }
147
148
  )
148
149
  end
@@ -207,13 +208,13 @@ module Whittle
207
208
  states << instruction[:state]
208
209
  args << token[:rule].action.call(token[:value])
209
210
  break
210
- when :reduce
211
+ when :reduce, :accept
211
212
  rule = instruction[:rule]
212
213
  size = rule.components.length
213
214
  args << rule.action.call(*args.pop(size))
214
215
  states.pop(size)
215
216
 
216
- if states.length == 1 && token[:name] == :$end
217
+ if states.length == 1 && instruction[:action] == :accept
217
218
  return args.pop
218
219
  elsif goto = table[states.last][rule.name]
219
220
  states << goto[:state]
@@ -300,7 +301,7 @@ module Whittle
300
301
  end
301
302
 
302
303
  def extract_expected_tokens(state)
303
- state.reject { |s, i| i[:action] == :goto }.keys.collect { |k| k.nil? ? :$end : k }
304
+ state.select { |s, i| [:shift, :accept].include?(i[:action]) }.keys
304
305
  end
305
306
  end
306
307
  end
@@ -88,6 +88,13 @@ module Whittle
88
88
  :rule => self,
89
89
  :prec => context[:prec]
90
90
  }
91
+
92
+ if context[:initial]
93
+ state[:$end] = {
94
+ :action => :accept,
95
+ :rule => self
96
+ }
97
+ end
91
98
  else
92
99
  raise GrammarError, "Unreferenced rule #{sym.inspect}" if rule.nil?
93
100
 
@@ -126,10 +133,11 @@ module Whittle
126
133
  table,
127
134
  parser,
128
135
  {
129
- :state => new_state,
130
- :seen => context[:seen],
131
- :offset => new_offset,
132
- :prec => new_prec
136
+ :initial => context[:initial],
137
+ :state => new_state,
138
+ :seen => context[:seen],
139
+ :offset => new_offset,
140
+ :prec => new_prec
133
141
  }
134
142
  )
135
143
  end
@@ -227,8 +235,7 @@ module Whittle
227
235
  {
228
236
  :rule => self,
229
237
  :value => match[0],
230
- # FIXME: Optimize this line count in a cross-platform compatible way
231
- :line => line + ("~" + match[0] + "~").lines.count - 1,
238
+ :line => line + match[0].count("\r\n", "\n"),
232
239
  :discarded => @action.equal?(NULL_ACTION)
233
240
  }
234
241
  end
@@ -3,5 +3,5 @@
3
3
  # Copyright (c) Chris Corbyn, 2011
4
4
 
5
5
  module Whittle
6
- VERSION = "0.0.3"
6
+ VERSION = "0.0.4"
7
7
  end
@@ -0,0 +1,43 @@
1
+ require "spec_helper"
2
+
3
+ describe "a parser receiving only partial input" do
4
+ let(:parser) do
5
+ Class.new(Whittle::Parser) do
6
+ rule("a")
7
+ rule("b")
8
+ rule("c")
9
+
10
+ rule(";")
11
+
12
+ rule(:abc) do |r|
13
+ r["a", "b", "c"]
14
+ end
15
+
16
+ rule(:prog) do |r|
17
+ r[:abc, ";"]
18
+ end
19
+
20
+ start(:prog)
21
+ end
22
+ end
23
+
24
+ it "raises a parse error" do
25
+ expect { parser.new.parse("abc") }.to raise_error(Whittle::ParseError)
26
+ end
27
+
28
+ it "reports the expected token" do
29
+ begin
30
+ parser.new.parse("abc")
31
+ rescue Whittle::ParseError => e
32
+ e.expected.should == [";"]
33
+ end
34
+ end
35
+
36
+ it "indicates :$end as the received token" do
37
+ begin
38
+ parser.new.parse("abc")
39
+ rescue Whittle::ParseError => e
40
+ e.received.should == :$end
41
+ end
42
+ end
43
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: whittle
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.3
4
+ version: 0.0.4
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2011-11-29 00:00:00.000000000 Z
12
+ date: 2011-11-30 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
16
- requirement: &70110140399280 !ruby/object:Gem::Requirement
16
+ requirement: &70302063205940 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ~>
@@ -21,7 +21,7 @@ dependencies:
21
21
  version: '2.6'
22
22
  type: :development
23
23
  prerelease: false
24
- version_requirements: *70110140399280
24
+ version_requirements: *70302063205940
25
25
  description: ! "Write powerful parsers by defining a series of very simple rules\n
26
26
  \ and operations to perform as those rules are matched. Whittle\n
27
27
  \ parsers are written in pure ruby and as such are extremely
@@ -59,6 +59,7 @@ files:
59
59
  - spec/unit/parser/noop_spec.rb
60
60
  - spec/unit/parser/pass_through_parser_spec.rb
61
61
  - spec/unit/parser/precedence_spec.rb
62
+ - spec/unit/parser/premature_eof_spec.rb
62
63
  - spec/unit/parser/self_referential_expr_spec.rb
63
64
  - spec/unit/parser/skipped_tokens_spec.rb
64
65
  - spec/unit/parser/sum_parser_spec.rb
@@ -99,6 +100,7 @@ test_files:
99
100
  - spec/unit/parser/noop_spec.rb
100
101
  - spec/unit/parser/pass_through_parser_spec.rb
101
102
  - spec/unit/parser/precedence_spec.rb
103
+ - spec/unit/parser/premature_eof_spec.rb
102
104
  - spec/unit/parser/self_referential_expr_spec.rb
103
105
  - spec/unit/parser/skipped_tokens_spec.rb
104
106
  - spec/unit/parser/sum_parser_spec.rb