whittle 0.0.3 → 0.0.4
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +194 -0
- data/lib/whittle/parser.rb +8 -7
- data/lib/whittle/rule.rb +13 -6
- data/lib/whittle/version.rb +1 -1
- data/spec/unit/parser/premature_eof_spec.rb +43 -0
- metadata +6 -4
data/README.md
CHANGED
@@ -5,6 +5,8 @@ it's 100% ruby. You write parsers by specifying sequences of allowable rules (w
|
|
5
5
|
other rules, or even to themselves). For each rule in your grammar, you provide a block that
|
6
6
|
is invoked when the grammar is recognized.
|
7
7
|
|
8
|
+
**TL;DR** (Skip to 'Summary & FAQ')
|
9
|
+
|
8
10
|
If you're *not* familiar with parsing, you should find Whittle to be a very friendly little
|
9
11
|
parser.
|
10
12
|
|
@@ -485,6 +487,198 @@ would probably be a useful exercise.
|
|
485
487
|
|
486
488
|
If you have any examples you'd like to contribute, I will gladly add them to the repository.
|
487
489
|
|
490
|
+
## Summary & FAQ
|
491
|
+
|
492
|
+
### Defining a rule to match a chunk of the input string
|
493
|
+
|
494
|
+
These are called "terminal rules", since they don't lead anywhere beyond themselves. A word of
|
495
|
+
caution here: the ordering matters. They are scanned in order from top to bottom.
|
496
|
+
|
497
|
+
``` ruby
|
498
|
+
rule("keyword")
|
499
|
+
# or
|
500
|
+
rule(:name => /pattern/)
|
501
|
+
```
|
502
|
+
|
503
|
+
### Providing a semantic action for a terminal rule
|
504
|
+
|
505
|
+
``` ruby
|
506
|
+
rule(:int => /[0-9]+/).as { |str| Integer(str) }
|
507
|
+
```
|
508
|
+
|
509
|
+
### Defining a rule to match a sequence of other rules
|
510
|
+
|
511
|
+
These are called "nonterminal rules", since they require chaining to other rules.
|
512
|
+
|
513
|
+
``` ruby
|
514
|
+
rule(:sum) do |r|
|
515
|
+
r[:int, "+", :int].as { |a, _, b| a + b }
|
516
|
+
end
|
517
|
+
```
|
518
|
+
|
519
|
+
Where `:int` and `"+"` have been previously declared by other rules. Arguments `a` and `b` in the
|
520
|
+
block are the two integers. Argument `_` is the "+" (which we're not using, hence the argument
|
521
|
+
name).
|
522
|
+
|
523
|
+
### Defining alternatives for the same rule
|
524
|
+
|
525
|
+
Call `[](*args)` more than once.
|
526
|
+
|
527
|
+
``` ruby
|
528
|
+
rule(:expr) do |r|
|
529
|
+
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
530
|
+
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
531
|
+
r[:int]
|
532
|
+
end
|
533
|
+
```
|
534
|
+
|
535
|
+
### Skipping whitespace and comments
|
536
|
+
|
537
|
+
``` ruby
|
538
|
+
rule(:wsp => /\s+/).skip!
|
539
|
+
rule(:comment => /#.*$/m).skip!
|
540
|
+
```
|
541
|
+
|
542
|
+
### Looking for the same thing multiple times
|
543
|
+
|
544
|
+
Define the rule for the single item, then add another rule for itself, followed by the single item.
|
545
|
+
|
546
|
+
``` ruby
|
547
|
+
rule(:list) do |r|
|
548
|
+
r[:list, :id].as { |list, id| list << id }
|
549
|
+
r[:id].as { |id| [id] }
|
550
|
+
end
|
551
|
+
```
|
552
|
+
|
553
|
+
If you want to allow zero of something, add an additional `r[]`.
|
554
|
+
|
555
|
+
### Looking for a comma separated list of something
|
556
|
+
|
557
|
+
Just like for above, but with a comma in our recursive rule.
|
558
|
+
|
559
|
+
``` ruby
|
560
|
+
rule(:list) do |r|
|
561
|
+
r[:list, ",", :id].as { |list, _, id| list << id }
|
562
|
+
r[:id].as { |id| [id] }
|
563
|
+
end
|
564
|
+
```
|
565
|
+
|
566
|
+
### Evaluate the left hand side of binary expressions as early as possible
|
567
|
+
|
568
|
+
This is called left association. Tag the operators with `% :left`. They are tagged `% :right` by default.
|
569
|
+
|
570
|
+
``` ruby
|
571
|
+
rule("+") % :left
|
572
|
+
```
|
573
|
+
|
574
|
+
### Give one operator a higher precedence than another
|
575
|
+
|
576
|
+
Attach a precedence number to any operators that need them. The higher the number, the higher the precedence.
|
577
|
+
|
578
|
+
``` ruby
|
579
|
+
rule("+") ^ 1
|
580
|
+
rule("*") ^ 2
|
581
|
+
```
|
582
|
+
|
583
|
+
### I have two types of expression: binary and function call. How can I allow a binary expression in a function call argument, and a function call in a binary expression?
|
584
|
+
|
585
|
+
If you can explain it this simply on paper, you can explain it formally in your grammar. If `:binary_expr`
|
586
|
+
allows `:invocation_expr` as an operand, and if `:invocation_expr` allows `:binary_expr` as an argument, then
|
587
|
+
what you're saying is they can be used in place of each other; thus, define a rule that represents the two of them
|
588
|
+
and use that new rule where you want to support both types of expression.
|
589
|
+
|
590
|
+
Assuming your grammar looked something like this:
|
591
|
+
|
592
|
+
``` ruby
|
593
|
+
rule("+")
|
594
|
+
|
595
|
+
rule(:int => /[0-9]+/).as { |i| Integer(i) }
|
596
|
+
rule(:id => /\w+/)
|
597
|
+
|
598
|
+
rule(:binary_expr) do |r|
|
599
|
+
r[:binary_expr, "+", :binary_expr].as { |a, _, b| a + b}
|
600
|
+
r[:int]
|
601
|
+
end
|
602
|
+
|
603
|
+
rule(:args) do |r|
|
604
|
+
r[].as { [] } # empty list
|
605
|
+
r[:args, :int].as { |args, i| args << i }
|
606
|
+
r[:int].as { |i| [i] }
|
607
|
+
end
|
608
|
+
|
609
|
+
rule(:invocation_expr) do |r|
|
610
|
+
r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
|
611
|
+
end
|
612
|
+
```
|
613
|
+
|
614
|
+
This grammar can parse things like "1 + 2 + 3" and "foo(1, 2, 3)", but it can't parse something like
|
615
|
+
"1 + foo(2 + 3) + 4".
|
616
|
+
|
617
|
+
The goal is to replace `:int` in the `:args` rule and `:binary_expr` in the `:binary_expr` rule, with
|
618
|
+
something that represents both types of expression.
|
619
|
+
|
620
|
+
``` ruby
|
621
|
+
rule("+")
|
622
|
+
|
623
|
+
rule(:int => /[0-9]+/).as { |i| Integer(i) }
|
624
|
+
rule(:id => /\w+/)
|
625
|
+
|
626
|
+
rule(:expr) do |r|
|
627
|
+
r[:binary_expr]
|
628
|
+
r[:invocation_expr]
|
629
|
+
end
|
630
|
+
|
631
|
+
rule(:binary_expr) do |r|
|
632
|
+
r[:expr, "+", :expr].as { |a, _, b| a + b}
|
633
|
+
r[:int]
|
634
|
+
end
|
635
|
+
|
636
|
+
rule(:args) do |r|
|
637
|
+
r[].as { [] } # empty list
|
638
|
+
r[:args, :expr].as { |args, expr| args << expr }
|
639
|
+
r[:expr].as { |expr| [expr] }
|
640
|
+
end
|
641
|
+
|
642
|
+
rule(:invocation_expr) do |r|
|
643
|
+
r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
|
644
|
+
end
|
645
|
+
```
|
646
|
+
|
647
|
+
Now we can parse the more complex expression "1 + foo(2, 3) + 4" without any issues.
|
648
|
+
|
649
|
+
### How do I track state to store variables etc with Whittle?
|
650
|
+
|
651
|
+
One of the goals of making Whittle all ruby was that I wouldn't have to tie people into any particular way of doing
|
652
|
+
something. Your blocks can call any ruby code they like, so create an object of some sort those blocks can reference
|
653
|
+
and do as you need during the parse. For example, you could add a method to the class called something like `runtime`,
|
654
|
+
which is accessible from each block.
|
655
|
+
|
656
|
+
### I just want Whittle to give me an AST of my input
|
657
|
+
|
658
|
+
AST (abstract syntax tree) is a loose term. Early versions originally created an AST, but the format you want the AST
|
659
|
+
in probably differs from the format the next developer wants it in. It's really easy to use your grammar to make one
|
660
|
+
however you please:
|
661
|
+
|
662
|
+
``` ruby
|
663
|
+
class Parser < Whittle::Parser
|
664
|
+
rule("+")
|
665
|
+
|
666
|
+
rule(:int => /[0-9]+/).as { |int| { :int => int } }
|
667
|
+
|
668
|
+
rule(:sum) do |r|
|
669
|
+
r[:int, "+", :int].as { |a, _, b| { :sum => [a, b] } }
|
670
|
+
end
|
671
|
+
|
672
|
+
start(:sum)
|
673
|
+
end
|
674
|
+
|
675
|
+
p Parser.new.parse("1+2")
|
676
|
+
# =>
|
677
|
+
# {:sum=>[{:int=>"1"}, {:int=>"2"}]}
|
678
|
+
```
|
679
|
+
|
680
|
+
(There could be a side-project in this if somebody thinks a "generic AST" is useful enough).
|
681
|
+
|
488
682
|
## Issues & Questions
|
489
683
|
|
490
684
|
Any issues, I will address them quickly as it is still early days, though I am pretty confident,
|
data/lib/whittle/parser.rb
CHANGED
@@ -139,10 +139,11 @@ module Whittle
|
|
139
139
|
{},
|
140
140
|
self,
|
141
141
|
{
|
142
|
-
:
|
143
|
-
:
|
144
|
-
:
|
145
|
-
:
|
142
|
+
:initial => true,
|
143
|
+
:state => initial_state,
|
144
|
+
:seen => [],
|
145
|
+
:offset => 0,
|
146
|
+
:prec => 0
|
146
147
|
}
|
147
148
|
)
|
148
149
|
end
|
@@ -207,13 +208,13 @@ module Whittle
|
|
207
208
|
states << instruction[:state]
|
208
209
|
args << token[:rule].action.call(token[:value])
|
209
210
|
break
|
210
|
-
when :reduce
|
211
|
+
when :reduce, :accept
|
211
212
|
rule = instruction[:rule]
|
212
213
|
size = rule.components.length
|
213
214
|
args << rule.action.call(*args.pop(size))
|
214
215
|
states.pop(size)
|
215
216
|
|
216
|
-
if states.length == 1 &&
|
217
|
+
if states.length == 1 && instruction[:action] == :accept
|
217
218
|
return args.pop
|
218
219
|
elsif goto = table[states.last][rule.name]
|
219
220
|
states << goto[:state]
|
@@ -300,7 +301,7 @@ module Whittle
|
|
300
301
|
end
|
301
302
|
|
302
303
|
def extract_expected_tokens(state)
|
303
|
-
state.
|
304
|
+
state.select { |s, i| [:shift, :accept].include?(i[:action]) }.keys
|
304
305
|
end
|
305
306
|
end
|
306
307
|
end
|
data/lib/whittle/rule.rb
CHANGED
@@ -88,6 +88,13 @@ module Whittle
|
|
88
88
|
:rule => self,
|
89
89
|
:prec => context[:prec]
|
90
90
|
}
|
91
|
+
|
92
|
+
if context[:initial]
|
93
|
+
state[:$end] = {
|
94
|
+
:action => :accept,
|
95
|
+
:rule => self
|
96
|
+
}
|
97
|
+
end
|
91
98
|
else
|
92
99
|
raise GrammarError, "Unreferenced rule #{sym.inspect}" if rule.nil?
|
93
100
|
|
@@ -126,10 +133,11 @@ module Whittle
|
|
126
133
|
table,
|
127
134
|
parser,
|
128
135
|
{
|
129
|
-
:
|
130
|
-
:
|
131
|
-
:
|
132
|
-
:
|
136
|
+
:initial => context[:initial],
|
137
|
+
:state => new_state,
|
138
|
+
:seen => context[:seen],
|
139
|
+
:offset => new_offset,
|
140
|
+
:prec => new_prec
|
133
141
|
}
|
134
142
|
)
|
135
143
|
end
|
@@ -227,8 +235,7 @@ module Whittle
|
|
227
235
|
{
|
228
236
|
:rule => self,
|
229
237
|
:value => match[0],
|
230
|
-
|
231
|
-
:line => line + ("~" + match[0] + "~").lines.count - 1,
|
238
|
+
:line => line + match[0].count("\r\n", "\n"),
|
232
239
|
:discarded => @action.equal?(NULL_ACTION)
|
233
240
|
}
|
234
241
|
end
|
data/lib/whittle/version.rb
CHANGED
@@ -0,0 +1,43 @@
|
|
1
|
+
require "spec_helper"
|
2
|
+
|
3
|
+
describe "a parser receiving only partial input" do
|
4
|
+
let(:parser) do
|
5
|
+
Class.new(Whittle::Parser) do
|
6
|
+
rule("a")
|
7
|
+
rule("b")
|
8
|
+
rule("c")
|
9
|
+
|
10
|
+
rule(";")
|
11
|
+
|
12
|
+
rule(:abc) do |r|
|
13
|
+
r["a", "b", "c"]
|
14
|
+
end
|
15
|
+
|
16
|
+
rule(:prog) do |r|
|
17
|
+
r[:abc, ";"]
|
18
|
+
end
|
19
|
+
|
20
|
+
start(:prog)
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
it "raises a parse error" do
|
25
|
+
expect { parser.new.parse("abc") }.to raise_error(Whittle::ParseError)
|
26
|
+
end
|
27
|
+
|
28
|
+
it "reports the expected token" do
|
29
|
+
begin
|
30
|
+
parser.new.parse("abc")
|
31
|
+
rescue Whittle::ParseError => e
|
32
|
+
e.expected.should == [";"]
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
it "indicates :$end as the received token" do
|
37
|
+
begin
|
38
|
+
parser.new.parse("abc")
|
39
|
+
rescue Whittle::ParseError => e
|
40
|
+
e.received.should == :$end
|
41
|
+
end
|
42
|
+
end
|
43
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: whittle
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.4
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,11 +9,11 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2011-11-
|
12
|
+
date: 2011-11-30 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rspec
|
16
|
-
requirement: &
|
16
|
+
requirement: &70302063205940 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ~>
|
@@ -21,7 +21,7 @@ dependencies:
|
|
21
21
|
version: '2.6'
|
22
22
|
type: :development
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *70302063205940
|
25
25
|
description: ! "Write powerful parsers by defining a series of very simple rules\n
|
26
26
|
\ and operations to perform as those rules are matched. Whittle\n
|
27
27
|
\ parsers are written in pure ruby and as such are extremely
|
@@ -59,6 +59,7 @@ files:
|
|
59
59
|
- spec/unit/parser/noop_spec.rb
|
60
60
|
- spec/unit/parser/pass_through_parser_spec.rb
|
61
61
|
- spec/unit/parser/precedence_spec.rb
|
62
|
+
- spec/unit/parser/premature_eof_spec.rb
|
62
63
|
- spec/unit/parser/self_referential_expr_spec.rb
|
63
64
|
- spec/unit/parser/skipped_tokens_spec.rb
|
64
65
|
- spec/unit/parser/sum_parser_spec.rb
|
@@ -99,6 +100,7 @@ test_files:
|
|
99
100
|
- spec/unit/parser/noop_spec.rb
|
100
101
|
- spec/unit/parser/pass_through_parser_spec.rb
|
101
102
|
- spec/unit/parser/precedence_spec.rb
|
103
|
+
- spec/unit/parser/premature_eof_spec.rb
|
102
104
|
- spec/unit/parser/self_referential_expr_spec.rb
|
103
105
|
- spec/unit/parser/skipped_tokens_spec.rb
|
104
106
|
- spec/unit/parser/sum_parser_spec.rb
|