whittle 0.0.3 → 0.0.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +194 -0
- data/lib/whittle/parser.rb +8 -7
- data/lib/whittle/rule.rb +13 -6
- data/lib/whittle/version.rb +1 -1
- data/spec/unit/parser/premature_eof_spec.rb +43 -0
- metadata +6 -4
data/README.md
CHANGED
@@ -5,6 +5,8 @@ it's 100% ruby. You write parsers by specifying sequences of allowable rules (w
|
|
5
5
|
other rules, or even to themselves). For each rule in your grammar, you provide a block that
|
6
6
|
is invoked when the grammar is recognized.
|
7
7
|
|
8
|
+
**TL;DR** (Skip to 'Summary & FAQ')
|
9
|
+
|
8
10
|
If you're *not* familiar with parsing, you should find Whittle to be a very friendly little
|
9
11
|
parser.
|
10
12
|
|
@@ -485,6 +487,198 @@ would probably be a useful exercise.
|
|
485
487
|
|
486
488
|
If you have any examples you'd like to contribute, I will gladly add them to the repository.
|
487
489
|
|
490
|
+
## Summary & FAQ
|
491
|
+
|
492
|
+
### Defining a rule to match a chunk of the input string
|
493
|
+
|
494
|
+
These are called "terminal rules", since they don't lead anywhere beyond themselves. A word of
|
495
|
+
caution here: the ordering matters. They are scanned in order from top to bottom.
|
496
|
+
|
497
|
+
``` ruby
|
498
|
+
rule("keyword")
|
499
|
+
# or
|
500
|
+
rule(:name => /pattern/)
|
501
|
+
```
|
502
|
+
|
503
|
+
### Providing a semantic action for a terminal rule
|
504
|
+
|
505
|
+
``` ruby
|
506
|
+
rule(:int => /[0-9]+/).as { |str| Integer(str) }
|
507
|
+
```
|
508
|
+
|
509
|
+
### Defining a rule to match a sequence of other rules
|
510
|
+
|
511
|
+
These are called "nonterminal rules", since they require chaining to other rules.
|
512
|
+
|
513
|
+
``` ruby
|
514
|
+
rule(:sum) do |r|
|
515
|
+
r[:int, "+", :int].as { |a, _, b| a + b }
|
516
|
+
end
|
517
|
+
```
|
518
|
+
|
519
|
+
Where `:int` and `"+"` have been previously declared by other rules. Arguments `a` and `b` in the
|
520
|
+
block are the two integers. Argument `_` is the "+" (which we're not using, hence the argument
|
521
|
+
name).
|
522
|
+
|
523
|
+
### Defining alternatives for the same rule
|
524
|
+
|
525
|
+
Call `[](*args)` more than once.
|
526
|
+
|
527
|
+
``` ruby
|
528
|
+
rule(:expr) do |r|
|
529
|
+
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
530
|
+
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
531
|
+
r[:int]
|
532
|
+
end
|
533
|
+
```
|
534
|
+
|
535
|
+
### Skipping whitespace and comments
|
536
|
+
|
537
|
+
``` ruby
|
538
|
+
rule(:wsp => /\s+/).skip!
|
539
|
+
rule(:comment => /#.*$/m).skip!
|
540
|
+
```
|
541
|
+
|
542
|
+
### Looking for the same thing multiple times
|
543
|
+
|
544
|
+
Define the rule for the single item, then add another rule for itself, followed by the single item.
|
545
|
+
|
546
|
+
``` ruby
|
547
|
+
rule(:list) do |r|
|
548
|
+
r[:list, :id].as { |list, id| list << id }
|
549
|
+
r[:id].as { |id| [id] }
|
550
|
+
end
|
551
|
+
```
|
552
|
+
|
553
|
+
If you want to allow zero of something, add an additional `r[]`.
|
554
|
+
|
555
|
+
### Looking for a comma separated list of something
|
556
|
+
|
557
|
+
Just like for above, but with a comma in our recursive rule.
|
558
|
+
|
559
|
+
``` ruby
|
560
|
+
rule(:list) do |r|
|
561
|
+
r[:list, ",", :id].as { |list, _, id| list << id }
|
562
|
+
r[:id].as { |id| [id] }
|
563
|
+
end
|
564
|
+
```
|
565
|
+
|
566
|
+
### Evaluate the left hand side of binary expressions as early as possible
|
567
|
+
|
568
|
+
This is called left association. Tag the operators with `% :left`. They are tagged `% :right` by default.
|
569
|
+
|
570
|
+
``` ruby
|
571
|
+
rule("+") % :left
|
572
|
+
```
|
573
|
+
|
574
|
+
### Give one operator a higher precedence than another
|
575
|
+
|
576
|
+
Attach a precedence number to any operators that need them. The higher the number, the higher the precedence.
|
577
|
+
|
578
|
+
``` ruby
|
579
|
+
rule("+") ^ 1
|
580
|
+
rule("*") ^ 2
|
581
|
+
```
|
582
|
+
|
583
|
+
### I have two types of expression: binary and function call. How can I allow a binary expression in a function call argument, and a function call in a binary expression?
|
584
|
+
|
585
|
+
If you can explain it this simply on paper, you can explain it formally in your grammar. If `:binary_expr`
|
586
|
+
allows `:invocation_expr` as an operand, and if `:invocation_expr` allows `:binary_expr` as an argument, then
|
587
|
+
what you're saying is they can be used in place of each other; thus, define a rule that represents the two of them
|
588
|
+
and use that new rule where you want to support both types of expression.
|
589
|
+
|
590
|
+
Assuming your grammar looked something like this:
|
591
|
+
|
592
|
+
``` ruby
|
593
|
+
rule("+")
|
594
|
+
|
595
|
+
rule(:int => /[0-9]+/).as { |i| Integer(i) }
|
596
|
+
rule(:id => /\w+/)
|
597
|
+
|
598
|
+
rule(:binary_expr) do |r|
|
599
|
+
r[:binary_expr, "+", :binary_expr].as { |a, _, b| a + b}
|
600
|
+
r[:int]
|
601
|
+
end
|
602
|
+
|
603
|
+
rule(:args) do |r|
|
604
|
+
r[].as { [] } # empty list
|
605
|
+
r[:args, :int].as { |args, i| args << i }
|
606
|
+
r[:int].as { |i| [i] }
|
607
|
+
end
|
608
|
+
|
609
|
+
rule(:invocation_expr) do |r|
|
610
|
+
r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
|
611
|
+
end
|
612
|
+
```
|
613
|
+
|
614
|
+
This grammar can parse things like "1 + 2 + 3" and "foo(1, 2, 3)", but it can't parse something like
|
615
|
+
"1 + foo(2 + 3) + 4".
|
616
|
+
|
617
|
+
The goal is to replace `:int` in the `:args` rule and `:binary_expr` in the `:binary_expr` rule, with
|
618
|
+
something that represents both types of expression.
|
619
|
+
|
620
|
+
``` ruby
|
621
|
+
rule("+")
|
622
|
+
|
623
|
+
rule(:int => /[0-9]+/).as { |i| Integer(i) }
|
624
|
+
rule(:id => /\w+/)
|
625
|
+
|
626
|
+
rule(:expr) do |r|
|
627
|
+
r[:binary_expr]
|
628
|
+
r[:invocation_expr]
|
629
|
+
end
|
630
|
+
|
631
|
+
rule(:binary_expr) do |r|
|
632
|
+
r[:expr, "+", :expr].as { |a, _, b| a + b}
|
633
|
+
r[:int]
|
634
|
+
end
|
635
|
+
|
636
|
+
rule(:args) do |r|
|
637
|
+
r[].as { [] } # empty list
|
638
|
+
r[:args, :expr].as { |args, expr| args << expr }
|
639
|
+
r[:expr].as { |expr| [expr] }
|
640
|
+
end
|
641
|
+
|
642
|
+
rule(:invocation_expr) do |r|
|
643
|
+
r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
|
644
|
+
end
|
645
|
+
```
|
646
|
+
|
647
|
+
Now we can parse the more complex expression "1 + foo(2, 3) + 4" without any issues.
|
648
|
+
|
649
|
+
### How do I track state to store variables etc with Whittle?
|
650
|
+
|
651
|
+
One of the goals of making Whittle all ruby was that I wouldn't have to tie people into any particular way of doing
|
652
|
+
something. Your blocks can call any ruby code they like, so create an object of some sort those blocks can reference
|
653
|
+
and do as you need during the parse. For example, you could add a method to the class called something like `runtime`,
|
654
|
+
which is accessible from each block.
|
655
|
+
|
656
|
+
### I just want Whittle to give me an AST of my input
|
657
|
+
|
658
|
+
AST (abstract syntax tree) is a loose term. Early versions originally created an AST, but the format you want the AST
|
659
|
+
in probably differs from the format the next developer wants it in. It's really easy to use your grammar to make one
|
660
|
+
however you please:
|
661
|
+
|
662
|
+
``` ruby
|
663
|
+
class Parser < Whittle::Parser
|
664
|
+
rule("+")
|
665
|
+
|
666
|
+
rule(:int => /[0-9]+/).as { |int| { :int => int } }
|
667
|
+
|
668
|
+
rule(:sum) do |r|
|
669
|
+
r[:int, "+", :int].as { |a, _, b| { :sum => [a, b] } }
|
670
|
+
end
|
671
|
+
|
672
|
+
start(:sum)
|
673
|
+
end
|
674
|
+
|
675
|
+
p Parser.new.parse("1+2")
|
676
|
+
# =>
|
677
|
+
# {:sum=>[{:int=>"1"}, {:int=>"2"}]}
|
678
|
+
```
|
679
|
+
|
680
|
+
(There could be a side-project in this if somebody thinks a "generic AST" is useful enough).
|
681
|
+
|
488
682
|
## Issues & Questions
|
489
683
|
|
490
684
|
Any issues, I will address them quickly as it is still early days, though I am pretty confident,
|
data/lib/whittle/parser.rb
CHANGED
@@ -139,10 +139,11 @@ module Whittle
|
|
139
139
|
{},
|
140
140
|
self,
|
141
141
|
{
|
142
|
-
:
|
143
|
-
:
|
144
|
-
:
|
145
|
-
:
|
142
|
+
:initial => true,
|
143
|
+
:state => initial_state,
|
144
|
+
:seen => [],
|
145
|
+
:offset => 0,
|
146
|
+
:prec => 0
|
146
147
|
}
|
147
148
|
)
|
148
149
|
end
|
@@ -207,13 +208,13 @@ module Whittle
|
|
207
208
|
states << instruction[:state]
|
208
209
|
args << token[:rule].action.call(token[:value])
|
209
210
|
break
|
210
|
-
when :reduce
|
211
|
+
when :reduce, :accept
|
211
212
|
rule = instruction[:rule]
|
212
213
|
size = rule.components.length
|
213
214
|
args << rule.action.call(*args.pop(size))
|
214
215
|
states.pop(size)
|
215
216
|
|
216
|
-
if states.length == 1 &&
|
217
|
+
if states.length == 1 && instruction[:action] == :accept
|
217
218
|
return args.pop
|
218
219
|
elsif goto = table[states.last][rule.name]
|
219
220
|
states << goto[:state]
|
@@ -300,7 +301,7 @@ module Whittle
|
|
300
301
|
end
|
301
302
|
|
302
303
|
def extract_expected_tokens(state)
|
303
|
-
state.
|
304
|
+
state.select { |s, i| [:shift, :accept].include?(i[:action]) }.keys
|
304
305
|
end
|
305
306
|
end
|
306
307
|
end
|
data/lib/whittle/rule.rb
CHANGED
@@ -88,6 +88,13 @@ module Whittle
|
|
88
88
|
:rule => self,
|
89
89
|
:prec => context[:prec]
|
90
90
|
}
|
91
|
+
|
92
|
+
if context[:initial]
|
93
|
+
state[:$end] = {
|
94
|
+
:action => :accept,
|
95
|
+
:rule => self
|
96
|
+
}
|
97
|
+
end
|
91
98
|
else
|
92
99
|
raise GrammarError, "Unreferenced rule #{sym.inspect}" if rule.nil?
|
93
100
|
|
@@ -126,10 +133,11 @@ module Whittle
|
|
126
133
|
table,
|
127
134
|
parser,
|
128
135
|
{
|
129
|
-
:
|
130
|
-
:
|
131
|
-
:
|
132
|
-
:
|
136
|
+
:initial => context[:initial],
|
137
|
+
:state => new_state,
|
138
|
+
:seen => context[:seen],
|
139
|
+
:offset => new_offset,
|
140
|
+
:prec => new_prec
|
133
141
|
}
|
134
142
|
)
|
135
143
|
end
|
@@ -227,8 +235,7 @@ module Whittle
|
|
227
235
|
{
|
228
236
|
:rule => self,
|
229
237
|
:value => match[0],
|
230
|
-
|
231
|
-
:line => line + ("~" + match[0] + "~").lines.count - 1,
|
238
|
+
:line => line + match[0].count("\r\n", "\n"),
|
232
239
|
:discarded => @action.equal?(NULL_ACTION)
|
233
240
|
}
|
234
241
|
end
|
data/lib/whittle/version.rb
CHANGED
@@ -0,0 +1,43 @@
|
|
1
|
+
require "spec_helper"
|
2
|
+
|
3
|
+
describe "a parser receiving only partial input" do
|
4
|
+
let(:parser) do
|
5
|
+
Class.new(Whittle::Parser) do
|
6
|
+
rule("a")
|
7
|
+
rule("b")
|
8
|
+
rule("c")
|
9
|
+
|
10
|
+
rule(";")
|
11
|
+
|
12
|
+
rule(:abc) do |r|
|
13
|
+
r["a", "b", "c"]
|
14
|
+
end
|
15
|
+
|
16
|
+
rule(:prog) do |r|
|
17
|
+
r[:abc, ";"]
|
18
|
+
end
|
19
|
+
|
20
|
+
start(:prog)
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
it "raises a parse error" do
|
25
|
+
expect { parser.new.parse("abc") }.to raise_error(Whittle::ParseError)
|
26
|
+
end
|
27
|
+
|
28
|
+
it "reports the expected token" do
|
29
|
+
begin
|
30
|
+
parser.new.parse("abc")
|
31
|
+
rescue Whittle::ParseError => e
|
32
|
+
e.expected.should == [";"]
|
33
|
+
end
|
34
|
+
end
|
35
|
+
|
36
|
+
it "indicates :$end as the received token" do
|
37
|
+
begin
|
38
|
+
parser.new.parse("abc")
|
39
|
+
rescue Whittle::ParseError => e
|
40
|
+
e.received.should == :$end
|
41
|
+
end
|
42
|
+
end
|
43
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: whittle
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.4
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,11 +9,11 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2011-11-
|
12
|
+
date: 2011-11-30 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rspec
|
16
|
-
requirement: &
|
16
|
+
requirement: &70302063205940 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ~>
|
@@ -21,7 +21,7 @@ dependencies:
|
|
21
21
|
version: '2.6'
|
22
22
|
type: :development
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *70302063205940
|
25
25
|
description: ! "Write powerful parsers by defining a series of very simple rules\n
|
26
26
|
\ and operations to perform as those rules are matched. Whittle\n
|
27
27
|
\ parsers are written in pure ruby and as such are extremely
|
@@ -59,6 +59,7 @@ files:
|
|
59
59
|
- spec/unit/parser/noop_spec.rb
|
60
60
|
- spec/unit/parser/pass_through_parser_spec.rb
|
61
61
|
- spec/unit/parser/precedence_spec.rb
|
62
|
+
- spec/unit/parser/premature_eof_spec.rb
|
62
63
|
- spec/unit/parser/self_referential_expr_spec.rb
|
63
64
|
- spec/unit/parser/skipped_tokens_spec.rb
|
64
65
|
- spec/unit/parser/sum_parser_spec.rb
|
@@ -99,6 +100,7 @@ test_files:
|
|
99
100
|
- spec/unit/parser/noop_spec.rb
|
100
101
|
- spec/unit/parser/pass_through_parser_spec.rb
|
101
102
|
- spec/unit/parser/precedence_spec.rb
|
103
|
+
- spec/unit/parser/premature_eof_spec.rb
|
102
104
|
- spec/unit/parser/self_referential_expr_spec.rb
|
103
105
|
- spec/unit/parser/skipped_tokens_spec.rb
|
104
106
|
- spec/unit/parser/sum_parser_spec.rb
|