whittle 0.0.1 → 0.0.2
Sign up to get free protection for your applications and to get access to all the features.
- data/README.md +102 -75
- data/examples/calculator.rb +59 -0
- data/lib/whittle/parser.rb +25 -52
- data/lib/whittle/rule.rb +11 -1
- data/lib/whittle/version.rb +5 -1
- data/lib/whittle.rb +4 -0
- data/spec/unit/parser/error_reporting_spec.rb +2 -6
- data/spec/unit/parser/grouped_expr_spec.rb +2 -4
- data/spec/unit/parser/multiple_precedence_spec.rb +2 -4
- data/spec/unit/parser/noop_spec.rb +2 -4
- data/spec/unit/parser/pass_through_parser_spec.rb +1 -3
- data/spec/unit/parser/precedence_spec.rb +2 -4
- data/spec/unit/parser/self_referential_expr_spec.rb +2 -4
- data/spec/unit/parser/skipped_tokens_spec.rb +3 -7
- data/spec/unit/parser/sum_parser_spec.rb +1 -3
- data/spec/unit/parser/typecast_parser_spec.rb +1 -3
- metadata +5 -4
data/README.md
CHANGED
@@ -2,32 +2,57 @@
|
|
2
2
|
|
3
3
|
Whittle is a LALR(1) parser. It's very small, easy to understand, and what's most important,
|
4
4
|
it's 100% ruby. You write parsers by specifying sequences of allowable rules (which refer to
|
5
|
-
other rules, or even to themselves)
|
5
|
+
other rules, or even to themselves). For each rule in your grammar, you provide a block that
|
6
6
|
is invoked when the grammar is recognized.
|
7
7
|
|
8
|
-
If you're not familiar with parsing, you should find Whittle to be a very friendly little
|
8
|
+
If you're *not* familiar with parsing, you should find Whittle to be a very friendly little
|
9
9
|
parser.
|
10
10
|
|
11
|
-
It is related, somewhat, to yacc and bison, which belong to the class of parsers
|
12
|
-
LALR(1):
|
13
|
-
|
11
|
+
It is related, somewhat, to yacc and bison, which belong to the class of parsers known as
|
12
|
+
LALR(1): Left-Right, using 1 Lookahead token. This class of parsers is both easy to work with
|
13
|
+
and particularly powerful (ruby itself is parsed using a LALR(1) parser). Since the algorithm
|
14
|
+
is based around a theory that *never* has to backtrack (that is, each token read takes the
|
15
|
+
parse forward, with just a single lookup in a parse table), parse time is also fast. Parse
|
16
|
+
time is governed by the size of the input, not by the size of the grammar.
|
14
17
|
|
15
|
-
Whittle provides meaningful error reporting
|
16
|
-
if you need to write some sort of crazy
|
18
|
+
Whittle provides meaningful error reporting (line number, expected tokens, received token) and
|
19
|
+
even lets you hook into the error handling logic if you need to write some sort of crazy
|
20
|
+
madman-forgiving parser.
|
21
|
+
|
22
|
+
If you've had issues with other parsers hitting "stack level too deep" errors, you should find
|
23
|
+
that Whittle does not suffer from the same issues, since it uses a state-switching algorithm
|
24
|
+
(a pushdown automaton to be precise), rather than simply having one parse function call another
|
25
|
+
and so on. Whittle also supports the following concepts:
|
26
|
+
|
27
|
+
- Left/right recursion
|
28
|
+
- Left/right associativity
|
29
|
+
- Operator precedences
|
30
|
+
- Skipping of silent tokens in the input (e.g. whitespace/comments)
|
31
|
+
|
32
|
+
## Installation
|
33
|
+
|
34
|
+
Via rubygems:
|
35
|
+
|
36
|
+
gem install whittle
|
37
|
+
|
38
|
+
Or in your Gemfile, if you're using bundler:
|
39
|
+
|
40
|
+
gem 'whittle'
|
17
41
|
|
18
42
|
## The Basics
|
19
43
|
|
20
|
-
Parsers using Whittle
|
21
|
-
odd, but c'mon, we're using Ruby, right?
|
44
|
+
Parsers using Whittle do not generate ruby code from a grammar file. This may strike users of
|
45
|
+
other LALR(1) parsers as odd, but c'mon, we're using Ruby, right?
|
22
46
|
|
23
47
|
I'll avoid discussing the algorithm until we get into the really advanced stuff, but you will
|
24
48
|
need to understand a few fundamental ideas before we begin.
|
25
49
|
|
26
|
-
1. There are two types of rule that make up a complete parser: terminal
|
50
|
+
1. There are two types of rule that make up a complete parser: *terminal*, and *nonterminal*
|
27
51
|
- A terminal rule is quite simply a chunk of the input string, like '42', or 'function'
|
28
|
-
- A nonterminal rule is a rule that makes reference to other rules (terminal and
|
52
|
+
- A nonterminal rule is a rule that makes reference to other rules (both terminal and
|
53
|
+
nonterminal)
|
29
54
|
2. The input to be parsed *always* conforms to just one rule at the topmost level. This is
|
30
|
-
known as the "start rule".
|
55
|
+
known as the "start rule" and describes the structure of the program as a whole.
|
31
56
|
|
32
57
|
The easiest way to understand how the parser works is just to learn by example, so let's see an
|
33
58
|
example.
|
@@ -38,9 +63,7 @@ require 'whittle'
|
|
38
63
|
class Mathematician < Whittle::Parser
|
39
64
|
rule("+")
|
40
65
|
|
41
|
-
rule(:int)
|
42
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
43
|
-
end
|
66
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
44
67
|
|
45
68
|
rule(:expr) do |r|
|
46
69
|
r[:int, "+", :int].as { |a, _, b| a + b }
|
@@ -55,21 +78,32 @@ mathematician.parse("1+2")
|
|
55
78
|
```
|
56
79
|
|
57
80
|
Let's break this down a bit. As you can see, the whole thing is really just `rule` used in
|
58
|
-
different ways. We also have to set the rule that we can use to describe an entire
|
59
|
-
which in this case is the `:expr` rule that can add two numbers together.
|
81
|
+
different ways. We also have to set the start rule that we can use to describe an entire
|
82
|
+
program, which in this case is the `:expr` rule that can add two numbers together.
|
60
83
|
|
61
84
|
There are two terminal rules (`"+"` and `:int`) and one nonterminal (`:expr`) in the above
|
62
85
|
grammar. Each rule can have a block attached to it. The block is invoked with the result
|
63
|
-
evaluating the blocks
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
86
|
+
evaluating the blocks attached to each of its inputs (in a depth-first manner). The default
|
87
|
+
action if no block is given, is to return whatever the leftmost input to the rule happens to
|
88
|
+
be.
|
89
|
+
|
90
|
+
We can optionally use the Hash notation to map a name with a pattern (or a fixed string) when
|
91
|
+
we declare terminal rules too, as we have done with the `:int` rule above. Note that the
|
92
|
+
longer way around defining terminal rules is to do like we have done for `:expr` and define a
|
93
|
+
block, but since this is such a common use-case, Whittle offers the shorthand.
|
94
|
+
|
95
|
+
As the input string is parsed, it *must* match the start rule `:expr`.
|
96
|
+
|
97
|
+
Let's step through the parse for the above input "1+2". When the parser starts, it looks at
|
98
|
+
the start rule `:expr` and decides what tokens would be valid if they were encountered. Since
|
99
|
+
`:expr` starts with `:int`, the only thing that would be valid is anything matching
|
100
|
+
`/[0-9]+/`. When the parser reads the "1", it recognizes it as an `:int`, puts at aside (puts
|
101
|
+
it on the stack, in technical terms). Now it advances through the rule for `:expr` and
|
102
|
+
decides the only possible valid input would be a "+", and finally the last `:int`. Upon
|
103
|
+
having read the sequence `:int`, "+", `:int`, our block attached to that rule is invoked to
|
104
|
+
return a result. First the three inputs are passed through their respective blocks (so the
|
105
|
+
"1" and the "2" are cast to integers, according to the rule for `:int`), then they are passed
|
106
|
+
to the `:expr`, which adds the 1 and the 2 to make 3. Magic!
|
73
107
|
|
74
108
|
## Nonterminal rules can have more than one valid sequence
|
75
109
|
|
@@ -88,9 +122,7 @@ class Mathematician < Whittle::Parser
|
|
88
122
|
rule("*")
|
89
123
|
rule("/")
|
90
124
|
|
91
|
-
rule(:int)
|
92
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
93
|
-
end
|
125
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
94
126
|
|
95
127
|
rule(:expr) do |r|
|
96
128
|
r[:int, "+", :int].as { |a, _, b| a + b }
|
@@ -117,7 +149,9 @@ mathematician.parse("4/2")
|
|
117
149
|
# => 2
|
118
150
|
```
|
119
151
|
|
120
|
-
Now you're probably
|
152
|
+
Now you're probably beginning to see how matching just one rule for the entire input is not a
|
153
|
+
problem. To think about a more real world example, you can describe most programming
|
154
|
+
languages as a series of statements and constructs.
|
121
155
|
|
122
156
|
## Rules can refer to themselves
|
123
157
|
|
@@ -133,16 +167,14 @@ class Mathematician < Whittle::Parser
|
|
133
167
|
rule("*")
|
134
168
|
rule("/")
|
135
169
|
|
136
|
-
rule(:int)
|
137
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
138
|
-
end
|
170
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
139
171
|
|
140
172
|
rule(:expr) do |r|
|
141
173
|
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
142
174
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
143
175
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
144
176
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
145
|
-
r[:int]
|
177
|
+
r[:int]
|
146
178
|
end
|
147
179
|
|
148
180
|
start(:expr)
|
@@ -156,14 +188,15 @@ mathematician.parse("1+5-2")
|
|
156
188
|
Adding a rule of just `:int` to the `:expr` rule means that any integer is also a valid `:expr`.
|
157
189
|
It is now possible to say that any `:expr` can be added to, multiplied by, divided by or
|
158
190
|
subtracted from another `:expr`. It is this ability to self-reference that makes LALR(1)
|
159
|
-
parsers so powerful and easy to use. Note that because the result each
|
160
|
-
*before* being passed as arguments to the block, each `:expr` in the calculations
|
161
|
-
always be a number, since each `:expr` returns a number.
|
191
|
+
parsers so powerful and easy to use. Note that because the result each input to any given rule
|
192
|
+
is computed *before* being passed as arguments to the block, each `:expr` in the calculations
|
193
|
+
above will always be a number, since each `:expr` returns a number. The recursion in these rules
|
194
|
+
is practically limitless. You can write "1+2-3*4+775/3" and it's still an `:expr`.
|
162
195
|
|
163
196
|
## Specifying the associativity
|
164
197
|
|
165
|
-
|
166
|
-
what happens when we do the following:
|
198
|
+
If we poke around for more than a few seconds, we'll soon realize that our mathematician makes
|
199
|
+
some silly mistakes. Let's see what happens when we do the following:
|
167
200
|
|
168
201
|
``` ruby
|
169
202
|
mathematician.parse("6-3-1")
|
@@ -196,16 +229,14 @@ class Mathematician < Whittle::Parser
|
|
196
229
|
rule("*") % :left
|
197
230
|
rule("/") % :left
|
198
231
|
|
199
|
-
rule(:int)
|
200
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
201
|
-
end
|
232
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
202
233
|
|
203
234
|
rule(:expr) do |r|
|
204
235
|
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
205
236
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
206
237
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
207
238
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
208
|
-
r[:int]
|
239
|
+
r[:int]
|
209
240
|
end
|
210
241
|
|
211
242
|
start(:expr)
|
@@ -217,11 +248,12 @@ mathematician.parse("6-3-1")
|
|
217
248
|
```
|
218
249
|
|
219
250
|
Attaching a percent sign followed by either `:left` or `:right` changes the associativity of a
|
220
|
-
rule. We now get the correct result.
|
251
|
+
terminal rule. We now get the correct result.
|
221
252
|
|
222
253
|
## Specifying the operator precedence
|
223
254
|
|
224
|
-
Well, despite fixing the associativity, we find we still
|
255
|
+
Basic arithmetic is easy peasy, right? Well, despite fixing the associativity, we find we still
|
256
|
+
have a problem:
|
225
257
|
|
226
258
|
``` ruby
|
227
259
|
mathematician.parse("1+2*3")
|
@@ -241,16 +273,14 @@ class Mathematician < Whittle::Parser
|
|
241
273
|
rule("*") % :left ^ 2
|
242
274
|
rule("/") % :left ^ 2
|
243
275
|
|
244
|
-
rule(:int)
|
245
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
246
|
-
end
|
276
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
247
277
|
|
248
278
|
rule(:expr) do |r|
|
249
279
|
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
250
280
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
251
281
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
252
282
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
253
|
-
r[:int]
|
283
|
+
r[:int]
|
254
284
|
end
|
255
285
|
|
256
286
|
start(:expr)
|
@@ -270,7 +300,7 @@ The same applies to "*" and "/", but these both usually have a higher precedence
|
|
270
300
|
## Disambiguating expressions with the use of parentheses
|
271
301
|
|
272
302
|
Sometimes we really do want "1+2*3" to mean "(1+2)*3", so we should really support this in our
|
273
|
-
mathematician. Fortunately adjusting the syntax rules in Whittle is a painless exercise.
|
303
|
+
mathematician class. Fortunately adjusting the syntax rules in Whittle is a painless exercise.
|
274
304
|
|
275
305
|
``` ruby
|
276
306
|
require 'whittle'
|
@@ -284,9 +314,7 @@ class Mathematician < Whittle::Parser
|
|
284
314
|
rule("(")
|
285
315
|
rule(")")
|
286
316
|
|
287
|
-
rule(:int)
|
288
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
289
|
-
end
|
317
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
290
318
|
|
291
319
|
rule(:expr) do |r|
|
292
320
|
r["(", :expr, ")"].as { |_, exp, _| exp }
|
@@ -294,7 +322,7 @@ class Mathematician < Whittle::Parser
|
|
294
322
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
295
323
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
296
324
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
297
|
-
r[:int]
|
325
|
+
r[:int]
|
298
326
|
end
|
299
327
|
|
300
328
|
start(:expr)
|
@@ -306,22 +334,22 @@ mathematician.parse("(1+2)*3")
|
|
306
334
|
```
|
307
335
|
|
308
336
|
All we had to do was add the new terminal rules for "(" and ")" then specify that the value of
|
309
|
-
an expression enclosed in parentheses is simply the value of the expression itself.
|
337
|
+
an expression enclosed in parentheses is simply the value of the expression itself. We could
|
338
|
+
just as easily pick some other characters to surround the grouping (maybe "~1+2~*3"), but then
|
339
|
+
people would think we were silly (arguably, we would be a bit silly if we gave the expression a
|
340
|
+
curly moustache like that!).
|
310
341
|
|
311
342
|
## Skipping whitespace
|
312
343
|
|
313
344
|
Most languages contain tokens that are ignored when interpreting the input, such as whitespace
|
314
345
|
and comments. Accounting for the possibility of these in all rules would be both wasteful and
|
315
|
-
tiresome. Instead, we skip them entirely, by declaring a terminal rule
|
316
|
-
action, or if you want to be explicit, with `as(:nothing)`.
|
346
|
+
tiresome. Instead, we skip them entirely, by declaring a terminal rule with `#skip!`.
|
317
347
|
|
318
348
|
``` ruby
|
319
349
|
require 'whittle'
|
320
350
|
|
321
351
|
class Mathematician < Whittle::Parser
|
322
|
-
rule(:wsp
|
323
|
-
r[/\s+/]
|
324
|
-
end
|
352
|
+
rule(:wsp => /\s+/).skip!
|
325
353
|
|
326
354
|
rule("+") % :left ^ 1
|
327
355
|
rule("-") % :left ^ 1
|
@@ -331,9 +359,7 @@ class Mathematician < Whittle::Parser
|
|
331
359
|
rule("(")
|
332
360
|
rule(")")
|
333
361
|
|
334
|
-
rule(:int)
|
335
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
336
|
-
end
|
362
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
337
363
|
|
338
364
|
rule(:expr) do |r|
|
339
365
|
r["(", :expr, ")"].as { |_, exp, _| exp }
|
@@ -341,7 +367,7 @@ class Mathematician < Whittle::Parser
|
|
341
367
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
342
368
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
343
369
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
344
|
-
r[:int]
|
370
|
+
r[:int]
|
345
371
|
end
|
346
372
|
|
347
373
|
start(:expr)
|
@@ -387,9 +413,7 @@ match nothing at all, which is what we hit in the middle of our nested parenthes
|
|
387
413
|
This is most useful in constructs like the following:
|
388
414
|
|
389
415
|
``` ruby
|
390
|
-
rule(:id
|
391
|
-
r[/[a-z]+/].as(:value)
|
392
|
-
end
|
416
|
+
rule(:id => /[a-z]+/)
|
393
417
|
|
394
418
|
rule(:list) do |r|
|
395
419
|
r[].as { [] }
|
@@ -412,13 +436,9 @@ information.
|
|
412
436
|
|
413
437
|
``` ruby
|
414
438
|
class ListParser < Whittle::Parser
|
415
|
-
rule(:wsp
|
416
|
-
r[/\s+/]
|
417
|
-
end
|
439
|
+
rule(:wsp => /\s+/).skip!
|
418
440
|
|
419
|
-
rule(:id
|
420
|
-
r[/[a-z]+/].as(:value)
|
421
|
-
end
|
441
|
+
rule(:id => /[a-z]+/)
|
422
442
|
|
423
443
|
rule(",")
|
424
444
|
rule("-")
|
@@ -447,10 +467,17 @@ something else, or rewinding the parse stack to a point where the error would no
|
|
447
467
|
need to write some specs on this and explore it fully myself before I document it. 99% of users
|
448
468
|
would never need to do such a thing.
|
449
469
|
|
470
|
+
## More examples
|
471
|
+
|
472
|
+
There are some runnable examples included in the examples/ directory. Playing around with these
|
473
|
+
would probably be a useful exercise.
|
474
|
+
|
475
|
+
If you have any examples you'd like to contribute, I will gladly add them to the repository.
|
476
|
+
|
450
477
|
## TODO
|
451
478
|
|
452
479
|
- Provide a more powerful (state based) lexer algorithm, or at least document how users can
|
453
|
-
override `#lex`.
|
480
|
+
override `#lex`.
|
454
481
|
- Allow inspection of the parse table (it is not very human friendly right now).
|
455
482
|
- Allow inspection of the AST (maybe).
|
456
483
|
- Given in an input String, provide a human readble explanation of the parse.
|
@@ -0,0 +1,59 @@
|
|
1
|
+
# Whittle: A little LALR(1) parser in pure ruby, without a generator.
|
2
|
+
#
|
3
|
+
# Copyright (c) Chris Corbyn, 2011
|
4
|
+
|
5
|
+
# This example creates a simple infix calculator, supporting the four basic arithmetic
|
6
|
+
# functions, add, subtract, multiply and divide, along with logic grouping and operator
|
7
|
+
# precedence
|
8
|
+
|
9
|
+
require "whittle"
|
10
|
+
require "bigdecimal"
|
11
|
+
|
12
|
+
class Calculator < Whittle::Parser
|
13
|
+
rule(:wsp => /\s+/).skip!
|
14
|
+
|
15
|
+
rule("+") % :left ^ 1
|
16
|
+
rule("-") % :left ^ 1
|
17
|
+
rule("*") % :left ^ 2
|
18
|
+
rule("/") % :left ^ 2
|
19
|
+
|
20
|
+
rule("(")
|
21
|
+
rule(")")
|
22
|
+
|
23
|
+
rule(:decimal => /([0-9]*\.)?[0-9]+/).as { |num| BigDecimal(num) }
|
24
|
+
|
25
|
+
rule(:expr) do |r|
|
26
|
+
r["(", :expr, ")"].as { |_, e, _| e }
|
27
|
+
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
28
|
+
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
29
|
+
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
30
|
+
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
31
|
+
r["-", :expr].as { |_, e| -e }
|
32
|
+
r[:decimal]
|
33
|
+
end
|
34
|
+
|
35
|
+
start(:expr)
|
36
|
+
end
|
37
|
+
|
38
|
+
calculator = Calculator.new
|
39
|
+
|
40
|
+
p calculator.parse("5-2-1").to_f
|
41
|
+
# => 2
|
42
|
+
|
43
|
+
p calculator.parse("5-2*3").to_f
|
44
|
+
# => -1
|
45
|
+
|
46
|
+
p calculator.parse(".7").to_f
|
47
|
+
# => 0.7
|
48
|
+
|
49
|
+
p calculator.parse("3.3 - .7").to_f
|
50
|
+
# => 2.6
|
51
|
+
|
52
|
+
p calculator.parse("5-(2-1)").to_f
|
53
|
+
# => 4
|
54
|
+
|
55
|
+
p calculator.parse("5 - -2").to_f
|
56
|
+
# => 7
|
57
|
+
|
58
|
+
p calculator.parse("5 * 2 - -2").to_f
|
59
|
+
# => 12
|
data/lib/whittle/parser.rb
CHANGED
@@ -58,69 +58,42 @@ module Whittle
|
|
58
58
|
|
59
59
|
# Declares a new rule.
|
60
60
|
#
|
61
|
-
# The are
|
62
|
-
# in the +name+ parameter, along with a block, in which you will add one more possible
|
63
|
-
# rules.
|
61
|
+
# The are three ways to call this method:
|
64
62
|
#
|
65
|
-
#
|
63
|
+
# 1. rule("+")
|
64
|
+
# 2. rule(:int => /[0-9]+/)
|
65
|
+
# 3. rule(:expr) do |r|
|
66
|
+
# r[:int, "+", :int].as { |a, _, b| a + b }
|
67
|
+
# end
|
66
68
|
#
|
67
|
-
#
|
68
|
-
#
|
69
|
-
# r[:expr, "-", :expr].as { |a, _, b| a - b }
|
70
|
-
# r[:expr, "/", :expr].as { |a, _, b| a / b }
|
71
|
-
# r[:expr, "*", :expr].as { |a, _, b| a * b }
|
72
|
-
# r[:integer].as { |i| Integer(i) }
|
73
|
-
# end
|
69
|
+
# Variants (1) and (2) define basic terminal symbols (direct chunks of the input string),
|
70
|
+
# while variant (3) takes a block to define one or more nonterminal rules.
|
74
71
|
#
|
75
|
-
#
|
76
|
-
#
|
77
|
-
# expr:
|
78
|
-
#
|
79
|
-
# 42
|
80
|
-
#
|
81
|
-
# Therefore any sum of integers as also a valid expr:
|
82
|
-
#
|
83
|
-
# 42 + 24
|
84
|
-
#
|
85
|
-
# Therefore any multiplication of sums of integers is also a valid expr, and so on.
|
86
|
-
#
|
87
|
-
# 42 + 24 * 7 + 52
|
88
|
-
#
|
89
|
-
# A rule like the above is called a 'nonterminal', because upon recognizing any expr, it
|
90
|
-
# is possible for the rule to continue collecting input and becoming a larger expr.
|
91
|
-
#
|
92
|
-
# In subtle contrast, a rule like the following:
|
93
|
-
#
|
94
|
-
# rule("+") do |r|
|
95
|
-
# r["+"].as { |plus| plus }
|
96
|
-
# end
|
97
|
-
#
|
98
|
-
# Is called a 'terminal' token, since upon recognizing a "+", the parser cannot
|
99
|
-
# add further input to the "+" itself... it is the tip of a branch in the parse tree; the
|
100
|
-
# branch terminates here, and subsequently the rule is terminal.
|
101
|
-
#
|
102
|
-
# There is a shorthand way to write the above rule:
|
103
|
-
#
|
104
|
-
# rule("+")
|
105
|
-
#
|
106
|
-
# Not given a block, #rule treats the name parameter as a literal token.
|
107
|
-
#
|
108
|
-
# Note that nonterminal rules are composed of other nonterminal rules and/or terminal
|
109
|
-
# rules. Terminal rules contain one, and only one Regexp pattern or fixed string.
|
110
|
-
#
|
111
|
-
# @param [Symbol, String] name
|
112
|
-
# the name of the ruleset (note the one ruleset can contain multiple rules)
|
72
|
+
# @param [Symbol, String, Hash] name
|
73
|
+
# the name of the rule, or a Hash mapping the name to a pattern
|
113
74
|
#
|
114
75
|
# @return [RuleSet, Rule]
|
115
76
|
# the newly created RuleSet if a block was given, otherwise a rule representing a
|
116
77
|
# terminal token for the input string +name+.
|
117
78
|
def rule(name)
|
118
|
-
rules[name] = RuleSet.new(name)
|
119
|
-
|
120
79
|
if block_given?
|
80
|
+
raise ArgumentError,
|
81
|
+
"Parser#rule does not accept both a Hash and a block" if name.kind_of?(Hash)
|
82
|
+
|
83
|
+
rules[name] = RuleSet.new(name)
|
121
84
|
rules[name].tap { |r| yield r }
|
122
85
|
else
|
123
|
-
|
86
|
+
key, value = if name.kind_of?(Hash)
|
87
|
+
raise ArgumentError,
|
88
|
+
"Only one element allowed in Hash for Parser#rule" unless name.length == 1
|
89
|
+
|
90
|
+
name.first
|
91
|
+
else
|
92
|
+
[name, name]
|
93
|
+
end
|
94
|
+
|
95
|
+
rules[key] = RuleSet.new(key)
|
96
|
+
rules[key][value].as(:value)
|
124
97
|
end
|
125
98
|
end
|
126
99
|
|
data/lib/whittle/rule.rb
CHANGED
@@ -26,7 +26,7 @@ module Whittle
|
|
26
26
|
# a variable list of components that make up the Rule
|
27
27
|
def initialize(name, *components)
|
28
28
|
@components = components
|
29
|
-
@action =
|
29
|
+
@action = DUMP_ACTION
|
30
30
|
@name = name
|
31
31
|
@terminal = components.length == 1 && !components.first.kind_of?(Symbol)
|
32
32
|
@assoc = :right
|
@@ -142,6 +142,8 @@ module Whittle
|
|
142
142
|
# Given a block, the Rule will be reduced by passing the result of reducing
|
143
143
|
# all inputs as arguments to the block.
|
144
144
|
#
|
145
|
+
# The default action is to return the leftmost input unchanged.
|
146
|
+
#
|
145
147
|
# Given the Symbol :value, the matched input will be returned verbatim.
|
146
148
|
# Given the Symbol :nothing, nil will be returned; you can use this to
|
147
149
|
# skip whitesapce and comments, for example.
|
@@ -165,6 +167,14 @@ module Whittle
|
|
165
167
|
end
|
166
168
|
end
|
167
169
|
|
170
|
+
# Alias for as(:nothing).
|
171
|
+
#
|
172
|
+
# @return [Rule]
|
173
|
+
# returns self
|
174
|
+
def skip!
|
175
|
+
as(:nothing)
|
176
|
+
end
|
177
|
+
|
168
178
|
# Set the associativity of this Rule.
|
169
179
|
#
|
170
180
|
# Accepts values of :left, :right (default) or :nonassoc.
|
data/lib/whittle/version.rb
CHANGED
data/lib/whittle.rb
CHANGED
@@ -3,13 +3,9 @@ require "spec_helper"
|
|
3
3
|
describe "a parser encountering unexpected input" do
|
4
4
|
let(:parser) do
|
5
5
|
Class.new(Whittle::Parser) do
|
6
|
-
rule(:wsp
|
7
|
-
r[/\s+/]
|
8
|
-
end
|
6
|
+
rule(:wsp => /\s+/).skip!
|
9
7
|
|
10
|
-
rule(:id
|
11
|
-
r[/[a-z]+/].as(:value)
|
12
|
-
end
|
8
|
+
rule(:id => /[a-z]+/)
|
13
9
|
|
14
10
|
rule(",")
|
15
11
|
rule("-")
|
@@ -6,12 +6,10 @@ describe "a parser with logical grouping" do
|
|
6
6
|
rule(:expr) do |r|
|
7
7
|
r["(", :expr, ")"].as { |_, expr, _| expr }
|
8
8
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
9
|
-
r[:int]
|
9
|
+
r[:int]
|
10
10
|
end
|
11
11
|
|
12
|
-
rule(:int)
|
13
|
-
r[/[0-9]+/].as { |int| Integer(int) }
|
14
|
-
end
|
12
|
+
rule(:int => /[0-9]+/).as { |int| Integer(int) }
|
15
13
|
|
16
14
|
rule("(")
|
17
15
|
rule(")")
|
@@ -9,12 +9,10 @@ describe "a parser with multiple precedence levels" do
|
|
9
9
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
10
10
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
11
11
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
12
|
-
r[:int]
|
12
|
+
r[:int]
|
13
13
|
end
|
14
14
|
|
15
|
-
rule(:int)
|
16
|
-
r[/[0-9]+/].as { |int| Integer(int) }
|
17
|
-
end
|
15
|
+
rule(:int => /[0-9]+/).as { |int| Integer(int) }
|
18
16
|
|
19
17
|
rule("(")
|
20
18
|
rule(")")
|
@@ -6,14 +6,12 @@ describe "a parser depending on operator precedences" do
|
|
6
6
|
rule("+") % :left ^ 1
|
7
7
|
rule("*") % :left ^ 2
|
8
8
|
|
9
|
-
rule(:int)
|
10
|
-
r[/[0-9]+/].as { |i| Integer(i) }
|
11
|
-
end
|
9
|
+
rule(:int => /[0-9]+/).as { |i| Integer(i) }
|
12
10
|
|
13
11
|
rule(:expr) do |r|
|
14
12
|
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
15
13
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
16
|
-
r[:int]
|
14
|
+
r[:int]
|
17
15
|
end
|
18
16
|
|
19
17
|
start(:expr)
|
@@ -7,13 +7,11 @@ describe "a parser with a self-referential rule" do
|
|
7
7
|
rule(")")
|
8
8
|
rule("+")
|
9
9
|
|
10
|
-
rule(:int)
|
11
|
-
r[/[0-9]+/].as { |int| Integer(int) }
|
12
|
-
end
|
10
|
+
rule(:int => /[0-9]+/).as { |int| Integer(int) }
|
13
11
|
|
14
12
|
rule(:expr) do |r|
|
15
13
|
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
16
|
-
r[:int]
|
14
|
+
r[:int]
|
17
15
|
end
|
18
16
|
|
19
17
|
start(:expr)
|
@@ -3,19 +3,15 @@ require "spec_helper"
|
|
3
3
|
describe "a parser that skips tokens" do
|
4
4
|
let(:parser) do
|
5
5
|
Class.new(Whittle::Parser) do
|
6
|
-
rule(:wsp
|
7
|
-
r[/\s+/]
|
8
|
-
end
|
6
|
+
rule(:wsp => /\s+/).skip!
|
9
7
|
|
10
8
|
rule("-") % :left
|
11
9
|
|
12
|
-
rule(:int)
|
13
|
-
r[/[0-9]+/].as { |int| Integer(int) }
|
14
|
-
end
|
10
|
+
rule(:int => /[0-9]+/).as { |int| Integer(int) }
|
15
11
|
|
16
12
|
rule(:expr) do |r|
|
17
13
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
18
|
-
r[:int]
|
14
|
+
r[:int]
|
19
15
|
end
|
20
16
|
|
21
17
|
start(:expr)
|
@@ -5,9 +5,7 @@ describe "a parser returning the sum of two integers" do
|
|
5
5
|
Class.new(Whittle::Parser) do
|
6
6
|
rule("+")
|
7
7
|
|
8
|
-
rule(:int)
|
9
|
-
r[/[0-9]+/].as { |int| Integer(int) }
|
10
|
-
end
|
8
|
+
rule(:int => /[0-9]+/).as { |int| Integer(int) }
|
11
9
|
|
12
10
|
rule(:sum) do |r|
|
13
11
|
r[:int, "+", :int].as { |a, _, b| a + b }
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: whittle
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,11 +9,11 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2011-11-
|
12
|
+
date: 2011-11-28 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rspec
|
16
|
-
requirement: &
|
16
|
+
requirement: &70351976364700 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ~>
|
@@ -21,7 +21,7 @@ dependencies:
|
|
21
21
|
version: '2.6'
|
22
22
|
type: :development
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *70351976364700
|
25
25
|
description: ! "Write powerful parsers by defining a series of very simple rules\n
|
26
26
|
\ and operations to perform as those rules are matched. Whittle\n
|
27
27
|
\ parsers are written in pure ruby and as such are extremely
|
@@ -40,6 +40,7 @@ files:
|
|
40
40
|
- LICENSE
|
41
41
|
- README.md
|
42
42
|
- Rakefile
|
43
|
+
- examples/calculator.rb
|
43
44
|
- lib/whittle.rb
|
44
45
|
- lib/whittle/error.rb
|
45
46
|
- lib/whittle/errors/grammar_error.rb
|