whittle 0.0.1 → 0.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.md +102 -75
- data/examples/calculator.rb +59 -0
- data/lib/whittle/parser.rb +25 -52
- data/lib/whittle/rule.rb +11 -1
- data/lib/whittle/version.rb +5 -1
- data/lib/whittle.rb +4 -0
- data/spec/unit/parser/error_reporting_spec.rb +2 -6
- data/spec/unit/parser/grouped_expr_spec.rb +2 -4
- data/spec/unit/parser/multiple_precedence_spec.rb +2 -4
- data/spec/unit/parser/noop_spec.rb +2 -4
- data/spec/unit/parser/pass_through_parser_spec.rb +1 -3
- data/spec/unit/parser/precedence_spec.rb +2 -4
- data/spec/unit/parser/self_referential_expr_spec.rb +2 -4
- data/spec/unit/parser/skipped_tokens_spec.rb +3 -7
- data/spec/unit/parser/sum_parser_spec.rb +1 -3
- data/spec/unit/parser/typecast_parser_spec.rb +1 -3
- metadata +5 -4
data/README.md
CHANGED
@@ -2,32 +2,57 @@
|
|
2
2
|
|
3
3
|
Whittle is a LALR(1) parser. It's very small, easy to understand, and what's most important,
|
4
4
|
it's 100% ruby. You write parsers by specifying sequences of allowable rules (which refer to
|
5
|
-
other rules, or even to themselves)
|
5
|
+
other rules, or even to themselves). For each rule in your grammar, you provide a block that
|
6
6
|
is invoked when the grammar is recognized.
|
7
7
|
|
8
|
-
If you're not familiar with parsing, you should find Whittle to be a very friendly little
|
8
|
+
If you're *not* familiar with parsing, you should find Whittle to be a very friendly little
|
9
9
|
parser.
|
10
10
|
|
11
|
-
It is related, somewhat, to yacc and bison, which belong to the class of parsers
|
12
|
-
LALR(1):
|
13
|
-
|
11
|
+
It is related, somewhat, to yacc and bison, which belong to the class of parsers known as
|
12
|
+
LALR(1): Left-Right, using 1 Lookahead token. This class of parsers is both easy to work with
|
13
|
+
and particularly powerful (ruby itself is parsed using a LALR(1) parser). Since the algorithm
|
14
|
+
is based around a theory that *never* has to backtrack (that is, each token read takes the
|
15
|
+
parse forward, with just a single lookup in a parse table), parse time is also fast. Parse
|
16
|
+
time is governed by the size of the input, not by the size of the grammar.
|
14
17
|
|
15
|
-
Whittle provides meaningful error reporting
|
16
|
-
if you need to write some sort of crazy
|
18
|
+
Whittle provides meaningful error reporting (line number, expected tokens, received token) and
|
19
|
+
even lets you hook into the error handling logic if you need to write some sort of crazy
|
20
|
+
madman-forgiving parser.
|
21
|
+
|
22
|
+
If you've had issues with other parsers hitting "stack level too deep" errors, you should find
|
23
|
+
that Whittle does not suffer from the same issues, since it uses a state-switching algorithm
|
24
|
+
(a pushdown automaton to be precise), rather than simply having one parse function call another
|
25
|
+
and so on. Whittle also supports the following concepts:
|
26
|
+
|
27
|
+
- Left/right recursion
|
28
|
+
- Left/right associativity
|
29
|
+
- Operator precedences
|
30
|
+
- Skipping of silent tokens in the input (e.g. whitespace/comments)
|
31
|
+
|
32
|
+
## Installation
|
33
|
+
|
34
|
+
Via rubygems:
|
35
|
+
|
36
|
+
gem install whittle
|
37
|
+
|
38
|
+
Or in your Gemfile, if you're using bundler:
|
39
|
+
|
40
|
+
gem 'whittle'
|
17
41
|
|
18
42
|
## The Basics
|
19
43
|
|
20
|
-
Parsers using Whittle
|
21
|
-
odd, but c'mon, we're using Ruby, right?
|
44
|
+
Parsers using Whittle do not generate ruby code from a grammar file. This may strike users of
|
45
|
+
other LALR(1) parsers as odd, but c'mon, we're using Ruby, right?
|
22
46
|
|
23
47
|
I'll avoid discussing the algorithm until we get into the really advanced stuff, but you will
|
24
48
|
need to understand a few fundamental ideas before we begin.
|
25
49
|
|
26
|
-
1. There are two types of rule that make up a complete parser: terminal
|
50
|
+
1. There are two types of rule that make up a complete parser: *terminal*, and *nonterminal*
|
27
51
|
- A terminal rule is quite simply a chunk of the input string, like '42', or 'function'
|
28
|
-
- A nonterminal rule is a rule that makes reference to other rules (terminal and
|
52
|
+
- A nonterminal rule is a rule that makes reference to other rules (both terminal and
|
53
|
+
nonterminal)
|
29
54
|
2. The input to be parsed *always* conforms to just one rule at the topmost level. This is
|
30
|
-
known as the "start rule".
|
55
|
+
known as the "start rule" and describes the structure of the program as a whole.
|
31
56
|
|
32
57
|
The easiest way to understand how the parser works is just to learn by example, so let's see an
|
33
58
|
example.
|
@@ -38,9 +63,7 @@ require 'whittle'
|
|
38
63
|
class Mathematician < Whittle::Parser
|
39
64
|
rule("+")
|
40
65
|
|
41
|
-
rule(:int)
|
42
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
43
|
-
end
|
66
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
44
67
|
|
45
68
|
rule(:expr) do |r|
|
46
69
|
r[:int, "+", :int].as { |a, _, b| a + b }
|
@@ -55,21 +78,32 @@ mathematician.parse("1+2")
|
|
55
78
|
```
|
56
79
|
|
57
80
|
Let's break this down a bit. As you can see, the whole thing is really just `rule` used in
|
58
|
-
different ways. We also have to set the rule that we can use to describe an entire
|
59
|
-
which in this case is the `:expr` rule that can add two numbers together.
|
81
|
+
different ways. We also have to set the start rule that we can use to describe an entire
|
82
|
+
program, which in this case is the `:expr` rule that can add two numbers together.
|
60
83
|
|
61
84
|
There are two terminal rules (`"+"` and `:int`) and one nonterminal (`:expr`) in the above
|
62
85
|
grammar. Each rule can have a block attached to it. The block is invoked with the result
|
63
|
-
evaluating the blocks
|
64
|
-
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
86
|
+
evaluating the blocks attached to each of its inputs (in a depth-first manner). The default
|
87
|
+
action if no block is given, is to return whatever the leftmost input to the rule happens to
|
88
|
+
be.
|
89
|
+
|
90
|
+
We can optionally use the Hash notation to map a name with a pattern (or a fixed string) when
|
91
|
+
we declare terminal rules too, as we have done with the `:int` rule above. Note that the
|
92
|
+
longer way around defining terminal rules is to do like we have done for `:expr` and define a
|
93
|
+
block, but since this is such a common use-case, Whittle offers the shorthand.
|
94
|
+
|
95
|
+
As the input string is parsed, it *must* match the start rule `:expr`.
|
96
|
+
|
97
|
+
Let's step through the parse for the above input "1+2". When the parser starts, it looks at
|
98
|
+
the start rule `:expr` and decides what tokens would be valid if they were encountered. Since
|
99
|
+
`:expr` starts with `:int`, the only thing that would be valid is anything matching
|
100
|
+
`/[0-9]+/`. When the parser reads the "1", it recognizes it as an `:int`, puts at aside (puts
|
101
|
+
it on the stack, in technical terms). Now it advances through the rule for `:expr` and
|
102
|
+
decides the only possible valid input would be a "+", and finally the last `:int`. Upon
|
103
|
+
having read the sequence `:int`, "+", `:int`, our block attached to that rule is invoked to
|
104
|
+
return a result. First the three inputs are passed through their respective blocks (so the
|
105
|
+
"1" and the "2" are cast to integers, according to the rule for `:int`), then they are passed
|
106
|
+
to the `:expr`, which adds the 1 and the 2 to make 3. Magic!
|
73
107
|
|
74
108
|
## Nonterminal rules can have more than one valid sequence
|
75
109
|
|
@@ -88,9 +122,7 @@ class Mathematician < Whittle::Parser
|
|
88
122
|
rule("*")
|
89
123
|
rule("/")
|
90
124
|
|
91
|
-
rule(:int)
|
92
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
93
|
-
end
|
125
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
94
126
|
|
95
127
|
rule(:expr) do |r|
|
96
128
|
r[:int, "+", :int].as { |a, _, b| a + b }
|
@@ -117,7 +149,9 @@ mathematician.parse("4/2")
|
|
117
149
|
# => 2
|
118
150
|
```
|
119
151
|
|
120
|
-
Now you're probably
|
152
|
+
Now you're probably beginning to see how matching just one rule for the entire input is not a
|
153
|
+
problem. To think about a more real world example, you can describe most programming
|
154
|
+
languages as a series of statements and constructs.
|
121
155
|
|
122
156
|
## Rules can refer to themselves
|
123
157
|
|
@@ -133,16 +167,14 @@ class Mathematician < Whittle::Parser
|
|
133
167
|
rule("*")
|
134
168
|
rule("/")
|
135
169
|
|
136
|
-
rule(:int)
|
137
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
138
|
-
end
|
170
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
139
171
|
|
140
172
|
rule(:expr) do |r|
|
141
173
|
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
142
174
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
143
175
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
144
176
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
145
|
-
r[:int]
|
177
|
+
r[:int]
|
146
178
|
end
|
147
179
|
|
148
180
|
start(:expr)
|
@@ -156,14 +188,15 @@ mathematician.parse("1+5-2")
|
|
156
188
|
Adding a rule of just `:int` to the `:expr` rule means that any integer is also a valid `:expr`.
|
157
189
|
It is now possible to say that any `:expr` can be added to, multiplied by, divided by or
|
158
190
|
subtracted from another `:expr`. It is this ability to self-reference that makes LALR(1)
|
159
|
-
parsers so powerful and easy to use. Note that because the result each
|
160
|
-
*before* being passed as arguments to the block, each `:expr` in the calculations
|
161
|
-
always be a number, since each `:expr` returns a number.
|
191
|
+
parsers so powerful and easy to use. Note that because the result each input to any given rule
|
192
|
+
is computed *before* being passed as arguments to the block, each `:expr` in the calculations
|
193
|
+
above will always be a number, since each `:expr` returns a number. The recursion in these rules
|
194
|
+
is practically limitless. You can write "1+2-3*4+775/3" and it's still an `:expr`.
|
162
195
|
|
163
196
|
## Specifying the associativity
|
164
197
|
|
165
|
-
|
166
|
-
what happens when we do the following:
|
198
|
+
If we poke around for more than a few seconds, we'll soon realize that our mathematician makes
|
199
|
+
some silly mistakes. Let's see what happens when we do the following:
|
167
200
|
|
168
201
|
``` ruby
|
169
202
|
mathematician.parse("6-3-1")
|
@@ -196,16 +229,14 @@ class Mathematician < Whittle::Parser
|
|
196
229
|
rule("*") % :left
|
197
230
|
rule("/") % :left
|
198
231
|
|
199
|
-
rule(:int)
|
200
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
201
|
-
end
|
232
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
202
233
|
|
203
234
|
rule(:expr) do |r|
|
204
235
|
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
205
236
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
206
237
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
207
238
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
208
|
-
r[:int]
|
239
|
+
r[:int]
|
209
240
|
end
|
210
241
|
|
211
242
|
start(:expr)
|
@@ -217,11 +248,12 @@ mathematician.parse("6-3-1")
|
|
217
248
|
```
|
218
249
|
|
219
250
|
Attaching a percent sign followed by either `:left` or `:right` changes the associativity of a
|
220
|
-
rule. We now get the correct result.
|
251
|
+
terminal rule. We now get the correct result.
|
221
252
|
|
222
253
|
## Specifying the operator precedence
|
223
254
|
|
224
|
-
Well, despite fixing the associativity, we find we still
|
255
|
+
Basic arithmetic is easy peasy, right? Well, despite fixing the associativity, we find we still
|
256
|
+
have a problem:
|
225
257
|
|
226
258
|
``` ruby
|
227
259
|
mathematician.parse("1+2*3")
|
@@ -241,16 +273,14 @@ class Mathematician < Whittle::Parser
|
|
241
273
|
rule("*") % :left ^ 2
|
242
274
|
rule("/") % :left ^ 2
|
243
275
|
|
244
|
-
rule(:int)
|
245
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
246
|
-
end
|
276
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
247
277
|
|
248
278
|
rule(:expr) do |r|
|
249
279
|
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
250
280
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
251
281
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
252
282
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
253
|
-
r[:int]
|
283
|
+
r[:int]
|
254
284
|
end
|
255
285
|
|
256
286
|
start(:expr)
|
@@ -270,7 +300,7 @@ The same applies to "*" and "/", but these both usually have a higher precedence
|
|
270
300
|
## Disambiguating expressions with the use of parentheses
|
271
301
|
|
272
302
|
Sometimes we really do want "1+2*3" to mean "(1+2)*3", so we should really support this in our
|
273
|
-
mathematician. Fortunately adjusting the syntax rules in Whittle is a painless exercise.
|
303
|
+
mathematician class. Fortunately adjusting the syntax rules in Whittle is a painless exercise.
|
274
304
|
|
275
305
|
``` ruby
|
276
306
|
require 'whittle'
|
@@ -284,9 +314,7 @@ class Mathematician < Whittle::Parser
|
|
284
314
|
rule("(")
|
285
315
|
rule(")")
|
286
316
|
|
287
|
-
rule(:int)
|
288
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
289
|
-
end
|
317
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
290
318
|
|
291
319
|
rule(:expr) do |r|
|
292
320
|
r["(", :expr, ")"].as { |_, exp, _| exp }
|
@@ -294,7 +322,7 @@ class Mathematician < Whittle::Parser
|
|
294
322
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
295
323
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
296
324
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
297
|
-
r[:int]
|
325
|
+
r[:int]
|
298
326
|
end
|
299
327
|
|
300
328
|
start(:expr)
|
@@ -306,22 +334,22 @@ mathematician.parse("(1+2)*3")
|
|
306
334
|
```
|
307
335
|
|
308
336
|
All we had to do was add the new terminal rules for "(" and ")" then specify that the value of
|
309
|
-
an expression enclosed in parentheses is simply the value of the expression itself.
|
337
|
+
an expression enclosed in parentheses is simply the value of the expression itself. We could
|
338
|
+
just as easily pick some other characters to surround the grouping (maybe "~1+2~*3"), but then
|
339
|
+
people would think we were silly (arguably, we would be a bit silly if we gave the expression a
|
340
|
+
curly moustache like that!).
|
310
341
|
|
311
342
|
## Skipping whitespace
|
312
343
|
|
313
344
|
Most languages contain tokens that are ignored when interpreting the input, such as whitespace
|
314
345
|
and comments. Accounting for the possibility of these in all rules would be both wasteful and
|
315
|
-
tiresome. Instead, we skip them entirely, by declaring a terminal rule
|
316
|
-
action, or if you want to be explicit, with `as(:nothing)`.
|
346
|
+
tiresome. Instead, we skip them entirely, by declaring a terminal rule with `#skip!`.
|
317
347
|
|
318
348
|
``` ruby
|
319
349
|
require 'whittle'
|
320
350
|
|
321
351
|
class Mathematician < Whittle::Parser
|
322
|
-
rule(:wsp
|
323
|
-
r[/\s+/]
|
324
|
-
end
|
352
|
+
rule(:wsp => /\s+/).skip!
|
325
353
|
|
326
354
|
rule("+") % :left ^ 1
|
327
355
|
rule("-") % :left ^ 1
|
@@ -331,9 +359,7 @@ class Mathematician < Whittle::Parser
|
|
331
359
|
rule("(")
|
332
360
|
rule(")")
|
333
361
|
|
334
|
-
rule(:int)
|
335
|
-
r[/[0-9]+/].as { |num| Integer(num) }
|
336
|
-
end
|
362
|
+
rule(:int => /[0-9]+/).as { |num| Integer(num) }
|
337
363
|
|
338
364
|
rule(:expr) do |r|
|
339
365
|
r["(", :expr, ")"].as { |_, exp, _| exp }
|
@@ -341,7 +367,7 @@ class Mathematician < Whittle::Parser
|
|
341
367
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
342
368
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
343
369
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
344
|
-
r[:int]
|
370
|
+
r[:int]
|
345
371
|
end
|
346
372
|
|
347
373
|
start(:expr)
|
@@ -387,9 +413,7 @@ match nothing at all, which is what we hit in the middle of our nested parenthes
|
|
387
413
|
This is most useful in constructs like the following:
|
388
414
|
|
389
415
|
``` ruby
|
390
|
-
rule(:id
|
391
|
-
r[/[a-z]+/].as(:value)
|
392
|
-
end
|
416
|
+
rule(:id => /[a-z]+/)
|
393
417
|
|
394
418
|
rule(:list) do |r|
|
395
419
|
r[].as { [] }
|
@@ -412,13 +436,9 @@ information.
|
|
412
436
|
|
413
437
|
``` ruby
|
414
438
|
class ListParser < Whittle::Parser
|
415
|
-
rule(:wsp
|
416
|
-
r[/\s+/]
|
417
|
-
end
|
439
|
+
rule(:wsp => /\s+/).skip!
|
418
440
|
|
419
|
-
rule(:id
|
420
|
-
r[/[a-z]+/].as(:value)
|
421
|
-
end
|
441
|
+
rule(:id => /[a-z]+/)
|
422
442
|
|
423
443
|
rule(",")
|
424
444
|
rule("-")
|
@@ -447,10 +467,17 @@ something else, or rewinding the parse stack to a point where the error would no
|
|
447
467
|
need to write some specs on this and explore it fully myself before I document it. 99% of users
|
448
468
|
would never need to do such a thing.
|
449
469
|
|
470
|
+
## More examples
|
471
|
+
|
472
|
+
There are some runnable examples included in the examples/ directory. Playing around with these
|
473
|
+
would probably be a useful exercise.
|
474
|
+
|
475
|
+
If you have any examples you'd like to contribute, I will gladly add them to the repository.
|
476
|
+
|
450
477
|
## TODO
|
451
478
|
|
452
479
|
- Provide a more powerful (state based) lexer algorithm, or at least document how users can
|
453
|
-
override `#lex`.
|
480
|
+
override `#lex`.
|
454
481
|
- Allow inspection of the parse table (it is not very human friendly right now).
|
455
482
|
- Allow inspection of the AST (maybe).
|
456
483
|
- Given in an input String, provide a human readble explanation of the parse.
|
@@ -0,0 +1,59 @@
|
|
1
|
+
# Whittle: A little LALR(1) parser in pure ruby, without a generator.
|
2
|
+
#
|
3
|
+
# Copyright (c) Chris Corbyn, 2011
|
4
|
+
|
5
|
+
# This example creates a simple infix calculator, supporting the four basic arithmetic
|
6
|
+
# functions, add, subtract, multiply and divide, along with logic grouping and operator
|
7
|
+
# precedence
|
8
|
+
|
9
|
+
require "whittle"
|
10
|
+
require "bigdecimal"
|
11
|
+
|
12
|
+
class Calculator < Whittle::Parser
|
13
|
+
rule(:wsp => /\s+/).skip!
|
14
|
+
|
15
|
+
rule("+") % :left ^ 1
|
16
|
+
rule("-") % :left ^ 1
|
17
|
+
rule("*") % :left ^ 2
|
18
|
+
rule("/") % :left ^ 2
|
19
|
+
|
20
|
+
rule("(")
|
21
|
+
rule(")")
|
22
|
+
|
23
|
+
rule(:decimal => /([0-9]*\.)?[0-9]+/).as { |num| BigDecimal(num) }
|
24
|
+
|
25
|
+
rule(:expr) do |r|
|
26
|
+
r["(", :expr, ")"].as { |_, e, _| e }
|
27
|
+
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
28
|
+
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
29
|
+
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
30
|
+
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
31
|
+
r["-", :expr].as { |_, e| -e }
|
32
|
+
r[:decimal]
|
33
|
+
end
|
34
|
+
|
35
|
+
start(:expr)
|
36
|
+
end
|
37
|
+
|
38
|
+
calculator = Calculator.new
|
39
|
+
|
40
|
+
p calculator.parse("5-2-1").to_f
|
41
|
+
# => 2
|
42
|
+
|
43
|
+
p calculator.parse("5-2*3").to_f
|
44
|
+
# => -1
|
45
|
+
|
46
|
+
p calculator.parse(".7").to_f
|
47
|
+
# => 0.7
|
48
|
+
|
49
|
+
p calculator.parse("3.3 - .7").to_f
|
50
|
+
# => 2.6
|
51
|
+
|
52
|
+
p calculator.parse("5-(2-1)").to_f
|
53
|
+
# => 4
|
54
|
+
|
55
|
+
p calculator.parse("5 - -2").to_f
|
56
|
+
# => 7
|
57
|
+
|
58
|
+
p calculator.parse("5 * 2 - -2").to_f
|
59
|
+
# => 12
|
data/lib/whittle/parser.rb
CHANGED
@@ -58,69 +58,42 @@ module Whittle
|
|
58
58
|
|
59
59
|
# Declares a new rule.
|
60
60
|
#
|
61
|
-
# The are
|
62
|
-
# in the +name+ parameter, along with a block, in which you will add one more possible
|
63
|
-
# rules.
|
61
|
+
# The are three ways to call this method:
|
64
62
|
#
|
65
|
-
#
|
63
|
+
# 1. rule("+")
|
64
|
+
# 2. rule(:int => /[0-9]+/)
|
65
|
+
# 3. rule(:expr) do |r|
|
66
|
+
# r[:int, "+", :int].as { |a, _, b| a + b }
|
67
|
+
# end
|
66
68
|
#
|
67
|
-
#
|
68
|
-
#
|
69
|
-
# r[:expr, "-", :expr].as { |a, _, b| a - b }
|
70
|
-
# r[:expr, "/", :expr].as { |a, _, b| a / b }
|
71
|
-
# r[:expr, "*", :expr].as { |a, _, b| a * b }
|
72
|
-
# r[:integer].as { |i| Integer(i) }
|
73
|
-
# end
|
69
|
+
# Variants (1) and (2) define basic terminal symbols (direct chunks of the input string),
|
70
|
+
# while variant (3) takes a block to define one or more nonterminal rules.
|
74
71
|
#
|
75
|
-
#
|
76
|
-
#
|
77
|
-
# expr:
|
78
|
-
#
|
79
|
-
# 42
|
80
|
-
#
|
81
|
-
# Therefore any sum of integers as also a valid expr:
|
82
|
-
#
|
83
|
-
# 42 + 24
|
84
|
-
#
|
85
|
-
# Therefore any multiplication of sums of integers is also a valid expr, and so on.
|
86
|
-
#
|
87
|
-
# 42 + 24 * 7 + 52
|
88
|
-
#
|
89
|
-
# A rule like the above is called a 'nonterminal', because upon recognizing any expr, it
|
90
|
-
# is possible for the rule to continue collecting input and becoming a larger expr.
|
91
|
-
#
|
92
|
-
# In subtle contrast, a rule like the following:
|
93
|
-
#
|
94
|
-
# rule("+") do |r|
|
95
|
-
# r["+"].as { |plus| plus }
|
96
|
-
# end
|
97
|
-
#
|
98
|
-
# Is called a 'terminal' token, since upon recognizing a "+", the parser cannot
|
99
|
-
# add further input to the "+" itself... it is the tip of a branch in the parse tree; the
|
100
|
-
# branch terminates here, and subsequently the rule is terminal.
|
101
|
-
#
|
102
|
-
# There is a shorthand way to write the above rule:
|
103
|
-
#
|
104
|
-
# rule("+")
|
105
|
-
#
|
106
|
-
# Not given a block, #rule treats the name parameter as a literal token.
|
107
|
-
#
|
108
|
-
# Note that nonterminal rules are composed of other nonterminal rules and/or terminal
|
109
|
-
# rules. Terminal rules contain one, and only one Regexp pattern or fixed string.
|
110
|
-
#
|
111
|
-
# @param [Symbol, String] name
|
112
|
-
# the name of the ruleset (note the one ruleset can contain multiple rules)
|
72
|
+
# @param [Symbol, String, Hash] name
|
73
|
+
# the name of the rule, or a Hash mapping the name to a pattern
|
113
74
|
#
|
114
75
|
# @return [RuleSet, Rule]
|
115
76
|
# the newly created RuleSet if a block was given, otherwise a rule representing a
|
116
77
|
# terminal token for the input string +name+.
|
117
78
|
def rule(name)
|
118
|
-
rules[name] = RuleSet.new(name)
|
119
|
-
|
120
79
|
if block_given?
|
80
|
+
raise ArgumentError,
|
81
|
+
"Parser#rule does not accept both a Hash and a block" if name.kind_of?(Hash)
|
82
|
+
|
83
|
+
rules[name] = RuleSet.new(name)
|
121
84
|
rules[name].tap { |r| yield r }
|
122
85
|
else
|
123
|
-
|
86
|
+
key, value = if name.kind_of?(Hash)
|
87
|
+
raise ArgumentError,
|
88
|
+
"Only one element allowed in Hash for Parser#rule" unless name.length == 1
|
89
|
+
|
90
|
+
name.first
|
91
|
+
else
|
92
|
+
[name, name]
|
93
|
+
end
|
94
|
+
|
95
|
+
rules[key] = RuleSet.new(key)
|
96
|
+
rules[key][value].as(:value)
|
124
97
|
end
|
125
98
|
end
|
126
99
|
|
data/lib/whittle/rule.rb
CHANGED
@@ -26,7 +26,7 @@ module Whittle
|
|
26
26
|
# a variable list of components that make up the Rule
|
27
27
|
def initialize(name, *components)
|
28
28
|
@components = components
|
29
|
-
@action =
|
29
|
+
@action = DUMP_ACTION
|
30
30
|
@name = name
|
31
31
|
@terminal = components.length == 1 && !components.first.kind_of?(Symbol)
|
32
32
|
@assoc = :right
|
@@ -142,6 +142,8 @@ module Whittle
|
|
142
142
|
# Given a block, the Rule will be reduced by passing the result of reducing
|
143
143
|
# all inputs as arguments to the block.
|
144
144
|
#
|
145
|
+
# The default action is to return the leftmost input unchanged.
|
146
|
+
#
|
145
147
|
# Given the Symbol :value, the matched input will be returned verbatim.
|
146
148
|
# Given the Symbol :nothing, nil will be returned; you can use this to
|
147
149
|
# skip whitesapce and comments, for example.
|
@@ -165,6 +167,14 @@ module Whittle
|
|
165
167
|
end
|
166
168
|
end
|
167
169
|
|
170
|
+
# Alias for as(:nothing).
|
171
|
+
#
|
172
|
+
# @return [Rule]
|
173
|
+
# returns self
|
174
|
+
def skip!
|
175
|
+
as(:nothing)
|
176
|
+
end
|
177
|
+
|
168
178
|
# Set the associativity of this Rule.
|
169
179
|
#
|
170
180
|
# Accepts values of :left, :right (default) or :nonassoc.
|
data/lib/whittle/version.rb
CHANGED
data/lib/whittle.rb
CHANGED
@@ -3,13 +3,9 @@ require "spec_helper"
|
|
3
3
|
describe "a parser encountering unexpected input" do
|
4
4
|
let(:parser) do
|
5
5
|
Class.new(Whittle::Parser) do
|
6
|
-
rule(:wsp
|
7
|
-
r[/\s+/]
|
8
|
-
end
|
6
|
+
rule(:wsp => /\s+/).skip!
|
9
7
|
|
10
|
-
rule(:id
|
11
|
-
r[/[a-z]+/].as(:value)
|
12
|
-
end
|
8
|
+
rule(:id => /[a-z]+/)
|
13
9
|
|
14
10
|
rule(",")
|
15
11
|
rule("-")
|
@@ -6,12 +6,10 @@ describe "a parser with logical grouping" do
|
|
6
6
|
rule(:expr) do |r|
|
7
7
|
r["(", :expr, ")"].as { |_, expr, _| expr }
|
8
8
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
9
|
-
r[:int]
|
9
|
+
r[:int]
|
10
10
|
end
|
11
11
|
|
12
|
-
rule(:int)
|
13
|
-
r[/[0-9]+/].as { |int| Integer(int) }
|
14
|
-
end
|
12
|
+
rule(:int => /[0-9]+/).as { |int| Integer(int) }
|
15
13
|
|
16
14
|
rule("(")
|
17
15
|
rule(")")
|
@@ -9,12 +9,10 @@ describe "a parser with multiple precedence levels" do
|
|
9
9
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
10
10
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
11
11
|
r[:expr, "/", :expr].as { |a, _, b| a / b }
|
12
|
-
r[:int]
|
12
|
+
r[:int]
|
13
13
|
end
|
14
14
|
|
15
|
-
rule(:int)
|
16
|
-
r[/[0-9]+/].as { |int| Integer(int) }
|
17
|
-
end
|
15
|
+
rule(:int => /[0-9]+/).as { |int| Integer(int) }
|
18
16
|
|
19
17
|
rule("(")
|
20
18
|
rule(")")
|
@@ -6,14 +6,12 @@ describe "a parser depending on operator precedences" do
|
|
6
6
|
rule("+") % :left ^ 1
|
7
7
|
rule("*") % :left ^ 2
|
8
8
|
|
9
|
-
rule(:int)
|
10
|
-
r[/[0-9]+/].as { |i| Integer(i) }
|
11
|
-
end
|
9
|
+
rule(:int => /[0-9]+/).as { |i| Integer(i) }
|
12
10
|
|
13
11
|
rule(:expr) do |r|
|
14
12
|
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
15
13
|
r[:expr, "*", :expr].as { |a, _, b| a * b }
|
16
|
-
r[:int]
|
14
|
+
r[:int]
|
17
15
|
end
|
18
16
|
|
19
17
|
start(:expr)
|
@@ -7,13 +7,11 @@ describe "a parser with a self-referential rule" do
|
|
7
7
|
rule(")")
|
8
8
|
rule("+")
|
9
9
|
|
10
|
-
rule(:int)
|
11
|
-
r[/[0-9]+/].as { |int| Integer(int) }
|
12
|
-
end
|
10
|
+
rule(:int => /[0-9]+/).as { |int| Integer(int) }
|
13
11
|
|
14
12
|
rule(:expr) do |r|
|
15
13
|
r[:expr, "+", :expr].as { |a, _, b| a + b }
|
16
|
-
r[:int]
|
14
|
+
r[:int]
|
17
15
|
end
|
18
16
|
|
19
17
|
start(:expr)
|
@@ -3,19 +3,15 @@ require "spec_helper"
|
|
3
3
|
describe "a parser that skips tokens" do
|
4
4
|
let(:parser) do
|
5
5
|
Class.new(Whittle::Parser) do
|
6
|
-
rule(:wsp
|
7
|
-
r[/\s+/]
|
8
|
-
end
|
6
|
+
rule(:wsp => /\s+/).skip!
|
9
7
|
|
10
8
|
rule("-") % :left
|
11
9
|
|
12
|
-
rule(:int)
|
13
|
-
r[/[0-9]+/].as { |int| Integer(int) }
|
14
|
-
end
|
10
|
+
rule(:int => /[0-9]+/).as { |int| Integer(int) }
|
15
11
|
|
16
12
|
rule(:expr) do |r|
|
17
13
|
r[:expr, "-", :expr].as { |a, _, b| a - b }
|
18
|
-
r[:int]
|
14
|
+
r[:int]
|
19
15
|
end
|
20
16
|
|
21
17
|
start(:expr)
|
@@ -5,9 +5,7 @@ describe "a parser returning the sum of two integers" do
|
|
5
5
|
Class.new(Whittle::Parser) do
|
6
6
|
rule("+")
|
7
7
|
|
8
|
-
rule(:int)
|
9
|
-
r[/[0-9]+/].as { |int| Integer(int) }
|
10
|
-
end
|
8
|
+
rule(:int => /[0-9]+/).as { |int| Integer(int) }
|
11
9
|
|
12
10
|
rule(:sum) do |r|
|
13
11
|
r[:int, "+", :int].as { |a, _, b| a + b }
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: whittle
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.2
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,11 +9,11 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2011-11-
|
12
|
+
date: 2011-11-28 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rspec
|
16
|
-
requirement: &
|
16
|
+
requirement: &70351976364700 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ~>
|
@@ -21,7 +21,7 @@ dependencies:
|
|
21
21
|
version: '2.6'
|
22
22
|
type: :development
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *70351976364700
|
25
25
|
description: ! "Write powerful parsers by defining a series of very simple rules\n
|
26
26
|
\ and operations to perform as those rules are matched. Whittle\n
|
27
27
|
\ parsers are written in pure ruby and as such are extremely
|
@@ -40,6 +40,7 @@ files:
|
|
40
40
|
- LICENSE
|
41
41
|
- README.md
|
42
42
|
- Rakefile
|
43
|
+
- examples/calculator.rb
|
43
44
|
- lib/whittle.rb
|
44
45
|
- lib/whittle/error.rb
|
45
46
|
- lib/whittle/errors/grammar_error.rb
|