parsby 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +13 -0
- data/.rspec +3 -0
- data/.ruby-version +1 -0
- data/.travis.yml +7 -0
- data/Gemfile +6 -0
- data/Gemfile.lock +41 -0
- data/README.md +607 -0
- data/Rakefile +6 -0
- data/bin/all-methods +35 -0
- data/bin/console +40 -0
- data/bin/methods-with-pending-documentation +49 -0
- data/bin/setup +8 -0
- data/bin/tested-methods +47 -0
- data/bin/vestigial-methods +30 -0
- data/lib/parsby.rb +804 -0
- data/lib/parsby/combinators.rb +384 -0
- data/lib/parsby/example/arithmetic_parser.rb +96 -0
- data/lib/parsby/example/csv_parser.rb +41 -0
- data/lib/parsby/example/json_parser.rb +92 -0
- data/lib/parsby/example/lisp_parser.rb +135 -0
- data/lib/parsby/version.rb +3 -0
- data/parsby.gemspec +42 -0
- metadata +121 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 3313d5b66907eba101433827e5cff5823b5736d1
|
4
|
+
data.tar.gz: d4aa6d380a96be184afd6344627679ce5cd6ca27
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 16680e582808dcbaec3f5c9d98bdbf477359d65a6448d0e48a6c3cf173d575b399b14cbeb4ccdbd754d3be4fdcd865a23a85707058c9e6b7a140f6c61598668d
|
7
|
+
data.tar.gz: b8e78360be81c54cb29a8c5ebad112488c2b4f90930bd56d8b029361192bfd0530914df8f2789031f4cf35adf5d60f3e9004113d8d04ac2e20127ed45ac75f1a
|
data/.gitignore
ADDED
data/.rspec
ADDED
data/.ruby-version
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
2.4.1
|
data/.travis.yml
ADDED
data/Gemfile
ADDED
data/Gemfile.lock
ADDED
@@ -0,0 +1,41 @@
|
|
1
|
+
PATH
|
2
|
+
remote: .
|
3
|
+
specs:
|
4
|
+
parsby (0.1.0)
|
5
|
+
|
6
|
+
GEM
|
7
|
+
remote: https://rubygems.org/
|
8
|
+
specs:
|
9
|
+
coderay (1.1.2)
|
10
|
+
diff-lcs (1.3)
|
11
|
+
method_source (1.0.0)
|
12
|
+
pry (0.13.1)
|
13
|
+
coderay (~> 1.1)
|
14
|
+
method_source (~> 1.0)
|
15
|
+
rake (10.5.0)
|
16
|
+
rspec (3.9.0)
|
17
|
+
rspec-core (~> 3.9.0)
|
18
|
+
rspec-expectations (~> 3.9.0)
|
19
|
+
rspec-mocks (~> 3.9.0)
|
20
|
+
rspec-core (3.9.2)
|
21
|
+
rspec-support (~> 3.9.3)
|
22
|
+
rspec-expectations (3.9.2)
|
23
|
+
diff-lcs (>= 1.2.0, < 2.0)
|
24
|
+
rspec-support (~> 3.9.0)
|
25
|
+
rspec-mocks (3.9.1)
|
26
|
+
diff-lcs (>= 1.2.0, < 2.0)
|
27
|
+
rspec-support (~> 3.9.0)
|
28
|
+
rspec-support (3.9.3)
|
29
|
+
|
30
|
+
PLATFORMS
|
31
|
+
ruby
|
32
|
+
|
33
|
+
DEPENDENCIES
|
34
|
+
bundler (~> 1.17)
|
35
|
+
parsby!
|
36
|
+
pry
|
37
|
+
rake (~> 10.0)
|
38
|
+
rspec (~> 3.0)
|
39
|
+
|
40
|
+
BUNDLED WITH
|
41
|
+
1.17.1
|
data/README.md
ADDED
@@ -0,0 +1,607 @@
|
|
1
|
+
# Parsby
|
2
|
+
|
3
|
+
Parser combinator library for Ruby, based on Haskell's Parsec.
|
4
|
+
|
5
|
+
- [Installation](#installation)
|
6
|
+
- [Examples](#examples)
|
7
|
+
- [Introduction](#introduction)
|
8
|
+
- [Defining combinators](#defining-combinators)
|
9
|
+
- [`Parsby.new`](#parsbynew)
|
10
|
+
- [Defining parsers as modules](#defining-parsers-as-modules)
|
11
|
+
- [`ExpectationFailed`](#expectationfailed)
|
12
|
+
- [Cleaning up the parse tree for the trace](#cleaning-up-the-parse-tree-for-the-trace)
|
13
|
+
- [`splicer.start` combinator](#splicerstart-combinator)
|
14
|
+
- [Recursive parsers with `lazy`](#recursive-parsers-with-lazy)
|
15
|
+
- [Parsing left-recursive languages with `reduce` combinator](#parsing-leftrecursive-languages-with-reduce-combinator)
|
16
|
+
- [Parsing from a string, a file, a pipe, a socket, ...](#parsing-from-a-string-a-file-a-pipe-a-socket-)
|
17
|
+
- [Comparing with Haskell's Parsec](#comparing-with-haskells-parsec)
|
18
|
+
- [Development](#development)
|
19
|
+
|
20
|
+
## Installation
|
21
|
+
|
22
|
+
Add this line to your application's Gemfile:
|
23
|
+
|
24
|
+
```ruby
|
25
|
+
gem 'parsby'
|
26
|
+
```
|
27
|
+
|
28
|
+
And then execute:
|
29
|
+
|
30
|
+
$ bundle
|
31
|
+
|
32
|
+
Or install it yourself as:
|
33
|
+
|
34
|
+
$ gem install parsby
|
35
|
+
|
36
|
+
## Examples
|
37
|
+
|
38
|
+
If you'd like to jump right into example parsers that use this library,
|
39
|
+
there's a few in this source:
|
40
|
+
|
41
|
+
- [CSV (RFC 4180 compliant)](lib/parsby/example/csv_parser.rb)
|
42
|
+
- [JSON](lib/parsby/example/json_parser.rb)
|
43
|
+
- [Lisp](lib/parsby/example/lisp_parser.rb)
|
44
|
+
- [Arithmetic expressions](lib/parsby/example/arithmetic_parser.rb)
|
45
|
+
|
46
|
+
## Introduction
|
47
|
+
|
48
|
+
This is a library used to define parsers by declaratively describing a
|
49
|
+
syntax using what's commonly referred to as combinators. Parser combinators
|
50
|
+
are functions that take parsers as inputs and/or return parsers as outputs,
|
51
|
+
i.e. they *combine* parsers into new parsers.
|
52
|
+
|
53
|
+
As an example, `between` is a combinator with 3 parameters: a parser for
|
54
|
+
what's to the left, one for what's to the right, and lastly one for what's
|
55
|
+
in-between them, and it returns a parser that, after parsing, returns the
|
56
|
+
result of the in-between parser:
|
57
|
+
|
58
|
+
```ruby
|
59
|
+
between(lit("<"), lit(">"), decimal).parse "<100>"
|
60
|
+
#=> 100
|
61
|
+
```
|
62
|
+
|
63
|
+
`lit` is a combinator that takes a string and returns a parser for
|
64
|
+
`lit`erally that string.
|
65
|
+
|
66
|
+
## Defining combinators
|
67
|
+
|
68
|
+
If you look at the examples in this source, you'll notice that all
|
69
|
+
combinators are defined with `define_combinator`. Strictly speaking, it's
|
70
|
+
not necessary to use that to define combinators. You can do it with
|
71
|
+
variable assignment or `def` syntax. Nevertheless, `define_combinator` is
|
72
|
+
preferred because it automates the assignment of a label to the combinator.
|
73
|
+
Consider this example:
|
74
|
+
|
75
|
+
```ruby
|
76
|
+
define_combinator :between do |left, right, p|
|
77
|
+
left > p < right
|
78
|
+
end
|
79
|
+
|
80
|
+
between(lit("<"), lit(">"), lit("foo")).label
|
81
|
+
#=> 'between(lit("<"), lit(">"), lit("foo"))'
|
82
|
+
```
|
83
|
+
|
84
|
+
If we use `def` instead of `define_combinator`, then the label would be
|
85
|
+
that of its definition. In the following case, it would be that assigned by
|
86
|
+
`<`.
|
87
|
+
|
88
|
+
```ruby
|
89
|
+
def between(left, right, p)
|
90
|
+
left > p < right
|
91
|
+
end
|
92
|
+
|
93
|
+
between(lit("<"), lit(">"), lit("foo")).label
|
94
|
+
=> '((lit("<") > lit("foo")) < lit(">"))'
|
95
|
+
```
|
96
|
+
|
97
|
+
If we're to wrap that parser in a new one, then the label would be simply
|
98
|
+
unknown.
|
99
|
+
|
100
|
+
```ruby
|
101
|
+
def between(left, right, p)
|
102
|
+
Parsby.new {|c| (left > p < right).parse c }
|
103
|
+
end
|
104
|
+
|
105
|
+
between(lit("<"), lit(">"), lit("foo")).label.to_s
|
106
|
+
=> "<unknown>"
|
107
|
+
```
|
108
|
+
|
109
|
+
## `Parsby.new`
|
110
|
+
|
111
|
+
Now, normally one ought to be able to define parsers using just
|
112
|
+
combinators, but there are times when one might need more control. For
|
113
|
+
those times, the most raw way to define a parser is using `Parsby.new`.
|
114
|
+
|
115
|
+
Here's `lit` as an example:
|
116
|
+
|
117
|
+
```ruby
|
118
|
+
define_combinator :lit, wrap: false do |e, case_sensitive: true|
|
119
|
+
Parsby.new do |c|
|
120
|
+
a = c.bio.read e.length
|
121
|
+
if case_sensitive ? a == e : a.to_s.downcase == e.downcase
|
122
|
+
a
|
123
|
+
else
|
124
|
+
raise ExpectationFailed.new c
|
125
|
+
end
|
126
|
+
end
|
127
|
+
end
|
128
|
+
```
|
129
|
+
|
130
|
+
It takes a string argument for what it `e`xpects to parse, and returns what
|
131
|
+
was `a`ctually parsed if it matches the expectation.
|
132
|
+
|
133
|
+
The block parameter `c` is a `Parsby::Context`. `c.bio` holds a
|
134
|
+
`Parsby::BackedIO`. The `parse` method of `Parsby` objects accepts ideally
|
135
|
+
any `IO` (and `String`s, which it turns into `StringIO`) and then wraps
|
136
|
+
them with `BackedIO` to give the `IO` the ability to backtrack.
|
137
|
+
|
138
|
+
## Defining parsers as modules
|
139
|
+
|
140
|
+
The typical pattern I use is something like this:
|
141
|
+
|
142
|
+
```ruby
|
143
|
+
module FoobarParser
|
144
|
+
include Parsby::Combinators
|
145
|
+
extend self
|
146
|
+
|
147
|
+
# Entrypoint nicety
|
148
|
+
def parse(s)
|
149
|
+
foobar.parse s
|
150
|
+
end
|
151
|
+
|
152
|
+
define_combinator :foobar do
|
153
|
+
foo + bar
|
154
|
+
end
|
155
|
+
|
156
|
+
define_combinator :foo do
|
157
|
+
lit("foo")
|
158
|
+
end
|
159
|
+
|
160
|
+
define_combinator :bar do
|
161
|
+
lit("bar")
|
162
|
+
end
|
163
|
+
end
|
164
|
+
```
|
165
|
+
|
166
|
+
From that, you can use it directly as:
|
167
|
+
|
168
|
+
```ruby
|
169
|
+
FoobarParser.parse "foobar"
|
170
|
+
#=> "foobar"
|
171
|
+
FoobarParser.foo.parse "foo"
|
172
|
+
#=> "foo"
|
173
|
+
```
|
174
|
+
|
175
|
+
Being able to use subparsers directly is useful for when you want to e.g.
|
176
|
+
parse JSON array, instead of any JSON value.
|
177
|
+
|
178
|
+
Writing the parser as a module like that also makes it easy to make a new
|
179
|
+
parser based on it:
|
180
|
+
|
181
|
+
```ruby
|
182
|
+
module FoobarbazParser
|
183
|
+
include FoobarParser
|
184
|
+
extend self
|
185
|
+
|
186
|
+
def parse(s)
|
187
|
+
foobarbaz.parse s
|
188
|
+
end
|
189
|
+
|
190
|
+
define_combinator :foobarbaz do
|
191
|
+
foobar + baz
|
192
|
+
end
|
193
|
+
|
194
|
+
define_combinator :baz do
|
195
|
+
lit("baz")
|
196
|
+
end
|
197
|
+
end
|
198
|
+
```
|
199
|
+
|
200
|
+
You can also define such a module to hold your own project's combinators to
|
201
|
+
use in multiple parsers.
|
202
|
+
|
203
|
+
## `ExpectationFailed`
|
204
|
+
|
205
|
+
Here's an example of an error, when parsing fails:
|
206
|
+
|
207
|
+
```
|
208
|
+
pry(main)> Parsby::Example::LispParser.sexp.parse "(foo `(foo ,bar) 2.3 . . nil)"
|
209
|
+
Parsby::ExpectationFailed: line 1:
|
210
|
+
(foo `(foo ,bar) 2.3 . . nil)
|
211
|
+
| * failure: char_in("([")
|
212
|
+
| * failure: list
|
213
|
+
| *| failure: symbol
|
214
|
+
| *|| failure: nil
|
215
|
+
| *||| failure: string
|
216
|
+
| *|||| failure: number
|
217
|
+
\\\||
|
218
|
+
| *| failure: atom
|
219
|
+
| *|| failure: abbrev
|
220
|
+
\\|
|
221
|
+
| * failure: sexp
|
222
|
+
V *| success: lit(".")
|
223
|
+
\-/ *|| success: sexp
|
224
|
+
\---------/ *||| success: sexp
|
225
|
+
\-/ *|||| success: sexp
|
226
|
+
V *||||| success: char_in("([")
|
227
|
+
\\\\\|
|
228
|
+
| * failure: list
|
229
|
+
| * failure: sexp
|
230
|
+
```
|
231
|
+
|
232
|
+
As can be seen, Parsby manages a tree structure representing parsers and
|
233
|
+
their subparsers, with the information of where a particular parser began
|
234
|
+
parsing, where it ended, whether it succeeded or failed, and the label of
|
235
|
+
the parser.
|
236
|
+
|
237
|
+
It might be worth mentioning that when debugging a parser from an
|
238
|
+
unexpected `ExpectationFailed` error, the backtrace isn't really useful.
|
239
|
+
That's because the backtrace points to the code involved in parsing, not
|
240
|
+
the code involved in constructing the parsers, which succeeded, but is
|
241
|
+
where the problem typically lies. The tree-looking exception message above
|
242
|
+
is meant to somewhat substitute the utility of the backtrace in these
|
243
|
+
cases.
|
244
|
+
|
245
|
+
Relating to that, the right-most text are the labels of the corresponding
|
246
|
+
parsers. I find that labels that resemble the source code are quite useful,
|
247
|
+
just like the code location descriptions that appear right-most in
|
248
|
+
backtraces. It's because of this that I consider the use of
|
249
|
+
`define_combinator` more preferable than using `def` and explicitly
|
250
|
+
assigning labels.
|
251
|
+
|
252
|
+
### Cleaning up the parse tree for the trace
|
253
|
+
|
254
|
+
If you look at the source of the example lisp parser, you might note that
|
255
|
+
there are a lot more parsers in between those shown in the tree above.
|
256
|
+
`sexp` is not a direct child of `list`, for example, despite it appearing
|
257
|
+
as so. There are at least 6 ancestors/descendant parsers between `list` and
|
258
|
+
`sexp`. It'd be very much pointless to show them all. They convey little
|
259
|
+
additional information and their labels are very verbose.
|
260
|
+
|
261
|
+
### `splicer.start` combinator
|
262
|
+
|
263
|
+
The reason why they don't appear is because `splicer` is used to make the
|
264
|
+
tree look a little cleaner.
|
265
|
+
|
266
|
+
The name comes from JS's `Array.prototype.splice`, to which you can give a
|
267
|
+
starting position, and a count specifying the end, and it'll remove the
|
268
|
+
specified elements from an Array. We use `splicer` likewise, only it works
|
269
|
+
on parse trees. To show an example, here's a simplified definition of
|
270
|
+
`choice`:
|
271
|
+
|
272
|
+
```ruby
|
273
|
+
define_combinator :choice do |*ps|
|
274
|
+
ps = ps.flatten
|
275
|
+
|
276
|
+
ps.reduce(unparseable) do |a, p|
|
277
|
+
a | p
|
278
|
+
end
|
279
|
+
end
|
280
|
+
```
|
281
|
+
|
282
|
+
Let's fail it:
|
283
|
+
|
284
|
+
```
|
285
|
+
pry(main)> choice(lit("foo"), lit("bar"), lit("baz")).parse "qux"
|
286
|
+
Parsby::ExpectationFailed: line 1:
|
287
|
+
qux
|
288
|
+
\-/ * failure: lit("baz")
|
289
|
+
\-/ *| failure: lit("bar")
|
290
|
+
\-/ *|| failure: lit("foo")
|
291
|
+
| *||| failure: unparseable
|
292
|
+
\|||
|
293
|
+
| *|| failure: (unparseable | lit("foo"))
|
294
|
+
\||
|
295
|
+
| *| failure: ((unparseable | lit("foo")) | lit("bar"))
|
296
|
+
\|
|
297
|
+
| * failure: (((unparseable | lit("foo")) | lit("bar")) | lit("baz"))
|
298
|
+
| * failure: choice(lit("foo"), lit("bar"), lit("baz"))
|
299
|
+
```
|
300
|
+
|
301
|
+
Those parser intermediaries that use `|` aren't really making things any
|
302
|
+
clearer. Let's use `splicer` to remove those:
|
303
|
+
|
304
|
+
```ruby
|
305
|
+
define_combinator :choice do |*ps|
|
306
|
+
ps = ps.flatten
|
307
|
+
|
308
|
+
splicer.start do |m|
|
309
|
+
ps.reduce(unparseable) do |a, p|
|
310
|
+
a | m.end(p)
|
311
|
+
end
|
312
|
+
end
|
313
|
+
end
|
314
|
+
```
|
315
|
+
|
316
|
+
Let's fail it, again:
|
317
|
+
|
318
|
+
```
|
319
|
+
pry(main)> choice(lit("foo"), lit("bar"), lit("baz")).parse "qux"
|
320
|
+
Parsby::ExpectationFailed: line 1:
|
321
|
+
qux
|
322
|
+
\-/ * failure: lit("baz")
|
323
|
+
\-/ *| failure: lit("bar")
|
324
|
+
\-/ *|| failure: lit("foo")
|
325
|
+
\\|
|
326
|
+
| * failure: splicer.start((((unparseable | splicer.end(lit("foo"))) | splicer.end(lit("bar"))) | splicer.end(lit("baz"))))
|
327
|
+
| * failure: choice(lit("foo"), lit("bar"), lit("baz"))
|
328
|
+
```
|
329
|
+
|
330
|
+
Now, the only issue left is that `define_combinator` wraps the result of
|
331
|
+
the parser in another parser. Let's disable that wrapping by passing `wrap:
|
332
|
+
false` to it:
|
333
|
+
|
334
|
+
```ruby
|
335
|
+
define_combinator :choice, wrap: false do |*ps|
|
336
|
+
ps = ps.flatten
|
337
|
+
|
338
|
+
splicer.start do |m|
|
339
|
+
ps.reduce(unparseable) do |a, p|
|
340
|
+
a | m.end(p)
|
341
|
+
end
|
342
|
+
end
|
343
|
+
end
|
344
|
+
```
|
345
|
+
|
346
|
+
Let's fail it, again:
|
347
|
+
|
348
|
+
```
|
349
|
+
pry(main)> choice(lit("foo"), lit("bar"), lit("baz")).parse "qux"
|
350
|
+
Parsby::ExpectationFailed: line 1:
|
351
|
+
qux
|
352
|
+
\-/ * failure: lit("baz")
|
353
|
+
\-/ *| failure: lit("bar")
|
354
|
+
\-/ *|| failure: lit("foo")
|
355
|
+
\\|
|
356
|
+
| * failure: choice(lit("foo"), lit("bar"), lit("baz"))
|
357
|
+
```
|
358
|
+
|
359
|
+
## Recursive parsers with `lazy`
|
360
|
+
|
361
|
+
If we try to define a recursive parser using combinators like so:
|
362
|
+
|
363
|
+
```ruby
|
364
|
+
define_combinator :value do
|
365
|
+
list | lit("foo")
|
366
|
+
end
|
367
|
+
|
368
|
+
define_combinator :list do
|
369
|
+
between(lit("["), lit("]"), sep_by(lit(","), spaced(value)))
|
370
|
+
end
|
371
|
+
|
372
|
+
value
|
373
|
+
#=> SystemStackError: stack level too deep
|
374
|
+
```
|
375
|
+
|
376
|
+
We get a stack overflow.
|
377
|
+
|
378
|
+
This isn't a problem in Haskell because the language evaluates lazily by
|
379
|
+
default. This allows it to define recursive parsers without even thinking
|
380
|
+
about it.
|
381
|
+
|
382
|
+
In Ruby's case, we need to be explicit about our laziness. For that,
|
383
|
+
there's `lazy`. We just need to wrap one of the expressions in the
|
384
|
+
recursive loop with it. It could be the `value` call in `list`; it could be
|
385
|
+
`list` call in `value`; it could be the whole of `value`. It really doesn't
|
386
|
+
matter.
|
387
|
+
|
388
|
+
```ruby
|
389
|
+
define_combinator :value do
|
390
|
+
lazy { list | lit("foo") }
|
391
|
+
end
|
392
|
+
|
393
|
+
define_combinator :list do
|
394
|
+
between(lit("["), lit("]"), sep_by(lit(","), spaced(value)))
|
395
|
+
end
|
396
|
+
|
397
|
+
value.parse "[[[[foo, foo]]]]"
|
398
|
+
#=> [[[["foo", "foo"]]]]
|
399
|
+
```
|
400
|
+
|
401
|
+
## Parsing left-recursive languages with `reduce` combinator
|
402
|
+
|
403
|
+
Here's a little arithmetic parser based on the
|
404
|
+
`Parsby::Example::ArithmeticParser`:
|
405
|
+
|
406
|
+
```ruby
|
407
|
+
define_combinator :div_op {|left, right| group(left, spaced(lit("/")), right) }
|
408
|
+
define_combinator :mul_op {|left, right| group(left, spaced(lit("*")), right) }
|
409
|
+
define_combinator :add_op {|left, right| group(left, spaced(lit("+")), right) }
|
410
|
+
define_combinator :sub_op {|left, right| group(left, spaced(lit("-")), right) }
|
411
|
+
|
412
|
+
def scope(x, &b)
|
413
|
+
b.call x
|
414
|
+
end
|
415
|
+
|
416
|
+
define_combinator :expr do
|
417
|
+
lazy do
|
418
|
+
e = decimal
|
419
|
+
|
420
|
+
# hpe -- higher precedence level expression
|
421
|
+
# spe -- same precedence level expression
|
422
|
+
|
423
|
+
e = scope e do |hpe|
|
424
|
+
recursive do |spe|
|
425
|
+
choice(
|
426
|
+
mul_op(hpe, spe),
|
427
|
+
div_op(hpe, spe),
|
428
|
+
hpe,
|
429
|
+
)
|
430
|
+
end
|
431
|
+
end
|
432
|
+
|
433
|
+
e = scope e do |hpe|
|
434
|
+
recursive do |spe|
|
435
|
+
choice(
|
436
|
+
add_op(hpe, spe),
|
437
|
+
sub_op(hpe, spe),
|
438
|
+
hpe,
|
439
|
+
)
|
440
|
+
end
|
441
|
+
end
|
442
|
+
end
|
443
|
+
end
|
444
|
+
|
445
|
+
expr.parse "5 - 4 - 3"
|
446
|
+
#=> [5, "-", [4, "-", 3]]
|
447
|
+
```
|
448
|
+
|
449
|
+
Now, that's incorrectly right-associative because we made the
|
450
|
+
precedence-level parsers right-recursive. See how the block parameter of
|
451
|
+
`recursive` is used for the right operands and not the left ones?
|
452
|
+
|
453
|
+
Let's fix that by switching the parsers used for the operands:
|
454
|
+
|
455
|
+
```ruby
|
456
|
+
define_combinator :expr do
|
457
|
+
lazy do
|
458
|
+
e = decimal
|
459
|
+
|
460
|
+
# hpe -- higher precedence level expression
|
461
|
+
# spe -- same precedence level expression
|
462
|
+
|
463
|
+
e = scope e do |hpe|
|
464
|
+
recursive do |spe|
|
465
|
+
choice(
|
466
|
+
mul_op(spe, hpe),
|
467
|
+
div_op(spe, hpe),
|
468
|
+
hpe,
|
469
|
+
)
|
470
|
+
end
|
471
|
+
end
|
472
|
+
|
473
|
+
e = scope e do |hpe|
|
474
|
+
recursive do |spe|
|
475
|
+
choice(
|
476
|
+
add_op(spe, hpe),
|
477
|
+
sub_op(spe, hpe),
|
478
|
+
hpe,
|
479
|
+
)
|
480
|
+
end
|
481
|
+
end
|
482
|
+
end
|
483
|
+
end
|
484
|
+
|
485
|
+
expr.parse "5 - 4 - 3"
|
486
|
+
# ...
|
487
|
+
```
|
488
|
+
|
489
|
+
If you ran this, it might take a while, but eventually you'll have a bunch
|
490
|
+
of `SystemStackError: stack level too deep` errors.
|
491
|
+
|
492
|
+
What's happening is that e.g. while trying to check whether the expression
|
493
|
+
is a subtraction, it needs to first resolve the left operand, and as part
|
494
|
+
of that it needs to check whether *that's* a subtraction, and so on and so
|
495
|
+
forth. In other words, this causes infinite recursion. It can't even read a
|
496
|
+
single character of the input because of this.
|
497
|
+
|
498
|
+
Our problem is that we're parsing top-down. We're trying to understand what
|
499
|
+
the whole thing is before looking at the parts. We need to parse bottom-up.
|
500
|
+
Successfully parse a small piece, then figure out what the whole thing is
|
501
|
+
as we keep reading. To do that while keeping our definitions declarative,
|
502
|
+
we can use the `reduce` combinator (in combination with `pure`):
|
503
|
+
|
504
|
+
```ruby
|
505
|
+
define_combinator :expr do
|
506
|
+
lazy do
|
507
|
+
e = decimal
|
508
|
+
|
509
|
+
# hpe -- higher precedence level expression
|
510
|
+
# spe -- same precedence level expression
|
511
|
+
|
512
|
+
e = scope e do |hpe|
|
513
|
+
reduce hpe do |left_result|
|
514
|
+
choice(
|
515
|
+
mul_op(pure(left_result), hpe),
|
516
|
+
div_op(pure(left_result), hpe),
|
517
|
+
)
|
518
|
+
end
|
519
|
+
end
|
520
|
+
|
521
|
+
e = scope e do |hpe|
|
522
|
+
reduce hpe do |left_result|
|
523
|
+
choice(
|
524
|
+
add_op(pure(left_result), hpe),
|
525
|
+
sub_op(pure(left_result), hpe),
|
526
|
+
)
|
527
|
+
end
|
528
|
+
end
|
529
|
+
end
|
530
|
+
end
|
531
|
+
|
532
|
+
expr.parse "5 - 4 - 3"
|
533
|
+
#=> [[5, "-", 4], "-", 3]
|
534
|
+
```
|
535
|
+
|
536
|
+
`reduce` starts parsing with its argument, in this case `hpe`, then passes
|
537
|
+
the result to the block, which uses it for the resolved left operand.
|
538
|
+
`reduce` then parses with the parser returned by the block and passes the
|
539
|
+
result again to the block, and so on and so forth until parsing fails,
|
540
|
+
returning the result of the last successful parse.
|
541
|
+
|
542
|
+
In effect, we're parsing left operands bottom-up and right operands
|
543
|
+
top-down.
|
544
|
+
|
545
|
+
## Parsing from a string, a file, a pipe, a socket, ...
|
546
|
+
|
547
|
+
Any `IO` ought to work (unit tests currently have only checked pipes,
|
548
|
+
though). When you pass a string to `Parsby#parse` it wraps it with
|
549
|
+
`StringIO` before using it.
|
550
|
+
|
551
|
+
## Comparing with Haskell's Parsec
|
552
|
+
|
553
|
+
If you're already familiar with Parsec, here are some similarities:
|
554
|
+
|
555
|
+
```ruby
|
556
|
+
# Parsby # -- Parsec
|
557
|
+
#
|
558
|
+
lit("foo") # string "foo"
|
559
|
+
#
|
560
|
+
foo | bar # foo <|> bar
|
561
|
+
#
|
562
|
+
pure "foo" # pure "foo"
|
563
|
+
#
|
564
|
+
foo.then {|x| bar x } # foo >>= \x -> bar x
|
565
|
+
#
|
566
|
+
foobar = Parsby.new do |c| # foobar = do
|
567
|
+
x = foo.parse c # x <- foo
|
568
|
+
bar(x).parse c # bar x
|
569
|
+
end #
|
570
|
+
#
|
571
|
+
lit("(") > foo < lit(")") # string "(" *> foo <* string ")"
|
572
|
+
#
|
573
|
+
lit("5").fmap {|n| n.to_i + 1 } # fmap (\n -> read n + 1) (string "5")
|
574
|
+
#
|
575
|
+
group(x, y, z) # (,,) <$> x <*> y <*> z
|
576
|
+
#
|
577
|
+
group( #
|
578
|
+
w, #
|
579
|
+
group(x, y), #
|
580
|
+
z, #
|
581
|
+
).fmap do |(wr, (xr, yr), zr)| #
|
582
|
+
Foo.new(wr, Bar.new(xr, yr), zr) # Foo <$> w <*> (Bar <$> x <*> y) <*> z
|
583
|
+
end #
|
584
|
+
#
|
585
|
+
# -- Means the same, but this
|
586
|
+
# -- raises an error in Haskell
|
587
|
+
# -- because it requires an
|
588
|
+
# -- infinite type [[[[...]]]]
|
589
|
+
recursive do |p| # fix $ \p ->
|
590
|
+
between(lit("("), lit(")"), # between (string "(") (string ")") $
|
591
|
+
single(p) | pure([]) # ((:[]) <$> p) <|> pure []
|
592
|
+
end #
|
593
|
+
end #
|
594
|
+
```
|
595
|
+
|
596
|
+
## Development
|
597
|
+
|
598
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then,
|
599
|
+
run `rake spec` to run the tests. You can also run `bin/console` for an
|
600
|
+
interactive prompt that will allow you to experiment.
|
601
|
+
|
602
|
+
`bin/console` includes `Parsby::Combinators` into the top-level so the
|
603
|
+
combinators and `define_combinator` are available directly from the prompt.
|
604
|
+
It also defines `reload!` to quickly load changes made to the source.
|
605
|
+
|
606
|
+
To install this gem onto your local machine, run `bundle exec rake
|
607
|
+
install`.
|