ruby-ll 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: ef87cedfa33d3340b77133abff1bdb11f5ad767e
4
+ data.tar.gz: 6e25ecbd4a78bc7f3469bba05d85a1cd9634b13a
5
+ SHA512:
6
+ metadata.gz: 07561fc9d28c285ec101c5f9d2c67f16f15b700efd814af5a49174e9e47de90f44f2ffb0859d039f7022bd494f74b8867677c5fd93b6189dd44c40367e5dc62d
7
+ data.tar.gz: 35a3352d8d2207937d5e1d8657f2c366f8417e6604fc2aef88effe9ed6d94a710e01b8ce4830881ca439fb7fa25a4b326e49a5157a925277d6cc260e4ae5c3f1
@@ -0,0 +1,13 @@
1
+ ./lib/ll/**/*.rb ./lib/ll.rb
2
+ -m markdown
3
+ -M kramdown
4
+ -o yardoc
5
+ -r ./README.md
6
+ --private
7
+ --protected
8
+ --asset ./doc/css/common.css:css/common.css
9
+ --verbose
10
+ -
11
+ ./doc/*.md
12
+ LICENSE
13
+ CONTRIBUTING.md
data/LICENSE ADDED
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2015, Yorick Peterse
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
@@ -0,0 +1,380 @@
1
+ # ruby-ll
2
+
3
+ ruby-ll is a high performance LL(1) table based parser generator for Ruby. The
4
+ parser driver is written in C/Java to ensure good runtime performance, the
5
+ compiler is written entirely in Ruby.
6
+
7
+ ruby-ll was written to serve as a fast and easy to use alternative to
8
+ [Racc][racc] for the various parsers used in [Oga][oga]. However, ruby-ll isn't
9
+ limited to just Oga, you can use it to write a parser for any language that can
10
+ be represented using an LL(1) grammar.
11
+
12
+ ruby-ll is self-hosting, this allows one to use ruby-ll to modify its own
13
+ parser. Self-hosting was achieved by bootstrapping the parser using a Racc
14
+ parser that outputs the same AST as the ruby-ll parser. The Racc parser remains
15
+ in the repository for historical purposes and in case it's ever needed again, it
16
+ can be found in [bootstrap/parser.y](lib/ll/bootstrap/parser.y).
17
+
18
+ For more information on LL parsing, see
19
+ <https://en.wikipedia.org/wiki/LL_parser>.
20
+
21
+ ## Features
22
+
23
+ * Support for detecting first/first and first/follow conflicts
24
+ * clang-like error/warning messages to ease debugging parsers
25
+ * High performance and a low memory footprint
26
+
27
+ ## Requirements
28
+
29
+ | Ruby | Required | Recommended |
30
+ |:---------|:--------------|:------------|
31
+ | MRI | >= 1.9.3 | >= 2.1.0 |
32
+ | Rubinius | >= 2.2 | >= 2.5.0 |
33
+ | JRuby | >= 1.7 | >= 1.7.0 |
34
+ | Maglev | Not supported | |
35
+ | Topaz | Not supported | |
36
+ | mruby | Not supported | |
37
+
38
+ For MRI/Rubinius you'll need a C90 compatible compiler such as clang or gcc. For
39
+ JRuby you don't need any compilers to be installed as the .jar is packaged with
40
+ the Gem itself.
41
+
42
+ When hacking on Oga you'll also need to have the following installed:
43
+
44
+ * Ragel 6 for building the grammar lexer
45
+ * javac for building the JRuby extension
46
+
47
+ ## Usage
48
+
49
+ The CLI takes a grammar input file (see below for the exact syntax) with the
50
+ extension `.rll` and turns it into a corresponding Ruby file. For example:
51
+
52
+ ruby-ll lib/my-gem/parser.rll
53
+
54
+ This would result in the parser being written to `lib/my-gem/parser.rb`. If you
55
+ want to customize the output path you can do so using the `-o` / `--output`
56
+ options:
57
+
58
+ ruby-ll lib/my-gem/parser.rll -o lib/my-gem/my-parser.rb
59
+
60
+ By default ruby-ll adds various `require` calls to ensure you can load the
61
+ parser _without_ having to load all of ruby-ll (e.g. the compiler code). If you
62
+ want to disable this behaviour you can use the `--no-requires` option when
63
+ processing a grammar:
64
+
65
+ ruby-ll lib/my-gem/parser.rll --no-requires
66
+
67
+ Once generated you can use the parser class like any other parser. To start
68
+ parsing simply call the `parse` method:
69
+
70
+ parser = MyGem::Parser.new
71
+
72
+ parser.parse
73
+
74
+ The return value of this method is whatever the root rule (= the first rule
75
+ defined) returned.
76
+
77
+ ## Grammar Syntax
78
+
79
+ The syntax of a ruby-ll grammar file is fairly simple and consists out of
80
+ directives, rules, comments and code blocks.
81
+
82
+ Directives can be seen as configuration options, for example to set the name of
83
+ the parser class. Rules are, well, the parsing rules. Code blocks can be used to
84
+ associate Ruby code with either a branch of a rule or a certain section of the
85
+ parser (the header or its inner body).
86
+
87
+ Directives and rules must be terminated using a semicolon, this is not needed
88
+ for `%inner` / `%header` blocks.
89
+
90
+ For a full example, see ruby-ll's own parser located at
91
+ [lib/ll/parser.rll](lib/ll/parser.rll).
92
+
93
+ ### Comments
94
+
95
+ Comments start with a hash (`#`) sign and continue until the end of the line,
96
+ just like Ruby. Example:
97
+
98
+ # Some say comments are a code smell.
99
+
100
+ ### %name
101
+
102
+ The `%name` directive is used to set the full name/namespace of the parser
103
+ class. The name consists out of a single identifier or multiple identifiers
104
+ separated by `::` (just like Ruby). Some examples:
105
+
106
+ %name A;
107
+ %name A::B;
108
+ %name A::B::C;
109
+
110
+ The last identifier is used as the actual class name. This class will be nested
111
+ inside a module for every other segment leading up to the last one. For example,
112
+ this:
113
+
114
+ %name A;
115
+
116
+ Gets turned into this:
117
+
118
+ class A < LL::Driver
119
+
120
+ end
121
+
122
+ While this:
123
+
124
+ %name A::B::C;
125
+
126
+ Gets turned into this:
127
+
128
+ module A
129
+ module B
130
+ class C < LL::Driver
131
+
132
+ end
133
+ end
134
+ end
135
+
136
+ By nesting the parser class in modules any constants in the scope can be
137
+ referred to without requiring the use of a full namespace. For example, the
138
+ constant `A::B::X` can just be referred to as `X` in the above example.
139
+
140
+ Multiple calls to this directive will result in previous values being
141
+ overwritten.
142
+
143
+ ### %terminals
144
+
145
+ The `%terminals` directive is used to list one or more terminals of the grammar.
146
+ Each terminal is an identifier separated by a space. For example:
147
+
148
+ %terminals A B C;
149
+
150
+ This would define 3 terminals: `A`, `B` and `C`. While there's no specific
151
+ requirement as to how you name your terminals it's common practise to capitalize
152
+ them and prefix them with `T_`, like so:
153
+
154
+ %terminals T_A T_B T_C;
155
+
156
+ Multiple calls to this directive will result in the terminals being appended to
157
+ the existing list.
158
+
159
+ ### %inner
160
+
161
+ The `%inner` directive can be used to specify a code block that should be placed
162
+ inside the parser's body, just after the section containing all parsing tables.
163
+ This directive should be used for adding custom methods and such to the parser.
164
+ For example:
165
+
166
+ %inner
167
+ {
168
+ def initialize(input)
169
+ @input = input
170
+ end
171
+ }
172
+
173
+ This would result in the following:
174
+
175
+ class A < LL::Driver
176
+ def initialize(input)
177
+ @input = input
178
+ end
179
+ end
180
+
181
+ Curly braces can either be placed on the same line as the `%inner` directive or
182
+ on a new line, it's up to you.
183
+
184
+ Unlike regular directives this directive should not be terminated using a
185
+ semicolon.
186
+
187
+ ### %header
188
+
189
+ The `%header` directive is similar to the `%inner` directive in that it can be
190
+ used to add a code block to the parser. The code of this directive is placed
191
+ just before the `class` definition of the parser. This directive can be used to
192
+ add documentation to the parser class. For example:
193
+
194
+ %header
195
+ {
196
+ # Hello world
197
+ }
198
+
199
+ This would result in the following:
200
+
201
+ # Hello world
202
+ class A < LL::Driver
203
+ end
204
+
205
+ ### Rules
206
+
207
+ Rules consist out of a name followed by an equals sign (`=`) followed by 1 or
208
+ more branches. Each branch is separated using a pipe (`|`). A branch can consist
209
+ out of 1 or many steps, or an epsilon. Branches can be followed by a code block
210
+ starting with `{` and ending with `}`. A rule must be terminated using a
211
+ semicolon.
212
+
213
+ An epsilon is represented as a single underscore (`_`) and is used to denote a
214
+ wildcard/nothingness.
215
+
216
+ A simple example:
217
+
218
+ %terminals A;
219
+
220
+ numbers = A | B;
221
+
222
+ Here the rule `numbers` is defined and has two branches. If we wanted a rule
223
+ that would match terminal `A` or nothing we'd use the following:
224
+
225
+ %terminals A;
226
+
227
+ numbers = A | _;
228
+
229
+ Code blocks can also be added:
230
+
231
+ numbers
232
+ = A { 'A' }
233
+ | B { 'B' }
234
+ ;
235
+
236
+ When the terminal `A` would be processed the returned value would be "B", for
237
+ terminal `B` the returned value would be "B".
238
+
239
+ Code blocks have access to an array called `val` which contains the values of
240
+ every step of a branch. For example:
241
+
242
+ numbers = A B { val };
243
+
244
+ Here `val` would return `[A, B]`. Since `val` is just an Array you can also
245
+ return specific elements from it:
246
+
247
+ numbers = A B { val[0] };
248
+
249
+ Values returned by code blocks are passed to whatever other rule called it. This
250
+ allows code blocks to be used for building ASTs and the likes. If no explicit
251
+ code block is defined `val` is returned as is.
252
+
253
+ ruby-ll parsers recurse into rules before unwinding, this means that the
254
+ inner-most rule is processed first.
255
+
256
+ Branches of a rule can also refer to other rules:
257
+
258
+ numbers = A other_rule;
259
+ other_rule = B;
260
+
261
+ The value for `other_rule` in the `numbers` rule would be whatever the
262
+ `other_rule` below it returns.
263
+
264
+ The grammar compiler adds errors whenever it encounters a rule with the same
265
+ name as a terminal, as such the following is invalid:
266
+
267
+ %terminals A B;
268
+
269
+ A = B;
270
+
271
+ It's also an error to re-define an existing rule.
272
+
273
+ ## Conflicts
274
+
275
+ LL(1) grammars can have two kinds of conflicts in a rule:
276
+
277
+ * first/first
278
+ * first/follow
279
+
280
+ ### first/first
281
+
282
+ A first/first conflict means that multiple branches of a rule start with the
283
+ same terminal, resulting in the parser being unable to choose what branch to
284
+ use. For example:
285
+
286
+ %terminals A B;
287
+
288
+ rule = A | A B;
289
+
290
+ This would result in the following output:
291
+
292
+ example.rll:5:1:error: first/first conflict, multiple branches start with the same terminals
293
+ rule = A | A B;
294
+ ^
295
+ example.rll:5:8:error: branch starts with: A
296
+ rule = A | A B;
297
+ ^
298
+ example.rll:5:12:error: branch starts with: A
299
+ rule = A | A B;
300
+ ^
301
+
302
+ To solve a first/first conflict you'll have to factor out the common left
303
+ factor. For example:
304
+
305
+ %name Example;
306
+
307
+ %terminals A B;
308
+
309
+ rule = A rule_follow;
310
+ rule_follow = B | _;
311
+
312
+ Here the `rule` rule starts with terminal `A` and can optionally be followed by
313
+ `B`, without introducing any first/first conflicts.
314
+
315
+ ### first/follow
316
+
317
+ A first/follow conflict occurs when a branch in a rule starts with an epsilon
318
+ and is followed by one or more terminals and/or rules. An example of a
319
+ first/follow conflict:
320
+
321
+ %name Example;
322
+
323
+ %terminals A B;
324
+
325
+ rule = other_rule B;
326
+ other_rule = A | _;
327
+
328
+ This produces the following errors:
329
+
330
+ example.rll:5:14:error: first/follow conflict, branch can start with epsilon and is followed by (non) terminals
331
+ rule = other_rule B;
332
+ ^
333
+ example.rll:6:18:error: epsilon originates from here
334
+ other_rule = A | _;
335
+ ^
336
+
337
+ There's no specific procedure to solving such a conflict other than simply
338
+ removing the starting epsilon.
339
+
340
+ ## Performance
341
+
342
+ One of the goals of ruby-ll is to be faster than existing parser generators,
343
+ Racc in particular. How much faster ruby-ll will be depends on the use case. For
344
+ example, for the benchmark
345
+ [benchmark/ll/simple\_json\_bench.rb](benchmark/l/simple_json_bench.rb) the
346
+ performance gains of ruby-ll over Racc are as following:
347
+
348
+ | Ruby | Speed |
349
+ |:----------------|:------|
350
+ | MRI 2.2 | 1.75x |
351
+ | Rubinius 2.5.2 | 3.85x |
352
+ | JRuby 1.7.18 | 6.44x |
353
+ | JRuby 9000 pre1 | 7.50x |
354
+
355
+ This benchmark was run on a Thinkpad T520 laptop so it's probably best to run
356
+ the bencharmk yourself to see how it behaves on your platform.
357
+
358
+ Depending on the complexity of your parser you might end up with different
359
+ different numbers. The above metrics are simply an indication of the maximum
360
+ performance gain of ruby-ll compared to Racc.
361
+
362
+ ## Thread Safety
363
+
364
+ Parsers generated by ruby-ll share an internal, mutable state on a per instance
365
+ basis. As a result of this a single instance of your parser _can not_ be used by
366
+ multiple threads in parallel. If it wasn't for MRI's C API (specifically due to
367
+ how `rb_block_call` works) this wouldn't have been an issue.
368
+
369
+ To mitigate the above simply create a new instance of your parser every time you
370
+ need it and have the GC clean it up once you're done. This _will_ introduce a
371
+ slight allocation overhead but it beats having to deal with race conditions.
372
+
373
+ ## License
374
+
375
+ All source code in this repository is licensed under the MIT license unless
376
+ specified otherwise. A copy of this license can be found in the file "LICENSE"
377
+ in the root directory of this repository.
378
+
379
+ [racc]: https://github.com/tenderlove/racc
380
+ [oga]: https://github.com/yorickpeterse/oga
@@ -0,0 +1,5 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require_relative '../lib/ll'
4
+
5
+ LL::CLI.new.run
@@ -0,0 +1,25 @@
1
+ # Developer's Certificate of Origin 1.0
2
+
3
+ By making a contribution to this project, I certify that:
4
+
5
+ 1. The contribution was created in whole or in part by me and I
6
+ have the right to submit it under the open source license
7
+ indicated in the file LICENSE; or
8
+
9
+ 2. The contribution is based upon previous work that, to the best
10
+ of my knowledge, is covered under an appropriate open source
11
+ license and I have the right under that license to submit that
12
+ work with modifications, whether created in whole or in part
13
+ by me, under the same open source license (unless I am
14
+ permitted to submit under a different license), as indicated
15
+ in the file LICENSE; or
16
+
17
+ 3. The contribution was provided directly to me by some other
18
+ person who certified (1), (2) or (3) and I have not modified
19
+ it.
20
+
21
+ 4. I understand and agree that this project and the contribution
22
+ are public and that a record of the contribution (including all
23
+ personal information I submit with it, including my sign-off) is
24
+ maintained indefinitely and may be redistributed consistent with
25
+ this project or the open source license(s) involved.