ruby-ll 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: ef87cedfa33d3340b77133abff1bdb11f5ad767e
4
+ data.tar.gz: 6e25ecbd4a78bc7f3469bba05d85a1cd9634b13a
5
+ SHA512:
6
+ metadata.gz: 07561fc9d28c285ec101c5f9d2c67f16f15b700efd814af5a49174e9e47de90f44f2ffb0859d039f7022bd494f74b8867677c5fd93b6189dd44c40367e5dc62d
7
+ data.tar.gz: 35a3352d8d2207937d5e1d8657f2c366f8417e6604fc2aef88effe9ed6d94a710e01b8ce4830881ca439fb7fa25a4b326e49a5157a925277d6cc260e4ae5c3f1
@@ -0,0 +1,13 @@
1
+ ./lib/ll/**/*.rb ./lib/ll.rb
2
+ -m markdown
3
+ -M kramdown
4
+ -o yardoc
5
+ -r ./README.md
6
+ --private
7
+ --protected
8
+ --asset ./doc/css/common.css:css/common.css
9
+ --verbose
10
+ -
11
+ ./doc/*.md
12
+ LICENSE
13
+ CONTRIBUTING.md
data/LICENSE ADDED
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2015, Yorick Peterse
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
@@ -0,0 +1,380 @@
1
+ # ruby-ll
2
+
3
+ ruby-ll is a high performance LL(1) table based parser generator for Ruby. The
4
+ parser driver is written in C/Java to ensure good runtime performance, the
5
+ compiler is written entirely in Ruby.
6
+
7
+ ruby-ll was written to serve as a fast and easy to use alternative to
8
+ [Racc][racc] for the various parsers used in [Oga][oga]. However, ruby-ll isn't
9
+ limited to just Oga, you can use it to write a parser for any language that can
10
+ be represented using an LL(1) grammar.
11
+
12
+ ruby-ll is self-hosting, this allows one to use ruby-ll to modify its own
13
+ parser. Self-hosting was achieved by bootstrapping the parser using a Racc
14
+ parser that outputs the same AST as the ruby-ll parser. The Racc parser remains
15
+ in the repository for historical purposes and in case it's ever needed again, it
16
+ can be found in [bootstrap/parser.y](lib/ll/bootstrap/parser.y).
17
+
18
+ For more information on LL parsing, see
19
+ <https://en.wikipedia.org/wiki/LL_parser>.
20
+
21
+ ## Features
22
+
23
+ * Support for detecting first/first and first/follow conflicts
24
+ * clang-like error/warning messages to ease debugging parsers
25
+ * High performance and a low memory footprint
26
+
27
+ ## Requirements
28
+
29
+ | Ruby | Required | Recommended |
30
+ |:---------|:--------------|:------------|
31
+ | MRI | >= 1.9.3 | >= 2.1.0 |
32
+ | Rubinius | >= 2.2 | >= 2.5.0 |
33
+ | JRuby | >= 1.7 | >= 1.7.0 |
34
+ | Maglev | Not supported | |
35
+ | Topaz | Not supported | |
36
+ | mruby | Not supported | |
37
+
38
+ For MRI/Rubinius you'll need a C90 compatible compiler such as clang or gcc. For
39
+ JRuby you don't need any compilers to be installed as the .jar is packaged with
40
+ the Gem itself.
41
+
42
+ When hacking on Oga you'll also need to have the following installed:
43
+
44
+ * Ragel 6 for building the grammar lexer
45
+ * javac for building the JRuby extension
46
+
47
+ ## Usage
48
+
49
+ The CLI takes a grammar input file (see below for the exact syntax) with the
50
+ extension `.rll` and turns it into a corresponding Ruby file. For example:
51
+
52
+ ruby-ll lib/my-gem/parser.rll
53
+
54
+ This would result in the parser being written to `lib/my-gem/parser.rb`. If you
55
+ want to customize the output path you can do so using the `-o` / `--output`
56
+ options:
57
+
58
+ ruby-ll lib/my-gem/parser.rll -o lib/my-gem/my-parser.rb
59
+
60
+ By default ruby-ll adds various `require` calls to ensure you can load the
61
+ parser _without_ having to load all of ruby-ll (e.g. the compiler code). If you
62
+ want to disable this behaviour you can use the `--no-requires` option when
63
+ processing a grammar:
64
+
65
+ ruby-ll lib/my-gem/parser.rll --no-requires
66
+
67
+ Once generated you can use the parser class like any other parser. To start
68
+ parsing simply call the `parse` method:
69
+
70
+ parser = MyGem::Parser.new
71
+
72
+ parser.parse
73
+
74
+ The return value of this method is whatever the root rule (= the first rule
75
+ defined) returned.
76
+
77
+ ## Grammar Syntax
78
+
79
+ The syntax of a ruby-ll grammar file is fairly simple and consists out of
80
+ directives, rules, comments and code blocks.
81
+
82
+ Directives can be seen as configuration options, for example to set the name of
83
+ the parser class. Rules are, well, the parsing rules. Code blocks can be used to
84
+ associate Ruby code with either a branch of a rule or a certain section of the
85
+ parser (the header or its inner body).
86
+
87
+ Directives and rules must be terminated using a semicolon, this is not needed
88
+ for `%inner` / `%header` blocks.
89
+
90
+ For a full example, see ruby-ll's own parser located at
91
+ [lib/ll/parser.rll](lib/ll/parser.rll).
92
+
93
+ ### Comments
94
+
95
+ Comments start with a hash (`#`) sign and continue until the end of the line,
96
+ just like Ruby. Example:
97
+
98
+ # Some say comments are a code smell.
99
+
100
+ ### %name
101
+
102
+ The `%name` directive is used to set the full name/namespace of the parser
103
+ class. The name consists out of a single identifier or multiple identifiers
104
+ separated by `::` (just like Ruby). Some examples:
105
+
106
+ %name A;
107
+ %name A::B;
108
+ %name A::B::C;
109
+
110
+ The last identifier is used as the actual class name. This class will be nested
111
+ inside a module for every other segment leading up to the last one. For example,
112
+ this:
113
+
114
+ %name A;
115
+
116
+ Gets turned into this:
117
+
118
+ class A < LL::Driver
119
+
120
+ end
121
+
122
+ While this:
123
+
124
+ %name A::B::C;
125
+
126
+ Gets turned into this:
127
+
128
+ module A
129
+ module B
130
+ class C < LL::Driver
131
+
132
+ end
133
+ end
134
+ end
135
+
136
+ By nesting the parser class in modules any constants in the scope can be
137
+ referred to without requiring the use of a full namespace. For example, the
138
+ constant `A::B::X` can just be referred to as `X` in the above example.
139
+
140
+ Multiple calls to this directive will result in previous values being
141
+ overwritten.
142
+
143
+ ### %terminals
144
+
145
+ The `%terminals` directive is used to list one or more terminals of the grammar.
146
+ Each terminal is an identifier separated by a space. For example:
147
+
148
+ %terminals A B C;
149
+
150
+ This would define 3 terminals: `A`, `B` and `C`. While there's no specific
151
+ requirement as to how you name your terminals it's common practise to capitalize
152
+ them and prefix them with `T_`, like so:
153
+
154
+ %terminals T_A T_B T_C;
155
+
156
+ Multiple calls to this directive will result in the terminals being appended to
157
+ the existing list.
158
+
159
+ ### %inner
160
+
161
+ The `%inner` directive can be used to specify a code block that should be placed
162
+ inside the parser's body, just after the section containing all parsing tables.
163
+ This directive should be used for adding custom methods and such to the parser.
164
+ For example:
165
+
166
+ %inner
167
+ {
168
+ def initialize(input)
169
+ @input = input
170
+ end
171
+ }
172
+
173
+ This would result in the following:
174
+
175
+ class A < LL::Driver
176
+ def initialize(input)
177
+ @input = input
178
+ end
179
+ end
180
+
181
+ Curly braces can either be placed on the same line as the `%inner` directive or
182
+ on a new line, it's up to you.
183
+
184
+ Unlike regular directives this directive should not be terminated using a
185
+ semicolon.
186
+
187
+ ### %header
188
+
189
+ The `%header` directive is similar to the `%inner` directive in that it can be
190
+ used to add a code block to the parser. The code of this directive is placed
191
+ just before the `class` definition of the parser. This directive can be used to
192
+ add documentation to the parser class. For example:
193
+
194
+ %header
195
+ {
196
+ # Hello world
197
+ }
198
+
199
+ This would result in the following:
200
+
201
+ # Hello world
202
+ class A < LL::Driver
203
+ end
204
+
205
+ ### Rules
206
+
207
+ Rules consist out of a name followed by an equals sign (`=`) followed by 1 or
208
+ more branches. Each branch is separated using a pipe (`|`). A branch can consist
209
+ out of 1 or many steps, or an epsilon. Branches can be followed by a code block
210
+ starting with `{` and ending with `}`. A rule must be terminated using a
211
+ semicolon.
212
+
213
+ An epsilon is represented as a single underscore (`_`) and is used to denote a
214
+ wildcard/nothingness.
215
+
216
+ A simple example:
217
+
218
+ %terminals A;
219
+
220
+ numbers = A | B;
221
+
222
+ Here the rule `numbers` is defined and has two branches. If we wanted a rule
223
+ that would match terminal `A` or nothing we'd use the following:
224
+
225
+ %terminals A;
226
+
227
+ numbers = A | _;
228
+
229
+ Code blocks can also be added:
230
+
231
+ numbers
232
+ = A { 'A' }
233
+ | B { 'B' }
234
+ ;
235
+
236
+ When the terminal `A` would be processed the returned value would be "B", for
237
+ terminal `B` the returned value would be "B".
238
+
239
+ Code blocks have access to an array called `val` which contains the values of
240
+ every step of a branch. For example:
241
+
242
+ numbers = A B { val };
243
+
244
+ Here `val` would return `[A, B]`. Since `val` is just an Array you can also
245
+ return specific elements from it:
246
+
247
+ numbers = A B { val[0] };
248
+
249
+ Values returned by code blocks are passed to whatever other rule called it. This
250
+ allows code blocks to be used for building ASTs and the likes. If no explicit
251
+ code block is defined `val` is returned as is.
252
+
253
+ ruby-ll parsers recurse into rules before unwinding, this means that the
254
+ inner-most rule is processed first.
255
+
256
+ Branches of a rule can also refer to other rules:
257
+
258
+ numbers = A other_rule;
259
+ other_rule = B;
260
+
261
+ The value for `other_rule` in the `numbers` rule would be whatever the
262
+ `other_rule` below it returns.
263
+
264
+ The grammar compiler adds errors whenever it encounters a rule with the same
265
+ name as a terminal, as such the following is invalid:
266
+
267
+ %terminals A B;
268
+
269
+ A = B;
270
+
271
+ It's also an error to re-define an existing rule.
272
+
273
+ ## Conflicts
274
+
275
+ LL(1) grammars can have two kinds of conflicts in a rule:
276
+
277
+ * first/first
278
+ * first/follow
279
+
280
+ ### first/first
281
+
282
+ A first/first conflict means that multiple branches of a rule start with the
283
+ same terminal, resulting in the parser being unable to choose what branch to
284
+ use. For example:
285
+
286
+ %terminals A B;
287
+
288
+ rule = A | A B;
289
+
290
+ This would result in the following output:
291
+
292
+ example.rll:5:1:error: first/first conflict, multiple branches start with the same terminals
293
+ rule = A | A B;
294
+ ^
295
+ example.rll:5:8:error: branch starts with: A
296
+ rule = A | A B;
297
+ ^
298
+ example.rll:5:12:error: branch starts with: A
299
+ rule = A | A B;
300
+ ^
301
+
302
+ To solve a first/first conflict you'll have to factor out the common left
303
+ factor. For example:
304
+
305
+ %name Example;
306
+
307
+ %terminals A B;
308
+
309
+ rule = A rule_follow;
310
+ rule_follow = B | _;
311
+
312
+ Here the `rule` rule starts with terminal `A` and can optionally be followed by
313
+ `B`, without introducing any first/first conflicts.
314
+
315
+ ### first/follow
316
+
317
+ A first/follow conflict occurs when a branch in a rule starts with an epsilon
318
+ and is followed by one or more terminals and/or rules. An example of a
319
+ first/follow conflict:
320
+
321
+ %name Example;
322
+
323
+ %terminals A B;
324
+
325
+ rule = other_rule B;
326
+ other_rule = A | _;
327
+
328
+ This produces the following errors:
329
+
330
+ example.rll:5:14:error: first/follow conflict, branch can start with epsilon and is followed by (non) terminals
331
+ rule = other_rule B;
332
+ ^
333
+ example.rll:6:18:error: epsilon originates from here
334
+ other_rule = A | _;
335
+ ^
336
+
337
+ There's no specific procedure to solving such a conflict other than simply
338
+ removing the starting epsilon.
339
+
340
+ ## Performance
341
+
342
+ One of the goals of ruby-ll is to be faster than existing parser generators,
343
+ Racc in particular. How much faster ruby-ll will be depends on the use case. For
344
+ example, for the benchmark
345
+ [benchmark/ll/simple\_json\_bench.rb](benchmark/l/simple_json_bench.rb) the
346
+ performance gains of ruby-ll over Racc are as following:
347
+
348
+ | Ruby | Speed |
349
+ |:----------------|:------|
350
+ | MRI 2.2 | 1.75x |
351
+ | Rubinius 2.5.2 | 3.85x |
352
+ | JRuby 1.7.18 | 6.44x |
353
+ | JRuby 9000 pre1 | 7.50x |
354
+
355
+ This benchmark was run on a Thinkpad T520 laptop so it's probably best to run
356
+ the bencharmk yourself to see how it behaves on your platform.
357
+
358
+ Depending on the complexity of your parser you might end up with different
359
+ different numbers. The above metrics are simply an indication of the maximum
360
+ performance gain of ruby-ll compared to Racc.
361
+
362
+ ## Thread Safety
363
+
364
+ Parsers generated by ruby-ll share an internal, mutable state on a per instance
365
+ basis. As a result of this a single instance of your parser _can not_ be used by
366
+ multiple threads in parallel. If it wasn't for MRI's C API (specifically due to
367
+ how `rb_block_call` works) this wouldn't have been an issue.
368
+
369
+ To mitigate the above simply create a new instance of your parser every time you
370
+ need it and have the GC clean it up once you're done. This _will_ introduce a
371
+ slight allocation overhead but it beats having to deal with race conditions.
372
+
373
+ ## License
374
+
375
+ All source code in this repository is licensed under the MIT license unless
376
+ specified otherwise. A copy of this license can be found in the file "LICENSE"
377
+ in the root directory of this repository.
378
+
379
+ [racc]: https://github.com/tenderlove/racc
380
+ [oga]: https://github.com/yorickpeterse/oga
@@ -0,0 +1,5 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require_relative '../lib/ll'
4
+
5
+ LL::CLI.new.run
@@ -0,0 +1,25 @@
1
+ # Developer's Certificate of Origin 1.0
2
+
3
+ By making a contribution to this project, I certify that:
4
+
5
+ 1. The contribution was created in whole or in part by me and I
6
+ have the right to submit it under the open source license
7
+ indicated in the file LICENSE; or
8
+
9
+ 2. The contribution is based upon previous work that, to the best
10
+ of my knowledge, is covered under an appropriate open source
11
+ license and I have the right under that license to submit that
12
+ work with modifications, whether created in whole or in part
13
+ by me, under the same open source license (unless I am
14
+ permitted to submit under a different license), as indicated
15
+ in the file LICENSE; or
16
+
17
+ 3. The contribution was provided directly to me by some other
18
+ person who certified (1), (2) or (3) and I have not modified
19
+ it.
20
+
21
+ 4. I understand and agree that this project and the contribution
22
+ are public and that a record of the contribution (including all
23
+ personal information I submit with it, including my sign-off) is
24
+ maintained indefinitely and may be redistributed consistent with
25
+ this project or the open source license(s) involved.