rltk 2.2.1 → 3.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/LICENSE +12 -12
- data/README.md +458 -285
- data/Rakefile +99 -92
- data/lib/rltk/ast.rb +221 -126
- data/lib/rltk/cfg.rb +218 -239
- data/lib/rltk/cg/basic_block.rb +1 -1
- data/lib/rltk/cg/bindings.rb +9 -26
- data/lib/rltk/cg/builder.rb +40 -8
- data/lib/rltk/cg/context.rb +1 -1
- data/lib/rltk/cg/contractor.rb +51 -0
- data/lib/rltk/cg/execution_engine.rb +45 -8
- data/lib/rltk/cg/function.rb +12 -2
- data/lib/rltk/cg/generated_bindings.rb +2541 -575
- data/lib/rltk/cg/generic_value.rb +2 -2
- data/lib/rltk/cg/instruction.rb +104 -83
- data/lib/rltk/cg/llvm.rb +44 -3
- data/lib/rltk/cg/memory_buffer.rb +22 -5
- data/lib/rltk/cg/module.rb +85 -36
- data/lib/rltk/cg/old_generated_bindings.rb +6152 -0
- data/lib/rltk/cg/pass_manager.rb +87 -43
- data/lib/rltk/cg/support.rb +2 -4
- data/lib/rltk/cg/target.rb +158 -28
- data/lib/rltk/cg/triple.rb +8 -8
- data/lib/rltk/cg/type.rb +69 -25
- data/lib/rltk/cg/value.rb +107 -66
- data/lib/rltk/cg.rb +16 -17
- data/lib/rltk/lexer.rb +21 -11
- data/lib/rltk/lexers/calculator.rb +1 -1
- data/lib/rltk/lexers/ebnf.rb +8 -7
- data/lib/rltk/parser.rb +300 -247
- data/lib/rltk/parsers/infix_calc.rb +1 -1
- data/lib/rltk/parsers/postfix_calc.rb +2 -2
- data/lib/rltk/parsers/prefix_calc.rb +2 -2
- data/lib/rltk/token.rb +1 -2
- data/lib/rltk/version.rb +3 -3
- data/lib/rltk.rb +6 -6
- data/test/cg/tc_basic_block.rb +83 -0
- data/test/cg/tc_control_flow.rb +191 -0
- data/test/cg/tc_function.rb +54 -0
- data/test/cg/tc_generic_value.rb +33 -0
- data/test/cg/tc_instruction.rb +256 -0
- data/test/cg/tc_llvm.rb +25 -0
- data/test/cg/tc_math.rb +88 -0
- data/test/cg/tc_module.rb +89 -0
- data/test/cg/tc_transforms.rb +68 -0
- data/test/cg/tc_type.rb +69 -0
- data/test/cg/tc_value.rb +151 -0
- data/test/cg/ts_cg.rb +23 -0
- data/test/tc_ast.rb +105 -8
- data/test/tc_cfg.rb +63 -48
- data/test/tc_lexer.rb +84 -96
- data/test/tc_parser.rb +224 -52
- data/test/tc_token.rb +6 -6
- data/test/ts_rltk.rb +12 -15
- metadata +149 -75
- data/lib/rltk/cg/generated_extended_bindings.rb +0 -287
- data/lib/rltk/util/abstract_class.rb +0 -25
- data/lib/rltk/util/monkeys.rb +0 -129
data/README.md
CHANGED
@@ -12,7 +12,7 @@ In addition, RLTK includes several ready-made lexers and parsers and a Turing-co
|
|
12
12
|
|
13
13
|
## Why Use RLTK
|
14
14
|
|
15
|
-
Here are some reasons to use RLTK to build your lexers, parsers, and abstract syntax trees:
|
15
|
+
Here are some reasons to use RLTK to build your lexers, parsers, and abstract syntax trees, as well as generating LLVM IR and native object files:
|
16
16
|
|
17
17
|
* **Lexer and Parser Definitions in Ruby** - Many tools require you to write your lexer/parser definitions in their own format, which is then processed and used to generate Ruby code. RLTK lexers/parsers are written entirely in Ruby and use syntax you are already familiar with.
|
18
18
|
|
@@ -34,6 +34,10 @@ Here are some reasons to use RLTK to build your lexers, parsers, and abstract sy
|
|
34
34
|
|
35
35
|
* **Parse Tree Graphs** - RLTK parsers can print parse trees (in the DOT language) of accepted strings.
|
36
36
|
|
37
|
+
* **LLVM Bindings** - RLTK provides wrappers for most of the C LLVM bindings.
|
38
|
+
|
39
|
+
* **The Contractor** - LLVM's method of building instructions is a bit cumbersome, and is very imperative in style. RLTK provides the Contractor class to make things easier.
|
40
|
+
|
37
41
|
* **Documentation** - We have it!
|
38
42
|
|
39
43
|
* **I Eat My Own Dog Food** - I'm using RLTK for my own projects so if there is a bug I'll most likely be the first one to know.
|
@@ -42,19 +46,21 @@ Here are some reasons to use RLTK to build your lexers, parsers, and abstract sy
|
|
42
46
|
|
43
47
|
To create your own lexer using RLTK you simply need to subclass the {RLTK::Lexer} class and define the *rules* that will be used for matching text and generating tokens. Here we see a simple lexer for a calculator:
|
44
48
|
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
49
|
+
```Ruby
|
50
|
+
class Calculator < RLTK::Lexer
|
51
|
+
rule(/\+/) { :PLS }
|
52
|
+
rule(/-/) { :SUB }
|
53
|
+
rule(/\*/) { :MUL }
|
54
|
+
rule(/\//) { :DIV }
|
50
55
|
|
51
|
-
|
52
|
-
|
56
|
+
rule(/\(/) { :LPAREN }
|
57
|
+
rule(/\)/) { :RPAREN }
|
53
58
|
|
54
|
-
|
59
|
+
rule(/[0-9]+/) { |t| [:NUM, t.to_i] }
|
55
60
|
|
56
|
-
|
57
|
-
|
61
|
+
rule(/\s/)
|
62
|
+
end
|
63
|
+
```
|
58
64
|
|
59
65
|
The {RLTK::Lexer.rule} method's first argument is the regular expression used for matching text. The block passed to the function is the action that executes when a substring is matched by the rule. These blocks must return the *type* of the token (which must be in ALL CAPS; see the Parsers section), and optionally a *value*. In the latter case you must return an array containing the *type* and *value*, which you can see an example of in the Calculator lexer shown above. The values returned by the proc object are used to build a {RLTK::Token} object that includes the *type* and *value* information, as well as information about the line number the token was found on, the offset from the beginning of the line to the start of the token, and the length of the token's text. If the *type* value returned by the proc is `nil` the input is discarded and no token is produced.
|
60
66
|
|
@@ -64,17 +70,19 @@ The {RLTK::Lexer} class provides both {RLTK::Lexer.lex} and {RLTK::Lexer.lex_fil
|
|
64
70
|
|
65
71
|
The proc objects passed to the {RLTK::Lexer.rule} methods are evaluated inside an instance of the {RLTK::Lexer::Environment} class. This gives you access to methods for manipulating the lexer's state and flags (see bellow). You can also subclass the environment inside your lexer to provide additional functionality to your rule blocks. When doing so you need to ensure that you name your new class Environment like in the following example:
|
66
72
|
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
73
|
+
```Ruby
|
74
|
+
class MyLexer < RLTK::Lexer
|
75
|
+
...
|
76
|
+
|
77
|
+
class Environment < Environment
|
78
|
+
def helper_function
|
79
|
+
...
|
80
|
+
end
|
81
|
+
|
82
|
+
...
|
83
|
+
end
|
84
|
+
end
|
85
|
+
```
|
78
86
|
|
79
87
|
### Using States
|
80
88
|
|
@@ -89,16 +97,18 @@ The methods used to manipulate state are:
|
|
89
97
|
|
90
98
|
States may be used to easily support nested comments.
|
91
99
|
|
92
|
-
|
93
|
-
|
94
|
-
|
95
|
-
|
96
|
-
|
97
|
-
|
98
|
-
|
99
|
-
|
100
|
-
|
101
|
-
|
100
|
+
```Ruby
|
101
|
+
class StateLexer < RLTK::Lexer
|
102
|
+
rule(/a/) { :A }
|
103
|
+
rule(/\s/)
|
104
|
+
|
105
|
+
rule(/\(\*/) { push_state(:comment) }
|
106
|
+
|
107
|
+
rule(/\(\*/, :comment) { push_state(:comment) }
|
108
|
+
rule(/\*\)/, :comment) { pop_state }
|
109
|
+
rule(/./, :comment)
|
110
|
+
end
|
111
|
+
```
|
102
112
|
|
103
113
|
By default the lexer will start in the `:default` state. To change this, you may use the {RLTK::Lexer.start} method.
|
104
114
|
|
@@ -112,14 +122,16 @@ The lexing environment also maintains a set of *flags*. This set is manipulated
|
|
112
122
|
|
113
123
|
When *rules* are defined they may use a third parameter to specify a list of flags that must be set before the rule is considered when matching substrings. An example of this usage follows:
|
114
124
|
|
115
|
-
|
116
|
-
|
117
|
-
|
118
|
-
|
119
|
-
|
120
|
-
|
121
|
-
|
122
|
-
|
125
|
+
```Ruby
|
126
|
+
class FlagLexer < RLTK::Lexer
|
127
|
+
rule(/a/) { set_flag(:a); :A }
|
128
|
+
|
129
|
+
rule(/\s/)
|
130
|
+
|
131
|
+
rule(/b/, :default, [:a]) { set_flag(:b); :B }
|
132
|
+
rule(/c/, :default, [:a, :b]) { :C }
|
133
|
+
end
|
134
|
+
```
|
123
135
|
|
124
136
|
### Instantiating Lexers
|
125
137
|
|
@@ -129,16 +141,57 @@ In addition to using the {RLTK::Lexer.lex} class method you may also instantiate
|
|
129
141
|
|
130
142
|
A RLTK::Lexer may be told to select either the first substring that is found to match a rule or the longest substring to match any rule. The default behavior is to match the longest substring possible, but you can change this by using the {RLTK::Lexer.match_first} method inside your class definition as follows:
|
131
143
|
|
132
|
-
|
133
|
-
|
134
|
-
|
135
|
-
|
136
|
-
|
144
|
+
```Ruby
|
145
|
+
class MyLexer < RLTK::Lexer
|
146
|
+
match_first
|
147
|
+
|
148
|
+
...
|
149
|
+
end
|
150
|
+
```
|
137
151
|
|
138
152
|
### Match Data
|
139
153
|
|
140
154
|
Because it isn't RLTK's job to tell you how to write lexers and parsers, the MatchData object from a pattern match is available inside the Lexer::Environment object via the `match` accessor.
|
141
155
|
|
156
|
+
## Context-Free Grammars
|
157
|
+
|
158
|
+
The {RLTK::CFG} class provides an abstraction for context-free grammars. For the purpose of this class terminal symbols appear in **ALL CAPS**, and non-terminal symbols appear in **all lowercase**. Once a grammar is defined the {RLTK::CFG#first_set} and {RLTK::CFG#follow_set} methods can be used to find *first* and *follow* sets.
|
159
|
+
|
160
|
+
### Defining Grammars
|
161
|
+
|
162
|
+
A grammar is defined by first instantiating the {RLTK::CFG} class. The {RLTK::CFG#production} and {RLTK::CFG#clause} methods may then be used to define the productions of the grammar. The `production` method can take a Symbol denoting the left-hand side of the production and a string describing the right-hand side of the production, or the left-hand side symbol and a block. In the first usage a single production is created. In the second usage the block may contain repeated calls to the `clause` method, each call producing a new production with the same left-hand side but different right-hand sides. {RLTK::CFG#clause} may not be called outside of {RLTK::CFG#production}. Bellow we see a grammar definition that uses both methods:
|
163
|
+
|
164
|
+
```Ruby
|
165
|
+
grammar = RLTK::CFG.new
|
166
|
+
|
167
|
+
grammar.production(:s) do
|
168
|
+
clause('A G D')
|
169
|
+
clause('A a C')
|
170
|
+
clause('B a D')
|
171
|
+
clause('B G C')
|
172
|
+
end
|
173
|
+
|
174
|
+
grammar.production(:a, 'b')
|
175
|
+
grammar.production(:b, 'G')
|
176
|
+
```
|
177
|
+
|
178
|
+
### Extended Backus–Naur Form
|
179
|
+
|
180
|
+
The RLTK::CFG class understands grammars written in the extended Backus–Naur form. This allows you to use the \*, \+, and ? operators in your grammar definitions. When each of these operators are encountered additional productions are generated. For example, if the right-hand side of a production contained `NUM*` a production of the form `num_star -> | NUM num_star` is added to the grammar. As such, your grammar should not contain productions with similar left-hand sides (e.g. foo_star, bar_question, or baz_plus).
|
181
|
+
|
182
|
+
As these additional productions are added internally to the grammar a callback functionality is provided to let you know when such an event occurs. The callback proc object can either be specified when the CFG object is created, or by using the {RLTK::CFG#callback} method. The callback will receive three arguments: the production generated, the operator that triggered the generation, and a symbol (:first or :second) specifying which clause of the production this callback is for.
|
183
|
+
|
184
|
+
### Helper Functions
|
185
|
+
|
186
|
+
Once a grammar has been defined you can use the following functions to obtain information about it:
|
187
|
+
|
188
|
+
* {RLTK::CFG#first_set} - Returns the *first set* for the provided symbol or sentence.
|
189
|
+
* {RLTK::CFG#follow_set} - Returns the *follow set* for the provided symbol.
|
190
|
+
* {RLTK::CFG#nonterms} - Returns a list of the non-terminal symbols used in the grammar's definition.
|
191
|
+
* {RLTK::CFG#productions} - Provides either a hash or array of the grammar's productions.
|
192
|
+
* {RLTK::CFG#symbols} - Returns a list of all symbols used in the grammar's definition.
|
193
|
+
* {RLTK::CFG#terms} - Returns a list of the terminal symbols used in the grammar's definition.
|
194
|
+
|
142
195
|
## Parsers
|
143
196
|
|
144
197
|
To create a parser using RLTK simply subclass RLTK::Parser, define the productions of the grammar you wish to parse, and call `finalize`. During finalization RLTK will build an LALR(1) parsing table, which may contain conflicts that can't be resolved with LALR(1) lookahead sets or precedence/associativity information. Traditionally, when parser generators such as **YACC** encounter conflicts during parsing table generation they will resolve shift/reduce conflicts in favor of shifts and reduce/reduce conflicts in favor of the production that was defined first. This means that the generated parsers can't handle ambiguous grammars.
|
@@ -149,18 +202,20 @@ RLTK parsers, on the other hand, can handle *all* context-free grammars by forki
|
|
149
202
|
|
150
203
|
Let us look at the simple prefix calculator included with RLTK:
|
151
204
|
|
152
|
-
|
153
|
-
|
154
|
-
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
159
|
-
|
160
|
-
|
161
|
-
|
162
|
-
|
163
|
-
|
205
|
+
```Ruby
|
206
|
+
class PrefixCalc < RLTK::Parser
|
207
|
+
production(:e) do
|
208
|
+
clause('NUM') {|n| n}
|
209
|
+
|
210
|
+
clause('PLS e e') { |_, e0, e1| e0 + e1 }
|
211
|
+
clause('SUB e e') { |_, e0, e1| e0 - e1 }
|
212
|
+
clause('MUL e e') { |_, e0, e1| e0 * e1 }
|
213
|
+
clause('DIV e e') { |_, e0, e1| e0 / e1 }
|
214
|
+
end
|
215
|
+
|
216
|
+
finalize
|
217
|
+
end
|
218
|
+
```
|
164
219
|
|
165
220
|
The parser uses the same method for defining productions as the {RLTK::CFG} class. In fact, the parser forwards the {RLTK::Parser.production} and {RLTK::Parser.clause} method invocations to an internal {RLTK::CFG} object after removing the parser specific information. To see a detailed description of grammar definitions please read the Context-Free Grammars section bellow.
|
166
221
|
|
@@ -172,39 +227,67 @@ The default starting symbol of the grammar is the left-hand side of the first pr
|
|
172
227
|
|
173
228
|
### Shortcuts
|
174
229
|
|
175
|
-
RLTK provides several shortcuts for common grammar constructs. Right now these shortcuts include the {RLTK::Parser.
|
230
|
+
RLTK provides several shortcuts for common grammar constructs. Right now these shortcuts include the {RLTK::Parser.list} and {RLTK::Parser.nonempty_list} methods. A list may contain 0, 1, or more elements, with an optional token or tokens seperating each element. A non-empty list contains **at least** 1 element. An empty list with only a single list element and an empty separator is equivelent to the Kleene star. Simillarly, a list with only a single list element and an empty separator is equivelent to the Kleene plus.
|
176
231
|
|
177
232
|
This example shows how these shortcuts may be used to define a list of integers separated by a `:COMMA` token:
|
178
233
|
|
179
|
-
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
|
234
|
+
```Ruby
|
235
|
+
class ListParser < RLTK::Parser
|
236
|
+
nonempty_list(:int_list, :INT, :COMMA)
|
237
|
+
|
238
|
+
finalize
|
239
|
+
end
|
240
|
+
```
|
184
241
|
|
185
242
|
If you wanted to define a list of floats or integers you could define your parser like this:
|
186
243
|
|
187
|
-
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
244
|
+
```Ruby
|
245
|
+
class ListParser < RLTK::Parser
|
246
|
+
nonempty_list(:mixed_list, [:INT, :FLOAT], :COMMA)
|
247
|
+
|
248
|
+
finalize
|
249
|
+
end
|
250
|
+
```
|
251
|
+
|
252
|
+
If you don't want to require a separator you can do this:
|
253
|
+
|
254
|
+
```Ruby
|
255
|
+
class ListParser < RLTK::Parser
|
256
|
+
nonempty_list(:mixed_nonsep_list, [:INT, :FLOAT])
|
257
|
+
|
258
|
+
finalize
|
259
|
+
end
|
260
|
+
```
|
261
|
+
|
262
|
+
You can also use separators that are made up of multiple tokens:
|
263
|
+
|
264
|
+
```Ruby
|
265
|
+
class ListParser < RLTK::Parser
|
266
|
+
nonempty_list(:mixed_nonsep_list, [:INT, :FLOAT], 'COMMA NEWLINE?')
|
267
|
+
|
268
|
+
finalize
|
269
|
+
end
|
270
|
+
```
|
192
271
|
|
193
272
|
A list may also contain multiple tokens between the separator:
|
194
273
|
|
195
|
-
|
196
|
-
|
197
|
-
|
198
|
-
|
199
|
-
|
274
|
+
```Ruby
|
275
|
+
class ListParser < RLTK::Parser
|
276
|
+
nonempty_list(:foo_bar_list, 'FOO BAR', :COMMA)
|
277
|
+
|
278
|
+
finalize
|
279
|
+
end
|
280
|
+
```
|
200
281
|
|
201
282
|
Lastly, you can mix all of these features together:
|
202
283
|
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
284
|
+
```Ruby
|
285
|
+
class ListParser < RLTK::Parser
|
286
|
+
nonempty_list(:foo_list, ['FOO BAR', 'FOO BAZ+'], :COMMA)
|
287
|
+
|
288
|
+
finalize
|
289
|
+
end
|
290
|
+
```
|
208
291
|
|
209
292
|
The productions generated by these shortcuts will always evaluate to an array. In the first two examples above the productions will produce a 1-D array containing the values of the `INT` or `FLOAT` tokens. In the last two examples the productions `foo_bar_list` and `foo_list` will produce 2-D arrays where the top level array is composed of tuples coresponding to the values of `FOO`, and `BAR` or one or more `BAZ`s.
|
210
293
|
|
@@ -220,48 +303,83 @@ To assign precedence to terminal symbols you can use the {RLTK::Parser.left}, {R
|
|
220
303
|
|
221
304
|
Let's look at the infix calculator example now:
|
222
305
|
|
223
|
-
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
231
|
-
|
232
|
-
|
233
|
-
|
234
|
-
|
235
|
-
|
236
|
-
|
237
|
-
|
238
|
-
|
239
|
-
|
240
|
-
|
241
|
-
|
242
|
-
|
243
|
-
|
244
|
-
|
245
|
-
|
246
|
-
|
306
|
+
```Ruby
|
307
|
+
class InfixCalc < RLTK::Parser
|
308
|
+
|
309
|
+
left :PLS, :SUB
|
310
|
+
right :MUL, :DIV
|
311
|
+
|
312
|
+
production(:e) do
|
313
|
+
clause('NUM') { |n| n }
|
314
|
+
|
315
|
+
clause('LPAREN e RPAREN') { |_, e, _| e }
|
316
|
+
|
317
|
+
clause('e PLS e') { |e0, _, e1| e0 + e1 }
|
318
|
+
clause('e SUB e') { |e0, _, e1| e0 - e1 }
|
319
|
+
clause('e MUL e') { |e0, _, e1| e0 * e1 }
|
320
|
+
clause('e DIV e') { |e0, _, e1| e0 / e1 }
|
321
|
+
end
|
322
|
+
|
323
|
+
finalize
|
324
|
+
end
|
325
|
+
```
|
326
|
+
|
327
|
+
The standard order of mathematical operations tells us that the correct way to group the operations in the expression `2 + 3 * 4` is `2 + (3 * 4)`. However, our grammar tells us that `(2 + 3) * 5` is also a valid way to parse the expression, leading to a shift/reduce error in the parser. To get rid of the shift/reduce error we need some way to tell the parser how to distinguish between these two parse trees. This is where associativity comes in. If the parser has already read `NUM PLS NUM` and the current symbol is a `MUL` symbol we want to tell the parser to shift the new `MUL` symbol onto the stack and continue on. We do this by making the `MUL` symbol right associative. When the parser generator encounters a shift/reduce error it looks at the token currently being read. If it has no associativity information, the error can't be resolved; if the token is left associative, it will remove the shift action from the parser (leaving only the reduce action); if the token is right associative, it will remove the reduce action from the parser (leaving only the shift action).
|
328
|
+
|
329
|
+
Now, let us consider the expression `3 - 2 - 1`. Here, the correct way to parse the expression is `(3 - 2) - 2`. To ensure that this case is selected over `3 - (2 - 1)` we can make the `SUB` token left associative. This will cause the symbols `NUM SUB NUM` to be reduced before the second `SUB` symbol is shifted onto the parse stack.
|
330
|
+
|
331
|
+
Not that, to resolve a shift/reduce or reduce/reduce conflict, precedence and associativty information must be present for all actions involved in the conflict. As such, it isn't enough to simply make the `MUL` and `DIV` tokens right associative; we must also make the `PLS` and `SUB` tokens left associative.
|
332
|
+
|
333
|
+
### Token Selectors
|
334
|
+
|
335
|
+
In many cases productions contain tokens who's value is unimportant. In such situations passing `nil` to the production's action is not useful. To prevent this happening you may use *token selectors*. By placing a period (`.`) in front of a token you can indicate to the parser that the following token is important and you wish for its value to be passed to the action. In the following example selectors are used to only pass the sub-expressions' values to the action:
|
336
|
+
|
337
|
+
```Ruby
|
338
|
+
class InfixCalc < RLTK::Parser
|
339
|
+
|
340
|
+
left :PLS, :SUB
|
341
|
+
right :MUL, :DIV
|
342
|
+
|
343
|
+
production(:e) do
|
344
|
+
clause('NUM') { |n| n }
|
345
|
+
|
346
|
+
clause('LPAREN .e RPAREN') { |e| e }
|
347
|
+
|
348
|
+
clause('.e PLS .e') { |e0, e1| e0 + e1 }
|
349
|
+
clause('.e SUB .e') { |e0, e1| e0 - e1 }
|
350
|
+
clause('.e MUL .e') { |e0, e1| e0 * e1 }
|
351
|
+
clause('.e DIV .e') { |e0, e1| e0 / e1 }
|
352
|
+
end
|
353
|
+
|
354
|
+
finalize
|
355
|
+
end
|
356
|
+
```
|
357
|
+
|
358
|
+
### Argument Passing for Actions
|
359
|
+
|
360
|
+
By default the proc objects associated with productions are passed one argument for each symbol on the right-hand side of the production. This can lead to long, unwieldy argument lists. To change this behaviour you can use the {RLTK::Parser.default_arg_type} method, which accepts the `:splat` (default) and `:array` arguments. Any production actions that are defined after a call to {RLTK::Parser.default_arg_type} will use the argument passing method currently set as the default. You can switch between the different argument passing methods by calling {RLTK::Parser.default_arg_type} repeatedly.
|
361
|
+
|
362
|
+
Individual productions may specify the argument type used by their action via the `arg_type` parameter. If the {RLTK::Parser.production} method is passed an argument type and a block, any clauses defined inside the block will use the argument type specified by the `arg_type` parameter.
|
247
363
|
|
248
364
|
### The Parsing Environment
|
249
365
|
|
250
366
|
The parsing environment is the context in which the proc objects associated with productions are evaluated, and can be used to provide helper functions and to keep state while parsing. To define a custom environment simply subclass {RLTK::Parser::Environment} inside your parser definition as follows:
|
251
367
|
|
252
|
-
|
253
|
-
|
254
|
-
|
255
|
-
|
256
|
-
|
257
|
-
|
258
|
-
|
259
|
-
|
260
|
-
|
261
|
-
|
262
|
-
|
263
|
-
|
264
|
-
|
368
|
+
```Ruby
|
369
|
+
class MyParser < RLTK::Parser
|
370
|
+
...
|
371
|
+
|
372
|
+
class Environment < Environment
|
373
|
+
def helper_function
|
374
|
+
...
|
375
|
+
end
|
376
|
+
|
377
|
+
...
|
378
|
+
end
|
379
|
+
|
380
|
+
finalize
|
381
|
+
end
|
382
|
+
```
|
265
383
|
|
266
384
|
(The definition of the Environment class may occur anywhere inside the MyParser class definition.)
|
267
385
|
|
@@ -320,23 +438,29 @@ The value of an `ERROR` non-terminal will be an array containing all of the toke
|
|
320
438
|
|
321
439
|
The example below, based on one of the unit tests, shows a very basic usage of error productions:
|
322
440
|
|
323
|
-
|
324
|
-
|
325
|
-
|
326
|
-
|
327
|
-
|
328
|
-
|
329
|
-
|
330
|
-
|
331
|
-
|
332
|
-
|
333
|
-
|
334
|
-
|
335
|
-
|
336
|
-
|
337
|
-
|
338
|
-
|
339
|
-
|
441
|
+
```Ruby
|
442
|
+
class ErrorCalc < RLTK::Parser
|
443
|
+
left :ERROR
|
444
|
+
right :PLS, :SUB, :MUL, :DIV, :NUM
|
445
|
+
|
446
|
+
production(:e) do
|
447
|
+
clause('NUM') {|n| n}
|
448
|
+
|
449
|
+
clause('e PLS e') { |e0, _, e1| e0 + e1 }
|
450
|
+
clause('e SUB e') { |e0, _, e1| e0 - e1 }
|
451
|
+
clause('e MUL e') { |e0, _, e1| e0 * e1 }
|
452
|
+
clause('e DIV e') { |e0, _, e1| e0 / e1 }
|
453
|
+
|
454
|
+
clause('e PLS ERROR e') { |e0, _, err, e1| error("#{err.len} tokens skipped."); e0 + e1 }
|
455
|
+
end
|
456
|
+
|
457
|
+
finalize
|
458
|
+
end
|
459
|
+
```
|
460
|
+
|
461
|
+
## A Note on Token Naming
|
462
|
+
|
463
|
+
In the world of RLTK both terminal and non-terminal symbols may contain only alphanumeric characters and underscores. The differences between terminal and non-terminal symbols is that terminals are **ALL\_UPPER\_CASE** and non-terminals are **all\_lower\_case**.
|
340
464
|
|
341
465
|
## ASTNode
|
342
466
|
|
@@ -345,42 +469,48 @@ The {RLTK::ASTNode} base class is meant to be a good starting point for implemen
|
|
345
469
|
To create your own AST node classes you subclass the {RLTK::ASTNode} class and then use the {RLTK::ASTNode.child} and {RLTK::ASTNode.value} methods. By declaring the children and values of a node the class will define the appropriate accessors with type checking, know how to pack and unpack a node's children, and know how to handle constructor arguments.
|
346
470
|
|
347
471
|
Here we can see the definition of several AST node classes that might be used to implement binary operations for a language:
|
348
|
-
|
349
|
-
|
350
|
-
|
351
|
-
|
352
|
-
|
353
|
-
|
354
|
-
|
355
|
-
|
356
|
-
|
357
|
-
|
358
|
-
|
359
|
-
|
360
|
-
|
361
|
-
|
362
|
-
|
472
|
+
|
473
|
+
```Ruby
|
474
|
+
class Expression < RLTK::ASTNode; end
|
475
|
+
|
476
|
+
class Number < Expression
|
477
|
+
value :value, Fixnum
|
478
|
+
end
|
479
|
+
|
480
|
+
class BinOp < Expression
|
481
|
+
value :op, String
|
482
|
+
|
483
|
+
child :left, Expression
|
484
|
+
child :right, Expression
|
485
|
+
end
|
486
|
+
```
|
487
|
+
|
488
|
+
The assignment functions that are generated for the children and values perform type checking to make sure that the AST is well-formed. The type of a child must be a subclass of the {RLTK::ASTNode} class, whereas the type of a value can be any Ruby class. While child and value objects are stored as instance variables it is unsafe to assign to these variables directly, and it is strongly recommended to always use the accessor functions.
|
363
489
|
|
364
490
|
When instantiating a subclass of {RLTK::ASTNode} the arguments to the constructor should be the node's values (in order of definition) followed by the node's children (in order of definition). If a constructor is given fewer arguments then the number of values and children the remaining arguments are assumed to be `nil`. Example:
|
365
491
|
|
366
|
-
|
367
|
-
|
368
|
-
|
369
|
-
|
370
|
-
|
371
|
-
|
372
|
-
|
373
|
-
|
374
|
-
|
375
|
-
|
376
|
-
|
377
|
-
|
492
|
+
```Ruby
|
493
|
+
class Foo < RLTK::ASTNode
|
494
|
+
value :a, Fixnum
|
495
|
+
child :b, Bar
|
496
|
+
value :c, String
|
497
|
+
child :d, Bar
|
498
|
+
end
|
499
|
+
|
500
|
+
class Bar < RLTK::ASTNode
|
501
|
+
value :a, String
|
502
|
+
end
|
503
|
+
|
504
|
+
Foo.new(1, 'baz', Bar.new)
|
505
|
+
```
|
378
506
|
|
379
507
|
Lastly, the type of a child or value can be defined as an array of objects of a specific type as follows:
|
380
508
|
|
381
|
-
|
382
|
-
|
383
|
-
|
509
|
+
```Ruby
|
510
|
+
class Foo < RLTK::ASTNode
|
511
|
+
value :strings, [String]
|
512
|
+
end
|
513
|
+
```
|
384
514
|
|
385
515
|
### Tree Iteration and Mapping
|
386
516
|
|
@@ -398,15 +528,6 @@ You can also map one tree to another tree using the {RLTK::ASTNode#map} and {RLT
|
|
398
528
|
|
399
529
|
RLTK supports the generation of native code and LLVM IR, as well as JIT compilation and execution, through the {RLTK::CG} module. This module is built on top of bindings to [LLVM](http://llvm.org) and provides much, though not all, of the functionality of the LLVM libraries.
|
400
530
|
|
401
|
-
A small amount of the functionality of the RLTK::CG module requires the [LLVM Extended C Bindings](https://github.com/chriswailes/llvm-ecb) library. If this library is missing the rest of the module should behave properly, but this functionality will be missing. The features that require this library are:
|
402
|
-
|
403
|
-
* **Shared Library Loading** - Load shared libraries into the process so that their exported symbols are visible to LLVM via the {RLTK::CG::Support.load\_library} method.
|
404
|
-
* **ASM Printer and Parser Initialization** - Available through the {RLTK::CG::LLVM.init\_asm\_parser} and {RLTK::CG::LLVM.init\_asm\_printer} methods.
|
405
|
-
* **LLVM IR Loading** - LLVM IR files can be loaded into RLTK via the {RLTK::CG::Module.read\_ir\_file} method.
|
406
|
-
* **Value Printing** - Print any value's LLVM IR to a given file descriptor using {RLTK::CG::Value#print}.
|
407
|
-
* **Targets, Target Data, and Target Machines** - Manipulate LLVM structures that contain data about the target environment.
|
408
|
-
* **Object File Generation** - LLVM Modules can be compiled to object files via the {RLTK::CG::Module#compile} method.
|
409
|
-
|
410
531
|
### Acknowledgments and Discussion
|
411
532
|
|
412
533
|
Before we get started with the details, I would like to thank [Jeremy Voorhis](https://github.com/jvoorhis/). The bindings present in RLTK are really a fork of the great work that he did on [ruby-llvm](https://github.com/jvoorhis/ruby-llvm).
|
@@ -468,40 +589,54 @@ Functions in LLVM are much like C functions; they have a return type, argument t
|
|
468
589
|
|
469
590
|
The first way to create functions is via a module's function collection:
|
470
591
|
|
471
|
-
|
592
|
+
```Ruby
|
593
|
+
mod.functions.add('my function', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
|
594
|
+
```
|
472
595
|
|
473
596
|
Here we have defined a function named 'my function' in the `mod` module. It takes two native integers as arguments and returns a native integer. It is also possible to define the type of a function ahead of time and pass it to this method:
|
474
597
|
|
475
|
-
|
476
|
-
|
598
|
+
```Ruby
|
599
|
+
type = RLTK::CG::FunctionType.new(RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
|
600
|
+
mod.functions.add('my function', type)
|
601
|
+
```
|
477
602
|
|
478
603
|
Functions may also be created directly via the {RLTK::CG::Function#initialize RLTK::CG::Function.new} method, though a reference to a module is still necessary:
|
479
604
|
|
480
|
-
|
481
|
-
|
605
|
+
```Ruby
|
606
|
+
mod = Module.new('my module')
|
607
|
+
fun = Function.new(mod, 'my function', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
|
608
|
+
```
|
482
609
|
|
483
610
|
or
|
484
|
-
|
485
|
-
|
486
|
-
|
487
|
-
|
611
|
+
|
612
|
+
```Ruby
|
613
|
+
mod = Module.new('my module')
|
614
|
+
type = RLTK::CG::FunctionType.new(RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
|
615
|
+
fun = Function.new(mod, 'my function', type)
|
616
|
+
```
|
488
617
|
|
489
618
|
Lastly, whenever you use one of these methods to create a function you may give it a block to be executed inside the context of the function object. This allows for easier building of functions:
|
490
619
|
|
491
|
-
|
492
|
-
|
493
|
-
|
494
|
-
|
620
|
+
```Ruby
|
621
|
+
mod.functions.add('my function', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
|
622
|
+
bb = blocks.append('entry)'
|
623
|
+
...
|
624
|
+
end
|
625
|
+
```
|
495
626
|
|
496
627
|
### Basic Blocks
|
497
628
|
|
498
629
|
Once a function has been added to a module you will need to add {RLTK::CG::BasicBlock BasicBlocks} to the function. This can be done easily:
|
499
630
|
|
500
|
-
|
631
|
+
```Ruby
|
632
|
+
bb = fun.blocks.append('entry')
|
633
|
+
```
|
501
634
|
|
502
635
|
We now have a basic block that we can use to add instructions to our function and get it to actually do something. You can also instantiate basic blocks directly:
|
503
636
|
|
504
|
-
|
637
|
+
```Ruby
|
638
|
+
bb = RLTK::CG::BasicBlock.new(fun, 'entry')
|
639
|
+
```
|
505
640
|
|
506
641
|
### The Builder
|
507
642
|
|
@@ -509,121 +644,151 @@ Now that you have a basic block you need to add instructions to it. This is acc
|
|
509
644
|
|
510
645
|
To add instructions using a builder directly (this is most similar to how it is done using C/C++) you create the builder, position it where you want to add instructions, and then build them:
|
511
646
|
|
512
|
-
|
513
|
-
|
514
|
-
|
515
|
-
|
516
|
-
|
517
|
-
|
518
|
-
|
519
|
-
|
520
|
-
|
521
|
-
|
522
|
-
|
523
|
-
|
647
|
+
```Ruby
|
648
|
+
fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
|
649
|
+
bb = fun.blocks.append('entry')
|
650
|
+
|
651
|
+
builder = RLTK::CG::Builder.new
|
652
|
+
|
653
|
+
builder.position_at_end(bb)
|
654
|
+
|
655
|
+
# Generate an add instruction.
|
656
|
+
inst0 = builder.add(fun.params[0], fun.params[1])
|
657
|
+
|
658
|
+
# Generate a return instruction.
|
659
|
+
builder.ret(inst0)
|
660
|
+
```
|
524
661
|
|
525
662
|
You can get rid of some of those references to the builder by using the {RLTK::CG::Builder#build} method:
|
526
663
|
|
527
|
-
|
528
|
-
|
529
|
-
|
530
|
-
|
531
|
-
|
532
|
-
|
533
|
-
|
534
|
-
|
664
|
+
```Ruby
|
665
|
+
fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
|
666
|
+
bb = fun.blocks.append('entry')
|
667
|
+
|
668
|
+
builder = RLTK::CG::Builder.new
|
669
|
+
|
670
|
+
builder.build(bb) do
|
671
|
+
ret add(fun.params[0], fun.params[1])
|
672
|
+
end
|
673
|
+
```
|
535
674
|
|
536
675
|
To get rid of more code:
|
537
676
|
|
538
|
-
|
539
|
-
|
540
|
-
|
541
|
-
|
542
|
-
|
543
|
-
|
677
|
+
```Ruby
|
678
|
+
fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
|
679
|
+
bb = fun.blocks.append('entry')
|
680
|
+
|
681
|
+
RLTK::CG::Builder.new(bb) do
|
682
|
+
ret add(fun.params[0], fun.params[1])
|
683
|
+
end
|
684
|
+
```
|
544
685
|
|
545
|
-
|
686
|
+
or
|
546
687
|
|
547
|
-
|
548
|
-
|
549
|
-
|
550
|
-
|
688
|
+
```Ruby
|
689
|
+
fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
|
690
|
+
fun.blocks.append('entry') do
|
691
|
+
ret add(fun.params[0], fun.params[1])
|
692
|
+
end
|
693
|
+
```
|
551
694
|
|
552
|
-
|
695
|
+
or even
|
553
696
|
|
554
|
-
|
555
|
-
|
556
|
-
|
557
|
-
|
558
|
-
|
697
|
+
```Ruby
|
698
|
+
mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
|
699
|
+
blocks.append('entry') do |fun|
|
700
|
+
ret add(fun.params[0], fun.params[1])
|
701
|
+
end
|
702
|
+
end
|
703
|
+
```
|
559
704
|
|
560
705
|
In the last two examples a new builder object is created for the block. It is possible to specify the builder to be used:
|
561
706
|
|
562
|
-
|
563
|
-
|
564
|
-
|
565
|
-
|
566
|
-
|
567
|
-
|
568
|
-
|
707
|
+
```Ruby
|
708
|
+
builder = RLTK::CG::Builder.new
|
709
|
+
|
710
|
+
mod.functions.add('add', RLTK:CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
|
711
|
+
blocks.append('entry', builder) do |fun|
|
712
|
+
ret add(fun.params[0], fun.params[1])
|
713
|
+
end
|
714
|
+
end
|
715
|
+
```
|
569
716
|
|
570
717
|
For an example of where this is useful, see the Kazoo tutorial.
|
571
718
|
|
572
|
-
###
|
719
|
+
### The Contractor
|
573
720
|
|
574
|
-
|
721
|
+
An alternative to using the {RLTK::CG::Builder} class is to use the {RLTK::CG::Contractor} class, which is a subclass of the Builder and includes the Filigree::Visitor module. (Get it? It's a visiting builder!) By subclassing the Contractor you can define blocks of code for handling various types of AST nodes and leave the selection of the correct code up to the {RLTK::CG::Contractor#visit} method. In addition, the `:at` and `:rcb` options to the *visit* method make it much easier to manage the positioning of the Builder.
|
575
722
|
|
576
|
-
|
577
|
-
jit = RLTK::CG::JITCompiler(mod)
|
578
|
-
|
579
|
-
mod.functions.add('add', RLTK:CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
|
580
|
-
blocks.append('entry', nil, nil, self) do |fun|
|
581
|
-
ret add(fun.params[0], fun.params[1])
|
582
|
-
end
|
583
|
-
end
|
723
|
+
Here we can see how easy it is to define a block that builds the instructions for binary operations:
|
584
724
|
|
585
|
-
|
725
|
+
```Ruby
|
726
|
+
on Binary do |node|
|
727
|
+
left = visit node.left
|
728
|
+
right = visit node.right
|
586
729
|
|
587
|
-
|
730
|
+
case node
|
731
|
+
when Add then fadd(left, right, 'addtmp')
|
732
|
+
when Sub then fsub(left, right, 'subtmp')
|
733
|
+
when Mul then fmul(left, right, 'multmp')
|
734
|
+
when Div then fdiv(left, right, 'divtmp')
|
735
|
+
when LT then ui2fp(fcmp(:ult, left, right, 'cmptmp'), RLTK::CG::DoubleType, 'booltmp')
|
736
|
+
end
|
737
|
+
end
|
738
|
+
```
|
588
739
|
|
589
|
-
|
740
|
+
AST nodes whos translation requires the generation of control flow will require the creation of new BasicBlocks and the repositioning of the builder. This can be easily managed:
|
590
741
|
|
591
|
-
|
742
|
+
```Ruby
|
743
|
+
on If do |node|
|
744
|
+
cond_val = visit node.cond
|
745
|
+
fcmp :one, cond_val, ZERO, 'ifcond'
|
592
746
|
|
593
|
-
|
747
|
+
start_bb = current_block
|
748
|
+
fun = start_bb.parent
|
594
749
|
|
595
|
-
|
750
|
+
then_bb = fun.blocks.append('then')
|
751
|
+
then_val, new_then_bb = visit node.then, at: then_bb, rcb: true
|
596
752
|
|
597
|
-
|
753
|
+
else_bb = fun.blocks.append('else')
|
754
|
+
else_val, new_else_bb = visit node.else, at: else_bb, rcb: true
|
598
755
|
|
599
|
-
|
600
|
-
|
601
|
-
grammar.production(:s) do
|
602
|
-
clause('A G D')
|
603
|
-
clause('A a C')
|
604
|
-
clause('B a D')
|
605
|
-
clause('B G C')
|
606
|
-
end
|
607
|
-
|
608
|
-
grammar.production(:a, 'b')
|
609
|
-
grammar.production(:b, 'G')
|
756
|
+
merge_bb = fun.blocks.append('merge', self)
|
757
|
+
phi_inst = build(merge_bb) { phi RLTK::CG::DoubleType, {new_then_bb => then_val, new_else_bb => else_val}, 'iftmp' }
|
610
758
|
|
611
|
-
|
759
|
+
build(start_bb) { cond cond_val, then_bb, else_bb }
|
612
760
|
|
613
|
-
|
761
|
+
build(new_then_bb) { br merge_bb }
|
762
|
+
build(new_else_bb) { br merge_bb }
|
614
763
|
|
615
|
-
|
764
|
+
returning(phi_inst) { target merge_bb }
|
765
|
+
end
|
766
|
+
```
|
616
767
|
|
617
|
-
|
768
|
+
More extensive examples of how to use the Contractor class can be found in the Kazoo tutorial chapters.
|
618
769
|
|
619
|
-
|
770
|
+
### Execution Engines
|
620
771
|
|
621
|
-
|
622
|
-
|
623
|
-
|
624
|
-
|
625
|
-
|
626
|
-
|
772
|
+
Once you have generated your code you may want to run it. RLTK provides bindings to both the LLVM interpreter and JIT compiler to help you do just that. Creating a JIT compiler is pretty simple.
|
773
|
+
|
774
|
+
```Ruby
|
775
|
+
mod = RLTK::CG::Module.new('my module')
|
776
|
+
jit = RLTK::CG::JITCompiler(mod)
|
777
|
+
|
778
|
+
mod.functions.add('add', RLTK:CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
|
779
|
+
blocks.append('entry', nil, nil, self) do |fun|
|
780
|
+
ret add(fun.params[0], fun.params[1])
|
781
|
+
end
|
782
|
+
end
|
783
|
+
```
|
784
|
+
|
785
|
+
Now you can run your 'add' function like this:
|
786
|
+
|
787
|
+
```Ruby
|
788
|
+
jit.run(fun, 1, 2)
|
789
|
+
```
|
790
|
+
|
791
|
+
The result will be a {RLTK::CG::GenericValue} object, and you will want to use its {RLTK::CG::GenericValue#to\_i #to\_i} and {RLTK::CG::GenericValue#to\_f #to\_f} methods to get the Ruby value result.
|
627
792
|
|
628
793
|
## Tutorial
|
629
794
|
|
@@ -637,22 +802,24 @@ The Kazoo toy language is a procedural language that allows you to define functi
|
|
637
802
|
|
638
803
|
Because we want to keep things simple the only datatype in Kazoo is a 64-bit floating point type (a C double or a Ruby float). As such, all values are implicitly double precision and the language doesn’t require type declarations. This gives the language a very nice and simple syntax. For example, the following example computes Fibonacci numbers:
|
639
804
|
|
640
|
-
|
641
|
-
|
642
|
-
|
643
|
-
|
644
|
-
|
805
|
+
```
|
806
|
+
def fib(x)
|
807
|
+
if x < 3 then
|
808
|
+
1
|
809
|
+
else
|
810
|
+
fib(x-1) + fib(x-2)
|
811
|
+
```
|
645
812
|
|
646
813
|
The tutorial is organized as follows:
|
647
814
|
|
648
|
-
* [Chapter 1: The Lexer](
|
649
|
-
* [Chapter 2: The AST Nodes](
|
650
|
-
* [Chapter 3: The Parser](
|
651
|
-
* [Chapter 4: AST Translation](
|
652
|
-
* [Chapter 5: JIT Compilation](
|
653
|
-
* [Chapter 6: Adding Control Flow](
|
654
|
-
* [Chapter 7: Playtime](
|
655
|
-
* [Chapter 8: Mutable Variables](
|
815
|
+
* [Chapter 1: The Lexer](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%201/Chapter1.md)
|
816
|
+
* [Chapter 2: The AST Nodes](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%202/Chapter2.md)
|
817
|
+
* [Chapter 3: The Parser](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%203/Chapter3.md)
|
818
|
+
* [Chapter 4: AST Translation](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%204/Chapter4.md)
|
819
|
+
* [Chapter 5: JIT Compilation](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%205/Chapter5.md)
|
820
|
+
* [Chapter 6: Adding Control Flow](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%206/Chapter6.md)
|
821
|
+
* [Chapter 7: Playtime](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%207/Chapter7.md)
|
822
|
+
* [Chapter 8: Mutable Variables](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%208/Chapter8.md)
|
656
823
|
|
657
824
|
Before starting this tutorial you should know about regular expressions, the basic ideas behind lexing and parsing, and be able to read context-free grammar (CFG) definitions. By the end of this tutorial we will have written 372 lines of source code and have a JIT compiler for a Turing complete language.
|
658
825
|
|
@@ -668,10 +835,16 @@ The following lexer and parser classes are included as part of RLTK:
|
|
668
835
|
|
669
836
|
## Contributing
|
670
837
|
|
671
|
-
If you are interested in contributing to RLTK you can:
|
838
|
+
If you are interested in contributing to RLTK there are many aspects of the library that you can work on. A detailed TODO list can be found [here](https://github.com/chriswailes/RLTK/blob/master/TODO.md). If you are looking for smaller units of work feel free to:
|
672
839
|
|
673
840
|
* Help provide unit tests. Not all of RLTK is tested as well as it could be. Specifically, more tests for the RLTK::CFG and RLTK::Parser classes would be appreciated.
|
674
841
|
* Write lexers or parsers that you think others might want to use. Possibilities include HTML, JSON/YAML, Javascript, and Ruby.
|
675
|
-
* Write a class for dealing with regular languages.
|
676
842
|
* Extend the RLTK::CFG class with additional functionality.
|
677
|
-
|
843
|
+
|
844
|
+
Lastly, I love hearing back from users. If you find any part of the documentation unclear or incomplete let me know. It is also helpful to me to know how people are using the library, so if you are using RLTK in your project send me an email. This lets me know what features are being used and where I should focus my development efforts.
|
845
|
+
|
846
|
+
## News
|
847
|
+
|
848
|
+
Aaaaand we're back. Development of RLTK has been on hold for a while as I worked on other projects. If you want to see what I've been up to, you can check out [Clang's](http://llvm.org/clang) new `-Wconsumed` flag and the [Filigree](http://github.com/chriswailes/filigree) gem.
|
849
|
+
|
850
|
+
The next version of RLTK is going to be updated to require Ruby 2.0 as well as LLVM 3.4. Previous versions of RLTK required my LLVM-ECB libarary to expose extra LLVM features through the C bindings; this is no longer necessary as this functionality has been moved into LLVM proper. If anyone has any requests for new or improved features for RLTK version 3.0, let me know.
|