rltk3 3.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (56) hide show
  1. checksums.yaml +7 -0
  2. data/AUTHORS +1 -0
  3. data/LICENSE +27 -0
  4. data/README.md +852 -0
  5. data/Rakefile +197 -0
  6. data/lib/rltk/ast.rb +573 -0
  7. data/lib/rltk/cfg.rb +683 -0
  8. data/lib/rltk/cg/basic_block.rb +157 -0
  9. data/lib/rltk/cg/bindings.rb +151 -0
  10. data/lib/rltk/cg/builder.rb +1127 -0
  11. data/lib/rltk/cg/context.rb +48 -0
  12. data/lib/rltk/cg/contractor.rb +51 -0
  13. data/lib/rltk/cg/execution_engine.rb +194 -0
  14. data/lib/rltk/cg/function.rb +237 -0
  15. data/lib/rltk/cg/generated_bindings.rb +8118 -0
  16. data/lib/rltk/cg/generic_value.rb +95 -0
  17. data/lib/rltk/cg/instruction.rb +519 -0
  18. data/lib/rltk/cg/llvm.rb +150 -0
  19. data/lib/rltk/cg/memory_buffer.rb +75 -0
  20. data/lib/rltk/cg/module.rb +451 -0
  21. data/lib/rltk/cg/pass_manager.rb +252 -0
  22. data/lib/rltk/cg/support.rb +29 -0
  23. data/lib/rltk/cg/target.rb +230 -0
  24. data/lib/rltk/cg/triple.rb +58 -0
  25. data/lib/rltk/cg/type.rb +554 -0
  26. data/lib/rltk/cg/value.rb +1272 -0
  27. data/lib/rltk/cg.rb +32 -0
  28. data/lib/rltk/lexer.rb +372 -0
  29. data/lib/rltk/lexers/calculator.rb +44 -0
  30. data/lib/rltk/lexers/ebnf.rb +38 -0
  31. data/lib/rltk/parser.rb +1702 -0
  32. data/lib/rltk/parsers/infix_calc.rb +43 -0
  33. data/lib/rltk/parsers/postfix_calc.rb +34 -0
  34. data/lib/rltk/parsers/prefix_calc.rb +34 -0
  35. data/lib/rltk/token.rb +90 -0
  36. data/lib/rltk/version.rb +11 -0
  37. data/lib/rltk.rb +16 -0
  38. data/test/cg/tc_basic_block.rb +83 -0
  39. data/test/cg/tc_control_flow.rb +191 -0
  40. data/test/cg/tc_function.rb +54 -0
  41. data/test/cg/tc_generic_value.rb +33 -0
  42. data/test/cg/tc_instruction.rb +256 -0
  43. data/test/cg/tc_llvm.rb +25 -0
  44. data/test/cg/tc_math.rb +88 -0
  45. data/test/cg/tc_module.rb +89 -0
  46. data/test/cg/tc_transforms.rb +68 -0
  47. data/test/cg/tc_type.rb +69 -0
  48. data/test/cg/tc_value.rb +151 -0
  49. data/test/cg/ts_cg.rb +23 -0
  50. data/test/tc_ast.rb +332 -0
  51. data/test/tc_cfg.rb +164 -0
  52. data/test/tc_lexer.rb +216 -0
  53. data/test/tc_parser.rb +711 -0
  54. data/test/tc_token.rb +34 -0
  55. data/test/ts_rltk.rb +47 -0
  56. metadata +317 -0
data/README.md ADDED
@@ -0,0 +1,852 @@
1
+ # Welcome to the Ruby Language Toolkit
2
+
3
+ **NOTE: This is not actively maintained or dfeveloped, and I probably cannot help you with support. This is a fork of [Chris Wailes' project](https://github.com/chriswailes/RLTK) intended to keep the Ruby gem up to date, and accept merge requests to fix any breaking changes.**
4
+
5
+ RLTK is a collection of classes and methods designed to help programmers work with languages in an easy to use and straightforward manner. This toolkit provides the following features:
6
+
7
+ * Lexer generator
8
+ * Parser generator
9
+ * AST node baseclass
10
+ * Class for representing context free grammars
11
+ * [Low Level Virtual Machine](http://llvm.org) (LLVM) bindings for code generation
12
+
13
+ In addition, RLTK includes several ready-made lexers and parsers and a Turing-complete language called Kazoo for use in your code and as examples for how to use the toolkit.
14
+
15
+ ## Why Use RLTK
16
+
17
+ Here are some reasons to use RLTK to build your lexers, parsers, and abstract syntax trees, as well as generating LLVM IR and native object files:
18
+
19
+ * **Lexer and Parser Definitions in Ruby** - Many tools require you to write your lexer/parser definitions in their own format, which is then processed and used to generate Ruby code. RLTK lexers/parsers are written entirely in Ruby and use syntax you are already familiar with.
20
+
21
+ * **Re-entrant Code** - The lexers and parsers generated by RLTK are fully re-entrant.
22
+
23
+ * **Multiple Lexers and Parsers** - You can define as many lexers and parses as you want, and instantiate as many of them as you need.
24
+
25
+ * **Token Positions** - Detailed information about a token's position is available in the parser.
26
+
27
+ * **Feature Rich Lexing and Parsing** - Often, lexer and parser generators will try and force you to do everything their way. RLTK gives you more flexibility with features such as states and flags for lexers, and argument arrays for parsers. What's more, these features actually work (I'm looking at you REX).
28
+
29
+ * **LALR(1)/GLR Parsing** - RLTK parsers use the LALR(1)/GLR parsing algorithms, which means you get both speed and the ability to handle **any** context-free grammar.
30
+
31
+ * **Parser Serialization** - RLTK parsers can be serialized and saved after they are generated for faster loading the next time they are required.
32
+
33
+ * **Error Productions** - RLTK parsers can use error productions to recover from, and report on, errors.
34
+
35
+ * **Fast Prototyping** - If you need to change your lexer/parser you don't have to re-run the lexer and parser generation tools, simply make the changes and be on your way.
36
+
37
+ * **Parse Tree Graphs** - RLTK parsers can print parse trees (in the DOT language) of accepted strings.
38
+
39
+ * **LLVM Bindings** - RLTK provides wrappers for most of the C LLVM bindings.
40
+
41
+ * **The Contractor** - LLVM's method of building instructions is a bit cumbersome, and is very imperative in style. RLTK provides the Contractor class to make things easier.
42
+
43
+ * **Documentation** - We have it!
44
+
45
+ * **I Eat My Own Dog Food** - I'm using RLTK for my own projects so if there is a bug I'll most likely be the first one to know.
46
+
47
+ ## Lexers
48
+
49
+ To create your own lexer using RLTK you simply need to subclass the {RLTK::Lexer} class and define the *rules* that will be used for matching text and generating tokens. Here we see a simple lexer for a calculator:
50
+
51
+ ```Ruby
52
+ class Calculator < RLTK::Lexer
53
+ rule(/\+/) { :PLS }
54
+ rule(/-/) { :SUB }
55
+ rule(/\*/) { :MUL }
56
+ rule(/\//) { :DIV }
57
+
58
+ rule(/\(/) { :LPAREN }
59
+ rule(/\)/) { :RPAREN }
60
+
61
+ rule(/[0-9]+/) { |t| [:NUM, t.to_i] }
62
+
63
+ rule(/\s/)
64
+ end
65
+ ```
66
+
67
+ The {RLTK::Lexer.rule} method's first argument is the regular expression used for matching text. The block passed to the function is the action that executes when a substring is matched by the rule. These blocks must return the *type* of the token (which must be in ALL CAPS; see the Parsers section), and optionally a *value*. In the latter case you must return an array containing the *type* and *value*, which you can see an example of in the Calculator lexer shown above. The values returned by the proc object are used to build a {RLTK::Token} object that includes the *type* and *value* information, as well as information about the line number the token was found on, the offset from the beginning of the line to the start of the token, and the length of the token's text. If the *type* value returned by the proc is `nil` the input is discarded and no token is produced.
68
+
69
+ The {RLTK::Lexer} class provides both {RLTK::Lexer.lex} and {RLTK::Lexer.lex_file}. The {RLTK::Lexer.lex} method takes a string as its argument and returns an array of tokens, with an *end of stream* token automatically added to the result. The {RLTK::Lexer.lex_file} method takes the name of a file as input, and lexes the contents of the specified file.
70
+
71
+ ### The Lexing Environment
72
+
73
+ The proc objects passed to the {RLTK::Lexer.rule} methods are evaluated inside an instance of the {RLTK::Lexer::Environment} class. This gives you access to methods for manipulating the lexer's state and flags (see bellow). You can also subclass the environment inside your lexer to provide additional functionality to your rule blocks. When doing so you need to ensure that you name your new class Environment like in the following example:
74
+
75
+ ```Ruby
76
+ class MyLexer < RLTK::Lexer
77
+ ...
78
+
79
+ class Environment < Environment
80
+ def helper_function
81
+ ...
82
+ end
83
+
84
+ ...
85
+ end
86
+ end
87
+ ```
88
+
89
+ ### Using States
90
+
91
+ The lexing environment may be used to keep track of state inside your lexer. When rules are defined they are defined inside a given state, which is specified by the second parameter to {RLTK::Lexer.rule}. The default state is cleverly named `:default`. When the lexer is scanning the input string for matching rules, it only considers the rules for the given state.
92
+
93
+ The methods used to manipulate state are:
94
+
95
+ * **RLTK::Lexer::Environment.push_state** - Pushes a new state onto the stack.
96
+ * **RLTK::Lexer::Environment.pop_state** - Pops a state off of the stack.
97
+ * **RLTK::Lexer::Environment.set_state** - Sets the state at the top of the stack.
98
+ * **RLTK::Lexer::Environment.state** - Returns the current state.
99
+
100
+ States may be used to easily support nested comments.
101
+
102
+ ```Ruby
103
+ class StateLexer < RLTK::Lexer
104
+ rule(/a/) { :A }
105
+ rule(/\s/)
106
+
107
+ rule(/\(\*/) { push_state(:comment) }
108
+
109
+ rule(/\(\*/, :comment) { push_state(:comment) }
110
+ rule(/\*\)/, :comment) { pop_state }
111
+ rule(/./, :comment)
112
+ end
113
+ ```
114
+
115
+ By default the lexer will start in the `:default` state. To change this, you may use the {RLTK::Lexer.start} method.
116
+
117
+ ### Using Flags
118
+
119
+ The lexing environment also maintains a set of *flags*. This set is manipulated using the following methods:
120
+
121
+ * **RLTK::Lexer::Environment.set_flag** - Adds the specified flag to the set of flags.
122
+ * **RLTK::Lexer::Environment.unset_flag** - Removes the specified flag from the set of flags.
123
+ * **RLTK::Lexer::Environment.clear_flags** - Unsets all flags.
124
+
125
+ When *rules* are defined they may use a third parameter to specify a list of flags that must be set before the rule is considered when matching substrings. An example of this usage follows:
126
+
127
+ ```Ruby
128
+ class FlagLexer < RLTK::Lexer
129
+ rule(/a/) { set_flag(:a); :A }
130
+
131
+ rule(/\s/)
132
+
133
+ rule(/b/, :default, [:a]) { set_flag(:b); :B }
134
+ rule(/c/, :default, [:a, :b]) { :C }
135
+ end
136
+ ```
137
+
138
+ ### Instantiating Lexers
139
+
140
+ In addition to using the {RLTK::Lexer.lex} class method you may also instantiate lexer objects. The only difference then is that the lexing environment used between subsequent calls to {RLTK::Lexer#lex} is the same object, and therefor allows you to keep persistent state.
141
+
142
+ ### First and Longest Match
143
+
144
+ A RLTK::Lexer may be told to select either the first substring that is found to match a rule or the longest substring to match any rule. The default behavior is to match the longest substring possible, but you can change this by using the {RLTK::Lexer.match_first} method inside your class definition as follows:
145
+
146
+ ```Ruby
147
+ class MyLexer < RLTK::Lexer
148
+ match_first
149
+
150
+ ...
151
+ end
152
+ ```
153
+
154
+ ### Match Data
155
+
156
+ Because it isn't RLTK's job to tell you how to write lexers and parsers, the MatchData object from a pattern match is available inside the Lexer::Environment object via the `match` accessor.
157
+
158
+ ## Context-Free Grammars
159
+
160
+ The {RLTK::CFG} class provides an abstraction for context-free grammars. For the purpose of this class terminal symbols appear in **ALL CAPS**, and non-terminal symbols appear in **all lowercase**. Once a grammar is defined the {RLTK::CFG#first_set} and {RLTK::CFG#follow_set} methods can be used to find *first* and *follow* sets.
161
+
162
+ ### Defining Grammars
163
+
164
+ A grammar is defined by first instantiating the {RLTK::CFG} class. The {RLTK::CFG#production} and {RLTK::CFG#clause} methods may then be used to define the productions of the grammar. The `production` method can take a Symbol denoting the left-hand side of the production and a string describing the right-hand side of the production, or the left-hand side symbol and a block. In the first usage a single production is created. In the second usage the block may contain repeated calls to the `clause` method, each call producing a new production with the same left-hand side but different right-hand sides. {RLTK::CFG#clause} may not be called outside of {RLTK::CFG#production}. Bellow we see a grammar definition that uses both methods:
165
+
166
+ ```Ruby
167
+ grammar = RLTK::CFG.new
168
+
169
+ grammar.production(:s) do
170
+ clause('A G D')
171
+ clause('A a C')
172
+ clause('B a D')
173
+ clause('B G C')
174
+ end
175
+
176
+ grammar.production(:a, 'b')
177
+ grammar.production(:b, 'G')
178
+ ```
179
+
180
+ ### Extended Backus–Naur Form
181
+
182
+ The RLTK::CFG class understands grammars written in the extended Backus–Naur form. This allows you to use the \*, \+, and ? operators in your grammar definitions. When each of these operators are encountered additional productions are generated. For example, if the right-hand side of a production contained `NUM*` a production of the form `num_star -> | NUM num_star` is added to the grammar. As such, your grammar should not contain productions with similar left-hand sides (e.g. foo_star, bar_question, or baz_plus).
183
+
184
+ As these additional productions are added internally to the grammar a callback functionality is provided to let you know when such an event occurs. The callback proc object can either be specified when the CFG object is created, or by using the {RLTK::CFG#callback} method. The callback will receive three arguments: the production generated, the operator that triggered the generation, and a symbol (:first or :second) specifying which clause of the production this callback is for.
185
+
186
+ ### Helper Functions
187
+
188
+ Once a grammar has been defined you can use the following functions to obtain information about it:
189
+
190
+ * {RLTK::CFG#first_set} - Returns the *first set* for the provided symbol or sentence.
191
+ * {RLTK::CFG#follow_set} - Returns the *follow set* for the provided symbol.
192
+ * {RLTK::CFG#nonterms} - Returns a list of the non-terminal symbols used in the grammar's definition.
193
+ * {RLTK::CFG#productions} - Provides either a hash or array of the grammar's productions.
194
+ * {RLTK::CFG#symbols} - Returns a list of all symbols used in the grammar's definition.
195
+ * {RLTK::CFG#terms} - Returns a list of the terminal symbols used in the grammar's definition.
196
+
197
+ ## Parsers
198
+
199
+ To create a parser using RLTK simply subclass RLTK::Parser, define the productions of the grammar you wish to parse, and call `finalize`. During finalization RLTK will build an LALR(1) parsing table, which may contain conflicts that can't be resolved with LALR(1) lookahead sets or precedence/associativity information. Traditionally, when parser generators such as **YACC** encounter conflicts during parsing table generation they will resolve shift/reduce conflicts in favor of shifts and reduce/reduce conflicts in favor of the production that was defined first. This means that the generated parsers can't handle ambiguous grammars.
200
+
201
+ RLTK parsers, on the other hand, can handle *all* context-free grammars by forking the parse stack when shift/reduce or reduce/reduce conflicts are encountered. This method is called the GLR parsing algorithm and allows the parser to explore multiple different possible derivations, discarding the ones that don't produce valid parse trees. GLR parsing is more expensive, in both time and space requirements, but these penalties are only payed when a parser for an ambiguous grammar is given an input with multiple parse trees, and as such most parsing should proceed using the faster LALR(1) base algorithm.
202
+
203
+ ### Defining a Grammar
204
+
205
+ Let us look at the simple prefix calculator included with RLTK:
206
+
207
+ ```Ruby
208
+ class PrefixCalc < RLTK::Parser
209
+ production(:e) do
210
+ clause('NUM') {|n| n}
211
+
212
+ clause('PLS e e') { |_, e0, e1| e0 + e1 }
213
+ clause('SUB e e') { |_, e0, e1| e0 - e1 }
214
+ clause('MUL e e') { |_, e0, e1| e0 * e1 }
215
+ clause('DIV e e') { |_, e0, e1| e0 / e1 }
216
+ end
217
+
218
+ finalize
219
+ end
220
+ ```
221
+
222
+ The parser uses the same method for defining productions as the {RLTK::CFG} class. In fact, the parser forwards the {RLTK::Parser.production} and {RLTK::Parser.clause} method invocations to an internal {RLTK::CFG} object after removing the parser specific information. To see a detailed description of grammar definitions please read the Context-Free Grammars section bellow.
223
+
224
+ It is important to note that the proc objects associated with productions should evaluate to the value you wish the left-hand side of the production to take.
225
+
226
+ The default starting symbol of the grammar is the left-hand side of the first production defined (in this case, _e_). This can be changed using the {RLTK::Parser.start} function when defining your parser.
227
+
228
+ **Make sure you call `finalize` at the end of your parser definition, and only call it once.**
229
+
230
+ ### Shortcuts
231
+
232
+ RLTK provides several shortcuts for common grammar constructs. Right now these shortcuts include the {RLTK::Parser.list} and {RLTK::Parser.nonempty_list} methods. A list may contain 0, 1, or more elements, with an optional token or tokens separating each element. A non-empty list contains **at least** 1 element. An empty list with only a single list element and an empty separator is equivalent to the Kleene star. Similarly, a list with only a single list element and an empty separator is equivalent to the Kleene plus.
233
+
234
+ This example shows how these shortcuts may be used to define a list of integers separated by a `:COMMA` token:
235
+
236
+ ```Ruby
237
+ class ListParser < RLTK::Parser
238
+ nonempty_list(:int_list, :INT, :COMMA)
239
+
240
+ finalize
241
+ end
242
+ ```
243
+
244
+ If you wanted to define a list of floats or integers you could define your parser like this:
245
+
246
+ ```Ruby
247
+ class ListParser < RLTK::Parser
248
+ nonempty_list(:mixed_list, [:INT, :FLOAT], :COMMA)
249
+
250
+ finalize
251
+ end
252
+ ```
253
+
254
+ If you don't want to require a separator you can do this:
255
+
256
+ ```Ruby
257
+ class ListParser < RLTK::Parser
258
+ nonempty_list(:mixed_nonsep_list, [:INT, :FLOAT])
259
+
260
+ finalize
261
+ end
262
+ ```
263
+
264
+ You can also use separators that are made up of multiple tokens:
265
+
266
+ ```Ruby
267
+ class ListParser < RLTK::Parser
268
+ nonempty_list(:mixed_nonsep_list, [:INT, :FLOAT], 'COMMA NEWLINE?')
269
+
270
+ finalize
271
+ end
272
+ ```
273
+
274
+ A list may also contain multiple tokens between the separator:
275
+
276
+ ```Ruby
277
+ class ListParser < RLTK::Parser
278
+ nonempty_list(:foo_bar_list, 'FOO BAR', :COMMA)
279
+
280
+ finalize
281
+ end
282
+ ```
283
+
284
+ Lastly, you can mix all of these features together:
285
+
286
+ ```Ruby
287
+ class ListParser < RLTK::Parser
288
+ nonempty_list(:foo_list, ['FOO BAR', 'FOO BAZ+'], :COMMA)
289
+
290
+ finalize
291
+ end
292
+ ```
293
+
294
+ The productions generated by these shortcuts will always evaluate to an array. In the first two examples above the productions will produce a 1-D array containing the values of the `INT` or `FLOAT` tokens. In the last two examples the productions `foo_bar_list` and `foo_list` will produce 2-D arrays where the top level array is composed of tuples corresponding to the values of `FOO`, and `BAR` or one or more `BAZ`s.
295
+
296
+ ### Precedence and Associativity
297
+
298
+ To help you remove ambiguity from your grammars RLTK lets you assign precedence and associativity information to terminal symbols. Productions then get assigned precedence and associativity based on either the last terminal symbol on the right-hand side of the production, or an optional parameter to the {RLTK::Parser.production} or {RLTK::Parser.clause} methods. When an {RLTK::Parser} encounters a shift/reduce error it will attempt to resolve it using the following rules:
299
+
300
+ 1. If there is precedence and associativity information present for all reduce actions involved and for the input token we attempt to resolve the conflict using the following rule. If not, no resolution is possible and the parser generator moves on. This conflict will later be reported to the programmer.
301
+
302
+ 2. The precedence of the actions involved in the conflict are compared (a shift action's precedence is based on the input token), and the action with the highest precedence is selected. If two actions have the same precedence the associativity of the input symbol is used: left associativity means we select the reduce action, right associativity means we select the shift action, and non-associativity means that we have encountered an error.
303
+
304
+ To assign precedence to terminal symbols you can use the {RLTK::Parser.left}, {RLTK::Parser.right}, and {RLTK::Parser.nonassoc} methods inside your parser class definition. Later declarations of associativity have higher levels of precedence than earlier declarations of the same associativity.
305
+
306
+ Let's look at the infix calculator example now:
307
+
308
+ ```Ruby
309
+ class InfixCalc < RLTK::Parser
310
+
311
+ left :PLS, :SUB
312
+ right :MUL, :DIV
313
+
314
+ production(:e) do
315
+ clause('NUM') { |n| n }
316
+
317
+ clause('LPAREN e RPAREN') { |_, e, _| e }
318
+
319
+ clause('e PLS e') { |e0, _, e1| e0 + e1 }
320
+ clause('e SUB e') { |e0, _, e1| e0 - e1 }
321
+ clause('e MUL e') { |e0, _, e1| e0 * e1 }
322
+ clause('e DIV e') { |e0, _, e1| e0 / e1 }
323
+ end
324
+
325
+ finalize
326
+ end
327
+ ```
328
+
329
+ The standard order of mathematical operations tells us that the correct way to group the operations in the expression `2 + 3 * 4` is `2 + (3 * 4)`. However, our grammar tells us that `(2 + 3) * 5` is also a valid way to parse the expression, leading to a shift/reduce error in the parser. To get rid of the shift/reduce error we need some way to tell the parser how to distinguish between these two parse trees. This is where associativity comes in. If the parser has already read `NUM PLS NUM` and the current symbol is a `MUL` symbol we want to tell the parser to shift the new `MUL` symbol onto the stack and continue on. We do this by making the `MUL` symbol right associative. When the parser generator encounters a shift/reduce error it looks at the token currently being read. If it has no associativity information, the error can't be resolved; if the token is left associative, it will remove the shift action from the parser (leaving only the reduce action); if the token is right associative, it will remove the reduce action from the parser (leaving only the shift action).
330
+
331
+ Now, let us consider the expression `3 - 2 - 1`. Here, the correct way to parse the expression is `(3 - 2) - 1`. To ensure that this case is selected over `3 - (2 - 1)` we can make the `SUB` token left associative. This will cause the symbols `NUM SUB NUM` to be reduced before the second `SUB` symbol is shifted onto the parse stack.
332
+
333
+ Not that, to resolve a shift/reduce or reduce/reduce conflict, precedence and associativity information must be present for all actions involved in the conflict. As such, it isn't enough to simply make the `MUL` and `DIV` tokens right associative; we must also make the `PLS` and `SUB` tokens left associative.
334
+
335
+ ### Token Selectors
336
+
337
+ In many cases productions contain tokens who's value is unimportant. In such situations passing `nil` to the production's action is not useful. To prevent this happening you may use *token selectors*. By placing a period (`.`) in front of a token you can indicate to the parser that the following token is important and you wish for its value to be passed to the action. In the following example selectors are used to only pass the sub-expressions' values to the action:
338
+
339
+ ```Ruby
340
+ class InfixCalc < RLTK::Parser
341
+
342
+ left :PLS, :SUB
343
+ right :MUL, :DIV
344
+
345
+ production(:e) do
346
+ clause('NUM') { |n| n }
347
+
348
+ clause('LPAREN .e RPAREN') { |e| e }
349
+
350
+ clause('.e PLS .e') { |e0, e1| e0 + e1 }
351
+ clause('.e SUB .e') { |e0, e1| e0 - e1 }
352
+ clause('.e MUL .e') { |e0, e1| e0 * e1 }
353
+ clause('.e DIV .e') { |e0, e1| e0 / e1 }
354
+ end
355
+
356
+ finalize
357
+ end
358
+ ```
359
+
360
+ ### Argument Passing for Actions
361
+
362
+ By default the proc objects associated with productions are passed one argument for each symbol on the right-hand side of the production. This can lead to long, unwieldy argument lists. To change this behaviour you can use the {RLTK::Parser.default_arg_type} method, which accepts the `:splat` (default) and `:array` arguments. Any production actions that are defined after a call to {RLTK::Parser.default_arg_type} will use the argument passing method currently set as the default. You can switch between the different argument passing methods by calling {RLTK::Parser.default_arg_type} repeatedly.
363
+
364
+ Individual productions may specify the argument type used by their action via the `arg_type` parameter. If the {RLTK::Parser.production} method is passed an argument type and a block, any clauses defined inside the block will use the argument type specified by the `arg_type` parameter.
365
+
366
+ ### The Parsing Environment
367
+
368
+ The parsing environment is the context in which the proc objects associated with productions are evaluated, and can be used to provide helper functions and to keep state while parsing. To define a custom environment simply subclass {RLTK::Parser::Environment} inside your parser definition as follows:
369
+
370
+ ```Ruby
371
+ class MyParser < RLTK::Parser
372
+ ...
373
+
374
+ class Environment < Environment
375
+ def helper_function
376
+ ...
377
+ end
378
+
379
+ ...
380
+ end
381
+
382
+ finalize
383
+ end
384
+ ```
385
+
386
+ (The definition of the Environment class may occur anywhere inside the MyParser class definition.)
387
+
388
+ ### Instantiating Parsers
389
+
390
+ In addition to using the {RLTK::Parser.parse} class method you may also instantiate parser objects. The only difference then is that the parsing environment used between subsequent calls to {RLTK::Parser#parse} is the same object, and therefor allows you to keep persistent state.
391
+
392
+ ### Finalization Options
393
+
394
+ The {RLTK::Parser.finalize} method has several options that you should be aware of:
395
+
396
+ * **explain** - Value should be `true`, `false`, an `IO` object, or a file name. Default value is `false`. If a non `false` (or `nil`) value is specified `finalize` will print an explanation of the parser to $stdout, the provided `IO` object, or the specified file. This explanation will include all of the productions defined, all of the terminal symbols used in the grammar definition, and the states present in the parsing table along with their items, actions, and conflicts.
397
+
398
+ * **lookahead** - Either `true` or `false`. Default value is `true`. Specifies whether the parser generator should build an LALR(1) or LR(0) parsing table. The LALR(1) table may have the same actions as the LR(0) table or fewer reduce actions if it is possible to resolve conflicts using lookahead sets.
399
+
400
+ * **precedence** - Either `true` or `false`. Default value is `true`. Specifies whether the parser generator should use precedence and associativity information to solve conflicts.
401
+
402
+ * **use** - Value should be `false`, the name of a file, or a file object. If the file exists and hasn't been modified since the parser definition was RLTK will load the parser definition from the file, saving a bunch of time. If the file doesn't exist or the parser has been modified since it was last used RLTK will save the parser's data structures to this file.
403
+
404
+ ### Parsing Options
405
+
406
+ The {RLTK::Parser.parse} and {RLTK::Parser#parse} methods also have several options that you should be aware of:
407
+
408
+ * **accept** - Either `:first` or `:all`. Default value is `:first`. This option tells the parser to accept the first successful parse-tree found, or all parse-trees that enter the accept state. It only affects the behavior of the parser if the defined grammar is ambiguous.
409
+
410
+ * **env** - This option specifies the environment in which the productions' proc objects are evaluated. The RLTK::Parser::parse class function will create a new RLTK::Parser::Environment on each call unless one is specified. RLTK::Parser objects have an internal, per-instance, RLTK::Parser::Environment that is the default value for this option when calling RLTK::Parser.parse
411
+
412
+ * **parse_tree** - Value should be `true`, `false`, an `IO` object, or a file name. Default value is `false`. If a non `false` (or `nil`) value is specified a DOT language description of all accepted parse trees will be printed out to $stdout, the provided `IO` object, or the specified file.
413
+
414
+ * **verbose** - Value should be `true`, `false`, an `IO` object, or a file name. Default value is `false`. If a non `false` (or `nil`) value is specified a detailed description of the actions of the parser are printed to $stdout, the provided `IO` object, or the specified file as it parses the input.
415
+
416
+ ### Parse Trees
417
+
418
+ The above section briefly mentions the *parse_tree* option. So that this neat feature doesn't get lost in the rest of the documentation here is the tree generated by the Kazoo parser from Chapter 7 of the tutorial when it parses the line `def fib(a) if a < 2 then 1 else fib(a-1) + fib(a-2);`:
419
+
420
+ ![Kazoo parse tree.](https://github.com/chriswailes/RLTK/raw/master/resources/simple_tree.png)
421
+
422
+ ### Parsing Exceptions
423
+
424
+ Calls to {RLTK::Parser.parse} may raise one of four exceptions:
425
+
426
+ * **RLTK::BadToken** - This exception is raised when a token is observed in the input stream that wasn't used in the language's definition.
427
+ * **RLTK::HandledError** - This exception is raised whenever an error production is encountered. The input stream is not actually in the language, but we were able to handle the encountered errors in a way that makes it appear that it is.
428
+ * **RLTK::InternalParserError** - This exception tells you that something REALLY went wrong. Users should never receive this exception.
429
+ * **RLTK::NotInLanguage** - This exception indicates that the input token stream is not in the parser's language.
430
+
431
+ ### Error Productions
432
+
433
+ **Warning: this is the least tested feature of RLTK. If you encounter any problems while using it, please let me know so I can fix any bugs as soon as possible.**
434
+
435
+ When an RLTK parser encounters a token for which there are no more valid actions (and it is on the last parse stack / possible parse-tree path) it will enter error handling mode. In this mode the parser pops states and input off of the parse stack (the parser is a pushdown automaton after all) until it finds a state that has a shift action for the `ERROR` terminal. A dummy `ERROR` terminal is then placed onto the parse stack and the shift action is taken. This error token will have the position information of the token that caused the parser to enter error handling mode. Additional tokens may have been discarded after this token.
436
+
437
+ If the input (including the `ERROR` token) can be reduced immediately the associated error handling proc is evaluated and we continue parsing. If no shift or reduce action is available the parser will being shifting tokens off of the input stack until a token appears with a valid action in the current state, in which case parsing resumes as normal.
438
+
439
+ The value of an `ERROR` non-terminal will be an array containing all of the tokens that were discarded while the parser was searching for a valid action.
440
+
441
+ The example below, based on one of the unit tests, shows a very basic usage of error productions:
442
+
443
+ ```Ruby
444
+ class ErrorCalc < RLTK::Parser
445
+ left :ERROR
446
+ right :PLS, :SUB, :MUL, :DIV, :NUM
447
+
448
+ production(:e) do
449
+ clause('NUM') {|n| n}
450
+
451
+ clause('e PLS e') { |e0, _, e1| e0 + e1 }
452
+ clause('e SUB e') { |e0, _, e1| e0 - e1 }
453
+ clause('e MUL e') { |e0, _, e1| e0 * e1 }
454
+ clause('e DIV e') { |e0, _, e1| e0 / e1 }
455
+
456
+ clause('e PLS ERROR e') { |e0, _, err, e1| error("#{err.len} tokens skipped."); e0 + e1 }
457
+ end
458
+
459
+ finalize
460
+ end
461
+ ```
462
+
463
+ ## A Note on Token Naming
464
+
465
+ In the world of RLTK both terminal and non-terminal symbols may contain only alphanumeric characters and underscores. The differences between terminal and non-terminal symbols is that terminals are **ALL\_UPPER\_CASE** and non-terminals are **all\_lower\_case**.
466
+
467
+ ## ASTNode
468
+
469
+ The {RLTK::ASTNode} base class is meant to be a good starting point for implementing your own abstract syntax tree nodes. By subclassing {RLTK::ASTNode} you automagically get features such as tree comparison, notes, value accessors with type checking, child node accessors and `each` and `map` methods (with type checking), and the ability to retrieve the root of a tree from any member node.
470
+
471
+ To create your own AST node classes you subclass the {RLTK::ASTNode} class and then use the {RLTK::ASTNode.child} and {RLTK::ASTNode.value} methods. By declaring the children and values of a node the class will define the appropriate accessors with type checking, know how to pack and unpack a node's children, and know how to handle constructor arguments.
472
+
473
+ Here we can see the definition of several AST node classes that might be used to implement binary operations for a language:
474
+
475
+ ```Ruby
476
+ class Expression < RLTK::ASTNode; end
477
+
478
+ class Number < Expression
479
+ value :value, Fixnum
480
+ end
481
+
482
+ class BinOp < Expression
483
+ value :op, String
484
+
485
+ child :left, Expression
486
+ child :right, Expression
487
+ end
488
+ ```
489
+
490
+ The assignment functions that are generated for the children and values perform type checking to make sure that the AST is well-formed. The type of a child must be a subclass of the {RLTK::ASTNode} class, whereas the type of a value can be any Ruby class. While child and value objects are stored as instance variables it is unsafe to assign to these variables directly, and it is strongly recommended to always use the accessor functions.
491
+
492
+ When instantiating a subclass of {RLTK::ASTNode} the arguments to the constructor should be the node's values (in order of definition) followed by the node's children (in order of definition). If a constructor is given fewer arguments then the number of values and children the remaining arguments are assumed to be `nil`. Example:
493
+
494
+ ```Ruby
495
+ class Foo < RLTK::ASTNode
496
+ value :a, Fixnum
497
+ child :b, Bar
498
+ value :c, String
499
+ child :d, Bar
500
+ end
501
+
502
+ class Bar < RLTK::ASTNode
503
+ value :a, String
504
+ end
505
+
506
+ Foo.new(1, 'baz', Bar.new)
507
+ ```
508
+
509
+ Lastly, the type of a child or value can be defined as an array of objects of a specific type as follows:
510
+
511
+ ```Ruby
512
+ class Foo < RLTK::ASTNode
513
+ value :strings, [String]
514
+ end
515
+ ```
516
+
517
+ ### Tree Iteration and Mapping
518
+
519
+ RLTK Abstract Syntax Trees may be [traversed](http://en.wikipedia.org/wiki/Tree_traversal) in three different ways:
520
+
521
+ * Pre-order
522
+ * Post-order
523
+ * Level-order
524
+
525
+ The order you wish to traverse the tree can be specified by passing the appropriate symbol to {RLTK::ASTNode#each}: `:pre`, `:post`, or `:level`.
526
+
527
+ You can also map one tree to another tree using the {RLTK::ASTNode#map} and {RLTK::ASTNode#map!} methods. In the former case a new tree is created and returned; in the latter case the current tree is transformed and the result of calling the provided block on the root node is returned. These methods will always visit nodes in *post-order*, so that all children of a node are visited before the node itself.
528
+
529
+ ## Code Generation
530
+
531
+ RLTK supports the generation of native code and LLVM IR, as well as JIT compilation and execution, through the {RLTK::CG} module. This module is built on top of bindings to [LLVM](http://llvm.org) and provides much, though not all, of the functionality of the LLVM libraries.
532
+
533
+ ### Acknowledgments and Discussion
534
+
535
+ Before we get started with the details, I would like to thank [Jeremy Voorhis](https://github.com/jvoorhis/). The bindings present in RLTK are really a fork of the great work that he did on [ruby-llvm](https://github.com/jvoorhis/ruby-llvm).
536
+
537
+ Why did I fork ruby-llvm, and why might you want to use the RLTK bindings over ruby-llvm? There are a couple of reasons:
538
+
539
+ * **Cleaner Codebase** - The RLTK bindings present a cleaner interface to the LLVM library by conforming to more standard Ruby programming practices, providing better abstractions and cleaner inheritance hierarchies, overloading constructors and other methods properly, and performing type checking on objects to better aid in debugging.
540
+ * **Documentation** - RLTK's bindings provide better documentation.
541
+ * **Completeness** - The RLTK bindings provide several features that are missing from the ruby-llvm project. These include the ability to initialize LLVM for architectures besides x86 (RLTK supports all architectures supported by LLVM), the presence of all of LLVM's optimization passes, the ability to print the LLVM IR representation of modules and values to files and load modules *from* files, easy initialization of native architectures, initialization for ASM printers and parsers, and compiling modules to object files.
542
+ * **Ease of Use** - Several features have been added to make generating code easier such as automatic management of memory resources used by LLVM.
543
+ * **Speed** - The RLTK bindings are ever so slightly faster due to avoiding unnecessary FFI calls.
544
+
545
+ Before you dive into generating code, here are some resources you might want to look over to build up some background knowledge on how LLVM works:
546
+
547
+ * [Static Single Assignment Form](http://en.wikipedia.org/wiki/Static_single_assignment_form)
548
+ * [LLVM Intermediate Representation](http://llvm.org/docs/LangRef.html)
549
+
550
+ ### LLVM
551
+
552
+ Since RLTK's code generation functionality is built on top of LLVM the first step in generating code is to inform LLVM of the target architecture. This is accomplished via the {RLTK::CG::LLVM.init} method, which is used like this: `RLTK::CG::LLVM.init(:PPC)`. The {RLTK::CG::Bindings::ARCHS} constant provides a list of supported architectures. This call must appear before any other calls to the RLTK::CG module.
553
+
554
+ If you would like to see what version of LLVM is targeted by your version of RLTK you can either call the {RLTK::CG::LLVM.version} method or looking at the {RLTK::LLVM\_TARGET\_VERSION} constant.
555
+
556
+ ### Modules
557
+
558
+ Modules are one of the core building blocks of the code generation module. Functions, constants, and global variables all exist inside a particular module and, if you use the JIT compiler, a module provides the context for your executing code. New modules can be created using the {RLTK::CG::Module#initialize RLTK::CG::Module.new} method. While this method is overloaded you, as a library user, will always pass it a string as its first argument. This allows you to name your modules for easier debugging later.
559
+
560
+ Once you have created you can serialize the code inside of it into *bitcode* via the {RLTK::CG::Module#write\_bitcode} method. This allows you to save partially generated code and then use it later. To load a module from *bitcode* you use the {RLTK::CG::Module.read\_bitcode} method.
561
+
562
+ ### Types
563
+
564
+ Types are an important part of generating code using LLVM. Functions, operations, and other constructs use types to make sure that the generated code is sane. All types in RLTK are subclasses of the {RLTK::CG::Type} class, and have class names that end in "Type". Types can be grouped into to categories: fundamental and composite.
565
+
566
+ Fundamental types are those like {RLTK::CG::Int32Type} and {RLTK::CG::FloatType} that don't take any arguments when they are created. Indeed, these types are represented using a Singleton class, and so the `new` method is disabled. Instead you can use the `instance` method to get an instantiated type, or simply pass in the class itself whenever you need to reference the type. In this last case, the method you pass the class to will instantiate the type for you.
567
+
568
+ Composite types are constructed from other types. These include the {RLTK::CG::ArrayType}, {RLTK::CG::FunctionType}, and other classes. These types you must instantiate directly before they can be used, and you may not simply pass the type class as the type argument to functions inside the RLTK::CG module.
569
+
570
+ For convenience, the native integer type of the host platform is made available via {RLTK::CG::NativeIntType}.
571
+
572
+ ### Values
573
+
574
+ The {RLTK::CG::Value} class is the common ancestor of many classes inside the RLTK::CG module. The main way in which you, the library user, will interact with them is when creating constant values. Here is a list of some of value classes you might use:
575
+
576
+ * {RLTK::CG::Int1}
577
+ * {RLTK::CG::Int8}
578
+ * {RLTK::CG::Int16}
579
+ * {RLTK::CG::Int32}
580
+ * {RLTK::CG::Int64}
581
+ * {RLTK::CG::Float}
582
+ * {RLTK::CG::Double}
583
+ * {RLTK::CG::ConstantArray}
584
+ * {RLTK::CG::ConstantStruct}
585
+
586
+ Again, for convenience, the native integer class of the host platform is made available via {RLTK::CG::NativeInt}.
587
+
588
+ ### Functions
589
+
590
+ Functions in LLVM are much like C functions; they have a return type, argument types, and a body. Functions may be created in several ways, though they all require a module in which to place the function.
591
+
592
+ The first way to create functions is via a module's function collection:
593
+
594
+ ```Ruby
595
+ mod.functions.add('my function', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
596
+ ```
597
+
598
+ Here we have defined a function named 'my function' in the `mod` module. It takes two native integers as arguments and returns a native integer. It is also possible to define the type of a function ahead of time and pass it to this method:
599
+
600
+ ```Ruby
601
+ type = RLTK::CG::FunctionType.new(RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
602
+ mod.functions.add('my function', type)
603
+ ```
604
+
605
+ Functions may also be created directly via the {RLTK::CG::Function#initialize RLTK::CG::Function.new} method, though a reference to a module is still necessary:
606
+
607
+ ```Ruby
608
+ mod = Module.new('my module')
609
+ fun = Function.new(mod, 'my function', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
610
+ ```
611
+
612
+ or
613
+
614
+ ```Ruby
615
+ mod = Module.new('my module')
616
+ type = RLTK::CG::FunctionType.new(RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
617
+ fun = Function.new(mod, 'my function', type)
618
+ ```
619
+
620
+ Lastly, whenever you use one of these methods to create a function you may give it a block to be executed inside the context of the function object. This allows for easier building of functions:
621
+
622
+ ```Ruby
623
+ mod.functions.add('my function', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
624
+ bb = blocks.append('entry)'
625
+ ...
626
+ end
627
+ ```
628
+
629
+ ### Basic Blocks
630
+
631
+ Once a function has been added to a module you will need to add {RLTK::CG::BasicBlock BasicBlocks} to the function. This can be done easily:
632
+
633
+ ```Ruby
634
+ bb = fun.blocks.append('entry')
635
+ ```
636
+
637
+ We now have a basic block that we can use to add instructions to our function and get it to actually do something. You can also instantiate basic blocks directly:
638
+
639
+ ```Ruby
640
+ bb = RLTK::CG::BasicBlock.new(fun, 'entry')
641
+ ```
642
+
643
+ ### The Builder
644
+
645
+ Now that you have a basic block you need to add instructions to it. This is accomplished using a {RLTK::CG::Builder builder}, either directly or indirectly.
646
+
647
+ To add instructions using a builder directly (this is most similar to how it is done using C/C++) you create the builder, position it where you want to add instructions, and then build them:
648
+
649
+ ```Ruby
650
+ fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
651
+ bb = fun.blocks.append('entry')
652
+
653
+ builder = RLTK::CG::Builder.new
654
+
655
+ builder.position_at_end(bb)
656
+
657
+ # Generate an add instruction.
658
+ inst0 = builder.add(fun.params[0], fun.params[1])
659
+
660
+ # Generate a return instruction.
661
+ builder.ret(inst0)
662
+ ```
663
+
664
+ You can get rid of some of those references to the builder by using the {RLTK::CG::Builder#build} method:
665
+
666
+ ```Ruby
667
+ fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
668
+ bb = fun.blocks.append('entry')
669
+
670
+ builder = RLTK::CG::Builder.new
671
+
672
+ builder.build(bb) do
673
+ ret add(fun.params[0], fun.params[1])
674
+ end
675
+ ```
676
+
677
+ To get rid of more code:
678
+
679
+ ```Ruby
680
+ fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
681
+ bb = fun.blocks.append('entry')
682
+
683
+ RLTK::CG::Builder.new(bb) do
684
+ ret add(fun.params[0], fun.params[1])
685
+ end
686
+ ```
687
+
688
+ or
689
+
690
+ ```Ruby
691
+ fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
692
+ fun.blocks.append('entry') do
693
+ ret add(fun.params[0], fun.params[1])
694
+ end
695
+ ```
696
+
697
+ or even
698
+
699
+ ```Ruby
700
+ mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
701
+ blocks.append('entry') do |fun|
702
+ ret add(fun.params[0], fun.params[1])
703
+ end
704
+ end
705
+ ```
706
+
707
+ In the last two examples a new builder object is created for the block. It is possible to specify the builder to be used:
708
+
709
+ ```Ruby
710
+ builder = RLTK::CG::Builder.new
711
+
712
+ mod.functions.add('add', RLTK:CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
713
+ blocks.append('entry', builder) do |fun|
714
+ ret add(fun.params[0], fun.params[1])
715
+ end
716
+ end
717
+ ```
718
+
719
+ For an example of where this is useful, see the Kazoo tutorial.
720
+
721
+ ### The Contractor
722
+
723
+ An alternative to using the {RLTK::CG::Builder} class is to use the {RLTK::CG::Contractor} class, which is a subclass of the Builder and includes the Filigree::Visitor module. (Get it? It's a visiting builder!) By subclassing the Contractor you can define blocks of code for handling various types of AST nodes and leave the selection of the correct code up to the {RLTK::CG::Contractor#visit} method. In addition, the `:at` and `:rcb` options to the *visit* method make it much easier to manage the positioning of the Builder.
724
+
725
+ Here we can see how easy it is to define a block that builds the instructions for binary operations:
726
+
727
+ ```Ruby
728
+ on Binary do |node|
729
+ left = visit node.left
730
+ right = visit node.right
731
+
732
+ case node
733
+ when Add then fadd(left, right, 'addtmp')
734
+ when Sub then fsub(left, right, 'subtmp')
735
+ when Mul then fmul(left, right, 'multmp')
736
+ when Div then fdiv(left, right, 'divtmp')
737
+ when LT then ui2fp(fcmp(:ult, left, right, 'cmptmp'), RLTK::CG::DoubleType, 'booltmp')
738
+ end
739
+ end
740
+ ```
741
+
742
+ AST nodes whose translation requires the generation of control flow will require the creation of new BasicBlocks and the repositioning of the builder. This can be easily managed:
743
+
744
+ ```Ruby
745
+ on If do |node|
746
+ cond_val = visit node.cond
747
+ fcmp :one, cond_val, ZERO, 'ifcond'
748
+
749
+ start_bb = current_block
750
+ fun = start_bb.parent
751
+
752
+ then_bb = fun.blocks.append('then')
753
+ then_val, new_then_bb = visit node.then, at: then_bb, rcb: true
754
+
755
+ else_bb = fun.blocks.append('else')
756
+ else_val, new_else_bb = visit node.else, at: else_bb, rcb: true
757
+
758
+ merge_bb = fun.blocks.append('merge', self)
759
+ phi_inst = build(merge_bb) { phi RLTK::CG::DoubleType, {new_then_bb => then_val, new_else_bb => else_val}, 'iftmp' }
760
+
761
+ build(start_bb) { cond cond_val, then_bb, else_bb }
762
+
763
+ build(new_then_bb) { br merge_bb }
764
+ build(new_else_bb) { br merge_bb }
765
+
766
+ returning(phi_inst) { target merge_bb }
767
+ end
768
+ ```
769
+
770
+ More extensive examples of how to use the Contractor class can be found in the Kazoo tutorial chapters.
771
+
772
+ ### Execution Engines
773
+
774
+ Once you have generated your code you may want to run it. RLTK provides bindings to both the LLVM interpreter and JIT compiler to help you do just that. Creating a JIT compiler is pretty simple.
775
+
776
+ ```Ruby
777
+ mod = RLTK::CG::Module.new('my module')
778
+ jit = RLTK::CG::JITCompiler(mod)
779
+
780
+ mod.functions.add('add', RLTK:CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
781
+ blocks.append('entry', nil, nil, self) do |fun|
782
+ ret add(fun.params[0], fun.params[1])
783
+ end
784
+ end
785
+ ```
786
+
787
+ Now you can run your 'add' function like this:
788
+
789
+ ```Ruby
790
+ jit.run(fun, 1, 2)
791
+ ```
792
+
793
+ The result will be a {RLTK::CG::GenericValue} object, and you will want to use its {RLTK::CG::GenericValue#to\_i #to\_i} and {RLTK::CG::GenericValue#to\_f #to\_f} methods to get the Ruby value result.
794
+
795
+ ## Tutorial
796
+
797
+ What follows is an in-depth example of how to use the Ruby Language Toolkit. This tutorial will show you how to use RLTK to build a lexer, parser, AST nodes, and compiler to create a toy language called Kazoo. The tutorial is based on the LLVM [Kaleidoscope](http://llvm.org/docs/tutorial/) tutorial, but has been modified to:
798
+
799
+ * a) be done in Ruby
800
+ * 2) use a lexer and parser generator and
801
+ * III) use a language that I call Kazoo, which is really just a cleaned up and simplified version of the Kaleidoscope language used in the LLVM tutorial (as opposed to the [Kaleidoscope language](http://en.wikipedia.org/wiki/Kaleidoscope_%28programming_language%29) from the 90′s).
802
+
803
+ The Kazoo toy language is a procedural language that allows you to define functions, use conditionals, and perform basic mathematical operations. Over the course of the tutorial we’ll extend Kazoo to support the if/then/else construct, for loops, JIT compilation, and a simple command line interface to the JIT.
804
+
805
+ Because we want to keep things simple the only datatype in Kazoo is a 64-bit floating point type (a C double or a Ruby float). As such, all values are implicitly double precision and the language doesn’t require type declarations. This gives the language a very nice and simple syntax. For example, the following example computes Fibonacci numbers:
806
+
807
+ ```
808
+ def fib(x)
809
+ if x < 3 then
810
+ 1
811
+ else
812
+ fib(x-1) + fib(x-2)
813
+ ```
814
+
815
+ The tutorial is organized as follows:
816
+
817
+ * [Chapter 1: The Lexer](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%201/Chapter1.md)
818
+ * [Chapter 2: The AST Nodes](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%202/Chapter2.md)
819
+ * [Chapter 3: The Parser](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%203/Chapter3.md)
820
+ * [Chapter 4: AST Translation](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%204/Chapter4.md)
821
+ * [Chapter 5: JIT Compilation](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%205/Chapter5.md)
822
+ * [Chapter 6: Adding Control Flow](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%206/Chapter6.md)
823
+ * [Chapter 7: Playtime](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%207/Chapter7.md)
824
+ * [Chapter 8: Mutable Variables](https://github.com/chriswailes/RLTK/blob/master/examples/kazoo/chapter%208/Chapter8.md)
825
+
826
+ Before starting this tutorial you should know about regular expressions, the basic ideas behind lexing and parsing, and be able to read context-free grammar (CFG) definitions. By the end of this tutorial we will have written 372 lines of source code and have a JIT compiler for a Turing complete language.
827
+
828
+ ## Provided Lexers and Parsers
829
+
830
+ The following lexer and parser classes are included as part of RLTK:
831
+
832
+ * {RLTK::Lexers::Calculator}
833
+ * {RLTK::Lexers::EBNF}
834
+ * {RLTK::Parsers::PrefixCalc}
835
+ * {RLTK::Parsers::InfixCalc}
836
+ * {RLTK::Parsers::PostfixCalc}
837
+
838
+ ## Contributing
839
+
840
+ If you are interested in contributing to RLTK there are many aspects of the library that you can work on. A detailed TODO list can be found [here](https://github.com/chriswailes/RLTK/blob/master/TODO.md). If you are looking for smaller units of work feel free to:
841
+
842
+ * Help provide unit tests. Not all of RLTK is tested as well as it could be. Specifically, more tests for the RLTK::CFG and RLTK::Parser classes would be appreciated.
843
+ * Write lexers or parsers that you think others might want to use. Possibilities include HTML, JSON/YAML, Javascript, and Ruby.
844
+ * Extend the RLTK::CFG class with additional functionality.
845
+
846
+ Lastly, I love hearing back from users. If you find any part of the documentation unclear or incomplete let me know. It is also helpful to me to know how people are using the library, so if you are using RLTK in your project send me an email. This lets me know what features are being used and where I should focus my development efforts.
847
+
848
+ ## News
849
+
850
+ Aaaaand we're back. Development of RLTK has been on hold for a while as I worked on other projects. If you want to see what I've been up to, you can check out [Clang's](http://llvm.org/clang) new `-Wconsumed` flag and the [Filigree](http://github.com/chriswailes/filigree) gem.
851
+
852
+ The next version of RLTK is going to be updated to require Ruby 2.0 as well as LLVM 3.4. Previous versions of RLTK required my LLVM-ECB libarary to expose extra LLVM features through the C bindings; this is no longer necessary as this functionality has been moved into LLVM proper. If anyone has any requests for new or improved features for RLTK version 3.0, let me know.