oedipus_lex 2.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: ae00ca9bebb1e13070e5b9ee44b4cfa762333cce
4
+ data.tar.gz: 5b798da04b678300ed976de7a281fcbb6d39a979
5
+ SHA512:
6
+ metadata.gz: 86902463f0f518bca1f7a02412a70e9e90aa95cd6be8169a48ed88a11228b520ffb11ea9cccb5e212c8e2fe3183e4a4b54a73c2aabbee8e61ac15147671028ef
7
+ data.tar.gz: c0b06ce63410490eafdfa2d8ee5758d7e92dd824d444b9e30767ce65e82c5439a04c51e4959bc97d5f2b55b80968f3b046320f92c2be9944b7b7444b7c189265
Binary file
@@ -0,0 +1 @@
1
+ @956v�Z?� ���H ]��R\
@@ -0,0 +1,26 @@
1
+ # -*- ruby -*-
2
+
3
+ require "autotest/restart"
4
+
5
+ Autotest.add_hook :initialize do |at|
6
+ at.testlib = "minitest/autorun"
7
+ at.add_exception "tmp"
8
+
9
+ # at.extra_files << "../some/external/dependency.rb"
10
+ #
11
+ # at.libs << ":../some/external"
12
+ #
13
+ # at.add_exception "vendor"
14
+ #
15
+ # at.add_mapping(/dependency.rb/) do |f, _|
16
+ # at.files_matching(/test_.*rb$/)
17
+ # end
18
+ #
19
+ # %w(TestA TestB).each do |klass|
20
+ # at.extra_class_map[klass] = "test/test_misc.rb"
21
+ # end
22
+ end
23
+
24
+ # Autotest.add_hook :run_command do |at|
25
+ # system "rake build"
26
+ # end
File without changes
@@ -0,0 +1,6 @@
1
+ === 1.0.0 / 2013-12-13
2
+
3
+ * 1 major enhancement
4
+
5
+ * Birthday!
6
+
@@ -0,0 +1,27 @@
1
+ .autotest
2
+ History.rdoc
3
+ Manifest.txt
4
+ README.rdoc
5
+ Rakefile
6
+ lib/oedipus_lex.rake
7
+ lib/oedipus_lex.rb
8
+ lib/oedipus_lex.rex
9
+ lib/oedipus_lex.rex.rb
10
+ rex-mode.el
11
+ sample/calc3.racc
12
+ sample/calc3.rex
13
+ sample/error1.rex
14
+ sample/error1.txt
15
+ sample/error2.rex
16
+ sample/sample.html
17
+ sample/sample.rex
18
+ sample/sample.xhtml
19
+ sample/sample1.c
20
+ sample/sample1.rex
21
+ sample/sample2.bas
22
+ sample/sample2.rex
23
+ sample/xhtmlparser.html
24
+ sample/xhtmlparser.racc
25
+ sample/xhtmlparser.rex
26
+ sample/xhtmlparser.xhtml
27
+ test/test_oedipus_lex.rb
@@ -0,0 +1,468 @@
1
+ = Oedipus Lex - This is not your father's lexer
2
+
3
+ home :: http://github.com/seattlerb/oedipus_lex
4
+ rdoc :: http://docs.seattlerb.org/oedipus_lex
5
+
6
+ == DESCRIPTION
7
+
8
+ Oedipus Lex is a lexer generator in the same family as Rexical and
9
+ Rex. Oedipus Lex is my independent lexer fork of Rexical. Rexical was
10
+ in turn a fork of Rex. We've been unable to contact the author of rex
11
+ in order to take it over, fix it up, extend it, and relicense it to
12
+ MIT. So, Oedipus was written clean-room in order to bypass licensing
13
+ constraints (and because bootstrapping is fun).
14
+
15
+ Oedipus brings a lot of extras to the table and at this point is only
16
+ historically related to rexical. The syntax has changed enough that
17
+ any rexical lexer will have to be tweaked to work inside of oedipus.
18
+ At the very least, you need to add slashes to all your regexps.
19
+
20
+ Oedipus, like rexical, is based primarily on generating code much like
21
+ you would a hand-written lexer. It is _not_ a table or hash driven
22
+ lexer. It use StrScanner within a multi-level case statement. As such,
23
+ Oedipus matches on the _first_ match, not the longest (like lex and
24
+ its ilk).
25
+
26
+ This documentation is not meant to bypass any prerequisite knowledge
27
+ on lexing or parsing. If you'd like to study the subject in further
28
+ detail, please try [TIN321] or the [LLVM Tutorial] or some other good
29
+ resource for CS learning. Books... books are good. I like books.
30
+
31
+ == Syntax:
32
+
33
+ lexer = (misc_line)*
34
+ /class/ class_id
35
+ (option_section)?
36
+ (inner_section)?
37
+ (start_section)?
38
+ (macro_section)?
39
+ (rule_section)?
40
+ /end/
41
+ (misc_line)*
42
+
43
+ misc_line = /.*/
44
+
45
+ class_id = /\w+.*/
46
+
47
+ option_section = /options?/ NL (option)*
48
+ option = /stub/i
49
+ | /debug/i
50
+
51
+ inner_section = /inner/ NL (misc_line)*
52
+
53
+ start_section = /start/ NL (misc_line)*
54
+
55
+ macro_section = /macros?/ NL (macro)*
56
+ macro = name regexp
57
+ name = /\w+/
58
+ regexp = /(\/(?:\\.|[^\/])+\/[io]?)/
59
+
60
+ rule_section = /rules?/ NL (rule)*
61
+ rule = (state)? regexp (action)?
62
+ state = label
63
+ | predicate
64
+ label = /:\w+/
65
+ predicate = /\w+\?/
66
+ action = name
67
+ | /\{.*\}.*/
68
+
69
+ === Basic Example
70
+
71
+ class Calculator
72
+ macros
73
+ NUMBER /\d+/
74
+ rules
75
+ /rpn/ :RPN # sets @state to :RPN
76
+ /#{NUMBER}/ { [:number, text.to_i] }
77
+ /\s+/
78
+ /[+-]/ { [:op, text] }
79
+
80
+ :RPN /\s+/
81
+ :RPN /[+-]/ { [:op2, text] }
82
+ :RPN /#{NUMBER}/ { [:number2, text.to_i] }
83
+ :RPN /alg/ nil # clears state
84
+ end
85
+
86
+ ==== Header
87
+
88
+ Anything before the class line is considered the "header" and will be
89
+ added to the top of your file. This includes extra lines like module
90
+ namespacing.
91
+
92
+ ==== Class Line
93
+
94
+ The class line, like a regular ruby class declaration, specifies what
95
+ class all of the lexer code belongs to. You may simply specify a class
96
+ name like:
97
+
98
+ class MyLexer
99
+
100
+ or it may specify a superclass as well:
101
+
102
+ class MyLexer < MyParser
103
+
104
+ You might do this latter case to mix your lexer and your racc parser
105
+ together.
106
+
107
+ Personally, I recommend keeping them apart for cleanliness and
108
+ testability.
109
+
110
+ ==== Options
111
+
112
+ There are currently only two options for Oedipus: "debug" and "stub".
113
+
114
+ Specify `debug` to turn on basic tracing output.
115
+
116
+ Specify `stub` to create a generic handler that processes all files
117
+ specified on the commandline with a rather generic handler. This makes
118
+ it easy to get up and running before you have the rest of your system
119
+ in place.
120
+
121
+ ==== Inner
122
+
123
+ The inner section is just code, like header or footer, but inner gets
124
+ put _inside_ the class body. You can put extra methods here.
125
+
126
+ Personally, I recommend you don't use inner and you put all of your
127
+ extra methods and class code in a separate file. This makes lexer
128
+ generation faster and keeps things separate and small.
129
+
130
+ ==== Macros
131
+
132
+ Macros define named regexps that you can use via interpolation inside
133
+ other subsequent macros or within rule matchers.
134
+
135
+ ==== Start
136
+
137
+ The lexer runs in a loop until it finds a match or has to bail. Use
138
+ the `start` section to place extra code at the top of your
139
+ `next_token` method, before the loop. Eg:
140
+
141
+ start
142
+ space_seen = false
143
+
144
+ This code will get expanded into the very top of the lexer method. Do
145
+ note that this code gets run before _every token_, not just on lexer
146
+ initialization.
147
+
148
+ ==== Rules
149
+
150
+ The rule section is the meat of the lexer. It contains one or more
151
+ rule lines where each line consists of:
152
+
153
+ * a required state (as a `:symbol`), a predicate method, or nothing.
154
+ * a regular expression.
155
+ * an action method, an action block, or nothing.
156
+
157
+ More often than not, a rule should not specify a required state. Only
158
+ use them when you're convinced you need them.
159
+
160
+ So a rule can very simple, including _just_ a regexp:
161
+
162
+ rules
163
+ /#.*/ # ignore comments
164
+
165
+ or can contain any combination of state checks or action types:
166
+
167
+ rules
168
+ :state /token/ action_method
169
+ predicate? /another/ { do_something }
170
+
171
+ ===== States and Predicates
172
+
173
+ In order for the tokenizer to determine if the rule's regexp should
174
+ even be considered, a rule may specify a required state, a predicate
175
+ method to call, or leave it blank.
176
+
177
+ If the rule does not specify a state, it can be used whenever `@state`
178
+ is nil or a symbol that starts lowercase (an inclusive rule). If the
179
+ rule specifies a symbol that starts uppercase (an exclusive rule), it
180
+ will _only_ use those rules when `@state` matches.
181
+
182
+ Alternatively, a rule may specify a predicate method to check. If that
183
+ method returns a truthy value, the rule is currently valid. This is
184
+ equivalent to setting the required state to nil, as it will be used
185
+ with inclusive and nil states, and ignored for exclusive states.
186
+
187
+ ==== End & Footer
188
+
189
+ Like the header, anything after the end line is considered the
190
+ "footer" and will be added to the bottom of your file.
191
+
192
+ == Suggested Structure
193
+
194
+ Here's how I suggest you structure things:
195
+
196
+ === Rakefile
197
+
198
+ You only need a minimum of dependencies to wire stuff up if you use
199
+ the supplied rake rule.
200
+
201
+ Rake.application.rake_require "oedipus_lex"
202
+
203
+ task :lexer => "lib/mylexer.rex.rb"
204
+ task :parser => :lexer # plus appropriate parser rules/deps
205
+ task :test => :parser
206
+
207
+ === lib/mylexer.rex
208
+
209
+ Put your lexer definition here. It will generate into
210
+ `"lib/mylexer.rex.rb"`.
211
+
212
+ class MyLexer
213
+ macros
214
+ # ...
215
+ rules
216
+ # ...
217
+ end
218
+
219
+ === lib/mylexer.rb
220
+
221
+ require "new_ruby_lexer.rex"
222
+
223
+ class MyLexer
224
+ # ... predicate methods and stuff
225
+ end
226
+
227
+ === lib/myparser.rb
228
+
229
+ Assuming you're using a racc based parser, you'll need to define a
230
+ `next_token` method that bridges over to your lexer:
231
+
232
+ class MyParser
233
+ def next_token
234
+ lexer.next_token # plus any sanity checking / error handling...
235
+ end
236
+ end
237
+
238
+ == Differences with Rexical
239
+
240
+ If you're already familiar with rexical, this might help you get up
241
+ and running faster. If not, it could provide an overview of the
242
+ value-added.
243
+
244
+ === Additions or Changes
245
+
246
+ ==== A generic rake rule is defined for rex files.
247
+
248
+ Oedipus defines a rake rule that allows you simply define a file-based
249
+ dependency and rake will take care of the rest. Eg:
250
+
251
+ file "lib/mylexer.rex.rb" => "lib/mylexer.rex"
252
+
253
+ task :generated => %w[lib/mylexer.rex.rb]
254
+
255
+ task :test => :generated
256
+
257
+ ==== All regular expressions must be slash delimited.
258
+
259
+ Basically, regexps are now plain slashed ruby regexps. This allows for
260
+ regexp flags to be provided individually, rather than specifying an
261
+ entire grammar is case-insensitive, you can have a single rule be case
262
+ insensitive.
263
+
264
+ Right now only `/i` and `/o` are properly handled.
265
+
266
+ ==== Regular expressions now use ruby interpolation.
267
+
268
+ Instead of `aaa{{macro}}ccc` it is `/aaa#{macro}ccc/`.
269
+
270
+ ==== Macros define class constants.
271
+
272
+ Macros simply become class constants inside the lexer class. This
273
+ makes them immediately available to other macros and to the regexps in
274
+ the rules section.
275
+
276
+ This also implies that they must start uppercase, since that is
277
+ required by ruby.
278
+
279
+ ==== Rules can be activated by predicate methods.
280
+
281
+ Instead of just switching on state, rules can now check predicate
282
+ methods to see if they should trigger. Eg:
283
+
284
+ rules
285
+ sad? /\w+/ { [:sad, text] }
286
+ happy? /\w+/ { [:happy, text] }
287
+ end
288
+
289
+ # elsewhere:
290
+ def sad?
291
+ # ...
292
+ end
293
+
294
+ def happy?
295
+ not sad?
296
+ end
297
+
298
+ ==== Rule actions are only a single-line.
299
+
300
+ In order to push complexity down, `{ rule actions }` may only be a
301
+ single line.
302
+
303
+ ==== Rules can invoke methods.
304
+
305
+ For more complex actions, use a method by specifying its name:
306
+
307
+ rules
308
+ /\w+/ process_word
309
+ end
310
+
311
+ And then define the handler method to return a result pair:
312
+
313
+ def process_word text
314
+ # do lots of normalization...
315
+ [:word, token]
316
+ end
317
+
318
+ This strikes a good balance between readability and maintainability.
319
+ It also makes it much easier to write unit tests for the complex
320
+ actions.
321
+
322
+ ==== Rules can define state.
323
+
324
+ There are shortcuts built in to define or clear state:
325
+
326
+ rules
327
+ /rpn/ :RPN # sets @state to :RPN
328
+ # ...
329
+ :RPN /alg/ nil # clears @state
330
+
331
+ ==== Use a `start` section to define pre-lex code.
332
+
333
+ The lexer runs in a loop until it finds a match or has to bail.
334
+ Sometimes more complex lexers need to set some local state. You can
335
+ now do this in a `start` section. Eg:
336
+
337
+ start
338
+ space_seen = false
339
+
340
+ This code will get expanded into the very top of the lexer method. Do
341
+ note that this code gets run before _every token_, not just on
342
+ initialization.
343
+
344
+ ==== Rule state can be inclusive or exclusive.
345
+
346
+ This actually isn't new from rexical... It just wasn't really well
347
+ documented.
348
+
349
+ Exclusive states start with an uppercase letter (and are generally all
350
+ uppercase). Inclusive states start with a lowercase letter. Exclusive
351
+ states will _only_ try their own matchers. Inclusive states will also
352
+ try any matcher w/o a state.
353
+
354
+ In both cases, the order of generated matchers is strictly defined by
355
+ the source file. Nothing is re-ordered, ever. Eg:
356
+
357
+ rules
358
+ /\d+/
359
+ /\s+/ # used in both nil-state and :rpn state
360
+ /[+-]/
361
+
362
+ :rpn /\d+/ # won't hit, because of nil-state matcher above
363
+
364
+ :OP /\s+/ # must define its own because no-nil-state matchers are used
365
+ :OP /\d+/
366
+ end
367
+
368
+ ==== Default `do_parse` will dispatch to lex_xxx automatically.
369
+
370
+ The method `do_parse` is generated for you and automatically
371
+ dispatches off to user-defined methods named `lex_<token-type>` where
372
+ token-type is the first value returned from any matching action. Eg:
373
+
374
+ rules
375
+ /\s*(\#.*)/ { [:comment, text] }
376
+
377
+ # elsewhere:
378
+
379
+ def lex_comment line
380
+ # do nothing
381
+ end
382
+
383
+ ==== `text` is passed in, or use `match[n]` or `matches`
384
+
385
+ You can use the `text` variable for the entire match inside an action,
386
+ or you can use `match[n]` to access a specific match group, or
387
+ `matches` to get an array of all match groups. Eg:
388
+
389
+ /class ([\w:]+)(.*)/ { [:class, *matches] }
390
+
391
+ In this case, the action will return something like: `[:class,
392
+ "ClassName" "< Superclass"]`.
393
+
394
+ ==== You can override the scanner class by defining `scanner_class`.
395
+
396
+ Oedipus will define the method `scanner_class` to return
397
+ `StringScanner` unless you define one yourself. Because it uses
398
+ reflection to figure out whether you've defined it or not, you may
399
+ need to require the generated lexer AFTER you've defined
400
+ `scanner_class`. Eg:
401
+
402
+ class MyLexer
403
+ # ...
404
+
405
+ def scanner_class
406
+ CustomStringScanner
407
+ end
408
+
409
+ # ...
410
+ end
411
+
412
+ require "my_lexer.rex"
413
+
414
+ **NOTE:** I'm _totally_ open to better ways of doing this. I simply
415
+ needed to get stuff done and this presented itself as _viable-enough_.
416
+
417
+ === Removals
418
+
419
+ ==== There is no command-line tool.
420
+
421
+ There is no command-line tool. Instead, use the rake rule described
422
+ above.
423
+
424
+ ==== There are only two options: debug and stub.
425
+
426
+ All other options from rexical have been removed because they don't
427
+ make sense in Oedipus.
428
+
429
+ ==== Probably others...
430
+
431
+ It's hard to think about what I took out. What I added is plain as
432
+ day. :P
433
+
434
+ [TIN321]: http://www.cse.chalmers.se/edu/year/2011/course/TIN321/lectures/proglang-04.html
435
+ [LLVM Tutorial]: http://llvm.org/docs/tutorial/LangImpl1.html#language
436
+
437
+ == Requirements:
438
+
439
+ * ruby version 1.8.x or later.
440
+
441
+ == Install
442
+
443
+ * sudo gem install rexical
444
+
445
+ == License
446
+
447
+ (The MIT License)
448
+
449
+ Copyright (c) Ryan Davis, seattle.rb
450
+
451
+ Permission is hereby granted, free of charge, to any person obtaining
452
+ a copy of this software and associated documentation files (the
453
+ 'Software'), to deal in the Software without restriction, including
454
+ without limitation the rights to use, copy, modify, merge, publish,
455
+ distribute, sublicense, and/or sell copies of the Software, and to
456
+ permit persons to whom the Software is furnished to do so, subject to
457
+ the following conditions:
458
+
459
+ The above copyright notice and this permission notice shall be
460
+ included in all copies or substantial portions of the Software.
461
+
462
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
463
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
464
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
465
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
466
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
467
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
468
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.