rubylexer 0.6.2 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (76) hide show
  1. data/History.txt +55 -0
  2. data/Manifest.txt +67 -0
  3. data/README.txt +103 -0
  4. data/Rakefile +24 -0
  5. data/howtouse.txt +9 -6
  6. data/{assert.rb → lib/assert.rb} +11 -11
  7. data/{rubylexer.rb → lib/rubylexer.rb} +645 -342
  8. data/lib/rubylexer/0.6.2.rb +39 -0
  9. data/lib/rubylexer/0.6.rb +5 -0
  10. data/lib/rubylexer/0.7.0.rb +2 -0
  11. data/{charhandler.rb → lib/rubylexer/charhandler.rb} +4 -2
  12. data/{charset.rb → lib/rubylexer/charset.rb} +4 -3
  13. data/{context.rb → lib/rubylexer/context.rb} +48 -18
  14. data/{rubycode.rb → lib/rubylexer/rubycode.rb} +5 -3
  15. data/{rulexer.rb → lib/rubylexer/rulexer.rb} +180 -102
  16. data/{symboltable.rb → lib/rubylexer/symboltable.rb} +10 -1
  17. data/{token.rb → lib/rubylexer/token.rb} +72 -20
  18. data/{tokenprinter.rb → lib/rubylexer/tokenprinter.rb} +39 -16
  19. data/lib/rubylexer/version.rb +3 -0
  20. data/{testcode → test/code}/deletewarns.rb +0 -0
  21. data/test/code/dl_all_gems.rb +43 -0
  22. data/{testcode → test/code}/dumptokens.rb +12 -9
  23. data/test/code/locatetest +30 -0
  24. data/test/code/locatetest.rb +49 -0
  25. data/test/code/rubylexervsruby.rb +173 -0
  26. data/{testcode → test/code}/tokentest.rb +62 -51
  27. data/{testcode → test/code}/torment +8 -8
  28. data/test/code/unpack_all_gems.rb +15 -0
  29. data/{testdata → test/data}/1.rb.broken +0 -0
  30. data/{testdata → test/data}/23.rb +0 -0
  31. data/test/data/__end__.rb +2 -0
  32. data/test/data/__end__2.rb +3 -0
  33. data/test/data/and.rb +5 -0
  34. data/test/data/blockassigntest.rb +23 -0
  35. data/test/data/chunky.plain.rb +75 -0
  36. data/test/data/chunky_bacon.rb +112 -0
  37. data/test/data/chunky_bacon2.rb +112 -0
  38. data/test/data/chunky_bacon3.rb +112 -0
  39. data/test/data/chunky_bacon4.rb +112 -0
  40. data/test/data/for.rb +45 -0
  41. data/test/data/format.rb +6 -0
  42. data/{testdata → test/data}/g.rb +0 -0
  43. data/test/data/gemlist.txt +280 -0
  44. data/test/data/heart.rb +7 -0
  45. data/test/data/if.rb +6 -0
  46. data/test/data/jarh.rb +369 -0
  47. data/test/data/lbrace.rb +4 -0
  48. data/test/data/lbrack.rb +4 -0
  49. data/{testdata → test/data}/newsyntax.rb +0 -0
  50. data/{testdata → test/data}/noeolatend.rb +0 -0
  51. data/test/data/p-op.rb +8 -0
  52. data/{testdata → test/data}/p.rb +671 -79
  53. data/{testdata → test/data}/pleac.rb.broken +0 -0
  54. data/{testdata → test/data}/pre.rb +0 -0
  55. data/{testdata → test/data}/pre.unix.rb +0 -0
  56. data/{testdata → test/data}/regtest.rb +0 -0
  57. data/test/data/rescue.rb +35 -0
  58. data/test/data/s.rb +186 -0
  59. data/test/data/strinc.rb +2 -0
  60. data/{testdata → test/data}/tokentest.assert.rb.can +0 -0
  61. data/test/data/untermed_here.rb.broken +2 -0
  62. data/test/data/untermed_string.rb.broken +1 -0
  63. data/{testdata → test/data}/untitled1.rb +0 -0
  64. data/{testdata → test/data}/w.rb +0 -0
  65. data/{testdata → test/data}/wsdlDriver.rb +0 -0
  66. data/testing.txt +6 -4
  67. metadata +163 -59
  68. data/README +0 -134
  69. data/Rantfile +0 -37
  70. data/io.each_til_charset.rb +0 -247
  71. data/require.rb +0 -103
  72. data/rlold.rb +0 -12
  73. data/testcode/locatetest +0 -12
  74. data/testcode/rubylexervsruby.rb +0 -104
  75. data/testcode/rubylexervsruby.sh +0 -51
  76. data/testresults/placeholder +0 -0
data/README DELETED
@@ -1,134 +0,0 @@
1
- -=RubyLexer 0.6.2=-
2
-
3
- RubyLexer is a lexer library for Ruby, written in Ruby. My goal with Rubylexer
4
- was to create a lexer for Ruby that's complete and correct; all legal Ruby
5
- code should be lexed correctly by RubyLexer as well. Just enough parsing
6
- capability is included to give RubyLexer enough context to tokenize correctly
7
- in all cases. (This turned out to be more parsing than I had thought or
8
- wanted to take on at first.)
9
-
10
- Other Ruby lexers exist, but most are inadequate. For instance, irb has it's
11
- own little lexer, as does, (I believe) RDoc, so do all the ide's that can
12
- colorize. I've seen several stand-alone libraries as well. All or almost all
13
- suffer from the same problems: they skip the hard part of lexing. RubyLexer
14
- handles the hard things like complicated strings, the ambiguous nature of
15
- some punctuation characters and keywords in ruby, and distinguishing methods
16
- and local variables.
17
-
18
- RubyLexer is not particularly clean code. As I progressed in writing this,
19
- I've learned a little about how these things are supposed to be done; the
20
- lexer is not supposed to have any state of it's own, instead it gets whatever
21
- it needs to know from the parser. As a stand-alone lexer, Rubylexer maintains
22
- quite a lot of state. Every instance variable in the RubyLexer class is some
23
- sort of lexer state. Most of the complication and ugly code in RubyLexer is
24
- in maintaining or using this state.
25
-
26
- For information about using RubyLexer in your program, please see howtouse.txt.
27
-
28
- For my notes on the testing of RubyLexer, see testing.txt.
29
-
30
- If you have any questions, comments, problems, new feature requests, or just
31
- want to figure out how to make it work for what you need to do, contact me:
32
- rubylexer _at_ inforadical.net
33
-
34
- RubyLexer is a RubyForge project. RubyForge is another good place to send your
35
- bug reports or whatever: http://rubyforge.org/projects/rubylexer/
36
-
37
- (There aren't any bug filed against RubyLexer there yet, but don't be afraid
38
- that your report will get lonely.)
39
-
40
- Status:
41
- RubyLexer can correctly lex all legal Ruby 1.8 code that I've been able to
42
- find on my Debian system. It can also handle (most of) my catalog of nasty
43
- test cases (in testdata/p.rb). At this point, new bugs are almost exclusively
44
- found by my home-grown test code, rather than ruby code gathered 'from the
45
- wild'. A largish sample of ruby recently tested for the first time (that is,
46
- Rubyx) had _0_ lex errors. (And this is not the only example.) There are a
47
- number of issues i know about and plan to fix, but it seems that Ruby coders
48
- don't write code complex enough to trigger them very often. Although
49
- incomplete, RubyLexer is nevertheless better than many existing ad-hoc
50
- lexers. For instance, RubyLexer can correctly distinguish all cases of the
51
- different uses the following operators, depending on context:
52
- % can be modulus operator or start of fancy string
53
- / can be division operator or start of regex
54
- * & + - can be unary or binary operator
55
- [] can be for array literal or index method
56
- << can be here document or left shift operator (or in class<<obj expr)
57
- :: can be unary or binary operator
58
- : can be start of symbol, substitute for then, or part of ternary op
59
- (there are other uses too, but they're not supported yet.)
60
- ? can be start of character constant or ternary operator
61
- ` can be method name or start of exec string
62
-
63
- todo:
64
- test w/ more code (rubygems, rpa, obfuscated ruby contest, rubicon, others?)
65
- these 5 should be my standard test suite: p.rb, (matz') test.rb, tk.rb, obfuscated ruby contest, rubicon
66
- test more ways: cvt source to dos or mac fmt before testing
67
- test more ways: run unit tests after passing thru rubylexer (0.7)
68
- test more ways: test require'd, load'd, or eval'd code as well (0.7)
69
- lex code a line (or chunk) at a time and save state for next line (irb wants this) (0.8)
70
- incremental lexing (ides want this (for performance))
71
- put everything in a namespace
72
- integrate w/ other tools...
73
- html colorized output?
74
- move more state onto @bracestack (ongoing)
75
- expand on test documentation above
76
- the new cases in p.rb now compile, but won't run
77
- use want_op_name more
78
- return result as a half-parsed tree (with parentheses and the like matched)
79
- emit advisory tokens when see beginword, then (or equivalent), or end... what else does florian want?
80
- strings are still slow
81
- rantfile
82
- emit advisory tokens when local var defined/goes out of scope (or hidden/unhidden?)
83
- fakefile should be a mixin
84
- token pruning in dumptokens...
85
-
86
- new ruby features not yet supported:
87
- procs without proc keyword, looks like hash to current lexer
88
- keyword arguments, in hash immediates or actual param lists (&formal param lists?)
89
- unicode (0.9)
90
- :wrap and friends... (i wish someone would make a list of all the uses of colon in ruby.)
91
- parens in block param list (works, but hacky)
92
-
93
-
94
- known issues: (and planned fix release)
95
- context not really preserved when entering or leaving string inclusions. this causes
96
- a number or problems. (0.8)
97
- string tokenization sometimes a little different from ruby around newlines
98
- (htree/template.rb) (0.8)
99
- string contents might not be correctly translated in a few cases (0.8?)
100
- the implicit tokens might be emitted at the wrong times. (or not at the right times) (need more test code) (0.7)
101
- local variables should be temporarily hidden by class, module, and def (0.7)
102
- windows or mac newline in source are likely to cause problems in obscure cases (need test case)
103
- line numbers are sometimes off... probably to do with multi-line strings (=begin...=end causes this) (0.8)
104
- symbols which contain string interpolations are flattened into one token. eg :"foo#{bar}" (0.8)
105
- methnames and varnames might get mixed up in def header (in idents after the 'def' but before param list) (0.7)
106
- FileAndLineToken not emitted everywhere it should be (0.8)
107
- '\r' whitespace sometimes seen in dos-formatted output.. shouldn't be (eg pre.rb) (0.7)
108
- no way to get offset of __END__ (??) (0.7)
109
- put things in lib/
110
-
111
-
112
- fixed issues in 0.6.2:
113
- testcode/dumptokens.rb charhandler.rb doesn't work... but does after unix2dos (not reproducible)
114
- files should be opened in binmode to avoid all possible eol translation
115
- (x.+?x) doesn't work
116
- methname/varname mixups in some cases
117
- performance, in most important cases.
118
- error handling tokens should be emitted on error input... ErrorToken mixin module
119
- but old error handling interface should be preserved and made available.
120
- move readahead and friends into IOext. make optimized readahead et al for fakefile.
121
- dos newlines (and newlines generally) can't be fancy string delimiters
122
- do,if,until, etc, have no way to tell if an end is associated
123
- break readme into pieces
124
-
125
-
126
- fixed issues in 0.6.0:
127
- the implicit tokens might be emitted at the wrong times. (or not at the right times) (partly fixed) (0.6)
128
- : operator might be a synonym for 'then' (0.6)
129
- variables other than the last are not recognized in multiple assignment. (0.7)
130
- variables created by for and rescue are not recognized. (0.7)
131
- token following :: should not be BareSymbolToken if begins with A-Z (unless obviously a func, eg b/c followed by func param list)
132
- read code to be lexed from a string. (irb wants this) (0.7)
133
- fancy symbols don't work at all. (like this: %s{abcdefg}) (0.7) [this is regressing now]
134
- Newline,EscNl,BareSymbolToken may get renamed
data/Rantfile DELETED
@@ -1,37 +0,0 @@
1
- import %w(rubydoc rubypackage)
2
-
3
-
4
- test_files=Dir["test{code,data}/*.rb"]
5
- lib_files = Dir["lib/**/*.rb"] #need to actually put files here...
6
- dist_files = lib_files + %w(Rantfile README COPYING) + test_files
7
-
8
- desc "Run unit tests."
9
- task :test do
10
- sys.mkdir 'testresults'
11
- test_files.each{|f|
12
- sys.ruby "testcode/rubylexervsruby.rb", f
13
- }
14
- lib_files.each{|f|
15
- sys.ruby "testcode/rubylexervsruby.rb", f
16
- }
17
- system 'which locate grep' &&
18
- sys.ruby "testcode/rubylexervsruby.rb", `locate /tk.rb|grep 'tk.rb$'`
19
- end
20
-
21
- desc "Generate html documentation."
22
- gen RubyDoc do |t|
23
- t.opts = %w(--title RubyLexer --main README README)
24
- end
25
-
26
- desc "Create packages."
27
- gen RubyPackage, :rubylexer do |t|
28
- t.version = "0.6.1"
29
- t.summary = "A complete lexer of ruby in ruby."
30
- t.files = dist_files
31
- t.package_task :gem
32
- #need more here
33
- end
34
-
35
- task :clean do
36
- sys.rm_rf %w(doc pkg testresults)
37
- end
@@ -1,247 +0,0 @@
1
- =begin copyright
2
- rubylexer - a ruby lexer written in ruby
3
- Copyright (C) 2004,2005 Caleb Clausen
4
-
5
- This library is free software; you can redistribute it and/or
6
- modify it under the terms of the GNU Lesser General Public
7
- License as published by the Free Software Foundation; either
8
- version 2.1 of the License, or (at your option) any later version.
9
-
10
- This library is distributed in the hope that it will be useful,
11
- but WITHOUT ANY WARRANTY; without even the implied warranty of
12
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13
- Lesser General Public License for more details.
14
-
15
- You should have received a copy of the GNU Lesser General Public
16
- License along with this library; if not, write to the Free Software
17
- Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
18
- =end
19
-
20
-
21
- module IOext
22
- #read until a character in a user-supplied set is found.
23
- #charrex must be a regexp that contains _only_ a single character class
24
- def til_charset(charrex,blocksize=16)
25
- blocks=[]
26
- m=nil
27
- until eof?
28
- block=read blocksize
29
- #if near eof, less than a full block may have been read
30
-
31
- if m=charrex .match(block)
32
- self.pos-=m.post_match.length+1
33
- #'self.' shouldn't be needed... but is
34
-
35
- blocks.push m.pre_match if m.pre_match.length>0
36
- break
37
- end
38
- blocks<<block
39
- end
40
- return blocks.to_s
41
- end
42
-
43
-
44
-
45
-
46
-
47
- #-----------------------------------
48
- #read and return next char if it matches ch
49
- #else, leave input unread and return nil or false
50
- def eat_next_if(ch)
51
- oldpos=pos
52
- c=read(1)
53
-
54
- ch.kind_of? Integer and ch=ch.chr
55
-
56
- return case c
57
- when ch then c
58
- when '' then self.pos=oldpos; nil
59
- else self.pos=oldpos; false
60
- end
61
- end
62
-
63
- #-----------------------------------
64
- def eat_while(pat)
65
- pat.kind_of? Integer and pat=pat.chr
66
-
67
- result=''
68
- loop {
69
- ch=read(1)
70
- unless pat===ch
71
- back1char unless ch.nil? #nil ch mean eof
72
- return result
73
- end
74
- result << ch
75
- }
76
- return result
77
- end
78
-
79
- #-----------------------------------
80
- #returns previous character in stream
81
- #without changing stream position
82
- #or '' if at beginning
83
- def prevchar
84
- pos==0 and return ''
85
-
86
- back1char
87
- return getc.chr
88
- end
89
-
90
- #-----------------------------------
91
- #returns next character in stream
92
- #without changing stream position
93
- #or nil if at end
94
- def nextchar
95
- eof? and return nil
96
-
97
- result=getc
98
- back1char
99
- return result
100
- end
101
-
102
- #-----------------------------------
103
- #this should really be in class File...
104
- def getchar
105
- eof? and return ''
106
- return getc.chr
107
- end
108
-
109
- #-----------------------------------
110
- def back1char() self.pos-=1 end
111
-
112
- #-----------------------------------
113
- def readahead(len)
114
- oldpos=pos
115
- result=read(len)
116
- self.pos=oldpos
117
-
118
- return result
119
- end
120
-
121
- #-----------------------------------
122
- def readback(len)
123
- oldpos=pos
124
- self.pos-=len
125
- result=read(len)
126
- self.pos=oldpos
127
-
128
- return result
129
- end
130
-
131
- #-----------------------------------
132
- def readuntil(pat)
133
- each(pat) { |match|
134
- return match
135
- }
136
- end
137
-
138
-
139
-
140
- #-----------------------------------------------------------------------
141
- #a String with the duck-type of a File
142
- #just enough is emulated to fool RubyLexer
143
- class FakeFile < ::String #thanks to murphy for this lovely.
144
-
145
- def initialize(*)
146
- super
147
- @pos = 0
148
- end
149
-
150
- attr_accessor :pos
151
-
152
- def read x
153
- pos = @pos
154
- @pos += x
155
- @pos>size and @pos=size
156
- self[pos ... @pos]
157
- end
158
-
159
- def getc
160
- eof? and return nil
161
- pos = @pos
162
- @pos += 1
163
- self[pos]
164
- end
165
-
166
- def eof?
167
- @pos >= size
168
- end
169
-
170
- def each_byte
171
- until eof?
172
- yield getc
173
- end
174
- end
175
-
176
- def stat #cheezy cheat to make #stat.size work
177
- self
178
- end
179
-
180
- def close; end
181
-
182
- def binmode; end
183
-
184
- include IOext
185
-
186
-
187
- #-----------------------------------
188
- #read and return next char if it matches ch
189
- #else, leave input unread and return nil or false
190
- def eat_next_if(ch)
191
- c=self[@pos,1]
192
-
193
- ch.kind_of? Integer and ch=ch.chr
194
-
195
- case c
196
- when ch then @pos+=1;c
197
- when '' then nil
198
- else false
199
- end
200
- end
201
-
202
- #-----------------------------------
203
- #returns previous character in stream
204
- #without changing stream position
205
- #or '' if at beginning
206
- def prevchar #returns Fixnum
207
- pos==0 ? '' : self[@pos-1]
208
- end
209
-
210
- #-----------------------------------
211
- #returns next character in stream
212
- #without changing stream position
213
- #or nil if at end
214
- def nextchar #returns Fixnum
215
- self[@pos]
216
- end
217
-
218
- #-----------------------------------
219
- def getchar #returns String
220
- eof? and return ''
221
- pos = @pos
222
- @pos += 1
223
- self[pos,1]
224
- end
225
-
226
- #-----------------------------------
227
- def back1char() @pos-=1 end
228
-
229
- #-----------------------------------
230
- def readahead(len)
231
- self[@pos,len]
232
- end
233
-
234
- #-----------------------------------
235
- def readback(len)
236
- assert @pos-len>=0
237
- self[@pos-len,len]
238
- end
239
-
240
-
241
- end
242
-
243
- end
244
-
245
- class IO
246
- include IOext
247
- end
data/require.rb DELETED
@@ -1,103 +0,0 @@
1
-
2
- #wrapper versions of all commands that import code into a running program:
3
- #require, load, eval and friends. the wrapped versions pass the code to
4
- #import to rubylexervsruby, to test whether it gets lexed correctly. an
5
- #exception is raised if an lex error happens, else the code should behave
6
- #as normal, just much slower.
7
- class Kernel
8
-
9
- System_extension_extension=
10
- case RUBY_PLATFORM
11
- when /darwin/: 'o'
12
- when /windows/i: 'dll'
13
- else 'so'
14
- end
15
- System_ext_rex=/\.#{System_extension_extension}$/o
16
-
17
- def require_name_resolve(name)
18
- add_ext=case name
19
- when System_ext_rex,/\.rb$/:
20
- else name=/(#{name})(#{System_ext_rex}|\.rb)?/
21
- end
22
- name=/#{File::SEPARATOR}#{name}$/
23
- $:.find{|dir|
24
- dir.chomp File::SEPARATOR
25
- Dir[/#{dir}#{name}/]
26
- }
27
- end
28
-
29
-
30
- #reallyy jonesing for :wrap here
31
- alias stdlib__require require
32
- def require feat
33
- name=feat
34
- name=require_name_resolve(name) unless File.abs_path?(name)
35
- return(false) unless name
36
- return(true) if $".grep(feat)
37
- $"<<feat
38
- return stdlib__require(name) if name[System_ext_rex]
39
- load name
40
- end
41
-
42
- alias stdlib__load load
43
- def load name,wrap=false
44
- name=$:.find{|dir| Dir[dir,name]} unless File.abs_path?(name)
45
- if wrap then Module.new {
46
- eval File.read(name), huh binding, name,1
47
- } else eval File.read(name), huh binding, name,1
48
- end
49
- true
50
- end
51
-
52
- @@evalpos=1 #eval saves a position for the next eval sometimes... when?
53
- alias stdlib__eval eval
54
- def eval code,binding=nil,name='(eval)',linenum=1
55
- if binding
56
- rubylexervsruby(code, :name=>name, :linenum=>linenum, :locals=>eval("local_variables",binding))
57
-
58
- return stdlib__eval code,binding,filename,linenum
59
- end
60
- huh Binding.of_caller{|bg| eval code,bg,name,linenum}
61
- end
62
-
63
- huh#got to do module_eval, class_eval, instance_eval, etc
64
- end
65
-
66
- class Object
67
- alias stdlib__instance_eval instance_eval
68
- def instance_eval(code,&block)
69
- block and return stdlib__instance_eval &block
70
- eval code, stdlib__instance_eval{binding}
71
- end
72
- end
73
-
74
- class Module
75
- alias stdlib__module_eval module_eval
76
- alias module_eval instance_eval
77
- end
78
-
79
- class Class
80
- alias stdlib__class_eval class_eval
81
- alias class_eval instance_eval
82
- end
83
-
84
-
85
- class Binding
86
- alias stdlib__eval eval
87
- def eval code,name='(eval)',linenum=1
88
- rubylexervsruby(code, :name=>name, :linenum=>linenum, :locals=>eval("local_variables",binding))
89
-
90
- huh #should set code to (effectively) output of tokentest
91
- #how to do that within rubylexervsruby
92
-
93
- return stdlib__eval code,self,filename,linenum
94
- end
95
- end
96
-
97
- =begin
98
- def Module
99
- def new
100
- o=Object.extend self
101
- end
102
- end
103
- =end