rubylexer 0.6.2 → 0.7.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (76) hide show
  1. data/History.txt +55 -0
  2. data/Manifest.txt +67 -0
  3. data/README.txt +103 -0
  4. data/Rakefile +24 -0
  5. data/howtouse.txt +9 -6
  6. data/{assert.rb → lib/assert.rb} +11 -11
  7. data/{rubylexer.rb → lib/rubylexer.rb} +645 -342
  8. data/lib/rubylexer/0.6.2.rb +39 -0
  9. data/lib/rubylexer/0.6.rb +5 -0
  10. data/lib/rubylexer/0.7.0.rb +2 -0
  11. data/{charhandler.rb → lib/rubylexer/charhandler.rb} +4 -2
  12. data/{charset.rb → lib/rubylexer/charset.rb} +4 -3
  13. data/{context.rb → lib/rubylexer/context.rb} +48 -18
  14. data/{rubycode.rb → lib/rubylexer/rubycode.rb} +5 -3
  15. data/{rulexer.rb → lib/rubylexer/rulexer.rb} +180 -102
  16. data/{symboltable.rb → lib/rubylexer/symboltable.rb} +10 -1
  17. data/{token.rb → lib/rubylexer/token.rb} +72 -20
  18. data/{tokenprinter.rb → lib/rubylexer/tokenprinter.rb} +39 -16
  19. data/lib/rubylexer/version.rb +3 -0
  20. data/{testcode → test/code}/deletewarns.rb +0 -0
  21. data/test/code/dl_all_gems.rb +43 -0
  22. data/{testcode → test/code}/dumptokens.rb +12 -9
  23. data/test/code/locatetest +30 -0
  24. data/test/code/locatetest.rb +49 -0
  25. data/test/code/rubylexervsruby.rb +173 -0
  26. data/{testcode → test/code}/tokentest.rb +62 -51
  27. data/{testcode → test/code}/torment +8 -8
  28. data/test/code/unpack_all_gems.rb +15 -0
  29. data/{testdata → test/data}/1.rb.broken +0 -0
  30. data/{testdata → test/data}/23.rb +0 -0
  31. data/test/data/__end__.rb +2 -0
  32. data/test/data/__end__2.rb +3 -0
  33. data/test/data/and.rb +5 -0
  34. data/test/data/blockassigntest.rb +23 -0
  35. data/test/data/chunky.plain.rb +75 -0
  36. data/test/data/chunky_bacon.rb +112 -0
  37. data/test/data/chunky_bacon2.rb +112 -0
  38. data/test/data/chunky_bacon3.rb +112 -0
  39. data/test/data/chunky_bacon4.rb +112 -0
  40. data/test/data/for.rb +45 -0
  41. data/test/data/format.rb +6 -0
  42. data/{testdata → test/data}/g.rb +0 -0
  43. data/test/data/gemlist.txt +280 -0
  44. data/test/data/heart.rb +7 -0
  45. data/test/data/if.rb +6 -0
  46. data/test/data/jarh.rb +369 -0
  47. data/test/data/lbrace.rb +4 -0
  48. data/test/data/lbrack.rb +4 -0
  49. data/{testdata → test/data}/newsyntax.rb +0 -0
  50. data/{testdata → test/data}/noeolatend.rb +0 -0
  51. data/test/data/p-op.rb +8 -0
  52. data/{testdata → test/data}/p.rb +671 -79
  53. data/{testdata → test/data}/pleac.rb.broken +0 -0
  54. data/{testdata → test/data}/pre.rb +0 -0
  55. data/{testdata → test/data}/pre.unix.rb +0 -0
  56. data/{testdata → test/data}/regtest.rb +0 -0
  57. data/test/data/rescue.rb +35 -0
  58. data/test/data/s.rb +186 -0
  59. data/test/data/strinc.rb +2 -0
  60. data/{testdata → test/data}/tokentest.assert.rb.can +0 -0
  61. data/test/data/untermed_here.rb.broken +2 -0
  62. data/test/data/untermed_string.rb.broken +1 -0
  63. data/{testdata → test/data}/untitled1.rb +0 -0
  64. data/{testdata → test/data}/w.rb +0 -0
  65. data/{testdata → test/data}/wsdlDriver.rb +0 -0
  66. data/testing.txt +6 -4
  67. metadata +163 -59
  68. data/README +0 -134
  69. data/Rantfile +0 -37
  70. data/io.each_til_charset.rb +0 -247
  71. data/require.rb +0 -103
  72. data/rlold.rb +0 -12
  73. data/testcode/locatetest +0 -12
  74. data/testcode/rubylexervsruby.rb +0 -104
  75. data/testcode/rubylexervsruby.sh +0 -51
  76. data/testresults/placeholder +0 -0
data/README DELETED
@@ -1,134 +0,0 @@
1
- -=RubyLexer 0.6.2=-
2
-
3
- RubyLexer is a lexer library for Ruby, written in Ruby. My goal with Rubylexer
4
- was to create a lexer for Ruby that's complete and correct; all legal Ruby
5
- code should be lexed correctly by RubyLexer as well. Just enough parsing
6
- capability is included to give RubyLexer enough context to tokenize correctly
7
- in all cases. (This turned out to be more parsing than I had thought or
8
- wanted to take on at first.)
9
-
10
- Other Ruby lexers exist, but most are inadequate. For instance, irb has it's
11
- own little lexer, as does, (I believe) RDoc, so do all the ide's that can
12
- colorize. I've seen several stand-alone libraries as well. All or almost all
13
- suffer from the same problems: they skip the hard part of lexing. RubyLexer
14
- handles the hard things like complicated strings, the ambiguous nature of
15
- some punctuation characters and keywords in ruby, and distinguishing methods
16
- and local variables.
17
-
18
- RubyLexer is not particularly clean code. As I progressed in writing this,
19
- I've learned a little about how these things are supposed to be done; the
20
- lexer is not supposed to have any state of it's own, instead it gets whatever
21
- it needs to know from the parser. As a stand-alone lexer, Rubylexer maintains
22
- quite a lot of state. Every instance variable in the RubyLexer class is some
23
- sort of lexer state. Most of the complication and ugly code in RubyLexer is
24
- in maintaining or using this state.
25
-
26
- For information about using RubyLexer in your program, please see howtouse.txt.
27
-
28
- For my notes on the testing of RubyLexer, see testing.txt.
29
-
30
- If you have any questions, comments, problems, new feature requests, or just
31
- want to figure out how to make it work for what you need to do, contact me:
32
- rubylexer _at_ inforadical.net
33
-
34
- RubyLexer is a RubyForge project. RubyForge is another good place to send your
35
- bug reports or whatever: http://rubyforge.org/projects/rubylexer/
36
-
37
- (There aren't any bug filed against RubyLexer there yet, but don't be afraid
38
- that your report will get lonely.)
39
-
40
- Status:
41
- RubyLexer can correctly lex all legal Ruby 1.8 code that I've been able to
42
- find on my Debian system. It can also handle (most of) my catalog of nasty
43
- test cases (in testdata/p.rb). At this point, new bugs are almost exclusively
44
- found by my home-grown test code, rather than ruby code gathered 'from the
45
- wild'. A largish sample of ruby recently tested for the first time (that is,
46
- Rubyx) had _0_ lex errors. (And this is not the only example.) There are a
47
- number of issues i know about and plan to fix, but it seems that Ruby coders
48
- don't write code complex enough to trigger them very often. Although
49
- incomplete, RubyLexer is nevertheless better than many existing ad-hoc
50
- lexers. For instance, RubyLexer can correctly distinguish all cases of the
51
- different uses the following operators, depending on context:
52
- % can be modulus operator or start of fancy string
53
- / can be division operator or start of regex
54
- * & + - can be unary or binary operator
55
- [] can be for array literal or index method
56
- << can be here document or left shift operator (or in class<<obj expr)
57
- :: can be unary or binary operator
58
- : can be start of symbol, substitute for then, or part of ternary op
59
- (there are other uses too, but they're not supported yet.)
60
- ? can be start of character constant or ternary operator
61
- ` can be method name or start of exec string
62
-
63
- todo:
64
- test w/ more code (rubygems, rpa, obfuscated ruby contest, rubicon, others?)
65
- these 5 should be my standard test suite: p.rb, (matz') test.rb, tk.rb, obfuscated ruby contest, rubicon
66
- test more ways: cvt source to dos or mac fmt before testing
67
- test more ways: run unit tests after passing thru rubylexer (0.7)
68
- test more ways: test require'd, load'd, or eval'd code as well (0.7)
69
- lex code a line (or chunk) at a time and save state for next line (irb wants this) (0.8)
70
- incremental lexing (ides want this (for performance))
71
- put everything in a namespace
72
- integrate w/ other tools...
73
- html colorized output?
74
- move more state onto @bracestack (ongoing)
75
- expand on test documentation above
76
- the new cases in p.rb now compile, but won't run
77
- use want_op_name more
78
- return result as a half-parsed tree (with parentheses and the like matched)
79
- emit advisory tokens when see beginword, then (or equivalent), or end... what else does florian want?
80
- strings are still slow
81
- rantfile
82
- emit advisory tokens when local var defined/goes out of scope (or hidden/unhidden?)
83
- fakefile should be a mixin
84
- token pruning in dumptokens...
85
-
86
- new ruby features not yet supported:
87
- procs without proc keyword, looks like hash to current lexer
88
- keyword arguments, in hash immediates or actual param lists (&formal param lists?)
89
- unicode (0.9)
90
- :wrap and friends... (i wish someone would make a list of all the uses of colon in ruby.)
91
- parens in block param list (works, but hacky)
92
-
93
-
94
- known issues: (and planned fix release)
95
- context not really preserved when entering or leaving string inclusions. this causes
96
- a number or problems. (0.8)
97
- string tokenization sometimes a little different from ruby around newlines
98
- (htree/template.rb) (0.8)
99
- string contents might not be correctly translated in a few cases (0.8?)
100
- the implicit tokens might be emitted at the wrong times. (or not at the right times) (need more test code) (0.7)
101
- local variables should be temporarily hidden by class, module, and def (0.7)
102
- windows or mac newline in source are likely to cause problems in obscure cases (need test case)
103
- line numbers are sometimes off... probably to do with multi-line strings (=begin...=end causes this) (0.8)
104
- symbols which contain string interpolations are flattened into one token. eg :"foo#{bar}" (0.8)
105
- methnames and varnames might get mixed up in def header (in idents after the 'def' but before param list) (0.7)
106
- FileAndLineToken not emitted everywhere it should be (0.8)
107
- '\r' whitespace sometimes seen in dos-formatted output.. shouldn't be (eg pre.rb) (0.7)
108
- no way to get offset of __END__ (??) (0.7)
109
- put things in lib/
110
-
111
-
112
- fixed issues in 0.6.2:
113
- testcode/dumptokens.rb charhandler.rb doesn't work... but does after unix2dos (not reproducible)
114
- files should be opened in binmode to avoid all possible eol translation
115
- (x.+?x) doesn't work
116
- methname/varname mixups in some cases
117
- performance, in most important cases.
118
- error handling tokens should be emitted on error input... ErrorToken mixin module
119
- but old error handling interface should be preserved and made available.
120
- move readahead and friends into IOext. make optimized readahead et al for fakefile.
121
- dos newlines (and newlines generally) can't be fancy string delimiters
122
- do,if,until, etc, have no way to tell if an end is associated
123
- break readme into pieces
124
-
125
-
126
- fixed issues in 0.6.0:
127
- the implicit tokens might be emitted at the wrong times. (or not at the right times) (partly fixed) (0.6)
128
- : operator might be a synonym for 'then' (0.6)
129
- variables other than the last are not recognized in multiple assignment. (0.7)
130
- variables created by for and rescue are not recognized. (0.7)
131
- token following :: should not be BareSymbolToken if begins with A-Z (unless obviously a func, eg b/c followed by func param list)
132
- read code to be lexed from a string. (irb wants this) (0.7)
133
- fancy symbols don't work at all. (like this: %s{abcdefg}) (0.7) [this is regressing now]
134
- Newline,EscNl,BareSymbolToken may get renamed
data/Rantfile DELETED
@@ -1,37 +0,0 @@
1
- import %w(rubydoc rubypackage)
2
-
3
-
4
- test_files=Dir["test{code,data}/*.rb"]
5
- lib_files = Dir["lib/**/*.rb"] #need to actually put files here...
6
- dist_files = lib_files + %w(Rantfile README COPYING) + test_files
7
-
8
- desc "Run unit tests."
9
- task :test do
10
- sys.mkdir 'testresults'
11
- test_files.each{|f|
12
- sys.ruby "testcode/rubylexervsruby.rb", f
13
- }
14
- lib_files.each{|f|
15
- sys.ruby "testcode/rubylexervsruby.rb", f
16
- }
17
- system 'which locate grep' &&
18
- sys.ruby "testcode/rubylexervsruby.rb", `locate /tk.rb|grep 'tk.rb$'`
19
- end
20
-
21
- desc "Generate html documentation."
22
- gen RubyDoc do |t|
23
- t.opts = %w(--title RubyLexer --main README README)
24
- end
25
-
26
- desc "Create packages."
27
- gen RubyPackage, :rubylexer do |t|
28
- t.version = "0.6.1"
29
- t.summary = "A complete lexer of ruby in ruby."
30
- t.files = dist_files
31
- t.package_task :gem
32
- #need more here
33
- end
34
-
35
- task :clean do
36
- sys.rm_rf %w(doc pkg testresults)
37
- end
@@ -1,247 +0,0 @@
1
- =begin copyright
2
- rubylexer - a ruby lexer written in ruby
3
- Copyright (C) 2004,2005 Caleb Clausen
4
-
5
- This library is free software; you can redistribute it and/or
6
- modify it under the terms of the GNU Lesser General Public
7
- License as published by the Free Software Foundation; either
8
- version 2.1 of the License, or (at your option) any later version.
9
-
10
- This library is distributed in the hope that it will be useful,
11
- but WITHOUT ANY WARRANTY; without even the implied warranty of
12
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13
- Lesser General Public License for more details.
14
-
15
- You should have received a copy of the GNU Lesser General Public
16
- License along with this library; if not, write to the Free Software
17
- Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
18
- =end
19
-
20
-
21
- module IOext
22
- #read until a character in a user-supplied set is found.
23
- #charrex must be a regexp that contains _only_ a single character class
24
- def til_charset(charrex,blocksize=16)
25
- blocks=[]
26
- m=nil
27
- until eof?
28
- block=read blocksize
29
- #if near eof, less than a full block may have been read
30
-
31
- if m=charrex .match(block)
32
- self.pos-=m.post_match.length+1
33
- #'self.' shouldn't be needed... but is
34
-
35
- blocks.push m.pre_match if m.pre_match.length>0
36
- break
37
- end
38
- blocks<<block
39
- end
40
- return blocks.to_s
41
- end
42
-
43
-
44
-
45
-
46
-
47
- #-----------------------------------
48
- #read and return next char if it matches ch
49
- #else, leave input unread and return nil or false
50
- def eat_next_if(ch)
51
- oldpos=pos
52
- c=read(1)
53
-
54
- ch.kind_of? Integer and ch=ch.chr
55
-
56
- return case c
57
- when ch then c
58
- when '' then self.pos=oldpos; nil
59
- else self.pos=oldpos; false
60
- end
61
- end
62
-
63
- #-----------------------------------
64
- def eat_while(pat)
65
- pat.kind_of? Integer and pat=pat.chr
66
-
67
- result=''
68
- loop {
69
- ch=read(1)
70
- unless pat===ch
71
- back1char unless ch.nil? #nil ch mean eof
72
- return result
73
- end
74
- result << ch
75
- }
76
- return result
77
- end
78
-
79
- #-----------------------------------
80
- #returns previous character in stream
81
- #without changing stream position
82
- #or '' if at beginning
83
- def prevchar
84
- pos==0 and return ''
85
-
86
- back1char
87
- return getc.chr
88
- end
89
-
90
- #-----------------------------------
91
- #returns next character in stream
92
- #without changing stream position
93
- #or nil if at end
94
- def nextchar
95
- eof? and return nil
96
-
97
- result=getc
98
- back1char
99
- return result
100
- end
101
-
102
- #-----------------------------------
103
- #this should really be in class File...
104
- def getchar
105
- eof? and return ''
106
- return getc.chr
107
- end
108
-
109
- #-----------------------------------
110
- def back1char() self.pos-=1 end
111
-
112
- #-----------------------------------
113
- def readahead(len)
114
- oldpos=pos
115
- result=read(len)
116
- self.pos=oldpos
117
-
118
- return result
119
- end
120
-
121
- #-----------------------------------
122
- def readback(len)
123
- oldpos=pos
124
- self.pos-=len
125
- result=read(len)
126
- self.pos=oldpos
127
-
128
- return result
129
- end
130
-
131
- #-----------------------------------
132
- def readuntil(pat)
133
- each(pat) { |match|
134
- return match
135
- }
136
- end
137
-
138
-
139
-
140
- #-----------------------------------------------------------------------
141
- #a String with the duck-type of a File
142
- #just enough is emulated to fool RubyLexer
143
- class FakeFile < ::String #thanks to murphy for this lovely.
144
-
145
- def initialize(*)
146
- super
147
- @pos = 0
148
- end
149
-
150
- attr_accessor :pos
151
-
152
- def read x
153
- pos = @pos
154
- @pos += x
155
- @pos>size and @pos=size
156
- self[pos ... @pos]
157
- end
158
-
159
- def getc
160
- eof? and return nil
161
- pos = @pos
162
- @pos += 1
163
- self[pos]
164
- end
165
-
166
- def eof?
167
- @pos >= size
168
- end
169
-
170
- def each_byte
171
- until eof?
172
- yield getc
173
- end
174
- end
175
-
176
- def stat #cheezy cheat to make #stat.size work
177
- self
178
- end
179
-
180
- def close; end
181
-
182
- def binmode; end
183
-
184
- include IOext
185
-
186
-
187
- #-----------------------------------
188
- #read and return next char if it matches ch
189
- #else, leave input unread and return nil or false
190
- def eat_next_if(ch)
191
- c=self[@pos,1]
192
-
193
- ch.kind_of? Integer and ch=ch.chr
194
-
195
- case c
196
- when ch then @pos+=1;c
197
- when '' then nil
198
- else false
199
- end
200
- end
201
-
202
- #-----------------------------------
203
- #returns previous character in stream
204
- #without changing stream position
205
- #or '' if at beginning
206
- def prevchar #returns Fixnum
207
- pos==0 ? '' : self[@pos-1]
208
- end
209
-
210
- #-----------------------------------
211
- #returns next character in stream
212
- #without changing stream position
213
- #or nil if at end
214
- def nextchar #returns Fixnum
215
- self[@pos]
216
- end
217
-
218
- #-----------------------------------
219
- def getchar #returns String
220
- eof? and return ''
221
- pos = @pos
222
- @pos += 1
223
- self[pos,1]
224
- end
225
-
226
- #-----------------------------------
227
- def back1char() @pos-=1 end
228
-
229
- #-----------------------------------
230
- def readahead(len)
231
- self[@pos,len]
232
- end
233
-
234
- #-----------------------------------
235
- def readback(len)
236
- assert @pos-len>=0
237
- self[@pos-len,len]
238
- end
239
-
240
-
241
- end
242
-
243
- end
244
-
245
- class IO
246
- include IOext
247
- end
data/require.rb DELETED
@@ -1,103 +0,0 @@
1
-
2
- #wrapper versions of all commands that import code into a running program:
3
- #require, load, eval and friends. the wrapped versions pass the code to
4
- #import to rubylexervsruby, to test whether it gets lexed correctly. an
5
- #exception is raised if an lex error happens, else the code should behave
6
- #as normal, just much slower.
7
- class Kernel
8
-
9
- System_extension_extension=
10
- case RUBY_PLATFORM
11
- when /darwin/: 'o'
12
- when /windows/i: 'dll'
13
- else 'so'
14
- end
15
- System_ext_rex=/\.#{System_extension_extension}$/o
16
-
17
- def require_name_resolve(name)
18
- add_ext=case name
19
- when System_ext_rex,/\.rb$/:
20
- else name=/(#{name})(#{System_ext_rex}|\.rb)?/
21
- end
22
- name=/#{File::SEPARATOR}#{name}$/
23
- $:.find{|dir|
24
- dir.chomp File::SEPARATOR
25
- Dir[/#{dir}#{name}/]
26
- }
27
- end
28
-
29
-
30
- #reallyy jonesing for :wrap here
31
- alias stdlib__require require
32
- def require feat
33
- name=feat
34
- name=require_name_resolve(name) unless File.abs_path?(name)
35
- return(false) unless name
36
- return(true) if $".grep(feat)
37
- $"<<feat
38
- return stdlib__require(name) if name[System_ext_rex]
39
- load name
40
- end
41
-
42
- alias stdlib__load load
43
- def load name,wrap=false
44
- name=$:.find{|dir| Dir[dir,name]} unless File.abs_path?(name)
45
- if wrap then Module.new {
46
- eval File.read(name), huh binding, name,1
47
- } else eval File.read(name), huh binding, name,1
48
- end
49
- true
50
- end
51
-
52
- @@evalpos=1 #eval saves a position for the next eval sometimes... when?
53
- alias stdlib__eval eval
54
- def eval code,binding=nil,name='(eval)',linenum=1
55
- if binding
56
- rubylexervsruby(code, :name=>name, :linenum=>linenum, :locals=>eval("local_variables",binding))
57
-
58
- return stdlib__eval code,binding,filename,linenum
59
- end
60
- huh Binding.of_caller{|bg| eval code,bg,name,linenum}
61
- end
62
-
63
- huh#got to do module_eval, class_eval, instance_eval, etc
64
- end
65
-
66
- class Object
67
- alias stdlib__instance_eval instance_eval
68
- def instance_eval(code,&block)
69
- block and return stdlib__instance_eval &block
70
- eval code, stdlib__instance_eval{binding}
71
- end
72
- end
73
-
74
- class Module
75
- alias stdlib__module_eval module_eval
76
- alias module_eval instance_eval
77
- end
78
-
79
- class Class
80
- alias stdlib__class_eval class_eval
81
- alias class_eval instance_eval
82
- end
83
-
84
-
85
- class Binding
86
- alias stdlib__eval eval
87
- def eval code,name='(eval)',linenum=1
88
- rubylexervsruby(code, :name=>name, :linenum=>linenum, :locals=>eval("local_variables",binding))
89
-
90
- huh #should set code to (effectively) output of tokentest
91
- #how to do that within rubylexervsruby
92
-
93
- return stdlib__eval code,self,filename,linenum
94
- end
95
- end
96
-
97
- =begin
98
- def Module
99
- def new
100
- o=Object.extend self
101
- end
102
- end
103
- =end