RubyGems - rubylexer - Versions diffs - 0.6.2 → 0.7.0 - Mend

rubylexer 0.6.2 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (76) hide show

data/History.txt +55 -0
data/Manifest.txt +67 -0
data/README.txt +103 -0
data/Rakefile +24 -0
data/howtouse.txt +9 -6
data/{assert.rb → lib/assert.rb} +11 -11
data/{rubylexer.rb → lib/rubylexer.rb} +645 -342
data/lib/rubylexer/0.6.2.rb +39 -0
data/lib/rubylexer/0.6.rb +5 -0
data/lib/rubylexer/0.7.0.rb +2 -0
data/{charhandler.rb → lib/rubylexer/charhandler.rb} +4 -2
data/{charset.rb → lib/rubylexer/charset.rb} +4 -3
data/{context.rb → lib/rubylexer/context.rb} +48 -18
data/{rubycode.rb → lib/rubylexer/rubycode.rb} +5 -3
data/{rulexer.rb → lib/rubylexer/rulexer.rb} +180 -102
data/{symboltable.rb → lib/rubylexer/symboltable.rb} +10 -1
data/{token.rb → lib/rubylexer/token.rb} +72 -20
data/{tokenprinter.rb → lib/rubylexer/tokenprinter.rb} +39 -16
data/lib/rubylexer/version.rb +3 -0
data/{testcode → test/code}/deletewarns.rb +0 -0
data/test/code/dl_all_gems.rb +43 -0
data/{testcode → test/code}/dumptokens.rb +12 -9
data/test/code/locatetest +30 -0
data/test/code/locatetest.rb +49 -0
data/test/code/rubylexervsruby.rb +173 -0
data/{testcode → test/code}/tokentest.rb +62 -51
data/{testcode → test/code}/torment +8 -8
data/test/code/unpack_all_gems.rb +15 -0
data/{testdata → test/data}/1.rb.broken +0 -0
data/{testdata → test/data}/23.rb +0 -0
data/test/data/__end__.rb +2 -0
data/test/data/__end__2.rb +3 -0
data/test/data/and.rb +5 -0
data/test/data/blockassigntest.rb +23 -0
data/test/data/chunky.plain.rb +75 -0
data/test/data/chunky_bacon.rb +112 -0
data/test/data/chunky_bacon2.rb +112 -0
data/test/data/chunky_bacon3.rb +112 -0
data/test/data/chunky_bacon4.rb +112 -0
data/test/data/for.rb +45 -0
data/test/data/format.rb +6 -0
data/{testdata → test/data}/g.rb +0 -0
data/test/data/gemlist.txt +280 -0
data/test/data/heart.rb +7 -0
data/test/data/if.rb +6 -0
data/test/data/jarh.rb +369 -0
data/test/data/lbrace.rb +4 -0
data/test/data/lbrack.rb +4 -0
data/{testdata → test/data}/newsyntax.rb +0 -0
data/{testdata → test/data}/noeolatend.rb +0 -0
data/test/data/p-op.rb +8 -0
data/{testdata → test/data}/p.rb +671 -79
data/{testdata → test/data}/pleac.rb.broken +0 -0
data/{testdata → test/data}/pre.rb +0 -0
data/{testdata → test/data}/pre.unix.rb +0 -0
data/{testdata → test/data}/regtest.rb +0 -0
data/test/data/rescue.rb +35 -0
data/test/data/s.rb +186 -0
data/test/data/strinc.rb +2 -0
data/{testdata → test/data}/tokentest.assert.rb.can +0 -0
data/test/data/untermed_here.rb.broken +2 -0
data/test/data/untermed_string.rb.broken +1 -0
data/{testdata → test/data}/untitled1.rb +0 -0
data/{testdata → test/data}/w.rb +0 -0
data/{testdata → test/data}/wsdlDriver.rb +0 -0
data/testing.txt +6 -4
metadata +163 -59
data/README +0 -134
data/Rantfile +0 -37
data/io.each_til_charset.rb +0 -247
data/require.rb +0 -103
data/rlold.rb +0 -12
data/testcode/locatetest +0 -12
data/testcode/rubylexervsruby.rb +0 -104
data/testcode/rubylexervsruby.sh +0 -51
data/testresults/placeholder +0 -0

data/README DELETED Viewed

@@ -1,134 +0,0 @@
-                              -=RubyLexer 0.6.2=-
-RubyLexer is a lexer library for Ruby, written in Ruby. My goal with Rubylexer
-was to create a lexer for Ruby that's complete and correct; all legal Ruby
-code should be lexed correctly by RubyLexer as well. Just enough parsing
-capability is included to give RubyLexer enough context to tokenize correctly
-in all cases. (This turned out to be more parsing than I had thought or
-wanted to take on at first.)
-Other Ruby lexers exist, but most are inadequate. For instance, irb has it's
-own little lexer, as does, (I believe) RDoc, so do all the ide's that can
-colorize. I've seen several stand-alone libraries as well. All or almost all
-suffer from the same problems: they skip the hard part of lexing. RubyLexer
-handles the hard things like complicated strings, the ambiguous nature of
-some punctuation characters and keywords in ruby, and distinguishing methods
-and local variables.
-RubyLexer is not particularly clean code. As I progressed in writing this,
-I've learned a little about how these things are supposed to be done; the
-lexer is not supposed to have any state of it's own, instead it gets whatever
-it needs to know from the parser. As a stand-alone lexer, Rubylexer maintains
-quite a lot of state. Every instance variable in the RubyLexer class is some
-sort of lexer state. Most of the complication and ugly code in RubyLexer is
-in maintaining or using this state.
-For information about using RubyLexer in your program, please see howtouse.txt.
-For my notes on the testing of RubyLexer, see testing.txt.
-If you have any questions, comments, problems, new feature requests, or just
-want to figure out how to make it work for what you need to do, contact me:
-       rubylexer _at_ inforadical.net
-RubyLexer is a RubyForge project. RubyForge is another good place to send your
-bug reports or whatever:  http://rubyforge.org/projects/rubylexer/
-(There aren't any bug filed against RubyLexer there yet, but don't be afraid
-that your report will get lonely.)
-Status:
-RubyLexer can correctly lex all legal Ruby 1.8 code that I've been able to
-find on my Debian system. It can also handle (most of) my catalog of nasty
-test cases (in testdata/p.rb). At this point, new bugs are almost exclusively
-found by my home-grown test code, rather than ruby code gathered 'from the
-wild'. A largish sample of ruby recently tested for the first time (that is,
-Rubyx) had _0_ lex errors. (And this is not the only example.) There are a
-number of issues i know about and plan to fix, but it seems that Ruby coders
-don't write code complex enough to trigger them very often. Although
-incomplete, RubyLexer is nevertheless better than many existing ad-hoc
-lexers. For instance, RubyLexer can correctly distinguish all cases of the
-different uses the following operators, depending on context:
-  %   can be modulus operator or start of fancy string
-  /   can be division operator or start of regex
-  * & + -  can be unary or binary operator
-  []  can be for array literal or index method
-  <<  can be here document or left shift operator (or in class<<obj expr)
-  ::  can be unary or binary operator
-  :   can be start of symbol, substitute for then, or part of ternary op
-      (there are other uses too, but they're not supported yet.)
-  ?   can be start of character constant or ternary operator
-  `   can be method name or start of exec string
-todo:
-test w/ more code (rubygems, rpa, obfuscated ruby contest, rubicon, others?)
-these 5 should be my standard test suite: p.rb, (matz') test.rb, tk.rb, obfuscated ruby contest, rubicon
-test more ways: cvt source to dos or mac fmt before testing
-test more ways: run unit tests after passing thru rubylexer (0.7)
-test more ways: test require'd, load'd, or eval'd code as well (0.7)
-lex code a line (or chunk) at a time and save state for next line (irb wants this) (0.8)
-incremental lexing (ides want this (for performance))
-put everything in a namespace
-integrate w/ other tools...
-html colorized output?
-move more state onto @bracestack (ongoing)
-expand on test documentation above
-the new cases in p.rb now compile, but won't run
-use want_op_name more
-return result as a half-parsed tree (with parentheses and the like matched)
-emit advisory tokens when see beginword, then (or equivalent), or end... what else does florian want?
-strings are still slow
-rantfile
-emit advisory tokens when local var defined/goes out of scope (or hidden/unhidden?)
-fakefile should be a mixin
-token pruning in dumptokens...
-new ruby features not yet supported:
-procs without proc keyword, looks like hash to current lexer
-keyword arguments, in hash immediates or actual param lists (&formal param lists?)
-unicode (0.9)
-:wrap and friends... (i wish someone would make a list of all the uses of colon in ruby.)
-parens in block param list (works, but hacky)
-known issues: (and planned fix release)
-context not really preserved when entering or leaving string inclusions. this causes
-a number or problems. (0.8)
-string tokenization sometimes a little different from ruby around newlines
-  (htree/template.rb) (0.8)
-string contents might not be correctly translated in a few cases (0.8?)
-the implicit tokens might be emitted at the wrong times. (or not at the right times) (need more test code) (0.7)
-local variables should be temporarily hidden by class, module, and def (0.7)
-windows or mac newline in source are likely to cause problems in obscure cases (need test case)
-line numbers are sometimes off... probably to do with multi-line strings (=begin...=end causes this) (0.8)
-symbols which contain string interpolations are flattened into one token. eg :"foo#{bar}" (0.8)
-methnames and varnames might get mixed up in def header (in idents after the 'def' but before param list) (0.7)
-FileAndLineToken not emitted everywhere it should be (0.8)
-'\r' whitespace sometimes seen in dos-formatted output.. shouldn't be (eg pre.rb) (0.7)
-no way to get offset of __END__ (??) (0.7)
-put things in lib/
-fixed issues in 0.6.2:
-testcode/dumptokens.rb charhandler.rb doesn't work... but does after unix2dos (not reproducible)
-files should be opened in binmode to avoid all possible eol translation
-(x.+?x) doesn't work
-methname/varname mixups in some cases
-performance, in most important cases.
-error handling tokens should be emitted on error input... ErrorToken mixin module
-but old error handling interface should be preserved and made available.
-move readahead and friends into IOext. make optimized readahead et al for fakefile.
-dos newlines (and newlines generally) can't be fancy string delimiters
-do,if,until, etc, have no way to tell if an end is associated
-break readme into pieces
-fixed issues in 0.6.0:
-the implicit tokens might be emitted at the wrong times. (or not at the right times) (partly fixed) (0.6)
-: operator might be a synonym for 'then' (0.6)
-variables other than the last are not recognized in multiple assignment. (0.7)
-variables created by for and rescue are not recognized. (0.7)
-token following :: should not be BareSymbolToken if begins with A-Z (unless obviously a func, eg b/c followed by func param list)
-read code to be lexed from a string. (irb wants this) (0.7)
-fancy symbols don't work at all. (like this:  %s{abcdefg}) (0.7) [this is regressing now]
-Newline,EscNl,BareSymbolToken may get renamed

data/Rantfile DELETED Viewed

@@ -1,37 +0,0 @@
-  import %w(rubydoc rubypackage)
-    test_files=Dir["test{code,data}/*.rb"]
-    lib_files = Dir["lib/**/*.rb"]  #need to actually put files here...
-    dist_files = lib_files + %w(Rantfile README COPYING) + test_files
-    desc "Run unit tests."
-    task :test do
-      sys.mkdir 'testresults'
-      test_files.each{|f|
-            sys.ruby "testcode/rubylexervsruby.rb", f
-      }
-      lib_files.each{|f|
-            sys.ruby "testcode/rubylexervsruby.rb", f
-      }
-      system 'which locate grep' &&
-        sys.ruby "testcode/rubylexervsruby.rb",  `locate /tk.rb|grep 'tk.rb$'`
-    end
-    desc "Generate html documentation."
-    gen RubyDoc do |t|
-        t.opts = %w(--title RubyLexer --main README README)
-    end
-    desc "Create packages."
-    gen RubyPackage, :rubylexer do |t|
-        t.version = "0.6.1"
-        t.summary = "A complete lexer of ruby in ruby."
-        t.files = dist_files
-        t.package_task :gem
-        #need more here
-    end
-    task :clean do
-        sys.rm_rf %w(doc pkg testresults)
-    end

data/io.each_til_charset.rb DELETED Viewed

@@ -1,247 +0,0 @@
-=begin copyright
-    rubylexer - a ruby lexer written in ruby
-    Copyright (C) 2004,2005  Caleb Clausen
-    This library is free software; you can redistribute it and/or
-    modify it under the terms of the GNU Lesser General Public
-    License as published by the Free Software Foundation; either
-    version 2.1 of the License, or (at your option) any later version.
-    This library is distributed in the hope that it will be useful,
-    but WITHOUT ANY WARRANTY; without even the implied warranty of
-    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-    Lesser General Public License for more details.
-    You should have received a copy of the GNU Lesser General Public
-    License along with this library; if not, write to the Free Software
-    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-=end
-module IOext
-   #read until a character in a user-supplied set is found.
-   #charrex must be a regexp that contains _only_ a single character class
-   def til_charset(charrex,blocksize=16)
-      blocks=[]
-      m=nil
-      until eof?
-         block=read blocksize
-         #if near eof, less than a full block may have been read
-         if m=charrex .match(block)
-            self.pos-=m.post_match.length+1
-            #'self.' shouldn't be needed... but is
-            blocks.push m.pre_match if m.pre_match.length>0
-            break
-         end
-         blocks<<block
-      end
-      return blocks.to_s
-   end
-   #-----------------------------------
-   #read and return next char if it matches ch
-   #else, leave input unread and return nil or false
-   def eat_next_if(ch)
-      oldpos=pos
-      c=read(1)
-      ch.kind_of? Integer and ch=ch.chr
-      return case c
-         when ch then c
-         when '' then self.pos=oldpos; nil
-         else    self.pos=oldpos; false
-      end
-   end
-   #-----------------------------------
-   def eat_while(pat)
-      pat.kind_of? Integer and pat=pat.chr
-      result=''
-      loop {
-         ch=read(1)
-         unless pat===ch
-            back1char unless ch.nil? #nil ch mean eof
-            return result
-         end
-         result << ch
-      }
-      return result
-   end
-   #-----------------------------------
-   #returns previous character in stream
-   #without changing stream position
-   #or '' if at beginning
-   def prevchar
-      pos==0 and return ''
-      back1char
-      return getc.chr
-   end
-   #-----------------------------------
-   #returns next character in stream
-   #without changing stream position
-   #or nil if at end
-   def nextchar
-      eof? and return nil
-      result=getc
-      back1char
-      return result
-   end
-   #-----------------------------------
-   #this should really be in class File...
-   def getchar
-      eof? and return ''
-      return getc.chr
-   end
-   #-----------------------------------
-   def back1char()    self.pos-=1   end
-   #-----------------------------------
-   def readahead(len)
-      oldpos=pos
-         result=read(len)
-      self.pos=oldpos
-      return result
-   end
-   #-----------------------------------
-   def readback(len)
-      oldpos=pos
-         self.pos-=len
-         result=read(len)
-      self.pos=oldpos
-      return result
-   end
-   #-----------------------------------
-   def readuntil(pat)
-      each(pat) { |match|
-          return match
-      }
-   end
-#-----------------------------------------------------------------------
-        #a String with the duck-type of a File
-        #just enough is emulated to fool RubyLexer
-		class FakeFile < ::String  #thanks to murphy for this lovely.
-			def initialize(*)
-				super
-				@pos = 0
-			end
-			attr_accessor :pos
-			def read x
-				pos = @pos
-				@pos += x
-                @pos>size and @pos=size
-				self[pos ... @pos]
-			end
-			def getc
-                eof? and return nil
-				pos = @pos
-				@pos += 1
-				self[pos]
-			end
-			def eof?
-				@pos >= size
-			end
-			def each_byte
-				until eof?
-					yield getc
-				end
-			end
-            def stat  #cheezy cheat to make #stat.size work
-              self
-            end
-            def close; end
-            def binmode; end
-            include IOext
-   #-----------------------------------
-   #read and return next char if it matches ch
-   #else, leave input unread and return nil or false
-   def eat_next_if(ch)
-      c=self[@pos,1]
-      ch.kind_of? Integer and ch=ch.chr
-      case c
-         when ch then @pos+=1;c
-         when '' then nil
-         else  false
-      end
-   end
-   #-----------------------------------
-   #returns previous character in stream
-   #without changing stream position
-   #or '' if at beginning
-   def prevchar #returns Fixnum
-      pos==0 ? '' : self[@pos-1]
-   end
-   #-----------------------------------
-   #returns next character in stream
-   #without changing stream position
-   #or nil if at end
-   def nextchar #returns Fixnum
-      self[@pos]
-   end
-   #-----------------------------------
-   def getchar  #returns String
-      eof? and return ''
-				pos = @pos
-				@pos += 1
-				self[pos,1]
-   end
-   #-----------------------------------
-   def back1char()    @pos-=1   end
-   #-----------------------------------
-   def readahead(len)
-     self[@pos,len]
-   end
-   #-----------------------------------
-   def readback(len)
-      assert @pos-len>=0
-      self[@pos-len,len]
-   end
-		end
-end
-class IO
-  include IOext
-end

data/require.rb DELETED Viewed

@@ -1,103 +0,0 @@
-#wrapper versions of all commands that import code into a running program:
-#require, load, eval and friends. the wrapped versions pass the code to
-#import to rubylexervsruby, to test whether it gets lexed correctly. an
-#exception is raised if an lex error happens, else the code should behave
-#as normal, just much slower.
-class Kernel
-  System_extension_extension=
-    case RUBY_PLATFORM
-      when /darwin/:   'o'
-      when /windows/i: 'dll'
-      else             'so'
-    end
-  System_ext_rex=/\.#{System_extension_extension}$/o
-  def require_name_resolve(name)
-    add_ext=case name
-      when System_ext_rex,/\.rb$/:
-      else name=/(#{name})(#{System_ext_rex}|\.rb)?/
-    end
-    name=/#{File::SEPARATOR}#{name}$/
-    $:.find{|dir|
-      dir.chomp File::SEPARATOR
-      Dir[/#{dir}#{name}/]
-    }
-  end
-  #reallyy jonesing for :wrap here
-  alias stdlib__require require
-  def require feat
-    name=feat
-    name=require_name_resolve(name)    unless File.abs_path?(name)
-    return(false) unless name
-    return(true) if $".grep(feat)
-    $"<<feat
-    return stdlib__require(name) if name[System_ext_rex]
-    load name
-  end
-  alias stdlib__load load
-  def load name,wrap=false
-    name=$:.find{|dir| Dir[dir,name]}    unless File.abs_path?(name)
-    if wrap then Module.new {
-           eval File.read(name), huh binding, name,1
-    } else eval File.read(name), huh binding, name,1
-    end
-    true
-  end
-  @@evalpos=1  #eval saves a position for the next eval sometimes... when?
-  alias stdlib__eval eval
-  def eval code,binding=nil,name='(eval)',linenum=1
-    if binding
-      rubylexervsruby(code, :name=>name, :linenum=>linenum, :locals=>eval("local_variables",binding))
-      return stdlib__eval code,binding,filename,linenum
-    end
-    huh Binding.of_caller{|bg| eval code,bg,name,linenum}
-  end
-  huh#got to do module_eval, class_eval, instance_eval,  etc
-end
-class Object
-  alias stdlib__instance_eval instance_eval
-  def instance_eval(code,&block)
-    block and return stdlib__instance_eval &block
-    eval code, stdlib__instance_eval{binding}
-  end
-end
-class Module
-  alias stdlib__module_eval module_eval
-  alias module_eval instance_eval
-end
-class Class
-  alias stdlib__class_eval class_eval
-  alias class_eval instance_eval
-end
-class Binding
-  alias stdlib__eval eval
-  def eval code,name='(eval)',linenum=1
-      rubylexervsruby(code, :name=>name, :linenum=>linenum, :locals=>eval("local_variables",binding))
-      huh #should set code to (effectively) output of tokentest
-      #how to do that within rubylexervsruby
-      return stdlib__eval code,self,filename,linenum
-  end
-end
-=begin
-def Module
-  def new
-    o=Object.extend self
-  end
-end
-=end