RubyGems - davidrichards-data_frame - Versions diffs - 0.0.3 - Mend

davidrichards-data_frame 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

data/README.rdoc +64 -0
data/VERSION.yml +4 -0
data/lib/data_frame.rb +114 -0
data/lib/data_frame/callback_array.rb +152 -0
data/lib/data_frame/transposable_array.rb +22 -0
data/lib/ext/string.rb +5 -0
data/lib/ext/symbol.rb +5 -0
data/spec/data_frame/callback_array_spec.rb +148 -0
data/spec/data_frame/transposable_array_spec.rb +138 -0
data/spec/data_frame_spec.rb +98 -0
data/spec/spec_helper.rb +8 -0
metadata +96 -0

data/README.rdoc ADDED Viewed

@@ -0,0 +1,64 @@
+== Data Frame
+This is a general data frame.  Load arrays and labels into it, and you will have a very powerful set of tools on your data set.
+==Usage
+  df = DataFrame.from_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv')
+  df.labels
+  # => [:x, :y, :month, :day, :ffmc, :dmc, :dc, :isi, :temp, :rh, :wind, :rain, :area]
+  df.dmc
+  # => [26.2, 35.4, 43.7, 33.3, 51.3, 85.3,...]
+  df.dmc.max
+  # => 291.3
+  df.dmc.min
+  # => 1.1
+  df.dmc.mean
+  # => 110.872340425532
+  df.dmc.std
+  # => 64.0464822492543
+  df = DataFrame.new(:list, :of, :things)
+  # => #<DataFrame:0x24ec6e8 @items=[], @labels=[:list, :of, :things]>
+  df.labels
+  # => [:list, :of, :things]
+  df << [1,2,3]
+  # => [[1, 2, 3]]
+  df.import([[2,3,4],[5,6,7]])
+  # => [[2, 3, 4], [5, 6, 7]]
+  df.items
+  # => [[1, 2, 3], [2, 3, 4], [5, 6, 7]]
+  df.list
+  # => [1, 2, 5]
+  df.list.correlation(df.things)
+  # => 1.0
+  df.list
+  # => [1, 2, 5]
+  df.things
+  # => [3, 4, 7]
+There are a few important features to know:
+* DataFrame.from_csv works for a string, a filename, or a URL.
+* FasterCSV parsing parameters can be passed to DataFrame.from_csv
+* DataFrame looks for operations first on the column labels, then on the row labels, then on the items table.  So don't name things :mean, :standard_deviation, :min, and that sort of thing.
+* CallbackArray allows you to set a callback anytime an array is tainted or untainted (taint, shift, pop, clear, map!, that sort of thing).  This is generally useful and will probably be copied into the Repositories gem.
+* TransposableArray is a subclass of CallbackArray, demonstrating how to use it.  It creates a very simple approach to memoization.  It caches the transpose of the table and resets it whenever it is tainted.
+To get your feet wet, you may want to play with data sets found here:
+  http://www.liaad.up.pt/~ltorgo/Regression/DataSets.html
+==Installation
+sudo gem install davidrichards-data_frame
+=== Dependencies
+* ActiveSupport: sudo gem install active_support
+* JustEnumerableStats: sudo gem install davidrichards-just_enumerable_stats
+* FasterCSV: sudo gem install faster_csv
+==COPYRIGHT
+Copyright (c) 2009 David Richards. See LICENSE for details.

data/VERSION.yml ADDED Viewed

@@ -0,0 +1,4 @@
+---
+:major: 0
+:minor: 0
+:patch: 3

data/lib/data_frame.rb ADDED Viewed

@@ -0,0 +1,114 @@
+require 'rubygems'
+require 'activesupport'
+require 'just_enumerable_stats'
+require 'open-uri'
+require 'fastercsv'
+Dir.glob("#{File.dirname(__FILE__)}/ext/*.rb").each { |file| require file }
+$:.unshift(File.dirname(__FILE__))
+require 'data_frame/callback_array'
+require 'data_frame/transposable_array'
+# This allows me to have named columns and optionally named rows in a
+# data frame, to work calculations (usually on the columns), to
+# transpose the matrix and store the transposed matrix until the object
+# is tainted.
+class DataFrame
+  class << self
+    # This is the neatest part of this neat gem.
+    # DataFrame.from_csv can be called in a lot of ways:
+    # DataFrame.from_csv(csv_contents)
+    # DataFrame.from_csv(filename)
+    # DataFrame.from_csv(url)
+    # If you need to define converters for FasterCSV, do it before calling
+    # this method:
+    # FasterCSV::Converters[:special] = lambda{|f| f == 'foo' ? 'bar' : 'foo'}
+    # DataFrame.from_csv('http://example.com/my_special_url.csv', :converters => :special)
+    # This returns bar where 'foo' was found and 'foo' everywhere else.
+    def from_csv(obj, opts={})
+      labels, table = infer_csv_contents(obj)
+      return nil unless labels and table
+      df = new(*labels)
+      df.import(table)
+      df
+    end
+    protected
+      def infer_csv_contents(obj, opts={})
+        contents = File.read(obj) if File.exist?(obj)
+        begin
+          open(obj) {|f| contents = f.read} unless contents
+        rescue
+          nil
+        end
+        contents ||= obj if obj.is_a?(String)
+        return nil unless contents
+        table = FCSV.parse(contents, default_csv_opts.merge(opts))
+        labels = table.shift
+        [labels, table]
+      end
+      def default_csv_opts; {:converters => :all}; end
+  end
+  # Loads a batch of rows.  Expects an array of arrays, else you don't
+  # know what you have.
+  def import(rows)
+    rows.each do |row|
+      self.add_item(row)
+    end
+  end
+  # The labels of the data items
+  attr_reader :labels
+  # The items stored in the frame
+  attr_reader :items
+  def initialize(*labels)
+    @labels = labels.map {|e| e.to_underscore_sym }
+    @items = TransposableArray.new
+  end
+  def add_item(item)
+    self.items << item
+  end
+  def row_labels
+    @row_labels ||= []
+  end
+  def row_labels=(ary)
+    raise ArgumentError, "Row labels must be an array" unless ary.is_a?(Array)
+    @row_labels = ary
+  end
+  def render_column(sym)
+    i = @labels.index(sym)
+    return nil unless i
+    @items.transpose[i]
+  end
+  def render_row(sym)
+    i = self.row_labels.index(sym)
+    return nil unless i
+    @items[i]
+  end
+  def method_missing(sym, *args, &block)
+    if self.labels.include?(sym)
+      render_column(sym)
+    elsif self.row_labels.include?(sym)
+      render_row(sym)
+    elsif @items.respond_to?(sym)
+      @items.send(sym, *args, &block)
+    else
+      super
+    end
+  end
+end

data/lib/data_frame/callback_array.rb ADDED Viewed

@@ -0,0 +1,152 @@
+# This overloads the tainting methods in array with callbacks.  So, I
+# can block all changes to an array, or broadcast to observers after a
+# change, or limit the size of an array. It really just opens up the array to one more dimension: change.  Before and after change, stack up any activity to block or enhance the experience.  There are also callbacks on untaint.  The tainting methods actually
+class CallbackArray < Array
+  include ActiveSupport::Callbacks
+  define_callbacks :before_taint, :after_taint, :before_untaint, :after_untaint
+  def wrap_call(safe_method, *args)
+    callback_result = run_callbacks(:before_taint)
+    if callback_result
+      result = self.send(safe_method, *args)
+      self.orig_taint
+      run_callbacks(:after_taint)
+    end
+    result
+  end
+  protected :wrap_call
+  # Need the original taint for all tainting methods
+  alias :orig_taint :taint
+  def taint
+    callback_result = run_callbacks(:before_taint)
+    if callback_result
+      result = self.orig_taint
+      run_callbacks(:after_taint)
+    end
+    result
+  end
+  # No other method needs orig_untaint, so building this in the cleanest
+  # way possible.
+  orig_untaint = instance_method(:untaint)
+  define_method(:untaint) {
+    callback_result = run_callbacks(:before_untaint)
+    if callback_result
+      val = orig_untaint.bind(self).call
+      run_callbacks(:after_untaint)
+    end
+    val
+  }
+  alias :nontainting_assign :[]=
+  def []=(index, value)
+    wrap_call(:nontainting_assign, index, value)
+  end
+  alias :nontainting_append :<<
+  def <<(value)
+    wrap_call(:nontainting_append, value)
+  end
+  alias :nontainting_delete :delete
+  def delete(value)
+    wrap_call(:nontainting_delete, value)
+  end
+  alias :nontainting_push :push
+  def push(value)
+    wrap_call(:nontainting_push, value)
+  end
+  alias :nontainting_pop :pop
+  def pop
+    wrap_call(:nontainting_pop)
+  end
+  alias :nontainting_shift :shift
+  def shift
+    wrap_call(:nontainting_shift)
+  end
+  alias :nontainting_unshift :unshift
+  def unshift(value)
+    wrap_call(:nontainting_unshift, value)
+  end
+  alias :nontainting_map! :map!
+  def map!(&block)
+    callback_result = run_callbacks(:before_taint)
+    if callback_result
+      result = nontainting_map!(&block)
+      self.orig_taint
+      run_callbacks(:after_taint)
+    end
+    result
+  end
+  alias :nontainting_sort! :sort!
+  def sort!(&block)
+    callback_result = run_callbacks(:before_taint)
+    if callback_result
+      result = nontainting_sort!(&block)
+      self.orig_taint
+      run_callbacks(:after_taint)
+    end
+    result
+  end
+  alias :nontainting_reverse! :reverse!
+  def reverse!
+    wrap_call(:nontainting_reverse!)
+  end
+  alias :nontainting_collect! :collect!
+  def collect!(&block)
+    callback_result = run_callbacks(:before_taint)
+    if callback_result
+      result = nontainting_collect!(&block)
+      self.orig_taint
+      run_callbacks(:after_taint)
+    end
+    result
+  end
+  alias :nontainting_compact! :compact!
+  def compact!
+    wrap_call(:nontainting_compact!)
+  end
+  alias :nontainting_reject! :reject!
+  def reject!(&block)
+    callback_result = run_callbacks(:before_taint)
+    if callback_result
+      result = nontainting_reject!(&block)
+      self.orig_taint
+      run_callbacks(:after_taint)
+    end
+    result
+  end
+  alias :nontainting_slice! :slice!
+  def slice!(*args)
+    wrap_call(:nontainting_slice!, *args)
+  end
+  alias :nontainting_flatten! :flatten!
+  def flatten!
+    wrap_call(:nontainting_flatten!)
+  end
+  alias :nontainting_uniq! :uniq!
+  def uniq!
+    wrap_call(:nontainting_uniq!)
+  end
+  alias :nontainting_clear :clear
+  def clear
+    wrap_call(:nontainting_clear)
+  end
+end

data/lib/data_frame/transposable_array.rb ADDED Viewed

@@ -0,0 +1,22 @@
+# The only trick in this array is that it's transpose is memoized until
+# it is tainted.  This will reduce computations elegantly.
+class TransposableArray < CallbackArray
+  after_taint :clear_cache
+  orig_transpose = instance_method(:transpose)
+  define_method(:transpose) {
+    @transpose ||= orig_transpose.bind(self).call
+  }
+  # For debugging and testing purposes, it just feels dirty to always ask
+  # for @ta.send(:instance_variable_get, :@transpose)
+  def cache
+    @transpose
+  end
+  def clear_cache
+    @transpose = nil
+  end
+  protected :clear_cache
+end

data/lib/ext/string.rb ADDED Viewed

@@ -0,0 +1,5 @@
+class String # :nodoc:
+  def to_underscore_sym
+    self.titleize.gsub(/\s+/, '').underscore.to_sym
+  end
+end

data/lib/ext/symbol.rb ADDED Viewed

@@ -0,0 +1,5 @@
+class Symbol # :nodoc:
+  def to_underscore_sym
+    self.to_s.titleize.gsub(/\s+/, '').underscore.to_sym
+  end
+end

data/spec/data_frame/callback_array_spec.rb ADDED Viewed

@@ -0,0 +1,148 @@
+require File.join(File.dirname(__FILE__), "/../spec_helper")
+# TransposableArray is a thorough test on the after_taint method.  Here
+# I only test the other callbacks.
+class Register
+  def self.next(meth)
+    @@count ||= {}
+    @@count[meth] ||= 0
+    @@count[meth] += 1
+  end
+  def self.for(meth)
+    @@count ||= {}
+    @@count[meth]
+  end
+end
+class A < CallbackArray
+  before_taint :register_before_taint
+  def register_before_taint
+    Register.next(:before_taint)
+  end
+  before_untaint :register_before_untaint
+  def register_before_untaint
+    Register.next(:before_untaint)
+  end
+  after_untaint :register_after_untaint
+  def register_after_untaint
+    Register.next(:after_untaint)
+  end
+end
+describe CallbackArray do
+  before do
+    @a = A.new [1,2,3]
+  end
+  context "before_taint" do
+    before do
+      @c = Register.for(:before_taint) || 0
+    end
+    after do
+      Register.for(:before_taint).should eql(@c + 1)
+      @a.should be_tainted
+    end
+    it "should callback before taint" do
+      @a.taint
+    end
+    it "should callback before :[]=" do
+      @a[0] = 2
+    end
+    it "should callback before :<<" do
+      @a << 3
+    end
+    it "should callback before :delete" do
+      @a.delete(2)
+    end
+    it "should callback before :push" do
+      @a.push(5)
+    end
+    it "should callback before :pop" do
+      @a.pop
+    end
+    it "should callback before :shift" do
+      @a.shift
+    end
+    it "should callback before :unshift" do
+      @a.unshift(6)
+    end
+    it "should callback before :map!" do
+      @a.map! {|e| e}
+    end
+    it "should callback before :sort!" do
+      @a.sort!
+    end
+    it "should callback before :reverse!" do
+      @a.reverse!
+    end
+    it "should callback before :collect!" do
+      @a.collect! {|e| e}
+    end
+    it "should callback before :compact!" do
+      @a.compact!
+    end
+    it "should callback before :reject!" do
+      @a.reject! {|e| not e}
+    end
+    it "should callback before :slice!" do
+      @a.slice!(1,2)
+    end
+    it "should callback before :flatten!" do
+      @a.flatten!
+    end
+    it "should callback before :uniq!" do
+      @a.uniq!
+    end
+    it "should callback before :clear" do
+      @a.clear
+    end
+  end
+  it "should not adjust the array in other methods" do
+    @a.at(0)
+    @a.sort
+    @a.uniq
+    @a.find{|e| e}
+    Register.for(:before_taint).should be_nil
+    @a.should_not be_tainted
+  end
+  it "should callback before untaint" do
+    c = Register.for(:before_untaint) || 0
+    @a.taint
+    @a.untaint
+    Register.for(:before_untaint).should eql(c + 1)
+  end
+  it "should callback after untaint" do
+    c = Register.for(:after_untaint) || 0
+    @a.taint
+    @a.untaint
+    Register.for(:after_untaint).should eql(c + 1)
+  end
+end

data/spec/data_frame/transposable_array_spec.rb ADDED Viewed

@@ -0,0 +1,138 @@
+require File.join(File.dirname(__FILE__), "/../spec_helper")
+describe TransposableArray do
+  before do
+    @ta = TransposableArray.new [[1,2,3],[4,5,6],[7,8,9]]
+    @t = [[1,4,7],[2,5,8],[3,6,9]]
+  end
+  it "should be able to transpose itself" do
+    @ta.transpose.should eql(@t)
+  end
+  it "should cache the transpose" do
+    @ta.cache.should be_nil
+    @ta.transpose
+    @ta.cache.should eql(@t)
+  end
+  it "should clear the cache on taint" do
+    @count = nil
+    @ta.transpose
+    @ta.taint
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on []=" do
+    @ta.transpose
+    @ta[0] = 1
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on <<" do
+    @ta.transpose
+    @ta << 1
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on delete" do
+    @ta.transpose
+    @ta.delete(0)
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on push" do
+    @ta.transpose
+    @ta.push(1)
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on pop" do
+    @ta.transpose
+    @ta.pop
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on shift" do
+    @ta.transpose
+    @ta.shift
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on unshift" do
+    @ta.transpose
+    @ta.unshift(1)
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on map!" do
+    @ta.transpose
+    @ta.map!{ |e| e }
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on sort!" do
+    @ta.transpose
+    @ta.sort!
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on reverse!" do
+    @ta.transpose
+    @ta.reverse!
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on collect!" do
+    @ta.transpose
+    @ta.collect! {|e| e}
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on compact!" do
+    @ta.transpose
+    @ta.compact!
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on reject!" do
+    @ta.transpose
+    @ta.reject! {|e| e}
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on slice!" do
+    @ta.transpose
+    @ta.slice!(1,2)
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on flatten!" do
+    @ta.transpose
+    @ta.flatten!
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on uniq!" do
+    @ta.transpose
+    @ta.uniq!
+    @ta.cache.should be_nil
+  end
+  it "should clear the cache on clear" do
+    @ta.transpose
+    @ta.clear
+    @ta.cache.should be_nil
+  end
+  it "should not adjust the array in other methods" do
+    @ta.transpose
+    @ta.at(0)
+    @ta.sort
+    @ta.uniq
+    @ta.find{|e| e}
+    @ta.cache.should eql(@t)
+  end
+end

data/spec/data_frame_spec.rb ADDED Viewed

@@ -0,0 +1,98 @@
+require File.join(File.dirname(__FILE__), "/spec_helper")
+describe DataFrame do
+  before do
+    @labels = [:these, :are, :the, :labels]
+    @df = DataFrame.new(*@labels)
+  end
+  it "should initialize with labels" do
+    df = DataFrame.new(*@labels)
+    df.labels.should eql(@labels)
+  end
+  it "should initialize with an empty items list" do
+    @df.items.should be_is_a(TransposableArray)
+    @df.items.should be_empty
+  end
+  it "should be able to add an item" do
+    item = [1,2,3,4]
+    @df.add_item(item)
+    @df.items.should eql([item])
+  end
+  it "should use just_enumerable_stats" do
+    [1,2,3].std.should eql(1.0)
+    lambda{[1,2,3].cor([2,3,5])}.should_not raise_error
+  end
+  context "column and row operations" do
+    before do
+      @df.add_item([1,2,3,4])
+      @df.add_item([5,6,7,8])
+      @df.add_item([9,10,11,12])
+    end
+    it "should have a method for every label, the column in the data frame" do
+      @df.these.should eql([1,5,9])
+    end
+    it "should make columns easily computable" do
+      @df.these.std.should eql([1,5,9].std)
+    end
+    it "should defer unknown methods to the items in the data frame" do
+      @df[0].should eql([1,2,3,4])
+      @df << [13,14,15,16]
+      @df.last.should eql([13,14,15,16])
+      @df.map { |e| e.sum }.should eql([10,26,42,58])
+    end
+    it "should allow optional row labels" do
+      @df.row_labels.should eql([])
+    end
+    it "should have a setter for row labels" do
+      @df.row_labels = [:other, :things, :here]
+      @df.row_labels.should eql([:other, :things, :here])
+    end
+    it "should be able to access rows by their labels" do
+      @df.row_labels = [:other, :things, :here]
+      @df.here.should eql([9,10,11,12])
+    end
+    it "should make rows easily computable" do
+      @df.row_labels = [:other, :things, :here]
+      @df.here.std.should be_close(1.414, 0.001)
+    end
+  end
+  it "should be able to import more than one row at a time" do
+    @df.import([[2,2,2,2],[3,3,3,3],[4,4,4,4]])
+    @df.row_labels = [:twos, :threes, :fours]
+    @df.twos.should eql([2,2,2,2])
+    @df.threes.should eql([3,3,3,3])
+    @df.fours.should eql([4,4,4,4])
+  end
+  context "csv" do
+    it "should compute easily from csv" do
+      contents = %{X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
+7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0,0
+7,4,oct,tue,90.6,35.4,669.1,6.7,18,33,0.9,0,0
+}
+      labels = [:x, :y, :month, :day, :ffmc, :dmc, :dc, :isi, :temp, :rh, :wind, :rain, :area]
+      @df = DataFrame.from_csv(contents)
+      @df.labels.should eql(labels)
+      @df.x.should eql([7,7])
+      @df.area.should eql([0,0])
+    end
+  end
+end

data/spec/spec_helper.rb ADDED Viewed

@@ -0,0 +1,8 @@
+$: << File.join(File.dirname(__FILE__), "/../lib")
+require 'rubygems'
+require 'spec'
+require 'data_frame'
+Spec::Runner.configure do |config|
+end

metadata ADDED Viewed

@@ -0,0 +1,96 @@
+--- !ruby/object:Gem::Specification
+name: davidrichards-data_frame
+version: !ruby/object:Gem::Version
+  version: 0.0.3
+platform: ruby
+authors:
+- David Richards
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2009-07-23 00:00:00 -07:00
+default_executable:
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: active_support
+  type: :runtime
+  version_requirement:
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: "0"
+    version:
+- !ruby/object:Gem::Dependency
+  name: davidrichards-just_enumerable_stats
+  type: :runtime
+  version_requirement:
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: "0"
+    version:
+- !ruby/object:Gem::Dependency
+  name: faster_csv
+  type: :runtime
+  version_requirement:
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: "0"
+    version:
+description: Data Frames with memoized transpose
+email: davidlamontrichards@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- README.rdoc
+- VERSION.yml
+- lib/data_frame
+- lib/data_frame/callback_array.rb
+- lib/data_frame/transposable_array.rb
+- lib/data_frame.rb
+- lib/ext
+- lib/ext/string.rb
+- lib/ext/symbol.rb
+- spec/data_frame
+- spec/data_frame/callback_array_spec.rb
+- spec/data_frame/transposable_array_spec.rb
+- spec/data_frame_spec.rb
+- spec/spec_helper.rb
+has_rdoc: true
+homepage: http://github.com/davidrichards/data_frame
+post_install_message:
+rdoc_options:
+- --inline-source
+- --charset=UTF-8
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: "0"
+  version:
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: "0"
+  version:
+requirements: []
+rubyforge_project:
+rubygems_version: 1.2.0
+signing_key:
+specification_version: 2
+summary: Data Frames with memoized transpose
+test_files: []