RubyGems - object_table - Versions diffs - 0.3.4 → 0.4.0 - Mend

object_table 0.3.4 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (40) hide show

checksums.yaml +4 -4
data/.travis.yml +0 -1
data/README.md +206 -108
data/lib/object_table/basic_grid.rb +1 -1
data/lib/object_table/column.rb +6 -7
data/lib/object_table/factory.rb +46 -0
data/lib/object_table/grouping/grid.rb +47 -0
data/lib/object_table/grouping.rb +109 -0
data/lib/object_table/joining.rb +71 -0
data/lib/object_table/masked_column.rb +2 -2
data/lib/object_table/printing.rb +69 -0
data/lib/object_table/stacking.rb +66 -0
data/lib/object_table/static_view.rb +2 -5
data/lib/object_table/table_methods.rb +35 -22
data/lib/object_table/util.rb +19 -0
data/lib/object_table/version.rb +1 -1
data/lib/object_table/view.rb +7 -5
data/lib/object_table/view_methods.rb +3 -2
data/lib/object_table.rb +8 -19
data/object_table.gemspec +2 -0
data/spec/object_table/column_spec.rb +2 -2
data/spec/object_table/grouping_spec.rb +475 -0
data/spec/object_table/static_view_spec.rb +2 -2
data/spec/object_table/util_spec.rb +43 -0
data/spec/object_table/view_spec.rb +6 -16
data/spec/object_table_spec.rb +45 -3
data/spec/subclassing_spec.rb +44 -5
data/spec/support/joining_example.rb +171 -0
data/spec/support/object_table_example.rb +124 -29
data/spec/support/stacking_example.rb +111 -0
data/spec/support/utils.rb +8 -0
data/spec/support/view_example.rb +10 -13
metadata +20 -12
data/lib/object_table/group.rb +0 -10
data/lib/object_table/grouped.rb +0 -93
data/lib/object_table/printable.rb +0 -72
data/lib/object_table/stacker.rb +0 -59
data/lib/object_table/table_child.rb +0 -19
data/spec/object_table/grouped_spec.rb +0 -351
data/spec/support/stacker_example.rb +0 -158

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 689d1d35ab1e6a33a345241e866169c7ddcf40fe
-  data.tar.gz: 11b9d772d9ddf5e2f32feb0b7250910f36be3007
+  metadata.gz: 02f7e1642a2f1f8106f32e3b04f8cfebf8676d52
+  data.tar.gz: 0986af221eab29e6654f511cebd5534962d82412
 SHA512:
-  metadata.gz: f93998847f29f3926d9d956098b3f6b29f6392c14bfa15b1300465d8f9d45cb05876e6a45a9db4e073ef3a10884f3c9c6f00d6365ef3dfff569eea4a13eb41f6
-  data.tar.gz: 669f9ed3791dcc71a29480a1be3073d700ae8bcba306fae78e4596da805f7b2c31830e23d43640f1d3b2360379194250a527cb366a6553fcd1253b5a4dd52d26
+  metadata.gz: d8ee1c4350da59156c2f17e23431969d126a5f8bac0a21348d048fb18486c765d1fe6f4f551a53ef18e03a9642e0704da93e997bb16c2d700003f5cde4f64079
+  data.tar.gz: d15636d515d1d5439f5233e58738b5ffdb273a4df77000a4fbdc5ba6ff5018e16e382eb487c6aa0d27e6a9b390e1d6b551a3010ca843123a227c4ccbdee0278d

data/.travis.yml CHANGED Viewed

@@ -1,6 +1,5 @@
 language: ruby
 rvm:
-  - 1.9.3
   - 2.0.0
   - 2.1.0
   - 2.2.0

data/README.md CHANGED Viewed

@@ -1,25 +1,17 @@
 ruby-object-table
 =================
-[![Gem Version][GV img]][Gem Version]
-[![Build Status][BS img]][Build Status]
-[![Code Climate][CC img]][Code Climate]
-[![Coverage Status][CS img]][Coverage Status]
-[Gem Version]: https://rubygems.org/gems/object_table
-[Build Status]: https://travis-ci.org/lincheney/ruby-object-table
-[Code Climate]: https://codeclimate.com/github/lincheney/ruby-object-table
-[Coverage Status]: https://coveralls.io/r/lincheney/ruby-object-table
-[GV img]: https://badge.fury.io/rb/object_table.png
-[BS img]: https://travis-ci.org/lincheney/ruby-object-table.png
-[CC img]: https://codeclimate.com/github/lincheney/ruby-object-table.png
-[CS img]: https://coveralls.io/repos/lincheney/ruby-object-table/badge.png?branch=master
-Simple data table/frame implementation in ruby
+[![Gem Version](https://badge.fury.io/rb/object_table.svg)](http://badge.fury.io/rb/object_table)
+[![Build Status](https://travis-ci.org/lincheney/ruby-object-table.svg?branch=master)](https://travis-ci.org/lincheney/ruby-object-table)
+[![Code Climate](https://codeclimate.com/github/lincheney/ruby-object-table/badges/gpa.svg)](https://codeclimate.com/github/lincheney/ruby-object-table)
+[![Coverage Status](https://coveralls.io/repos/lincheney/ruby-object-table/badge.svg?branch=master)](https://coveralls.io/r/lincheney/ruby-object-table?branch=master)
+Simple data table/frame implementation in ruby.
 Probably slow and extremely inefficient, but it works and that's all that matters.
 Uses NArrays (https://github.com/masa16/narray) for storing data.
+Be sure to check out the [release notes](https://github.com/lincheney/ruby-object-table/releases).
 ## Creating a table
 Just pass a hash of columns into the constructor.
@@ -69,7 +61,7 @@ Otherwise the scalars are extended to match the length of the vector columns
 - `#nrows` returns the number of rows
 - `#colnames` returns an array of the column names
 - `#clone` make a copy of the table
-- `#stack(table1, table2, ...)` appends then supplied tables
+- `#stack(table1, table2, ...)` appends the supplied tables
 - `#apply(&block)` evaluates `block` in the context of the table
 - `#where(&block)` filters the table
 - `#group_by(&block)` splits the table into groups
@@ -369,104 +361,223 @@ If you want to filter a table and keep that data (i.e. without it syncing with t
 ## Grouping (and aggregating)
 Use the `#group_by` method and pass column names or a block that returns grouping keys.
-Then call `#each` to iterate through the groups or `#apply` to aggregate the results.
-The argument to `#group_by` should be a hash mapping key name => key. See the below example.
+```ruby
+# group by column_1
+>>> data.group_by(:column_1)
+# or group by a dynamically calculated value
+# note the double braces is actually a hash inside a block
+>>> data.group_by{{ key: column_1.round }}
+```
+This gives you a `ObjectTable::Grouping`.
+There are two ways to perform aggregation with a grouping: using `apply`/`each` or using `reduce`.
+Using `apply`/`each` is the most flexible and powerful.
+It iterates through each group and calls a supplied block for each group.
+`reduce` instead iterates through each *row* and keeps track of which group the row belongs to.
+It can only be used with (online algorithms)[http://en.wikipedia.org/wiki/Online_algorithm]
+but can be much faster if there is a large number of groups (relative to the number of rows).
+### Using `apply`/`each`
+`each` enumerates through the groups.
+`apply` is similar to doing `grouping.each.map` but instead of collecting results in an `Array`
+the results are stacked into a new table.
 ```ruby
->>> data = ObjectTable.new(name: ['John', 'Tom', 'John', 'Tom', 'Jim'], value: 1..5)
- => ObjectTable(5, 2)
-         name  value
-  0:   "John"      1
-  1:    "Tom"      2
-  2:   "John"      3
-  3:    "Tom"      4
-  4:    "Jim"      5
-         name  value
-# group by the name and get the no. of rows in each group
->>> num_rows = []
->>> data.group_by(:name).each{ num_rows.push(nrows) }
->>> num_rows
- => [2, 2, 1]
-# or group with a block
->>> num_rows = []
-# let's group by initial letter of the name
->>> data.group_by{ {initial: name.map{|n| n[0]}} }.each{ num_rows.push(nrows) }
->>> num_rows
- => [3, 2]
+# let's create some data
+>>> data = ObjectTable.new(col1: 1..10, col2: (1..20).step(2).to_a)
+  => ObjectTable(10, 2)
+       col1  col2
+  0:      1     1
+  1:      2     3
+  2:      3     5
+  3:      4     7
+  4:      5     9
+  5:      6    11
+  6:      7    13
+  7:      8    15
+  8:      9    17
+  9:     10    19
+       col1  col2
+# print sum of col2 for col1 remainder 3
+>>> data.group_by{{ rem: col1 % 3 }}.each{ p col2.sum }; nil
+40
+27
+33
+# which sum is which group?
+# we can access the group keys through @K
+>>> data.group_by{{ rem: col1 > 0 }}.each{ p [@K.rem, col2.sum] }; nil
+[1, 40]
+[2, 27]
+[0, 33]
+# collect results into an array
+# note that we need an argument to the map block
+>>> data.group_by{{ rem: col1 % 3 }}.each.map{|grp| [grp.K.rem, grp.col2.sum] }
+ => [[1, 40], [2, 27], [0, 33]]
+# collect the results into a new table using apply()
+>>> data.group_by{{ rem: col1 % 3 }}.apply{ col2.sum }
+ => ObjectTable(3, 2)
+       rem  v_0
+  0:     1   40
+  1:     2   27
+  2:     0   33
+       rem  v_0
+# aggregated columns are given default names of v_0, v_1, etc.
+# let's set the names ourselves
+>>> data.group_by{{ rem: col1 % 3 }}.apply{ @R[sum: col2.sum] }
+ => ObjectTable(3, 2)
+       rem  sum
+  0:     1   40
+  1:     2   27
+  2:     0   33
+       rem  sum
 ```
-The group keys are accessible through the `@K` shortcut
+We can also assign new columns based on the group (you cannot do this with `reduce`).
 ```ruby
->>> data = ObjectTable.new(name: ['John', 'Tom', 'John', 'Tom', 'Jim'], value: 1..5)
->>> data.group_by(:name).each{ p @K }
-{:name=>"John"}
-{:name=>"Tom"}
-{:name=>"Jim"}
-# or if you are using a block with args
->>> data.group_by(:name).each{|grp| p grp.K }
-{:name=>"John"}
-{:name=>"Tom"}
-{:name=>"Jim"}
+>>> data.group_by{{ rem: col1 % 3 }}.each{ self[:sum] = col2.sum }
+>>> data
+ => ObjectTable(10, 3)
+       col1  col2  sum
+  0:      1     1   40
+  1:      2     3   27
+  2:      3     5   33
+  3:      4     7   40
+  4:      5     9   27
+  5:      6    11   33
+  6:      7    13   40
+  7:      8    15   27
+  8:      9    17   33
+  9:     10    19   40
+       col1  col2  sum
 ```
+### Using `reduce`
-### Aggregation
+`reduce` returns a new table like `apply`
+(and there is no equivalent for `each`, i.e. iterating through groups).
-Call `#apply` and the results are stored into a table.
+Pass a block to `reduce`; you will have access to the `@R` variable
+which is a group-specific hash where you can accumulate results.
+See the examples below.
 ```ruby
->>> data = ObjectTable.new(name: ['John', 'Tom', 'John', 'Tom', 'Jim'], value: 1..5)
->>> data.group_by(:name).apply{ value.mean }
+# sum of column 2
+>>> data.group_by{{ rem: col1 % 3 }}.reduce{ @R[:sum] += col2 }
+ => ObjectTable(3, 2)
+       rem  sum
+  0:     1   40
+  1:     2   27
+  2:     0   33
+       rem  sum
+# we can supply initial values, e.g. if we wish to calculate product
+>>> data.group_by{{ rem: col1 % 3 }}.reduce(prod: 1){ @R[:prod] *= col2 }
  => ObjectTable(3, 2)
-         name  v_0
-  0:   "John"  2.0
-  1:    "Tom"  3.0
-  2:    "Jim"  5.0
-         name  v_0
+       rem  prod
+  0:     1  1729
+  1:     2   405
+  2:     0   935
+       rem  prod
+```
+You should avoid reduce unless your aggregating operation is simply
+and you have a relatively large number of groups
+(`reduce` is slower than `apply` with few groups).
+### Comparison of `apply` and `reduce`
+The `reduce` version is more complicated because we must implement the
+online algorithm ourselves.
+#### Sum
+```ruby
+>>> data.group_by{{ rem: col1 % 3 }}.apply{ @R[sum: col2.sum] }
+>>> data.group_by{{ rem: col1 % 3 }}.reduce{ @R[:sum] += col2 }
 ```
-Normally you can only have one aggregated column with a default name of v_0.
-You can have more columns and set column names by making a `ObjectTable` or using the @R shortcut.
+#### Product
 ```ruby
->>> data.group_by(:name).apply{ @R[ mean: value.mean, sum: value.sum] }
- => ObjectTable(3, 3)
-         name  mean  sum
-  0:   "John"   2.0    4
-  1:    "Tom"   3.0    6
-  2:    "Jim"   5.0    5
-         name  mean  sum
-# or if you are using a block with args
->>> data.group_by(:name).apply{|grp| grp.R[ mean: grp.value.mean, sum: grp.value.sum] }
- => ObjectTable(3, 3)
-         name  mean  sum
-  0:   "John"   2.0    4
-  1:    "Tom"   3.0    6
-  2:    "Jim"   5.0    5
-         name  mean  sum
+>>> data.group_by{{ rem: col1 % 3 }}.apply{ @R[prod: col2.prod] }
+>>> data.group_by{{ rem: col1 % 3 }}.reduce(prod: 1){ @R[:prod] *= col2 }
+```
+#### Variance
+Online algorithm for variance taken from:
+http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
+```ruby
+>>> data.group_by{{ rem: col1 % 3 }}.apply{ @R[var: col2.stddev**2] }
+>>> data.group_by{{ rem: col1 % 3 }}.reduce(n: 0, mean: 0.0, m2: 0) do
+      @R[:n] += 1
+      delta = col2 - @R[:mean]
+      @R[:mean] += delta / @R[:n]
+      @R[:m2] += delta * (col2 - @R[:mean])
+    end.apply{ @R[rem: rem, variance: m2 / (n - 1)] }
 ```
-### Assigning to columns
+## Joining
-Assigning to columns will assign by group.
+Note the current joining algorithm is quite slow.
 ```ruby
-# every row with the same name will get the same group_values
->>> data.group_by(:name).each{|grp| grp[:group_values] = grp.value.to_a.join(',') }
+# let's create some data
+>>> left = ObjectTable.new( key: [1, 2, 3, 5, 7], val_1: 1..5 )
+>>> right = ObjectTable.new( key: [2, 3, 4, 5], val_2: 'a'..'d')
+# inner join
+>>> left.join(right, :key)
+  => ObjectTable(3, 3)
+       key  val_1  val_2
+  0:     2      2    "a"
+  1:     3      3    "b"
+  2:     5      4    "d"
+       key  val_1  val_2
+# left join
+>>> left.join(right, :key, type: 'left')
  => ObjectTable(5, 3)
-         name  value  group_values
-  0:   "John"      1         "1,3"
-  1:    "Tom"      2         "2,4"
-  2:   "John"      3         "1,3"
-  3:    "Tom"      4         "2,4"
-  4:    "Jim"      5           "5"
-         name  value  group_values
+       key  val_1  val_2
+  0:     1      1    nil
+  1:     2      2    "a"
+  2:     3      3    "b"
+  3:     5      4    "d"
+  4:     7      5    nil
+       key  val_1  val_2
+# right join
+>>> left.join(right, :key, type: 'right')
+ => ObjectTable(4, 3)
+       key  val_1  val_2
+  0:     2      2    "a"
+  1:     3      3    "b"
+  2:     5      4    "d"
+  3:     4      0    "c"
+       key  val_1  val_2
+# outer join
+>>> left.join(right, :key, type: 'outer')
+ => ObjectTable(6, 3)
+       key  val_1  val_2
+  0:     1      1    nil
+  1:     2      2    "a"
+  2:     3      3    "b"
+  3:     5      4    "d"
+  4:     7      5    nil
+  5:     4      0    "c"
+       key  val_1  val_2
 ```
 ## Subclassing ObjectTable
@@ -491,8 +602,8 @@ The act of subclassing itself is easy, but any methods you add won't be availabl
 NoMethodError: undefined method `a_plus_b' for #<ObjectTable::View:0x000000011d4dd0>
 ```
-To make it work, you'll need to subclass `View`, `StaticView` and `Group` too and assign those subclasses under your ObjectTable subclass.
-The easiest way is just to include a module with your common methods.
+The easiest way to make it work is to put your methods into a mixin
+and use the `fully_include` class method.
 ```ruby
 >>> class WorkingTable < ObjectTable
@@ -502,12 +613,7 @@ The easiest way is just to include a module with your common methods.
         end
       end
-      include Mixin
-      # subclass each of these and include the Mixin too
-      class StaticView < StaticView; include Mixin; end
-      class View < View; include Mixin; end
-      class Group < Group; include Mixin; end
+      fully_include Mixin
     end
 ...
@@ -518,15 +624,7 @@ The easiest way is just to include a module with your common methods.
 # hurrah!
 >>> data.where{ a > 1 }.a_plus_b
- => NArray.int(2):
+ => ObjectTable::MaskedColumn.int(2):
 [ 7, 9 ]
-# also works in groups!
->>> data.group_by{{odd: a % 2}}.each do
-      p "when a % 2 == #{@K[:odd]}, a + b == #{a_plus_b.to_a}"
-    end
-...
-"when a % 2 == 1, a + b == [5, 9]"
-"when a % 2 == 0, a + b == [7]"
 ```

data/lib/object_table/basic_grid.rb CHANGED Viewed

@@ -8,7 +8,7 @@ class ObjectTable::BasicGrid < Hash
   def _get_number_rows!
     each{|k, v| self[k] = v.to_a if v.is_a?(Range)}
-    rows = map{|k, v| ObjectTable::Column.length_of(v) rescue nil}.compact.uniq
+    rows = map{|k, v| ObjectTable::Column.length_of(v)}.compact.uniq
   end
   def _ensure_uniform_columns!(rows = nil)

data/lib/object_table/column.rb CHANGED Viewed

@@ -4,17 +4,16 @@ module ObjectTable::Column
   def self.length_of(array)
     case array
-    when Array
-      array.length
-    when NArray
-      array.shape.last or 0
-    else
-      raise "Expected Array or NArray, got #{array}"
+    when Array then array.length
+    when NArray then (array.shape.last or 0)
+    else nil
     end
   end
-  def self.stack(*columns)
+  def self.stack(*columns); _stack(columns); end
+  def self._stack(columns)
     columns = columns.reject(&:empty?)
     return NArray[] if columns.empty?
     return columns[0].clone if columns.length == 1

data/lib/object_table/factory.rb ADDED Viewed

@@ -0,0 +1,46 @@
+require 'forwardable'
+module ObjectTable::Factory
+  CLASS_MAP = {
+    '__static_view_cls__' => 'StaticView',
+    '__view_cls__'        => 'View',
+    '__group_cls__'       => 'Group',
+    }.freeze
+  FACTORIES = (CLASS_MAP.keys + ['__table_cls__']).freeze
+  module ClassMethods
+    CLASS_MAP.each do |name, const|
+      eval "def #{name}; self::#{const}; end"
+    end
+    def __table_cls__
+      self
+    end
+    def fully_include(mixin)
+      include(mixin)
+      constants = constants(false)
+      CLASS_MAP.each do |name, const|
+        child_cls = send(name)
+        # create a new subclass if there isn't already one
+        child_cls = const_set(const, Class.new(child_cls)) unless constants.include?(child_cls)
+        child_cls.send(:include, mixin)
+      end
+    end
+  end
+  extend Forwardable
+  def_delegators 'self.class', *FACTORIES
+  def self.included(base)
+    base.extend(ClassMethods)
+  end
+  module SubFactory
+    FACTORIES.each do |name|
+      eval "def #{name}; @#{name} ||= @parent.#{name}; end"
+    end
+  end
+end

data/lib/object_table/grouping/grid.rb ADDED Viewed

@@ -0,0 +1,47 @@
+require_relative '../util'
+class ObjectTable::Grouping
+  class Grid
+    attr_reader :values, :index
+    def initialize(keys, defaults)
+      unless defaults.is_a?(Hash)
+        raise "Expected defaults to be a hash, got: #{defaults.inspect}"
+      end
+      defaults.default = 0
+      @defaults = defaults
+      @values = {}
+      @index = {}
+      @ids = keys.map{|k| @index[k] ||= @index.length}
+      @keys = keys
+      @length = @index.length
+    end
+    def [](k)
+      (@values[k] ||= Array.new(@length, @defaults[k]))[@id]
+    end
+    def []=(k, v)
+      @values[k][@id] = v
+    end
+    module RowFactory
+      def self.new(*args)
+        Struct.new(*args){ attr_accessor :K, :R }
+      end
+    end
+    def apply_to_rows(rows, key_struct, block)
+      @ids.zip(@keys, rows) do |id, key, row|
+        @id = id
+        row.K = key_struct.new(*key)
+        row.R = self
+        ObjectTable::Util.apply_block(row, block)
+      end
+    end
+  end
+end

data/lib/object_table/grouping.rb ADDED Viewed

@@ -0,0 +1,109 @@
+require_relative 'factory'
+require_relative 'util'
+require_relative 'static_view'
+require_relative 'grouping/grid'
+class ObjectTable
+  class Group < StaticView
+    attr_reader :K
+    def initialize(parent, keys, value)
+      super(parent, value)
+      @K = keys
+    end
+  end
+  class Grouping
+    DEFAULT_VALUE_PREFIX = 'v_'.freeze
+    include Factory::SubFactory
+    def initialize(parent, *columns, &grouper)
+      @parent = parent
+      @grouper = grouper
+      @columns = columns
+      @names = columns
+    end
+    def _keys
+      return Util.get_rows(@parent, @columns) unless @columns.empty?
+      keys = @parent.apply(&@grouper)
+      raise 'Group keys must be hashes' unless keys.is_a?(Hash)
+      keys = BasicGrid.new.replace keys
+      keys._ensure_uniform_columns!(@parent.nrows)
+      @names = keys.keys
+      keys.values.map(&:to_a).transpose
+    end
+    def each(&block)
+      groups = Util.group_indices(_keys)
+      return to_enum(:_make_groups, groups) unless block
+      _make_groups(groups){|grp| Util.apply_block(grp, block)}
+    end
+    def apply(&block)
+      groups = Util.group_indices(_keys)
+      return empty_aggregation if groups.empty?
+      value_key = self.class.generate_name(DEFAULT_VALUE_PREFIX, @names).to_sym
+      keys = []
+      data = groups.keys.zip(to_enum(:_make_groups, groups)).map do |key, group|
+        value = Util.apply_block(group, block)
+        case value
+        when TableMethods
+          nrows = value.nrows
+        when BasicGrid
+          nrows = value._ensure_uniform_columns!
+        else
+          nrows = (Column.length_of(value) or 1)
+          value = BasicGrid[value_key, value]
+        end
+        keys.concat( Array.new(nrows, key) )
+        value
+      end
+      keys = BasicGrid[@names.zip(keys.transpose)]
+      result = __table_cls__._stack(data)
+      __table_cls__.new(keys.merge!(result.columns))
+    end
+    def reduce(defaults={}, &block)
+      keys = _keys()
+      return empty_aggregation if keys.empty?
+      grid = Grid.new(keys, defaults)
+      rows = @parent.each_row(row_factory: Grid::RowFactory)
+      grid.apply_to_rows(rows, self.class.key_struct(@names), block)
+      keys = BasicGrid[@names.zip(grid.index.keys.transpose)]
+      __table_cls__.new(keys.merge!(grid.values))
+    end
+    def _make_groups(groups)
+      key_struct = self.class.key_struct(@names)
+      groups.each do |k, v|
+        yield __group_cls__.new(@parent, key_struct.new(*k), NArray.to_na(v))
+      end
+      @parent
+    end
+    def self.generate_name(prefix, names)
+      regex = Regexp.new(Regexp.quote(prefix) + '(\d+)')
+      i = names.map{|n| n =~ regex and $1.to_i}.compact.max || -1
+      "#{prefix}#{i + 1}"
+    end
+    def self.key_struct(names)
+      Struct.new(*names.map(&:to_sym))
+    end
+    def empty_aggregation
+      __table_cls__.new(@names.map{|n| [n, []]})
+    end
+  end
+end