RubyGems - daru - Versions diffs - 0.2.1 → 0.2.2 - Mend

daru 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

checksums.yaml +5 -5
data/.gitignore +1 -0
data/.travis.yml +4 -1
data/CONTRIBUTING.md +9 -0
data/History.md +16 -1
data/README.md +22 -3
data/benchmarks/db_loading.rb +34 -0
data/daru.gemspec +4 -2
data/lib/daru/category.rb +13 -4
data/lib/daru/core/group_by.rb +40 -31
data/lib/daru/dataframe.rb +200 -54
data/lib/daru/index/index.rb +12 -11
data/lib/daru/index/multi_index.rb +8 -3
data/lib/daru/io/io.rb +5 -17
data/lib/daru/iruby/templates/dataframe.html.erb +1 -1
data/lib/daru/vector.rb +20 -6
data/lib/daru/version.rb +1 -1
data/spec/core/group_by_spec.rb +6 -1
data/spec/dataframe_spec.rb +110 -0
data/spec/index/index_spec.rb +26 -0
data/spec/index/multi_index_spec.rb +18 -0
metadata +10 -10

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
-SHA1:
-  metadata.gz: 87e4e2869fe6411e3eece92bb5dc24d48f890774
-  data.tar.gz: e711d0db1d57f51f31ccb7fb54078a6bdbcc4ff5
+SHA256:
+  metadata.gz: 617e082fd3366f695622071cf630690d102552821e82926af81a7007bb09093d
+  data.tar.gz: b6b995e35e8124768a15a3e32d1fc38515aecc55f070510d7a51b45945520eb7
 SHA512:
-  metadata.gz: afdb295d0d01542ba9f439cf5f7959d7f2a3b9e47de6047ecf7719548ef760e657c0dfe753ed16ee1da65e071bb5a182aaf03ee83c9de6075d54149753b9c346
-  data.tar.gz: e0c4ace661d9f1cb7e8040d424bb004a0b650a9605037d1aff258258bbac40a3c158e5f5b8a2a5c6a28070cf55566a0729ee9b77c8114d40d4d18cf9d26e69c3
+  metadata.gz: 8ae029cac761e4a7164b472ad6ef5275d18aa7d6dace9f61ab4553cc288a1d95e80412804e28a4405444a97dfd1e850ce00783d9b45126c0cb8e0c4dafa09e63
+  data.tar.gz: e8aa0aed6c05ec54ba4f5083ed3baa47b6eb02c3304bb0152beea1e57e181aafe45ebc779bbdec76b22b6c86b29f71dc16d013250870589cb93eeae0b8ca0917

data/.gitignore CHANGED

@@ -6,3 +6,4 @@ doc/
 vendor/
 profile/out/
 coverage/
+.ruby-version

data/.travis.yml CHANGED

@@ -22,7 +22,10 @@ script:
   - bundle exec yard-junk
 install:
-  - gem install bundler
+  - if [ $TRAVIS_RUBY_VERSION == '2.2' ] || [ $TRAVIS_RUBY_VERSION == '2.1' ] || [ $TRAVIS_RUBY_VERSION == '2.0' ];
+      then gem install bundler -v '~> 1.6';
+      else gem install bundler;
+      fi
   - gem install rainbow -v '2.2.1'
   - bundle install

data/CONTRIBUTING.md CHANGED

@@ -22,12 +22,21 @@ And run the test suite (should be all green with pending tests):
 If you have problems installing nmatrix, please consult the [nmatrix installation wiki](https://github.com/SciRuby/nmatrix/wiki/Installation) or the [mailing list](https://groups.google.com/forum/#!forum/sciruby-dev).
+**NOTE**: `Daru` is compatible with Ruby versions < 2.5; for later Ruby versions it breaks, returning the following error in versions >= 2.5.
+```
+/gems/packable-1.3.10/lib/packable/extensions/io.rb:86:in `pos': Illegal seek @ rb_io_tell - <STDOUT> (Errno::ESPIPE)
+```
+To reproduce this issue or explore this error further, head over to
+[issue #500](https://github.com/SciRuby/daru/issues/500),
+[issue #503](https://github.com/SciRuby/daru/issues/503). Also, if you want to fix this issue, then please discuss it here : [#505](https://github.com/SciRuby/daru/issues/500)
 While preparing your pull requests, don't forget to check your code with Rubocop:
   `bundle exec rubocop`
 [Optional] Install all Ruby versions which Daru currently supports with `rake spec setup`.
 ## Basic Development Flow
 1. Create a new branch with `git checkout -b <branch_name>`.

data/History.md CHANGED

@@ -1,3 +1,18 @@
+# 0.2.2 (8 August 2019)
+* Minor Enhancements
+  - DataFrame#set_index can take column name array, which results in multi-index  https://github.com/SciRuby/daru/pull/471 (by @Yuki-Inoue)
+  - implements DataFrame#reset_index https://github.com/SciRuby/daru/pull/473  (by @Yuki-Inoue)
+  - Make DataFrame.from_activerecord faster https://github.com/SciRuby/daru/pull/464 (by @paisible-wanderer )
+  - Added access_row_tuples_by_indexs method https://github.com/SciRuby/daru/pull/463 (by @Prakriti-nith )
+* Fixes
+  - Fix reindex vector on argument error https://github.com/SciRuby/daru/pull/470 (by @Yuki-Inoue)
+  - Optimize aggregation https://github.com/SciRuby/daru/pull/464 (by @paisible-wanderer)
+  - Index#dup should copy reference to name too https://github.com/SciRuby/daru/pull/477 (by @Yuki-Inoue)
+  - Should support bundler version 2.x.x https://github.com/SciRuby/daru/pull/483/ (by @Shekharrajak )
+  - fix table style  https://github.com/SciRuby/daru/pull/489 (by @kojix2 )
 # 0.2.1 (02 July 2018)
 * Minor Enhancements
@@ -116,7 +131,7 @@
   - Support formatting empty dataframes. They were returning an error before. (@gnilrets)
   - method_missing in Daru::DataFrame would not detect the correct vector if it was a String. Fixed that. (@lokeshh)
   - Fix docs of contrast_code to specify that the default value is false. (@v0dro)
-  - Fix occurence of SystemStackError due to faulty arguement passing to Array#values_at. (@v0dro)
+  - Fix occurence of SystemStackError due to faulty argument passing to Array#values_at. (@v0dro)
   - Fix `DataFrame#pivot_table` regression that raised an ArgumentError if the `:index` option was not specified. (@zverok)
   - Fix `DateFrame.rows` to accept empty argument. (@zverok)
   - Fix bug with false values on dataframe create. DataFrame from an Array of hashes wasn't being created properly when some of the values were `false`. (@gnilrets)

data/README.md CHANGED

@@ -11,6 +11,25 @@ daru (Data Analysis in RUby) is a library for storage, analysis, manipulation an
 daru makes it easy and intuitive to process data predominantly through 2 data structures: `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2, 2.3, and 2.4.
+## daru plugin gems
+- **[daru-view](https://github.com/SciRuby/daru-view)**
+daru-view is for easy and interactive plotting in web application & IRuby
+notebook. It can work in any Ruby web application frameworks like Rails, Sinatra, Nanoc and hopefully in others too.
+Articles/Blogs, that summarize powerful features of daru-view:
+* [GSoC 2017 daru-view](http://sciruby.com/blog/2017/09/01/gsoc-2017-data-visualization-using-daru-view/)
+* [GSoC 2018 Progress Report](https://github.com/SciRuby/daru-view/wiki/GSoC-2018---Progress-Report)
+* [HighCharts Official blog post regarding daru-view](https://www.highcharts.com/blog/post/i-am-ruby-developer-how-can-i-use-highcharts/)
+- **[daru-io](https://github.com/SciRuby/daru-io)**
+This gem extends support for many Import and Export methods of `Daru::DataFrame`. This gem is intended to help Rubyists who are into Data Analysis or Web Development, by serving as a general purpose conversion library that takes input in one format (say, JSON) and converts it another format (say, Avro) while also making it incredibly easy to getting started on analyzing data with daru. One can read more in [SciRuby/blog/daru-io](http://sciruby.com/blog/2017/08/29/gsoc-2017-support-to-import-export-of-more-formats/).
 ## Features
 * Data structures:
@@ -83,9 +102,9 @@ $ gem install daru
 ### Categorical Data
-* [Categorical Index](http://lokeshh.github.io/blog/2016/06/14/categorical-index/)
-* [Categorical Data](http://lokeshh.github.io/blog/2016/06/21/categorical-data/)
-* [Visualization with Categorical Data](http://lokeshh.github.io/blog/2016/07/02/visualization/)
+* [Categorical Index](http://lokeshh.github.io/gsoc2016/blog/2016/06/14/categorical-index/)
+* [Categorical Data](http://lokeshh.github.io/gsoc2016/blog/2016/06/21/categorical-data/)
+* [Visualization with Categorical Data](http://lokeshh.github.io/gsoc2016/blog/2016/07/02/visualization/)
 ## Basic Usage

data/benchmarks/db_loading.rb ADDED

@@ -0,0 +1,34 @@
+$:.unshift File.expand_path("../../lib", __FILE__)
+require 'benchmark'
+require 'daru'
+require 'sqlite3'
+require 'dbi'
+require 'active_record'
+db_name = 'daru_test.sqlite'
+FileUtils.rm(db_name) if File.file?(db_name)
+SQLite3::Database.new(db_name).tap do |db|
+  db.execute "create table accounts(id integer, name varchar, age integer, primary key(id))"
+  values = 1.upto(100_000).map { |i| %!(#{i},"name_#{i}",#{rand(100)})! }.join(",")
+  db.execute "insert into accounts values #{values}"
+end
+ActiveRecord::Base.establish_connection("sqlite3:#{db_name}")
+ActiveRecord::Base.connection
+class Account < ActiveRecord::Base; end
+Benchmark.bm do |x|
+  x.report("DataFrame.from_sql") do
+    Daru::DataFrame.from_sql(ActiveRecord::Base.connection, "SELECT * FROM accounts")
+  end
+  x.report("DataFrame.from_activerecord") do
+    Daru::DataFrame.from_activerecord(Account.all)
+  end
+end
+FileUtils.rm(db_name)

data/daru.gemspec CHANGED

@@ -33,7 +33,7 @@ Gem::Specification.new do |spec|
   spec.add_runtime_dependency 'packable', '~> 1.3.9'
   spec.add_development_dependency 'spreadsheet', '~> 1.1.1'
-  spec.add_development_dependency 'bundler', '~> 1.10'
+  spec.add_development_dependency 'bundler', '>= 1.10'
   spec.add_development_dependency 'rake', '~>10.5'
   spec.add_development_dependency 'pry', '~> 0.10'
   spec.add_development_dependency 'pry-byebug'
@@ -49,7 +49,9 @@ Gem::Specification.new do |spec|
   spec.add_development_dependency 'dbi'
   spec.add_development_dependency 'activerecord', '~> 4.0'
   spec.add_development_dependency 'mechanize'
-  spec.add_development_dependency 'sqlite3'
+  # issue : https://github.com/SciRuby/daru/issues/493 occured
+  # with latest version of sqlite3
+  spec.add_development_dependency  'sqlite3', '~> 1.3.13'
   spec.add_development_dependency 'rubocop', '~> 0.49.0'
   spec.add_development_dependency 'ruby-prof'
   spec.add_development_dependency 'simplecov'

data/lib/daru/category.rb CHANGED

@@ -74,6 +74,13 @@ module Daru
       end
     end
+    # this method is overwritten: see Daru::Category#plotting_library=
+    def plot(*args, **options, &b)
+      init_plotting_library
+      plot(*args, **options, &b)
+    end
     alias_method :rename, :name=
     # Returns an enumerator that enumerates on categorical data
@@ -174,7 +181,7 @@ module Daru
     # Returns vector for indexes/positions specified
     # @param [Array] indexes for which values has to be retrived
     # @note Since it accepts both indexes and postions. In case of collision,
-    #   arguement will be treated as index
+    #   argument will be treated as index
     # @return vector containing values specified at specified indexes/positions
     # @example
     #   dv = Daru::Vector.new [:a, 1, :a, 1, :c],
@@ -748,6 +755,11 @@ module Daru
     private
+    # Will lazily load the plotting library being used
+    def init_plotting_library
+      self.plotting_library = Daru.plotting_library
+    end
     def validate_categories input_categories
       raise ArgumentError, 'Input categories and speculated categories mismatch' unless
         (categories - input_categories).empty?
@@ -768,9 +780,6 @@ module Daru
       # To link every instance to its category,
       # it stores integer for every instance representing its category
       @array = map_cat_int.values_at(*data)
-      # Include plotting functionality
-      self.plotting_library = Daru.plotting_library
     end
     def category_from_position position

data/lib/daru/core/group_by.rb CHANGED

@@ -2,6 +2,7 @@ module Daru
   module Core
     class GroupBy
       class << self
+        # @private
         def get_positions_group_map_on(indexes_with_positions, sort: false)
           group_map = {}
@@ -17,6 +18,7 @@ module Daru
           group_map
         end
+        # @private
         def get_positions_group_for_aggregation(multi_index, level=-1)
           raise unless multi_index.is_a?(Daru::MultiIndex)
@@ -26,16 +28,19 @@ module Daru
           get_positions_group_map_on(new_index.each_with_index)
         end
+        # @private
         def get_positions_group_map_for_df(df, group_by_keys, sort: true)
           indexes_with_positions = df[*group_by_keys].to_df.each_row.map(&:to_a).each_with_index
           get_positions_group_map_on(indexes_with_positions, sort: sort)
         end
+        # @private
         def group_map_from_positions_to_indexes(positions_group_map, index)
           positions_group_map.map { |k, positions| [k, positions.map { |pos| index.at(pos) }] }.to_h
         end
+        # @private
         def df_from_group_map(df, group_map, remaining_vectors, from_position: true)
           return nil if group_map == {}
@@ -52,7 +57,17 @@ module Daru
         end
       end
-      attr_reader :groups, :df
+      # lazy accessor/attr_reader for the attribute groups
+      def groups
+        @groups ||= GroupBy.group_map_from_positions_to_indexes(@groups_by_pos, @context.index)
+      end
+      alias :groups_by_idx :groups
+      # lazy accessor/attr_reader for the attribute df
+      def df
+        @df ||= GroupBy.df_from_group_map(@context, @groups_by_pos, @non_group_vectors)
+      end
+      alias :grouped_df :df
       # Iterate over each group created by group_by. A DataFrame is yielded in
       # block.
@@ -75,8 +90,11 @@ module Daru
       end
       def initialize context, names
+        @group_vectors     = names
         @non_group_vectors = context.vectors.to_a - names
-        @context = context
+        @context = context # TODO: maybe rename in @original_df or @grouped_db
         # FIXME: It feels like we don't want to sort here. Ruby's #group_by
         # never sorts:
         #
@@ -84,22 +102,14 @@ module Daru
         #   #  => {4=>["test"], 2=>["me"], 6=>["please"]}
         #
         # - zverok, 2016-09-12
-        positions_groups = GroupBy.get_positions_group_map_for_df(@context, names, sort: true)
-        @groups = GroupBy.group_map_from_positions_to_indexes(positions_groups, @context.index)
-        @df     = GroupBy.df_from_group_map(@context, positions_groups, @non_group_vectors)
+        @groups_by_pos = GroupBy.get_positions_group_map_for_df(@context, @group_vectors, sort: true)
       end
       # Get a Daru::Vector of the size of each group.
       def size
-        index =
-          if multi_indexed_grouping?
-            Daru::MultiIndex.from_tuples @groups.keys
-          else
-            Daru::Index.new @groups.keys.flatten
-          end
+        index = get_grouped_index
-        values = @groups.values.map(&:size)
+        values = @groups_by_pos.values.map(&:size)
         Daru::Vector.new(values, index: index, name: :size)
       end
@@ -246,7 +256,7 @@ module Daru
       #   #                    a          b          c          d
       #   #         5        bar        two          6         66
       def get_group group
-        indexes   = @groups[group]
+        indexes   = groups_by_idx[group]
         elements  = @context.each_vector.map(&:to_a)
         transpose = elements.transpose
         rows      = indexes.each.map { |idx| transpose[idx] }
@@ -273,7 +283,7 @@ module Daru
       #   #   a ACE
       #   #   b BDF
       def reduce(init=nil)
-        result_hash = @groups.each_with_object({}) do |(group, indices), h|
+        result_hash = groups_by_idx.each_with_object({}) do |(group, indices), h|
           group_indices = indices.map { |v| @context.index.to_a[v] }
           grouped_result = init
@@ -284,18 +294,13 @@ module Daru
           h[group] = grouped_result
         end
-        index =
-          if multi_indexed_grouping?
-            Daru::MultiIndex.from_tuples result_hash.keys
-          else
-            Daru::Index.new result_hash.keys.flatten
-          end
+        index = get_grouped_index(result_hash.keys)
         Daru::Vector.new(result_hash.values, index: index)
       end
       def inspect
-        @df.inspect
+        grouped_df.inspect
       end
       # Function to use for aggregating the data.
@@ -335,7 +340,9 @@ module Daru
       #           Ram Hyderabad,Mumbai
       #
       def aggregate(options={})
-        @df.aggregate(options)
+        new_index = get_grouped_index
+        @context.aggregate(options) { [@groups_by_pos.values, new_index] }
       end
       private
@@ -344,7 +351,7 @@ module Daru
         selection     = @context
         rows, indexes = [], []
-        @groups.each_value do |index|
+        groups_by_idx.each_value do |index|
           index.send(method, quantity).each do |idx|
             rows << selection.row[idx].to_a
             indexes << idx
@@ -360,29 +367,31 @@ module Daru
           method_type == :numeric && @context[ngvec].type == :numeric
         end
-        rows = @groups.map do |_group, indexes|
+        rows = groups_by_idx.map do |_group, indexes|
           order.map do |ngvector|
             slice = @context[ngvector][*indexes]
             slice.is_a?(Daru::Vector) ? slice.send(method) : slice
           end
         end
-        index = apply_method_index
+        index = get_grouped_index
         order = Daru::Index.new(order)
         Daru::DataFrame.new(rows.transpose, index: index, order: order)
       end
-      def apply_method_index
+      def get_grouped_index(index_tuples=nil)
+        index_tuples = @groups_by_pos.keys if index_tuples.nil?
         if multi_indexed_grouping?
-          Daru::MultiIndex.from_tuples(@groups.keys)
+          Daru::MultiIndex.from_tuples(index_tuples)
         else
-          Daru::Index.new(@groups.keys.flatten)
+          Daru::Index.new(index_tuples.flatten)
         end
       end
       def multi_indexed_grouping?
-        return false unless @groups.keys[0]
-        @groups.keys[0].size > 1
+        return false unless @groups_by_pos.keys[0]
+        @groups_by_pos.keys[0].size > 1
       end
     end
   end

data/lib/daru/dataframe.rb CHANGED

@@ -10,7 +10,8 @@ module Daru
     include Daru::Maths::Arithmetic::DataFrame
     include Daru::Maths::Statistics::DataFrame
     # TODO: Remove this line but its causing erros due to unkown reason
-    include Daru::Plotting::DataFrame::NyaplotLibrary if Daru.has_nyaplot?
+    Daru.has_nyaplot?
     extend Gem::Deprecate
     class << self
@@ -346,20 +347,19 @@ module Daru
       @name = opts[:name]
       case source
-      when ->(s) { s.empty? }
-        @vectors = Index.coerce vectors
-        @index   = Index.coerce index
-        create_empty_vectors
+      when [], {}
+        create_empty_vectors(vectors, index)
       when Array
         initialize_from_array source, vectors, index, opts
       when Hash
         initialize_from_hash source, vectors, index, opts
+      when ->(s) { s.empty? } # TODO: likely want to remove this case
+        create_empty_vectors(vectors, index)
       end
       set_size
       validate
       update
-      self.plotting_library = Daru.plotting_library
     end
     def plotting_library= lib
@@ -372,11 +372,18 @@ module Daru
           )
         end
       else
-        raise ArguementError, "Plotting library #{lib} not supported. "\
+        raise ArgumentError, "Plotting library #{lib} not supported. "\
           'Supported libraries are :nyaplot and :gruff'
       end
     end
+    # this method is overwritten: see Daru::DataFrame#plotting_library=
+    def plot(*args, **options, &b)
+      init_plotting_library
+      plot(*args, **options, &b)
+    end
     # Access row or vector. Specify name of row/vector followed by axis(:row, :vector).
     # Defaults to *:vector*. Use of this method is not recommended for accessing
     # rows. Use df.row[:a] for accessing row with index ':a'.
@@ -404,13 +411,11 @@ module Daru
       validate_positions(*positions, nrows)
       if positions.is_a? Integer
-        return Daru::Vector.new @data.map { |vec| vec.at(*positions) },
-          index: @vectors
+        row = get_rows_for([positions])
+        Daru::Vector.new row, index: @vectors
       else
-        new_rows = @data.map { |vec| vec.at(*original_positions) }
-        return Daru::DataFrame.new new_rows,
-          index: @index.at(*original_positions),
-          order: @vectors
+        new_rows = get_rows_for(original_positions)
+        Daru::DataFrame.new new_rows, index: @index.at(*original_positions), order: @vectors
       end
     end
@@ -621,7 +626,7 @@ module Daru
     deprecate :dup_only_valid, :reject_values, 2016, 10
     # Returns a dataframe in which rows with any of the mentioned values
-    #   are ignored.
+    # are ignored.
     # @param [Array] values to reject to form the new dataframe
     # @return [Daru::DataFrame] Data Frame with only rows which doesn't
     #   contain the mentioned values
@@ -752,7 +757,7 @@ module Daru
     #     3   4   d
     #
     def uniq(*vtrs)
-      vecs = vtrs.empty? ? vectors.map(&:to_s) : Array(vtrs)
+      vecs = vtrs.empty? ? vectors.to_a : Array(vtrs)
       grouped = group_by(vecs)
       indexes = grouped.groups.values.map { |v| v[0] }.sort
       row[*indexes]
@@ -1011,6 +1016,7 @@ module Daru
       case method
       when Symbol then df.send(method)
       when Proc   then method.call(df)
+      when Array  then method.map(&:to_proc).map { |proc| proc.call(df) } # works with Array of both Symbol and/or Proc
       else raise
       end
     end
@@ -1489,7 +1495,7 @@ module Daru
     def reindex_vectors new_vectors
       unless new_vectors.is_a?(Daru::Index)
         raise ArgumentError, 'Must pass the new index of type Index or its '\
-          "subclasses, not #{new_index.class}"
+          "subclasses, not #{new_vectors.class}"
       end
       cl = Daru::DataFrame.new({}, order: new_vectors, index: @index, name: @name)
@@ -1527,14 +1533,52 @@ module Daru
       df
     end
+    module SetSingleIndexStrategy
+      def self.uniq_size(df, col)
+        df[col].uniq.size
+      end
+      def self.new_index(df, col)
+        Daru::Index.new(df[col].to_a)
+      end
+      def self.delete_vector(df, col)
+        df.delete_vector(col)
+      end
+    end
+    module SetMultiIndexStrategy
+      def self.uniq_size(df, cols)
+        df[*cols].uniq.size
+      end
+      def self.new_index(df, cols)
+        Daru::MultiIndex.from_arrays(df[*cols].map_vectors(&:to_a)).tap do |mi|
+          mi.name = cols
+          mi
+        end
+      end
+      def self.delete_vector(df, cols)
+        df.delete_vectors(*cols)
+      end
+    end
     # Set a particular column as the new DF
-    def set_index new_index, opts={}
-      raise ArgumentError, 'All elements in new index must be unique.' if
-        @size != self[new_index].uniq.size
+    def set_index new_index_col, opts={}
+      if new_index_col.respond_to?(:to_a)
+        strategy = SetMultiIndexStrategy
+        new_index_col = new_index_col.to_a
+      else
+        strategy = SetSingleIndexStrategy
+      end
-      self.index = Daru::Index.new(self[new_index].to_a)
-      delete_vector(new_index) unless opts[:keep]
+      uniq_size = strategy.uniq_size(self, new_index_col)
+      raise ArgumentError, 'All elements in new index must be unique.' if
+        @size != uniq_size
+      self.index = strategy.new_index(self, new_index_col)
+      strategy.delete_vector(self, new_index_col) unless opts[:keep]
       self
     end
@@ -1572,11 +1616,24 @@ module Daru
       end
     end
+    def reset_index
+      index_df = index.to_df
+      names = index.name
+      names = [names] unless names.instance_of?(Array)
+      new_vectors = names + vectors.to_a
+      self.index = index_df.index
+      names.each do |name|
+        self[name] = index_df[name]
+      end
+      self.order = new_vectors
+      self
+    end
     # Reassign index with a new index of type Daru::Index or any of its subclasses.
     #
     # @param [Daru::Index] idx New index object on which the rows of the dataframe
     #   are to be indexed.
-    # @example Reassgining index of a DataFrame
+    # @example Reassigining index of a DataFrame
     #   df = Daru::DataFrame.new({a: [1,2,3,4], b: [11,22,33,44]})
     #   df.index.to_a #=> [0,1,2,3]
     #
@@ -2088,7 +2145,7 @@ module Daru
     # Write this DataFrame to a CSV file.
     #
-    # == Arguements
+    # == Arguments
     #
     # * filename - Path of CSV file where the DataFrame is to be saved.
     #
@@ -2264,7 +2321,7 @@ module Daru
     #   #   2   3]
     def split_by_category cat_name
       cat_dv = self[cat_name]
-      raise ArguementError, "#{cat_name} is not a category vector" unless
+      raise ArgumentError, "#{cat_name} is not a category vector" unless
         cat_dv.category?
       cat_dv.categories.map do |cat|
@@ -2274,6 +2331,50 @@ module Daru
       end
     end
+    # @param indexes [Array] index(s) at which row tuples are retrieved
+    # @return [Array] returns array of row tuples at given index(s)
+    # @example Using Daru::Index
+    #   df = Daru::DataFrame.new({
+    #     a: [1, 2, 3],
+    #     b: ['a', 'a', 'b']
+    #   })
+    #
+    #   df.access_row_tuples_by_indexs(1,2)
+    #   # => [[2, "a"], [3, "b"]]
+    #
+    #   df.index = Daru::Index.new([:one,:two,:three])
+    #   df.access_row_tuples_by_indexs(:one,:three)
+    #   # => [[1, "a"], [3, "b"]]
+    #
+    # @example Using Daru::MultiIndex
+    #   mi_idx = Daru::MultiIndex.from_tuples [
+    #     [:a,:one,:bar],
+    #     [:a,:one,:baz],
+    #     [:b,:two,:bar],
+    #     [:a,:two,:baz],
+    #   ]
+    #   df_mi = Daru::DataFrame.new({
+    #     a: 1..4,
+    #     b: 'a'..'d'
+    #   }, index: mi_idx )
+    #
+    #   df_mi.access_row_tuples_by_indexs(:b, :two, :bar)
+    #   # => [[3, "c"]]
+    #   df_mi.access_row_tuples_by_indexs(:a)
+    #   # => [[1, "a"], [2, "b"], [4, "d"]]
+    def access_row_tuples_by_indexs *indexes
+      return get_sub_dataframe(indexes, by_position: false).map_rows(&:to_a) if
+      @index.is_a?(Daru::MultiIndex)
+      positions = @index.pos(*indexes)
+      if positions.is_a? Numeric
+        row = get_rows_for([positions])
+        row.first.is_a?(Array) ? row : [row]
+      else
+        new_rows = get_rows_for(indexes, by_position: false)
+        indexes.map { |index| new_rows.map { |r| r[index] } }
+      end
+    end
     # Function to use for aggregating the data.
     #
     # @param options [Hash] options for column, you want in resultant dataframe
@@ -2322,25 +2423,28 @@ module Daru
     # Note: `GroupBy` class `aggregate` method uses this `aggregate` method
     # internally.
     def aggregate(options={}, multi_index_level=-1)
-      positions_tuples, new_index = group_index_for_aggregation(@index, multi_index_level)
+      if block_given?
+        positions_tuples, new_index = yield(@index) # note: use of yield is private for now
+      else
+        positions_tuples, new_index = group_index_for_aggregation(@index, multi_index_level)
+      end
       colmn_value = aggregate_by_positions_tuples(options, positions_tuples)
       Daru::DataFrame.new(colmn_value, index: new_index, order: options.keys)
     end
-    # Is faster than using group_by followed by aggregate (because it doesn't generate an intermediary dataframe)
     def group_by_and_aggregate(*group_by_keys, **aggregation_map)
-      positions_groups = Daru::Core::GroupBy.get_positions_group_map_for_df(self, group_by_keys.flatten, sort: true)
-      new_index   = Daru::MultiIndex.from_tuples(positions_groups.keys).coerce_index
-      colmn_value = aggregate_by_positions_tuples(aggregation_map, positions_groups.values)
-      Daru::DataFrame.new(colmn_value, index: new_index, order: aggregation_map.keys)
+      group_by(*group_by_keys).aggregate(aggregation_map)
     end
     private
+    # Will lazily load the plotting library being used for this dataframe
+    def init_plotting_library
+      self.plotting_library = Daru.plotting_library
+    end
     def headers
       Daru::Index.new(Array(index.name) + @vectors.to_a)
     end
@@ -2452,19 +2556,30 @@ module Daru
       positions = @index.pos(*indexes)
       if positions.is_a? Numeric
-        return Daru::Vector.new populate_row_for(positions),
-          index: @vectors,
-          name: indexes.first
+        row = get_rows_for([positions])
+        Daru::Vector.new row, index: @vectors, name: indexes.first
       else
-        new_rows = @data.map { |vec| vec[*indexes] }
-        return Daru::DataFrame.new new_rows,
-          index: @index.subset(*indexes),
-          order: @vectors
+        new_rows = get_rows_for(indexes, by_position: false)
+        Daru::DataFrame.new new_rows, index: @index.subset(*indexes), order: @vectors
       end
     end
-    def populate_row_for pos
-      @data.map { |vector| vector.at(*pos) }
+    # @param keys [Array] can be an array of positions (if by_position is true) or indexes (if by_position if false)
+    # because of coercion by Daru::Vector#at and Daru::Vector#[], can return either an Array of
+    #   values (representing a row) or an array of Vectors (that can be seen as rows)
+    def get_rows_for(keys, by_position: true)
+      raise unless keys.is_a?(Array)
+      if by_position
+        pos = keys
+        @data.map { |vector| vector.at(*pos) }
+      else
+        # TODO: for now (2018-07-27), it is different than using
+        #    get_rows_for(@index.pos(*keys))
+        #    because Daru::Vector#at and Daru::Vector#[] don't handle Daru::MultiIndex the same way
+        indexes = keys
+        @data.map { |vec| vec[*indexes] }
+      end
     end
     def insert_or_modify_vector name, vector
@@ -2565,7 +2680,10 @@ module Daru
       set_size
     end
-    def create_empty_vectors
+    def create_empty_vectors(vectors, index)
+      @vectors = Index.coerce vectors
+      @index   = Index.coerce index
       @data = @vectors.map do |name|
         Daru::Vector.new([], name: coerce_name(name), index: @index)
       end
@@ -2885,7 +3003,6 @@ module Daru
     # Raises IndexError when one of the positions is not a valid position
     def validate_positions *positions, size
-      positions = [positions] if positions.is_a? Integer
       positions.each do |pos|
         raise IndexError, "#{pos} is not a valid position." if pos >= size
       end
@@ -2910,28 +3027,57 @@ module Daru
     end
     def aggregate_by_positions_tuples(options, positions_tuples)
-      options.map do |vect, method|
-        if @vectors.include?(vect)
-          vect = self[vect]
+      agg_over_vectors_only, options = cast_aggregation_options(options)
+      if agg_over_vectors_only
+        options.map do |vect_name, method|
+          vect = self[vect_name]
           positions_tuples.map do |positions|
             vect.apply_method_on_sub_vector(method, keys: positions)
           end
-        else
-          positions_tuples.map do |positions|
-            apply_method_on_sub_df(method, keys: positions)
-          end
         end
+      else
+        methods = options.values
+        # note: because we aggregate over rows, we don't have to re-get sub-dfs for each method (which is expensive)
+        rows = positions_tuples.map do |positions|
+          apply_method_on_sub_df(methods, keys: positions)
+        end
+        rows.transpose
+      end
+    end
+    # convert operations over sub-vectors to operations over sub-dfs when it improves perf
+    # note: we don't always "cast" because aggregation over a single vector / a few vector is faster
+    #   than aggregation over (sub-)dfs
+    def cast_aggregation_options(options)
+      vects, non_vects = options.keys.partition { |k| @vectors.include?(k) }
+      over_vectors = true
+      if non_vects.any?
+        options = options.clone
+        vects.each do |name|
+          proc_on_vect = options[name].to_proc
+          options[name] = ->(sub_df) { proc_on_vect.call(sub_df[name]) }
+        end
+        over_vectors = false
       end
+      [over_vectors, options]
     end
     def group_index_for_aggregation(index, multi_index_level=-1)
       case index
       when Daru::MultiIndex
-        groups = Daru::Core::GroupBy.get_positions_group_for_aggregation(index, multi_index_level)
-        new_index, pos_tuples = groups.keys, groups.values
+        groups_by_pos = Daru::Core::GroupBy.get_positions_group_for_aggregation(index, multi_index_level)
-        new_index = Daru::MultiIndex.from_tuples(new_index).coerce_index
+        new_index = Daru::MultiIndex.from_tuples(groups_by_pos.keys).coerce_index
+        pos_tuples = groups_by_pos.values
       when Daru::Index, Daru::CategoricalIndex
         new_index = Array(index).uniq
         pos_tuples = new_index.map { |idx| [*index.pos(idx)] }
@@ -2950,7 +3096,7 @@ module Daru
         when Range
           size.times.to_a[positions.first]
         else
-          raise ArgumentError, 'Unkown position type.'
+          raise ArgumentError, 'Unknown position type.'
         end
       else
         positions