RubyGems - daru - Versions diffs - 0.2.0 → 0.2.1 - Mend

daru 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

checksums.yaml +4 -4
data/History.md +15 -0
data/README.md +5 -3
data/daru.gemspec +0 -23
data/lib/daru.rb +0 -10
data/lib/daru/core/group_by.rb +57 -46
data/lib/daru/core/merge.rb +12 -3
data/lib/daru/dataframe.rb +75 -67
data/lib/daru/index/multi_index.rb +19 -5
data/lib/daru/io/csv/converters.rb +3 -0
data/lib/daru/io/io.rb +12 -5
data/lib/daru/vector.rb +25 -0
data/lib/daru/version.rb +1 -1
data/spec/core/group_by_spec.rb +75 -21
data/spec/dataframe_spec.rb +43 -3
data/spec/fixtures/string_converter_test.csv +5 -0
data/spec/index/multi_index_spec.rb +10 -2
data/spec/io/io_spec.rb +10 -0
data/spec/vector_spec.rb +16 -0
metadata +6 -23

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 69452b32fd8ef0ef7fb4ed58ab53ffa8aa15806d
-  data.tar.gz: 56927c77adbe7941eb2ca9a5e44d705931aad237
+  metadata.gz: 87e4e2869fe6411e3eece92bb5dc24d48f890774
+  data.tar.gz: e711d0db1d57f51f31ccb7fb54078a6bdbcc4ff5
 SHA512:
-  metadata.gz: 8e7511133b3409f7821cfec944a950d53df57bcd5893bb8a9557c013f31bf1e4a9cc07bbe1c143c63684f00f7d8d8f1adf3b31df732508e667ba6677f47d1d96
-  data.tar.gz: fc4beb70106372a276b21e0da645951595e5674f56e4422752aeeabc9cc2156983add90e59486aea4d88386fbeb2896d15f7ede30667bc84027abd900ee42e0e
+  metadata.gz: afdb295d0d01542ba9f439cf5f7959d7f2a3b9e47de6047ecf7719548ef760e657c0dfe753ed16ee1da65e071bb5a182aaf03ee83c9de6075d54149753b9c346
+  data.tar.gz: e0c4ace661d9f1cb7e8040d424bb004a0b650a9605037d1aff258258bbac40a3c158e5f5b8a2a5c6a28070cf55566a0729ee9b77c8114d40d4d18cf9d26e69c3

data/History.md CHANGED

@@ -1,3 +1,18 @@
+# 0.2.1 (02 July 2018)
+* Minor Enhancements
+  - Allow pasing singular Symbol to CSV converters option (@takkanm)
+  - Support calling GroupBy#each_group w/o blocks (@hibariya)
+  - Refactor grouping and aggregation (@paisible-wanderer)
+  - Add String Converter to Daru::IO::CSV::CONVERTERS (@takkanm)
+  - Fix annoying missing libraries warning
+  - Remove post-install message (nice yet useless)
+* Fixes
+  - Fix group_by for DataFrame with single row (@baarkerlounger)
+  - `#rolling_fillna!` bugfixes on `Daru::Vector` and `Daru::DataFrame` (@mhammiche)
+  - Fixes `#include?` on multiindex (@rohitner)
 # 0.2.0 (31 October 2017)
 * Major Enhancements
   - Add `DataFrame#which` query DSL (experimental! @rainchen)

data/README.md CHANGED

@@ -3,12 +3,13 @@
 [![Gem Version](https://badge.fury.io/rb/daru.svg)](http://badge.fury.io/rb/daru)
 [![Build Status](https://travis-ci.org/SciRuby/daru.svg?branch=master)](https://travis-ci.org/SciRuby/daru)
 [![Gitter](https://badges.gitter.im/v0dro/daru.svg)](https://gitter.im/v0dro/daru?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)
+[![Open Source Helpers](https://www.codetriage.com/sciruby/daru/badges/users.svg)](https://www.codetriage.com/sciruby/daru)
 ## Introduction
 daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby.
-daru makes it easy and intuitive to process data predominantly through 2 data structures: `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2 and 2.3.
+daru makes it easy and intuitive to process data predominantly through 2 data structures: `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2, 2.3, and 2.4.
 ## Features
@@ -73,6 +74,7 @@ $ gem install daru
 * [Data Analysis in RUby: Basic data manipulation and plotting](http://v0dro.github.io/blog/2014/11/25/data-analysis-in-ruby-basic-data-manipulation-and-plotting/)
 * [Data Analysis in RUby: Splitting, sorting, aggregating data and data types](http://v0dro.github.io/blog/2015/02/24/data-analysis-in-ruby-part-2/)
 * [Finding and Combining data in daru](http://v0dro.github.io/blog/2015/08/03/finding-and-combining-data-in-daru/)
+* [Introduction to analyzing datasets with daru library](http://gafur.me/2018/02/05/analysing-datasets-with-daru-library.html)
 ### Time series
@@ -192,13 +194,13 @@ In addition to nyaplot, daru also supports plotting out of the box with [gnuplot
 ## Documentation
-Docs can be found [here](https://rubygems.org/gems/daru).
+Docs can be found [here](http://www.rubydoc.info/gems/daru).
 ## Contributing
 Pick a feature from the Roadmap or the issue tracker or think of your own and send me a Pull Request!
-For details see [CONTRIBUTING](https://github.com/v0dro/daru/blob/master/CONTRIBUTING.md).
+For details see [CONTRIBUTING](https://github.com/SciRuby/daru/blob/master/CONTRIBUTING.md).
 ## Acknowledgements

data/daru.gemspec CHANGED

@@ -27,29 +27,6 @@ Gem::Specification.new do |spec|
   spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
   spec.require_paths = ["lib"]
-  spec.post_install_message = <<-EOF
-*************************************************************************
-Thank you for installing daru!
-  oOOOOOo
- ,|    oO
-//|     |
-\\\\|     |
- `|     |
-  `-----`
-Hope you love daru! For enhanced interactivity and better visualizations,
-consider using gnuplotrb and nyaplot with iruby. For statistics use the
-statsample family.
-Read the README for interesting use cases and examples.
-Cheers!
-*************************************************************************
-EOF
   spec.add_runtime_dependency 'backports'
   # it is required by NMatrix, yet we want to specify clearly which minimal version is OK

data/lib/daru.rb CHANGED

@@ -86,16 +86,6 @@ module Daru
   create_has_library :gruff
 end
-{'spreadsheet' => '~>1.1.1', 'mechanize' => '~>2.7.5'}.each do |name, version|
-  begin
-    gem name, version
-    require name
-  rescue LoadError
-    Daru.error "\nInstall the #{name} gem version #{version} for using"\
-    " #{name} functions."
-  end
-end
 autoload :CSV, 'csv'
 require 'matrix'
 require 'forwardable'

data/lib/daru/core/group_by.rb CHANGED

@@ -1,11 +1,64 @@
 module Daru
   module Core
     class GroupBy
+      class << self
+        def get_positions_group_map_on(indexes_with_positions, sort: false)
+          group_map = {}
+          indexes_with_positions.each do |idx, position|
+            (group_map[idx] ||= []) << position
+          end
+          if sort # TODO: maybe add a more "stable" sorting option?
+            sorted_keys = group_map.keys.sort(&Daru::Core::GroupBy::TUPLE_SORTER)
+            group_map = sorted_keys.map { |k| [k, group_map[k]] }.to_h
+          end
+          group_map
+        end
+        def get_positions_group_for_aggregation(multi_index, level=-1)
+          raise unless multi_index.is_a?(Daru::MultiIndex)
+          new_index = multi_index.dup
+          new_index.remove_layer(level) # TODO: recheck code of Daru::MultiIndex#remove_layer
+          get_positions_group_map_on(new_index.each_with_index)
+        end
+        def get_positions_group_map_for_df(df, group_by_keys, sort: true)
+          indexes_with_positions = df[*group_by_keys].to_df.each_row.map(&:to_a).each_with_index
+          get_positions_group_map_on(indexes_with_positions, sort: sort)
+        end
+        def group_map_from_positions_to_indexes(positions_group_map, index)
+          positions_group_map.map { |k, positions| [k, positions.map { |pos| index.at(pos) }] }.to_h
+        end
+        def df_from_group_map(df, group_map, remaining_vectors, from_position: true)
+          return nil if group_map == {}
+          new_index = group_map.flat_map { |group, values| values.map { |val| group + [val] } }
+          new_index = Daru::MultiIndex.from_tuples(new_index)
+          return Daru::DataFrame.new({}, index: new_index) if remaining_vectors == []
+          new_rows_order = group_map.values.flatten
+          new_df = df[*remaining_vectors].to_df.get_sub_dataframe(new_rows_order, by_position: from_position)
+          new_df.index = new_index
+          new_df
+        end
+      end
       attr_reader :groups, :df
       # Iterate over each group created by group_by. A DataFrame is yielded in
       # block.
       def each_group
+        return to_enum(:each_group) unless block_given?
         groups.keys.each do |k|
           yield get_group(k)
         end
@@ -22,11 +75,8 @@ module Daru
       end
       def initialize context, names
-        @groups = {}
         @non_group_vectors = context.vectors.to_a - names
         @context = context
-        vectors = names.map { |vec| context[vec].to_a }
-        tuples  = vectors[0].zip(*vectors[1..-1])
         # FIXME: It feels like we don't want to sort here. Ruby's #group_by
         # never sorts:
         #
@@ -34,7 +84,10 @@ module Daru
         #   #  => {4=>["test"], 2=>["me"], 6=>["please"]}
         #
         # - zverok, 2016-09-12
-        init_groups_df tuples, names
+        positions_groups = GroupBy.get_positions_group_map_for_df(@context, names, sort: true)
+        @groups = GroupBy.group_map_from_positions_to_indexes(positions_groups, @context.index)
+        @df     = GroupBy.df_from_group_map(@context, positions_groups, @non_group_vectors)
       end
       # Get a Daru::Vector of the size of each group.
@@ -282,26 +335,11 @@ module Daru
       #           Ram Hyderabad,Mumbai
       #
       def aggregate(options={})
-        @df.index = @df.index.remove_layer(@df.index.levels.size - 1)
         @df.aggregate(options)
       end
       private
-      def init_groups_df tuples, names
-        multi_index_tuples = []
-        keys = tuples.uniq.sort(&TUPLE_SORTER)
-        keys.each do |key|
-          indices = all_indices_for(tuples, key)
-          @groups[key] = indices
-          indices.each do |indice|
-            multi_index_tuples << key + [indice]
-          end
-        end
-        @groups.freeze
-        @df = resultant_context(multi_index_tuples, names) unless multi_index_tuples.empty?
-      end
       def select_groups_from method, quantity
         selection     = @context
         rows, indexes = [], []
@@ -342,33 +380,6 @@ module Daru
         end
       end
-      def resultant_context(multi_index_tuples, names)
-        multi_index = Daru::MultiIndex.from_tuples(multi_index_tuples)
-        context_tmp = @context.dup.delete_vectors(*names)
-        rows_tuples = context_tmp.access_row_tuples_by_indexs(
-          *@groups.values.flatten!
-        )
-        context_new = Daru::DataFrame.rows(rows_tuples, index: multi_index)
-        context_new.vectors = context_tmp.vectors
-        context_new
-      end
-      def all_indices_for arry, element
-        found, index, indexes = -1, -1, []
-        while found
-          found = arry[index+1..-1].index(element)
-          if found
-            index = index + found + 1
-            indexes << index
-          end
-        end
-        if indexes.count == 1
-          [@context.index.at(*indexes)]
-        else
-          @context.index.at(*indexes).to_a
-        end
-      end
       def multi_indexed_grouping?
         return false unless @groups.keys[0]
         @groups.keys[0].size > 1

data/lib/daru/core/merge.rb CHANGED

@@ -17,17 +17,17 @@ module Daru
         end
       end
-      def initialize left_df, right_df, opts={}
+      def initialize left_df, right_df, opts={} # rubocop:disable Metrics/AbcSize -- quick-fix for issue #171
         init_opts(opts)
         validate_on!(left_df, right_df)
         key_sanitizer = ->(h) { sanitize_merge_keys(h.values_at(*on)) }
         @left = df_to_a(left_df)
-        @left.sort_by!(&key_sanitizer)
+        @left.sort! { |a, b| safe_compare(a.values_at(*on), b.values_at(*on)) }
         @left_key_values = @left.map(&key_sanitizer)
         @right = df_to_a(right_df)
-        @right.sort_by!(&key_sanitizer)
+        @right.sort! { |a, b| safe_compare(a.values_at(*on), b.values_at(*on)) }
         @right_key_values = @right.map(&key_sanitizer)
         @left_keys, @right_keys = merge_keys(left_df, right_df, on)
@@ -246,6 +246,15 @@ module Daru
             raise ArgumentError, "Both dataframes expected to have #{on.inspect} field"
         end
       end
+      def safe_compare(left_array, right_array)
+        left_array.zip(right_array).map { |l, r|
+          next 0 if l.nil? && r.nil?
+          next 1 if r.nil?
+          next -1 if l.nil?
+          l <=> r
+        }.reject(&:zero?).first || 0
+      end
     end
     module Merge

data/lib/daru/dataframe.rb CHANGED

@@ -549,6 +549,20 @@ module Daru
       Daru::Accessors::DataFrameByRow.new(self)
     end
+    # Extract a dataframe given row indexes or positions
+    # @param keys [Array] can be positions (if by_position is true) or indexes (if by_position if false)
+    # @return [Daru::Dataframe]
+    def get_sub_dataframe(keys, by_position: true)
+      return Daru::DataFrame.new({}) if keys == []
+      keys = @index.pos(*keys) unless by_position
+      sub_df = row_at(*keys)
+      sub_df = sub_df.to_df.transpose if sub_df.is_a?(Daru::Vector)
+      sub_df
+    end
     # Duplicate the DataFrame entirely.
     #
     # == Arguments
@@ -698,6 +712,7 @@ module Daru
     #
     def rolling_fillna!(direction=:forward)
       @data.each { |vec| vec.rolling_fillna!(direction) }
+      self
     end
     def rolling_fillna(direction=:forward)
@@ -990,6 +1005,17 @@ module Daru
       self
     end
+    def apply_method(method, keys: nil, by_position: true)
+      df = keys ? get_sub_dataframe(keys, by_position: by_position) : self
+      case method
+      when Symbol then df.send(method)
+      when Proc   then method.call(df)
+      else raise
+      end
+    end
+    alias :apply_method_on_sub_df :apply_method
     # Retrieves a Daru::Vector, based on the result of calculation
     # performed on each row.
     def collect_rows &block
@@ -1450,11 +1476,10 @@ module Daru
     #   # ["foo", "two", 3]=>[2, 4]}
     def group_by *vectors
       vectors.flatten!
-      # FIXME: wouldn't it better to do vectors - @vectors here and
-      # raise one error with all non-existent vector names?.. - zverok, 2016-05-18
-      vectors.each { |v|
-        raise(ArgumentError, "Vector #{v} does not exist") unless has_vector?(v)
-      }
+      missing = vectors - @vectors.to_a
+      unless missing.empty?
+        raise(ArgumentError, "Vector(s) missing: #{missing.join(', ')}")
+      end
       vectors = [@vectors.first] if vectors.empty?
@@ -2249,22 +2274,6 @@ module Daru
       end
     end
-    # returns array of row tuples at given index(s)
-    def access_row_tuples_by_indexs *indexes
-      positions = @index.pos(*indexes)
-      return populate_row_for(positions) if positions.is_a? Numeric
-      res = []
-      new_rows = @data.map { |vec| vec[*indexes] }
-      indexes.each do |index|
-        tuples = []
-        new_rows.map { |row| tuples += [row[index]] }
-        res << tuples
-      end
-      res
-    end
     # Function to use for aggregating the data.
     #
     # @param options [Hash] options for column, you want in resultant dataframe
@@ -2282,7 +2291,7 @@ module Daru
     #      3   d  17
     #      4   e   1
     #
-    #    df.aggregate(num_100_times: ->(df) { df.num*100 })
+    #    df.aggregate(num_100_times: ->(df) { (df.num*100).first })
     #   => #<Daru::DataFrame(5x1)>
     #               num_100_ti
     #             0       5200
@@ -2312,41 +2321,26 @@ module Daru
     #
     # Note: `GroupBy` class `aggregate` method uses this `aggregate` method
     # internally.
-    def aggregate(options={})
-      colmn_value, index_tuples = aggregated_colmn_value(options)
-      Daru::DataFrame.new(
-        colmn_value, index: index_tuples, order: options.keys
-      )
-    end
+    def aggregate(options={}, multi_index_level=-1)
+      positions_tuples, new_index = group_index_for_aggregation(@index, multi_index_level)
-    private
+      colmn_value = aggregate_by_positions_tuples(options, positions_tuples)
-    # Do the `method` (`method` can be :sum, :mean, :std, :median, etc or
-    # lambda), on the column.
-    def apply_method_on_colmns colmn, index_tuples, method
-      rows = []
-      index_tuples.each do |indexes|
-        # If single element then also make it vector.
-        slice = Daru::Vector.new(Array(self[colmn][*indexes]))
-        case method
-        when Symbol
-          rows << (slice.is_a?(Daru::Vector) ? slice.send(method) : slice)
-        when Proc
-          rows << method.call(slice)
-        end
-      end
-      rows
+      Daru::DataFrame.new(colmn_value, index: new_index, order: options.keys)
     end
-    def apply_method_on_df index_tuples, method
-      rows = []
-      index_tuples.each do |indexes|
-        slice = row[*indexes]
-        rows << method.call(slice)
-      end
-      rows
+    # Is faster than using group_by followed by aggregate (because it doesn't generate an intermediary dataframe)
+    def group_by_and_aggregate(*group_by_keys, **aggregation_map)
+      positions_groups = Daru::Core::GroupBy.get_positions_group_map_for_df(self, group_by_keys.flatten, sort: true)
+      new_index   = Daru::MultiIndex.from_tuples(positions_groups.keys).coerce_index
+      colmn_value = aggregate_by_positions_tuples(aggregation_map, positions_groups.values)
+      Daru::DataFrame.new(colmn_value, index: new_index, order: aggregation_map.keys)
     end
+    private
     def headers
       Daru::Index.new(Array(index.name) + @vectors.to_a)
     end
@@ -2910,27 +2904,41 @@ module Daru
     end
     def update_data source, vectors
-      @data = @vectors.each_with_index.map do |_vec,idx|
+      @data = @vectors.each_with_index.map do |_vec, idx|
         Daru::Vector.new(source[idx], index: @index, name: vectors[idx])
       end
     end
-    def aggregated_colmn_value(options)
-      colmn_value = []
-      index_tuples = Array(@index).uniq
-      options.keys.each do |vec|
-        do_this_on_vec = options[vec]
-        colmn_value << if @vectors.include?(vec)
-                         apply_method_on_colmns(
-                           vec, index_tuples, do_this_on_vec
-                         )
-                       else
-                         apply_method_on_df(
-                           index_tuples, do_this_on_vec
-                         )
-                       end
+    def aggregate_by_positions_tuples(options, positions_tuples)
+      options.map do |vect, method|
+        if @vectors.include?(vect)
+          vect = self[vect]
+          positions_tuples.map do |positions|
+            vect.apply_method_on_sub_vector(method, keys: positions)
+          end
+        else
+          positions_tuples.map do |positions|
+            apply_method_on_sub_df(method, keys: positions)
+          end
+        end
       end
-      [colmn_value, index_tuples]
+    end
+    def group_index_for_aggregation(index, multi_index_level=-1)
+      case index
+      when Daru::MultiIndex
+        groups = Daru::Core::GroupBy.get_positions_group_for_aggregation(index, multi_index_level)
+        new_index, pos_tuples = groups.keys, groups.values
+        new_index = Daru::MultiIndex.from_tuples(new_index).coerce_index
+      when Daru::Index, Daru::CategoricalIndex
+        new_index = Array(index).uniq
+        pos_tuples = new_index.map { |idx| [*index.pos(idx)] }
+      else raise
+      end
+      [pos_tuples, new_index]
     end
     # coerce ranges, integers and array in appropriate ways

data/lib/daru/index/multi_index.rb CHANGED

@@ -244,8 +244,21 @@ module Daru
       @labels.delete_at(layer_index)
       @name.delete_at(layer_index) unless @name.nil?
-      # CategoricalIndex is used , to allow duplicate indexes.
-      @levels.size == 1 ? Daru::CategoricalIndex.new(to_a.flatten) : self
+      coerce_index
+    end
+    def coerce_index
+      if @levels.size == 1
+        elements = to_a.flatten
+        if elements.uniq.length == elements.length
+          Daru::Index.new(elements)
+        else
+          Daru::CategoricalIndex.new(elements)
+        end
+      else
+        self
+      end
     end
     # Array `name` must have same length as levels and labels.
@@ -272,7 +285,7 @@ module Daru
     end
     def dup
-      MultiIndex.new levels: levels.dup, labels: labels
+      MultiIndex.new levels: levels.dup, labels: labels.dup, name: (@name.nil? ? nil : @name.dup)
     end
     def drop_left_level by=1
@@ -293,8 +306,9 @@ module Daru
     def include? tuple
       return false unless tuple.is_a? Enumerable
-      tuple.flatten.each_with_index
-           .all? { |tup, i| @levels[i][tup] }
+      @labels[0...tuple.flatten.size]
+        .transpose
+        .include?(tuple.flatten.each_with_index.map { |e, i| @levels[i][e] })
     end
     def size

data/lib/daru/io/csv/converters.rb CHANGED

@@ -11,6 +11,9 @@ module Daru
           else
             f
           end
+        },
+        string: lambda { |f, _|
+          f
         }
       }.freeze
     end

data/lib/daru/io/io.rb CHANGED

@@ -34,11 +34,12 @@ module Daru
     end
   end
-  module IO
+  module IO # rubocop:disable Metrics/ModuleLength
     class << self
       # Functions for loading/writing Excel files.
       def from_excel path, opts={}
+        optional_gem 'spreadsheet', '~>1.1.1'
         opts = {
           worksheet_id: 0
         }.merge opts
@@ -185,19 +186,25 @@ module Daru
       end
       def from_html path, opts
+        optional_gem 'mechanize', '~>2.7.5'
         page = Mechanize.new.get(path)
         page.search('table').map { |table| html_parse_table table }
             .keep_if { |table| html_search table, opts[:match] }
             .compact
             .map { |table| html_decide_values table, opts }
             .map { |table| html_table_to_dataframe table }
-      rescue LoadError
-        raise 'Install the mechanize gem version 2.7.5 with `gem install mechanize`,'\
-        ' for using the from_html function.'
       end
       private
+      def optional_gem(name, version)
+        gem name, version
+        require name
+      rescue LoadError
+        Daru.error "\nInstall the #{name} gem version #{version} for using"\
+        " #{name} functions."
+      end
       DARU_OPT_KEYS = %i[clone order index name].freeze
       def from_csv_prepare_opts opts
@@ -214,7 +221,7 @@ module Daru
       end
       def from_csv_prepare_converters(converters)
-        converters.flat_map do |c|
+        Array(converters).flat_map do |c|
           if ::CSV::Converters[c]
             ::CSV::Converters[c]
           elsif Daru::IO::CSV::CONVERTERS[c]

data/lib/daru/vector.rb CHANGED

@@ -122,6 +122,17 @@ module Daru
       self
     end
+    def apply_method(method, keys: nil, by_position: true)
+      vect = keys ? get_sub_vector(keys, by_position: by_position) : self
+      case method
+      when Symbol then vect.send(method)
+      when Proc   then method.call(vect)
+      else raise
+      end
+    end
+    alias :apply_method_on_sub_vector :apply_method
     # The name of the Daru::Vector. String.
     attr_reader :name
     # The row index. Can be either Daru::Index or Daru::MultiIndex.
@@ -790,6 +801,7 @@ module Daru
           self[idx] = last_valid_value
         end
       end
+      self
     end
     # Non-destructive version of rolling_fillna!
@@ -870,6 +882,19 @@ module Daru
       @index.include? index
     end
+    # @param keys [Array] can be positions (if by_position is true) or indexes (if by_position if false)
+    # @return [Daru::Vector]
+    def get_sub_vector(keys, by_position: true)
+      return Daru::Vector.new([]) if keys == []
+      keys = @index.pos(*keys) unless by_position
+      sub_vect = at(*keys)
+      sub_vect = Daru::Vector.new([sub_vect]) unless sub_vect.is_a?(Daru::Vector)
+      sub_vect
+    end
     # @return [Daru::DataFrame] the vector as a single-vector dataframe
     def to_df
       Daru::DataFrame.new({@name => @data}, name: @name, index: @index)

data/lib/daru/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Daru
-  VERSION = '0.2.0'.freeze
+  VERSION = '0.2.1'.freeze
 end

data/spec/core/group_by_spec.rb CHANGED

@@ -201,6 +201,22 @@ describe Daru::Core::GroupBy do
     end
   end
+  context '#each_group without block' do
+    it 'enumerates groups' do
+      enum = @dl_group.each_group
+      expect(enum.count).to eq 6
+      expect(enum).to all be_a(Daru::DataFrame)
+      expect(enum.to_a.last).to eq(Daru::DataFrame.new({
+        a: ['foo', 'foo'],
+        b: ['two', 'two'],
+        c: [3, 3],
+        d: [33, 55]
+        }, index: [2, 4]
+      ))
+    end
+  end
   context '#first' do
     it 'gets the first row from each group' do
       expect(@dl_group.first).to eq(Daru::DataFrame.new({
@@ -223,10 +239,6 @@ describe Daru::Core::GroupBy do
     end
   end
-  context "#aggregate" do
-    pending
-  end
   context "#mean" do
     it "computes mean of the numeric columns of a single layer group" do
       expect(@sl_group.mean).to eq(Daru::DataFrame.new({
@@ -498,23 +510,6 @@ describe Daru::Core::GroupBy do
       }
     end
-    context 'group and aggregate sum for two vectors' do
-      subject {
-        dataframe.group_by([:employee, :month]).aggregate(salary: :sum) }
-      it { is_expected.to eq Daru::DataFrame.new({
-              salary: [600, 500, 1200, 1000, 600, 700]},
-              index: Daru::MultiIndex.from_tuples([
-                  ['Jane', 'July'],
-                  ['Jane', 'June'],
-                  ['John', 'July'],
-                  ['John', 'June'],
-                  ['Mark', 'July'],
-                  ['Mark', 'June']
-                ])
-      )}
-    end
     context 'group and aggregate sum and lambda function for vectors' do
       subject { dataframe.group_by([:employee]).aggregate(
         salary: :sum,
@@ -592,5 +587,64 @@ describe Daru::Core::GroupBy do
           )
       end
     end
+    let(:spending_df) {
+      Daru::DataFrame.rows([
+        [2010,    'dev',  50, 1],
+        [2010,    'dev', 150, 1],
+        [2010,    'dev', 200, 1],
+        [2011,    'dev',  50, 1],
+        [2012,    'dev', 150, 1],
+        [2011, 'office', 300, 1],
+        [2010, 'market',  50, 1],
+        [2011, 'market', 500, 1],
+        [2012, 'market', 500, 1],
+        [2012, 'market', 300, 1],
+        [2012,    'R&D',  10, 1],],
+        order: [:year, :category, :spending, :nb_spending])
+    }
+    let(:multi_index_year_category) {
+      Daru::MultiIndex.from_tuples([
+                       [2010, "dev"], [2010, "market"],
+                       [2011, "dev"], [2011, "market"], [2011, "office"],
+        [2012, "R&D"], [2012, "dev"], [2012, "market"]])
+    }
+    context 'group_by and aggregate on multiple elements' do
+      it 'does aggregate' do
+        expect(spending_df.group_by([:year, :category]).aggregate(spending: :sum)).to eq(
+          Daru::DataFrame.new({spending: [400, 50, 50, 500, 300, 10, 150, 800]}, index: multi_index_year_category))
+      end
+      it 'works as older methods' do
+        newer_way = spending_df.group_by([:year, :category]).aggregate(spending: :sum, nb_spending: :sum)
+        older_way = spending_df.group_by([:year, :category]).sum
+        expect(newer_way).to eq(older_way)
+      end
+      context 'can aggregate on MultiIndex' do
+        let(:multi_indexed_aggregated_df) { spending_df.group_by([:year, :category]).aggregate(spending: :sum) }
+        let(:index_year) { Daru::Index.new([2010, 2011, 2012]) }
+        let(:index_category) { Daru::Index.new(["dev", "market", "office", "R&D"]) }
+        it 'aggregates by default on the last layer of MultiIndex' do
+          expect(multi_indexed_aggregated_df.aggregate(spending: :sum)).to eq(
+            Daru::DataFrame.new({spending: [450, 850, 960]}, index: index_year))
+        end
+        it 'can aggregate on the first layer of MultiIndex' do
+          expect(multi_indexed_aggregated_df.aggregate({spending: :sum},0)).to eq(
+            Daru::DataFrame.new({spending: [600, 1350, 300, 10]}, index: index_category))
+        end
+        it 'does coercion: when one layer is remaining, MultiIndex is coerced in Index that does not aggregate anymore' do
+          df_with_simple_index = multi_indexed_aggregated_df.aggregate(spending: :sum)
+          expect(df_with_simple_index.aggregate(spending: :sum)).to eq(df_with_simple_index)
+        end
+      end
+    end
   end
 end

data/spec/dataframe_spec.rb CHANGED

@@ -1858,7 +1858,7 @@ describe Daru::DataFrame do
     context 'rolling_fillna! forwards' do
       before { subject.rolling_fillna!(:forward) }
-      it { is_expected.to be_a Daru::DataFrame }
+      it { expect(subject.rolling_fillna!(:forward)).to eq(subject) }
       its(:'a.to_a') { is_expected.to eq [1, 2, 3, 3, 3, 3, 1, 7] }
       its(:'b.to_a') { is_expected.to eq [:a,  :b, :b, :b, :b, 3, 5, 5] }
       its(:'c.to_a') { is_expected.to eq ['a', 'a', 3, 4, 3, 5, 5, 7] }
@@ -1866,7 +1866,7 @@ describe Daru::DataFrame do
     context 'rolling_fillna! backwards' do
       before { subject.rolling_fillna!(:backward) }
-      it { is_expected.to be_a Daru::DataFrame }
+      it { expect(subject.rolling_fillna!(:backward)).to eq(subject) }
       its(:'a.to_a') { is_expected.to eq [1, 2, 3, 1, 1, 1, 1, 7] }
       its(:'b.to_a') { is_expected.to eq [:a, :b, 3, 3, 3, 3, 5, 0] }
       its(:'c.to_a') { is_expected.to eq ['a', 3, 3, 4, 3, 5, 7, 7] }
@@ -3266,6 +3266,18 @@ describe Daru::DataFrame do
     end
   end
+  context "group_by" do
+    context "on a single row DataFrame" do
+      let(:df){ Daru::DataFrame.new(city: %w[Kyiv], year: [2015], value: [1]) }
+      it "returns a groupby object" do
+        expect(df.group_by([:city])).to be_a(Daru::Core::GroupBy)
+      end
+      it "has the correct index" do
+        expect(df.group_by([:city]).groups).to eq({["Kyiv"]=>[0]})
+      end
+    end
+  end
   context "#vector_sum" do
     before do
       a1 = Daru::Vector.new [1, 2, 3, 4, 5, nil, nil]
@@ -4032,7 +4044,7 @@ describe Daru::DataFrame do
       Daru::DataFrame.new({num: [52,12,07,17,01]}, index: cat_idx) }
     it 'lambda function on particular column' do
-      expect(df.aggregate(num_100_times: ->(df) { df.num*100 })).to eq(
+      expect(df.aggregate(num_100_times: ->(df) { (df.num*100).first })).to eq(
           Daru::DataFrame.new(num_100_times: [5200, 1200, 700, 1700, 100])
         )
     end
@@ -4043,6 +4055,34 @@ describe Daru::DataFrame do
     end
   end
+  context '#group_by_and_aggregate' do
+    let(:spending_df) {
+      Daru::DataFrame.rows([
+        [2010,    'dev',  50, 1],
+        [2010,    'dev', 150, 1],
+        [2010,    'dev', 200, 1],
+        [2011,    'dev',  50, 1],
+        [2012,    'dev', 150, 1],
+        [2011, 'office', 300, 1],
+        [2010, 'market',  50, 1],
+        [2011, 'market', 500, 1],
+        [2012, 'market', 500, 1],
+        [2012, 'market', 300, 1],
+        [2012,    'R&D',  10, 1],],
+        order: [:year, :category, :spending, :nb_spending])
+    }
+    it 'works as group_by + aggregate' do
+      expect(spending_df.group_by_and_aggregate(:year, {spending: :sum})).to eq(
+        spending_df.group_by(:year).aggregate(spending: :sum))
+      expect(spending_df.group_by_and_aggregate([:year, :category], spending: :sum, nb_spending: :size)).to eq(
+        spending_df.group_by([:year, :category]).aggregate(spending: :sum, nb_spending: :size))
+    end
+  end
   context '#create_sql' do
     let(:df) { Daru::DataFrame.new({
         a: [1,2,3],

data/spec/fixtures/string_converter_test.csv ADDED

@@ -0,0 +1,5 @@
+ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
+8517337,094652,03/12/2012 02:00:00 PM,027XX S HAMLIN AVE,1152,DECEPTIVE PRACTICE,ILLEGAL USE CASH CARD,ATM (AUTOMATIC TELLER MACHINE),false,true,1031,010,22,30,11,1151482,1885517,2012,02/04/2016 06:33:39 AM,41.841738053,-87.719605942,"(41.841738053, -87.719605942)"
+8517338,194241,03/06/2012 10:49:00 PM,102XX S VERNON AVE,0917,MOTOR VEHICLE THEFT,"CYCLE, SCOOTER, BIKE W-VIN",STREET,false,false,0511,005,9,49,07,1181052,1837191,2012,02/04/2016 06:33:39 AM,41.708495677,-87.612580474,"(41.708495677, -87.612580474)"
+8517339,194563,02/01/2012 08:15:00 AM,003XX W 108TH ST,0460,BATTERY,SIMPLE,"SCHOOL, PRIVATE, BUILDING",false,false,0513,005,34,49,08B,1176016,1833309,2012,02/04/2016 06:33:39 AM,41.6979571,-87.631138505,"(41.6979571, -87.631138505)"
+8517340,194531,03/12/2012 05:50:00 PM,089XX S CARPENTER ST,0560,ASSAULT,SIMPLE,STREET,false,false,2222,022,21,73,08A,1170886,1845421,2012,02/04/2016 06:33:39 AM,41.731307475,-87.649569675,"(41.731307475, -87.649569675)"

data/spec/index/multi_index_spec.rb CHANGED

@@ -202,8 +202,16 @@ describe Daru::MultiIndex do
       expect(@multi_mi.include?([:a, :one])).to eq(true)
     end
-    it "checks for non-existence of a tuple" do
-      expect(@multi_mi.include?([:boo])).to eq(false)
+    it "checks for non-existence of completely specified tuple" do
+      expect(@multi_mi.include?([:b, :two, :foo])).to eq(false)
+    end
+    it "checks for non-existence of a top layer incomplete tuple" do
+      expect(@multi_mi.include?([:d])).to eq(false)
+    end
+    it "checks for non-existence of a middle layer incomplete tuple" do
+      expect(@multi_mi.include?([:c, :three])).to eq(false)
     end
   end

data/spec/io/io_spec.rb CHANGED

@@ -51,6 +51,16 @@ describe Daru::IO do
         expect(df['Domestic'].to_a).to all be_boolean
       end
+      it "uses the custom string converter correctly" do
+        df = Daru::DataFrame.from_csv 'spec/fixtures/string_converter_test.csv', converters: [:string]
+        expect(df['Case Number'].to_a.all? {|x| String === x }).to be_truthy
+      end
+      it "allow symbol to converters option" do
+        df = Daru::DataFrame.from_csv 'spec/fixtures/boolean_converter_test.csv', converters: :boolean
+        expect(df['Domestic'].to_a).to all be_boolean
+      end
       it "checks for equal parsing of local CSV files and remote CSV files" do
         %w[matrix_test repeated_fields scientific_notation sales-funnel].each do |file|
           df_local  = Daru::DataFrame.from_csv("spec/fixtures/#{file}.csv")

data/spec/vector_spec.rb CHANGED

@@ -1808,6 +1808,22 @@ describe Daru::Vector do
     end
   end
+  context '#rolling_fillna' do
+    subject do
+      Daru::Vector.new(
+        [Float::NAN, 2, 1, 4, nil, Float::NAN, 3, nil, Float::NAN]
+      )
+    end
+    context 'rolling_fillna forwards' do
+      it { expect(subject.rolling_fillna(:forward).to_a).to eq [0, 2, 1, 4, 4, 4, 3, 3, 3] }
+    end
+    context 'rolling_fillna backwards' do
+      it { expect(subject.rolling_fillna(direction: :backward).to_a).to eq [2, 2, 1, 4, 3, 3, 3, 0, 0] }
+    end
+  end
   context "#type" do
     before(:each) do
       @numeric    = Daru::Vector.new([1,2,3,4,5])

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: daru
 version: !ruby/object:Gem::Version
-  version: 0.2.0
+  version: 0.2.1
 platform: ruby
 authors:
 - Sameer Deshmukh
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2017-10-31 00:00:00.000000000 Z
+date: 2018-07-02 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: backports
@@ -532,6 +532,7 @@ files:
 - spec/fixtures/repeated_fields.csv
 - spec/fixtures/sales-funnel.csv
 - spec/fixtures/scientific_notation.csv
+- spec/fixtures/string_converter_test.csv
 - spec/fixtures/strings.dat
 - spec/fixtures/test_xls.xls
 - spec/fixtures/url_test.txt~
@@ -569,26 +570,7 @@ homepage: http://github.com/v0dro/daru
 licenses:
 - BSD-2
 metadata: {}
-post_install_message: |
-  *************************************************************************
-  Thank you for installing daru!
-    oOOOOOo
-   ,|    oO
-  //|     |
-  \\|     |
-   `|     |
-    `-----`
-  Hope you love daru! For enhanced interactivity and better visualizations,
-  consider using gnuplotrb and nyaplot with iruby. For statistics use the
-  statsample family.
-  Read the README for interesting use cases and examples.
-  Cheers!
-  *************************************************************************
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -604,7 +586,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.6.10
+rubygems_version: 2.6.14
 signing_key:
 specification_version: 4
 summary: Data Analysis in RUby
@@ -638,6 +620,7 @@ test_files:
 - spec/fixtures/repeated_fields.csv
 - spec/fixtures/sales-funnel.csv
 - spec/fixtures/scientific_notation.csv
+- spec/fixtures/string_converter_test.csv
 - spec/fixtures/strings.dat
 - spec/fixtures/test_xls.xls
 - spec/fixtures/url_test.txt~