daru 0.2.1 → 0.2.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 87e4e2869fe6411e3eece92bb5dc24d48f890774
4
- data.tar.gz: e711d0db1d57f51f31ccb7fb54078a6bdbcc4ff5
2
+ SHA256:
3
+ metadata.gz: 617e082fd3366f695622071cf630690d102552821e82926af81a7007bb09093d
4
+ data.tar.gz: b6b995e35e8124768a15a3e32d1fc38515aecc55f070510d7a51b45945520eb7
5
5
  SHA512:
6
- metadata.gz: afdb295d0d01542ba9f439cf5f7959d7f2a3b9e47de6047ecf7719548ef760e657c0dfe753ed16ee1da65e071bb5a182aaf03ee83c9de6075d54149753b9c346
7
- data.tar.gz: e0c4ace661d9f1cb7e8040d424bb004a0b650a9605037d1aff258258bbac40a3c158e5f5b8a2a5c6a28070cf55566a0729ee9b77c8114d40d4d18cf9d26e69c3
6
+ metadata.gz: 8ae029cac761e4a7164b472ad6ef5275d18aa7d6dace9f61ab4553cc288a1d95e80412804e28a4405444a97dfd1e850ce00783d9b45126c0cb8e0c4dafa09e63
7
+ data.tar.gz: e8aa0aed6c05ec54ba4f5083ed3baa47b6eb02c3304bb0152beea1e57e181aafe45ebc779bbdec76b22b6c86b29f71dc16d013250870589cb93eeae0b8ca0917
data/.gitignore CHANGED
@@ -6,3 +6,4 @@ doc/
6
6
  vendor/
7
7
  profile/out/
8
8
  coverage/
9
+ .ruby-version
@@ -22,7 +22,10 @@ script:
22
22
  - bundle exec yard-junk
23
23
 
24
24
  install:
25
- - gem install bundler
25
+ - if [ $TRAVIS_RUBY_VERSION == '2.2' ] || [ $TRAVIS_RUBY_VERSION == '2.1' ] || [ $TRAVIS_RUBY_VERSION == '2.0' ];
26
+ then gem install bundler -v '~> 1.6';
27
+ else gem install bundler;
28
+ fi
26
29
  - gem install rainbow -v '2.2.1'
27
30
  - bundle install
28
31
 
@@ -22,12 +22,21 @@ And run the test suite (should be all green with pending tests):
22
22
 
23
23
  If you have problems installing nmatrix, please consult the [nmatrix installation wiki](https://github.com/SciRuby/nmatrix/wiki/Installation) or the [mailing list](https://groups.google.com/forum/#!forum/sciruby-dev).
24
24
 
25
+ **NOTE**: `Daru` is compatible with Ruby versions < 2.5; for later Ruby versions it breaks, returning the following error in versions >= 2.5.
26
+ ```
27
+ /gems/packable-1.3.10/lib/packable/extensions/io.rb:86:in `pos': Illegal seek @ rb_io_tell - <STDOUT> (Errno::ESPIPE)
28
+ ```
29
+ To reproduce this issue or explore this error further, head over to
30
+ [issue #500](https://github.com/SciRuby/daru/issues/500),
31
+ [issue #503](https://github.com/SciRuby/daru/issues/503). Also, if you want to fix this issue, then please discuss it here : [#505](https://github.com/SciRuby/daru/issues/500)
32
+
25
33
  While preparing your pull requests, don't forget to check your code with Rubocop:
26
34
 
27
35
  `bundle exec rubocop`
28
36
 
29
37
  [Optional] Install all Ruby versions which Daru currently supports with `rake spec setup`.
30
38
 
39
+
31
40
  ## Basic Development Flow
32
41
 
33
42
  1. Create a new branch with `git checkout -b <branch_name>`.
data/History.md CHANGED
@@ -1,3 +1,18 @@
1
+ # 0.2.2 (8 August 2019)
2
+
3
+ * Minor Enhancements
4
+ - DataFrame#set_index can take column name array, which results in multi-index https://github.com/SciRuby/daru/pull/471 (by @Yuki-Inoue)
5
+ - implements DataFrame#reset_index https://github.com/SciRuby/daru/pull/473 (by @Yuki-Inoue)
6
+ - Make DataFrame.from_activerecord faster https://github.com/SciRuby/daru/pull/464 (by @paisible-wanderer )
7
+ - Added access_row_tuples_by_indexs method https://github.com/SciRuby/daru/pull/463 (by @Prakriti-nith )
8
+
9
+ * Fixes
10
+ - Fix reindex vector on argument error https://github.com/SciRuby/daru/pull/470 (by @Yuki-Inoue)
11
+ - Optimize aggregation https://github.com/SciRuby/daru/pull/464 (by @paisible-wanderer)
12
+ - Index#dup should copy reference to name too https://github.com/SciRuby/daru/pull/477 (by @Yuki-Inoue)
13
+ - Should support bundler version 2.x.x https://github.com/SciRuby/daru/pull/483/ (by @Shekharrajak )
14
+ - fix table style https://github.com/SciRuby/daru/pull/489 (by @kojix2 )
15
+
1
16
  # 0.2.1 (02 July 2018)
2
17
 
3
18
  * Minor Enhancements
@@ -116,7 +131,7 @@
116
131
  - Support formatting empty dataframes. They were returning an error before. (@gnilrets)
117
132
  - method_missing in Daru::DataFrame would not detect the correct vector if it was a String. Fixed that. (@lokeshh)
118
133
  - Fix docs of contrast_code to specify that the default value is false. (@v0dro)
119
- - Fix occurence of SystemStackError due to faulty arguement passing to Array#values_at. (@v0dro)
134
+ - Fix occurence of SystemStackError due to faulty argument passing to Array#values_at. (@v0dro)
120
135
  - Fix `DataFrame#pivot_table` regression that raised an ArgumentError if the `:index` option was not specified. (@zverok)
121
136
  - Fix `DateFrame.rows` to accept empty argument. (@zverok)
122
137
  - Fix bug with false values on dataframe create. DataFrame from an Array of hashes wasn't being created properly when some of the values were `false`. (@gnilrets)
data/README.md CHANGED
@@ -11,6 +11,25 @@ daru (Data Analysis in RUby) is a library for storage, analysis, manipulation an
11
11
 
12
12
  daru makes it easy and intuitive to process data predominantly through 2 data structures: `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2, 2.3, and 2.4.
13
13
 
14
+
15
+ ## daru plugin gems
16
+
17
+ - **[daru-view](https://github.com/SciRuby/daru-view)**
18
+
19
+ daru-view is for easy and interactive plotting in web application & IRuby
20
+ notebook. It can work in any Ruby web application frameworks like Rails, Sinatra, Nanoc and hopefully in others too.
21
+
22
+ Articles/Blogs, that summarize powerful features of daru-view:
23
+
24
+ * [GSoC 2017 daru-view](http://sciruby.com/blog/2017/09/01/gsoc-2017-data-visualization-using-daru-view/)
25
+ * [GSoC 2018 Progress Report](https://github.com/SciRuby/daru-view/wiki/GSoC-2018---Progress-Report)
26
+ * [HighCharts Official blog post regarding daru-view](https://www.highcharts.com/blog/post/i-am-ruby-developer-how-can-i-use-highcharts/)
27
+
28
+ - **[daru-io](https://github.com/SciRuby/daru-io)**
29
+
30
+ This gem extends support for many Import and Export methods of `Daru::DataFrame`. This gem is intended to help Rubyists who are into Data Analysis or Web Development, by serving as a general purpose conversion library that takes input in one format (say, JSON) and converts it another format (say, Avro) while also making it incredibly easy to getting started on analyzing data with daru. One can read more in [SciRuby/blog/daru-io](http://sciruby.com/blog/2017/08/29/gsoc-2017-support-to-import-export-of-more-formats/).
31
+
32
+
14
33
  ## Features
15
34
 
16
35
  * Data structures:
@@ -83,9 +102,9 @@ $ gem install daru
83
102
 
84
103
  ### Categorical Data
85
104
 
86
- * [Categorical Index](http://lokeshh.github.io/blog/2016/06/14/categorical-index/)
87
- * [Categorical Data](http://lokeshh.github.io/blog/2016/06/21/categorical-data/)
88
- * [Visualization with Categorical Data](http://lokeshh.github.io/blog/2016/07/02/visualization/)
105
+ * [Categorical Index](http://lokeshh.github.io/gsoc2016/blog/2016/06/14/categorical-index/)
106
+ * [Categorical Data](http://lokeshh.github.io/gsoc2016/blog/2016/06/21/categorical-data/)
107
+ * [Visualization with Categorical Data](http://lokeshh.github.io/gsoc2016/blog/2016/07/02/visualization/)
89
108
 
90
109
  ## Basic Usage
91
110
 
@@ -0,0 +1,34 @@
1
+ $:.unshift File.expand_path("../../lib", __FILE__)
2
+
3
+ require 'benchmark'
4
+ require 'daru'
5
+ require 'sqlite3'
6
+ require 'dbi'
7
+ require 'active_record'
8
+
9
+ db_name = 'daru_test.sqlite'
10
+ FileUtils.rm(db_name) if File.file?(db_name)
11
+
12
+ SQLite3::Database.new(db_name).tap do |db|
13
+ db.execute "create table accounts(id integer, name varchar, age integer, primary key(id))"
14
+
15
+ values = 1.upto(100_000).map { |i| %!(#{i},"name_#{i}",#{rand(100)})! }.join(",")
16
+ db.execute "insert into accounts values #{values}"
17
+ end
18
+
19
+ ActiveRecord::Base.establish_connection("sqlite3:#{db_name}")
20
+ ActiveRecord::Base.connection
21
+
22
+ class Account < ActiveRecord::Base; end
23
+
24
+ Benchmark.bm do |x|
25
+ x.report("DataFrame.from_sql") do
26
+ Daru::DataFrame.from_sql(ActiveRecord::Base.connection, "SELECT * FROM accounts")
27
+ end
28
+
29
+ x.report("DataFrame.from_activerecord") do
30
+ Daru::DataFrame.from_activerecord(Account.all)
31
+ end
32
+ end
33
+
34
+ FileUtils.rm(db_name)
@@ -33,7 +33,7 @@ Gem::Specification.new do |spec|
33
33
  spec.add_runtime_dependency 'packable', '~> 1.3.9'
34
34
 
35
35
  spec.add_development_dependency 'spreadsheet', '~> 1.1.1'
36
- spec.add_development_dependency 'bundler', '~> 1.10'
36
+ spec.add_development_dependency 'bundler', '>= 1.10'
37
37
  spec.add_development_dependency 'rake', '~>10.5'
38
38
  spec.add_development_dependency 'pry', '~> 0.10'
39
39
  spec.add_development_dependency 'pry-byebug'
@@ -49,7 +49,9 @@ Gem::Specification.new do |spec|
49
49
  spec.add_development_dependency 'dbi'
50
50
  spec.add_development_dependency 'activerecord', '~> 4.0'
51
51
  spec.add_development_dependency 'mechanize'
52
- spec.add_development_dependency 'sqlite3'
52
+ # issue : https://github.com/SciRuby/daru/issues/493 occured
53
+ # with latest version of sqlite3
54
+ spec.add_development_dependency 'sqlite3', '~> 1.3.13'
53
55
  spec.add_development_dependency 'rubocop', '~> 0.49.0'
54
56
  spec.add_development_dependency 'ruby-prof'
55
57
  spec.add_development_dependency 'simplecov'
@@ -74,6 +74,13 @@ module Daru
74
74
  end
75
75
  end
76
76
 
77
+ # this method is overwritten: see Daru::Category#plotting_library=
78
+ def plot(*args, **options, &b)
79
+ init_plotting_library
80
+
81
+ plot(*args, **options, &b)
82
+ end
83
+
77
84
  alias_method :rename, :name=
78
85
 
79
86
  # Returns an enumerator that enumerates on categorical data
@@ -174,7 +181,7 @@ module Daru
174
181
  # Returns vector for indexes/positions specified
175
182
  # @param [Array] indexes for which values has to be retrived
176
183
  # @note Since it accepts both indexes and postions. In case of collision,
177
- # arguement will be treated as index
184
+ # argument will be treated as index
178
185
  # @return vector containing values specified at specified indexes/positions
179
186
  # @example
180
187
  # dv = Daru::Vector.new [:a, 1, :a, 1, :c],
@@ -748,6 +755,11 @@ module Daru
748
755
 
749
756
  private
750
757
 
758
+ # Will lazily load the plotting library being used
759
+ def init_plotting_library
760
+ self.plotting_library = Daru.plotting_library
761
+ end
762
+
751
763
  def validate_categories input_categories
752
764
  raise ArgumentError, 'Input categories and speculated categories mismatch' unless
753
765
  (categories - input_categories).empty?
@@ -768,9 +780,6 @@ module Daru
768
780
  # To link every instance to its category,
769
781
  # it stores integer for every instance representing its category
770
782
  @array = map_cat_int.values_at(*data)
771
-
772
- # Include plotting functionality
773
- self.plotting_library = Daru.plotting_library
774
783
  end
775
784
 
776
785
  def category_from_position position
@@ -2,6 +2,7 @@ module Daru
2
2
  module Core
3
3
  class GroupBy
4
4
  class << self
5
+ # @private
5
6
  def get_positions_group_map_on(indexes_with_positions, sort: false)
6
7
  group_map = {}
7
8
 
@@ -17,6 +18,7 @@ module Daru
17
18
  group_map
18
19
  end
19
20
 
21
+ # @private
20
22
  def get_positions_group_for_aggregation(multi_index, level=-1)
21
23
  raise unless multi_index.is_a?(Daru::MultiIndex)
22
24
 
@@ -26,16 +28,19 @@ module Daru
26
28
  get_positions_group_map_on(new_index.each_with_index)
27
29
  end
28
30
 
31
+ # @private
29
32
  def get_positions_group_map_for_df(df, group_by_keys, sort: true)
30
33
  indexes_with_positions = df[*group_by_keys].to_df.each_row.map(&:to_a).each_with_index
31
34
 
32
35
  get_positions_group_map_on(indexes_with_positions, sort: sort)
33
36
  end
34
37
 
38
+ # @private
35
39
  def group_map_from_positions_to_indexes(positions_group_map, index)
36
40
  positions_group_map.map { |k, positions| [k, positions.map { |pos| index.at(pos) }] }.to_h
37
41
  end
38
42
 
43
+ # @private
39
44
  def df_from_group_map(df, group_map, remaining_vectors, from_position: true)
40
45
  return nil if group_map == {}
41
46
 
@@ -52,7 +57,17 @@ module Daru
52
57
  end
53
58
  end
54
59
 
55
- attr_reader :groups, :df
60
+ # lazy accessor/attr_reader for the attribute groups
61
+ def groups
62
+ @groups ||= GroupBy.group_map_from_positions_to_indexes(@groups_by_pos, @context.index)
63
+ end
64
+ alias :groups_by_idx :groups
65
+
66
+ # lazy accessor/attr_reader for the attribute df
67
+ def df
68
+ @df ||= GroupBy.df_from_group_map(@context, @groups_by_pos, @non_group_vectors)
69
+ end
70
+ alias :grouped_df :df
56
71
 
57
72
  # Iterate over each group created by group_by. A DataFrame is yielded in
58
73
  # block.
@@ -75,8 +90,11 @@ module Daru
75
90
  end
76
91
 
77
92
  def initialize context, names
93
+ @group_vectors = names
78
94
  @non_group_vectors = context.vectors.to_a - names
79
- @context = context
95
+
96
+ @context = context # TODO: maybe rename in @original_df or @grouped_db
97
+
80
98
  # FIXME: It feels like we don't want to sort here. Ruby's #group_by
81
99
  # never sorts:
82
100
  #
@@ -84,22 +102,14 @@ module Daru
84
102
  # # => {4=>["test"], 2=>["me"], 6=>["please"]}
85
103
  #
86
104
  # - zverok, 2016-09-12
87
- positions_groups = GroupBy.get_positions_group_map_for_df(@context, names, sort: true)
88
-
89
- @groups = GroupBy.group_map_from_positions_to_indexes(positions_groups, @context.index)
90
- @df = GroupBy.df_from_group_map(@context, positions_groups, @non_group_vectors)
105
+ @groups_by_pos = GroupBy.get_positions_group_map_for_df(@context, @group_vectors, sort: true)
91
106
  end
92
107
 
93
108
  # Get a Daru::Vector of the size of each group.
94
109
  def size
95
- index =
96
- if multi_indexed_grouping?
97
- Daru::MultiIndex.from_tuples @groups.keys
98
- else
99
- Daru::Index.new @groups.keys.flatten
100
- end
110
+ index = get_grouped_index
101
111
 
102
- values = @groups.values.map(&:size)
112
+ values = @groups_by_pos.values.map(&:size)
103
113
  Daru::Vector.new(values, index: index, name: :size)
104
114
  end
105
115
 
@@ -246,7 +256,7 @@ module Daru
246
256
  # # a b c d
247
257
  # # 5 bar two 6 66
248
258
  def get_group group
249
- indexes = @groups[group]
259
+ indexes = groups_by_idx[group]
250
260
  elements = @context.each_vector.map(&:to_a)
251
261
  transpose = elements.transpose
252
262
  rows = indexes.each.map { |idx| transpose[idx] }
@@ -273,7 +283,7 @@ module Daru
273
283
  # # a ACE
274
284
  # # b BDF
275
285
  def reduce(init=nil)
276
- result_hash = @groups.each_with_object({}) do |(group, indices), h|
286
+ result_hash = groups_by_idx.each_with_object({}) do |(group, indices), h|
277
287
  group_indices = indices.map { |v| @context.index.to_a[v] }
278
288
 
279
289
  grouped_result = init
@@ -284,18 +294,13 @@ module Daru
284
294
  h[group] = grouped_result
285
295
  end
286
296
 
287
- index =
288
- if multi_indexed_grouping?
289
- Daru::MultiIndex.from_tuples result_hash.keys
290
- else
291
- Daru::Index.new result_hash.keys.flatten
292
- end
297
+ index = get_grouped_index(result_hash.keys)
293
298
 
294
299
  Daru::Vector.new(result_hash.values, index: index)
295
300
  end
296
301
 
297
302
  def inspect
298
- @df.inspect
303
+ grouped_df.inspect
299
304
  end
300
305
 
301
306
  # Function to use for aggregating the data.
@@ -335,7 +340,9 @@ module Daru
335
340
  # Ram Hyderabad,Mumbai
336
341
  #
337
342
  def aggregate(options={})
338
- @df.aggregate(options)
343
+ new_index = get_grouped_index
344
+
345
+ @context.aggregate(options) { [@groups_by_pos.values, new_index] }
339
346
  end
340
347
 
341
348
  private
@@ -344,7 +351,7 @@ module Daru
344
351
  selection = @context
345
352
  rows, indexes = [], []
346
353
 
347
- @groups.each_value do |index|
354
+ groups_by_idx.each_value do |index|
348
355
  index.send(method, quantity).each do |idx|
349
356
  rows << selection.row[idx].to_a
350
357
  indexes << idx
@@ -360,29 +367,31 @@ module Daru
360
367
  method_type == :numeric && @context[ngvec].type == :numeric
361
368
  end
362
369
 
363
- rows = @groups.map do |_group, indexes|
370
+ rows = groups_by_idx.map do |_group, indexes|
364
371
  order.map do |ngvector|
365
372
  slice = @context[ngvector][*indexes]
366
373
  slice.is_a?(Daru::Vector) ? slice.send(method) : slice
367
374
  end
368
375
  end
369
376
 
370
- index = apply_method_index
377
+ index = get_grouped_index
371
378
  order = Daru::Index.new(order)
372
379
  Daru::DataFrame.new(rows.transpose, index: index, order: order)
373
380
  end
374
381
 
375
- def apply_method_index
382
+ def get_grouped_index(index_tuples=nil)
383
+ index_tuples = @groups_by_pos.keys if index_tuples.nil?
384
+
376
385
  if multi_indexed_grouping?
377
- Daru::MultiIndex.from_tuples(@groups.keys)
386
+ Daru::MultiIndex.from_tuples(index_tuples)
378
387
  else
379
- Daru::Index.new(@groups.keys.flatten)
388
+ Daru::Index.new(index_tuples.flatten)
380
389
  end
381
390
  end
382
391
 
383
392
  def multi_indexed_grouping?
384
- return false unless @groups.keys[0]
385
- @groups.keys[0].size > 1
393
+ return false unless @groups_by_pos.keys[0]
394
+ @groups_by_pos.keys[0].size > 1
386
395
  end
387
396
  end
388
397
  end
@@ -10,7 +10,8 @@ module Daru
10
10
  include Daru::Maths::Arithmetic::DataFrame
11
11
  include Daru::Maths::Statistics::DataFrame
12
12
  # TODO: Remove this line but its causing erros due to unkown reason
13
- include Daru::Plotting::DataFrame::NyaplotLibrary if Daru.has_nyaplot?
13
+ Daru.has_nyaplot?
14
+
14
15
  extend Gem::Deprecate
15
16
 
16
17
  class << self
@@ -346,20 +347,19 @@ module Daru
346
347
  @name = opts[:name]
347
348
 
348
349
  case source
349
- when ->(s) { s.empty? }
350
- @vectors = Index.coerce vectors
351
- @index = Index.coerce index
352
- create_empty_vectors
350
+ when [], {}
351
+ create_empty_vectors(vectors, index)
353
352
  when Array
354
353
  initialize_from_array source, vectors, index, opts
355
354
  when Hash
356
355
  initialize_from_hash source, vectors, index, opts
356
+ when ->(s) { s.empty? } # TODO: likely want to remove this case
357
+ create_empty_vectors(vectors, index)
357
358
  end
358
359
 
359
360
  set_size
360
361
  validate
361
362
  update
362
- self.plotting_library = Daru.plotting_library
363
363
  end
364
364
 
365
365
  def plotting_library= lib
@@ -372,11 +372,18 @@ module Daru
372
372
  )
373
373
  end
374
374
  else
375
- raise ArguementError, "Plotting library #{lib} not supported. "\
375
+ raise ArgumentError, "Plotting library #{lib} not supported. "\
376
376
  'Supported libraries are :nyaplot and :gruff'
377
377
  end
378
378
  end
379
379
 
380
+ # this method is overwritten: see Daru::DataFrame#plotting_library=
381
+ def plot(*args, **options, &b)
382
+ init_plotting_library
383
+
384
+ plot(*args, **options, &b)
385
+ end
386
+
380
387
  # Access row or vector. Specify name of row/vector followed by axis(:row, :vector).
381
388
  # Defaults to *:vector*. Use of this method is not recommended for accessing
382
389
  # rows. Use df.row[:a] for accessing row with index ':a'.
@@ -404,13 +411,11 @@ module Daru
404
411
  validate_positions(*positions, nrows)
405
412
 
406
413
  if positions.is_a? Integer
407
- return Daru::Vector.new @data.map { |vec| vec.at(*positions) },
408
- index: @vectors
414
+ row = get_rows_for([positions])
415
+ Daru::Vector.new row, index: @vectors
409
416
  else
410
- new_rows = @data.map { |vec| vec.at(*original_positions) }
411
- return Daru::DataFrame.new new_rows,
412
- index: @index.at(*original_positions),
413
- order: @vectors
417
+ new_rows = get_rows_for(original_positions)
418
+ Daru::DataFrame.new new_rows, index: @index.at(*original_positions), order: @vectors
414
419
  end
415
420
  end
416
421
 
@@ -621,7 +626,7 @@ module Daru
621
626
  deprecate :dup_only_valid, :reject_values, 2016, 10
622
627
 
623
628
  # Returns a dataframe in which rows with any of the mentioned values
624
- # are ignored.
629
+ # are ignored.
625
630
  # @param [Array] values to reject to form the new dataframe
626
631
  # @return [Daru::DataFrame] Data Frame with only rows which doesn't
627
632
  # contain the mentioned values
@@ -752,7 +757,7 @@ module Daru
752
757
  # 3 4 d
753
758
  #
754
759
  def uniq(*vtrs)
755
- vecs = vtrs.empty? ? vectors.map(&:to_s) : Array(vtrs)
760
+ vecs = vtrs.empty? ? vectors.to_a : Array(vtrs)
756
761
  grouped = group_by(vecs)
757
762
  indexes = grouped.groups.values.map { |v| v[0] }.sort
758
763
  row[*indexes]
@@ -1011,6 +1016,7 @@ module Daru
1011
1016
  case method
1012
1017
  when Symbol then df.send(method)
1013
1018
  when Proc then method.call(df)
1019
+ when Array then method.map(&:to_proc).map { |proc| proc.call(df) } # works with Array of both Symbol and/or Proc
1014
1020
  else raise
1015
1021
  end
1016
1022
  end
@@ -1489,7 +1495,7 @@ module Daru
1489
1495
  def reindex_vectors new_vectors
1490
1496
  unless new_vectors.is_a?(Daru::Index)
1491
1497
  raise ArgumentError, 'Must pass the new index of type Index or its '\
1492
- "subclasses, not #{new_index.class}"
1498
+ "subclasses, not #{new_vectors.class}"
1493
1499
  end
1494
1500
 
1495
1501
  cl = Daru::DataFrame.new({}, order: new_vectors, index: @index, name: @name)
@@ -1527,14 +1533,52 @@ module Daru
1527
1533
  df
1528
1534
  end
1529
1535
 
1536
+ module SetSingleIndexStrategy
1537
+ def self.uniq_size(df, col)
1538
+ df[col].uniq.size
1539
+ end
1540
+
1541
+ def self.new_index(df, col)
1542
+ Daru::Index.new(df[col].to_a)
1543
+ end
1544
+
1545
+ def self.delete_vector(df, col)
1546
+ df.delete_vector(col)
1547
+ end
1548
+ end
1549
+
1550
+ module SetMultiIndexStrategy
1551
+ def self.uniq_size(df, cols)
1552
+ df[*cols].uniq.size
1553
+ end
1554
+
1555
+ def self.new_index(df, cols)
1556
+ Daru::MultiIndex.from_arrays(df[*cols].map_vectors(&:to_a)).tap do |mi|
1557
+ mi.name = cols
1558
+ mi
1559
+ end
1560
+ end
1561
+
1562
+ def self.delete_vector(df, cols)
1563
+ df.delete_vectors(*cols)
1564
+ end
1565
+ end
1566
+
1530
1567
  # Set a particular column as the new DF
1531
- def set_index new_index, opts={}
1532
- raise ArgumentError, 'All elements in new index must be unique.' if
1533
- @size != self[new_index].uniq.size
1568
+ def set_index new_index_col, opts={}
1569
+ if new_index_col.respond_to?(:to_a)
1570
+ strategy = SetMultiIndexStrategy
1571
+ new_index_col = new_index_col.to_a
1572
+ else
1573
+ strategy = SetSingleIndexStrategy
1574
+ end
1534
1575
 
1535
- self.index = Daru::Index.new(self[new_index].to_a)
1536
- delete_vector(new_index) unless opts[:keep]
1576
+ uniq_size = strategy.uniq_size(self, new_index_col)
1577
+ raise ArgumentError, 'All elements in new index must be unique.' if
1578
+ @size != uniq_size
1537
1579
 
1580
+ self.index = strategy.new_index(self, new_index_col)
1581
+ strategy.delete_vector(self, new_index_col) unless opts[:keep]
1538
1582
  self
1539
1583
  end
1540
1584
 
@@ -1572,11 +1616,24 @@ module Daru
1572
1616
  end
1573
1617
  end
1574
1618
 
1619
+ def reset_index
1620
+ index_df = index.to_df
1621
+ names = index.name
1622
+ names = [names] unless names.instance_of?(Array)
1623
+ new_vectors = names + vectors.to_a
1624
+ self.index = index_df.index
1625
+ names.each do |name|
1626
+ self[name] = index_df[name]
1627
+ end
1628
+ self.order = new_vectors
1629
+ self
1630
+ end
1631
+
1575
1632
  # Reassign index with a new index of type Daru::Index or any of its subclasses.
1576
1633
  #
1577
1634
  # @param [Daru::Index] idx New index object on which the rows of the dataframe
1578
1635
  # are to be indexed.
1579
- # @example Reassgining index of a DataFrame
1636
+ # @example Reassigining index of a DataFrame
1580
1637
  # df = Daru::DataFrame.new({a: [1,2,3,4], b: [11,22,33,44]})
1581
1638
  # df.index.to_a #=> [0,1,2,3]
1582
1639
  #
@@ -2088,7 +2145,7 @@ module Daru
2088
2145
 
2089
2146
  # Write this DataFrame to a CSV file.
2090
2147
  #
2091
- # == Arguements
2148
+ # == Arguments
2092
2149
  #
2093
2150
  # * filename - Path of CSV file where the DataFrame is to be saved.
2094
2151
  #
@@ -2264,7 +2321,7 @@ module Daru
2264
2321
  # # 2 3]
2265
2322
  def split_by_category cat_name
2266
2323
  cat_dv = self[cat_name]
2267
- raise ArguementError, "#{cat_name} is not a category vector" unless
2324
+ raise ArgumentError, "#{cat_name} is not a category vector" unless
2268
2325
  cat_dv.category?
2269
2326
 
2270
2327
  cat_dv.categories.map do |cat|
@@ -2274,6 +2331,50 @@ module Daru
2274
2331
  end
2275
2332
  end
2276
2333
 
2334
+ # @param indexes [Array] index(s) at which row tuples are retrieved
2335
+ # @return [Array] returns array of row tuples at given index(s)
2336
+ # @example Using Daru::Index
2337
+ # df = Daru::DataFrame.new({
2338
+ # a: [1, 2, 3],
2339
+ # b: ['a', 'a', 'b']
2340
+ # })
2341
+ #
2342
+ # df.access_row_tuples_by_indexs(1,2)
2343
+ # # => [[2, "a"], [3, "b"]]
2344
+ #
2345
+ # df.index = Daru::Index.new([:one,:two,:three])
2346
+ # df.access_row_tuples_by_indexs(:one,:three)
2347
+ # # => [[1, "a"], [3, "b"]]
2348
+ #
2349
+ # @example Using Daru::MultiIndex
2350
+ # mi_idx = Daru::MultiIndex.from_tuples [
2351
+ # [:a,:one,:bar],
2352
+ # [:a,:one,:baz],
2353
+ # [:b,:two,:bar],
2354
+ # [:a,:two,:baz],
2355
+ # ]
2356
+ # df_mi = Daru::DataFrame.new({
2357
+ # a: 1..4,
2358
+ # b: 'a'..'d'
2359
+ # }, index: mi_idx )
2360
+ #
2361
+ # df_mi.access_row_tuples_by_indexs(:b, :two, :bar)
2362
+ # # => [[3, "c"]]
2363
+ # df_mi.access_row_tuples_by_indexs(:a)
2364
+ # # => [[1, "a"], [2, "b"], [4, "d"]]
2365
+ def access_row_tuples_by_indexs *indexes
2366
+ return get_sub_dataframe(indexes, by_position: false).map_rows(&:to_a) if
2367
+ @index.is_a?(Daru::MultiIndex)
2368
+ positions = @index.pos(*indexes)
2369
+ if positions.is_a? Numeric
2370
+ row = get_rows_for([positions])
2371
+ row.first.is_a?(Array) ? row : [row]
2372
+ else
2373
+ new_rows = get_rows_for(indexes, by_position: false)
2374
+ indexes.map { |index| new_rows.map { |r| r[index] } }
2375
+ end
2376
+ end
2377
+
2277
2378
  # Function to use for aggregating the data.
2278
2379
  #
2279
2380
  # @param options [Hash] options for column, you want in resultant dataframe
@@ -2322,25 +2423,28 @@ module Daru
2322
2423
  # Note: `GroupBy` class `aggregate` method uses this `aggregate` method
2323
2424
  # internally.
2324
2425
  def aggregate(options={}, multi_index_level=-1)
2325
- positions_tuples, new_index = group_index_for_aggregation(@index, multi_index_level)
2426
+ if block_given?
2427
+ positions_tuples, new_index = yield(@index) # note: use of yield is private for now
2428
+ else
2429
+ positions_tuples, new_index = group_index_for_aggregation(@index, multi_index_level)
2430
+ end
2326
2431
 
2327
2432
  colmn_value = aggregate_by_positions_tuples(options, positions_tuples)
2328
2433
 
2329
2434
  Daru::DataFrame.new(colmn_value, index: new_index, order: options.keys)
2330
2435
  end
2331
2436
 
2332
- # Is faster than using group_by followed by aggregate (because it doesn't generate an intermediary dataframe)
2333
2437
  def group_by_and_aggregate(*group_by_keys, **aggregation_map)
2334
- positions_groups = Daru::Core::GroupBy.get_positions_group_map_for_df(self, group_by_keys.flatten, sort: true)
2335
-
2336
- new_index = Daru::MultiIndex.from_tuples(positions_groups.keys).coerce_index
2337
- colmn_value = aggregate_by_positions_tuples(aggregation_map, positions_groups.values)
2338
-
2339
- Daru::DataFrame.new(colmn_value, index: new_index, order: aggregation_map.keys)
2438
+ group_by(*group_by_keys).aggregate(aggregation_map)
2340
2439
  end
2341
2440
 
2342
2441
  private
2343
2442
 
2443
+ # Will lazily load the plotting library being used for this dataframe
2444
+ def init_plotting_library
2445
+ self.plotting_library = Daru.plotting_library
2446
+ end
2447
+
2344
2448
  def headers
2345
2449
  Daru::Index.new(Array(index.name) + @vectors.to_a)
2346
2450
  end
@@ -2452,19 +2556,30 @@ module Daru
2452
2556
  positions = @index.pos(*indexes)
2453
2557
 
2454
2558
  if positions.is_a? Numeric
2455
- return Daru::Vector.new populate_row_for(positions),
2456
- index: @vectors,
2457
- name: indexes.first
2559
+ row = get_rows_for([positions])
2560
+ Daru::Vector.new row, index: @vectors, name: indexes.first
2458
2561
  else
2459
- new_rows = @data.map { |vec| vec[*indexes] }
2460
- return Daru::DataFrame.new new_rows,
2461
- index: @index.subset(*indexes),
2462
- order: @vectors
2562
+ new_rows = get_rows_for(indexes, by_position: false)
2563
+ Daru::DataFrame.new new_rows, index: @index.subset(*indexes), order: @vectors
2463
2564
  end
2464
2565
  end
2465
2566
 
2466
- def populate_row_for pos
2467
- @data.map { |vector| vector.at(*pos) }
2567
+ # @param keys [Array] can be an array of positions (if by_position is true) or indexes (if by_position if false)
2568
+ # because of coercion by Daru::Vector#at and Daru::Vector#[], can return either an Array of
2569
+ # values (representing a row) or an array of Vectors (that can be seen as rows)
2570
+ def get_rows_for(keys, by_position: true)
2571
+ raise unless keys.is_a?(Array)
2572
+
2573
+ if by_position
2574
+ pos = keys
2575
+ @data.map { |vector| vector.at(*pos) }
2576
+ else
2577
+ # TODO: for now (2018-07-27), it is different than using
2578
+ # get_rows_for(@index.pos(*keys))
2579
+ # because Daru::Vector#at and Daru::Vector#[] don't handle Daru::MultiIndex the same way
2580
+ indexes = keys
2581
+ @data.map { |vec| vec[*indexes] }
2582
+ end
2468
2583
  end
2469
2584
 
2470
2585
  def insert_or_modify_vector name, vector
@@ -2565,7 +2680,10 @@ module Daru
2565
2680
  set_size
2566
2681
  end
2567
2682
 
2568
- def create_empty_vectors
2683
+ def create_empty_vectors(vectors, index)
2684
+ @vectors = Index.coerce vectors
2685
+ @index = Index.coerce index
2686
+
2569
2687
  @data = @vectors.map do |name|
2570
2688
  Daru::Vector.new([], name: coerce_name(name), index: @index)
2571
2689
  end
@@ -2885,7 +3003,6 @@ module Daru
2885
3003
 
2886
3004
  # Raises IndexError when one of the positions is not a valid position
2887
3005
  def validate_positions *positions, size
2888
- positions = [positions] if positions.is_a? Integer
2889
3006
  positions.each do |pos|
2890
3007
  raise IndexError, "#{pos} is not a valid position." if pos >= size
2891
3008
  end
@@ -2910,28 +3027,57 @@ module Daru
2910
3027
  end
2911
3028
 
2912
3029
  def aggregate_by_positions_tuples(options, positions_tuples)
2913
- options.map do |vect, method|
2914
- if @vectors.include?(vect)
2915
- vect = self[vect]
3030
+ agg_over_vectors_only, options = cast_aggregation_options(options)
3031
+
3032
+ if agg_over_vectors_only
3033
+ options.map do |vect_name, method|
3034
+ vect = self[vect_name]
2916
3035
 
2917
3036
  positions_tuples.map do |positions|
2918
3037
  vect.apply_method_on_sub_vector(method, keys: positions)
2919
3038
  end
2920
- else
2921
- positions_tuples.map do |positions|
2922
- apply_method_on_sub_df(method, keys: positions)
2923
- end
2924
3039
  end
3040
+ else
3041
+ methods = options.values
3042
+
3043
+ # note: because we aggregate over rows, we don't have to re-get sub-dfs for each method (which is expensive)
3044
+ rows = positions_tuples.map do |positions|
3045
+ apply_method_on_sub_df(methods, keys: positions)
3046
+ end
3047
+
3048
+ rows.transpose
3049
+ end
3050
+ end
3051
+
3052
+ # convert operations over sub-vectors to operations over sub-dfs when it improves perf
3053
+ # note: we don't always "cast" because aggregation over a single vector / a few vector is faster
3054
+ # than aggregation over (sub-)dfs
3055
+ def cast_aggregation_options(options)
3056
+ vects, non_vects = options.keys.partition { |k| @vectors.include?(k) }
3057
+
3058
+ over_vectors = true
3059
+
3060
+ if non_vects.any?
3061
+ options = options.clone
3062
+
3063
+ vects.each do |name|
3064
+ proc_on_vect = options[name].to_proc
3065
+ options[name] = ->(sub_df) { proc_on_vect.call(sub_df[name]) }
3066
+ end
3067
+
3068
+ over_vectors = false
2925
3069
  end
3070
+
3071
+ [over_vectors, options]
2926
3072
  end
2927
3073
 
2928
3074
  def group_index_for_aggregation(index, multi_index_level=-1)
2929
3075
  case index
2930
3076
  when Daru::MultiIndex
2931
- groups = Daru::Core::GroupBy.get_positions_group_for_aggregation(index, multi_index_level)
2932
- new_index, pos_tuples = groups.keys, groups.values
3077
+ groups_by_pos = Daru::Core::GroupBy.get_positions_group_for_aggregation(index, multi_index_level)
2933
3078
 
2934
- new_index = Daru::MultiIndex.from_tuples(new_index).coerce_index
3079
+ new_index = Daru::MultiIndex.from_tuples(groups_by_pos.keys).coerce_index
3080
+ pos_tuples = groups_by_pos.values
2935
3081
  when Daru::Index, Daru::CategoricalIndex
2936
3082
  new_index = Array(index).uniq
2937
3083
  pos_tuples = new_index.map { |idx| [*index.pos(idx)] }
@@ -2950,7 +3096,7 @@ module Daru
2950
3096
  when Range
2951
3097
  size.times.to_a[positions.first]
2952
3098
  else
2953
- raise ArgumentError, 'Unkown position type.'
3099
+ raise ArgumentError, 'Unknown position type.'
2954
3100
  end
2955
3101
  else
2956
3102
  positions