daru 0.1.6 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 10338f8d554cc2c70b2dcc2d8fd029e73446f4de
4
- data.tar.gz: b848b5923eebe90577ef93d4c9e988d7aa09fe9b
3
+ metadata.gz: 69452b32fd8ef0ef7fb4ed58ab53ffa8aa15806d
4
+ data.tar.gz: 56927c77adbe7941eb2ca9a5e44d705931aad237
5
5
  SHA512:
6
- metadata.gz: ad6c0de4217a65a2c6245c4b969e98d970da47a08ce8128b060cd06a13ae415df6d5e7fdc7bba4c84115fc8559e80bea0a39c87efb1c3401df358f4df5d43117
7
- data.tar.gz: 34abe4afd0c88c24d3d0bbd6f11df5efe554f50b0d3d045a8a969f4eae2232d8d1cfeead814e336da507ca94b3126447631699258fe1a5dc5ea736de77c587f8
6
+ metadata.gz: 8e7511133b3409f7821cfec944a950d53df57bcd5893bb8a9557c013f31bf1e4a9cc07bbe1c143c63684f00f7d8d8f1adf3b31df732508e667ba6677f47d1d96
7
+ data.tar.gz: fc4beb70106372a276b21e0da645951595e5674f56e4422752aeeabc9cc2156983add90e59486aea4d88386fbeb2896d15f7ede30667bc84027abd900ee42e0e
@@ -0,0 +1,18 @@
1
+ Heya! We are glad you are going to contribute to Daru by creating an issue, and kindly ask you to
2
+ follow the simple rules:
3
+
4
+ 1. If it is a bug report, please provide a **self-containing** Ruby code for reproducing the bug.
5
+ This means if Daru contributors just copy-paste the code from issue into `this-is-bug.rb` and run
6
+ `ruby this-is-bug.rb`, it will be reproduced. If the bug is hard to spot (e.g. it is not some
7
+ `NoMethodError`, but the differences in data structure), please show it with comment in code or
8
+ plain text in the issue.
9
+ 2. If it is a feature request, try to do the following (if possible):
10
+ * show how new feature will work with small code example;
11
+ * explain the use case (if it is not 200% obvious);
12
+ * if you are aware of it, show how it works in pandas and/or R.
13
+ 3. If it is just a question ("how to do this or that" or "why Daru does this or that") feel free to
14
+ write it in any form that is convenient to you, but remember code examples and use cases are always
15
+ welcome.
16
+
17
+ Thanks! And please remove this text when finished with your issue description :)
18
+
@@ -12,6 +12,7 @@ AllCops:
12
12
  - 'vendor/**/*'
13
13
  - 'benchmarks/*'
14
14
  - 'profile/*'
15
+ - 'tmp/*'
15
16
  DisplayCopNames: true
16
17
  TargetRubyVersion: 2.0
17
18
 
@@ -9,12 +9,17 @@ rvm:
9
9
  - '2.4.0'
10
10
 
11
11
  matrix:
12
+ allow_failures:
13
+ - rvm: '2.0'
12
14
  fast_finish:
13
15
  true
14
16
 
15
17
  script:
18
+ - bundle add yard-junk
19
+ - bundle install
16
20
  - bundle exec rspec
17
21
  - bundle exec rubocop
22
+ - bundle exec yard-junk
18
23
 
19
24
  install:
20
25
  - gem install bundler
data/History.md CHANGED
@@ -1,3 +1,31 @@
1
+ # 0.2.0 (31 October 2017)
2
+ * Major Enhancements
3
+ - Add `DataFrame#which` query DSL (experimental! @rainchen)
4
+ - Add `DataFrame/Vector#rolling_fillna` (@baarkerlounger)
5
+ - Add `GroupBy#aggregate` (@shekharrajak)
6
+ - Add `DataFrame#uniq` (@baarkerlounger)
7
+
8
+ * Minor Enhancements
9
+ - Allow `Vector#count` to be called without param for category type Vector (@rainchen)
10
+ - Add option to `DataFrame#vector_sum` to skip nils (@parthm)
11
+ - Add installation instructions to README.md (@koishimasato)
12
+ - Add release policy documentation (@baarkerlounger)
13
+ - Set index as DataFrame's default x axis for nyaplot (@matugm)
14
+
15
+ * Fixes
16
+ - Fix `DataFrame/Vector#to_s` when name is a symbol (@baarkerlounger)
17
+ - Force `Vector#proportions` to return float (@rainchen)
18
+ - `DataFrame#new` creates empty DataFrame when given empty hash (@parthm)
19
+ - Remove unnecessary backports dependencies (@zverok)
20
+ - Specify minimum packable dependency (@zverok)
21
+ - Preserve key/column order when creating DataFrame from hash (@baarkerlounger)
22
+ - Fix `DataFrame#add_row` for DF with multi-index (@zverok)
23
+ - Fix `Vector#min, `#max`, `#index_of_min`, `#index_of_max` (0.1.6 regression) (@athityakumar)
24
+ - Integrate yard-junk into CI (@rohitner)
25
+ - Remove Travis spec restriction (@zverok)
26
+ - Fix tuple sorting for DataFrames with nils (@baarkerlounger)
27
+ - Fix merge on index dropping default index (@rohitner)
28
+
1
29
  # 0.1.6 (04 August 2017)
2
30
  * Major Enhancements
3
31
  - Add support for reading HTML tables into DataFrames (@athityakumar)
data/README.md CHANGED
@@ -26,6 +26,12 @@ daru makes it easy and intuitive to process data predominantly through 2 data st
26
26
  * Quickly reducing data with pivot tables for quick data summary.
27
27
  * Import and export data from and to Excel, CSV, SQL Databases, ActiveRecord and plain text files.
28
28
 
29
+ ## Installation
30
+
31
+ ```console
32
+ $ gem install daru
33
+ ```
34
+
29
35
  ## Notebooks
30
36
 
31
37
  #### Notebooks on most use cases
@@ -0,0 +1,20 @@
1
+ # Gem Release Policy
2
+
3
+ Applicable to Daru > 0.1.6
4
+
5
+ ## Versioning
6
+
7
+ Daru follows semantic versioning whereby the version number is always in the form MAJOR.MINOR.PATCH
8
+
9
+ * Patch bump = Bug fixes
10
+ * Minor bump = New features but backwards compatible
11
+ * Major bump = API breaking changes
12
+
13
+ For Major and Minor bumps release candidates should be released around 2 weeks prior to the bump and are indicated by MAJOR.MINOR.0.rc.
14
+
15
+ For more information see the full semantic versioning specification at http://semver.org/.
16
+
17
+ ## Release Timing
18
+
19
+ Patch releases should be done after every fix of a major bug (as tagged in the github issue tracker).
20
+ Major releases should be kept to the minimum.
@@ -52,6 +52,9 @@ EOF
52
52
 
53
53
  spec.add_runtime_dependency 'backports'
54
54
 
55
+ # it is required by NMatrix, yet we want to specify clearly which minimal version is OK
56
+ spec.add_runtime_dependency 'packable', '~> 1.3.9'
57
+
55
58
  spec.add_development_dependency 'spreadsheet', '~> 1.1.1'
56
59
  spec.add_development_dependency 'bundler', '~> 1.10'
57
60
  spec.add_development_dependency 'rake', '~>10.5'
@@ -75,6 +78,7 @@ EOF
75
78
  spec.add_development_dependency 'simplecov'
76
79
  spec.add_development_dependency 'gruff'
77
80
  spec.add_development_dependency 'webmock'
81
+
78
82
  if RUBY_VERSION < '2.1.0'
79
83
  spec.add_development_dependency 'nokogiri', '<= 1.6.8.1'
80
84
  else
@@ -105,6 +105,7 @@ require 'date'
105
105
  require 'daru/version.rb'
106
106
 
107
107
  require 'open-uri'
108
+ require 'backports/2.1.0/array/to_h'
108
109
 
109
110
  require 'daru/index/index.rb'
110
111
  require 'daru/index/multi_index.rb'
@@ -124,5 +125,3 @@ require 'daru/core/merge.rb'
124
125
 
125
126
  require 'daru/date_time/offsets.rb'
126
127
  require 'daru/date_time/index.rb'
127
-
128
- require 'backports'
@@ -1,5 +1,7 @@
1
1
  module Daru
2
2
  module Category # rubocop:disable Metrics/ModuleLength
3
+ UNDEFINED = Object.new.freeze
4
+
3
5
  attr_accessor :base_category
4
6
  attr_reader :index, :coding_scheme, :name
5
7
 
@@ -113,7 +115,7 @@ module Daru
113
115
  end
114
116
 
115
117
  # Associates a category to the vector.
116
- # @param [Array] *new_categories new categories to be associated
118
+ # @param [Array] new_categories new categories to be associated
117
119
  # @example
118
120
  # dv = Daru::Vector.new [:a, 1, :a, 1, :c], type: :category
119
121
  # dv.add_category :b
@@ -131,7 +133,10 @@ module Daru
131
133
  # dv = Daru::Vector.new [:a, 1, :a, 1, :c], type: :category
132
134
  # dv.count :a
133
135
  # # => 2
134
- def count category
136
+ # dv.count
137
+ # # => 5
138
+ def count category=UNDEFINED
139
+ return @cat_hash.values.map(&:size).inject(&:+) if category == UNDEFINED # count all
135
140
  raise ArgumentError, "Invalid category #{category}" unless
136
141
  categories.include?(category)
137
142
 
@@ -167,7 +172,7 @@ module Daru
167
172
  end
168
173
 
169
174
  # Returns vector for indexes/positions specified
170
- # @param [Array] *indexes indexes/positions for which values has to be retrived
175
+ # @param [Array] indexes for which values has to be retrived
171
176
  # @note Since it accepts both indexes and postions. In case of collision,
172
177
  # arguement will be treated as index
173
178
  # @return vector containing values specified at specified indexes/positions
@@ -196,7 +201,7 @@ module Daru
196
201
  end
197
202
 
198
203
  # Returns vector for positions specified.
199
- # @param [Array] *positions positions at which values to be retrived.
204
+ # @param [Array] positions at which values to be retrived.
200
205
  # @return vector containing values specified at specified positions
201
206
  # @example
202
207
  # dv = Daru::Vector.new [:a, 1, :a, 1, :c], type: :category
@@ -223,7 +228,7 @@ module Daru
223
228
 
224
229
  # Modifies values at specified indexes/positions.
225
230
  # @note In order to add a new category you need to associate it via #add_category
226
- # @param [Array] *indexes indexes/positions at which to modify value
231
+ # @param [Array] indexes at which to modify value
227
232
  # @param [object] val value to assign at specific indexes/positions
228
233
  # @return modified vector
229
234
  # @example
@@ -584,7 +589,7 @@ module Daru
584
589
  alias :gteq :mteq
585
590
 
586
591
  # For querying the data
587
- # @param [object] arel like query syntax
592
+ # @param bool_array [object] arel like query syntax
588
593
  # @return [Daru::Vector] Vector which makes the conditions true
589
594
  # @example
590
595
  # dv = Daru::Vector.new ['I', 'II', 'I', 'III', 'I', 'II'],
@@ -658,7 +663,7 @@ module Daru
658
663
  end
659
664
 
660
665
  # Check if any one of mentioned values occur in the vector
661
- # @param [Array] *values values to check for
666
+ # @param [Array] values to check for
662
667
  # @return [true, false] returns true if any one of specified values
663
668
  # occur in the vector
664
669
  # @example
@@ -670,7 +675,7 @@ module Daru
670
675
  end
671
676
 
672
677
  # Return a vector with specified values removed
673
- # @param [Array] *values values to reject from resultant vector
678
+ # @param [Array] values to reject from resultant vector
674
679
  # @return [Daru::Vector] vector with specified values removed
675
680
  # @example
676
681
  # dv = Daru::Vector.new [1, 2, nil, Float::NAN], type: :category
@@ -689,7 +694,7 @@ module Daru
689
694
  end
690
695
 
691
696
  # Count the number of values specified
692
- # @param [Array] *values values to count for
697
+ # @param [Array] values to count for
693
698
  # @return [Integer] the number of times the values mentioned occurs
694
699
  # @example
695
700
  # dv = Daru::Vector.new [1, 2, 1, 2, 3, 4, nil, nil]
@@ -702,7 +707,7 @@ module Daru
702
707
  end
703
708
 
704
709
  # Return indexes of values specified
705
- # @param [Array] *values values to find indexes for
710
+ # @param [Array] values to find indexes for
706
711
  # @return [Array] array of indexes of values specified
707
712
  # @example
708
713
  # dv = Daru::Vector.new [1, 2, nil, Float::NAN], index: 11..14
@@ -11,12 +11,14 @@ module Daru
11
11
  end
12
12
  end
13
13
 
14
- TUPLE_SORTER = lambda do |a, b|
15
- if a && b
16
- a.compact <=> b.compact
17
- else
18
- a ? 1 : -1
19
- end
14
+ TUPLE_SORTER = lambda do |left, right|
15
+ return -1 unless right
16
+ return 1 unless left
17
+
18
+ left = left.compact
19
+ right = right.compact
20
+ return left <=> right || 0 if left.length == right.length
21
+ left.length <=> right.length
20
22
  end
21
23
 
22
24
  def initialize context, names
@@ -203,8 +205,8 @@ module Daru
203
205
 
204
206
  # Iteratively applies a function to the values in a group and accumulates the result.
205
207
  # @param init (nil) The initial value of the accumulator.
206
- # @param block [Proc] A proc or lambda that accepts two arguments. The first argument
207
- # is the accumulated result. The second argument is a DataFrame row.
208
+ # @yieldparam block [Proc] A proc or lambda that accepts two arguments. The first argument
209
+ # is the accumulated result. The second argument is a DataFrame row.
208
210
  # @example Usage of reduce
209
211
  # df = Daru::DataFrame.new({
210
212
  # a: ['a','b'] * 3,
@@ -243,6 +245,47 @@ module Daru
243
245
  @df.inspect
244
246
  end
245
247
 
248
+ # Function to use for aggregating the data.
249
+ # `group_by` is using Daru::DataFrame#aggregate
250
+ #
251
+ # @param options [Hash] options for column, you want in resultant dataframe
252
+ #
253
+ # @return [Daru::DataFrame]
254
+ #
255
+ # @example
256
+ #
257
+ # df = Daru::DataFrame.new(
258
+ # name: ['Ram','Krishna','Ram','Krishna','Krishna'],
259
+ # visited: ['Hyderabad', 'Delhi', 'Mumbai', 'Raipur', 'Banglore'])
260
+ #
261
+ # => #<Daru::DataFrame(5x2)>
262
+ # name visited
263
+ # 0 Ram Hyderabad
264
+ # 1 Krishna Delhi
265
+ # 2 Ram Mumbai
266
+ # 3 Krishna Raipur
267
+ # 4 Krishna Banglore
268
+ #
269
+ # df.group_by(:name)
270
+ # => #<Daru::DataFrame(5x1)>
271
+ # visited
272
+ # Krishna 1 Delhi
273
+ # 3 Raipur
274
+ # 4 Banglore
275
+ # Ram 0 Hyderabad
276
+ # 2 Mumbai
277
+ #
278
+ # df.group_by(:name).aggregate(visited: -> (vec){vec.to_a.join(',')})
279
+ # => #<Daru::DataFrame(2x1)>
280
+ # visited
281
+ # Krishna Delhi,Raipur,Banglore
282
+ # Ram Hyderabad,Mumbai
283
+ #
284
+ def aggregate(options={})
285
+ @df.index = @df.index.remove_layer(@df.index.levels.size - 1)
286
+ @df.aggregate(options)
287
+ end
288
+
246
289
  private
247
290
 
248
291
  def init_groups_df tuples, names
@@ -84,7 +84,7 @@ module Daru
84
84
  # Read a dataframe from AR::Relation
85
85
  #
86
86
  # @param relation [ActiveRecord::Relation] An AR::Relation object from which data is loaded
87
- # @params fields [Array] Field names to be loaded (optional)
87
+ # @param fields [Array] Field names to be loaded (optional)
88
88
  #
89
89
  # @return A dataframe containing the data loaded from the relation
90
90
  #
@@ -277,6 +277,17 @@ module Daru
277
277
  # Default to *true*.
278
278
  #
279
279
  # == Usage
280
+ #
281
+ # df = Daru::DataFrame.new
282
+ # # =>
283
+ # # <Daru::DataFrame(0x0)>
284
+ # # Creates an empty DataFrame with no rows or columns.
285
+ #
286
+ # df = Daru::DataFrame.new({}, order: [:a, :b])
287
+ # #<Daru::DataFrame(0x2)>
288
+ # a b
289
+ # # Creates a DataFrame with no rows and columns :a and :b
290
+ #
280
291
  # df = Daru::DataFrame.new({a: [1,2,3,4], b: [6,7,8,9]}, order: [:b, :a],
281
292
  # index: [:a, :b, :c, :d], name: :spider_man)
282
293
  #
@@ -329,7 +340,7 @@ module Daru
329
340
  # # 1 4 14 44
330
341
  # # 2 5 15 55
331
342
 
332
- def initialize source, opts={} # rubocop:disable Metrics/MethodLength
343
+ def initialize source={}, opts={} # rubocop:disable Metrics/MethodLength
333
344
  vectors, index = opts[:order], opts[:index] # FIXME: just keyword arges after Ruby 2.1
334
345
  @data = []
335
346
  @name = opts[:name]
@@ -375,7 +386,7 @@ module Daru
375
386
  end
376
387
 
377
388
  # Retrive rows by positions
378
- # @param [Array<Integer>] *positions positions of rows to retrive
389
+ # @param [Array<Integer>] positions of rows to retrive
379
390
  # @return [Daru::Vector, Daru::DataFrame] vector for single position and dataframe for multiple positions
380
391
  # @example
381
392
  # df = Daru::DataFrame.new({
@@ -405,7 +416,7 @@ module Daru
405
416
 
406
417
  # Set rows by positions
407
418
  # @param [Array<Integer>] positions positions of rows to set
408
- # @vector [Array, Daru::Vector] vector vector to be assigned
419
+ # @param [Array, Daru::Vector] vector vector to be assigned
409
420
  # @example
410
421
  # df = Daru::DataFrame.new({
411
422
  # a: [1, 2, 3],
@@ -438,7 +449,7 @@ module Daru
438
449
  end
439
450
 
440
451
  # Retrive vectors by positions
441
- # @param [Array<Integer>] *positions positions of vectors to retrive
452
+ # @param [Array<Integer>] positions of vectors to retrive
442
453
  # @return [Daru::Vector, Daru::DataFrame] vector for single position and dataframe for multiple positions
443
454
  # @example
444
455
  # df = Daru::DataFrame.new({
@@ -522,7 +533,7 @@ module Daru
522
533
  end
523
534
 
524
535
  def add_row row, index=nil
525
- self.row[index || @size] = row
536
+ self.row[*(index || @size)] = row
526
537
  end
527
538
 
528
539
  def add_vector n, vector
@@ -597,7 +608,7 @@ module Daru
597
608
 
598
609
  # Returns a dataframe in which rows with any of the mentioned values
599
610
  # are ignored.
600
- # @param [Array] *values values to reject to form the new dataframe
611
+ # @param [Array] values to reject to form the new dataframe
601
612
  # @return [Daru::DataFrame] Data Frame with only rows which doesn't
602
613
  # contain the mentioned values
603
614
  # @example
@@ -650,6 +661,88 @@ module Daru
650
661
  self
651
662
  end
652
663
 
664
+ # Rolling fillna
665
+ # replace all Float::NAN and NIL values with the preceeding or following value
666
+ #
667
+ # @param direction [Symbol] (:forward, :backward) whether replacement value is preceeding or following
668
+ #
669
+ # @example
670
+ # df = Daru::DataFrame.new({
671
+ # a: [1, 2, 3, nil, Float::NAN, nil, 1, 7],
672
+ # b: [:a, :b, nil, Float::NAN, nil, 3, 5, nil],
673
+ # c: ['a', Float::NAN, 3, 4, 3, 5, nil, 7]
674
+ # })
675
+ #
676
+ # => #<Daru::DataFrame(8x3)>
677
+ # a b c
678
+ # 0 1 a a
679
+ # 1 2 b NaN
680
+ # 2 3 nil 3
681
+ # 3 nil NaN 4
682
+ # 4 NaN nil 3
683
+ # 5 nil 3 5
684
+ # 6 1 5 nil
685
+ # 7 7 nil 7
686
+ #
687
+ # 2.3.3 :068 > df.rolling_fillna(:forward)
688
+ # => #<Daru::DataFrame(8x3)>
689
+ # a b c
690
+ # 0 1 a a
691
+ # 1 2 b a
692
+ # 2 3 b 3
693
+ # 3 3 b 4
694
+ # 4 3 b 3
695
+ # 5 3 3 5
696
+ # 6 1 5 5
697
+ # 7 7 5 7
698
+ #
699
+ def rolling_fillna!(direction=:forward)
700
+ @data.each { |vec| vec.rolling_fillna!(direction) }
701
+ end
702
+
703
+ def rolling_fillna(direction=:forward)
704
+ dup.rolling_fillna!(direction)
705
+ end
706
+
707
+ # Return unique rows by vector specified or all vectors
708
+ #
709
+ # @param vtrs [String][Symbol] vector names(s) that should be considered
710
+ #
711
+ # @example
712
+ #
713
+ # => #<Daru::DataFrame(6x2)>
714
+ # a b
715
+ # 0 1 a
716
+ # 1 2 b
717
+ # 2 3 c
718
+ # 3 4 d
719
+ # 2 3 c
720
+ # 3 4 f
721
+ #
722
+ # 2.3.3 :> df.unique
723
+ # => #<Daru::DataFrame(5x2)>
724
+ # a b
725
+ # 0 1 a
726
+ # 1 2 b
727
+ # 2 3 c
728
+ # 3 4 d
729
+ # 3 4 f
730
+ #
731
+ # 2.3.3 :> df.unique(:a)
732
+ # => #<Daru::DataFrame(5x2)>
733
+ # a b
734
+ # 0 1 a
735
+ # 1 2 b
736
+ # 2 3 c
737
+ # 3 4 d
738
+ #
739
+ def uniq(*vtrs)
740
+ vecs = vtrs.empty? ? vectors.map(&:to_s) : Array(vtrs)
741
+ grouped = group_by(vecs)
742
+ indexes = grouped.groups.values.map { |v| v[0] }.sort
743
+ row[*indexes]
744
+ end
745
+
653
746
  # Iterate over each index of the DataFrame.
654
747
  def each_index &block
655
748
  return to_enum(:each_index) unless block_given?
@@ -1024,9 +1117,9 @@ module Daru
1024
1117
  dup.tap { |df| df.keep_vector_if(&block) }
1025
1118
  end
1026
1119
 
1027
- # Test each row with one or more tests. Each test is a Proc with the form
1028
- # *Proc.new {|row| row[:age] > 0}*
1029
- #
1120
+ # Test each row with one or more tests.
1121
+ # @param tests [Proc] Each test is a Proc with the form
1122
+ # *Proc.new {|row| row[:age] > 0}*
1030
1123
  # The function returns an array with all errors.
1031
1124
  #
1032
1125
  # FIXME: description here is too sparse. As far as I can get,
@@ -1128,7 +1221,7 @@ module Daru
1128
1221
  deprecate :flawed?, :include_values?, 2016, 10
1129
1222
 
1130
1223
  # Check if any of given values occur in the data frame
1131
- # @param [Array] *values values to check for
1224
+ # @param [Array] values to check for
1132
1225
  # @return [true, false] true if any of the given values occur in the
1133
1226
  # dataframe, false otherwise
1134
1227
  # @example
@@ -1259,13 +1352,60 @@ module Daru
1259
1352
 
1260
1353
  alias :last :tail
1261
1354
 
1262
- # Returns a vector with sum of all vectors specified in the argument.
1263
- # If vecs parameter is empty, sum all numeric vector.
1264
- def vector_sum vecs=nil
1355
+ # Sum all numeric/specified vectors in the DataFrame.
1356
+ #
1357
+ # Returns a new vector that's a containing a sum of all numeric
1358
+ # or specified vectors of the DataFrame. By default, if the vector
1359
+ # contains a nil, the sum is nil.
1360
+ # With :skipnil argument set to true, nil values are assumed to be
1361
+ # 0 (zero) and the sum vector is returned.
1362
+ #
1363
+ # @param args [Array] List of vectors to sum. Default is nil in which case
1364
+ # all numeric vectors are summed.
1365
+ #
1366
+ # @option opts [Boolean] :skipnil Consider nils as 0. Default is false.
1367
+ #
1368
+ # @return Vector with sum of all vectors specified in the argument.
1369
+ # If vecs parameter is empty, sum all numeric vector.
1370
+ #
1371
+ # @example
1372
+ # df = Daru::DataFrame.new({
1373
+ # a: [1, 2, nil],
1374
+ # b: [2, 1, 3],
1375
+ # c: [1, 1, 1]
1376
+ # })
1377
+ # => #<Daru::DataFrame(3x3)>
1378
+ # a b c
1379
+ # 0 1 2 1
1380
+ # 1 2 1 1
1381
+ # 2 nil 3 1
1382
+ # df.vector_sum [:a, :c]
1383
+ # => #<Daru::Vector(3)>
1384
+ # 0 2
1385
+ # 1 3
1386
+ # 2 nil
1387
+ # df.vector_sum
1388
+ # => #<Daru::Vector(3)>
1389
+ # 0 4
1390
+ # 1 4
1391
+ # 2 nil
1392
+ # df.vector_sum skipnil: true
1393
+ # => #<Daru::Vector(3)>
1394
+ # c
1395
+ # 0 4
1396
+ # 1 4
1397
+ # 2 4
1398
+ #
1399
+ def vector_sum(*args)
1400
+ defaults = {vecs: nil, skipnil: false}
1401
+ options = args.last.is_a?(::Hash) ? args.pop : {}
1402
+ options = defaults.merge(options)
1403
+ vecs = args[0] || options[:vecs]
1404
+ skipnil = args[1] || options[:skipnil]
1405
+
1265
1406
  vecs ||= numeric_vectors
1266
1407
  sum = Daru::Vector.new [0]*@size, index: @index, name: @name, dtype: @dtype
1267
-
1268
- vecs.inject(sum) { |memo, n| memo + self[n] }
1408
+ vecs.inject(sum) { |memo, n| self[n].add(memo, skipnil: skipnil) }
1269
1409
  end
1270
1410
 
1271
1411
  # Calculate mean of the rows of the dataframe.
@@ -1427,7 +1567,7 @@ module Daru
1427
1567
 
1428
1568
  # Reassign vectors with a new index of type Daru::Index or any of its subclasses.
1429
1569
  #
1430
- # @param [Daru::Index] idx The new index object on which the vectors are to
1570
+ # @param new_index [Daru::Index] idx The new index object on which the vectors are to
1431
1571
  # be indexed. Must of the same size as ncols.
1432
1572
  # @example Reassigning vectors of a DataFrame
1433
1573
  # df = Daru::DataFrame.new({a: [1,2,3,4], b: [:a,:b,:c,:d], c: [11,22,33,44]})
@@ -1513,9 +1653,9 @@ module Daru
1513
1653
  # Sorts a dataframe (ascending/descending) in the given pripority sequence of
1514
1654
  # vectors, with or without a block.
1515
1655
  #
1516
- # @param order [Array] The order of vector names in which the DataFrame
1656
+ # @param vector_order [Array] The order of vector names in which the DataFrame
1517
1657
  # should be sorted.
1518
- # @param [Hash] opts The options to sort with.
1658
+ # @param opts [Hash] opts The options to sort with.
1519
1659
  # @option opts [TrueClass,FalseClass,Array] :ascending (true) Sort in ascending
1520
1660
  # or descending order. Specify Array corresponding to *order* for multiple
1521
1661
  # sort orders.
@@ -1684,12 +1824,11 @@ module Daru
1684
1824
 
1685
1825
  new_fields = (@vectors.to_a + other_df.vectors.to_a)
1686
1826
  new_fields = ArrayHelper.recode_repeated(new_fields)
1687
-
1688
1827
  DataFrame.new({}, order: new_fields).tap do |df_new|
1689
1828
  (0...nrows).each do |i|
1690
1829
  df_new.add_row row[i].to_a + other_df.row[i].to_a
1691
1830
  end
1692
-
1831
+ df_new.index = @index if @index == other_df.index
1693
1832
  df_new.update
1694
1833
  end
1695
1834
  end
@@ -2035,7 +2174,7 @@ module Daru
2035
2174
  end
2036
2175
 
2037
2176
  # Converts the specified non category type vectors to category type vectors
2038
- # @param [Array] *names names of non category type vectors to be converted
2177
+ # @param [Array] names of non category type vectors to be converted
2039
2178
  # @return [Daru::DataFrame] data frame in which specified vectors have been
2040
2179
  # converted to category type
2041
2180
  # @example
@@ -2126,8 +2265,88 @@ module Daru
2126
2265
  res
2127
2266
  end
2128
2267
 
2268
+ # Function to use for aggregating the data.
2269
+ #
2270
+ # @param options [Hash] options for column, you want in resultant dataframe
2271
+ #
2272
+ # @return [Daru::DataFrame]
2273
+ #
2274
+ # @example
2275
+ # df = Daru::DataFrame.new(
2276
+ # {col: [:a, :b, :c, :d, :e], num: [52,12,07,17,01]})
2277
+ # => #<Daru::DataFrame(5x2)>
2278
+ # col num
2279
+ # 0 a 52
2280
+ # 1 b 12
2281
+ # 2 c 7
2282
+ # 3 d 17
2283
+ # 4 e 1
2284
+ #
2285
+ # df.aggregate(num_100_times: ->(df) { df.num*100 })
2286
+ # => #<Daru::DataFrame(5x1)>
2287
+ # num_100_ti
2288
+ # 0 5200
2289
+ # 1 1200
2290
+ # 2 700
2291
+ # 3 1700
2292
+ # 4 100
2293
+ #
2294
+ # When we have duplicate index :
2295
+ #
2296
+ # idx = Daru::CategoricalIndex.new [:a, :b, :a, :a, :c]
2297
+ # df = Daru::DataFrame.new({num: [52,12,07,17,01]}, index: idx)
2298
+ # => #<Daru::DataFrame(5x1)>
2299
+ # num
2300
+ # a 52
2301
+ # b 12
2302
+ # a 7
2303
+ # a 17
2304
+ # c 1
2305
+ #
2306
+ # df.aggregate(num: :mean)
2307
+ # => #<Daru::DataFrame(3x1)>
2308
+ # num
2309
+ # a 25.3333333
2310
+ # b 12
2311
+ # c 1
2312
+ #
2313
+ # Note: `GroupBy` class `aggregate` method uses this `aggregate` method
2314
+ # internally.
2315
+ def aggregate(options={})
2316
+ colmn_value, index_tuples = aggregated_colmn_value(options)
2317
+ Daru::DataFrame.new(
2318
+ colmn_value, index: index_tuples, order: options.keys
2319
+ )
2320
+ end
2321
+
2129
2322
  private
2130
2323
 
2324
+ # Do the `method` (`method` can be :sum, :mean, :std, :median, etc or
2325
+ # lambda), on the column.
2326
+ def apply_method_on_colmns colmn, index_tuples, method
2327
+ rows = []
2328
+ index_tuples.each do |indexes|
2329
+ # If single element then also make it vector.
2330
+ slice = Daru::Vector.new(Array(self[colmn][*indexes]))
2331
+ case method
2332
+ when Symbol
2333
+ rows << (slice.is_a?(Daru::Vector) ? slice.send(method) : slice)
2334
+ when Proc
2335
+ rows << method.call(slice)
2336
+ end
2337
+ end
2338
+ rows
2339
+ end
2340
+
2341
+ def apply_method_on_df index_tuples, method
2342
+ rows = []
2343
+ index_tuples.each do |indexes|
2344
+ slice = row[*indexes]
2345
+ rows << method.call(slice)
2346
+ end
2347
+ rows
2348
+ end
2349
+
2131
2350
  def headers
2132
2351
  Daru::Index.new(Array(index.name) + @vectors.to_a)
2133
2352
  end
@@ -2224,9 +2443,7 @@ module Daru
2224
2443
  rescue IndexError
2225
2444
  raise IndexError, "Specified vector #{names.first} does not exist"
2226
2445
  end
2227
-
2228
2446
  return @data[pos] if pos.is_a?(Numeric)
2229
-
2230
2447
  names = pos
2231
2448
  end
2232
2449
 
@@ -2396,7 +2613,7 @@ module Daru
2396
2613
  end
2397
2614
 
2398
2615
  def create_vectors_index_with vectors, source
2399
- vectors = source.keys.sort_by(&:to_s) if vectors.nil?
2616
+ vectors = source.keys if vectors.nil?
2400
2617
 
2401
2618
  @vectors =
2402
2619
  if vectors.is_a?(Index) || vectors.is_a?(MultiIndex)
@@ -2443,9 +2660,7 @@ module Daru
2443
2660
  @index = Index.coerce(index || source[0].size)
2444
2661
  @vectors = Index.coerce(vectors)
2445
2662
 
2446
- @data = @vectors.each_with_index.map do |_vec,idx|
2447
- Daru::Vector.new(source[idx], index: @index, name: vectors[idx])
2448
- end
2663
+ update_data source, vectors
2449
2664
  end
2450
2665
 
2451
2666
  def initialize_from_array_of_vectors source, vectors, index, opts
@@ -2694,6 +2909,30 @@ module Daru
2694
2909
  end
2695
2910
  end
2696
2911
 
2912
+ def update_data source, vectors
2913
+ @data = @vectors.each_with_index.map do |_vec,idx|
2914
+ Daru::Vector.new(source[idx], index: @index, name: vectors[idx])
2915
+ end
2916
+ end
2917
+
2918
+ def aggregated_colmn_value(options)
2919
+ colmn_value = []
2920
+ index_tuples = Array(@index).uniq
2921
+ options.keys.each do |vec|
2922
+ do_this_on_vec = options[vec]
2923
+ colmn_value << if @vectors.include?(vec)
2924
+ apply_method_on_colmns(
2925
+ vec, index_tuples, do_this_on_vec
2926
+ )
2927
+ else
2928
+ apply_method_on_df(
2929
+ index_tuples, do_this_on_vec
2930
+ )
2931
+ end
2932
+ end
2933
+ [colmn_value, index_tuples]
2934
+ end
2935
+
2697
2936
  # coerce ranges, integers and array in appropriate ways
2698
2937
  def coerce_positions *positions, size
2699
2938
  if positions.size == 1