daru 0.0.4 → 0.0.5

Sign up to get free protection for your applications and to get access to all the features.
Files changed (40) hide show
  1. checksums.yaml +4 -4
  2. data/CONTRIBUTING.md +0 -0
  3. data/Gemfile +0 -1
  4. data/History.txt +35 -0
  5. data/README.md +178 -198
  6. data/daru.gemspec +5 -7
  7. data/lib/daru.rb +10 -2
  8. data/lib/daru/accessors/array_wrapper.rb +36 -198
  9. data/lib/daru/accessors/nmatrix_wrapper.rb +60 -209
  10. data/lib/daru/core/group_by.rb +183 -0
  11. data/lib/daru/dataframe.rb +615 -167
  12. data/lib/daru/index.rb +17 -16
  13. data/lib/daru/io/io.rb +5 -12
  14. data/lib/daru/maths/arithmetic/dataframe.rb +72 -8
  15. data/lib/daru/maths/arithmetic/vector.rb +19 -6
  16. data/lib/daru/maths/statistics/dataframe.rb +103 -2
  17. data/lib/daru/maths/statistics/vector.rb +102 -61
  18. data/lib/daru/monkeys.rb +8 -0
  19. data/lib/daru/multi_index.rb +199 -0
  20. data/lib/daru/plotting/dataframe.rb +24 -24
  21. data/lib/daru/plotting/vector.rb +14 -15
  22. data/lib/daru/vector.rb +402 -98
  23. data/lib/version.rb +1 -1
  24. data/notebooks/grouping_splitting_pivots.ipynb +529 -0
  25. data/notebooks/intro_with_music_data_.ipynb +104 -119
  26. data/spec/accessors/wrappers_spec.rb +36 -0
  27. data/spec/core/group_by_spec.rb +331 -0
  28. data/spec/dataframe_spec.rb +1237 -475
  29. data/spec/fixtures/sales-funnel.csv +18 -0
  30. data/spec/index_spec.rb +10 -21
  31. data/spec/io/io_spec.rb +4 -14
  32. data/spec/math/arithmetic/dataframe_spec.rb +66 -0
  33. data/spec/math/arithmetic/vector_spec.rb +45 -4
  34. data/spec/math/statistics/dataframe_spec.rb +91 -1
  35. data/spec/math/statistics/vector_spec.rb +32 -6
  36. data/spec/monkeys_spec.rb +10 -1
  37. data/spec/multi_index_spec.rb +216 -0
  38. data/spec/spec_helper.rb +1 -0
  39. data/spec/vector_spec.rb +505 -57
  40. metadata +21 -15
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: dc60c1a59b4c112dface4bdd6d0a40cf1bcc499c
4
- data.tar.gz: aec8fd874e2e74ded4eeb2cebe8503b5a3cbefbc
3
+ metadata.gz: fd2dec0795f15ca1e45bdad5238fb7dbe33e1089
4
+ data.tar.gz: 634ff6e6b533cad019893a6e248706c824933e1d
5
5
  SHA512:
6
- metadata.gz: 091f1ce3e0da8b9083dc6f53a9977550def5d10a4f759797ec56cb6b90788aa8d6a7fb8ad1a0e4105c640b5987e0b200877a12441cc8cd6b2614f26141554d9e
7
- data.tar.gz: d189373c2196a3ddc10856f4800fee908d9361f83921e68078c971402a3996b02d82c13bb275a26d070518e90626d4e654556d970d91b69179f2899065b31685
6
+ metadata.gz: 2c4aed326afacb2fe2324dd720e302564ab973b7fe69e17daf8f4902fecf7a2bbe34a26b0681dc42eaef14bd511a439a2717a115a7f577f700212d0d605d6dee
7
+ data.tar.gz: be1bc452b188d233a6c668a008ed9f9e4cd77cf9b24a574559bf27c8c28ab34b0c23d51cc4321ab49c1416a53b0b74571afca698cdf2106d407f744204191362
File without changes
data/Gemfile CHANGED
@@ -1,4 +1,3 @@
1
1
  source 'https://rubygems.org'
2
2
 
3
- # Specify your gem's dependencies in daru.gemspec
4
3
  gemspec
@@ -41,3 +41,38 @@
41
41
  * #uniq for Vector.
42
42
  * #max for Vector can return a Vector object with the index set to the index of the max value.
43
43
  * Tonnes of documentation for most methods.
44
+
45
+ == 0.0.5
46
+
47
+ * Easy accessors for some methods
48
+ * Faster CSV loading.
49
+ * Changed vector #is\_valid? to #exists?
50
+ * Revamped dtype specifiers for Vector. Now specify :array/:nmatrix for changing underlying data implementation. Specigfy nm\_dtype for specifying the data type of the NMatrix object.
51
+ * #sort for Vector. Quick sort algorithm with preservation of original indexes.
52
+ * Removed #re\_index and #to\_index from Daru::Index.
53
+ * Ability to change the index of Vector and DataFrame with #reindex/#reindex!.
54
+ * Multi-level #sort! and #sort for DataFrames. Preserves indexing.
55
+ * All vector statistics now work with NMatrix as the underlying data type.
56
+ * Vectors keep a record of all positions with nils with #nil\_positions.
57
+ * Know whether a position has nils or not with #is_nil?
58
+ * Added #clone_structure to Vector for cloning only the index and structure or a vector.
59
+ * Figure out the type of data using #type. Running thru the data to determine its type is delayed till the last possible moment.
60
+ * Added arithmetic operations between data frame and scalars or other data frames.
61
+ * Added #map_vectors!.
62
+ * Create a DataFrame from Array of Arrays and Array of Vectors.
63
+ * Refactored DataFrame.rows and the DataFrame constructor.
64
+ * Added hierarchial indexing to Vector and DataFrame with MultiIndex.
65
+ * Convert DataFrame to ruby Matrix or NMatrix with #to\_matrix and #to\_nmatrix.
66
+ * Added #group_by to DataFrame for grouping rows according to elements in a given column. Works similar to SQL GROUP BY, only much simpler.
67
+ * Added new class Daru::Core::GroupBy for supporting various grouping methods like #head, #tail, #get_group, #size, #count, #mean, #std, #min, #max.
68
+ * Tranpose indexed/multi-indexed DataFrame with #transpose.
69
+ * Convert Daru::Vector to horizontal or vertical Ruby Matrix with #to_matrix.
70
+ * Added shortcut to DataFrame to allow access of vectors by using only #[] instead of calling #vector or *[vector_names, :vector]*.
71
+ * Added DSL for Vector and DataFrame plotting with nyaplot. Can now grab the underlying Nyaplot::Plot and Nyaplot::Diagram object for performing different operations. Only need to supply parameters for the initial creation of the diagram.
72
+ * Added #pivot_table to DataFrame for reducing and aggregating data to generate a quick summary.
73
+ * Added #shape to DataFrame for knowing the numbers of rows and columns in a DataFrame.
74
+ * Added statistics methods #mean, #std, #max, #min, #count, #product, #sum to DataFrame.
75
+ * Added #describe to DataFrame for producing multiple statistics data of numerical vectors in one shot.
76
+ * Monkey patched Ruby Matrix to include #elementwise_division.
77
+ * Added #covariance to calculate the covariance between numbers of a DataFrame and #correlation to calculate correlation.
78
+ * Enumerators return Enumerator objects if there is no block.
data/README.md CHANGED
@@ -7,30 +7,35 @@ Data Analysis in RUby
7
7
 
8
8
  ## Introduction
9
9
 
10
- daru (Data Analysis in RUby) is a library for storage, analysis and manipulation of data.
11
-
12
- Development of daru was started to address the fragmentation of Dataframe-like classes which were created in many ruby gems as per their own needs. daru offers a uniform interface for all sorts of data analysis and manipulation operations and aims to be compatible with all ruby gems involved in any way with data.
10
+ daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data.
13
11
 
14
12
  daru is inspired by `Statsample::Dataset` and pandas, a very mature solution in Python.
15
13
 
16
- daru works with CRuby (1.9.3+) and JRuby.
14
+ Written in pure Ruby so should work with all ruby implementations.
17
15
 
18
16
  ## Features
19
17
 
20
18
  * Data structures:
21
19
  - Vector - A basic 1-D vector.
22
- - DataFrame - A 2-D matrix-like structure which is internally composed of named `Vector` classes.
23
- * Compatible with IRuby notebook.
24
- * Indexed and named data structures.
20
+ - DataFrame - A 2-D table-like structure which is internally composed of named `Vectors`.
21
+ * Compatible with [IRuby notebook](https://github.com/minad/iruby) and [statsample](https://github.com/clbustos/statsample).
22
+ * Singly and hierarchially indexed data structures.
25
23
  * Flexible and intuitive API for manipulation and analysis of data.
24
+ * Easy plotting, statistics and arithmetic.
25
+ * Plentiful iterators.
26
+ * Optional speed and space optimization on MRI with [NMatrix](https://github.com/SciRuby/nmatrix).
27
+ * Easy splitting, aggregation and grouping of data.
28
+ * Quickly reducing data with pivot tables for quick data summary.
26
29
 
27
30
  ## Notebooks
28
31
 
29
- * [Analysis and plotting of a data set comprising of music listening habits of a last.fm user(iruby notebook)](http://nbviewer.ipython.org/github/v0dro/daru/blob/master/notebooks/intro_with_music_data_.ipynb)
32
+ * [Analysis and plotting of a data set comprising of music listening habits of a last.fm user](http://nbviewer.ipython.org/github/v0dro/daru/blob/master/notebooks/intro_with_music_data_.ipynb)
33
+ * [Basic splitting, grouping and aggregating of data](http://nbviewer.ipython.org/github/v0dro/daru/blob/master/notebooks/grouping_splitting_pivots.ipynb)
30
34
 
31
35
  ## Blog Posts
32
36
 
33
37
  * [Data Analysis in RUby: Basic data manipulation and plotting](http://v0dro.github.io/blog/2014/11/25/data-analysis-in-ruby-basic-data-manipulation-and-plotting/)
38
+ * [Data Analysis in RUby: Splitting, sorting, aggregating data and data types](http://v0dro.github.io/blog/2015/02/24/data-analysis-in-ruby-part-2/)
34
39
 
35
40
  ## Documentation
36
41
 
@@ -38,34 +43,23 @@ Docs can be found [here](https://rubygems.org/gems/daru).
38
43
 
39
44
  ## Basic Usage
40
45
 
41
- daru has been created with keeping extreme ease of use in mind.
42
-
43
- The gem consists of two data structures, Vector and DataFrame. Any data in a serial format is a Vector and a table is a DataFrame.
44
-
45
46
  #### Initialization of DataFrame
46
47
 
47
- A data frame can be initialized from the following sources:
48
- * Hash of indexed order: `{ b: Daru::Vector.new(:b, [11,12,13,14,15], [:two, :one, :four, :five, :three]), a: Daru::Vector.new(:a, [1,2,3,4,5], [:two,:one,:three, :four, :five])}`.
49
- * Array of hashes: `[{a: 1, b: 11}, {a: 2, b: 12}, {a: 3, b: 13},{a: 4, b: 14}, {a: 5, b: 15}]`.
50
- * Hash of names and Arrays: `{b: [11,12,13,14,15], a: [1,2,3,4,5]}`
51
-
52
- The DataFrame constructor takes 4 arguments: source, vectors, indexes and name in that order. The last 3 are optional while the first is mandatory.
53
-
54
48
  A basic DataFrame can be initialized like this:
55
49
 
56
50
  ```ruby
57
51
 
58
- df = Daru::DataFrame.new({b: [11,12,13,14,15], a: [1,2,3,4,5]}, order: [:a, :b], index: [:one, :two, :three, :four, :five])
59
- df
60
- # =>
61
- # # <Daru::DataFrame:87274040 @name = 7308c587-4073-4e7d-b3ca-3679d1dcc946 # @size = 5>
62
- # a b
63
- # one 1 11
64
- # two 2 12
65
- # three 3 13
66
- # four 4 14
67
- # five 5 15
52
+ df = Daru::DataFrame.new({b: [11,12,13,14,15], a: [1,2,3,4,5]}, order: [:a, :b], index: [:one, :two, :three, :four, :five])
53
+ df
68
54
 
55
+ # =>
56
+ # # <Daru::DataFrame:87274040 @name = 7308c587-4073-4e7d-b3ca-3679d1dcc946 # @size = 5>
57
+ # a b
58
+ # one 1 11
59
+ # two 2 12
60
+ # three 3 13
61
+ # four 4 14
62
+ # five 5 15
69
63
  ```
70
64
  Daru will automatically align the vectors correctly according to the specified index and then create the DataFrame. Thus, elements having the same index will show up in the same row. The indexes will be arranged alphabetically if vectors with unaligned indexes are supplied.
71
65
 
@@ -73,69 +67,63 @@ The vectors of the DataFrame will be arranged according to the array specified i
73
67
 
74
68
  ```ruby
75
69
 
76
- df = Daru::DataFrame.new({
77
- b: [11,12,13,14,15].dv(:b, [:two, :one, :four, :five, :three]),
78
- a: [1,2,3,4,5].dv(:a, [:two,:one,:three, :four, :five])
79
- },
80
- order: [:a, :b]
81
- )
82
- df
83
-
84
- # =>
85
- # #<Daru::DataFrame:87363700 @name = 75ba0a14-8291-48ac-ac30-35017e4d6c5f # @size = 5>
86
- # a b
87
- # five 5 14
88
- # four 4 13
89
- # one 2 12
90
- # three 3 15
91
- # two 1 11
70
+ df = Daru::DataFrame.new({
71
+ b: [11,12,13,14,15].dv(:b, [:two, :one, :four, :five, :three]),
72
+ a: [1,2,3,4,5].dv(:a, [:two,:one,:three, :four, :five])
73
+ }, order: [:a, :b]
74
+ )
75
+ df
92
76
 
77
+ # =>
78
+ # #<Daru::DataFrame:87363700 @name = 75ba0a14-8291-48ac-ac30-35017e4d6c5f # @size = 5>
79
+ # a b
80
+ # five 5 14
81
+ # four 4 13
82
+ # one 2 12
83
+ # three 3 15
84
+ # two 1 11
93
85
  ```
94
86
 
95
87
  If an index for the DataFrame is supplied (third argument), then the indexes of the individual vectors will be matched to the DataFrame index. If any of the indexes do not match, nils will be inserted instead:
96
88
 
97
89
  ```ruby
98
90
 
99
- df = Daru::DataFrame.new({
100
- b: [11] .dv(nil, [:one]),
101
- a: [1,2,3] .dv(nil, [:one, :two, :three]),
102
- c: [11,22,33,44,55] .dv(nil, [:one, :two, :three, :four, :five]),
103
- d: [49,69,89,99,108,44].dv(nil, [:one, :two, :three, :four, :five, :six])
104
- }, order: [:a, :b, :c, :d], index: [:one, :two, :three, :four, :five, :six])
105
- df
106
- # =>
107
- # #<Daru::DataFrame:87523270 @name = bda4eb68-afdd-4404-9981-708edab14201 #@size = 6>
108
- # a b c d
109
- # one 1 11 11 49
110
- # two 2 nil 22 69
111
- # three 3 nil 33 89
112
- # four nil nil 44 99
113
- # five nil nil 55 108
114
- # six nil nil nil 44
115
-
91
+ df = Daru::DataFrame.new({
92
+ b: [11] .dv(nil, [:one]),
93
+ a: [1,2,3] .dv(nil, [:one, :two, :three]),
94
+ c: [11,22,33,44,55] .dv(nil, [:one, :two, :three, :four, :five]),
95
+ d: [49,69,89,99,108,44].dv(nil, [:one, :two, :three, :four, :five, :six])
96
+ }, order: [:a, :b, :c, :d], index: [:one, :two, :three, :four, :five, :six])
97
+ df
98
+ # =>
99
+ # #<Daru::DataFrame:87523270 @name = bda4eb68-afdd-4404-9981-708edab14201 #@size = 6>
100
+ # a b c d
101
+ # one 1 11 11 49
102
+ # two 2 nil 22 69
103
+ # three 3 nil 33 89
104
+ # four nil nil 44 99
105
+ # five nil nil 55 108
106
+ # six nil nil nil 44
116
107
  ```
117
108
 
118
109
  If some of the supplied vectors do not contain certain indexes that are contained in other vectors, they are added to those vectors and the correspoding elements are set to `nil`.
119
110
 
120
111
  ```ruby
121
112
 
122
- df = Daru::DataFrame.new({
123
- b: [11,12,13,14,15].dv(:b, [:two, :one, :four, :five, :three]),
124
- a: [1,2,3] .dv(:a, [:two,:one,:three])
125
- },
126
- order: [:a, :b]
127
- )
128
- df
129
-
130
- # =>
131
- # #<Daru::DataFrame:87612510 @name = 1e904c15-e095-4dce-bfdf-c07ee4d6e4a4 # @size = 5>
132
- # a b
133
- # five nil 14
134
- # four nil 13
135
- # one 2 12
136
- # three 3 15
137
- # two 1 11
138
-
113
+ df = Daru::DataFrame.new({
114
+ b: [11,12,13,14,15].dv(:b, [:two, :one, :four, :five, :three]),
115
+ a: [1,2,3] .dv(:a, [:two,:one,:three])
116
+ }, order: [:a, :b])
117
+ df
118
+
119
+ # =>
120
+ # #<Daru::DataFrame:87612510 @name = 1e904c15-e095-4dce-bfdf-c07ee4d6e4a4 # @size = 5>
121
+ # a b
122
+ # five nil 14
123
+ # four nil 13
124
+ # one 2 12
125
+ # three 3 15
126
+ # two 1 11
139
127
  ```
140
128
 
141
129
  #### Initialization of Vector
@@ -146,36 +134,32 @@ In the simplest case it can be constructed like this:
146
134
 
147
135
  ```ruby
148
136
 
149
- dv = Daru::Vector.new [1,2,3,4,5], name: ravan, index: [:ek, :don, :teen, :char, :pach]
150
- dv
151
-
152
- # =>
153
- # #<Daru::Vector:87630270 @name = ravan @size = 5 >
154
- # ravan
155
- # ek 1
156
- # don 2
157
- # teen 3
158
- # char 4
159
- # pach 5
160
-
137
+ dv = Daru::Vector.new [1,2,3,4,5], name: ravan, index: [:ek, :don, :teen, :char, :pach]
138
+ dv
139
+ # =>
140
+ # #<Daru::Vector:87630270 @name = ravan @size = 5 >
141
+ # ravan
142
+ # ek 1
143
+ # don 2
144
+ # teen 3
145
+ # char 4
146
+ # pach 5
161
147
  ```
162
148
 
163
149
  Initializing a vector with indexes will insert nils in places where elements dont exist:
164
150
 
165
151
  ```ruby
166
152
 
167
- dv = Daru::Vector.new [1,2,3], name: yoga, index: [0,1,2,3,4]
168
- dv
169
- # =>
170
- # #<Daru::Vector:87890840 @name = yoga @size = 5 >
171
- # y
172
- # 0 1
173
- # 1 2
174
- # 2 3
175
- # 3 nil
176
- # 4 nil
177
-
178
-
153
+ dv = Daru::Vector.new [1,2,3], name: yoga, index: [0,1,2,3,4]
154
+ dv
155
+ # =>
156
+ # #<Daru::Vector:87890840 @name = yoga @size = 5 >
157
+ # y
158
+ # 0 1
159
+ # 1 2
160
+ # 2 3
161
+ # 3 nil
162
+ # 4 nil
179
163
  ```
180
164
 
181
165
  #### Basic Selection Operations
@@ -184,34 +168,32 @@ Initialize a dataframe:
184
168
 
185
169
  ```ruby
186
170
 
187
- df = Daru::DataFrame.new({
188
- b: [11,12,13,14,15].dv(:b, [:two, :one, :four, :five, :three]),
189
- a: [1,2,3,4,5].dv(:a, [:two,:one,:three, :four, :five])
190
- },
191
- order: [:a, :b]
192
- )
193
-
194
- # =>
195
- # #<Daru::DataFrame:87455010 @name = b3d14e23-98c2-4741-a563-92e8f1fd0f13 # @size = 5>
196
- # a b
197
- # five 5 14
198
- # four 4 13
199
- # one 2 12
200
- # three 3 15
201
- # two 1 11
171
+ df = Daru::DataFrame.new({
172
+ b: [11,12,13,14,15].dv(:b, [:two, :one, :four, :five, :three]),
173
+ a: [1,2,3,4,5].dv(:a, [:two,:one,:three, :four, :five])
174
+ }, order: [:a, :b])
175
+
176
+ # =>
177
+ # #<Daru::DataFrame:87455010 @name = b3d14e23-98c2-4741-a563-92e8f1fd0f13 # @size = 5>
178
+ # a b
179
+ # five 5 14
180
+ # four 4 13
181
+ # one 2 12
182
+ # three 3 15
183
+ # two 1 11
202
184
 
203
185
  ```
204
186
  Select a row from a DataFrame:
205
187
 
206
188
  ```ruby
207
189
 
208
- df.row[:one]
190
+ df.row[:one]
209
191
 
210
- # =>
211
- # #<Daru::Vector:87432070 @name = one @size = 2 >
212
- # one
213
- # a 2
214
- # b 12
192
+ # =>
193
+ # #<Daru::Vector:87432070 @name = one @size = 2 >
194
+ # one
195
+ # a 2
196
+ # b 12
215
197
  ```
216
198
  A row or a vector is returned as a `Daru::Vector` object, so any manipulations supported by `Daru::Vector` can be performed on the chosen row as well.
217
199
 
@@ -233,101 +215,92 @@ Select a single vector:
233
215
 
234
216
  ```ruby
235
217
 
236
- df.vector[:a] # or simply df.a
237
-
238
- # =>
239
- # #<Daru::Vector:87454270 @name = a @size = 5 >
240
- # a
241
- # five 5
242
- # four 4
243
- # one 2
244
- # three 3
245
- # two 1
246
-
218
+ df.vector[:a] # or simply df.a
219
+
220
+ # =>
221
+ # #<Daru::Vector:87454270 @name = a @size = 5 >
222
+ # a
223
+ # five 5
224
+ # four 4
225
+ # one 2
226
+ # three 3
227
+ # two 1
247
228
  ```
248
229
 
249
230
  Select multiple vectors and return a DataFrame in the specified order:
250
231
 
251
232
  ```ruby
252
233
 
253
- df.vector[:b, :a]
254
- # =>
255
- # #<Daru::DataFrame:87835960 @name = e80902cc-cff9-4b23-9eca-5da36ebc88a8 # @size = 5>
256
- # b a
257
- # five 14 5
258
- # four 13 4
259
- # one 12 2
260
- # three 15 3
261
- # two 11 1
262
-
234
+ df.vector[:b, :a]
235
+ # =>
236
+ # #<Daru::DataFrame:87835960 @name = e80902cc-cff9-4b23-9eca-5da36ebc88a8 # @size = 5>
237
+ # b a
238
+ # five 14 5
239
+ # four 13 4
240
+ # one 12 2
241
+ # three 15 3
242
+ # two 11 1
263
243
  ```
264
244
 
265
245
  Keep/remove row according to a specified condition:
266
246
 
267
247
  ```ruby
268
248
 
269
- df = df.filter_rows do |row|
270
- row[:a] == 5
271
- end
272
-
273
- df
274
- # =>
275
- # #<Daru::DataFrame:87455010 @name = b3d14e23-98c2-4741-a563-92e8f1fd0f13 # @size = 1>
276
- # a b
277
- # five 5 14
278
-
249
+ df = df.filter_rows do |row|
250
+ row[:a] == 5
251
+ end
252
+ df
253
+ # =>
254
+ # #<Daru::DataFrame:87455010 @name = b3d14e23-98c2-4741-a563-92e8f1fd0f13 # @size = 1>
255
+ # a b
256
+ # five 5 14
279
257
  ```
280
258
  The same can be applied to vectors using `filter_vectors`.
281
259
 
282
- To iterate over a DataFrame and perform operations on rows or vectors, use `#each_row` or `#each_vector`.
283
-
284
260
  To change the values of a row/vector while iterating through the DataFrame, use `map_rows` or `map_vectors`:
285
261
 
286
262
  ```ruby
287
263
 
288
- df.map_rows do |row|
289
- row = row * row
290
- end
291
-
292
- df
293
-
294
- # =>
295
- # #<Daru::DataFrame:86826830 @name = b092ca5b-7b83-4dbe-a469-124f7f25a568 # @size = 5>
296
- # a b
297
- # five 25 196
298
- # four 16 169
299
- # one 4 144
300
- # three 9 225
301
- # two 1 121
302
-
264
+ df.map_rows do |row|
265
+ row = row * row
266
+ end
267
+
268
+ df
269
+ # =>
270
+ # #<Daru::DataFrame:86826830 @name = b092ca5b-7b83-4dbe-a469-124f7f25a568 # @size = 5>
271
+ # a b
272
+ # five 25 196
273
+ # four 16 169
274
+ # one 4 144
275
+ # three 9 225
276
+ # two 1 121
303
277
  ```
304
278
 
305
- Rows/vectors can be deleted using `delete_row` or `delete_vector`.
306
-
307
279
  #### Basic Maths Operations
308
280
 
309
281
  Performing a binary arithmetic operation on two `Daru::Vector` objects will return a `Vector` object in which the operation will be performed on elements of the same index.
310
282
 
311
283
  ```ruby
312
284
 
313
- dv1 = Daru::Vector.new [1,2,3,4], name: :boozy, index: [:a, :b, :c, :d]
314
-
315
- dv2 = Daru::Vector.new [1,2,3,4], name: :mayer, index: [:e, :f, :b, :d]
285
+ dv1 = Daru::Vector.new [1,2,3,4], name: :boozy, index: [:a, :b, :c, :d]
286
+ dv2 = Daru::Vector.new [1,2,3,4], name: :mayer, index: [:e, :f, :b, :d]
287
+ dv1 * dv2
316
288
 
317
- dv1 * dv2
318
-
319
- # #<Daru::Vector:80924700 @name = boozy @size = 2 >
320
- # boozy
321
- # b 6
322
- # d 16
323
-
289
+ # #<Daru::Vector:80924700 @name = boozy @size = 2 >
290
+ # boozy
291
+ # b 6
292
+ # d 16
324
293
  ```
325
294
 
326
295
  Arithmetic operators applied on a single Numeric will perform the operation with that number against the entire vector.
327
296
 
328
- #### Statistics Operations
297
+ Same applies to DataFrame as well.
298
+
299
+ #### Splitting and aggregation of data
329
300
 
330
- Daru::Vector has a whole lot of statistics operations to maintain compatibility with Statsample::Vector. Check the docs for details.
301
+ `Daru::DataFrame` provides the `#group_by` method to split or aggregate data. Its very similar to SQL GROUP BY. Check the [blog post]() for details.
302
+
303
+ You can also generate Excel-style pivot tables with `#pivot_table`.
331
304
 
332
305
  #### Plotting
333
306
 
@@ -335,35 +308,42 @@ daru uses [Nyaplot](https://github.com/domitry/nyaplot) for plotting and an exam
335
308
 
336
309
  Head over to the tutorials and notebooks listed above for more examples.
337
310
 
311
+ #### Working with missing data
312
+
313
+ Missing data is an integral part of any data analysis operation and [this blog post](http://v0dro.github.io/blog/2015/02/24/data-analysis-in-ruby-part-2/) provides details on dealing with missing data.
314
+
338
315
  ## Roadmap
339
316
 
340
317
  * Automate testing for both MRI and JRuby.
341
318
  * Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
342
319
  * Destructive map iterators for DataFrame.
343
- * Completely test all functionality for NMatrix and MDArray.
320
+ * Completely test all functionality for MDArray.
344
321
  * Basic Data manipulation and analysis operations:
345
322
  - Different kinds of join operations
346
- - Dataframe/vector merge
347
- - Creation of correlation, covariance matrices
323
+ - Dataframe/vector merge (left, right, inner, outer)
348
324
  - Verification of data in a vector
349
- * Transpose a dataframe.
325
+ - DF concat
350
326
  * Option to express a DataFrame as an NMatrix or MDArray so as to use more efficient storage techniques.
351
327
  * Assignment of a column to a single number should set the entire column to that number.
352
328
  * == between daru_vector and string/number.
353
329
  * Multiple column assignment with []=
354
- * Creation of DataFrame from Array of Arrays.
355
330
  * Multiple value assignment for vectors with []=.
356
331
  * Load DataFrame from multiple sources (excel, SQL, etc.).
357
332
  * Deletion of elements from Vector should only modify the index and leave the vector as it is so that compacting is not needed and things are faster.
358
- * Add a #sync method which will sync the modified index with the unmodified vector.
359
- * Ability to reorder the index of a dataframe.
360
- * head/tail for DV.
361
333
  * #find\_max function which will evaluate a block and return the row for the value of the block is max.
362
334
  * Function to check if a value of a row/vector is within a specified range.
363
335
  * Create a new vector in map_rows if any of the already present rows dont match the one assigned in the block.
364
- * Direct functions to answer something like 'number of something per thousand of something else'.
365
- * Tests for checking NMatrix resizing
366
- * Sort while preserving index.
336
+ * Sort by index.
337
+ * Statistics on DataFrame over rows and columns.
338
+ * Cumulative sum.
339
+ * Time series support.
340
+ * Calculate percentage change.
341
+ * Working with missing data - drop\_missing\_data, dropping rows with missing data.
342
+ * Have some sample data sets for users to play around with. Should be able to load these from the code itself.
343
+ * Sorting with missing data present.
344
+ * Make vectors aware of the data frame that they are a part of.
345
+ * re_index should re establish previous index values in the newly supplied index.
346
+ * Reset index.
367
347
 
368
348
  ## Contributing
369
349
 
@@ -373,5 +353,5 @@ Pick a feature from the Roadmap above or think of your own and send me a Pull Re
373
353
 
374
354
  * Thank you [last.fm](http://www.last.fm/) for making user data accessible to the public.
375
355
 
376
- Copyright (c) 2014, Sameer Deshmukh
356
+ Copyright (c) 2015, Sameer Deshmukh
377
357
  All rights reserved