daru 0.0.4 → 0.0.5
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CONTRIBUTING.md +0 -0
- data/Gemfile +0 -1
- data/History.txt +35 -0
- data/README.md +178 -198
- data/daru.gemspec +5 -7
- data/lib/daru.rb +10 -2
- data/lib/daru/accessors/array_wrapper.rb +36 -198
- data/lib/daru/accessors/nmatrix_wrapper.rb +60 -209
- data/lib/daru/core/group_by.rb +183 -0
- data/lib/daru/dataframe.rb +615 -167
- data/lib/daru/index.rb +17 -16
- data/lib/daru/io/io.rb +5 -12
- data/lib/daru/maths/arithmetic/dataframe.rb +72 -8
- data/lib/daru/maths/arithmetic/vector.rb +19 -6
- data/lib/daru/maths/statistics/dataframe.rb +103 -2
- data/lib/daru/maths/statistics/vector.rb +102 -61
- data/lib/daru/monkeys.rb +8 -0
- data/lib/daru/multi_index.rb +199 -0
- data/lib/daru/plotting/dataframe.rb +24 -24
- data/lib/daru/plotting/vector.rb +14 -15
- data/lib/daru/vector.rb +402 -98
- data/lib/version.rb +1 -1
- data/notebooks/grouping_splitting_pivots.ipynb +529 -0
- data/notebooks/intro_with_music_data_.ipynb +104 -119
- data/spec/accessors/wrappers_spec.rb +36 -0
- data/spec/core/group_by_spec.rb +331 -0
- data/spec/dataframe_spec.rb +1237 -475
- data/spec/fixtures/sales-funnel.csv +18 -0
- data/spec/index_spec.rb +10 -21
- data/spec/io/io_spec.rb +4 -14
- data/spec/math/arithmetic/dataframe_spec.rb +66 -0
- data/spec/math/arithmetic/vector_spec.rb +45 -4
- data/spec/math/statistics/dataframe_spec.rb +91 -1
- data/spec/math/statistics/vector_spec.rb +32 -6
- data/spec/monkeys_spec.rb +10 -1
- data/spec/multi_index_spec.rb +216 -0
- data/spec/spec_helper.rb +1 -0
- data/spec/vector_spec.rb +505 -57
- metadata +21 -15
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: fd2dec0795f15ca1e45bdad5238fb7dbe33e1089
|
4
|
+
data.tar.gz: 634ff6e6b533cad019893a6e248706c824933e1d
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 2c4aed326afacb2fe2324dd720e302564ab973b7fe69e17daf8f4902fecf7a2bbe34a26b0681dc42eaef14bd511a439a2717a115a7f577f700212d0d605d6dee
|
7
|
+
data.tar.gz: be1bc452b188d233a6c668a008ed9f9e4cd77cf9b24a574559bf27c8c28ab34b0c23d51cc4321ab49c1416a53b0b74571afca698cdf2106d407f744204191362
|
data/CONTRIBUTING.md
ADDED
File without changes
|
data/Gemfile
CHANGED
data/History.txt
CHANGED
@@ -41,3 +41,38 @@
|
|
41
41
|
* #uniq for Vector.
|
42
42
|
* #max for Vector can return a Vector object with the index set to the index of the max value.
|
43
43
|
* Tonnes of documentation for most methods.
|
44
|
+
|
45
|
+
== 0.0.5
|
46
|
+
|
47
|
+
* Easy accessors for some methods
|
48
|
+
* Faster CSV loading.
|
49
|
+
* Changed vector #is\_valid? to #exists?
|
50
|
+
* Revamped dtype specifiers for Vector. Now specify :array/:nmatrix for changing underlying data implementation. Specigfy nm\_dtype for specifying the data type of the NMatrix object.
|
51
|
+
* #sort for Vector. Quick sort algorithm with preservation of original indexes.
|
52
|
+
* Removed #re\_index and #to\_index from Daru::Index.
|
53
|
+
* Ability to change the index of Vector and DataFrame with #reindex/#reindex!.
|
54
|
+
* Multi-level #sort! and #sort for DataFrames. Preserves indexing.
|
55
|
+
* All vector statistics now work with NMatrix as the underlying data type.
|
56
|
+
* Vectors keep a record of all positions with nils with #nil\_positions.
|
57
|
+
* Know whether a position has nils or not with #is_nil?
|
58
|
+
* Added #clone_structure to Vector for cloning only the index and structure or a vector.
|
59
|
+
* Figure out the type of data using #type. Running thru the data to determine its type is delayed till the last possible moment.
|
60
|
+
* Added arithmetic operations between data frame and scalars or other data frames.
|
61
|
+
* Added #map_vectors!.
|
62
|
+
* Create a DataFrame from Array of Arrays and Array of Vectors.
|
63
|
+
* Refactored DataFrame.rows and the DataFrame constructor.
|
64
|
+
* Added hierarchial indexing to Vector and DataFrame with MultiIndex.
|
65
|
+
* Convert DataFrame to ruby Matrix or NMatrix with #to\_matrix and #to\_nmatrix.
|
66
|
+
* Added #group_by to DataFrame for grouping rows according to elements in a given column. Works similar to SQL GROUP BY, only much simpler.
|
67
|
+
* Added new class Daru::Core::GroupBy for supporting various grouping methods like #head, #tail, #get_group, #size, #count, #mean, #std, #min, #max.
|
68
|
+
* Tranpose indexed/multi-indexed DataFrame with #transpose.
|
69
|
+
* Convert Daru::Vector to horizontal or vertical Ruby Matrix with #to_matrix.
|
70
|
+
* Added shortcut to DataFrame to allow access of vectors by using only #[] instead of calling #vector or *[vector_names, :vector]*.
|
71
|
+
* Added DSL for Vector and DataFrame plotting with nyaplot. Can now grab the underlying Nyaplot::Plot and Nyaplot::Diagram object for performing different operations. Only need to supply parameters for the initial creation of the diagram.
|
72
|
+
* Added #pivot_table to DataFrame for reducing and aggregating data to generate a quick summary.
|
73
|
+
* Added #shape to DataFrame for knowing the numbers of rows and columns in a DataFrame.
|
74
|
+
* Added statistics methods #mean, #std, #max, #min, #count, #product, #sum to DataFrame.
|
75
|
+
* Added #describe to DataFrame for producing multiple statistics data of numerical vectors in one shot.
|
76
|
+
* Monkey patched Ruby Matrix to include #elementwise_division.
|
77
|
+
* Added #covariance to calculate the covariance between numbers of a DataFrame and #correlation to calculate correlation.
|
78
|
+
* Enumerators return Enumerator objects if there is no block.
|
data/README.md
CHANGED
@@ -7,30 +7,35 @@ Data Analysis in RUby
|
|
7
7
|
|
8
8
|
## Introduction
|
9
9
|
|
10
|
-
daru (Data Analysis in RUby) is a library for storage, analysis and
|
11
|
-
|
12
|
-
Development of daru was started to address the fragmentation of Dataframe-like classes which were created in many ruby gems as per their own needs. daru offers a uniform interface for all sorts of data analysis and manipulation operations and aims to be compatible with all ruby gems involved in any way with data.
|
10
|
+
daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data.
|
13
11
|
|
14
12
|
daru is inspired by `Statsample::Dataset` and pandas, a very mature solution in Python.
|
15
13
|
|
16
|
-
|
14
|
+
Written in pure Ruby so should work with all ruby implementations.
|
17
15
|
|
18
16
|
## Features
|
19
17
|
|
20
18
|
* Data structures:
|
21
19
|
- Vector - A basic 1-D vector.
|
22
|
-
- DataFrame - A 2-D
|
23
|
-
* Compatible with IRuby notebook.
|
24
|
-
*
|
20
|
+
- DataFrame - A 2-D table-like structure which is internally composed of named `Vectors`.
|
21
|
+
* Compatible with [IRuby notebook](https://github.com/minad/iruby) and [statsample](https://github.com/clbustos/statsample).
|
22
|
+
* Singly and hierarchially indexed data structures.
|
25
23
|
* Flexible and intuitive API for manipulation and analysis of data.
|
24
|
+
* Easy plotting, statistics and arithmetic.
|
25
|
+
* Plentiful iterators.
|
26
|
+
* Optional speed and space optimization on MRI with [NMatrix](https://github.com/SciRuby/nmatrix).
|
27
|
+
* Easy splitting, aggregation and grouping of data.
|
28
|
+
* Quickly reducing data with pivot tables for quick data summary.
|
26
29
|
|
27
30
|
## Notebooks
|
28
31
|
|
29
|
-
* [Analysis and plotting of a data set comprising of music listening habits of a last.fm user
|
32
|
+
* [Analysis and plotting of a data set comprising of music listening habits of a last.fm user](http://nbviewer.ipython.org/github/v0dro/daru/blob/master/notebooks/intro_with_music_data_.ipynb)
|
33
|
+
* [Basic splitting, grouping and aggregating of data](http://nbviewer.ipython.org/github/v0dro/daru/blob/master/notebooks/grouping_splitting_pivots.ipynb)
|
30
34
|
|
31
35
|
## Blog Posts
|
32
36
|
|
33
37
|
* [Data Analysis in RUby: Basic data manipulation and plotting](http://v0dro.github.io/blog/2014/11/25/data-analysis-in-ruby-basic-data-manipulation-and-plotting/)
|
38
|
+
* [Data Analysis in RUby: Splitting, sorting, aggregating data and data types](http://v0dro.github.io/blog/2015/02/24/data-analysis-in-ruby-part-2/)
|
34
39
|
|
35
40
|
## Documentation
|
36
41
|
|
@@ -38,34 +43,23 @@ Docs can be found [here](https://rubygems.org/gems/daru).
|
|
38
43
|
|
39
44
|
## Basic Usage
|
40
45
|
|
41
|
-
daru has been created with keeping extreme ease of use in mind.
|
42
|
-
|
43
|
-
The gem consists of two data structures, Vector and DataFrame. Any data in a serial format is a Vector and a table is a DataFrame.
|
44
|
-
|
45
46
|
#### Initialization of DataFrame
|
46
47
|
|
47
|
-
A data frame can be initialized from the following sources:
|
48
|
-
* Hash of indexed order: `{ b: Daru::Vector.new(:b, [11,12,13,14,15], [:two, :one, :four, :five, :three]), a: Daru::Vector.new(:a, [1,2,3,4,5], [:two,:one,:three, :four, :five])}`.
|
49
|
-
* Array of hashes: `[{a: 1, b: 11}, {a: 2, b: 12}, {a: 3, b: 13},{a: 4, b: 14}, {a: 5, b: 15}]`.
|
50
|
-
* Hash of names and Arrays: `{b: [11,12,13,14,15], a: [1,2,3,4,5]}`
|
51
|
-
|
52
|
-
The DataFrame constructor takes 4 arguments: source, vectors, indexes and name in that order. The last 3 are optional while the first is mandatory.
|
53
|
-
|
54
48
|
A basic DataFrame can be initialized like this:
|
55
49
|
|
56
50
|
```ruby
|
57
51
|
|
58
|
-
|
59
|
-
|
60
|
-
# =>
|
61
|
-
# # <Daru::DataFrame:87274040 @name = 7308c587-4073-4e7d-b3ca-3679d1dcc946 # @size = 5>
|
62
|
-
# a b
|
63
|
-
# one 1 11
|
64
|
-
# two 2 12
|
65
|
-
# three 3 13
|
66
|
-
# four 4 14
|
67
|
-
# five 5 15
|
52
|
+
df = Daru::DataFrame.new({b: [11,12,13,14,15], a: [1,2,3,4,5]}, order: [:a, :b], index: [:one, :two, :three, :four, :five])
|
53
|
+
df
|
68
54
|
|
55
|
+
# =>
|
56
|
+
# # <Daru::DataFrame:87274040 @name = 7308c587-4073-4e7d-b3ca-3679d1dcc946 # @size = 5>
|
57
|
+
# a b
|
58
|
+
# one 1 11
|
59
|
+
# two 2 12
|
60
|
+
# three 3 13
|
61
|
+
# four 4 14
|
62
|
+
# five 5 15
|
69
63
|
```
|
70
64
|
Daru will automatically align the vectors correctly according to the specified index and then create the DataFrame. Thus, elements having the same index will show up in the same row. The indexes will be arranged alphabetically if vectors with unaligned indexes are supplied.
|
71
65
|
|
@@ -73,69 +67,63 @@ The vectors of the DataFrame will be arranged according to the array specified i
|
|
73
67
|
|
74
68
|
```ruby
|
75
69
|
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
df
|
83
|
-
|
84
|
-
# =>
|
85
|
-
# #<Daru::DataFrame:87363700 @name = 75ba0a14-8291-48ac-ac30-35017e4d6c5f # @size = 5>
|
86
|
-
# a b
|
87
|
-
# five 5 14
|
88
|
-
# four 4 13
|
89
|
-
# one 2 12
|
90
|
-
# three 3 15
|
91
|
-
# two 1 11
|
70
|
+
df = Daru::DataFrame.new({
|
71
|
+
b: [11,12,13,14,15].dv(:b, [:two, :one, :four, :five, :three]),
|
72
|
+
a: [1,2,3,4,5].dv(:a, [:two,:one,:three, :four, :five])
|
73
|
+
}, order: [:a, :b]
|
74
|
+
)
|
75
|
+
df
|
92
76
|
|
77
|
+
# =>
|
78
|
+
# #<Daru::DataFrame:87363700 @name = 75ba0a14-8291-48ac-ac30-35017e4d6c5f # @size = 5>
|
79
|
+
# a b
|
80
|
+
# five 5 14
|
81
|
+
# four 4 13
|
82
|
+
# one 2 12
|
83
|
+
# three 3 15
|
84
|
+
# two 1 11
|
93
85
|
```
|
94
86
|
|
95
87
|
If an index for the DataFrame is supplied (third argument), then the indexes of the individual vectors will be matched to the DataFrame index. If any of the indexes do not match, nils will be inserted instead:
|
96
88
|
|
97
89
|
```ruby
|
98
90
|
|
99
|
-
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
|
113
|
-
|
114
|
-
|
115
|
-
|
91
|
+
df = Daru::DataFrame.new({
|
92
|
+
b: [11] .dv(nil, [:one]),
|
93
|
+
a: [1,2,3] .dv(nil, [:one, :two, :three]),
|
94
|
+
c: [11,22,33,44,55] .dv(nil, [:one, :two, :three, :four, :five]),
|
95
|
+
d: [49,69,89,99,108,44].dv(nil, [:one, :two, :three, :four, :five, :six])
|
96
|
+
}, order: [:a, :b, :c, :d], index: [:one, :two, :three, :four, :five, :six])
|
97
|
+
df
|
98
|
+
# =>
|
99
|
+
# #<Daru::DataFrame:87523270 @name = bda4eb68-afdd-4404-9981-708edab14201 #@size = 6>
|
100
|
+
# a b c d
|
101
|
+
# one 1 11 11 49
|
102
|
+
# two 2 nil 22 69
|
103
|
+
# three 3 nil 33 89
|
104
|
+
# four nil nil 44 99
|
105
|
+
# five nil nil 55 108
|
106
|
+
# six nil nil nil 44
|
116
107
|
```
|
117
108
|
|
118
109
|
If some of the supplied vectors do not contain certain indexes that are contained in other vectors, they are added to those vectors and the correspoding elements are set to `nil`.
|
119
110
|
|
120
111
|
```ruby
|
121
112
|
|
122
|
-
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
|
133
|
-
|
134
|
-
|
135
|
-
|
136
|
-
# three 3 15
|
137
|
-
# two 1 11
|
138
|
-
|
113
|
+
df = Daru::DataFrame.new({
|
114
|
+
b: [11,12,13,14,15].dv(:b, [:two, :one, :four, :five, :three]),
|
115
|
+
a: [1,2,3] .dv(:a, [:two,:one,:three])
|
116
|
+
}, order: [:a, :b])
|
117
|
+
df
|
118
|
+
|
119
|
+
# =>
|
120
|
+
# #<Daru::DataFrame:87612510 @name = 1e904c15-e095-4dce-bfdf-c07ee4d6e4a4 # @size = 5>
|
121
|
+
# a b
|
122
|
+
# five nil 14
|
123
|
+
# four nil 13
|
124
|
+
# one 2 12
|
125
|
+
# three 3 15
|
126
|
+
# two 1 11
|
139
127
|
```
|
140
128
|
|
141
129
|
#### Initialization of Vector
|
@@ -146,36 +134,32 @@ In the simplest case it can be constructed like this:
|
|
146
134
|
|
147
135
|
```ruby
|
148
136
|
|
149
|
-
|
150
|
-
|
151
|
-
|
152
|
-
|
153
|
-
|
154
|
-
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
159
|
-
# pach 5
|
160
|
-
|
137
|
+
dv = Daru::Vector.new [1,2,3,4,5], name: ravan, index: [:ek, :don, :teen, :char, :pach]
|
138
|
+
dv
|
139
|
+
# =>
|
140
|
+
# #<Daru::Vector:87630270 @name = ravan @size = 5 >
|
141
|
+
# ravan
|
142
|
+
# ek 1
|
143
|
+
# don 2
|
144
|
+
# teen 3
|
145
|
+
# char 4
|
146
|
+
# pach 5
|
161
147
|
```
|
162
148
|
|
163
149
|
Initializing a vector with indexes will insert nils in places where elements dont exist:
|
164
150
|
|
165
151
|
```ruby
|
166
152
|
|
167
|
-
|
168
|
-
|
169
|
-
|
170
|
-
|
171
|
-
|
172
|
-
|
173
|
-
|
174
|
-
|
175
|
-
|
176
|
-
|
177
|
-
|
178
|
-
|
153
|
+
dv = Daru::Vector.new [1,2,3], name: yoga, index: [0,1,2,3,4]
|
154
|
+
dv
|
155
|
+
# =>
|
156
|
+
# #<Daru::Vector:87890840 @name = yoga @size = 5 >
|
157
|
+
# y
|
158
|
+
# 0 1
|
159
|
+
# 1 2
|
160
|
+
# 2 3
|
161
|
+
# 3 nil
|
162
|
+
# 4 nil
|
179
163
|
```
|
180
164
|
|
181
165
|
#### Basic Selection Operations
|
@@ -184,34 +168,32 @@ Initialize a dataframe:
|
|
184
168
|
|
185
169
|
```ruby
|
186
170
|
|
187
|
-
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
|
197
|
-
|
198
|
-
|
199
|
-
|
200
|
-
# three 3 15
|
201
|
-
# two 1 11
|
171
|
+
df = Daru::DataFrame.new({
|
172
|
+
b: [11,12,13,14,15].dv(:b, [:two, :one, :four, :five, :three]),
|
173
|
+
a: [1,2,3,4,5].dv(:a, [:two,:one,:three, :four, :five])
|
174
|
+
}, order: [:a, :b])
|
175
|
+
|
176
|
+
# =>
|
177
|
+
# #<Daru::DataFrame:87455010 @name = b3d14e23-98c2-4741-a563-92e8f1fd0f13 # @size = 5>
|
178
|
+
# a b
|
179
|
+
# five 5 14
|
180
|
+
# four 4 13
|
181
|
+
# one 2 12
|
182
|
+
# three 3 15
|
183
|
+
# two 1 11
|
202
184
|
|
203
185
|
```
|
204
186
|
Select a row from a DataFrame:
|
205
187
|
|
206
188
|
```ruby
|
207
189
|
|
208
|
-
|
190
|
+
df.row[:one]
|
209
191
|
|
210
|
-
|
211
|
-
|
212
|
-
|
213
|
-
|
214
|
-
|
192
|
+
# =>
|
193
|
+
# #<Daru::Vector:87432070 @name = one @size = 2 >
|
194
|
+
# one
|
195
|
+
# a 2
|
196
|
+
# b 12
|
215
197
|
```
|
216
198
|
A row or a vector is returned as a `Daru::Vector` object, so any manipulations supported by `Daru::Vector` can be performed on the chosen row as well.
|
217
199
|
|
@@ -233,101 +215,92 @@ Select a single vector:
|
|
233
215
|
|
234
216
|
```ruby
|
235
217
|
|
236
|
-
|
237
|
-
|
238
|
-
|
239
|
-
|
240
|
-
|
241
|
-
|
242
|
-
|
243
|
-
|
244
|
-
|
245
|
-
|
246
|
-
|
218
|
+
df.vector[:a] # or simply df.a
|
219
|
+
|
220
|
+
# =>
|
221
|
+
# #<Daru::Vector:87454270 @name = a @size = 5 >
|
222
|
+
# a
|
223
|
+
# five 5
|
224
|
+
# four 4
|
225
|
+
# one 2
|
226
|
+
# three 3
|
227
|
+
# two 1
|
247
228
|
```
|
248
229
|
|
249
230
|
Select multiple vectors and return a DataFrame in the specified order:
|
250
231
|
|
251
232
|
```ruby
|
252
233
|
|
253
|
-
|
254
|
-
|
255
|
-
|
256
|
-
|
257
|
-
|
258
|
-
|
259
|
-
|
260
|
-
|
261
|
-
|
262
|
-
|
234
|
+
df.vector[:b, :a]
|
235
|
+
# =>
|
236
|
+
# #<Daru::DataFrame:87835960 @name = e80902cc-cff9-4b23-9eca-5da36ebc88a8 # @size = 5>
|
237
|
+
# b a
|
238
|
+
# five 14 5
|
239
|
+
# four 13 4
|
240
|
+
# one 12 2
|
241
|
+
# three 15 3
|
242
|
+
# two 11 1
|
263
243
|
```
|
264
244
|
|
265
245
|
Keep/remove row according to a specified condition:
|
266
246
|
|
267
247
|
```ruby
|
268
248
|
|
269
|
-
|
270
|
-
|
271
|
-
|
272
|
-
|
273
|
-
|
274
|
-
|
275
|
-
|
276
|
-
|
277
|
-
# five 5 14
|
278
|
-
|
249
|
+
df = df.filter_rows do |row|
|
250
|
+
row[:a] == 5
|
251
|
+
end
|
252
|
+
df
|
253
|
+
# =>
|
254
|
+
# #<Daru::DataFrame:87455010 @name = b3d14e23-98c2-4741-a563-92e8f1fd0f13 # @size = 1>
|
255
|
+
# a b
|
256
|
+
# five 5 14
|
279
257
|
```
|
280
258
|
The same can be applied to vectors using `filter_vectors`.
|
281
259
|
|
282
|
-
To iterate over a DataFrame and perform operations on rows or vectors, use `#each_row` or `#each_vector`.
|
283
|
-
|
284
260
|
To change the values of a row/vector while iterating through the DataFrame, use `map_rows` or `map_vectors`:
|
285
261
|
|
286
262
|
```ruby
|
287
263
|
|
288
|
-
|
289
|
-
|
290
|
-
|
291
|
-
|
292
|
-
|
293
|
-
|
294
|
-
|
295
|
-
|
296
|
-
|
297
|
-
|
298
|
-
|
299
|
-
|
300
|
-
|
301
|
-
# two 1 121
|
302
|
-
|
264
|
+
df.map_rows do |row|
|
265
|
+
row = row * row
|
266
|
+
end
|
267
|
+
|
268
|
+
df
|
269
|
+
# =>
|
270
|
+
# #<Daru::DataFrame:86826830 @name = b092ca5b-7b83-4dbe-a469-124f7f25a568 # @size = 5>
|
271
|
+
# a b
|
272
|
+
# five 25 196
|
273
|
+
# four 16 169
|
274
|
+
# one 4 144
|
275
|
+
# three 9 225
|
276
|
+
# two 1 121
|
303
277
|
```
|
304
278
|
|
305
|
-
Rows/vectors can be deleted using `delete_row` or `delete_vector`.
|
306
|
-
|
307
279
|
#### Basic Maths Operations
|
308
280
|
|
309
281
|
Performing a binary arithmetic operation on two `Daru::Vector` objects will return a `Vector` object in which the operation will be performed on elements of the same index.
|
310
282
|
|
311
283
|
```ruby
|
312
284
|
|
313
|
-
|
314
|
-
|
315
|
-
|
285
|
+
dv1 = Daru::Vector.new [1,2,3,4], name: :boozy, index: [:a, :b, :c, :d]
|
286
|
+
dv2 = Daru::Vector.new [1,2,3,4], name: :mayer, index: [:e, :f, :b, :d]
|
287
|
+
dv1 * dv2
|
316
288
|
|
317
|
-
|
318
|
-
|
319
|
-
|
320
|
-
|
321
|
-
# b 6
|
322
|
-
# d 16
|
323
|
-
|
289
|
+
# #<Daru::Vector:80924700 @name = boozy @size = 2 >
|
290
|
+
# boozy
|
291
|
+
# b 6
|
292
|
+
# d 16
|
324
293
|
```
|
325
294
|
|
326
295
|
Arithmetic operators applied on a single Numeric will perform the operation with that number against the entire vector.
|
327
296
|
|
328
|
-
|
297
|
+
Same applies to DataFrame as well.
|
298
|
+
|
299
|
+
#### Splitting and aggregation of data
|
329
300
|
|
330
|
-
Daru::
|
301
|
+
`Daru::DataFrame` provides the `#group_by` method to split or aggregate data. Its very similar to SQL GROUP BY. Check the [blog post]() for details.
|
302
|
+
|
303
|
+
You can also generate Excel-style pivot tables with `#pivot_table`.
|
331
304
|
|
332
305
|
#### Plotting
|
333
306
|
|
@@ -335,35 +308,42 @@ daru uses [Nyaplot](https://github.com/domitry/nyaplot) for plotting and an exam
|
|
335
308
|
|
336
309
|
Head over to the tutorials and notebooks listed above for more examples.
|
337
310
|
|
311
|
+
#### Working with missing data
|
312
|
+
|
313
|
+
Missing data is an integral part of any data analysis operation and [this blog post](http://v0dro.github.io/blog/2015/02/24/data-analysis-in-ruby-part-2/) provides details on dealing with missing data.
|
314
|
+
|
338
315
|
## Roadmap
|
339
316
|
|
340
317
|
* Automate testing for both MRI and JRuby.
|
341
318
|
* Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
|
342
319
|
* Destructive map iterators for DataFrame.
|
343
|
-
* Completely test all functionality for
|
320
|
+
* Completely test all functionality for MDArray.
|
344
321
|
* Basic Data manipulation and analysis operations:
|
345
322
|
- Different kinds of join operations
|
346
|
-
- Dataframe/vector merge
|
347
|
-
- Creation of correlation, covariance matrices
|
323
|
+
- Dataframe/vector merge (left, right, inner, outer)
|
348
324
|
- Verification of data in a vector
|
349
|
-
|
325
|
+
- DF concat
|
350
326
|
* Option to express a DataFrame as an NMatrix or MDArray so as to use more efficient storage techniques.
|
351
327
|
* Assignment of a column to a single number should set the entire column to that number.
|
352
328
|
* == between daru_vector and string/number.
|
353
329
|
* Multiple column assignment with []=
|
354
|
-
* Creation of DataFrame from Array of Arrays.
|
355
330
|
* Multiple value assignment for vectors with []=.
|
356
331
|
* Load DataFrame from multiple sources (excel, SQL, etc.).
|
357
332
|
* Deletion of elements from Vector should only modify the index and leave the vector as it is so that compacting is not needed and things are faster.
|
358
|
-
* Add a #sync method which will sync the modified index with the unmodified vector.
|
359
|
-
* Ability to reorder the index of a dataframe.
|
360
|
-
* head/tail for DV.
|
361
333
|
* #find\_max function which will evaluate a block and return the row for the value of the block is max.
|
362
334
|
* Function to check if a value of a row/vector is within a specified range.
|
363
335
|
* Create a new vector in map_rows if any of the already present rows dont match the one assigned in the block.
|
364
|
-
*
|
365
|
-
*
|
366
|
-
*
|
336
|
+
* Sort by index.
|
337
|
+
* Statistics on DataFrame over rows and columns.
|
338
|
+
* Cumulative sum.
|
339
|
+
* Time series support.
|
340
|
+
* Calculate percentage change.
|
341
|
+
* Working with missing data - drop\_missing\_data, dropping rows with missing data.
|
342
|
+
* Have some sample data sets for users to play around with. Should be able to load these from the code itself.
|
343
|
+
* Sorting with missing data present.
|
344
|
+
* Make vectors aware of the data frame that they are a part of.
|
345
|
+
* re_index should re establish previous index values in the newly supplied index.
|
346
|
+
* Reset index.
|
367
347
|
|
368
348
|
## Contributing
|
369
349
|
|
@@ -373,5 +353,5 @@ Pick a feature from the Roadmap above or think of your own and send me a Pull Re
|
|
373
353
|
|
374
354
|
* Thank you [last.fm](http://www.last.fm/) for making user data accessible to the public.
|
375
355
|
|
376
|
-
Copyright (c)
|
356
|
+
Copyright (c) 2015, Sameer Deshmukh
|
377
357
|
All rights reserved
|