daru 0.1.1 → 0.1.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +1 -5
- data/CONTRIBUTING.md +2 -11
- data/History.md +18 -0
- data/README.md +109 -11
- data/daru.gemspec +11 -6
- data/images/README.md +5 -0
- data/images/con0.png +0 -0
- data/images/con1.png +0 -0
- data/images/init0.png +0 -0
- data/images/init1.png +0 -0
- data/images/man0.png +0 -0
- data/images/man1.png +0 -0
- data/images/man2.png +0 -0
- data/images/man3.png +0 -0
- data/images/man4.png +0 -0
- data/images/man5.png +0 -0
- data/images/man6.png +0 -0
- data/images/plot0.png +0 -0
- data/lib/daru.rb +5 -2
- data/lib/daru/core/group_by.rb +45 -45
- data/lib/daru/core/merge.rb +59 -1
- data/lib/daru/dataframe.rb +255 -226
- data/lib/daru/exceptions.rb +2 -0
- data/lib/daru/io/io.rb +41 -19
- data/lib/daru/io/sql_data_source.rb +116 -0
- data/lib/daru/vector.rb +124 -104
- data/lib/daru/version.rb +1 -1
- data/spec/core/group_by_spec.rb +12 -2
- data/spec/core/merge_spec.rb +14 -1
- data/spec/dataframe_spec.rb +189 -158
- data/spec/io/io_spec.rb +80 -2
- data/spec/io/sql_data_source_spec.rb +67 -0
- data/spec/spec_helper.rb +4 -2
- data/spec/support/database_helper.rb +30 -0
- data/spec/vector_spec.rb +45 -46
- metadata +104 -16
- data/.build.sh +0 -14
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ed2a3e2a4cd9fce8d95af6aac9c3db532eed444f
|
4
|
+
data.tar.gz: 90ca6a62ee824d20f72a9f6689c03f27d7667168
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: e6f3345ef4372e1c45a3d80c0cc61c2b4c72e4c810cfb183f30bfd9285a09639ea39cd0a3597fc63551d7f72398d8d83af4424855018e8a5b2a99274b46625cd
|
7
|
+
data.tar.gz: 65d262b1deec54680a5fdcfecda3530c9fb9450dbd280c18833b655521418ed340f311253221dfd4e018577b063f9d3638d0d600a108c3b98ec5a7cd2dfe98ec
|
data/.travis.yml
CHANGED
@@ -1,13 +1,11 @@
|
|
1
1
|
language:
|
2
2
|
ruby
|
3
3
|
|
4
|
-
env:
|
5
|
-
- CPLUS_INCLUDE_PATH=/usr/include/atlas C_INCLUDE_PATH=/usr/include/atlas
|
6
|
-
|
7
4
|
rvm:
|
8
5
|
- '2.0'
|
9
6
|
- '2.1'
|
10
7
|
- '2.2'
|
8
|
+
- '2.3.0'
|
11
9
|
|
12
10
|
matrix:
|
13
11
|
fast_finish:
|
@@ -17,11 +15,9 @@ script: "bundle exec rspec"
|
|
17
15
|
|
18
16
|
install:
|
19
17
|
- gem install bundler
|
20
|
-
- ./.build.sh
|
21
18
|
- bundle install
|
22
19
|
|
23
20
|
before_install:
|
24
21
|
- sudo apt-get update -qq
|
25
|
-
- sudo apt-get install -qq libatlas-base-dev
|
26
22
|
- sudo apt-get install -y libgsl0-dev r-base r-base-dev
|
27
23
|
- sudo Rscript -e "install.packages(c('Rserve','irr'),,'http://cran.us.r-project.org')"
|
data/CONTRIBUTING.md
CHANGED
@@ -2,25 +2,16 @@
|
|
2
2
|
|
3
3
|
## Installing daru development dependencies
|
4
4
|
|
5
|
-
|
6
|
-
|
7
|
-
Keep in mind that either nmatrix or rb-gsl are NOT NECESSARY for using daru. They are just required for an optional speed up and for running the test suite.
|
5
|
+
Either nmatrix or rb-gsl are NOT NECESSARY for using daru. They are just required for an optional speed up and for running the test suite.
|
8
6
|
|
9
7
|
To install dependencies, execute the following commands:
|
10
8
|
|
11
|
-
`export CPLUS_INCLUDE_PATH=/usr/include/atlas`
|
12
|
-
`export C_INCLUDE_PATH=/usr/include/atlas`
|
13
9
|
`sudo apt-get update -qq`
|
14
|
-
`sudo apt-get install -qq libatlas-base-dev`
|
15
|
-
`sudo apt-get --purge remove liblapack-dev liblapack3 liblapack3gf`
|
16
10
|
`sudo apt-get install -y libgsl0-dev r-base r-base-dev`
|
17
11
|
`sudo Rscript -e "install.packages(c('Rserve','irr'),,'http://cran.us.r-project.org')"`
|
18
12
|
|
19
|
-
Then execute the [.build.sh script](https://github.com/v0dro/daru/blob/master/.build.sh) to clone and install the latest nmatrix system:
|
20
|
-
|
21
|
-
`./.build.sh`
|
22
13
|
|
23
|
-
Then
|
14
|
+
Then install remaining dependencies:
|
24
15
|
|
25
16
|
`bundle install`
|
26
17
|
|
data/History.md
CHANGED
@@ -1,3 +1,21 @@
|
|
1
|
+
# 0.1.2
|
2
|
+
|
3
|
+
* Enhancements
|
4
|
+
- New method `DataFrame.from_activerecord` for importing data sets from ActiveRecord. (by @mrkn)
|
5
|
+
- Better importing of data from SQL databases by extracting that functionality into a separate class called `Daru::IO::SqlDataSource` (by @mrkn).
|
6
|
+
- Faster algorithm for performing inner joins by using the bloomfilter-rb gem. Available only for MRI. (by Peter Tung)
|
7
|
+
- Added exception `SizeError` (by Peter Tung).
|
8
|
+
- Removed outdated dependencies and build scripts, updated existing dependencies.
|
9
|
+
- Ability to sort a Daru::Vector with nils present (by @gnilrets)
|
10
|
+
|
11
|
+
* Fixes
|
12
|
+
- Fix column creation for `Dataframe.from_sql` (by @dansbits).
|
13
|
+
- group_by can now be performed on DataFrames with nils (@gnilrets).
|
14
|
+
- Bug fix for DataFrame Vectors not duplicating when calling `DataFrame#dup` (by @gnilrets).
|
15
|
+
- Bug fix when concantenating DataFrames (by @gnilrets)
|
16
|
+
- Handling improper arguments to `Daru::Vector#[]` (by @lokeshh)
|
17
|
+
- Resolve narray conflict by using the latest nmatrix require methods (by @lokeshh)
|
18
|
+
|
1
19
|
# 0.1.1
|
2
20
|
|
3
21
|
* Enhancements
|
data/README.md
CHANGED
@@ -1,18 +1,13 @@
|
|
1
|
-
daru
|
2
|
-
====
|
3
|
-
|
4
|
-
Data Analysis in RUby
|
1
|
+
# daru - Data Analysis in RUby
|
5
2
|
|
6
3
|
[![Gem Version](https://badge.fury.io/rb/daru.svg)](http://badge.fury.io/rb/daru)
|
7
4
|
[![Build Status](https://travis-ci.org/v0dro/daru.svg)](https://travis-ci.org/v0dro/daru)
|
8
5
|
|
9
6
|
## Introduction
|
10
7
|
|
11
|
-
daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data.
|
12
|
-
|
13
|
-
daru is inspired by pandas, a very mature solution in Python.
|
8
|
+
daru (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data in Ruby.
|
14
9
|
|
15
|
-
Written in pure Ruby
|
10
|
+
daru makes it easy and intuituive to process data predominantly through 2 data structures: `Daru::DataFrame` and `Daru::Vector`. Written in pure Ruby works with all ruby implementations. Tested with MRI 2.0, 2.1, 2.2 and 2.3.
|
16
11
|
|
17
12
|
## Features
|
18
13
|
|
@@ -28,7 +23,7 @@ Written in pure Ruby so should work with all ruby implementations. Tested with M
|
|
28
23
|
* Optional speed and space optimization on MRI with [NMatrix](https://github.com/SciRuby/nmatrix) and GSL.
|
29
24
|
* Easy splitting, aggregation and grouping of data.
|
30
25
|
* Quickly reducing data with pivot tables for quick data summary.
|
31
|
-
* Import and export data from and to Excel, CSV, SQL Databases and plain text files.
|
26
|
+
* Import and export data from and to Excel, CSV, SQL Databases, ActiveRecord and plain text files.
|
32
27
|
|
33
28
|
## Notebooks
|
34
29
|
|
@@ -64,6 +59,111 @@ Written in pure Ruby so should work with all ruby implementations. Tested with M
|
|
64
59
|
* [Analysis of Time Series in daru](http://v0dro.github.io/blog/2015/07/31/analysis-of-time-series-in-daru/)
|
65
60
|
* [Date Offsets in Daru](http://v0dro.github.io/blog/2015/07/27/date-offsets-in-daru/)
|
66
61
|
|
62
|
+
## Basic Usage
|
63
|
+
|
64
|
+
daru exposes two major data structures: `DataFrame` and `Vector`. The Vector is a basic 1-D structure corresponding to a labelled Array, while the `DataFrame` - daru's primary data structure - is 2-D spreadsheet-like structure for manipulating and storing data sets.
|
65
|
+
|
66
|
+
Basic DataFrame intitialization.
|
67
|
+
|
68
|
+
``` ruby
|
69
|
+
data_frame = Daru::DataFrame.new(
|
70
|
+
{
|
71
|
+
'Beer' => ['Kingfisher', 'Snow', 'Bud Light', 'Tiger Beer', 'Budweiser'],
|
72
|
+
'Gallons sold' => [500, 400, 450, 200, 250]
|
73
|
+
},
|
74
|
+
index: ['India', 'China', 'USA', 'Malaysia', 'Canada']
|
75
|
+
)
|
76
|
+
data_frame
|
77
|
+
```
|
78
|
+
![init0](images/init0.png)
|
79
|
+
|
80
|
+
|
81
|
+
Load data from CSV files.
|
82
|
+
``` ruby
|
83
|
+
df = Daru::DataFrame.from_csv('TradeoffData.csv')
|
84
|
+
```
|
85
|
+
![init1](images/init1.png)
|
86
|
+
|
87
|
+
*Basic Data Manipulation*
|
88
|
+
|
89
|
+
Selecting rows.
|
90
|
+
``` ruby
|
91
|
+
data_frame.row['USA']
|
92
|
+
```
|
93
|
+
![man0](images/man0.png)
|
94
|
+
|
95
|
+
Selecting columns.
|
96
|
+
``` ruby
|
97
|
+
data_frame['Beer']
|
98
|
+
```
|
99
|
+
![man1](images/man1.png)
|
100
|
+
|
101
|
+
A range of rows.
|
102
|
+
``` ruby
|
103
|
+
data_frame.row['India'..'USA']
|
104
|
+
```
|
105
|
+
![man2](images/man2.png)
|
106
|
+
|
107
|
+
The first 2 rows.
|
108
|
+
``` ruby
|
109
|
+
data_frame.first(2)
|
110
|
+
```
|
111
|
+
![man3](images/man3.png)
|
112
|
+
|
113
|
+
The last 2 rows.
|
114
|
+
``` ruby
|
115
|
+
data_frame.last(2)
|
116
|
+
```
|
117
|
+
![man4](images/man4.png)
|
118
|
+
|
119
|
+
Adding a new column.
|
120
|
+
``` ruby
|
121
|
+
data_frame['Gallons produced'] = [550, 500, 600, 210, 240]
|
122
|
+
```
|
123
|
+
![man5](images/man5.png)
|
124
|
+
|
125
|
+
Creating a new column based on data in other columns.
|
126
|
+
``` ruby
|
127
|
+
data_frame['Demand supply gap'] = data_frame['Gallons produced'] - data_frame['Gallons sold']
|
128
|
+
```
|
129
|
+
![man6](images/man6.png)
|
130
|
+
|
131
|
+
*Condition based selection*
|
132
|
+
|
133
|
+
Selecting countries based on the number of gallons sold in each. We use a syntax similar to that defined by [Arel](https://github.com/rails/arel), i.e. by using the `where` clause.
|
134
|
+
``` ruby
|
135
|
+
data_frame.where(data_frame['Gallons sold'].lt(300))
|
136
|
+
```
|
137
|
+
![con0](images/con0.png)
|
138
|
+
|
139
|
+
You can pass a combination of boolean operations into the `#where` method and it should work fine:
|
140
|
+
``` ruby
|
141
|
+
data_frame.where(
|
142
|
+
data_frame['Beer']
|
143
|
+
.in(['Snow', 'Kingfisher','Tiger Beer'])
|
144
|
+
.and(
|
145
|
+
data_frame['Gallons produced'].gt(520).or(data_frame['Gallons produced'].lt(250))
|
146
|
+
)
|
147
|
+
)
|
148
|
+
```
|
149
|
+
![con1](images/con1.png)
|
150
|
+
|
151
|
+
*Plotting*
|
152
|
+
|
153
|
+
Daru supports plotting of interactive graphs with [nyaplot](). You can easily create a plot with the `#plot` method. Here we plot the gallons sold on the Y axis and name of the brand on the X axis in a bar graph.
|
154
|
+
``` ruby
|
155
|
+
data_frame.plot type: :bar, x: 'Beer', y: 'Gallons sold' do |plot, diagram|
|
156
|
+
plot.x_label "Beer"
|
157
|
+
plot.y_label "Gallons Sold"
|
158
|
+
plot.yrange [0,600]
|
159
|
+
plot.width 500
|
160
|
+
plot.height 400
|
161
|
+
end
|
162
|
+
```
|
163
|
+
![plot0](images/plot0.png)
|
164
|
+
|
165
|
+
In addition to nyaplot, daru also supports plotting out of the box with [gnuplotrb](https://github.com/SciRuby/gnuplotrb).
|
166
|
+
|
67
167
|
## Documentation
|
68
168
|
|
69
169
|
Docs can be found [here](https://rubygems.org/gems/daru).
|
@@ -71,8 +171,6 @@ Docs can be found [here](https://rubygems.org/gems/daru).
|
|
71
171
|
## Roadmap
|
72
172
|
|
73
173
|
* Enable creation of DataFrame by only specifying an NMatrix/MDArray in initialize. Vector naming happens automatically (alphabetic) or is specified in an Array.
|
74
|
-
* Basic Data manipulation and analysis operations:
|
75
|
-
- DF concat
|
76
174
|
* Assignment of a column to a single number should set the entire column to that number.
|
77
175
|
* Multiple column assignment with []=
|
78
176
|
* Multiple value assignment for vectors with []=.
|
data/daru.gemspec
CHANGED
@@ -34,7 +34,7 @@ Thank you for installing daru!
|
|
34
34
|
oOOOOOo
|
35
35
|
,| oO
|
36
36
|
//| |
|
37
|
-
|
37
|
+
\\\\| |
|
38
38
|
`| |
|
39
39
|
`-----`
|
40
40
|
|
@@ -50,17 +50,22 @@ Cheers!
|
|
50
50
|
EOF
|
51
51
|
|
52
52
|
spec.add_runtime_dependency 'reportbuilder', '~> 1.4'
|
53
|
-
spec.add_runtime_dependency 'spreadsheet', '~> 1.
|
53
|
+
spec.add_runtime_dependency 'spreadsheet', '~> 1.1.1'
|
54
54
|
|
55
55
|
spec.add_development_dependency 'bundler', '~> 1.10'
|
56
|
-
spec.add_development_dependency 'rake'
|
56
|
+
spec.add_development_dependency 'rake', '~>10.5'
|
57
57
|
spec.add_development_dependency 'pry', '~> 0.10'
|
58
58
|
spec.add_development_dependency 'pry-byebug'
|
59
59
|
spec.add_development_dependency 'rserve-client', '~> 0.3'
|
60
|
-
spec.add_development_dependency 'rspec'
|
60
|
+
spec.add_development_dependency 'rspec', '~> 3.4'
|
61
61
|
spec.add_development_dependency 'awesome_print'
|
62
62
|
spec.add_development_dependency 'nyaplot', '~> 0.1.5'
|
63
|
-
spec.add_development_dependency 'nmatrix', '~> 0.1
|
63
|
+
spec.add_development_dependency 'nmatrix', '~> 0.2.1'
|
64
64
|
spec.add_development_dependency 'distribution', '~> 0.7'
|
65
65
|
spec.add_development_dependency 'rb-gsl', '~>1.16'
|
66
|
-
|
66
|
+
spec.add_development_dependency 'bloomfilter-rb', '~> 2.1'
|
67
|
+
spec.add_development_dependency 'dbd-sqlite3'
|
68
|
+
spec.add_development_dependency 'dbi'
|
69
|
+
spec.add_development_dependency 'activerecord', '~> 4.0'
|
70
|
+
spec.add_development_dependency 'sqlite3'
|
71
|
+
end
|
data/images/README.md
ADDED
data/images/con0.png
ADDED
Binary file
|
data/images/con1.png
ADDED
Binary file
|
data/images/init0.png
ADDED
Binary file
|
data/images/init1.png
ADDED
Binary file
|
data/images/man0.png
ADDED
Binary file
|
data/images/man1.png
ADDED
Binary file
|
data/images/man2.png
ADDED
Binary file
|
data/images/man3.png
ADDED
Binary file
|
data/images/man4.png
ADDED
Binary file
|
data/images/man5.png
ADDED
Binary file
|
data/images/man6.png
ADDED
Binary file
|
data/images/plot0.png
ADDED
Binary file
|
data/lib/daru.rb
CHANGED
@@ -38,10 +38,12 @@ module Daru
|
|
38
38
|
attr_accessor :lazy_update
|
39
39
|
|
40
40
|
def create_has_library(library)
|
41
|
-
|
42
|
-
|
41
|
+
lib_underscore = library.to_s.gsub(/-/, '_')
|
42
|
+
define_singleton_method("has_#{lib_underscore}?") do
|
43
|
+
cv = "@@#{lib_underscore}"
|
43
44
|
unless class_variable_defined? cv
|
44
45
|
begin
|
46
|
+
library = 'nmatrix/nmatrix' if library == :nmatrix
|
45
47
|
require library.to_s
|
46
48
|
class_variable_set(cv, true)
|
47
49
|
rescue LoadError
|
@@ -56,6 +58,7 @@ module Daru
|
|
56
58
|
create_has_library :gsl
|
57
59
|
create_has_library :nmatrix
|
58
60
|
create_has_library :nyaplot
|
61
|
+
create_has_library :'bloomfilter-rb'
|
59
62
|
end
|
60
63
|
|
61
64
|
autoload :Spreadsheet, 'spreadsheet'
|
data/lib/daru/core/group_by.rb
CHANGED
@@ -18,7 +18,7 @@ module Daru
|
|
18
18
|
@context = context
|
19
19
|
vectors = names.map { |vec| context[vec].to_a }
|
20
20
|
tuples = vectors[0].zip(*vectors[1..-1])
|
21
|
-
keys = tuples.uniq.sort
|
21
|
+
keys = tuples.uniq.sort { |a,b| a && b ? a.compact <=> b.compact : a ? 1 : -1 }
|
22
22
|
|
23
23
|
keys.each do |key|
|
24
24
|
@groups[key] = all_indices_for(tuples, key)
|
@@ -28,7 +28,7 @@ module Daru
|
|
28
28
|
|
29
29
|
# Get a Daru::Vector of the size of each group.
|
30
30
|
def size
|
31
|
-
index =
|
31
|
+
index =
|
32
32
|
if multi_indexed_grouping?
|
33
33
|
Daru::MultiIndex.from_tuples @groups.keys
|
34
34
|
else
|
@@ -59,15 +59,15 @@ module Daru
|
|
59
59
|
# d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
|
60
60
|
# })
|
61
61
|
# df.group_by([:a, :b]).head(1)
|
62
|
-
# # =>
|
62
|
+
# # =>
|
63
63
|
# # #<Daru::DataFrame:82745170 @name = d7003f75-5eb9-4967-9303-c08dd9160224 @size = 6>
|
64
|
-
# # a b c d
|
65
|
-
# # 1 bar one 2 22
|
66
|
-
# # 3 bar three 1 44
|
67
|
-
# # 5 bar two 6 66
|
68
|
-
# # 0 foo one 1 11
|
69
|
-
# # 7 foo three 8 88
|
70
|
-
# # 2 foo two 3 33
|
64
|
+
# # a b c d
|
65
|
+
# # 1 bar one 2 22
|
66
|
+
# # 3 bar three 1 44
|
67
|
+
# # 5 bar two 6 66
|
68
|
+
# # 0 foo one 1 11
|
69
|
+
# # 7 foo three 8 88
|
70
|
+
# # 2 foo two 3 33
|
71
71
|
def head quantity=5
|
72
72
|
select_groups_from :first, quantity
|
73
73
|
end
|
@@ -82,14 +82,14 @@ module Daru
|
|
82
82
|
# d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
|
83
83
|
# })
|
84
84
|
# # df.group_by([:a, :b]).tail(1)
|
85
|
-
# # =>
|
85
|
+
# # =>
|
86
86
|
# # #<Daru::DataFrame:82378270 @name = 0623db46-5425-41bd-a843-99baac3d1d9a @size = 6>
|
87
|
-
# # a b c d
|
88
|
-
# # 1 bar one 2 22
|
89
|
-
# # 3 bar three 1 44
|
90
|
-
# # 5 bar two 6 66
|
91
|
-
# # 6 foo one 3 77
|
92
|
-
# # 7 foo three 8 88
|
87
|
+
# # a b c d
|
88
|
+
# # 1 bar one 2 22
|
89
|
+
# # 3 bar three 1 44
|
90
|
+
# # 5 bar two 6 66
|
91
|
+
# # 6 foo one 3 77
|
92
|
+
# # 7 foo three 8 88
|
93
93
|
# # 4 foo two 3 55
|
94
94
|
def tail quantity=5
|
95
95
|
select_groups_from :last, quantity
|
@@ -103,15 +103,15 @@ module Daru
|
|
103
103
|
# c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
|
104
104
|
# d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
|
105
105
|
# df.group_by([:a, :b]).mean
|
106
|
-
# # =>
|
106
|
+
# # =>
|
107
107
|
# # #<Daru::DataFrame:81097450 @name = 0c32983f-3e06-451f-a9c9-051cadfe7371 @size = 6>
|
108
|
-
# # c d
|
109
|
-
# # ["bar", "one"] 2 22
|
110
|
-
# # ["bar", "three"] 1 44
|
111
|
-
# # ["bar", "two"] 6 66
|
112
|
-
# # ["foo", "one"] 2.0 44.0
|
113
|
-
# # ["foo", "three"] 8 88
|
114
|
-
# # ["foo", "two"] 3.0 44.0
|
108
|
+
# # c d
|
109
|
+
# # ["bar", "one"] 2 22
|
110
|
+
# # ["bar", "three"] 1 44
|
111
|
+
# # ["bar", "two"] 6 66
|
112
|
+
# # ["foo", "one"] 2.0 44.0
|
113
|
+
# # ["foo", "three"] 8 88
|
114
|
+
# # ["foo", "two"] 3.0 44.0
|
115
115
|
def mean
|
116
116
|
apply_method :numeric, :mean
|
117
117
|
end
|
@@ -128,28 +128,28 @@ module Daru
|
|
128
128
|
|
129
129
|
# Count groups, excludes missing values.
|
130
130
|
# @example Using count
|
131
|
-
# df = Daru::DataFrame.new({
|
132
|
-
# a: %w{foo bar foo bar foo bar foo foo},
|
133
|
-
# b: %w{one one two three two two one three},
|
134
|
-
# c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
|
135
|
-
# d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
|
136
|
-
# })
|
131
|
+
# df = Daru::DataFrame.new({
|
132
|
+
# a: %w{foo bar foo bar foo bar foo foo},
|
133
|
+
# b: %w{one one two three two two one three},
|
134
|
+
# c: [1 ,2 ,3 ,1 ,3 ,6 ,3 ,8],
|
135
|
+
# d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
|
136
|
+
# })
|
137
137
|
# df.group_by([:a, :b]).count
|
138
|
-
# # =>
|
138
|
+
# # =>
|
139
139
|
# # #<Daru::DataFrame:76900210 @name = 7b9cf55d-17f8-48c7-b03a-2586c6e5ec5a @size = 6>
|
140
|
-
# # c d
|
141
|
-
# # ["bar", "one"] 1 1
|
142
|
-
# # ["bar", "two"] 1 1
|
143
|
-
# # ["bar", "three"] 1 1
|
144
|
-
# # ["foo", "one"] 2 2
|
145
|
-
# # ["foo", "three"] 1 1
|
146
|
-
# # ["foo", "two"] 2 2
|
140
|
+
# # c d
|
141
|
+
# # ["bar", "one"] 1 1
|
142
|
+
# # ["bar", "two"] 1 1
|
143
|
+
# # ["bar", "three"] 1 1
|
144
|
+
# # ["foo", "one"] 2 2
|
145
|
+
# # ["foo", "three"] 1 1
|
146
|
+
# # ["foo", "two"] 2 2
|
147
147
|
def count
|
148
148
|
width = @non_group_vectors.size
|
149
149
|
Daru::DataFrame.new([size]*width, order: @non_group_vectors)
|
150
150
|
end
|
151
151
|
|
152
|
-
# Calculate sample standard deviation of numeric vector groups, excluding
|
152
|
+
# Calculate sample standard deviation of numeric vector groups, excluding
|
153
153
|
# missing values.
|
154
154
|
def std
|
155
155
|
apply_method :numeric, :std
|
@@ -177,9 +177,9 @@ module Daru
|
|
177
177
|
# d: [11 ,22 ,33 ,44 ,55 ,66 ,77 ,88]
|
178
178
|
# })
|
179
179
|
# df.group_by([:a, :b]).get_group ['bar','two']
|
180
|
-
# #=>
|
180
|
+
# #=>
|
181
181
|
# ##<Daru::DataFrame:83258980 @name = 687ee3f6-8874-4899-97fa-9b31d84fa1d5 @size = 1>
|
182
|
-
# # a b c d
|
182
|
+
# # a b c d
|
183
183
|
# # 5 bar two 6 66
|
184
184
|
def get_group group
|
185
185
|
indexes = @groups[group]
|
@@ -198,7 +198,7 @@ module Daru
|
|
198
198
|
rows, index: @context.index[indexes], order: @context.vectors)
|
199
199
|
end
|
200
200
|
|
201
|
-
private
|
201
|
+
private
|
202
202
|
|
203
203
|
def select_groups_from method, quantity
|
204
204
|
selection = @context
|
@@ -227,7 +227,7 @@ module Daru
|
|
227
227
|
slice = vec[*indexes]
|
228
228
|
single_row << (slice.is_a?(Numeric) ? slice : slice.send(method))
|
229
229
|
end
|
230
|
-
end
|
230
|
+
end
|
231
231
|
|
232
232
|
rows << single_row
|
233
233
|
end
|
@@ -260,4 +260,4 @@ module Daru
|
|
260
260
|
end
|
261
261
|
end
|
262
262
|
end
|
263
|
-
end
|
263
|
+
end
|