red_amber 0.2.1 → 0.2.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.rubocop.yml +3 -0
- data/CHANGELOG.md +69 -2
- data/README.md +83 -280
- data/doc/DataFrame.md +279 -265
- data/doc/Vector.md +28 -36
- data/doc/image/basic_verbs.png +0 -0
- data/doc/image/dataframe/assign.png +0 -0
- data/doc/image/dataframe/assign_operation.png +0 -0
- data/doc/image/dataframe/drop.png +0 -0
- data/doc/image/dataframe/pick.png +0 -0
- data/doc/image/dataframe/pick_operation.png +0 -0
- data/doc/image/dataframe/remove.png +0 -0
- data/doc/image/dataframe/rename.png +0 -0
- data/doc/image/dataframe/rename_operation.png +0 -0
- data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
- data/doc/image/dataframe/slice.png +0 -0
- data/doc/image/dataframe/slice_operation.png +0 -0
- data/doc/image/dataframe_model.png +0 -0
- data/doc/image/group_operation.png +0 -0
- data/doc/image/replace-if_then.png +0 -0
- data/doc/image/reshaping_dataframe.png +0 -0
- data/doc/image/screenshot.png +0 -0
- data/doc/image/vector/binary_element_wise.png +0 -0
- data/doc/image/vector/unary_aggregation.png +0 -0
- data/doc/image/vector/unary_aggregation_w_option.png +0 -0
- data/doc/image/vector/unary_element_wise.png +0 -0
- data/lib/red_amber/data_frame.rb +10 -37
- data/lib/red_amber/data_frame_displayable.rb +56 -3
- data/lib/red_amber/data_frame_loadsave.rb +36 -0
- data/lib/red_amber/data_frame_reshaping.rb +8 -6
- data/lib/red_amber/data_frame_variable_operation.rb +25 -19
- data/lib/red_amber/group.rb +5 -3
- data/lib/red_amber/helper.rb +20 -18
- data/lib/red_amber/vector.rb +49 -30
- data/lib/red_amber/vector_selectable.rb +9 -1
- data/lib/red_amber/vector_updatable.rb +6 -3
- data/lib/red_amber/version.rb +1 -1
- data/lib/red_amber.rb +1 -0
- metadata +13 -3
- data/doc/examples_of_red_amber.ipynb +0 -8979
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a16699a945f41bf98790f698998126cc6b4a5e916eccb805e78448ec029f9310
|
4
|
+
data.tar.gz: 5e7fa732f64567fd85e5a74b046e80861824f13d15dc910278b6c62359db9a22
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6ae7a6e3a8015b6b9736fb934526d9dc96b43830f0890ccbc16e175e539a8df1053432a63dde84a31dbd3a170aa6256b681127c510117723427bce815568c981
|
7
|
+
data.tar.gz: a0e7d86a7bdc6be7ec493ef5331ced5ecf4e6b89458f4252f208435905a7e4e80a088a718098073fb0c65c86d76297c70c978cd4dec28b1eb1a0d915bb7e3608
|
data/.rubocop.yml
CHANGED
@@ -63,6 +63,7 @@ Metrics/AbcSize:
|
|
63
63
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 55
|
64
64
|
- 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
|
65
65
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 51
|
66
|
+
- 'lib/red_amber/data_frame_variable_operation.rb' # Max: 30.15
|
66
67
|
- 'lib/red_amber/vector_updatable.rb' # Max: 36
|
67
68
|
- 'lib/red_amber/vector_selectable.rb' # Max: 33
|
68
69
|
|
@@ -86,6 +87,7 @@ Metrics/CyclomaticComplexity:
|
|
86
87
|
Exclude:
|
87
88
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 18
|
88
89
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 14
|
90
|
+
- 'lib/red_amber/helper.rb' # Max: 15
|
89
91
|
- 'lib/red_amber/vector_selectable.rb' # Max: 13
|
90
92
|
- 'lib/red_amber/vector_updatable.rb' # Max: 14
|
91
93
|
|
@@ -111,6 +113,7 @@ Metrics/PerceivedComplexity:
|
|
111
113
|
Max: 13
|
112
114
|
Exclude:
|
113
115
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 14
|
116
|
+
- 'lib/red_amber/helper.rb' # Max: 15
|
114
117
|
- 'lib/red_amber/vector_updatable.rb' # Max: 15
|
115
118
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 19
|
116
119
|
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,63 @@
|
|
1
|
+
## [0.2.2] - 2022-10-04
|
2
|
+
|
3
|
+
- Bug fixes
|
4
|
+
|
5
|
+
- Return self when no replacement happen in Vector#replace. (#92)
|
6
|
+
|
7
|
+
- Limit n-digits in to_iruby. (#111)
|
8
|
+
|
9
|
+
- Fix displaying space in to_iruby. (#111)
|
10
|
+
|
11
|
+
- Raise error if key is duplicated. (#113)
|
12
|
+
|
13
|
+
- Fix DataFrame#pick/#drop with endless Range. (#113)
|
14
|
+
|
15
|
+
- Change type from dictionary to string in DataFrame reshaping methods. (#113)
|
16
|
+
|
17
|
+
- Fix arguments parser to accept Enumerator. (#114)
|
18
|
+
|
19
|
+
- New features and improvements
|
20
|
+
|
21
|
+
- Support to make a data frame from a to_arrow-responsible object. (#106) [Patch by Kenta Murata]
|
22
|
+
|
23
|
+
- Introduce DataFrame#auto_cast (experimental feature) (#105)
|
24
|
+
|
25
|
+
- Change default name in DataFrame#transpose, #to_long, #to_wide. (#110)
|
26
|
+
|
27
|
+
- Add Vector#dictionary? method. (#113)
|
28
|
+
|
29
|
+
- Add display mode 'Plain' and 'Minimum'. (#113)
|
30
|
+
|
31
|
+
- Refactor code
|
32
|
+
|
33
|
+
- Refine test_vector_selectable. (#92)
|
34
|
+
- Refine test_vector_updatable. (#92)
|
35
|
+
- Refine Vector.new. (#113)
|
36
|
+
- Refine DataFrame#pick, #drop. (#113)
|
37
|
+
|
38
|
+
- Documents
|
39
|
+
|
40
|
+
- Update images. (#90, #105, #113)
|
41
|
+
|
42
|
+
- Update README to use simpler examples. (#112)
|
43
|
+
- Update README with a new screenshot example. (#113)
|
44
|
+
|
45
|
+
- GitHub site
|
46
|
+
|
47
|
+
- Update Jupyter notebooks in Binder (#88, #115)
|
48
|
+
- Move binder support to heronshoes/docker-stacks repository.
|
49
|
+
- Update README notebook on binder.
|
50
|
+
- Add examples_of_RedAmber notebook on binder.
|
51
|
+
|
52
|
+
- Start to use discussions.
|
53
|
+
|
54
|
+
- Thanks
|
55
|
+
|
56
|
+
- Kenta Murata
|
57
|
+
|
1
58
|
## [0.2.1] - 2022-09-07
|
2
59
|
|
3
|
-
-Bug fixes
|
60
|
+
- Bug fixes
|
4
61
|
|
5
62
|
- Fix `Vector#each` with block (#66)
|
6
63
|
`Vector#each` will return value of each element with block.
|
@@ -49,12 +106,15 @@
|
|
49
106
|
|
50
107
|
- Add binary function `Vector#logb`
|
51
108
|
|
52
|
-
- Docker image and Jupyter Notebook
|
109
|
+
- Docker image and Jupyter Notebook [Thanks to Kenta Murata]
|
53
110
|
- Add link to RubyData in README
|
54
111
|
- Add link to interactive README by Binder
|
55
112
|
|
56
113
|
- Update Jupyter Notebook `71 examples of RedAmber`
|
57
114
|
|
115
|
+
- Thanks
|
116
|
+
|
117
|
+
- Kenta Murata
|
58
118
|
|
59
119
|
## [0.2.0] - 2022-08-15
|
60
120
|
|
@@ -294,6 +354,13 @@
|
|
294
354
|
- Documentation
|
295
355
|
- Fix typo in DataFrame.md
|
296
356
|
|
357
|
+
- Github site
|
358
|
+
- Add gem and status badges in README. (#42) [Patch by kojix2]
|
359
|
+
|
360
|
+
- Thanks
|
361
|
+
|
362
|
+
- kojix2
|
363
|
+
|
297
364
|
## [0.1.5] - 2022-06-12 (experimental)
|
298
365
|
|
299
366
|
- Bug fixes
|
data/README.md
CHANGED
@@ -2,12 +2,15 @@
|
|
2
2
|
|
3
3
|
[![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
|
4
4
|
[![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
|
5
|
+
[![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
|
5
6
|
|
6
7
|
A simple dataframe library for Ruby.
|
7
8
|
|
8
9
|
- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
|
9
10
|
- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
|
10
11
|
|
12
|
+
![screenshot from jupyterlab](doc/image/screenshot.png)
|
13
|
+
|
11
14
|
## Requirements
|
12
15
|
|
13
16
|
Supported Ruby version is >= 2.7.
|
@@ -57,338 +60,132 @@ gem install red_amber
|
|
57
60
|
|
58
61
|
[RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
|
59
62
|
|
60
|
-
Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/
|
61
|
-
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/
|
62
|
-
|
63
|
+
Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=README.ipynb).
|
64
|
+
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb)
|
63
65
|
|
64
66
|
|
65
|
-
## `RedAmber
|
67
|
+
## Data frame in `RedAmber`
|
66
68
|
|
67
|
-
|
69
|
+
Class `RedAmber::DataFrame` represents a set of data in 2D-shape.
|
70
|
+
The entity is a Red Arrow's Table object.
|
68
71
|
|
69
72
|
![dataframe model of RedAmber](doc/image/dataframe_model.png)
|
70
73
|
|
71
|
-
|
72
|
-
require 'red_amber' # require 'red-amber' is also OK.
|
73
|
-
require 'datasets-arrow'
|
74
|
-
|
75
|
-
arrow = Datasets::Penguins.new.to_arrow
|
76
|
-
penguins = RedAmber::DataFrame.new(arrow)
|
77
|
-
|
78
|
-
# =>
|
79
|
-
#<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
|
80
|
-
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
81
|
-
<string> <string> <double> <double> <uint8> ... <uint16>
|
82
|
-
1 Adelie Torgersen 39.1 18.7 181 ... 2007
|
83
|
-
2 Adelie Torgersen 39.5 17.4 186 ... 2007
|
84
|
-
3 Adelie Torgersen 40.3 18.0 195 ... 2007
|
85
|
-
4 Adelie Torgersen (nil) (nil) (nil) ... 2007
|
86
|
-
5 Adelie Torgersen 36.7 19.3 193 ... 2007
|
87
|
-
: : : : : : ... :
|
88
|
-
342 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
89
|
-
343 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
90
|
-
344 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
91
|
-
```
|
92
|
-
|
93
|
-
For example, `DataFrame#pick` accepts keys as arguments and returns a sub DataFrame.
|
94
|
-
|
95
|
-
![pick method image](doc/image/dataframe/pick.png)
|
96
|
-
|
97
|
-
```ruby
|
98
|
-
penguins.keys
|
99
|
-
# =>
|
100
|
-
[:species,
|
101
|
-
:island,
|
102
|
-
:bill_length_mm,
|
103
|
-
:bill_depth_mm,
|
104
|
-
:flipper_length_mm,
|
105
|
-
:body_mass_g,
|
106
|
-
:sex,
|
107
|
-
:year]
|
108
|
-
|
109
|
-
df = penguins.pick(:species, :island, :body_mass_g)
|
110
|
-
df
|
111
|
-
|
112
|
-
# =>
|
113
|
-
#<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003cc1c>
|
114
|
-
species island body_mass_g
|
115
|
-
<string> <string> <uint16>
|
116
|
-
1 Adelie Torgersen 3750
|
117
|
-
2 Adelie Torgersen 3800
|
118
|
-
3 Adelie Torgersen 3250
|
119
|
-
4 Adelie Torgersen (nil)
|
120
|
-
5 Adelie Torgersen 3450
|
121
|
-
: : : :
|
122
|
-
342 Gentoo Biscoe 5750
|
123
|
-
343 Gentoo Biscoe 5200
|
124
|
-
344 Gentoo Biscoe 5400
|
125
|
-
```
|
126
|
-
|
127
|
-
`DataFrame#drop` drops some columns to create a remainer DataFrame.
|
128
|
-
|
129
|
-
![drop method image](doc/image/dataframe/drop.png)
|
130
|
-
|
131
|
-
You can specify by keys or a boolean array of same size as n_keys.
|
74
|
+
Load the library.
|
132
75
|
|
133
76
|
```ruby
|
134
|
-
#
|
135
|
-
|
136
|
-
|
137
|
-
# =>
|
138
|
-
#<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
|
139
|
-
body_mass_g
|
140
|
-
<uint16>
|
141
|
-
1 3750
|
142
|
-
2 3800
|
143
|
-
3 3250
|
144
|
-
4 (nil)
|
145
|
-
5 3450
|
146
|
-
: :
|
147
|
-
342 5750
|
148
|
-
343 5200
|
149
|
-
344 5400
|
77
|
+
require 'red_amber' # require 'red-amber' is also OK.
|
78
|
+
include RedAmber
|
150
79
|
```
|
151
80
|
|
152
|
-
|
153
|
-
|
154
|
-
`DataFrame#assign` creates new columns or update existing columns.
|
155
|
-
|
156
|
-
![assign method image](doc/image/dataframe/assign.png)
|
81
|
+
### Example: diamonds dataset
|
157
82
|
|
158
83
|
```ruby
|
159
|
-
|
160
|
-
df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
|
161
|
-
|
162
|
-
# =>
|
163
|
-
#<RedAmber::DataFrame : 344 x 2 Vectors, 0x00000000000212f0>
|
164
|
-
body_mass_g body_mass_kg
|
165
|
-
<uint16> <double>
|
166
|
-
1 3750 3.8
|
167
|
-
2 3800 3.8
|
168
|
-
3 3250 3.3
|
169
|
-
4 (nil) (nil)
|
170
|
-
5 3450 3.5
|
171
|
-
: : :
|
172
|
-
342 5750 5.8
|
173
|
-
343 5200 5.2
|
174
|
-
344 5400 5.4
|
175
|
-
```
|
176
|
-
|
177
|
-
`DataFrame#slice` selects rows (observations) to create a sub DataFrame.
|
84
|
+
require 'datasets-arrow' # to load sample data
|
178
85
|
|
179
|
-
|
180
|
-
|
181
|
-
```ruby
|
182
|
-
# returns 5 rows at the start and 5 rows from the end
|
183
|
-
penguins.slice(0...5, -5..-1)
|
86
|
+
dataset = Datasets::Diamonds.new
|
87
|
+
diamonds = DataFrame.new(dataset) # from v0.2.2, should be `dataset.to_arrow` if older.
|
184
88
|
|
185
89
|
# =>
|
186
|
-
#<RedAmber::DataFrame :
|
187
|
-
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
196
|
-
|
197
|
-
|
90
|
+
#<RedAmber::DataFrame : 53940 x 10 Vectors, 0x000000000000f668>
|
91
|
+
carat cut color clarity depth table price x ... z
|
92
|
+
<double> <string> <string> <string> <double> <double> <uint16> <double> ... <double>
|
93
|
+
0 0.23 Ideal E SI2 61.5 55.0 326 3.95 ... 2.43
|
94
|
+
1 0.21 Premium E SI1 59.8 61.0 326 3.89 ... 2.31
|
95
|
+
2 0.23 Good E VS1 56.9 65.0 327 4.05 ... 2.31
|
96
|
+
3 0.29 Premium I VS2 62.4 58.0 334 4.2 ... 2.63
|
97
|
+
4 0.31 Good J SI2 63.3 58.0 335 4.34 ... 2.75
|
98
|
+
: : : : : : : : : ... :
|
99
|
+
53937 0.7 Very Good D SI1 62.8 60.0 2757 5.66 ... 3.56
|
100
|
+
53938 0.86 Premium H SI2 61.0 58.0 2757 6.15 ... 3.74
|
101
|
+
53939 0.75 Ideal D SI2 62.2 55.0 2757 5.83 ... 3.64
|
198
102
|
```
|
199
103
|
|
200
|
-
|
201
|
-
|
202
|
-
![remove method image](doc/image/dataframe/remove.png)
|
104
|
+
For example, we can compute mean prices per 'cut' for the data larger than 1 carat.
|
203
105
|
|
204
106
|
```ruby
|
205
|
-
|
206
|
-
|
107
|
+
df = diamonds
|
108
|
+
.slice { carat > 1 }
|
109
|
+
.group(:cut)
|
110
|
+
.mean(:price) # `pick` prior to `group` is not required if `:price` is specified here.
|
111
|
+
.sort('-mean(price)')
|
207
112
|
|
208
113
|
# =>
|
209
|
-
#<RedAmber::DataFrame :
|
210
|
-
|
211
|
-
|
212
|
-
|
213
|
-
|
214
|
-
|
215
|
-
|
216
|
-
|
217
|
-
: : : : : : ... :
|
218
|
-
242 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
219
|
-
243 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
220
|
-
244 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
114
|
+
#<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f67c>
|
115
|
+
cut mean(price)
|
116
|
+
<string> <double>
|
117
|
+
0 Ideal 8674.23
|
118
|
+
1 Premium 8487.25
|
119
|
+
2 Very Good 8340.55
|
120
|
+
3 Good 7753.6
|
121
|
+
4 Fair 7177.86
|
221
122
|
```
|
222
123
|
|
223
|
-
|
224
|
-
|
225
|
-
Previous example is also OK with a block.
|
124
|
+
Arrow data is immutable, so these methods always return new objects.
|
125
|
+
Next example will rename a column and create a new column by simple calcuration.
|
226
126
|
|
227
127
|
```ruby
|
228
|
-
|
229
|
-
```
|
230
|
-
|
231
|
-
Next example is an usage of block to update a column.
|
232
|
-
|
233
|
-
```ruby
|
234
|
-
df = RedAmber::DataFrame.new(
|
235
|
-
integer: [0, 1, 2, 3, nil],
|
236
|
-
float: [0.0, 1.1, 2.2, Float::NAN, nil],
|
237
|
-
string: ['A', 'B', 'C', 'D', nil],
|
238
|
-
boolean: [true, false, true, false, nil])
|
239
|
-
df
|
240
|
-
|
241
|
-
# =>
|
242
|
-
#<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000003131c>
|
243
|
-
integer float string boolean
|
244
|
-
<uint8> <double> <string> <boolean>
|
245
|
-
1 0 0.0 A true
|
246
|
-
2 1 1.1 B false
|
247
|
-
3 2 2.2 C true
|
248
|
-
4 3 NaN D false
|
249
|
-
5 (nil) (nil) (nil) (nil)
|
250
|
-
|
251
|
-
df.assign do
|
252
|
-
vectors.select(&:float?).map { |v| [v.key, -v] }
|
253
|
-
# => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
|
254
|
-
end
|
255
|
-
|
256
|
-
# =>
|
257
|
-
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
|
258
|
-
index float string
|
259
|
-
<uint8> <double> <string>
|
260
|
-
1 0 -0.0 A
|
261
|
-
2 1 -1.1 B
|
262
|
-
3 2 -2.2 C
|
263
|
-
4 3 NaN D
|
264
|
-
5 (nil) (nil) (nil)
|
265
|
-
```
|
266
|
-
|
267
|
-
Next example is to eliminate rows containing nil.
|
128
|
+
usdjpy = 110.0
|
268
129
|
|
269
|
-
|
270
|
-
|
271
|
-
nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
|
272
|
-
nil_removed.tdr
|
130
|
+
df.rename('mean(price)': :mean_price_USD)
|
131
|
+
.assign(:mean_price_JPY) { mean_price_USD * usdjpy }
|
273
132
|
|
274
133
|
# =>
|
275
|
-
RedAmber::DataFrame :
|
276
|
-
|
277
|
-
|
278
|
-
|
279
|
-
|
280
|
-
|
281
|
-
|
282
|
-
|
283
|
-
6 :body_mass_g int64 94 [3750, 3800, 3250, 3450, 3650, ... ]
|
284
|
-
7 :sex string 3 {"male"=>168, "female"=>165, ""=>9}
|
285
|
-
8 :year int64 3 {2007=>109, 2008=>114, 2009=>119}
|
134
|
+
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f71c>
|
135
|
+
cut mean_price_USD mean_price_JPY
|
136
|
+
<string> <double> <double>
|
137
|
+
0 Ideal 8674.23 954164.93
|
138
|
+
1 Premium 8487.25 933597.34
|
139
|
+
2 Very Good 8340.55 917460.37
|
140
|
+
3 Good 7753.6 852896.11
|
141
|
+
4 Fair 7177.86 789564.12
|
286
142
|
```
|
287
143
|
|
288
|
-
|
144
|
+
### Example: starwars dataset
|
289
145
|
|
290
|
-
|
291
|
-
penguins.remove_nil # => same result as above
|
292
|
-
```
|
293
|
-
|
294
|
-
`DataFrame#summary` shows summary statistics in a DataFrame.
|
146
|
+
Next example is `starwars` dataset reading from the downloaded CSV file. Followed by minimum data cleansing.
|
295
147
|
|
296
148
|
```ruby
|
297
|
-
|
149
|
+
uri = URI('https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv')
|
298
150
|
|
299
|
-
|
300
|
-
variables count mean std min 25% median 75% max
|
301
|
-
<dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
|
302
|
-
1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
|
303
|
-
2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
|
304
|
-
3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
|
305
|
-
4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
|
306
|
-
5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
|
307
|
-
```
|
308
|
-
|
309
|
-
`DataFrame#group` method can be used for the grouping tasks.
|
151
|
+
starwars = DataFrame.load(uri)
|
310
152
|
|
311
|
-
```ruby
|
312
|
-
starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
|
313
153
|
starwars
|
154
|
+
.drop(0) # delete unnecessary index column
|
155
|
+
.remove { species == "NA" } # delete unnecessary rows
|
156
|
+
.group(:species) { [count(:species), mean(:height, :mass)] }
|
157
|
+
.slice { count > 1 }
|
314
158
|
|
315
159
|
# =>
|
316
|
-
#<RedAmber::DataFrame :
|
317
|
-
unnamed1 name height mass hair_color skin_color eye_color ... species
|
318
|
-
<int64> <string> <int64> <double> <string> <string> <string> ... <string>
|
319
|
-
1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
|
320
|
-
2 2 C-3PO 167 75.0 NA gold yellow ... Droid
|
321
|
-
3 3 R2-D2 96 32.0 NA white, blue red ... Droid
|
322
|
-
4 4 Darth Vader 202 136.0 none white yellow ... Human
|
323
|
-
5 5 Leia Organa 150 49.0 brown light brown ... Human
|
324
|
-
: : : : : : : : ... :
|
325
|
-
85 85 BB8 (nil) (nil) none none black ... Droid
|
326
|
-
86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
|
327
|
-
87 87 Padmé Amidala 165 45.0 brown light brown ... Human
|
328
|
-
|
329
|
-
starwars.group(:species) { [count(:species), mean(:height, :mass)] }
|
330
|
-
.slice { count > 1 }
|
331
|
-
|
332
|
-
# =>
|
333
|
-
#<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000006e848>
|
160
|
+
#<RedAmber::DataFrame : 8 x 4 Vectors, 0x000000000000f848>
|
334
161
|
species count mean(height) mean(mass)
|
335
162
|
<string> <int64> <double> <double>
|
336
|
-
|
337
|
-
|
338
|
-
|
339
|
-
|
340
|
-
|
341
|
-
|
342
|
-
|
343
|
-
|
344
|
-
9 Kaminoan 2 221.0 88.0
|
163
|
+
0 Human 35 176.65 82.78
|
164
|
+
1 Droid 6 131.2 69.75
|
165
|
+
2 Wookiee 2 231.0 124.0
|
166
|
+
3 Gungan 3 208.67 74.0
|
167
|
+
4 Zabrak 2 173.0 80.0
|
168
|
+
5 Twi'lek 2 179.0 55.0
|
169
|
+
6 Mirialan 2 168.0 53.1
|
170
|
+
7 Kaminoan 2 221.0 88.0
|
345
171
|
```
|
346
172
|
|
347
173
|
See [DataFrame.md](doc/DataFrame.md) for other examples and details.
|
348
174
|
|
349
175
|
|
350
|
-
|
176
|
+
### `Vector` for 1D data object in column
|
351
177
|
|
352
178
|
Class `RedAmber::Vector` represents a series of data in the DataFrame.
|
353
|
-
Method `RedAmber::DataFrame#[key]` returns a Vector with the key `key`.
|
354
|
-
|
355
|
-
```ruby
|
356
|
-
penguins[:bill_length_mm]
|
357
|
-
# =>
|
358
|
-
#<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
|
359
|
-
[39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
|
360
|
-
```
|
361
|
-
|
362
|
-
Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
|
363
|
-
|
364
|
-
This is an element-wise comparison and returns a boolean Vector of same size.
|
365
|
-
|
366
|
-
![unary element-wise](doc/image/vector/unary_element_wise.png)
|
367
|
-
|
368
|
-
```ruby
|
369
|
-
penguins[:bill_length_mm] < 40
|
370
|
-
|
371
|
-
# =>
|
372
|
-
#<RedAmber::Vector(:boolean, size=344):0x000000000007e7ac>
|
373
|
-
[true, true, false, nil, true, true, true, true, true, false, true, true, false, ... ]
|
374
|
-
```
|
375
|
-
|
376
|
-
Next example returns aggregated result.
|
377
|
-
|
378
|
-
![unary aggregation](doc/image/vector/unary_aggregation.png)
|
379
|
-
|
380
|
-
```ruby
|
381
|
-
penguins[:bill_length_mm].mean
|
382
|
-
43.92192982456141
|
383
|
-
# =>
|
384
|
-
|
385
|
-
```
|
386
179
|
|
387
180
|
See [Vector.md](doc/Vector.md) for details.
|
388
181
|
|
389
182
|
## Jupyter notebook
|
390
183
|
|
391
|
-
[
|
184
|
+
[73 Examples of Red Amber](binder/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
|
185
|
+
|
186
|
+
You can try this notebook on [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb).
|
187
|
+
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb)
|
188
|
+
|
392
189
|
|
393
190
|
## Development
|
394
191
|
|
@@ -399,8 +196,14 @@ bundle install
|
|
399
196
|
bundle exec rake test
|
400
197
|
```
|
401
198
|
|
199
|
+
## Community
|
200
|
+
|
402
201
|
I will appreciate if you could help to improve this project. Here are a few ways you can help:
|
403
202
|
|
203
|
+
- Let's talk in the [discussions](https://github.com/heronshoes/red_amber/discussions). [![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
|
204
|
+
- Browse Q and A, how to use, tips, etc.
|
205
|
+
- Ask questions you’re wondering about.
|
206
|
+
- Share ideas. The idea may be promoted to issues or pull requests.
|
404
207
|
- [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
|
405
208
|
- Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
|
406
209
|
- Write, clarify, or fix documentation
|