red_amber 0.2.0 → 0.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +5 -0
- data/CHANGELOG.md +125 -0
- data/README.md +86 -269
- data/doc/DataFrame.md +427 -281
- data/doc/Vector.md +35 -54
- data/doc/image/basic_verbs.png +0 -0
- data/doc/image/dataframe/assign.png +0 -0
- data/doc/image/dataframe/assign_operation.png +0 -0
- data/doc/image/dataframe/drop.png +0 -0
- data/doc/image/dataframe/pick.png +0 -0
- data/doc/image/dataframe/pick_operation.png +0 -0
- data/doc/image/dataframe/remove.png +0 -0
- data/doc/image/dataframe/rename.png +0 -0
- data/doc/image/dataframe/rename_operation.png +0 -0
- data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
- data/doc/image/dataframe/slice.png +0 -0
- data/doc/image/dataframe/slice_operation.png +0 -0
- data/doc/image/dataframe_model.png +0 -0
- data/doc/image/group_operation.png +0 -0
- data/doc/image/replace-if_then.png +0 -0
- data/doc/image/reshaping_dataframe.png +0 -0
- data/doc/image/screenshot.png +0 -0
- data/doc/image/vector/binary_element_wise.png +0 -0
- data/doc/image/vector/unary_aggregation.png +0 -0
- data/doc/image/vector/unary_aggregation_w_option.png +0 -0
- data/doc/image/vector/unary_element_wise.png +0 -0
- data/lib/red_amber/data_frame.rb +33 -41
- data/lib/red_amber/data_frame_displayable.rb +59 -6
- data/lib/red_amber/data_frame_loadsave.rb +36 -0
- data/lib/red_amber/data_frame_reshaping.rb +12 -10
- data/lib/red_amber/data_frame_selectable.rb +53 -9
- data/lib/red_amber/data_frame_variable_operation.rb +57 -20
- data/lib/red_amber/group.rb +5 -3
- data/lib/red_amber/helper.rb +20 -18
- data/lib/red_amber/vector.rb +50 -31
- data/lib/red_amber/vector_functions.rb +21 -24
- data/lib/red_amber/vector_selectable.rb +18 -9
- data/lib/red_amber/vector_updatable.rb +6 -3
- data/lib/red_amber/version.rb +1 -1
- data/lib/red_amber.rb +1 -0
- metadata +13 -3
- data/doc/examples_of_red_amber.ipynb +0 -6783
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a16699a945f41bf98790f698998126cc6b4a5e916eccb805e78448ec029f9310
|
4
|
+
data.tar.gz: 5e7fa732f64567fd85e5a74b046e80861824f13d15dc910278b6c62359db9a22
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6ae7a6e3a8015b6b9736fb934526d9dc96b43830f0890ccbc16e175e539a8df1053432a63dde84a31dbd3a170aa6256b681127c510117723427bce815568c981
|
7
|
+
data.tar.gz: a0e7d86a7bdc6be7ec493ef5331ced5ecf4e6b89458f4252f208435905a7e4e80a088a718098073fb0c65c86d76297c70c978cd4dec28b1eb1a0d915bb7e3608
|
data/.rubocop.yml
CHANGED
@@ -63,6 +63,7 @@ Metrics/AbcSize:
|
|
63
63
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 55
|
64
64
|
- 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
|
65
65
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 51
|
66
|
+
- 'lib/red_amber/data_frame_variable_operation.rb' # Max: 30.15
|
66
67
|
- 'lib/red_amber/vector_updatable.rb' # Max: 36
|
67
68
|
- 'lib/red_amber/vector_selectable.rb' # Max: 33
|
68
69
|
|
@@ -86,6 +87,7 @@ Metrics/CyclomaticComplexity:
|
|
86
87
|
Exclude:
|
87
88
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 18
|
88
89
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 14
|
90
|
+
- 'lib/red_amber/helper.rb' # Max: 15
|
89
91
|
- 'lib/red_amber/vector_selectable.rb' # Max: 13
|
90
92
|
- 'lib/red_amber/vector_updatable.rb' # Max: 14
|
91
93
|
|
@@ -94,6 +96,8 @@ Metrics/MethodLength:
|
|
94
96
|
Max: 30
|
95
97
|
Exclude:
|
96
98
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 33
|
99
|
+
- 'lib/red_amber/data_frame_selectable.rb' # Max: 38
|
100
|
+
- 'lib/red_amber/data_frame_variable_operation.rb' # Max: 35
|
97
101
|
|
98
102
|
# Max: 100
|
99
103
|
Metrics/ModuleLength:
|
@@ -109,6 +113,7 @@ Metrics/PerceivedComplexity:
|
|
109
113
|
Max: 13
|
110
114
|
Exclude:
|
111
115
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 14
|
116
|
+
- 'lib/red_amber/helper.rb' # Max: 15
|
112
117
|
- 'lib/red_amber/vector_updatable.rb' # Max: 15
|
113
118
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 19
|
114
119
|
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,121 @@
|
|
1
|
+
## [0.2.2] - 2022-10-04
|
2
|
+
|
3
|
+
- Bug fixes
|
4
|
+
|
5
|
+
- Return self when no replacement happen in Vector#replace. (#92)
|
6
|
+
|
7
|
+
- Limit n-digits in to_iruby. (#111)
|
8
|
+
|
9
|
+
- Fix displaying space in to_iruby. (#111)
|
10
|
+
|
11
|
+
- Raise error if key is duplicated. (#113)
|
12
|
+
|
13
|
+
- Fix DataFrame#pick/#drop with endless Range. (#113)
|
14
|
+
|
15
|
+
- Change type from dictionary to string in DataFrame reshaping methods. (#113)
|
16
|
+
|
17
|
+
- Fix arguments parser to accept Enumerator. (#114)
|
18
|
+
|
19
|
+
- New features and improvements
|
20
|
+
|
21
|
+
- Support to make a data frame from a to_arrow-responsible object. (#106) [Patch by Kenta Murata]
|
22
|
+
|
23
|
+
- Introduce DataFrame#auto_cast (experimental feature) (#105)
|
24
|
+
|
25
|
+
- Change default name in DataFrame#transpose, #to_long, #to_wide. (#110)
|
26
|
+
|
27
|
+
- Add Vector#dictionary? method. (#113)
|
28
|
+
|
29
|
+
- Add display mode 'Plain' and 'Minimum'. (#113)
|
30
|
+
|
31
|
+
- Refactor code
|
32
|
+
|
33
|
+
- Refine test_vector_selectable. (#92)
|
34
|
+
- Refine test_vector_updatable. (#92)
|
35
|
+
- Refine Vector.new. (#113)
|
36
|
+
- Refine DataFrame#pick, #drop. (#113)
|
37
|
+
|
38
|
+
- Documents
|
39
|
+
|
40
|
+
- Update images. (#90, #105, #113)
|
41
|
+
|
42
|
+
- Update README to use simpler examples. (#112)
|
43
|
+
- Update README with a new screenshot example. (#113)
|
44
|
+
|
45
|
+
- GitHub site
|
46
|
+
|
47
|
+
- Update Jupyter notebooks in Binder (#88, #115)
|
48
|
+
- Move binder support to heronshoes/docker-stacks repository.
|
49
|
+
- Update README notebook on binder.
|
50
|
+
- Add examples_of_RedAmber notebook on binder.
|
51
|
+
|
52
|
+
- Start to use discussions.
|
53
|
+
|
54
|
+
- Thanks
|
55
|
+
|
56
|
+
- Kenta Murata
|
57
|
+
|
58
|
+
## [0.2.1] - 2022-09-07
|
59
|
+
|
60
|
+
- Bug fixes
|
61
|
+
|
62
|
+
- Fix `Vector#each` with block (#66)
|
63
|
+
`Vector#each` will return value of each element with block.
|
64
|
+
|
65
|
+
- Fix table format at size == 9 (#67)
|
66
|
+
|
67
|
+
- Fix to support Vector in `DataFrame#assign` (#77)
|
68
|
+
|
69
|
+
- Add `assert_delta` functionality for `assert_with_NaN` (#78)
|
70
|
+
|
71
|
+
- Fix Vector#is_in when self is chunked (#79)
|
72
|
+
|
73
|
+
- Fix Array type error (uint/int) (#79)
|
74
|
+
|
75
|
+
- New features and improvements
|
76
|
+
|
77
|
+
- Refine `DataFrame#indices` method (#67)
|
78
|
+
|
79
|
+
- Update DataFrame reshaping methods (#73)
|
80
|
+
|
81
|
+
- Change default option value of DataFrame reshaping
|
82
|
+
|
83
|
+
- Change the order of import_cars example
|
84
|
+
|
85
|
+
- Add `DataFrame#method_missing` to get column vector by method (#75)
|
86
|
+
|
87
|
+
- Add `DataFrame#method_missing` to get column (#75)
|
88
|
+
|
89
|
+
- Accept both args and block in `DataFrame#assign` (#75)
|
90
|
+
|
91
|
+
- Accept indices in `DataFrame#pick` and `DataFrame#drop` (#76)
|
92
|
+
|
93
|
+
- Add `DataFrame#slice_by` method (#77)
|
94
|
+
|
95
|
+
- Add new Vector functions (#78)
|
96
|
+
|
97
|
+
- Add inverse trigonometric function for Vector
|
98
|
+
- `acos`
|
99
|
+
- `asin`
|
100
|
+
|
101
|
+
- Add logarithmic function for Vector
|
102
|
+
- `ln`
|
103
|
+
- `log10`
|
104
|
+
- `log1p`
|
105
|
+
- `log2`
|
106
|
+
|
107
|
+
- Add binary function `Vector#logb`
|
108
|
+
|
109
|
+
- Docker image and Jupyter Notebook [Thanks to Kenta Murata]
|
110
|
+
- Add link to RubyData in README
|
111
|
+
- Add link to interactive README by Binder
|
112
|
+
|
113
|
+
- Update Jupyter Notebook `71 examples of RedAmber`
|
114
|
+
|
115
|
+
- Thanks
|
116
|
+
|
117
|
+
- Kenta Murata
|
118
|
+
|
1
119
|
## [0.2.0] - 2022-08-15
|
2
120
|
|
3
121
|
- Bump version up to 0.2.0
|
@@ -236,6 +354,13 @@
|
|
236
354
|
- Documentation
|
237
355
|
- Fix typo in DataFrame.md
|
238
356
|
|
357
|
+
- Github site
|
358
|
+
- Add gem and status badges in README. (#42) [Patch by kojix2]
|
359
|
+
|
360
|
+
- Thanks
|
361
|
+
|
362
|
+
- kojix2
|
363
|
+
|
239
364
|
## [0.1.5] - 2022-06-12 (experimental)
|
240
365
|
|
241
366
|
- Bug fixes
|
data/README.md
CHANGED
@@ -2,12 +2,15 @@
|
|
2
2
|
|
3
3
|
[](https://badge.fury.io/rb/red_amber)
|
4
4
|
[](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
|
5
|
+
[](https://github.com/heronshoes/red_amber/discussions)
|
5
6
|
|
6
7
|
A simple dataframe library for Ruby.
|
7
8
|
|
8
9
|
- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [](https://gitter.im/red-data-tools/en)
|
9
10
|
- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
|
10
11
|
|
12
|
+

|
13
|
+
|
11
14
|
## Requirements
|
12
15
|
|
13
16
|
Supported Ruby version is >= 2.7.
|
@@ -53,328 +56,136 @@ Or install it yourself as:
|
|
53
56
|
gem install red_amber
|
54
57
|
```
|
55
58
|
|
56
|
-
##
|
59
|
+
## Docker image and Jupyter Notebook
|
57
60
|
|
58
|
-
|
61
|
+
[RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
|
59
62
|
|
60
|
-
|
61
|
-
|
62
|
-
require 'datasets-arrow'
|
63
|
+
Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=README.ipynb).
|
64
|
+
[](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb)
|
63
65
|
|
64
|
-
arrow = Datasets::Penguins.new.to_arrow
|
65
|
-
penguins = RedAmber::DataFrame.new(arrow)
|
66
66
|
|
67
|
-
|
68
|
-
#<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
|
69
|
-
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
70
|
-
<string> <string> <double> <double> <uint8> ... <uint16>
|
71
|
-
1 Adelie Torgersen 39.1 18.7 181 ... 2007
|
72
|
-
2 Adelie Torgersen 39.5 17.4 186 ... 2007
|
73
|
-
3 Adelie Torgersen 40.3 18.0 195 ... 2007
|
74
|
-
4 Adelie Torgersen (nil) (nil) (nil) ... 2007
|
75
|
-
5 Adelie Torgersen 36.7 19.3 193 ... 2007
|
76
|
-
: : : : : : ... :
|
77
|
-
342 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
78
|
-
343 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
79
|
-
344 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
80
|
-
```
|
67
|
+
## Data frame in `RedAmber`
|
81
68
|
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
For example, `DataFrame#pick` accepts keys as an argument and returns a sub DataFrame.
|
69
|
+
Class `RedAmber::DataFrame` represents a set of data in 2D-shape.
|
70
|
+
The entity is a Red Arrow's Table object.
|
86
71
|
|
87
|
-
![
|
88
|
-
|
89
|
-
```ruby
|
90
|
-
penguins.keys
|
91
|
-
# =>
|
92
|
-
[:species,
|
93
|
-
:island,
|
94
|
-
:bill_length_mm,
|
95
|
-
:bill_depth_mm,
|
96
|
-
:flipper_length_mm,
|
97
|
-
:body_mass_g,
|
98
|
-
:sex,
|
99
|
-
:year]
|
100
|
-
|
101
|
-
df = penguins.pick(:species, :island, :body_mass_g)
|
102
|
-
df
|
103
|
-
|
104
|
-
# =>
|
105
|
-
#<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003cc1c>
|
106
|
-
species island body_mass_g
|
107
|
-
<string> <string> <uint16>
|
108
|
-
1 Adelie Torgersen 3750
|
109
|
-
2 Adelie Torgersen 3800
|
110
|
-
3 Adelie Torgersen 3250
|
111
|
-
4 Adelie Torgersen (nil)
|
112
|
-
5 Adelie Torgersen 3450
|
113
|
-
: : : :
|
114
|
-
342 Gentoo Biscoe 5750
|
115
|
-
343 Gentoo Biscoe 5200
|
116
|
-
344 Gentoo Biscoe 5400
|
117
|
-
```
|
118
|
-
|
119
|
-
`DataFrame#drop` drops some columns to create a remainer DataFrame.
|
120
|
-
|
121
|
-

|
72
|
+

|
122
73
|
|
123
|
-
|
74
|
+
Load the library.
|
124
75
|
|
125
76
|
```ruby
|
126
|
-
#
|
127
|
-
|
128
|
-
|
129
|
-
# =>
|
130
|
-
#<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
|
131
|
-
body_mass_g
|
132
|
-
<uint16>
|
133
|
-
1 3750
|
134
|
-
2 3800
|
135
|
-
3 3250
|
136
|
-
4 (nil)
|
137
|
-
5 3450
|
138
|
-
: :
|
139
|
-
342 5750
|
140
|
-
343 5200
|
141
|
-
344 5400
|
77
|
+
require 'red_amber' # require 'red-amber' is also OK.
|
78
|
+
include RedAmber
|
142
79
|
```
|
143
80
|
|
144
|
-
|
145
|
-
|
146
|
-
`DataFrame#assign` creates new columns or update existing columns.
|
147
|
-
|
148
|
-

|
81
|
+
### Example: diamonds dataset
|
149
82
|
|
150
83
|
```ruby
|
151
|
-
|
152
|
-
df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
|
84
|
+
require 'datasets-arrow' # to load sample data
|
153
85
|
|
154
|
-
|
155
|
-
|
156
|
-
body_mass_g body_mass_kg
|
157
|
-
<uint16> <double>
|
158
|
-
1 3750 3.8
|
159
|
-
2 3800 3.8
|
160
|
-
3 3250 3.3
|
161
|
-
4 (nil) (nil)
|
162
|
-
5 3450 3.5
|
163
|
-
: : :
|
164
|
-
342 5750 5.8
|
165
|
-
343 5200 5.2
|
166
|
-
344 5400 5.4
|
167
|
-
```
|
168
|
-
|
169
|
-
`DataFrame#slice` selects rows (observations) to create a sub DataFrame.
|
170
|
-
|
171
|
-

|
172
|
-
|
173
|
-
```ruby
|
174
|
-
# returns 5 rows at the start and 5 rows from the end
|
175
|
-
penguins.slice(0...5, -5..-1)
|
86
|
+
dataset = Datasets::Diamonds.new
|
87
|
+
diamonds = DataFrame.new(dataset) # from v0.2.2, should be `dataset.to_arrow` if older.
|
176
88
|
|
177
89
|
# =>
|
178
|
-
#<RedAmber::DataFrame :
|
179
|
-
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
|
184
|
-
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
|
189
|
-
|
90
|
+
#<RedAmber::DataFrame : 53940 x 10 Vectors, 0x000000000000f668>
|
91
|
+
carat cut color clarity depth table price x ... z
|
92
|
+
<double> <string> <string> <string> <double> <double> <uint16> <double> ... <double>
|
93
|
+
0 0.23 Ideal E SI2 61.5 55.0 326 3.95 ... 2.43
|
94
|
+
1 0.21 Premium E SI1 59.8 61.0 326 3.89 ... 2.31
|
95
|
+
2 0.23 Good E VS1 56.9 65.0 327 4.05 ... 2.31
|
96
|
+
3 0.29 Premium I VS2 62.4 58.0 334 4.2 ... 2.63
|
97
|
+
4 0.31 Good J SI2 63.3 58.0 335 4.34 ... 2.75
|
98
|
+
: : : : : : : : : ... :
|
99
|
+
53937 0.7 Very Good D SI1 62.8 60.0 2757 5.66 ... 3.56
|
100
|
+
53938 0.86 Premium H SI2 61.0 58.0 2757 6.15 ... 3.74
|
101
|
+
53939 0.75 Ideal D SI2 62.2 55.0 2757 5.83 ... 3.64
|
190
102
|
```
|
191
103
|
|
192
|
-
|
193
|
-
|
194
|
-

|
104
|
+
For example, we can compute mean prices per 'cut' for the data larger than 1 carat.
|
195
105
|
|
196
106
|
```ruby
|
197
|
-
|
198
|
-
|
107
|
+
df = diamonds
|
108
|
+
.slice { carat > 1 }
|
109
|
+
.group(:cut)
|
110
|
+
.mean(:price) # `pick` prior to `group` is not required if `:price` is specified here.
|
111
|
+
.sort('-mean(price)')
|
199
112
|
|
200
113
|
# =>
|
201
|
-
#<RedAmber::DataFrame :
|
202
|
-
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
: : : : : : ... :
|
210
|
-
242 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
211
|
-
243 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
212
|
-
244 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
114
|
+
#<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f67c>
|
115
|
+
cut mean(price)
|
116
|
+
<string> <double>
|
117
|
+
0 Ideal 8674.23
|
118
|
+
1 Premium 8487.25
|
119
|
+
2 Very Good 8340.55
|
120
|
+
3 Good 7753.6
|
121
|
+
4 Fair 7177.86
|
213
122
|
```
|
214
123
|
|
215
|
-
|
216
|
-
|
217
|
-
This example is usage of block to update a column.
|
124
|
+
Arrow data is immutable, so these methods always return new objects.
|
125
|
+
Next example will rename a column and create a new column by simple calcuration.
|
218
126
|
|
219
127
|
```ruby
|
220
|
-
|
221
|
-
integer: [0, 1, 2, 3, nil],
|
222
|
-
float: [0.0, 1.1, 2.2, Float::NAN, nil],
|
223
|
-
string: ['A', 'B', 'C', 'D', nil],
|
224
|
-
boolean: [true, false, true, false, nil])
|
225
|
-
df
|
128
|
+
usdjpy = 110.0
|
226
129
|
|
227
|
-
|
228
|
-
|
229
|
-
integer float string boolean
|
230
|
-
<uint8> <double> <string> <boolean>
|
231
|
-
1 0 0.0 A true
|
232
|
-
2 1 1.1 B false
|
233
|
-
3 2 2.2 C true
|
234
|
-
4 3 NaN D false
|
235
|
-
5 (nil) (nil) (nil) (nil)
|
236
|
-
|
237
|
-
df.assign do
|
238
|
-
vectors.select(&:float?).map { |v| [v.key, -v] }
|
239
|
-
# => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
|
240
|
-
end
|
130
|
+
df.rename('mean(price)': :mean_price_USD)
|
131
|
+
.assign(:mean_price_JPY) { mean_price_USD * usdjpy }
|
241
132
|
|
242
133
|
# =>
|
243
|
-
#<RedAmber::DataFrame : 5 x 3 Vectors,
|
244
|
-
|
245
|
-
<
|
246
|
-
|
247
|
-
|
248
|
-
|
249
|
-
|
250
|
-
|
134
|
+
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f71c>
|
135
|
+
cut mean_price_USD mean_price_JPY
|
136
|
+
<string> <double> <double>
|
137
|
+
0 Ideal 8674.23 954164.93
|
138
|
+
1 Premium 8487.25 933597.34
|
139
|
+
2 Very Good 8340.55 917460.37
|
140
|
+
3 Good 7753.6 852896.11
|
141
|
+
4 Fair 7177.86 789564.12
|
251
142
|
```
|
252
143
|
|
253
|
-
|
144
|
+
### Example: starwars dataset
|
254
145
|
|
255
|
-
|
256
|
-
# remove all observations containing nil
|
257
|
-
nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
|
258
|
-
nil_removed.tdr
|
259
|
-
|
260
|
-
# =>
|
261
|
-
RedAmber::DataFrame : 342 x 8 Vectors
|
262
|
-
Vectors : 5 numeric, 3 strings
|
263
|
-
# key type level data_preview
|
264
|
-
1 :species string 3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
|
265
|
-
2 :island string 3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
|
266
|
-
3 :bill_length_mm double 164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
|
267
|
-
4 :bill_depth_mm double 80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
|
268
|
-
5 :flipper_length_mm int64 55 [181, 186, 195, 193, 190, ... ]
|
269
|
-
6 :body_mass_g int64 94 [3750, 3800, 3250, 3450, 3650, ... ]
|
270
|
-
7 :sex string 3 {"male"=>168, "female"=>165, ""=>9}
|
271
|
-
8 :year int64 3 {2007=>109, 2008=>114, 2009=>119}
|
272
|
-
```
|
273
|
-
|
274
|
-
For this frequently needed task, we can do it much simpler.
|
146
|
+
Next example is `starwars` dataset reading from the downloaded CSV file. Followed by minimum data cleansing.
|
275
147
|
|
276
148
|
```ruby
|
277
|
-
|
278
|
-
```
|
149
|
+
uri = URI('https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv')
|
279
150
|
|
280
|
-
|
151
|
+
starwars = DataFrame.load(uri)
|
281
152
|
|
282
|
-
```ruby
|
283
|
-
puts penguins.summary.to_s(width: 82)
|
284
|
-
|
285
|
-
# =>
|
286
|
-
variables count mean std min 25% median 75% max
|
287
|
-
<dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
|
288
|
-
1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
|
289
|
-
2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
|
290
|
-
3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
|
291
|
-
4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
|
292
|
-
5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
|
293
|
-
```
|
294
|
-
|
295
|
-
`DataFrame#group` method can be used for the grouping tasks.
|
296
|
-
|
297
|
-
```ruby
|
298
|
-
starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
|
299
153
|
starwars
|
154
|
+
.drop(0) # delete unnecessary index column
|
155
|
+
.remove { species == "NA" } # delete unnecessary rows
|
156
|
+
.group(:species) { [count(:species), mean(:height, :mass)] }
|
157
|
+
.slice { count > 1 }
|
300
158
|
|
301
159
|
# =>
|
302
|
-
#<RedAmber::DataFrame :
|
303
|
-
unnamed1 name height mass hair_color skin_color eye_color ... species
|
304
|
-
<int64> <string> <int64> <double> <string> <string> <string> ... <string>
|
305
|
-
1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
|
306
|
-
2 2 C-3PO 167 75.0 NA gold yellow ... Droid
|
307
|
-
3 3 R2-D2 96 32.0 NA white, blue red ... Droid
|
308
|
-
4 4 Darth Vader 202 136.0 none white yellow ... Human
|
309
|
-
5 5 Leia Organa 150 49.0 brown light brown ... Human
|
310
|
-
: : : : : : : : ... :
|
311
|
-
85 85 BB8 (nil) (nil) none none black ... Droid
|
312
|
-
86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
|
313
|
-
87 87 Padmé Amidala 165 45.0 brown light brown ... Human
|
314
|
-
|
315
|
-
grouped = starwars.group(:species) { [count(:species), mean(:height, :mass)] }
|
316
|
-
grouped.slice { v(:count) > 1 }
|
317
|
-
|
318
|
-
# =>
|
319
|
-
#<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000006e848>
|
160
|
+
#<RedAmber::DataFrame : 8 x 4 Vectors, 0x000000000000f848>
|
320
161
|
species count mean(height) mean(mass)
|
321
162
|
<string> <int64> <double> <double>
|
322
|
-
|
323
|
-
|
324
|
-
|
325
|
-
|
326
|
-
|
327
|
-
|
328
|
-
|
329
|
-
|
330
|
-
9 Kaminoan 2 221.0 88.0
|
163
|
+
0 Human 35 176.65 82.78
|
164
|
+
1 Droid 6 131.2 69.75
|
165
|
+
2 Wookiee 2 231.0 124.0
|
166
|
+
3 Gungan 3 208.67 74.0
|
167
|
+
4 Zabrak 2 173.0 80.0
|
168
|
+
5 Twi'lek 2 179.0 55.0
|
169
|
+
6 Mirialan 2 168.0 53.1
|
170
|
+
7 Kaminoan 2 221.0 88.0
|
331
171
|
```
|
332
172
|
|
333
173
|
See [DataFrame.md](doc/DataFrame.md) for other examples and details.
|
334
174
|
|
335
175
|
|
336
|
-
|
176
|
+
### `Vector` for 1D data object in column
|
337
177
|
|
338
178
|
Class `RedAmber::Vector` represents a series of data in the DataFrame.
|
339
|
-
Method `RedAmber::DataFrame#[key]` returns a Vector with the key `key`.
|
340
|
-
|
341
|
-
```ruby
|
342
|
-
penguins[:bill_length_mm]
|
343
|
-
# =>
|
344
|
-
#<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
|
345
|
-
[39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
|
346
|
-
```
|
347
|
-
|
348
|
-
Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
|
349
|
-
|
350
|
-
This is an element-wise comparison and returns a boolean Vector of same size.
|
351
|
-
|
352
|
-

|
353
|
-
|
354
|
-
```ruby
|
355
|
-
penguins[:bill_length_mm] < 40
|
356
|
-
|
357
|
-
# =>
|
358
|
-
#<RedAmber::Vector(:boolean, size=344):0x000000000007e7ac>
|
359
|
-
[true, true, false, nil, true, true, true, true, true, false, true, true, false, ... ]
|
360
|
-
```
|
361
|
-
|
362
|
-
Next example returns aggregated result.
|
363
|
-
|
364
|
-

|
365
|
-
|
366
|
-
```ruby
|
367
|
-
penguins[:bill_length_mm].mean
|
368
|
-
43.92192982456141
|
369
|
-
# =>
|
370
|
-
|
371
|
-
```
|
372
179
|
|
373
180
|
See [Vector.md](doc/Vector.md) for details.
|
374
181
|
|
375
182
|
## Jupyter notebook
|
376
183
|
|
377
|
-
[
|
184
|
+
[73 Examples of Red Amber](binder/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
|
185
|
+
|
186
|
+
You can try this notebook on [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb).
|
187
|
+
[](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb)
|
188
|
+
|
378
189
|
|
379
190
|
## Development
|
380
191
|
|
@@ -385,8 +196,14 @@ bundle install
|
|
385
196
|
bundle exec rake test
|
386
197
|
```
|
387
198
|
|
199
|
+
## Community
|
200
|
+
|
388
201
|
I will appreciate if you could help to improve this project. Here are a few ways you can help:
|
389
202
|
|
203
|
+
- Let's talk in the [discussions](https://github.com/heronshoes/red_amber/discussions). [](https://github.com/heronshoes/red_amber/discussions)
|
204
|
+
- Browse Q and A, how to use, tips, etc.
|
205
|
+
- Ask questions you’re wondering about.
|
206
|
+
- Share ideas. The idea may be promoted to issues or pull requests.
|
390
207
|
- [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
|
391
208
|
- Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
|
392
209
|
- Write, clarify, or fix documentation
|