red_amber 0.2.1 → 0.2.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (41) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +3 -0
  3. data/CHANGELOG.md +69 -2
  4. data/README.md +83 -280
  5. data/doc/DataFrame.md +279 -265
  6. data/doc/Vector.md +28 -36
  7. data/doc/image/basic_verbs.png +0 -0
  8. data/doc/image/dataframe/assign.png +0 -0
  9. data/doc/image/dataframe/assign_operation.png +0 -0
  10. data/doc/image/dataframe/drop.png +0 -0
  11. data/doc/image/dataframe/pick.png +0 -0
  12. data/doc/image/dataframe/pick_operation.png +0 -0
  13. data/doc/image/dataframe/remove.png +0 -0
  14. data/doc/image/dataframe/rename.png +0 -0
  15. data/doc/image/dataframe/rename_operation.png +0 -0
  16. data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
  17. data/doc/image/dataframe/slice.png +0 -0
  18. data/doc/image/dataframe/slice_operation.png +0 -0
  19. data/doc/image/dataframe_model.png +0 -0
  20. data/doc/image/group_operation.png +0 -0
  21. data/doc/image/replace-if_then.png +0 -0
  22. data/doc/image/reshaping_dataframe.png +0 -0
  23. data/doc/image/screenshot.png +0 -0
  24. data/doc/image/vector/binary_element_wise.png +0 -0
  25. data/doc/image/vector/unary_aggregation.png +0 -0
  26. data/doc/image/vector/unary_aggregation_w_option.png +0 -0
  27. data/doc/image/vector/unary_element_wise.png +0 -0
  28. data/lib/red_amber/data_frame.rb +10 -37
  29. data/lib/red_amber/data_frame_displayable.rb +56 -3
  30. data/lib/red_amber/data_frame_loadsave.rb +36 -0
  31. data/lib/red_amber/data_frame_reshaping.rb +8 -6
  32. data/lib/red_amber/data_frame_variable_operation.rb +25 -19
  33. data/lib/red_amber/group.rb +5 -3
  34. data/lib/red_amber/helper.rb +20 -18
  35. data/lib/red_amber/vector.rb +49 -30
  36. data/lib/red_amber/vector_selectable.rb +9 -1
  37. data/lib/red_amber/vector_updatable.rb +6 -3
  38. data/lib/red_amber/version.rb +1 -1
  39. data/lib/red_amber.rb +1 -0
  40. metadata +13 -3
  41. data/doc/examples_of_red_amber.ipynb +0 -8979
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d239a3fa90e5796fb695f8d3c4995d0a2178ea7c8c2789bed157e688902585cb
4
- data.tar.gz: 968c02294d24a3dabaa6e5128be0bcfad713e131df15850ac0ceb64c2883dcd0
3
+ metadata.gz: a16699a945f41bf98790f698998126cc6b4a5e916eccb805e78448ec029f9310
4
+ data.tar.gz: 5e7fa732f64567fd85e5a74b046e80861824f13d15dc910278b6c62359db9a22
5
5
  SHA512:
6
- metadata.gz: d1c5ffd9650dd8c9e825514cd7e2ff4914690bd731ac262fca6cc17e56c1e312679689351a05fb741dccfb59377214706a8bf6ca6fe3237ca46fb623ae1b9f10
7
- data.tar.gz: f37c4aff9170cd5105737a9d2b3d827051254dcca6968b697f5ed3a70e1b2c3cb14303e88a9c342870d1447450a538e445d6f3d37de53591d3f6d13b87aebc16
6
+ metadata.gz: 6ae7a6e3a8015b6b9736fb934526d9dc96b43830f0890ccbc16e175e539a8df1053432a63dde84a31dbd3a170aa6256b681127c510117723427bce815568c981
7
+ data.tar.gz: a0e7d86a7bdc6be7ec493ef5331ced5ecf4e6b89458f4252f208435905a7e4e80a088a718098073fb0c65c86d76297c70c978cd4dec28b1eb1a0d915bb7e3608
data/.rubocop.yml CHANGED
@@ -63,6 +63,7 @@ Metrics/AbcSize:
63
63
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
64
64
  - 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
65
65
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 51
66
+ - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 30.15
66
67
  - 'lib/red_amber/vector_updatable.rb' # Max: 36
67
68
  - 'lib/red_amber/vector_selectable.rb' # Max: 33
68
69
 
@@ -86,6 +87,7 @@ Metrics/CyclomaticComplexity:
86
87
  Exclude:
87
88
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 18
88
89
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
90
+ - 'lib/red_amber/helper.rb' # Max: 15
89
91
  - 'lib/red_amber/vector_selectable.rb' # Max: 13
90
92
  - 'lib/red_amber/vector_updatable.rb' # Max: 14
91
93
 
@@ -111,6 +113,7 @@ Metrics/PerceivedComplexity:
111
113
  Max: 13
112
114
  Exclude:
113
115
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
116
+ - 'lib/red_amber/helper.rb' # Max: 15
114
117
  - 'lib/red_amber/vector_updatable.rb' # Max: 15
115
118
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 19
116
119
 
data/CHANGELOG.md CHANGED
@@ -1,6 +1,63 @@
1
+ ## [0.2.2] - 2022-10-04
2
+
3
+ - Bug fixes
4
+
5
+ - Return self when no replacement happen in Vector#replace. (#92)
6
+
7
+ - Limit n-digits in to_iruby. (#111)
8
+
9
+ - Fix displaying space in to_iruby. (#111)
10
+
11
+ - Raise error if key is duplicated. (#113)
12
+
13
+ - Fix DataFrame#pick/#drop with endless Range. (#113)
14
+
15
+ - Change type from dictionary to string in DataFrame reshaping methods. (#113)
16
+
17
+ - Fix arguments parser to accept Enumerator. (#114)
18
+
19
+ - New features and improvements
20
+
21
+ - Support to make a data frame from a to_arrow-responsible object. (#106) [Patch by Kenta Murata]
22
+
23
+ - Introduce DataFrame#auto_cast (experimental feature) (#105)
24
+
25
+ - Change default name in DataFrame#transpose, #to_long, #to_wide. (#110)
26
+
27
+ - Add Vector#dictionary? method. (#113)
28
+
29
+ - Add display mode 'Plain' and 'Minimum'. (#113)
30
+
31
+ - Refactor code
32
+
33
+ - Refine test_vector_selectable. (#92)
34
+ - Refine test_vector_updatable. (#92)
35
+ - Refine Vector.new. (#113)
36
+ - Refine DataFrame#pick, #drop. (#113)
37
+
38
+ - Documents
39
+
40
+ - Update images. (#90, #105, #113)
41
+
42
+ - Update README to use simpler examples. (#112)
43
+ - Update README with a new screenshot example. (#113)
44
+
45
+ - GitHub site
46
+
47
+ - Update Jupyter notebooks in Binder (#88, #115)
48
+ - Move binder support to heronshoes/docker-stacks repository.
49
+ - Update README notebook on binder.
50
+ - Add examples_of_RedAmber notebook on binder.
51
+
52
+ - Start to use discussions.
53
+
54
+ - Thanks
55
+
56
+ - Kenta Murata
57
+
1
58
  ## [0.2.1] - 2022-09-07
2
59
 
3
- -Bug fixes
60
+ - Bug fixes
4
61
 
5
62
  - Fix `Vector#each` with block (#66)
6
63
  `Vector#each` will return value of each element with block.
@@ -49,12 +106,15 @@
49
106
 
50
107
  - Add binary function `Vector#logb`
51
108
 
52
- - Docker image and Jupyter Notebook (Thanks to @mrkn)
109
+ - Docker image and Jupyter Notebook [Thanks to Kenta Murata]
53
110
  - Add link to RubyData in README
54
111
  - Add link to interactive README by Binder
55
112
 
56
113
  - Update Jupyter Notebook `71 examples of RedAmber`
57
114
 
115
+ - Thanks
116
+
117
+ - Kenta Murata
58
118
 
59
119
  ## [0.2.0] - 2022-08-15
60
120
 
@@ -294,6 +354,13 @@
294
354
  - Documentation
295
355
  - Fix typo in DataFrame.md
296
356
 
357
+ - Github site
358
+ - Add gem and status badges in README. (#42) [Patch by kojix2]
359
+
360
+ - Thanks
361
+
362
+ - kojix2
363
+
297
364
  ## [0.1.5] - 2022-06-12 (experimental)
298
365
 
299
366
  - Bug fixes
data/README.md CHANGED
@@ -2,12 +2,15 @@
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
4
4
  [![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
5
+ [![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
5
6
 
6
7
  A simple dataframe library for Ruby.
7
8
 
8
9
  - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
9
10
  - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
10
11
 
12
+ ![screenshot from jupyterlab](doc/image/screenshot.png)
13
+
11
14
  ## Requirements
12
15
 
13
16
  Supported Ruby version is >= 2.7.
@@ -57,338 +60,132 @@ gem install red_amber
57
60
 
58
61
  [RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
59
62
 
60
- Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb).
61
- [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb)
62
-
63
+ Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=README.ipynb).
64
+ [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb)
63
65
 
64
66
 
65
- ## `RedAmber::DataFrame`
67
+ ## Data frame in `RedAmber`
66
68
 
67
- It represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
69
+ Class `RedAmber::DataFrame` represents a set of data in 2D-shape.
70
+ The entity is a Red Arrow's Table object.
68
71
 
69
72
  ![dataframe model of RedAmber](doc/image/dataframe_model.png)
70
73
 
71
- ```ruby
72
- require 'red_amber' # require 'red-amber' is also OK.
73
- require 'datasets-arrow'
74
-
75
- arrow = Datasets::Penguins.new.to_arrow
76
- penguins = RedAmber::DataFrame.new(arrow)
77
-
78
- # =>
79
- #<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
80
- species island bill_length_mm bill_depth_mm flipper_length_mm ... year
81
- <string> <string> <double> <double> <uint8> ... <uint16>
82
- 1 Adelie Torgersen 39.1 18.7 181 ... 2007
83
- 2 Adelie Torgersen 39.5 17.4 186 ... 2007
84
- 3 Adelie Torgersen 40.3 18.0 195 ... 2007
85
- 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
86
- 5 Adelie Torgersen 36.7 19.3 193 ... 2007
87
- : : : : : : ... :
88
- 342 Gentoo Biscoe 50.4 15.7 222 ... 2009
89
- 343 Gentoo Biscoe 45.2 14.8 212 ... 2009
90
- 344 Gentoo Biscoe 49.9 16.1 213 ... 2009
91
- ```
92
-
93
- For example, `DataFrame#pick` accepts keys as arguments and returns a sub DataFrame.
94
-
95
- ![pick method image](doc/image/dataframe/pick.png)
96
-
97
- ```ruby
98
- penguins.keys
99
- # =>
100
- [:species,
101
- :island,
102
- :bill_length_mm,
103
- :bill_depth_mm,
104
- :flipper_length_mm,
105
- :body_mass_g,
106
- :sex,
107
- :year]
108
-
109
- df = penguins.pick(:species, :island, :body_mass_g)
110
- df
111
-
112
- # =>
113
- #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003cc1c>
114
- species island body_mass_g
115
- <string> <string> <uint16>
116
- 1 Adelie Torgersen 3750
117
- 2 Adelie Torgersen 3800
118
- 3 Adelie Torgersen 3250
119
- 4 Adelie Torgersen (nil)
120
- 5 Adelie Torgersen 3450
121
- : : : :
122
- 342 Gentoo Biscoe 5750
123
- 343 Gentoo Biscoe 5200
124
- 344 Gentoo Biscoe 5400
125
- ```
126
-
127
- `DataFrame#drop` drops some columns to create a remainer DataFrame.
128
-
129
- ![drop method image](doc/image/dataframe/drop.png)
130
-
131
- You can specify by keys or a boolean array of same size as n_keys.
74
+ Load the library.
132
75
 
133
76
  ```ruby
134
- # Same as df.drop(:species, :island)
135
- df = df.drop(true, true, false)
136
-
137
- # =>
138
- #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
139
- body_mass_g
140
- <uint16>
141
- 1 3750
142
- 2 3800
143
- 3 3250
144
- 4 (nil)
145
- 5 3450
146
- : :
147
- 342 5750
148
- 343 5200
149
- 344 5400
77
+ require 'red_amber' # require 'red-amber' is also OK.
78
+ include RedAmber
150
79
  ```
151
80
 
152
- Arrow data is immutable, so these methods always return an new object.
153
-
154
- `DataFrame#assign` creates new columns or update existing columns.
155
-
156
- ![assign method image](doc/image/dataframe/assign.png)
81
+ ### Example: diamonds dataset
157
82
 
158
83
  ```ruby
159
- # New column is created because ':body_mass_kg' is a new key.
160
- df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
161
-
162
- # =>
163
- #<RedAmber::DataFrame : 344 x 2 Vectors, 0x00000000000212f0>
164
- body_mass_g body_mass_kg
165
- <uint16> <double>
166
- 1 3750 3.8
167
- 2 3800 3.8
168
- 3 3250 3.3
169
- 4 (nil) (nil)
170
- 5 3450 3.5
171
- : : :
172
- 342 5750 5.8
173
- 343 5200 5.2
174
- 344 5400 5.4
175
- ```
176
-
177
- `DataFrame#slice` selects rows (observations) to create a sub DataFrame.
84
+ require 'datasets-arrow' # to load sample data
178
85
 
179
- ![slice method image](doc/image/dataframe/slice.png)
180
-
181
- ```ruby
182
- # returns 5 rows at the start and 5 rows from the end
183
- penguins.slice(0...5, -5..-1)
86
+ dataset = Datasets::Diamonds.new
87
+ diamonds = DataFrame.new(dataset) # from v0.2.2, should be `dataset.to_arrow` if older.
184
88
 
185
89
  # =>
186
- #<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
187
- species island bill_length_mm bill_depth_mm flipper_length_mm ... year
188
- <string> <string> <double> <double> <uint8> ... <uint16>
189
- 1 Adelie Torgersen 39.1 18.7 181 ... 2007
190
- 2 Adelie Torgersen 39.5 17.4 186 ... 2007
191
- 3 Adelie Torgersen 40.3 18.0 195 ... 2007
192
- 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
193
- 5 Adelie Torgersen 36.7 19.3 193 ... 2007
194
- : : : : : : ... :
195
- 8 Gentoo Biscoe 50.4 15.7 222 ... 2009
196
- 9 Gentoo Biscoe 45.2 14.8 212 ... 2009
197
- 10 Gentoo Biscoe 49.9 16.1 213 ... 2009
90
+ #<RedAmber::DataFrame : 53940 x 10 Vectors, 0x000000000000f668>
91
+ carat cut color clarity depth table price x ... z
92
+ <double> <string> <string> <string> <double> <double> <uint16> <double> ... <double>
93
+ 0 0.23 Ideal E SI2 61.5 55.0 326 3.95 ... 2.43
94
+ 1 0.21 Premium E SI1 59.8 61.0 326 3.89 ... 2.31
95
+ 2 0.23 Good E VS1 56.9 65.0 327 4.05 ... 2.31
96
+ 3 0.29 Premium I VS2 62.4 58.0 334 4.2 ... 2.63
97
+ 4 0.31 Good J SI2 63.3 58.0 335 4.34 ... 2.75
98
+ : : : : : : : : : ... :
99
+ 53937 0.7 Very Good D SI1 62.8 60.0 2757 5.66 ... 3.56
100
+ 53938 0.86 Premium H SI2 61.0 58.0 2757 6.15 ... 3.74
101
+ 53939 0.75 Ideal D SI2 62.2 55.0 2757 5.83 ... 3.64
198
102
  ```
199
103
 
200
- `DataFrame#remove` rejects rows (observations) to create a remainer DataFrame.
201
-
202
- ![remove method image](doc/image/dataframe/remove.png)
104
+ For example, we can compute mean prices per 'cut' for the data larger than 1 carat.
203
105
 
204
106
  ```ruby
205
- # penguins[:bill_length_mm] < 40 returns a boolean Vector
206
- penguins.remove(penguins[:bill_length_mm] < 40)
107
+ df = diamonds
108
+ .slice { carat > 1 }
109
+ .group(:cut)
110
+ .mean(:price) # `pick` prior to `group` is not required if `:price` is specified here.
111
+ .sort('-mean(price)')
207
112
 
208
113
  # =>
209
- #<RedAmber::DataFrame : 244 x 8 Vectors, 0x000000000007d6f4>
210
- species island bill_length_mm bill_depth_mm flipper_length_mm ... year
211
- <string> <string> <double> <double> <uint8> ... <uint16>
212
- 1 Adelie Torgersen 40.3 18.0 195 ... 2007
213
- 2 Adelie Torgersen (nil) (nil) (nil) ... 2007
214
- 3 Adelie Torgersen 42.0 20.2 190 ... 2007
215
- 4 Adelie Torgersen 41.1 17.6 182 ... 2007
216
- 5 Adelie Torgersen 42.5 20.7 197 ... 2007
217
- : : : : : : ... :
218
- 242 Gentoo Biscoe 50.4 15.7 222 ... 2009
219
- 243 Gentoo Biscoe 45.2 14.8 212 ... 2009
220
- 244 Gentoo Biscoe 49.9 16.1 213 ... 2009
114
+ #<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f67c>
115
+ cut mean(price)
116
+ <string> <double>
117
+ 0 Ideal 8674.23
118
+ 1 Premium 8487.25
119
+ 2 Very Good 8340.55
120
+ 3 Good 7753.6
121
+ 4 Fair 7177.86
221
122
  ```
222
123
 
223
- DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
224
-
225
- Previous example is also OK with a block.
124
+ Arrow data is immutable, so these methods always return new objects.
125
+ Next example will rename a column and create a new column by simple calcuration.
226
126
 
227
127
  ```ruby
228
- penguins.remove { bill_length_mm < 40 }
229
- ```
230
-
231
- Next example is an usage of block to update a column.
232
-
233
- ```ruby
234
- df = RedAmber::DataFrame.new(
235
- integer: [0, 1, 2, 3, nil],
236
- float: [0.0, 1.1, 2.2, Float::NAN, nil],
237
- string: ['A', 'B', 'C', 'D', nil],
238
- boolean: [true, false, true, false, nil])
239
- df
240
-
241
- # =>
242
- #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000003131c>
243
- integer float string boolean
244
- <uint8> <double> <string> <boolean>
245
- 1 0 0.0 A true
246
- 2 1 1.1 B false
247
- 3 2 2.2 C true
248
- 4 3 NaN D false
249
- 5 (nil) (nil) (nil) (nil)
250
-
251
- df.assign do
252
- vectors.select(&:float?).map { |v| [v.key, -v] }
253
- # => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
254
- end
255
-
256
- # =>
257
- #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
258
- index float string
259
- <uint8> <double> <string>
260
- 1 0 -0.0 A
261
- 2 1 -1.1 B
262
- 3 2 -2.2 C
263
- 4 3 NaN D
264
- 5 (nil) (nil) (nil)
265
- ```
266
-
267
- Next example is to eliminate rows containing nil.
128
+ usdjpy = 110.0
268
129
 
269
- ```ruby
270
- # remove all observations containing nil
271
- nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
272
- nil_removed.tdr
130
+ df.rename('mean(price)': :mean_price_USD)
131
+ .assign(:mean_price_JPY) { mean_price_USD * usdjpy }
273
132
 
274
133
  # =>
275
- RedAmber::DataFrame : 342 x 8 Vectors
276
- Vectors : 5 numeric, 3 strings
277
- # key type level data_preview
278
- 1 :species string 3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
279
- 2 :island string 3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
280
- 3 :bill_length_mm double 164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
281
- 4 :bill_depth_mm double 80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
282
- 5 :flipper_length_mm int64 55 [181, 186, 195, 193, 190, ... ]
283
- 6 :body_mass_g int64 94 [3750, 3800, 3250, 3450, 3650, ... ]
284
- 7 :sex string 3 {"male"=>168, "female"=>165, ""=>9}
285
- 8 :year int64 3 {2007=>109, 2008=>114, 2009=>119}
134
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f71c>
135
+ cut mean_price_USD mean_price_JPY
136
+ <string> <double> <double>
137
+ 0 Ideal 8674.23 954164.93
138
+ 1 Premium 8487.25 933597.34
139
+ 2 Very Good 8340.55 917460.37
140
+ 3 Good 7753.6 852896.11
141
+ 4 Fair 7177.86 789564.12
286
142
  ```
287
143
 
288
- For this frequently needed task, we can do it much simpler.
144
+ ### Example: starwars dataset
289
145
 
290
- ```ruby
291
- penguins.remove_nil # => same result as above
292
- ```
293
-
294
- `DataFrame#summary` shows summary statistics in a DataFrame.
146
+ Next example is `starwars` dataset reading from the downloaded CSV file. Followed by minimum data cleansing.
295
147
 
296
148
  ```ruby
297
- puts penguins.summary.to_s(width: 82)
149
+ uri = URI('https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv')
298
150
 
299
- # =>
300
- variables count mean std min 25% median 75% max
301
- <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
302
- 1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
303
- 2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
304
- 3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
305
- 4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
306
- 5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
307
- ```
308
-
309
- `DataFrame#group` method can be used for the grouping tasks.
151
+ starwars = DataFrame.load(uri)
310
152
 
311
- ```ruby
312
- starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
313
153
  starwars
154
+ .drop(0) # delete unnecessary index column
155
+ .remove { species == "NA" } # delete unnecessary rows
156
+ .group(:species) { [count(:species), mean(:height, :mass)] }
157
+ .slice { count > 1 }
314
158
 
315
159
  # =>
316
- #<RedAmber::DataFrame : 87 x 12 Vectors, 0x000000000000607c>
317
- unnamed1 name height mass hair_color skin_color eye_color ... species
318
- <int64> <string> <int64> <double> <string> <string> <string> ... <string>
319
- 1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
320
- 2 2 C-3PO 167 75.0 NA gold yellow ... Droid
321
- 3 3 R2-D2 96 32.0 NA white, blue red ... Droid
322
- 4 4 Darth Vader 202 136.0 none white yellow ... Human
323
- 5 5 Leia Organa 150 49.0 brown light brown ... Human
324
- : : : : : : : : ... :
325
- 85 85 BB8 (nil) (nil) none none black ... Droid
326
- 86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
327
- 87 87 Padmé Amidala 165 45.0 brown light brown ... Human
328
-
329
- starwars.group(:species) { [count(:species), mean(:height, :mass)] }
330
- .slice { count > 1 }
331
-
332
- # =>
333
- #<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000006e848>
160
+ #<RedAmber::DataFrame : 8 x 4 Vectors, 0x000000000000f848>
334
161
  species count mean(height) mean(mass)
335
162
  <string> <int64> <double> <double>
336
- 1 Human 35 176.6 82.8
337
- 2 Droid 6 131.2 69.8
338
- 3 Wookiee 2 231.0 124.0
339
- 4 Gungan 3 208.7 74.0
340
- 5 NA 4 181.3 48.0
341
- 6 Zabrak 2 173.0 80.0
342
- 7 Twi'lek 2 179.0 55.0
343
- 8 Mirialan 2 168.0 53.1
344
- 9 Kaminoan 2 221.0 88.0
163
+ 0 Human 35 176.65 82.78
164
+ 1 Droid 6 131.2 69.75
165
+ 2 Wookiee 2 231.0 124.0
166
+ 3 Gungan 3 208.67 74.0
167
+ 4 Zabrak 2 173.0 80.0
168
+ 5 Twi'lek 2 179.0 55.0
169
+ 6 Mirialan 2 168.0 53.1
170
+ 7 Kaminoan 2 221.0 88.0
345
171
  ```
346
172
 
347
173
  See [DataFrame.md](doc/DataFrame.md) for other examples and details.
348
174
 
349
175
 
350
- ## `RedAmber::Vector`
176
+ ### `Vector` for 1D data object in column
351
177
 
352
178
  Class `RedAmber::Vector` represents a series of data in the DataFrame.
353
- Method `RedAmber::DataFrame#[key]` returns a Vector with the key `key`.
354
-
355
- ```ruby
356
- penguins[:bill_length_mm]
357
- # =>
358
- #<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
359
- [39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
360
- ```
361
-
362
- Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
363
-
364
- This is an element-wise comparison and returns a boolean Vector of same size.
365
-
366
- ![unary element-wise](doc/image/vector/unary_element_wise.png)
367
-
368
- ```ruby
369
- penguins[:bill_length_mm] < 40
370
-
371
- # =>
372
- #<RedAmber::Vector(:boolean, size=344):0x000000000007e7ac>
373
- [true, true, false, nil, true, true, true, true, true, false, true, true, false, ... ]
374
- ```
375
-
376
- Next example returns aggregated result.
377
-
378
- ![unary aggregation](doc/image/vector/unary_aggregation.png)
379
-
380
- ```ruby
381
- penguins[:bill_length_mm].mean
382
- 43.92192982456141
383
- # =>
384
-
385
- ```
386
179
 
387
180
  See [Vector.md](doc/Vector.md) for details.
388
181
 
389
182
  ## Jupyter notebook
390
183
 
391
- [71 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
184
+ [73 Examples of Red Amber](binder/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
185
+
186
+ You can try this notebook on [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb).
187
+ [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb)
188
+
392
189
 
393
190
  ## Development
394
191
 
@@ -399,8 +196,14 @@ bundle install
399
196
  bundle exec rake test
400
197
  ```
401
198
 
199
+ ## Community
200
+
402
201
  I will appreciate if you could help to improve this project. Here are a few ways you can help:
403
202
 
203
+ - Let's talk in the [discussions](https://github.com/heronshoes/red_amber/discussions). [![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
204
+ - Browse Q and A, how to use, tips, etc.
205
+ - Ask questions you’re wondering about.
206
+ - Share ideas. The idea may be promoted to issues or pull requests.
404
207
  - [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
405
208
  - Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
406
209
  - Write, clarify, or fix documentation