red_amber 0.2.0 → 0.2.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (43) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +5 -0
  3. data/CHANGELOG.md +125 -0
  4. data/README.md +86 -269
  5. data/doc/DataFrame.md +427 -281
  6. data/doc/Vector.md +35 -54
  7. data/doc/image/basic_verbs.png +0 -0
  8. data/doc/image/dataframe/assign.png +0 -0
  9. data/doc/image/dataframe/assign_operation.png +0 -0
  10. data/doc/image/dataframe/drop.png +0 -0
  11. data/doc/image/dataframe/pick.png +0 -0
  12. data/doc/image/dataframe/pick_operation.png +0 -0
  13. data/doc/image/dataframe/remove.png +0 -0
  14. data/doc/image/dataframe/rename.png +0 -0
  15. data/doc/image/dataframe/rename_operation.png +0 -0
  16. data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
  17. data/doc/image/dataframe/slice.png +0 -0
  18. data/doc/image/dataframe/slice_operation.png +0 -0
  19. data/doc/image/dataframe_model.png +0 -0
  20. data/doc/image/group_operation.png +0 -0
  21. data/doc/image/replace-if_then.png +0 -0
  22. data/doc/image/reshaping_dataframe.png +0 -0
  23. data/doc/image/screenshot.png +0 -0
  24. data/doc/image/vector/binary_element_wise.png +0 -0
  25. data/doc/image/vector/unary_aggregation.png +0 -0
  26. data/doc/image/vector/unary_aggregation_w_option.png +0 -0
  27. data/doc/image/vector/unary_element_wise.png +0 -0
  28. data/lib/red_amber/data_frame.rb +33 -41
  29. data/lib/red_amber/data_frame_displayable.rb +59 -6
  30. data/lib/red_amber/data_frame_loadsave.rb +36 -0
  31. data/lib/red_amber/data_frame_reshaping.rb +12 -10
  32. data/lib/red_amber/data_frame_selectable.rb +53 -9
  33. data/lib/red_amber/data_frame_variable_operation.rb +57 -20
  34. data/lib/red_amber/group.rb +5 -3
  35. data/lib/red_amber/helper.rb +20 -18
  36. data/lib/red_amber/vector.rb +50 -31
  37. data/lib/red_amber/vector_functions.rb +21 -24
  38. data/lib/red_amber/vector_selectable.rb +18 -9
  39. data/lib/red_amber/vector_updatable.rb +6 -3
  40. data/lib/red_amber/version.rb +1 -1
  41. data/lib/red_amber.rb +1 -0
  42. metadata +13 -3
  43. data/doc/examples_of_red_amber.ipynb +0 -6783
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 73459d02c921fcb0fcb742760e8c882b5491fa5316a79b9016233a516ada013e
4
- data.tar.gz: ac25e808c5e5d4c13bb1877659550bba532cb5778371e39dfa1f3b9e5a91a4f8
3
+ metadata.gz: a16699a945f41bf98790f698998126cc6b4a5e916eccb805e78448ec029f9310
4
+ data.tar.gz: 5e7fa732f64567fd85e5a74b046e80861824f13d15dc910278b6c62359db9a22
5
5
  SHA512:
6
- metadata.gz: 1bfa4200d440c338f496fe282816634d6a833e30e17edc87a2cf5ec63866e2bbbaf8796916f1b052ea66482c54a038bbf1445258c2526691e42c2b47be2c39c5
7
- data.tar.gz: e324e480e6086f7017de58201783c857825b79d0b2e2c8fa2636089cd1c5531e22905a3c0d860f26b833eb6add6ed6017497632bd1ea8fcb932c2d2233b11812
6
+ metadata.gz: 6ae7a6e3a8015b6b9736fb934526d9dc96b43830f0890ccbc16e175e539a8df1053432a63dde84a31dbd3a170aa6256b681127c510117723427bce815568c981
7
+ data.tar.gz: a0e7d86a7bdc6be7ec493ef5331ced5ecf4e6b89458f4252f208435905a7e4e80a088a718098073fb0c65c86d76297c70c978cd4dec28b1eb1a0d915bb7e3608
data/.rubocop.yml CHANGED
@@ -63,6 +63,7 @@ Metrics/AbcSize:
63
63
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
64
64
  - 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
65
65
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 51
66
+ - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 30.15
66
67
  - 'lib/red_amber/vector_updatable.rb' # Max: 36
67
68
  - 'lib/red_amber/vector_selectable.rb' # Max: 33
68
69
 
@@ -86,6 +87,7 @@ Metrics/CyclomaticComplexity:
86
87
  Exclude:
87
88
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 18
88
89
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
90
+ - 'lib/red_amber/helper.rb' # Max: 15
89
91
  - 'lib/red_amber/vector_selectable.rb' # Max: 13
90
92
  - 'lib/red_amber/vector_updatable.rb' # Max: 14
91
93
 
@@ -94,6 +96,8 @@ Metrics/MethodLength:
94
96
  Max: 30
95
97
  Exclude:
96
98
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 33
99
+ - 'lib/red_amber/data_frame_selectable.rb' # Max: 38
100
+ - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 35
97
101
 
98
102
  # Max: 100
99
103
  Metrics/ModuleLength:
@@ -109,6 +113,7 @@ Metrics/PerceivedComplexity:
109
113
  Max: 13
110
114
  Exclude:
111
115
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
116
+ - 'lib/red_amber/helper.rb' # Max: 15
112
117
  - 'lib/red_amber/vector_updatable.rb' # Max: 15
113
118
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 19
114
119
 
data/CHANGELOG.md CHANGED
@@ -1,3 +1,121 @@
1
+ ## [0.2.2] - 2022-10-04
2
+
3
+ - Bug fixes
4
+
5
+ - Return self when no replacement happen in Vector#replace. (#92)
6
+
7
+ - Limit n-digits in to_iruby. (#111)
8
+
9
+ - Fix displaying space in to_iruby. (#111)
10
+
11
+ - Raise error if key is duplicated. (#113)
12
+
13
+ - Fix DataFrame#pick/#drop with endless Range. (#113)
14
+
15
+ - Change type from dictionary to string in DataFrame reshaping methods. (#113)
16
+
17
+ - Fix arguments parser to accept Enumerator. (#114)
18
+
19
+ - New features and improvements
20
+
21
+ - Support to make a data frame from a to_arrow-responsible object. (#106) [Patch by Kenta Murata]
22
+
23
+ - Introduce DataFrame#auto_cast (experimental feature) (#105)
24
+
25
+ - Change default name in DataFrame#transpose, #to_long, #to_wide. (#110)
26
+
27
+ - Add Vector#dictionary? method. (#113)
28
+
29
+ - Add display mode 'Plain' and 'Minimum'. (#113)
30
+
31
+ - Refactor code
32
+
33
+ - Refine test_vector_selectable. (#92)
34
+ - Refine test_vector_updatable. (#92)
35
+ - Refine Vector.new. (#113)
36
+ - Refine DataFrame#pick, #drop. (#113)
37
+
38
+ - Documents
39
+
40
+ - Update images. (#90, #105, #113)
41
+
42
+ - Update README to use simpler examples. (#112)
43
+ - Update README with a new screenshot example. (#113)
44
+
45
+ - GitHub site
46
+
47
+ - Update Jupyter notebooks in Binder (#88, #115)
48
+ - Move binder support to heronshoes/docker-stacks repository.
49
+ - Update README notebook on binder.
50
+ - Add examples_of_RedAmber notebook on binder.
51
+
52
+ - Start to use discussions.
53
+
54
+ - Thanks
55
+
56
+ - Kenta Murata
57
+
58
+ ## [0.2.1] - 2022-09-07
59
+
60
+ - Bug fixes
61
+
62
+ - Fix `Vector#each` with block (#66)
63
+ `Vector#each` will return value of each element with block.
64
+
65
+ - Fix table format at size == 9 (#67)
66
+
67
+ - Fix to support Vector in `DataFrame#assign` (#77)
68
+
69
+ - Add `assert_delta` functionality for `assert_with_NaN` (#78)
70
+
71
+ - Fix Vector#is_in when self is chunked (#79)
72
+
73
+ - Fix Array type error (uint/int) (#79)
74
+
75
+ - New features and improvements
76
+
77
+ - Refine `DataFrame#indices` method (#67)
78
+
79
+ - Update DataFrame reshaping methods (#73)
80
+
81
+ - Change default option value of DataFrame reshaping
82
+
83
+ - Change the order of import_cars example
84
+
85
+ - Add `DataFrame#method_missing` to get column vector by method (#75)
86
+
87
+ - Add `DataFrame#method_missing` to get column (#75)
88
+
89
+ - Accept both args and block in `DataFrame#assign` (#75)
90
+
91
+ - Accept indices in `DataFrame#pick` and `DataFrame#drop` (#76)
92
+
93
+ - Add `DataFrame#slice_by` method (#77)
94
+
95
+ - Add new Vector functions (#78)
96
+
97
+ - Add inverse trigonometric function for Vector
98
+ - `acos`
99
+ - `asin`
100
+
101
+ - Add logarithmic function for Vector
102
+ - `ln`
103
+ - `log10`
104
+ - `log1p`
105
+ - `log2`
106
+
107
+ - Add binary function `Vector#logb`
108
+
109
+ - Docker image and Jupyter Notebook [Thanks to Kenta Murata]
110
+ - Add link to RubyData in README
111
+ - Add link to interactive README by Binder
112
+
113
+ - Update Jupyter Notebook `71 examples of RedAmber`
114
+
115
+ - Thanks
116
+
117
+ - Kenta Murata
118
+
1
119
  ## [0.2.0] - 2022-08-15
2
120
 
3
121
  - Bump version up to 0.2.0
@@ -236,6 +354,13 @@
236
354
  - Documentation
237
355
  - Fix typo in DataFrame.md
238
356
 
357
+ - Github site
358
+ - Add gem and status badges in README. (#42) [Patch by kojix2]
359
+
360
+ - Thanks
361
+
362
+ - kojix2
363
+
239
364
  ## [0.1.5] - 2022-06-12 (experimental)
240
365
 
241
366
  - Bug fixes
data/README.md CHANGED
@@ -2,12 +2,15 @@
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
4
4
  [![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
5
+ [![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
5
6
 
6
7
  A simple dataframe library for Ruby.
7
8
 
8
9
  - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
9
10
  - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
10
11
 
12
+ ![screenshot from jupyterlab](doc/image/screenshot.png)
13
+
11
14
  ## Requirements
12
15
 
13
16
  Supported Ruby version is >= 2.7.
@@ -53,328 +56,136 @@ Or install it yourself as:
53
56
  gem install red_amber
54
57
  ```
55
58
 
56
- ## `RedAmber::DataFrame`
59
+ ## Docker image and Jupyter Notebook
57
60
 
58
- Represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
61
+ [RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
59
62
 
60
- ```ruby
61
- require 'red_amber' # require 'red-amber' is also OK.
62
- require 'datasets-arrow'
63
+ Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=README.ipynb).
64
+ [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb)
63
65
 
64
- arrow = Datasets::Penguins.new.to_arrow
65
- penguins = RedAmber::DataFrame.new(arrow)
66
66
 
67
- # =>
68
- #<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
69
- species island bill_length_mm bill_depth_mm flipper_length_mm ... year
70
- <string> <string> <double> <double> <uint8> ... <uint16>
71
- 1 Adelie Torgersen 39.1 18.7 181 ... 2007
72
- 2 Adelie Torgersen 39.5 17.4 186 ... 2007
73
- 3 Adelie Torgersen 40.3 18.0 195 ... 2007
74
- 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
75
- 5 Adelie Torgersen 36.7 19.3 193 ... 2007
76
- : : : : : : ... :
77
- 342 Gentoo Biscoe 50.4 15.7 222 ... 2009
78
- 343 Gentoo Biscoe 45.2 14.8 212 ... 2009
79
- 344 Gentoo Biscoe 49.9 16.1 213 ... 2009
80
- ```
67
+ ## Data frame in `RedAmber`
81
68
 
82
- ### DataFrame model
83
- ![dataframe model of RedAmber](doc/image/dataframe_model.png)
84
-
85
- For example, `DataFrame#pick` accepts keys as an argument and returns a sub DataFrame.
69
+ Class `RedAmber::DataFrame` represents a set of data in 2D-shape.
70
+ The entity is a Red Arrow's Table object.
86
71
 
87
- ![pick method image](doc/image/dataframe/pick.png)
88
-
89
- ```ruby
90
- penguins.keys
91
- # =>
92
- [:species,
93
- :island,
94
- :bill_length_mm,
95
- :bill_depth_mm,
96
- :flipper_length_mm,
97
- :body_mass_g,
98
- :sex,
99
- :year]
100
-
101
- df = penguins.pick(:species, :island, :body_mass_g)
102
- df
103
-
104
- # =>
105
- #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003cc1c>
106
- species island body_mass_g
107
- <string> <string> <uint16>
108
- 1 Adelie Torgersen 3750
109
- 2 Adelie Torgersen 3800
110
- 3 Adelie Torgersen 3250
111
- 4 Adelie Torgersen (nil)
112
- 5 Adelie Torgersen 3450
113
- : : : :
114
- 342 Gentoo Biscoe 5750
115
- 343 Gentoo Biscoe 5200
116
- 344 Gentoo Biscoe 5400
117
- ```
118
-
119
- `DataFrame#drop` drops some columns to create a remainer DataFrame.
120
-
121
- ![drop method image](doc/image/dataframe/drop.png)
72
+ ![dataframe model of RedAmber](doc/image/dataframe_model.png)
122
73
 
123
- You can specify by keys or a boolean array (same size as n_keys).
74
+ Load the library.
124
75
 
125
76
  ```ruby
126
- # Same as df.drop(:species, :island)
127
- df = df.drop(true, true, false)
128
-
129
- # =>
130
- #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
131
- body_mass_g
132
- <uint16>
133
- 1 3750
134
- 2 3800
135
- 3 3250
136
- 4 (nil)
137
- 5 3450
138
- : :
139
- 342 5750
140
- 343 5200
141
- 344 5400
77
+ require 'red_amber' # require 'red-amber' is also OK.
78
+ include RedAmber
142
79
  ```
143
80
 
144
- Arrow data is immutable, so these methods always return an new object.
145
-
146
- `DataFrame#assign` creates new columns or update existing columns.
147
-
148
- ![assign method image](doc/image/dataframe/assign.png)
81
+ ### Example: diamonds dataset
149
82
 
150
83
  ```ruby
151
- # New column is created because ':body_mass_kg' is a new key.
152
- df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
84
+ require 'datasets-arrow' # to load sample data
153
85
 
154
- # =>
155
- #<RedAmber::DataFrame : 344 x 2 Vectors, 0x00000000000212f0>
156
- body_mass_g body_mass_kg
157
- <uint16> <double>
158
- 1 3750 3.8
159
- 2 3800 3.8
160
- 3 3250 3.3
161
- 4 (nil) (nil)
162
- 5 3450 3.5
163
- : : :
164
- 342 5750 5.8
165
- 343 5200 5.2
166
- 344 5400 5.4
167
- ```
168
-
169
- `DataFrame#slice` selects rows (observations) to create a sub DataFrame.
170
-
171
- ![slice method image](doc/image/dataframe/slice.png)
172
-
173
- ```ruby
174
- # returns 5 rows at the start and 5 rows from the end
175
- penguins.slice(0...5, -5..-1)
86
+ dataset = Datasets::Diamonds.new
87
+ diamonds = DataFrame.new(dataset) # from v0.2.2, should be `dataset.to_arrow` if older.
176
88
 
177
89
  # =>
178
- #<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
179
- species island bill_length_mm bill_depth_mm flipper_length_mm ... year
180
- <string> <string> <double> <double> <uint8> ... <uint16>
181
- 1 Adelie Torgersen 39.1 18.7 181 ... 2007
182
- 2 Adelie Torgersen 39.5 17.4 186 ... 2007
183
- 3 Adelie Torgersen 40.3 18.0 195 ... 2007
184
- 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
185
- 5 Adelie Torgersen 36.7 19.3 193 ... 2007
186
- : : : : : : ... :
187
- 8 Gentoo Biscoe 50.4 15.7 222 ... 2009
188
- 9 Gentoo Biscoe 45.2 14.8 212 ... 2009
189
- 10 Gentoo Biscoe 49.9 16.1 213 ... 2009
90
+ #<RedAmber::DataFrame : 53940 x 10 Vectors, 0x000000000000f668>
91
+ carat cut color clarity depth table price x ... z
92
+ <double> <string> <string> <string> <double> <double> <uint16> <double> ... <double>
93
+ 0 0.23 Ideal E SI2 61.5 55.0 326 3.95 ... 2.43
94
+ 1 0.21 Premium E SI1 59.8 61.0 326 3.89 ... 2.31
95
+ 2 0.23 Good E VS1 56.9 65.0 327 4.05 ... 2.31
96
+ 3 0.29 Premium I VS2 62.4 58.0 334 4.2 ... 2.63
97
+ 4 0.31 Good J SI2 63.3 58.0 335 4.34 ... 2.75
98
+ : : : : : : : : : ... :
99
+ 53937 0.7 Very Good D SI1 62.8 60.0 2757 5.66 ... 3.56
100
+ 53938 0.86 Premium H SI2 61.0 58.0 2757 6.15 ... 3.74
101
+ 53939 0.75 Ideal D SI2 62.2 55.0 2757 5.83 ... 3.64
190
102
  ```
191
103
 
192
- `DataFrame#remove` rejects rows (observations) to create a remainer DataFrame.
193
-
194
- ![remove method image](doc/image/dataframe/remove.png)
104
+ For example, we can compute mean prices per 'cut' for the data larger than 1 carat.
195
105
 
196
106
  ```ruby
197
- # penguins[:bill_length_mm] < 40 returns a boolean Vector
198
- penguins.remove(penguins[:bill_length_mm] < 40)
107
+ df = diamonds
108
+ .slice { carat > 1 }
109
+ .group(:cut)
110
+ .mean(:price) # `pick` prior to `group` is not required if `:price` is specified here.
111
+ .sort('-mean(price)')
199
112
 
200
113
  # =>
201
- #<RedAmber::DataFrame : 244 x 8 Vectors, 0x000000000007d6f4>
202
- species island bill_length_mm bill_depth_mm flipper_length_mm ... year
203
- <string> <string> <double> <double> <uint8> ... <uint16>
204
- 1 Adelie Torgersen 40.3 18.0 195 ... 2007
205
- 2 Adelie Torgersen (nil) (nil) (nil) ... 2007
206
- 3 Adelie Torgersen 42.0 20.2 190 ... 2007
207
- 4 Adelie Torgersen 41.1 17.6 182 ... 2007
208
- 5 Adelie Torgersen 42.5 20.7 197 ... 2007
209
- : : : : : : ... :
210
- 242 Gentoo Biscoe 50.4 15.7 222 ... 2009
211
- 243 Gentoo Biscoe 45.2 14.8 212 ... 2009
212
- 244 Gentoo Biscoe 49.9 16.1 213 ... 2009
114
+ #<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f67c>
115
+ cut mean(price)
116
+ <string> <double>
117
+ 0 Ideal 8674.23
118
+ 1 Premium 8487.25
119
+ 2 Very Good 8340.55
120
+ 3 Good 7753.6
121
+ 4 Fair 7177.86
213
122
  ```
214
123
 
215
- DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
216
-
217
- This example is usage of block to update a column.
124
+ Arrow data is immutable, so these methods always return new objects.
125
+ Next example will rename a column and create a new column by simple calcuration.
218
126
 
219
127
  ```ruby
220
- df = RedAmber::DataFrame.new(
221
- integer: [0, 1, 2, 3, nil],
222
- float: [0.0, 1.1, 2.2, Float::NAN, nil],
223
- string: ['A', 'B', 'C', 'D', nil],
224
- boolean: [true, false, true, false, nil])
225
- df
128
+ usdjpy = 110.0
226
129
 
227
- # =>
228
- #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000003131c>
229
- integer float string boolean
230
- <uint8> <double> <string> <boolean>
231
- 1 0 0.0 A true
232
- 2 1 1.1 B false
233
- 3 2 2.2 C true
234
- 4 3 NaN D false
235
- 5 (nil) (nil) (nil) (nil)
236
-
237
- df.assign do
238
- vectors.select(&:float?).map { |v| [v.key, -v] }
239
- # => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
240
- end
130
+ df.rename('mean(price)': :mean_price_USD)
131
+ .assign(:mean_price_JPY) { mean_price_USD * usdjpy }
241
132
 
242
133
  # =>
243
- #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
244
- index float string
245
- <uint8> <double> <string>
246
- 1 0 -0.0 A
247
- 2 1 -1.1 B
248
- 3 2 -2.2 C
249
- 4 3 NaN D
250
- 5 (nil) (nil) (nil)
134
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f71c>
135
+ cut mean_price_USD mean_price_JPY
136
+ <string> <double> <double>
137
+ 0 Ideal 8674.23 954164.93
138
+ 1 Premium 8487.25 933597.34
139
+ 2 Very Good 8340.55 917460.37
140
+ 3 Good 7753.6 852896.11
141
+ 4 Fair 7177.86 789564.12
251
142
  ```
252
143
 
253
- Next example is to eliminate rows containing nil.
144
+ ### Example: starwars dataset
254
145
 
255
- ```ruby
256
- # remove all observations containing nil
257
- nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
258
- nil_removed.tdr
259
-
260
- # =>
261
- RedAmber::DataFrame : 342 x 8 Vectors
262
- Vectors : 5 numeric, 3 strings
263
- # key type level data_preview
264
- 1 :species string 3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
265
- 2 :island string 3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
266
- 3 :bill_length_mm double 164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
267
- 4 :bill_depth_mm double 80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
268
- 5 :flipper_length_mm int64 55 [181, 186, 195, 193, 190, ... ]
269
- 6 :body_mass_g int64 94 [3750, 3800, 3250, 3450, 3650, ... ]
270
- 7 :sex string 3 {"male"=>168, "female"=>165, ""=>9}
271
- 8 :year int64 3 {2007=>109, 2008=>114, 2009=>119}
272
- ```
273
-
274
- For this frequently needed task, we can do it much simpler.
146
+ Next example is `starwars` dataset reading from the downloaded CSV file. Followed by minimum data cleansing.
275
147
 
276
148
  ```ruby
277
- penguins.remove_nil # => same result as above
278
- ```
149
+ uri = URI('https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv')
279
150
 
280
- `DataFrame#summary` shows summary statistics in a DataFrame.
151
+ starwars = DataFrame.load(uri)
281
152
 
282
- ```ruby
283
- puts penguins.summary.to_s(width: 82)
284
-
285
- # =>
286
- variables count mean std min 25% median 75% max
287
- <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
288
- 1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
289
- 2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
290
- 3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
291
- 4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
292
- 5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
293
- ```
294
-
295
- `DataFrame#group` method can be used for the grouping tasks.
296
-
297
- ```ruby
298
- starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
299
153
  starwars
154
+ .drop(0) # delete unnecessary index column
155
+ .remove { species == "NA" } # delete unnecessary rows
156
+ .group(:species) { [count(:species), mean(:height, :mass)] }
157
+ .slice { count > 1 }
300
158
 
301
159
  # =>
302
- #<RedAmber::DataFrame : 87 x 12 Vectors, 0x000000000000607c>
303
- unnamed1 name height mass hair_color skin_color eye_color ... species
304
- <int64> <string> <int64> <double> <string> <string> <string> ... <string>
305
- 1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
306
- 2 2 C-3PO 167 75.0 NA gold yellow ... Droid
307
- 3 3 R2-D2 96 32.0 NA white, blue red ... Droid
308
- 4 4 Darth Vader 202 136.0 none white yellow ... Human
309
- 5 5 Leia Organa 150 49.0 brown light brown ... Human
310
- : : : : : : : : ... :
311
- 85 85 BB8 (nil) (nil) none none black ... Droid
312
- 86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
313
- 87 87 Padmé Amidala 165 45.0 brown light brown ... Human
314
-
315
- grouped = starwars.group(:species) { [count(:species), mean(:height, :mass)] }
316
- grouped.slice { v(:count) > 1 }
317
-
318
- # =>
319
- #<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000006e848>
160
+ #<RedAmber::DataFrame : 8 x 4 Vectors, 0x000000000000f848>
320
161
  species count mean(height) mean(mass)
321
162
  <string> <int64> <double> <double>
322
- 1 Human 35 176.6 82.8
323
- 2 Droid 6 131.2 69.8
324
- 3 Wookiee 2 231.0 124.0
325
- 4 Gungan 3 208.7 74.0
326
- 5 NA 4 181.3 48.0
327
- : : : : :
328
- 7 Twi'lek 2 179.0 55.0
329
- 8 Mirialan 2 168.0 53.1
330
- 9 Kaminoan 2 221.0 88.0
163
+ 0 Human 35 176.65 82.78
164
+ 1 Droid 6 131.2 69.75
165
+ 2 Wookiee 2 231.0 124.0
166
+ 3 Gungan 3 208.67 74.0
167
+ 4 Zabrak 2 173.0 80.0
168
+ 5 Twi'lek 2 179.0 55.0
169
+ 6 Mirialan 2 168.0 53.1
170
+ 7 Kaminoan 2 221.0 88.0
331
171
  ```
332
172
 
333
173
  See [DataFrame.md](doc/DataFrame.md) for other examples and details.
334
174
 
335
175
 
336
- ## `RedAmber::Vector`
176
+ ### `Vector` for 1D data object in column
337
177
 
338
178
  Class `RedAmber::Vector` represents a series of data in the DataFrame.
339
- Method `RedAmber::DataFrame#[key]` returns a Vector with the key `key`.
340
-
341
- ```ruby
342
- penguins[:bill_length_mm]
343
- # =>
344
- #<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
345
- [39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
346
- ```
347
-
348
- Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
349
-
350
- This is an element-wise comparison and returns a boolean Vector of same size.
351
-
352
- ![unary element-wise](doc/image/vector/unary_element_wise.png)
353
-
354
- ```ruby
355
- penguins[:bill_length_mm] < 40
356
-
357
- # =>
358
- #<RedAmber::Vector(:boolean, size=344):0x000000000007e7ac>
359
- [true, true, false, nil, true, true, true, true, true, false, true, true, false, ... ]
360
- ```
361
-
362
- Next example returns aggregated result.
363
-
364
- ![unary aggregation](doc/image/vector/unary_aggregation.png)
365
-
366
- ```ruby
367
- penguins[:bill_length_mm].mean
368
- 43.92192982456141
369
- # =>
370
-
371
- ```
372
179
 
373
180
  See [Vector.md](doc/Vector.md) for details.
374
181
 
375
182
  ## Jupyter notebook
376
183
 
377
- [61 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
184
+ [73 Examples of Red Amber](binder/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
185
+
186
+ You can try this notebook on [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb).
187
+ [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb)
188
+
378
189
 
379
190
  ## Development
380
191
 
@@ -385,8 +196,14 @@ bundle install
385
196
  bundle exec rake test
386
197
  ```
387
198
 
199
+ ## Community
200
+
388
201
  I will appreciate if you could help to improve this project. Here are a few ways you can help:
389
202
 
203
+ - Let's talk in the [discussions](https://github.com/heronshoes/red_amber/discussions). [![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
204
+ - Browse Q and A, how to use, tips, etc.
205
+ - Ask questions you’re wondering about.
206
+ - Share ideas. The idea may be promoted to issues or pull requests.
390
207
  - [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
391
208
  - Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
392
209
  - Write, clarify, or fix documentation