red_amber 0.2.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +5 -0
  3. data/CHANGELOG.md +125 -0
  4. data/README.md +86 -269
  5. data/doc/DataFrame.md +427 -281
  6. data/doc/Vector.md +35 -54
  7. data/doc/image/basic_verbs.png +0 -0
  8. data/doc/image/dataframe/assign.png +0 -0
  9. data/doc/image/dataframe/assign_operation.png +0 -0
  10. data/doc/image/dataframe/drop.png +0 -0
  11. data/doc/image/dataframe/pick.png +0 -0
  12. data/doc/image/dataframe/pick_operation.png +0 -0
  13. data/doc/image/dataframe/remove.png +0 -0
  14. data/doc/image/dataframe/rename.png +0 -0
  15. data/doc/image/dataframe/rename_operation.png +0 -0
  16. data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
  17. data/doc/image/dataframe/slice.png +0 -0
  18. data/doc/image/dataframe/slice_operation.png +0 -0
  19. data/doc/image/dataframe_model.png +0 -0
  20. data/doc/image/group_operation.png +0 -0
  21. data/doc/image/replace-if_then.png +0 -0
  22. data/doc/image/reshaping_dataframe.png +0 -0
  23. data/doc/image/screenshot.png +0 -0
  24. data/doc/image/vector/binary_element_wise.png +0 -0
  25. data/doc/image/vector/unary_aggregation.png +0 -0
  26. data/doc/image/vector/unary_aggregation_w_option.png +0 -0
  27. data/doc/image/vector/unary_element_wise.png +0 -0
  28. data/lib/red_amber/data_frame.rb +33 -41
  29. data/lib/red_amber/data_frame_displayable.rb +59 -6
  30. data/lib/red_amber/data_frame_loadsave.rb +36 -0
  31. data/lib/red_amber/data_frame_reshaping.rb +12 -10
  32. data/lib/red_amber/data_frame_selectable.rb +53 -9
  33. data/lib/red_amber/data_frame_variable_operation.rb +57 -20
  34. data/lib/red_amber/group.rb +5 -3
  35. data/lib/red_amber/helper.rb +20 -18
  36. data/lib/red_amber/vector.rb +50 -31
  37. data/lib/red_amber/vector_functions.rb +21 -24
  38. data/lib/red_amber/vector_selectable.rb +18 -9
  39. data/lib/red_amber/vector_updatable.rb +6 -3
  40. data/lib/red_amber/version.rb +1 -1
  41. data/lib/red_amber.rb +1 -0
  42. metadata +13 -3
  43. data/doc/examples_of_red_amber.ipynb +0 -6783
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 73459d02c921fcb0fcb742760e8c882b5491fa5316a79b9016233a516ada013e
4
- data.tar.gz: ac25e808c5e5d4c13bb1877659550bba532cb5778371e39dfa1f3b9e5a91a4f8
3
+ metadata.gz: a16699a945f41bf98790f698998126cc6b4a5e916eccb805e78448ec029f9310
4
+ data.tar.gz: 5e7fa732f64567fd85e5a74b046e80861824f13d15dc910278b6c62359db9a22
5
5
  SHA512:
6
- metadata.gz: 1bfa4200d440c338f496fe282816634d6a833e30e17edc87a2cf5ec63866e2bbbaf8796916f1b052ea66482c54a038bbf1445258c2526691e42c2b47be2c39c5
7
- data.tar.gz: e324e480e6086f7017de58201783c857825b79d0b2e2c8fa2636089cd1c5531e22905a3c0d860f26b833eb6add6ed6017497632bd1ea8fcb932c2d2233b11812
6
+ metadata.gz: 6ae7a6e3a8015b6b9736fb934526d9dc96b43830f0890ccbc16e175e539a8df1053432a63dde84a31dbd3a170aa6256b681127c510117723427bce815568c981
7
+ data.tar.gz: a0e7d86a7bdc6be7ec493ef5331ced5ecf4e6b89458f4252f208435905a7e4e80a088a718098073fb0c65c86d76297c70c978cd4dec28b1eb1a0d915bb7e3608
data/.rubocop.yml CHANGED
@@ -63,6 +63,7 @@ Metrics/AbcSize:
63
63
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
64
64
  - 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
65
65
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 51
66
+ - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 30.15
66
67
  - 'lib/red_amber/vector_updatable.rb' # Max: 36
67
68
  - 'lib/red_amber/vector_selectable.rb' # Max: 33
68
69
 
@@ -86,6 +87,7 @@ Metrics/CyclomaticComplexity:
86
87
  Exclude:
87
88
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 18
88
89
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
90
+ - 'lib/red_amber/helper.rb' # Max: 15
89
91
  - 'lib/red_amber/vector_selectable.rb' # Max: 13
90
92
  - 'lib/red_amber/vector_updatable.rb' # Max: 14
91
93
 
@@ -94,6 +96,8 @@ Metrics/MethodLength:
94
96
  Max: 30
95
97
  Exclude:
96
98
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 33
99
+ - 'lib/red_amber/data_frame_selectable.rb' # Max: 38
100
+ - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 35
97
101
 
98
102
  # Max: 100
99
103
  Metrics/ModuleLength:
@@ -109,6 +113,7 @@ Metrics/PerceivedComplexity:
109
113
  Max: 13
110
114
  Exclude:
111
115
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
116
+ - 'lib/red_amber/helper.rb' # Max: 15
112
117
  - 'lib/red_amber/vector_updatable.rb' # Max: 15
113
118
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 19
114
119
 
data/CHANGELOG.md CHANGED
@@ -1,3 +1,121 @@
1
+ ## [0.2.2] - 2022-10-04
2
+
3
+ - Bug fixes
4
+
5
+ - Return self when no replacement happen in Vector#replace. (#92)
6
+
7
+ - Limit n-digits in to_iruby. (#111)
8
+
9
+ - Fix displaying space in to_iruby. (#111)
10
+
11
+ - Raise error if key is duplicated. (#113)
12
+
13
+ - Fix DataFrame#pick/#drop with endless Range. (#113)
14
+
15
+ - Change type from dictionary to string in DataFrame reshaping methods. (#113)
16
+
17
+ - Fix arguments parser to accept Enumerator. (#114)
18
+
19
+ - New features and improvements
20
+
21
+ - Support to make a data frame from a to_arrow-responsible object. (#106) [Patch by Kenta Murata]
22
+
23
+ - Introduce DataFrame#auto_cast (experimental feature) (#105)
24
+
25
+ - Change default name in DataFrame#transpose, #to_long, #to_wide. (#110)
26
+
27
+ - Add Vector#dictionary? method. (#113)
28
+
29
+ - Add display mode 'Plain' and 'Minimum'. (#113)
30
+
31
+ - Refactor code
32
+
33
+ - Refine test_vector_selectable. (#92)
34
+ - Refine test_vector_updatable. (#92)
35
+ - Refine Vector.new. (#113)
36
+ - Refine DataFrame#pick, #drop. (#113)
37
+
38
+ - Documents
39
+
40
+ - Update images. (#90, #105, #113)
41
+
42
+ - Update README to use simpler examples. (#112)
43
+ - Update README with a new screenshot example. (#113)
44
+
45
+ - GitHub site
46
+
47
+ - Update Jupyter notebooks in Binder (#88, #115)
48
+ - Move binder support to heronshoes/docker-stacks repository.
49
+ - Update README notebook on binder.
50
+ - Add examples_of_RedAmber notebook on binder.
51
+
52
+ - Start to use discussions.
53
+
54
+ - Thanks
55
+
56
+ - Kenta Murata
57
+
58
+ ## [0.2.1] - 2022-09-07
59
+
60
+ - Bug fixes
61
+
62
+ - Fix `Vector#each` with block (#66)
63
+ `Vector#each` will return value of each element with block.
64
+
65
+ - Fix table format at size == 9 (#67)
66
+
67
+ - Fix to support Vector in `DataFrame#assign` (#77)
68
+
69
+ - Add `assert_delta` functionality for `assert_with_NaN` (#78)
70
+
71
+ - Fix Vector#is_in when self is chunked (#79)
72
+
73
+ - Fix Array type error (uint/int) (#79)
74
+
75
+ - New features and improvements
76
+
77
+ - Refine `DataFrame#indices` method (#67)
78
+
79
+ - Update DataFrame reshaping methods (#73)
80
+
81
+ - Change default option value of DataFrame reshaping
82
+
83
+ - Change the order of import_cars example
84
+
85
+ - Add `DataFrame#method_missing` to get column vector by method (#75)
86
+
87
+ - Add `DataFrame#method_missing` to get column (#75)
88
+
89
+ - Accept both args and block in `DataFrame#assign` (#75)
90
+
91
+ - Accept indices in `DataFrame#pick` and `DataFrame#drop` (#76)
92
+
93
+ - Add `DataFrame#slice_by` method (#77)
94
+
95
+ - Add new Vector functions (#78)
96
+
97
+ - Add inverse trigonometric function for Vector
98
+ - `acos`
99
+ - `asin`
100
+
101
+ - Add logarithmic function for Vector
102
+ - `ln`
103
+ - `log10`
104
+ - `log1p`
105
+ - `log2`
106
+
107
+ - Add binary function `Vector#logb`
108
+
109
+ - Docker image and Jupyter Notebook [Thanks to Kenta Murata]
110
+ - Add link to RubyData in README
111
+ - Add link to interactive README by Binder
112
+
113
+ - Update Jupyter Notebook `71 examples of RedAmber`
114
+
115
+ - Thanks
116
+
117
+ - Kenta Murata
118
+
1
119
  ## [0.2.0] - 2022-08-15
2
120
 
3
121
  - Bump version up to 0.2.0
@@ -236,6 +354,13 @@
236
354
  - Documentation
237
355
  - Fix typo in DataFrame.md
238
356
 
357
+ - Github site
358
+ - Add gem and status badges in README. (#42) [Patch by kojix2]
359
+
360
+ - Thanks
361
+
362
+ - kojix2
363
+
239
364
  ## [0.1.5] - 2022-06-12 (experimental)
240
365
 
241
366
  - Bug fixes
data/README.md CHANGED
@@ -2,12 +2,15 @@
2
2
 
3
3
  [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
4
4
  [![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
5
+ [![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
5
6
 
6
7
  A simple dataframe library for Ruby.
7
8
 
8
9
  - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
9
10
  - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
10
11
 
12
+ ![screenshot from jupyterlab](doc/image/screenshot.png)
13
+
11
14
  ## Requirements
12
15
 
13
16
  Supported Ruby version is >= 2.7.
@@ -53,328 +56,136 @@ Or install it yourself as:
53
56
  gem install red_amber
54
57
  ```
55
58
 
56
- ## `RedAmber::DataFrame`
59
+ ## Docker image and Jupyter Notebook
57
60
 
58
- Represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
61
+ [RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
59
62
 
60
- ```ruby
61
- require 'red_amber' # require 'red-amber' is also OK.
62
- require 'datasets-arrow'
63
+ Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=README.ipynb).
64
+ [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb)
63
65
 
64
- arrow = Datasets::Penguins.new.to_arrow
65
- penguins = RedAmber::DataFrame.new(arrow)
66
66
 
67
- # =>
68
- #<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
69
- species island bill_length_mm bill_depth_mm flipper_length_mm ... year
70
- <string> <string> <double> <double> <uint8> ... <uint16>
71
- 1 Adelie Torgersen 39.1 18.7 181 ... 2007
72
- 2 Adelie Torgersen 39.5 17.4 186 ... 2007
73
- 3 Adelie Torgersen 40.3 18.0 195 ... 2007
74
- 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
75
- 5 Adelie Torgersen 36.7 19.3 193 ... 2007
76
- : : : : : : ... :
77
- 342 Gentoo Biscoe 50.4 15.7 222 ... 2009
78
- 343 Gentoo Biscoe 45.2 14.8 212 ... 2009
79
- 344 Gentoo Biscoe 49.9 16.1 213 ... 2009
80
- ```
67
+ ## Data frame in `RedAmber`
81
68
 
82
- ### DataFrame model
83
- ![dataframe model of RedAmber](doc/image/dataframe_model.png)
84
-
85
- For example, `DataFrame#pick` accepts keys as an argument and returns a sub DataFrame.
69
+ Class `RedAmber::DataFrame` represents a set of data in 2D-shape.
70
+ The entity is a Red Arrow's Table object.
86
71
 
87
- ![pick method image](doc/image/dataframe/pick.png)
88
-
89
- ```ruby
90
- penguins.keys
91
- # =>
92
- [:species,
93
- :island,
94
- :bill_length_mm,
95
- :bill_depth_mm,
96
- :flipper_length_mm,
97
- :body_mass_g,
98
- :sex,
99
- :year]
100
-
101
- df = penguins.pick(:species, :island, :body_mass_g)
102
- df
103
-
104
- # =>
105
- #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003cc1c>
106
- species island body_mass_g
107
- <string> <string> <uint16>
108
- 1 Adelie Torgersen 3750
109
- 2 Adelie Torgersen 3800
110
- 3 Adelie Torgersen 3250
111
- 4 Adelie Torgersen (nil)
112
- 5 Adelie Torgersen 3450
113
- : : : :
114
- 342 Gentoo Biscoe 5750
115
- 343 Gentoo Biscoe 5200
116
- 344 Gentoo Biscoe 5400
117
- ```
118
-
119
- `DataFrame#drop` drops some columns to create a remainer DataFrame.
120
-
121
- ![drop method image](doc/image/dataframe/drop.png)
72
+ ![dataframe model of RedAmber](doc/image/dataframe_model.png)
122
73
 
123
- You can specify by keys or a boolean array (same size as n_keys).
74
+ Load the library.
124
75
 
125
76
  ```ruby
126
- # Same as df.drop(:species, :island)
127
- df = df.drop(true, true, false)
128
-
129
- # =>
130
- #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
131
- body_mass_g
132
- <uint16>
133
- 1 3750
134
- 2 3800
135
- 3 3250
136
- 4 (nil)
137
- 5 3450
138
- : :
139
- 342 5750
140
- 343 5200
141
- 344 5400
77
+ require 'red_amber' # require 'red-amber' is also OK.
78
+ include RedAmber
142
79
  ```
143
80
 
144
- Arrow data is immutable, so these methods always return an new object.
145
-
146
- `DataFrame#assign` creates new columns or update existing columns.
147
-
148
- ![assign method image](doc/image/dataframe/assign.png)
81
+ ### Example: diamonds dataset
149
82
 
150
83
  ```ruby
151
- # New column is created because ':body_mass_kg' is a new key.
152
- df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
84
+ require 'datasets-arrow' # to load sample data
153
85
 
154
- # =>
155
- #<RedAmber::DataFrame : 344 x 2 Vectors, 0x00000000000212f0>
156
- body_mass_g body_mass_kg
157
- <uint16> <double>
158
- 1 3750 3.8
159
- 2 3800 3.8
160
- 3 3250 3.3
161
- 4 (nil) (nil)
162
- 5 3450 3.5
163
- : : :
164
- 342 5750 5.8
165
- 343 5200 5.2
166
- 344 5400 5.4
167
- ```
168
-
169
- `DataFrame#slice` selects rows (observations) to create a sub DataFrame.
170
-
171
- ![slice method image](doc/image/dataframe/slice.png)
172
-
173
- ```ruby
174
- # returns 5 rows at the start and 5 rows from the end
175
- penguins.slice(0...5, -5..-1)
86
+ dataset = Datasets::Diamonds.new
87
+ diamonds = DataFrame.new(dataset) # from v0.2.2, should be `dataset.to_arrow` if older.
176
88
 
177
89
  # =>
178
- #<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
179
- species island bill_length_mm bill_depth_mm flipper_length_mm ... year
180
- <string> <string> <double> <double> <uint8> ... <uint16>
181
- 1 Adelie Torgersen 39.1 18.7 181 ... 2007
182
- 2 Adelie Torgersen 39.5 17.4 186 ... 2007
183
- 3 Adelie Torgersen 40.3 18.0 195 ... 2007
184
- 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
185
- 5 Adelie Torgersen 36.7 19.3 193 ... 2007
186
- : : : : : : ... :
187
- 8 Gentoo Biscoe 50.4 15.7 222 ... 2009
188
- 9 Gentoo Biscoe 45.2 14.8 212 ... 2009
189
- 10 Gentoo Biscoe 49.9 16.1 213 ... 2009
90
+ #<RedAmber::DataFrame : 53940 x 10 Vectors, 0x000000000000f668>
91
+ carat cut color clarity depth table price x ... z
92
+ <double> <string> <string> <string> <double> <double> <uint16> <double> ... <double>
93
+ 0 0.23 Ideal E SI2 61.5 55.0 326 3.95 ... 2.43
94
+ 1 0.21 Premium E SI1 59.8 61.0 326 3.89 ... 2.31
95
+ 2 0.23 Good E VS1 56.9 65.0 327 4.05 ... 2.31
96
+ 3 0.29 Premium I VS2 62.4 58.0 334 4.2 ... 2.63
97
+ 4 0.31 Good J SI2 63.3 58.0 335 4.34 ... 2.75
98
+ : : : : : : : : : ... :
99
+ 53937 0.7 Very Good D SI1 62.8 60.0 2757 5.66 ... 3.56
100
+ 53938 0.86 Premium H SI2 61.0 58.0 2757 6.15 ... 3.74
101
+ 53939 0.75 Ideal D SI2 62.2 55.0 2757 5.83 ... 3.64
190
102
  ```
191
103
 
192
- `DataFrame#remove` rejects rows (observations) to create a remainer DataFrame.
193
-
194
- ![remove method image](doc/image/dataframe/remove.png)
104
+ For example, we can compute mean prices per 'cut' for the data larger than 1 carat.
195
105
 
196
106
  ```ruby
197
- # penguins[:bill_length_mm] < 40 returns a boolean Vector
198
- penguins.remove(penguins[:bill_length_mm] < 40)
107
+ df = diamonds
108
+ .slice { carat > 1 }
109
+ .group(:cut)
110
+ .mean(:price) # `pick` prior to `group` is not required if `:price` is specified here.
111
+ .sort('-mean(price)')
199
112
 
200
113
  # =>
201
- #<RedAmber::DataFrame : 244 x 8 Vectors, 0x000000000007d6f4>
202
- species island bill_length_mm bill_depth_mm flipper_length_mm ... year
203
- <string> <string> <double> <double> <uint8> ... <uint16>
204
- 1 Adelie Torgersen 40.3 18.0 195 ... 2007
205
- 2 Adelie Torgersen (nil) (nil) (nil) ... 2007
206
- 3 Adelie Torgersen 42.0 20.2 190 ... 2007
207
- 4 Adelie Torgersen 41.1 17.6 182 ... 2007
208
- 5 Adelie Torgersen 42.5 20.7 197 ... 2007
209
- : : : : : : ... :
210
- 242 Gentoo Biscoe 50.4 15.7 222 ... 2009
211
- 243 Gentoo Biscoe 45.2 14.8 212 ... 2009
212
- 244 Gentoo Biscoe 49.9 16.1 213 ... 2009
114
+ #<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f67c>
115
+ cut mean(price)
116
+ <string> <double>
117
+ 0 Ideal 8674.23
118
+ 1 Premium 8487.25
119
+ 2 Very Good 8340.55
120
+ 3 Good 7753.6
121
+ 4 Fair 7177.86
213
122
  ```
214
123
 
215
- DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
216
-
217
- This example is usage of block to update a column.
124
+ Arrow data is immutable, so these methods always return new objects.
125
+ Next example will rename a column and create a new column by simple calcuration.
218
126
 
219
127
  ```ruby
220
- df = RedAmber::DataFrame.new(
221
- integer: [0, 1, 2, 3, nil],
222
- float: [0.0, 1.1, 2.2, Float::NAN, nil],
223
- string: ['A', 'B', 'C', 'D', nil],
224
- boolean: [true, false, true, false, nil])
225
- df
128
+ usdjpy = 110.0
226
129
 
227
- # =>
228
- #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000003131c>
229
- integer float string boolean
230
- <uint8> <double> <string> <boolean>
231
- 1 0 0.0 A true
232
- 2 1 1.1 B false
233
- 3 2 2.2 C true
234
- 4 3 NaN D false
235
- 5 (nil) (nil) (nil) (nil)
236
-
237
- df.assign do
238
- vectors.select(&:float?).map { |v| [v.key, -v] }
239
- # => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
240
- end
130
+ df.rename('mean(price)': :mean_price_USD)
131
+ .assign(:mean_price_JPY) { mean_price_USD * usdjpy }
241
132
 
242
133
  # =>
243
- #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
244
- index float string
245
- <uint8> <double> <string>
246
- 1 0 -0.0 A
247
- 2 1 -1.1 B
248
- 3 2 -2.2 C
249
- 4 3 NaN D
250
- 5 (nil) (nil) (nil)
134
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f71c>
135
+ cut mean_price_USD mean_price_JPY
136
+ <string> <double> <double>
137
+ 0 Ideal 8674.23 954164.93
138
+ 1 Premium 8487.25 933597.34
139
+ 2 Very Good 8340.55 917460.37
140
+ 3 Good 7753.6 852896.11
141
+ 4 Fair 7177.86 789564.12
251
142
  ```
252
143
 
253
- Next example is to eliminate rows containing nil.
144
+ ### Example: starwars dataset
254
145
 
255
- ```ruby
256
- # remove all observations containing nil
257
- nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
258
- nil_removed.tdr
259
-
260
- # =>
261
- RedAmber::DataFrame : 342 x 8 Vectors
262
- Vectors : 5 numeric, 3 strings
263
- # key type level data_preview
264
- 1 :species string 3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
265
- 2 :island string 3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
266
- 3 :bill_length_mm double 164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
267
- 4 :bill_depth_mm double 80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
268
- 5 :flipper_length_mm int64 55 [181, 186, 195, 193, 190, ... ]
269
- 6 :body_mass_g int64 94 [3750, 3800, 3250, 3450, 3650, ... ]
270
- 7 :sex string 3 {"male"=>168, "female"=>165, ""=>9}
271
- 8 :year int64 3 {2007=>109, 2008=>114, 2009=>119}
272
- ```
273
-
274
- For this frequently needed task, we can do it much simpler.
146
+ Next example is `starwars` dataset reading from the downloaded CSV file. Followed by minimum data cleansing.
275
147
 
276
148
  ```ruby
277
- penguins.remove_nil # => same result as above
278
- ```
149
+ uri = URI('https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv')
279
150
 
280
- `DataFrame#summary` shows summary statistics in a DataFrame.
151
+ starwars = DataFrame.load(uri)
281
152
 
282
- ```ruby
283
- puts penguins.summary.to_s(width: 82)
284
-
285
- # =>
286
- variables count mean std min 25% median 75% max
287
- <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
288
- 1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
289
- 2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
290
- 3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
291
- 4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
292
- 5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
293
- ```
294
-
295
- `DataFrame#group` method can be used for the grouping tasks.
296
-
297
- ```ruby
298
- starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
299
153
  starwars
154
+ .drop(0) # delete unnecessary index column
155
+ .remove { species == "NA" } # delete unnecessary rows
156
+ .group(:species) { [count(:species), mean(:height, :mass)] }
157
+ .slice { count > 1 }
300
158
 
301
159
  # =>
302
- #<RedAmber::DataFrame : 87 x 12 Vectors, 0x000000000000607c>
303
- unnamed1 name height mass hair_color skin_color eye_color ... species
304
- <int64> <string> <int64> <double> <string> <string> <string> ... <string>
305
- 1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
306
- 2 2 C-3PO 167 75.0 NA gold yellow ... Droid
307
- 3 3 R2-D2 96 32.0 NA white, blue red ... Droid
308
- 4 4 Darth Vader 202 136.0 none white yellow ... Human
309
- 5 5 Leia Organa 150 49.0 brown light brown ... Human
310
- : : : : : : : : ... :
311
- 85 85 BB8 (nil) (nil) none none black ... Droid
312
- 86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
313
- 87 87 Padmé Amidala 165 45.0 brown light brown ... Human
314
-
315
- grouped = starwars.group(:species) { [count(:species), mean(:height, :mass)] }
316
- grouped.slice { v(:count) > 1 }
317
-
318
- # =>
319
- #<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000006e848>
160
+ #<RedAmber::DataFrame : 8 x 4 Vectors, 0x000000000000f848>
320
161
  species count mean(height) mean(mass)
321
162
  <string> <int64> <double> <double>
322
- 1 Human 35 176.6 82.8
323
- 2 Droid 6 131.2 69.8
324
- 3 Wookiee 2 231.0 124.0
325
- 4 Gungan 3 208.7 74.0
326
- 5 NA 4 181.3 48.0
327
- : : : : :
328
- 7 Twi'lek 2 179.0 55.0
329
- 8 Mirialan 2 168.0 53.1
330
- 9 Kaminoan 2 221.0 88.0
163
+ 0 Human 35 176.65 82.78
164
+ 1 Droid 6 131.2 69.75
165
+ 2 Wookiee 2 231.0 124.0
166
+ 3 Gungan 3 208.67 74.0
167
+ 4 Zabrak 2 173.0 80.0
168
+ 5 Twi'lek 2 179.0 55.0
169
+ 6 Mirialan 2 168.0 53.1
170
+ 7 Kaminoan 2 221.0 88.0
331
171
  ```
332
172
 
333
173
  See [DataFrame.md](doc/DataFrame.md) for other examples and details.
334
174
 
335
175
 
336
- ## `RedAmber::Vector`
176
+ ### `Vector` for 1D data object in column
337
177
 
338
178
  Class `RedAmber::Vector` represents a series of data in the DataFrame.
339
- Method `RedAmber::DataFrame#[key]` returns a Vector with the key `key`.
340
-
341
- ```ruby
342
- penguins[:bill_length_mm]
343
- # =>
344
- #<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
345
- [39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
346
- ```
347
-
348
- Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
349
-
350
- This is an element-wise comparison and returns a boolean Vector of same size.
351
-
352
- ![unary element-wise](doc/image/vector/unary_element_wise.png)
353
-
354
- ```ruby
355
- penguins[:bill_length_mm] < 40
356
-
357
- # =>
358
- #<RedAmber::Vector(:boolean, size=344):0x000000000007e7ac>
359
- [true, true, false, nil, true, true, true, true, true, false, true, true, false, ... ]
360
- ```
361
-
362
- Next example returns aggregated result.
363
-
364
- ![unary aggregation](doc/image/vector/unary_aggregation.png)
365
-
366
- ```ruby
367
- penguins[:bill_length_mm].mean
368
- 43.92192982456141
369
- # =>
370
-
371
- ```
372
179
 
373
180
  See [Vector.md](doc/Vector.md) for details.
374
181
 
375
182
  ## Jupyter notebook
376
183
 
377
- [61 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
184
+ [73 Examples of Red Amber](binder/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
185
+
186
+ You can try this notebook on [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb).
187
+ [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb)
188
+
378
189
 
379
190
  ## Development
380
191
 
@@ -385,8 +196,14 @@ bundle install
385
196
  bundle exec rake test
386
197
  ```
387
198
 
199
+ ## Community
200
+
388
201
  I will appreciate if you could help to improve this project. Here are a few ways you can help:
389
202
 
203
+ - Let's talk in the [discussions](https://github.com/heronshoes/red_amber/discussions). [![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
204
+ - Browse Q and A, how to use, tips, etc.
205
+ - Ask questions you’re wondering about.
206
+ - Share ideas. The idea may be promoted to issues or pull requests.
390
207
  - [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
391
208
  - Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
392
209
  - Write, clarify, or fix documentation