red_amber 0.1.7 → 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 88bdd603d8daec1a95c0277ef68857f84346ad7cf95d0ba23a306e6b70567c29
4
- data.tar.gz: 40add80cbaa5183ca0e93eadcdcd1fead37015cac1cb2360660002c0b1878255
3
+ metadata.gz: d239a3fa90e5796fb695f8d3c4995d0a2178ea7c8c2789bed157e688902585cb
4
+ data.tar.gz: 968c02294d24a3dabaa6e5128be0bcfad713e131df15850ac0ceb64c2883dcd0
5
5
  SHA512:
6
- metadata.gz: d043eea51117ecc48bdc52fa951e24d2618f273eb289a30f5bbb182e1a891763cdd35f6a7c6764f6e0061bddeaaa86b2374de1dc2b48f25a5b6b05c9af83a0e3
7
- data.tar.gz: cdbba19750bf71fe99e55bf6c46cb4522018f43563d7a93fdc375987f9388234e4f7e833297fdb6b8dd5a41b5a1bfdbf287ea47663f5f8a90facb56a4c63daef
6
+ metadata.gz: d1c5ffd9650dd8c9e825514cd7e2ff4914690bd731ac262fca6cc17e56c1e312679689351a05fb741dccfb59377214706a8bf6ca6fe3237ca46fb623ae1b9f10
7
+ data.tar.gz: f37c4aff9170cd5105737a9d2b3d827051254dcca6968b697f5ed3a70e1b2c3cb14303e88a9c342870d1447450a538e445d6f3d37de53591d3f6d13b87aebc16
data/.rubocop.yml CHANGED
@@ -43,6 +43,11 @@ Lint/BinaryOperatorWithIdenticalOperands:
43
43
  Exclude:
44
44
  - 'test/test_vector_function.rb'
45
45
 
46
+ # Need for test with empty block
47
+ Lint/EmptyBlock:
48
+ Exclude:
49
+ - 'test/test_group.rb'
50
+
46
51
  # Max: 120
47
52
  Layout/LineLength:
48
53
  Max: 118
@@ -56,6 +61,7 @@ Metrics/AbcSize:
56
61
  Max: 30
57
62
  Exclude:
58
63
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
64
+ - 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
59
65
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 51
60
66
  - 'lib/red_amber/vector_updatable.rb' # Max: 36
61
67
  - 'lib/red_amber/vector_selectable.rb' # Max: 33
@@ -78,23 +84,27 @@ Metrics/ClassLength:
78
84
  Metrics/CyclomaticComplexity:
79
85
  Max: 12
80
86
  Exclude:
87
+ - 'lib/red_amber/data_frame_displayable.rb' # Max: 18
81
88
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
89
+ - 'lib/red_amber/vector_selectable.rb' # Max: 13
82
90
  - 'lib/red_amber/vector_updatable.rb' # Max: 14
83
- - 'lib/red_amber/data_frame_displayable.rb' # Max: 18
84
91
 
85
92
  # Max: 10
86
93
  Metrics/MethodLength:
87
94
  Max: 30
88
95
  Exclude:
89
96
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 33
97
+ - 'lib/red_amber/data_frame_selectable.rb' # Max: 38
98
+ - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 35
90
99
 
91
100
  # Max: 100
92
101
  Metrics/ModuleLength:
93
102
  Max: 100
94
103
  Exclude:
104
+ - 'lib/red_amber/data_frame_displayable.rb' # Max: 132
95
105
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 141
106
+ - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 110
96
107
  - 'lib/red_amber/vector_functions.rb' # Max: 114
97
- - 'lib/red_amber/data_frame_displayable.rb' # Max: 132
98
108
 
99
109
  # Max: 8
100
110
  Metrics/PerceivedComplexity:
data/.rubocop_todo.yml CHANGED
@@ -1,15 +1,2 @@
1
- # This configuration was generated by
2
- # `rubocop --auto-gen-config`
3
- # on 2022-05-08 02:37:36 UTC using RuboCop version 1.27.0.
4
- # The point is for the user to remove these configuration records
5
- # one by one as the offenses are removed from the code base.
6
- # Note that changes in the inspected code, or installation of new
7
- # versions of RuboCop, may require this file to be generated again.
8
-
9
- # Offense count: 1
10
- # This cop supports unsafe auto-correction (--auto-correct-all).
11
- # Configuration parameters: EnforcedStyle.
12
- # SupportedStyles: forbid_for_all_comparison_operators, forbid_for_equality_operators_only, require_for_all_comparison_operators, require_for_equality_operators_only
13
- Style/YodaCondition:
14
- Exclude:
15
- - 'lib/red_amber/data_frame.rb'
1
+ # We will use cops to detect bugs in an early stage
2
+ # Feel free to use .rubocop_todo.yml by --auto-gen-config
data/.yardopts ADDED
@@ -0,0 +1 @@
1
+ --output-dir doc/yard
data/CHANGELOG.md CHANGED
@@ -1,6 +1,168 @@
1
- ## [0.1.9] - Unreleased
1
+ ## [0.2.1] - 2022-09-07
2
2
 
3
- - Supports Arrow 9.0.0
3
+ -Bug fixes
4
+
5
+ - Fix `Vector#each` with block (#66)
6
+ `Vector#each` will return value of each element with block.
7
+
8
+ - Fix table format at size == 9 (#67)
9
+
10
+ - Fix to support Vector in `DataFrame#assign` (#77)
11
+
12
+ - Add `assert_delta` functionality for `assert_with_NaN` (#78)
13
+
14
+ - Fix Vector#is_in when self is chunked (#79)
15
+
16
+ - Fix Array type error (uint/int) (#79)
17
+
18
+ - New features and improvements
19
+
20
+ - Refine `DataFrame#indices` method (#67)
21
+
22
+ - Update DataFrame reshaping methods (#73)
23
+
24
+ - Change default option value of DataFrame reshaping
25
+
26
+ - Change the order of import_cars example
27
+
28
+ - Add `DataFrame#method_missing` to get column vector by method (#75)
29
+
30
+ - Add `DataFrame#method_missing` to get column (#75)
31
+
32
+ - Accept both args and block in `DataFrame#assign` (#75)
33
+
34
+ - Accept indices in `DataFrame#pick` and `DataFrame#drop` (#76)
35
+
36
+ - Add `DataFrame#slice_by` method (#77)
37
+
38
+ - Add new Vector functions (#78)
39
+
40
+ - Add inverse trigonometric function for Vector
41
+ - `acos`
42
+ - `asin`
43
+
44
+ - Add logarithmic function for Vector
45
+ - `ln`
46
+ - `log10`
47
+ - `log1p`
48
+ - `log2`
49
+
50
+ - Add binary function `Vector#logb`
51
+
52
+ - Docker image and Jupyter Notebook (Thanks to @mrkn)
53
+ - Add link to RubyData in README
54
+ - Add link to interactive README by Binder
55
+
56
+ - Update Jupyter Notebook `71 examples of RedAmber`
57
+
58
+
59
+ ## [0.2.0] - 2022-08-15
60
+
61
+ - Bump version up to 0.2.0
62
+
63
+ - Bug fixes
64
+
65
+ - Fix order of multiple group keys (#55)
66
+
67
+ Only 1 group key comes to left. Other keys remain in right.
68
+
69
+ - Remove optional `require` for rover (#55)
70
+
71
+ Fix DataFrame.new for argument with Rover::DataFrame.
72
+
73
+ - Fix occasional failure in CI (#59)
74
+
75
+ Sometimes the CI test fails. I added -dev dependency
76
+ in Arrow install by apt, not doing in bundler.
77
+
78
+ - Fix calling :take in V#[] (#56)
79
+
80
+ Fixed to call Arrow function :take instead of :array_take in Vector#take_by_vector. This will prevent the error below
81
+ when called with Arrow::ChunkedArray.
82
+
83
+ - Raise error renaming non existing key (#61)
84
+
85
+ Add error when specified key is not exist.
86
+
87
+ - Fix DataFrame#rename #assign by array (#65)
88
+
89
+ - New features and improvements
90
+
91
+ - Support Arrow 9.0.0
92
+ - Upgrade to Arrow 9.0.0 (#59)
93
+ - Add Vector#quantile method (#59)
94
+ Arrow::QuantileOptions has supported in Arrow GLib 9.0.0 (ARROW-16623, Thanks!)
95
+
96
+ - Add Vector#quantiles (#62)
97
+
98
+ - Add DataFrame#each_row (#56)
99
+ - Returns Enumerator if block is not given.
100
+ - Change DataFrame#each_row to return a Hash {key => row} (#63)
101
+
102
+ - Refactor to use pattern match in overloaded parameter parsing (#61)
103
+ - Refine DataFrame.new to use pattern match
104
+ - Use pattern match in DataFrame#assign
105
+ - Use pattern match in DataFrame#rename
106
+
107
+ - Accept Array for renamer/assigner in #rename/#assign (#61)
108
+ - Accept assigner by Arrays in DataFrame#assign
109
+ - Accept renamer pairs by Arrays in DataFrame#rename
110
+ - Add DataFrame#assign_left method
111
+
112
+ - Add summary/describe (#62)
113
+ - Introduce DataFrame#summary(#describe)
114
+
115
+ - Introduce reshaping methods for DataFrame (#64)
116
+ - Introduce DataFrame#transpose method
117
+ - Intorduce DataFrame#to_long method
118
+ - Intorduce DataFrame#to_wide method
119
+
120
+ - Others
121
+
122
+ - Add alias sort_index for array_sort_indices (#59)
123
+ - Enable :width option in DataFrame#to_s (#62)
124
+ - Add options to DataFrame#format_table (#62)
125
+
126
+ - Update Documents
127
+
128
+ - Add Yard doc for some methods
129
+
130
+ - Update Jupyter notebook '61 Examples of Red Amber' (#65)
131
+
132
+ ## [0.1.8] - 2022-08-04 (experimental)
133
+
134
+ - Bug fixes
135
+
136
+ - Fix unnamed column in table formatter (#52)
137
+ - Fix DataFrame#key?, DataFrame#key_index when @keys.nil? (#52)
138
+ - Align order of replacer in Vector#replace (#53, resolved #38)
139
+
140
+ - New features and improvements
141
+
142
+ - Refine DataFrame.new for empty arguments (#50)
143
+ - Delete .rubocop_todo.yml for not to use yoda condition (#50)
144
+
145
+ - Refine Group (#52, resolved #28)
146
+ - Refine Group methods creation
147
+ - Make group key at first(left)
148
+ - Show only one group count when same counts
149
+ - Add block acceptability for group
150
+ - Rename empty key to :unnamed in DataFrame.new
151
+ - Rename Group#aggregated_by to #summarize (#54)
152
+
153
+ - Add Vector#shift (#51)
154
+
155
+ - Vector#[] accepts Range as an argument (#51)
156
+
157
+ - Update documents
158
+
159
+ - Add support for yard (#54)
160
+
161
+ - Renew jupyter notebook '53 examples' (#54)
162
+
163
+ - Add more examples and images in README (#52)
164
+ - Add document of group manipulations in README (#52)
165
+ - Renew DF#group document in DataFrame.md (#52)
4
166
 
5
167
  ## [0.1.7] - 2022-07-15 (experimental)
6
168
 
data/Gemfile CHANGED
@@ -7,7 +7,7 @@ gemspec
7
7
  group :test do
8
8
  gem 'rake'
9
9
 
10
- gem 'red-parquet', '>= 8.0.0'
10
+ gem 'red-parquet', '>= 9.0.0'
11
11
  gem 'rover-df', '~> 0.3.0'
12
12
 
13
13
  gem 'rubocop'
@@ -18,6 +18,7 @@ group :test do
18
18
  gem 'iruby'
19
19
  gem 'test-unit'
20
20
  gem 'webrick'
21
+ gem 'yard'
21
22
 
22
23
  gem 'benchmark_driver'
23
24
  gem 'red-datasets'
data/README.md CHANGED
@@ -3,17 +3,23 @@
3
3
  [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
4
4
  [![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
5
5
 
6
- A simple dataframe library for Ruby (experimental).
6
+ A simple dataframe library for Ruby.
7
7
 
8
8
  - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
9
9
  - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
10
10
 
11
11
  ## Requirements
12
12
 
13
+ Supported Ruby version is >= 2.7.
14
+
15
+ Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
16
+ I recommend Ruby 3 for performance.
17
+
13
18
  ```ruby
14
- gem 'red-arrow', '>= 8.0.0'
19
+ # Libraries required
20
+ gem 'red-arrow', '>= 9.0.0'
15
21
 
16
- gem 'red-parquet', '>= 8.0.0' # Optional, if you use IO from/to parquet
22
+ gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet
17
23
  gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
18
24
  ```
19
25
 
@@ -21,9 +27,9 @@ gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
21
27
 
22
28
  Install requirements before you install Red Amber.
23
29
 
24
- - Apache Arrow GLib (>= 8.0.0)
30
+ - Apache Arrow GLib (>= 9.0.0)
25
31
 
26
- - Apache Parquet GLib (>= 8.0.0) # If you use IO from/to parquet
32
+ - Apache Parquet GLib (>= 9.0.0) # If you use IO from/to parquet
27
33
 
28
34
  See [Apache Arrow install document](https://arrow.apache.org/install/).
29
35
 
@@ -47,16 +53,27 @@ Or install it yourself as:
47
53
  gem install red_amber
48
54
  ```
49
55
 
56
+ ## Docker image and Jupyter Notebook
57
+
58
+ [RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
59
+
60
+ Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb).
61
+ [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb)
62
+
63
+
64
+
50
65
  ## `RedAmber::DataFrame`
51
66
 
52
- Represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
67
+ It represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
68
+
69
+ ![dataframe model of RedAmber](doc/image/dataframe_model.png)
53
70
 
54
71
  ```ruby
55
72
  require 'red_amber' # require 'red-amber' is also OK.
56
73
  require 'datasets-arrow'
57
74
 
58
75
  arrow = Datasets::Penguins.new.to_arrow
59
- RedAmber::DataFrame.new(arrow)
76
+ penguins = RedAmber::DataFrame.new(arrow)
60
77
 
61
78
  # =>
62
79
  #<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
@@ -73,17 +90,52 @@ RedAmber::DataFrame.new(arrow)
73
90
  344 Gentoo Biscoe 49.9 16.1 213 ... 2009
74
91
  ```
75
92
 
76
- ### DataFrame model
77
- ![dataframe model of RedAmber](doc/image/dataframe_model.png)
93
+ For example, `DataFrame#pick` accepts keys as arguments and returns a sub DataFrame.
78
94
 
79
- For example, `DataFrame#pick` accepts keys as an argument and returns a sub DataFrame.
95
+ ![pick method image](doc/image/dataframe/pick.png)
80
96
 
81
97
  ```ruby
82
- df = penguins.pick(:body_mass_g)
98
+ penguins.keys
99
+ # =>
100
+ [:species,
101
+ :island,
102
+ :bill_length_mm,
103
+ :bill_depth_mm,
104
+ :flipper_length_mm,
105
+ :body_mass_g,
106
+ :sex,
107
+ :year]
108
+
109
+ df = penguins.pick(:species, :island, :body_mass_g)
83
110
  df
84
111
 
85
112
  # =>
86
- #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000015cc0>
113
+ #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003cc1c>
114
+ species island body_mass_g
115
+ <string> <string> <uint16>
116
+ 1 Adelie Torgersen 3750
117
+ 2 Adelie Torgersen 3800
118
+ 3 Adelie Torgersen 3250
119
+ 4 Adelie Torgersen (nil)
120
+ 5 Adelie Torgersen 3450
121
+ : : : :
122
+ 342 Gentoo Biscoe 5750
123
+ 343 Gentoo Biscoe 5200
124
+ 344 Gentoo Biscoe 5400
125
+ ```
126
+
127
+ `DataFrame#drop` drops some columns to create a remainer DataFrame.
128
+
129
+ ![drop method image](doc/image/dataframe/drop.png)
130
+
131
+ You can specify by keys or a boolean array of same size as n_keys.
132
+
133
+ ```ruby
134
+ # Same as df.drop(:species, :island)
135
+ df = df.drop(true, true, false)
136
+
137
+ # =>
138
+ #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
87
139
  body_mass_g
88
140
  <uint16>
89
141
  1 3750
@@ -97,9 +149,14 @@ df
97
149
  344 5400
98
150
  ```
99
151
 
100
- `DataFrame#assign` creates new variables (column in the table).
152
+ Arrow data is immutable, so these methods always return an new object.
153
+
154
+ `DataFrame#assign` creates new columns or update existing columns.
155
+
156
+ ![assign method image](doc/image/dataframe/assign.png)
101
157
 
102
158
  ```ruby
159
+ # New column is created because ':body_mass_kg' is a new key.
103
160
  df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
104
161
 
105
162
  # =>
@@ -117,14 +174,103 @@ df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
117
174
  344 5400 5.4
118
175
  ```
119
176
 
177
+ `DataFrame#slice` selects rows (observations) to create a sub DataFrame.
178
+
179
+ ![slice method image](doc/image/dataframe/slice.png)
180
+
181
+ ```ruby
182
+ # returns 5 rows at the start and 5 rows from the end
183
+ penguins.slice(0...5, -5..-1)
184
+
185
+ # =>
186
+ #<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
187
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
188
+ <string> <string> <double> <double> <uint8> ... <uint16>
189
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
190
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
191
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
192
+ 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
193
+ 5 Adelie Torgersen 36.7 19.3 193 ... 2007
194
+ : : : : : : ... :
195
+ 8 Gentoo Biscoe 50.4 15.7 222 ... 2009
196
+ 9 Gentoo Biscoe 45.2 14.8 212 ... 2009
197
+ 10 Gentoo Biscoe 49.9 16.1 213 ... 2009
198
+ ```
199
+
200
+ `DataFrame#remove` rejects rows (observations) to create a remainer DataFrame.
201
+
202
+ ![remove method image](doc/image/dataframe/remove.png)
203
+
204
+ ```ruby
205
+ # penguins[:bill_length_mm] < 40 returns a boolean Vector
206
+ penguins.remove(penguins[:bill_length_mm] < 40)
207
+
208
+ # =>
209
+ #<RedAmber::DataFrame : 244 x 8 Vectors, 0x000000000007d6f4>
210
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
211
+ <string> <string> <double> <double> <uint8> ... <uint16>
212
+ 1 Adelie Torgersen 40.3 18.0 195 ... 2007
213
+ 2 Adelie Torgersen (nil) (nil) (nil) ... 2007
214
+ 3 Adelie Torgersen 42.0 20.2 190 ... 2007
215
+ 4 Adelie Torgersen 41.1 17.6 182 ... 2007
216
+ 5 Adelie Torgersen 42.5 20.7 197 ... 2007
217
+ : : : : : : ... :
218
+ 242 Gentoo Biscoe 50.4 15.7 222 ... 2009
219
+ 243 Gentoo Biscoe 45.2 14.8 212 ... 2009
220
+ 244 Gentoo Biscoe 49.9 16.1 213 ... 2009
221
+ ```
222
+
120
223
  DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
121
224
 
122
- This is an exaple to eliminate observations (row in the table) containing nil.
225
+ Previous example is also OK with a block.
226
+
227
+ ```ruby
228
+ penguins.remove { bill_length_mm < 40 }
229
+ ```
230
+
231
+ Next example is an usage of block to update a column.
123
232
 
124
233
  ```ruby
125
- # remove all observation contains nil
234
+ df = RedAmber::DataFrame.new(
235
+ integer: [0, 1, 2, 3, nil],
236
+ float: [0.0, 1.1, 2.2, Float::NAN, nil],
237
+ string: ['A', 'B', 'C', 'D', nil],
238
+ boolean: [true, false, true, false, nil])
239
+ df
240
+
241
+ # =>
242
+ #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000003131c>
243
+ integer float string boolean
244
+ <uint8> <double> <string> <boolean>
245
+ 1 0 0.0 A true
246
+ 2 1 1.1 B false
247
+ 3 2 2.2 C true
248
+ 4 3 NaN D false
249
+ 5 (nil) (nil) (nil) (nil)
250
+
251
+ df.assign do
252
+ vectors.select(&:float?).map { |v| [v.key, -v] }
253
+ # => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
254
+ end
255
+
256
+ # =>
257
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
258
+ index float string
259
+ <uint8> <double> <string>
260
+ 1 0 -0.0 A
261
+ 2 1 -1.1 B
262
+ 3 2 -2.2 C
263
+ 4 3 NaN D
264
+ 5 (nil) (nil) (nil)
265
+ ```
266
+
267
+ Next example is to eliminate rows containing nil.
268
+
269
+ ```ruby
270
+ # remove all observations containing nil
126
271
  nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
127
272
  nil_removed.tdr
273
+
128
274
  # =>
129
275
  RedAmber::DataFrame : 342 x 8 Vectors
130
276
  Vectors : 5 numeric, 3 strings
@@ -145,12 +291,66 @@ For this frequently needed task, we can do it much simpler.
145
291
  penguins.remove_nil # => same result as above
146
292
  ```
147
293
 
148
- See [DataFrame.md](doc/DataFrame.md) for details.
294
+ `DataFrame#summary` shows summary statistics in a DataFrame.
295
+
296
+ ```ruby
297
+ puts penguins.summary.to_s(width: 82)
298
+
299
+ # =>
300
+ variables count mean std min 25% median 75% max
301
+ <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
302
+ 1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
303
+ 2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
304
+ 3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
305
+ 4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
306
+ 5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
307
+ ```
308
+
309
+ `DataFrame#group` method can be used for the grouping tasks.
310
+
311
+ ```ruby
312
+ starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
313
+ starwars
314
+
315
+ # =>
316
+ #<RedAmber::DataFrame : 87 x 12 Vectors, 0x000000000000607c>
317
+ unnamed1 name height mass hair_color skin_color eye_color ... species
318
+ <int64> <string> <int64> <double> <string> <string> <string> ... <string>
319
+ 1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
320
+ 2 2 C-3PO 167 75.0 NA gold yellow ... Droid
321
+ 3 3 R2-D2 96 32.0 NA white, blue red ... Droid
322
+ 4 4 Darth Vader 202 136.0 none white yellow ... Human
323
+ 5 5 Leia Organa 150 49.0 brown light brown ... Human
324
+ : : : : : : : : ... :
325
+ 85 85 BB8 (nil) (nil) none none black ... Droid
326
+ 86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
327
+ 87 87 Padmé Amidala 165 45.0 brown light brown ... Human
328
+
329
+ starwars.group(:species) { [count(:species), mean(:height, :mass)] }
330
+ .slice { count > 1 }
331
+
332
+ # =>
333
+ #<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000006e848>
334
+ species count mean(height) mean(mass)
335
+ <string> <int64> <double> <double>
336
+ 1 Human 35 176.6 82.8
337
+ 2 Droid 6 131.2 69.8
338
+ 3 Wookiee 2 231.0 124.0
339
+ 4 Gungan 3 208.7 74.0
340
+ 5 NA 4 181.3 48.0
341
+ 6 Zabrak 2 173.0 80.0
342
+ 7 Twi'lek 2 179.0 55.0
343
+ 8 Mirialan 2 168.0 53.1
344
+ 9 Kaminoan 2 221.0 88.0
345
+ ```
346
+
347
+ See [DataFrame.md](doc/DataFrame.md) for other examples and details.
149
348
 
150
349
 
151
350
  ## `RedAmber::Vector`
152
351
 
153
352
  Class `RedAmber::Vector` represents a series of data in the DataFrame.
353
+ Method `RedAmber::DataFrame#[key]` returns a Vector with the key `key`.
154
354
 
155
355
  ```ruby
156
356
  penguins[:bill_length_mm]
@@ -161,11 +361,34 @@ penguins[:bill_length_mm]
161
361
 
162
362
  Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
163
363
 
364
+ This is an element-wise comparison and returns a boolean Vector of same size.
365
+
366
+ ![unary element-wise](doc/image/vector/unary_element_wise.png)
367
+
368
+ ```ruby
369
+ penguins[:bill_length_mm] < 40
370
+
371
+ # =>
372
+ #<RedAmber::Vector(:boolean, size=344):0x000000000007e7ac>
373
+ [true, true, false, nil, true, true, true, true, true, false, true, true, false, ... ]
374
+ ```
375
+
376
+ Next example returns aggregated result.
377
+
378
+ ![unary aggregation](doc/image/vector/unary_aggregation.png)
379
+
380
+ ```ruby
381
+ penguins[:bill_length_mm].mean
382
+ 43.92192982456141
383
+ # =>
384
+
385
+ ```
386
+
164
387
  See [Vector.md](doc/Vector.md) for details.
165
388
 
166
389
  ## Jupyter notebook
167
390
 
168
- [47 Examples of Red Amber](doc/47_examples_of_red_amber.ipynb)
391
+ [71 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
169
392
 
170
393
  ## Development
171
394
 
@@ -176,6 +399,12 @@ bundle install
176
399
  bundle exec rake test
177
400
  ```
178
401
 
402
+ I will appreciate if you could help to improve this project. Here are a few ways you can help:
403
+
404
+ - [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
405
+ - Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
406
+ - Write, clarify, or fix documentation
407
+
179
408
  ## License
180
409
 
181
410
  The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).