red_amber 0.1.7 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 88bdd603d8daec1a95c0277ef68857f84346ad7cf95d0ba23a306e6b70567c29
4
- data.tar.gz: 40add80cbaa5183ca0e93eadcdcd1fead37015cac1cb2360660002c0b1878255
3
+ metadata.gz: d239a3fa90e5796fb695f8d3c4995d0a2178ea7c8c2789bed157e688902585cb
4
+ data.tar.gz: 968c02294d24a3dabaa6e5128be0bcfad713e131df15850ac0ceb64c2883dcd0
5
5
  SHA512:
6
- metadata.gz: d043eea51117ecc48bdc52fa951e24d2618f273eb289a30f5bbb182e1a891763cdd35f6a7c6764f6e0061bddeaaa86b2374de1dc2b48f25a5b6b05c9af83a0e3
7
- data.tar.gz: cdbba19750bf71fe99e55bf6c46cb4522018f43563d7a93fdc375987f9388234e4f7e833297fdb6b8dd5a41b5a1bfdbf287ea47663f5f8a90facb56a4c63daef
6
+ metadata.gz: d1c5ffd9650dd8c9e825514cd7e2ff4914690bd731ac262fca6cc17e56c1e312679689351a05fb741dccfb59377214706a8bf6ca6fe3237ca46fb623ae1b9f10
7
+ data.tar.gz: f37c4aff9170cd5105737a9d2b3d827051254dcca6968b697f5ed3a70e1b2c3cb14303e88a9c342870d1447450a538e445d6f3d37de53591d3f6d13b87aebc16
data/.rubocop.yml CHANGED
@@ -43,6 +43,11 @@ Lint/BinaryOperatorWithIdenticalOperands:
43
43
  Exclude:
44
44
  - 'test/test_vector_function.rb'
45
45
 
46
+ # Need for test with empty block
47
+ Lint/EmptyBlock:
48
+ Exclude:
49
+ - 'test/test_group.rb'
50
+
46
51
  # Max: 120
47
52
  Layout/LineLength:
48
53
  Max: 118
@@ -56,6 +61,7 @@ Metrics/AbcSize:
56
61
  Max: 30
57
62
  Exclude:
58
63
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
64
+ - 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
59
65
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 51
60
66
  - 'lib/red_amber/vector_updatable.rb' # Max: 36
61
67
  - 'lib/red_amber/vector_selectable.rb' # Max: 33
@@ -78,23 +84,27 @@ Metrics/ClassLength:
78
84
  Metrics/CyclomaticComplexity:
79
85
  Max: 12
80
86
  Exclude:
87
+ - 'lib/red_amber/data_frame_displayable.rb' # Max: 18
81
88
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 14
89
+ - 'lib/red_amber/vector_selectable.rb' # Max: 13
82
90
  - 'lib/red_amber/vector_updatable.rb' # Max: 14
83
- - 'lib/red_amber/data_frame_displayable.rb' # Max: 18
84
91
 
85
92
  # Max: 10
86
93
  Metrics/MethodLength:
87
94
  Max: 30
88
95
  Exclude:
89
96
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 33
97
+ - 'lib/red_amber/data_frame_selectable.rb' # Max: 38
98
+ - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 35
90
99
 
91
100
  # Max: 100
92
101
  Metrics/ModuleLength:
93
102
  Max: 100
94
103
  Exclude:
104
+ - 'lib/red_amber/data_frame_displayable.rb' # Max: 132
95
105
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 141
106
+ - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 110
96
107
  - 'lib/red_amber/vector_functions.rb' # Max: 114
97
- - 'lib/red_amber/data_frame_displayable.rb' # Max: 132
98
108
 
99
109
  # Max: 8
100
110
  Metrics/PerceivedComplexity:
data/.rubocop_todo.yml CHANGED
@@ -1,15 +1,2 @@
1
- # This configuration was generated by
2
- # `rubocop --auto-gen-config`
3
- # on 2022-05-08 02:37:36 UTC using RuboCop version 1.27.0.
4
- # The point is for the user to remove these configuration records
5
- # one by one as the offenses are removed from the code base.
6
- # Note that changes in the inspected code, or installation of new
7
- # versions of RuboCop, may require this file to be generated again.
8
-
9
- # Offense count: 1
10
- # This cop supports unsafe auto-correction (--auto-correct-all).
11
- # Configuration parameters: EnforcedStyle.
12
- # SupportedStyles: forbid_for_all_comparison_operators, forbid_for_equality_operators_only, require_for_all_comparison_operators, require_for_equality_operators_only
13
- Style/YodaCondition:
14
- Exclude:
15
- - 'lib/red_amber/data_frame.rb'
1
+ # We will use cops to detect bugs in an early stage
2
+ # Feel free to use .rubocop_todo.yml by --auto-gen-config
data/.yardopts ADDED
@@ -0,0 +1 @@
1
+ --output-dir doc/yard
data/CHANGELOG.md CHANGED
@@ -1,6 +1,168 @@
1
- ## [0.1.9] - Unreleased
1
+ ## [0.2.1] - 2022-09-07
2
2
 
3
- - Supports Arrow 9.0.0
3
+ -Bug fixes
4
+
5
+ - Fix `Vector#each` with block (#66)
6
+ `Vector#each` will return value of each element with block.
7
+
8
+ - Fix table format at size == 9 (#67)
9
+
10
+ - Fix to support Vector in `DataFrame#assign` (#77)
11
+
12
+ - Add `assert_delta` functionality for `assert_with_NaN` (#78)
13
+
14
+ - Fix Vector#is_in when self is chunked (#79)
15
+
16
+ - Fix Array type error (uint/int) (#79)
17
+
18
+ - New features and improvements
19
+
20
+ - Refine `DataFrame#indices` method (#67)
21
+
22
+ - Update DataFrame reshaping methods (#73)
23
+
24
+ - Change default option value of DataFrame reshaping
25
+
26
+ - Change the order of import_cars example
27
+
28
+ - Add `DataFrame#method_missing` to get column vector by method (#75)
29
+
30
+ - Add `DataFrame#method_missing` to get column (#75)
31
+
32
+ - Accept both args and block in `DataFrame#assign` (#75)
33
+
34
+ - Accept indices in `DataFrame#pick` and `DataFrame#drop` (#76)
35
+
36
+ - Add `DataFrame#slice_by` method (#77)
37
+
38
+ - Add new Vector functions (#78)
39
+
40
+ - Add inverse trigonometric function for Vector
41
+ - `acos`
42
+ - `asin`
43
+
44
+ - Add logarithmic function for Vector
45
+ - `ln`
46
+ - `log10`
47
+ - `log1p`
48
+ - `log2`
49
+
50
+ - Add binary function `Vector#logb`
51
+
52
+ - Docker image and Jupyter Notebook (Thanks to @mrkn)
53
+ - Add link to RubyData in README
54
+ - Add link to interactive README by Binder
55
+
56
+ - Update Jupyter Notebook `71 examples of RedAmber`
57
+
58
+
59
+ ## [0.2.0] - 2022-08-15
60
+
61
+ - Bump version up to 0.2.0
62
+
63
+ - Bug fixes
64
+
65
+ - Fix order of multiple group keys (#55)
66
+
67
+ Only 1 group key comes to left. Other keys remain in right.
68
+
69
+ - Remove optional `require` for rover (#55)
70
+
71
+ Fix DataFrame.new for argument with Rover::DataFrame.
72
+
73
+ - Fix occasional failure in CI (#59)
74
+
75
+ Sometimes the CI test fails. I added -dev dependency
76
+ in Arrow install by apt, not doing in bundler.
77
+
78
+ - Fix calling :take in V#[] (#56)
79
+
80
+ Fixed to call Arrow function :take instead of :array_take in Vector#take_by_vector. This will prevent the error below
81
+ when called with Arrow::ChunkedArray.
82
+
83
+ - Raise error renaming non existing key (#61)
84
+
85
+ Add error when specified key is not exist.
86
+
87
+ - Fix DataFrame#rename #assign by array (#65)
88
+
89
+ - New features and improvements
90
+
91
+ - Support Arrow 9.0.0
92
+ - Upgrade to Arrow 9.0.0 (#59)
93
+ - Add Vector#quantile method (#59)
94
+ Arrow::QuantileOptions has supported in Arrow GLib 9.0.0 (ARROW-16623, Thanks!)
95
+
96
+ - Add Vector#quantiles (#62)
97
+
98
+ - Add DataFrame#each_row (#56)
99
+ - Returns Enumerator if block is not given.
100
+ - Change DataFrame#each_row to return a Hash {key => row} (#63)
101
+
102
+ - Refactor to use pattern match in overloaded parameter parsing (#61)
103
+ - Refine DataFrame.new to use pattern match
104
+ - Use pattern match in DataFrame#assign
105
+ - Use pattern match in DataFrame#rename
106
+
107
+ - Accept Array for renamer/assigner in #rename/#assign (#61)
108
+ - Accept assigner by Arrays in DataFrame#assign
109
+ - Accept renamer pairs by Arrays in DataFrame#rename
110
+ - Add DataFrame#assign_left method
111
+
112
+ - Add summary/describe (#62)
113
+ - Introduce DataFrame#summary(#describe)
114
+
115
+ - Introduce reshaping methods for DataFrame (#64)
116
+ - Introduce DataFrame#transpose method
117
+ - Intorduce DataFrame#to_long method
118
+ - Intorduce DataFrame#to_wide method
119
+
120
+ - Others
121
+
122
+ - Add alias sort_index for array_sort_indices (#59)
123
+ - Enable :width option in DataFrame#to_s (#62)
124
+ - Add options to DataFrame#format_table (#62)
125
+
126
+ - Update Documents
127
+
128
+ - Add Yard doc for some methods
129
+
130
+ - Update Jupyter notebook '61 Examples of Red Amber' (#65)
131
+
132
+ ## [0.1.8] - 2022-08-04 (experimental)
133
+
134
+ - Bug fixes
135
+
136
+ - Fix unnamed column in table formatter (#52)
137
+ - Fix DataFrame#key?, DataFrame#key_index when @keys.nil? (#52)
138
+ - Align order of replacer in Vector#replace (#53, resolved #38)
139
+
140
+ - New features and improvements
141
+
142
+ - Refine DataFrame.new for empty arguments (#50)
143
+ - Delete .rubocop_todo.yml for not to use yoda condition (#50)
144
+
145
+ - Refine Group (#52, resolved #28)
146
+ - Refine Group methods creation
147
+ - Make group key at first(left)
148
+ - Show only one group count when same counts
149
+ - Add block acceptability for group
150
+ - Rename empty key to :unnamed in DataFrame.new
151
+ - Rename Group#aggregated_by to #summarize (#54)
152
+
153
+ - Add Vector#shift (#51)
154
+
155
+ - Vector#[] accepts Range as an argument (#51)
156
+
157
+ - Update documents
158
+
159
+ - Add support for yard (#54)
160
+
161
+ - Renew jupyter notebook '53 examples' (#54)
162
+
163
+ - Add more examples and images in README (#52)
164
+ - Add document of group manipulations in README (#52)
165
+ - Renew DF#group document in DataFrame.md (#52)
4
166
 
5
167
  ## [0.1.7] - 2022-07-15 (experimental)
6
168
 
data/Gemfile CHANGED
@@ -7,7 +7,7 @@ gemspec
7
7
  group :test do
8
8
  gem 'rake'
9
9
 
10
- gem 'red-parquet', '>= 8.0.0'
10
+ gem 'red-parquet', '>= 9.0.0'
11
11
  gem 'rover-df', '~> 0.3.0'
12
12
 
13
13
  gem 'rubocop'
@@ -18,6 +18,7 @@ group :test do
18
18
  gem 'iruby'
19
19
  gem 'test-unit'
20
20
  gem 'webrick'
21
+ gem 'yard'
21
22
 
22
23
  gem 'benchmark_driver'
23
24
  gem 'red-datasets'
data/README.md CHANGED
@@ -3,17 +3,23 @@
3
3
  [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
4
4
  [![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
5
5
 
6
- A simple dataframe library for Ruby (experimental).
6
+ A simple dataframe library for Ruby.
7
7
 
8
8
  - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
9
9
  - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
10
10
 
11
11
  ## Requirements
12
12
 
13
+ Supported Ruby version is >= 2.7.
14
+
15
+ Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
16
+ I recommend Ruby 3 for performance.
17
+
13
18
  ```ruby
14
- gem 'red-arrow', '>= 8.0.0'
19
+ # Libraries required
20
+ gem 'red-arrow', '>= 9.0.0'
15
21
 
16
- gem 'red-parquet', '>= 8.0.0' # Optional, if you use IO from/to parquet
22
+ gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet
17
23
  gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
18
24
  ```
19
25
 
@@ -21,9 +27,9 @@ gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
21
27
 
22
28
  Install requirements before you install Red Amber.
23
29
 
24
- - Apache Arrow GLib (>= 8.0.0)
30
+ - Apache Arrow GLib (>= 9.0.0)
25
31
 
26
- - Apache Parquet GLib (>= 8.0.0) # If you use IO from/to parquet
32
+ - Apache Parquet GLib (>= 9.0.0) # If you use IO from/to parquet
27
33
 
28
34
  See [Apache Arrow install document](https://arrow.apache.org/install/).
29
35
 
@@ -47,16 +53,27 @@ Or install it yourself as:
47
53
  gem install red_amber
48
54
  ```
49
55
 
56
+ ## Docker image and Jupyter Notebook
57
+
58
+ [RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
59
+
60
+ Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb).
61
+ [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb)
62
+
63
+
64
+
50
65
  ## `RedAmber::DataFrame`
51
66
 
52
- Represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
67
+ It represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
68
+
69
+ ![dataframe model of RedAmber](doc/image/dataframe_model.png)
53
70
 
54
71
  ```ruby
55
72
  require 'red_amber' # require 'red-amber' is also OK.
56
73
  require 'datasets-arrow'
57
74
 
58
75
  arrow = Datasets::Penguins.new.to_arrow
59
- RedAmber::DataFrame.new(arrow)
76
+ penguins = RedAmber::DataFrame.new(arrow)
60
77
 
61
78
  # =>
62
79
  #<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
@@ -73,17 +90,52 @@ RedAmber::DataFrame.new(arrow)
73
90
  344 Gentoo Biscoe 49.9 16.1 213 ... 2009
74
91
  ```
75
92
 
76
- ### DataFrame model
77
- ![dataframe model of RedAmber](doc/image/dataframe_model.png)
93
+ For example, `DataFrame#pick` accepts keys as arguments and returns a sub DataFrame.
78
94
 
79
- For example, `DataFrame#pick` accepts keys as an argument and returns a sub DataFrame.
95
+ ![pick method image](doc/image/dataframe/pick.png)
80
96
 
81
97
  ```ruby
82
- df = penguins.pick(:body_mass_g)
98
+ penguins.keys
99
+ # =>
100
+ [:species,
101
+ :island,
102
+ :bill_length_mm,
103
+ :bill_depth_mm,
104
+ :flipper_length_mm,
105
+ :body_mass_g,
106
+ :sex,
107
+ :year]
108
+
109
+ df = penguins.pick(:species, :island, :body_mass_g)
83
110
  df
84
111
 
85
112
  # =>
86
- #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000015cc0>
113
+ #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003cc1c>
114
+ species island body_mass_g
115
+ <string> <string> <uint16>
116
+ 1 Adelie Torgersen 3750
117
+ 2 Adelie Torgersen 3800
118
+ 3 Adelie Torgersen 3250
119
+ 4 Adelie Torgersen (nil)
120
+ 5 Adelie Torgersen 3450
121
+ : : : :
122
+ 342 Gentoo Biscoe 5750
123
+ 343 Gentoo Biscoe 5200
124
+ 344 Gentoo Biscoe 5400
125
+ ```
126
+
127
+ `DataFrame#drop` drops some columns to create a remainer DataFrame.
128
+
129
+ ![drop method image](doc/image/dataframe/drop.png)
130
+
131
+ You can specify by keys or a boolean array of same size as n_keys.
132
+
133
+ ```ruby
134
+ # Same as df.drop(:species, :island)
135
+ df = df.drop(true, true, false)
136
+
137
+ # =>
138
+ #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
87
139
  body_mass_g
88
140
  <uint16>
89
141
  1 3750
@@ -97,9 +149,14 @@ df
97
149
  344 5400
98
150
  ```
99
151
 
100
- `DataFrame#assign` creates new variables (column in the table).
152
+ Arrow data is immutable, so these methods always return an new object.
153
+
154
+ `DataFrame#assign` creates new columns or update existing columns.
155
+
156
+ ![assign method image](doc/image/dataframe/assign.png)
101
157
 
102
158
  ```ruby
159
+ # New column is created because ':body_mass_kg' is a new key.
103
160
  df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
104
161
 
105
162
  # =>
@@ -117,14 +174,103 @@ df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
117
174
  344 5400 5.4
118
175
  ```
119
176
 
177
+ `DataFrame#slice` selects rows (observations) to create a sub DataFrame.
178
+
179
+ ![slice method image](doc/image/dataframe/slice.png)
180
+
181
+ ```ruby
182
+ # returns 5 rows at the start and 5 rows from the end
183
+ penguins.slice(0...5, -5..-1)
184
+
185
+ # =>
186
+ #<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
187
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
188
+ <string> <string> <double> <double> <uint8> ... <uint16>
189
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
190
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
191
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
192
+ 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
193
+ 5 Adelie Torgersen 36.7 19.3 193 ... 2007
194
+ : : : : : : ... :
195
+ 8 Gentoo Biscoe 50.4 15.7 222 ... 2009
196
+ 9 Gentoo Biscoe 45.2 14.8 212 ... 2009
197
+ 10 Gentoo Biscoe 49.9 16.1 213 ... 2009
198
+ ```
199
+
200
+ `DataFrame#remove` rejects rows (observations) to create a remainer DataFrame.
201
+
202
+ ![remove method image](doc/image/dataframe/remove.png)
203
+
204
+ ```ruby
205
+ # penguins[:bill_length_mm] < 40 returns a boolean Vector
206
+ penguins.remove(penguins[:bill_length_mm] < 40)
207
+
208
+ # =>
209
+ #<RedAmber::DataFrame : 244 x 8 Vectors, 0x000000000007d6f4>
210
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
211
+ <string> <string> <double> <double> <uint8> ... <uint16>
212
+ 1 Adelie Torgersen 40.3 18.0 195 ... 2007
213
+ 2 Adelie Torgersen (nil) (nil) (nil) ... 2007
214
+ 3 Adelie Torgersen 42.0 20.2 190 ... 2007
215
+ 4 Adelie Torgersen 41.1 17.6 182 ... 2007
216
+ 5 Adelie Torgersen 42.5 20.7 197 ... 2007
217
+ : : : : : : ... :
218
+ 242 Gentoo Biscoe 50.4 15.7 222 ... 2009
219
+ 243 Gentoo Biscoe 45.2 14.8 212 ... 2009
220
+ 244 Gentoo Biscoe 49.9 16.1 213 ... 2009
221
+ ```
222
+
120
223
  DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
121
224
 
122
- This is an exaple to eliminate observations (row in the table) containing nil.
225
+ Previous example is also OK with a block.
226
+
227
+ ```ruby
228
+ penguins.remove { bill_length_mm < 40 }
229
+ ```
230
+
231
+ Next example is an usage of block to update a column.
123
232
 
124
233
  ```ruby
125
- # remove all observation contains nil
234
+ df = RedAmber::DataFrame.new(
235
+ integer: [0, 1, 2, 3, nil],
236
+ float: [0.0, 1.1, 2.2, Float::NAN, nil],
237
+ string: ['A', 'B', 'C', 'D', nil],
238
+ boolean: [true, false, true, false, nil])
239
+ df
240
+
241
+ # =>
242
+ #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000003131c>
243
+ integer float string boolean
244
+ <uint8> <double> <string> <boolean>
245
+ 1 0 0.0 A true
246
+ 2 1 1.1 B false
247
+ 3 2 2.2 C true
248
+ 4 3 NaN D false
249
+ 5 (nil) (nil) (nil) (nil)
250
+
251
+ df.assign do
252
+ vectors.select(&:float?).map { |v| [v.key, -v] }
253
+ # => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
254
+ end
255
+
256
+ # =>
257
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
258
+ index float string
259
+ <uint8> <double> <string>
260
+ 1 0 -0.0 A
261
+ 2 1 -1.1 B
262
+ 3 2 -2.2 C
263
+ 4 3 NaN D
264
+ 5 (nil) (nil) (nil)
265
+ ```
266
+
267
+ Next example is to eliminate rows containing nil.
268
+
269
+ ```ruby
270
+ # remove all observations containing nil
126
271
  nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
127
272
  nil_removed.tdr
273
+
128
274
  # =>
129
275
  RedAmber::DataFrame : 342 x 8 Vectors
130
276
  Vectors : 5 numeric, 3 strings
@@ -145,12 +291,66 @@ For this frequently needed task, we can do it much simpler.
145
291
  penguins.remove_nil # => same result as above
146
292
  ```
147
293
 
148
- See [DataFrame.md](doc/DataFrame.md) for details.
294
+ `DataFrame#summary` shows summary statistics in a DataFrame.
295
+
296
+ ```ruby
297
+ puts penguins.summary.to_s(width: 82)
298
+
299
+ # =>
300
+ variables count mean std min 25% median 75% max
301
+ <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
302
+ 1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
303
+ 2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
304
+ 3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
305
+ 4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
306
+ 5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
307
+ ```
308
+
309
+ `DataFrame#group` method can be used for the grouping tasks.
310
+
311
+ ```ruby
312
+ starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
313
+ starwars
314
+
315
+ # =>
316
+ #<RedAmber::DataFrame : 87 x 12 Vectors, 0x000000000000607c>
317
+ unnamed1 name height mass hair_color skin_color eye_color ... species
318
+ <int64> <string> <int64> <double> <string> <string> <string> ... <string>
319
+ 1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
320
+ 2 2 C-3PO 167 75.0 NA gold yellow ... Droid
321
+ 3 3 R2-D2 96 32.0 NA white, blue red ... Droid
322
+ 4 4 Darth Vader 202 136.0 none white yellow ... Human
323
+ 5 5 Leia Organa 150 49.0 brown light brown ... Human
324
+ : : : : : : : : ... :
325
+ 85 85 BB8 (nil) (nil) none none black ... Droid
326
+ 86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
327
+ 87 87 Padmé Amidala 165 45.0 brown light brown ... Human
328
+
329
+ starwars.group(:species) { [count(:species), mean(:height, :mass)] }
330
+ .slice { count > 1 }
331
+
332
+ # =>
333
+ #<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000006e848>
334
+ species count mean(height) mean(mass)
335
+ <string> <int64> <double> <double>
336
+ 1 Human 35 176.6 82.8
337
+ 2 Droid 6 131.2 69.8
338
+ 3 Wookiee 2 231.0 124.0
339
+ 4 Gungan 3 208.7 74.0
340
+ 5 NA 4 181.3 48.0
341
+ 6 Zabrak 2 173.0 80.0
342
+ 7 Twi'lek 2 179.0 55.0
343
+ 8 Mirialan 2 168.0 53.1
344
+ 9 Kaminoan 2 221.0 88.0
345
+ ```
346
+
347
+ See [DataFrame.md](doc/DataFrame.md) for other examples and details.
149
348
 
150
349
 
151
350
  ## `RedAmber::Vector`
152
351
 
153
352
  Class `RedAmber::Vector` represents a series of data in the DataFrame.
353
+ Method `RedAmber::DataFrame#[key]` returns a Vector with the key `key`.
154
354
 
155
355
  ```ruby
156
356
  penguins[:bill_length_mm]
@@ -161,11 +361,34 @@ penguins[:bill_length_mm]
161
361
 
162
362
  Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
163
363
 
364
+ This is an element-wise comparison and returns a boolean Vector of same size.
365
+
366
+ ![unary element-wise](doc/image/vector/unary_element_wise.png)
367
+
368
+ ```ruby
369
+ penguins[:bill_length_mm] < 40
370
+
371
+ # =>
372
+ #<RedAmber::Vector(:boolean, size=344):0x000000000007e7ac>
373
+ [true, true, false, nil, true, true, true, true, true, false, true, true, false, ... ]
374
+ ```
375
+
376
+ Next example returns aggregated result.
377
+
378
+ ![unary aggregation](doc/image/vector/unary_aggregation.png)
379
+
380
+ ```ruby
381
+ penguins[:bill_length_mm].mean
382
+ 43.92192982456141
383
+ # =>
384
+
385
+ ```
386
+
164
387
  See [Vector.md](doc/Vector.md) for details.
165
388
 
166
389
  ## Jupyter notebook
167
390
 
168
- [47 Examples of Red Amber](doc/47_examples_of_red_amber.ipynb)
391
+ [71 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
169
392
 
170
393
  ## Development
171
394
 
@@ -176,6 +399,12 @@ bundle install
176
399
  bundle exec rake test
177
400
  ```
178
401
 
402
+ I will appreciate if you could help to improve this project. Here are a few ways you can help:
403
+
404
+ - [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
405
+ - Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
406
+ - Write, clarify, or fix documentation
407
+
179
408
  ## License
180
409
 
181
410
  The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).