red_amber 0.1.7 → 0.2.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.rubocop.yml +12 -2
- data/.rubocop_todo.yml +2 -15
- data/.yardopts +1 -0
- data/CHANGELOG.md +164 -2
- data/Gemfile +2 -1
- data/README.md +246 -17
- data/doc/DataFrame.md +392 -129
- data/doc/Vector.md +37 -19
- data/doc/examples_of_red_amber.ipynb +8979 -0
- data/lib/red_amber/data_frame.rb +138 -24
- data/lib/red_amber/data_frame_displayable.rb +35 -18
- data/lib/red_amber/data_frame_reshaping.rb +85 -0
- data/lib/red_amber/data_frame_selectable.rb +53 -9
- data/lib/red_amber/data_frame_variable_operation.rb +130 -50
- data/lib/red_amber/group.rb +29 -27
- data/lib/red_amber/vector.rb +1 -1
- data/lib/red_amber/vector_functions.rb +65 -23
- data/lib/red_amber/vector_selectable.rb +12 -9
- data/lib/red_amber/vector_updatable.rb +22 -1
- data/lib/red_amber/version.rb +1 -1
- data/lib/red_amber.rb +1 -1
- data/red_amber.gemspec +1 -1
- metadata +7 -5
- data/doc/47_examples_of_red_amber.ipynb +0 -4872
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d239a3fa90e5796fb695f8d3c4995d0a2178ea7c8c2789bed157e688902585cb
|
4
|
+
data.tar.gz: 968c02294d24a3dabaa6e5128be0bcfad713e131df15850ac0ceb64c2883dcd0
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d1c5ffd9650dd8c9e825514cd7e2ff4914690bd731ac262fca6cc17e56c1e312679689351a05fb741dccfb59377214706a8bf6ca6fe3237ca46fb623ae1b9f10
|
7
|
+
data.tar.gz: f37c4aff9170cd5105737a9d2b3d827051254dcca6968b697f5ed3a70e1b2c3cb14303e88a9c342870d1447450a538e445d6f3d37de53591d3f6d13b87aebc16
|
data/.rubocop.yml
CHANGED
@@ -43,6 +43,11 @@ Lint/BinaryOperatorWithIdenticalOperands:
|
|
43
43
|
Exclude:
|
44
44
|
- 'test/test_vector_function.rb'
|
45
45
|
|
46
|
+
# Need for test with empty block
|
47
|
+
Lint/EmptyBlock:
|
48
|
+
Exclude:
|
49
|
+
- 'test/test_group.rb'
|
50
|
+
|
46
51
|
# Max: 120
|
47
52
|
Layout/LineLength:
|
48
53
|
Max: 118
|
@@ -56,6 +61,7 @@ Metrics/AbcSize:
|
|
56
61
|
Max: 30
|
57
62
|
Exclude:
|
58
63
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 55
|
64
|
+
- 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
|
59
65
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 51
|
60
66
|
- 'lib/red_amber/vector_updatable.rb' # Max: 36
|
61
67
|
- 'lib/red_amber/vector_selectable.rb' # Max: 33
|
@@ -78,23 +84,27 @@ Metrics/ClassLength:
|
|
78
84
|
Metrics/CyclomaticComplexity:
|
79
85
|
Max: 12
|
80
86
|
Exclude:
|
87
|
+
- 'lib/red_amber/data_frame_displayable.rb' # Max: 18
|
81
88
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 14
|
89
|
+
- 'lib/red_amber/vector_selectable.rb' # Max: 13
|
82
90
|
- 'lib/red_amber/vector_updatable.rb' # Max: 14
|
83
|
-
- 'lib/red_amber/data_frame_displayable.rb' # Max: 18
|
84
91
|
|
85
92
|
# Max: 10
|
86
93
|
Metrics/MethodLength:
|
87
94
|
Max: 30
|
88
95
|
Exclude:
|
89
96
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 33
|
97
|
+
- 'lib/red_amber/data_frame_selectable.rb' # Max: 38
|
98
|
+
- 'lib/red_amber/data_frame_variable_operation.rb' # Max: 35
|
90
99
|
|
91
100
|
# Max: 100
|
92
101
|
Metrics/ModuleLength:
|
93
102
|
Max: 100
|
94
103
|
Exclude:
|
104
|
+
- 'lib/red_amber/data_frame_displayable.rb' # Max: 132
|
95
105
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 141
|
106
|
+
- 'lib/red_amber/data_frame_variable_operation.rb' # Max: 110
|
96
107
|
- 'lib/red_amber/vector_functions.rb' # Max: 114
|
97
|
-
- 'lib/red_amber/data_frame_displayable.rb' # Max: 132
|
98
108
|
|
99
109
|
# Max: 8
|
100
110
|
Metrics/PerceivedComplexity:
|
data/.rubocop_todo.yml
CHANGED
@@ -1,15 +1,2 @@
|
|
1
|
-
#
|
2
|
-
#
|
3
|
-
# on 2022-05-08 02:37:36 UTC using RuboCop version 1.27.0.
|
4
|
-
# The point is for the user to remove these configuration records
|
5
|
-
# one by one as the offenses are removed from the code base.
|
6
|
-
# Note that changes in the inspected code, or installation of new
|
7
|
-
# versions of RuboCop, may require this file to be generated again.
|
8
|
-
|
9
|
-
# Offense count: 1
|
10
|
-
# This cop supports unsafe auto-correction (--auto-correct-all).
|
11
|
-
# Configuration parameters: EnforcedStyle.
|
12
|
-
# SupportedStyles: forbid_for_all_comparison_operators, forbid_for_equality_operators_only, require_for_all_comparison_operators, require_for_equality_operators_only
|
13
|
-
Style/YodaCondition:
|
14
|
-
Exclude:
|
15
|
-
- 'lib/red_amber/data_frame.rb'
|
1
|
+
# We will use cops to detect bugs in an early stage
|
2
|
+
# Feel free to use .rubocop_todo.yml by --auto-gen-config
|
data/.yardopts
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--output-dir doc/yard
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,168 @@
|
|
1
|
-
## [0.1
|
1
|
+
## [0.2.1] - 2022-09-07
|
2
2
|
|
3
|
-
-
|
3
|
+
-Bug fixes
|
4
|
+
|
5
|
+
- Fix `Vector#each` with block (#66)
|
6
|
+
`Vector#each` will return value of each element with block.
|
7
|
+
|
8
|
+
- Fix table format at size == 9 (#67)
|
9
|
+
|
10
|
+
- Fix to support Vector in `DataFrame#assign` (#77)
|
11
|
+
|
12
|
+
- Add `assert_delta` functionality for `assert_with_NaN` (#78)
|
13
|
+
|
14
|
+
- Fix Vector#is_in when self is chunked (#79)
|
15
|
+
|
16
|
+
- Fix Array type error (uint/int) (#79)
|
17
|
+
|
18
|
+
- New features and improvements
|
19
|
+
|
20
|
+
- Refine `DataFrame#indices` method (#67)
|
21
|
+
|
22
|
+
- Update DataFrame reshaping methods (#73)
|
23
|
+
|
24
|
+
- Change default option value of DataFrame reshaping
|
25
|
+
|
26
|
+
- Change the order of import_cars example
|
27
|
+
|
28
|
+
- Add `DataFrame#method_missing` to get column vector by method (#75)
|
29
|
+
|
30
|
+
- Add `DataFrame#method_missing` to get column (#75)
|
31
|
+
|
32
|
+
- Accept both args and block in `DataFrame#assign` (#75)
|
33
|
+
|
34
|
+
- Accept indices in `DataFrame#pick` and `DataFrame#drop` (#76)
|
35
|
+
|
36
|
+
- Add `DataFrame#slice_by` method (#77)
|
37
|
+
|
38
|
+
- Add new Vector functions (#78)
|
39
|
+
|
40
|
+
- Add inverse trigonometric function for Vector
|
41
|
+
- `acos`
|
42
|
+
- `asin`
|
43
|
+
|
44
|
+
- Add logarithmic function for Vector
|
45
|
+
- `ln`
|
46
|
+
- `log10`
|
47
|
+
- `log1p`
|
48
|
+
- `log2`
|
49
|
+
|
50
|
+
- Add binary function `Vector#logb`
|
51
|
+
|
52
|
+
- Docker image and Jupyter Notebook (Thanks to @mrkn)
|
53
|
+
- Add link to RubyData in README
|
54
|
+
- Add link to interactive README by Binder
|
55
|
+
|
56
|
+
- Update Jupyter Notebook `71 examples of RedAmber`
|
57
|
+
|
58
|
+
|
59
|
+
## [0.2.0] - 2022-08-15
|
60
|
+
|
61
|
+
- Bump version up to 0.2.0
|
62
|
+
|
63
|
+
- Bug fixes
|
64
|
+
|
65
|
+
- Fix order of multiple group keys (#55)
|
66
|
+
|
67
|
+
Only 1 group key comes to left. Other keys remain in right.
|
68
|
+
|
69
|
+
- Remove optional `require` for rover (#55)
|
70
|
+
|
71
|
+
Fix DataFrame.new for argument with Rover::DataFrame.
|
72
|
+
|
73
|
+
- Fix occasional failure in CI (#59)
|
74
|
+
|
75
|
+
Sometimes the CI test fails. I added -dev dependency
|
76
|
+
in Arrow install by apt, not doing in bundler.
|
77
|
+
|
78
|
+
- Fix calling :take in V#[] (#56)
|
79
|
+
|
80
|
+
Fixed to call Arrow function :take instead of :array_take in Vector#take_by_vector. This will prevent the error below
|
81
|
+
when called with Arrow::ChunkedArray.
|
82
|
+
|
83
|
+
- Raise error renaming non existing key (#61)
|
84
|
+
|
85
|
+
Add error when specified key is not exist.
|
86
|
+
|
87
|
+
- Fix DataFrame#rename #assign by array (#65)
|
88
|
+
|
89
|
+
- New features and improvements
|
90
|
+
|
91
|
+
- Support Arrow 9.0.0
|
92
|
+
- Upgrade to Arrow 9.0.0 (#59)
|
93
|
+
- Add Vector#quantile method (#59)
|
94
|
+
Arrow::QuantileOptions has supported in Arrow GLib 9.0.0 (ARROW-16623, Thanks!)
|
95
|
+
|
96
|
+
- Add Vector#quantiles (#62)
|
97
|
+
|
98
|
+
- Add DataFrame#each_row (#56)
|
99
|
+
- Returns Enumerator if block is not given.
|
100
|
+
- Change DataFrame#each_row to return a Hash {key => row} (#63)
|
101
|
+
|
102
|
+
- Refactor to use pattern match in overloaded parameter parsing (#61)
|
103
|
+
- Refine DataFrame.new to use pattern match
|
104
|
+
- Use pattern match in DataFrame#assign
|
105
|
+
- Use pattern match in DataFrame#rename
|
106
|
+
|
107
|
+
- Accept Array for renamer/assigner in #rename/#assign (#61)
|
108
|
+
- Accept assigner by Arrays in DataFrame#assign
|
109
|
+
- Accept renamer pairs by Arrays in DataFrame#rename
|
110
|
+
- Add DataFrame#assign_left method
|
111
|
+
|
112
|
+
- Add summary/describe (#62)
|
113
|
+
- Introduce DataFrame#summary(#describe)
|
114
|
+
|
115
|
+
- Introduce reshaping methods for DataFrame (#64)
|
116
|
+
- Introduce DataFrame#transpose method
|
117
|
+
- Intorduce DataFrame#to_long method
|
118
|
+
- Intorduce DataFrame#to_wide method
|
119
|
+
|
120
|
+
- Others
|
121
|
+
|
122
|
+
- Add alias sort_index for array_sort_indices (#59)
|
123
|
+
- Enable :width option in DataFrame#to_s (#62)
|
124
|
+
- Add options to DataFrame#format_table (#62)
|
125
|
+
|
126
|
+
- Update Documents
|
127
|
+
|
128
|
+
- Add Yard doc for some methods
|
129
|
+
|
130
|
+
- Update Jupyter notebook '61 Examples of Red Amber' (#65)
|
131
|
+
|
132
|
+
## [0.1.8] - 2022-08-04 (experimental)
|
133
|
+
|
134
|
+
- Bug fixes
|
135
|
+
|
136
|
+
- Fix unnamed column in table formatter (#52)
|
137
|
+
- Fix DataFrame#key?, DataFrame#key_index when @keys.nil? (#52)
|
138
|
+
- Align order of replacer in Vector#replace (#53, resolved #38)
|
139
|
+
|
140
|
+
- New features and improvements
|
141
|
+
|
142
|
+
- Refine DataFrame.new for empty arguments (#50)
|
143
|
+
- Delete .rubocop_todo.yml for not to use yoda condition (#50)
|
144
|
+
|
145
|
+
- Refine Group (#52, resolved #28)
|
146
|
+
- Refine Group methods creation
|
147
|
+
- Make group key at first(left)
|
148
|
+
- Show only one group count when same counts
|
149
|
+
- Add block acceptability for group
|
150
|
+
- Rename empty key to :unnamed in DataFrame.new
|
151
|
+
- Rename Group#aggregated_by to #summarize (#54)
|
152
|
+
|
153
|
+
- Add Vector#shift (#51)
|
154
|
+
|
155
|
+
- Vector#[] accepts Range as an argument (#51)
|
156
|
+
|
157
|
+
- Update documents
|
158
|
+
|
159
|
+
- Add support for yard (#54)
|
160
|
+
|
161
|
+
- Renew jupyter notebook '53 examples' (#54)
|
162
|
+
|
163
|
+
- Add more examples and images in README (#52)
|
164
|
+
- Add document of group manipulations in README (#52)
|
165
|
+
- Renew DF#group document in DataFrame.md (#52)
|
4
166
|
|
5
167
|
## [0.1.7] - 2022-07-15 (experimental)
|
6
168
|
|
data/Gemfile
CHANGED
@@ -7,7 +7,7 @@ gemspec
|
|
7
7
|
group :test do
|
8
8
|
gem 'rake'
|
9
9
|
|
10
|
-
gem 'red-parquet', '>=
|
10
|
+
gem 'red-parquet', '>= 9.0.0'
|
11
11
|
gem 'rover-df', '~> 0.3.0'
|
12
12
|
|
13
13
|
gem 'rubocop'
|
@@ -18,6 +18,7 @@ group :test do
|
|
18
18
|
gem 'iruby'
|
19
19
|
gem 'test-unit'
|
20
20
|
gem 'webrick'
|
21
|
+
gem 'yard'
|
21
22
|
|
22
23
|
gem 'benchmark_driver'
|
23
24
|
gem 'red-datasets'
|
data/README.md
CHANGED
@@ -3,17 +3,23 @@
|
|
3
3
|
[![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
|
4
4
|
[![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
|
5
5
|
|
6
|
-
A simple dataframe library for Ruby
|
6
|
+
A simple dataframe library for Ruby.
|
7
7
|
|
8
8
|
- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
|
9
9
|
- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
|
10
10
|
|
11
11
|
## Requirements
|
12
12
|
|
13
|
+
Supported Ruby version is >= 2.7.
|
14
|
+
|
15
|
+
Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
|
16
|
+
I recommend Ruby 3 for performance.
|
17
|
+
|
13
18
|
```ruby
|
14
|
-
|
19
|
+
# Libraries required
|
20
|
+
gem 'red-arrow', '>= 9.0.0'
|
15
21
|
|
16
|
-
gem 'red-parquet', '>=
|
22
|
+
gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet
|
17
23
|
gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
|
18
24
|
```
|
19
25
|
|
@@ -21,9 +27,9 @@ gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
|
|
21
27
|
|
22
28
|
Install requirements before you install Red Amber.
|
23
29
|
|
24
|
-
- Apache Arrow GLib (>=
|
30
|
+
- Apache Arrow GLib (>= 9.0.0)
|
25
31
|
|
26
|
-
- Apache Parquet GLib (>=
|
32
|
+
- Apache Parquet GLib (>= 9.0.0) # If you use IO from/to parquet
|
27
33
|
|
28
34
|
See [Apache Arrow install document](https://arrow.apache.org/install/).
|
29
35
|
|
@@ -47,16 +53,27 @@ Or install it yourself as:
|
|
47
53
|
gem install red_amber
|
48
54
|
```
|
49
55
|
|
56
|
+
## Docker image and Jupyter Notebook
|
57
|
+
|
58
|
+
[RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
|
59
|
+
|
60
|
+
Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb).
|
61
|
+
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb)
|
62
|
+
|
63
|
+
|
64
|
+
|
50
65
|
## `RedAmber::DataFrame`
|
51
66
|
|
52
|
-
|
67
|
+
It represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
|
68
|
+
|
69
|
+
![dataframe model of RedAmber](doc/image/dataframe_model.png)
|
53
70
|
|
54
71
|
```ruby
|
55
72
|
require 'red_amber' # require 'red-amber' is also OK.
|
56
73
|
require 'datasets-arrow'
|
57
74
|
|
58
75
|
arrow = Datasets::Penguins.new.to_arrow
|
59
|
-
RedAmber::DataFrame.new(arrow)
|
76
|
+
penguins = RedAmber::DataFrame.new(arrow)
|
60
77
|
|
61
78
|
# =>
|
62
79
|
#<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
|
@@ -73,17 +90,52 @@ RedAmber::DataFrame.new(arrow)
|
|
73
90
|
344 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
74
91
|
```
|
75
92
|
|
76
|
-
|
77
|
-
![dataframe model of RedAmber](doc/image/dataframe_model.png)
|
93
|
+
For example, `DataFrame#pick` accepts keys as arguments and returns a sub DataFrame.
|
78
94
|
|
79
|
-
|
95
|
+
![pick method image](doc/image/dataframe/pick.png)
|
80
96
|
|
81
97
|
```ruby
|
82
|
-
|
98
|
+
penguins.keys
|
99
|
+
# =>
|
100
|
+
[:species,
|
101
|
+
:island,
|
102
|
+
:bill_length_mm,
|
103
|
+
:bill_depth_mm,
|
104
|
+
:flipper_length_mm,
|
105
|
+
:body_mass_g,
|
106
|
+
:sex,
|
107
|
+
:year]
|
108
|
+
|
109
|
+
df = penguins.pick(:species, :island, :body_mass_g)
|
83
110
|
df
|
84
111
|
|
85
112
|
# =>
|
86
|
-
#<RedAmber::DataFrame : 344 x
|
113
|
+
#<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003cc1c>
|
114
|
+
species island body_mass_g
|
115
|
+
<string> <string> <uint16>
|
116
|
+
1 Adelie Torgersen 3750
|
117
|
+
2 Adelie Torgersen 3800
|
118
|
+
3 Adelie Torgersen 3250
|
119
|
+
4 Adelie Torgersen (nil)
|
120
|
+
5 Adelie Torgersen 3450
|
121
|
+
: : : :
|
122
|
+
342 Gentoo Biscoe 5750
|
123
|
+
343 Gentoo Biscoe 5200
|
124
|
+
344 Gentoo Biscoe 5400
|
125
|
+
```
|
126
|
+
|
127
|
+
`DataFrame#drop` drops some columns to create a remainer DataFrame.
|
128
|
+
|
129
|
+
![drop method image](doc/image/dataframe/drop.png)
|
130
|
+
|
131
|
+
You can specify by keys or a boolean array of same size as n_keys.
|
132
|
+
|
133
|
+
```ruby
|
134
|
+
# Same as df.drop(:species, :island)
|
135
|
+
df = df.drop(true, true, false)
|
136
|
+
|
137
|
+
# =>
|
138
|
+
#<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
|
87
139
|
body_mass_g
|
88
140
|
<uint16>
|
89
141
|
1 3750
|
@@ -97,9 +149,14 @@ df
|
|
97
149
|
344 5400
|
98
150
|
```
|
99
151
|
|
100
|
-
|
152
|
+
Arrow data is immutable, so these methods always return an new object.
|
153
|
+
|
154
|
+
`DataFrame#assign` creates new columns or update existing columns.
|
155
|
+
|
156
|
+
![assign method image](doc/image/dataframe/assign.png)
|
101
157
|
|
102
158
|
```ruby
|
159
|
+
# New column is created because ':body_mass_kg' is a new key.
|
103
160
|
df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
|
104
161
|
|
105
162
|
# =>
|
@@ -117,14 +174,103 @@ df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
|
|
117
174
|
344 5400 5.4
|
118
175
|
```
|
119
176
|
|
177
|
+
`DataFrame#slice` selects rows (observations) to create a sub DataFrame.
|
178
|
+
|
179
|
+
![slice method image](doc/image/dataframe/slice.png)
|
180
|
+
|
181
|
+
```ruby
|
182
|
+
# returns 5 rows at the start and 5 rows from the end
|
183
|
+
penguins.slice(0...5, -5..-1)
|
184
|
+
|
185
|
+
# =>
|
186
|
+
#<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
|
187
|
+
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
188
|
+
<string> <string> <double> <double> <uint8> ... <uint16>
|
189
|
+
1 Adelie Torgersen 39.1 18.7 181 ... 2007
|
190
|
+
2 Adelie Torgersen 39.5 17.4 186 ... 2007
|
191
|
+
3 Adelie Torgersen 40.3 18.0 195 ... 2007
|
192
|
+
4 Adelie Torgersen (nil) (nil) (nil) ... 2007
|
193
|
+
5 Adelie Torgersen 36.7 19.3 193 ... 2007
|
194
|
+
: : : : : : ... :
|
195
|
+
8 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
196
|
+
9 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
197
|
+
10 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
198
|
+
```
|
199
|
+
|
200
|
+
`DataFrame#remove` rejects rows (observations) to create a remainer DataFrame.
|
201
|
+
|
202
|
+
![remove method image](doc/image/dataframe/remove.png)
|
203
|
+
|
204
|
+
```ruby
|
205
|
+
# penguins[:bill_length_mm] < 40 returns a boolean Vector
|
206
|
+
penguins.remove(penguins[:bill_length_mm] < 40)
|
207
|
+
|
208
|
+
# =>
|
209
|
+
#<RedAmber::DataFrame : 244 x 8 Vectors, 0x000000000007d6f4>
|
210
|
+
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
211
|
+
<string> <string> <double> <double> <uint8> ... <uint16>
|
212
|
+
1 Adelie Torgersen 40.3 18.0 195 ... 2007
|
213
|
+
2 Adelie Torgersen (nil) (nil) (nil) ... 2007
|
214
|
+
3 Adelie Torgersen 42.0 20.2 190 ... 2007
|
215
|
+
4 Adelie Torgersen 41.1 17.6 182 ... 2007
|
216
|
+
5 Adelie Torgersen 42.5 20.7 197 ... 2007
|
217
|
+
: : : : : : ... :
|
218
|
+
242 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
219
|
+
243 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
220
|
+
244 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
221
|
+
```
|
222
|
+
|
120
223
|
DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
|
121
224
|
|
122
|
-
|
225
|
+
Previous example is also OK with a block.
|
226
|
+
|
227
|
+
```ruby
|
228
|
+
penguins.remove { bill_length_mm < 40 }
|
229
|
+
```
|
230
|
+
|
231
|
+
Next example is an usage of block to update a column.
|
123
232
|
|
124
233
|
```ruby
|
125
|
-
|
234
|
+
df = RedAmber::DataFrame.new(
|
235
|
+
integer: [0, 1, 2, 3, nil],
|
236
|
+
float: [0.0, 1.1, 2.2, Float::NAN, nil],
|
237
|
+
string: ['A', 'B', 'C', 'D', nil],
|
238
|
+
boolean: [true, false, true, false, nil])
|
239
|
+
df
|
240
|
+
|
241
|
+
# =>
|
242
|
+
#<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000003131c>
|
243
|
+
integer float string boolean
|
244
|
+
<uint8> <double> <string> <boolean>
|
245
|
+
1 0 0.0 A true
|
246
|
+
2 1 1.1 B false
|
247
|
+
3 2 2.2 C true
|
248
|
+
4 3 NaN D false
|
249
|
+
5 (nil) (nil) (nil) (nil)
|
250
|
+
|
251
|
+
df.assign do
|
252
|
+
vectors.select(&:float?).map { |v| [v.key, -v] }
|
253
|
+
# => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
|
254
|
+
end
|
255
|
+
|
256
|
+
# =>
|
257
|
+
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
|
258
|
+
index float string
|
259
|
+
<uint8> <double> <string>
|
260
|
+
1 0 -0.0 A
|
261
|
+
2 1 -1.1 B
|
262
|
+
3 2 -2.2 C
|
263
|
+
4 3 NaN D
|
264
|
+
5 (nil) (nil) (nil)
|
265
|
+
```
|
266
|
+
|
267
|
+
Next example is to eliminate rows containing nil.
|
268
|
+
|
269
|
+
```ruby
|
270
|
+
# remove all observations containing nil
|
126
271
|
nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
|
127
272
|
nil_removed.tdr
|
273
|
+
|
128
274
|
# =>
|
129
275
|
RedAmber::DataFrame : 342 x 8 Vectors
|
130
276
|
Vectors : 5 numeric, 3 strings
|
@@ -145,12 +291,66 @@ For this frequently needed task, we can do it much simpler.
|
|
145
291
|
penguins.remove_nil # => same result as above
|
146
292
|
```
|
147
293
|
|
148
|
-
|
294
|
+
`DataFrame#summary` shows summary statistics in a DataFrame.
|
295
|
+
|
296
|
+
```ruby
|
297
|
+
puts penguins.summary.to_s(width: 82)
|
298
|
+
|
299
|
+
# =>
|
300
|
+
variables count mean std min 25% median 75% max
|
301
|
+
<dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
|
302
|
+
1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
|
303
|
+
2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
|
304
|
+
3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
|
305
|
+
4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
|
306
|
+
5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
|
307
|
+
```
|
308
|
+
|
309
|
+
`DataFrame#group` method can be used for the grouping tasks.
|
310
|
+
|
311
|
+
```ruby
|
312
|
+
starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
|
313
|
+
starwars
|
314
|
+
|
315
|
+
# =>
|
316
|
+
#<RedAmber::DataFrame : 87 x 12 Vectors, 0x000000000000607c>
|
317
|
+
unnamed1 name height mass hair_color skin_color eye_color ... species
|
318
|
+
<int64> <string> <int64> <double> <string> <string> <string> ... <string>
|
319
|
+
1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
|
320
|
+
2 2 C-3PO 167 75.0 NA gold yellow ... Droid
|
321
|
+
3 3 R2-D2 96 32.0 NA white, blue red ... Droid
|
322
|
+
4 4 Darth Vader 202 136.0 none white yellow ... Human
|
323
|
+
5 5 Leia Organa 150 49.0 brown light brown ... Human
|
324
|
+
: : : : : : : : ... :
|
325
|
+
85 85 BB8 (nil) (nil) none none black ... Droid
|
326
|
+
86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
|
327
|
+
87 87 Padmé Amidala 165 45.0 brown light brown ... Human
|
328
|
+
|
329
|
+
starwars.group(:species) { [count(:species), mean(:height, :mass)] }
|
330
|
+
.slice { count > 1 }
|
331
|
+
|
332
|
+
# =>
|
333
|
+
#<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000006e848>
|
334
|
+
species count mean(height) mean(mass)
|
335
|
+
<string> <int64> <double> <double>
|
336
|
+
1 Human 35 176.6 82.8
|
337
|
+
2 Droid 6 131.2 69.8
|
338
|
+
3 Wookiee 2 231.0 124.0
|
339
|
+
4 Gungan 3 208.7 74.0
|
340
|
+
5 NA 4 181.3 48.0
|
341
|
+
6 Zabrak 2 173.0 80.0
|
342
|
+
7 Twi'lek 2 179.0 55.0
|
343
|
+
8 Mirialan 2 168.0 53.1
|
344
|
+
9 Kaminoan 2 221.0 88.0
|
345
|
+
```
|
346
|
+
|
347
|
+
See [DataFrame.md](doc/DataFrame.md) for other examples and details.
|
149
348
|
|
150
349
|
|
151
350
|
## `RedAmber::Vector`
|
152
351
|
|
153
352
|
Class `RedAmber::Vector` represents a series of data in the DataFrame.
|
353
|
+
Method `RedAmber::DataFrame#[key]` returns a Vector with the key `key`.
|
154
354
|
|
155
355
|
```ruby
|
156
356
|
penguins[:bill_length_mm]
|
@@ -161,11 +361,34 @@ penguins[:bill_length_mm]
|
|
161
361
|
|
162
362
|
Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
|
163
363
|
|
364
|
+
This is an element-wise comparison and returns a boolean Vector of same size.
|
365
|
+
|
366
|
+
![unary element-wise](doc/image/vector/unary_element_wise.png)
|
367
|
+
|
368
|
+
```ruby
|
369
|
+
penguins[:bill_length_mm] < 40
|
370
|
+
|
371
|
+
# =>
|
372
|
+
#<RedAmber::Vector(:boolean, size=344):0x000000000007e7ac>
|
373
|
+
[true, true, false, nil, true, true, true, true, true, false, true, true, false, ... ]
|
374
|
+
```
|
375
|
+
|
376
|
+
Next example returns aggregated result.
|
377
|
+
|
378
|
+
![unary aggregation](doc/image/vector/unary_aggregation.png)
|
379
|
+
|
380
|
+
```ruby
|
381
|
+
penguins[:bill_length_mm].mean
|
382
|
+
43.92192982456141
|
383
|
+
# =>
|
384
|
+
|
385
|
+
```
|
386
|
+
|
164
387
|
See [Vector.md](doc/Vector.md) for details.
|
165
388
|
|
166
389
|
## Jupyter notebook
|
167
390
|
|
168
|
-
[
|
391
|
+
[71 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
|
169
392
|
|
170
393
|
## Development
|
171
394
|
|
@@ -176,6 +399,12 @@ bundle install
|
|
176
399
|
bundle exec rake test
|
177
400
|
```
|
178
401
|
|
402
|
+
I will appreciate if you could help to improve this project. Here are a few ways you can help:
|
403
|
+
|
404
|
+
- [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
|
405
|
+
- Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
|
406
|
+
- Write, clarify, or fix documentation
|
407
|
+
|
179
408
|
## License
|
180
409
|
|
181
410
|
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|