red_amber 0.1.7 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +12 -2
- data/.rubocop_todo.yml +2 -15
- data/.yardopts +1 -0
- data/CHANGELOG.md +164 -2
- data/Gemfile +2 -1
- data/README.md +246 -17
- data/doc/DataFrame.md +392 -129
- data/doc/Vector.md +37 -19
- data/doc/examples_of_red_amber.ipynb +8979 -0
- data/lib/red_amber/data_frame.rb +138 -24
- data/lib/red_amber/data_frame_displayable.rb +35 -18
- data/lib/red_amber/data_frame_reshaping.rb +85 -0
- data/lib/red_amber/data_frame_selectable.rb +53 -9
- data/lib/red_amber/data_frame_variable_operation.rb +130 -50
- data/lib/red_amber/group.rb +29 -27
- data/lib/red_amber/vector.rb +1 -1
- data/lib/red_amber/vector_functions.rb +65 -23
- data/lib/red_amber/vector_selectable.rb +12 -9
- data/lib/red_amber/vector_updatable.rb +22 -1
- data/lib/red_amber/version.rb +1 -1
- data/lib/red_amber.rb +1 -1
- data/red_amber.gemspec +1 -1
- metadata +7 -5
- data/doc/47_examples_of_red_amber.ipynb +0 -4872
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d239a3fa90e5796fb695f8d3c4995d0a2178ea7c8c2789bed157e688902585cb
|
4
|
+
data.tar.gz: 968c02294d24a3dabaa6e5128be0bcfad713e131df15850ac0ceb64c2883dcd0
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d1c5ffd9650dd8c9e825514cd7e2ff4914690bd731ac262fca6cc17e56c1e312679689351a05fb741dccfb59377214706a8bf6ca6fe3237ca46fb623ae1b9f10
|
7
|
+
data.tar.gz: f37c4aff9170cd5105737a9d2b3d827051254dcca6968b697f5ed3a70e1b2c3cb14303e88a9c342870d1447450a538e445d6f3d37de53591d3f6d13b87aebc16
|
data/.rubocop.yml
CHANGED
@@ -43,6 +43,11 @@ Lint/BinaryOperatorWithIdenticalOperands:
|
|
43
43
|
Exclude:
|
44
44
|
- 'test/test_vector_function.rb'
|
45
45
|
|
46
|
+
# Need for test with empty block
|
47
|
+
Lint/EmptyBlock:
|
48
|
+
Exclude:
|
49
|
+
- 'test/test_group.rb'
|
50
|
+
|
46
51
|
# Max: 120
|
47
52
|
Layout/LineLength:
|
48
53
|
Max: 118
|
@@ -56,6 +61,7 @@ Metrics/AbcSize:
|
|
56
61
|
Max: 30
|
57
62
|
Exclude:
|
58
63
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 55
|
64
|
+
- 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
|
59
65
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 51
|
60
66
|
- 'lib/red_amber/vector_updatable.rb' # Max: 36
|
61
67
|
- 'lib/red_amber/vector_selectable.rb' # Max: 33
|
@@ -78,23 +84,27 @@ Metrics/ClassLength:
|
|
78
84
|
Metrics/CyclomaticComplexity:
|
79
85
|
Max: 12
|
80
86
|
Exclude:
|
87
|
+
- 'lib/red_amber/data_frame_displayable.rb' # Max: 18
|
81
88
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 14
|
89
|
+
- 'lib/red_amber/vector_selectable.rb' # Max: 13
|
82
90
|
- 'lib/red_amber/vector_updatable.rb' # Max: 14
|
83
|
-
- 'lib/red_amber/data_frame_displayable.rb' # Max: 18
|
84
91
|
|
85
92
|
# Max: 10
|
86
93
|
Metrics/MethodLength:
|
87
94
|
Max: 30
|
88
95
|
Exclude:
|
89
96
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 33
|
97
|
+
- 'lib/red_amber/data_frame_selectable.rb' # Max: 38
|
98
|
+
- 'lib/red_amber/data_frame_variable_operation.rb' # Max: 35
|
90
99
|
|
91
100
|
# Max: 100
|
92
101
|
Metrics/ModuleLength:
|
93
102
|
Max: 100
|
94
103
|
Exclude:
|
104
|
+
- 'lib/red_amber/data_frame_displayable.rb' # Max: 132
|
95
105
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 141
|
106
|
+
- 'lib/red_amber/data_frame_variable_operation.rb' # Max: 110
|
96
107
|
- 'lib/red_amber/vector_functions.rb' # Max: 114
|
97
|
-
- 'lib/red_amber/data_frame_displayable.rb' # Max: 132
|
98
108
|
|
99
109
|
# Max: 8
|
100
110
|
Metrics/PerceivedComplexity:
|
data/.rubocop_todo.yml
CHANGED
@@ -1,15 +1,2 @@
|
|
1
|
-
#
|
2
|
-
#
|
3
|
-
# on 2022-05-08 02:37:36 UTC using RuboCop version 1.27.0.
|
4
|
-
# The point is for the user to remove these configuration records
|
5
|
-
# one by one as the offenses are removed from the code base.
|
6
|
-
# Note that changes in the inspected code, or installation of new
|
7
|
-
# versions of RuboCop, may require this file to be generated again.
|
8
|
-
|
9
|
-
# Offense count: 1
|
10
|
-
# This cop supports unsafe auto-correction (--auto-correct-all).
|
11
|
-
# Configuration parameters: EnforcedStyle.
|
12
|
-
# SupportedStyles: forbid_for_all_comparison_operators, forbid_for_equality_operators_only, require_for_all_comparison_operators, require_for_equality_operators_only
|
13
|
-
Style/YodaCondition:
|
14
|
-
Exclude:
|
15
|
-
- 'lib/red_amber/data_frame.rb'
|
1
|
+
# We will use cops to detect bugs in an early stage
|
2
|
+
# Feel free to use .rubocop_todo.yml by --auto-gen-config
|
data/.yardopts
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--output-dir doc/yard
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,168 @@
|
|
1
|
-
## [0.1
|
1
|
+
## [0.2.1] - 2022-09-07
|
2
2
|
|
3
|
-
-
|
3
|
+
-Bug fixes
|
4
|
+
|
5
|
+
- Fix `Vector#each` with block (#66)
|
6
|
+
`Vector#each` will return value of each element with block.
|
7
|
+
|
8
|
+
- Fix table format at size == 9 (#67)
|
9
|
+
|
10
|
+
- Fix to support Vector in `DataFrame#assign` (#77)
|
11
|
+
|
12
|
+
- Add `assert_delta` functionality for `assert_with_NaN` (#78)
|
13
|
+
|
14
|
+
- Fix Vector#is_in when self is chunked (#79)
|
15
|
+
|
16
|
+
- Fix Array type error (uint/int) (#79)
|
17
|
+
|
18
|
+
- New features and improvements
|
19
|
+
|
20
|
+
- Refine `DataFrame#indices` method (#67)
|
21
|
+
|
22
|
+
- Update DataFrame reshaping methods (#73)
|
23
|
+
|
24
|
+
- Change default option value of DataFrame reshaping
|
25
|
+
|
26
|
+
- Change the order of import_cars example
|
27
|
+
|
28
|
+
- Add `DataFrame#method_missing` to get column vector by method (#75)
|
29
|
+
|
30
|
+
- Add `DataFrame#method_missing` to get column (#75)
|
31
|
+
|
32
|
+
- Accept both args and block in `DataFrame#assign` (#75)
|
33
|
+
|
34
|
+
- Accept indices in `DataFrame#pick` and `DataFrame#drop` (#76)
|
35
|
+
|
36
|
+
- Add `DataFrame#slice_by` method (#77)
|
37
|
+
|
38
|
+
- Add new Vector functions (#78)
|
39
|
+
|
40
|
+
- Add inverse trigonometric function for Vector
|
41
|
+
- `acos`
|
42
|
+
- `asin`
|
43
|
+
|
44
|
+
- Add logarithmic function for Vector
|
45
|
+
- `ln`
|
46
|
+
- `log10`
|
47
|
+
- `log1p`
|
48
|
+
- `log2`
|
49
|
+
|
50
|
+
- Add binary function `Vector#logb`
|
51
|
+
|
52
|
+
- Docker image and Jupyter Notebook (Thanks to @mrkn)
|
53
|
+
- Add link to RubyData in README
|
54
|
+
- Add link to interactive README by Binder
|
55
|
+
|
56
|
+
- Update Jupyter Notebook `71 examples of RedAmber`
|
57
|
+
|
58
|
+
|
59
|
+
## [0.2.0] - 2022-08-15
|
60
|
+
|
61
|
+
- Bump version up to 0.2.0
|
62
|
+
|
63
|
+
- Bug fixes
|
64
|
+
|
65
|
+
- Fix order of multiple group keys (#55)
|
66
|
+
|
67
|
+
Only 1 group key comes to left. Other keys remain in right.
|
68
|
+
|
69
|
+
- Remove optional `require` for rover (#55)
|
70
|
+
|
71
|
+
Fix DataFrame.new for argument with Rover::DataFrame.
|
72
|
+
|
73
|
+
- Fix occasional failure in CI (#59)
|
74
|
+
|
75
|
+
Sometimes the CI test fails. I added -dev dependency
|
76
|
+
in Arrow install by apt, not doing in bundler.
|
77
|
+
|
78
|
+
- Fix calling :take in V#[] (#56)
|
79
|
+
|
80
|
+
Fixed to call Arrow function :take instead of :array_take in Vector#take_by_vector. This will prevent the error below
|
81
|
+
when called with Arrow::ChunkedArray.
|
82
|
+
|
83
|
+
- Raise error renaming non existing key (#61)
|
84
|
+
|
85
|
+
Add error when specified key is not exist.
|
86
|
+
|
87
|
+
- Fix DataFrame#rename #assign by array (#65)
|
88
|
+
|
89
|
+
- New features and improvements
|
90
|
+
|
91
|
+
- Support Arrow 9.0.0
|
92
|
+
- Upgrade to Arrow 9.0.0 (#59)
|
93
|
+
- Add Vector#quantile method (#59)
|
94
|
+
Arrow::QuantileOptions has supported in Arrow GLib 9.0.0 (ARROW-16623, Thanks!)
|
95
|
+
|
96
|
+
- Add Vector#quantiles (#62)
|
97
|
+
|
98
|
+
- Add DataFrame#each_row (#56)
|
99
|
+
- Returns Enumerator if block is not given.
|
100
|
+
- Change DataFrame#each_row to return a Hash {key => row} (#63)
|
101
|
+
|
102
|
+
- Refactor to use pattern match in overloaded parameter parsing (#61)
|
103
|
+
- Refine DataFrame.new to use pattern match
|
104
|
+
- Use pattern match in DataFrame#assign
|
105
|
+
- Use pattern match in DataFrame#rename
|
106
|
+
|
107
|
+
- Accept Array for renamer/assigner in #rename/#assign (#61)
|
108
|
+
- Accept assigner by Arrays in DataFrame#assign
|
109
|
+
- Accept renamer pairs by Arrays in DataFrame#rename
|
110
|
+
- Add DataFrame#assign_left method
|
111
|
+
|
112
|
+
- Add summary/describe (#62)
|
113
|
+
- Introduce DataFrame#summary(#describe)
|
114
|
+
|
115
|
+
- Introduce reshaping methods for DataFrame (#64)
|
116
|
+
- Introduce DataFrame#transpose method
|
117
|
+
- Intorduce DataFrame#to_long method
|
118
|
+
- Intorduce DataFrame#to_wide method
|
119
|
+
|
120
|
+
- Others
|
121
|
+
|
122
|
+
- Add alias sort_index for array_sort_indices (#59)
|
123
|
+
- Enable :width option in DataFrame#to_s (#62)
|
124
|
+
- Add options to DataFrame#format_table (#62)
|
125
|
+
|
126
|
+
- Update Documents
|
127
|
+
|
128
|
+
- Add Yard doc for some methods
|
129
|
+
|
130
|
+
- Update Jupyter notebook '61 Examples of Red Amber' (#65)
|
131
|
+
|
132
|
+
## [0.1.8] - 2022-08-04 (experimental)
|
133
|
+
|
134
|
+
- Bug fixes
|
135
|
+
|
136
|
+
- Fix unnamed column in table formatter (#52)
|
137
|
+
- Fix DataFrame#key?, DataFrame#key_index when @keys.nil? (#52)
|
138
|
+
- Align order of replacer in Vector#replace (#53, resolved #38)
|
139
|
+
|
140
|
+
- New features and improvements
|
141
|
+
|
142
|
+
- Refine DataFrame.new for empty arguments (#50)
|
143
|
+
- Delete .rubocop_todo.yml for not to use yoda condition (#50)
|
144
|
+
|
145
|
+
- Refine Group (#52, resolved #28)
|
146
|
+
- Refine Group methods creation
|
147
|
+
- Make group key at first(left)
|
148
|
+
- Show only one group count when same counts
|
149
|
+
- Add block acceptability for group
|
150
|
+
- Rename empty key to :unnamed in DataFrame.new
|
151
|
+
- Rename Group#aggregated_by to #summarize (#54)
|
152
|
+
|
153
|
+
- Add Vector#shift (#51)
|
154
|
+
|
155
|
+
- Vector#[] accepts Range as an argument (#51)
|
156
|
+
|
157
|
+
- Update documents
|
158
|
+
|
159
|
+
- Add support for yard (#54)
|
160
|
+
|
161
|
+
- Renew jupyter notebook '53 examples' (#54)
|
162
|
+
|
163
|
+
- Add more examples and images in README (#52)
|
164
|
+
- Add document of group manipulations in README (#52)
|
165
|
+
- Renew DF#group document in DataFrame.md (#52)
|
4
166
|
|
5
167
|
## [0.1.7] - 2022-07-15 (experimental)
|
6
168
|
|
data/Gemfile
CHANGED
@@ -7,7 +7,7 @@ gemspec
|
|
7
7
|
group :test do
|
8
8
|
gem 'rake'
|
9
9
|
|
10
|
-
gem 'red-parquet', '>=
|
10
|
+
gem 'red-parquet', '>= 9.0.0'
|
11
11
|
gem 'rover-df', '~> 0.3.0'
|
12
12
|
|
13
13
|
gem 'rubocop'
|
@@ -18,6 +18,7 @@ group :test do
|
|
18
18
|
gem 'iruby'
|
19
19
|
gem 'test-unit'
|
20
20
|
gem 'webrick'
|
21
|
+
gem 'yard'
|
21
22
|
|
22
23
|
gem 'benchmark_driver'
|
23
24
|
gem 'red-datasets'
|
data/README.md
CHANGED
@@ -3,17 +3,23 @@
|
|
3
3
|
[](https://badge.fury.io/rb/red_amber)
|
4
4
|
[](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
|
5
5
|
|
6
|
-
A simple dataframe library for Ruby
|
6
|
+
A simple dataframe library for Ruby.
|
7
7
|
|
8
8
|
- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [](https://gitter.im/red-data-tools/en)
|
9
9
|
- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
|
10
10
|
|
11
11
|
## Requirements
|
12
12
|
|
13
|
+
Supported Ruby version is >= 2.7.
|
14
|
+
|
15
|
+
Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
|
16
|
+
I recommend Ruby 3 for performance.
|
17
|
+
|
13
18
|
```ruby
|
14
|
-
|
19
|
+
# Libraries required
|
20
|
+
gem 'red-arrow', '>= 9.0.0'
|
15
21
|
|
16
|
-
gem 'red-parquet', '>=
|
22
|
+
gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet
|
17
23
|
gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
|
18
24
|
```
|
19
25
|
|
@@ -21,9 +27,9 @@ gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
|
|
21
27
|
|
22
28
|
Install requirements before you install Red Amber.
|
23
29
|
|
24
|
-
- Apache Arrow GLib (>=
|
30
|
+
- Apache Arrow GLib (>= 9.0.0)
|
25
31
|
|
26
|
-
- Apache Parquet GLib (>=
|
32
|
+
- Apache Parquet GLib (>= 9.0.0) # If you use IO from/to parquet
|
27
33
|
|
28
34
|
See [Apache Arrow install document](https://arrow.apache.org/install/).
|
29
35
|
|
@@ -47,16 +53,27 @@ Or install it yourself as:
|
|
47
53
|
gem install red_amber
|
48
54
|
```
|
49
55
|
|
56
|
+
## Docker image and Jupyter Notebook
|
57
|
+
|
58
|
+
[RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
|
59
|
+
|
60
|
+
Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb).
|
61
|
+
[](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb)
|
62
|
+
|
63
|
+
|
64
|
+
|
50
65
|
## `RedAmber::DataFrame`
|
51
66
|
|
52
|
-
|
67
|
+
It represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
|
68
|
+
|
69
|
+

|
53
70
|
|
54
71
|
```ruby
|
55
72
|
require 'red_amber' # require 'red-amber' is also OK.
|
56
73
|
require 'datasets-arrow'
|
57
74
|
|
58
75
|
arrow = Datasets::Penguins.new.to_arrow
|
59
|
-
RedAmber::DataFrame.new(arrow)
|
76
|
+
penguins = RedAmber::DataFrame.new(arrow)
|
60
77
|
|
61
78
|
# =>
|
62
79
|
#<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
|
@@ -73,17 +90,52 @@ RedAmber::DataFrame.new(arrow)
|
|
73
90
|
344 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
74
91
|
```
|
75
92
|
|
76
|
-
|
77
|
-

|
93
|
+
For example, `DataFrame#pick` accepts keys as arguments and returns a sub DataFrame.
|
78
94
|
|
79
|
-
|
95
|
+

|
80
96
|
|
81
97
|
```ruby
|
82
|
-
|
98
|
+
penguins.keys
|
99
|
+
# =>
|
100
|
+
[:species,
|
101
|
+
:island,
|
102
|
+
:bill_length_mm,
|
103
|
+
:bill_depth_mm,
|
104
|
+
:flipper_length_mm,
|
105
|
+
:body_mass_g,
|
106
|
+
:sex,
|
107
|
+
:year]
|
108
|
+
|
109
|
+
df = penguins.pick(:species, :island, :body_mass_g)
|
83
110
|
df
|
84
111
|
|
85
112
|
# =>
|
86
|
-
#<RedAmber::DataFrame : 344 x
|
113
|
+
#<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003cc1c>
|
114
|
+
species island body_mass_g
|
115
|
+
<string> <string> <uint16>
|
116
|
+
1 Adelie Torgersen 3750
|
117
|
+
2 Adelie Torgersen 3800
|
118
|
+
3 Adelie Torgersen 3250
|
119
|
+
4 Adelie Torgersen (nil)
|
120
|
+
5 Adelie Torgersen 3450
|
121
|
+
: : : :
|
122
|
+
342 Gentoo Biscoe 5750
|
123
|
+
343 Gentoo Biscoe 5200
|
124
|
+
344 Gentoo Biscoe 5400
|
125
|
+
```
|
126
|
+
|
127
|
+
`DataFrame#drop` drops some columns to create a remainer DataFrame.
|
128
|
+
|
129
|
+

|
130
|
+
|
131
|
+
You can specify by keys or a boolean array of same size as n_keys.
|
132
|
+
|
133
|
+
```ruby
|
134
|
+
# Same as df.drop(:species, :island)
|
135
|
+
df = df.drop(true, true, false)
|
136
|
+
|
137
|
+
# =>
|
138
|
+
#<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
|
87
139
|
body_mass_g
|
88
140
|
<uint16>
|
89
141
|
1 3750
|
@@ -97,9 +149,14 @@ df
|
|
97
149
|
344 5400
|
98
150
|
```
|
99
151
|
|
100
|
-
|
152
|
+
Arrow data is immutable, so these methods always return an new object.
|
153
|
+
|
154
|
+
`DataFrame#assign` creates new columns or update existing columns.
|
155
|
+
|
156
|
+

|
101
157
|
|
102
158
|
```ruby
|
159
|
+
# New column is created because ':body_mass_kg' is a new key.
|
103
160
|
df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
|
104
161
|
|
105
162
|
# =>
|
@@ -117,14 +174,103 @@ df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
|
|
117
174
|
344 5400 5.4
|
118
175
|
```
|
119
176
|
|
177
|
+
`DataFrame#slice` selects rows (observations) to create a sub DataFrame.
|
178
|
+
|
179
|
+

|
180
|
+
|
181
|
+
```ruby
|
182
|
+
# returns 5 rows at the start and 5 rows from the end
|
183
|
+
penguins.slice(0...5, -5..-1)
|
184
|
+
|
185
|
+
# =>
|
186
|
+
#<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
|
187
|
+
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
188
|
+
<string> <string> <double> <double> <uint8> ... <uint16>
|
189
|
+
1 Adelie Torgersen 39.1 18.7 181 ... 2007
|
190
|
+
2 Adelie Torgersen 39.5 17.4 186 ... 2007
|
191
|
+
3 Adelie Torgersen 40.3 18.0 195 ... 2007
|
192
|
+
4 Adelie Torgersen (nil) (nil) (nil) ... 2007
|
193
|
+
5 Adelie Torgersen 36.7 19.3 193 ... 2007
|
194
|
+
: : : : : : ... :
|
195
|
+
8 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
196
|
+
9 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
197
|
+
10 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
198
|
+
```
|
199
|
+
|
200
|
+
`DataFrame#remove` rejects rows (observations) to create a remainer DataFrame.
|
201
|
+
|
202
|
+

|
203
|
+
|
204
|
+
```ruby
|
205
|
+
# penguins[:bill_length_mm] < 40 returns a boolean Vector
|
206
|
+
penguins.remove(penguins[:bill_length_mm] < 40)
|
207
|
+
|
208
|
+
# =>
|
209
|
+
#<RedAmber::DataFrame : 244 x 8 Vectors, 0x000000000007d6f4>
|
210
|
+
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
211
|
+
<string> <string> <double> <double> <uint8> ... <uint16>
|
212
|
+
1 Adelie Torgersen 40.3 18.0 195 ... 2007
|
213
|
+
2 Adelie Torgersen (nil) (nil) (nil) ... 2007
|
214
|
+
3 Adelie Torgersen 42.0 20.2 190 ... 2007
|
215
|
+
4 Adelie Torgersen 41.1 17.6 182 ... 2007
|
216
|
+
5 Adelie Torgersen 42.5 20.7 197 ... 2007
|
217
|
+
: : : : : : ... :
|
218
|
+
242 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
219
|
+
243 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
220
|
+
244 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
221
|
+
```
|
222
|
+
|
120
223
|
DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
|
121
224
|
|
122
|
-
|
225
|
+
Previous example is also OK with a block.
|
226
|
+
|
227
|
+
```ruby
|
228
|
+
penguins.remove { bill_length_mm < 40 }
|
229
|
+
```
|
230
|
+
|
231
|
+
Next example is an usage of block to update a column.
|
123
232
|
|
124
233
|
```ruby
|
125
|
-
|
234
|
+
df = RedAmber::DataFrame.new(
|
235
|
+
integer: [0, 1, 2, 3, nil],
|
236
|
+
float: [0.0, 1.1, 2.2, Float::NAN, nil],
|
237
|
+
string: ['A', 'B', 'C', 'D', nil],
|
238
|
+
boolean: [true, false, true, false, nil])
|
239
|
+
df
|
240
|
+
|
241
|
+
# =>
|
242
|
+
#<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000003131c>
|
243
|
+
integer float string boolean
|
244
|
+
<uint8> <double> <string> <boolean>
|
245
|
+
1 0 0.0 A true
|
246
|
+
2 1 1.1 B false
|
247
|
+
3 2 2.2 C true
|
248
|
+
4 3 NaN D false
|
249
|
+
5 (nil) (nil) (nil) (nil)
|
250
|
+
|
251
|
+
df.assign do
|
252
|
+
vectors.select(&:float?).map { |v| [v.key, -v] }
|
253
|
+
# => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
|
254
|
+
end
|
255
|
+
|
256
|
+
# =>
|
257
|
+
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
|
258
|
+
index float string
|
259
|
+
<uint8> <double> <string>
|
260
|
+
1 0 -0.0 A
|
261
|
+
2 1 -1.1 B
|
262
|
+
3 2 -2.2 C
|
263
|
+
4 3 NaN D
|
264
|
+
5 (nil) (nil) (nil)
|
265
|
+
```
|
266
|
+
|
267
|
+
Next example is to eliminate rows containing nil.
|
268
|
+
|
269
|
+
```ruby
|
270
|
+
# remove all observations containing nil
|
126
271
|
nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
|
127
272
|
nil_removed.tdr
|
273
|
+
|
128
274
|
# =>
|
129
275
|
RedAmber::DataFrame : 342 x 8 Vectors
|
130
276
|
Vectors : 5 numeric, 3 strings
|
@@ -145,12 +291,66 @@ For this frequently needed task, we can do it much simpler.
|
|
145
291
|
penguins.remove_nil # => same result as above
|
146
292
|
```
|
147
293
|
|
148
|
-
|
294
|
+
`DataFrame#summary` shows summary statistics in a DataFrame.
|
295
|
+
|
296
|
+
```ruby
|
297
|
+
puts penguins.summary.to_s(width: 82)
|
298
|
+
|
299
|
+
# =>
|
300
|
+
variables count mean std min 25% median 75% max
|
301
|
+
<dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
|
302
|
+
1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
|
303
|
+
2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
|
304
|
+
3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
|
305
|
+
4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
|
306
|
+
5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
|
307
|
+
```
|
308
|
+
|
309
|
+
`DataFrame#group` method can be used for the grouping tasks.
|
310
|
+
|
311
|
+
```ruby
|
312
|
+
starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
|
313
|
+
starwars
|
314
|
+
|
315
|
+
# =>
|
316
|
+
#<RedAmber::DataFrame : 87 x 12 Vectors, 0x000000000000607c>
|
317
|
+
unnamed1 name height mass hair_color skin_color eye_color ... species
|
318
|
+
<int64> <string> <int64> <double> <string> <string> <string> ... <string>
|
319
|
+
1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
|
320
|
+
2 2 C-3PO 167 75.0 NA gold yellow ... Droid
|
321
|
+
3 3 R2-D2 96 32.0 NA white, blue red ... Droid
|
322
|
+
4 4 Darth Vader 202 136.0 none white yellow ... Human
|
323
|
+
5 5 Leia Organa 150 49.0 brown light brown ... Human
|
324
|
+
: : : : : : : : ... :
|
325
|
+
85 85 BB8 (nil) (nil) none none black ... Droid
|
326
|
+
86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
|
327
|
+
87 87 Padmé Amidala 165 45.0 brown light brown ... Human
|
328
|
+
|
329
|
+
starwars.group(:species) { [count(:species), mean(:height, :mass)] }
|
330
|
+
.slice { count > 1 }
|
331
|
+
|
332
|
+
# =>
|
333
|
+
#<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000006e848>
|
334
|
+
species count mean(height) mean(mass)
|
335
|
+
<string> <int64> <double> <double>
|
336
|
+
1 Human 35 176.6 82.8
|
337
|
+
2 Droid 6 131.2 69.8
|
338
|
+
3 Wookiee 2 231.0 124.0
|
339
|
+
4 Gungan 3 208.7 74.0
|
340
|
+
5 NA 4 181.3 48.0
|
341
|
+
6 Zabrak 2 173.0 80.0
|
342
|
+
7 Twi'lek 2 179.0 55.0
|
343
|
+
8 Mirialan 2 168.0 53.1
|
344
|
+
9 Kaminoan 2 221.0 88.0
|
345
|
+
```
|
346
|
+
|
347
|
+
See [DataFrame.md](doc/DataFrame.md) for other examples and details.
|
149
348
|
|
150
349
|
|
151
350
|
## `RedAmber::Vector`
|
152
351
|
|
153
352
|
Class `RedAmber::Vector` represents a series of data in the DataFrame.
|
353
|
+
Method `RedAmber::DataFrame#[key]` returns a Vector with the key `key`.
|
154
354
|
|
155
355
|
```ruby
|
156
356
|
penguins[:bill_length_mm]
|
@@ -161,11 +361,34 @@ penguins[:bill_length_mm]
|
|
161
361
|
|
162
362
|
Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
|
163
363
|
|
364
|
+
This is an element-wise comparison and returns a boolean Vector of same size.
|
365
|
+
|
366
|
+

|
367
|
+
|
368
|
+
```ruby
|
369
|
+
penguins[:bill_length_mm] < 40
|
370
|
+
|
371
|
+
# =>
|
372
|
+
#<RedAmber::Vector(:boolean, size=344):0x000000000007e7ac>
|
373
|
+
[true, true, false, nil, true, true, true, true, true, false, true, true, false, ... ]
|
374
|
+
```
|
375
|
+
|
376
|
+
Next example returns aggregated result.
|
377
|
+
|
378
|
+

|
379
|
+
|
380
|
+
```ruby
|
381
|
+
penguins[:bill_length_mm].mean
|
382
|
+
43.92192982456141
|
383
|
+
# =>
|
384
|
+
|
385
|
+
```
|
386
|
+
|
164
387
|
See [Vector.md](doc/Vector.md) for details.
|
165
388
|
|
166
389
|
## Jupyter notebook
|
167
390
|
|
168
|
-
[
|
391
|
+
[71 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
|
169
392
|
|
170
393
|
## Development
|
171
394
|
|
@@ -176,6 +399,12 @@ bundle install
|
|
176
399
|
bundle exec rake test
|
177
400
|
```
|
178
401
|
|
402
|
+
I will appreciate if you could help to improve this project. Here are a few ways you can help:
|
403
|
+
|
404
|
+
- [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
|
405
|
+
- Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
|
406
|
+
- Write, clarify, or fix documentation
|
407
|
+
|
179
408
|
## License
|
180
409
|
|
181
410
|
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|