red_amber 0.2.3 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.rubocop.yml +133 -51
- data/.yardopts +2 -0
- data/CHANGELOG.md +203 -1
- data/Gemfile +2 -1
- data/LICENSE +1 -1
- data/README.md +61 -45
- data/benchmark/basic.yml +11 -4
- data/benchmark/combine.yml +3 -4
- data/benchmark/dataframe.yml +62 -0
- data/benchmark/group.yml +7 -1
- data/benchmark/reshape.yml +6 -2
- data/benchmark/vector.yml +63 -0
- data/doc/DataFrame.md +35 -12
- data/doc/DataFrame_Comparison.md +65 -0
- data/doc/SubFrames.md +11 -0
- data/doc/Vector.md +295 -1
- data/doc/yard-templates/default/fulldoc/html/css/common.css +6 -0
- data/lib/red_amber/data_frame.rb +537 -68
- data/lib/red_amber/data_frame_combinable.rb +776 -123
- data/lib/red_amber/data_frame_displayable.rb +248 -18
- data/lib/red_amber/data_frame_indexable.rb +122 -19
- data/lib/red_amber/data_frame_loadsave.rb +81 -10
- data/lib/red_amber/data_frame_reshaping.rb +216 -21
- data/lib/red_amber/data_frame_selectable.rb +781 -120
- data/lib/red_amber/data_frame_variable_operation.rb +561 -85
- data/lib/red_amber/group.rb +195 -21
- data/lib/red_amber/helper.rb +114 -32
- data/lib/red_amber/refinements.rb +206 -0
- data/lib/red_amber/subframes.rb +1066 -0
- data/lib/red_amber/vector.rb +435 -58
- data/lib/red_amber/vector_aggregation.rb +312 -0
- data/lib/red_amber/vector_binary_element_wise.rb +387 -0
- data/lib/red_amber/vector_selectable.rb +321 -69
- data/lib/red_amber/vector_unary_element_wise.rb +436 -0
- data/lib/red_amber/vector_updatable.rb +397 -24
- data/lib/red_amber/version.rb +2 -1
- data/lib/red_amber.rb +15 -1
- data/red_amber.gemspec +4 -3
- metadata +19 -11
- data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
- data/lib/red_amber/vector_functions.rb +0 -294
data/README.md
CHANGED
@@ -1,28 +1,29 @@
|
|
1
1
|
# RedAmber
|
2
2
|
|
3
|
-
[![Gem Version](https://
|
4
|
-
[![
|
3
|
+
[![Gem Version](https://img.shields.io/gem/v/red_amber?color=brightgreen)](https://rubygems.org/gems/red_amber)
|
4
|
+
[![CI](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml)
|
5
|
+
[![Maintainability](https://api.codeclimate.com/v1/badges/b8a745047045d2f49daa/maintainability)](https://codeclimate.com/github/heronshoes/red_amber/maintainability)
|
6
|
+
[![Test coverage](https://api.codeclimate.com/v1/badges/b8a745047045d2f49daa/test_coverage)](https://codeclimate.com/github/heronshoes/red_amber/test_coverage)
|
7
|
+
[![Doc](https://img.shields.io/badge/docs-latest-blue)](https://heronshoes.github.io/red_amber/)
|
5
8
|
[![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
|
6
9
|
|
7
10
|
A simple dataframe library for Ruby.
|
8
11
|
|
9
|
-
- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
|
12
|
+
- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
|
13
|
+
[![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en) [![Gem Version](https://img.shields.io/gem/v/red-arrow?color=brightgreen)](https://rubygems.org/gems/red-arrow)
|
10
14
|
- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
|
11
15
|
|
12
|
-
![screenshot from jupyterlab](doc/image/screenshot.png)
|
16
|
+
![screenshot from jupyterlab](https://raw.githubusercontent.com/heronshoes/red_amber/main/doc/image/screenshot.png)
|
13
17
|
|
14
18
|
## Requirements
|
19
|
+
### Ruby
|
20
|
+
Supported Ruby version is >= 3.0 (since RedAmber 0.3.0).
|
21
|
+
- I decided to remove Ruby 2.7 without waiting for EOL. See [Release note for v0.3.0](https://github.com/heronshoes/red_amber/discussions/162) for details.
|
15
22
|
|
16
|
-
|
17
|
-
|
18
|
-
Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
|
19
|
-
I recommend Ruby 3 for performance.
|
20
|
-
|
23
|
+
### Libraries
|
21
24
|
```ruby
|
22
|
-
#
|
23
|
-
gem 'red-
|
24
|
-
|
25
|
-
gem 'red-parquet', '~> 10.0.0' # Optional, if you use IO from/to parquet
|
25
|
+
gem 'red-arrow', '~> 11.0.0' # Requires Apache Arrow (see installation below)
|
26
|
+
gem 'red-parquet', '~> 11.0.0' # Optional, if you use IO from/to parquet
|
26
27
|
gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
|
27
28
|
```
|
28
29
|
|
@@ -30,61 +31,71 @@ gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
|
|
30
31
|
|
31
32
|
Install requirements before you install Red Amber.
|
32
33
|
|
33
|
-
- Apache Arrow (~>
|
34
|
-
- Apache Arrow GLib (~>
|
35
|
-
- Apache Parquet GLib (~>
|
34
|
+
- Apache Arrow (~> 11.0.0)
|
35
|
+
- Apache Arrow GLib (~> 11.0.0)
|
36
|
+
- Apache Parquet GLib (~> 11.0.0) # If you use IO from/to parquet
|
36
37
|
|
37
|
-
|
38
|
+
See [Apache Arrow install document](https://arrow.apache.org/install/).
|
38
39
|
|
39
40
|
- Minimum installation example for the latest Ubuntu:
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
41
|
+
|
42
|
+
```
|
43
|
+
sudo apt update
|
44
|
+
sudo apt install -y -V ca-certificates lsb-release wget
|
45
|
+
wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
|
46
|
+
sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
|
47
|
+
sudo apt update
|
48
|
+
sudo apt install -y -V libarrow-dev
|
49
|
+
sudo apt install -y -V libarrow-glib-dev
|
50
|
+
```
|
51
|
+
|
52
|
+
- On Fedora 38 (Rawhide):
|
53
|
+
|
54
|
+
```
|
55
|
+
sudo dnf update
|
56
|
+
sudo dnf -y install gcc-c++ libarrow-devel libarrow-glib-devel ruby-devel
|
57
|
+
```
|
58
|
+
|
59
|
+
- On macOS, using Homebrew:
|
60
|
+
|
61
|
+
```
|
62
|
+
brew install apache-arrow
|
63
|
+
brew install apache-arrow-glib
|
64
|
+
```
|
60
65
|
|
61
66
|
If you prepared Apache Arrow, add these lines to your Gemfile:
|
62
67
|
|
63
68
|
```ruby
|
64
|
-
gem 'red-arrow', '~>
|
69
|
+
gem 'red-arrow', '~> 11.0.0'
|
65
70
|
gem 'red_amber'
|
66
|
-
gem 'red-parquet', '~>
|
71
|
+
gem 'red-parquet', '~> 11.0.0' # Optional, if you use IO from/to parquet
|
67
72
|
gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
|
68
73
|
gem 'red-datasets-arrow' # Optional, recommended if you use Red Datasets
|
69
74
|
gem 'red-arrow-numo-narray' # Optional, recommended if you use inputs from Numo::NArray
|
70
75
|
```
|
71
76
|
|
72
|
-
And then execute `bundle install` or install
|
77
|
+
And then execute `bundle install` or install them yourself such as `gem install red_amber`.
|
73
78
|
|
74
79
|
## Docker image and Jupyter Notebook
|
75
80
|
|
76
|
-
[RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to
|
81
|
+
[RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to Kenta Murata).
|
77
82
|
|
78
83
|
Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb).
|
79
84
|
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb)
|
80
85
|
|
86
|
+
## Comparison of DataFrames
|
87
|
+
|
88
|
+
Comparison of basic features of RedAmber with Python
|
89
|
+
[pandas](https://pandas.pydata.org/),
|
90
|
+
R [Tidyverse](https://www.tidyverse.org/) and
|
91
|
+
Julia [Dataframes](https://dataframes.juliadata.org/stable/) is [here](doc/DataFrame_Comparison.md) (Thanks to Benson Muite).
|
81
92
|
|
82
93
|
## Data frame in `RedAmber`
|
83
94
|
|
84
95
|
Class `RedAmber::DataFrame` represents a set of data in 2D-shape.
|
85
96
|
The entity is a Red Arrow's Table object.
|
86
97
|
|
87
|
-
![dataframe model of RedAmber](doc/image/dataframe_model.png)
|
98
|
+
![dataframe model of RedAmber](https://raw.githubusercontent.com/heronshoes/red_amber/main/doc/image/dataframe_model.png)
|
88
99
|
|
89
100
|
Let's load the library and try some examples.
|
90
101
|
|
@@ -95,6 +106,11 @@ include RedAmber
|
|
95
106
|
|
96
107
|
### Example: diamonds dataset
|
97
108
|
|
109
|
+
First do (if you do not installed) `
|
110
|
+
gem install red-datasets-arrow
|
111
|
+
`
|
112
|
+
then
|
113
|
+
|
98
114
|
```ruby
|
99
115
|
require 'datasets-arrow' # to load sample data
|
100
116
|
|
@@ -120,7 +136,7 @@ For example, we can compute mean prices per cut for the data larger than 1 carat
|
|
120
136
|
|
121
137
|
```ruby
|
122
138
|
df = diamonds
|
123
|
-
.slice { carat > 1 }
|
139
|
+
.slice { carat > 1 } # or use #filter instead of #slice
|
124
140
|
.group(:cut)
|
125
141
|
.mean(:price) # `pick` prior to `group` is not required if `:price` is specified here.
|
126
142
|
.sort('-mean(price)')
|
@@ -169,7 +185,7 @@ starwars
|
|
169
185
|
.drop(0) # delete unnecessary index column
|
170
186
|
.remove { species == "NA" } # delete unnecessary rows
|
171
187
|
.group(:species) { [count(:species), mean(:height, :mass)] }
|
172
|
-
.slice { count > 1 }
|
188
|
+
.slice { count > 1 } # or use #filter instead of slice
|
173
189
|
|
174
190
|
# =>
|
175
191
|
#<RedAmber::DataFrame : 8 x 4 Vectors, 0x000000000000f848>
|
@@ -196,7 +212,7 @@ See [Vector.md](doc/Vector.md) for details.
|
|
196
212
|
|
197
213
|
## Jupyter notebook
|
198
214
|
|
199
|
-
[
|
215
|
+
[Examples of Red Amber](https://github.com/heronshoes/docker-stacks/blob/RedAmber-binder/binder/examples_of_red_amber.ipynb)
|
200
216
|
([raw file](https://raw.githubusercontent.com/heronshoes/docker-stacks/RedAmber-binder/binder/examples_of_red_amber.ipynb)) shows more examples in jupyter notebook.
|
201
217
|
|
202
218
|
You can try this notebook on [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb).
|
data/benchmark/basic.yml
CHANGED
@@ -1,10 +1,17 @@
|
|
1
|
+
loop_count: 3
|
2
|
+
|
1
3
|
contexts:
|
2
4
|
- name: HEAD
|
3
5
|
prelude: |
|
4
6
|
$LOAD_PATH.unshift(File.expand_path('lib'))
|
5
|
-
-
|
7
|
+
- name: 0.3.0
|
8
|
+
gems:
|
9
|
+
red_amber: 0.3.0
|
10
|
+
- name: 0.2.0
|
11
|
+
gems:
|
6
12
|
red_amber: 0.2.0
|
7
|
-
-
|
13
|
+
- name: 0.1.5
|
14
|
+
gems:
|
8
15
|
red_amber: 0.1.5
|
9
16
|
|
10
17
|
prelude: |
|
@@ -21,8 +28,8 @@ benchmark:
|
|
21
28
|
'B01: Pick([]) by a key name': |
|
22
29
|
df[:flight]
|
23
30
|
|
24
|
-
'
|
25
|
-
df[
|
31
|
+
'B02a: Pick([]) by key names': |
|
32
|
+
df[:carrier, :flight]
|
26
33
|
|
27
34
|
'B03: Pick by key names': |
|
28
35
|
df.pick(:carrier, :flight)
|
data/benchmark/combine.yml
CHANGED
@@ -0,0 +1,62 @@
|
|
1
|
+
loop_count: 3
|
2
|
+
|
3
|
+
contexts:
|
4
|
+
- name: HEAD
|
5
|
+
prelude: |
|
6
|
+
$LOAD_PATH.unshift(File.expand_path('lib'))
|
7
|
+
- name: 0.3.0
|
8
|
+
gems:
|
9
|
+
red_amber: 0.3.0
|
10
|
+
- name: 0.2.0
|
11
|
+
gems:
|
12
|
+
red_amber: 0.2.0
|
13
|
+
|
14
|
+
prelude: |
|
15
|
+
require 'red_amber'
|
16
|
+
require 'datasets-arrow'
|
17
|
+
|
18
|
+
diamonds = RedAmber::DataFrame.new(Datasets::Diamonds.new.to_arrow)
|
19
|
+
|
20
|
+
starwars = RedAmber::DataFrame.new(Datasets::Rdataset.new('dplyr', 'starwars').to_arrow)
|
21
|
+
|
22
|
+
uri = URI("https://raw.githubusercontent.com/heronshoes/red_amber/master/test/entity/import_cars.tsv")
|
23
|
+
import_cars = RedAmber::DataFrame.load(uri)
|
24
|
+
|
25
|
+
ds = Datasets::Rdataset.new('openintro', 'simpsons_paradox_covid')
|
26
|
+
simpsons_paradox_covid = RedAmber::DataFrame.new(ds.to_arrow)
|
27
|
+
|
28
|
+
benchmark:
|
29
|
+
'D01: Diamonds test': |
|
30
|
+
diamonds
|
31
|
+
.slice { v(:carat) > 1 }
|
32
|
+
.pick(:cut, :price)
|
33
|
+
.group(:cut)
|
34
|
+
.mean
|
35
|
+
.sort('-mean(price)')
|
36
|
+
.rename('mean(price)': :mean_price_USD)
|
37
|
+
.assign { [:mean_price_JPY, v(:mean_price_USD) * 110.0] }
|
38
|
+
|
39
|
+
'D02: Starwars test': |
|
40
|
+
starwars
|
41
|
+
.drop { keys.select { |key| key.end_with?('color') } }
|
42
|
+
.remove { v(:species) == 'NA' }
|
43
|
+
.group(:species) { [count(:species), mean(:height, :mass)] }
|
44
|
+
.slice { v(:count) > 1 }
|
45
|
+
|
46
|
+
'D03: Inport cars test': |
|
47
|
+
import_cars
|
48
|
+
.to_long(:Year, name: :Manufacturer, value: :Num_of_imported)
|
49
|
+
.to_wide(name: :Manufacturer, value: :Num_of_imported)
|
50
|
+
.transpose
|
51
|
+
|
52
|
+
'D04: Simpsons paradox test': |
|
53
|
+
simpsons_paradox_covid[simpsons_paradox_covid[:age_group] == 'under 50']
|
54
|
+
.group(:vaccine_status, :outcome)
|
55
|
+
.count
|
56
|
+
.then { |df| df.to_wide(name: :vaccine_status, value: df.keys[-1]) }
|
57
|
+
.assign do
|
58
|
+
[
|
59
|
+
[:'vaccinated_%', (100.0 * v(:vaccinated) / v(:vaccinated).sum)],
|
60
|
+
[:'unvaccinated_%', (100.0 * v(:unvaccinated) / v(:unvaccinated).sum)]
|
61
|
+
]
|
62
|
+
end
|
data/benchmark/group.yml
CHANGED
data/benchmark/reshape.yml
CHANGED
@@ -0,0 +1,63 @@
|
|
1
|
+
loop_count: 10
|
2
|
+
|
3
|
+
contexts:
|
4
|
+
- name: HEAD
|
5
|
+
prelude: |
|
6
|
+
$LOAD_PATH.unshift(File.expand_path('lib'))
|
7
|
+
- name: 0.3.0
|
8
|
+
gems:
|
9
|
+
red_amber: 0.3.0
|
10
|
+
- name: 0.2.0
|
11
|
+
gems:
|
12
|
+
red_amber: 0.2.0
|
13
|
+
|
14
|
+
prelude: |
|
15
|
+
require 'red_amber'
|
16
|
+
include RedAmber
|
17
|
+
require 'datasets-arrow'
|
18
|
+
|
19
|
+
ds = Datasets::Rdatasets.new('nycflights13', 'flights')
|
20
|
+
flights = RedAmber::DataFrame.new(ds.to_arrow)
|
21
|
+
df = flights.slice { flights[:month] <= 6 }
|
22
|
+
|
23
|
+
tailnum_vector = df[:tailnum]
|
24
|
+
distance_vector = df[:distance]
|
25
|
+
|
26
|
+
strings = tailnum_vector.to_a
|
27
|
+
arrow_array = tailnum_vector.data
|
28
|
+
integers = df[:dep_delay].to_a
|
29
|
+
boolean_vector = df[:air_time].is_nil
|
30
|
+
index_vector = Vector.new(0...boolean_vector.size).filter(boolean_vector)
|
31
|
+
replacer = index_vector.data.map(&:to_s)
|
32
|
+
booleans = boolean_vector.to_a
|
33
|
+
|
34
|
+
benchmark:
|
35
|
+
'V01: Vector.new from integer Array': |
|
36
|
+
Vector.new(integers)
|
37
|
+
|
38
|
+
'V02: Vector.new from string Array': |
|
39
|
+
Vector.new(strings)
|
40
|
+
|
41
|
+
'V03: Vector.new from boolean Vector': |
|
42
|
+
Vector.new(boolean_vector)
|
43
|
+
|
44
|
+
'V04: Vector#sum': |
|
45
|
+
distance_vector.mean
|
46
|
+
|
47
|
+
'V05: Vector#*': |
|
48
|
+
distance_vector * 1.852
|
49
|
+
|
50
|
+
'V06: Vector#[booleans]': |
|
51
|
+
tailnum_vector[booleans]
|
52
|
+
|
53
|
+
'V07: Vector#[boolean_vector]': |
|
54
|
+
tailnum_vector[boolean_vector]
|
55
|
+
|
56
|
+
'V08: Vector#[index_vector]': |
|
57
|
+
tailnum_vector[index_vector]
|
58
|
+
|
59
|
+
'V09: Vector#replace': |
|
60
|
+
tailnum_vector.replace(booleans, replacer)
|
61
|
+
|
62
|
+
'V10: Vector#replace with broad casting': |
|
63
|
+
tailnum_vector.replace(booleans, 'x')
|
data/doc/DataFrame.md
CHANGED
@@ -57,6 +57,10 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
57
57
|
```ruby
|
58
58
|
RedAmber::DataFrame.load("test/entity/with_header.csv")
|
59
59
|
```
|
60
|
+
|
61
|
+
```ruby
|
62
|
+
RedAmber::DataFrame.load("test/entity/without_header.csv", headers: [:x, :y, :z])
|
63
|
+
```
|
60
64
|
|
61
65
|
- from a string buffer
|
62
66
|
|
@@ -275,6 +279,7 @@ penguins.to_rover
|
|
275
279
|
|
276
280
|
- Shows some information about self in a transposed style.
|
277
281
|
- `tdr_str` returns same info as a String.
|
282
|
+
- `glimpse` is an alias. It is similar to dplyr's (or Polars's) `glimpse()`.
|
278
283
|
|
279
284
|
```ruby
|
280
285
|
require 'red_amber'
|
@@ -568,7 +573,7 @@ penguins.to_rover
|
|
568
573
|
[1, 2, 3]
|
569
574
|
```
|
570
575
|
|
571
|
-
### `slice ` -
|
576
|
+
### `slice ` - cut into slices of records -
|
572
577
|
|
573
578
|
Slice and select records (rows) to create a sub DataFrame.
|
574
579
|
|
@@ -601,11 +606,14 @@ penguins.to_rover
|
|
601
606
|
|
602
607
|
- Booleans as an argument
|
603
608
|
|
604
|
-
`slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
|
609
|
+
`filter(booleans)` or `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
|
610
|
+
|
611
|
+
note: `slice(booleans)` is acceptable for orthogonality of `slice`/`remove`.
|
605
612
|
|
606
613
|
```ruby
|
607
614
|
vector = penguins[:bill_length_mm]
|
608
|
-
penguins.
|
615
|
+
penguins.filter(vector >= 40)
|
616
|
+
# penguins.slice(vector >= 40) is also acceptable
|
609
617
|
|
610
618
|
# =>
|
611
619
|
#<RedAmber::DataFrame : 242 x 8 Vectors, 0x0000000000043d3c>
|
@@ -833,14 +841,14 @@ penguins.to_rover
|
|
833
841
|
|
834
842
|
Assign new or updated variables (columns) and create an updated DataFrame.
|
835
843
|
|
836
|
-
- Variables with new keys will append new columns from
|
844
|
+
- Variables with new keys will append new columns from right.
|
837
845
|
- Variables with exisiting keys will update corresponding vectors.
|
838
846
|
|
839
847
|
![assign method image](doc/../image/dataframe/assign.png)
|
840
848
|
|
841
849
|
- Variables as arguments
|
842
850
|
|
843
|
-
`assign(
|
851
|
+
`assign(key_value_pairs)` accepts pairs of key and values as parameters. `key_value_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
|
844
852
|
|
845
853
|
```ruby
|
846
854
|
df = RedAmber::DataFrame.new(
|
@@ -857,12 +865,12 @@ penguins.to_rover
|
|
857
865
|
2 Hinata 28
|
858
866
|
|
859
867
|
# update :age and add :brother
|
860
|
-
df.assign
|
868
|
+
df.assign(
|
861
869
|
{
|
862
870
|
age: age + 29,
|
863
871
|
brother: ['Santa', nil, 'Momotaro']
|
864
872
|
}
|
865
|
-
|
873
|
+
)
|
866
874
|
|
867
875
|
# =>
|
868
876
|
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000658b0>
|
@@ -932,7 +940,7 @@ penguins.to_rover
|
|
932
940
|
|
933
941
|
- Append from left
|
934
942
|
|
935
|
-
`assign_left` method accepts the same parameters and block as `assign`, but append new columns from
|
943
|
+
`assign_left` method accepts the same parameters and block as `assign`, but append new columns from left.
|
936
944
|
|
937
945
|
```ruby
|
938
946
|
df.assign_left(new_index: df.indices(1))
|
@@ -1302,7 +1310,10 @@ When the option `keep_key: true` used, the column `key` will be preserved.
|
|
1302
1310
|
- `join_keys` are keys shared by self and other to match with them.
|
1303
1311
|
- If `join_keys` are empty, common keys in self and other are chosen (natural join).
|
1304
1312
|
- If (common keys) > `join_keys`, duplicated keys are renamed by `suffix`.
|
1313
|
+
- If you want to match the columns with different names,
|
1314
|
+
use Hash for `join_keys` such as `{ left: :KEY1, right: KEY2}`.
|
1305
1315
|
|
1316
|
+
These are dataframes to use in the examples of joins.
|
1306
1317
|
```ruby
|
1307
1318
|
df = DataFrame.new(
|
1308
1319
|
KEY: %w[A B C],
|
@@ -1450,6 +1461,8 @@ When the option `keep_key: true` used, the column `key` will be preserved.
|
|
1450
1461
|
1 B 4
|
1451
1462
|
2 D 5
|
1452
1463
|
```
|
1464
|
+
##### `set_operable?(other)`
|
1465
|
+
Check if `types` of self and other are same.
|
1453
1466
|
|
1454
1467
|
##### `intersect(other)`
|
1455
1468
|
|
@@ -1495,15 +1508,23 @@ When the option `keep_key: true` used, the column `key` will be preserved.
|
|
1495
1508
|
<string> <uint8>
|
1496
1509
|
1 B 2
|
1497
1510
|
2 C 3
|
1511
|
+
|
1512
|
+
other.differencr(df)
|
1513
|
+
#=>
|
1514
|
+
#<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000040e0c>
|
1515
|
+
KEY1 KEY2
|
1516
|
+
<string> <uint8>
|
1517
|
+
0 B 4
|
1518
|
+
1 D 5
|
1498
1519
|
```
|
1499
1520
|
|
1500
1521
|
## Binding
|
1501
1522
|
|
1502
1523
|
### `concatenate(other)`
|
1503
1524
|
|
1504
|
-
Concatenate another DataFrame or Table onto the bottom of self. The
|
1525
|
+
Concatenate another DataFrame or Table onto the bottom of self. The types of other must be the same as self.
|
1505
1526
|
|
1506
|
-
The alias is `concat`.
|
1527
|
+
The alias is `concat` and `bind_rows`.
|
1507
1528
|
|
1508
1529
|
An array of DataFrames or Tables is also acceptable as other.
|
1509
1530
|
|
@@ -1535,9 +1556,11 @@ When the option `keep_key: true` used, the column `key` will be preserved.
|
|
1535
1556
|
3 4 D
|
1536
1557
|
```
|
1537
1558
|
|
1538
|
-
### `merge(other)`
|
1559
|
+
### `merge(*other)`
|
1560
|
+
|
1561
|
+
Concatenate another DataFrame or Table onto the bottom of self. The size of other must be the same as self. Self and other must not share the same key.
|
1539
1562
|
|
1540
|
-
|
1563
|
+
The alias is `bind_cols`.
|
1541
1564
|
|
1542
1565
|
```ruby
|
1543
1566
|
df
|
@@ -0,0 +1,65 @@
|
|
1
|
+
# Comparison of DataFrames
|
2
|
+
|
3
|
+
Compare basic features of RedAmber with Python
|
4
|
+
[pandas](https://pandas.pydata.org/),
|
5
|
+
R [Tidyverse](https://www.tidyverse.org/) and
|
6
|
+
Julia [Dataframes](https://dataframes.juliadata.org/stable/).
|
7
|
+
|
8
|
+
## Select columns (variables)
|
9
|
+
|
10
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
11
|
+
|--- |--- |--- |--- |--- |
|
12
|
+
| Select columns as a dataframe | pick, drop, [] | dplyr::select, dplyr::select_if | [], loc[], iloc[], drop, select_dtypes | [], select |
|
13
|
+
| Select a column as a vector | [], v | dplyr::pull, [, x] | [], loc[], iloc[] | [!, :x] |
|
14
|
+
| Move columns to a new position | pick, [] | relocate | [], reindex, loc[], iloc[] | select,transform |
|
15
|
+
|
16
|
+
## Select rows (records, observations)
|
17
|
+
|
18
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
19
|
+
|--- |--- |--- |--- |--- |
|
20
|
+
| Select rows that meet logical criteria as a dataframe | slice, remove, [] | dplyr::filter | [], filter, query, loc[] | filter |
|
21
|
+
| Select rows by position as a dataframe | slice, remove, [] | dplyr::slice | iloc[], drop | subset |
|
22
|
+
| Move rows to a new position | slice, [] | dplyr::filter, dplyr::slice | reindex, loc[], iloc[] | permute |
|
23
|
+
|
24
|
+
## Update columns / create new columns
|
25
|
+
|
26
|
+
|Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
27
|
+
|--- |--- |--- |--- |--- |
|
28
|
+
| Update existing columns | assign | dplyr::mutate | assign, []= | mapcols |
|
29
|
+
| Create new columns | assign, assign_left | dplyr::mutate | apply | insertcols,.+ |
|
30
|
+
| Compute new columns, drop others | new | transmute | (dfply:)transmute | transform,insertcols,mapcols |
|
31
|
+
| Rename columns | rename | dplyr::rename, dplyr::rename_with, purrr::set_names | rename, set_axis | rename |
|
32
|
+
| Sort dataframe | sort | dplyr::arrange | sort_values | sort |
|
33
|
+
|
34
|
+
## Reshape dataframe
|
35
|
+
|
36
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
37
|
+
|--- |--- |--- |--- |--- |
|
38
|
+
| Gather columns into rows (create a longer dataframe) | to_long | tidyr::pivot_longer | melt | stack |
|
39
|
+
| Spread rows into columns (create a wider dataframe) | to_wide | tidyr::pivot_wider | pivot | unstack |
|
40
|
+
| transpose a wide dataframe | transpose | transpose, t | transpose, T | permutedims |
|
41
|
+
|
42
|
+
## Grouping
|
43
|
+
|
44
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
45
|
+
|--- |--- |--- |--- |--- |
|
46
|
+
|Grouping | group, group.summarize | dplyr::group_by %>% dplyr::summarise | groupby.agg | combine,groupby |
|
47
|
+
|
48
|
+
## Combine dataframes or tables
|
49
|
+
|
50
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
51
|
+
|--- |--- |--- |--- |--- |
|
52
|
+
| Combine additional columns | merge, bind_cols | dplyr::bind_cols | concat | combine |
|
53
|
+
| Combine additional rows | concatenate, concat, bind_rows | dplyr::bind_rows | concat | transform |
|
54
|
+
| Join right to left, leaving only the matching rows| join, inner_join | dplyr::inner_join | merge | innerjoin |
|
55
|
+
| Join right to left, leaving all rows | join, full_join, outer_join | dplyr::full_join | merge | outerjoin |
|
56
|
+
| Join matching values to left from right | join, left_join | dplyr::left_join | merge | leftjoin |
|
57
|
+
| Join matching values from left to right | join, right_join | dplyr::right_join | merge | rightjoin |
|
58
|
+
| Return rows of left that have a match in right | join, semi_join | dplyr::semi_join | [isin] | semijoin |
|
59
|
+
| Return rows of left that do not have a match in right | join, anti_join | dplyr::anti_join | [isin] | antijoin |
|
60
|
+
| Collect rows that appear in left or right | union | dplyr::union | merge | |
|
61
|
+
| Collect rows that appear in both left and right | intersect | dplyr::intersect | merge | |
|
62
|
+
| Collect rows that appear in left but not right | difference, setdiff | dplyr::setdiff | merge | |
|
63
|
+
|
64
|
+
|
65
|
+
|
data/doc/SubFrames.md
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
# SubFrames
|
2
|
+
|
3
|
+
`SubFrames` represents a collection of subsets of a DataFrame.
|
4
|
+
It has an Array of indices `#subset_indices` which is able to create an Array of sub DataFrames.
|
5
|
+
The concept includes `group` operation of a Dataframe, rolling window operation and has more broad capabilities.
|
6
|
+
|
7
|
+
This feature is experimental. It may be removed or be changed in the future.
|
8
|
+
|
9
|
+
## Create SubFrames
|
10
|
+
|
11
|
+
## Properties of SubFrames
|