red_amber 0.2.3 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (42) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +133 -51
  3. data/.yardopts +2 -0
  4. data/CHANGELOG.md +203 -1
  5. data/Gemfile +2 -1
  6. data/LICENSE +1 -1
  7. data/README.md +61 -45
  8. data/benchmark/basic.yml +11 -4
  9. data/benchmark/combine.yml +3 -4
  10. data/benchmark/dataframe.yml +62 -0
  11. data/benchmark/group.yml +7 -1
  12. data/benchmark/reshape.yml +6 -2
  13. data/benchmark/vector.yml +63 -0
  14. data/doc/DataFrame.md +35 -12
  15. data/doc/DataFrame_Comparison.md +65 -0
  16. data/doc/SubFrames.md +11 -0
  17. data/doc/Vector.md +295 -1
  18. data/doc/yard-templates/default/fulldoc/html/css/common.css +6 -0
  19. data/lib/red_amber/data_frame.rb +537 -68
  20. data/lib/red_amber/data_frame_combinable.rb +776 -123
  21. data/lib/red_amber/data_frame_displayable.rb +248 -18
  22. data/lib/red_amber/data_frame_indexable.rb +122 -19
  23. data/lib/red_amber/data_frame_loadsave.rb +81 -10
  24. data/lib/red_amber/data_frame_reshaping.rb +216 -21
  25. data/lib/red_amber/data_frame_selectable.rb +781 -120
  26. data/lib/red_amber/data_frame_variable_operation.rb +561 -85
  27. data/lib/red_amber/group.rb +195 -21
  28. data/lib/red_amber/helper.rb +114 -32
  29. data/lib/red_amber/refinements.rb +206 -0
  30. data/lib/red_amber/subframes.rb +1066 -0
  31. data/lib/red_amber/vector.rb +435 -58
  32. data/lib/red_amber/vector_aggregation.rb +312 -0
  33. data/lib/red_amber/vector_binary_element_wise.rb +387 -0
  34. data/lib/red_amber/vector_selectable.rb +321 -69
  35. data/lib/red_amber/vector_unary_element_wise.rb +436 -0
  36. data/lib/red_amber/vector_updatable.rb +397 -24
  37. data/lib/red_amber/version.rb +2 -1
  38. data/lib/red_amber.rb +15 -1
  39. data/red_amber.gemspec +4 -3
  40. metadata +19 -11
  41. data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
  42. data/lib/red_amber/vector_functions.rb +0 -294
data/README.md CHANGED
@@ -1,28 +1,29 @@
1
1
  # RedAmber
2
2
 
3
- [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
4
- [![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml)
3
+ [![Gem Version](https://img.shields.io/gem/v/red_amber?color=brightgreen)](https://rubygems.org/gems/red_amber)
4
+ [![CI](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml)
5
+ [![Maintainability](https://api.codeclimate.com/v1/badges/b8a745047045d2f49daa/maintainability)](https://codeclimate.com/github/heronshoes/red_amber/maintainability)
6
+ [![Test coverage](https://api.codeclimate.com/v1/badges/b8a745047045d2f49daa/test_coverage)](https://codeclimate.com/github/heronshoes/red_amber/test_coverage)
7
+ [![Doc](https://img.shields.io/badge/docs-latest-blue)](https://heronshoes.github.io/red_amber/)
5
8
  [![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
6
9
 
7
10
  A simple dataframe library for Ruby.
8
11
 
9
- - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
12
+ - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
13
+ [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en) [![Gem Version](https://img.shields.io/gem/v/red-arrow?color=brightgreen)](https://rubygems.org/gems/red-arrow)
10
14
  - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
11
15
 
12
- ![screenshot from jupyterlab](doc/image/screenshot.png)
16
+ ![screenshot from jupyterlab](https://raw.githubusercontent.com/heronshoes/red_amber/main/doc/image/screenshot.png)
13
17
 
14
18
  ## Requirements
19
+ ### Ruby
20
+ Supported Ruby version is >= 3.0 (since RedAmber 0.3.0).
21
+ - I decided to remove Ruby 2.7 without waiting for EOL. See [Release note for v0.3.0](https://github.com/heronshoes/red_amber/discussions/162) for details.
15
22
 
16
- Supported Ruby version is >= 2.7.
17
-
18
- Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
19
- I recommend Ruby 3 for performance.
20
-
23
+ ### Libraries
21
24
  ```ruby
22
- # Libraries required
23
- gem 'red-arrow', '~> 10.0.0' # Requires Apache Arrow (see installation below)
24
-
25
- gem 'red-parquet', '~> 10.0.0' # Optional, if you use IO from/to parquet
25
+ gem 'red-arrow', '~> 11.0.0' # Requires Apache Arrow (see installation below)
26
+ gem 'red-parquet', '~> 11.0.0' # Optional, if you use IO from/to parquet
26
27
  gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
27
28
  ```
28
29
 
@@ -30,61 +31,71 @@ gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
30
31
 
31
32
  Install requirements before you install Red Amber.
32
33
 
33
- - Apache Arrow (~> 10.0.0)
34
- - Apache Arrow GLib (~> 10.0.0)
35
- - Apache Parquet GLib (~> 10.0.0) # If you use IO from/to parquet
34
+ - Apache Arrow (~> 11.0.0)
35
+ - Apache Arrow GLib (~> 11.0.0)
36
+ - Apache Parquet GLib (~> 11.0.0) # If you use IO from/to parquet
36
37
 
37
- See [Apache Arrow install document](https://arrow.apache.org/install/).
38
+ See [Apache Arrow install document](https://arrow.apache.org/install/).
38
39
 
39
40
  - Minimum installation example for the latest Ubuntu:
40
- ```
41
- sudo apt update
42
- sudo apt install -y -V ca-certificates lsb-release wget
43
- wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
44
- sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
45
- sudo apt update
46
- sudo apt install -y -V libarrow-dev
47
- sudo apt install -y -V libarrow-glib-dev
48
- ```
49
- - On macOS, you can install Apache Arrow C++ library using Homebrew:
50
-
51
- ```
52
- brew install apache-arrow
53
- ```
54
-
55
- and GLib (C) package with:
56
-
57
- ```
58
- brew install apache-arrow-glib
59
- ```
41
+
42
+ ```
43
+ sudo apt update
44
+ sudo apt install -y -V ca-certificates lsb-release wget
45
+ wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
46
+ sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
47
+ sudo apt update
48
+ sudo apt install -y -V libarrow-dev
49
+ sudo apt install -y -V libarrow-glib-dev
50
+ ```
51
+
52
+ - On Fedora 38 (Rawhide):
53
+
54
+ ```
55
+ sudo dnf update
56
+ sudo dnf -y install gcc-c++ libarrow-devel libarrow-glib-devel ruby-devel
57
+ ```
58
+
59
+ - On macOS, using Homebrew:
60
+
61
+ ```
62
+ brew install apache-arrow
63
+ brew install apache-arrow-glib
64
+ ```
60
65
 
61
66
  If you prepared Apache Arrow, add these lines to your Gemfile:
62
67
 
63
68
  ```ruby
64
- gem 'red-arrow', '~> 10.0.0'
69
+ gem 'red-arrow', '~> 11.0.0'
65
70
  gem 'red_amber'
66
- gem 'red-parquet', '~> 10.0.0' # Optional, if you use IO from/to parquet
71
+ gem 'red-parquet', '~> 11.0.0' # Optional, if you use IO from/to parquet
67
72
  gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
68
73
  gem 'red-datasets-arrow' # Optional, recommended if you use Red Datasets
69
74
  gem 'red-arrow-numo-narray' # Optional, recommended if you use inputs from Numo::NArray
70
75
  ```
71
76
 
72
- And then execute `bundle install` or install it yourself as `gem install red_amber`.
77
+ And then execute `bundle install` or install them yourself such as `gem install red_amber`.
73
78
 
74
79
  ## Docker image and Jupyter Notebook
75
80
 
76
- [RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
81
+ [RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to Kenta Murata).
77
82
 
78
83
  Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb).
79
84
  [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb)
80
85
 
86
+ ## Comparison of DataFrames
87
+
88
+ Comparison of basic features of RedAmber with Python
89
+ [pandas](https://pandas.pydata.org/),
90
+ R [Tidyverse](https://www.tidyverse.org/) and
91
+ Julia [Dataframes](https://dataframes.juliadata.org/stable/) is [here](doc/DataFrame_Comparison.md) (Thanks to Benson Muite).
81
92
 
82
93
  ## Data frame in `RedAmber`
83
94
 
84
95
  Class `RedAmber::DataFrame` represents a set of data in 2D-shape.
85
96
  The entity is a Red Arrow's Table object.
86
97
 
87
- ![dataframe model of RedAmber](doc/image/dataframe_model.png)
98
+ ![dataframe model of RedAmber](https://raw.githubusercontent.com/heronshoes/red_amber/main/doc/image/dataframe_model.png)
88
99
 
89
100
  Let's load the library and try some examples.
90
101
 
@@ -95,6 +106,11 @@ include RedAmber
95
106
 
96
107
  ### Example: diamonds dataset
97
108
 
109
+ First do (if you do not installed) `
110
+ gem install red-datasets-arrow
111
+ `
112
+ then
113
+
98
114
  ```ruby
99
115
  require 'datasets-arrow' # to load sample data
100
116
 
@@ -120,7 +136,7 @@ For example, we can compute mean prices per cut for the data larger than 1 carat
120
136
 
121
137
  ```ruby
122
138
  df = diamonds
123
- .slice { carat > 1 }
139
+ .slice { carat > 1 } # or use #filter instead of #slice
124
140
  .group(:cut)
125
141
  .mean(:price) # `pick` prior to `group` is not required if `:price` is specified here.
126
142
  .sort('-mean(price)')
@@ -169,7 +185,7 @@ starwars
169
185
  .drop(0) # delete unnecessary index column
170
186
  .remove { species == "NA" } # delete unnecessary rows
171
187
  .group(:species) { [count(:species), mean(:height, :mass)] }
172
- .slice { count > 1 }
188
+ .slice { count > 1 } # or use #filter instead of slice
173
189
 
174
190
  # =>
175
191
  #<RedAmber::DataFrame : 8 x 4 Vectors, 0x000000000000f848>
@@ -196,7 +212,7 @@ See [Vector.md](doc/Vector.md) for details.
196
212
 
197
213
  ## Jupyter notebook
198
214
 
199
- [83 Examples of Red Amber](https://github.com/heronshoes/docker-stacks/blob/RedAmber-binder/binder/examples_of_red_amber.ipynb)
215
+ [Examples of Red Amber](https://github.com/heronshoes/docker-stacks/blob/RedAmber-binder/binder/examples_of_red_amber.ipynb)
200
216
  ([raw file](https://raw.githubusercontent.com/heronshoes/docker-stacks/RedAmber-binder/binder/examples_of_red_amber.ipynb)) shows more examples in jupyter notebook.
201
217
 
202
218
  You can try this notebook on [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb).
data/benchmark/basic.yml CHANGED
@@ -1,10 +1,17 @@
1
+ loop_count: 3
2
+
1
3
  contexts:
2
4
  - name: HEAD
3
5
  prelude: |
4
6
  $LOAD_PATH.unshift(File.expand_path('lib'))
5
- - gems:
7
+ - name: 0.3.0
8
+ gems:
9
+ red_amber: 0.3.0
10
+ - name: 0.2.0
11
+ gems:
6
12
  red_amber: 0.2.0
7
- - gems:
13
+ - name: 0.1.5
14
+ gems:
8
15
  red_amber: 0.1.5
9
16
 
10
17
  prelude: |
@@ -21,8 +28,8 @@ benchmark:
21
28
  'B01: Pick([]) by a key name': |
22
29
  df[:flight]
23
30
 
24
- 'B02: Pick by index': |
25
- df[df.keys[9]]
31
+ 'B02a: Pick([]) by key names': |
32
+ df[:carrier, :flight]
26
33
 
27
34
  'B03: Pick by key names': |
28
35
  df.pick(:carrier, :flight)
@@ -1,13 +1,12 @@
1
- # --repeat-count 3
2
-
3
1
  loop_count: 3
4
2
 
5
3
  contexts:
6
4
  - name: HEAD
7
5
  prelude: |
8
6
  $LOAD_PATH.unshift(File.expand_path('lib'))
9
- # - gems:
10
- # red_amber: 0.2.3
7
+ - name: 0.3.0
8
+ gems:
9
+ red_amber: 0.3.0
11
10
 
12
11
  prelude: |
13
12
  require 'red_amber'
@@ -0,0 +1,62 @@
1
+ loop_count: 3
2
+
3
+ contexts:
4
+ - name: HEAD
5
+ prelude: |
6
+ $LOAD_PATH.unshift(File.expand_path('lib'))
7
+ - name: 0.3.0
8
+ gems:
9
+ red_amber: 0.3.0
10
+ - name: 0.2.0
11
+ gems:
12
+ red_amber: 0.2.0
13
+
14
+ prelude: |
15
+ require 'red_amber'
16
+ require 'datasets-arrow'
17
+
18
+ diamonds = RedAmber::DataFrame.new(Datasets::Diamonds.new.to_arrow)
19
+
20
+ starwars = RedAmber::DataFrame.new(Datasets::Rdataset.new('dplyr', 'starwars').to_arrow)
21
+
22
+ uri = URI("https://raw.githubusercontent.com/heronshoes/red_amber/master/test/entity/import_cars.tsv")
23
+ import_cars = RedAmber::DataFrame.load(uri)
24
+
25
+ ds = Datasets::Rdataset.new('openintro', 'simpsons_paradox_covid')
26
+ simpsons_paradox_covid = RedAmber::DataFrame.new(ds.to_arrow)
27
+
28
+ benchmark:
29
+ 'D01: Diamonds test': |
30
+ diamonds
31
+ .slice { v(:carat) > 1 }
32
+ .pick(:cut, :price)
33
+ .group(:cut)
34
+ .mean
35
+ .sort('-mean(price)')
36
+ .rename('mean(price)': :mean_price_USD)
37
+ .assign { [:mean_price_JPY, v(:mean_price_USD) * 110.0] }
38
+
39
+ 'D02: Starwars test': |
40
+ starwars
41
+ .drop { keys.select { |key| key.end_with?('color') } }
42
+ .remove { v(:species) == 'NA' }
43
+ .group(:species) { [count(:species), mean(:height, :mass)] }
44
+ .slice { v(:count) > 1 }
45
+
46
+ 'D03: Inport cars test': |
47
+ import_cars
48
+ .to_long(:Year, name: :Manufacturer, value: :Num_of_imported)
49
+ .to_wide(name: :Manufacturer, value: :Num_of_imported)
50
+ .transpose
51
+
52
+ 'D04: Simpsons paradox test': |
53
+ simpsons_paradox_covid[simpsons_paradox_covid[:age_group] == 'under 50']
54
+ .group(:vaccine_status, :outcome)
55
+ .count
56
+ .then { |df| df.to_wide(name: :vaccine_status, value: df.keys[-1]) }
57
+ .assign do
58
+ [
59
+ [:'vaccinated_%', (100.0 * v(:vaccinated) / v(:vaccinated).sum)],
60
+ [:'unvaccinated_%', (100.0 * v(:unvaccinated) / v(:unvaccinated).sum)]
61
+ ]
62
+ end
data/benchmark/group.yml CHANGED
@@ -1,8 +1,14 @@
1
+ loop_count: 3
2
+
1
3
  contexts:
2
4
  - name: HEAD
3
5
  prelude: |
4
6
  $LOAD_PATH.unshift(File.expand_path('lib'))
5
- - gems:
7
+ - name: 0.3.0
8
+ gems:
9
+ red_amber: 0.3.0
10
+ - name: 0.2.2
11
+ gems:
6
12
  red_amber: 0.2.2
7
13
 
8
14
  prelude: |
@@ -1,10 +1,14 @@
1
- # --repeat-count 3
1
+ loop_count: 3
2
2
 
3
3
  contexts:
4
4
  - name: HEAD
5
5
  prelude: |
6
6
  $LOAD_PATH.unshift(File.expand_path('lib'))
7
- - gems:
7
+ - name: 0.3.0
8
+ gems:
9
+ red_amber: 0.3.0
10
+ - name: 0.2.2
11
+ gems:
8
12
  red_amber: 0.2.2
9
13
 
10
14
  prelude: |
@@ -0,0 +1,63 @@
1
+ loop_count: 10
2
+
3
+ contexts:
4
+ - name: HEAD
5
+ prelude: |
6
+ $LOAD_PATH.unshift(File.expand_path('lib'))
7
+ - name: 0.3.0
8
+ gems:
9
+ red_amber: 0.3.0
10
+ - name: 0.2.0
11
+ gems:
12
+ red_amber: 0.2.0
13
+
14
+ prelude: |
15
+ require 'red_amber'
16
+ include RedAmber
17
+ require 'datasets-arrow'
18
+
19
+ ds = Datasets::Rdatasets.new('nycflights13', 'flights')
20
+ flights = RedAmber::DataFrame.new(ds.to_arrow)
21
+ df = flights.slice { flights[:month] <= 6 }
22
+
23
+ tailnum_vector = df[:tailnum]
24
+ distance_vector = df[:distance]
25
+
26
+ strings = tailnum_vector.to_a
27
+ arrow_array = tailnum_vector.data
28
+ integers = df[:dep_delay].to_a
29
+ boolean_vector = df[:air_time].is_nil
30
+ index_vector = Vector.new(0...boolean_vector.size).filter(boolean_vector)
31
+ replacer = index_vector.data.map(&:to_s)
32
+ booleans = boolean_vector.to_a
33
+
34
+ benchmark:
35
+ 'V01: Vector.new from integer Array': |
36
+ Vector.new(integers)
37
+
38
+ 'V02: Vector.new from string Array': |
39
+ Vector.new(strings)
40
+
41
+ 'V03: Vector.new from boolean Vector': |
42
+ Vector.new(boolean_vector)
43
+
44
+ 'V04: Vector#sum': |
45
+ distance_vector.mean
46
+
47
+ 'V05: Vector#*': |
48
+ distance_vector * 1.852
49
+
50
+ 'V06: Vector#[booleans]': |
51
+ tailnum_vector[booleans]
52
+
53
+ 'V07: Vector#[boolean_vector]': |
54
+ tailnum_vector[boolean_vector]
55
+
56
+ 'V08: Vector#[index_vector]': |
57
+ tailnum_vector[index_vector]
58
+
59
+ 'V09: Vector#replace': |
60
+ tailnum_vector.replace(booleans, replacer)
61
+
62
+ 'V10: Vector#replace with broad casting': |
63
+ tailnum_vector.replace(booleans, 'x')
data/doc/DataFrame.md CHANGED
@@ -57,6 +57,10 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
57
57
  ```ruby
58
58
  RedAmber::DataFrame.load("test/entity/with_header.csv")
59
59
  ```
60
+
61
+ ```ruby
62
+ RedAmber::DataFrame.load("test/entity/without_header.csv", headers: [:x, :y, :z])
63
+ ```
60
64
 
61
65
  - from a string buffer
62
66
 
@@ -275,6 +279,7 @@ penguins.to_rover
275
279
 
276
280
  - Shows some information about self in a transposed style.
277
281
  - `tdr_str` returns same info as a String.
282
+ - `glimpse` is an alias. It is similar to dplyr's (or Polars's) `glimpse()`.
278
283
 
279
284
  ```ruby
280
285
  require 'red_amber'
@@ -568,7 +573,7 @@ penguins.to_rover
568
573
  [1, 2, 3]
569
574
  ```
570
575
 
571
- ### `slice ` - slice and select records -
576
+ ### `slice ` - cut into slices of records -
572
577
 
573
578
  Slice and select records (rows) to create a sub DataFrame.
574
579
 
@@ -601,11 +606,14 @@ penguins.to_rover
601
606
 
602
607
  - Booleans as an argument
603
608
 
604
- `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
609
+ `filter(booleans)` or `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
610
+
611
+ note: `slice(booleans)` is acceptable for orthogonality of `slice`/`remove`.
605
612
 
606
613
  ```ruby
607
614
  vector = penguins[:bill_length_mm]
608
- penguins.slice(vector >= 40)
615
+ penguins.filter(vector >= 40)
616
+ # penguins.slice(vector >= 40) is also acceptable
609
617
 
610
618
  # =>
611
619
  #<RedAmber::DataFrame : 242 x 8 Vectors, 0x0000000000043d3c>
@@ -833,14 +841,14 @@ penguins.to_rover
833
841
 
834
842
  Assign new or updated variables (columns) and create an updated DataFrame.
835
843
 
836
- - Variables with new keys will append new columns from the right.
844
+ - Variables with new keys will append new columns from right.
837
845
  - Variables with exisiting keys will update corresponding vectors.
838
846
 
839
847
  ![assign method image](doc/../image/dataframe/assign.png)
840
848
 
841
849
  - Variables as arguments
842
850
 
843
- `assign(key_pairs)` accepts pairs of key and values as parameters. `key_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
851
+ `assign(key_value_pairs)` accepts pairs of key and values as parameters. `key_value_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
844
852
 
845
853
  ```ruby
846
854
  df = RedAmber::DataFrame.new(
@@ -857,12 +865,12 @@ penguins.to_rover
857
865
  2 Hinata 28
858
866
 
859
867
  # update :age and add :brother
860
- df.assign do
868
+ df.assign(
861
869
  {
862
870
  age: age + 29,
863
871
  brother: ['Santa', nil, 'Momotaro']
864
872
  }
865
- end
873
+ )
866
874
 
867
875
  # =>
868
876
  #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000658b0>
@@ -932,7 +940,7 @@ penguins.to_rover
932
940
 
933
941
  - Append from left
934
942
 
935
- `assign_left` method accepts the same parameters and block as `assign`, but append new columns from leftside.
943
+ `assign_left` method accepts the same parameters and block as `assign`, but append new columns from left.
936
944
 
937
945
  ```ruby
938
946
  df.assign_left(new_index: df.indices(1))
@@ -1302,7 +1310,10 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1302
1310
  - `join_keys` are keys shared by self and other to match with them.
1303
1311
  - If `join_keys` are empty, common keys in self and other are chosen (natural join).
1304
1312
  - If (common keys) > `join_keys`, duplicated keys are renamed by `suffix`.
1313
+ - If you want to match the columns with different names,
1314
+ use Hash for `join_keys` such as `{ left: :KEY1, right: KEY2}`.
1305
1315
 
1316
+ These are dataframes to use in the examples of joins.
1306
1317
  ```ruby
1307
1318
  df = DataFrame.new(
1308
1319
  KEY: %w[A B C],
@@ -1450,6 +1461,8 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1450
1461
  1 B 4
1451
1462
  2 D 5
1452
1463
  ```
1464
+ ##### `set_operable?(other)`
1465
+ Check if `types` of self and other are same.
1453
1466
 
1454
1467
  ##### `intersect(other)`
1455
1468
 
@@ -1495,15 +1508,23 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1495
1508
  <string> <uint8>
1496
1509
  1 B 2
1497
1510
  2 C 3
1511
+
1512
+ other.differencr(df)
1513
+ #=>
1514
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000040e0c>
1515
+ KEY1 KEY2
1516
+ <string> <uint8>
1517
+ 0 B 4
1518
+ 1 D 5
1498
1519
  ```
1499
1520
 
1500
1521
  ## Binding
1501
1522
 
1502
1523
  ### `concatenate(other)`
1503
1524
 
1504
- Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
1525
+ Concatenate another DataFrame or Table onto the bottom of self. The types of other must be the same as self.
1505
1526
 
1506
- The alias is `concat`.
1527
+ The alias is `concat` and `bind_rows`.
1507
1528
 
1508
1529
  An array of DataFrames or Tables is also acceptable as other.
1509
1530
 
@@ -1535,9 +1556,11 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1535
1556
  3 4 D
1536
1557
  ```
1537
1558
 
1538
- ### `merge(other)`
1559
+ ### `merge(*other)`
1560
+
1561
+ Concatenate another DataFrame or Table onto the bottom of self. The size of other must be the same as self. Self and other must not share the same key.
1539
1562
 
1540
- Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
1563
+ The alias is `bind_cols`.
1541
1564
 
1542
1565
  ```ruby
1543
1566
  df
@@ -0,0 +1,65 @@
1
+ # Comparison of DataFrames
2
+
3
+ Compare basic features of RedAmber with Python
4
+ [pandas](https://pandas.pydata.org/),
5
+ R [Tidyverse](https://www.tidyverse.org/) and
6
+ Julia [Dataframes](https://dataframes.juliadata.org/stable/).
7
+
8
+ ## Select columns (variables)
9
+
10
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
11
+ |--- |--- |--- |--- |--- |
12
+ | Select columns as a dataframe | pick, drop, [] | dplyr::select, dplyr::select_if | [], loc[], iloc[], drop, select_dtypes | [], select |
13
+ | Select a column as a vector | [], v | dplyr::pull, [, x] | [], loc[], iloc[] | [!, :x] |
14
+ | Move columns to a new position | pick, [] | relocate | [], reindex, loc[], iloc[] | select,transform |
15
+
16
+ ## Select rows (records, observations)
17
+
18
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
19
+ |--- |--- |--- |--- |--- |
20
+ | Select rows that meet logical criteria as a dataframe | slice, remove, [] | dplyr::filter | [], filter, query, loc[] | filter |
21
+ | Select rows by position as a dataframe | slice, remove, [] | dplyr::slice | iloc[], drop | subset |
22
+ | Move rows to a new position | slice, [] | dplyr::filter, dplyr::slice | reindex, loc[], iloc[] | permute |
23
+
24
+ ## Update columns / create new columns
25
+
26
+ |Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
27
+ |--- |--- |--- |--- |--- |
28
+ | Update existing columns | assign | dplyr::mutate | assign, []= | mapcols |
29
+ | Create new columns | assign, assign_left | dplyr::mutate | apply | insertcols,.+ |
30
+ | Compute new columns, drop others | new | transmute | (dfply:)transmute | transform,insertcols,mapcols |
31
+ | Rename columns | rename | dplyr::rename, dplyr::rename_with, purrr::set_names | rename, set_axis | rename |
32
+ | Sort dataframe | sort | dplyr::arrange | sort_values | sort |
33
+
34
+ ## Reshape dataframe
35
+
36
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
37
+ |--- |--- |--- |--- |--- |
38
+ | Gather columns into rows (create a longer dataframe) | to_long | tidyr::pivot_longer | melt | stack |
39
+ | Spread rows into columns (create a wider dataframe) | to_wide | tidyr::pivot_wider | pivot | unstack |
40
+ | transpose a wide dataframe | transpose | transpose, t | transpose, T | permutedims |
41
+
42
+ ## Grouping
43
+
44
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
45
+ |--- |--- |--- |--- |--- |
46
+ |Grouping | group, group.summarize | dplyr::group_by %>% dplyr::summarise | groupby.agg | combine,groupby |
47
+
48
+ ## Combine dataframes or tables
49
+
50
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
51
+ |--- |--- |--- |--- |--- |
52
+ | Combine additional columns | merge, bind_cols | dplyr::bind_cols | concat | combine |
53
+ | Combine additional rows | concatenate, concat, bind_rows | dplyr::bind_rows | concat | transform |
54
+ | Join right to left, leaving only the matching rows| join, inner_join | dplyr::inner_join | merge | innerjoin |
55
+ | Join right to left, leaving all rows | join, full_join, outer_join | dplyr::full_join | merge | outerjoin |
56
+ | Join matching values to left from right | join, left_join | dplyr::left_join | merge | leftjoin |
57
+ | Join matching values from left to right | join, right_join | dplyr::right_join | merge | rightjoin |
58
+ | Return rows of left that have a match in right | join, semi_join | dplyr::semi_join | [isin] | semijoin |
59
+ | Return rows of left that do not have a match in right | join, anti_join | dplyr::anti_join | [isin] | antijoin |
60
+ | Collect rows that appear in left or right | union | dplyr::union | merge | |
61
+ | Collect rows that appear in both left and right | intersect | dplyr::intersect | merge | |
62
+ | Collect rows that appear in left but not right | difference, setdiff | dplyr::setdiff | merge | |
63
+
64
+
65
+
data/doc/SubFrames.md ADDED
@@ -0,0 +1,11 @@
1
+ # SubFrames
2
+
3
+ `SubFrames` represents a collection of subsets of a DataFrame.
4
+ It has an Array of indices `#subset_indices` which is able to create an Array of sub DataFrames.
5
+ The concept includes `group` operation of a Dataframe, rolling window operation and has more broad capabilities.
6
+
7
+ This feature is experimental. It may be removed or be changed in the future.
8
+
9
+ ## Create SubFrames
10
+
11
+ ## Properties of SubFrames