RubyGems - red_amber - Versions diffs - 0.2.1 → 0.2.3 - Mend

red_amber 0.2.1 → 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (58) hide show

checksums.yaml +4 -4
data/.rubocop.yml +15 -0
data/CHANGELOG.md +170 -20
data/Gemfile +4 -2
data/README.md +121 -302
data/benchmark/basic.yml +79 -0
data/benchmark/combine.yml +63 -0
data/benchmark/drop_nil.yml +15 -3
data/benchmark/group.yml +33 -0
data/benchmark/reshape.yml +27 -0
data/benchmark/{csv_load_penguins.yml → rover/csv_load_penguins.yml} +3 -3
data/benchmark/rover/flights.yml +23 -0
data/benchmark/rover/penguins.yml +23 -0
data/benchmark/rover/planes.yml +23 -0
data/benchmark/rover/weather.yml +23 -0
data/doc/DataFrame.md +611 -318
data/doc/Vector.md +31 -36
data/doc/image/basic_verbs.png +0 -0
data/doc/image/dataframe/assign.png +0 -0
data/doc/image/dataframe/assign_operation.png +0 -0
data/doc/image/dataframe/drop.png +0 -0
data/doc/image/dataframe/join.png +0 -0
data/doc/image/dataframe/pick.png +0 -0
data/doc/image/dataframe/pick_operation.png +0 -0
data/doc/image/dataframe/remove.png +0 -0
data/doc/image/dataframe/rename.png +0 -0
data/doc/image/dataframe/rename_operation.png +0 -0
data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
data/doc/image/dataframe/set_and_bind.png +0 -0
data/doc/image/dataframe/slice.png +0 -0
data/doc/image/dataframe/slice_operation.png +0 -0
data/doc/image/dataframe_model.png +0 -0
data/doc/image/group_operation.png +0 -0
data/doc/image/replace-if_then.png +0 -0
data/doc/image/reshaping_dataframe.png +0 -0
data/doc/image/screenshot.png +0 -0
data/doc/image/vector/binary_element_wise.png +0 -0
data/doc/image/vector/unary_aggregation.png +0 -0
data/doc/image/vector/unary_aggregation_w_option.png +0 -0
data/doc/image/vector/unary_element_wise.png +0 -0
data/lib/red_amber/data_frame.rb +16 -42
data/lib/red_amber/data_frame_combinable.rb +283 -0
data/lib/red_amber/data_frame_displayable.rb +58 -3
data/lib/red_amber/data_frame_loadsave.rb +36 -0
data/lib/red_amber/data_frame_reshaping.rb +8 -6
data/lib/red_amber/data_frame_selectable.rb +9 -9
data/lib/red_amber/data_frame_variable_operation.rb +27 -21
data/lib/red_amber/group.rb +100 -17
data/lib/red_amber/helper.rb +20 -30
data/lib/red_amber/vector.rb +56 -30
data/lib/red_amber/vector_functions.rb +0 -8
data/lib/red_amber/vector_selectable.rb +9 -1
data/lib/red_amber/vector_updatable.rb +61 -63
data/lib/red_amber/version.rb +1 -1
data/lib/red_amber.rb +2 -0
data/red_amber.gemspec +1 -1
metadata +32 -11
data/doc/examples_of_red_amber.ipynb +0 -8979

data/README.md CHANGED Viewed

@@ -1,13 +1,16 @@
 # RedAmber
 [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
-[![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
+[![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml)
+[![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
 A simple dataframe library for Ruby.
 - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
 - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
+![screenshot from jupyterlab](doc/image/screenshot.png)
 ## Requirements
 Supported Ruby version is >= 2.7.
@@ -17,9 +20,9 @@ I recommend Ruby 3 for performance.
 ```ruby
 # Libraries required
-gem 'red-arrow',   '>= 9.0.0'
+gem 'red-arrow',   '~> 10.0.0' # Requires Apache Arrow (see installation below)
-gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet
+gem 'red-parquet', '~> 10.0.0' # Optional, if you use IO from/to parquet
 gem 'rover-df',    '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
 ```
@@ -27,368 +30,178 @@ gem 'rover-df',    '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
 Install requirements before you install Red Amber.
-- Apache Arrow GLib (>= 9.0.0)
-- Apache Parquet GLib (>= 9.0.0)  # If you use IO from/to parquet
+- Apache Arrow (~> 10.0.0)
+- Apache Arrow GLib (~> 10.0.0)
+- Apache Parquet GLib (~> 10.0.0)  # If you use IO from/to parquet
   See [Apache Arrow install document](https://arrow.apache.org/install/).
-  Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber.
-Add this line to your Gemfile:
-```ruby
+  - Minimum installation example for the latest Ubuntu:
+    ```
+    sudo apt update
+    sudo apt install -y -V ca-certificates lsb-release wget
+    wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
+    sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
+    sudo apt update
+    sudo apt install -y -V libarrow-dev
+    sudo apt install -y -V libarrow-glib-dev
+    ```
+  - On macOS, you can install Apache Arrow C++ library using Homebrew:
+    ```
+    brew install apache-arrow
+    ```
+    and GLib (C) package with:
+    ```
+    brew install apache-arrow-glib
+    ```
+If you prepared Apache Arrow, add these lines to your Gemfile:
+```ruby
+gem 'red-arrow',   '~> 10.0.0'
 gem 'red_amber'
+gem 'red-parquet', '~> 10.0.0' # Optional, if you use IO from/to parquet
+gem 'rover-df',    '~> 0.3.0'  # Optional, if you use IO from/to Rover::DataFrame
+gem 'red-datasets-arrow'       # Optional, recommended if you use Red Datasets
+gem 'red-arrow-numo-narray'    # Optional, recommended if you use inputs from Numo::NArray
 ```
-And then execute:
-```shell
-bundle install
-```
-Or install it yourself as:
-```shell
-gem install red_amber
-```
+And then execute `bundle install` or install it yourself as `gem install red_amber`.
 ## Docker image and Jupyter Notebook
 [RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
-Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb).
-[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/RubyData/docker-stacks/master?filepath=red-amber.ipynb)
+Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb).
+[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb)
-## `RedAmber::DataFrame`
+## Data frame in `RedAmber`
-It represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
+Class `RedAmber::DataFrame` represents a set of data in 2D-shape.
+The entity is a Red Arrow's Table object.
 ![dataframe model of RedAmber](doc/image/dataframe_model.png)
-```ruby
-require 'red_amber' # require 'red-amber' is also OK.
-require 'datasets-arrow'
-arrow = Datasets::Penguins.new.to_arrow
-penguins = RedAmber::DataFrame.new(arrow)
-# =>
-#<RedAmber::DataFrame : 344 x 8 Vectors, 0x0000000000013790>
-    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year
-    <string> <string>        <double>      <double>           <uint8> ... <uint16>
-  1 Adelie   Torgersen           39.1          18.7               181 ...     2007
-  2 Adelie   Torgersen           39.5          17.4               186 ...     2007
-  3 Adelie   Torgersen           40.3          18.0               195 ...     2007
-  4 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007
-  5 Adelie   Torgersen           36.7          19.3               193 ...     2007
-  : :        :                      :             :                 : ...        :
-342 Gentoo   Biscoe              50.4          15.7               222 ...     2009
-343 Gentoo   Biscoe              45.2          14.8               212 ...     2009
-344 Gentoo   Biscoe              49.9          16.1               213 ...     2009
-```
-For example, `DataFrame#pick` accepts keys as arguments and returns a sub DataFrame.
-![pick method image](doc/image/dataframe/pick.png)
+Let's load the library and try some examples.
 ```ruby
-penguins.keys
-# =>
-[:species,
- :island,
- :bill_length_mm,
- :bill_depth_mm,
- :flipper_length_mm,
- :body_mass_g,
- :sex,
- :year]
-df = penguins.pick(:species, :island, :body_mass_g)
-df
-# =>
-#<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003cc1c>
-    species  island    body_mass_g
-    <string> <string>     <uint16>
-  1 Adelie   Torgersen        3750
-  2 Adelie   Torgersen        3800
-  3 Adelie   Torgersen        3250
-  4 Adelie   Torgersen       (nil)
-  5 Adelie   Torgersen        3450
-  : :        :                   :
-342 Gentoo   Biscoe           5750
-343 Gentoo   Biscoe           5200
-344 Gentoo   Biscoe           5400
-```
-`DataFrame#drop` drops some columns to create a remainer DataFrame.
-![drop method image](doc/image/dataframe/drop.png)
-You can specify by keys or a boolean array of same size as n_keys.
-```ruby
-# Same as df.drop(:species, :island)
-df = df.drop(true, true, false)
-# =>
-#<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
-    body_mass_g
-       <uint16>
-  1        3750
-  2        3800
-  3        3250
-  4       (nil)
-  5        3450
-  :           :
-342        5750
-343        5200
-344        5400
+require 'red_amber' # require 'red-amber' is also OK.
+include RedAmber
 ```
-Arrow data is immutable, so these methods always return an new object.
-`DataFrame#assign` creates new columns or update existing columns.
-![assign method image](doc/image/dataframe/assign.png)
+### Example: diamonds dataset
 ```ruby
-# New column is created because ':body_mass_kg' is a new key.
-df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
-# =>
-#<RedAmber::DataFrame : 344 x 2 Vectors, 0x00000000000212f0>
-    body_mass_g body_mass_kg
-       <uint16>     <double>
-  1        3750          3.8
-  2        3800          3.8
-  3        3250          3.3
-  4       (nil)        (nil)
-  5        3450          3.5
-  :           :            :
-342        5750          5.8
-343        5200          5.2
-344        5400          5.4
-```
-`DataFrame#slice` selects rows (observations) to create a sub DataFrame.
+require 'datasets-arrow' # to load sample data
-![slice method image](doc/image/dataframe/slice.png)
-```ruby
-# returns 5 rows at the start and 5 rows from the end
-penguins.slice(0...5, -5..-1)
+dataset = Datasets::Diamonds.new
+diamonds = DataFrame.new(dataset) # from v0.2.2, should be `dataset.to_arrow` if older.
 # =>
-#<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
-   species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year
-   <string> <string>        <double>      <double>           <uint8> ... <uint16>
- 1 Adelie   Torgersen           39.1          18.7               181 ...     2007
- 2 Adelie   Torgersen           39.5          17.4               186 ...     2007
- 3 Adelie   Torgersen           40.3          18.0               195 ...     2007
- 4 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007
- 5 Adelie   Torgersen           36.7          19.3               193 ...     2007
- : :        :                      :             :                 : ...        :
- 8 Gentoo   Biscoe              50.4          15.7               222 ...     2009
- 9 Gentoo   Biscoe              45.2          14.8               212 ...     2009
-10 Gentoo   Biscoe              49.9          16.1               213 ...     2009
+#<RedAmber::DataFrame : 53940 x 10 Vectors, 0x000000000000f668>
+         carat cut       color    clarity     depth    table    price        x ...        z
+      <double> <string>  <string> <string> <double> <double> <uint16> <double> ... <double>
+    0     0.23 Ideal     E        SI2          61.5     55.0      326     3.95 ...     2.43
+    1     0.21 Premium   E        SI1          59.8     61.0      326     3.89 ...     2.31
+    2     0.23 Good      E        VS1          56.9     65.0      327     4.05 ...     2.31
+    3     0.29 Premium   I        VS2          62.4     58.0      334      4.2 ...     2.63
+    4     0.31 Good      J        SI2          63.3     58.0      335     4.34 ...     2.75
+    :        : :         :        :               :        :        :        : ...        :
+53937      0.7 Very Good D        SI1          62.8     60.0     2757     5.66 ...     3.56
+53938     0.86 Premium   H        SI2          61.0     58.0     2757     6.15 ...     3.74
+53939     0.75 Ideal     D        SI2          62.2     55.0     2757     5.83 ...     3.64
 ```
-`DataFrame#remove` rejects rows (observations) to create a remainer DataFrame.
-![remove method image](doc/image/dataframe/remove.png)
+For example, we can compute mean prices per cut for the data larger than 1 carat.
 ```ruby
-# penguins[:bill_length_mm] < 40 returns a boolean Vector
-penguins.remove(penguins[:bill_length_mm] < 40)
+df = diamonds
+  .slice { carat > 1 }
+  .group(:cut)
+  .mean(:price) # `pick` prior to `group` is not required if `:price` is specified here.
+  .sort('-mean(price)')
 # =>
-#<RedAmber::DataFrame : 244 x 8 Vectors, 0x000000000007d6f4>
-    species  island    bill_length_mm bill_depth_mm flipper_length_mm ...     year
-    <string> <string>        <double>      <double>           <uint8> ... <uint16>
-  1 Adelie   Torgersen           40.3          18.0               195 ...     2007
-  2 Adelie   Torgersen          (nil)         (nil)             (nil) ...     2007
-  3 Adelie   Torgersen           42.0          20.2               190 ...     2007
-  4 Adelie   Torgersen           41.1          17.6               182 ...     2007
-  5 Adelie   Torgersen           42.5          20.7               197 ...     2007
-  : :        :                      :             :                 : ...        :
-242 Gentoo   Biscoe              50.4          15.7               222 ...     2009
-243 Gentoo   Biscoe              45.2          14.8               212 ...     2009
-244 Gentoo   Biscoe              49.9          16.1               213 ...     2009
+#<RedAmber::DataFrame : 5 x 2 Vectors, 0x000000000000f67c>
+  cut       mean(price)
+  <string>     <double>
+0 Ideal         8674.23
+1 Premium       8487.25
+2 Very Good     8340.55
+3 Good           7753.6
+4 Fair          7177.86
 ```
-DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
-Previous example is also OK with a block.
-```ruby
-penguins.remove { bill_length_mm < 40 }
-```
-Next example is an usage of block to update a column.
+Arrow data is immutable, so these methods always return new objects.
+Next example will rename a column and create a new column by simple calcuration.
 ```ruby
-df = RedAmber::DataFrame.new(
-  integer: [0, 1, 2, 3, nil],
-  float:   [0.0, 1.1,  2.2, Float::NAN, nil],
-  string:  ['A', 'B', 'C', 'D', nil],
-  boolean: [true, false, true, false, nil])
-df
-# =>
-#<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000003131c>
-  integer    float string   boolean
-  <uint8> <double> <string> <boolean>
-1       0      0.0 A        true
-2       1      1.1 B        false
-3       2      2.2 C        true
-4       3      NaN D        false
-5   (nil)    (nil) (nil)    (nil)
-df.assign do
-  vectors.select(&:float?).map { |v| [v.key, -v] }
-  # => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
-end
-# =>
-#<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
-    index    float string
-  <uint8> <double> <string>
-1       0     -0.0 A
-2       1     -1.1 B
-3       2     -2.2 C
-4       3      NaN D
-5   (nil)    (nil) (nil)
-```
-Next example is to eliminate rows containing nil.
+usdjpy = 110.0 # when the yen was stronger
-```ruby
-# remove all observations containing nil
-nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
-nil_removed.tdr
+df.rename('mean(price)': :mean_price_USD)
+  .assign(:mean_price_JPY) { mean_price_USD * usdjpy }
 # =>
-RedAmber::DataFrame : 342 x 8 Vectors
-Vectors : 5 numeric, 3 strings
-# key                type   level data_preview
-1 :species           string     3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
-2 :island            string     3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
-3 :bill_length_mm    double   164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
-4 :bill_depth_mm     double    80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
-5 :flipper_length_mm int64     55 [181, 186, 195, 193, 190, ... ]
-6 :body_mass_g       int64     94 [3750, 3800, 3250, 3450, 3650, ... ]
-7 :sex               string     3 {"male"=>168, "female"=>165, ""=>9}
-8 :year              int64      3 {2007=>109, 2008=>114, 2009=>119}
+#<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f71c>
+  cut       mean_price_USD mean_price_JPY
+  <string>        <double>       <double>
+0 Ideal            8674.23      954164.93
+1 Premium          8487.25      933597.34
+2 Very Good        8340.55      917460.37
+3 Good              7753.6      852896.11
+4 Fair             7177.86      789564.12
 ```
-For this frequently needed task, we can do it much simpler.
-```ruby
-penguins.remove_nil # => same result as above
-```
+### Example: starwars dataset
-`DataFrame#summary` shows summary statistics in a DataFrame.
+Next example is `starwars` dataset reading from the downloaded CSV file. Followed by minimum data cleansing.
 ```ruby
-puts penguins.summary.to_s(width: 82)
+uri = URI('https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv')
-# =>
-  variables            count     mean      std      min      25%   median      75%      max
-  <dictionary>      <uint16> <double> <double> <double> <double> <double> <double> <double>
-1 bill_length_mm         342    43.92     5.46     32.1    39.23    44.38     48.5     59.6
-2 bill_depth_mm          342    17.15     1.97     13.1     15.6    17.32     18.7     21.5
-3 flipper_length_mm      342   200.92    14.06    172.0    190.0    197.0    213.0    231.0
-4 body_mass_g            342  4201.75   801.95   2700.0   3550.0   4031.5   4750.0   6300.0
-5 year                   344  2008.03     0.82   2007.0   2007.0   2008.0   2009.0   2009.0
-```
+starwars = DataFrame.load(uri)
-`DataFrame#group` method can be used for the grouping tasks.
-```ruby
-starwars = RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
 starwars
+  .drop(0) # delete unnecessary index column
+  .remove { species == "NA" } # delete unnecessary rows
+  .group(:species) { [count(:species), mean(:height, :mass)] }
+  .slice { count > 1 }
 # =>
-#<RedAmber::DataFrame : 87 x 12 Vectors, 0x000000000000607c>
-   unnamed1 name            height     mass hair_color skin_color  eye_color ... species
-    <int64> <string>       <int64> <double> <string>   <string>    <string>  ... <string>
- 1        1 Luke Skywalker     172     77.0 blond      fair        blue      ... Human
- 2        2 C-3PO              167     75.0 NA         gold        yellow    ... Droid
- 3        3 R2-D2               96     32.0 NA         white, blue red       ... Droid
- 4        4 Darth Vader        202    136.0 none       white       yellow    ... Human
- 5        5 Leia Organa        150     49.0 brown      light       brown     ... Human
- :        : :                    :        : :          :           :         ... :
-85       85 BB8              (nil)    (nil) none       none        black     ... Droid
-86       86 Captain Phasma   (nil)    (nil) unknown    unknown     unknown   ... NA
-87       87 Padmé Amidala      165     45.0 brown      light       brown     ... Human
-starwars.group(:species) { [count(:species), mean(:height, :mass)] }
-        .slice { count > 1 }
-# =>
-#<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000006e848>
+#<RedAmber::DataFrame : 8 x 4 Vectors, 0x000000000000f848>
   species    count mean(height) mean(mass)
   <string> <int64>     <double>   <double>
-1 Human         35        176.6       82.8
-2 Droid          6        131.2       69.8
-3 Wookiee        2        231.0      124.0
-4 Gungan         3        208.7       74.0
-5 NA             4        181.3       48.0
-6 Zabrak         2        173.0       80.0
-7 Twi'lek        2        179.0       55.0
-8 Mirialan       2        168.0       53.1
-9 Kaminoan       2        221.0       88.0
+0 Human         35       176.65      82.78
+1 Droid          6        131.2      69.75
+2 Wookiee        2        231.0      124.0
+3 Gungan         3       208.67       74.0
+4 Zabrak         2        173.0       80.0
+5 Twi'lek        2        179.0       55.0
+6 Mirialan       2        168.0       53.1
+7 Kaminoan       2        221.0       88.0
 ```
 See [DataFrame.md](doc/DataFrame.md) for other examples and details.
-## `RedAmber::Vector`
+### `Vector` for 1D data object in column
 Class `RedAmber::Vector` represents a series of data in the DataFrame.
-Method `RedAmber::DataFrame#[key]` returns a Vector with the key `key`.
-```ruby
-penguins[:bill_length_mm]
-# =>
-#<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
-[39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
-```
-Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
-This is an element-wise comparison and returns a boolean Vector of same size.
-![unary element-wise](doc/image/vector/unary_element_wise.png)
-```ruby
-penguins[:bill_length_mm] < 40
-# =>
-#<RedAmber::Vector(:boolean, size=344):0x000000000007e7ac>
-[true, true, false, nil, true, true, true, true, true, false, true, true, false, ... ]
-```
-Next example returns aggregated result.
-![unary aggregation](doc/image/vector/unary_aggregation.png)
-```ruby
-penguins[:bill_length_mm].mean
-43.92192982456141
-# =>
-```
 See [Vector.md](doc/Vector.md) for details.
 ## Jupyter notebook
-[71 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
+[83 Examples of Red Amber](https://github.com/heronshoes/docker-stacks/blob/RedAmber-binder/binder/examples_of_red_amber.ipynb)
+([raw file](https://raw.githubusercontent.com/heronshoes/docker-stacks/RedAmber-binder/binder/examples_of_red_amber.ipynb)) shows more examples in jupyter notebook.
+You can try this notebook on [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb).
+[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb)
 ## Development
@@ -399,8 +212,14 @@ bundle install
 bundle exec rake test
 ```
+## Community
 I will appreciate if you could help to improve this project. Here are a few ways you can help:
+- Let's talk in the [discussions](https://github.com/heronshoes/red_amber/discussions). [![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
+  - Browse Q and A, how to use, tips, etc.
+  - Ask questions you’re wondering about.
+  - Share ideas. The idea may be promoted to issues or pull requests.
 - [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
 - Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
 - Write, clarify, or fix documentation

data/benchmark/basic.yml ADDED Viewed

@@ -0,0 +1,79 @@
+contexts:
+  - name: HEAD
+    prelude: |
+      $LOAD_PATH.unshift(File.expand_path('lib'))
+  - gems:
+      red_amber: 0.2.0
+  - gems:
+      red_amber: 0.1.5
+prelude: |
+  require 'red_amber'
+  require 'datasets-arrow'
+  ds = Datasets::Rdatasets.new('nycflights13', 'flights')
+  df = RedAmber::DataFrame.new(ds.to_arrow)
+  slicer = df[:distance] > 1000
+  distance_km = df[:distance] * 1.852
+benchmark:
+  'B01: Pick([]) by a key name': |
+    df[:flight]
+  'B02: Pick by index': |
+    df[df.keys[9]]
+  'B03: Pick by key names': |
+    df.pick(:carrier, :flight)
+  'B04: Drop by key names': |
+    df.drop(:year, :month, :day)
+  'B05: Pick by booleans': |
+    df.pick(df.vectors.map(&:string?))
+  'B06: Pick by a block': |
+    df.pick { keys.map { |key| key.end_with?('time') } }
+  'B07: Slice([]) by a index': |
+    df[877]
+  'B08: Slice by indeces': |
+    df.slice(0...5, -5..-1)
+  'B09: Slice([]) by booleans': |
+    df[slicer]
+  'B10: Slice by booleans': |
+    df.slice(slicer)
+  'B11: Remove by booleans': |
+    df.remove(slicer)
+  'B12: Slice by a block': |
+    df.slice { slicer }
+  'B13: Rename by Hash': |
+    df.rename(distance: :distance_mile)
+  'B14: Assign an existing variable': |
+    df.assign(distance: distance_km)
+  'B15: Assign a new variable': |
+    df.assign(distance_km: distance_km)
+  'B16: Sort by a key': |
+    df.sort(:distance)
+  'B17: Sort by keys': |
+    df.sort(:origin, '-distance')
+  'B18: Convert to a Hash': |
+    df.to_h
+  'B19: Output in TDR style': |
+    df.tdr
+  'B20: Inspect': |
+    df.inspect

data/benchmark/combine.yml ADDED Viewed

@@ -0,0 +1,63 @@
+# --repeat-count 3
+loop_count: 3
+contexts:
+  - name: HEAD
+    prelude: |
+      $LOAD_PATH.unshift(File.expand_path('lib'))
+  # - gems:
+  #     red_amber: 0.2.3
+prelude: |
+  require 'red_amber'
+  include RedAmber
+  require 'datasets-arrow'
+  package = 'nycflights13'
+  airlines = DataFrame.new(Datasets::Rdatasets.new(package, 'airlines'))
+  airports = DataFrame.new(Datasets::Rdatasets.new(package, 'airports'))
+  flights  = DataFrame.new(Datasets::Rdatasets.new(package, 'flights'))
+    .pick(%i[month day carrier flight tailnum origin dest air_time distance])
+  planes   = DataFrame.new(Datasets::Rdatasets.new(package, 'planes'))
+  weather  = DataFrame.new(Datasets::Rdatasets.new(package, 'weather'))
+  flights_Q1 = flights.slice { month <= 3 }
+  flights_Q2 = flights.slice { month > 3 }
+  flights_1_2 = flights_Q1.slice { month.is_in(1, 2) }
+  flights_1_3 = flights_Q1.slice { month.is_in(1, 3) }
+  flights_left = flights_Q1.pick(...5)
+  flights_right = flights_Q1.pick(5..)
+benchmark:
+  'C01: Inner join on flights_Q1 by carrier': |
+    flights_Q1.inner_join(airlines, :carrier)
+  'C02: Full join on flights_Q1 by planes': |
+    flights_Q1.full_join(planes, :tailnum)
+  'C03: Left join on flights_Q1 by planes': |
+    flights_Q1.left_join(planes, :tailnum)
+  'C04: Semi join on flights_Q1 by planes': |
+    flights_Q1.semi_join(planes, :tailnum)
+  'C05: Anti join on flights_Q1 by planes': |
+    flights_Q1.anti_join(planes, :tailnum)
+  'C06: Intersection of flights_1_2 and flights_1_3': |
+    flights_1_2.intersect(flights_1_3)
+  'C07: Union of flights_1_2 and flights_1_3': |
+    flights_1_2.union(flights_1_3)
+  'C08: Difference between flights_1_2 and flights_1_3': |
+    flights_1_2.difference(flights_1_3)
+  'C09: Concatenate flight_Q1 on flight_Q2': |
+    flights_Q1.concatenate(flights_Q2)
+  'C10: Merge flights_Q1_right on flights_Q1_left': |
+    flights_left.merge(flights_right)