RubyGems - red_amber - Versions diffs - 0.2.3 → 0.4.0 - Mend

red_amber 0.2.3 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

checksums.yaml +4 -4
data/.rubocop.yml +133 -51
data/.yardopts +2 -0
data/CHANGELOG.md +203 -1
data/Gemfile +2 -1
data/LICENSE +1 -1
data/README.md +61 -45
data/benchmark/basic.yml +11 -4
data/benchmark/combine.yml +3 -4
data/benchmark/dataframe.yml +62 -0
data/benchmark/group.yml +7 -1
data/benchmark/reshape.yml +6 -2
data/benchmark/vector.yml +63 -0
data/doc/DataFrame.md +35 -12
data/doc/DataFrame_Comparison.md +65 -0
data/doc/SubFrames.md +11 -0
data/doc/Vector.md +295 -1
data/doc/yard-templates/default/fulldoc/html/css/common.css +6 -0
data/lib/red_amber/data_frame.rb +537 -68
data/lib/red_amber/data_frame_combinable.rb +776 -123
data/lib/red_amber/data_frame_displayable.rb +248 -18
data/lib/red_amber/data_frame_indexable.rb +122 -19
data/lib/red_amber/data_frame_loadsave.rb +81 -10
data/lib/red_amber/data_frame_reshaping.rb +216 -21
data/lib/red_amber/data_frame_selectable.rb +781 -120
data/lib/red_amber/data_frame_variable_operation.rb +561 -85
data/lib/red_amber/group.rb +195 -21
data/lib/red_amber/helper.rb +114 -32
data/lib/red_amber/refinements.rb +206 -0
data/lib/red_amber/subframes.rb +1066 -0
data/lib/red_amber/vector.rb +435 -58
data/lib/red_amber/vector_aggregation.rb +312 -0
data/lib/red_amber/vector_binary_element_wise.rb +387 -0
data/lib/red_amber/vector_selectable.rb +321 -69
data/lib/red_amber/vector_unary_element_wise.rb +436 -0
data/lib/red_amber/vector_updatable.rb +397 -24
data/lib/red_amber/version.rb +2 -1
data/lib/red_amber.rb +15 -1
data/red_amber.gemspec +4 -3
metadata +19 -11
data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
data/lib/red_amber/vector_functions.rb +0 -294

data/README.md CHANGED Viewed

@@ -1,28 +1,29 @@
 # RedAmber
-[![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
-[![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml)
+[![Gem Version](https://img.shields.io/gem/v/red_amber?color=brightgreen)](https://rubygems.org/gems/red_amber)
+[![CI](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/ci.yml)
+[![Maintainability](https://api.codeclimate.com/v1/badges/b8a745047045d2f49daa/maintainability)](https://codeclimate.com/github/heronshoes/red_amber/maintainability)
+[![Test coverage](https://api.codeclimate.com/v1/badges/b8a745047045d2f49daa/test_coverage)](https://codeclimate.com/github/heronshoes/red_amber/test_coverage)
+[![Doc](https://img.shields.io/badge/docs-latest-blue)](https://heronshoes.github.io/red_amber/)
 [![Discussions](https://img.shields.io/github/discussions/heronshoes/red_amber)](https://github.com/heronshoes/red_amber/discussions)
 A simple dataframe library for Ruby.
-- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
+- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
+[![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en) [![Gem Version](https://img.shields.io/gem/v/red-arrow?color=brightgreen)](https://rubygems.org/gems/red-arrow)
 - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
-![screenshot from jupyterlab](doc/image/screenshot.png)
+![screenshot from jupyterlab](https://raw.githubusercontent.com/heronshoes/red_amber/main/doc/image/screenshot.png)
 ## Requirements
+### Ruby
+Supported Ruby version is >= 3.0 (since RedAmber 0.3.0).
+- I decided to remove Ruby 2.7 without waiting for EOL. See [Release note for v0.3.0](https://github.com/heronshoes/red_amber/discussions/162) for details.
-Supported Ruby version is >= 2.7.
-Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
-I recommend Ruby 3 for performance.
+### Libraries
 ```ruby
-# Libraries required
-gem 'red-arrow',   '~> 10.0.0' # Requires Apache Arrow (see installation below)
-gem 'red-parquet', '~> 10.0.0' # Optional, if you use IO from/to parquet
+gem 'red-arrow',   '~> 11.0.0' # Requires Apache Arrow (see installation below)
+gem 'red-parquet', '~> 11.0.0' # Optional, if you use IO from/to parquet
 gem 'rover-df',    '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
 ```
@@ -30,61 +31,71 @@ gem 'rover-df',    '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
 Install requirements before you install Red Amber.
-- Apache Arrow (~> 10.0.0)
-- Apache Arrow GLib (~> 10.0.0)
-- Apache Parquet GLib (~> 10.0.0)  # If you use IO from/to parquet
+- Apache Arrow (~> 11.0.0)
+- Apache Arrow GLib (~> 11.0.0)
+- Apache Parquet GLib (~> 11.0.0)  # If you use IO from/to parquet
-  See [Apache Arrow install document](https://arrow.apache.org/install/).
+See [Apache Arrow install document](https://arrow.apache.org/install/).
   - Minimum installation example for the latest Ubuntu:
-    ```
-    sudo apt update
-    sudo apt install -y -V ca-certificates lsb-release wget
-    wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
-    sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
-    sudo apt update
-    sudo apt install -y -V libarrow-dev
-    sudo apt install -y -V libarrow-glib-dev
-    ```
-  - On macOS, you can install Apache Arrow C++ library using Homebrew:
-    ```
-    brew install apache-arrow
-    ```
-    and GLib (C) package with:
-    ```
-    brew install apache-arrow-glib
-    ```
+      ```
+      sudo apt update
+      sudo apt install -y -V ca-certificates lsb-release wget
+      wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
+      sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
+      sudo apt update
+      sudo apt install -y -V libarrow-dev
+      sudo apt install -y -V libarrow-glib-dev
+      ```
+  - On Fedora 38 (Rawhide):
+      ```
+      sudo dnf update
+      sudo dnf -y install gcc-c++ libarrow-devel libarrow-glib-devel ruby-devel
+      ```
+  - On macOS, using Homebrew:
+      ```
+      brew install apache-arrow
+      brew install apache-arrow-glib
+      ```
 If you prepared Apache Arrow, add these lines to your Gemfile:
 ```ruby
-gem 'red-arrow',   '~> 10.0.0'
+gem 'red-arrow',   '~> 11.0.0'
 gem 'red_amber'
-gem 'red-parquet', '~> 10.0.0' # Optional, if you use IO from/to parquet
+gem 'red-parquet', '~> 11.0.0' # Optional, if you use IO from/to parquet
 gem 'rover-df',    '~> 0.3.0'  # Optional, if you use IO from/to Rover::DataFrame
 gem 'red-datasets-arrow'       # Optional, recommended if you use Red Datasets
 gem 'red-arrow-numo-narray'    # Optional, recommended if you use inputs from Numo::NArray
 ```
-And then execute `bundle install` or install it yourself as `gem install red_amber`.
+And then execute `bundle install` or install them yourself such as `gem install red_amber`.
 ## Docker image and Jupyter Notebook
-[RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to @mrkn).
+[RubyData Docker Stacks](https://github.com/RubyData/docker-stacks) is available as a ready-to-run Docker image containing Jupyter and useful data tools as well as RedAmber (Thanks to Kenta Murata).
 Also you can try the contents of this README interactively by [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb).
 [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=red-amber.ipynb)
+## Comparison of DataFrames
+Comparison of  basic features of RedAmber with Python
+[pandas](https://pandas.pydata.org/),
+R [Tidyverse](https://www.tidyverse.org/) and
+Julia [Dataframes](https://dataframes.juliadata.org/stable/) is [here](doc/DataFrame_Comparison.md) (Thanks to Benson Muite).
 ## Data frame in `RedAmber`
 Class `RedAmber::DataFrame` represents a set of data in 2D-shape.
 The entity is a Red Arrow's Table object.
-![dataframe model of RedAmber](doc/image/dataframe_model.png)
+![dataframe model of RedAmber](https://raw.githubusercontent.com/heronshoes/red_amber/main/doc/image/dataframe_model.png)
 Let's load the library and try some examples.
@@ -95,6 +106,11 @@ include RedAmber
 ### Example: diamonds dataset
+First do (if you do not installed) `
+gem install red-datasets-arrow
+`
+then
 ```ruby
 require 'datasets-arrow' # to load sample data
@@ -120,7 +136,7 @@ For example, we can compute mean prices per cut for the data larger than 1 carat
 ```ruby
 df = diamonds
-  .slice { carat > 1 }
+  .slice { carat > 1 } # or use #filter instead of #slice
   .group(:cut)
   .mean(:price) # `pick` prior to `group` is not required if `:price` is specified here.
   .sort('-mean(price)')
@@ -169,7 +185,7 @@ starwars
   .drop(0) # delete unnecessary index column
   .remove { species == "NA" } # delete unnecessary rows
   .group(:species) { [count(:species), mean(:height, :mass)] }
-  .slice { count > 1 }
+  .slice { count > 1 } # or use #filter instead of slice
 # =>
 #<RedAmber::DataFrame : 8 x 4 Vectors, 0x000000000000f848>
@@ -196,7 +212,7 @@ See [Vector.md](doc/Vector.md) for details.
 ## Jupyter notebook
-[83 Examples of Red Amber](https://github.com/heronshoes/docker-stacks/blob/RedAmber-binder/binder/examples_of_red_amber.ipynb)
+[Examples of Red Amber](https://github.com/heronshoes/docker-stacks/blob/RedAmber-binder/binder/examples_of_red_amber.ipynb)
 ([raw file](https://raw.githubusercontent.com/heronshoes/docker-stacks/RedAmber-binder/binder/examples_of_red_amber.ipynb)) shows more examples in jupyter notebook.
 You can try this notebook on [Binder](https://mybinder.org/v2/gh/heronshoes/docker-stacks/RedAmber-binder?filepath=examples_of_red_amber.ipynb).

data/benchmark/basic.yml CHANGED Viewed

@@ -1,10 +1,17 @@
+loop_count: 3
 contexts:
   - name: HEAD
     prelude: |
       $LOAD_PATH.unshift(File.expand_path('lib'))
-  - gems:
+  - name: 0.3.0
+    gems:
+      red_amber: 0.3.0
+  - name: 0.2.0
+    gems:
       red_amber: 0.2.0
-  - gems:
+  - name: 0.1.5
+    gems:
       red_amber: 0.1.5
 prelude: |
@@ -21,8 +28,8 @@ benchmark:
   'B01: Pick([]) by a key name': |
     df[:flight]
-  'B02: Pick by index': |
-    df[df.keys[9]]
+  'B02a: Pick([]) by key names': |
+    df[:carrier, :flight]
   'B03: Pick by key names': |
     df.pick(:carrier, :flight)

data/benchmark/combine.yml CHANGED Viewed

@@ -1,13 +1,12 @@
-# --repeat-count 3
 loop_count: 3
 contexts:
   - name: HEAD
     prelude: |
       $LOAD_PATH.unshift(File.expand_path('lib'))
-  # - gems:
-  #     red_amber: 0.2.3
+  - name: 0.3.0
+    gems:
+      red_amber: 0.3.0
 prelude: |
   require 'red_amber'

data/benchmark/dataframe.yml ADDED Viewed

@@ -0,0 +1,62 @@
+loop_count: 3
+contexts:
+  - name: HEAD
+    prelude: |
+      $LOAD_PATH.unshift(File.expand_path('lib'))
+  - name: 0.3.0
+    gems:
+      red_amber: 0.3.0
+  - name: 0.2.0
+    gems:
+      red_amber: 0.2.0
+prelude: |
+  require 'red_amber'
+  require 'datasets-arrow'
+  diamonds = RedAmber::DataFrame.new(Datasets::Diamonds.new.to_arrow)
+  starwars = RedAmber::DataFrame.new(Datasets::Rdataset.new('dplyr', 'starwars').to_arrow)
+  uri = URI("https://raw.githubusercontent.com/heronshoes/red_amber/master/test/entity/import_cars.tsv")
+  import_cars = RedAmber::DataFrame.load(uri)
+  ds = Datasets::Rdataset.new('openintro', 'simpsons_paradox_covid')
+  simpsons_paradox_covid = RedAmber::DataFrame.new(ds.to_arrow)
+benchmark:
+  'D01: Diamonds test': |
+    diamonds
+      .slice { v(:carat) > 1 }
+      .pick(:cut, :price)
+      .group(:cut)
+      .mean
+      .sort('-mean(price)')
+      .rename('mean(price)': :mean_price_USD)
+      .assign { [:mean_price_JPY, v(:mean_price_USD) * 110.0] }
+  'D02: Starwars test': |
+    starwars
+      .drop { keys.select { |key| key.end_with?('color') } }
+      .remove { v(:species) == 'NA' }
+      .group(:species) { [count(:species), mean(:height, :mass)] }
+      .slice { v(:count) > 1 }
+  'D03: Inport cars test': |
+    import_cars
+      .to_long(:Year, name: :Manufacturer, value: :Num_of_imported)
+      .to_wide(name: :Manufacturer, value: :Num_of_imported)
+      .transpose
+  'D04: Simpsons paradox test': |
+    simpsons_paradox_covid[simpsons_paradox_covid[:age_group] == 'under 50']
+      .group(:vaccine_status, :outcome)
+      .count
+      .then { |df| df.to_wide(name: :vaccine_status, value: df.keys[-1]) }
+      .assign do
+        [
+          [:'vaccinated_%', (100.0 * v(:vaccinated) / v(:vaccinated).sum)],
+          [:'unvaccinated_%', (100.0 * v(:unvaccinated) / v(:unvaccinated).sum)]
+        ]
+      end

data/benchmark/group.yml CHANGED Viewed

@@ -1,8 +1,14 @@
+loop_count: 3
 contexts:
   - name: HEAD
     prelude: |
       $LOAD_PATH.unshift(File.expand_path('lib'))
-  - gems:
+  - name: 0.3.0
+    gems:
+      red_amber: 0.3.0
+  - name: 0.2.2
+    gems:
       red_amber: 0.2.2
 prelude: |

data/benchmark/reshape.yml CHANGED Viewed

@@ -1,10 +1,14 @@
-# --repeat-count 3
+loop_count: 3
 contexts:
   - name: HEAD
     prelude: |
       $LOAD_PATH.unshift(File.expand_path('lib'))
-  - gems:
+  - name: 0.3.0
+    gems:
+      red_amber: 0.3.0
+  - name: 0.2.2
+    gems:
       red_amber: 0.2.2
 prelude: |

data/benchmark/vector.yml ADDED Viewed

@@ -0,0 +1,63 @@
+loop_count: 10
+contexts:
+  - name: HEAD
+    prelude: |
+      $LOAD_PATH.unshift(File.expand_path('lib'))
+  - name: 0.3.0
+    gems:
+      red_amber: 0.3.0
+  - name: 0.2.0
+    gems:
+      red_amber: 0.2.0
+prelude: |
+  require 'red_amber'
+  include RedAmber
+  require 'datasets-arrow'
+  ds = Datasets::Rdatasets.new('nycflights13', 'flights')
+  flights = RedAmber::DataFrame.new(ds.to_arrow)
+  df = flights.slice { flights[:month] <= 6 }
+  tailnum_vector = df[:tailnum]
+  distance_vector = df[:distance]
+  strings = tailnum_vector.to_a
+  arrow_array = tailnum_vector.data
+  integers = df[:dep_delay].to_a
+  boolean_vector = df[:air_time].is_nil
+  index_vector = Vector.new(0...boolean_vector.size).filter(boolean_vector)
+  replacer = index_vector.data.map(&:to_s)
+  booleans = boolean_vector.to_a
+benchmark:
+  'V01: Vector.new from integer Array': |
+    Vector.new(integers)
+  'V02: Vector.new from string Array': |
+    Vector.new(strings)
+  'V03: Vector.new from boolean Vector': |
+    Vector.new(boolean_vector)
+  'V04: Vector#sum': |
+    distance_vector.mean
+  'V05: Vector#*': |
+    distance_vector * 1.852
+  'V06: Vector#[booleans]': |
+    tailnum_vector[booleans]
+  'V07: Vector#[boolean_vector]': |
+    tailnum_vector[boolean_vector]
+  'V08: Vector#[index_vector]': |
+    tailnum_vector[index_vector]
+  'V09: Vector#replace': |
+    tailnum_vector.replace(booleans, replacer)
+  'V10: Vector#replace with broad casting': |
+    tailnum_vector.replace(booleans, 'x')

data/doc/DataFrame.md CHANGED Viewed

@@ -57,6 +57,10 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
   ```ruby
   RedAmber::DataFrame.load("test/entity/with_header.csv")
   ```
+  ```ruby
+  RedAmber::DataFrame.load("test/entity/without_header.csv", headers: [:x, :y, :z])
+  ```
 - from a string buffer
@@ -275,6 +279,7 @@ penguins.to_rover
   - Shows some information about self in a transposed style.
   - `tdr_str` returns same info as a String.
+  - `glimpse` is an alias. It is similar to dplyr's (or Polars's) `glimpse()`.
   ```ruby
   require 'red_amber'
@@ -568,7 +573,7 @@ penguins.to_rover
   [1, 2, 3]
   ```
-### `slice  `  - slice and select records -
+### `slice  `  - cut into slices of records -
   Slice and select records (rows) to create a sub DataFrame.
@@ -601,11 +606,14 @@ penguins.to_rover
 - Booleans as an argument
-  `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
+  `filter(booleans)` or `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
+  note: `slice(booleans)` is acceptable for orthogonality of `slice`/`remove`.
     ```ruby
     vector = penguins[:bill_length_mm]
-    penguins.slice(vector >= 40)
+    penguins.filter(vector >= 40)
+    # penguins.slice(vector >= 40) is also acceptable
     # =>
     #<RedAmber::DataFrame : 242 x 8 Vectors, 0x0000000000043d3c>
@@ -833,14 +841,14 @@ penguins.to_rover
   Assign new or updated variables (columns) and create an updated DataFrame.
-  - Variables with new keys will append new columns from the right.
+  - Variables with new keys will append new columns from right.
   - Variables with exisiting keys will update corresponding vectors.
     ![assign method image](doc/../image/dataframe/assign.png)
 - Variables as arguments
-    `assign(key_pairs)` accepts pairs of key and values as parameters. `key_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
+    `assign(key_value_pairs)` accepts pairs of key and values as parameters. `key_value_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
     ```ruby
     df = RedAmber::DataFrame.new(
@@ -857,12 +865,12 @@ penguins.to_rover
     2 Hinata        28
     # update :age and add :brother
-    df.assign do
+    df.assign(
       {
         age: age + 29,
         brother: ['Santa', nil, 'Momotaro']
       }
-    end
+    )
     # =>
     #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000658b0>
@@ -932,7 +940,7 @@ penguins.to_rover
 - Append from left
-  `assign_left` method accepts the same parameters and block as `assign`, but append new columns from leftside.
+  `assign_left` method accepts the same parameters and block as `assign`, but append new columns from left.
   ```ruby
   df.assign_left(new_index: df.indices(1))
@@ -1302,7 +1310,10 @@ When the option `keep_key: true` used, the column `key` will be preserved.
   - `join_keys` are keys shared by self and other to match with them.
   - If `join_keys` are empty, common keys in self and other are chosen (natural join).
   - If (common keys) > `join_keys`, duplicated keys are renamed by `suffix`.
+  - If you want to match the columns with different names,
+    use Hash for `join_keys` such as `{ left: :KEY1, right: KEY2}`.
+  These are dataframes to use in the examples of joins.
   ```ruby
   df = DataFrame.new(
     KEY: %w[A B C],
@@ -1450,6 +1461,8 @@ When the option `keep_key: true` used, the column `key` will be preserved.
   1 B              4
   2 D              5
   ```
+##### `set_operable?(other)`
+  Check if `types` of self and other are same.
 ##### `intersect(other)`
@@ -1495,15 +1508,23 @@ When the option `keep_key: true` used, the column `key` will be preserved.
     <string> <uint8>
   1 B              2
   2 C              3
+  other.differencr(df)
+  #=>
+  #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000040e0c>
+    KEY1        KEY2
+    <string> <uint8>
+  0 B              4
+  1 D              5
   ```
 ## Binding
 ### `concatenate(other)`
-  Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
+  Concatenate another DataFrame or Table onto the bottom of self. The types  of other must be the same as self.
-  The alias is `concat`.
+  The alias is `concat` and `bind_rows`.
   An array of DataFrames or Tables is also acceptable as other.
@@ -1535,9 +1556,11 @@ When the option `keep_key: true` used, the column `key` will be preserved.
   3       4 D
   ```
-### `merge(other)`
+### `merge(*other)`
+  Concatenate another DataFrame or Table onto the bottom of self. The size of other must be the same as self. Self and other must not share the same key.
-  Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
+  The alias is `bind_cols`.
   ```ruby
   df

data/doc/DataFrame_Comparison.md ADDED Viewed

@@ -0,0 +1,65 @@
+# Comparison of DataFrames
+Compare basic features of RedAmber with Python
+[pandas](https://pandas.pydata.org/),
+R [Tidyverse](https://www.tidyverse.org/) and
+Julia [Dataframes](https://dataframes.juliadata.org/stable/).
+## Select columns (variables)
+| Features                        |	RedAmber        |	Tidyverse 	                     | pandas                                 | DataFrames.jl     |
+|---                              |---              |---                              |---                                     |---                |
+| Select columns as a dataframe   |	pick, drop, [] 	| dplyr::select, dplyr::select_if | [], loc[], iloc[], drop, select_dtypes | [], select        |
+| Select a column as a vector     | 	[], v 	        | dplyr::pull, [, x]	             | [], loc[], iloc[]                      | [!, :x]           |
+| Move columns to a new position  |	pick, [] 	      | relocate                        | [], reindex, loc[], iloc[]             | select,transform  |
+## Select rows (records, observations)
+| Features                                              |	RedAmber 	        | Tidyverse                   | pandas                   | DataFrames.jl |
+|---                                                    |---                |---                          |---                       |---            |
+| Select rows that meet logical criteria as a dataframe |	slice, remove, [] | 	dplyr::filter              |	[], filter, query, loc[] | filter        |
+| Select rows by position as a dataframe 	              | slice, remove, [] | dplyr::slice 	              | iloc[], drop             | subset        |
+| Move rows to a new position 	                         | slice, [] 	       | dplyr::filter, dplyr::slice |	reindex, loc[], iloc[]   | permute       |
+## Update columns / create new columns
+|Features 	                         | RedAmber 	          | Tidyverse 	                                        | pandas            | DataFrames.jl |
+|---                                |---                  |---                                                 |---                |---            |
+| Update existing columns           |	assign 	            | dplyr::mutate                                     	| assign, []=       | mapcols       |
+| Create new columns 	              | assign, assign_left |	dplyr::mutate 	                                    | apply             | insertcols,.+ |
+| Compute new columns, drop others 	| new 	               | transmute 	                                        | (dfply:)transmute | transform,insertcols,mapcols |
+| Rename columns 	                  | rename              |	dplyr::rename, dplyr::rename_with, purrr::set_names |	rename, set_axis  | rename        |
+| Sort dataframe 	                  | sort 	              | dplyr::arrange 	                                    | sort_values       | sort          |
+## Reshape dataframe
+| Features 	                                           | RedAmber 	| Tidyverse 	         | pandas       | DataFrames.jl |
+|---                                                   |---        |---                  |---           |---            |
+| Gather columns into rows (create a longer dataframe) |	to_long 	 | tidyr::pivot_longer |	melt         | stack         |
+| Spread rows into columns (create a wider dataframe)  | to_wide 	 | tidyr::pivot_wider 	| pivot        | unstack       |
+| transpose a wide dataframe 	                         | transpose | transpose, t 	      | transpose, T | permutedims   |
+## Grouping
+| Features | RedAmber 	              | Tidyverse 	                          | pandas       | DataFrames.jl   |
+|---       |---                      |---                                   |---           |---              |
+|Grouping 	| group, group.summarize 	| dplyr::group_by %>% dplyr::summarise | groupby.agg  | combine,groupby |
+## Combine dataframes or tables
+| Features 	                               |  RedAmber 	                    | Tidyverse          | pandas  | DataFrames.jl |
+|---                                       |---                             |---                 |---      |---            |
+| Combine additional columns               | merge, bind_cols               | dplyr::bind_cols   | concat  | combine       |
+| Combine additional rows 	                | concatenate, concat, bind_rows |	dplyr::bind_rows 	 | concat  | transform     |
+| Join right to left, leaving only the matching rows| join, inner_join      | dplyr::inner_join  | merge   | innerjoin     |
+| Join right to left, leaving all rows     | join, full_join, outer_join 	  | dplyr::full_join   | merge   | outerjoin     |
+| Join matching values to left from right  | join, left_join                |	dplyr::left_join 	 | merge   | leftjoin      |
+| Join matching values from left to right  | join, right_join               |	dplyr::right_join  | merge   | rightjoin     |
+| Return rows of left that have a match in right | join, semi_join 	        | dplyr::semi_join 	 | [isin]  | semijoin      |
+| Return rows of left that do not have a match in right | join, anti_join   |	dplyr::anti_join 	 | [isin]  | antijoin      |
+| Collect rows that appear in left or right | union 	                       | dplyr::union       | merge   |               |
+| Collect rows that appear in both left and right | intersect 	             | dplyr::intersect 	 | merge   |               |
+| Collect rows that appear in left but not right | difference, setdiff      | dplyr::setdiff 	   | merge   |               |

data/doc/SubFrames.md ADDED Viewed

@@ -0,0 +1,11 @@
+# SubFrames
+`SubFrames` represents a collection of subsets of a DataFrame.
+It has an Array of indices `#subset_indices` which is able to create an Array of sub DataFrames.
+The concept includes `group` operation of a Dataframe, rolling window operation and has more broad capabilities.
+This feature is experimental. It may be removed or be changed in the future.
+## Create SubFrames
+## Properties of SubFrames