RubyGems - red_amber - Versions diffs - 0.4.2 → 0.5.1 - Mend

red_amber 0.4.2 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

checksums.yaml +4 -4
data/.devcontainer/Dockerfile +75 -0
data/.devcontainer/devcontainer.json +38 -0
data/.devcontainer/onCreateCommand.sh +22 -0
data/.rubocop.yml +11 -5
data/CHANGELOG.md +141 -17
data/Gemfile +5 -6
data/README.ja.md +271 -0
data/README.md +52 -31
data/Rakefile +55 -0
data/benchmark/group.yml +12 -5
data/doc/Dev_Containers.ja.md +290 -0
data/doc/Dev_Containers.md +292 -0
data/doc/qmd/examples_of_red_amber.qmd +4596 -0
data/doc/qmd/red-amber.qmd +90 -0
data/docker/Dockerfile +2 -2
data/docker/Gemfile +8 -3
data/docker/docker-compose.yml +1 -1
data/docker/readme.md +5 -5
data/lib/red_amber/data_frame.rb +78 -4
data/lib/red_amber/data_frame_combinable.rb +147 -119
data/lib/red_amber/data_frame_displayable.rb +7 -6
data/lib/red_amber/data_frame_loadsave.rb +1 -1
data/lib/red_amber/data_frame_selectable.rb +51 -2
data/lib/red_amber/data_frame_variable_operation.rb +6 -6
data/lib/red_amber/group.rb +476 -127
data/lib/red_amber/helper.rb +26 -0
data/lib/red_amber/subframes.rb +18 -11
data/lib/red_amber/vector.rb +45 -25
data/lib/red_amber/vector_aggregation.rb +26 -0
data/lib/red_amber/vector_selectable.rb +124 -40
data/lib/red_amber/vector_string_function.rb +279 -0
data/lib/red_amber/vector_unary_element_wise.rb +4 -0
data/lib/red_amber/vector_updatable.rb +28 -0
data/lib/red_amber/version.rb +1 -1
data/lib/red_amber.rb +2 -1
data/red_amber.gemspec +3 -3
metadata +19 -14
data/docker/Gemfile.lock +0 -80
data/docker/example +0 -74
data/docker/notebook/examples_of_red_amber.ipynb +0 -8562
data/docker/notebook/red-amber.ipynb +0 -188

data/doc/qmd/examples_of_red_amber.qmd ADDED Viewed

@@ -0,0 +1,4596 @@
+---
+title: 127 examples of Red Amber
+author: heronshoes
+date: '2023-08-11'
+format:
+  pdf:
+    code-fold: false
+jupyter: ruby
+format:
+  pdf:
+    toc: true
+    fontfamily: libertinus
+    colorlinks: true
+---
+For RedAmber Version 0.5.1-HEAD and Arrow version 12.0.1 .
+## 1. Install
+Install requirements before you install RedAmber.
+- Ruby (>= 3.0)
+- Apache Arrow (~> 12.0.0)
+- Apache Arrow GLib (~> 12.0.0)
+- Apache Parquet GLib (~> 12.0.0)  # if you need IO from/to Parquet resource.
+  See [Apache Arrow install document](https://arrow.apache.org/install/).
+  - Minimum installation example for the latest Ubuntu:
+    ```shell
+    sudo apt update
+    sudo apt install -y -V ca-certificates lsb-release wget
+    wget https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
+    sudo apt install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
+    sudo apt update
+    sudo apt install -y -V libarrow-dev
+    sudo apt install -y -V libarrow-glib-dev
+    ```
+  - On Fedora 38 (Rawhide):
+    ```shell
+    sudo dnf update
+    sudo dnf -y install gcc-c++ libarrow-devel libarrow-glib-devel ruby-devel
+  - On macOS, you can install Apache Arrow C++ library using Homebrew:
+    ```shell
+    brew install apache-arrow
+    ```
+    and GLib (C) package with:
+    ```shell
+    brew install apache-arrow-glib
+    ```
+If you prepared Apache Arrow, add these lines to your Gemfile:
+```ruby
+gem 'red-arrow',   '~> 12.0.0'
+gem 'red_amber'
+gem 'red-arrow-numo-narray'    # Optional, recommended if you use inputs from Numo::NArray
+                               # or use random sampling feature.
+gem 'red-parquet', '~> 12.0.0' # Optional, if you use IO from/to parquet
+gem 'red-datasets-arrow'       # Optional, recommended if you use Red Datasets
+gem 'red-arrow-activerecord'   # Optional, if you use Active Record
+gem 'rover-df',                # Optional, if you use IO from/to Rover::DataFrame.
+```
+And then execute `bundle install` or install it yourself as `gem install red_amber`.
+## 2. Require
+```{ruby}
+#| tags: []
+require 'red_amber' # require 'red-amber' is also OK
+include RedAmber
+{RedAmber: VERSION, Arrow: Arrow::VERSION}
+```
+## 3. Initialize
+There are several ways to initialize a DataFrame.
+```{ruby}
+#| tags: []
+# From a Hash
+DataFrame.new(x: [1, 2, 3], y: %w[A B C])
+```
+```{ruby}
+#| tags: []
+# From a schema and a row-oriented array
+DataFrame.new({ x: :uint8, y: :string }, [[1, 'A'], [2, 'B'], [3, 'C']])
+```
+```{ruby}
+#| tags: []
+# From an Arrow::Table
+table = Arrow::Table.new(x: [1, 2, 3], y: %w[A B C])
+DataFrame.new(table)
+```
+```{ruby}
+#| tags: []
+# From a Rover::DataFrame
+require 'rover'
+rover = Rover::DataFrame.new(x: [1, 2, 3], y: %w[A B C])
+DataFrame.new(rover)
+```
+```{ruby}
+#| tags: []
+# from a datasets in Red Datasets
+require 'datasets-arrow'
+dataset = Datasets::Penguins.new
+penguins = DataFrame.new(dataset) # Since 0.2.2 . If it is older, it must be `dataset.to_arrow`.
+```
+```{ruby}
+#| tags: []
+dataset = Datasets::Rdatasets.new('datasets', 'mtcars')
+mtcars = DataFrame.new(dataset)
+```
+(New from 0.2.3 with Arrow 10.0.0) It is possible to initialize by objects responsible to `to_arrow` since 0.2.3 . Arrays in Numo::NArray is responsible to `to_arrow` with `red-arrow-numo-narray` gem. This feature is proposed by the Red Data Tools member @kojix2 and implemented by @kou in Arrow 10.0.0 and Red Arrow Numo::NArray 0.0.6. Thanks!
+```{ruby}
+#| tags: []
+require 'arrow-numo-narray'
+DataFrame.new(numo: Numo::DFloat.new(3).rand)
+```
+Another example by Numo::NArray is [#77. Introduce columns from numo/narray](#77.-Introduce-columns-from-numo/narray).
+## 4. Load
+`RedAmber::DataFrame` delegates `#load` to `Arrow::Table#load`. We can load from `[.arrow, .arrows, .csv, .csv.gz, .tsv]` files.
+`load` accepts following options:
+`load(input, format: nil, compression: nil, schema: nil, skip_lines: nil)`
+- `format` [:arrow_file, :batch, :arrows, :arrow_stream, :stream, :csv, :tsv]
+- `compression` [:gzip, nil]
+- `schema` [Arrow::Schema]
+- `skip_lines` [Regexp]
+Load from a file 'comecome.csv';
+```{ruby}
+#| tags: []
+file = Tempfile.open(['comecome', '.csv']) do |f|
+  f.puts(<<~CSV)
+    name,age
+    Yasuko,68
+    Rui,49
+    Hinata,28
+  CSV
+  f
+end
+DataFrame.load(file)
+```
+Load from a Buffer;
+```{ruby}
+#| tags: []
+DataFrame.load(Arrow::Buffer.new(<<~BUFFER), format: :csv)
+  name,age
+  Yasuko,68
+  Rui,49
+  Hinata,28
+BUFFER
+```
+Load from a Buffer skipping comment line;
+```{ruby}
+#| tags: []
+DataFrame.load(Arrow::Buffer.new(<<~BUFFER), format: :csv, skip_lines: /^#/)
+  # comment
+  name,age
+  Yasuko,68
+  Rui,49
+  Hinata,28
+BUFFER
+```
+## 5. Load from a URI
+```{ruby}
+#| tags: []
+uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
+DataFrame.load(uri)
+```
+## 6. Save
+`#save` accepts same options as `#load`. See [#4. Load](#4.-Load).
+```{ruby}
+#| tags: []
+penguins.save("penguins.arrow")
+penguins.save("penguins.arrows")
+penguins.save("penguins.csv")
+penguins.save("penguins.csv.gz")
+penguins.save("penguins.tsv")
+penguins.save("penguins.feather")
+```
+(Since 0.3.0) `DataFrame#save` returns self.
+## 7. to_s/inspect
+`to_s` or `inspect` (it uses to_s inside) shows a preview of the dataframe.
+It shows first 5 and last 3 rows if it has many rows. Columns are also omitted if line is exceeded 80 letters.
+```{ruby}
+#| tags: []
+df = DataFrame.new(
+  x: [1, 2, 3, 4, 5],
+  y: [1, 2, 3, 0/0.0, nil],
+  s: %w[A B C D] << nil,
+  b: [true, false, true, false, nil]
+)
+```
+```{ruby}
+#| tags: []
+p penguins; nil
+```
+## 8. Show table
+`#table` shows Arrow::Table object. The alias is `#to_arrow`.
+```{ruby}
+#| tags: []
+df.table
+```
+```{ruby}
+#| tags: []
+penguins.to_arrow
+```
+```{ruby}
+#| tags: []
+# This is a Red Arrow's feature
+puts df.table.to_s(format: :column)
+```
+```{ruby}
+#| tags: []
+# This is also a Red Arrow's feature
+puts df.table.to_s(format: :list)
+```
+## 9. TDR
+TDR means 'Transposed Dataframe Representation'. It shows columns in lateral just the same shape as initializing by a Hash. TDR has some information which is useful for the exploratory data processing.
+- DataFrame shape: n_rows x n_columns
+- Data types
+- Levels: number of unique elements
+- Data preview: same data is aggregated if level is smaller (tally mode)
+- Show counts of abnormal element: NaN and nil
+It is similar to dplyr's (or Polars's) `glimpse()` so we have an alias `#glimpse` (since 0.4.0).
+```{ruby}
+#| tags: []
+df.tdr
+```
+```{ruby}
+#| tags: []
+penguins.tdr
+```
+`#tdr` has some options:
+`limit` : to limit a number of variables to show. Default value is `limit=10`.
+```{ruby}
+#| tags: []
+penguins.tdr(3)
+```
+By default `#tdr` shows 9 variables at maximum. `#tdr(:all)` will show all variables.
+```{ruby}
+#| tags: []
+mtcars.tdr(:all)
+```
+(Since 0.4.0) `#tdra` method is short cut for `#tdr(:all)`
+```{ruby}
+#| tags: []
+mtcars.tdra
+```
+`elements` : max number of elements to show in observations. Default value is `elements: 5`.
+```{ruby}
+#| tags: []
+penguins.tdr(elements: 3) # Show first 3 items in data
+```
+`tally` : max level to use tally mode. Level means size of `tally`ed hash. Default value is `tally: 5`.
+```{ruby}
+#| tags: []
+penguins.tdr(tally: 0) # Don't use tally mode
+```
+`#tdr_str` returns a String. `#tdr` do the same thing as `puts #tdr_str`
+```{ruby}
+#| tags: []
+puts penguins.tdr_str
+```
+(Since 0.4.0) `#glimpse` is an alias for `#tdr`.
+```{ruby}
+#| tags: []
+mtcars.glimpse(:all, elements: 10)
+```
+## 10. Size and shape
+```{ruby}
+#| tags: []
+# same as n_rows, n_obs
+df.size
+```
+```{ruby}
+#| tags: []
+# same as n_cols, n_vars
+df.n_keys
+```
+```{ruby}
+#| tags: []
+# [df.size, df.n_keys], [df.n_rows, df.n_cols]
+df.shape
+```
+## 11. Keys
+```{ruby}
+#| tags: []
+df.keys
+```
+```{ruby}
+#| tags: []
+penguins.keys
+```
+## 12. Types
+```{ruby}
+#| tags: []
+df.types
+```
+```{ruby}
+#| tags: []
+penguins.types
+```
+## 13. Data type classes
+```{ruby}
+#| tags: []
+df.type_classes
+```
+```{ruby}
+#| tags: []
+penguins.type_classes
+```
+## 14. Indices
+Another example of `indices` is in [66. Custom index](#66.-Custom-index).
+```{ruby}
+#| tags: []
+df.indexes
+# or
+df.indices
+```
+(Since 0.2.3) `#indices` returns Vector.
+## 15. To an Array or a Hash
+DataFrame#to_a returns an array of row-oriented data without a header.
+```{ruby}
+#| tags: []
+df.to_a
+```
+If you need a column-oriented array with keys, use `.to_h.to_a`
+```{ruby}
+#| tags: []
+df.to_h
+```
+```{ruby}
+#| tags: []
+df.to_h.to_a
+```
+## 16. Schema
+Schema is keys and value types pairs as a Hash.
+```{ruby}
+#| tags: []
+df.schema
+```
+## 17. Vector
+Each variable (column in the table) is represented by a Vector object.
+```{ruby}
+#| tags: []
+df[:x] # This syntax will come later
+```
+Or create new Vector by the constructor.
+```{ruby}
+#| tags: []
+Vector.new(1, 2, 3, 4, 5)
+```
+```{ruby}
+#| tags: []
+Vector.new(1..5)
+```
+```{ruby}
+#| tags: []
+Vector.new([1, 2, 3], [4, 5])
+```
+```{ruby}
+#| tags: []
+array = Arrow::Array.new([1, 2, 3, 4, 5])
+Vector.new(array)
+```
+(Since 0.4.2) New constructor Vector[*array_like] has introduced.
+```{ruby}
+#| tags: []
+Vector[1, 2, 3, 4, 5]
+```
+## 18. Vectors
+Returns an Array of Vectors as a DataFrame.
+```{ruby}
+#| tags: []
+df.vectors
+```
+## 19. Variables
+Returns key and Vector pairs as a Hash.
+```{ruby}
+#| tags: []
+df.variables
+```
+## 20. Select columns by #[ ]
+`DataFrame#[]` is overloading column operations and row operations.
+- For columns (variables)
+  - Key in a Symbol: `df[:symbol]`
+  - Key in a String: `df["string"]`
+  - Keys in an Array: `df[:symbol1, "string", :symbol2]`
+  - Keys by indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`
+```{ruby}
+#| tags: []
+# Keys in a Symbol and a String
+df[:x, 'y']
+```
+```{ruby}
+#| tags: []
+# Keys in a Range
+df[:x..:y]
+```
+```{ruby}
+#| tags: []
+# Keys with a index Range, and a symbol
+df[df.keys[2..], :x]
+```
+## 21. Select rows by #[ ]
+`DataFrame#[]` is overloading column operations and row operations.
+- For rows (observations)
+  - Select rows by a Index: `df[index]`
+  - Select rows by Indices: `df[indices]` # Array, Arrow::Array, Vectors are acceptable for indices
+  - Select rows by Ranges: `df[range]`
+  - Select rows by Booleans: `df[booleans]` # Array, Arrow::Array, Vectors are acceptable for booleans
+```{ruby}
+#| tags: []
+# indices
+df[0, 2, 1]
+```
+```{ruby}
+#| tags: []
+# including a Range
+# negative indices are also acceptable
+df[1..2, -1]
+```
+```{ruby}
+#| tags: []
+# booleans
+# length of boolean should be the same as self
+df[false, true, true, false, true]
+```
+```{ruby}
+#| tags: []
+# Arrow::Array
+indices = Arrow::UInt8Array.new([0,2,4])
+df[indices]
+```
+```{ruby}
+#| tags: []
+# By a Vector as indices
+indices = Vector.new(df.indices)
+# indices > 1 returns a boolean Vector
+df[indices > 1]
+```
+```{ruby}
+#| tags: []
+# By a Vector as booleans
+booleans = df[:b]
+```
+```{ruby}
+#| tags: []
+df[booleans]
+```
+## 22. empty?
+```{ruby}
+#| tags: []
+df.empty?
+```
+```{ruby}
+#| tags: []
+DataFrame.new
+```
+```{ruby}
+#| tags: []
+DataFrame.new.empty?
+```
+## 23. Select columns by pick
+`DataFrame#pick` accepts an Array of keys to pick up columns (variables) and creates a new DataFrame. You can change the order of columns at a same time.
+The name `pick` comes from the action to pick variables(columns) according to the label keys.
+```{ruby}
+#| tags: []
+df.pick(:s, :y)
+# or
+df.pick([:s, :y]) # OK too.
+```
+Or use a boolean Array of lengeh `n_key` to `pick`. This style preserves the order of variables.
+```{ruby}
+#| tags: []
+df.pick(false, true, true, false)
+# or
+df.pick([false, true, true, false])
+# or
+df.pick(Vector.new([false, true, true, false]))
+```
+`#pick` also accepts a block in the context of self.
+Next example is picking up numeric variables.
+```{ruby}
+#| tags: []
+# reciever is required with the argument style
+df.pick(df.vectors.map(&:numeric?))
+# with a block
+df.pick { vectors.map(&:numeric?) }
+```
+`pick` also accepts numeric indexes.
+(Since 0.2.1)
+```{ruby}
+#| tags: []
+df.pick(0, 3)
+```
+## 24. Reject columns by drop
+`DataFrame#drop` accepts an Array keys to drop columns (variables) to create a remainer DataFrame.
+The name `drop` comes from the pair word of `pick`.
+```{ruby}
+#| tags: []
+df.drop(:x, :b)
+# df.drop([:x, :b]) #is OK too.
+```
+Or use a boolean Array of lengeh `n_key` to `drop`.
+```{ruby}
+#| tags: []
+df.drop(true, false, false, true)
+# df.drop([true, false, false, true]) # is OK too
+```
+`#drop` also accepts a block in the context of self.
+Next example will drop variables which have nil or NaN values.
+```{ruby}
+#| tags: []
+df.drop { vectors.map { |v| v.is_na.any } }
+```
+Argument style is also acceptable but it requires the reciever 'df'.
+```{ruby}
+#| tags: []
+df.drop(df.vectors.map { |v| v.is_na.any })
+```
+`drop` also accepts numeric indexes.
+(Since 0.2.1)
+```{ruby}
+#| tags: []
+df.drop(0, 3)
+```
+## 25. Pick/drop and nil
+When `pick` or `drop` is used with booleans, nil in the booleans is treated as false. This behavior is aligned with Ruby's `BasicObject#!`.
+```{ruby}
+#| tags: []
+booleans = [true, true, false, nil]
+booleans_invert = booleans.map(&:!) # => [false, false, true, true] because nil.! is true
+df.pick(booleans) == df.drop(booleans_invert)
+```
+## 26. Vector#invert, #primitive_invert
+For the boolean Vector;
+```{ruby}
+#| tags: []
+vector = Vector.new(booleans)
+```
+nil is converted to nil by `Vector#invert`.
+```{ruby}
+#| tags: []
+vector.invert
+# or
+!vector
+```
+So `df.pick(booleans) != df.drop(booleans.invert)` when booleans have any nils.
+On the other hand, `Vector#primitive_invert` follows Ruby's `BasicObject#!`'s behavior. Then pick and drop keep 'MECE' behavior.
+```{ruby}
+#| tags: []
+vector.primitive_invert
+```
+```{ruby}
+#| tags: []
+df.pick(vector) == df.drop(vector.primitive_invert)
+```
+## 27. Pick/drop, #[] and #v
+When `pick` or `drop` select a single column (variable), it returns a `DataFrame` with one column (variable).
+```{ruby}
+#| tags: []
+df.pick(:x) # or
+df.drop(:y, :s, :b)
+```
+In contrast, when `[]` selects a single column (variable), it returns a `Vector`.
+```{ruby}
+#| tags: []
+df[:x]
+```
+This behavior may be useful to use with DataFrame manipulation verbs (like pick, drop, slice, remove, assign, rename).
+```{ruby}
+#| tags: []
+df.pick { keys.select { |key| df[key].numeric? } }
+```
+`df#v` method is same as `df#[]` to pick a Vector. But a little bit faster and easy to use in the block.
+```{ruby}
+#| tags: []
+df.v(:x)
+```
+## 28. Slice
+Another example of `slice` is [#70. Row index label by slice_by](#70.-Row-index-label-by-slice_by).
+`slice` selects rows (records) to create a subset of a DataFrame.
+`slice(indeces)` accepts indices as arguments. Indices should be Integers, Floats or Ranges of Integers. Negative index from the tail like Ruby's Array is also acceptable.
+```{ruby}
+#| tags: []
+# returns 5 rows from the start and 5 rows from the end
+penguins.slice(0...5, -5..-1)
+```
+```{ruby}
+#| tags: []
+# slice accepts Float index
+# 33% of 344 observations in index => 113.52 th data ??
+indexed_penguins = penguins.assign_left { [:index, indexes] } # #assign_left and assigner by Array is 0.2.0 feature
+indexed_penguins.slice(penguins.size * 0.33)
+```
+Indices in Vectors or Arrow::Arrays are also acceptable.
+Another way to select in `slice` is to use booleans. An alias for this feature is `filter`.
+- Booleans is an Array, Arrow::Array, Vector or their Array.
+- Each data type must be boolean.
+- Size of booleans must be same as the size of self.
+```{ruby}
+#| tags: []
+# make boolean Vector to check over 40
+booleans = penguins[:bill_length_mm] > 40
+```
+```{ruby}
+#| tags: []
+penguins.slice(booleans)
+```
+`slice` accepts a block.
+- We can't use both arguments and a block at a same time.
+- The block should return indeces in any length or a boolean Array with a same length as `size`.
+- Block is called in the context of self. So reciever 'self' can be omitted in the block.
+```{ruby}
+#| tags: []
+# return a DataFrame with bill_length_mm is in 2*std range around mean
+penguins.slice do
+  min = bill_length_mm.mean - bill_length_mm.std
+  max = bill_length_mm.mean + bill_length_mm.std
+  bill_length_mm.to_a.map { |e| (min..max).include? e }
+end
+```
+## 29. Slice and nil option
+`Arrow::Table#slice` uses `#filter` method with a option `Arrow::FilterOptions.null_selection_behavior = :emit_null`. This will propagate nil at the same row.
+```{ruby}
+#| tags: []
+hash = { a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3] }
+table = Arrow::Table.new(hash)
+table.slice([true, false, nil])
+```
+Whereas in RedAmber, `DataFrame#slice` with booleans containing nil is treated as false. This behavior comes from `Allow::FilterOptions.null_selection_behavior = :drop`. This is a default value for `Arrow::Table.filter` method.
+```{ruby}
+#| tags: []
+RedAmber::DataFrame.new(table).slice([true, false, nil]).table
+```
+## 30. Remove
+Slice and reject rows (observations) to create a remainer DataFrame.
+`#remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
+```{ruby}
+#| tags: []
+# returns 6th to 339th obs. Remainer of penguins.slice(0...5, -5..-1)
+penguins.remove(0...5, -5..-1)
+```
+`remove(booleans)` accepts booleans as a argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `#size`.
+```{ruby}
+#| tags: []
+# remove all observation contains nil
+removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
+```
+`remove {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as size. Block is called in the context of self.
+```{ruby}
+#| tags: []
+# Remove data in 2*std range around mean
+penguins.remove do
+  vector = self[:bill_length_mm]
+  min = vector.mean - vector.std
+  max = vector.mean + vector.std
+  vector.to_a.map { |e| (min..max).include? e }
+end
+```
+## 31. Remove and nil
+When `remove` used with booleans, nil in booleans is treated as false. This behavior is aligned with Ruby's `nil#!`.
+```{ruby}
+#| tags: []
+df = RedAmber::DataFrame.new(a: [1, 2, nil], b: %w[A B C], c: [1.0, 2, 3])
+```
+```{ruby}
+#| tags: []
+booleans = df[:a] < 2
+```
+```{ruby}
+#| tags: []
+booleans_invert = booleans.to_a.map(&:!)
+```
+```{ruby}
+#| tags: []
+df.slice(booleans) == df.remove(booleans_invert)
+```
+Whereas `Vector#invert` returns nil for elements nil. This will bring different result. (See #26)
+```{ruby}
+#| tags: []
+booleans.invert
+```
+```{ruby}
+#| tags: []
+df.remove(booleans.invert)
+```
+We have `#primitive_invert` method in Vector. This method returns the same result as `.to_a.map(&:!)` above.
+```{ruby}
+#| tags: []
+booleans.primitive_invert
+```
+```{ruby}
+#| tags: []
+df.remove(booleans.primitive_invert)
+```
+```{ruby}
+#| tags: []
+df.slice(booleans) == df.remove(booleans.primitive_invert)
+```
+## 32. Remove nil
+Remove any observations containing nil.
+```{ruby}
+#| tags: []
+penguins.remove_nil
+```
+The roundabout way for this is to use `#remove`.
+```{ruby}
+#| tags: []
+penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
+```
+## 33. Rename
+Rename keys (column names) to create a updated DataFrame.
+`#rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}` or an Array of Array `[[existing_key, new_key], ...]` .
+```{ruby}
+#| tags: []
+h = { name: %w[Yasuko Rui Hinata], age: [68, 49, 28] }
+comecome = RedAmber::DataFrame.new(h)
+```
+```{ruby}
+#| tags: []
+comecome.rename(age: :age_in_1993)
+# comecome.rename(:age, :age_in_1993) # is also OK
+# comecome.rename([:age, :age_in_1993]) # is also OK
+```
+`#rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}` or an Array of Array `[[existing_key, new_key], ...]`. Block is called in the context of self.
+Symbol key and String key are distinguished.
+## 34. Assign
+Another example of `assign` is [68. Assign revised](#68.-Assign-revised), [#69. Variations of assign](#69.-Variations-of-assign) .
+Assign new or updated columns (variables) and create a updated DataFrame.
+- Columns with new keys will append new variables at right (bottom in TDR).
+- Columns with exisiting keys will update corresponding vectors.
+`#assign(key_pairs)` accepts pairs of key and array_like values as arguments. The pairs should be a Hash of `{key => array_like}` or an Array of Array `[[key, array_like], ... ]`. `array_like` is one of `Vector`, `Array` or `Arrow::Array`.
+```{ruby}
+#| tags: []
+comecome = RedAmber::DataFrame.new( name: %w[Yasuko Rui Hinata], age: [68, 49, 28] )
+```
+```{ruby}
+#| tags: []
+# update :age and add :brother
+assigner = { age: [97, 78, 57], brother: ['Santa', nil, 'Momotaro'] }
+comecome.assign(assigner)
+```
+`#assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and array_like values as a Hash of `{key => array_like}` or an Array of Array `[[key, array_like], ... ]`. `array_like` is one of `Vector`, `Array` or `Arrow::Array`. Block is called in the context of self.
+```{ruby}
+#| tags: []
+df = RedAmber::DataFrame.new(
+  index: [0, 1, 2, 3, nil],
+  float: [0.0, 1.1,  2.2, Float::NAN, nil],
+  string: ['A', 'B', 'C', 'D', nil])
+```
+```{ruby}
+#| tags: []
+# update numeric variables
+df.assign do
+  vectors.select(&:numeric?).map { |v| [v.key, -v] }
+end
+```
+In this example, columns :x and :y are updated. Column :x returns complements for #negate method because :x is :uint8 type.
+```{ruby}
+#| tags: []
+df.types
+```
+## 35. Coerce in Vector
+Vector has coerce method.
+```{ruby}
+#| tags: []
+vector = RedAmber::Vector.new(1,2,3)
+```
+```{ruby}
+#| tags: []
+# Vector's `#*` method
+vector * -1
+```
+```{ruby}
+#| tags: []
+# coerced calculation
+-1 * vector
+```
+```{ruby}
+#| tags: []
+# `@-` operator
+-vector
+```
+## 36. Vector#to_ary
+`Vector#to_ary` will enable implicit conversion to an Array.
+```{ruby}
+#| tags: []
+Array(Vector.new([3, 4, 5]))
+```
+```{ruby}
+#| tags: []
+[1, 2] + Vector.new([3, 4, 5])
+```
+```{ruby}
+#| tags: []
+[1, 2, Vector.new([3, 4, 5])].flatten
+```
+## 37. Vector#fill_nil
+`Vector#fill_nil_forward` or `Vector#fill_nil_backward` will
+propagate the last valid observation forward (or backward).
+Or preserve nil if all previous values are nil or at the end.
+```{ruby}
+#| tags: []
+integer = Vector.new([0, 1, nil, 3, nil])
+integer.fill_nil_forward
+```
+```{ruby}
+#| tags: []
+integer.fill_nil_backward
+```
+(Since 0.4.2) `Vector#fill_nil(value)` will fill `value` to `nil` in self.
+```{ruby}
+#| tags: []
+integer.fill_nil(-1)
+```
+If value has upper type, self will automatically upcasted.
+Int16 will casted into double in next example.
+```{ruby}
+#| tags: []
+integer.fill_nil(0.1)
+```
+## 38. Vector#all?/any?
+`Vector#all?` returns true if all elements is true.
+`Vector#any?` returns true if exists any true.
+These are unary aggregation function.
+```{ruby}
+#| tags: []
+booleans = Vector.new([true, true, nil])
+booleans.all?
+```
+```{ruby}
+#| tags: []
+booleans.any?
+```
+If these methods are used with option `skip_nulls: false` nil is considered.
+```{ruby}
+#| tags: []
+booleans.all?(skip_nulls: false)
+```
+```{ruby}
+#| tags: []
+booleans.any?(skip_nulls: false)
+```
+## 39. Vector#count/count_uniq
+`Vector#count` counts element.
+`Vector#count_uniq` counts unique element. `#count_distinct` is an alias (Arrow's name).
+These are unary aggregation function.
+```{ruby}
+#| tags: []
+string = Vector.new(%w[A B A])
+string.count
+```
+```{ruby}
+#| tags: []
+string.count_uniq # count_distinct is also OK
+```
+## 40. Vector#stddev/variance
+These are unary element-wise function.
+For biased standard deviation;
+```{ruby}
+#| tags: []
+integers = Vector.new([1, 2, 3, nil])
+integers.stddev
+```
+For unbiased standard deviation;
+```{ruby}
+#| tags: []
+integers.sd
+```
+For biased variance;
+```{ruby}
+#| tags: []
+integers.variance
+```
+For unbiased variance;
+```{ruby}
+#| tags: []
+integers.var
+```
+## 41. Vector#negate
+These are unary element-wise function.
+```{ruby}
+#| tags: []
+double = Vector.new([1.0, -2, 3])
+double.negate
+```
+Same as #negate;
+```{ruby}
+#| tags: []
+-double
+```
+## 42. Vector#round
+Otions for `#round`;
+- `:n-digits` The number of digits to show.
+- `round_mode` Specify rounding mode.
+This is a unary element-wise function.
+```{ruby}
+#| tags: []
+double = RedAmber::Vector.new([15.15, 2.5, 3.5, -4.5, -5.5])
+```
+```{ruby}
+#| tags: []
+double.round
+```
+```{ruby}
+#| tags: []
+double.round(mode: :half_to_even)
+```
+```{ruby}
+#| tags: []
+double.round(mode: :towards_infinity)
+```
+```{ruby}
+#| tags: []
+double.round(mode: :half_up)
+```
+```{ruby}
+#| tags: []
+double.round(mode: :half_towards_zero)
+```
+```{ruby}
+#| tags: []
+double.round(mode: :half_towards_infinity)
+```
+```{ruby}
+#| tags: []
+double.round(mode: :half_to_odd)
+```
+```{ruby}
+#| tags: []
+double.round(n_digits: 0)
+```
+```{ruby}
+#| tags: []
+double.round(n_digits: 1)
+```
+```{ruby}
+#| tags: []
+double.round(n_digits: -1)
+```
+## 43. Vector#and/or
+RedAmber select `and_kleene`/`or_kleene` as default `&`/`|` method.
+These are unary element-wise function.
+```{ruby}
+#| tags: []
+bool_self  = Vector.new([true, true, true, false, false, false, nil, nil, nil])
+bool_other = Vector.new([true, false, nil, true, false, nil, true, false, nil])
+bool_self & bool_other  # same as bool_self.and_kleene(bool_other)
+```
+```{ruby}
+#| tags: []
+# Ruby's primitive `&&`
+bool_self && bool_other
+```
+```{ruby}
+#| tags: []
+# Arrow's default `and`
+bool_self.and_org(bool_other)
+```
+```{ruby}
+#| tags: []
+bool_self | bool_other  # same as bool_self.or_kleene(bool_other)
+```
+```{ruby}
+#| tags: []
+# Ruby's primitive `||`
+bool_self || bool_other
+```
+```{ruby}
+#| tags: []
+# Arrow's default `or`
+bool_self.or_org(bool_other)
+```
+## 44. Vector#is_finite/is_nan/is_nil/is_na
+These are unary element-wise function.
+```{ruby}
+#| tags: []
+double = Vector.new([Math::PI, Float::INFINITY, -Float::INFINITY, Float::NAN, nil])
+```
+```{ruby}
+#| tags: []
+double.is_finite
+```
+```{ruby}
+#| tags: []
+double.is_inf
+```
+```{ruby}
+#| tags: []
+double.is_na
+```
+```{ruby}
+#| tags: []
+double.is_nil
+```
+```{ruby}
+#| tags: []
+double.is_valid
+```
+## 45. Prime-th rows
+```{ruby}
+#| tags: []
+# prime-th rows ... Don't ask me what it means.
+require 'prime'
+penguins.assign_left(:index, penguins.indices + 1) # since 0.2.0
+        .slice { Vector.new(Prime.each(size).to_a) - 1 }
+```
+## 46. Slice by Enumerator
+Slice accepts Enumerator.
+```{ruby}
+#| tags: []
+# Select every 10 samples
+penguins.assign_left(index: penguins.indices) # 0.2.0 feature
+        .slice(0.step(by: 10, to: 340))
+```
+```{ruby}
+#| tags: []
+# Select every 2 samples by step 100
+penguins.assign_left(index: penguins.indices) # 0.2.0 feature
+        .slice { 0.step(by: 100, to: 300).map { |i| i..(i+1) } }
+```
+## 47. Output mode
+Output mode of `DataFrame#inspect` and `DataFrame#to_iruby` is Table mode by default. If you prefer other mode set the environment variable `RED_AMBER_OUTPUT_MODE` .
+```{ruby}
+#| tags: []
+ENV['RED_AMBER_OUTPUT_MODE'] = 'Table' # or nil (default)
+penguins  # Almost same as `puts penguins.to_s` in any mode
+```
+```{ruby}
+#| tags: []
+penguins[:species]
+```
+```{ruby}
+#| tags: []
+ENV['RED_AMBER_OUTPUT_MODE'] = 'Plain' # Since 0.2.2
+penguins
+```
+```{ruby}
+#| tags: []
+penguins[:species]
+```
+```{ruby}
+#| tags: []
+ENV['RED_AMBER_OUTPUT_MODE'] = 'Minimum'  # Since 0.2.2
+penguins
+```
+```{ruby}
+#| tags: []
+penguins[:species]
+```
+```{ruby}
+#| tags: []
+ENV['RED_AMBER_OUTPUT_MODE'] = 'TDR'
+penguins
+```
+```{ruby}
+#| tags: []
+penguins[:species]
+```
+```{ruby}
+#| tags: []
+ENV['RED_AMBER_OUTPUT_MODE'] = nil
+```
+## 48. Empty key
+Empty key `:""` will be automatically renamed to `:unnamed1`.
+If `:unnamed1` was used, `:unnamed1.succ` will be used.
+(Since 0.1.8)
+```{ruby}
+#| tags: []
+df = DataFrame.new("": [1, 2], unnamed1: [3, 4])
+```
+## 49. Grouping
+`DataFrame#group` takes group_keys as arguments, and creates `Group` class.
+Group class inspects counts of each unique elements.
+(Since 0.1.7)
+```{ruby}
+#| tags: []
+group = penguins.group(:species)
+```
+The instance of `Group` class has methods to summary functions.
+It returns `function(key)` style summarized columns as a result.
+```{ruby}
+#| tags: []
+group.count
+```
+If count result is same in multiple columns, count column is aggregated to one column `:count`.
+```{ruby}
+#| tags: []
+penguins.pick(:species, :bill_length_mm, :bill_depth_mm).group(:species).count
+```
+Grouping key comes first (leftmost) in the columns.
+## 50. Grouping with a block
+`DataFrame#group` takes a block and we can specify multiple functions.
+Inside the block is the context of instance of Group. So we can use summary functions without the reciever.
+(Since 0.1.8)
+```{ruby}
+#| tags: []
+penguins.group(:species) { [count(:species), mean(:body_mass_g)] }
+```
+`Group#summarize` accepts same block as `DataFrame#group`.
+```{ruby}
+#| tags: []
+group.summarize { [count(:species), mean] }
+```
+## 51. Group#count family
+`Group#count` counts the number of non-nil values in each group.
+If counts are the same (and do not include NaN or nil), columns for counts are unified.
+```{ruby}
+dataframe = DataFrame.new(
+  x: [*1..6],
+  y: %w[A A B B B C],
+  z: [false, true, false, nil, true, false])
+```
+Non-nil counts in column y and z are different.
+```{ruby}
+dataframe.group(:y).count
+```
+Non-nil counts in column x and y are same, so only one column is emitted.
+```{ruby}
+dataframe.group(:z).count
+```
+`Group#count_all` returns each record group size as a DataFrame. `Group#group_count` is an alias.
+```{ruby}
+dataframe.group(:y).count_all
+```
+`Group#count_uniq` count the unique values in each group and return as a DataFrame. `Group#count_distinct` is an alias.
+```{ruby}
+dataframe.group(:y).count_uniq
+```
+## 52. Group#one
+`Group#one` gets one value from each group.
+```{ruby}
+dataframe.group(:y).one
+```
+## 53. Group aggregation functions
+`Group#all` emits aggragated booleans Whether all elements in each group evaluate to true.
+```{ruby}
+dataframe.group(:y).all
+```
+`Group#any` emits aggragated booleans Whether any elements in each group evaluate to true.
+```{ruby}
+dataframe.group(:y).any
+```
+`Group#max` computes maximum of values in each group for numeric columns.
+```{ruby}
+dataframe.group(:y).max
+```
+`Group#mean` computes mean of values in each group for numeric columns.
+```{ruby}
+dataframe.group(:y).mean
+```
+`Group#median` computes median of values in each group for numeric columns.
+```{ruby}
+dataframe.group(:y).median
+```
+`Group#min` computes minimum of values in each group for numeric columns.
+```{ruby}
+dataframe.group(:y).min
+```
+`Group#product` computes product of values in each group for numeric columns.
+```{ruby}
+dataframe.group(:y).product
+```
+`Group#stddev` computes standrad deviation of values in each group for numeric columns.
+```{ruby}
+dataframe.group(:y).stddev
+```
+`Group#sum` computes sum of values in each group for numeric columns.
+```{ruby}
+dataframe.group(:y).sum
+```
+`Group#variance` computes variance of values in each group for numeric columns.
+```{ruby}
+dataframe.group(:y).variance
+```
+## 54. Group#grouped_frame
+`Group#grouped_frame` returns grouped DataFrame only for group keys. The alias is `#none`
+```{ruby}
+dataframe.group(:y).grouped_frame
+```
+## 55. Vector#shift
+`Vector#shift(amount = 1, fill: nil)`
+Shift vector's values by specified `amount`. Shifted space is filled by value `fill`.
+(Since 0.1.8)
+```{ruby}
+#| tags: []
+vector = RedAmber::Vector.new([1, 2, 3, 4, 5])
+vector.shift
+```
+```{ruby}
+#| tags: []
+vector.shift(-2)
+```
+```{ruby}
+#| tags: []
+vector.shift(fill: Float::NAN)
+```
+## 56. From the Pandas cookbook - if-then
+https://pandas.pydata.org/docs/user_guide/cookbook.html#if-then
+```python
+# by Python Pandas
+df = pd.DataFrame(
+    {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}
+)
+df.loc[df.AAA >= 5, "BBB"] = -1
+# returns =>
+   AAA  BBB  CCC
+0    4   10  100
+1    5   -1   50
+2    6   -1  -30
+3    7   -1  -50
+```
+```{ruby}
+#| tags: []
+# RedAmber
+df = DataFrame.new(
+  "AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]  # You can omit {}
+)
+df.assign(BBB: df[:BBB].replace(df[:AAA] >= 5, -1))
+```
+If you want to replace both :BBB and :CCC ;
+```{ruby}
+#| tags: []
+df.assign do
+  replacer = v(:AAA) >= 5  # Boolean Vector
+  {
+    BBB: v(:BBB).replace(replacer, -1),
+    CCC: v(:CCC).replace(replacer, -2)
+  }
+end
+```
+## 57. From the Pandas cookbook - Splitting
+Split a frame with a boolean criterion
+https://pandas.pydata.org/docs/user_guide/cookbook.html#splitting
+```python
+# by Python Pandas
+df = pd.DataFrame(
+    {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}
+)
+df[df.AAA <= 5]
+# returns =>
+   AAA  BBB  CCC
+0    4   10  100
+1    5   20   50
+df[df.AAA > 5]
+# returns =>
+   AAA  BBB  CCC
+2    6   30  -30
+3    7   40  -50
+```
+```{ruby}
+#| tags: []
+# RedAmber
+df = DataFrame.new(
+  # You can omit outer {}
+  "AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]
+)
+df.slice(df[:AAA] <= 5)
+# df[df[:AAA] <= 5] # is also OK
+```
+```{ruby}
+#| tags: []
+df.remove(df[:AAA] <= 5)
+# df.slice(df[:AAA] > 5) # do the same thing
+```
+## 58. From the Pandas cookbook - Building criteria
+Split a frame with a boolean criterion
+https://pandas.pydata.org/docs/user_guide/cookbook.html#building-criteria
+```python
+# by Python Pandas
+df = pd.DataFrame(
+    {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}
+)
+# and
+df.loc[(df["BBB"] < 25) & (df["CCC"] >= -40), "AAA"]
+# returns a series =>
+0    4
+1    5
+Name: AAA, dtype: int64
+# or
+df.loc[(df["BBB"] > 25) | (df["CCC"] >= -40), "AAA"]
+# returns a series =>
+0    4
+1    5
+2    6
+3    7
+Name: AAA, dtype: int64
+```
+```{ruby}
+#| tags: []
+# RedAmber
+df = DataFrame.new(
+  # You can omit {}
+  "AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]
+)
+df.slice( (df[:BBB] < 25) & (df[:CCC] >= 40) ).pick(:AAA)
+```
+```{ruby}
+#| tags: []
+df.slice( (df[:BBB] > 25) | (df[:CCC] >= 40) ).pick(:AAA)
+# df[ (df[:BBB] > 25) | (df[:CCC] >= 40) ][:AAA)] # also OK
+```
+```python
+# by Python Pandas
+# or (with assignment)
+df.loc[(df["BBB"] > 25) | (df["CCC"] >= 75), "AAA"] = 0.1
+df
+# returns a dataframe =>
+   AAA  BBB  CCC
+0  0.1   10  100
+1  5.0   20   50
+2  0.1   30  -30
+3  0.1   40  -50
+```
+```{ruby}
+#| tags: []
+# df.assign(AAA: df[:AAA].replace((df[:BBB] > 25) | (df[:CCC] >= 75), 0.1)) # by one liner
+booleans = (df[:BBB] > 25) | (df[:CCC] >= 75)
+replaced = df[:AAA].replace(booleans, 0.1)
+df.assign(AAA: replaced)
+```
+```python
+# by Python Pandas
+# Select rows with data closest to certain value using argsort
+df = pd.DataFrame(
+    {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}
+)
+aValue = 43.0
+df.loc[(df.CCC - aValue).abs().argsort()]
+# returns a dataframe =>
+   AAA  BBB  CCC
+1    5   20   50
+0    4   10  100
+2    6   30  -30
+3    7   40  -50
+```
+```{ruby}
+#| tags: []
+a_value = 43
+df[(df[:CCC] - a_value).abs.sort_indexes]
+# df.slice (df[:CCC] - a_value).abs.sort_indexes # also OK
+```
+```python
+# by Python Pandas
+# Dynamically reduce a list of criteria using a binary operators
+df = pd.DataFrame(
+    {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}
+)
+Crit1 = df.AAA <= 5.5
+Crit2 = df.BBB == 10.0
+Crit3 = df.CCC > -40.0
+AllCrit = Crit1 & Crit2 & Crit3
+import functools
+CritList = [Crit1, Crit2, Crit3]
+AllCrit = functools.reduce(lambda x, y: x & y, CritList)
+df[AllCrit]
+# returns a dataframe =>
+   AAA  BBB  CCC
+0    4   10  100
+```
+```{ruby}
+#| tags: []
+crit1 = df[:AAA] <= 5.5
+crit2 = df[:BBB] == 10.0
+crit3 = df[:CCC] >= -40.0
+df[crit1 & crit2 & crit3]
+```
+## 59. From the Pandas cookbook - Dataframes
+https://pandas.pydata.org/docs/user_guide/cookbook.html#dataframes
+```python
+# by Python Pandas
+# Using both row labels and value conditionals
+df = pd.DataFrame(
+    {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]}
+)
+df[(df.AAA <= 6) & (df.index.isin([0, 2, 4]))]
+# returns =>
+   AAA  BBB  CCC
+0    4   10  100
+2    6   30  -30
+```
+```{ruby}
+#| tags: []
+# RedAmber
+df = DataFrame.new(
+  "AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]
+)
+df[(df[:AAA] <= 6) & df.indices.map { |i| [0, 2, 4].include? i }]
+```
+```python
+# by Python Pandas
+# Use loc for label-oriented slicing and iloc positional slicing GH2904
+df = pd.DataFrame(
+    {"AAA": [4, 5, 6, 7], "BBB": [10, 20, 30, 40], "CCC": [100, 50, -30, -50]},
+    index=["foo", "bar", "boo", "kar"],
+)
+# There are 2 explicit slicing methods, with a third general case
+# 1. Positional-oriented (Python slicing style : exclusive of end)
+# 2. Label-oriented (Non-Python slicing style : inclusive of end)
+# 3. General (Either slicing style : depends on if the slice contains labels or positions)
+df.loc["bar":"kar"]  # Label
+# returns =>
+     AAA  BBB  CCC
+bar    5   20   50
+boo    6   30  -30
+kar    7   40  -50
+# Generic
+df[0:3]
+# returns =>
+     AAA  BBB  CCC
+foo    4   10  100
+bar    5   20   50
+boo    6   30  -30
+df["bar":"kar"]
+# returns =>
+     AAA  BBB  CCC
+bar    5   20   50
+boo    6   30  -30
+kar    7   40  -50
+```
+```{ruby}
+#| tags: []
+# RedAmber does not have row index. Use a new column as indexes.
+labeled = df.assign_left(index: %w[foo bar boo kar])
+# labeled = df.assign(index: %w[foo bar boo kar]).pick { [keys[-1], keys[0...-1]] } # until v0.1.8
+```
+```{ruby}
+#| tags: []
+labeled[1..3]
+```
+```{ruby}
+#| tags: []
+labeled.slice do
+  v = v(:index)
+  v.index("bar")..v.index("kar")
+end
+```
+`slice_by` returns the same result as above.
+(Since 0.2.1)
+```{ruby}
+#| tags: []
+labeled.slice_by(:index, keep_key: true) { "bar".."kar"}
+```
+```python
+# by Python Pandas
+# Ambiguity arises when an index consists of integers with a non-zero start or non-unit increment.
+df2 = pd.DataFrame(data=data, index=[1, 2, 3, 4])  # Note index starts at 1.
+df2.iloc[1:3]  # Position-oriented
+# returns =>
+   AAA  BBB  CCC
+2    5   20   50
+3    6   30  -30
+df2.loc[1:3]  # Label-oriented
+# returns =>
+   AAA  BBB  CCC
+1    4   10  100
+2    5   20   50
+3    6   30  -30
+```
+```{ruby}
+#| tags: []
+# RedAmber only have an implicit integer index 0...size,
+# does not happen any ambiguity unless you create a new column and use it for indexes :-).
+```
+```python
+# by Python Pandas
+# Using inverse operator (~) to take the complement of a mask
+df[~((df.AAA <= 6) & (df.index.isin([0, 2, 4])))]
+# returns =>
+   AAA  BBB  CCC
+1    5   20   50
+3    7   40  -50
+```
+```{ruby}
+#| tags: []
+# RedAmber offers #! method for boolean Vector.
+df[!((df[:AAA] <= 6) & df.indices.map { |i| [0, 2, 4].include? i })]
+# or
+# df[((df[:AAA] <= 6) & df.indices.map { |i| [0, 2, 4].include? i }).invert]
+```
+If you have `nil` in your data, consider #primitive_invert for consistent result. See example #26.
+## 60. From the Pandas cookbook - New columns
+https://pandas.pydata.org/docs/user_guide/cookbook.html#new-columns
+```python
+# by Python Pandas
+# Efficiently and dynamically creating new columns using applymap
+df = pd.DataFrame({"AAA": [1, 2, 1, 3], "BBB": [1, 1, 2, 2], "CCC": [2, 1, 3, 1]})
+df
+# returns =>
+   AAA  BBB  CCC
+0    1    1    2
+1    2    1    1
+2    1    2    3
+3    3    2    1
+source_cols = df.columns  # Or some subset would work too
+new_cols = [str(x) + "_cat" for x in source_cols]
+categories = {1: "Alpha", 2: "Beta", 3: "Charlie"}
+df[new_cols] = df[source_cols].applymap(categories.get)
+df
+# returns =>
+   AAA  BBB  CCC  AAA_cat BBB_cat  CCC_cat
+0    1    1    2    Alpha   Alpha     Beta
+1    2    1    1     Beta   Alpha    Alpha
+2    1    2    3    Alpha    Beta  Charlie
+3    3    2    1  Charlie    Beta    Alpha
+```
+```{ruby}
+#| tags: []
+# RedAmber
+df = DataFrame.new({"AAA": [1, 2, 1, 3], "BBB": [1, 1, 2, 2], "CCC": [2, 1, 3, 1]})
+```
+```{ruby}
+#| tags: []
+categories = {1 => "Alpha", 2 => "Beta", 3 => "Charlie"}
+# Creating a Hash from keys
+df.assign do
+  keys.each_with_object({}) do |key, h|
+    h["#{key}_cat"] = v(key).to_a.map { |x| categories[x] }
+  end
+end
+# Creating an Array from vectors, from v0.2.0
+df.assign do
+  vectors.map do |v|
+    ["#{v.key}_cat", v.to_a.map { |x| categories[x] } ]
+  end
+end
+```
+```python
+# by Python Pandas
+# Keep other columns when using min() with groupby
+df = pd.DataFrame(
+    {"AAA": [1, 1, 1, 2, 2, 2, 3, 3], "BBB": [2, 1, 3, 4, 5, 1, 2, 3]}
+)
+df
+# returns =>
+   AAA  BBB
+0    1    2
+1    1    1
+2    1    3
+3    2    4
+4    2    5
+5    2    1
+6    3    2
+7    3    3
+# Method 1 : idxmin() to get the index of the minimums
+df.loc[df.groupby("AAA")["BBB"].idxmin()]
+# returns =>
+   AAA  BBB
+1    1    1
+5    2    1
+6    3    2
+# Method 2 : sort then take first of each
+df.sort_values(by="BBB").groupby("AAA", as_index=False).first()
+# returns =>
+   AAA  BBB
+0    1    1
+1    2    1
+2    3    2
+# Notice the same results, with the exception of the index.
+```
+```{ruby}
+#| tags: []
+# RedAmber
+df = DataFrame.new(AAA: [1, 1, 1, 2, 2, 2, 3, 3], BBB: [2, 1, 3, 4, 5, 1, 2, 3])
+```
+```{ruby}
+#| tags: []
+df.group(:AAA).min
+# Add `.rename { [keys[-1], :BBB] }` if you want.
+```
+## 61. Summary/describe
+```{ruby}
+#| tags: []
+penguins.summary
+# or
+penguins.describe
+```
+If you need a variables in row, use `transpose`. (Since 0.2.0)
+```{ruby}
+#| tags: []
+penguins.summary.transpose(name: :stats)
+```
+## 62. Quantile/Quantiles
+`Vector#quantile(prob)` returns quantile at probability `prob`.
+(Since 0.2.0)
+```{ruby}
+#| tags: []
+penguins[:bill_depth_mm].quantile # default　is prob = 0.5
+```
+`Vector#quantiles` accepts an Array for multiple quantiles. Returns a DataFrame.
+```{ruby}
+#| tags: []
+penguins[:bill_depth_mm].quantiles([0.05, 0.95])
+```
+## 63. Transpose
+`DataFrame#transpose` creates transposed DataFrame for wide type dataframe.
+(Since 0.2.0)
+```{ruby}
+#| tags: []
+uri = URI("https://raw.githubusercontent.com/heronshoes/red_amber/master/test/entity/import_cars.tsv")
+import_cars = RedAmber::DataFrame.load(uri)
+```
+```{ruby}
+#| tags: []
+import_cars.transpose
+```
+Default name of created column is `:NAME`.
+We can name the column from the keys in original by the option `name:`.
+```{ruby}
+#| tags: []
+import_cars.transpose(key: :Year, name: :Manufacturer)
+```
+You can specify index column by option `:key` even if it is in the middle of the original DataFrame.
+```{ruby}
+#| tags: []
+# locate `:Year` in the middle
+df = import_cars.pick(1..2, 0, 3..)
+```
+```{ruby}
+#| tags: []
+df.transpose(key: :Year)
+```
+## 64. To_long
+`DataFrame#to_long(*keep_keys)` reshapes wide DataFrame to the long DataFrame.
+- Parameter `keep_keys` specifies the key names to keep.
+(Since 0.2.0)
+```{ruby}
+#| tags: []
+uri = URI("https://raw.githubusercontent.com/heronshoes/red_amber/master/test/entity/import_cars.tsv")
+import_cars = RedAmber::DataFrame.load(uri)
+```
+```{ruby}
+#| tags: []
+import_cars.to_long(:Year)
+```
+- Option `:name` specify the key of the column which is come **from key names**. Default is `:NAME`.
+- Option `:value` specify the key of the column which is come **from values**. Default is `:VALUE`.
+```{ruby}
+#| tags: []
+import_cars.to_long(:Year, name: :Manufacturer, value: :Num_of_imported)
+```
+## 65. To_wide
+`DataFrame#to_wide(*keep_keys)` reshapes long DataFrame to a wide DataFrame.
+- Option `:name` specify the key of the column which will be expanded **to key name**. Default is `:NAME`.
+- Option `:value` specify the key of the column which will be expanded **to values**. Default is `:VALUE`.
+(Since 0.2.0)
+```{ruby}
+#| tags: []
+import_cars.to_long(:Year).to_wide
+```
+```{ruby}
+#| tags: []
+import_cars.to_long(:Year).to_wide(name: :NAME, value: :VALUE)
+# is also OK
+```
+## 66. Custom index
+Another example of `indices` is [14. Indices](#14.-Indices).
+We can set the start of indices by the option.
+(Since 0.2.1)
+```{ruby}
+#| tags: []
+df = DataFrame.new(x: [0, 1, 2, 3, 4])
+df.indices
+```
+```{ruby}
+#| tags: []
+df.indices(1)
+```
+You can put the first value which accepts `#succ` method.
+```{ruby}
+#| tags: []
+df.indices("a")
+```
+## 67. Method missing
+`RedAmber::DataFrame` has `#method_missing` to enable to call key names as methods.
+This feature is limited to what can be called as a method (`:key` is OK, not allowed for the keys `:Key`, `:"key.1"`, `:"1key"`, etc. ). But it will be convenient in many cases.
+(Since 0.2.1)
+```{ruby}
+#| tags: []
+df = DataFrame.new(x: [1, 2, 3])
+df.x.sum
+```
+```{ruby}
+#| tags: []
+# Some ways to pull a Vector
+df[:x] # Formal style
+df.v(:x) # #v method
+df.x # method
+```
+```{ruby}
+#| tags: []
+df.x.sum
+```
+## 68. Assign revised
+Another example of `assign` is [#34. Assign](#34.-Assign), [#69. Variations of assign](#69.-Variations-of-assign) .
+```{ruby}
+#| tags: []
+df = DataFrame.new(x: [1, 2, 3])
+# Assign by a Hash
+df.assign(y: df.x / 10.0)
+```
+```{ruby}
+#| tags: []
+# Assign by separated key and value
+df.assign(:y) { x / 10.0 }
+```
+```{ruby}
+#| tags: []
+# Separated keys and values
+df.assign(:y, :z) { [x * 10, x / 10.0] }
+```
+## 69. Variations of assign
+Another example of `assign` is [#34. Assign](#34.-Assign), [#68. Assign revised](#68.-Assign-revised) .
+```{ruby}
+#| tags: []
+df = DataFrame.new(x: [1, 2, 3])
+```
+```{ruby}
+#| tags: []
+# Hash args
+df.assign(y: df[:x] * 10, z: df[:x] / 10.0)
+# Hash
+hash = {y: df[:x] * 10, z: df[:x] / 10.0}
+df.assign(hash)
+# Array
+array = [[:y, df[:x] * 10], [:z, df[:x] / 10.0]]
+df.assign(array)
+# Array
+df.assign [
+  [:y, df[:x] * 10],
+  [:z, df[:x] / 10.0]
+]
+# Hash
+df.assign({
+  y: df[:x] * 10,
+  z: df[:x] / 10.0
+})
+# Block, Hash
+df.assign { {y: df[:x] * 10, z: df[:x] / 10.0} }
+# Block, Array
+df.assign { [[:y, df[:x] * 10], [:z, df[:x] / 10.0]] }
+# Block, Array, method
+#df.assign { [:y, x * 10], [:z, x / 10.0]] }
+# Separated
+#df.assign(:y, :z) { [x * 10, x / 10.0] }
+```
+## 70. Row index label by slice_by
+Another example of `slice` is [#28. Slice](#28.-Slice).
+(Since 0.2.1)
+```{ruby}
+#| tags: []
+df = DataFrame.new(num: [1.1, 2.2, 3.3, 4.4, 5.5])
+              .assign_left(:label) { indices("a") }
+```
+`slice_by(key) { row_selector }` selects rows in column `key` with `row_selector`.
+```{ruby}
+#| tags: []
+df.slice_by(:label) { "b".."d" }
+```
+```{ruby}
+#| tags: []
+df.slice_by(:label) { ["c", "b", "e"] }
+```
+If the option `keep_key:` set to `true`, index label column is preserved.
+```{ruby}
+#| tags: []
+df.slice_by(:label, keep_key: true) { "b".."d" }
+```
+## 71. Simpson's paradox in COVID-19 data
+https://www.rdocumentation.org/packages/openintro/versions/2.3.0/topics/simpsons_paradox_covid
+```{ruby}
+#| tags: []
+require 'datasets-arrow'
+ds = Datasets::Rdatasets.new('openintro', 'simpsons_paradox_covid')
+df = RedAmber::DataFrame.new(ds.to_arrow)
+```
+Create group and count by vaccine status and outcome.
+```{ruby}
+#| tags: []
+count = df.group(:vaccine_status, :outcome).count
+```
+Reshape to human readable wide table.
+```{ruby}
+#| tags: []
+all_count = count.to_wide(name: :vaccine_status, value: :count)
+```
+Compute death or survived ratio for vaccine status.
+```{ruby}
+#| tags: []
+all_count.assign do
+  {
+    "vaccinated_%": 100.0 * vaccinated / vaccinated.sum,
+    "unvaccinated_%": 100.0 * unvaccinated / unvaccinated.sum
+  }
+end
+```
+Death ratio for vaccinated is higher than unvaccinated. Is it true?
+Next, do the same thing above for each age group. Temporally create methods.
+```{ruby}
+#| tags: []
+def make_covid_table(df)
+  df.group(:vaccine_status, :outcome)
+    .count
+    .to_wide(name: :vaccine_status, value: :count)
+    .assign do
+      {
+        "vaccinated_%": (100.0 * vaccinated / vaccinated.sum).round(n_digits: 3),
+        "unvaccinated_%": (100.0 * unvaccinated / unvaccinated.sum).round(n_digits: 3)
+      }
+     end
+end
+```
+```{ruby}
+#| tags: []
+# under 50
+make_covid_table(df[df[:age_group] == "under 50"])
+```
+```{ruby}
+#| tags: []
+# 50 +
+make_covid_table(df[df[:age_group] == "50 +"])
+```
+Death ratio for vaccinated is lower than unvaccinated for grouped subset by age. This is an exaple of "Simpson's paradox" .
+```{ruby}
+#| tags: []
+# Vaccine status vs age
+# 50+ is highly vaccinated.
+df.group(:vaccine_status, :age_group).count.to_wide(name: :age_group, value: :count)
+```
+```{ruby}
+#| tags: []
+# Outcome vs age
+# 50+ also has higher death rate.
+df.group(:outcome, :age_group).count.to_wide(name: :age_group, value: :count)
+```
+## 72. Clean up dirty data
+```{ruby}
+#| tags: []
+file = Tempfile.open(['dirty_data', '.csv']) do |f|
+  f.puts(<<~CSV)
+    height,weight
+    154.9,52.2
+    156.8cm,51.1kg
+    152,49
+    148.5cm,45.4kg
+    155cm,
+    ,49.9kg
+    1.58m,49.8kg
+    166.8cm,53.6kg
+  CSV
+  f
+end
+df = DataFrame.load(file)
+```
+It was loaded as String Vectors.
+```{ruby}
+#| tags: []
+df.schema
+```
+First for the `:weight` column. Replacing "" to NaN causes casting to Float.
+```{ruby}
+#| tags: []
+df.assign do
+  {
+    weight: weight.replace(weight == "", Float::NAN)
+  }
+end
+```
+Apply same conversion for `:height` followed by unit conversion by `if_else`.
+```{ruby}
+#| tags: []
+df = df.assign do
+  {
+    weight: weight.replace(weight == '', Float::NAN),
+    height: height.replace(height == '', Float::NAN)
+                  .then { |h| (h < 10).if_else(h * 100, h) }
+  }
+end
+puts df.schema
+df
+```
+We got clean data, then compute BMI as a new column.
+```{ruby}
+#| tags: []
+df.assign(:BMI) { (weight / height ** 2 * 10000).round(n_digits: 1) }
+```
+## 73. From the Pandas cookbook - Multiindexing
+(Updated on v0.3.0)
+https://pandas.pydata.org/docs/user_guide/cookbook.html#multiindexing
+```python
+# by Python Pandas
+# Efficiently and dynamically creating new columns using applymap
+df = pd.DataFrame(
+    {
+        "row": [0, 1, 2],
+        "One_X": [1.1, 1.1, 1.1],
+        "One_Y": [1.2, 1.2, 1.2],
+        "Two_X": [1.11, 1.11, 1.11],
+        "Two_Y": [1.22, 1.22, 1.22],
+    }
+)
+df
+# =>
+   row  One_X  One_Y  Two_X  Two_Y
+0    0    1.1    1.2   1.11   1.22
+1    1    1.1    1.2   1.11   1.22
+2    2    1.1    1.2   1.11   1.22
+# As Labelled Index
+df = df.set_index("row")
+df
+# =>
+     One_X  One_Y  Two_X  Two_Y
+row
+0      1.1    1.2   1.11   1.22
+1      1.1    1.2   1.11   1.22
+2      1.1    1.2   1.11   1.22
+# With Hierarchical Columns
+df.columns = pd.MultiIndex.from_tuples([tuple(c.split("_")) for c in df.columns])
+df
+# =>
+     One        Two
+       X    Y     X     Y
+row
+0    1.1  1.2  1.11  1.22
+1    1.1  1.2  1.11  1.22
+2    1.1  1.2  1.11  1.22
+# Now stack & Reset
+df = df.stack(0).reset_index(1)
+df
+# =>
+    level_1     X     Y
+row
+0       One  1.10  1.20
+0       Two  1.11  1.22
+1       One  1.10  1.20
+1       Two  1.11  1.22
+2       One  1.10  1.20
+2       Two  1.11  1.22
+# And fix the labels (Notice the label 'level_1' got added automatically)
+df.columns = ["Sample", "All_X", "All_Y"]
+df
+# =>
+    Sample  All_X  All_Y
+row
+0      One   1.10   1.20
+0      Two   1.11   1.22
+1      One   1.10   1.20
+1      Two   1.11   1.22
+2      One   1.10   1.20
+2      Two   1.11   1.22
+```
+(Until 0.2.3)
+This is an example before `Vector#split_*` has introduced. See [88. Vector#split_columns](#88.-Vector#split_to_columns) .
+```{ruby}
+#| tags: []
+df = RedAmber::DataFrame.new(
+        "row": [0, 1, 2],
+        "One_X": [1.1, 1.1, 1.1],
+        "One_Y": [1.2, 1.2, 1.2],
+        "Two_X": [1.11, 1.11, 1.11],
+        "Two_Y": [1.22, 1.22, 1.22],
+)
+```
+```{ruby}
+#| tags: []
+df_x = df.pick(:row, :One_X, :Two_X)
+  .to_long(:row, name: :Sample, value: :All_X)
+```
+```{ruby}
+#| tags: []
+df_y = df.pick(:row, :One_Y, :Two_Y)
+  .to_long(:row, name: :Sample, value: :All_Y)
+```
+```{ruby}
+#| tags: []
+df_x.pick(:row)
+ .assign [
+   [:Sample, df_x[:Sample].each.map { |x| x.split("_").first }],
+   [:All_X, df_x[:All_X]],
+   [:All_Y, df_y[:All_Y]]
+ ]
+```
+(Since 0.3.0)
+This example will use `Vector#split_to_columns`.
+```{ruby}
+#| tags: []
+df = RedAmber::DataFrame.new(
+        "row": [0, 1, 2],
+        "One_X": [1.1, 1.1, 1.1],
+        "One_Y": [1.2, 1.2, 1.2],
+        "Two_X": [1.11, 1.11, 1.11],
+        "Two_Y": [1.22, 1.22, 1.22],
+)
+```
+```{ruby}
+#| tags: []
+df.to_long(:row)
+```
+`Vector#split_to_colums` returns two splitted Vectors.
+```{ruby}
+#| tags: []
+df.to_long(:row, name: :Sample)
+  .assign(:Sample, :xy) { v(:Sample).split_to_columns('_') }
+```
+```{ruby}
+#| tags: []
+df.to_long(:row, name: :Sample)
+  .assign(:Sample, :xy) { v(:Sample).split_to_columns('_') }
+  .to_wide(name: :xy, value: :VALUE)
+```
+## 74. From the Pandas cookbook - Arithmetic
+https://pandas.pydata.org/docs/user_guide/cookbook.html#arithmetic
+```python
+# by Python Pandas
+cols = pd.MultiIndex.from_tuples(
+    [(x, y) for x in ["A", "B", "C"] for y in ["O", "I"]]
+)
+df = pd.DataFrame(np.random.randn(2, 6), index=["n", "m"], columns=cols)
+df
+# =>
+          A                   B                   C
+          O         I         O         I         O         I
+n  0.469112 -0.282863 -1.509059 -1.135632  1.212112 -0.173215
+m  0.119209 -1.044236 -0.861849 -2.104569 -0.494929  1.071804
+df = df.div(df["C"], level=1)
+df
+# =>
+          A                   B              C
+          O         I         O         I    O    I
+n  0.387021  1.633022 -1.244983  6.556214  1.0  1.0
+m -0.240860 -0.974279  1.741358 -1.963577  1.0  1.0
+```
+This is a tentative example. This work may be refined by the coming feature which treats multiple key header easily.
+```{ruby}
+#| tags: []
+require "arrow-numo-narray"
+values = Numo::DFloat.new(6, 2).rand_norm
+```
+For consistency with the pandas result, we will use same data of them.
+```{ruby}
+#| tags: []
+values = [
+  [0.469112, -0.282863, -1.509059, -1.135632,  1.212112, -0.173215],
+  [0.119209, -1.044236, -0.861849, -2.104569, -0.494929,  1.071804]
+].transpose
+```
+```{ruby}
+#| tags: []
+keys = %w[A B C].product(%w[O I]).map(&:join)
+```
+```{ruby}
+#| tags: []
+df = RedAmber::DataFrame.new(index: %w[n m])
+                        .assign(*keys) { values }
+```
+```{ruby}
+#| tags: []
+df.assign do
+  assigner = {}
+  %w[A B C].each do |abc|
+    %w[O I].each do |oi|
+      key = "#{abc}#{oi}".to_sym
+      assigner[key] = v(key) / v("C#{oi}".to_sym)
+    end
+  end
+  assigner
+end
+```
+```{ruby}
+#| tags: []
+coords = [["AA", "one"], ["AA", "six"], ["BB", "one"], ["BB", "two"], ["BB", "six"]].transpose
+df = RedAmber::DataFrame.new(MyData: [11, 22, 33, 44, 55])
+                        .assign_left(:label1, :label2) { coords }
+```
+## 75. From the Pandas cookbook - Slicing
+https://pandas.pydata.org/docs/user_guide/cookbook.html#slicing
+```python
+# by Python Pandas
+coords = [("AA", "one"), ("AA", "six"), ("BB", "one"), ("BB", "two"), ("BB", "six")]
+index = pd.MultiIndex.from_tuples(coords)
+df = pd.DataFrame([11, 22, 33, 44, 55], index, ["MyData"])
+df
+# =>
+        MyData
+AA one      11
+   six      22
+BB one      33
+   two      44
+   six      55
+```
+To take the cross section of the 1st level and 1st axis the index:
+```python
+# by Python Pandas
+# Note : level and axis are optional, and default to zero
+df.xs("BB", level=0, axis=0)
+# =>
+     MyData
+one      33
+two      44
+six      55
+```
+```{ruby}
+#| tags: []
+df.slice { label1 == "BB" }.drop(:label1)
+```
+…and now the 2nd level of the 1st axis.
+```python
+# by Python Pandas
+df.xs("six", level=1, axis=0)
+# =>
+    MyData
+AA      22
+BB      55
+```
+```{ruby}
+#| tags: []
+df.slice { label2 == "six" }.drop(:label2)
+```
+```python
+# by Python Pandas
+import itertools
+index = list(itertools.product(["Ada", "Quinn", "Violet"], ["Comp", "Math", "Sci"]))
+headr = list(itertools.product(["Exams", "Labs"], ["I", "II"]))
+indx = pd.MultiIndex.from_tuples(index, names=["Student", "Course"])
+cols = pd.MultiIndex.from_tuples(headr)  # Notice these are un-named
+data = [[70 + x + y + (x * y) % 3 for x in range(4)] for y in range(9)]
+df = pd.DataFrame(data, indx, cols)
+df
+# =>
+               Exams     Labs
+                   I  II    I  II
+Student Course
+Ada     Comp      70  71   72  73
+        Math      71  73   75  74
+        Sci       72  75   75  75
+Quinn   Comp      73  74   75  76
+        Math      74  76   78  77
+        Sci       75  78   78  78
+Violet  Comp      76  77   78  79
+        Math      77  79   81  80
+        Sci       78  81   81  81
+```
+```{ruby}
+#| tags: []
+indexes = %w[Ada Quinn Violet].product(%w[Comp Math Sci]).transpose
+df = RedAmber::DataFrame.new(%w[Student Course].zip(indexes))
+                        .assign do
+                          assigner = {}
+                          keys = %w[Exams Labs].product(%w[I II]).map { |a| a.join("/") }
+                          keys.each.with_index do |key, x|
+                            assigner[key] = (0...9).map { |y| 70 + x + y + (x * y) % 3 }
+                          end
+                          assigner
+                        end
+```
+```python
+# by Python Pandas
+All = slice(None)
+df.loc["Violet"]
+# =>
+       Exams     Labs
+           I  II    I  II
+Course
+Comp      76  77   78  79
+Math      77  79   81  80
+Sci       78  81   81  81
+```
+```{ruby}
+#| tags: []
+df.slice(df[:Student] == "Violet").drop(:Student)
+```
+```python
+# by Python Pandas
+df.loc[(All, "Math"), All]
+# =>
+               Exams     Labs
+                   I  II    I  II
+Student Course
+Ada     Math      71  73   75  74
+Quinn   Math      74  76   78  77
+Violet  Math      77  79   81  80
+```
+```{ruby}
+#| tags: []
+df.slice(df[:Course] == "Math")
+```
+```python
+# by Python Pandas
+df.loc[(slice("Ada", "Quinn"), "Math"), All]
+# =>
+               Exams     Labs
+                   I  II    I  II
+Student Course
+Ada     Math      71  73   75  74
+Quinn   Math      74  76   78  77
+```
+```{ruby}
+#| tags: []
+df.slice(df[:Course] == "Math")
+  .slice { (v(:Student) == "Ada") | (v(:Student) == "Quinn") }
+```
+```python
+# by Python Pandas
+df.loc[(All, "Math"), ("Exams")]
+# =>
+                 I  II
+Student Course
+Ada     Math    71  73
+Quinn   Math    74  76
+Violet  Math    77  79
+```
+```{ruby}
+#| tags: []
+df.slice(df[:Course] == "Math")
+  .pick {
+    [:Student, :Course].concat keys.select { |key| key.to_s.start_with?("Exams") }
+  }
+```
+```python
+# by Python Pandas
+df.loc[(All, "Math"), (All, "II")]
+# =>
+               Exams Labs
+                  II   II
+Student Course
+Ada     Math      73   74
+Quinn   Math      76   77
+Violet  Math      79   80
+```
+```{ruby}
+#| tags: []
+df.slice(df[:Course] == "Math")
+  .pick {
+    [:Student, :Course].concat keys.select { |key| key.to_s.end_with?("II") }
+  }
+```
+## 76. Vector#map
+`Vector#map` method accepts a block and return yielded results from the block in a Vector.
+```{ruby}
+#| tags: []
+v = Vector.new(1, 2, 3, 4)
+v.map { |x| x / 100.0 }
+```
+If no block is given, return a Enumerator.
+```{ruby}
+#| tags: []
+v.map
+```
+If you need ruby's map from a Vector, try `.each.map` .
+```{ruby}
+#| tags: []
+v.each.map { |x| x / 100.0 }
+```
+Alias for `#map` is `#collect`
+Similar method is `Vector#filter/#select`.
+## 77. Introduce columns from numo/narray
+(Until 0.2.2 w/Arrow 9.0.0) We couldn't construct the DataFrame directly from Numo/NArray, but following trick enables.
+```{ruby}
+#| tags: []
+DataFrame.new(index: Array(1..10))
+  .assign do
+    {
+      x0: Numo::DFloat.new(size).rand_norm(0, 2),
+      x1: Numo::DFloat.new(size).rand_norm(5, 2),
+      x2: Numo::DFloat.new(size).rand_norm(10, 2),
+      y0: Numo::DFloat.new(size).rand_norm(100, 10),
+      y1: Numo::DFloat.new(size).rand_norm(200, 10),
+      y2: Numo::DFloat.new(size).rand_norm(300, 10)
+    }
+  end
+```
+If you do not need the index column, try this.
+```{ruby}
+#| tags: []
+DataFrame.new(_: Array(1..10))
+  .assign do
+    {
+      x0: Numo::DFloat.new(size).rand_norm(0, 2),
+      x1: Numo::DFloat.new(size).rand_norm(5, 2),
+      x2: Numo::DFloat.new(size).rand_norm(10, 2),
+      y0: Numo::DFloat.new(size).rand_norm(100, 10),
+      y1: Numo::DFloat.new(size).rand_norm(200, 10),
+      y2: Numo::DFloat.new(size).rand_norm(300, 10)
+    }
+  end
+  .drop(:_)
+```
+(New from 0.2.3 with Aroow 10.0.0) It is possible to initialize by objects responsible to `to_arrow` since 0.2.3 . Arrays in Numo::NArray is responsible to `to_arrow` with `red-arrow-numo-narray` gem. This feature is proposed by the Red Data Tools member @kojix2 and implemented by @kou in Arrow 10.0.0 and Red Arrow Numo::NArray 0.0.6. Thanks!
+```{ruby}
+#| tags: []
+require 'arrow-numo-narray'
+size = 10
+DataFrame.new(
+  x0: Numo::DFloat.new(size).rand_norm(0, 2),
+  x1: Numo::DFloat.new(size).rand_norm(5, 2),
+  x2: Numo::DFloat.new(size).rand_norm(10, 2),
+  y0: Numo::DFloat.new(size).rand_norm(100, 10),
+  y1: Numo::DFloat.new(size).rand_norm(200, 10),
+  y2: Numo::DFloat.new(size).rand_norm(300, 10)
+)
+```
+## 78. Join (mutating joins)
+(Since 0.2.3)
+```{ruby}
+#| tags: []
+df = DataFrame.new(
+  KEY: %w[A B C],
+  X1: [1, 2, 3]
+)
+```
+```{ruby}
+#| tags: []
+other = DataFrame.new(
+  KEY: %w[A B D],
+  X2: [true, false, nil]
+)
+```
+Inner join will join data leaving only the matching records.
+```{ruby}
+#| tags: []
+df.inner_join(other, :KEY)
+```
+If we omit join keys, common keys are automatically chosen (natural key).
+```{ruby}
+#| tags: []
+df.inner_join(other)
+```
+Full join will join data leaving all records.
+```{ruby}
+#| tags: []
+df.full_join(other)
+```
+Left join will join matching values to self from other (type: left_outer).
+```{ruby}
+#| tags: []
+df.left_join(other)
+```
+Right join will join matching values from self to other (type: right_outer).
+```{ruby}
+#| tags: []
+df.right_join(other)
+```
+Left join will join matching values to self from other.
+```{ruby}
+#| tags: []
+df.left_join(other)
+```
+## 79. Join (filtering joins)
+(Since 0.2.3)
+Semi join will return records of self that have a match in other.
+```{ruby}
+#| tags: []
+df.semi_join(other)
+```
+Anti join will return records of self that do not have a match in other.
+```{ruby}
+#| tags: []
+df.anti_join(other)
+```
+## 80. Partial joins
+(Since 0.2.3)
+```{ruby}
+#| tags: []
+df2 = DataFrame.new(
+  KEY1: %w[A B C],
+  KEY2: %w[s t u],
+  X: [1, 2, 3]
+)
+```
+```{ruby}
+#| tags: []
+other2 = DataFrame.new(
+  KEY1: %w[A B D],
+  KEY2: %w[s u v],
+  Y: [3, 2, 1]
+)
+```
+```{ruby}
+#| tags: []
+# natural join
+df2.inner_join(other2)
+# Same as df2.inner_join(other2, [:KEY1, :KEY2])
+```
+Partial join enables some part of common keys as join keys.
+Common keys of other not used as join keys will renamed as `:suffix`. Default suffix is '.1'.
+```{ruby}
+#| tags: []
+# partial join
+df2.inner_join(other2, :KEY1)
+```
+```{ruby}
+#| tags: []
+df2.inner_join(other2, :KEY1, suffix: '_')
+```
+## 81. Order of record in join
+Order of records is not guaranteed to be preserved before or after join. This is a similar property to RDB. Records behave like a set.
+If you want to preserve the order of records, it is recommended to add an index or sort.
+(Since 0.2.3)
+```{ruby}
+#| tags: []
+df2
+```
+```{ruby}
+#| tags: []
+other2
+```
+```{ruby}
+#| tags: []
+df2.full_join(other2, :KEY2)
+```
+## 82. Set operations
+Keys in self and other must be same in set operations.
+(Since 0.2.3)
+```{ruby}
+#| tags: []
+df = DataFrame.new(
+  KEY1: %w[A B C],
+  KEY2: [1, 2, 3]
+)
+```
+```{ruby}
+#| tags: []
+other = DataFrame.new(
+  KEY1: %w[A B D],
+  KEY2: [1, 4, 5]
+)
+```
+Intersect will select records appearing in both self and other.
+```{ruby}
+#| tags: []
+df.intersect(other)
+```
+Union will select records appearing in both self or other.
+```{ruby}
+#| tags: []
+df.union(other)
+```
+Difference will select records appearing in self but not in other.
+It has an alias `#setdiff`.
+```{ruby}
+#| tags: []
+df.difference(other)
+```
+## 83. Join (big method)
+Undocumented big method `join` supports all mutating joins, filtering joins and set operations.
+|category|method of RedAmber|:type in join method|requirement|
+|-|-|-|-|
+|mutating joins|#inner_join|:inner||
+|mutating joins|#full_join|:full_outer||
+|mutating joins|#left_join|:left_outer||
+|mutating joins|#right_join|:right_outer||
+|-|-|:right_semi||
+|-|-|:right_anti||
+|filtering joins|#semi_join|:left_semi||
+|filtering joins|#anti_join|:left_anti||
+|set operations|#intersect|:inner|must have same keys with self and other|
+|set operations|#union|:full_outer|must have same keys with self and other|
+|set operations|#difference|:left_anti|must have same keys with self and other|
+(Since 0.2.3)
+```{ruby}
+#| tags: []
+df = DataFrame.new(
+  KEY: %w[A B C],
+  X1: [1, 2, 3]
+)
+```
+```{ruby}
+#| tags: []
+other = DataFrame.new(
+  KEY: %w[A B D],
+  X2: [true, false, nil]
+)
+```
+```{ruby}
+#| tags: []
+df.join(other, :KEY, type: :inner)
+# Same as df.inner_join(other)
+```
+(Since 0.5.0) `#join` will not force ordering of original column by default.
+## 84. Force order for #join
+We can use `:force_order` option to ensure unique order for `join` families.
+This option is true by default in `#inner_join`, `#full_join`, `#left_join`, `#right_join`, `#semi_join` and `#anti_join`.
+It will append index to the source and sort after joining. It will cause some degradation in performance.
+(Since 0.4.0)
+(Since 0.5.0) `#join` will not force ordering of original column by default.
+```{ruby}
+#| tags: []
+df2 = DataFrame.new(
+  KEY1: %w[A B C],
+  KEY2: %w[s t u],
+  X: [1, 2, 3]
+)
+```
+```{ruby}
+#| tags: []
+right2 = DataFrame.new(
+  KEY1: %w[A B D],
+  KEY2: %w[s u v],
+  Y: [3, 2, 1]
+)
+```
+```{ruby}
+#| tags: []
+df2.full_join(right2, :KEY2)
+```
+```{ruby}
+#| tags: []
+df2.full_join(right2, :KEY2, force_order: false)
+```
+```{ruby}
+#| tags: []
+df2.full_join(right2, { left: :KEY2, right: 'KEY2' })
+```
+```{ruby}
+#| tags: []
+df2.full_join(right2, { left: :KEY2, right: 'KEY2' }, force_order: false)
+```
+## 85. Binding DataFrames in vertical (concatenate)
+Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
+The alias is `concat`.
+(Since 0.2.3)
+```{ruby}
+#| tags: []
+df = DataFrame.new(x: [1, 2], y: ['A', 'B'])
+```
+```{ruby}
+#| tags: []
+other = DataFrame.new(x: [3, 4], y: ['C', 'D'])
+```
+```{ruby}
+#| tags: []
+df.concatenate(other)
+```
+## 86. Binding DataFrames in lateral (merge)
+Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
+(Since 0.2.3)
+```{ruby}
+#| tags: []
+df = DataFrame.new(x: [1, 2], y: [3, 4])
+```
+```{ruby}
+#| tags: []
+other = DataFrame.new(a: ['A', 'B'], b: ['C', 'D'])
+```
+```{ruby}
+#| tags: []
+df.merge(other)
+```
+## 87. Join - larger example by nycflight13
+(Since 0.2.3)
+'nycflights13' dataset is a large dataset. It will take a while for the first run to fetch and prepare red-datasets cache.
+```{ruby}
+#| tags: []
+require 'datasets-arrow'
+package = 'nycflights13'
+airlines = DataFrame.new(Datasets::Rdatasets.new(package, 'airlines'))
+airlines
+```
+Creating `Datasets::Rdatasets.new('flights', 'airlines')` is very slow because Red Datasets uses Ruby's primitive CSV as csv parser. We can parse csv by Arrow's faster parser.
+```{ruby}
+uri = URI('https://vincentarelbundock.github.io/Rdatasets/csv/nycflights13/flights.csv')
+flights  = DataFrame.load(uri)
+  .pick(%i[month day carrier flight tailnum origin dest air_time distance])
+flights
+```
+```{ruby}
+# inner join
+flights.inner_join(airlines, :carrier)
+# flights.inner_join(airlines) # natural join (same result)
+```
+## 88. Vector#split_to_columns
+Another example using in the DataFrame operation is in [73. From the Pandas cookbook - Multiindexing](#73.-From-the-Pandas-cookbook---Multiindexing).
+`self` must be a String type Vector.
+(Since 0.3.0)
+```{ruby}
+#| tags: []
+v = Vector.new(['a b', 'c d', 'e f'])
+```
+```{ruby}
+#| tags: []
+v.split_to_columns
+```
+`#split` accepts `sep` argument as a separator. `sep` is passed to `String#split(sep)`.
+```{ruby}
+#| tags: []
+Vector.new('ab', 'cd', 'ef')
+      .split_to_columns('')
+```
+nil will separated as nil.
+```{ruby}
+#| tags: []
+Vector.new(nil, 'c d', 'e f')
+      .split_to_columns
+```
+## 89. Vector#split_to_rows
+`#split_to_rows` will separate strings and flatten into row.
+(Since 0.3.0)
+```{ruby}
+#| tags: []
+v = Vector.new(['a b', 'c d', 'e f'])
+```
+```{ruby}
+#| tags: []
+v.split_to_rows
+```
+## 90. Vector#merge
+(Since 0.3.0)
+`Vector#merge(other)` merges `self` and `other` if they are String Vector.
+```{ruby}
+#| tags: []
+vector = Vector.new(%w[a c e])
+other = Vector.new(%w[b d f])
+vector.merge(other)
+```
+If `other` is scalar, it will be appended to each elements of `self`.
+```{ruby}
+#| tags: []
+vector.merge('x')
+```
+Option `:sep` is used to concatenating elements. Its default value is ' '.
+```{ruby}
+#| tags: []
+vector.merge('x', sep: '')
+```
+## 91. Separate a variable (column) in a DataFrame
+(Since 0.3.0)
+R's separate operation.
+https://tidyr.tidyverse.org/reference/separate.html
+```{ruby}
+#| tags: []
+df = DataFrame.new(xyz: [nil, 'x.y', 'x.z', 'y.z'])
+```
+Instead of `separate(:xyz, [:a, :b])` we will do:
+```{ruby}
+#| tags: []
+df.assign(:A, :B) { xyz.split_to_columns('.') }
+  .drop(:xyz)
+```
+If you need :B only, instead of `separate(:xyz, [nil, :B])` we will do:
+```{ruby}
+#| tags: []
+df.assign(:A, :B) { xyz.split_to_columns('.') }
+  .pick(:B)
+```
+When splitted length is not equal, split returns max size of Vector Array filled with nil.
+```{ruby}
+#| tags: []
+df = DataFrame.new(xyz: ['x', 'x y', 'x y z', nil])
+df.assign(:x, :y, :z) { xyz.split_to_columns }
+```
+Split limiting max 2 elemnts.
+```{ruby}
+#| tags: []
+df.assign(:x, :yz) { xyz.split_to_columns(' ', 2) }
+```
+Another example:
+```{ruby}
+#| tags: []
+df = DataFrame.new(id: 1..3, 'month-year': %w[8-2022 9-2022 10-2022])
+  .assign(:month, :year) { v(:'month-year').split_to_columns('-') }
+```
+Split between the letters.
+```{ruby}
+#| tags: []
+df = DataFrame.new(id: 1..3, yearmonth: %w[202209 202210 202211])
+  .assign(:year, :month) { yearmonth.split_to_columns(/(?=..$)/) }
+```
+## 92. Unite variables (columns) in a DataFrame
+(Since 0.3.0)
+R's unite operation.
+```{ruby}
+#| tags: []
+df = DataFrame.new(id: 1..3, year: %w[2022 2022 2022], month: %w[09 10 11])
+```
+```{ruby}
+#| tags: []
+df.assign(:yearmonth) { year.merge(month, sep: '') }
+  .pick(:id, :yearmonth)
+```
+```{ruby}
+#| tags: []
+# Or directly create:
+DataFrame.new(id: 1..3, yearmonth: df.year.merge(df.month, sep: ''))
+```
+## 93. Separate variable and lengthen into several rows.
+(Since 0.3.0)
+R's separate_rows operation.
+```{ruby}
+#| tags: []
+df = DataFrame.new(id: 1..3, yearmonth: %w[202209 202210 202211])
+  .assign(:year, :month) { yearmonth.split_to_columns(/(?=..$)/) }
+  .drop(:yearmonth)
+  .to_long(:id)
+```
+Another example with different list size.
+```{ruby}
+#| tags: []
+df = DataFrame.new(
+  x: 1..3,
+  y: ['a', 'd,e,f', 'g,h'],
+  z: ['1', '2,3,4', '5,6'],
+)
+```
+```{ruby}
+#| tags: []
+sizes = df.y.split(',').list_sizes
+a = sizes.to_a.map.with_index(1) { |n, i| [i] * n }.flatten
+```
+```{ruby}
+#| tags: []
+DataFrame.new(
+  x: a,
+  y: df.y.split_to_rows(','),
+  z: df.z.split_to_rows(',')
+)
+```
+Another way to use `#split_to_columns`.
+```{ruby}
+#| tags: []
+xy = df.pick(:x, :y)
+  .assign(:y, :y1, :y2) { v(:y).split_to_columns(',') }
+  .to_long(:x, value: :y)
+  .remove_nil
+```
+```{ruby}
+#| tags: []
+xz = df.pick(:x, :z)
+  .assign(:z, :z1, :z2) { v(:z).split_to_columns(',') }
+  .to_long(:x, value: :z)
+  .remove_nil
+```
+```{ruby}
+#| tags: []
+xy.pick(:x, :y).merge(xz.pick(:z))
+```
+Get all combinations of :y and :z.
+```{ruby}
+#| tags: []
+df.assign(:y, :y1, :y2) { v(:y).split_to_columns(',') }
+  .to_long(:x, :z, value: :y)
+  .drop(:NAME)
+  .assign(:z, :z1, :z2) { v(:z).split_to_columns(',') }
+  .to_long(:x, :y, value: :z)
+  .drop(:NAME)
+  .drop_nil
+```
+## 94. Vector#propagate
+Spread the return value of an aggregate function as if it is a element-wise function.
+It has an alias `#expand`.
+(Since 0.4.0)
+```{ruby}
+#| tags: []
+vec = Vector.new(1, 2, 3, 4)
+vec.propagate(:mean)
+```
+Block is also available.
+```{ruby}
+#| tags: []
+vec.propagate { |v| v.mean.round }
+```
+## 95. DataFrame#propagate
+Returns a Vector such that all elements have value `scalar` and have same size as self.
+(Since 0.5.0)
+```{ruby}
+#| tags: []
+df
+```
+```{ruby}
+#| tags: []
+df.assign(:sum_x) { propagate(x.sum) }
+```
+With a block.
+```{ruby}
+#| tags: []
+df.assign(:range) { propagate { x.max - x.min } }
+```
+## 96. Vector#sort / #sort_indices
+`#sort` will arrange values in Vector.
+Accepts :sort order option:
+   - `:+`, `:ascending` or without argument will sort in increasing order.
+   - `:-` or `:descending` will sort in decreasing order.
+(Since 0.4.0)
+```{ruby}
+#| tags: []
+vector = Vector.new(%w[B D A E C])
+vector.sort
+# same as vector.sort(:+)
+# same as vector.sort(:ascending)
+```
+Sort in decreasing order;
+```{ruby}
+#| tags: []
+vector.sort(:-)
+# same as vector.sort(:descending)
+```
+## 97. Vector#rank
+Returns 1-based numerical rank of self.
+- Nil values are considered greater than any value.
+- NaN values are considered greater than any value but smaller than nil values.
+- Sort order can be controlled by the option `order`.
+  * `:ascending` or `+` will compute rank in ascending order (default).
+  * `:descending` or `-` will compute rank in descending order.
+- Tiebreakers will configure how ties between equal values are handled.
+  * `tie: :first` : Ranks are assigned in order of when ties appear in the input (default).
+  * `tie: :min` : Ties get the smallest possible rank in the sorted order.
+  * `tie: :max` : Ties get the largest possible rank in the sorted order.
+  * `tie: :dense` : The ranks span a dense [1, M] interval where M is the number of distinct values in the input.
+- Placement of nil and NaN is controlled by the option `null_placement`.
+  * `null_placement: :at_end` : place nulls at end (default).
+  * `null_placement: :at_start` : place nulls at the top of Vector.
+(Since 0.4.0, revised in 0.5.1)
+Rank of float Vector;
+```{ruby}
+#| tags: []
+float = Vector[1, 0, nil, Float::NAN, Float::INFINITY, -Float::INFINITY, 3, 2]
+```
+```{ruby}
+#| tags: []
+# Same as float.rank(:ascending, tie: :first, null_placement: :at_end)
+float.rank
+```
+With sort order;
+```{ruby}
+#| tags: []
+float.rank(:descending) # or float.rank('-')
+```
+With null placement;
+```{ruby}
+#| tags: []
+float.rank(null_placement: :at_start)
+```
+Rank of string Vector with tiebreakers;
+```{ruby}
+#| tags: []
+string = Vector['A', 'A', nil, nil, 'C', 'B']
+```
+```{ruby}
+#| tags: []
+string.rank # same as string.rank(tie: :first)
+```
+```{ruby}
+#| tags: []
+string.rank(tie: :min)
+```
+```{ruby}
+#| tags: []
+string.rank(tie: :max)
+```
+```{ruby}
+#| tags: []
+string.rank(tie: :dense)
+```
+## 98. Vector#sample
+Pick up elements at random.
+(Since 0.4.0)
+Return a randomly selected element. This is one of an aggregation function.
+```{ruby}
+#| tags: []
+v = Vector.new('A'..'H')
+```
+Returns scalar without any arguments.
+```{ruby}
+#| tags: []
+v.sample
+```
+`sample(n)` will pick up `n` elements at random. `n` is a positive number of elements to pick.
+If n is smaller or equal to size, elements are picked by non-repeating.
+If n == 1 (in case of `sample(1)`), it returns a Vector of size == 1 not a scalar.
+```{ruby}
+#| tags: []
+v.sample(1)
+```
+Sample same size of self: every element is picked in random order.
+```{ruby}
+#| tags: []
+v.sample(8)
+```
+If n is greater than `size`, some elements are picked repeatedly.
+```{ruby}
+#| tags: []
+v.sample(9)
+```
+`sample(prop)` will pick up elements by proportion `prop` at random. `prop` must be positive float.
+ - Absolute number of elements to pick:`prop*size` is truncated.
+ - If prop is smaller or equal to 1.0, elements are picked by non-repeating.
+```{ruby}
+#| tags: []
+v.sample(0.7)
+```
+If picked element is only one, it returns a Vector of size == 1 not a scalar.
+```{ruby}
+#| tags: []
+v.sample(0.1)
+```
+Sample same size of self: every element is picked in random order.
+```{ruby}
+#| tags: []
+v.sample(1.0)
+```
+If prop is greater than 1.0, some elements are picked repeatedly.
+```{ruby}
+#| tags: []
+# 2 times over sampling
+sampled = v.sample(2.0)
+```
+```{ruby}
+#| tags: []
+sampled.tally
+```
+## 99. DataFrame#sample/shuffle
+(Since 0.5.0)
+Select records randomly to create a DataFrame.
+```{ruby}
+#| tags: []
+penguins.sample(0.1)
+```
+Returns a DataFrame with shuffled rows.
+```{ruby}
+#| tags: []
+penguins.shuffle
+```
+## 100. Vector#concatenate
+Concatenate other array-like to self.
+(Since 0.4.0)
+Concatenate to string;
+```{ruby}
+#| tags: []
+string = Vector.new(%w[A B])
+```
+```{ruby}
+#| tags: []
+string.concatenate([1, 2])
+```
+Concatenate to integer;
+```{ruby}
+#| tags: []
+integer = Vector.new(1, 2)
+```
+```{ruby}
+#| tags: []
+integer.concatenate(["A", "B"])
+```
+## 101. Vector#resolve
+Return other as a Vector which is same data type as self.
+(Since 0.4.0)
+Integer to String;
+```{ruby}
+#| tags: []
+Vector.new('A').resolve([1, 2])
+```
+String to Ineger;
+```{ruby}
+#| tags: []
+Vector.new(1).resolve(['A'])
+```
+Upcast to uint16;
+```{ruby}
+#| tags: []
+vector = Vector.new(256)
+```
+Not a uint8 Vector;
+```{ruby}
+#| tags: []
+vector.resolve([1, 2])
+```
+## 102. Vector#cast
+Cast self to `type`.
+(since 0.4.2)
+```{ruby}
+#| tags: []
+vector = Vector.new(1, 2, nil)
+vector.cast(:int16)
+```
+```{ruby}
+#| tags: []
+vector.cast(:double)
+```
+```{ruby}
+#| tags: []
+vector.cast(:string)
+```
+## 103. Vector#one
+Get a non-nil element in self. If all elements are nil, return nil.
+(since 0.4.2)
+```{ruby}
+#| tags: []
+vector = Vector.new([nil, 1, 3])
+vector.one
+```
+## 104. SubFrames
+`SubFrames` is a new concept of DataFrame collection. It represents ordered subsets of a DataFrame collected by some rules. It includes both grouping and windowing concepts in a unified manner, and also covers broader cases more flexibly.
+(Since 0.4.0)
+```{ruby}
+#| tags: []
+dataframe = DataFrame.new(
+  x: [*1..6],
+  y: %w[A A B B B C],
+  z: [false, true, false, nil, true, false]
+)
+p dataframe; nil
+```
+```{ruby}
+#| tags: []
+sf = SubFrames.new(dataframe, [[0, 1], [2, 3, 4], [5]])
+```
+Source DataFrame (univarsal set).
+```{ruby}
+#| tags: []
+sf.baseframe
+```
+Size of subsets.
+```{ruby}
+#| tags: []
+sf.size
+```
+Sizes of each subsets.
+```{ruby}
+#| tags: []
+sf.sizes
+```
+`#each` will return an Enumerator or iterates each subset as a DataFrame.
+```{ruby}
+#| tags: []
+sf.each
+```
+```{ruby}
+#| tags: []
+sf.each.next
+```
+`SubFrames.new` also accepts a block.
+```{ruby}
+#| tags: []
+usf = SubFrames.new(dataframe) { |df| [df.indices] }
+```
+`#universal?` tests if self is an univarsal set.
+```{ruby}
+#| tags: []
+usf.universal?
+```
+`#empty?` tests if self is an empty set.
+```{ruby}
+#| tags: []
+esf = SubFrames.new(dataframe, [])
+```
+```{ruby}
+#| scrolled: true
+#| tags: []
+esf.empty?
+```
+`#take(n)` takes n sub dataframes and return them by SubFrames. If n >= size, it returns self.
+```{ruby}
+sf.take(2)
+```
+`#offset_indices` returns indices at the top of each sub DataFrames.
+```{ruby}
+sf.offset_indices
+```
+`#frames` returns an Array of sub DataFrames.
+```{ruby}
+sf.frames
+```
+`SubFrames.new` also accepts boolean filters even from the block.
+```{ruby}
+#| tags: []
+small = dataframe.x < 4
+large = !small
+small_large = SubFrames.new(dataframe) { [small, large] }
+```
+## 105. SubFrames#concatenate
+`SubFrames#concatenate` (or alias `#concat`) will concatenate SubFrames to create a DataFrame.
+(Since 0.4.0)
+```{ruby}
+#| tags: []
+sf.concatenate
+```
+## 106. SubFrames.by_group
+Create SubFrames by Group object.
+(Since 0.4.0)
+```{ruby}
+#| tags: []
+p dataframe; nil
+```
+```{ruby}
+#| tags: []
+group = Group.new(dataframe, [:y])
+sf = SubFrames.by_group(group)
+```
+## 107. SubFrames.by_indices/.by_filters
+`SubFrames.by_indices(dataframe, subset_indices)` creates a new SubFrames object from a DataFrame and an array of indices.#
+```{ruby}
+SubFrames.by_indices(dataframe, [[0, 2, 4], [1, 3, 5]])
+```
+`SubFrames.by_filters(dataframe, subset_filters)` creates a new SubFrames object from a DataFrame and an array of filters.
+```{ruby}
+#| scrolled: true
+SubFrames.by_filters(dataframe, [[true, false, true, false, nil, false], [true, true, false, false, nil, false]])
+```
+## 108. SubFrames.by_dataframes
+`SubFrames.by_dataframes(dataframes)` creates a new SubFrames from an Array of DataFrames.
+```{ruby}
+dataframes = [
+    DataFrame.new(x: [1, 2, 3], y: %w[A A B], z: [false, true, false]),
+    DataFrame.new(x: [4, 5, 6], y: %w[B B C], z: [nil, true, false])
+]
+```
+```{ruby}
+SubFrames.by_dataframes(dataframes)
+```
+## 109. DataFrame#sub_by_value
+`sub_by_value(*keys)` make subframes by value. It is corresponding to Group processing.
+Create SubFrames from keys and group by values in columns specified by the key.
+(Since 0.4.0)
+```{ruby}
+#| tags: []
+dataframe.sub_by_value(:y)
+```
+## 110. DataFrame#sub_by_window
+Create SubFrames by window in `size` rolling `from` by `step`.
+Default values is `from: 0`, `size: nil` and `step: 1`.
+(Since 0.4.0)
+```{ruby}
+#| tags: []
+dataframe.sub_by_window(size: 4, step: 2)
+```
+## 111. DataFrame#sub_by_enum
+Create SubFrames by Grouping/Windowing by posion. The position is specified by `Array`'s enumerator method such as `each_slice` or `each_cons`.
+(Since 0.4.0)
+Create a SubFrames object sliced by 3 rows. This is MECE (Mutually Exclusive and Collectively Exhaustive) SubFrames.
+```{ruby}
+#| tags: []
+dataframe.sub_by_enum(:each_slice, 3)
+```
+Create a SubFrames object for each consecutive 3 rows.
+```{ruby}
+#| tags: []
+dataframe.sub_by_enum(:each_cons, 4)
+```
+## 112. DataFrame#sub_by_kernel
+Create SubFrames by windowing with a kernel and step.
+Kernel is a boolean Array and it behaves like a masked window.
+(Since 0.4.0)
+```{ruby}
+#| tags: []
+kernel = [true, false, false, true]
+dataframe.sub_by_kernel(kernel, step: 2)
+```
+## 113. DataFrame#build_subframes
+Generic builder of sub-dataframe from self.
+(Sice 0.4.0)
+```{ruby}
+#| tags: []
+dataframe.build_subframes([[0, 2, 4], [1, 3, 5]])
+```
+`#build_subframes` also accepts a block.
+```{ruby}
+#| tags: []
+dataframe.build_subframes do |df|
+  even = df.indices.map(&:even?)
+  [even, !even]
+end
+```
+## 114. SubFrames#aggregate
+Aggregate SubFrames to create a DataFrame. There are 4 APIs in this method.
+(Since 0.4.0)
+- `#aggregate(keys) { columns }`
+Aggregate SubFrames creating DataFrame with label `keys` and  its column values by block.
+```{ruby}
+#| tags: []
+sf = dataframe.sub_by_value(:y)
+```
+```{ruby}
+sf.aggregate(:y, :sum_x) { [y.one, x.sum] }  # sf.aggregate([:y, :sum_x]) { [y.one, x.sum] } is also acceptable
+```
+- `#aggregate { key_and_aggregated_values }`
+Aggregate SubFrames creating DataFrame with pairs of key and aggregated values  in Hash from the block.
+```{ruby}
+sf.aggregate do
+  { y: y.one, sum_x: x.sum }
+end
+```
+- `#aggregate { [keys, values] }`
+Aggregate SubFrames creating DataFrame with an Array of key and aggregated value  from the block.
+```{ruby}
+#| tags: []
+sf.aggregate do
+  [[:y, y.one], [:sum_x, x.sum]]
+end
+```
+- `#aggregate(group_keys, aggregations)`
+Aggregate SubFrames for first values of the columns of  `group_keys` and the aggregated results of key-function pairs.
+( [Experiment)l] This API may be changed in the future.
+```{ruby}
+#| tags: []
+sf.aggregate(:y, { x: :sum, z: :count })
+```
+## 115. SubFrames#map/#assign
+`#map` Returns a SubFrames containing DataFrames returned by the block. It has an alias `collect`.
+```{ruby}
+sf
+```
+This example assigns a new column.
+```{ruby}
+sf.map { |df| df.assign(x_plus1: df[:x] + 1) }
+```
+There is a shortcut of `map { assign }`. We can use `assign(key) { updated_column }`.
+```{ruby}
+sf.assign(:x_plus1) { x + 1 }
+```
+We can use `assign(keys) { updated_columns }` for multiple columns.
+```{ruby}
+sf.assign(:sum_x, :flac_x) do
+  group_sum = x.sum
+  [[group_sum] * x.size, x / group_sum.to_f]
+end
+```
+Also `assign { keys_and_columns }` is possible.
+```{ruby}
+sf.assign do
+  { 'x*z': x * z.if_else(1, 0) }
+end
+```
+(Notice) `SubFrames#assign` has a same syntax as `DataFrame#assign`.
+If you need an Array of DataFrames (not a SubFrames), use `each.map` instead.
+```{ruby}
+sf.each.map { |df| df.assign(x_plus1: df[:x] + 1) }
+```
+## 116. SubFrames#select/#reject
+`#select` returns a SubFrames containing DataFrames selected by the block.#
+```{ruby}
+sf.select { |df| df[:z].any? }
+```
+`#select` has aliases `#filter` and `#find_all`.
+`#reject` returns a SubFrames containing truthy DataFrames returned by the block.#
+```{ruby}
+sf.reject { |df| df[:z].any? }
+```
+## 117. SubFrames#filter_map
+It returns a SubFrames containing truthy DataFrames returned by the block.
+```{ruby}
+sf.filter_map do |df|
+  if df.size > 1
+    df.assign(:y) do
+      y.merge(indices('1'), sep: '')
+    end
+  end
+end
+```
+## 118. Vector#modulo
+(Since 0.4.1)
+`#%` is an alias of `#modulo`.
+```{ruby}
+#| tags: []
+vector = Vector.new(5, -3, 1)
+vector % 3
+```
+`#%` and `#modulo` is equivalent to `self-divisor*(self/divisor).floor`.
+```{ruby}
+#| tags: []
+vector.modulo(-2)
+```
+## 119. Vector#mode
+Compute the 1 most common values and their respective occurence counts.
+(since 0.5.0) ModeOptions are not supported in 0.5.0 . Only one mode value is returned.
+```{ruby}
+#| tags: []
+Vector[true, true, false, nil].mode
+```
+```{ruby}
+#| tags: []
+Vector[0, 1, 1, 2, nil].mode
+```
+```{ruby}
+#| tags: []
+Vector[1, 0/0.0, -1/0.0, 1/0.0, nil].mode
+```
+## 120. Vector#end_with/start_with
+Check if elements in self ends/starts with a literal pattern.
+(since 0.5.0)
+```{ruby}
+#| tags: []
+v = Vector['array', 'Arrow', 'carrot', nil, 'window']
+```
+Emits true if it contains `string`. Emit false if not found. Nil inputs emit nil.
+```{ruby}
+#| tags: []
+v.end_with('ow')
+```
+```{ruby}
+#| tags: []
+v.start_with('arr')
+```
+## 121. Vector#match_substring
+For each string in self, emit true if it contains a given pattern.
+(since 0.5.0)
+```{ruby}
+#| tags: []
+v = Vector['array', 'Arrow', 'carrot', nil, 'window']
+```
+Emits true if it contains `string`. Emit false if not found. Nil inputs emit nil.
+```{ruby}
+#| tags: []
+v.match_substring('arr')
+```
+Otherwise use it with Regexp pattern. It calls `count_substring_regex` in Arrow compute function and uses re2 library.
+```{ruby}
+#| tags: []
+v.match_substring(/arr/)
+```
+You can ignore case if you use regexp with `i` option, or `igfnore_case: true`
+```{ruby}
+#| tags: []
+v.match_substring(/arr/i)  # same as v.find_substring(/arr/, ignore_case: true)
+```
+## 122. Vector#match_like
+Match elements of self against SQL-style LIKE pattern. The pattern matches a given pattern at any position.
+- '%' will match any number of characters,
+- '_' will match exactly one character, and any other character matches itself.
+- To match a literal '%', '_', or '\', precede the character with a backslash.
+(since 0.5.0)
+```{ruby}
+#| tags: []
+v = Vector['array', 'Arrow', 'carrot', nil, 'window']
+```
+You can find indices of a literal string. Emit -1 if not found. Nil inputs emit nil.
+```{ruby}
+#| tags: []
+v.match_like('_arr%')
+```
+You can ignore case if you use the option `igfnore_case: true`.
+```{ruby}
+#| tags: []
+v.match_like('arr%', ignore_case: true)
+```
+## 123. Vector#find_substring
+Find first occurrence of substring in string Vector.
+(since 0.5.1)
+```{ruby}
+#| tags: []
+v = Vector['array', 'Arrow', 'carrot', nil, 'window']
+```
+You can find indices of a literal string. Emit -1 if not found. Nil inputs emit nil.
+```{ruby}
+#| tags: []
+v.find_substring('arr')
+```
+Otherwise use it with Regexp pattern. It calls `count_substring_regex` in Arrow compute function and uses re2 library.
+```{ruby}
+#| tags: []
+v.find_substring(/arr/)
+```
+You can ignore case if you use regexp with `i` option, or `igfnore_case: true`
+```{ruby}
+#| tags: []
+v.find_substring(/arr/i)  # same as v.find_substring(/arr/, ignore_case: true)
+```
+## 124. Vector#count_substring
+For each string in self, count occuerences of substring in given pattern.
+(since 0.5.0)
+```{ruby}
+#| tags: []
+v = Vector['amber', 'Amazon', 'banana', nil]
+```
+You can find indices of a literal string. Emit -1 if not found. Nil inputs emit nil.
+```{ruby}
+#| tags: []
+v.count_substring('an')
+```
+Otherwise use it with Regexp pattern. It calls `count_substring_regex` in Arrow compute function and uses re2 library.
+```{ruby}
+#| tags: []
+v.count_substring(/a[mn]/)
+```
+You can ignore case if you use regexp with `i` option, or `igfnore_case: true`
+```{ruby}
+#| tags: []
+v.count_substring(/a[mn]/i)  # same as v.find_substring(/arr/, ignore_case: true)
+```
+## 125. Grouped DataFrame as a list
+This API was introduced in 0.2.3, and supply a new DataFrame group (experimental).
+This additional API will treat a grouped DataFrame as a list of DataFrames. I think this API has pros such as:
+- API is easy to understand and flexible.
+- It has good compatibility with Ruby's primitive Enumerables.
+- We can only use non hash-ed aggregation functions.
+- Do not need grouped DataFrame state, nor `#ungroup` method.
+- May be useful for concurrent operations.
+This feature is implemented by Ruby, so it is pretty slow and experimental. Use original Group API for practical purpose.
+(Since 0.2.3, experimental feature => This was upgraded to SubFrames feature)
+```{ruby}
+enum = penguins.group(:island).each
+```
+```{ruby}
+enum.to_a
+```
+```{ruby}
+array = enum.map do |df|
+  DataFrame.new(island: [df.island[0]]).assign do
+    df.variables.each_with_object({}) do |(key, vec), hash|
+      next unless vec.numeric?
+      hash["mean(#{key})"] = [vec.mean]
+    end
+  end
+end
+```
+```{ruby}
+array.reduce { |a, df| a.concat df }
+```
+## 126. ArrowFunction helpers
+`ArrowFunction` module adds two helper method.
+`ArrowFunction.find(function_name)` returns Arrow Function object in Arrow C++ Compute Functions.
+```{ruby}
+ArrowFunction.find(:mean)
+```
+To execute this function,
+```{ruby}
+ArrowFunction.find(:mean).execute([[1, 2, 3, 4]]).value.value
+```
+`ArrowFunction.arrow_doc(function_name)` returns a document of Arrow C++ Compute Function in a string.
+```{ruby}
+puts ArrowFunction.arrow_doc(:mean)
+```
+## 127. DataFrame.auto_cast
+A data set for planetary data in https://nssdc.gsfc.nasa.gov/planetary/factsheet/ is used here. Let's manually copy the data in the html table and get the tab separated text values.
+```{ruby}
+tsv = ' 	 MERCURY 	 VENUS 	 EARTH 	 MOON 	 MARS 	 JUPITER 	 SATURN 	 URANUS 	 NEPTUNE 	 PLUTO
+Mass (1024kg)	0.330	4.87	5.97	0.073	0.642	1898	568	86.8	102	0.0130
+Diameter (km)	4879	12,104	12,756	3475	6792	142,984	120,536	51,118	49,528	2376
+Density (kg/m3)	5429	5243	5514	3340	3934	1326	687	1270	1638	1850
+Gravity (m/s2)	3.7	8.9	9.8	1.6	3.7	23.1	9.0	8.7	11.0	0.7
+Escape Velocity (km/s)	4.3	10.4	11.2	2.4	5.0	59.5	35.5	21.3	23.5	1.3
+Rotation Period (hours)	1407.6	-5832.5	23.9	655.7	24.6	9.9	10.7	-17.2	16.1	-153.3
+Length of Day (hours)	4222.6	2802.0	24.0	708.7	24.7	9.9	10.7	17.2	16.1	153.3
+Distance from Sun (106 km)	57.9	108.2	149.6	0.384*	228.0	778.5	1432.0	2867.0	4515.0	5906.4
+Perihelion (106 km)	46.0	107.5	147.1	0.363*	206.7	740.6	1357.6	2732.7	4471.1	4436.8
+Aphelion (106 km)	69.8	108.9	152.1	0.406*	249.3	816.4	1506.5	3001.4	4558.9	7375.9
+Orbital Period (days)	88.0	224.7	365.2	27.3*	687.0	4331	10,747	30,589	59,800	90,560
+Orbital Velocity (km/s)	47.4	35.0	29.8	1.0*	24.1	13.1	9.7	6.8	5.4	4.7
+Orbital Inclination (degrees)	7.0	3.4	0.0	5.1	1.8	1.3	2.5	0.8	1.8	17.2
+Orbital Eccentricity	0.206	0.007	0.017	0.055	0.094	0.049	0.052	0.047	0.010	0.244
+Obliquity to Orbit (degrees)	0.034	177.4	23.4	6.7	25.2	3.1	26.7	97.8	28.3	122.5
+Mean Temperature (C)	167	464	15	-20	-65	-110	-140	-195	-200	-225
+Surface Pressure (bars)	0	92	1	0	0.01	Unknown*	Unknown*	Unknown*	Unknown*	0.00001
+Number of Moons	0	0	1	0	2	79	82	27	14	5
+Ring System?	No	No	No	No	No	Yes	Yes	Yes	Yes	No
+Global Magnetic Field?	Yes	No	Yes	No	No	Yes	Yes	Yes	Yes	Unknown
+'
+raw_dataframe = DataFrame.load(Arrow::Buffer.new(tsv), format: :tsv)
+ENV['RED_AMBER_OUTPUT_MODE'] = 'plain'
+raw_dataframe
+```
+This dataframe has row oriented calues. So we must transpose the dataframe.
+```{ruby}
+transposed = raw_dataframe.transpose
+```
+This dataframe has string columns. We can cast each numeric columns, recommended way is to use `#auto_cast`. `#auto_cast` save it in temporally tsv file and re-open it to get a casted dataframe.
+```{ruby}
+transposed.auto_cast
+```
+There are still some dirts to be cleaned in this dataframe, we don't touch them here. If you are interested, give it a try!
+- Rename a column 'NAME' to 'Planet_name'.
+- Remove preceding/trailing spaces in 'Planet_name' values.
+- Capitalize 'Planet_name' values.
+- Remove data for 'Moon' and 'Pluto' to create the Table for planets.
+- Convert 'Unknown*' to nil.
+- Change 'Yes' / 'No' values to true / false (change column type to boolean).
+- Remove comma in numeric values. They obstruct to be numeric columns.
+- Correct cell values which have '*'. They obstruct to be numeric columns.
+- Add missing '^' to unit in labels.