RubyGems - red_amber - Versions diffs - 0.1.2 → 0.1.5 - Mend

red_amber 0.1.2 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

checksums.yaml +4 -4
data/.rubocop.yml +21 -10
data/CHANGELOG.md +162 -6
data/Gemfile +3 -0
data/README.md +89 -303
data/benchmark/csv_load_penguins.yml +15 -0
data/benchmark/drop_nil.yml +11 -0
data/doc/DataFrame.md +840 -0
data/doc/Vector.md +317 -0
data/doc/image/arrow_table_new.png +0 -0
data/doc/image/dataframe/assign.png +0 -0
data/doc/image/dataframe/drop.png +0 -0
data/doc/image/dataframe/pick.png +0 -0
data/doc/image/dataframe/remove.png +0 -0
data/doc/image/dataframe/rename.png +0 -0
data/doc/image/dataframe/slice.png +0 -0
data/doc/image/dataframe_model.png +0 -0
data/doc/image/example_in_red_arrow.png +0 -0
data/doc/image/tdr.png +0 -0
data/doc/image/tdr_and_table.png +0 -0
data/doc/image/tidy_data_in_TDR.png +0 -0
data/doc/image/vector/binary_element_wise.png +0 -0
data/doc/image/vector/unary_aggregation.png +0 -0
data/doc/image/vector/unary_aggregation_w_option.png +0 -0
data/doc/image/vector/unary_element_wise.png +0 -0
data/doc/tdr.md +56 -0
data/doc/tdr_ja.md +56 -0
data/lib/red_amber/data_frame.rb +68 -35
data/lib/red_amber/data_frame_displayable.rb +132 -0
data/lib/red_amber/data_frame_helper.rb +64 -0
data/lib/red_amber/data_frame_indexable.rb +38 -0
data/lib/red_amber/data_frame_observation_operation.rb +83 -0
data/lib/red_amber/data_frame_selectable.rb +34 -43
data/lib/red_amber/data_frame_variable_operation.rb +133 -0
data/lib/red_amber/vector.rb +58 -6
data/lib/red_amber/vector_compensable.rb +68 -0
data/lib/red_amber/vector_functions.rb +147 -68
data/lib/red_amber/version.rb +1 -1
data/lib/red_amber.rb +9 -1
data/red_amber.gemspec +3 -6
metadata +36 -9
data/lib/red_amber/data_frame_output.rb +0 -116

data/doc/DataFrame.md ADDED Viewed

@@ -0,0 +1,840 @@
+# DataFrame
+Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
+- A collection of data which have same data type within. We call it `Vector`.
+- A label is attached to `Vector`. We call it `key`.
+- A `Vector` and associated `key` is grouped as a `variable`.
+- `variable`s with same vector length are aligned and arranged to be a `DaTaFrame`.
+- Each `Vector` in a `DataFrame` contains a set of relating data at same position. We call it `observation`.
+![dataframe model image](doc/../image/dataframe_model.png)
+## Constructors and saving
+### `new` from a Hash
+  ```ruby
+  RedAmber::DataFrame.new(x: [1, 2, 3])
+  ```
+### `new` from a schema (by Hash) and data (by Array)
+  ```ruby
+  RedAmber::DataFrame.new({:x=>:uint8}, [[1], [2], [3]])
+  ```
+### `new` from an Arrow::Table
+  ```ruby
+  table = Arrow::Table.new(x: [1, 2, 3])
+  RedAmber::DataFrame.new(table)
+  ```
+### `new` from a Rover::DataFrame
+  ```ruby
+  rover = Rover::DataFrame.new(x: [1, 2, 3])
+  RedAmber::DataFrame.new(rover)
+  ```
+### `load` (class method)
+- from a `.arrow`, `.arrows`, `.csv`, `.csv.gz` or `.tsv` file
+  ```ruby
+  RedAmber::DataFrame.load("test/entity/with_header.csv")
+  ```
+- from a string buffer
+- from a URI
+  ```ruby
+  uri = URI("uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
+  RedAmber::DataFrame.load(uri)
+  ```
+- from a Parquet file
+  ```ruby
+  dataframe = RedAmber::DataFrame.load("file.parquet")
+  ```
+### `save` (instance method)
+- to a `.arrow`, `.arrows`, `.csv`, `.csv.gz` or `.tsv` file
+- to a string buffer
+- to a URI
+- to a Parquet file
+  ```ruby
+  dataframe.save("file.parquet")
+  ```
+## Properties
+### `table`, `to_arrow`
+- Reader of Arrow::Table object inside.
+### `size`, `n_obs`, `n_rows`
+- Returns size of Vector (num of observations).
+### `n_keys`, `n_vars`, `n_cols`,
+- Returns num of keys (num of variables).
+### `shape`
+- Returns shape in an Array[n_rows, n_cols].
+### `variables`
+- Returns key names and Vectors pair in a Hash.
+  It is convenient to use in a block when both key and vector required. We will write:
+  ```ruby
+    # update numeric variables
+    df.assign do
+      variables.select.with_object({}) do |(key, vector), assigner|
+        assigner[key] = vector * -1 if vector.numeric?
+      end
+    end
+  ```
+  Instead of:
+  ```ruby
+    df.assign do
+      assigner = {}
+      vectors.each_with_index do |vector, i|
+        assigner[keys[i]] = vector * -1 if vector.numeric?
+      end
+      assigner
+    end
+  ```
+### `keys`, `var_names`, `column_names`
+- Returns key names in an Array.
+  When we use it with vectors, Vector#key is useful to get the key inside of DataFrame.
+  ```ruby
+    # update numeric variables, another solution
+    df.assign do
+      vectors.each_with_object({}) do |vector, assigner|
+        assigner[vector.key] = vector * -1 if vector.numeric?
+      end
+    end
+  ```
+### `types`
+- Returns types of vectors in an Array of Symbols.
+### `type_classes`
+- Returns types of vector in an Array of `Arrow::DataType`.
+### `vectors`
+- Returns an Array of Vectors.
+### `indexes`, `indices`
+- Returns all indexes in a Range.
+### `to_h`
+- Returns column-oriented data in a Hash.
+### `to_a`, `raw_records`
+- Returns an array of row-oriented data without header.
+  If you need a column-oriented full array, use `.to_h.to_a`
+### `schema`
+- Returns column name and data type in a Hash.
+### `==`
+### `empty?`
+## Output
+### `to_s`
+### `summary`, `describe` (not implemented)
+### `to_rover`
+- Returns a `Rover::DataFrame`.
+### `tdr(limit = 10, tally: 5, elements: 5)`
+  - Shows some information about self in a transposed style.
+  - `tdr_str` returns same info as a String.
+  ```ruby
+  require 'red_amber'
+  require 'datasets-arrow'
+  penguins = Datasets::Penguins.new.to_arrow
+  RedAmber::DataFrame.new(penguins).tdr
+  # =>
+  RedAmber::DataFrame : 344 x 8 Vectors
+  Vectors : 5 numeric, 3 strings
+  # key                type   level data_preview
+  1 :species           string     3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
+  2 :island            string     3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
+  3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
+  4 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
+  5 :flipper_length_mm uint8     56 [181, 186, 195, nil, 193, ... ], 2 nils
+  6 :body_mass_g       uint16    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
+  7 :sex               string     3 {"male"=>168, "female"=>165, nil=>11}
+  8 :year              uint16     3 {2007=>110, 2008=>114, 2009=>120}
+  ```
+  - limit: limit of variables to show. Default value is 10.
+  - tally: max level to use tally mode.
+  - elements: max num of element to show values in each observations.
+### `inspect`
+- Returns the information of self as `tdr(3)`, and also shows object id.
+  ```ruby
+  puts penguins.inspect
+  # =>
+  #<RedAmber::DataFrame : 344 x 8 Vectors, 0x000000000000f0b4>
+  Vectors : 5 numeric, 3 strings
+  # key                type   level data_preview
+  1 :species           string     3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
+  2 :island            string     3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
+  3 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
+   ... 5 more Vectors ...
+  ```
+## Selecting
+### Select variables (columns in a table) by `[]` as `[key]`, `[keys]`, `[keys[index]]`
+- Key in a Symbol: `df[:symbol]`
+- Key in a String: `df["string"]`
+- Keys in an Array: `df[:symbol1, "string", :symbol2]`
+- Keys by indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`
+  Key indeces can be used via `keys[i]` because numbers are used to select observations (rows).
+- Keys by a Range:
+  If keys are able to represent by Range, it can be included in the arguments. See a example below.
+- You can exchange the order of variables (columns).
+  ```ruby
+  hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
+  df = RedAmber::DataFrame.new(hash)
+  df[:b..:c, "a"]
+  # =>
+  #<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000b02c>
+  Vectors : 2 numeric, 1 string
+  # key type   level data_preview
+  1 :b  string     3 ["A", "B", "C"]
+  2 :c  double     3 [1.0, 2.0, 3.0]
+  3 :a  uint8      3 [1, 2, 3]
+  ```
+  If `#[]` represents single variable (column), it returns a Vector object.
+  ```ruby
+  df[:a]
+  # =>
+  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
+  [1, 2, 3]
+  ```
+  Or `#v` method also returns a Vector for a key.
+  ```ruby
+  df.v(:a)
+  # =>
+  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
+  [1, 2, 3]
+  ```
+  This may be useful to use in a block of DataFrame manipulation verbs. We can write `v(:a)` rather than `self[:a]` or `df[:a]`
+### Select observations (rows in a table) by `[]` as `[index]`, `[range]`, `[array]`
+- Select a obs. by index: `df[0]`
+- Select obs. by indeces in a Range: `df[1..2]`
+  An end-less or a begin-less Range can be used to represent indeces.
+- Select obs. by indeces in an Array: `df[1, 2]`
+- Mixed case: `df[2, 0..]`
+  ```ruby
+  hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
+  df = RedAmber::DataFrame.new(hash)
+  df[:b..:c, "a"].tdr(tally_level: 0)
+  # =>
+  RedAmber::DataFrame : 4 x 3 Vectors
+  Vectors : 2 numeric, 1 string
+  # key type   level data_preview
+  1 :a  uint8      3 [3, 1, 2, 3]
+  2 :b  string     3 ["C", "A", "B", "C"]
+  3 :c  double     3 [3.0, 1.0, 2.0, 3.0]
+  ```
+- Select obs. by a boolean Array or a boolean RedAmber::Vector at same size as self.
+  It returns a sub dataframe with observations at boolean is true.
+    ```ruby
+    # with the same dataframe `df` above
+    df[true, false, nil] # or
+    df[[true, false, nil]] # or
+    df[RedAmber::Vector.new([true, false, nil])]
+    # =>
+    #<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000000f1a4>
+    Vectors : 2 numeric, 1 string
+    # key type   level data_preview
+    1 :a  uint8      1 [1]
+    2 :b  string     1 ["A"]
+    3 :c  double     1 [1.0]
+    ```
+### Select rows from top or from bottom
+  `head(n=5)`, `tail(n=5)`, `first(n=1)`, `last(n=1)`
+## Sub DataFrame manipulations
+### `pick  ` - pick up variables by key label -
+  Pick up some variables (columns) to create a sub DataFrame.
+  ![pick method image](doc/../image/dataframe/pick.png)
+- Keys as arguments
+  `pick(keys)` accepts keys as arguments in an Array.
+    ```ruby
+    penguins.pick(:species, :bill_length_mm)
+    # =>
+    #<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000f924>
+    Vectors : 1 numeric, 1 string
+    # key             type   level data_preview
+    1 :species        string     3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
+    2 :bill_length_mm double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
+    ```
+- Booleans as a argument
+  `pick(booleans)` accepts booleans as a argument in an Array. Booleans must be same length as `n_keys`.
+    ```ruby
+    penguins.pick(penguins.types.map { |type| type == :string })
+    # =>
+    #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000000f938>
+    Vectors : 3 strings
+    # key      type   level data_preview
+    1 :species string     3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
+    2 :island  string     3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
+    3 :sex     string     3 {"male"=>168, "female"=>165, ""=>11}
+    ```
+ - Keys or booleans by a block
+    `pick {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return keys, or a boolean Array with a same length as `n_keys`. Block is called in the context of self.
+    ```ruby
+    # It is ok to write `keys ...` in the block, not `penguins.keys ...`
+    penguins.pick { keys.map { |key| key.end_with?('mm') } }
+    # =>
+    #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000000f1cc>
+    Vectors : 3 numeric
+    # key                type   level data_preview
+    1 :bill_length_mm    double   165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
+    2 :bill_depth_mm     double    81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
+    3 :flipper_length_mm int64     56 [181, 186, 195, nil, 193, ... ], 2 nils
+    ```
+### `drop  ` - pick and drop -
+  Drop some variables (columns) to create a remainer DataFrame.
+  ![drop method image](doc/../image/dataframe/drop.png)
+- Keys as arguments
+  `drop(keys)` accepts keys as arguments in an Array.
+- Booleans as a argument
+  `drop(booleans)` accepts booleans as a argument in an Array. Booleans must be same length as `n_keys`.
+- Keys or booleans by a block
+  `drop {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return keys, or a boolean Array with a same length as `n_keys`. Block is called in the context of self.
+- Notice for nil
+  When used with booleans, nil in booleans is treated as a false. This behavior is aligned with Ruby's `nil#!`.
+  ```ruby
+  booleans = [true, false, nil]
+  booleans_invert = booleans.map(&:!) # => [false, true, true]
+  df.pick(booleans) == df.drop(booleans_invert) # => true
+  ```
+- Difference between `pick`/`drop` and `[]`
+  If `pick` or `drop` will select a single variable (column), it returns a `DataFrame` with one variable. In contrast, `[]` returns a `Vector`. This behavior may be useful to use in a block of DataFrame manipulations.
+  ```ruby
+  df = RedAmber::DataFrame.new(a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3])
+  df.pick(:a) # or
+  df.drop(:b, :c)
+  # =>
+  #<RedAmber::DataFrame : 3 x 1 Vector, 0x000000000000f280>
+  Vector : 1 numeric
+  # key type  level data_preview
+  1 :a  uint8     3 [1, 2, 3]
+  df[:a]
+  # =>
+  #<RedAmber::Vector(:uint8, size=3):0x000000000000f258>
+  [1, 2, 3]
+  ```
+### `slice  `  - to cut vertically is slice -
+  Slice and select observations (rows) to create a sub DataFrame.
+  ![slice method image](doc/../image/dataframe/slice.png)
+- Keys as arguments
+    `slice(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
+    ```ruby
+    # returns 5 obs. at start and 5 obs. from end
+    penguins.slice(0...5, -5..-1)
+    # =>
+    #<RedAmber::DataFrame : 10 x 8 Vectors, 0x000000000000f230>
+    Vectors : 5 numeric, 3 strings
+    # key                type   level data_preview
+    1 :species           string     2 {"Adelie"=>5, "Gentoo"=>5}
+    2 :island            string     2 {"Torgersen"=>5, "Biscoe"=>5}
+    3 :bill_length_mm    double     9 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
+     ... 5 more Vectors ...
+    ```
+- Booleans as an argument
+  `slice(booleans)` accepts booleans as a argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
+    ```ruby
+    vector = penguins[:bill_length_mm]
+    penguins.slice(vector >= 40)
+    # =>
+    #<RedAmber::DataFrame : 242 x 8 Vectors, 0x000000000000f2bc>
+    Vectors : 5 numeric, 3 strings
+    # key                type   level data_preview
+    1 :species           string     3 {"Adelie"=>51, "Chinstrap"=>68, "Gentoo"=>123}
+    2 :island            string     3 {"Torgersen"=>18, "Biscoe"=>139, "Dream"=>85}
+    3 :bill_length_mm    double   115 [40.3, 42.0, 41.1, 42.5, 46.0, ... ]
+     ... 5 more Vectors ...
+    ```
+- Keys or booleans by a block
+    `slice {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.
+    ```ruby
+    # return a DataFrame with bill_length_mm is in 2*std range around mean
+    penguins.slice do
+      vector = self[:bill_length_mm]
+      min = vector.mean - vector.std
+      max = vector.mean + vector.std
+      vector.to_a.map { |e| (min..max).include? e }
+    end
+    # =>
+    #<RedAmber::DataFrame : 204 x 8 Vectors, 0x000000000000f30c>
+    Vectors : 5 numeric, 3 strings
+    # key                type   level data_preview
+    1 :species           string     3 {"Adelie"=>82, "Chinstrap"=>33, "Gentoo"=>89}
+    2 :island            string     3 {"Torgersen"=>31, "Biscoe"=>112, "Dream"=>61}
+    3 :bill_length_mm    double    90 [39.1, 39.5, 40.3, 39.3, 38.9, ... ]
+     ... 5 more Vectors ...
+    ```
+- Notice: nil option
+  - `Arrow::Table#slice` uses `filter` method with a option `Arrow::FilterOptions.null_selection_behavior = :emit_null`. This will propagate nil at the same row.
+    ```ruby
+    hash = { a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3] }
+    table = Arrow::Table.new(hash)
+    table.slice([true, false, nil])
+    # =>
+    #<Arrow::Table:0x7fdfe44b9e18 ptr=0x555e9fe744d0>
+	         a	b	            c
+    0	     1  A      1.000000
+    1	(null)	(null)   (null)
+    ```
+  - Whereas in RedAmber, `DataFrame#slice` with booleans containing nil is treated as false. This behavior comes from `Allow::FilterOptions.null_selection_behavior = :drop`. This is  a default value for `Arrow::Table.filter` method.
+    ```ruby
+    RedAmber::DataFrame.new(table).slice([true, false, nil]).table
+    # =>
+    #<Arrow::Table:0x7fdfe44981c8 ptr=0x555e9febc330>
+	    a	b	         c
+    0	1	A	  1.000000
+    ```
+### `remove`
+  Slice and reject observations (rows) to create a remainer DataFrame.
+  ![remove method image](doc/../image/dataframe/remove.png)
+- Keys as arguments
+    `remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
+    ```ruby
+    # returns 6th to 339th obs.
+    penguins.remove(0...5, -5..-1)
+    # =>
+    #<RedAmber::DataFrame : 334 x 8 Vectors, 0x000000000000f320>
+    Vectors : 5 numeric, 3 strings
+    # key                type   level data_preview
+    1 :species           string     3 {"Adelie"=>147, "Chinstrap"=>68, "Gentoo"=>119}
+    2 :island            string     3 {"Torgersen"=>47, "Biscoe"=>163, "Dream"=>124}
+    3 :bill_length_mm    double   162 [39.3, 38.9, 39.2, 34.1, 42.0, ... ]
+     ... 5 more Vectors ...
+    ```
+- Booleans as an argument
+  `remove(booleans)` accepts booleans as a argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
+    ```ruby
+    # remove all observation contains nil
+    removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
+    removed.tdr
+    # =>
+    RedAmber::DataFrame : 333 x 8 Vectors
+    Vectors : 5 numeric, 3 strings
+    # key                type   level data_preview
+    1 :species           string     3 {"Adelie"=>146, "Chinstrap"=>68, "Gentoo"=>119}
+    2 :island            string     3 {"Torgersen"=>47, "Biscoe"=>163, "Dream"=>123}
+    3 :bill_length_mm    double   163 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
+    4 :bill_depth_mm     double    79 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
+    5 :flipper_length_mm uint8     54 [181, 186, 195, 193, 190, ... ]
+    6 :body_mass_g       uint16    93 [3750, 3800, 3250, 3450, 3650, ... ]
+    7 :sex               string     2 {"male"=>168, "female"=>165}
+    8 :year              uint16     3 {2007=>103, 2008=>113, 2009=>117}
+    ```
+- Keys or booleans by a block
+    `remove {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.
+    ```ruby
+    penguins.remove do
+      vector = self[:bill_length_mm]
+      min = vector.mean - vector.std
+      max = vector.mean + vector.std
+      vector.to_a.map { |e| (min..max).include? e }
+    end
+    # =>
+    #<RedAmber::DataFrame : 140 x 8 Vectors, 0x000000000000f370>
+    Vectors : 5 numeric, 3 strings
+    # key                type   level data_preview
+    1 :species           string     3 {"Adelie"=>70, "Chinstrap"=>35, "Gentoo"=>35}
+    2 :island            string     3 {"Torgersen"=>21, "Biscoe"=>56, "Dream"=>63}
+    3 :bill_length_mm    double    75 [nil, 36.7, 34.1, 37.8, 37.8, ... ], 2 nils
+     ... 5 more Vectors ...
+    ```
+- Notice for nil
+  - When `remove` used with booleans, nil in booleans is treated as false. This behavior is aligned with Ruby's `nil#!`.
+    ```ruby
+    df = RedAmber::DataFrame.new(a: [1, 2, nil], b: %w[A B C], c: [1.0, 2, 3])
+    booleans = df[:a] < 2
+    # =>
+    #<RedAmber::Vector(:boolean, size=3):0x000000000000f410>
+    [true, false, nil]
+    booleans_invert = booleans.to_a.map(&:!) # => [false, true, true]
+    df.slice(booleans) == df.remove(booleans_invert) # => true
+    ```
+  - Whereas `Vector#invert` returns nil for elements nil. This will bring different result.
+    ```ruby
+    booleans.invert
+    # =>
+    #<RedAmber::Vector(:boolean, size=3):0x000000000000f488>
+    [false, true, nil]
+    df.remove(booleans.invert)
+    #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000f474>
+    Vectors : 2 numeric, 1 string
+    # key type   level data_preview
+    1 :a  uint8      2 [1, nil], 1 nil
+    2 :b  string     2 ["A", "C"]
+    3 :c  double     2 [1.0, 3.0]
+    ```
+### `rename`
+  Rename keys (column names) to create a updated DataFrame.
+  ![rename method image](doc/../image/dataframe/rename.png)
+- Key pairs as arguments
+    `rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}`.
+    ```ruby
+    h = { 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] }
+    df = RedAmber::DataFrame.new(h)
+    df.rename(:age => :age_in_1993)
+    # =>
+    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f8fc>
+    Vectors : 1 numeric, 1 string
+    # key          type   level data_preview
+    1 :name        string     3 ["Yasuko", "Rui", "Hinata"]
+    2 :age_in_1993 uint8      3 [68, 49, 28]
+    ```
+- Key pairs by a block
+    `rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}`. Block is called in the context of self.
+- Key type
+  Symbol key and String key are distinguished.
+### `assign`
+  Assign new or updated variables (columns) and create a updated DataFrame.
+  - Variables with new keys will append new variables at bottom (right in the table).
+  - Variables with exisiting keys will update corresponding vectors.
+    ![assign method image](doc/../image/dataframe/assign.png)
+- Variables as arguments
+    `assign(key_pairs)` accepts pairs of key and values as arguments. key_pairs should be a Hash of `{key => array}` or `{key => Vector}`.
+    ```ruby
+    df = RedAmber::DataFrame.new(
+      'name' => %w[Yasuko Rui Hinata],
+      'age' => [68, 49, 28])
+    # =>
+    #<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f8fc>
+    Vectors : 1 numeric, 1 string
+    # key   type   level data_preview
+    1 :name string     3 ["Yasuko", "Rui", "Hinata"]
+    2 :age  uint8      3 [68, 49, 28]
+    # update :age and add :brother
+    assigner = { age: [97, 78, 57], brother: ['Santa', nil, 'Momotaro'] }
+    df.assign(assigner)
+    # =>
+    #<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000f960>
+    Vectors : 1 numeric, 2 strings
+    # key      type   level data_preview
+    1 :name    string     3 ["Yasuko", "Rui", "Hinata"]
+    2 :age     uint8      3 [97, 78, 57]
+    3 :brother string     3 ["Santa", nil, "Momotaro"], 1 nil
+    ```
+- Key pairs by a block
+    `assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key => array}` or `{key => Vector}`. Block is called in the context of self.
+    ```ruby
+    df = RedAmber::DataFrame.new(
+      index: [0, 1, 2, 3, nil],
+      float: [0.0, 1.1,  2.2, Float::NAN, nil],
+      string: ['A', 'B', 'C', 'D', nil])
+    # =>
+    #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f8c0>
+    Vectors : 2 numeric, 1 string
+    # key     type   level data_preview
+    1 :index  uint8      5 [0, 1, 2, 3, nil], 1 nil
+    2 :float  double     5 [0.0, 1.1, 2.2, NaN, nil], 1 NaN, 1 nil
+    3 :string string     5 ["A", "B", "C", "D", nil], 1 nil
+    # update numeric variables
+    df.assign do
+      assigner = {}
+      vectors.each_with_index do |v, i|
+        assigner[keys[i]] = v * -1 if v.numeric?
+      end
+      assigner
+    end
+    # =>
+    #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f924>
+    Vectors : 2 numeric, 1 string
+    # key     type   level data_preview
+    1 :index  int8       5 [0, -1, -2, -3, nil], 1 nil
+    2 :float  double     5 [-0.0, -1.1, -2.2, NaN, nil], 1 NaN, 1 nil
+    3 :string string     5 ["A", "B", "C", "D", nil], 1 nil
+    # Or it ’s shorter like this:
+    df.assign do
+      variables.select.with_object({}) do |(key, vector), assigner|
+        assigner[key] = vector * -1 if vector.numeric?
+      end
+    end
+    # => same as above
+    ```
+- Key type
+  Symbol key and String key are considered as the same key.
+## Updating
+### `sort`
+  `sort` accepts parameters as sort_keys thanks to the amazing Red Arrow feature。
+    - :key, "key" or "+key" denotes ascending order
+    - "-key" denotes descending order
+  ```ruby
+  df = RedAmber::DataFrame.new({
+        index:  [1, 1, 0, nil, 0],
+        string: ['C', 'B', nil, 'A', 'B'],
+        bool:   [nil, true, false, true, false],
+      })
+  df.sort(:index, '-bool').tdr(tally: 0)
+  # =>
+  RedAmber::DataFrame : 5 x 3 Vectors
+  Vectors : 1 numeric, 1 string, 1 boolean
+  # key     type    level data_preview
+  1 :index  uint8       3 [0, 0, 1, 1, nil], 1 nil
+  2 :string string      4 [nil, "B", "B", "C", "A"], 1 nil
+  3 :bool   boolean     3 [false, false, true, nil, true], 1 nil
+  ```
+- [ ] Clamp
+- [ ] Clear data
+## Treat na data
+### `remove_nil`
+  Remove any observations containing nil.
+## Grouping
+### `group(aggregating_keys, function, target_keys)`
+  Create grouped dataframe by `aggregation_keys` and apply `function` to each group and returns in `target_keys`. Aggregated key name is `function(key)` style.
+  (The current implementation is not intuitive. Needs improvement.)
+  ```ruby
+  ds = Datasets::Rdatasets.new('dplyr', 'starwars')
+  starwars = RedAmber::DataFrame.new(ds.to_table.to_h)
+  starwars.tdr(11)
+  # =>
+  RedAmber::DataFrame : 87 x 11 Vectors
+  Vectors : 3 numeric, 8 strings
+  #  key         type   level data_preview
+  1  :name       string    87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader",   "Leia Organa", ... ]
+  2  :height     uint16    46 [172, 167, 96, 202, 150, ... ], 6 nils
+  3  :mass       double    39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
+  4  :hair_color string    13 ["blond", nil, nil, "none", "brown", ... ], 5 nils
+  5  :skin_color string    31 ["fair", "gold", "white, blue", "white", "light", ..  . ]
+  6  :eye_color  string    15 ["blue", "yellow", "red", "yellow", "brown", ... ]
+  7  :birth_year double    37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
+  8  :sex        string     5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, nil=>4}
+  9  :gender     string     3 {"masculine"=>66, "feminine"=>17, nil=>4}
+  10 :homeworld  string    49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ], 10 nils
+  11 :species    string    38 ["Human", "Droid", "Droid", "Human", "Human", ... ], 4 nils
+  grouped = starwars.group(:species, :mean, [:mass, :height])
+  # =>
+  #<RedAmber::DataFrame : 38 x 3 Vectors, 0x000000000000fbf4>
+  Vectors : 2 numeric, 1 string
+  # key             type   level data_preview
+  1 :"mean(mass)"   double    27 [82.78181818181818, 69.75, 124.0, 74.0, 1358.0, ... ], 6 nils
+  2 :"mean(height)" double    32 [176.6451612903226, 131.2, 231.0, 173.0, 175.0, ... ]
+  3 :species        string    38 ["Human", "Droid", "Wookiee", "Rodian", "Hutt", ... ], 1 nil
+  count = starwars.group(:species, :count, :species)[:"count(species)"]
+  df = grouped.slice(count > 1)
+  # =>
+  #<RedAmber::DataFrame : 8 x 3 Vectors, 0x000000000000fc44>
+  Vectors : 2 numeric, 1 string
+  # key             type   level data_preview
+  1 :"mean(mass)"   double     8 [82.78181818181818, 69.75, 124.0, 74.0, 80.0, ... ]
+  2 :"mean(height)" double     8 [176.6451612903226, 131.2, 231.0, 208.66666666666666, 173.0, ... ]
+  3 :species        string     8 ["Human", "Droid", "Wookiee", "Gungan", "Zabrak", ... ]
+  df.table
+  # =>
+  #<Arrow::Table:0x1165593c8 ptr=0x7fb3db144c70>
+	mean(mass)	mean(height)	species
+  0	 82.781818	  176.645161	Human
+  1	 69.750000	  131.200000	Droid
+  2	124.000000	  231.000000	Wookiee
+  3	 74.000000	  208.666667	Gungan
+  4	 80.000000	  173.000000	Zabrak
+  5	 55.000000	  179.000000	Twi'lek
+  6	 53.100000	  168.000000	Mirialan
+  7	 88.000000	  221.000000	Kaminoan
+  ```
+  Available functions are:
+  - [ ] all
+  - [ ] any
+  - [ ] approximate_median
+  - ✓ count
+  - [ ] count_distinct
+  - [ ] distinct
+  - ✓ max
+  - ✓ mean
+  - ✓ min
+  - [ ] min_max
+  - ✓ product
+  - ✓ stddev
+  - ✓ sum
+  - [ ] tdigest
+  - ✓ variance
+## Combining DataFrames
+- [ ]  obs
+- [ ] Add vars
+- [ ] Inner join
+- [ ] Left join
+## Encoding
+- [ ] One-hot encoding
+## Iteration (not impremented)