RubyGems - red_amber - Versions diffs - 0.1.4 → 0.1.5 - Mend

red_amber 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

checksums.yaml +4 -4
data/.rubocop.yml +8 -8
data/CHANGELOG.md +74 -7
data/Gemfile +3 -0
data/README.md +47 -13
data/benchmark/csv_load_penguins.yml +15 -0
data/benchmark/drop_nil.yml +11 -0
data/doc/DataFrame.md +185 -35
data/doc/Vector.md +132 -10
data/doc/image/dataframe_model.png +0 -0
data/doc/tdr.md +14 -11
data/doc/tdr_ja.md +13 -10
data/lib/red_amber/data_frame.rb +38 -23
data/lib/red_amber/data_frame_displayable.rb +4 -3
data/lib/red_amber/data_frame_helper.rb +8 -8
data/lib/red_amber/data_frame_indexable.rb +38 -0
data/lib/red_amber/data_frame_observation_operation.rb +13 -2
data/lib/red_amber/data_frame_selectable.rb +14 -4
data/lib/red_amber/vector.rb +28 -5
data/lib/red_amber/vector_compensable.rb +68 -0
data/lib/red_amber/vector_functions.rb +16 -13
data/lib/red_amber/version.rb +1 -1
data/lib/red_amber.rb +5 -0
data/red_amber.gemspec +3 -6
metadata +12 -9
data/doc/image/TDR_operations.pdf +0 -0

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 6ceace9db54b82c03ccf00fcd1b7bf2af57d94ea4e54183dc6af1da47e21ef00
-  data.tar.gz: f30578dcec45fd5efec9219c6438fd0108a0690b1cd69b1c398dffacd38aeba1
+  metadata.gz: 4d18eedf5de7fd06fe52e8a82ad38fe12d590dc10929c96872e557b9e946f785
+  data.tar.gz: dda93f0af421096410e00ecf2261e8846a236634bd96ae9941d1b5cd49cd5eb2
 SHA512:
-  metadata.gz: ee26fd212d0cb0758bc4611c5b43b302fe5c1b958239b5a9ac81ee09e936bdded733a719507e24e5434c33fc5d7ece43c973dd66d51413f23cc435ea0bd7570c
-  data.tar.gz: 674f56a11ddf906f608ecf7d7c852bec654a749e9052092553d19be967072d5acec95a096fbecc60ffd4b33fad3f4322354d93fade67230078fff15b6b7398dd
+  metadata.gz: 7c1b1edd6c1f6f3f275ea765c4bc8765327c88a36120a4c5a66dd8afa59f5913db4a5b436d80378554e03403bab823edf7467beea0f44e2803e36f3e9677a065
+  data.tar.gz: 949fd15d2076d4e53fb141375bde282228c7f6566e137047344134c54964fe77fd2f9757b0bdc324eb3cfa14091f2ae928e0e844d28f3ebbcfa17fc7d388bbd0

data/.rubocop.yml CHANGED Viewed

@@ -53,12 +53,10 @@ Layout/LineLength:
 # 18..30 unsatisfactory
 # > 30 dangerous
 Metrics/AbcSize:
-  Max: 23
+  Max: 30
   Exclude:
     - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
-    - 'lib/red_amber/data_frame_selectable.rb' # Max: 27
-    - 'lib/red_amber/data_frame_observation_operation.rb' # Max: 29
-    - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 26
+    - 'lib/red_amber/vector_compensable.rb' # Max: 36
 # Max: 25
 Metrics/BlockLength:
@@ -68,21 +66,21 @@ Metrics/BlockLength:
 # Max: 100
 Metrics/ClassLength:
-  Max: 100
+  Max: 120
   Exclude:
     - 'test/**/*'
 # Max: 7
 Metrics/CyclomaticComplexity:
   Max: 12
+  Exclude:
+    - 'lib/red_amber/vector_compensable.rb' # Max: 14
 # Max: 10
 Metrics/MethodLength:
-  Max: 18
+  Max: 30
   Exclude:
     - 'lib/red_amber/data_frame_displayable.rb' # Max: 33
-    - 'lib/red_amber/data_frame_observation_operation.rb' # Max: 21
-    - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 20
 # Max: 100
 Metrics/ModuleLength:
@@ -93,6 +91,8 @@ Metrics/ModuleLength:
 # Max: 8
 Metrics/PerceivedComplexity:
   Max: 13
+  Exclude:
+    - 'lib/red_amber/vector_compensable.rb' # Max: 15
 # Necessary to define is_na
 Naming/PredicateName:

data/CHANGELOG.md CHANGED Viewed

@@ -1,19 +1,86 @@
-##  - Unreleased
+## [0.2.0] - unreleased
-- Feedback something to Red Arrow
+- Document
+  - YARD support
+- DataFrame#join features
+## [0.1.6] - Unreleased
+- Feedback something to Red Data Tools
 - `DataFrame`
-  - Introduce `group_by`
-  - Introduce `summarize`
   - Introduce `summary` or ``describe`
+  - Add `Quantile` by own code?
   - Improve dataframe obs. manipuration methods to accept float as a index (#10)
-  - More performant
+  - Improve as more performant by benchmark check.
 - `Vector`
   - Support more functions
+  - Support coerece
-- Document
-  - YARD support
+- More examples of frequently needed tasks
+## [0.1.5] - 2022-06-12 (experimental)
+- Bug fixes
+  - Fix DF#tdr to display timestamp type (#19)
+  - Add TZ setting in CI test to pass temporal tests (#19)
+  - Fix example in document of #load(csv_from_URI) (#23)
+- New features and improvements
+  - Improve usability of DataFrame manipulating block (#19)
+    - Add `DataFrame#v` to select a Vector
+    - Add `DataFrame#variables` method
+    - Add `DataFrame#to_arrow`
+    - Add instance variables in DataFrame with lazy initialization
+    - Add `Vector#key` to get key name
+    - Add `Vector#temporal?` to check if temporal type
+    - Refine around DataFrame#variables
+    - Refine init of instance variables
+    - Refine DataFrame#type_classes, V#ectortype_class
+    - Refine DataFrame#tdr to shorten temporal data
+  - Add supports to make up for missing values (#20)
+    - Add VectorArgumentError
+    - Add `Vector#replace_with`
+    - Add helper function to assert with NaN
+      - To assert NaN == NaN
+    - Add `Vector#fill_nil_backward`, `Vector#forward`
+    - Add `DataFrame#remove_nil` method
+    - Change to accept nil as replacement in Vector#replace_with
+  - Introduce index related methods (#22)
+    - Add `Vector#sort_indexes` method
+    - Add `Vector#uniq` method
+    - Add `Vector#tally` and `Vectorvalue_counts` methods
+    - Add `DataFrame#sort` method
+    - Add `DataFrame#group` method
+    - Change to use DataFrame#map_indices in #[]
+  - Add rounding functions with opts (#21)
+    -  With options :mode and :n_digits
+    -  :n_digits also can be specified with :multiple option in `Vector#round_to_multiple`
+    - `Vector#round`
+    - `Vector#ceil`
+    - `Vector#floor`
+    - `Vector#trunc`
+  - Documentation
+    - Update TDR, TDR_ja documents to latest (#18)
+    - Refinement and small fix in DataFrame.md (#18)
+    - Update README to use more effective example (#18)
+    - Delete expired TDR_operations.pdf (#23)
+    - Update README and dataframe_model image (#23)
+    - Update description about rover-df in README (#23)
+    - Add installation of Arrow in README (#23)
+  - Others
+    - Tried but cannot use bundler cache in ci test (#17)
+    - Bump up requirements to Arrow 8.0.0 (#25)
+      - Arrow 7.0.0 with Ubuntu 21.04 causes an fatal error in replace_with_mask function.
+    - Update the description of gem (#23)
+    - Add benchmark tests (#26)
 ## [0.1.4] - 2022-05-29 (experimental)

data/Gemfile CHANGED Viewed

@@ -14,4 +14,7 @@ group :test do
   gem 'test-unit'
   gem 'webrick'
+  gem 'benchmark_driver'
+  gem 'red-datasets-arrow'
 end

data/README.md CHANGED Viewed

@@ -3,18 +3,27 @@
 A simple dataframe library for Ruby (experimental)
 - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
-- Simple API similar to [Rover-df](https://github.com/ankane/rover)
+- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
 ## Requirements
 ```ruby
-gem 'red-arrow',   '>= 7.0.0'
-gem 'red-parquet', '>= 7.0.0' # if you use IO from/to parquet
+gem 'red-arrow',   '>= 8.0.0'
+gem 'red-parquet', '>= 8.0.0' # if you use IO from/to parquet
 gem 'rover-df',    '~> 0.3.0' # if you use IO from/to Rover::DataFrame
 ```
 ## Installation
+Install requirements before you install Red Amber.
+- Apache Arrow GLib (>= 8.0.0)
+- Apache Parquet GLib (>= 8.0.0)
+  See [Apache Arrow install document](https://arrow.apache.org/install/).
+  Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber.
 Add this line to your Gemfile:
 ```ruby
@@ -41,8 +50,9 @@ Represents a set of data in 2D-shape.
 require 'red_amber'
 require 'datasets-arrow'
-penguins = Datasets::Penguins.new.to_arrow
-puts RedAmber::DataFrame.new(penguins).tdr
+arrow = Datasets::Penguins.new.to_arrow
+penguins = RedAmber::DataFrame.new(arrow)
+penguins.tdr
 # =>
 RedAmber::DataFrame : 344 x 8 Vectors
 Vectors : 5 numeric, 3 strings
@@ -71,12 +81,10 @@ Vector : 1 numeric
 1 :body_mass_g int64    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
 ```
-`DataFrame#assign` can accept a block and create new variables.
+`DataFrame#assign` creates new variables (column in the table).
 ```ruby
-df.assign do
-  {:body_mass_kg => penguins[:body_mass_g] / 1000.0}
-end
+df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
 # =>
 #<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000fa28>
 Vectors : 2 numeric
@@ -85,7 +93,33 @@ Vectors : 2 numeric
 2 :body_mass_kg double    95 [3.75, 3.8, 3.25, nil, 3.45, ... ], 2 nils
 ```
-Other DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove` and `rename` also accept a block.
+DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
+This is an exaple to eliminate observations (row in the table) containing nil.
+```ruby
+# remove all observation contains nil
+nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
+nil_removed.tdr
+# =>
+RedAmber::DataFrame : 342 x 8 Vectors
+Vectors : 5 numeric, 3 strings
+# key                type   level data_preview
+1 :species           string     3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
+2 :island            string     3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
+3 :bill_length_mm    double   164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
+4 :bill_depth_mm     double    80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
+5 :flipper_length_mm int64     55 [181, 186, 195, 193, 190, ... ]
+6 :body_mass_g       int64     94 [3750, 3800, 3250, 3450, 3650, ... ]
+7 :sex               string     3 {"male"=>168, "female"=>165, ""=>9}
+8 :year              int64      3 {2007=>109, 2008=>114, 2009=>119}
+```
+For this frequently needed task, we can do it much simpler.
+```ruby
+penguins.remove_nil # => same result as above
+```
 See [DataFrame.md](doc/DataFrame.md) for details.
@@ -95,10 +129,10 @@ See [DataFrame.md](doc/DataFrame.md) for details.
 Class `RedAmber::Vector` represents a series of data in the DataFrame.
 ```ruby
-penguins[:species]
+penguins[:bill_length_mm]
 # =>
-#<RedAmber::Vector(:string, size=344):0x000000000000f8e8>
-["Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", ... ]
+#<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
+[39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
 ```
 Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).

data/benchmark/csv_load_penguins.yml ADDED Viewed

@@ -0,0 +1,15 @@
+prelude: |
+  require 'datasets-arrow'
+  require 'rover'
+  require 'red_amber'
+  penguins_csv = 'benchmark/cache/penguins.csv'
+  unless File.exist?(penguins_csv)
+    arrow = Datasets::Penguins.new.to_arrow
+    RedAmber::DataFrame.new(arrow).save(penguins_csv)
+  end
+benchmark:
+  'penguins by Rover': Rover.read_csv(penguins_csv)
+  'penguins by RedAmber': RedAmber::DataFrame.load(penguins_csv)

data/benchmark/drop_nil.yml ADDED Viewed

@@ -0,0 +1,11 @@
+prelude: |
+  require 'datasets-arrow'
+  require 'red_amber'
+  penguins = RedAmber::DataFrame.new(Datasets::Penguins.new.to_arrow)
+  def drop_nil(penguins)
+    penguins.remove { vectors.map { |v| v.is_nil} }
+  end
+benchmark: drop_nil(penguins)

data/doc/DataFrame.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # DataFrame
-Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
+Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
 - A collection of data which have same data type within. We call it `Vector`.
 - A label is attached to `Vector`. We call it `key`.
 - A `Vector` and associated `key` is grouped as a `variable`.
@@ -11,13 +11,13 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 ## Constructors and saving
-### `new` from a columnar Hash
+### `new` from a Hash
   ```ruby
   RedAmber::DataFrame.new(x: [1, 2, 3])
   ```
-### `new` from a schema (by Hash) and rows (by Array)
+### `new` from a schema (by Hash) and data (by Array)
   ```ruby
   RedAmber::DataFrame.new({:x=>:uint8}, [[1], [2], [3]])
@@ -52,7 +52,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 - from a URI
   ```ruby
-  uri = URI("https://github.com/heronshoes/red_amber/blob/master/test/entity/with_header.csv")
+  uri = URI("uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
   RedAmber::DataFrame.load(uri)
   ```
@@ -78,7 +78,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 ## Properties
-### `table`
+### `table`, `to_arrow`
 - Reader of Arrow::Table object inside.
@@ -93,16 +93,53 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 ### `shape`
 - Returns shape in an Array[n_rows, n_cols].
+### `variables`
+- Returns key names and Vectors pair in a Hash.
+  It is convenient to use in a block when both key and vector required. We will write:
+  ```ruby
+    # update numeric variables
+    df.assign do
+      variables.select.with_object({}) do |(key, vector), assigner|
+        assigner[key] = vector * -1 if vector.numeric?
+      end
+    end
+  ```
+  Instead of:
+  ```ruby
+    df.assign do
+      assigner = {}
+      vectors.each_with_index do |vector, i|
+        assigner[keys[i]] = vector * -1 if vector.numeric?
+      end
+      assigner
+    end
+  ```
 ### `keys`, `var_names`, `column_names`
 - Returns key names in an Array.
+  When we use it with vectors, Vector#key is useful to get the key inside of DataFrame.
+  ```ruby
+    # update numeric variables, another solution
+    df.assign do
+      vectors.each_with_object({}) do |vector, assigner|
+        assigner[vector.key] = vector * -1 if vector.numeric?
+      end
+    end
+  ```
 ### `types`
 - Returns types of vectors in an Array of Symbols.
-### `data_types`
+### `type_classes`
 - Returns types of vector in an Array of `Arrow::DataType`.
@@ -167,7 +204,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
   8 :year              uint16     3 {2007=>110, 2008=>114, 2009=>120}
   ```
-  - limit: limits variable number to show. Default value is 10.
+  - limit: limit of variables to show. Default value is 10.
   - tally: max level to use tally mode.
   - elements: max num of element to show values in each observations.
@@ -224,7 +261,16 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
   #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
   [1, 2, 3]
   ```
-  This may be useful to use in a block of DataFrame manipulations.
+  Or `#v` method also returns a Vector for a key.
+  ```ruby
+  df.v(:a)
+  # =>
+  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
+  [1, 2, 3]
+  ```
+  This may be useful to use in a block of DataFrame manipulation verbs. We can write `v(:a)` rather than `self[:a]` or `df[:a]`
 ### Select observations (rows in a table) by `[]` as `[index]`, `[range]`, `[array]`
@@ -267,13 +313,13 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
     3 :c  double     1 [1.0]
     ```
-### Select rows from top or bottom
+### Select rows from top or from bottom
   `head(n=5)`, `tail(n=5)`, `first(n=1)`, `last(n=1)`
 ## Sub DataFrame manipulations
-### `pick`
+### `pick  ` - pick up variables by key label -
   Pick up some variables (columns) to create a sub DataFrame.
@@ -313,6 +359,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
     `pick {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return keys, or a boolean Array with a same length as `n_keys`. Block is called in the context of self.
     ```ruby
+    # It is ok to write `keys ...` in the block, not `penguins.keys ...`
     penguins.pick { keys.map { |key| key.end_with?('mm') } }
     # =>
     #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000000f1cc>
@@ -323,7 +370,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
     3 :flipper_length_mm int64     56 [181, 186, 195, nil, 193, ... ], 2 nils
     ```
-### `drop`
+### `drop  ` - pick and drop -
   Drop some variables (columns) to create a remainer DataFrame.
@@ -352,15 +399,10 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
   ```
 - Difference between `pick`/`drop` and `[]`
-  If `pick` or `drop` will select single variable (column), it returns a `DataFrame` with one variable. In contrast, `[]` returns a `Vector`.
+  If `pick` or `drop` will select a single variable (column), it returns a `DataFrame` with one variable. In contrast, `[]` returns a `Vector`. This behavior may be useful to use in a block of DataFrame manipulations.
   ```ruby
   df = RedAmber::DataFrame.new(a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3])
-  df[:a]
-  # =>
-  #<RedAmber::Vector(:uint8, size=3):0x000000000000f258>
-  [1, 2, 3]
   df.pick(:a) # or
   df.drop(:b, :c)
   # =>
@@ -368,9 +410,14 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
   Vector : 1 numeric
   # key type  level data_preview
   1 :a  uint8     3 [1, 2, 3]
+  df[:a]
+  # =>
+  #<RedAmber::Vector(:uint8, size=3):0x000000000000f258>
+  [1, 2, 3]
   ```
-### `slice`
+### `slice  `  - to cut vertically is slice -
   Slice and select observations (rows) to create a sub DataFrame.
@@ -488,17 +535,17 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
     removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
     removed.tdr
     # =>
-    RedAmber::DataFrame : 342 x 8 Vectors
+    RedAmber::DataFrame : 333 x 8 Vectors
     Vectors : 5 numeric, 3 strings
     # key                type   level data_preview
-    1 :species           string     3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
-    2 :island            string     3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
-    3 :bill_length_mm    double   164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
-    4 :bill_depth_mm     double    80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
-    5 :flipper_length_mm int64     55 [181, 186, 195, 193, 190, ... ]
-    6 :body_mass_g       int64     94 [3750, 3800, 3250, 3450, 3650, ... ]
-    7 :sex               string     3 {"male"=>168, "female"=>165, ""=>9}
-    8 :year              int64      3 {2007=>109, 2008=>114, 2009=>119}
+    1 :species           string     3 {"Adelie"=>146, "Chinstrap"=>68, "Gentoo"=>119}
+    2 :island            string     3 {"Torgersen"=>47, "Biscoe"=>163, "Dream"=>123}
+    3 :bill_length_mm    double   163 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
+    4 :bill_depth_mm     double    79 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
+    5 :flipper_length_mm uint8     54 [181, 186, 195, 193, 190, ... ]
+    6 :body_mass_g       uint16    93 [3750, 3800, 3250, 3450, 3650, ... ]
+    7 :sex               string     2 {"male"=>168, "female"=>165}
+    8 :year              uint16     3 {2007=>103, 2008=>113, 2009=>117}
     ```
 - Keys or booleans by a block
@@ -583,7 +630,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 ### `assign`
-  Assign new variables (columns) and create a updated DataFrame.
+  Assign new or updated variables (columns) and create a updated DataFrame.
   - Variables with new keys will append new variables at bottom (right in the table).
   - Variables with exisiting keys will update corresponding vectors.
@@ -649,6 +696,14 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
     1 :index  int8       5 [0, -1, -2, -3, nil], 1 nil
     2 :float  double     5 [-0.0, -1.1, -2.2, NaN, nil], 1 NaN, 1 nil
     3 :string string     5 ["A", "B", "C", "D", nil], 1 nil
+    # Or it ’s shorter like this:
+    df.assign do
+      variables.select.with_object({}) do |(key, vector), assigner|
+        assigner[key] = vector * -1 if vector.numeric?
+      end
+    end
+    # => same as above
     ```
 - Key type
@@ -657,21 +712,116 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 ## Updating
-- [ ] Update elements matching a condition
+### `sort`
-- [ ] Clamp
+  `sort` accepts parameters as sort_keys thanks to the amazing Red Arrow feature。
+    - :key, "key" or "+key" denotes ascending order
+    - "-key" denotes descending order
+  ```ruby
+  df = RedAmber::DataFrame.new({
+        index:  [1, 1, 0, nil, 0],
+        string: ['C', 'B', nil, 'A', 'B'],
+        bool:   [nil, true, false, true, false],
+      })
+  df.sort(:index, '-bool').tdr(tally: 0)
+  # =>
+  RedAmber::DataFrame : 5 x 3 Vectors
+  Vectors : 1 numeric, 1 string, 1 boolean
+  # key     type    level data_preview
+  1 :index  uint8       3 [0, 0, 1, 1, nil], 1 nil
+  2 :string string      4 [nil, "B", "B", "C", "A"], 1 nil
+  3 :bool   boolean     3 [false, false, true, nil, true], 1 nil
+  ```
-- [ ] Sort rows
+- [ ] Clamp
 - [ ] Clear data
 ## Treat na data
-- [ ] Drop na (NaN, nil)
+### `remove_nil`
+  Remove any observations containing nil.
+## Grouping
+### `group(aggregating_keys, function, target_keys)`
-- [ ] Replace na with value
+  Create grouped dataframe by `aggregation_keys` and apply `function` to each group and returns in `target_keys`. Aggregated key name is `function(key)` style.
+  (The current implementation is not intuitive. Needs improvement.)
+  ```ruby
+  ds = Datasets::Rdatasets.new('dplyr', 'starwars')
+  starwars = RedAmber::DataFrame.new(ds.to_table.to_h)
+  starwars.tdr(11)
+  # =>
+  RedAmber::DataFrame : 87 x 11 Vectors
+  Vectors : 3 numeric, 8 strings
+  #  key         type   level data_preview
+  1  :name       string    87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader",   "Leia Organa", ... ]
+  2  :height     uint16    46 [172, 167, 96, 202, 150, ... ], 6 nils
+  3  :mass       double    39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
+  4  :hair_color string    13 ["blond", nil, nil, "none", "brown", ... ], 5 nils
+  5  :skin_color string    31 ["fair", "gold", "white, blue", "white", "light", ..  . ]
+  6  :eye_color  string    15 ["blue", "yellow", "red", "yellow", "brown", ... ]
+  7  :birth_year double    37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
+  8  :sex        string     5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, nil=>4}
+  9  :gender     string     3 {"masculine"=>66, "feminine"=>17, nil=>4}
+  10 :homeworld  string    49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ], 10 nils
+  11 :species    string    38 ["Human", "Droid", "Droid", "Human", "Human", ... ], 4 nils
+  grouped = starwars.group(:species, :mean, [:mass, :height])
+  # =>
+  #<RedAmber::DataFrame : 38 x 3 Vectors, 0x000000000000fbf4>
+  Vectors : 2 numeric, 1 string
+  # key             type   level data_preview
+  1 :"mean(mass)"   double    27 [82.78181818181818, 69.75, 124.0, 74.0, 1358.0, ... ], 6 nils
+  2 :"mean(height)" double    32 [176.6451612903226, 131.2, 231.0, 173.0, 175.0, ... ]
+  3 :species        string    38 ["Human", "Droid", "Wookiee", "Rodian", "Hutt", ... ], 1 nil
+  count = starwars.group(:species, :count, :species)[:"count(species)"]
+  df = grouped.slice(count > 1)
+  # =>
+  #<RedAmber::DataFrame : 8 x 3 Vectors, 0x000000000000fc44>
+  Vectors : 2 numeric, 1 string
+  # key             type   level data_preview
+  1 :"mean(mass)"   double     8 [82.78181818181818, 69.75, 124.0, 74.0, 80.0, ... ]
+  2 :"mean(height)" double     8 [176.6451612903226, 131.2, 231.0, 208.66666666666666, 173.0, ... ]
+  3 :species        string     8 ["Human", "Droid", "Wookiee", "Gungan", "Zabrak", ... ]
+  df.table
+  # =>
+  #<Arrow::Table:0x1165593c8 ptr=0x7fb3db144c70>
+	mean(mass)	mean(height)	species
+  0	 82.781818	  176.645161	Human
+  1	 69.750000	  131.200000	Droid
+  2	124.000000	  231.000000	Wookiee
+  3	 74.000000	  208.666667	Gungan
+  4	 80.000000	  173.000000	Zabrak
+  5	 55.000000	  179.000000	Twi'lek
+  6	 53.100000	  168.000000	Mirialan
+  7	 88.000000	  221.000000	Kaminoan
+  ```
-- [ ] Interpolate na with convolution array
+  Available functions are:
+  - [ ] all
+  - [ ] any
+  - [ ] approximate_median
+  - ✓ count
+  - [ ] count_distinct
+  - [ ] distinct
+  - ✓ max
+  - ✓ mean
+  - ✓ min
+  - [ ] min_max
+  - ✓ product
+  - ✓ stddev
+  - ✓ sum
+  - [ ] tdigest
+  - ✓ variance
 ## Combining DataFrames