RubyGems - red_amber - Versions diffs - 0.1.4 → 0.1.5 - Mend

red_amber 0.1.4 → 0.1.5

Files changed (26) hide show

checksums.yaml +4 -4
data/.rubocop.yml +8 -8
data/CHANGELOG.md +74 -7
data/Gemfile +3 -0
data/README.md +47 -13
data/benchmark/csv_load_penguins.yml +15 -0
data/benchmark/drop_nil.yml +11 -0
data/doc/DataFrame.md +185 -35
data/doc/Vector.md +132 -10
data/doc/image/dataframe_model.png +0 -0
data/doc/tdr.md +14 -11
data/doc/tdr_ja.md +13 -10
data/lib/red_amber/data_frame.rb +38 -23
data/lib/red_amber/data_frame_displayable.rb +4 -3
data/lib/red_amber/data_frame_helper.rb +8 -8
data/lib/red_amber/data_frame_indexable.rb +38 -0
data/lib/red_amber/data_frame_observation_operation.rb +13 -2
data/lib/red_amber/data_frame_selectable.rb +14 -4
data/lib/red_amber/vector.rb +28 -5
data/lib/red_amber/vector_compensable.rb +68 -0
data/lib/red_amber/vector_functions.rb +16 -13
data/lib/red_amber/version.rb +1 -1
data/lib/red_amber.rb +5 -0
data/red_amber.gemspec +3 -6
metadata +12 -9
data/doc/image/TDR_operations.pdf +0 -0

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 6ceace9db54b82c03ccf00fcd1b7bf2af57d94ea4e54183dc6af1da47e21ef00
-  data.tar.gz: f30578dcec45fd5efec9219c6438fd0108a0690b1cd69b1c398dffacd38aeba1
+  metadata.gz: 4d18eedf5de7fd06fe52e8a82ad38fe12d590dc10929c96872e557b9e946f785
+  data.tar.gz: dda93f0af421096410e00ecf2261e8846a236634bd96ae9941d1b5cd49cd5eb2
 SHA512:
-  metadata.gz: ee26fd212d0cb0758bc4611c5b43b302fe5c1b958239b5a9ac81ee09e936bdded733a719507e24e5434c33fc5d7ece43c973dd66d51413f23cc435ea0bd7570c
-  data.tar.gz: 674f56a11ddf906f608ecf7d7c852bec654a749e9052092553d19be967072d5acec95a096fbecc60ffd4b33fad3f4322354d93fade67230078fff15b6b7398dd
+  metadata.gz: 7c1b1edd6c1f6f3f275ea765c4bc8765327c88a36120a4c5a66dd8afa59f5913db4a5b436d80378554e03403bab823edf7467beea0f44e2803e36f3e9677a065
+  data.tar.gz: 949fd15d2076d4e53fb141375bde282228c7f6566e137047344134c54964fe77fd2f9757b0bdc324eb3cfa14091f2ae928e0e844d28f3ebbcfa17fc7d388bbd0

data/.rubocop.yml CHANGED Viewed

@@ -53,12 +53,10 @@ Layout/LineLength:
 # 18..30 unsatisfactory
 # > 30 dangerous
 Metrics/AbcSize:
-  Max: 23
+  Max: 30
   Exclude:
     - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
-    - 'lib/red_amber/data_frame_selectable.rb' # Max: 27
-    - 'lib/red_amber/data_frame_observation_operation.rb' # Max: 29
-    - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 26
+    - 'lib/red_amber/vector_compensable.rb' # Max: 36
 # Max: 25
 Metrics/BlockLength:
@@ -68,21 +66,21 @@ Metrics/BlockLength:
 # Max: 100
 Metrics/ClassLength:
-  Max: 100
+  Max: 120
   Exclude:
     - 'test/**/*'
 # Max: 7
 Metrics/CyclomaticComplexity:
   Max: 12
+  Exclude:
+    - 'lib/red_amber/vector_compensable.rb' # Max: 14
 # Max: 10
 Metrics/MethodLength:
-  Max: 18
+  Max: 30
   Exclude:
     - 'lib/red_amber/data_frame_displayable.rb' # Max: 33
-    - 'lib/red_amber/data_frame_observation_operation.rb' # Max: 21
-    - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 20
 # Max: 100
 Metrics/ModuleLength:
@@ -93,6 +91,8 @@ Metrics/ModuleLength:
 # Max: 8
 Metrics/PerceivedComplexity:
   Max: 13
+  Exclude:
+    - 'lib/red_amber/vector_compensable.rb' # Max: 15
 # Necessary to define is_na
 Naming/PredicateName:

data/CHANGELOG.md CHANGED Viewed

@@ -1,19 +1,86 @@
-##  - Unreleased
+## [0.2.0] - unreleased
-- Feedback something to Red Arrow
+- Document
+  - YARD support
+- DataFrame#join features
+## [0.1.6] - Unreleased
+- Feedback something to Red Data Tools
 - `DataFrame`
-  - Introduce `group_by`
-  - Introduce `summarize`
   - Introduce `summary` or ``describe`
+  - Add `Quantile` by own code?
   - Improve dataframe obs. manipuration methods to accept float as a index (#10)
-  - More performant
+  - Improve as more performant by benchmark check.
 - `Vector`
   - Support more functions
+  - Support coerece
-- Document
-  - YARD support
+- More examples of frequently needed tasks
+## [0.1.5] - 2022-06-12 (experimental)
+- Bug fixes
+  - Fix DF#tdr to display timestamp type (#19)
+  - Add TZ setting in CI test to pass temporal tests (#19)
+  - Fix example in document of #load(csv_from_URI) (#23)
+- New features and improvements
+  - Improve usability of DataFrame manipulating block (#19)
+    - Add `DataFrame#v` to select a Vector
+    - Add `DataFrame#variables` method
+    - Add `DataFrame#to_arrow`
+    - Add instance variables in DataFrame with lazy initialization
+    - Add `Vector#key` to get key name
+    - Add `Vector#temporal?` to check if temporal type
+    - Refine around DataFrame#variables
+    - Refine init of instance variables
+    - Refine DataFrame#type_classes, V#ectortype_class
+    - Refine DataFrame#tdr to shorten temporal data
+  - Add supports to make up for missing values (#20)
+    - Add VectorArgumentError
+    - Add `Vector#replace_with`
+    - Add helper function to assert with NaN
+      - To assert NaN == NaN
+    - Add `Vector#fill_nil_backward`, `Vector#forward`
+    - Add `DataFrame#remove_nil` method
+    - Change to accept nil as replacement in Vector#replace_with
+  - Introduce index related methods (#22)
+    - Add `Vector#sort_indexes` method
+    - Add `Vector#uniq` method
+    - Add `Vector#tally` and `Vectorvalue_counts` methods
+    - Add `DataFrame#sort` method
+    - Add `DataFrame#group` method
+    - Change to use DataFrame#map_indices in #[]
+  - Add rounding functions with opts (#21)
+    -  With options :mode and :n_digits
+    -  :n_digits also can be specified with :multiple option in `Vector#round_to_multiple`
+    - `Vector#round`
+    - `Vector#ceil`
+    - `Vector#floor`
+    - `Vector#trunc`
+  - Documentation
+    - Update TDR, TDR_ja documents to latest (#18)
+    - Refinement and small fix in DataFrame.md (#18)
+    - Update README to use more effective example (#18)
+    - Delete expired TDR_operations.pdf (#23)
+    - Update README and dataframe_model image (#23)
+    - Update description about rover-df in README (#23)
+    - Add installation of Arrow in README (#23)
+  - Others
+    - Tried but cannot use bundler cache in ci test (#17)
+    - Bump up requirements to Arrow 8.0.0 (#25)
+      - Arrow 7.0.0 with Ubuntu 21.04 causes an fatal error in replace_with_mask function.
+    - Update the description of gem (#23)
+    - Add benchmark tests (#26)
 ## [0.1.4] - 2022-05-29 (experimental)

data/Gemfile CHANGED Viewed

@@ -14,4 +14,7 @@ group :test do
   gem 'test-unit'
   gem 'webrick'
+  gem 'benchmark_driver'
+  gem 'red-datasets-arrow'
 end

data/README.md CHANGED Viewed

@@ -3,18 +3,27 @@
 A simple dataframe library for Ruby (experimental)
 - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
-- Simple API similar to [Rover-df](https://github.com/ankane/rover)
+- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
 ## Requirements
 ```ruby
-gem 'red-arrow',   '>= 7.0.0'
-gem 'red-parquet', '>= 7.0.0' # if you use IO from/to parquet
+gem 'red-arrow',   '>= 8.0.0'
+gem 'red-parquet', '>= 8.0.0' # if you use IO from/to parquet
 gem 'rover-df',    '~> 0.3.0' # if you use IO from/to Rover::DataFrame
 ```
 ## Installation
+Install requirements before you install Red Amber.
+- Apache Arrow GLib (>= 8.0.0)
+- Apache Parquet GLib (>= 8.0.0)
+  See [Apache Arrow install document](https://arrow.apache.org/install/).
+  Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber.
 Add this line to your Gemfile:
 ```ruby
@@ -41,8 +50,9 @@ Represents a set of data in 2D-shape.
 require 'red_amber'
 require 'datasets-arrow'
-penguins = Datasets::Penguins.new.to_arrow
-puts RedAmber::DataFrame.new(penguins).tdr
+arrow = Datasets::Penguins.new.to_arrow
+penguins = RedAmber::DataFrame.new(arrow)
+penguins.tdr
 # =>
 RedAmber::DataFrame : 344 x 8 Vectors
 Vectors : 5 numeric, 3 strings
@@ -71,12 +81,10 @@ Vector : 1 numeric
 1 :body_mass_g int64    95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
 ```
-`DataFrame#assign` can accept a block and create new variables.
+`DataFrame#assign` creates new variables (column in the table).
 ```ruby
-df.assign do
-  {:body_mass_kg => penguins[:body_mass_g] / 1000.0}
-end
+df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
 # =>
 #<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000fa28>
 Vectors : 2 numeric
@@ -85,7 +93,33 @@ Vectors : 2 numeric
 2 :body_mass_kg double    95 [3.75, 3.8, 3.25, nil, 3.45, ... ], 2 nils
 ```
-Other DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove` and `rename` also accept a block.
+DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
+This is an exaple to eliminate observations (row in the table) containing nil.
+```ruby
+# remove all observation contains nil
+nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
+nil_removed.tdr
+# =>
+RedAmber::DataFrame : 342 x 8 Vectors
+Vectors : 5 numeric, 3 strings
+# key                type   level data_preview
+1 :species           string     3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
+2 :island            string     3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
+3 :bill_length_mm    double   164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
+4 :bill_depth_mm     double    80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
+5 :flipper_length_mm int64     55 [181, 186, 195, 193, 190, ... ]
+6 :body_mass_g       int64     94 [3750, 3800, 3250, 3450, 3650, ... ]
+7 :sex               string     3 {"male"=>168, "female"=>165, ""=>9}
+8 :year              int64      3 {2007=>109, 2008=>114, 2009=>119}
+```
+For this frequently needed task, we can do it much simpler.
+```ruby
+penguins.remove_nil # => same result as above
+```
 See [DataFrame.md](doc/DataFrame.md) for details.
@@ -95,10 +129,10 @@ See [DataFrame.md](doc/DataFrame.md) for details.
 Class `RedAmber::Vector` represents a series of data in the DataFrame.
 ```ruby
-penguins[:species]
+penguins[:bill_length_mm]
 # =>
-#<RedAmber::Vector(:string, size=344):0x000000000000f8e8>
-["Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", ... ]
+#<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
+[39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
 ```
 Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).

data/benchmark/csv_load_penguins.yml ADDED Viewed

@@ -0,0 +1,15 @@
+prelude: |
+  require 'datasets-arrow'
+  require 'rover'
+  require 'red_amber'
+  penguins_csv = 'benchmark/cache/penguins.csv'
+  unless File.exist?(penguins_csv)
+    arrow = Datasets::Penguins.new.to_arrow
+    RedAmber::DataFrame.new(arrow).save(penguins_csv)
+  end
+benchmark:
+  'penguins by Rover': Rover.read_csv(penguins_csv)
+  'penguins by RedAmber': RedAmber::DataFrame.load(penguins_csv)

data/benchmark/drop_nil.yml ADDED Viewed

@@ -0,0 +1,11 @@
+prelude: |
+  require 'datasets-arrow'
+  require 'red_amber'
+  penguins = RedAmber::DataFrame.new(Datasets::Penguins.new.to_arrow)
+  def drop_nil(penguins)
+    penguins.remove { vectors.map { |v| v.is_nil} }
+  end
+benchmark: drop_nil(penguins)

data/doc/DataFrame.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # DataFrame
-Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
+Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
 - A collection of data which have same data type within. We call it `Vector`.
 - A label is attached to `Vector`. We call it `key`.
 - A `Vector` and associated `key` is grouped as a `variable`.
@@ -11,13 +11,13 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 ## Constructors and saving
-### `new` from a columnar Hash
+### `new` from a Hash
   ```ruby
   RedAmber::DataFrame.new(x: [1, 2, 3])
   ```
-### `new` from a schema (by Hash) and rows (by Array)
+### `new` from a schema (by Hash) and data (by Array)
   ```ruby
   RedAmber::DataFrame.new({:x=>:uint8}, [[1], [2], [3]])
@@ -52,7 +52,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 - from a URI
   ```ruby
-  uri = URI("https://github.com/heronshoes/red_amber/blob/master/test/entity/with_header.csv")
+  uri = URI("uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
   RedAmber::DataFrame.load(uri)
   ```
@@ -78,7 +78,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 ## Properties
-### `table`
+### `table`, `to_arrow`
 - Reader of Arrow::Table object inside.
@@ -93,16 +93,53 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 ### `shape`
 - Returns shape in an Array[n_rows, n_cols].
+### `variables`
+- Returns key names and Vectors pair in a Hash.
+  It is convenient to use in a block when both key and vector required. We will write:
+  ```ruby
+    # update numeric variables
+    df.assign do
+      variables.select.with_object({}) do |(key, vector), assigner|
+        assigner[key] = vector * -1 if vector.numeric?
+      end
+    end
+  ```
+  Instead of:
+  ```ruby
+    df.assign do
+      assigner = {}
+      vectors.each_with_index do |vector, i|
+        assigner[keys[i]] = vector * -1 if vector.numeric?
+      end
+      assigner
+    end
+  ```
 ### `keys`, `var_names`, `column_names`
 - Returns key names in an Array.
+  When we use it with vectors, Vector#key is useful to get the key inside of DataFrame.
+  ```ruby
+    # update numeric variables, another solution
+    df.assign do
+      vectors.each_with_object({}) do |vector, assigner|
+        assigner[vector.key] = vector * -1 if vector.numeric?
+      end
+    end
+  ```
 ### `types`
 - Returns types of vectors in an Array of Symbols.
-### `data_types`
+### `type_classes`
 - Returns types of vector in an Array of `Arrow::DataType`.
@@ -167,7 +204,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
   8 :year              uint16     3 {2007=>110, 2008=>114, 2009=>120}
   ```
-  - limit: limits variable number to show. Default value is 10.
+  - limit: limit of variables to show. Default value is 10.
   - tally: max level to use tally mode.
   - elements: max num of element to show values in each observations.
@@ -224,7 +261,16 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
   #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
   [1, 2, 3]
   ```
-  This may be useful to use in a block of DataFrame manipulations.
+  Or `#v` method also returns a Vector for a key.
+  ```ruby
+  df.v(:a)
+  # =>
+  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
+  [1, 2, 3]
+  ```
+  This may be useful to use in a block of DataFrame manipulation verbs. We can write `v(:a)` rather than `self[:a]` or `df[:a]`
 ### Select observations (rows in a table) by `[]` as `[index]`, `[range]`, `[array]`
@@ -267,13 +313,13 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
     3 :c  double     1 [1.0]
     ```
-### Select rows from top or bottom
+### Select rows from top or from bottom
   `head(n=5)`, `tail(n=5)`, `first(n=1)`, `last(n=1)`
 ## Sub DataFrame manipulations
-### `pick`
+### `pick  ` - pick up variables by key label -
   Pick up some variables (columns) to create a sub DataFrame.
@@ -313,6 +359,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
     `pick {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return keys, or a boolean Array with a same length as `n_keys`. Block is called in the context of self.
     ```ruby
+    # It is ok to write `keys ...` in the block, not `penguins.keys ...`
     penguins.pick { keys.map { |key| key.end_with?('mm') } }
     # =>
     #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000000f1cc>
@@ -323,7 +370,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
     3 :flipper_length_mm int64     56 [181, 186, 195, nil, 193, ... ], 2 nils
     ```
-### `drop`
+### `drop  ` - pick and drop -
   Drop some variables (columns) to create a remainer DataFrame.
@@ -352,15 +399,10 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
   ```
 - Difference between `pick`/`drop` and `[]`
-  If `pick` or `drop` will select single variable (column), it returns a `DataFrame` with one variable. In contrast, `[]` returns a `Vector`.
+  If `pick` or `drop` will select a single variable (column), it returns a `DataFrame` with one variable. In contrast, `[]` returns a `Vector`. This behavior may be useful to use in a block of DataFrame manipulations.
   ```ruby
   df = RedAmber::DataFrame.new(a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3])
-  df[:a]
-  # =>
-  #<RedAmber::Vector(:uint8, size=3):0x000000000000f258>
-  [1, 2, 3]
   df.pick(:a) # or
   df.drop(:b, :c)
   # =>
@@ -368,9 +410,14 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
   Vector : 1 numeric
   # key type  level data_preview
   1 :a  uint8     3 [1, 2, 3]
+  df[:a]
+  # =>
+  #<RedAmber::Vector(:uint8, size=3):0x000000000000f258>
+  [1, 2, 3]
   ```
-### `slice`
+### `slice  `  - to cut vertically is slice -
   Slice and select observations (rows) to create a sub DataFrame.
@@ -488,17 +535,17 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
     removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
     removed.tdr
     # =>
-    RedAmber::DataFrame : 342 x 8 Vectors
+    RedAmber::DataFrame : 333 x 8 Vectors
     Vectors : 5 numeric, 3 strings
     # key                type   level data_preview
-    1 :species           string     3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
-    2 :island            string     3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
-    3 :bill_length_mm    double   164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
-    4 :bill_depth_mm     double    80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
-    5 :flipper_length_mm int64     55 [181, 186, 195, 193, 190, ... ]
-    6 :body_mass_g       int64     94 [3750, 3800, 3250, 3450, 3650, ... ]
-    7 :sex               string     3 {"male"=>168, "female"=>165, ""=>9}
-    8 :year              int64      3 {2007=>109, 2008=>114, 2009=>119}
+    1 :species           string     3 {"Adelie"=>146, "Chinstrap"=>68, "Gentoo"=>119}
+    2 :island            string     3 {"Torgersen"=>47, "Biscoe"=>163, "Dream"=>123}
+    3 :bill_length_mm    double   163 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
+    4 :bill_depth_mm     double    79 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
+    5 :flipper_length_mm uint8     54 [181, 186, 195, 193, 190, ... ]
+    6 :body_mass_g       uint16    93 [3750, 3800, 3250, 3450, 3650, ... ]
+    7 :sex               string     2 {"male"=>168, "female"=>165}
+    8 :year              uint16     3 {2007=>103, 2008=>113, 2009=>117}
     ```
 - Keys or booleans by a block
@@ -583,7 +630,7 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 ### `assign`
-  Assign new variables (columns) and create a updated DataFrame.
+  Assign new or updated variables (columns) and create a updated DataFrame.
   - Variables with new keys will append new variables at bottom (right in the table).
   - Variables with exisiting keys will update corresponding vectors.
@@ -649,6 +696,14 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
     1 :index  int8       5 [0, -1, -2, -3, nil], 1 nil
     2 :float  double     5 [-0.0, -1.1, -2.2, NaN, nil], 1 NaN, 1 nil
     3 :string string     5 ["A", "B", "C", "D", nil], 1 nil
+    # Or it ’s shorter like this:
+    df.assign do
+      variables.select.with_object({}) do |(key, vector), assigner|
+        assigner[key] = vector * -1 if vector.numeric?
+      end
+    end
+    # => same as above
     ```
 - Key type
@@ -657,21 +712,116 @@ Class `RedAmber::DataFrame` represents 2D-data. `DataFrame` consists with:
 ## Updating
-- [ ] Update elements matching a condition
+### `sort`
-- [ ] Clamp
+  `sort` accepts parameters as sort_keys thanks to the amazing Red Arrow feature。
+    - :key, "key" or "+key" denotes ascending order
+    - "-key" denotes descending order
+  ```ruby
+  df = RedAmber::DataFrame.new({
+        index:  [1, 1, 0, nil, 0],
+        string: ['C', 'B', nil, 'A', 'B'],
+        bool:   [nil, true, false, true, false],
+      })
+  df.sort(:index, '-bool').tdr(tally: 0)
+  # =>
+  RedAmber::DataFrame : 5 x 3 Vectors
+  Vectors : 1 numeric, 1 string, 1 boolean
+  # key     type    level data_preview
+  1 :index  uint8       3 [0, 0, 1, 1, nil], 1 nil
+  2 :string string      4 [nil, "B", "B", "C", "A"], 1 nil
+  3 :bool   boolean     3 [false, false, true, nil, true], 1 nil
+  ```
-- [ ] Sort rows
+- [ ] Clamp
 - [ ] Clear data
 ## Treat na data
-- [ ] Drop na (NaN, nil)
+### `remove_nil`
+  Remove any observations containing nil.
+## Grouping
+### `group(aggregating_keys, function, target_keys)`
-- [ ] Replace na with value
+  Create grouped dataframe by `aggregation_keys` and apply `function` to each group and returns in `target_keys`. Aggregated key name is `function(key)` style.
+  (The current implementation is not intuitive. Needs improvement.)
+  ```ruby
+  ds = Datasets::Rdatasets.new('dplyr', 'starwars')
+  starwars = RedAmber::DataFrame.new(ds.to_table.to_h)
+  starwars.tdr(11)
+  # =>
+  RedAmber::DataFrame : 87 x 11 Vectors
+  Vectors : 3 numeric, 8 strings
+  #  key         type   level data_preview
+  1  :name       string    87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader",   "Leia Organa", ... ]
+  2  :height     uint16    46 [172, 167, 96, 202, 150, ... ], 6 nils
+  3  :mass       double    39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
+  4  :hair_color string    13 ["blond", nil, nil, "none", "brown", ... ], 5 nils
+  5  :skin_color string    31 ["fair", "gold", "white, blue", "white", "light", ..  . ]
+  6  :eye_color  string    15 ["blue", "yellow", "red", "yellow", "brown", ... ]
+  7  :birth_year double    37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
+  8  :sex        string     5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, nil=>4}
+  9  :gender     string     3 {"masculine"=>66, "feminine"=>17, nil=>4}
+  10 :homeworld  string    49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ], 10 nils
+  11 :species    string    38 ["Human", "Droid", "Droid", "Human", "Human", ... ], 4 nils
+  grouped = starwars.group(:species, :mean, [:mass, :height])
+  # =>
+  #<RedAmber::DataFrame : 38 x 3 Vectors, 0x000000000000fbf4>
+  Vectors : 2 numeric, 1 string
+  # key             type   level data_preview
+  1 :"mean(mass)"   double    27 [82.78181818181818, 69.75, 124.0, 74.0, 1358.0, ... ], 6 nils
+  2 :"mean(height)" double    32 [176.6451612903226, 131.2, 231.0, 173.0, 175.0, ... ]
+  3 :species        string    38 ["Human", "Droid", "Wookiee", "Rodian", "Hutt", ... ], 1 nil
+  count = starwars.group(:species, :count, :species)[:"count(species)"]
+  df = grouped.slice(count > 1)
+  # =>
+  #<RedAmber::DataFrame : 8 x 3 Vectors, 0x000000000000fc44>
+  Vectors : 2 numeric, 1 string
+  # key             type   level data_preview
+  1 :"mean(mass)"   double     8 [82.78181818181818, 69.75, 124.0, 74.0, 80.0, ... ]
+  2 :"mean(height)" double     8 [176.6451612903226, 131.2, 231.0, 208.66666666666666, 173.0, ... ]
+  3 :species        string     8 ["Human", "Droid", "Wookiee", "Gungan", "Zabrak", ... ]
+  df.table
+  # =>
+  #<Arrow::Table:0x1165593c8 ptr=0x7fb3db144c70>
+	mean(mass)	mean(height)	species
+  0	 82.781818	  176.645161	Human
+  1	 69.750000	  131.200000	Droid
+  2	124.000000	  231.000000	Wookiee
+  3	 74.000000	  208.666667	Gungan
+  4	 80.000000	  173.000000	Zabrak
+  5	 55.000000	  179.000000	Twi'lek
+  6	 53.100000	  168.000000	Mirialan
+  7	 88.000000	  221.000000	Kaminoan
+  ```
-- [ ] Interpolate na with convolution array
+  Available functions are:
+  - [ ] all
+  - [ ] any
+  - [ ] approximate_median
+  - ✓ count
+  - [ ] count_distinct
+  - [ ] distinct
+  - ✓ max
+  - ✓ mean
+  - ✓ min
+  - [ ] min_max
+  - ✓ product
+  - ✓ stddev
+  - ✓ sum
+  - [ ] tdigest
+  - ✓ variance
 ## Combining DataFrames