RubyGems - red_amber - Versions diffs - 0.1.8 → 0.2.0 - Mend

red_amber 0.1.8 → 0.2.0

Files changed (19) hide show

checksums.yaml +4 -4
data/.rubocop.yml +3 -1
data/CHANGELOG.md +71 -2
data/Gemfile +1 -1
data/README.md +58 -33
data/doc/DataFrame.md +196 -55
data/doc/Vector.md +5 -1
data/doc/examples_of_red_amber.ipynb +1677 -348
data/lib/red_amber/data_frame.rb +92 -15
data/lib/red_amber/data_frame_displayable.rb +25 -10
data/lib/red_amber/data_frame_reshaping.rb +85 -0
data/lib/red_amber/data_frame_variable_operation.rb +89 -40
data/lib/red_amber/group.rb +5 -1
data/lib/red_amber/vector_functions.rb +46 -1
data/lib/red_amber/vector_selectable.rb +1 -1
data/lib/red_amber/version.rb +1 -1
data/lib/red_amber.rb +1 -1
data/red_amber.gemspec +1 -1
metadata +5 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 3853e70f378cac65013a3bcfc51a2d55cb70cc494f3f3b70675bed944cc15b49
-  data.tar.gz: 3c65999cf978f1edf8c2c7fcce9a0ccb192d4da051f34fa0bf3f66ddc178eb1c
+  metadata.gz: 73459d02c921fcb0fcb742760e8c882b5491fa5316a79b9016233a516ada013e
+  data.tar.gz: ac25e808c5e5d4c13bb1877659550bba532cb5778371e39dfa1f3b9e5a91a4f8
 SHA512:
-  metadata.gz: fac66ba0bf5955cfe0d21a51b90ec16407182b9053e9b586dfe9f8e2526de4e90efecdd8eba1e8b3c99b12fc44544c82fb2f6af4b666b97876a64a6ee4deedf1
-  data.tar.gz: 1a4cc526ce9f097438f2b7d018552a4cd6aaa2d900012297cd1777c4b9e39063cc2988af91c138e93f291a56175aefb6a6b00c211f9b9c5bd38d75d6bc40acb9
+  metadata.gz: 1bfa4200d440c338f496fe282816634d6a833e30e17edc87a2cf5ec63866e2bbbaf8796916f1b052ea66482c54a038bbf1445258c2526691e42c2b47be2c39c5
+  data.tar.gz: e324e480e6086f7017de58201783c857825b79d0b2e2c8fa2636089cd1c5531e22905a3c0d860f26b833eb6add6ed6017497632bd1ea8fcb932c2d2233b11812

data/.rubocop.yml CHANGED Viewed

@@ -61,6 +61,7 @@ Metrics/AbcSize:
   Max: 30
   Exclude:
     - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
+    - 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
     - 'lib/red_amber/data_frame_selectable.rb' # Max: 51
     - 'lib/red_amber/vector_updatable.rb' # Max: 36
     - 'lib/red_amber/vector_selectable.rb' # Max: 33
@@ -98,9 +99,10 @@ Metrics/MethodLength:
 Metrics/ModuleLength:
   Max: 100
   Exclude:
+    - 'lib/red_amber/data_frame_displayable.rb' # Max: 132
     - 'lib/red_amber/data_frame_selectable.rb' # Max: 141
+    - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 110
     - 'lib/red_amber/vector_functions.rb' # Max: 114
-    - 'lib/red_amber/data_frame_displayable.rb' # Max: 132
 # Max: 8
 Metrics/PerceivedComplexity:

data/CHANGELOG.md CHANGED Viewed

@@ -1,6 +1,75 @@
-## [0.1.9] - Unreleased
+## [0.2.0] - 2022-08-15
-- Supports Arrow 9.0.0
+- Bump version up to 0.2.0
+- Bug fixes
+  - Fix order of multiple group keys (#55)
+    Only 1 group key comes to left. Other keys remain in right.
+  - Remove optional `require` for rover (#55)
+    Fix DataFrame.new for argument with Rover::DataFrame.
+  - Fix occasional failure in CI (#59)
+    Sometimes the CI test fails. I added -dev dependency
+    in Arrow install by apt, not doing in bundler.
+  - Fix calling :take in V#[] (#56)
+    Fixed to call Arrow function :take instead of :array_take in Vector#take_by_vector. This will prevent the error below
+    when called with Arrow::ChunkedArray.
+  - Raise error renaming non existing key (#61)
+    Add error when specified key is not exist.
+  - Fix DataFrame#rename #assign by array (#65)
+- New features and improvements
+  - Support Arrow 9.0.0
+    - Upgrade to Arrow 9.0.0 (#59)
+    - Add Vector#quantile method (#59)
+      Arrow::QuantileOptions has supported in Arrow GLib 9.0.0 (ARROW-16623, Thanks!)
+    - Add Vector#quantiles (#62)
+    - Add DataFrame#each_row (#56)
+      - Returns Enumerator if block is not given.
+      - Change DataFrame#each_row to return a Hash {key => row} (#63)
+  - Refactor to use pattern match in overloaded parameter parsing (#61)
+    - Refine DataFrame.new to use pattern match
+    - Use pattern match in DataFrame#assign
+    - Use pattern match in DataFrame#rename
+  - Accept Array for renamer/assigner in #rename/#assign (#61)
+    - Accept assigner by Arrays in DataFrame#assign
+    - Accept renamer pairs by Arrays in DataFrame#rename
+    - Add DataFrame#assign_left method
+  - Add summary/describe (#62)
+    - Introduce DataFrame#summary(#describe)
+  - Introduce reshaping methods for DataFrame (#64)
+    - Introduce DataFrame#transpose method
+    - Intorduce DataFrame#to_long method
+    - Intorduce DataFrame#to_wide method
+  - Others
+    - Add alias sort_index for array_sort_indices (#59)
+    - Enable :width option in DataFrame#to_s (#62)
+    - Add options to DataFrame#format_table (#62)
+  - Update Documents
+    - Add Yard doc for some methods
+    - Update Jupyter notebook '61 Examples of Red Amber' (#65)
 ## [0.1.8] - 2022-08-04 (experimental)

data/Gemfile CHANGED Viewed

@@ -7,7 +7,7 @@ gemspec
 group :test do
   gem 'rake'
-  gem 'red-parquet', '>= 8.0.0'
+  gem 'red-parquet', '>= 9.0.0'
   gem 'rover-df', '~> 0.3.0'
   gem 'rubocop'

data/README.md CHANGED Viewed

@@ -3,17 +3,23 @@
 [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
 [![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
-A simple dataframe library for Ruby (experimental).
+A simple dataframe library for Ruby.
 - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
 - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
 ## Requirements
+Supported Ruby version is >= 2.7.
+Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
+I recommend Ruby 3 for performance.
 ```ruby
-gem 'red-arrow',   '>= 8.0.0'
+# Libraries required
+gem 'red-arrow',   '>= 9.0.0'
-gem 'red-parquet', '>= 8.0.0' # Optional, if you use IO from/to parquet
+gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet
 gem 'rover-df',    '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
 ```
@@ -21,9 +27,9 @@ gem 'rover-df',    '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
 Install requirements before you install Red Amber.
-- Apache Arrow GLib (>= 8.0.0)
+- Apache Arrow GLib (>= 9.0.0)
-- Apache Parquet GLib (>= 8.0.0)  # If you use IO from/to parquet
+- Apache Parquet GLib (>= 9.0.0)  # If you use IO from/to parquet
   See [Apache Arrow install document](https://arrow.apache.org/install/).
@@ -122,22 +128,22 @@ df = df.drop(true, true, false)
 # =>
 #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
-    body_mass_g
-       <uint16>
-  1        3750
-  2        3800
-  3        3250
-  4       (nil)
-  5        3450
-  :           :
-342        5750
-343        5200
+    body_mass_g
+       <uint16>
+  1        3750
+  2        3800
+  3        3250
+  4       (nil)
+  5        3450
+  :           :
+342        5750
+343        5200
 344        5400
 ```
 Arrow data is immutable, so these methods always return an new object.
-`DataFrame#assign` creates new variables (column in the table).
+`DataFrame#assign` creates new columns or update existing columns.
 ![assign method image](doc/image/dataframe/assign.png)
@@ -208,7 +214,7 @@ penguins.remove(penguins[:bill_length_mm] < 40)
 DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
-This example is usage of block to update numeric columns.
+This example is usage of block to update a column.
 ```ruby
 df = RedAmber::DataFrame.new(
@@ -229,30 +235,28 @@ df
 5   (nil)    (nil) (nil)    (nil)
 df.assign do
-  vectors.each_with_object({}) do |v, h|
-    h[v.key] = -v if v.numeric?
-  end
+  vectors.select(&:float?).map { |v| [v.key, -v] }
+  # => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
 end
 # =>
-#<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000009a1b4>
-  integer    float string   boolean
-  <uint8> <double> <string> <boolean>
-1       0     -0.0 A        true
-2     255     -1.1 B        false
-3     254     -2.2 C        true
-4     253      NaN D        false
-5   (nil)    (nil) (nil)    (nil)
+#<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
+    index    float string
+  <uint8> <double> <string>
+1       0     -0.0 A
+2       1     -1.1 B
+3       2     -2.2 C
+4       3      NaN D
+5   (nil)    (nil) (nil)
 ```
-Negate (-@) method of unsigned integer Vector returns complement.
-Next example is to eliminate observations (row in the table) containing nil.
+Next example is to eliminate rows containing nil.
 ```ruby
 # remove all observations containing nil
 nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
 nil_removed.tdr
 # =>
 RedAmber::DataFrame : 342 x 8 Vectors
 Vectors : 5 numeric, 3 strings
@@ -273,6 +277,21 @@ For this frequently needed task, we can do it much simpler.
 penguins.remove_nil # => same result as above
 ```
+`DataFrame#summary` shows summary statistics in a DataFrame.
+```ruby
+puts penguins.summary.to_s(width: 82)
+# =>
+  variables            count     mean      std      min      25%   median      75%      max
+  <dictionary>      <uint16> <double> <double> <double> <double> <double> <double> <double>
+1 bill_length_mm         342    43.92     5.46     32.1    39.23    44.38     48.5     59.6
+2 bill_depth_mm          342    17.15     1.97     13.1     15.6    17.32     18.7     21.5
+3 flipper_length_mm      342   200.92    14.06    172.0    190.0    197.0    213.0    231.0
+4 body_mass_g            342  4201.75   801.95   2700.0   3550.0   4031.5   4750.0   6300.0
+5 year                   344  2008.03     0.82   2007.0   2007.0   2008.0   2009.0   2009.0
+```
 `DataFrame#group` method can be used for the grouping tasks.
 ```ruby
@@ -311,7 +330,7 @@ grouped.slice { v(:count) > 1 }
 9 Kaminoan       2        221.0       88.0
 ```
-See [DataFrame.md](doc/DataFrame.md) for details.
+See [DataFrame.md](doc/DataFrame.md) for other examples and details.
 ## `RedAmber::Vector`
@@ -355,7 +374,7 @@ See [Vector.md](doc/Vector.md) for details.
 ## Jupyter notebook
-[53 Examples of Red Amber](doc/examples_of_red_amber.ipynb)
+[61 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
 ## Development
@@ -366,6 +385,12 @@ bundle install
 bundle exec rake test
 ```
+I will appreciate if you could help to improve this project. Here are a few ways you can help:
+- [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
+- Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
+- Write, clarify, or fix documentation
 ## License
 The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).

data/doc/DataFrame.md CHANGED Viewed

@@ -167,6 +167,11 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
   If you need a column-oriented full array, use `.to_h.to_a`
+### `each_row`
+  Yield each row in a `{ key => row}` Hash.
+  Returns Enumerator if block is not given.
 ### `schema`
 - Returns column name and data type in a Hash.
@@ -202,7 +207,22 @@ puts penguins.to_s
 `inspect` uses `to_s` output and also shows shape and object_id.
-### `summary`, `describe` (not implemented)
+### `summary`, `describe`
+`DataFrame#summary` or `DataFrame#describe` shows summary statistics in a DataFrame.
+```ruby
+puts penguins.summary.to_s(width: 82) # needs more width to show all stats in this example
+# =>
+  variables            count     mean      std      min      25%   median      75%      max
+  <dictionary>      <uint16> <double> <double> <double> <double> <double> <double> <double>
+1 bill_length_mm         342    43.92     5.46     32.1    39.23    44.38     48.5     59.6
+2 bill_depth_mm          342    17.15     1.97     13.1     15.6    17.32     18.7     21.5
+3 flipper_length_mm      342   200.92    14.06    172.0    190.0    197.0    213.0    231.0
+4 body_mass_g            342  4201.75   801.95   2700.0   3550.0   4031.5   4750.0   6300.0
+5 year                   344  2008.03     0.82   2007.0   2007.0   2008.0   2009.0   2009.0
+```
 ### `to_rover`
@@ -704,7 +724,7 @@ penguins.to_rover
 - Key pairs as arguments
-    `rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}`.
+    `rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`.
     ```ruby
     df = RedAmber::DataFrame.new( 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] )
@@ -721,7 +741,11 @@ penguins.to_rover
 - Key pairs by a block
-    `rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}`. Block is called in the context of self.
+    `rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`. Block is called in the context of self.
+- Not existing keys
+    If specified `existing_key` is not exist, raise a `DataFrameArgumentError`.
 - Key type
@@ -729,16 +753,16 @@ penguins.to_rover
 ### `assign`
-  Assign new or updated variables (columns) and create a updated DataFrame.
+  Assign new or updated columns (variables) and create a updated DataFrame.
-  - Variables with new keys will append new variables at bottom (right in the table).
+  - Variables with new keys will append new columns from the right.
   - Variables with exisiting keys will update corresponding vectors.
     ![assign method image](doc/../image/dataframe/assign.png)
 - Variables as arguments
-    `assign(key_pairs)` accepts pairs of key and values as arguments. key_pairs should be a Hash of `{key => array}` or `{key => Vector}`.
+    `assign(key_pairs)` accepts pairs of key and values as parameters. `key_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
     ```ruby
     df = RedAmber::DataFrame.new(
@@ -769,7 +793,7 @@ penguins.to_rover
 - Key pairs by a block
-    `assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key => array}` or `{key => Vector}`. Block is called in the context of self.
+    `assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`. The block is called in the context of self.
     ```ruby
     df = RedAmber::DataFrame.new(
@@ -788,29 +812,27 @@ penguins.to_rover
     4       3      NaN D
     5   (nil)    (nil) (nil)
-    # update numeric variables
+    # update :float
+    # assigner by an Array
     df.assign do
-      assigner = {}
-      vectors.each_with_index do |v, i|
-        assigner[keys[i]] = v * -1 if v.numeric?
-      end
-      assigner
+      vectors.select(&:float?)
+             .map { |v| [v.key, -v] }
     end
     # =>
-    #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000006e000>
-       index    float string
-      <int8> <double> <string>
-    1      0     -0.0 A
-    2     -1     -1.1 B
-    3     -2     -2.2 C
-    4     -3      NaN D
-    5  (nil)    (nil) (nil)
-    # Or it ’s shorter like this:
+    #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000dfffc>
+        index    float string
+      <uint8> <double> <string>
+    1       0     -0.0 A
+    2       1     -1.1 B
+    3       2     -2.2 C
+    4       3      NaN D
+    5   (nil)    (nil) (nil)
+    # Or we can use assigner by a Hash
     df.assign do
-      variables.select.with_object({}) do |(key, vector), assigner|
-        assigner[key] = vector * -1 if vector.numeric?
+      vectors.select.with_object({}) do |v, assigner|
+        assigner[v.key] = -v if v.float?
       end
     end
@@ -821,6 +843,28 @@ penguins.to_rover
   Symbol key and String key are considered as the same key.
+- Empty assignment
+  If assigner is empty or nil, returns self.
+- Append from left
+  `assign_left` method accepts the same parameters and block as `assign`, but append new columns from leftside.
+  ```ruby
+  df.assign_left(new_index: [1, 2, 3, 4, 5])
+  # =>
+  #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000001787c>
+    new_index   index    float string
+      <uint8> <uint8> <double> <string>
+  1         1       0      0.0 A
+  2         2       1      1.1 B
+  3         3       2      2.2 C
+  4         4       3      NaN D
+  5         5   (nil)    (nil) (nil)
+  ```
 ## Updating
 ### `sort`
@@ -933,17 +977,17 @@ penguins.to_rover
   starwars.group(:species).count(:species)
   # =>
-  #<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
-     species    count
-     <string> <int64>
-   1 Human         35
-   2 Droid          6
-   3 Wookiee        2
-   4 Rodian         1
-   5 Hutt           1
-   : :              :
-  36 Kaleesh        1
-  37 Pau'an         1
+  #<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
+     species    count
+     <string> <int64>
+   1 Human         35
+   2 Droid          6
+   3 Wookiee        2
+   4 Rodian         1
+   5 Hutt           1
+   : :              :
+  36 Kaleesh        1
+  37 Pau'an         1
   38 Kel Dor        1
   ```
@@ -953,17 +997,17 @@ penguins.to_rover
   grouped = starwars.group(:species) { [count(:species), mean(:height, :mass)] }
   # =>
-  #<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
-     species    count mean(height) mean(mass)
-     <string> <int64>     <double>   <double>
-   1 Human         35        176.6       82.8
-   2 Droid          6        131.2       69.8
-   3 Wookiee        2        231.0      124.0
-   4 Rodian         1        173.0       74.0
-   5 Hutt           1        175.0     1358.0
-   : :              :            :          :
-  36 Kaleesh        1        216.0      159.0
-  37 Pau'an         1        206.0       80.0
+  #<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
+     specie  s    count mean(height) mean(mass)
+     <strin  g> <int64>     <double>   <double>
+   1 Human           35        176.6       82.8
+   2 Droid            6        131.2       69.8
+   3 Wookie  e        2        231.0      124.0
+   4 Rodian           1        173.0       74.0
+   5 Hutt             1        175.0     1358.0
+   : :                :            :          :
+  36 Kalees  h        1        216.0      159.0
+  37 Pau'an           1        206.0       80.0
   38 Kel Dor        1        188.0       80.0
   ```
@@ -987,18 +1031,115 @@ penguins.to_rover
   9 Kaminoan       2        221.0       88.0
   ```
-## Combining DataFrames
+## Reshape
-- [ ] Combining rows to a dataframe
+### `transpose`
-- [ ] Inner join
+  Creates transposed DataFrame for wide type dataframe.
-- [ ] Left join
+  ```ruby
+  import_cars = RedAmber::DataFrame.load('test/entity/import_cars.tsv')
-## Encoding
+  # =>
+  #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000d520>
+       Year    Audi     BMW BMW_MINI Mercedes-Benz      VW
+    <int64> <int64> <int64>  <int64>       <int64> <int64>
+  1    2021   22535   35905    18211         51722   35215
+  2    2020   22304   35712    20196         57041   36576
+  3    2019   24222   46814    23813         66553   46794
+  4    2018   26473   50982    25984         67554   51961
+  5    2017   28336   52527    25427         68221   49040
-- [ ] One-hot encoding
+  import_cars.transpose
-## Iteration
+  # =>
+  #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000ef74>
+    name              2021     2020     2019     2018     2017
+    <dictionary>  <uint16> <uint16> <uint32> <uint32> <uint32>
+  1 Audi             22535    22304    24222    26473    28336
+  2 BMW              35905    35712    46814    50982    52527
+  3 BMW_MINI         18211    20196    23813    25984    25427
+  4 Mercedes-Benz    51722    57041    66553    67554    68221
+  5 VW               35215    36576    46794    51961    49040
+  ```
+  The leftmost column is created by original keys. Key name of the column is
+  named by 'name'.
+### `to_long(*keep_keys)`
+  Creates a 'long' DataFrame.
+  - Parameter `keep_keys` specifies the key names to keep.
+  ```ruby
+  import_cars.to_long(:Year)
+  # =>
+  #<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000012750>
+         Year name             value
+     <uint16> <dictionary>  <uint32>
+   1     2021 Audi             22535
+   2     2021 BMW              35905
+   3     2021 BMW_MINI         18211
+   4     2021 Mercedes-Benz    51722
+   5     2021 VW               35215
+   :        : :                    :
+  23     2017 BMW_MINI         25427
+  24     2017 Mercedes-Benz    68221
+  25     2017 VW               49040
+  ```
+  - Option `:name` : key of the column which is come **from key names**.
+  - Option `:value` : key of the column which is come **from values**.
+  ```ruby
+  import_cars.to_long(:Year, name: :Manufacturer, value: :Num_of_imported)
+  # =>
+  #<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000017700>
+         Year Manufacturer  Num_of_imported
+     <uint16> <dictionary>         <uint32>
+   1     2021 Audi                    22535
+   2     2021 BMW                     35905
+   3     2021 BMW_MINI                18211
+   4     2021 Mercedes-Benz           51722
+   5     2021 VW                      35215
+   :        : :                           :
+  23     2017 BMW_MINI                25427
+  24     2017 Mercedes-Benz           68221
+  25     2017 VW                      49040
+  ```
-- [ ] each_rows
+### `to_wide`
+  Creates a 'wide' DataFrame.
+  - Option `:name` : key of the column which will be expanded **to key name**.
+  - Option `:value` : key of the column which will be expanded **to values**.
+  ```ruby
+  import_cars.to_long(:Year).to_wide
+  # import_cars.to_long(:Year).to_wide(name: :name, value: :value)
+  # is also OK
+  # =>
+  #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000f0f0>
+        Year     Audi      BMW BMW_MINI Mercedes-Benz       VW
+    <uint16> <uint16> <uint16> <uint16>      <uint32> <uint16>
+  1     2021    22535    35905    18211         51722    35215
+  2     2020    22304    35712    20196         57041    36576
+  3     2019    24222    46814    23813         66553    46794
+  4     2018    26473    50982    25984         67554    51961
+  5     2017    28336    52527    25427         68221    49040
+  ```
+## Combine
+- [ ] Combining dataframes
+- [ ] Join
+## Encoding
+- [ ] One-hot encoding

data/doc/Vector.md CHANGED Viewed

@@ -145,7 +145,7 @@ array[booleans]
 | ✓ `min_max` |  ✓  |  ✓  |  ✓  | ✓ ScalarAggregate|     |
 |[ ]`mode`    |     | [ ] |     |[ ] Mode    |     |
 | ✓ `product` |  ✓  |  ✓  |     | ✓ ScalarAggregate|     |
-|[ ]`quantile`|     | [ ] |     |[ ] Quantile|     |
+| ✓ `quantile`|     |  ✓  |     | ✓ Quantile|Specify probability in (0..1) by a parameter (default=0.5)|
 | ✓ `sd    `  |     |  ✓  |     |          |ddof: 1 at `stddev`|
 | ✓ `stddev`  |     |  ✓  |     | ✓ Variance|ddof: 0 by default|
 | ✓ `sum`     |  ✓  |  ✓  |     | ✓ ScalarAggregate|     |
@@ -303,6 +303,10 @@ double.round(n_digits: -1)
   Returns index of specified element.
+### `quantiles(probs = [1.0, 0.75, 0.5, 0.25, 0.0], interpolation: :linear, skip_nils: true, min_count: 0)`
+  Returns quantiles for specified probabilities in a DataFrame.
 ### `sort_indexes`, `sort_indices`, `array_sort_indices`
 ### [ ] `sort`, `sort_by`