red_amber 0.1.8 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3853e70f378cac65013a3bcfc51a2d55cb70cc494f3f3b70675bed944cc15b49
4
- data.tar.gz: 3c65999cf978f1edf8c2c7fcce9a0ccb192d4da051f34fa0bf3f66ddc178eb1c
3
+ metadata.gz: 73459d02c921fcb0fcb742760e8c882b5491fa5316a79b9016233a516ada013e
4
+ data.tar.gz: ac25e808c5e5d4c13bb1877659550bba532cb5778371e39dfa1f3b9e5a91a4f8
5
5
  SHA512:
6
- metadata.gz: fac66ba0bf5955cfe0d21a51b90ec16407182b9053e9b586dfe9f8e2526de4e90efecdd8eba1e8b3c99b12fc44544c82fb2f6af4b666b97876a64a6ee4deedf1
7
- data.tar.gz: 1a4cc526ce9f097438f2b7d018552a4cd6aaa2d900012297cd1777c4b9e39063cc2988af91c138e93f291a56175aefb6a6b00c211f9b9c5bd38d75d6bc40acb9
6
+ metadata.gz: 1bfa4200d440c338f496fe282816634d6a833e30e17edc87a2cf5ec63866e2bbbaf8796916f1b052ea66482c54a038bbf1445258c2526691e42c2b47be2c39c5
7
+ data.tar.gz: e324e480e6086f7017de58201783c857825b79d0b2e2c8fa2636089cd1c5531e22905a3c0d860f26b833eb6add6ed6017497632bd1ea8fcb932c2d2233b11812
data/.rubocop.yml CHANGED
@@ -61,6 +61,7 @@ Metrics/AbcSize:
61
61
  Max: 30
62
62
  Exclude:
63
63
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
64
+ - 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
64
65
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 51
65
66
  - 'lib/red_amber/vector_updatable.rb' # Max: 36
66
67
  - 'lib/red_amber/vector_selectable.rb' # Max: 33
@@ -98,9 +99,10 @@ Metrics/MethodLength:
98
99
  Metrics/ModuleLength:
99
100
  Max: 100
100
101
  Exclude:
102
+ - 'lib/red_amber/data_frame_displayable.rb' # Max: 132
101
103
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 141
104
+ - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 110
102
105
  - 'lib/red_amber/vector_functions.rb' # Max: 114
103
- - 'lib/red_amber/data_frame_displayable.rb' # Max: 132
104
106
 
105
107
  # Max: 8
106
108
  Metrics/PerceivedComplexity:
data/CHANGELOG.md CHANGED
@@ -1,6 +1,75 @@
1
- ## [0.1.9] - Unreleased
1
+ ## [0.2.0] - 2022-08-15
2
2
 
3
- - Supports Arrow 9.0.0
3
+ - Bump version up to 0.2.0
4
+
5
+ - Bug fixes
6
+
7
+ - Fix order of multiple group keys (#55)
8
+
9
+ Only 1 group key comes to left. Other keys remain in right.
10
+
11
+ - Remove optional `require` for rover (#55)
12
+
13
+ Fix DataFrame.new for argument with Rover::DataFrame.
14
+
15
+ - Fix occasional failure in CI (#59)
16
+
17
+ Sometimes the CI test fails. I added -dev dependency
18
+ in Arrow install by apt, not doing in bundler.
19
+
20
+ - Fix calling :take in V#[] (#56)
21
+
22
+ Fixed to call Arrow function :take instead of :array_take in Vector#take_by_vector. This will prevent the error below
23
+ when called with Arrow::ChunkedArray.
24
+
25
+ - Raise error renaming non existing key (#61)
26
+
27
+ Add error when specified key is not exist.
28
+
29
+ - Fix DataFrame#rename #assign by array (#65)
30
+
31
+ - New features and improvements
32
+
33
+ - Support Arrow 9.0.0
34
+ - Upgrade to Arrow 9.0.0 (#59)
35
+ - Add Vector#quantile method (#59)
36
+ Arrow::QuantileOptions has supported in Arrow GLib 9.0.0 (ARROW-16623, Thanks!)
37
+
38
+ - Add Vector#quantiles (#62)
39
+
40
+ - Add DataFrame#each_row (#56)
41
+ - Returns Enumerator if block is not given.
42
+ - Change DataFrame#each_row to return a Hash {key => row} (#63)
43
+
44
+ - Refactor to use pattern match in overloaded parameter parsing (#61)
45
+ - Refine DataFrame.new to use pattern match
46
+ - Use pattern match in DataFrame#assign
47
+ - Use pattern match in DataFrame#rename
48
+
49
+ - Accept Array for renamer/assigner in #rename/#assign (#61)
50
+ - Accept assigner by Arrays in DataFrame#assign
51
+ - Accept renamer pairs by Arrays in DataFrame#rename
52
+ - Add DataFrame#assign_left method
53
+
54
+ - Add summary/describe (#62)
55
+ - Introduce DataFrame#summary(#describe)
56
+
57
+ - Introduce reshaping methods for DataFrame (#64)
58
+ - Introduce DataFrame#transpose method
59
+ - Intorduce DataFrame#to_long method
60
+ - Intorduce DataFrame#to_wide method
61
+
62
+ - Others
63
+
64
+ - Add alias sort_index for array_sort_indices (#59)
65
+ - Enable :width option in DataFrame#to_s (#62)
66
+ - Add options to DataFrame#format_table (#62)
67
+
68
+ - Update Documents
69
+
70
+ - Add Yard doc for some methods
71
+
72
+ - Update Jupyter notebook '61 Examples of Red Amber' (#65)
4
73
 
5
74
  ## [0.1.8] - 2022-08-04 (experimental)
6
75
 
data/Gemfile CHANGED
@@ -7,7 +7,7 @@ gemspec
7
7
  group :test do
8
8
  gem 'rake'
9
9
 
10
- gem 'red-parquet', '>= 8.0.0'
10
+ gem 'red-parquet', '>= 9.0.0'
11
11
  gem 'rover-df', '~> 0.3.0'
12
12
 
13
13
  gem 'rubocop'
data/README.md CHANGED
@@ -3,17 +3,23 @@
3
3
  [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
4
4
  [![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
5
5
 
6
- A simple dataframe library for Ruby (experimental).
6
+ A simple dataframe library for Ruby.
7
7
 
8
8
  - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
9
9
  - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
10
10
 
11
11
  ## Requirements
12
12
 
13
+ Supported Ruby version is >= 2.7.
14
+
15
+ Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
16
+ I recommend Ruby 3 for performance.
17
+
13
18
  ```ruby
14
- gem 'red-arrow', '>= 8.0.0'
19
+ # Libraries required
20
+ gem 'red-arrow', '>= 9.0.0'
15
21
 
16
- gem 'red-parquet', '>= 8.0.0' # Optional, if you use IO from/to parquet
22
+ gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet
17
23
  gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
18
24
  ```
19
25
 
@@ -21,9 +27,9 @@ gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
21
27
 
22
28
  Install requirements before you install Red Amber.
23
29
 
24
- - Apache Arrow GLib (>= 8.0.0)
30
+ - Apache Arrow GLib (>= 9.0.0)
25
31
 
26
- - Apache Parquet GLib (>= 8.0.0) # If you use IO from/to parquet
32
+ - Apache Parquet GLib (>= 9.0.0) # If you use IO from/to parquet
27
33
 
28
34
  See [Apache Arrow install document](https://arrow.apache.org/install/).
29
35
 
@@ -122,22 +128,22 @@ df = df.drop(true, true, false)
122
128
 
123
129
  # =>
124
130
  #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
125
- body_mass_g
126
- <uint16>
127
- 1 3750
128
- 2 3800
129
- 3 3250
130
- 4 (nil)
131
- 5 3450
132
- : :
133
- 342 5750
134
- 343 5200
131
+ body_mass_g
132
+ <uint16>
133
+ 1 3750
134
+ 2 3800
135
+ 3 3250
136
+ 4 (nil)
137
+ 5 3450
138
+ : :
139
+ 342 5750
140
+ 343 5200
135
141
  344 5400
136
142
  ```
137
143
 
138
144
  Arrow data is immutable, so these methods always return an new object.
139
145
 
140
- `DataFrame#assign` creates new variables (column in the table).
146
+ `DataFrame#assign` creates new columns or update existing columns.
141
147
 
142
148
  ![assign method image](doc/image/dataframe/assign.png)
143
149
 
@@ -208,7 +214,7 @@ penguins.remove(penguins[:bill_length_mm] < 40)
208
214
 
209
215
  DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
210
216
 
211
- This example is usage of block to update numeric columns.
217
+ This example is usage of block to update a column.
212
218
 
213
219
  ```ruby
214
220
  df = RedAmber::DataFrame.new(
@@ -229,30 +235,28 @@ df
229
235
  5 (nil) (nil) (nil) (nil)
230
236
 
231
237
  df.assign do
232
- vectors.each_with_object({}) do |v, h|
233
- h[v.key] = -v if v.numeric?
234
- end
238
+ vectors.select(&:float?).map { |v| [v.key, -v] }
239
+ # => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
235
240
  end
236
241
 
237
242
  # =>
238
- #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000009a1b4>
239
- integer float string boolean
240
- <uint8> <double> <string> <boolean>
241
- 1 0 -0.0 A true
242
- 2 255 -1.1 B false
243
- 3 254 -2.2 C true
244
- 4 253 NaN D false
245
- 5 (nil) (nil) (nil) (nil)
243
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
244
+ index float string
245
+ <uint8> <double> <string>
246
+ 1 0 -0.0 A
247
+ 2 1 -1.1 B
248
+ 3 2 -2.2 C
249
+ 4 3 NaN D
250
+ 5 (nil) (nil) (nil)
246
251
  ```
247
252
 
248
- Negate (-@) method of unsigned integer Vector returns complement.
249
-
250
- Next example is to eliminate observations (row in the table) containing nil.
253
+ Next example is to eliminate rows containing nil.
251
254
 
252
255
  ```ruby
253
256
  # remove all observations containing nil
254
257
  nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
255
258
  nil_removed.tdr
259
+
256
260
  # =>
257
261
  RedAmber::DataFrame : 342 x 8 Vectors
258
262
  Vectors : 5 numeric, 3 strings
@@ -273,6 +277,21 @@ For this frequently needed task, we can do it much simpler.
273
277
  penguins.remove_nil # => same result as above
274
278
  ```
275
279
 
280
+ `DataFrame#summary` shows summary statistics in a DataFrame.
281
+
282
+ ```ruby
283
+ puts penguins.summary.to_s(width: 82)
284
+
285
+ # =>
286
+ variables count mean std min 25% median 75% max
287
+ <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
288
+ 1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
289
+ 2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
290
+ 3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
291
+ 4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
292
+ 5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
293
+ ```
294
+
276
295
  `DataFrame#group` method can be used for the grouping tasks.
277
296
 
278
297
  ```ruby
@@ -311,7 +330,7 @@ grouped.slice { v(:count) > 1 }
311
330
  9 Kaminoan 2 221.0 88.0
312
331
  ```
313
332
 
314
- See [DataFrame.md](doc/DataFrame.md) for details.
333
+ See [DataFrame.md](doc/DataFrame.md) for other examples and details.
315
334
 
316
335
 
317
336
  ## `RedAmber::Vector`
@@ -355,7 +374,7 @@ See [Vector.md](doc/Vector.md) for details.
355
374
 
356
375
  ## Jupyter notebook
357
376
 
358
- [53 Examples of Red Amber](doc/examples_of_red_amber.ipynb)
377
+ [61 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
359
378
 
360
379
  ## Development
361
380
 
@@ -366,6 +385,12 @@ bundle install
366
385
  bundle exec rake test
367
386
  ```
368
387
 
388
+ I will appreciate if you could help to improve this project. Here are a few ways you can help:
389
+
390
+ - [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
391
+ - Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
392
+ - Write, clarify, or fix documentation
393
+
369
394
  ## License
370
395
 
371
396
  The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
data/doc/DataFrame.md CHANGED
@@ -167,6 +167,11 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
167
167
 
168
168
  If you need a column-oriented full array, use `.to_h.to_a`
169
169
 
170
+ ### `each_row`
171
+
172
+ Yield each row in a `{ key => row}` Hash.
173
+ Returns Enumerator if block is not given.
174
+
170
175
  ### `schema`
171
176
 
172
177
  - Returns column name and data type in a Hash.
@@ -202,7 +207,22 @@ puts penguins.to_s
202
207
  `inspect` uses `to_s` output and also shows shape and object_id.
203
208
 
204
209
 
205
- ### `summary`, `describe` (not implemented)
210
+ ### `summary`, `describe`
211
+
212
+ `DataFrame#summary` or `DataFrame#describe` shows summary statistics in a DataFrame.
213
+
214
+ ```ruby
215
+ puts penguins.summary.to_s(width: 82) # needs more width to show all stats in this example
216
+
217
+ # =>
218
+ variables count mean std min 25% median 75% max
219
+ <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
220
+ 1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
221
+ 2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
222
+ 3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
223
+ 4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
224
+ 5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
225
+ ```
206
226
 
207
227
  ### `to_rover`
208
228
 
@@ -704,7 +724,7 @@ penguins.to_rover
704
724
 
705
725
  - Key pairs as arguments
706
726
 
707
- `rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}`.
727
+ `rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`.
708
728
 
709
729
  ```ruby
710
730
  df = RedAmber::DataFrame.new( 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] )
@@ -721,7 +741,11 @@ penguins.to_rover
721
741
 
722
742
  - Key pairs by a block
723
743
 
724
- `rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}`. Block is called in the context of self.
744
+ `rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`. Block is called in the context of self.
745
+
746
+ - Not existing keys
747
+
748
+ If specified `existing_key` is not exist, raise a `DataFrameArgumentError`.
725
749
 
726
750
  - Key type
727
751
 
@@ -729,16 +753,16 @@ penguins.to_rover
729
753
 
730
754
  ### `assign`
731
755
 
732
- Assign new or updated variables (columns) and create a updated DataFrame.
756
+ Assign new or updated columns (variables) and create a updated DataFrame.
733
757
 
734
- - Variables with new keys will append new variables at bottom (right in the table).
758
+ - Variables with new keys will append new columns from the right.
735
759
  - Variables with exisiting keys will update corresponding vectors.
736
760
 
737
761
  ![assign method image](doc/../image/dataframe/assign.png)
738
762
 
739
763
  - Variables as arguments
740
764
 
741
- `assign(key_pairs)` accepts pairs of key and values as arguments. key_pairs should be a Hash of `{key => array}` or `{key => Vector}`.
765
+ `assign(key_pairs)` accepts pairs of key and values as parameters. `key_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
742
766
 
743
767
  ```ruby
744
768
  df = RedAmber::DataFrame.new(
@@ -769,7 +793,7 @@ penguins.to_rover
769
793
 
770
794
  - Key pairs by a block
771
795
 
772
- `assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key => array}` or `{key => Vector}`. Block is called in the context of self.
796
+ `assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`. The block is called in the context of self.
773
797
 
774
798
  ```ruby
775
799
  df = RedAmber::DataFrame.new(
@@ -788,29 +812,27 @@ penguins.to_rover
788
812
  4 3 NaN D
789
813
  5 (nil) (nil) (nil)
790
814
 
791
- # update numeric variables
815
+ # update :float
816
+ # assigner by an Array
792
817
  df.assign do
793
- assigner = {}
794
- vectors.each_with_index do |v, i|
795
- assigner[keys[i]] = v * -1 if v.numeric?
796
- end
797
- assigner
818
+ vectors.select(&:float?)
819
+ .map { |v| [v.key, -v] }
798
820
  end
799
821
 
800
822
  # =>
801
- #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000006e000>
802
- index float string
803
- <int8> <double> <string>
804
- 1 0 -0.0 A
805
- 2 -1 -1.1 B
806
- 3 -2 -2.2 C
807
- 4 -3 NaN D
808
- 5 (nil) (nil) (nil)
809
-
810
- # Or it ’s shorter like this:
823
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000dfffc>
824
+ index float string
825
+ <uint8> <double> <string>
826
+ 1 0 -0.0 A
827
+ 2 1 -1.1 B
828
+ 3 2 -2.2 C
829
+ 4 3 NaN D
830
+ 5 (nil) (nil) (nil)
831
+
832
+ # Or we can use assigner by a Hash
811
833
  df.assign do
812
- variables.select.with_object({}) do |(key, vector), assigner|
813
- assigner[key] = vector * -1 if vector.numeric?
834
+ vectors.select.with_object({}) do |v, assigner|
835
+ assigner[v.key] = -v if v.float?
814
836
  end
815
837
  end
816
838
 
@@ -821,6 +843,28 @@ penguins.to_rover
821
843
 
822
844
  Symbol key and String key are considered as the same key.
823
845
 
846
+ - Empty assignment
847
+
848
+ If assigner is empty or nil, returns self.
849
+
850
+ - Append from left
851
+
852
+ `assign_left` method accepts the same parameters and block as `assign`, but append new columns from leftside.
853
+
854
+ ```ruby
855
+ df.assign_left(new_index: [1, 2, 3, 4, 5])
856
+
857
+ # =>
858
+ #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000001787c>
859
+ new_index index float string
860
+ <uint8> <uint8> <double> <string>
861
+ 1 1 0 0.0 A
862
+ 2 2 1 1.1 B
863
+ 3 3 2 2.2 C
864
+ 4 4 3 NaN D
865
+ 5 5 (nil) (nil) (nil)
866
+ ```
867
+
824
868
  ## Updating
825
869
 
826
870
  ### `sort`
@@ -933,17 +977,17 @@ penguins.to_rover
933
977
  starwars.group(:species).count(:species)
934
978
 
935
979
  # =>
936
- #<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
937
- species count
938
- <string> <int64>
939
- 1 Human 35
940
- 2 Droid 6
941
- 3 Wookiee 2
942
- 4 Rodian 1
943
- 5 Hutt 1
944
- : : :
945
- 36 Kaleesh 1
946
- 37 Pau'an 1
980
+ #<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
981
+ species count
982
+ <string> <int64>
983
+ 1 Human 35
984
+ 2 Droid 6
985
+ 3 Wookiee 2
986
+ 4 Rodian 1
987
+ 5 Hutt 1
988
+ : : :
989
+ 36 Kaleesh 1
990
+ 37 Pau'an 1
947
991
  38 Kel Dor 1
948
992
  ```
949
993
 
@@ -953,17 +997,17 @@ penguins.to_rover
953
997
  grouped = starwars.group(:species) { [count(:species), mean(:height, :mass)] }
954
998
 
955
999
  # =>
956
- #<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
957
- species count mean(height) mean(mass)
958
- <string> <int64> <double> <double>
959
- 1 Human 35 176.6 82.8
960
- 2 Droid 6 131.2 69.8
961
- 3 Wookiee 2 231.0 124.0
962
- 4 Rodian 1 173.0 74.0
963
- 5 Hutt 1 175.0 1358.0
964
- : : : : :
965
- 36 Kaleesh 1 216.0 159.0
966
- 37 Pau'an 1 206.0 80.0
1000
+ #<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
1001
+ specie s count mean(height) mean(mass)
1002
+ <strin g> <int64> <double> <double>
1003
+ 1 Human 35 176.6 82.8
1004
+ 2 Droid 6 131.2 69.8
1005
+ 3 Wookie e 2 231.0 124.0
1006
+ 4 Rodian 1 173.0 74.0
1007
+ 5 Hutt 1 175.0 1358.0
1008
+ : : : : :
1009
+ 36 Kalees h 1 216.0 159.0
1010
+ 37 Pau'an 1 206.0 80.0
967
1011
  38 Kel Dor 1 188.0 80.0
968
1012
  ```
969
1013
 
@@ -987,18 +1031,115 @@ penguins.to_rover
987
1031
  9 Kaminoan 2 221.0 88.0
988
1032
  ```
989
1033
 
990
- ## Combining DataFrames
1034
+ ## Reshape
991
1035
 
992
- - [ ] Combining rows to a dataframe
1036
+ ### `transpose`
993
1037
 
994
- - [ ] Inner join
1038
+ Creates transposed DataFrame for wide type dataframe.
995
1039
 
996
- - [ ] Left join
1040
+ ```ruby
1041
+ import_cars = RedAmber::DataFrame.load('test/entity/import_cars.tsv')
997
1042
 
998
- ## Encoding
1043
+ # =>
1044
+ #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000d520>
1045
+ Year Audi BMW BMW_MINI Mercedes-Benz VW
1046
+ <int64> <int64> <int64> <int64> <int64> <int64>
1047
+ 1 2021 22535 35905 18211 51722 35215
1048
+ 2 2020 22304 35712 20196 57041 36576
1049
+ 3 2019 24222 46814 23813 66553 46794
1050
+ 4 2018 26473 50982 25984 67554 51961
1051
+ 5 2017 28336 52527 25427 68221 49040
999
1052
 
1000
- - [ ] One-hot encoding
1053
+ import_cars.transpose
1001
1054
 
1002
- ## Iteration
1055
+ # =>
1056
+ #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000ef74>
1057
+ name 2021 2020 2019 2018 2017
1058
+ <dictionary> <uint16> <uint16> <uint32> <uint32> <uint32>
1059
+ 1 Audi 22535 22304 24222 26473 28336
1060
+ 2 BMW 35905 35712 46814 50982 52527
1061
+ 3 BMW_MINI 18211 20196 23813 25984 25427
1062
+ 4 Mercedes-Benz 51722 57041 66553 67554 68221
1063
+ 5 VW 35215 36576 46794 51961 49040
1064
+ ```
1065
+
1066
+ The leftmost column is created by original keys. Key name of the column is
1067
+ named by 'name'.
1068
+
1069
+ ### `to_long(*keep_keys)`
1070
+
1071
+ Creates a 'long' DataFrame.
1072
+
1073
+ - Parameter `keep_keys` specifies the key names to keep.
1074
+
1075
+ ```ruby
1076
+ import_cars.to_long(:Year)
1077
+
1078
+ # =>
1079
+ #<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000012750>
1080
+ Year name value
1081
+ <uint16> <dictionary> <uint32>
1082
+ 1 2021 Audi 22535
1083
+ 2 2021 BMW 35905
1084
+ 3 2021 BMW_MINI 18211
1085
+ 4 2021 Mercedes-Benz 51722
1086
+ 5 2021 VW 35215
1087
+ : : : :
1088
+ 23 2017 BMW_MINI 25427
1089
+ 24 2017 Mercedes-Benz 68221
1090
+ 25 2017 VW 49040
1091
+ ```
1092
+
1093
+ - Option `:name` : key of the column which is come **from key names**.
1094
+ - Option `:value` : key of the column which is come **from values**.
1095
+
1096
+ ```ruby
1097
+ import_cars.to_long(:Year, name: :Manufacturer, value: :Num_of_imported)
1098
+
1099
+ # =>
1100
+ #<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000017700>
1101
+ Year Manufacturer Num_of_imported
1102
+ <uint16> <dictionary> <uint32>
1103
+ 1 2021 Audi 22535
1104
+ 2 2021 BMW 35905
1105
+ 3 2021 BMW_MINI 18211
1106
+ 4 2021 Mercedes-Benz 51722
1107
+ 5 2021 VW 35215
1108
+ : : : :
1109
+ 23 2017 BMW_MINI 25427
1110
+ 24 2017 Mercedes-Benz 68221
1111
+ 25 2017 VW 49040
1112
+ ```
1003
1113
 
1004
- - [ ] each_rows
1114
+ ### `to_wide`
1115
+
1116
+ Creates a 'wide' DataFrame.
1117
+
1118
+ - Option `:name` : key of the column which will be expanded **to key name**.
1119
+ - Option `:value` : key of the column which will be expanded **to values**.
1120
+
1121
+ ```ruby
1122
+ import_cars.to_long(:Year).to_wide
1123
+ # import_cars.to_long(:Year).to_wide(name: :name, value: :value)
1124
+ # is also OK
1125
+
1126
+ # =>
1127
+ #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000f0f0>
1128
+ Year Audi BMW BMW_MINI Mercedes-Benz VW
1129
+ <uint16> <uint16> <uint16> <uint16> <uint32> <uint16>
1130
+ 1 2021 22535 35905 18211 51722 35215
1131
+ 2 2020 22304 35712 20196 57041 36576
1132
+ 3 2019 24222 46814 23813 66553 46794
1133
+ 4 2018 26473 50982 25984 67554 51961
1134
+ 5 2017 28336 52527 25427 68221 49040
1135
+ ```
1136
+
1137
+ ## Combine
1138
+
1139
+ - [ ] Combining dataframes
1140
+
1141
+ - [ ] Join
1142
+
1143
+ ## Encoding
1144
+
1145
+ - [ ] One-hot encoding
data/doc/Vector.md CHANGED
@@ -145,7 +145,7 @@ array[booleans]
145
145
  | ✓ `min_max` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| |
146
146
  |[ ]`mode` | | [ ] | |[ ] Mode | |
147
147
  | ✓ `product` | ✓ | ✓ | | ✓ ScalarAggregate| |
148
- |[ ]`quantile`| | [ ] | |[ ] Quantile| |
148
+ | `quantile`| || | Quantile|Specify probability in (0..1) by a parameter (default=0.5)|
149
149
  | ✓ `sd ` | | ✓ | | |ddof: 1 at `stddev`|
150
150
  | ✓ `stddev` | | ✓ | | ✓ Variance|ddof: 0 by default|
151
151
  | ✓ `sum` | ✓ | ✓ | | ✓ ScalarAggregate| |
@@ -303,6 +303,10 @@ double.round(n_digits: -1)
303
303
 
304
304
  Returns index of specified element.
305
305
 
306
+ ### `quantiles(probs = [1.0, 0.75, 0.5, 0.25, 0.0], interpolation: :linear, skip_nils: true, min_count: 0)`
307
+
308
+ Returns quantiles for specified probabilities in a DataFrame.
309
+
306
310
  ### `sort_indexes`, `sort_indices`, `array_sort_indices`
307
311
 
308
312
  ### [ ] `sort`, `sort_by`