red_amber 0.1.8 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3853e70f378cac65013a3bcfc51a2d55cb70cc494f3f3b70675bed944cc15b49
4
- data.tar.gz: 3c65999cf978f1edf8c2c7fcce9a0ccb192d4da051f34fa0bf3f66ddc178eb1c
3
+ metadata.gz: 73459d02c921fcb0fcb742760e8c882b5491fa5316a79b9016233a516ada013e
4
+ data.tar.gz: ac25e808c5e5d4c13bb1877659550bba532cb5778371e39dfa1f3b9e5a91a4f8
5
5
  SHA512:
6
- metadata.gz: fac66ba0bf5955cfe0d21a51b90ec16407182b9053e9b586dfe9f8e2526de4e90efecdd8eba1e8b3c99b12fc44544c82fb2f6af4b666b97876a64a6ee4deedf1
7
- data.tar.gz: 1a4cc526ce9f097438f2b7d018552a4cd6aaa2d900012297cd1777c4b9e39063cc2988af91c138e93f291a56175aefb6a6b00c211f9b9c5bd38d75d6bc40acb9
6
+ metadata.gz: 1bfa4200d440c338f496fe282816634d6a833e30e17edc87a2cf5ec63866e2bbbaf8796916f1b052ea66482c54a038bbf1445258c2526691e42c2b47be2c39c5
7
+ data.tar.gz: e324e480e6086f7017de58201783c857825b79d0b2e2c8fa2636089cd1c5531e22905a3c0d860f26b833eb6add6ed6017497632bd1ea8fcb932c2d2233b11812
data/.rubocop.yml CHANGED
@@ -61,6 +61,7 @@ Metrics/AbcSize:
61
61
  Max: 30
62
62
  Exclude:
63
63
  - 'lib/red_amber/data_frame_displayable.rb' # Max: 55
64
+ - 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
64
65
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 51
65
66
  - 'lib/red_amber/vector_updatable.rb' # Max: 36
66
67
  - 'lib/red_amber/vector_selectable.rb' # Max: 33
@@ -98,9 +99,10 @@ Metrics/MethodLength:
98
99
  Metrics/ModuleLength:
99
100
  Max: 100
100
101
  Exclude:
102
+ - 'lib/red_amber/data_frame_displayable.rb' # Max: 132
101
103
  - 'lib/red_amber/data_frame_selectable.rb' # Max: 141
104
+ - 'lib/red_amber/data_frame_variable_operation.rb' # Max: 110
102
105
  - 'lib/red_amber/vector_functions.rb' # Max: 114
103
- - 'lib/red_amber/data_frame_displayable.rb' # Max: 132
104
106
 
105
107
  # Max: 8
106
108
  Metrics/PerceivedComplexity:
data/CHANGELOG.md CHANGED
@@ -1,6 +1,75 @@
1
- ## [0.1.9] - Unreleased
1
+ ## [0.2.0] - 2022-08-15
2
2
 
3
- - Supports Arrow 9.0.0
3
+ - Bump version up to 0.2.0
4
+
5
+ - Bug fixes
6
+
7
+ - Fix order of multiple group keys (#55)
8
+
9
+ Only 1 group key comes to left. Other keys remain in right.
10
+
11
+ - Remove optional `require` for rover (#55)
12
+
13
+ Fix DataFrame.new for argument with Rover::DataFrame.
14
+
15
+ - Fix occasional failure in CI (#59)
16
+
17
+ Sometimes the CI test fails. I added -dev dependency
18
+ in Arrow install by apt, not doing in bundler.
19
+
20
+ - Fix calling :take in V#[] (#56)
21
+
22
+ Fixed to call Arrow function :take instead of :array_take in Vector#take_by_vector. This will prevent the error below
23
+ when called with Arrow::ChunkedArray.
24
+
25
+ - Raise error renaming non existing key (#61)
26
+
27
+ Add error when specified key is not exist.
28
+
29
+ - Fix DataFrame#rename #assign by array (#65)
30
+
31
+ - New features and improvements
32
+
33
+ - Support Arrow 9.0.0
34
+ - Upgrade to Arrow 9.0.0 (#59)
35
+ - Add Vector#quantile method (#59)
36
+ Arrow::QuantileOptions has supported in Arrow GLib 9.0.0 (ARROW-16623, Thanks!)
37
+
38
+ - Add Vector#quantiles (#62)
39
+
40
+ - Add DataFrame#each_row (#56)
41
+ - Returns Enumerator if block is not given.
42
+ - Change DataFrame#each_row to return a Hash {key => row} (#63)
43
+
44
+ - Refactor to use pattern match in overloaded parameter parsing (#61)
45
+ - Refine DataFrame.new to use pattern match
46
+ - Use pattern match in DataFrame#assign
47
+ - Use pattern match in DataFrame#rename
48
+
49
+ - Accept Array for renamer/assigner in #rename/#assign (#61)
50
+ - Accept assigner by Arrays in DataFrame#assign
51
+ - Accept renamer pairs by Arrays in DataFrame#rename
52
+ - Add DataFrame#assign_left method
53
+
54
+ - Add summary/describe (#62)
55
+ - Introduce DataFrame#summary(#describe)
56
+
57
+ - Introduce reshaping methods for DataFrame (#64)
58
+ - Introduce DataFrame#transpose method
59
+ - Intorduce DataFrame#to_long method
60
+ - Intorduce DataFrame#to_wide method
61
+
62
+ - Others
63
+
64
+ - Add alias sort_index for array_sort_indices (#59)
65
+ - Enable :width option in DataFrame#to_s (#62)
66
+ - Add options to DataFrame#format_table (#62)
67
+
68
+ - Update Documents
69
+
70
+ - Add Yard doc for some methods
71
+
72
+ - Update Jupyter notebook '61 Examples of Red Amber' (#65)
4
73
 
5
74
  ## [0.1.8] - 2022-08-04 (experimental)
6
75
 
data/Gemfile CHANGED
@@ -7,7 +7,7 @@ gemspec
7
7
  group :test do
8
8
  gem 'rake'
9
9
 
10
- gem 'red-parquet', '>= 8.0.0'
10
+ gem 'red-parquet', '>= 9.0.0'
11
11
  gem 'rover-df', '~> 0.3.0'
12
12
 
13
13
  gem 'rubocop'
data/README.md CHANGED
@@ -3,17 +3,23 @@
3
3
  [![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
4
4
  [![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
5
5
 
6
- A simple dataframe library for Ruby (experimental).
6
+ A simple dataframe library for Ruby.
7
7
 
8
8
  - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
9
9
  - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
10
10
 
11
11
  ## Requirements
12
12
 
13
+ Supported Ruby version is >= 2.7.
14
+
15
+ Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
16
+ I recommend Ruby 3 for performance.
17
+
13
18
  ```ruby
14
- gem 'red-arrow', '>= 8.0.0'
19
+ # Libraries required
20
+ gem 'red-arrow', '>= 9.0.0'
15
21
 
16
- gem 'red-parquet', '>= 8.0.0' # Optional, if you use IO from/to parquet
22
+ gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet
17
23
  gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
18
24
  ```
19
25
 
@@ -21,9 +27,9 @@ gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
21
27
 
22
28
  Install requirements before you install Red Amber.
23
29
 
24
- - Apache Arrow GLib (>= 8.0.0)
30
+ - Apache Arrow GLib (>= 9.0.0)
25
31
 
26
- - Apache Parquet GLib (>= 8.0.0) # If you use IO from/to parquet
32
+ - Apache Parquet GLib (>= 9.0.0) # If you use IO from/to parquet
27
33
 
28
34
  See [Apache Arrow install document](https://arrow.apache.org/install/).
29
35
 
@@ -122,22 +128,22 @@ df = df.drop(true, true, false)
122
128
 
123
129
  # =>
124
130
  #<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
125
- body_mass_g
126
- <uint16>
127
- 1 3750
128
- 2 3800
129
- 3 3250
130
- 4 (nil)
131
- 5 3450
132
- : :
133
- 342 5750
134
- 343 5200
131
+ body_mass_g
132
+ <uint16>
133
+ 1 3750
134
+ 2 3800
135
+ 3 3250
136
+ 4 (nil)
137
+ 5 3450
138
+ : :
139
+ 342 5750
140
+ 343 5200
135
141
  344 5400
136
142
  ```
137
143
 
138
144
  Arrow data is immutable, so these methods always return an new object.
139
145
 
140
- `DataFrame#assign` creates new variables (column in the table).
146
+ `DataFrame#assign` creates new columns or update existing columns.
141
147
 
142
148
  ![assign method image](doc/image/dataframe/assign.png)
143
149
 
@@ -208,7 +214,7 @@ penguins.remove(penguins[:bill_length_mm] < 40)
208
214
 
209
215
  DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
210
216
 
211
- This example is usage of block to update numeric columns.
217
+ This example is usage of block to update a column.
212
218
 
213
219
  ```ruby
214
220
  df = RedAmber::DataFrame.new(
@@ -229,30 +235,28 @@ df
229
235
  5 (nil) (nil) (nil) (nil)
230
236
 
231
237
  df.assign do
232
- vectors.each_with_object({}) do |v, h|
233
- h[v.key] = -v if v.numeric?
234
- end
238
+ vectors.select(&:float?).map { |v| [v.key, -v] }
239
+ # => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
235
240
  end
236
241
 
237
242
  # =>
238
- #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000009a1b4>
239
- integer float string boolean
240
- <uint8> <double> <string> <boolean>
241
- 1 0 -0.0 A true
242
- 2 255 -1.1 B false
243
- 3 254 -2.2 C true
244
- 4 253 NaN D false
245
- 5 (nil) (nil) (nil) (nil)
243
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
244
+ index float string
245
+ <uint8> <double> <string>
246
+ 1 0 -0.0 A
247
+ 2 1 -1.1 B
248
+ 3 2 -2.2 C
249
+ 4 3 NaN D
250
+ 5 (nil) (nil) (nil)
246
251
  ```
247
252
 
248
- Negate (-@) method of unsigned integer Vector returns complement.
249
-
250
- Next example is to eliminate observations (row in the table) containing nil.
253
+ Next example is to eliminate rows containing nil.
251
254
 
252
255
  ```ruby
253
256
  # remove all observations containing nil
254
257
  nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
255
258
  nil_removed.tdr
259
+
256
260
  # =>
257
261
  RedAmber::DataFrame : 342 x 8 Vectors
258
262
  Vectors : 5 numeric, 3 strings
@@ -273,6 +277,21 @@ For this frequently needed task, we can do it much simpler.
273
277
  penguins.remove_nil # => same result as above
274
278
  ```
275
279
 
280
+ `DataFrame#summary` shows summary statistics in a DataFrame.
281
+
282
+ ```ruby
283
+ puts penguins.summary.to_s(width: 82)
284
+
285
+ # =>
286
+ variables count mean std min 25% median 75% max
287
+ <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
288
+ 1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
289
+ 2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
290
+ 3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
291
+ 4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
292
+ 5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
293
+ ```
294
+
276
295
  `DataFrame#group` method can be used for the grouping tasks.
277
296
 
278
297
  ```ruby
@@ -311,7 +330,7 @@ grouped.slice { v(:count) > 1 }
311
330
  9 Kaminoan 2 221.0 88.0
312
331
  ```
313
332
 
314
- See [DataFrame.md](doc/DataFrame.md) for details.
333
+ See [DataFrame.md](doc/DataFrame.md) for other examples and details.
315
334
 
316
335
 
317
336
  ## `RedAmber::Vector`
@@ -355,7 +374,7 @@ See [Vector.md](doc/Vector.md) for details.
355
374
 
356
375
  ## Jupyter notebook
357
376
 
358
- [53 Examples of Red Amber](doc/examples_of_red_amber.ipynb)
377
+ [61 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
359
378
 
360
379
  ## Development
361
380
 
@@ -366,6 +385,12 @@ bundle install
366
385
  bundle exec rake test
367
386
  ```
368
387
 
388
+ I will appreciate if you could help to improve this project. Here are a few ways you can help:
389
+
390
+ - [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
391
+ - Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
392
+ - Write, clarify, or fix documentation
393
+
369
394
  ## License
370
395
 
371
396
  The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
data/doc/DataFrame.md CHANGED
@@ -167,6 +167,11 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
167
167
 
168
168
  If you need a column-oriented full array, use `.to_h.to_a`
169
169
 
170
+ ### `each_row`
171
+
172
+ Yield each row in a `{ key => row}` Hash.
173
+ Returns Enumerator if block is not given.
174
+
170
175
  ### `schema`
171
176
 
172
177
  - Returns column name and data type in a Hash.
@@ -202,7 +207,22 @@ puts penguins.to_s
202
207
  `inspect` uses `to_s` output and also shows shape and object_id.
203
208
 
204
209
 
205
- ### `summary`, `describe` (not implemented)
210
+ ### `summary`, `describe`
211
+
212
+ `DataFrame#summary` or `DataFrame#describe` shows summary statistics in a DataFrame.
213
+
214
+ ```ruby
215
+ puts penguins.summary.to_s(width: 82) # needs more width to show all stats in this example
216
+
217
+ # =>
218
+ variables count mean std min 25% median 75% max
219
+ <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
220
+ 1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
221
+ 2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
222
+ 3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
223
+ 4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
224
+ 5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
225
+ ```
206
226
 
207
227
  ### `to_rover`
208
228
 
@@ -704,7 +724,7 @@ penguins.to_rover
704
724
 
705
725
  - Key pairs as arguments
706
726
 
707
- `rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}`.
727
+ `rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`.
708
728
 
709
729
  ```ruby
710
730
  df = RedAmber::DataFrame.new( 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] )
@@ -721,7 +741,11 @@ penguins.to_rover
721
741
 
722
742
  - Key pairs by a block
723
743
 
724
- `rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}`. Block is called in the context of self.
744
+ `rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`. Block is called in the context of self.
745
+
746
+ - Not existing keys
747
+
748
+ If specified `existing_key` is not exist, raise a `DataFrameArgumentError`.
725
749
 
726
750
  - Key type
727
751
 
@@ -729,16 +753,16 @@ penguins.to_rover
729
753
 
730
754
  ### `assign`
731
755
 
732
- Assign new or updated variables (columns) and create a updated DataFrame.
756
+ Assign new or updated columns (variables) and create a updated DataFrame.
733
757
 
734
- - Variables with new keys will append new variables at bottom (right in the table).
758
+ - Variables with new keys will append new columns from the right.
735
759
  - Variables with exisiting keys will update corresponding vectors.
736
760
 
737
761
  ![assign method image](doc/../image/dataframe/assign.png)
738
762
 
739
763
  - Variables as arguments
740
764
 
741
- `assign(key_pairs)` accepts pairs of key and values as arguments. key_pairs should be a Hash of `{key => array}` or `{key => Vector}`.
765
+ `assign(key_pairs)` accepts pairs of key and values as parameters. `key_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
742
766
 
743
767
  ```ruby
744
768
  df = RedAmber::DataFrame.new(
@@ -769,7 +793,7 @@ penguins.to_rover
769
793
 
770
794
  - Key pairs by a block
771
795
 
772
- `assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key => array}` or `{key => Vector}`. Block is called in the context of self.
796
+ `assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`. The block is called in the context of self.
773
797
 
774
798
  ```ruby
775
799
  df = RedAmber::DataFrame.new(
@@ -788,29 +812,27 @@ penguins.to_rover
788
812
  4 3 NaN D
789
813
  5 (nil) (nil) (nil)
790
814
 
791
- # update numeric variables
815
+ # update :float
816
+ # assigner by an Array
792
817
  df.assign do
793
- assigner = {}
794
- vectors.each_with_index do |v, i|
795
- assigner[keys[i]] = v * -1 if v.numeric?
796
- end
797
- assigner
818
+ vectors.select(&:float?)
819
+ .map { |v| [v.key, -v] }
798
820
  end
799
821
 
800
822
  # =>
801
- #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000006e000>
802
- index float string
803
- <int8> <double> <string>
804
- 1 0 -0.0 A
805
- 2 -1 -1.1 B
806
- 3 -2 -2.2 C
807
- 4 -3 NaN D
808
- 5 (nil) (nil) (nil)
809
-
810
- # Or it ’s shorter like this:
823
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000dfffc>
824
+ index float string
825
+ <uint8> <double> <string>
826
+ 1 0 -0.0 A
827
+ 2 1 -1.1 B
828
+ 3 2 -2.2 C
829
+ 4 3 NaN D
830
+ 5 (nil) (nil) (nil)
831
+
832
+ # Or we can use assigner by a Hash
811
833
  df.assign do
812
- variables.select.with_object({}) do |(key, vector), assigner|
813
- assigner[key] = vector * -1 if vector.numeric?
834
+ vectors.select.with_object({}) do |v, assigner|
835
+ assigner[v.key] = -v if v.float?
814
836
  end
815
837
  end
816
838
 
@@ -821,6 +843,28 @@ penguins.to_rover
821
843
 
822
844
  Symbol key and String key are considered as the same key.
823
845
 
846
+ - Empty assignment
847
+
848
+ If assigner is empty or nil, returns self.
849
+
850
+ - Append from left
851
+
852
+ `assign_left` method accepts the same parameters and block as `assign`, but append new columns from leftside.
853
+
854
+ ```ruby
855
+ df.assign_left(new_index: [1, 2, 3, 4, 5])
856
+
857
+ # =>
858
+ #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000001787c>
859
+ new_index index float string
860
+ <uint8> <uint8> <double> <string>
861
+ 1 1 0 0.0 A
862
+ 2 2 1 1.1 B
863
+ 3 3 2 2.2 C
864
+ 4 4 3 NaN D
865
+ 5 5 (nil) (nil) (nil)
866
+ ```
867
+
824
868
  ## Updating
825
869
 
826
870
  ### `sort`
@@ -933,17 +977,17 @@ penguins.to_rover
933
977
  starwars.group(:species).count(:species)
934
978
 
935
979
  # =>
936
- #<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
937
- species count
938
- <string> <int64>
939
- 1 Human 35
940
- 2 Droid 6
941
- 3 Wookiee 2
942
- 4 Rodian 1
943
- 5 Hutt 1
944
- : : :
945
- 36 Kaleesh 1
946
- 37 Pau'an 1
980
+ #<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
981
+ species count
982
+ <string> <int64>
983
+ 1 Human 35
984
+ 2 Droid 6
985
+ 3 Wookiee 2
986
+ 4 Rodian 1
987
+ 5 Hutt 1
988
+ : : :
989
+ 36 Kaleesh 1
990
+ 37 Pau'an 1
947
991
  38 Kel Dor 1
948
992
  ```
949
993
 
@@ -953,17 +997,17 @@ penguins.to_rover
953
997
  grouped = starwars.group(:species) { [count(:species), mean(:height, :mass)] }
954
998
 
955
999
  # =>
956
- #<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
957
- species count mean(height) mean(mass)
958
- <string> <int64> <double> <double>
959
- 1 Human 35 176.6 82.8
960
- 2 Droid 6 131.2 69.8
961
- 3 Wookiee 2 231.0 124.0
962
- 4 Rodian 1 173.0 74.0
963
- 5 Hutt 1 175.0 1358.0
964
- : : : : :
965
- 36 Kaleesh 1 216.0 159.0
966
- 37 Pau'an 1 206.0 80.0
1000
+ #<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
1001
+ specie s count mean(height) mean(mass)
1002
+ <strin g> <int64> <double> <double>
1003
+ 1 Human 35 176.6 82.8
1004
+ 2 Droid 6 131.2 69.8
1005
+ 3 Wookie e 2 231.0 124.0
1006
+ 4 Rodian 1 173.0 74.0
1007
+ 5 Hutt 1 175.0 1358.0
1008
+ : : : : :
1009
+ 36 Kalees h 1 216.0 159.0
1010
+ 37 Pau'an 1 206.0 80.0
967
1011
  38 Kel Dor 1 188.0 80.0
968
1012
  ```
969
1013
 
@@ -987,18 +1031,115 @@ penguins.to_rover
987
1031
  9 Kaminoan 2 221.0 88.0
988
1032
  ```
989
1033
 
990
- ## Combining DataFrames
1034
+ ## Reshape
991
1035
 
992
- - [ ] Combining rows to a dataframe
1036
+ ### `transpose`
993
1037
 
994
- - [ ] Inner join
1038
+ Creates transposed DataFrame for wide type dataframe.
995
1039
 
996
- - [ ] Left join
1040
+ ```ruby
1041
+ import_cars = RedAmber::DataFrame.load('test/entity/import_cars.tsv')
997
1042
 
998
- ## Encoding
1043
+ # =>
1044
+ #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000d520>
1045
+ Year Audi BMW BMW_MINI Mercedes-Benz VW
1046
+ <int64> <int64> <int64> <int64> <int64> <int64>
1047
+ 1 2021 22535 35905 18211 51722 35215
1048
+ 2 2020 22304 35712 20196 57041 36576
1049
+ 3 2019 24222 46814 23813 66553 46794
1050
+ 4 2018 26473 50982 25984 67554 51961
1051
+ 5 2017 28336 52527 25427 68221 49040
999
1052
 
1000
- - [ ] One-hot encoding
1053
+ import_cars.transpose
1001
1054
 
1002
- ## Iteration
1055
+ # =>
1056
+ #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000ef74>
1057
+ name 2021 2020 2019 2018 2017
1058
+ <dictionary> <uint16> <uint16> <uint32> <uint32> <uint32>
1059
+ 1 Audi 22535 22304 24222 26473 28336
1060
+ 2 BMW 35905 35712 46814 50982 52527
1061
+ 3 BMW_MINI 18211 20196 23813 25984 25427
1062
+ 4 Mercedes-Benz 51722 57041 66553 67554 68221
1063
+ 5 VW 35215 36576 46794 51961 49040
1064
+ ```
1065
+
1066
+ The leftmost column is created by original keys. Key name of the column is
1067
+ named by 'name'.
1068
+
1069
+ ### `to_long(*keep_keys)`
1070
+
1071
+ Creates a 'long' DataFrame.
1072
+
1073
+ - Parameter `keep_keys` specifies the key names to keep.
1074
+
1075
+ ```ruby
1076
+ import_cars.to_long(:Year)
1077
+
1078
+ # =>
1079
+ #<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000012750>
1080
+ Year name value
1081
+ <uint16> <dictionary> <uint32>
1082
+ 1 2021 Audi 22535
1083
+ 2 2021 BMW 35905
1084
+ 3 2021 BMW_MINI 18211
1085
+ 4 2021 Mercedes-Benz 51722
1086
+ 5 2021 VW 35215
1087
+ : : : :
1088
+ 23 2017 BMW_MINI 25427
1089
+ 24 2017 Mercedes-Benz 68221
1090
+ 25 2017 VW 49040
1091
+ ```
1092
+
1093
+ - Option `:name` : key of the column which is come **from key names**.
1094
+ - Option `:value` : key of the column which is come **from values**.
1095
+
1096
+ ```ruby
1097
+ import_cars.to_long(:Year, name: :Manufacturer, value: :Num_of_imported)
1098
+
1099
+ # =>
1100
+ #<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000017700>
1101
+ Year Manufacturer Num_of_imported
1102
+ <uint16> <dictionary> <uint32>
1103
+ 1 2021 Audi 22535
1104
+ 2 2021 BMW 35905
1105
+ 3 2021 BMW_MINI 18211
1106
+ 4 2021 Mercedes-Benz 51722
1107
+ 5 2021 VW 35215
1108
+ : : : :
1109
+ 23 2017 BMW_MINI 25427
1110
+ 24 2017 Mercedes-Benz 68221
1111
+ 25 2017 VW 49040
1112
+ ```
1003
1113
 
1004
- - [ ] each_rows
1114
+ ### `to_wide`
1115
+
1116
+ Creates a 'wide' DataFrame.
1117
+
1118
+ - Option `:name` : key of the column which will be expanded **to key name**.
1119
+ - Option `:value` : key of the column which will be expanded **to values**.
1120
+
1121
+ ```ruby
1122
+ import_cars.to_long(:Year).to_wide
1123
+ # import_cars.to_long(:Year).to_wide(name: :name, value: :value)
1124
+ # is also OK
1125
+
1126
+ # =>
1127
+ #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000f0f0>
1128
+ Year Audi BMW BMW_MINI Mercedes-Benz VW
1129
+ <uint16> <uint16> <uint16> <uint16> <uint32> <uint16>
1130
+ 1 2021 22535 35905 18211 51722 35215
1131
+ 2 2020 22304 35712 20196 57041 36576
1132
+ 3 2019 24222 46814 23813 66553 46794
1133
+ 4 2018 26473 50982 25984 67554 51961
1134
+ 5 2017 28336 52527 25427 68221 49040
1135
+ ```
1136
+
1137
+ ## Combine
1138
+
1139
+ - [ ] Combining dataframes
1140
+
1141
+ - [ ] Join
1142
+
1143
+ ## Encoding
1144
+
1145
+ - [ ] One-hot encoding
data/doc/Vector.md CHANGED
@@ -145,7 +145,7 @@ array[booleans]
145
145
  | ✓ `min_max` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| |
146
146
  |[ ]`mode` | | [ ] | |[ ] Mode | |
147
147
  | ✓ `product` | ✓ | ✓ | | ✓ ScalarAggregate| |
148
- |[ ]`quantile`| | [ ] | |[ ] Quantile| |
148
+ | `quantile`| || | Quantile|Specify probability in (0..1) by a parameter (default=0.5)|
149
149
  | ✓ `sd ` | | ✓ | | |ddof: 1 at `stddev`|
150
150
  | ✓ `stddev` | | ✓ | | ✓ Variance|ddof: 0 by default|
151
151
  | ✓ `sum` | ✓ | ✓ | | ✓ ScalarAggregate| |
@@ -303,6 +303,10 @@ double.round(n_digits: -1)
303
303
 
304
304
  Returns index of specified element.
305
305
 
306
+ ### `quantiles(probs = [1.0, 0.75, 0.5, 0.25, 0.0], interpolation: :linear, skip_nils: true, min_count: 0)`
307
+
308
+ Returns quantiles for specified probabilities in a DataFrame.
309
+
306
310
  ### `sort_indexes`, `sort_indices`, `array_sort_indices`
307
311
 
308
312
  ### [ ] `sort`, `sort_by`