red_amber 0.1.8 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.rubocop.yml +3 -1
- data/CHANGELOG.md +71 -2
- data/Gemfile +1 -1
- data/README.md +58 -33
- data/doc/DataFrame.md +196 -55
- data/doc/Vector.md +5 -1
- data/doc/examples_of_red_amber.ipynb +1677 -348
- data/lib/red_amber/data_frame.rb +92 -15
- data/lib/red_amber/data_frame_displayable.rb +25 -10
- data/lib/red_amber/data_frame_reshaping.rb +85 -0
- data/lib/red_amber/data_frame_variable_operation.rb +89 -40
- data/lib/red_amber/group.rb +5 -1
- data/lib/red_amber/vector_functions.rb +46 -1
- data/lib/red_amber/vector_selectable.rb +1 -1
- data/lib/red_amber/version.rb +1 -1
- data/lib/red_amber.rb +1 -1
- data/red_amber.gemspec +1 -1
- metadata +5 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 73459d02c921fcb0fcb742760e8c882b5491fa5316a79b9016233a516ada013e
|
4
|
+
data.tar.gz: ac25e808c5e5d4c13bb1877659550bba532cb5778371e39dfa1f3b9e5a91a4f8
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1bfa4200d440c338f496fe282816634d6a833e30e17edc87a2cf5ec63866e2bbbaf8796916f1b052ea66482c54a038bbf1445258c2526691e42c2b47be2c39c5
|
7
|
+
data.tar.gz: e324e480e6086f7017de58201783c857825b79d0b2e2c8fa2636089cd1c5531e22905a3c0d860f26b833eb6add6ed6017497632bd1ea8fcb932c2d2233b11812
|
data/.rubocop.yml
CHANGED
@@ -61,6 +61,7 @@ Metrics/AbcSize:
|
|
61
61
|
Max: 30
|
62
62
|
Exclude:
|
63
63
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 55
|
64
|
+
- 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
|
64
65
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 51
|
65
66
|
- 'lib/red_amber/vector_updatable.rb' # Max: 36
|
66
67
|
- 'lib/red_amber/vector_selectable.rb' # Max: 33
|
@@ -98,9 +99,10 @@ Metrics/MethodLength:
|
|
98
99
|
Metrics/ModuleLength:
|
99
100
|
Max: 100
|
100
101
|
Exclude:
|
102
|
+
- 'lib/red_amber/data_frame_displayable.rb' # Max: 132
|
101
103
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 141
|
104
|
+
- 'lib/red_amber/data_frame_variable_operation.rb' # Max: 110
|
102
105
|
- 'lib/red_amber/vector_functions.rb' # Max: 114
|
103
|
-
- 'lib/red_amber/data_frame_displayable.rb' # Max: 132
|
104
106
|
|
105
107
|
# Max: 8
|
106
108
|
Metrics/PerceivedComplexity:
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,75 @@
|
|
1
|
-
## [0.
|
1
|
+
## [0.2.0] - 2022-08-15
|
2
2
|
|
3
|
-
-
|
3
|
+
- Bump version up to 0.2.0
|
4
|
+
|
5
|
+
- Bug fixes
|
6
|
+
|
7
|
+
- Fix order of multiple group keys (#55)
|
8
|
+
|
9
|
+
Only 1 group key comes to left. Other keys remain in right.
|
10
|
+
|
11
|
+
- Remove optional `require` for rover (#55)
|
12
|
+
|
13
|
+
Fix DataFrame.new for argument with Rover::DataFrame.
|
14
|
+
|
15
|
+
- Fix occasional failure in CI (#59)
|
16
|
+
|
17
|
+
Sometimes the CI test fails. I added -dev dependency
|
18
|
+
in Arrow install by apt, not doing in bundler.
|
19
|
+
|
20
|
+
- Fix calling :take in V#[] (#56)
|
21
|
+
|
22
|
+
Fixed to call Arrow function :take instead of :array_take in Vector#take_by_vector. This will prevent the error below
|
23
|
+
when called with Arrow::ChunkedArray.
|
24
|
+
|
25
|
+
- Raise error renaming non existing key (#61)
|
26
|
+
|
27
|
+
Add error when specified key is not exist.
|
28
|
+
|
29
|
+
- Fix DataFrame#rename #assign by array (#65)
|
30
|
+
|
31
|
+
- New features and improvements
|
32
|
+
|
33
|
+
- Support Arrow 9.0.0
|
34
|
+
- Upgrade to Arrow 9.0.0 (#59)
|
35
|
+
- Add Vector#quantile method (#59)
|
36
|
+
Arrow::QuantileOptions has supported in Arrow GLib 9.0.0 (ARROW-16623, Thanks!)
|
37
|
+
|
38
|
+
- Add Vector#quantiles (#62)
|
39
|
+
|
40
|
+
- Add DataFrame#each_row (#56)
|
41
|
+
- Returns Enumerator if block is not given.
|
42
|
+
- Change DataFrame#each_row to return a Hash {key => row} (#63)
|
43
|
+
|
44
|
+
- Refactor to use pattern match in overloaded parameter parsing (#61)
|
45
|
+
- Refine DataFrame.new to use pattern match
|
46
|
+
- Use pattern match in DataFrame#assign
|
47
|
+
- Use pattern match in DataFrame#rename
|
48
|
+
|
49
|
+
- Accept Array for renamer/assigner in #rename/#assign (#61)
|
50
|
+
- Accept assigner by Arrays in DataFrame#assign
|
51
|
+
- Accept renamer pairs by Arrays in DataFrame#rename
|
52
|
+
- Add DataFrame#assign_left method
|
53
|
+
|
54
|
+
- Add summary/describe (#62)
|
55
|
+
- Introduce DataFrame#summary(#describe)
|
56
|
+
|
57
|
+
- Introduce reshaping methods for DataFrame (#64)
|
58
|
+
- Introduce DataFrame#transpose method
|
59
|
+
- Intorduce DataFrame#to_long method
|
60
|
+
- Intorduce DataFrame#to_wide method
|
61
|
+
|
62
|
+
- Others
|
63
|
+
|
64
|
+
- Add alias sort_index for array_sort_indices (#59)
|
65
|
+
- Enable :width option in DataFrame#to_s (#62)
|
66
|
+
- Add options to DataFrame#format_table (#62)
|
67
|
+
|
68
|
+
- Update Documents
|
69
|
+
|
70
|
+
- Add Yard doc for some methods
|
71
|
+
|
72
|
+
- Update Jupyter notebook '61 Examples of Red Amber' (#65)
|
4
73
|
|
5
74
|
## [0.1.8] - 2022-08-04 (experimental)
|
6
75
|
|
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -3,17 +3,23 @@
|
|
3
3
|
[![Gem Version](https://badge.fury.io/rb/red_amber.svg)](https://badge.fury.io/rb/red_amber)
|
4
4
|
[![Ruby](https://github.com/heronshoes/red_amber/actions/workflows/test.yml/badge.svg)](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
|
5
5
|
|
6
|
-
A simple dataframe library for Ruby
|
6
|
+
A simple dataframe library for Ruby.
|
7
7
|
|
8
8
|
- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [![Gitter Chat](https://badges.gitter.im/red-data-tools/en.svg)](https://gitter.im/red-data-tools/en)
|
9
9
|
- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
|
10
10
|
|
11
11
|
## Requirements
|
12
12
|
|
13
|
+
Supported Ruby version is >= 2.7.
|
14
|
+
|
15
|
+
Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
|
16
|
+
I recommend Ruby 3 for performance.
|
17
|
+
|
13
18
|
```ruby
|
14
|
-
|
19
|
+
# Libraries required
|
20
|
+
gem 'red-arrow', '>= 9.0.0'
|
15
21
|
|
16
|
-
gem 'red-parquet', '>=
|
22
|
+
gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet
|
17
23
|
gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
|
18
24
|
```
|
19
25
|
|
@@ -21,9 +27,9 @@ gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
|
|
21
27
|
|
22
28
|
Install requirements before you install Red Amber.
|
23
29
|
|
24
|
-
- Apache Arrow GLib (>=
|
30
|
+
- Apache Arrow GLib (>= 9.0.0)
|
25
31
|
|
26
|
-
- Apache Parquet GLib (>=
|
32
|
+
- Apache Parquet GLib (>= 9.0.0) # If you use IO from/to parquet
|
27
33
|
|
28
34
|
See [Apache Arrow install document](https://arrow.apache.org/install/).
|
29
35
|
|
@@ -122,22 +128,22 @@ df = df.drop(true, true, false)
|
|
122
128
|
|
123
129
|
# =>
|
124
130
|
#<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
|
125
|
-
body_mass_g
|
126
|
-
<uint16>
|
127
|
-
1 3750
|
128
|
-
2 3800
|
129
|
-
3 3250
|
130
|
-
4 (nil)
|
131
|
-
5 3450
|
132
|
-
: :
|
133
|
-
342 5750
|
134
|
-
343 5200
|
131
|
+
body_mass_g
|
132
|
+
<uint16>
|
133
|
+
1 3750
|
134
|
+
2 3800
|
135
|
+
3 3250
|
136
|
+
4 (nil)
|
137
|
+
5 3450
|
138
|
+
: :
|
139
|
+
342 5750
|
140
|
+
343 5200
|
135
141
|
344 5400
|
136
142
|
```
|
137
143
|
|
138
144
|
Arrow data is immutable, so these methods always return an new object.
|
139
145
|
|
140
|
-
`DataFrame#assign` creates new
|
146
|
+
`DataFrame#assign` creates new columns or update existing columns.
|
141
147
|
|
142
148
|
![assign method image](doc/image/dataframe/assign.png)
|
143
149
|
|
@@ -208,7 +214,7 @@ penguins.remove(penguins[:bill_length_mm] < 40)
|
|
208
214
|
|
209
215
|
DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
|
210
216
|
|
211
|
-
This example is usage of block to update
|
217
|
+
This example is usage of block to update a column.
|
212
218
|
|
213
219
|
```ruby
|
214
220
|
df = RedAmber::DataFrame.new(
|
@@ -229,30 +235,28 @@ df
|
|
229
235
|
5 (nil) (nil) (nil) (nil)
|
230
236
|
|
231
237
|
df.assign do
|
232
|
-
vectors.
|
233
|
-
|
234
|
-
end
|
238
|
+
vectors.select(&:float?).map { |v| [v.key, -v] }
|
239
|
+
# => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
|
235
240
|
end
|
236
241
|
|
237
242
|
# =>
|
238
|
-
#<RedAmber::DataFrame : 5 x
|
239
|
-
|
240
|
-
<uint8> <double> <string>
|
241
|
-
1 0 -0.0 A
|
242
|
-
2
|
243
|
-
3
|
244
|
-
4
|
245
|
-
5 (nil) (nil) (nil)
|
243
|
+
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
|
244
|
+
index float string
|
245
|
+
<uint8> <double> <string>
|
246
|
+
1 0 -0.0 A
|
247
|
+
2 1 -1.1 B
|
248
|
+
3 2 -2.2 C
|
249
|
+
4 3 NaN D
|
250
|
+
5 (nil) (nil) (nil)
|
246
251
|
```
|
247
252
|
|
248
|
-
|
249
|
-
|
250
|
-
Next example is to eliminate observations (row in the table) containing nil.
|
253
|
+
Next example is to eliminate rows containing nil.
|
251
254
|
|
252
255
|
```ruby
|
253
256
|
# remove all observations containing nil
|
254
257
|
nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
|
255
258
|
nil_removed.tdr
|
259
|
+
|
256
260
|
# =>
|
257
261
|
RedAmber::DataFrame : 342 x 8 Vectors
|
258
262
|
Vectors : 5 numeric, 3 strings
|
@@ -273,6 +277,21 @@ For this frequently needed task, we can do it much simpler.
|
|
273
277
|
penguins.remove_nil # => same result as above
|
274
278
|
```
|
275
279
|
|
280
|
+
`DataFrame#summary` shows summary statistics in a DataFrame.
|
281
|
+
|
282
|
+
```ruby
|
283
|
+
puts penguins.summary.to_s(width: 82)
|
284
|
+
|
285
|
+
# =>
|
286
|
+
variables count mean std min 25% median 75% max
|
287
|
+
<dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
|
288
|
+
1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
|
289
|
+
2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
|
290
|
+
3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
|
291
|
+
4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
|
292
|
+
5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
|
293
|
+
```
|
294
|
+
|
276
295
|
`DataFrame#group` method can be used for the grouping tasks.
|
277
296
|
|
278
297
|
```ruby
|
@@ -311,7 +330,7 @@ grouped.slice { v(:count) > 1 }
|
|
311
330
|
9 Kaminoan 2 221.0 88.0
|
312
331
|
```
|
313
332
|
|
314
|
-
See [DataFrame.md](doc/DataFrame.md) for details.
|
333
|
+
See [DataFrame.md](doc/DataFrame.md) for other examples and details.
|
315
334
|
|
316
335
|
|
317
336
|
## `RedAmber::Vector`
|
@@ -355,7 +374,7 @@ See [Vector.md](doc/Vector.md) for details.
|
|
355
374
|
|
356
375
|
## Jupyter notebook
|
357
376
|
|
358
|
-
[
|
377
|
+
[61 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
|
359
378
|
|
360
379
|
## Development
|
361
380
|
|
@@ -366,6 +385,12 @@ bundle install
|
|
366
385
|
bundle exec rake test
|
367
386
|
```
|
368
387
|
|
388
|
+
I will appreciate if you could help to improve this project. Here are a few ways you can help:
|
389
|
+
|
390
|
+
- [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
|
391
|
+
- Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
|
392
|
+
- Write, clarify, or fix documentation
|
393
|
+
|
369
394
|
## License
|
370
395
|
|
371
396
|
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
data/doc/DataFrame.md
CHANGED
@@ -167,6 +167,11 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
167
167
|
|
168
168
|
If you need a column-oriented full array, use `.to_h.to_a`
|
169
169
|
|
170
|
+
### `each_row`
|
171
|
+
|
172
|
+
Yield each row in a `{ key => row}` Hash.
|
173
|
+
Returns Enumerator if block is not given.
|
174
|
+
|
170
175
|
### `schema`
|
171
176
|
|
172
177
|
- Returns column name and data type in a Hash.
|
@@ -202,7 +207,22 @@ puts penguins.to_s
|
|
202
207
|
`inspect` uses `to_s` output and also shows shape and object_id.
|
203
208
|
|
204
209
|
|
205
|
-
### `summary`, `describe`
|
210
|
+
### `summary`, `describe`
|
211
|
+
|
212
|
+
`DataFrame#summary` or `DataFrame#describe` shows summary statistics in a DataFrame.
|
213
|
+
|
214
|
+
```ruby
|
215
|
+
puts penguins.summary.to_s(width: 82) # needs more width to show all stats in this example
|
216
|
+
|
217
|
+
# =>
|
218
|
+
variables count mean std min 25% median 75% max
|
219
|
+
<dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
|
220
|
+
1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
|
221
|
+
2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
|
222
|
+
3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
|
223
|
+
4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
|
224
|
+
5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
|
225
|
+
```
|
206
226
|
|
207
227
|
### `to_rover`
|
208
228
|
|
@@ -704,7 +724,7 @@ penguins.to_rover
|
|
704
724
|
|
705
725
|
- Key pairs as arguments
|
706
726
|
|
707
|
-
`rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}`.
|
727
|
+
`rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`.
|
708
728
|
|
709
729
|
```ruby
|
710
730
|
df = RedAmber::DataFrame.new( 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] )
|
@@ -721,7 +741,11 @@ penguins.to_rover
|
|
721
741
|
|
722
742
|
- Key pairs by a block
|
723
743
|
|
724
|
-
`rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}`. Block is called in the context of self.
|
744
|
+
`rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`. Block is called in the context of self.
|
745
|
+
|
746
|
+
- Not existing keys
|
747
|
+
|
748
|
+
If specified `existing_key` is not exist, raise a `DataFrameArgumentError`.
|
725
749
|
|
726
750
|
- Key type
|
727
751
|
|
@@ -729,16 +753,16 @@ penguins.to_rover
|
|
729
753
|
|
730
754
|
### `assign`
|
731
755
|
|
732
|
-
Assign new or updated
|
756
|
+
Assign new or updated columns (variables) and create a updated DataFrame.
|
733
757
|
|
734
|
-
- Variables with new keys will append new
|
758
|
+
- Variables with new keys will append new columns from the right.
|
735
759
|
- Variables with exisiting keys will update corresponding vectors.
|
736
760
|
|
737
761
|
![assign method image](doc/../image/dataframe/assign.png)
|
738
762
|
|
739
763
|
- Variables as arguments
|
740
764
|
|
741
|
-
`assign(key_pairs)` accepts pairs of key and values as
|
765
|
+
`assign(key_pairs)` accepts pairs of key and values as parameters. `key_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
|
742
766
|
|
743
767
|
```ruby
|
744
768
|
df = RedAmber::DataFrame.new(
|
@@ -769,7 +793,7 @@ penguins.to_rover
|
|
769
793
|
|
770
794
|
- Key pairs by a block
|
771
795
|
|
772
|
-
`assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key =>
|
796
|
+
`assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`. The block is called in the context of self.
|
773
797
|
|
774
798
|
```ruby
|
775
799
|
df = RedAmber::DataFrame.new(
|
@@ -788,29 +812,27 @@ penguins.to_rover
|
|
788
812
|
4 3 NaN D
|
789
813
|
5 (nil) (nil) (nil)
|
790
814
|
|
791
|
-
# update
|
815
|
+
# update :float
|
816
|
+
# assigner by an Array
|
792
817
|
df.assign do
|
793
|
-
|
794
|
-
|
795
|
-
assigner[keys[i]] = v * -1 if v.numeric?
|
796
|
-
end
|
797
|
-
assigner
|
818
|
+
vectors.select(&:float?)
|
819
|
+
.map { |v| [v.key, -v] }
|
798
820
|
end
|
799
821
|
|
800
822
|
# =>
|
801
|
-
#<RedAmber::DataFrame : 5 x 3 Vectors,
|
802
|
-
|
803
|
-
<
|
804
|
-
1
|
805
|
-
2
|
806
|
-
3
|
807
|
-
4
|
808
|
-
5
|
809
|
-
|
810
|
-
# Or
|
823
|
+
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000dfffc>
|
824
|
+
index float string
|
825
|
+
<uint8> <double> <string>
|
826
|
+
1 0 -0.0 A
|
827
|
+
2 1 -1.1 B
|
828
|
+
3 2 -2.2 C
|
829
|
+
4 3 NaN D
|
830
|
+
5 (nil) (nil) (nil)
|
831
|
+
|
832
|
+
# Or we can use assigner by a Hash
|
811
833
|
df.assign do
|
812
|
-
|
813
|
-
assigner[key] =
|
834
|
+
vectors.select.with_object({}) do |v, assigner|
|
835
|
+
assigner[v.key] = -v if v.float?
|
814
836
|
end
|
815
837
|
end
|
816
838
|
|
@@ -821,6 +843,28 @@ penguins.to_rover
|
|
821
843
|
|
822
844
|
Symbol key and String key are considered as the same key.
|
823
845
|
|
846
|
+
- Empty assignment
|
847
|
+
|
848
|
+
If assigner is empty or nil, returns self.
|
849
|
+
|
850
|
+
- Append from left
|
851
|
+
|
852
|
+
`assign_left` method accepts the same parameters and block as `assign`, but append new columns from leftside.
|
853
|
+
|
854
|
+
```ruby
|
855
|
+
df.assign_left(new_index: [1, 2, 3, 4, 5])
|
856
|
+
|
857
|
+
# =>
|
858
|
+
#<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000001787c>
|
859
|
+
new_index index float string
|
860
|
+
<uint8> <uint8> <double> <string>
|
861
|
+
1 1 0 0.0 A
|
862
|
+
2 2 1 1.1 B
|
863
|
+
3 3 2 2.2 C
|
864
|
+
4 4 3 NaN D
|
865
|
+
5 5 (nil) (nil) (nil)
|
866
|
+
```
|
867
|
+
|
824
868
|
## Updating
|
825
869
|
|
826
870
|
### `sort`
|
@@ -933,17 +977,17 @@ penguins.to_rover
|
|
933
977
|
starwars.group(:species).count(:species)
|
934
978
|
|
935
979
|
# =>
|
936
|
-
#<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
|
937
|
-
species count
|
938
|
-
<string> <int64>
|
939
|
-
1 Human 35
|
940
|
-
2 Droid 6
|
941
|
-
3 Wookiee 2
|
942
|
-
4 Rodian 1
|
943
|
-
5 Hutt 1
|
944
|
-
: : :
|
945
|
-
36 Kaleesh 1
|
946
|
-
37 Pau'an 1
|
980
|
+
#<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
|
981
|
+
species count
|
982
|
+
<string> <int64>
|
983
|
+
1 Human 35
|
984
|
+
2 Droid 6
|
985
|
+
3 Wookiee 2
|
986
|
+
4 Rodian 1
|
987
|
+
5 Hutt 1
|
988
|
+
: : :
|
989
|
+
36 Kaleesh 1
|
990
|
+
37 Pau'an 1
|
947
991
|
38 Kel Dor 1
|
948
992
|
```
|
949
993
|
|
@@ -953,17 +997,17 @@ penguins.to_rover
|
|
953
997
|
grouped = starwars.group(:species) { [count(:species), mean(:height, :mass)] }
|
954
998
|
|
955
999
|
# =>
|
956
|
-
#<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
|
957
|
-
|
958
|
-
<
|
959
|
-
1 Human
|
960
|
-
2 Droid
|
961
|
-
3
|
962
|
-
4 Rodian
|
963
|
-
5 Hutt
|
964
|
-
: :
|
965
|
-
36
|
966
|
-
37 Pau'an
|
1000
|
+
#<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
|
1001
|
+
specie s count mean(height) mean(mass)
|
1002
|
+
<strin g> <int64> <double> <double>
|
1003
|
+
1 Human 35 176.6 82.8
|
1004
|
+
2 Droid 6 131.2 69.8
|
1005
|
+
3 Wookie e 2 231.0 124.0
|
1006
|
+
4 Rodian 1 173.0 74.0
|
1007
|
+
5 Hutt 1 175.0 1358.0
|
1008
|
+
: : : : :
|
1009
|
+
36 Kalees h 1 216.0 159.0
|
1010
|
+
37 Pau'an 1 206.0 80.0
|
967
1011
|
38 Kel Dor 1 188.0 80.0
|
968
1012
|
```
|
969
1013
|
|
@@ -987,18 +1031,115 @@ penguins.to_rover
|
|
987
1031
|
9 Kaminoan 2 221.0 88.0
|
988
1032
|
```
|
989
1033
|
|
990
|
-
##
|
1034
|
+
## Reshape
|
991
1035
|
|
992
|
-
|
1036
|
+
### `transpose`
|
993
1037
|
|
994
|
-
|
1038
|
+
Creates transposed DataFrame for wide type dataframe.
|
995
1039
|
|
996
|
-
|
1040
|
+
```ruby
|
1041
|
+
import_cars = RedAmber::DataFrame.load('test/entity/import_cars.tsv')
|
997
1042
|
|
998
|
-
|
1043
|
+
# =>
|
1044
|
+
#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000d520>
|
1045
|
+
Year Audi BMW BMW_MINI Mercedes-Benz VW
|
1046
|
+
<int64> <int64> <int64> <int64> <int64> <int64>
|
1047
|
+
1 2021 22535 35905 18211 51722 35215
|
1048
|
+
2 2020 22304 35712 20196 57041 36576
|
1049
|
+
3 2019 24222 46814 23813 66553 46794
|
1050
|
+
4 2018 26473 50982 25984 67554 51961
|
1051
|
+
5 2017 28336 52527 25427 68221 49040
|
999
1052
|
|
1000
|
-
|
1053
|
+
import_cars.transpose
|
1001
1054
|
|
1002
|
-
|
1055
|
+
# =>
|
1056
|
+
#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000ef74>
|
1057
|
+
name 2021 2020 2019 2018 2017
|
1058
|
+
<dictionary> <uint16> <uint16> <uint32> <uint32> <uint32>
|
1059
|
+
1 Audi 22535 22304 24222 26473 28336
|
1060
|
+
2 BMW 35905 35712 46814 50982 52527
|
1061
|
+
3 BMW_MINI 18211 20196 23813 25984 25427
|
1062
|
+
4 Mercedes-Benz 51722 57041 66553 67554 68221
|
1063
|
+
5 VW 35215 36576 46794 51961 49040
|
1064
|
+
```
|
1065
|
+
|
1066
|
+
The leftmost column is created by original keys. Key name of the column is
|
1067
|
+
named by 'name'.
|
1068
|
+
|
1069
|
+
### `to_long(*keep_keys)`
|
1070
|
+
|
1071
|
+
Creates a 'long' DataFrame.
|
1072
|
+
|
1073
|
+
- Parameter `keep_keys` specifies the key names to keep.
|
1074
|
+
|
1075
|
+
```ruby
|
1076
|
+
import_cars.to_long(:Year)
|
1077
|
+
|
1078
|
+
# =>
|
1079
|
+
#<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000012750>
|
1080
|
+
Year name value
|
1081
|
+
<uint16> <dictionary> <uint32>
|
1082
|
+
1 2021 Audi 22535
|
1083
|
+
2 2021 BMW 35905
|
1084
|
+
3 2021 BMW_MINI 18211
|
1085
|
+
4 2021 Mercedes-Benz 51722
|
1086
|
+
5 2021 VW 35215
|
1087
|
+
: : : :
|
1088
|
+
23 2017 BMW_MINI 25427
|
1089
|
+
24 2017 Mercedes-Benz 68221
|
1090
|
+
25 2017 VW 49040
|
1091
|
+
```
|
1092
|
+
|
1093
|
+
- Option `:name` : key of the column which is come **from key names**.
|
1094
|
+
- Option `:value` : key of the column which is come **from values**.
|
1095
|
+
|
1096
|
+
```ruby
|
1097
|
+
import_cars.to_long(:Year, name: :Manufacturer, value: :Num_of_imported)
|
1098
|
+
|
1099
|
+
# =>
|
1100
|
+
#<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000017700>
|
1101
|
+
Year Manufacturer Num_of_imported
|
1102
|
+
<uint16> <dictionary> <uint32>
|
1103
|
+
1 2021 Audi 22535
|
1104
|
+
2 2021 BMW 35905
|
1105
|
+
3 2021 BMW_MINI 18211
|
1106
|
+
4 2021 Mercedes-Benz 51722
|
1107
|
+
5 2021 VW 35215
|
1108
|
+
: : : :
|
1109
|
+
23 2017 BMW_MINI 25427
|
1110
|
+
24 2017 Mercedes-Benz 68221
|
1111
|
+
25 2017 VW 49040
|
1112
|
+
```
|
1003
1113
|
|
1004
|
-
|
1114
|
+
### `to_wide`
|
1115
|
+
|
1116
|
+
Creates a 'wide' DataFrame.
|
1117
|
+
|
1118
|
+
- Option `:name` : key of the column which will be expanded **to key name**.
|
1119
|
+
- Option `:value` : key of the column which will be expanded **to values**.
|
1120
|
+
|
1121
|
+
```ruby
|
1122
|
+
import_cars.to_long(:Year).to_wide
|
1123
|
+
# import_cars.to_long(:Year).to_wide(name: :name, value: :value)
|
1124
|
+
# is also OK
|
1125
|
+
|
1126
|
+
# =>
|
1127
|
+
#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000f0f0>
|
1128
|
+
Year Audi BMW BMW_MINI Mercedes-Benz VW
|
1129
|
+
<uint16> <uint16> <uint16> <uint16> <uint32> <uint16>
|
1130
|
+
1 2021 22535 35905 18211 51722 35215
|
1131
|
+
2 2020 22304 35712 20196 57041 36576
|
1132
|
+
3 2019 24222 46814 23813 66553 46794
|
1133
|
+
4 2018 26473 50982 25984 67554 51961
|
1134
|
+
5 2017 28336 52527 25427 68221 49040
|
1135
|
+
```
|
1136
|
+
|
1137
|
+
## Combine
|
1138
|
+
|
1139
|
+
- [ ] Combining dataframes
|
1140
|
+
|
1141
|
+
- [ ] Join
|
1142
|
+
|
1143
|
+
## Encoding
|
1144
|
+
|
1145
|
+
- [ ] One-hot encoding
|
data/doc/Vector.md
CHANGED
@@ -145,7 +145,7 @@ array[booleans]
|
|
145
145
|
| ✓ `min_max` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| |
|
146
146
|
|[ ]`mode` | | [ ] | |[ ] Mode | |
|
147
147
|
| ✓ `product` | ✓ | ✓ | | ✓ ScalarAggregate| |
|
148
|
-
|
|
148
|
+
| ✓ `quantile`| | ✓ | | ✓ Quantile|Specify probability in (0..1) by a parameter (default=0.5)|
|
149
149
|
| ✓ `sd ` | | ✓ | | |ddof: 1 at `stddev`|
|
150
150
|
| ✓ `stddev` | | ✓ | | ✓ Variance|ddof: 0 by default|
|
151
151
|
| ✓ `sum` | ✓ | ✓ | | ✓ ScalarAggregate| |
|
@@ -303,6 +303,10 @@ double.round(n_digits: -1)
|
|
303
303
|
|
304
304
|
Returns index of specified element.
|
305
305
|
|
306
|
+
### `quantiles(probs = [1.0, 0.75, 0.5, 0.25, 0.0], interpolation: :linear, skip_nils: true, min_count: 0)`
|
307
|
+
|
308
|
+
Returns quantiles for specified probabilities in a DataFrame.
|
309
|
+
|
306
310
|
### `sort_indexes`, `sort_indices`, `array_sort_indices`
|
307
311
|
|
308
312
|
### [ ] `sort`, `sort_by`
|