red_amber 0.1.8 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +3 -1
- data/CHANGELOG.md +71 -2
- data/Gemfile +1 -1
- data/README.md +58 -33
- data/doc/DataFrame.md +196 -55
- data/doc/Vector.md +5 -1
- data/doc/examples_of_red_amber.ipynb +1677 -348
- data/lib/red_amber/data_frame.rb +92 -15
- data/lib/red_amber/data_frame_displayable.rb +25 -10
- data/lib/red_amber/data_frame_reshaping.rb +85 -0
- data/lib/red_amber/data_frame_variable_operation.rb +89 -40
- data/lib/red_amber/group.rb +5 -1
- data/lib/red_amber/vector_functions.rb +46 -1
- data/lib/red_amber/vector_selectable.rb +1 -1
- data/lib/red_amber/version.rb +1 -1
- data/lib/red_amber.rb +1 -1
- data/red_amber.gemspec +1 -1
- metadata +5 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 73459d02c921fcb0fcb742760e8c882b5491fa5316a79b9016233a516ada013e
|
4
|
+
data.tar.gz: ac25e808c5e5d4c13bb1877659550bba532cb5778371e39dfa1f3b9e5a91a4f8
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 1bfa4200d440c338f496fe282816634d6a833e30e17edc87a2cf5ec63866e2bbbaf8796916f1b052ea66482c54a038bbf1445258c2526691e42c2b47be2c39c5
|
7
|
+
data.tar.gz: e324e480e6086f7017de58201783c857825b79d0b2e2c8fa2636089cd1c5531e22905a3c0d860f26b833eb6add6ed6017497632bd1ea8fcb932c2d2233b11812
|
data/.rubocop.yml
CHANGED
@@ -61,6 +61,7 @@ Metrics/AbcSize:
|
|
61
61
|
Max: 30
|
62
62
|
Exclude:
|
63
63
|
- 'lib/red_amber/data_frame_displayable.rb' # Max: 55
|
64
|
+
- 'lib/red_amber/data_frame_reshaping.rb' # Max 40.91
|
64
65
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 51
|
65
66
|
- 'lib/red_amber/vector_updatable.rb' # Max: 36
|
66
67
|
- 'lib/red_amber/vector_selectable.rb' # Max: 33
|
@@ -98,9 +99,10 @@ Metrics/MethodLength:
|
|
98
99
|
Metrics/ModuleLength:
|
99
100
|
Max: 100
|
100
101
|
Exclude:
|
102
|
+
- 'lib/red_amber/data_frame_displayable.rb' # Max: 132
|
101
103
|
- 'lib/red_amber/data_frame_selectable.rb' # Max: 141
|
104
|
+
- 'lib/red_amber/data_frame_variable_operation.rb' # Max: 110
|
102
105
|
- 'lib/red_amber/vector_functions.rb' # Max: 114
|
103
|
-
- 'lib/red_amber/data_frame_displayable.rb' # Max: 132
|
104
106
|
|
105
107
|
# Max: 8
|
106
108
|
Metrics/PerceivedComplexity:
|
data/CHANGELOG.md
CHANGED
@@ -1,6 +1,75 @@
|
|
1
|
-
## [0.
|
1
|
+
## [0.2.0] - 2022-08-15
|
2
2
|
|
3
|
-
-
|
3
|
+
- Bump version up to 0.2.0
|
4
|
+
|
5
|
+
- Bug fixes
|
6
|
+
|
7
|
+
- Fix order of multiple group keys (#55)
|
8
|
+
|
9
|
+
Only 1 group key comes to left. Other keys remain in right.
|
10
|
+
|
11
|
+
- Remove optional `require` for rover (#55)
|
12
|
+
|
13
|
+
Fix DataFrame.new for argument with Rover::DataFrame.
|
14
|
+
|
15
|
+
- Fix occasional failure in CI (#59)
|
16
|
+
|
17
|
+
Sometimes the CI test fails. I added -dev dependency
|
18
|
+
in Arrow install by apt, not doing in bundler.
|
19
|
+
|
20
|
+
- Fix calling :take in V#[] (#56)
|
21
|
+
|
22
|
+
Fixed to call Arrow function :take instead of :array_take in Vector#take_by_vector. This will prevent the error below
|
23
|
+
when called with Arrow::ChunkedArray.
|
24
|
+
|
25
|
+
- Raise error renaming non existing key (#61)
|
26
|
+
|
27
|
+
Add error when specified key is not exist.
|
28
|
+
|
29
|
+
- Fix DataFrame#rename #assign by array (#65)
|
30
|
+
|
31
|
+
- New features and improvements
|
32
|
+
|
33
|
+
- Support Arrow 9.0.0
|
34
|
+
- Upgrade to Arrow 9.0.0 (#59)
|
35
|
+
- Add Vector#quantile method (#59)
|
36
|
+
Arrow::QuantileOptions has supported in Arrow GLib 9.0.0 (ARROW-16623, Thanks!)
|
37
|
+
|
38
|
+
- Add Vector#quantiles (#62)
|
39
|
+
|
40
|
+
- Add DataFrame#each_row (#56)
|
41
|
+
- Returns Enumerator if block is not given.
|
42
|
+
- Change DataFrame#each_row to return a Hash {key => row} (#63)
|
43
|
+
|
44
|
+
- Refactor to use pattern match in overloaded parameter parsing (#61)
|
45
|
+
- Refine DataFrame.new to use pattern match
|
46
|
+
- Use pattern match in DataFrame#assign
|
47
|
+
- Use pattern match in DataFrame#rename
|
48
|
+
|
49
|
+
- Accept Array for renamer/assigner in #rename/#assign (#61)
|
50
|
+
- Accept assigner by Arrays in DataFrame#assign
|
51
|
+
- Accept renamer pairs by Arrays in DataFrame#rename
|
52
|
+
- Add DataFrame#assign_left method
|
53
|
+
|
54
|
+
- Add summary/describe (#62)
|
55
|
+
- Introduce DataFrame#summary(#describe)
|
56
|
+
|
57
|
+
- Introduce reshaping methods for DataFrame (#64)
|
58
|
+
- Introduce DataFrame#transpose method
|
59
|
+
- Intorduce DataFrame#to_long method
|
60
|
+
- Intorduce DataFrame#to_wide method
|
61
|
+
|
62
|
+
- Others
|
63
|
+
|
64
|
+
- Add alias sort_index for array_sort_indices (#59)
|
65
|
+
- Enable :width option in DataFrame#to_s (#62)
|
66
|
+
- Add options to DataFrame#format_table (#62)
|
67
|
+
|
68
|
+
- Update Documents
|
69
|
+
|
70
|
+
- Add Yard doc for some methods
|
71
|
+
|
72
|
+
- Update Jupyter notebook '61 Examples of Red Amber' (#65)
|
4
73
|
|
5
74
|
## [0.1.8] - 2022-08-04 (experimental)
|
6
75
|
|
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -3,17 +3,23 @@
|
|
3
3
|
[](https://badge.fury.io/rb/red_amber)
|
4
4
|
[](https://github.com/heronshoes/red_amber/actions/workflows/test.yml)
|
5
5
|
|
6
|
-
A simple dataframe library for Ruby
|
6
|
+
A simple dataframe library for Ruby.
|
7
7
|
|
8
8
|
- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow) [](https://gitter.im/red-data-tools/en)
|
9
9
|
- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
|
10
10
|
|
11
11
|
## Requirements
|
12
12
|
|
13
|
+
Supported Ruby version is >= 2.7.
|
14
|
+
|
15
|
+
Since v0.2.0, this library uses pattern matching which is an experimental feature in 2.7 . It is usable but a warning message will be shown in 2.7 .
|
16
|
+
I recommend Ruby 3 for performance.
|
17
|
+
|
13
18
|
```ruby
|
14
|
-
|
19
|
+
# Libraries required
|
20
|
+
gem 'red-arrow', '>= 9.0.0'
|
15
21
|
|
16
|
-
gem 'red-parquet', '>=
|
22
|
+
gem 'red-parquet', '>= 9.0.0' # Optional, if you use IO from/to parquet
|
17
23
|
gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
|
18
24
|
```
|
19
25
|
|
@@ -21,9 +27,9 @@ gem 'rover-df', '~> 0.3.0' # Optional, if you use IO from/to Rover::DataFrame
|
|
21
27
|
|
22
28
|
Install requirements before you install Red Amber.
|
23
29
|
|
24
|
-
- Apache Arrow GLib (>=
|
30
|
+
- Apache Arrow GLib (>= 9.0.0)
|
25
31
|
|
26
|
-
- Apache Parquet GLib (>=
|
32
|
+
- Apache Parquet GLib (>= 9.0.0) # If you use IO from/to parquet
|
27
33
|
|
28
34
|
See [Apache Arrow install document](https://arrow.apache.org/install/).
|
29
35
|
|
@@ -122,22 +128,22 @@ df = df.drop(true, true, false)
|
|
122
128
|
|
123
129
|
# =>
|
124
130
|
#<RedAmber::DataFrame : 344 x 1 Vector, 0x0000000000048760>
|
125
|
-
body_mass_g
|
126
|
-
<uint16>
|
127
|
-
1 3750
|
128
|
-
2 3800
|
129
|
-
3 3250
|
130
|
-
4 (nil)
|
131
|
-
5 3450
|
132
|
-
: :
|
133
|
-
342 5750
|
134
|
-
343 5200
|
131
|
+
body_mass_g
|
132
|
+
<uint16>
|
133
|
+
1 3750
|
134
|
+
2 3800
|
135
|
+
3 3250
|
136
|
+
4 (nil)
|
137
|
+
5 3450
|
138
|
+
: :
|
139
|
+
342 5750
|
140
|
+
343 5200
|
135
141
|
344 5400
|
136
142
|
```
|
137
143
|
|
138
144
|
Arrow data is immutable, so these methods always return an new object.
|
139
145
|
|
140
|
-
`DataFrame#assign` creates new
|
146
|
+
`DataFrame#assign` creates new columns or update existing columns.
|
141
147
|
|
142
148
|

|
143
149
|
|
@@ -208,7 +214,7 @@ penguins.remove(penguins[:bill_length_mm] < 40)
|
|
208
214
|
|
209
215
|
DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
|
210
216
|
|
211
|
-
This example is usage of block to update
|
217
|
+
This example is usage of block to update a column.
|
212
218
|
|
213
219
|
```ruby
|
214
220
|
df = RedAmber::DataFrame.new(
|
@@ -229,30 +235,28 @@ df
|
|
229
235
|
5 (nil) (nil) (nil) (nil)
|
230
236
|
|
231
237
|
df.assign do
|
232
|
-
vectors.
|
233
|
-
|
234
|
-
end
|
238
|
+
vectors.select(&:float?).map { |v| [v.key, -v] }
|
239
|
+
# => returns [[:float], [-0.0, -1.1, -2.2, NAN, nil]]
|
235
240
|
end
|
236
241
|
|
237
242
|
# =>
|
238
|
-
#<RedAmber::DataFrame : 5 x
|
239
|
-
|
240
|
-
<uint8> <double> <string>
|
241
|
-
1 0 -0.0 A
|
242
|
-
2
|
243
|
-
3
|
244
|
-
4
|
245
|
-
5 (nil) (nil) (nil)
|
243
|
+
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000e270c>
|
244
|
+
index float string
|
245
|
+
<uint8> <double> <string>
|
246
|
+
1 0 -0.0 A
|
247
|
+
2 1 -1.1 B
|
248
|
+
3 2 -2.2 C
|
249
|
+
4 3 NaN D
|
250
|
+
5 (nil) (nil) (nil)
|
246
251
|
```
|
247
252
|
|
248
|
-
|
249
|
-
|
250
|
-
Next example is to eliminate observations (row in the table) containing nil.
|
253
|
+
Next example is to eliminate rows containing nil.
|
251
254
|
|
252
255
|
```ruby
|
253
256
|
# remove all observations containing nil
|
254
257
|
nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
|
255
258
|
nil_removed.tdr
|
259
|
+
|
256
260
|
# =>
|
257
261
|
RedAmber::DataFrame : 342 x 8 Vectors
|
258
262
|
Vectors : 5 numeric, 3 strings
|
@@ -273,6 +277,21 @@ For this frequently needed task, we can do it much simpler.
|
|
273
277
|
penguins.remove_nil # => same result as above
|
274
278
|
```
|
275
279
|
|
280
|
+
`DataFrame#summary` shows summary statistics in a DataFrame.
|
281
|
+
|
282
|
+
```ruby
|
283
|
+
puts penguins.summary.to_s(width: 82)
|
284
|
+
|
285
|
+
# =>
|
286
|
+
variables count mean std min 25% median 75% max
|
287
|
+
<dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
|
288
|
+
1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
|
289
|
+
2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
|
290
|
+
3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
|
291
|
+
4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
|
292
|
+
5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
|
293
|
+
```
|
294
|
+
|
276
295
|
`DataFrame#group` method can be used for the grouping tasks.
|
277
296
|
|
278
297
|
```ruby
|
@@ -311,7 +330,7 @@ grouped.slice { v(:count) > 1 }
|
|
311
330
|
9 Kaminoan 2 221.0 88.0
|
312
331
|
```
|
313
332
|
|
314
|
-
See [DataFrame.md](doc/DataFrame.md) for details.
|
333
|
+
See [DataFrame.md](doc/DataFrame.md) for other examples and details.
|
315
334
|
|
316
335
|
|
317
336
|
## `RedAmber::Vector`
|
@@ -355,7 +374,7 @@ See [Vector.md](doc/Vector.md) for details.
|
|
355
374
|
|
356
375
|
## Jupyter notebook
|
357
376
|
|
358
|
-
[
|
377
|
+
[61 Examples of Red Amber](doc/examples_of_red_amber.ipynb) shows more examples in jupyter notebook.
|
359
378
|
|
360
379
|
## Development
|
361
380
|
|
@@ -366,6 +385,12 @@ bundle install
|
|
366
385
|
bundle exec rake test
|
367
386
|
```
|
368
387
|
|
388
|
+
I will appreciate if you could help to improve this project. Here are a few ways you can help:
|
389
|
+
|
390
|
+
- [Report bugs or suggest new features](https://github.com/heronshoes/red_amber/issues)
|
391
|
+
- Fix bugs and [submit pull requests](https://github.com/heronshoes/red_amber/pulls)
|
392
|
+
- Write, clarify, or fix documentation
|
393
|
+
|
369
394
|
## License
|
370
395
|
|
371
396
|
The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
|
data/doc/DataFrame.md
CHANGED
@@ -167,6 +167,11 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
167
167
|
|
168
168
|
If you need a column-oriented full array, use `.to_h.to_a`
|
169
169
|
|
170
|
+
### `each_row`
|
171
|
+
|
172
|
+
Yield each row in a `{ key => row}` Hash.
|
173
|
+
Returns Enumerator if block is not given.
|
174
|
+
|
170
175
|
### `schema`
|
171
176
|
|
172
177
|
- Returns column name and data type in a Hash.
|
@@ -202,7 +207,22 @@ puts penguins.to_s
|
|
202
207
|
`inspect` uses `to_s` output and also shows shape and object_id.
|
203
208
|
|
204
209
|
|
205
|
-
### `summary`, `describe`
|
210
|
+
### `summary`, `describe`
|
211
|
+
|
212
|
+
`DataFrame#summary` or `DataFrame#describe` shows summary statistics in a DataFrame.
|
213
|
+
|
214
|
+
```ruby
|
215
|
+
puts penguins.summary.to_s(width: 82) # needs more width to show all stats in this example
|
216
|
+
|
217
|
+
# =>
|
218
|
+
variables count mean std min 25% median 75% max
|
219
|
+
<dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
|
220
|
+
1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
|
221
|
+
2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
|
222
|
+
3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
|
223
|
+
4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
|
224
|
+
5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
|
225
|
+
```
|
206
226
|
|
207
227
|
### `to_rover`
|
208
228
|
|
@@ -704,7 +724,7 @@ penguins.to_rover
|
|
704
724
|
|
705
725
|
- Key pairs as arguments
|
706
726
|
|
707
|
-
`rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}`.
|
727
|
+
`rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`.
|
708
728
|
|
709
729
|
```ruby
|
710
730
|
df = RedAmber::DataFrame.new( 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] )
|
@@ -721,7 +741,11 @@ penguins.to_rover
|
|
721
741
|
|
722
742
|
- Key pairs by a block
|
723
743
|
|
724
|
-
`rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}`. Block is called in the context of self.
|
744
|
+
`rename {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return key_pairs as a Hash of `{existing_key => new_key}` or an Array of Arrays like `[[existing_key, new_key], ... ]`. Block is called in the context of self.
|
745
|
+
|
746
|
+
- Not existing keys
|
747
|
+
|
748
|
+
If specified `existing_key` is not exist, raise a `DataFrameArgumentError`.
|
725
749
|
|
726
750
|
- Key type
|
727
751
|
|
@@ -729,16 +753,16 @@ penguins.to_rover
|
|
729
753
|
|
730
754
|
### `assign`
|
731
755
|
|
732
|
-
Assign new or updated
|
756
|
+
Assign new or updated columns (variables) and create a updated DataFrame.
|
733
757
|
|
734
|
-
- Variables with new keys will append new
|
758
|
+
- Variables with new keys will append new columns from the right.
|
735
759
|
- Variables with exisiting keys will update corresponding vectors.
|
736
760
|
|
737
761
|

|
738
762
|
|
739
763
|
- Variables as arguments
|
740
764
|
|
741
|
-
`assign(key_pairs)` accepts pairs of key and values as
|
765
|
+
`assign(key_pairs)` accepts pairs of key and values as parameters. `key_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
|
742
766
|
|
743
767
|
```ruby
|
744
768
|
df = RedAmber::DataFrame.new(
|
@@ -769,7 +793,7 @@ penguins.to_rover
|
|
769
793
|
|
770
794
|
- Key pairs by a block
|
771
795
|
|
772
|
-
`assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key =>
|
796
|
+
`assign {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return pairs of key and values as a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`. The block is called in the context of self.
|
773
797
|
|
774
798
|
```ruby
|
775
799
|
df = RedAmber::DataFrame.new(
|
@@ -788,29 +812,27 @@ penguins.to_rover
|
|
788
812
|
4 3 NaN D
|
789
813
|
5 (nil) (nil) (nil)
|
790
814
|
|
791
|
-
# update
|
815
|
+
# update :float
|
816
|
+
# assigner by an Array
|
792
817
|
df.assign do
|
793
|
-
|
794
|
-
|
795
|
-
assigner[keys[i]] = v * -1 if v.numeric?
|
796
|
-
end
|
797
|
-
assigner
|
818
|
+
vectors.select(&:float?)
|
819
|
+
.map { |v| [v.key, -v] }
|
798
820
|
end
|
799
821
|
|
800
822
|
# =>
|
801
|
-
#<RedAmber::DataFrame : 5 x 3 Vectors,
|
802
|
-
|
803
|
-
<
|
804
|
-
1
|
805
|
-
2
|
806
|
-
3
|
807
|
-
4
|
808
|
-
5
|
809
|
-
|
810
|
-
# Or
|
823
|
+
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000dfffc>
|
824
|
+
index float string
|
825
|
+
<uint8> <double> <string>
|
826
|
+
1 0 -0.0 A
|
827
|
+
2 1 -1.1 B
|
828
|
+
3 2 -2.2 C
|
829
|
+
4 3 NaN D
|
830
|
+
5 (nil) (nil) (nil)
|
831
|
+
|
832
|
+
# Or we can use assigner by a Hash
|
811
833
|
df.assign do
|
812
|
-
|
813
|
-
assigner[key] =
|
834
|
+
vectors.select.with_object({}) do |v, assigner|
|
835
|
+
assigner[v.key] = -v if v.float?
|
814
836
|
end
|
815
837
|
end
|
816
838
|
|
@@ -821,6 +843,28 @@ penguins.to_rover
|
|
821
843
|
|
822
844
|
Symbol key and String key are considered as the same key.
|
823
845
|
|
846
|
+
- Empty assignment
|
847
|
+
|
848
|
+
If assigner is empty or nil, returns self.
|
849
|
+
|
850
|
+
- Append from left
|
851
|
+
|
852
|
+
`assign_left` method accepts the same parameters and block as `assign`, but append new columns from leftside.
|
853
|
+
|
854
|
+
```ruby
|
855
|
+
df.assign_left(new_index: [1, 2, 3, 4, 5])
|
856
|
+
|
857
|
+
# =>
|
858
|
+
#<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000001787c>
|
859
|
+
new_index index float string
|
860
|
+
<uint8> <uint8> <double> <string>
|
861
|
+
1 1 0 0.0 A
|
862
|
+
2 2 1 1.1 B
|
863
|
+
3 3 2 2.2 C
|
864
|
+
4 4 3 NaN D
|
865
|
+
5 5 (nil) (nil) (nil)
|
866
|
+
```
|
867
|
+
|
824
868
|
## Updating
|
825
869
|
|
826
870
|
### `sort`
|
@@ -933,17 +977,17 @@ penguins.to_rover
|
|
933
977
|
starwars.group(:species).count(:species)
|
934
978
|
|
935
979
|
# =>
|
936
|
-
#<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
|
937
|
-
species count
|
938
|
-
<string> <int64>
|
939
|
-
1 Human 35
|
940
|
-
2 Droid 6
|
941
|
-
3 Wookiee 2
|
942
|
-
4 Rodian 1
|
943
|
-
5 Hutt 1
|
944
|
-
: : :
|
945
|
-
36 Kaleesh 1
|
946
|
-
37 Pau'an 1
|
980
|
+
#<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
|
981
|
+
species count
|
982
|
+
<string> <int64>
|
983
|
+
1 Human 35
|
984
|
+
2 Droid 6
|
985
|
+
3 Wookiee 2
|
986
|
+
4 Rodian 1
|
987
|
+
5 Hutt 1
|
988
|
+
: : :
|
989
|
+
36 Kaleesh 1
|
990
|
+
37 Pau'an 1
|
947
991
|
38 Kel Dor 1
|
948
992
|
```
|
949
993
|
|
@@ -953,17 +997,17 @@ penguins.to_rover
|
|
953
997
|
grouped = starwars.group(:species) { [count(:species), mean(:height, :mass)] }
|
954
998
|
|
955
999
|
# =>
|
956
|
-
#<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
|
957
|
-
|
958
|
-
<
|
959
|
-
1 Human
|
960
|
-
2 Droid
|
961
|
-
3
|
962
|
-
4 Rodian
|
963
|
-
5 Hutt
|
964
|
-
: :
|
965
|
-
36
|
966
|
-
37 Pau'an
|
1000
|
+
#<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
|
1001
|
+
specie s count mean(height) mean(mass)
|
1002
|
+
<strin g> <int64> <double> <double>
|
1003
|
+
1 Human 35 176.6 82.8
|
1004
|
+
2 Droid 6 131.2 69.8
|
1005
|
+
3 Wookie e 2 231.0 124.0
|
1006
|
+
4 Rodian 1 173.0 74.0
|
1007
|
+
5 Hutt 1 175.0 1358.0
|
1008
|
+
: : : : :
|
1009
|
+
36 Kalees h 1 216.0 159.0
|
1010
|
+
37 Pau'an 1 206.0 80.0
|
967
1011
|
38 Kel Dor 1 188.0 80.0
|
968
1012
|
```
|
969
1013
|
|
@@ -987,18 +1031,115 @@ penguins.to_rover
|
|
987
1031
|
9 Kaminoan 2 221.0 88.0
|
988
1032
|
```
|
989
1033
|
|
990
|
-
##
|
1034
|
+
## Reshape
|
991
1035
|
|
992
|
-
|
1036
|
+
### `transpose`
|
993
1037
|
|
994
|
-
|
1038
|
+
Creates transposed DataFrame for wide type dataframe.
|
995
1039
|
|
996
|
-
|
1040
|
+
```ruby
|
1041
|
+
import_cars = RedAmber::DataFrame.load('test/entity/import_cars.tsv')
|
997
1042
|
|
998
|
-
|
1043
|
+
# =>
|
1044
|
+
#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000d520>
|
1045
|
+
Year Audi BMW BMW_MINI Mercedes-Benz VW
|
1046
|
+
<int64> <int64> <int64> <int64> <int64> <int64>
|
1047
|
+
1 2021 22535 35905 18211 51722 35215
|
1048
|
+
2 2020 22304 35712 20196 57041 36576
|
1049
|
+
3 2019 24222 46814 23813 66553 46794
|
1050
|
+
4 2018 26473 50982 25984 67554 51961
|
1051
|
+
5 2017 28336 52527 25427 68221 49040
|
999
1052
|
|
1000
|
-
|
1053
|
+
import_cars.transpose
|
1001
1054
|
|
1002
|
-
|
1055
|
+
# =>
|
1056
|
+
#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000ef74>
|
1057
|
+
name 2021 2020 2019 2018 2017
|
1058
|
+
<dictionary> <uint16> <uint16> <uint32> <uint32> <uint32>
|
1059
|
+
1 Audi 22535 22304 24222 26473 28336
|
1060
|
+
2 BMW 35905 35712 46814 50982 52527
|
1061
|
+
3 BMW_MINI 18211 20196 23813 25984 25427
|
1062
|
+
4 Mercedes-Benz 51722 57041 66553 67554 68221
|
1063
|
+
5 VW 35215 36576 46794 51961 49040
|
1064
|
+
```
|
1065
|
+
|
1066
|
+
The leftmost column is created by original keys. Key name of the column is
|
1067
|
+
named by 'name'.
|
1068
|
+
|
1069
|
+
### `to_long(*keep_keys)`
|
1070
|
+
|
1071
|
+
Creates a 'long' DataFrame.
|
1072
|
+
|
1073
|
+
- Parameter `keep_keys` specifies the key names to keep.
|
1074
|
+
|
1075
|
+
```ruby
|
1076
|
+
import_cars.to_long(:Year)
|
1077
|
+
|
1078
|
+
# =>
|
1079
|
+
#<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000012750>
|
1080
|
+
Year name value
|
1081
|
+
<uint16> <dictionary> <uint32>
|
1082
|
+
1 2021 Audi 22535
|
1083
|
+
2 2021 BMW 35905
|
1084
|
+
3 2021 BMW_MINI 18211
|
1085
|
+
4 2021 Mercedes-Benz 51722
|
1086
|
+
5 2021 VW 35215
|
1087
|
+
: : : :
|
1088
|
+
23 2017 BMW_MINI 25427
|
1089
|
+
24 2017 Mercedes-Benz 68221
|
1090
|
+
25 2017 VW 49040
|
1091
|
+
```
|
1092
|
+
|
1093
|
+
- Option `:name` : key of the column which is come **from key names**.
|
1094
|
+
- Option `:value` : key of the column which is come **from values**.
|
1095
|
+
|
1096
|
+
```ruby
|
1097
|
+
import_cars.to_long(:Year, name: :Manufacturer, value: :Num_of_imported)
|
1098
|
+
|
1099
|
+
# =>
|
1100
|
+
#<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000017700>
|
1101
|
+
Year Manufacturer Num_of_imported
|
1102
|
+
<uint16> <dictionary> <uint32>
|
1103
|
+
1 2021 Audi 22535
|
1104
|
+
2 2021 BMW 35905
|
1105
|
+
3 2021 BMW_MINI 18211
|
1106
|
+
4 2021 Mercedes-Benz 51722
|
1107
|
+
5 2021 VW 35215
|
1108
|
+
: : : :
|
1109
|
+
23 2017 BMW_MINI 25427
|
1110
|
+
24 2017 Mercedes-Benz 68221
|
1111
|
+
25 2017 VW 49040
|
1112
|
+
```
|
1003
1113
|
|
1004
|
-
|
1114
|
+
### `to_wide`
|
1115
|
+
|
1116
|
+
Creates a 'wide' DataFrame.
|
1117
|
+
|
1118
|
+
- Option `:name` : key of the column which will be expanded **to key name**.
|
1119
|
+
- Option `:value` : key of the column which will be expanded **to values**.
|
1120
|
+
|
1121
|
+
```ruby
|
1122
|
+
import_cars.to_long(:Year).to_wide
|
1123
|
+
# import_cars.to_long(:Year).to_wide(name: :name, value: :value)
|
1124
|
+
# is also OK
|
1125
|
+
|
1126
|
+
# =>
|
1127
|
+
#<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000f0f0>
|
1128
|
+
Year Audi BMW BMW_MINI Mercedes-Benz VW
|
1129
|
+
<uint16> <uint16> <uint16> <uint16> <uint32> <uint16>
|
1130
|
+
1 2021 22535 35905 18211 51722 35215
|
1131
|
+
2 2020 22304 35712 20196 57041 36576
|
1132
|
+
3 2019 24222 46814 23813 66553 46794
|
1133
|
+
4 2018 26473 50982 25984 67554 51961
|
1134
|
+
5 2017 28336 52527 25427 68221 49040
|
1135
|
+
```
|
1136
|
+
|
1137
|
+
## Combine
|
1138
|
+
|
1139
|
+
- [ ] Combining dataframes
|
1140
|
+
|
1141
|
+
- [ ] Join
|
1142
|
+
|
1143
|
+
## Encoding
|
1144
|
+
|
1145
|
+
- [ ] One-hot encoding
|
data/doc/Vector.md
CHANGED
@@ -145,7 +145,7 @@ array[booleans]
|
|
145
145
|
| ✓ `min_max` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| |
|
146
146
|
|[ ]`mode` | | [ ] | |[ ] Mode | |
|
147
147
|
| ✓ `product` | ✓ | ✓ | | ✓ ScalarAggregate| |
|
148
|
-
|
|
148
|
+
| ✓ `quantile`| | ✓ | | ✓ Quantile|Specify probability in (0..1) by a parameter (default=0.5)|
|
149
149
|
| ✓ `sd ` | | ✓ | | |ddof: 1 at `stddev`|
|
150
150
|
| ✓ `stddev` | | ✓ | | ✓ Variance|ddof: 0 by default|
|
151
151
|
| ✓ `sum` | ✓ | ✓ | | ✓ ScalarAggregate| |
|
@@ -303,6 +303,10 @@ double.round(n_digits: -1)
|
|
303
303
|
|
304
304
|
Returns index of specified element.
|
305
305
|
|
306
|
+
### `quantiles(probs = [1.0, 0.75, 0.5, 0.25, 0.0], interpolation: :linear, skip_nils: true, min_count: 0)`
|
307
|
+
|
308
|
+
Returns quantiles for specified probabilities in a DataFrame.
|
309
|
+
|
306
310
|
### `sort_indexes`, `sort_indices`, `array_sort_indices`
|
307
311
|
|
308
312
|
### [ ] `sort`, `sort_by`
|