red_amber 0.1.5 → 0.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +33 -5
- data/.rubocop_todo.yml +2 -15
- data/.yardopts +1 -0
- data/CHANGELOG.md +164 -18
- data/Gemfile +6 -1
- data/README.md +247 -33
- data/Rakefile +1 -0
- data/benchmark/csv_load_penguins.yml +1 -1
- data/doc/DataFrame.md +383 -219
- data/doc/Vector.md +247 -37
- data/doc/examples_of_red_amber.ipynb +5454 -0
- data/doc/image/dataframe/assign.png +0 -0
- data/doc/image/dataframe/drop.png +0 -0
- data/doc/image/dataframe/pick.png +0 -0
- data/doc/image/dataframe/remove.png +0 -0
- data/doc/image/dataframe/rename.png +0 -0
- data/doc/image/dataframe/slice.png +0 -0
- data/doc/image/dataframe_model.png +0 -0
- data/doc/image/vector/binary_element_wise.png +0 -0
- data/doc/image/vector/unary_aggregation.png +0 -0
- data/doc/image/vector/unary_aggregation_w_option.png +0 -0
- data/doc/image/vector/unary_element_wise.png +0 -0
- data/lib/red-amber.rb +3 -0
- data/lib/red_amber/data_frame.rb +62 -10
- data/lib/red_amber/data_frame_displayable.rb +86 -9
- data/lib/red_amber/data_frame_selectable.rb +151 -32
- data/lib/red_amber/data_frame_variable_operation.rb +4 -0
- data/lib/red_amber/group.rb +59 -0
- data/lib/red_amber/helper.rb +61 -0
- data/lib/red_amber/vector.rb +59 -15
- data/lib/red_amber/vector_functions.rb +47 -38
- data/lib/red_amber/vector_selectable.rb +126 -0
- data/lib/red_amber/vector_updatable.rb +125 -0
- data/lib/red_amber/version.rb +1 -1
- data/lib/red_amber.rb +6 -3
- data/red_amber.gemspec +0 -2
- metadata +9 -33
- data/lib/red_amber/data_frame_helper.rb +0 -64
- data/lib/red_amber/data_frame_observation_operation.rb +0 -83
- data/lib/red_amber/vector_compensable.rb +0 -68
data/doc/DataFrame.md
CHANGED
@@ -4,7 +4,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
4
4
|
- A collection of data which have same data type within. We call it `Vector`.
|
5
5
|
- A label is attached to `Vector`. We call it `key`.
|
6
6
|
- A `Vector` and associated `key` is grouped as a `variable`.
|
7
|
-
- `variable`s with same vector length are aligned and arranged to be a `
|
7
|
+
- `variable`s with same vector length are aligned and arranged to be a `DataFrame`.
|
8
8
|
- Each `Vector` in a `DataFrame` contains a set of relating data at same position. We call it `observation`.
|
9
9
|
|
10
10
|

|
@@ -35,6 +35,8 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
35
35
|
|
36
36
|
|
37
37
|
```ruby
|
38
|
+
require 'rover'
|
39
|
+
|
38
40
|
rover = Rover::DataFrame.new(x: [1, 2, 3])
|
39
41
|
RedAmber::DataFrame.new(rover)
|
40
42
|
```
|
@@ -52,13 +54,15 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
52
54
|
- from a URI
|
53
55
|
|
54
56
|
```ruby
|
55
|
-
uri = URI("
|
57
|
+
uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
|
56
58
|
RedAmber::DataFrame.load(uri)
|
57
59
|
```
|
58
60
|
|
59
61
|
- from a Parquet file
|
60
62
|
|
61
63
|
```ruby
|
64
|
+
require 'parquet'
|
65
|
+
|
62
66
|
dataframe = RedAmber::DataFrame.load("file.parquet")
|
63
67
|
```
|
64
68
|
|
@@ -73,6 +77,8 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
73
77
|
- to a Parquet file
|
74
78
|
|
75
79
|
```ruby
|
80
|
+
require 'parquet'
|
81
|
+
|
76
82
|
dataframe.save("file.parquet")
|
77
83
|
```
|
78
84
|
|
@@ -147,9 +153,9 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
147
153
|
|
148
154
|
- Returns an Array of Vectors.
|
149
155
|
|
150
|
-
### `
|
156
|
+
### `indices`, `indexes`
|
151
157
|
|
152
|
-
- Returns all indexes in
|
158
|
+
- Returns all indexes in an Array.
|
153
159
|
|
154
160
|
### `to_h`
|
155
161
|
|
@@ -173,12 +179,45 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
173
179
|
|
174
180
|
### `to_s`
|
175
181
|
|
182
|
+
`to_s` returns a preview of the Table.
|
183
|
+
|
184
|
+
```ruby
|
185
|
+
puts penguins.to_s
|
186
|
+
|
187
|
+
# =>
|
188
|
+
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
189
|
+
<string> <string> <double> <double> <uint8> ... <uint16>
|
190
|
+
1 Adelie Torgersen 39.1 18.7 181 ... 2007
|
191
|
+
2 Adelie Torgersen 39.5 17.4 186 ... 2007
|
192
|
+
3 Adelie Torgersen 40.3 18.0 195 ... 2007
|
193
|
+
4 Adelie Torgersen (nil) (nil) (nil) ... 2007
|
194
|
+
5 Adelie Torgersen 36.7 19.3 193 ... 2007
|
195
|
+
: : : : : : ... :
|
196
|
+
342 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
197
|
+
343 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
198
|
+
344 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
199
|
+
```
|
200
|
+
### `inspect`
|
201
|
+
|
202
|
+
`inspect` uses `to_s` output and also shows shape and object_id.
|
203
|
+
|
204
|
+
|
176
205
|
### `summary`, `describe` (not implemented)
|
177
206
|
|
178
207
|
### `to_rover`
|
179
208
|
|
180
209
|
- Returns a `Rover::DataFrame`.
|
181
210
|
|
211
|
+
```ruby
|
212
|
+
require 'rover'
|
213
|
+
|
214
|
+
penguins.to_rover
|
215
|
+
```
|
216
|
+
|
217
|
+
### `to_iruby`
|
218
|
+
|
219
|
+
- Show the DataFrame as a Table in Jupyter Notebook or Jupyter Lab with IRuby.
|
220
|
+
|
182
221
|
### `tdr(limit = 10, tally: 5, elements: 5)`
|
183
222
|
|
184
223
|
- Shows some information about self in a transposed style.
|
@@ -190,6 +229,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
190
229
|
|
191
230
|
penguins = Datasets::Penguins.new.to_arrow
|
192
231
|
RedAmber::DataFrame.new(penguins).tdr
|
232
|
+
|
193
233
|
# =>
|
194
234
|
RedAmber::DataFrame : 344 x 8 Vectors
|
195
235
|
Vectors : 5 numeric, 3 strings
|
@@ -208,22 +248,6 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
208
248
|
- tally: max level to use tally mode.
|
209
249
|
- elements: max num of element to show values in each observations.
|
210
250
|
|
211
|
-
### `inspect`
|
212
|
-
|
213
|
-
- Returns the information of self as `tdr(3)`, and also shows object id.
|
214
|
-
|
215
|
-
```ruby
|
216
|
-
puts penguins.inspect
|
217
|
-
# =>
|
218
|
-
#<RedAmber::DataFrame : 344 x 8 Vectors, 0x000000000000f0b4>
|
219
|
-
Vectors : 5 numeric, 3 strings
|
220
|
-
# key type level data_preview
|
221
|
-
1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
|
222
|
-
2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
|
223
|
-
3 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
|
224
|
-
... 5 more Vectors ...
|
225
|
-
```
|
226
|
-
|
227
251
|
## Selecting
|
228
252
|
|
229
253
|
### Select variables (columns in a table) by `[]` as `[key]`, `[keys]`, `[keys[index]]`
|
@@ -244,19 +268,21 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
244
268
|
hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
|
245
269
|
df = RedAmber::DataFrame.new(hash)
|
246
270
|
df[:b..:c, "a"]
|
271
|
+
|
247
272
|
# =>
|
248
|
-
#<RedAmber::DataFrame : 3 x 3 Vectors,
|
249
|
-
|
250
|
-
|
251
|
-
1
|
252
|
-
2
|
253
|
-
3
|
273
|
+
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000328fc>
|
274
|
+
b c a
|
275
|
+
<string> <double> <uint8>
|
276
|
+
1 A 1.0 1
|
277
|
+
2 B 2.0 2
|
278
|
+
3 C 3.0 3
|
254
279
|
```
|
255
280
|
|
256
281
|
If `#[]` represents single variable (column), it returns a Vector object.
|
257
282
|
|
258
283
|
```ruby
|
259
284
|
df[:a]
|
285
|
+
|
260
286
|
# =>
|
261
287
|
#<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
|
262
288
|
[1, 2, 3]
|
@@ -265,6 +291,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
265
291
|
|
266
292
|
```ruby
|
267
293
|
df.v(:a)
|
294
|
+
|
268
295
|
# =>
|
269
296
|
#<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
|
270
297
|
[1, 2, 3]
|
@@ -280,19 +307,24 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
280
307
|
An end-less or a begin-less Range can be used to represent indeces.
|
281
308
|
|
282
309
|
- Select obs. by indeces in an Array: `df[1, 2]`
|
310
|
+
|
311
|
+
- You can use float indices.
|
312
|
+
|
283
313
|
- Mixed case: `df[2, 0..]`
|
284
314
|
|
285
315
|
```ruby
|
286
316
|
hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
|
287
317
|
df = RedAmber::DataFrame.new(hash)
|
288
|
-
df[
|
318
|
+
df[2, 0..]
|
319
|
+
|
289
320
|
# =>
|
290
|
-
RedAmber::DataFrame : 4 x 3 Vectors
|
291
|
-
|
292
|
-
|
293
|
-
1
|
294
|
-
2
|
295
|
-
3
|
321
|
+
#<RedAmber::DataFrame : 4 x 3 Vectors, 0x0000000000033270>
|
322
|
+
a b c
|
323
|
+
<uint8> <string> <double>
|
324
|
+
1 3 C 3.0
|
325
|
+
2 1 A 1.0
|
326
|
+
3 2 B 2.0
|
327
|
+
4 3 C 3.0
|
296
328
|
```
|
297
329
|
|
298
330
|
- Select obs. by a boolean Array or a boolean RedAmber::Vector at same size as self.
|
@@ -304,13 +336,12 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
304
336
|
df[true, false, nil] # or
|
305
337
|
df[[true, false, nil]] # or
|
306
338
|
df[RedAmber::Vector.new([true, false, nil])]
|
339
|
+
|
307
340
|
# =>
|
308
|
-
#<RedAmber::DataFrame : 1 x 3 Vectors,
|
309
|
-
|
310
|
-
|
311
|
-
1
|
312
|
-
2 :b string 1 ["A"]
|
313
|
-
3 :c double 1 [1.0]
|
341
|
+
#<RedAmber::DataFrame : 1 x 3 Vectors, 0x00000000000353e0>
|
342
|
+
a b c
|
343
|
+
<uint8> <string> <double>
|
344
|
+
1 1 A 1.0
|
314
345
|
```
|
315
346
|
|
316
347
|
### Select rows from top or from bottom
|
@@ -331,12 +362,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
331
362
|
|
332
363
|
```ruby
|
333
364
|
penguins.pick(:species, :bill_length_mm)
|
365
|
+
|
334
366
|
# =>
|
335
|
-
#<RedAmber::DataFrame : 344 x 2 Vectors,
|
336
|
-
|
337
|
-
|
338
|
-
|
339
|
-
|
367
|
+
#<RedAmber::DataFrame : 344 x 2 Vectors, 0x0000000000035ebc>
|
368
|
+
species bill_length_mm
|
369
|
+
<string> <double>
|
370
|
+
1 Adelie 39.1
|
371
|
+
2 Adelie 39.5
|
372
|
+
3 Adelie 40.3
|
373
|
+
4 Adelie (nil)
|
374
|
+
5 Adelie 36.7
|
375
|
+
: : :
|
376
|
+
342 Gentoo 50.4
|
377
|
+
343 Gentoo 45.2
|
378
|
+
344 Gentoo 49.9
|
340
379
|
```
|
341
380
|
|
342
381
|
- Booleans as a argument
|
@@ -345,13 +384,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
345
384
|
|
346
385
|
```ruby
|
347
386
|
penguins.pick(penguins.types.map { |type| type == :string })
|
387
|
+
|
348
388
|
# =>
|
349
|
-
#<RedAmber::DataFrame : 344 x 3 Vectors,
|
350
|
-
|
351
|
-
|
352
|
-
|
353
|
-
|
354
|
-
|
389
|
+
#<RedAmber::DataFrame : 344 x 3 Vectors, 0x00000000000387ac>
|
390
|
+
species island sex
|
391
|
+
<string> <string> <string>
|
392
|
+
1 Adelie Torgersen male
|
393
|
+
2 Adelie Torgersen female
|
394
|
+
3 Adelie Torgersen female
|
395
|
+
4 Adelie Torgersen (nil)
|
396
|
+
5 Adelie Torgersen female
|
397
|
+
: : : :
|
398
|
+
342 Gentoo Biscoe male
|
399
|
+
343 Gentoo Biscoe female
|
400
|
+
344 Gentoo Biscoe male
|
355
401
|
```
|
356
402
|
|
357
403
|
- Keys or booleans by a block
|
@@ -359,15 +405,21 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
359
405
|
`pick {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return keys, or a boolean Array with a same length as `n_keys`. Block is called in the context of self.
|
360
406
|
|
361
407
|
```ruby
|
362
|
-
# It is ok to write `keys ...` in the block, not `penguins.keys ...`
|
363
408
|
penguins.pick { keys.map { |key| key.end_with?('mm') } }
|
409
|
+
|
364
410
|
# =>
|
365
|
-
#<RedAmber::DataFrame : 344 x 3 Vectors,
|
366
|
-
|
367
|
-
|
368
|
-
|
369
|
-
|
370
|
-
|
411
|
+
#<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003dd4c>
|
412
|
+
bill_length_mm bill_depth_mm flipper_length_mm
|
413
|
+
<double> <double> <uint8>
|
414
|
+
1 39.1 18.7 181
|
415
|
+
2 39.5 17.4 186
|
416
|
+
3 40.3 18.0 195
|
417
|
+
4 (nil) (nil) (nil)
|
418
|
+
5 36.7 19.3 193
|
419
|
+
: : : :
|
420
|
+
342 50.4 15.7 222
|
421
|
+
343 45.2 14.8 212
|
422
|
+
344 49.9 16.1 213
|
371
423
|
```
|
372
424
|
|
373
425
|
### `drop ` - pick and drop -
|
@@ -405,13 +457,17 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
405
457
|
df = RedAmber::DataFrame.new(a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3])
|
406
458
|
df.pick(:a) # or
|
407
459
|
df.drop(:b, :c)
|
460
|
+
|
408
461
|
# =>
|
409
|
-
#<RedAmber::DataFrame : 3 x 1 Vector,
|
410
|
-
|
411
|
-
|
412
|
-
1
|
462
|
+
#<RedAmber::DataFrame : 3 x 1 Vector, 0x000000000003f4bc>
|
463
|
+
a
|
464
|
+
<uint8>
|
465
|
+
1 1
|
466
|
+
2 2
|
467
|
+
3 3
|
413
468
|
|
414
469
|
df[:a]
|
470
|
+
|
415
471
|
# =>
|
416
472
|
#<RedAmber::Vector(:uint8, size=3):0x000000000000f258>
|
417
473
|
[1, 2, 3]
|
@@ -423,21 +479,29 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
423
479
|
|
424
480
|

|
425
481
|
|
426
|
-
-
|
482
|
+
- Indices as arguments
|
483
|
+
|
484
|
+
`slice(indeces)` accepts indices as arguments. Indices should be Integers, Floats or Ranges of Integers.
|
427
485
|
|
428
|
-
|
486
|
+
Negative index from the tail like Ruby's Array is also acceptable.
|
429
487
|
|
430
488
|
```ruby
|
431
489
|
# returns 5 obs. at start and 5 obs. from end
|
432
490
|
penguins.slice(0...5, -5..-1)
|
491
|
+
|
433
492
|
# =>
|
434
|
-
#<RedAmber::DataFrame : 10 x 8 Vectors,
|
435
|
-
|
436
|
-
|
437
|
-
|
438
|
-
|
439
|
-
|
440
|
-
|
493
|
+
#<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
|
494
|
+
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
495
|
+
<string> <string> <double> <double> <uint8> ... <uint16>
|
496
|
+
1 Adelie Torgersen 39.1 18.7 181 ... 2007
|
497
|
+
2 Adelie Torgersen 39.5 17.4 186 ... 2007
|
498
|
+
3 Adelie Torgersen 40.3 18.0 195 ... 2007
|
499
|
+
4 Adelie Torgersen (nil) (nil) (nil) ... 2007
|
500
|
+
5 Adelie Torgersen 36.7 19.3 193 ... 2007
|
501
|
+
: : : : : : ... :
|
502
|
+
8 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
503
|
+
9 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
504
|
+
10 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
441
505
|
```
|
442
506
|
|
443
507
|
- Booleans as an argument
|
@@ -447,17 +511,23 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
447
511
|
```ruby
|
448
512
|
vector = penguins[:bill_length_mm]
|
449
513
|
penguins.slice(vector >= 40)
|
514
|
+
|
450
515
|
# =>
|
451
|
-
#<RedAmber::DataFrame : 242 x 8 Vectors,
|
452
|
-
|
453
|
-
|
454
|
-
|
455
|
-
|
456
|
-
|
457
|
-
|
516
|
+
#<RedAmber::DataFrame : 242 x 8 Vectors, 0x0000000000043d3c>
|
517
|
+
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
518
|
+
<string> <string> <double> <double> <uint8> ... <uint16>
|
519
|
+
1 Adelie Torgersen 40.3 18.0 195 ... 2007
|
520
|
+
2 Adelie Torgersen 42.0 20.2 190 ... 2007
|
521
|
+
3 Adelie Torgersen 41.1 17.6 182 ... 2007
|
522
|
+
4 Adelie Torgersen 42.5 20.7 197 ... 2007
|
523
|
+
5 Adelie Torgersen 46.0 21.5 194 ... 2007
|
524
|
+
: : : : : : ... :
|
525
|
+
240 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
526
|
+
241 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
527
|
+
242 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
458
528
|
```
|
459
529
|
|
460
|
-
-
|
530
|
+
- Indices or booleans by a block
|
461
531
|
|
462
532
|
`slice {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.
|
463
533
|
|
@@ -469,14 +539,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
469
539
|
max = vector.mean + vector.std
|
470
540
|
vector.to_a.map { |e| (min..max).include? e }
|
471
541
|
end
|
542
|
+
|
472
543
|
# =>
|
473
|
-
#<RedAmber::DataFrame : 204 x 8 Vectors,
|
474
|
-
|
475
|
-
|
476
|
-
|
477
|
-
|
478
|
-
|
479
|
-
|
544
|
+
#<RedAmber::DataFrame : 204 x 8 Vectors, 0x0000000000047a40>
|
545
|
+
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
546
|
+
<string> <string> <double> <double> <uint8> ... <uint16>
|
547
|
+
1 Adelie Torgersen 39.1 18.7 181 ... 2007
|
548
|
+
2 Adelie Torgersen 39.5 17.4 186 ... 2007
|
549
|
+
3 Adelie Torgersen 40.3 18.0 195 ... 2007
|
550
|
+
4 Adelie Torgersen 39.3 20.6 190 ... 2007
|
551
|
+
5 Adelie Torgersen 38.9 17.8 181 ... 2007
|
552
|
+
: : : : : : ... :
|
553
|
+
202 Gentoo Biscoe 47.2 13.7 214 ... 2009
|
554
|
+
203 Gentoo Biscoe 46.8 14.3 215 ... 2009
|
555
|
+
204 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
480
556
|
```
|
481
557
|
|
482
558
|
- Notice: nil option
|
@@ -486,6 +562,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
486
562
|
hash = { a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3] }
|
487
563
|
table = Arrow::Table.new(hash)
|
488
564
|
table.slice([true, false, nil])
|
565
|
+
|
489
566
|
# =>
|
490
567
|
#<Arrow::Table:0x7fdfe44b9e18 ptr=0x555e9fe744d0>
|
491
568
|
a b c
|
@@ -497,6 +574,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
497
574
|
|
498
575
|
```ruby
|
499
576
|
RedAmber::DataFrame.new(table).slice([true, false, nil]).table
|
577
|
+
|
500
578
|
# =>
|
501
579
|
#<Arrow::Table:0x7fdfe44981c8 ptr=0x555e9febc330>
|
502
580
|
a b c
|
@@ -509,21 +587,27 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
509
587
|
|
510
588
|

|
511
589
|
|
512
|
-
-
|
590
|
+
- Indices as arguments
|
513
591
|
|
514
592
|
`remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
|
515
593
|
|
516
594
|
```ruby
|
517
595
|
# returns 6th to 339th obs.
|
518
596
|
penguins.remove(0...5, -5..-1)
|
597
|
+
|
519
598
|
# =>
|
520
|
-
#<RedAmber::DataFrame : 334 x 8 Vectors,
|
521
|
-
|
522
|
-
|
523
|
-
|
524
|
-
|
525
|
-
|
526
|
-
|
599
|
+
#<RedAmber::DataFrame : 334 x 8 Vectors, 0x00000000000487c4>
|
600
|
+
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
601
|
+
<string> <string> <double> <double> <uint8> ... <uint16>
|
602
|
+
1 Adelie Torgersen 39.3 20.6 190 ... 2007
|
603
|
+
2 Adelie Torgersen 38.9 17.8 181 ... 2007
|
604
|
+
3 Adelie Torgersen 39.2 19.6 195 ... 2007
|
605
|
+
4 Adelie Torgersen 34.1 18.1 193 ... 2007
|
606
|
+
5 Adelie Torgersen 42.0 20.2 190 ... 2007
|
607
|
+
: : : : : : ... :
|
608
|
+
332 Gentoo Biscoe 44.5 15.7 217 ... 2009
|
609
|
+
333 Gentoo Biscoe 48.8 16.2 222 ... 2009
|
610
|
+
334 Gentoo Biscoe 47.2 13.7 214 ... 2009
|
527
611
|
```
|
528
612
|
|
529
613
|
- Booleans as an argument
|
@@ -533,22 +617,24 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
533
617
|
```ruby
|
534
618
|
# remove all observation contains nil
|
535
619
|
removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
|
536
|
-
removed
|
620
|
+
removed
|
621
|
+
|
537
622
|
# =>
|
538
|
-
RedAmber::DataFrame : 333 x 8 Vectors
|
539
|
-
|
540
|
-
|
541
|
-
|
542
|
-
|
543
|
-
|
544
|
-
|
545
|
-
|
546
|
-
|
547
|
-
7
|
548
|
-
8
|
623
|
+
#<RedAmber::DataFrame : 333 x 8 Vectors, 0x0000000000049fac>
|
624
|
+
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
625
|
+
<string> <string> <double> <double> <uint8> ... <uint16>
|
626
|
+
1 Adelie Torgersen 39.1 18.7 181 ... 2007
|
627
|
+
2 Adelie Torgersen 39.5 17.4 186 ... 2007
|
628
|
+
3 Adelie Torgersen 40.3 18.0 195 ... 2007
|
629
|
+
4 Adelie Torgersen 36.7 19.3 193 ... 2007
|
630
|
+
5 Adelie Torgersen 39.3 20.6 190 ... 2007
|
631
|
+
: : : : : : ... :
|
632
|
+
331 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
633
|
+
332 Gentoo Biscoe 45.2 14.8 212 ... 2009
|
634
|
+
333 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
549
635
|
```
|
550
636
|
|
551
|
-
-
|
637
|
+
- Indices or booleans by a block
|
552
638
|
|
553
639
|
`remove {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.
|
554
640
|
|
@@ -559,14 +645,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
559
645
|
max = vector.mean + vector.std
|
560
646
|
vector.to_a.map { |e| (min..max).include? e }
|
561
647
|
end
|
648
|
+
|
562
649
|
# =>
|
563
|
-
#<RedAmber::DataFrame : 140 x 8 Vectors,
|
564
|
-
|
565
|
-
|
566
|
-
|
567
|
-
|
568
|
-
|
569
|
-
|
650
|
+
#<RedAmber::DataFrame : 140 x 8 Vectors, 0x000000000004de40>
|
651
|
+
species island bill_length_mm bill_depth_mm flipper_length_mm ... year
|
652
|
+
<string> <string> <double> <double> <uint8> ... <uint16>
|
653
|
+
1 Adelie Torgersen (nil) (nil) (nil) ... 2007
|
654
|
+
2 Adelie Torgersen 36.7 19.3 193 ... 2007
|
655
|
+
3 Adelie Torgersen 34.1 18.1 193 ... 2007
|
656
|
+
4 Adelie Torgersen 37.8 17.1 186 ... 2007
|
657
|
+
5 Adelie Torgersen 37.8 17.3 180 ... 2007
|
658
|
+
: : : : : : ... :
|
659
|
+
138 Gentoo Biscoe (nil) (nil) (nil) ... 2009
|
660
|
+
139 Gentoo Biscoe 50.4 15.7 222 ... 2009
|
661
|
+
140 Gentoo Biscoe 49.9 16.1 213 ... 2009
|
570
662
|
```
|
571
663
|
- Notice for nil
|
572
664
|
- When `remove` used with booleans, nil in booleans is treated as false. This behavior is aligned with Ruby's `nil#!`.
|
@@ -574,28 +666,34 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
574
666
|
```ruby
|
575
667
|
df = RedAmber::DataFrame.new(a: [1, 2, nil], b: %w[A B C], c: [1.0, 2, 3])
|
576
668
|
booleans = df[:a] < 2
|
669
|
+
booleans
|
670
|
+
|
577
671
|
# =>
|
578
672
|
#<RedAmber::Vector(:boolean, size=3):0x000000000000f410>
|
579
673
|
[true, false, nil]
|
580
674
|
|
581
675
|
booleans_invert = booleans.to_a.map(&:!) # => [false, true, true]
|
676
|
+
|
582
677
|
df.slice(booleans) == df.remove(booleans_invert) # => true
|
583
678
|
```
|
679
|
+
|
584
680
|
- Whereas `Vector#invert` returns nil for elements nil. This will bring different result.
|
585
681
|
|
586
682
|
```ruby
|
587
683
|
booleans.invert
|
684
|
+
|
588
685
|
# =>
|
589
686
|
#<RedAmber::Vector(:boolean, size=3):0x000000000000f488>
|
590
687
|
[false, true, nil]
|
591
688
|
|
592
689
|
df.remove(booleans.invert)
|
593
|
-
|
594
|
-
|
595
|
-
|
596
|
-
|
597
|
-
|
598
|
-
|
690
|
+
|
691
|
+
# =>
|
692
|
+
#<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000005df98>
|
693
|
+
a b c
|
694
|
+
<uint8> <string> <double>
|
695
|
+
1 1 A 1.0
|
696
|
+
2 (nil) C 3.0
|
599
697
|
```
|
600
698
|
|
601
699
|
### `rename`
|
@@ -609,15 +707,16 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
609
707
|
`rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}`.
|
610
708
|
|
611
709
|
```ruby
|
612
|
-
|
613
|
-
df = RedAmber::DataFrame.new(h)
|
710
|
+
df = RedAmber::DataFrame.new( 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] )
|
614
711
|
df.rename(:age => :age_in_1993)
|
712
|
+
|
615
713
|
# =>
|
616
|
-
#<RedAmber::DataFrame : 3 x 2 Vectors,
|
617
|
-
|
618
|
-
|
619
|
-
1
|
620
|
-
2
|
714
|
+
#<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000060838>
|
715
|
+
name age_in_1993
|
716
|
+
<string> <uint8>
|
717
|
+
1 Yasuko 68
|
718
|
+
2 Rui 49
|
719
|
+
3 Hinata 28
|
621
720
|
```
|
622
721
|
|
623
722
|
- Key pairs by a block
|
@@ -643,25 +742,29 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
643
742
|
|
644
743
|
```ruby
|
645
744
|
df = RedAmber::DataFrame.new(
|
646
|
-
|
647
|
-
|
745
|
+
name: %w[Yasuko Rui Hinata],
|
746
|
+
age: [68, 49, 28])
|
747
|
+
df
|
748
|
+
|
648
749
|
# =>
|
649
|
-
#<RedAmber::DataFrame : 3 x 2 Vectors,
|
650
|
-
|
651
|
-
|
652
|
-
1
|
653
|
-
2
|
750
|
+
#<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000062804>
|
751
|
+
name age
|
752
|
+
<string> <uint8>
|
753
|
+
1 Yasuko 68
|
754
|
+
2 Rui 49
|
755
|
+
3 Hinata 28
|
654
756
|
|
655
757
|
# update :age and add :brother
|
656
758
|
assigner = { age: [97, 78, 57], brother: ['Santa', nil, 'Momotaro'] }
|
657
759
|
df.assign(assigner)
|
760
|
+
|
658
761
|
# =>
|
659
|
-
#<RedAmber::DataFrame : 3 x 3 Vectors,
|
660
|
-
|
661
|
-
|
662
|
-
1
|
663
|
-
2
|
664
|
-
3
|
762
|
+
#<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000658b0>
|
763
|
+
name age brother
|
764
|
+
<string> <uint8> <string>
|
765
|
+
1 Yasuko 97 Santa
|
766
|
+
2 Rui 78 (nil)
|
767
|
+
3 Hinata 57 Momotaro
|
665
768
|
```
|
666
769
|
|
667
770
|
- Key pairs by a block
|
@@ -673,13 +776,17 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
673
776
|
index: [0, 1, 2, 3, nil],
|
674
777
|
float: [0.0, 1.1, 2.2, Float::NAN, nil],
|
675
778
|
string: ['A', 'B', 'C', 'D', nil])
|
779
|
+
df
|
780
|
+
|
676
781
|
# =>
|
677
|
-
#<RedAmber::DataFrame : 5 x 3 Vectors,
|
678
|
-
|
679
|
-
|
680
|
-
1
|
681
|
-
2
|
682
|
-
3
|
782
|
+
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x0000000000069e60>
|
783
|
+
index float string
|
784
|
+
<uint8> <double> <string>
|
785
|
+
1 0 0.0 A
|
786
|
+
2 1 1.1 B
|
787
|
+
3 2 2.2 C
|
788
|
+
4 3 NaN D
|
789
|
+
5 (nil) (nil) (nil)
|
683
790
|
|
684
791
|
# update numeric variables
|
685
792
|
df.assign do
|
@@ -689,13 +796,16 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
689
796
|
end
|
690
797
|
assigner
|
691
798
|
end
|
799
|
+
|
692
800
|
# =>
|
693
|
-
#<RedAmber::DataFrame : 5 x 3 Vectors,
|
694
|
-
|
695
|
-
|
696
|
-
1
|
697
|
-
2
|
698
|
-
3
|
801
|
+
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000006e000>
|
802
|
+
index float string
|
803
|
+
<int8> <double> <string>
|
804
|
+
1 0 -0.0 A
|
805
|
+
2 -1 -1.1 B
|
806
|
+
3 -2 -2.2 C
|
807
|
+
4 -3 NaN D
|
808
|
+
5 (nil) (nil) (nil)
|
699
809
|
|
700
810
|
# Or it ’s shorter like this:
|
701
811
|
df.assign do
|
@@ -703,6 +813,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
703
813
|
assigner[key] = vector * -1 if vector.numeric?
|
704
814
|
end
|
705
815
|
end
|
816
|
+
|
706
817
|
# => same as above
|
707
818
|
```
|
708
819
|
|
@@ -724,14 +835,17 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
724
835
|
string: ['C', 'B', nil, 'A', 'B'],
|
725
836
|
bool: [nil, true, false, true, false],
|
726
837
|
})
|
727
|
-
df.sort(:index, '-bool')
|
838
|
+
df.sort(:index, '-bool')
|
839
|
+
|
728
840
|
# =>
|
729
|
-
RedAmber::DataFrame : 5 x 3 Vectors
|
730
|
-
|
731
|
-
|
732
|
-
1
|
733
|
-
2
|
734
|
-
3
|
841
|
+
#<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000009b03c>
|
842
|
+
index string bool
|
843
|
+
<uint8> <string> <boolean>
|
844
|
+
1 0 (nil) false
|
845
|
+
2 0 B false
|
846
|
+
3 1 B true
|
847
|
+
4 1 C (nil)
|
848
|
+
5 (nil) A true
|
735
849
|
```
|
736
850
|
|
737
851
|
- [ ] Clamp
|
@@ -746,64 +860,10 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
746
860
|
|
747
861
|
## Grouping
|
748
862
|
|
749
|
-
### `group(
|
750
|
-
|
751
|
-
Create grouped dataframe by `aggregation_keys` and apply `function` to each group and returns in `target_keys`. Aggregated key name is `function(key)` style.
|
863
|
+
### `group(group_keys)`
|
752
864
|
|
753
|
-
|
754
|
-
|
755
|
-
```ruby
|
756
|
-
ds = Datasets::Rdatasets.new('dplyr', 'starwars')
|
757
|
-
starwars = RedAmber::DataFrame.new(ds.to_table.to_h)
|
758
|
-
starwars.tdr(11)
|
759
|
-
# =>
|
760
|
-
RedAmber::DataFrame : 87 x 11 Vectors
|
761
|
-
Vectors : 3 numeric, 8 strings
|
762
|
-
# key type level data_preview
|
763
|
-
1 :name string 87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organa", ... ]
|
764
|
-
2 :height uint16 46 [172, 167, 96, 202, 150, ... ], 6 nils
|
765
|
-
3 :mass double 39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
|
766
|
-
4 :hair_color string 13 ["blond", nil, nil, "none", "brown", ... ], 5 nils
|
767
|
-
5 :skin_color string 31 ["fair", "gold", "white, blue", "white", "light", .. . ]
|
768
|
-
6 :eye_color string 15 ["blue", "yellow", "red", "yellow", "brown", ... ]
|
769
|
-
7 :birth_year double 37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
|
770
|
-
8 :sex string 5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, nil=>4}
|
771
|
-
9 :gender string 3 {"masculine"=>66, "feminine"=>17, nil=>4}
|
772
|
-
10 :homeworld string 49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ], 10 nils
|
773
|
-
11 :species string 38 ["Human", "Droid", "Droid", "Human", "Human", ... ], 4 nils
|
774
|
-
|
775
|
-
grouped = starwars.group(:species, :mean, [:mass, :height])
|
776
|
-
# =>
|
777
|
-
#<RedAmber::DataFrame : 38 x 3 Vectors, 0x000000000000fbf4>
|
778
|
-
Vectors : 2 numeric, 1 string
|
779
|
-
# key type level data_preview
|
780
|
-
1 :"mean(mass)" double 27 [82.78181818181818, 69.75, 124.0, 74.0, 1358.0, ... ], 6 nils
|
781
|
-
2 :"mean(height)" double 32 [176.6451612903226, 131.2, 231.0, 173.0, 175.0, ... ]
|
782
|
-
3 :species string 38 ["Human", "Droid", "Wookiee", "Rodian", "Hutt", ... ], 1 nil
|
783
|
-
|
784
|
-
count = starwars.group(:species, :count, :species)[:"count(species)"]
|
785
|
-
df = grouped.slice(count > 1)
|
786
|
-
# =>
|
787
|
-
#<RedAmber::DataFrame : 8 x 3 Vectors, 0x000000000000fc44>
|
788
|
-
Vectors : 2 numeric, 1 string
|
789
|
-
# key type level data_preview
|
790
|
-
1 :"mean(mass)" double 8 [82.78181818181818, 69.75, 124.0, 74.0, 80.0, ... ]
|
791
|
-
2 :"mean(height)" double 8 [176.6451612903226, 131.2, 231.0, 208.66666666666666, 173.0, ... ]
|
792
|
-
3 :species string 8 ["Human", "Droid", "Wookiee", "Gungan", "Zabrak", ... ]
|
793
|
-
|
794
|
-
df.table
|
795
|
-
# =>
|
796
|
-
#<Arrow::Table:0x1165593c8 ptr=0x7fb3db144c70>
|
797
|
-
mean(mass) mean(height) species
|
798
|
-
0 82.781818 176.645161 Human
|
799
|
-
1 69.750000 131.200000 Droid
|
800
|
-
2 124.000000 231.000000 Wookiee
|
801
|
-
3 74.000000 208.666667 Gungan
|
802
|
-
4 80.000000 173.000000 Zabrak
|
803
|
-
5 55.000000 179.000000 Twi'lek
|
804
|
-
6 53.100000 168.000000 Mirialan
|
805
|
-
7 88.000000 221.000000 Kaminoan
|
806
|
-
```
|
865
|
+
`group` creates a class `Group` object. `Group` accepts functions below as a method.
|
866
|
+
Method accepts options as `group_keys`.
|
807
867
|
|
808
868
|
Available functions are:
|
809
869
|
|
@@ -823,11 +883,113 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
823
883
|
- [ ] tdigest
|
824
884
|
- ✓ variance
|
825
885
|
|
826
|
-
|
886
|
+
For the each group of `group_keys`, the aggregation `function` is applied and returns a new dataframe with aggregated keys according to `summary_keys`.
|
887
|
+
Summary key names are provided by `function(summary_keys)` style.
|
888
|
+
|
889
|
+
This is an example of grouping of famous STARWARS dataset.
|
890
|
+
|
891
|
+
```ruby
|
892
|
+
starwars =
|
893
|
+
RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
|
894
|
+
starwars
|
895
|
+
|
896
|
+
# =>
|
897
|
+
#<RedAmber::DataFrame : 87 x 12 Vectors, 0x0000000000005a50>
|
898
|
+
unnamed1 name height mass hair_color skin_color eye_color ... species
|
899
|
+
<int64> <string> <int64> <double> <string> <string> <string> ... <string>
|
900
|
+
1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
|
901
|
+
2 2 C-3PO 167 75.0 NA gold yellow ... Droid
|
902
|
+
3 3 R2-D2 96 32.0 NA white, blue red ... Droid
|
903
|
+
4 4 Darth Vader 202 136.0 none white yellow ... Human
|
904
|
+
5 5 Leia Organa 150 49.0 brown light brown ... Human
|
905
|
+
: : : : : : : : ... :
|
906
|
+
85 85 BB8 (nil) (nil) none none black ... Droid
|
907
|
+
86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
|
908
|
+
87 87 Padmé Amidala 165 45.0 brown light brown ... Human
|
909
|
+
|
910
|
+
starwars.tdr(12)
|
827
911
|
|
828
|
-
|
912
|
+
# =>
|
913
|
+
RedAmber::DataFrame : 87 x 12 Vectors
|
914
|
+
Vectors : 4 numeric, 8 strings
|
915
|
+
# key type level data_preview
|
916
|
+
1 :unnamed1 int64 87 [1, 2, 3, 4, 5, ... ]
|
917
|
+
2 :name string 87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organa", ... ]
|
918
|
+
3 :height int64 46 [172, 167, 96, 202, 150, ... ], 6 nils
|
919
|
+
4 :mass double 39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
|
920
|
+
5 :hair_color string 13 ["blond", "NA", "NA", "none", "brown", ... ]
|
921
|
+
6 :skin_color string 31 ["fair", "gold", "white, blue", "white", "light", ... ]
|
922
|
+
7 :eye_color string 15 ["blue", "yellow", "red", "yellow", "brown", ... ]
|
923
|
+
8 :birth_year double 37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
|
924
|
+
9 :sex string 5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, "NA"=>4}
|
925
|
+
10 :gender string 3 {"masculine"=>66, "feminine"=>17, "NA"=>4}
|
926
|
+
11 :homeworld string 49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ]
|
927
|
+
12 :species string 38 ["Human", "Droid", "Droid", "Human", "Human", ... ]
|
928
|
+
```
|
829
929
|
|
830
|
-
|
930
|
+
We can group by `:species` and calculate the count.
|
931
|
+
|
932
|
+
```ruby
|
933
|
+
starwars.group(:species).count(:species)
|
934
|
+
|
935
|
+
# =>
|
936
|
+
#<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
|
937
|
+
species count
|
938
|
+
<string> <int64>
|
939
|
+
1 Human 35
|
940
|
+
2 Droid 6
|
941
|
+
3 Wookiee 2
|
942
|
+
4 Rodian 1
|
943
|
+
5 Hutt 1
|
944
|
+
: : :
|
945
|
+
36 Kaleesh 1
|
946
|
+
37 Pau'an 1
|
947
|
+
38 Kel Dor 1
|
948
|
+
```
|
949
|
+
|
950
|
+
We can also calculate the mean of `:mass` and `:height` together.
|
951
|
+
|
952
|
+
```ruby
|
953
|
+
grouped = starwars.group(:species) { [count(:species), mean(:height, :mass)] }
|
954
|
+
|
955
|
+
# =>
|
956
|
+
#<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
|
957
|
+
species count mean(height) mean(mass)
|
958
|
+
<string> <int64> <double> <double>
|
959
|
+
1 Human 35 176.6 82.8
|
960
|
+
2 Droid 6 131.2 69.8
|
961
|
+
3 Wookiee 2 231.0 124.0
|
962
|
+
4 Rodian 1 173.0 74.0
|
963
|
+
5 Hutt 1 175.0 1358.0
|
964
|
+
: : : : :
|
965
|
+
36 Kaleesh 1 216.0 159.0
|
966
|
+
37 Pau'an 1 206.0 80.0
|
967
|
+
38 Kel Dor 1 188.0 80.0
|
968
|
+
```
|
969
|
+
|
970
|
+
Select rows for count > 1.
|
971
|
+
|
972
|
+
```ruby
|
973
|
+
grouped.slice(grouped[:count] > 1)
|
974
|
+
|
975
|
+
# =>
|
976
|
+
#<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000004c270>
|
977
|
+
species count mean(height) mean(mass)
|
978
|
+
<string> <int64> <double> <double>
|
979
|
+
1 Human 35 176.6 82.8
|
980
|
+
2 Droid 6 131.2 69.8
|
981
|
+
3 Wookiee 2 231.0 124.0
|
982
|
+
4 Gungan 3 208.7 74.0
|
983
|
+
5 NA 4 181.3 48.0
|
984
|
+
: : : : :
|
985
|
+
7 Twi'lek 2 179.0 55.0
|
986
|
+
8 Mirialan 2 168.0 53.1
|
987
|
+
9 Kaminoan 2 221.0 88.0
|
988
|
+
```
|
989
|
+
|
990
|
+
## Combining DataFrames
|
991
|
+
|
992
|
+
- [ ] Combining rows to a dataframe
|
831
993
|
|
832
994
|
- [ ] Inner join
|
833
995
|
|
@@ -837,4 +999,6 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
|
|
837
999
|
|
838
1000
|
- [ ] One-hot encoding
|
839
1001
|
|
840
|
-
## Iteration
|
1002
|
+
## Iteration
|
1003
|
+
|
1004
|
+
- [ ] each_rows
|