red_amber 0.1.6 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/doc/DataFrame.md CHANGED
@@ -9,8 +9,6 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
9
9
 
10
10
  ![dataframe model image](doc/../image/dataframe_model.png)
11
11
 
12
- (No change in this model in v0.1.6 .)
13
-
14
12
  ## Constructors and saving
15
13
 
16
14
  ### `new` from a Hash
@@ -37,6 +35,8 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
37
35
 
38
36
 
39
37
  ```ruby
38
+ require 'rover'
39
+
40
40
  rover = Rover::DataFrame.new(x: [1, 2, 3])
41
41
  RedAmber::DataFrame.new(rover)
42
42
  ```
@@ -61,6 +61,8 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
61
61
  - from a Parquet file
62
62
 
63
63
  ```ruby
64
+ require 'parquet'
65
+
64
66
  dataframe = RedAmber::DataFrame.load("file.parquet")
65
67
  ```
66
68
 
@@ -75,6 +77,8 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
75
77
  - to a Parquet file
76
78
 
77
79
  ```ruby
80
+ require 'parquet'
81
+
78
82
  dataframe.save("file.parquet")
79
83
  ```
80
84
 
@@ -175,12 +179,41 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
175
179
 
176
180
  ### `to_s`
177
181
 
182
+ `to_s` returns a preview of the Table.
183
+
184
+ ```ruby
185
+ puts penguins.to_s
186
+
187
+ # =>
188
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
189
+ <string> <string> <double> <double> <uint8> ... <uint16>
190
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
191
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
192
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
193
+ 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
194
+ 5 Adelie Torgersen 36.7 19.3 193 ... 2007
195
+ : : : : : : ... :
196
+ 342 Gentoo Biscoe 50.4 15.7 222 ... 2009
197
+ 343 Gentoo Biscoe 45.2 14.8 212 ... 2009
198
+ 344 Gentoo Biscoe 49.9 16.1 213 ... 2009
199
+ ```
200
+ ### `inspect`
201
+
202
+ `inspect` uses `to_s` output and also shows shape and object_id.
203
+
204
+
178
205
  ### `summary`, `describe` (not implemented)
179
206
 
180
207
  ### `to_rover`
181
208
 
182
209
  - Returns a `Rover::DataFrame`.
183
210
 
211
+ ```ruby
212
+ require 'rover'
213
+
214
+ penguins.to_rover
215
+ ```
216
+
184
217
  ### `to_iruby`
185
218
 
186
219
  - Show the DataFrame as a Table in Jupyter Notebook or Jupyter Lab with IRuby.
@@ -196,6 +229,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
196
229
 
197
230
  penguins = Datasets::Penguins.new.to_arrow
198
231
  RedAmber::DataFrame.new(penguins).tdr
232
+
199
233
  # =>
200
234
  RedAmber::DataFrame : 344 x 8 Vectors
201
235
  Vectors : 5 numeric, 3 strings
@@ -214,22 +248,6 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
214
248
  - tally: max level to use tally mode.
215
249
  - elements: max num of element to show values in each observations.
216
250
 
217
- ### `inspect`
218
-
219
- - Returns the information of self as `tdr(3)`, and also shows object id.
220
-
221
- ```ruby
222
- puts penguins.inspect
223
- # =>
224
- #<RedAmber::DataFrame : 344 x 8 Vectors, 0x000000000000f0b4>
225
- Vectors : 5 numeric, 3 strings
226
- # key type level data_preview
227
- 1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
228
- 2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
229
- 3 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
230
- ... 5 more Vectors ...
231
- ```
232
-
233
251
  ## Selecting
234
252
 
235
253
  ### Select variables (columns in a table) by `[]` as `[key]`, `[keys]`, `[keys[index]]`
@@ -250,19 +268,21 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
250
268
  hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
251
269
  df = RedAmber::DataFrame.new(hash)
252
270
  df[:b..:c, "a"]
271
+
253
272
  # =>
254
- #<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000b02c>
255
- Vectors : 2 numeric, 1 string
256
- # key type level data_preview
257
- 1 :b string 3 ["A", "B", "C"]
258
- 2 :c double 3 [1.0, 2.0, 3.0]
259
- 3 :a uint8 3 [1, 2, 3]
273
+ #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000328fc>
274
+ b c a
275
+ <string> <double> <uint8>
276
+ 1 A 1.0 1
277
+ 2 B 2.0 2
278
+ 3 C 3.0 3
260
279
  ```
261
280
 
262
281
  If `#[]` represents single variable (column), it returns a Vector object.
263
282
 
264
283
  ```ruby
265
284
  df[:a]
285
+
266
286
  # =>
267
287
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
268
288
  [1, 2, 3]
@@ -271,6 +291,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
271
291
 
272
292
  ```ruby
273
293
  df.v(:a)
294
+
274
295
  # =>
275
296
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
276
297
  [1, 2, 3]
@@ -294,14 +315,16 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
294
315
  ```ruby
295
316
  hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
296
317
  df = RedAmber::DataFrame.new(hash)
297
- df[:b..:c, "a"].tdr(tally_level: 0)
318
+ df[2, 0..]
319
+
298
320
  # =>
299
- RedAmber::DataFrame : 4 x 3 Vectors
300
- Vectors : 2 numeric, 1 string
301
- # key type level data_preview
302
- 1 :a uint8 3 [3, 1, 2, 3]
303
- 2 :b string 3 ["C", "A", "B", "C"]
304
- 3 :c double 3 [3.0, 1.0, 2.0, 3.0]
321
+ #<RedAmber::DataFrame : 4 x 3 Vectors, 0x0000000000033270>
322
+ a b c
323
+ <uint8> <string> <double>
324
+ 1 3 C 3.0
325
+ 2 1 A 1.0
326
+ 3 2 B 2.0
327
+ 4 3 C 3.0
305
328
  ```
306
329
 
307
330
  - Select obs. by a boolean Array or a boolean RedAmber::Vector at same size as self.
@@ -313,13 +336,12 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
313
336
  df[true, false, nil] # or
314
337
  df[[true, false, nil]] # or
315
338
  df[RedAmber::Vector.new([true, false, nil])]
339
+
316
340
  # =>
317
- #<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000000f1a4>
318
- Vectors : 2 numeric, 1 string
319
- # key type level data_preview
320
- 1 :a uint8 1 [1]
321
- 2 :b string 1 ["A"]
322
- 3 :c double 1 [1.0]
341
+ #<RedAmber::DataFrame : 1 x 3 Vectors, 0x00000000000353e0>
342
+ a b c
343
+ <uint8> <string> <double>
344
+ 1 1 A 1.0
323
345
  ```
324
346
 
325
347
  ### Select rows from top or from bottom
@@ -340,12 +362,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
340
362
 
341
363
  ```ruby
342
364
  penguins.pick(:species, :bill_length_mm)
365
+
343
366
  # =>
344
- #<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000f924>
345
- Vectors : 1 numeric, 1 string
346
- # key type level data_preview
347
- 1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
348
- 2 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
367
+ #<RedAmber::DataFrame : 344 x 2 Vectors, 0x0000000000035ebc>
368
+ species bill_length_mm
369
+ <string> <double>
370
+ 1 Adelie 39.1
371
+ 2 Adelie 39.5
372
+ 3 Adelie 40.3
373
+ 4 Adelie (nil)
374
+ 5 Adelie 36.7
375
+ : : :
376
+ 342 Gentoo 50.4
377
+ 343 Gentoo 45.2
378
+ 344 Gentoo 49.9
349
379
  ```
350
380
 
351
381
  - Booleans as a argument
@@ -354,13 +384,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
354
384
 
355
385
  ```ruby
356
386
  penguins.pick(penguins.types.map { |type| type == :string })
387
+
357
388
  # =>
358
- #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000000f938>
359
- Vectors : 3 strings
360
- # key type level data_preview
361
- 1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
362
- 2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
363
- 3 :sex string 3 {"male"=>168, "female"=>165, ""=>11}
389
+ #<RedAmber::DataFrame : 344 x 3 Vectors, 0x00000000000387ac>
390
+ species island sex
391
+ <string> <string> <string>
392
+ 1 Adelie Torgersen male
393
+ 2 Adelie Torgersen female
394
+ 3 Adelie Torgersen female
395
+ 4 Adelie Torgersen (nil)
396
+ 5 Adelie Torgersen female
397
+ : : : :
398
+ 342 Gentoo Biscoe male
399
+ 343 Gentoo Biscoe female
400
+ 344 Gentoo Biscoe male
364
401
  ```
365
402
 
366
403
  - Keys or booleans by a block
@@ -368,15 +405,21 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
368
405
  `pick {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return keys, or a boolean Array with a same length as `n_keys`. Block is called in the context of self.
369
406
 
370
407
  ```ruby
371
- # It is ok to write `keys ...` in the block, not `penguins.keys ...`
372
408
  penguins.pick { keys.map { |key| key.end_with?('mm') } }
409
+
373
410
  # =>
374
- #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000000f1cc>
375
- Vectors : 3 numeric
376
- # key type level data_preview
377
- 1 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
378
- 2 :bill_depth_mm double 81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
379
- 3 :flipper_length_mm int64 56 [181, 186, 195, nil, 193, ... ], 2 nils
411
+ #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003dd4c>
412
+ bill_length_mm bill_depth_mm flipper_length_mm
413
+ <double> <double> <uint8>
414
+ 1 39.1 18.7 181
415
+ 2 39.5 17.4 186
416
+ 3 40.3 18.0 195
417
+ 4 (nil) (nil) (nil)
418
+ 5 36.7 19.3 193
419
+ : : : :
420
+ 342 50.4 15.7 222
421
+ 343 45.2 14.8 212
422
+ 344 49.9 16.1 213
380
423
  ```
381
424
 
382
425
  ### `drop ` - pick and drop -
@@ -414,13 +457,17 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
414
457
  df = RedAmber::DataFrame.new(a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3])
415
458
  df.pick(:a) # or
416
459
  df.drop(:b, :c)
460
+
417
461
  # =>
418
- #<RedAmber::DataFrame : 3 x 1 Vector, 0x000000000000f280>
419
- Vector : 1 numeric
420
- # key type level data_preview
421
- 1 :a uint8 3 [1, 2, 3]
462
+ #<RedAmber::DataFrame : 3 x 1 Vector, 0x000000000003f4bc>
463
+ a
464
+ <uint8>
465
+ 1 1
466
+ 2 2
467
+ 3 3
422
468
 
423
469
  df[:a]
470
+
424
471
  # =>
425
472
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f258>
426
473
  [1, 2, 3]
@@ -441,14 +488,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
441
488
  ```ruby
442
489
  # returns 5 obs. at start and 5 obs. from end
443
490
  penguins.slice(0...5, -5..-1)
491
+
444
492
  # =>
445
- #<RedAmber::DataFrame : 10 x 8 Vectors, 0x000000000000f230>
446
- Vectors : 5 numeric, 3 strings
447
- # key type level data_preview
448
- 1 :species string 2 {"Adelie"=>5, "Gentoo"=>5}
449
- 2 :island string 2 {"Torgersen"=>5, "Biscoe"=>5}
450
- 3 :bill_length_mm double 9 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
451
- ... 5 more Vectors ...
493
+ #<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
494
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
495
+ <string> <string> <double> <double> <uint8> ... <uint16>
496
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
497
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
498
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
499
+ 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
500
+ 5 Adelie Torgersen 36.7 19.3 193 ... 2007
501
+ : : : : : : ... :
502
+ 8 Gentoo Biscoe 50.4 15.7 222 ... 2009
503
+ 9 Gentoo Biscoe 45.2 14.8 212 ... 2009
504
+ 10 Gentoo Biscoe 49.9 16.1 213 ... 2009
452
505
  ```
453
506
 
454
507
  - Booleans as an argument
@@ -458,14 +511,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
458
511
  ```ruby
459
512
  vector = penguins[:bill_length_mm]
460
513
  penguins.slice(vector >= 40)
514
+
461
515
  # =>
462
- #<RedAmber::DataFrame : 242 x 8 Vectors, 0x000000000000f2bc>
463
- Vectors : 5 numeric, 3 strings
464
- # key type level data_preview
465
- 1 :species string 3 {"Adelie"=>51, "Chinstrap"=>68, "Gentoo"=>123}
466
- 2 :island string 3 {"Torgersen"=>18, "Biscoe"=>139, "Dream"=>85}
467
- 3 :bill_length_mm double 115 [40.3, 42.0, 41.1, 42.5, 46.0, ... ]
468
- ... 5 more Vectors ...
516
+ #<RedAmber::DataFrame : 242 x 8 Vectors, 0x0000000000043d3c>
517
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
518
+ <string> <string> <double> <double> <uint8> ... <uint16>
519
+ 1 Adelie Torgersen 40.3 18.0 195 ... 2007
520
+ 2 Adelie Torgersen 42.0 20.2 190 ... 2007
521
+ 3 Adelie Torgersen 41.1 17.6 182 ... 2007
522
+ 4 Adelie Torgersen 42.5 20.7 197 ... 2007
523
+ 5 Adelie Torgersen 46.0 21.5 194 ... 2007
524
+ : : : : : : ... :
525
+ 240 Gentoo Biscoe 50.4 15.7 222 ... 2009
526
+ 241 Gentoo Biscoe 45.2 14.8 212 ... 2009
527
+ 242 Gentoo Biscoe 49.9 16.1 213 ... 2009
469
528
  ```
470
529
 
471
530
  - Indices or booleans by a block
@@ -482,13 +541,18 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
482
541
  end
483
542
 
484
543
  # =>
485
- #<RedAmber::DataFrame : 204 x 8 Vectors, 0x000000000000f30c>
486
- Vectors : 5 numeric, 3 strings
487
- # key type level data_preview
488
- 1 :species string 3 {"Adelie"=>82, "Chinstrap"=>33, "Gentoo"=>89}
489
- 2 :island string 3 {"Torgersen"=>31, "Biscoe"=>112, "Dream"=>61}
490
- 3 :bill_length_mm double 90 [39.1, 39.5, 40.3, 39.3, 38.9, ... ]
491
- ... 5 more Vectors ...
544
+ #<RedAmber::DataFrame : 204 x 8 Vectors, 0x0000000000047a40>
545
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
546
+ <string> <string> <double> <double> <uint8> ... <uint16>
547
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
548
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
549
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
550
+ 4 Adelie Torgersen 39.3 20.6 190 ... 2007
551
+ 5 Adelie Torgersen 38.9 17.8 181 ... 2007
552
+ : : : : : : ... :
553
+ 202 Gentoo Biscoe 47.2 13.7 214 ... 2009
554
+ 203 Gentoo Biscoe 46.8 14.3 215 ... 2009
555
+ 204 Gentoo Biscoe 45.2 14.8 212 ... 2009
492
556
  ```
493
557
 
494
558
  - Notice: nil option
@@ -498,6 +562,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
498
562
  hash = { a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3] }
499
563
  table = Arrow::Table.new(hash)
500
564
  table.slice([true, false, nil])
565
+
501
566
  # =>
502
567
  #<Arrow::Table:0x7fdfe44b9e18 ptr=0x555e9fe744d0>
503
568
  a b c
@@ -509,6 +574,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
509
574
 
510
575
  ```ruby
511
576
  RedAmber::DataFrame.new(table).slice([true, false, nil]).table
577
+
512
578
  # =>
513
579
  #<Arrow::Table:0x7fdfe44981c8 ptr=0x555e9febc330>
514
580
  a b c
@@ -528,14 +594,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
528
594
  ```ruby
529
595
  # returns 6th to 339th obs.
530
596
  penguins.remove(0...5, -5..-1)
597
+
531
598
  # =>
532
- #<RedAmber::DataFrame : 334 x 8 Vectors, 0x000000000000f320>
533
- Vectors : 5 numeric, 3 strings
534
- # key type level data_preview
535
- 1 :species string 3 {"Adelie"=>147, "Chinstrap"=>68, "Gentoo"=>119}
536
- 2 :island string 3 {"Torgersen"=>47, "Biscoe"=>163, "Dream"=>124}
537
- 3 :bill_length_mm double 162 [39.3, 38.9, 39.2, 34.1, 42.0, ... ]
538
- ... 5 more Vectors ...
599
+ #<RedAmber::DataFrame : 334 x 8 Vectors, 0x00000000000487c4>
600
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
601
+ <string> <string> <double> <double> <uint8> ... <uint16>
602
+ 1 Adelie Torgersen 39.3 20.6 190 ... 2007
603
+ 2 Adelie Torgersen 38.9 17.8 181 ... 2007
604
+ 3 Adelie Torgersen 39.2 19.6 195 ... 2007
605
+ 4 Adelie Torgersen 34.1 18.1 193 ... 2007
606
+ 5 Adelie Torgersen 42.0 20.2 190 ... 2007
607
+ : : : : : : ... :
608
+ 332 Gentoo Biscoe 44.5 15.7 217 ... 2009
609
+ 333 Gentoo Biscoe 48.8 16.2 222 ... 2009
610
+ 334 Gentoo Biscoe 47.2 13.7 214 ... 2009
539
611
  ```
540
612
 
541
613
  - Booleans as an argument
@@ -545,19 +617,21 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
545
617
  ```ruby
546
618
  # remove all observation contains nil
547
619
  removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
548
- removed.tdr
620
+ removed
621
+
549
622
  # =>
550
- RedAmber::DataFrame : 333 x 8 Vectors
551
- Vectors : 5 numeric, 3 strings
552
- # key type level data_preview
553
- 1 :species string 3 {"Adelie"=>146, "Chinstrap"=>68, "Gentoo"=>119}
554
- 2 :island string 3 {"Torgersen"=>47, "Biscoe"=>163, "Dream"=>123}
555
- 3 :bill_length_mm double 163 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
556
- 4 :bill_depth_mm double 79 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
557
- 5 :flipper_length_mm uint8 54 [181, 186, 195, 193, 190, ... ]
558
- 6 :body_mass_g uint16 93 [3750, 3800, 3250, 3450, 3650, ... ]
559
- 7 :sex string 2 {"male"=>168, "female"=>165}
560
- 8 :year uint16 3 {2007=>103, 2008=>113, 2009=>117}
623
+ #<RedAmber::DataFrame : 333 x 8 Vectors, 0x0000000000049fac>
624
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
625
+ <string> <string> <double> <double> <uint8> ... <uint16>
626
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
627
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
628
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
629
+ 4 Adelie Torgersen 36.7 19.3 193 ... 2007
630
+ 5 Adelie Torgersen 39.3 20.6 190 ... 2007
631
+ : : : : : : ... :
632
+ 331 Gentoo Biscoe 50.4 15.7 222 ... 2009
633
+ 332 Gentoo Biscoe 45.2 14.8 212 ... 2009
634
+ 333 Gentoo Biscoe 49.9 16.1 213 ... 2009
561
635
  ```
562
636
 
563
637
  - Indices or booleans by a block
@@ -571,14 +645,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
571
645
  max = vector.mean + vector.std
572
646
  vector.to_a.map { |e| (min..max).include? e }
573
647
  end
648
+
574
649
  # =>
575
- #<RedAmber::DataFrame : 140 x 8 Vectors, 0x000000000000f370>
576
- Vectors : 5 numeric, 3 strings
577
- # key type level data_preview
578
- 1 :species string 3 {"Adelie"=>70, "Chinstrap"=>35, "Gentoo"=>35}
579
- 2 :island string 3 {"Torgersen"=>21, "Biscoe"=>56, "Dream"=>63}
580
- 3 :bill_length_mm double 75 [nil, 36.7, 34.1, 37.8, 37.8, ... ], 2 nils
581
- ... 5 more Vectors ...
650
+ #<RedAmber::DataFrame : 140 x 8 Vectors, 0x000000000004de40>
651
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
652
+ <string> <string> <double> <double> <uint8> ... <uint16>
653
+ 1 Adelie Torgersen (nil) (nil) (nil) ... 2007
654
+ 2 Adelie Torgersen 36.7 19.3 193 ... 2007
655
+ 3 Adelie Torgersen 34.1 18.1 193 ... 2007
656
+ 4 Adelie Torgersen 37.8 17.1 186 ... 2007
657
+ 5 Adelie Torgersen 37.8 17.3 180 ... 2007
658
+ : : : : : : ... :
659
+ 138 Gentoo Biscoe (nil) (nil) (nil) ... 2009
660
+ 139 Gentoo Biscoe 50.4 15.7 222 ... 2009
661
+ 140 Gentoo Biscoe 49.9 16.1 213 ... 2009
582
662
  ```
583
663
  - Notice for nil
584
664
  - When `remove` used with booleans, nil in booleans is treated as false. This behavior is aligned with Ruby's `nil#!`.
@@ -586,28 +666,34 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
586
666
  ```ruby
587
667
  df = RedAmber::DataFrame.new(a: [1, 2, nil], b: %w[A B C], c: [1.0, 2, 3])
588
668
  booleans = df[:a] < 2
669
+ booleans
670
+
589
671
  # =>
590
672
  #<RedAmber::Vector(:boolean, size=3):0x000000000000f410>
591
673
  [true, false, nil]
592
674
 
593
675
  booleans_invert = booleans.to_a.map(&:!) # => [false, true, true]
676
+
594
677
  df.slice(booleans) == df.remove(booleans_invert) # => true
595
678
  ```
679
+
596
680
  - Whereas `Vector#invert` returns nil for elements nil. This will bring different result.
597
681
 
598
682
  ```ruby
599
683
  booleans.invert
684
+
600
685
  # =>
601
686
  #<RedAmber::Vector(:boolean, size=3):0x000000000000f488>
602
687
  [false, true, nil]
603
688
 
604
689
  df.remove(booleans.invert)
605
- #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000f474>
606
- Vectors : 2 numeric, 1 string
607
- # key type level data_preview
608
- 1 :a uint8 2 [1, nil], 1 nil
609
- 2 :b string 2 ["A", "C"]
610
- 3 :c double 2 [1.0, 3.0]
690
+
691
+ # =>
692
+ #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000005df98>
693
+ a b c
694
+ <uint8> <string> <double>
695
+ 1 1 A 1.0
696
+ 2 (nil) C 3.0
611
697
  ```
612
698
 
613
699
  ### `rename`
@@ -621,15 +707,16 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
621
707
  `rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}`.
622
708
 
623
709
  ```ruby
624
- h = { 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] }
625
- df = RedAmber::DataFrame.new(h)
710
+ df = RedAmber::DataFrame.new( 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] )
626
711
  df.rename(:age => :age_in_1993)
712
+
627
713
  # =>
628
- #<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f8fc>
629
- Vectors : 1 numeric, 1 string
630
- # key type level data_preview
631
- 1 :name string 3 ["Yasuko", "Rui", "Hinata"]
632
- 2 :age_in_1993 uint8 3 [68, 49, 28]
714
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000060838>
715
+ name age_in_1993
716
+ <string> <uint8>
717
+ 1 Yasuko 68
718
+ 2 Rui 49
719
+ 3 Hinata 28
633
720
  ```
634
721
 
635
722
  - Key pairs by a block
@@ -655,25 +742,29 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
655
742
 
656
743
  ```ruby
657
744
  df = RedAmber::DataFrame.new(
658
- 'name' => %w[Yasuko Rui Hinata],
659
- 'age' => [68, 49, 28])
745
+ name: %w[Yasuko Rui Hinata],
746
+ age: [68, 49, 28])
747
+ df
748
+
660
749
  # =>
661
- #<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f8fc>
662
- Vectors : 1 numeric, 1 string
663
- # key type level data_preview
664
- 1 :name string 3 ["Yasuko", "Rui", "Hinata"]
665
- 2 :age uint8 3 [68, 49, 28]
750
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000062804>
751
+ name age
752
+ <string> <uint8>
753
+ 1 Yasuko 68
754
+ 2 Rui 49
755
+ 3 Hinata 28
666
756
 
667
757
  # update :age and add :brother
668
758
  assigner = { age: [97, 78, 57], brother: ['Santa', nil, 'Momotaro'] }
669
759
  df.assign(assigner)
760
+
670
761
  # =>
671
- #<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000f960>
672
- Vectors : 1 numeric, 2 strings
673
- # key type level data_preview
674
- 1 :name string 3 ["Yasuko", "Rui", "Hinata"]
675
- 2 :age uint8 3 [97, 78, 57]
676
- 3 :brother string 3 ["Santa", nil, "Momotaro"], 1 nil
762
+ #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000658b0>
763
+ name age brother
764
+ <string> <uint8> <string>
765
+ 1 Yasuko 97 Santa
766
+ 2 Rui 78 (nil)
767
+ 3 Hinata 57 Momotaro
677
768
  ```
678
769
 
679
770
  - Key pairs by a block
@@ -685,13 +776,17 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
685
776
  index: [0, 1, 2, 3, nil],
686
777
  float: [0.0, 1.1, 2.2, Float::NAN, nil],
687
778
  string: ['A', 'B', 'C', 'D', nil])
779
+ df
780
+
688
781
  # =>
689
- #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f8c0>
690
- Vectors : 2 numeric, 1 string
691
- # key type level data_preview
692
- 1 :index uint8 5 [0, 1, 2, 3, nil], 1 nil
693
- 2 :float double 5 [0.0, 1.1, 2.2, NaN, nil], 1 NaN, 1 nil
694
- 3 :string string 5 ["A", "B", "C", "D", nil], 1 nil
782
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x0000000000069e60>
783
+ index float string
784
+ <uint8> <double> <string>
785
+ 1 0 0.0 A
786
+ 2 1 1.1 B
787
+ 3 2 2.2 C
788
+ 4 3 NaN D
789
+ 5 (nil) (nil) (nil)
695
790
 
696
791
  # update numeric variables
697
792
  df.assign do
@@ -701,13 +796,16 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
701
796
  end
702
797
  assigner
703
798
  end
799
+
704
800
  # =>
705
- #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f924>
706
- Vectors : 2 numeric, 1 string
707
- # key type level data_preview
708
- 1 :index int8 5 [0, -1, -2, -3, nil], 1 nil
709
- 2 :float double 5 [-0.0, -1.1, -2.2, NaN, nil], 1 NaN, 1 nil
710
- 3 :string string 5 ["A", "B", "C", "D", nil], 1 nil
801
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000006e000>
802
+ index float string
803
+ <int8> <double> <string>
804
+ 1 0 -0.0 A
805
+ 2 -1 -1.1 B
806
+ 3 -2 -2.2 C
807
+ 4 -3 NaN D
808
+ 5 (nil) (nil) (nil)
711
809
 
712
810
  # Or it ’s shorter like this:
713
811
  df.assign do
@@ -715,6 +813,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
715
813
  assigner[key] = vector * -1 if vector.numeric?
716
814
  end
717
815
  end
816
+
718
817
  # => same as above
719
818
  ```
720
819
 
@@ -736,14 +835,17 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
736
835
  string: ['C', 'B', nil, 'A', 'B'],
737
836
  bool: [nil, true, false, true, false],
738
837
  })
739
- df.sort(:index, '-bool').tdr(tally: 0)
838
+ df.sort(:index, '-bool')
839
+
740
840
  # =>
741
- RedAmber::DataFrame : 5 x 3 Vectors
742
- Vectors : 1 numeric, 1 string, 1 boolean
743
- # key type level data_preview
744
- 1 :index uint8 3 [0, 0, 1, 1, nil], 1 nil
745
- 2 :string string 4 [nil, "B", "B", "C", "A"], 1 nil
746
- 3 :bool boolean 3 [false, false, true, nil, true], 1 nil
841
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000009b03c>
842
+ index string bool
843
+ <uint8> <string> <boolean>
844
+ 1 0 (nil) false
845
+ 2 0 B false
846
+ 3 1 B true
847
+ 4 1 C (nil)
848
+ 5 (nil) A true
747
849
  ```
748
850
 
749
851
  - [ ] Clamp
@@ -758,66 +860,16 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
758
860
 
759
861
  ## Grouping
760
862
 
761
- ### `group(aggregating_keys, function, target_keys)`
762
-
763
- (This is a temporary API and may change in the future version.)
863
+ ### `group(aggregating_keys)`
764
864
 
765
- Create grouped dataframe by `aggregation_keys` and apply `function` to each group and returns in `target_keys`. Aggregated key name is `function(key)` style.
865
+ (
866
+ This API will change in the future version. Especcially I want to change:
867
+ - Order of the column of the result (aggregation_keys should be the first)
868
+ - DataFrame#group will accept a block (heronshoes/red_amber #28)
869
+ )
766
870
 
767
- (The current implementation is not intuitive. Needs improvement.)
768
-
769
- ```ruby
770
- ds = Datasets::Rdatasets.new('dplyr', 'starwars')
771
- starwars = RedAmber::DataFrame.new(ds.to_table.to_h)
772
- starwars.tdr(11)
773
- # =>
774
- RedAmber::DataFrame : 87 x 11 Vectors
775
- Vectors : 3 numeric, 8 strings
776
- # key type level data_preview
777
- 1 :name string 87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organa", ... ]
778
- 2 :height uint16 46 [172, 167, 96, 202, 150, ... ], 6 nils
779
- 3 :mass double 39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
780
- 4 :hair_color string 13 ["blond", nil, nil, "none", "brown", ... ], 5 nils
781
- 5 :skin_color string 31 ["fair", "gold", "white, blue", "white", "light", .. . ]
782
- 6 :eye_color string 15 ["blue", "yellow", "red", "yellow", "brown", ... ]
783
- 7 :birth_year double 37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
784
- 8 :sex string 5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, nil=>4}
785
- 9 :gender string 3 {"masculine"=>66, "feminine"=>17, nil=>4}
786
- 10 :homeworld string 49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ], 10 nils
787
- 11 :species string 38 ["Human", "Droid", "Droid", "Human", "Human", ... ], 4 nils
788
-
789
- grouped = starwars.group(:species, :mean, [:mass, :height])
790
- # =>
791
- #<RedAmber::DataFrame : 38 x 3 Vectors, 0x000000000000fbf4>
792
- Vectors : 2 numeric, 1 string
793
- # key type level data_preview
794
- 1 :"mean(mass)" double 27 [82.78181818181818, 69.75, 124.0, 74.0, 1358.0, ... ], 6 nils
795
- 2 :"mean(height)" double 32 [176.6451612903226, 131.2, 231.0, 173.0, 175.0, ... ]
796
- 3 :species string 38 ["Human", "Droid", "Wookiee", "Rodian", "Hutt", ... ], 1 nil
797
-
798
- count = starwars.group(:species, :count, :species)[:"count(species)"]
799
- df = grouped.slice(count > 1)
800
- # =>
801
- #<RedAmber::DataFrame : 8 x 3 Vectors, 0x000000000000fc44>
802
- Vectors : 2 numeric, 1 string
803
- # key type level data_preview
804
- 1 :"mean(mass)" double 8 [82.78181818181818, 69.75, 124.0, 74.0, 80.0, ... ]
805
- 2 :"mean(height)" double 8 [176.6451612903226, 131.2, 231.0, 208.66666666666666, 173.0, ... ]
806
- 3 :species string 8 ["Human", "Droid", "Wookiee", "Gungan", "Zabrak", ... ]
807
-
808
- df.table
809
- # =>
810
- #<Arrow::Table:0x1165593c8 ptr=0x7fb3db144c70>
811
- mean(mass) mean(height) species
812
- 0 82.781818 176.645161 Human
813
- 1 69.750000 131.200000 Droid
814
- 2 124.000000 231.000000 Wookiee
815
- 3 74.000000 208.666667 Gungan
816
- 4 80.000000 173.000000 Zabrak
817
- 5 55.000000 179.000000 Twi'lek
818
- 6 53.100000 168.000000 Mirialan
819
- 7 88.000000 221.000000 Kaminoan
820
- ```
871
+ `group` creates a class `Group` object. `Group` accepts functions below as a method.
872
+ Method accepts options as `summary_keys`.
821
873
 
822
874
  Available functions are:
823
875
 
@@ -837,9 +889,115 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
837
889
  - [ ] tdigest
838
890
  - ✓ variance
839
891
 
892
+ For the each group of `aggregation_keys`, the aggregation `function` is applied and returns a new dataframe with aggregated keys according to `summary_keys`.
893
+ Aggregated key name is `function(summary_key)` style.
894
+
895
+ This is an example of grouping of famous STARWARS dataset.
896
+
897
+ ```ruby
898
+ starwars =
899
+ RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
900
+ starwars
901
+
902
+ # =>
903
+ #<RedAmber::DataFrame : 87 x 12 Vectors, 0x00000000000773bc>
904
+ species name height mass hair_color skin_color eye_color ... homeworld
905
+ <string> <string> <int64> <double> <string> <string> <string> ... <string>
906
+ Human 1 Luke Skywalker 172 77.0 blond fair blue ... Tatooine
907
+ Droid 2 C-3PO 167 75.0 NA gold yellow ... Tatooine
908
+ Droid 3 R2-D2 96 32.0 NA white, blue red ... Naboo
909
+ Human 4 Darth Vader 202 136.0 none white yellow ... Tatooine
910
+ Human 5 Leia Organa 150 49.0 brown light brown ... Alderaan
911
+ : : : : : : : : ... :
912
+ Droid 85 BB8 (nil) (nil) none none black ... NA
913
+ NA 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
914
+ Human 87 Padmé Amidala 165 45.0 brown light brown ... Naboo
915
+
916
+ starwars.tdr(12)
917
+
918
+ # =>
919
+ RedAmber::DataFrame : 87 x 12 Vectors
920
+ Vectors : 4 numeric, 8 strings
921
+ # key type level data_preview
922
+ 1 :"" int64 87 [1, 2, 3, 4, 5, ... ]
923
+ 2 :name string 87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organa", ... ]
924
+ 3 :height int64 46 [172, 167, 96, 202, 150, ... ], 6 nils
925
+ 4 :mass double 39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
926
+ 5 :hair_color string 13 ["blond", "NA", "NA", "none", "brown", ... ]
927
+ 6 :skin_color string 31 ["fair", "gold", "white, blue", "white", "light", ... ]
928
+ 7 :eye_color string 15 ["blue", "yellow", "red", "yellow", "brown", ... ]
929
+ 8 :birth_year double 37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
930
+ 9 :sex string 5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, "NA"=>4}
931
+ 10 :gender string 3 {"masculine"=>66, "feminine"=>17, "NA"=>4}
932
+ 11 :homeworld string 49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ]
933
+ 12 :species string 38 ["Human", "Droid", "Droid", "Human", "Human", ... ]
934
+ ```
935
+
936
+ We can aggregate for `:species` and calculate the mean of `:mass` and `:height`.
937
+
938
+ ```ruby
939
+ grouped = starwars.group(:species).mean(:mass, :height)
940
+ grouped
941
+
942
+ # =>
943
+ #<RedAmber::DataFrame : 38 x 3 Vectors, 0x000000000008e620>
944
+ mean(mass) mean(height) species
945
+ <double> <double> <string>
946
+ 1 82.8 176.6 Human
947
+ 2 69.8 131.2 Droid
948
+ 3 124.0 231.0 Wookiee
949
+ 4 74.0 173.0 Rodian
950
+ 5 1358.0 175.0 Hutt
951
+ : : : :
952
+ 36 159.0 216.0 Kaleesh
953
+ 37 80.0 206.0 Pau'an
954
+ 38 80.0 188.0 Kel Dor
955
+ ```
956
+
957
+ Select rows for count > 1.
958
+
959
+ ```ruby
960
+ count = starwars.group(:species).count(:species)[:'count(species)'] # => Vector
961
+ grouped = grouped.slice(count > 1)
962
+
963
+ # =>
964
+ #<RedAmber::DataFrame : 9 x 3 Vectors, 0x0000000000098260>
965
+ mean(mass) mean(height) species
966
+ <double> <double> <string>
967
+ 1 82.8 176.6 Human
968
+ 2 69.8 131.2 Droid
969
+ 3 124.0 231.0 Wookiee
970
+ 4 74.0 208.7 Gungan
971
+ 5 48.0 181.3 NA
972
+ : : : :
973
+ 7 55.0 179.0 Twi'lek
974
+ 8 53.1 168.0 Mirialan
975
+ 9 88.0 221.0 Kaminoan
976
+ ```
977
+
978
+ Assemble the result and change the order of columns.
979
+
980
+ ```ruby
981
+ grouped.assign(count: count[count > 1]).pick { [2,3,0,1].map{ |i| keys[i] } }
982
+
983
+ # =>
984
+ #<RedAmber::DataFrame : 9 x 4 Vectors, 0x0000000000141838>
985
+ species count mean(mass) mean(height)
986
+ <string> <uint8> <double> <double>
987
+ 1 Human 35 82.8 176.6
988
+ 2 Droid 6 69.8 131.2
989
+ 3 Wookiee 2 124.0 231.0
990
+ 4 Gungan 3 74.0 208.7
991
+ 5 NA 4 48.0 181.3
992
+ : : : : :
993
+ 7 Twi'lek 2 55.0 179.0
994
+ 8 Mirialan 2 53.1 168.0
995
+ 9 Kaminoan 2 88.0 221.0
996
+ ```
997
+
840
998
  ## Combining DataFrames
841
999
 
842
- - [ ] obs
1000
+ - [ ] Combining rows to a dataframe
843
1001
 
844
1002
  - [ ] Add vars
845
1003
 
@@ -852,3 +1010,5 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
852
1010
  - [ ] One-hot encoding
853
1011
 
854
1012
  ## Iteration (not impremented)
1013
+
1014
+ - [ ] each_rows