red_amber 0.1.5 → 0.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +33 -5
  3. data/.rubocop_todo.yml +2 -15
  4. data/.yardopts +1 -0
  5. data/CHANGELOG.md +164 -18
  6. data/Gemfile +6 -1
  7. data/README.md +247 -33
  8. data/Rakefile +1 -0
  9. data/benchmark/csv_load_penguins.yml +1 -1
  10. data/doc/DataFrame.md +383 -219
  11. data/doc/Vector.md +247 -37
  12. data/doc/examples_of_red_amber.ipynb +5454 -0
  13. data/doc/image/dataframe/assign.png +0 -0
  14. data/doc/image/dataframe/drop.png +0 -0
  15. data/doc/image/dataframe/pick.png +0 -0
  16. data/doc/image/dataframe/remove.png +0 -0
  17. data/doc/image/dataframe/rename.png +0 -0
  18. data/doc/image/dataframe/slice.png +0 -0
  19. data/doc/image/dataframe_model.png +0 -0
  20. data/doc/image/vector/binary_element_wise.png +0 -0
  21. data/doc/image/vector/unary_aggregation.png +0 -0
  22. data/doc/image/vector/unary_aggregation_w_option.png +0 -0
  23. data/doc/image/vector/unary_element_wise.png +0 -0
  24. data/lib/red-amber.rb +3 -0
  25. data/lib/red_amber/data_frame.rb +62 -10
  26. data/lib/red_amber/data_frame_displayable.rb +86 -9
  27. data/lib/red_amber/data_frame_selectable.rb +151 -32
  28. data/lib/red_amber/data_frame_variable_operation.rb +4 -0
  29. data/lib/red_amber/group.rb +59 -0
  30. data/lib/red_amber/helper.rb +61 -0
  31. data/lib/red_amber/vector.rb +59 -15
  32. data/lib/red_amber/vector_functions.rb +47 -38
  33. data/lib/red_amber/vector_selectable.rb +126 -0
  34. data/lib/red_amber/vector_updatable.rb +125 -0
  35. data/lib/red_amber/version.rb +1 -1
  36. data/lib/red_amber.rb +6 -3
  37. data/red_amber.gemspec +0 -2
  38. metadata +9 -33
  39. data/lib/red_amber/data_frame_helper.rb +0 -64
  40. data/lib/red_amber/data_frame_observation_operation.rb +0 -83
  41. data/lib/red_amber/vector_compensable.rb +0 -68
data/doc/DataFrame.md CHANGED
@@ -4,7 +4,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
4
4
  - A collection of data which have same data type within. We call it `Vector`.
5
5
  - A label is attached to `Vector`. We call it `key`.
6
6
  - A `Vector` and associated `key` is grouped as a `variable`.
7
- - `variable`s with same vector length are aligned and arranged to be a `DaTaFrame`.
7
+ - `variable`s with same vector length are aligned and arranged to be a `DataFrame`.
8
8
  - Each `Vector` in a `DataFrame` contains a set of relating data at same position. We call it `observation`.
9
9
 
10
10
  ![dataframe model image](doc/../image/dataframe_model.png)
@@ -35,6 +35,8 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
35
35
 
36
36
 
37
37
  ```ruby
38
+ require 'rover'
39
+
38
40
  rover = Rover::DataFrame.new(x: [1, 2, 3])
39
41
  RedAmber::DataFrame.new(rover)
40
42
  ```
@@ -52,13 +54,15 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
52
54
  - from a URI
53
55
 
54
56
  ```ruby
55
- uri = URI("uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
57
+ uri = URI("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv")
56
58
  RedAmber::DataFrame.load(uri)
57
59
  ```
58
60
 
59
61
  - from a Parquet file
60
62
 
61
63
  ```ruby
64
+ require 'parquet'
65
+
62
66
  dataframe = RedAmber::DataFrame.load("file.parquet")
63
67
  ```
64
68
 
@@ -73,6 +77,8 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
73
77
  - to a Parquet file
74
78
 
75
79
  ```ruby
80
+ require 'parquet'
81
+
76
82
  dataframe.save("file.parquet")
77
83
  ```
78
84
 
@@ -147,9 +153,9 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
147
153
 
148
154
  - Returns an Array of Vectors.
149
155
 
150
- ### `indexes`, `indices`
156
+ ### `indices`, `indexes`
151
157
 
152
- - Returns all indexes in a Range.
158
+ - Returns all indexes in an Array.
153
159
 
154
160
  ### `to_h`
155
161
 
@@ -173,12 +179,45 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
173
179
 
174
180
  ### `to_s`
175
181
 
182
+ `to_s` returns a preview of the Table.
183
+
184
+ ```ruby
185
+ puts penguins.to_s
186
+
187
+ # =>
188
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
189
+ <string> <string> <double> <double> <uint8> ... <uint16>
190
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
191
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
192
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
193
+ 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
194
+ 5 Adelie Torgersen 36.7 19.3 193 ... 2007
195
+ : : : : : : ... :
196
+ 342 Gentoo Biscoe 50.4 15.7 222 ... 2009
197
+ 343 Gentoo Biscoe 45.2 14.8 212 ... 2009
198
+ 344 Gentoo Biscoe 49.9 16.1 213 ... 2009
199
+ ```
200
+ ### `inspect`
201
+
202
+ `inspect` uses `to_s` output and also shows shape and object_id.
203
+
204
+
176
205
  ### `summary`, `describe` (not implemented)
177
206
 
178
207
  ### `to_rover`
179
208
 
180
209
  - Returns a `Rover::DataFrame`.
181
210
 
211
+ ```ruby
212
+ require 'rover'
213
+
214
+ penguins.to_rover
215
+ ```
216
+
217
+ ### `to_iruby`
218
+
219
+ - Show the DataFrame as a Table in Jupyter Notebook or Jupyter Lab with IRuby.
220
+
182
221
  ### `tdr(limit = 10, tally: 5, elements: 5)`
183
222
 
184
223
  - Shows some information about self in a transposed style.
@@ -190,6 +229,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
190
229
 
191
230
  penguins = Datasets::Penguins.new.to_arrow
192
231
  RedAmber::DataFrame.new(penguins).tdr
232
+
193
233
  # =>
194
234
  RedAmber::DataFrame : 344 x 8 Vectors
195
235
  Vectors : 5 numeric, 3 strings
@@ -208,22 +248,6 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
208
248
  - tally: max level to use tally mode.
209
249
  - elements: max num of element to show values in each observations.
210
250
 
211
- ### `inspect`
212
-
213
- - Returns the information of self as `tdr(3)`, and also shows object id.
214
-
215
- ```ruby
216
- puts penguins.inspect
217
- # =>
218
- #<RedAmber::DataFrame : 344 x 8 Vectors, 0x000000000000f0b4>
219
- Vectors : 5 numeric, 3 strings
220
- # key type level data_preview
221
- 1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
222
- 2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
223
- 3 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
224
- ... 5 more Vectors ...
225
- ```
226
-
227
251
  ## Selecting
228
252
 
229
253
  ### Select variables (columns in a table) by `[]` as `[key]`, `[keys]`, `[keys[index]]`
@@ -244,19 +268,21 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
244
268
  hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
245
269
  df = RedAmber::DataFrame.new(hash)
246
270
  df[:b..:c, "a"]
271
+
247
272
  # =>
248
- #<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000b02c>
249
- Vectors : 2 numeric, 1 string
250
- # key type level data_preview
251
- 1 :b string 3 ["A", "B", "C"]
252
- 2 :c double 3 [1.0, 2.0, 3.0]
253
- 3 :a uint8 3 [1, 2, 3]
273
+ #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000328fc>
274
+ b c a
275
+ <string> <double> <uint8>
276
+ 1 A 1.0 1
277
+ 2 B 2.0 2
278
+ 3 C 3.0 3
254
279
  ```
255
280
 
256
281
  If `#[]` represents single variable (column), it returns a Vector object.
257
282
 
258
283
  ```ruby
259
284
  df[:a]
285
+
260
286
  # =>
261
287
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
262
288
  [1, 2, 3]
@@ -265,6 +291,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
265
291
 
266
292
  ```ruby
267
293
  df.v(:a)
294
+
268
295
  # =>
269
296
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
270
297
  [1, 2, 3]
@@ -280,19 +307,24 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
280
307
  An end-less or a begin-less Range can be used to represent indeces.
281
308
 
282
309
  - Select obs. by indeces in an Array: `df[1, 2]`
310
+
311
+ - You can use float indices.
312
+
283
313
  - Mixed case: `df[2, 0..]`
284
314
 
285
315
  ```ruby
286
316
  hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
287
317
  df = RedAmber::DataFrame.new(hash)
288
- df[:b..:c, "a"].tdr(tally_level: 0)
318
+ df[2, 0..]
319
+
289
320
  # =>
290
- RedAmber::DataFrame : 4 x 3 Vectors
291
- Vectors : 2 numeric, 1 string
292
- # key type level data_preview
293
- 1 :a uint8 3 [3, 1, 2, 3]
294
- 2 :b string 3 ["C", "A", "B", "C"]
295
- 3 :c double 3 [3.0, 1.0, 2.0, 3.0]
321
+ #<RedAmber::DataFrame : 4 x 3 Vectors, 0x0000000000033270>
322
+ a b c
323
+ <uint8> <string> <double>
324
+ 1 3 C 3.0
325
+ 2 1 A 1.0
326
+ 3 2 B 2.0
327
+ 4 3 C 3.0
296
328
  ```
297
329
 
298
330
  - Select obs. by a boolean Array or a boolean RedAmber::Vector at same size as self.
@@ -304,13 +336,12 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
304
336
  df[true, false, nil] # or
305
337
  df[[true, false, nil]] # or
306
338
  df[RedAmber::Vector.new([true, false, nil])]
339
+
307
340
  # =>
308
- #<RedAmber::DataFrame : 1 x 3 Vectors, 0x000000000000f1a4>
309
- Vectors : 2 numeric, 1 string
310
- # key type level data_preview
311
- 1 :a uint8 1 [1]
312
- 2 :b string 1 ["A"]
313
- 3 :c double 1 [1.0]
341
+ #<RedAmber::DataFrame : 1 x 3 Vectors, 0x00000000000353e0>
342
+ a b c
343
+ <uint8> <string> <double>
344
+ 1 1 A 1.0
314
345
  ```
315
346
 
316
347
  ### Select rows from top or from bottom
@@ -331,12 +362,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
331
362
 
332
363
  ```ruby
333
364
  penguins.pick(:species, :bill_length_mm)
365
+
334
366
  # =>
335
- #<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000f924>
336
- Vectors : 1 numeric, 1 string
337
- # key type level data_preview
338
- 1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
339
- 2 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
367
+ #<RedAmber::DataFrame : 344 x 2 Vectors, 0x0000000000035ebc>
368
+ species bill_length_mm
369
+ <string> <double>
370
+ 1 Adelie 39.1
371
+ 2 Adelie 39.5
372
+ 3 Adelie 40.3
373
+ 4 Adelie (nil)
374
+ 5 Adelie 36.7
375
+ : : :
376
+ 342 Gentoo 50.4
377
+ 343 Gentoo 45.2
378
+ 344 Gentoo 49.9
340
379
  ```
341
380
 
342
381
  - Booleans as a argument
@@ -345,13 +384,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
345
384
 
346
385
  ```ruby
347
386
  penguins.pick(penguins.types.map { |type| type == :string })
387
+
348
388
  # =>
349
- #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000000f938>
350
- Vectors : 3 strings
351
- # key type level data_preview
352
- 1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
353
- 2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
354
- 3 :sex string 3 {"male"=>168, "female"=>165, ""=>11}
389
+ #<RedAmber::DataFrame : 344 x 3 Vectors, 0x00000000000387ac>
390
+ species island sex
391
+ <string> <string> <string>
392
+ 1 Adelie Torgersen male
393
+ 2 Adelie Torgersen female
394
+ 3 Adelie Torgersen female
395
+ 4 Adelie Torgersen (nil)
396
+ 5 Adelie Torgersen female
397
+ : : : :
398
+ 342 Gentoo Biscoe male
399
+ 343 Gentoo Biscoe female
400
+ 344 Gentoo Biscoe male
355
401
  ```
356
402
 
357
403
  - Keys or booleans by a block
@@ -359,15 +405,21 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
359
405
  `pick {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return keys, or a boolean Array with a same length as `n_keys`. Block is called in the context of self.
360
406
 
361
407
  ```ruby
362
- # It is ok to write `keys ...` in the block, not `penguins.keys ...`
363
408
  penguins.pick { keys.map { |key| key.end_with?('mm') } }
409
+
364
410
  # =>
365
- #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000000f1cc>
366
- Vectors : 3 numeric
367
- # key type level data_preview
368
- 1 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
369
- 2 :bill_depth_mm double 81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
370
- 3 :flipper_length_mm int64 56 [181, 186, 195, nil, 193, ... ], 2 nils
411
+ #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003dd4c>
412
+ bill_length_mm bill_depth_mm flipper_length_mm
413
+ <double> <double> <uint8>
414
+ 1 39.1 18.7 181
415
+ 2 39.5 17.4 186
416
+ 3 40.3 18.0 195
417
+ 4 (nil) (nil) (nil)
418
+ 5 36.7 19.3 193
419
+ : : : :
420
+ 342 50.4 15.7 222
421
+ 343 45.2 14.8 212
422
+ 344 49.9 16.1 213
371
423
  ```
372
424
 
373
425
  ### `drop ` - pick and drop -
@@ -405,13 +457,17 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
405
457
  df = RedAmber::DataFrame.new(a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3])
406
458
  df.pick(:a) # or
407
459
  df.drop(:b, :c)
460
+
408
461
  # =>
409
- #<RedAmber::DataFrame : 3 x 1 Vector, 0x000000000000f280>
410
- Vector : 1 numeric
411
- # key type level data_preview
412
- 1 :a uint8 3 [1, 2, 3]
462
+ #<RedAmber::DataFrame : 3 x 1 Vector, 0x000000000003f4bc>
463
+ a
464
+ <uint8>
465
+ 1 1
466
+ 2 2
467
+ 3 3
413
468
 
414
469
  df[:a]
470
+
415
471
  # =>
416
472
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f258>
417
473
  [1, 2, 3]
@@ -423,21 +479,29 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
423
479
 
424
480
  ![slice method image](doc/../image/dataframe/slice.png)
425
481
 
426
- - Keys as arguments
482
+ - Indices as arguments
483
+
484
+ `slice(indeces)` accepts indices as arguments. Indices should be Integers, Floats or Ranges of Integers.
427
485
 
428
- `slice(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
486
+ Negative index from the tail like Ruby's Array is also acceptable.
429
487
 
430
488
  ```ruby
431
489
  # returns 5 obs. at start and 5 obs. from end
432
490
  penguins.slice(0...5, -5..-1)
491
+
433
492
  # =>
434
- #<RedAmber::DataFrame : 10 x 8 Vectors, 0x000000000000f230>
435
- Vectors : 5 numeric, 3 strings
436
- # key type level data_preview
437
- 1 :species string 2 {"Adelie"=>5, "Gentoo"=>5}
438
- 2 :island string 2 {"Torgersen"=>5, "Biscoe"=>5}
439
- 3 :bill_length_mm double 9 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
440
- ... 5 more Vectors ...
493
+ #<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
494
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
495
+ <string> <string> <double> <double> <uint8> ... <uint16>
496
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
497
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
498
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
499
+ 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
500
+ 5 Adelie Torgersen 36.7 19.3 193 ... 2007
501
+ : : : : : : ... :
502
+ 8 Gentoo Biscoe 50.4 15.7 222 ... 2009
503
+ 9 Gentoo Biscoe 45.2 14.8 212 ... 2009
504
+ 10 Gentoo Biscoe 49.9 16.1 213 ... 2009
441
505
  ```
442
506
 
443
507
  - Booleans as an argument
@@ -447,17 +511,23 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
447
511
  ```ruby
448
512
  vector = penguins[:bill_length_mm]
449
513
  penguins.slice(vector >= 40)
514
+
450
515
  # =>
451
- #<RedAmber::DataFrame : 242 x 8 Vectors, 0x000000000000f2bc>
452
- Vectors : 5 numeric, 3 strings
453
- # key type level data_preview
454
- 1 :species string 3 {"Adelie"=>51, "Chinstrap"=>68, "Gentoo"=>123}
455
- 2 :island string 3 {"Torgersen"=>18, "Biscoe"=>139, "Dream"=>85}
456
- 3 :bill_length_mm double 115 [40.3, 42.0, 41.1, 42.5, 46.0, ... ]
457
- ... 5 more Vectors ...
516
+ #<RedAmber::DataFrame : 242 x 8 Vectors, 0x0000000000043d3c>
517
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
518
+ <string> <string> <double> <double> <uint8> ... <uint16>
519
+ 1 Adelie Torgersen 40.3 18.0 195 ... 2007
520
+ 2 Adelie Torgersen 42.0 20.2 190 ... 2007
521
+ 3 Adelie Torgersen 41.1 17.6 182 ... 2007
522
+ 4 Adelie Torgersen 42.5 20.7 197 ... 2007
523
+ 5 Adelie Torgersen 46.0 21.5 194 ... 2007
524
+ : : : : : : ... :
525
+ 240 Gentoo Biscoe 50.4 15.7 222 ... 2009
526
+ 241 Gentoo Biscoe 45.2 14.8 212 ... 2009
527
+ 242 Gentoo Biscoe 49.9 16.1 213 ... 2009
458
528
  ```
459
529
 
460
- - Keys or booleans by a block
530
+ - Indices or booleans by a block
461
531
 
462
532
  `slice {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.
463
533
 
@@ -469,14 +539,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
469
539
  max = vector.mean + vector.std
470
540
  vector.to_a.map { |e| (min..max).include? e }
471
541
  end
542
+
472
543
  # =>
473
- #<RedAmber::DataFrame : 204 x 8 Vectors, 0x000000000000f30c>
474
- Vectors : 5 numeric, 3 strings
475
- # key type level data_preview
476
- 1 :species string 3 {"Adelie"=>82, "Chinstrap"=>33, "Gentoo"=>89}
477
- 2 :island string 3 {"Torgersen"=>31, "Biscoe"=>112, "Dream"=>61}
478
- 3 :bill_length_mm double 90 [39.1, 39.5, 40.3, 39.3, 38.9, ... ]
479
- ... 5 more Vectors ...
544
+ #<RedAmber::DataFrame : 204 x 8 Vectors, 0x0000000000047a40>
545
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
546
+ <string> <string> <double> <double> <uint8> ... <uint16>
547
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
548
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
549
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
550
+ 4 Adelie Torgersen 39.3 20.6 190 ... 2007
551
+ 5 Adelie Torgersen 38.9 17.8 181 ... 2007
552
+ : : : : : : ... :
553
+ 202 Gentoo Biscoe 47.2 13.7 214 ... 2009
554
+ 203 Gentoo Biscoe 46.8 14.3 215 ... 2009
555
+ 204 Gentoo Biscoe 45.2 14.8 212 ... 2009
480
556
  ```
481
557
 
482
558
  - Notice: nil option
@@ -486,6 +562,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
486
562
  hash = { a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3] }
487
563
  table = Arrow::Table.new(hash)
488
564
  table.slice([true, false, nil])
565
+
489
566
  # =>
490
567
  #<Arrow::Table:0x7fdfe44b9e18 ptr=0x555e9fe744d0>
491
568
  a b c
@@ -497,6 +574,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
497
574
 
498
575
  ```ruby
499
576
  RedAmber::DataFrame.new(table).slice([true, false, nil]).table
577
+
500
578
  # =>
501
579
  #<Arrow::Table:0x7fdfe44981c8 ptr=0x555e9febc330>
502
580
  a b c
@@ -509,21 +587,27 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
509
587
 
510
588
  ![remove method image](doc/../image/dataframe/remove.png)
511
589
 
512
- - Keys as arguments
590
+ - Indices as arguments
513
591
 
514
592
  `remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
515
593
 
516
594
  ```ruby
517
595
  # returns 6th to 339th obs.
518
596
  penguins.remove(0...5, -5..-1)
597
+
519
598
  # =>
520
- #<RedAmber::DataFrame : 334 x 8 Vectors, 0x000000000000f320>
521
- Vectors : 5 numeric, 3 strings
522
- # key type level data_preview
523
- 1 :species string 3 {"Adelie"=>147, "Chinstrap"=>68, "Gentoo"=>119}
524
- 2 :island string 3 {"Torgersen"=>47, "Biscoe"=>163, "Dream"=>124}
525
- 3 :bill_length_mm double 162 [39.3, 38.9, 39.2, 34.1, 42.0, ... ]
526
- ... 5 more Vectors ...
599
+ #<RedAmber::DataFrame : 334 x 8 Vectors, 0x00000000000487c4>
600
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
601
+ <string> <string> <double> <double> <uint8> ... <uint16>
602
+ 1 Adelie Torgersen 39.3 20.6 190 ... 2007
603
+ 2 Adelie Torgersen 38.9 17.8 181 ... 2007
604
+ 3 Adelie Torgersen 39.2 19.6 195 ... 2007
605
+ 4 Adelie Torgersen 34.1 18.1 193 ... 2007
606
+ 5 Adelie Torgersen 42.0 20.2 190 ... 2007
607
+ : : : : : : ... :
608
+ 332 Gentoo Biscoe 44.5 15.7 217 ... 2009
609
+ 333 Gentoo Biscoe 48.8 16.2 222 ... 2009
610
+ 334 Gentoo Biscoe 47.2 13.7 214 ... 2009
527
611
  ```
528
612
 
529
613
  - Booleans as an argument
@@ -533,22 +617,24 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
533
617
  ```ruby
534
618
  # remove all observation contains nil
535
619
  removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
536
- removed.tdr
620
+ removed
621
+
537
622
  # =>
538
- RedAmber::DataFrame : 333 x 8 Vectors
539
- Vectors : 5 numeric, 3 strings
540
- # key type level data_preview
541
- 1 :species string 3 {"Adelie"=>146, "Chinstrap"=>68, "Gentoo"=>119}
542
- 2 :island string 3 {"Torgersen"=>47, "Biscoe"=>163, "Dream"=>123}
543
- 3 :bill_length_mm double 163 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
544
- 4 :bill_depth_mm double 79 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
545
- 5 :flipper_length_mm uint8 54 [181, 186, 195, 193, 190, ... ]
546
- 6 :body_mass_g uint16 93 [3750, 3800, 3250, 3450, 3650, ... ]
547
- 7 :sex string 2 {"male"=>168, "female"=>165}
548
- 8 :year uint16 3 {2007=>103, 2008=>113, 2009=>117}
623
+ #<RedAmber::DataFrame : 333 x 8 Vectors, 0x0000000000049fac>
624
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
625
+ <string> <string> <double> <double> <uint8> ... <uint16>
626
+ 1 Adelie Torgersen 39.1 18.7 181 ... 2007
627
+ 2 Adelie Torgersen 39.5 17.4 186 ... 2007
628
+ 3 Adelie Torgersen 40.3 18.0 195 ... 2007
629
+ 4 Adelie Torgersen 36.7 19.3 193 ... 2007
630
+ 5 Adelie Torgersen 39.3 20.6 190 ... 2007
631
+ : : : : : : ... :
632
+ 331 Gentoo Biscoe 50.4 15.7 222 ... 2009
633
+ 332 Gentoo Biscoe 45.2 14.8 212 ... 2009
634
+ 333 Gentoo Biscoe 49.9 16.1 213 ... 2009
549
635
  ```
550
636
 
551
- - Keys or booleans by a block
637
+ - Indices or booleans by a block
552
638
 
553
639
  `remove {block}` is also acceptable. We can't use both arguments and a block at a same time. The block should return indeces or a boolean Array with a same length as `size`. Block is called in the context of self.
554
640
 
@@ -559,14 +645,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
559
645
  max = vector.mean + vector.std
560
646
  vector.to_a.map { |e| (min..max).include? e }
561
647
  end
648
+
562
649
  # =>
563
- #<RedAmber::DataFrame : 140 x 8 Vectors, 0x000000000000f370>
564
- Vectors : 5 numeric, 3 strings
565
- # key type level data_preview
566
- 1 :species string 3 {"Adelie"=>70, "Chinstrap"=>35, "Gentoo"=>35}
567
- 2 :island string 3 {"Torgersen"=>21, "Biscoe"=>56, "Dream"=>63}
568
- 3 :bill_length_mm double 75 [nil, 36.7, 34.1, 37.8, 37.8, ... ], 2 nils
569
- ... 5 more Vectors ...
650
+ #<RedAmber::DataFrame : 140 x 8 Vectors, 0x000000000004de40>
651
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
652
+ <string> <string> <double> <double> <uint8> ... <uint16>
653
+ 1 Adelie Torgersen (nil) (nil) (nil) ... 2007
654
+ 2 Adelie Torgersen 36.7 19.3 193 ... 2007
655
+ 3 Adelie Torgersen 34.1 18.1 193 ... 2007
656
+ 4 Adelie Torgersen 37.8 17.1 186 ... 2007
657
+ 5 Adelie Torgersen 37.8 17.3 180 ... 2007
658
+ : : : : : : ... :
659
+ 138 Gentoo Biscoe (nil) (nil) (nil) ... 2009
660
+ 139 Gentoo Biscoe 50.4 15.7 222 ... 2009
661
+ 140 Gentoo Biscoe 49.9 16.1 213 ... 2009
570
662
  ```
571
663
  - Notice for nil
572
664
  - When `remove` used with booleans, nil in booleans is treated as false. This behavior is aligned with Ruby's `nil#!`.
@@ -574,28 +666,34 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
574
666
  ```ruby
575
667
  df = RedAmber::DataFrame.new(a: [1, 2, nil], b: %w[A B C], c: [1.0, 2, 3])
576
668
  booleans = df[:a] < 2
669
+ booleans
670
+
577
671
  # =>
578
672
  #<RedAmber::Vector(:boolean, size=3):0x000000000000f410>
579
673
  [true, false, nil]
580
674
 
581
675
  booleans_invert = booleans.to_a.map(&:!) # => [false, true, true]
676
+
582
677
  df.slice(booleans) == df.remove(booleans_invert) # => true
583
678
  ```
679
+
584
680
  - Whereas `Vector#invert` returns nil for elements nil. This will bring different result.
585
681
 
586
682
  ```ruby
587
683
  booleans.invert
684
+
588
685
  # =>
589
686
  #<RedAmber::Vector(:boolean, size=3):0x000000000000f488>
590
687
  [false, true, nil]
591
688
 
592
689
  df.remove(booleans.invert)
593
- #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000000f474>
594
- Vectors : 2 numeric, 1 string
595
- # key type level data_preview
596
- 1 :a uint8 2 [1, nil], 1 nil
597
- 2 :b string 2 ["A", "C"]
598
- 3 :c double 2 [1.0, 3.0]
690
+
691
+ # =>
692
+ #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000005df98>
693
+ a b c
694
+ <uint8> <string> <double>
695
+ 1 1 A 1.0
696
+ 2 (nil) C 3.0
599
697
  ```
600
698
 
601
699
  ### `rename`
@@ -609,15 +707,16 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
609
707
  `rename(key_pairs)` accepts key_pairs as arguments. key_pairs should be a Hash of `{existing_key => new_key}`.
610
708
 
611
709
  ```ruby
612
- h = { 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] }
613
- df = RedAmber::DataFrame.new(h)
710
+ df = RedAmber::DataFrame.new( 'name' => %w[Yasuko Rui Hinata], 'age' => [68, 49, 28] )
614
711
  df.rename(:age => :age_in_1993)
712
+
615
713
  # =>
616
- #<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f8fc>
617
- Vectors : 1 numeric, 1 string
618
- # key type level data_preview
619
- 1 :name string 3 ["Yasuko", "Rui", "Hinata"]
620
- 2 :age_in_1993 uint8 3 [68, 49, 28]
714
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000060838>
715
+ name age_in_1993
716
+ <string> <uint8>
717
+ 1 Yasuko 68
718
+ 2 Rui 49
719
+ 3 Hinata 28
621
720
  ```
622
721
 
623
722
  - Key pairs by a block
@@ -643,25 +742,29 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
643
742
 
644
743
  ```ruby
645
744
  df = RedAmber::DataFrame.new(
646
- 'name' => %w[Yasuko Rui Hinata],
647
- 'age' => [68, 49, 28])
745
+ name: %w[Yasuko Rui Hinata],
746
+ age: [68, 49, 28])
747
+ df
748
+
648
749
  # =>
649
- #<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f8fc>
650
- Vectors : 1 numeric, 1 string
651
- # key type level data_preview
652
- 1 :name string 3 ["Yasuko", "Rui", "Hinata"]
653
- 2 :age uint8 3 [68, 49, 28]
750
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000062804>
751
+ name age
752
+ <string> <uint8>
753
+ 1 Yasuko 68
754
+ 2 Rui 49
755
+ 3 Hinata 28
654
756
 
655
757
  # update :age and add :brother
656
758
  assigner = { age: [97, 78, 57], brother: ['Santa', nil, 'Momotaro'] }
657
759
  df.assign(assigner)
760
+
658
761
  # =>
659
- #<RedAmber::DataFrame : 3 x 3 Vectors, 0x000000000000f960>
660
- Vectors : 1 numeric, 2 strings
661
- # key type level data_preview
662
- 1 :name string 3 ["Yasuko", "Rui", "Hinata"]
663
- 2 :age uint8 3 [97, 78, 57]
664
- 3 :brother string 3 ["Santa", nil, "Momotaro"], 1 nil
762
+ #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000658b0>
763
+ name age brother
764
+ <string> <uint8> <string>
765
+ 1 Yasuko 97 Santa
766
+ 2 Rui 78 (nil)
767
+ 3 Hinata 57 Momotaro
665
768
  ```
666
769
 
667
770
  - Key pairs by a block
@@ -673,13 +776,17 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
673
776
  index: [0, 1, 2, 3, nil],
674
777
  float: [0.0, 1.1, 2.2, Float::NAN, nil],
675
778
  string: ['A', 'B', 'C', 'D', nil])
779
+ df
780
+
676
781
  # =>
677
- #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f8c0>
678
- Vectors : 2 numeric, 1 string
679
- # key type level data_preview
680
- 1 :index uint8 5 [0, 1, 2, 3, nil], 1 nil
681
- 2 :float double 5 [0.0, 1.1, 2.2, NaN, nil], 1 NaN, 1 nil
682
- 3 :string string 5 ["A", "B", "C", "D", nil], 1 nil
782
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x0000000000069e60>
783
+ index float string
784
+ <uint8> <double> <string>
785
+ 1 0 0.0 A
786
+ 2 1 1.1 B
787
+ 3 2 2.2 C
788
+ 4 3 NaN D
789
+ 5 (nil) (nil) (nil)
683
790
 
684
791
  # update numeric variables
685
792
  df.assign do
@@ -689,13 +796,16 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
689
796
  end
690
797
  assigner
691
798
  end
799
+
692
800
  # =>
693
- #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000000f924>
694
- Vectors : 2 numeric, 1 string
695
- # key type level data_preview
696
- 1 :index int8 5 [0, -1, -2, -3, nil], 1 nil
697
- 2 :float double 5 [-0.0, -1.1, -2.2, NaN, nil], 1 NaN, 1 nil
698
- 3 :string string 5 ["A", "B", "C", "D", nil], 1 nil
801
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000006e000>
802
+ index float string
803
+ <int8> <double> <string>
804
+ 1 0 -0.0 A
805
+ 2 -1 -1.1 B
806
+ 3 -2 -2.2 C
807
+ 4 -3 NaN D
808
+ 5 (nil) (nil) (nil)
699
809
 
700
810
  # Or it ’s shorter like this:
701
811
  df.assign do
@@ -703,6 +813,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
703
813
  assigner[key] = vector * -1 if vector.numeric?
704
814
  end
705
815
  end
816
+
706
817
  # => same as above
707
818
  ```
708
819
 
@@ -724,14 +835,17 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
724
835
  string: ['C', 'B', nil, 'A', 'B'],
725
836
  bool: [nil, true, false, true, false],
726
837
  })
727
- df.sort(:index, '-bool').tdr(tally: 0)
838
+ df.sort(:index, '-bool')
839
+
728
840
  # =>
729
- RedAmber::DataFrame : 5 x 3 Vectors
730
- Vectors : 1 numeric, 1 string, 1 boolean
731
- # key type level data_preview
732
- 1 :index uint8 3 [0, 0, 1, 1, nil], 1 nil
733
- 2 :string string 4 [nil, "B", "B", "C", "A"], 1 nil
734
- 3 :bool boolean 3 [false, false, true, nil, true], 1 nil
841
+ #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000009b03c>
842
+ index string bool
843
+ <uint8> <string> <boolean>
844
+ 1 0 (nil) false
845
+ 2 0 B false
846
+ 3 1 B true
847
+ 4 1 C (nil)
848
+ 5 (nil) A true
735
849
  ```
736
850
 
737
851
  - [ ] Clamp
@@ -746,64 +860,10 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
746
860
 
747
861
  ## Grouping
748
862
 
749
- ### `group(aggregating_keys, function, target_keys)`
750
-
751
- Create grouped dataframe by `aggregation_keys` and apply `function` to each group and returns in `target_keys`. Aggregated key name is `function(key)` style.
863
+ ### `group(group_keys)`
752
864
 
753
- (The current implementation is not intuitive. Needs improvement.)
754
-
755
- ```ruby
756
- ds = Datasets::Rdatasets.new('dplyr', 'starwars')
757
- starwars = RedAmber::DataFrame.new(ds.to_table.to_h)
758
- starwars.tdr(11)
759
- # =>
760
- RedAmber::DataFrame : 87 x 11 Vectors
761
- Vectors : 3 numeric, 8 strings
762
- # key type level data_preview
763
- 1 :name string 87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organa", ... ]
764
- 2 :height uint16 46 [172, 167, 96, 202, 150, ... ], 6 nils
765
- 3 :mass double 39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
766
- 4 :hair_color string 13 ["blond", nil, nil, "none", "brown", ... ], 5 nils
767
- 5 :skin_color string 31 ["fair", "gold", "white, blue", "white", "light", .. . ]
768
- 6 :eye_color string 15 ["blue", "yellow", "red", "yellow", "brown", ... ]
769
- 7 :birth_year double 37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
770
- 8 :sex string 5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, nil=>4}
771
- 9 :gender string 3 {"masculine"=>66, "feminine"=>17, nil=>4}
772
- 10 :homeworld string 49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ], 10 nils
773
- 11 :species string 38 ["Human", "Droid", "Droid", "Human", "Human", ... ], 4 nils
774
-
775
- grouped = starwars.group(:species, :mean, [:mass, :height])
776
- # =>
777
- #<RedAmber::DataFrame : 38 x 3 Vectors, 0x000000000000fbf4>
778
- Vectors : 2 numeric, 1 string
779
- # key type level data_preview
780
- 1 :"mean(mass)" double 27 [82.78181818181818, 69.75, 124.0, 74.0, 1358.0, ... ], 6 nils
781
- 2 :"mean(height)" double 32 [176.6451612903226, 131.2, 231.0, 173.0, 175.0, ... ]
782
- 3 :species string 38 ["Human", "Droid", "Wookiee", "Rodian", "Hutt", ... ], 1 nil
783
-
784
- count = starwars.group(:species, :count, :species)[:"count(species)"]
785
- df = grouped.slice(count > 1)
786
- # =>
787
- #<RedAmber::DataFrame : 8 x 3 Vectors, 0x000000000000fc44>
788
- Vectors : 2 numeric, 1 string
789
- # key type level data_preview
790
- 1 :"mean(mass)" double 8 [82.78181818181818, 69.75, 124.0, 74.0, 80.0, ... ]
791
- 2 :"mean(height)" double 8 [176.6451612903226, 131.2, 231.0, 208.66666666666666, 173.0, ... ]
792
- 3 :species string 8 ["Human", "Droid", "Wookiee", "Gungan", "Zabrak", ... ]
793
-
794
- df.table
795
- # =>
796
- #<Arrow::Table:0x1165593c8 ptr=0x7fb3db144c70>
797
- mean(mass) mean(height) species
798
- 0 82.781818 176.645161 Human
799
- 1 69.750000 131.200000 Droid
800
- 2 124.000000 231.000000 Wookiee
801
- 3 74.000000 208.666667 Gungan
802
- 4 80.000000 173.000000 Zabrak
803
- 5 55.000000 179.000000 Twi'lek
804
- 6 53.100000 168.000000 Mirialan
805
- 7 88.000000 221.000000 Kaminoan
806
- ```
865
+ `group` creates a class `Group` object. `Group` accepts functions below as a method.
866
+ Method accepts options as `group_keys`.
807
867
 
808
868
  Available functions are:
809
869
 
@@ -823,11 +883,113 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
823
883
  - [ ] tdigest
824
884
  - ✓ variance
825
885
 
826
- ## Combining DataFrames
886
+ For the each group of `group_keys`, the aggregation `function` is applied and returns a new dataframe with aggregated keys according to `summary_keys`.
887
+ Summary key names are provided by `function(summary_keys)` style.
888
+
889
+ This is an example of grouping of famous STARWARS dataset.
890
+
891
+ ```ruby
892
+ starwars =
893
+ RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
894
+ starwars
895
+
896
+ # =>
897
+ #<RedAmber::DataFrame : 87 x 12 Vectors, 0x0000000000005a50>
898
+ unnamed1 name height mass hair_color skin_color eye_color ... species
899
+ <int64> <string> <int64> <double> <string> <string> <string> ... <string>
900
+ 1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
901
+ 2 2 C-3PO 167 75.0 NA gold yellow ... Droid
902
+ 3 3 R2-D2 96 32.0 NA white, blue red ... Droid
903
+ 4 4 Darth Vader 202 136.0 none white yellow ... Human
904
+ 5 5 Leia Organa 150 49.0 brown light brown ... Human
905
+ : : : : : : : : ... :
906
+ 85 85 BB8 (nil) (nil) none none black ... Droid
907
+ 86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
908
+ 87 87 Padmé Amidala 165 45.0 brown light brown ... Human
909
+
910
+ starwars.tdr(12)
827
911
 
828
- - [ ] obs
912
+ # =>
913
+ RedAmber::DataFrame : 87 x 12 Vectors
914
+ Vectors : 4 numeric, 8 strings
915
+ # key type level data_preview
916
+ 1 :unnamed1 int64 87 [1, 2, 3, 4, 5, ... ]
917
+ 2 :name string 87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organa", ... ]
918
+ 3 :height int64 46 [172, 167, 96, 202, 150, ... ], 6 nils
919
+ 4 :mass double 39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
920
+ 5 :hair_color string 13 ["blond", "NA", "NA", "none", "brown", ... ]
921
+ 6 :skin_color string 31 ["fair", "gold", "white, blue", "white", "light", ... ]
922
+ 7 :eye_color string 15 ["blue", "yellow", "red", "yellow", "brown", ... ]
923
+ 8 :birth_year double 37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
924
+ 9 :sex string 5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, "NA"=>4}
925
+ 10 :gender string 3 {"masculine"=>66, "feminine"=>17, "NA"=>4}
926
+ 11 :homeworld string 49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ]
927
+ 12 :species string 38 ["Human", "Droid", "Droid", "Human", "Human", ... ]
928
+ ```
829
929
 
830
- - [ ] Add vars
930
+ We can group by `:species` and calculate the count.
931
+
932
+ ```ruby
933
+ starwars.group(:species).count(:species)
934
+
935
+ # =>
936
+ #<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
937
+ species count
938
+ <string> <int64>
939
+ 1 Human 35
940
+ 2 Droid 6
941
+ 3 Wookiee 2
942
+ 4 Rodian 1
943
+ 5 Hutt 1
944
+ : : :
945
+ 36 Kaleesh 1
946
+ 37 Pau'an 1
947
+ 38 Kel Dor 1
948
+ ```
949
+
950
+ We can also calculate the mean of `:mass` and `:height` together.
951
+
952
+ ```ruby
953
+ grouped = starwars.group(:species) { [count(:species), mean(:height, :mass)] }
954
+
955
+ # =>
956
+ #<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
957
+ species count mean(height) mean(mass)
958
+ <string> <int64> <double> <double>
959
+ 1 Human 35 176.6 82.8
960
+ 2 Droid 6 131.2 69.8
961
+ 3 Wookiee 2 231.0 124.0
962
+ 4 Rodian 1 173.0 74.0
963
+ 5 Hutt 1 175.0 1358.0
964
+ : : : : :
965
+ 36 Kaleesh 1 216.0 159.0
966
+ 37 Pau'an 1 206.0 80.0
967
+ 38 Kel Dor 1 188.0 80.0
968
+ ```
969
+
970
+ Select rows for count > 1.
971
+
972
+ ```ruby
973
+ grouped.slice(grouped[:count] > 1)
974
+
975
+ # =>
976
+ #<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000004c270>
977
+ species count mean(height) mean(mass)
978
+ <string> <int64> <double> <double>
979
+ 1 Human 35 176.6 82.8
980
+ 2 Droid 6 131.2 69.8
981
+ 3 Wookiee 2 231.0 124.0
982
+ 4 Gungan 3 208.7 74.0
983
+ 5 NA 4 181.3 48.0
984
+ : : : : :
985
+ 7 Twi'lek 2 179.0 55.0
986
+ 8 Mirialan 2 168.0 53.1
987
+ 9 Kaminoan 2 221.0 88.0
988
+ ```
989
+
990
+ ## Combining DataFrames
991
+
992
+ - [ ] Combining rows to a dataframe
831
993
 
832
994
  - [ ] Inner join
833
995
 
@@ -837,4 +999,6 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
837
999
 
838
1000
  - [ ] One-hot encoding
839
1001
 
840
- ## Iteration (not impremented)
1002
+ ## Iteration
1003
+
1004
+ - [ ] each_rows