red_amber 0.1.5 → 0.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +33 -5
  3. data/.rubocop_todo.yml +2 -15
  4. data/.yardopts +1 -0
  5. data/CHANGELOG.md +164 -18
  6. data/Gemfile +6 -1
  7. data/README.md +247 -33
  8. data/Rakefile +1 -0
  9. data/benchmark/csv_load_penguins.yml +1 -1
  10. data/doc/DataFrame.md +383 -219
  11. data/doc/Vector.md +247 -37
  12. data/doc/examples_of_red_amber.ipynb +5454 -0
  13. data/doc/image/dataframe/assign.png +0 -0
  14. data/doc/image/dataframe/drop.png +0 -0
  15. data/doc/image/dataframe/pick.png +0 -0
  16. data/doc/image/dataframe/remove.png +0 -0
  17. data/doc/image/dataframe/rename.png +0 -0
  18. data/doc/image/dataframe/slice.png +0 -0
  19. data/doc/image/dataframe_model.png +0 -0
  20. data/doc/image/vector/binary_element_wise.png +0 -0
  21. data/doc/image/vector/unary_aggregation.png +0 -0
  22. data/doc/image/vector/unary_aggregation_w_option.png +0 -0
  23. data/doc/image/vector/unary_element_wise.png +0 -0
  24. data/lib/red-amber.rb +3 -0
  25. data/lib/red_amber/data_frame.rb +62 -10
  26. data/lib/red_amber/data_frame_displayable.rb +86 -9
  27. data/lib/red_amber/data_frame_selectable.rb +151 -32
  28. data/lib/red_amber/data_frame_variable_operation.rb +4 -0
  29. data/lib/red_amber/group.rb +59 -0
  30. data/lib/red_amber/helper.rb +61 -0
  31. data/lib/red_amber/vector.rb +59 -15
  32. data/lib/red_amber/vector_functions.rb +47 -38
  33. data/lib/red_amber/vector_selectable.rb +126 -0
  34. data/lib/red_amber/vector_updatable.rb +125 -0
  35. data/lib/red_amber/version.rb +1 -1
  36. data/lib/red_amber.rb +6 -3
  37. data/red_amber.gemspec +0 -2
  38. metadata +9 -33
  39. data/lib/red_amber/data_frame_helper.rb +0 -64
  40. data/lib/red_amber/data_frame_observation_operation.rb +0 -83
  41. data/lib/red_amber/vector_compensable.rb +0 -68
data/doc/Vector.md CHANGED
@@ -18,6 +18,13 @@ Class `RedAmber::Vector` represents a series of data in the DataFrame.
18
18
 
19
19
  ```ruby
20
20
  vector = RedAmber::Vector.new([1, 2, 3])
21
+ # or
22
+ vector = RedAmber::Vector.new(1, 2, 3)
23
+ # or
24
+ vector = RedAmber::Vector.new(1..3)
25
+ # or
26
+ vector = RedAmber::Vector.new(Arrow::Array([1, 2, 3])
27
+
21
28
  # =>
22
29
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f514>
23
30
  [1, 2, 3]
@@ -29,29 +36,46 @@ Class `RedAmber::Vector` represents a series of data in the DataFrame.
29
36
 
30
37
  ### `values`, `to_a`, `entries`
31
38
 
39
+ ### `indices`, `indexes`, `indeces`
40
+
41
+ Return indices in an Array.
42
+
43
+ ### `to_ary`
44
+
45
+ It implicitly converts a Vector to an Array when required.
46
+
47
+ ```ruby
48
+ [1, 2] + Vector.new([3, 4])
49
+
50
+ # =>
51
+ [1, 2, 3, 4]
52
+ ```
53
+
32
54
  ### `size`, `length`, `n_rows`, `nrow`
33
55
 
56
+ ### `empty?`
57
+
34
58
  ### `type`
35
59
 
36
60
  ### `boolean?`, `numeric?`, `string?`, `temporal?`
37
61
 
38
62
  ### `type_class`
39
63
 
40
- ### [ ] `each` (not impremented yet)
41
-
42
- ### [ ] `chunked?` (not impremented yet)
64
+ ### `each`
43
65
 
44
- ### [ ] `n_chunks` (not impremented yet)
45
-
46
- ### [ ] `each_chunk` (not impremented yet)
66
+ If block is not given, returns Enumerator.
47
67
 
48
68
  ### `n_nils`, `n_nans`
49
69
 
50
70
  - `n_nulls` is an alias of `n_nils`
51
71
 
72
+ ### `has_nil?`
73
+
74
+ Returns `true` if self has any `nil`. Otherwise returns `false`.
75
+
52
76
  ### `inspect(limit: 80)`
53
77
 
54
- - `limit` sets size limit to display long array.
78
+ - `limit` sets size limit to display a long array.
55
79
 
56
80
  ```ruby
57
81
  vector = RedAmber::Vector.new((1..50).to_a)
@@ -60,6 +84,47 @@ Class `RedAmber::Vector` represents a series of data in the DataFrame.
60
84
  [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, ... ]
61
85
  ```
62
86
 
87
+ ## Selecting Values
88
+
89
+ ### `take(indices)`, `[](indices)`
90
+
91
+ - Acceptable class for indices:
92
+ - Integer, Float
93
+ - Vector of integer or float
94
+ - Arrow::Arry of integer or float
95
+ - Negative index is also OK like the Ruby's primitive Array.
96
+
97
+ ```ruby
98
+ array = RedAmber::Vector.new(%w[A B C D E])
99
+ indices = RedAmber::Vector.new([0.1, -0.5, -5.1])
100
+ array.take(indices)
101
+ # or
102
+ array[indices]
103
+
104
+ # =>
105
+ #<RedAmber::Vector(:string, size=3):0x000000000000f820>
106
+ ["A", "E", "A"]
107
+ ```
108
+
109
+ ### `filter(booleans)`, `[](booleans)`
110
+
111
+ - Acceptable class for booleans:
112
+ - An array of true, false, or nil
113
+ - Boolean Vector
114
+ - Arrow::BooleanArray
115
+
116
+ ```ruby
117
+ array = RedAmber::Vector.new(%w[A B C D E])
118
+ booleans = [true, false, nil, false, true]
119
+ array.filter(booleans)
120
+ # or
121
+ array[booleans]
122
+
123
+ # =>
124
+ #<RedAmber::Vector(:string, size=2):0x000000000000f21c>
125
+ ["A", "E"]
126
+ ```
127
+
63
128
  ## Functions
64
129
 
65
130
  ### Unary aggregations: `vector.func => scalar`
@@ -68,8 +133,8 @@ Class `RedAmber::Vector` represents a series of data in the DataFrame.
68
133
 
69
134
  | Method |Boolean|Numeric|String|Options|Remarks|
70
135
  | ----------- | --- | --- | --- | --- | --- |
71
- | ✓ `all` | ✓ | | | ✓ ScalarAggregate| |
72
- | ✓ `any` | ✓ | | | ✓ ScalarAggregate| |
136
+ | ✓ `all?` | ✓ | | | ✓ ScalarAggregate| alias `all` |
137
+ | ✓ `any?` | ✓ | | | ✓ ScalarAggregate| alias `any` |
73
138
  | ✓ `approximate_median`| |✓| | ✓ ScalarAggregate| alias `median`|
74
139
  | ✓ `count` | ✓ | ✓ | ✓ | ✓ Count | |
75
140
  | ✓ `count_distinct`| ✓ | ✓ | ✓ | ✓ Count |alias `count_uniq`|
@@ -99,9 +164,9 @@ double = RedAmber::Vector.new([1, 0/0.0, -1/0.0, 1/0.0, nil, ""])
99
164
  [1.0, NaN, -Infinity, Infinity, nil, 0.0]
100
165
 
101
166
  double.count #=> 5
102
- double.count(opts: {mode: :only_valid}) #=> 5, default
103
- double.count(opts: {mode: :only_null}) #=> 1
104
- double.count(opts: {mode: :all}) #=> 6
167
+ double.count(mode: :only_valid) #=> 5, default
168
+ double.count(mode: :only_null) #=> 1
169
+ double.count(mode: :all) #=> 6
105
170
 
106
171
  boolean = RedAmber::Vector.new([true, true, nil])
107
172
  #=>
@@ -109,8 +174,8 @@ boolean = RedAmber::Vector.new([true, true, nil])
109
174
  [true, true, nil]
110
175
 
111
176
  boolean.all #=> true
112
- boolean.all(opts: {skip_nulls: true}) #=> true
113
- boolean.all(opts: {skip_nulls: false}) #=> false
177
+ boolean.all(skip_nulls: true) #=> true
178
+ boolean.all(skip_nulls: false) #=> false
114
179
  ```
115
180
 
116
181
  ### Unary element-wise: `vector.func => vector`
@@ -144,6 +209,37 @@ boolean.all(opts: {skip_nulls: false}) #=> false
144
209
  | ✓ `tan` | | ✓ | | | |
145
210
  | ✓ `trunc` | | ✓ | | | |
146
211
 
212
+ Examples of options for `#round`;
213
+
214
+ - `:n-digits` The number of digits to show.
215
+ - `round_mode` Specify rounding mode.
216
+
217
+ ```ruby
218
+ double = RedAmber::Vector.new([15.15, 2.5, 3.5, -4.5, -5.5])
219
+ # => [15.15, 2.5, 3.5, -4.5, -5.5]
220
+ double.round
221
+ # => [15.0, 2.0, 4.0, -4.0, -6.0]
222
+ double.round(mode: :half_to_even)
223
+ # => Default. Same as double.round
224
+ double.round(mode: :towards_infinity)
225
+ # => [16.0, 3.0, 4.0, -5.0, -6.0]
226
+ double.round(mode: :half_up)
227
+ # => [15.0, 3.0, 4.0, -4.0, -5.0]
228
+ double.round(mode: :half_towards_zero)
229
+ # => [15.0, 2.0, 3.0, -4.0, -5.0]
230
+ double.round(mode: :half_towards_infinity)
231
+ # => [15.0, 3.0, 4.0, -5.0, -6.0]
232
+ double.round(mode: :half_to_odd)
233
+ # => [15.0, 3.0, 3.0, -5.0, -5.0]
234
+
235
+ double.round(n_digits: 0)
236
+ # => Default. Same as double.round
237
+ double.round(n_digits: 1)
238
+ # => [15.2, 2.5, 3.5, -4.5, -5.5]
239
+ double.round(n_digits: -1)
240
+ # => [20.0, 0.0, 0.0, -0.0, -10.0]
241
+ ```
242
+
147
243
  ### Binary element-wise: `vector.func(vector) => vector`
148
244
 
149
245
  ![binary element-wise](doc/image/../../image/vector/binary_element_wise.png)
@@ -203,6 +299,9 @@ boolean.all(opts: {skip_nulls: false}) #=> false
203
299
  vector.tally #=> {NaN=>2}
204
300
  vector.value_counts #=> {NaN=>2}
205
301
  ```
302
+ ### `index(element)`
303
+
304
+ Returns index of specified element.
206
305
 
207
306
  ### `sort_indexes`, `sort_indices`, `array_sort_indices`
208
307
 
@@ -215,42 +314,68 @@ boolean.all(opts: {skip_nulls: false}) #=> false
215
314
  ### [ ] (index functions)
216
315
  ### [ ] (other functions)
217
316
 
218
- ## Coerce (not impremented)
317
+ ## Coerce
318
+
319
+ ```ruby
320
+ vector = RedAmber::Vector.new(1,2,3)
321
+ # =>
322
+ #<RedAmber::Vector(:uint8, size=3):0x00000000000decc4>
323
+ [1, 2, 3]
324
+
325
+ # Vector's `#*` method
326
+ vector * -1
327
+ # =>
328
+ #<RedAmber::Vector(:int16, size=3):0x00000000000e3698>
329
+ [-1, -2, -3]
330
+
331
+ # coerced calculation
332
+ -1 * vector
333
+ # =>
334
+ #<RedAmber::Vector(:int16, size=3):0x00000000000ea4ac>
335
+ [-1, -2, -3]
336
+
337
+ # `@-` operator
338
+ -vector
339
+ # =>
340
+ #<RedAmber::Vector(:uint8, size=3):0x00000000000ee7b4>
341
+ [255, 254, 253]
342
+ ```
219
343
 
220
344
  ## Update vector's value
221
- ### `replace_with(booleans, replacements)` => vector
345
+ ### `replace(specifier, replacer)` => vector
222
346
 
223
- - Accepts Vector, Array, Arrow::Array for booleans and replacements.
224
- - Replacements can accept scalar
225
- - Booleans specifies the position of replacement in true.
226
- - Replacements specifies the vaues to be replaced.
227
- - The number of true in booleans must be equal to the length of replacement
347
+ - Accepts Scalar, Range of Integer, Vector, Array, Arrow::Array as a specifier
348
+ - Accepts Scalar, Vector, Array and Arrow::Array as a replacer.
349
+ - Boolean specifiers specify the position of replacer in true.
350
+ - Index specifiers specify the position of replacer in indices.
351
+ - replacer specifies the values to be replaced.
352
+ - The number of true in booleans must be equal to the length of replacer
228
353
 
229
354
  ```ruby
230
355
  vector = RedAmber::Vector.new([1, 2, 3])
231
356
  booleans = [true, false, true]
232
- replacemants = [4, 5]
233
- vector.replace_with(booleans, replacemants)
357
+ replacer = [4, 5]
358
+ vector.replace(booleans, replacer)
234
359
  # =>
235
360
  #<RedAmber::Vector(:uint8, size=3):0x000000000001ee10>
236
361
  [4, 2, 5]
237
362
  ```
238
363
 
239
- - Scalar value in replacements can be broadcasted.
364
+ - Scalar value in replacer can be broadcasted.
240
365
 
241
366
  ```ruby
242
- replacemant = 0
243
- vector.replace_with(booleans, replacement)
367
+ replacer = 0
368
+ vector.replace(booleans, replacer)
244
369
  # =>
245
370
  #<RedAmber::Vector(:uint8, size=3):0x000000000001ee10>
246
371
  [0, 2, 0]
247
372
  ```
248
373
 
249
- - Returned data type is automatically up-casted by replacement.
374
+ - Returned data type is automatically up-casted by replacer.
250
375
 
251
376
  ```ruby
252
- replacement = 1.0
253
- vector.replace_with(booleans, replacement)
377
+ replacer = 1.0
378
+ vector.replace(booleans, replacer)
254
379
  # =>
255
380
  #<RedAmber::Vector(:double, size=3):0x0000000000025d78>
256
381
  [1.0, 2.0, 1.0]
@@ -260,29 +385,29 @@ vector.replace_with(booleans, replacement)
260
385
 
261
386
  ```ruby
262
387
  booleans = [true, false, nil]
263
- replacemant = -1
264
- vec.replace_with(booleans, replacement)
388
+ replacer = -1
389
+ vec.replace(booleans, replacer)
265
390
  =>
266
391
  #<RedAmber::Vector(:int8, size=3):0x00000000000304d0>
267
392
  [-1, 2, nil]
268
393
  ```
269
394
 
270
- - Replacemants can have nil in it.
395
+ - replacer can have nil in it.
271
396
 
272
397
  ```ruby
273
398
  booleans = [true, false, true]
274
- replacemants = [nil]
275
- vec.replace_with(booleans, replacemants)
399
+ replacer = [nil]
400
+ vec.replace(booleans, replacer)
276
401
  =>
277
402
  #<RedAmber::Vector(:int8, size=3):0x00000000000304d0>
278
403
  [nil, 2, nil]
279
404
  ```
280
405
 
281
- - If no replacemants specified, it is same as to specify nil.
406
+ - If no replacer specified, it is same as to specify nil.
282
407
 
283
408
  ```ruby
284
409
  booleans = [true, false, true]
285
- vec.replace_with(booleans)
410
+ vec.replace(booleans)
286
411
  =>
287
412
  #<RedAmber::Vector(:int8, size=3):0x00000000000304d0>
288
413
  [nil, 2, nil]
@@ -292,12 +417,27 @@ vec.replace_with(booleans)
292
417
 
293
418
  ```ruby
294
419
  vector = RedAmber::Vector.new(['A', 'B', 'NA'])
295
- vector.replace_with(vector == 'NA', nil)
420
+ vector.replace(vector == 'NA', nil)
296
421
  # =>
297
422
  #<RedAmber::Vector(:string, size=3):0x000000000000f8ac>
298
423
  ["A", "B", nil]
299
424
  ```
300
425
 
426
+ - Specifier in indices.
427
+
428
+ Specified indices are used 'as sorted'. Position in indices and replacer may not have correspondence.
429
+
430
+ ```ruby
431
+ vector = RedAmber::Vector.new([1, 2, 3])
432
+ indices = [2, 1]
433
+ replacer = [4, 5]
434
+ vector.replace(indices, replacer)
435
+ # =>
436
+ #<RedAmber::Vector(:uint8, size=3):0x000000000000f244>
437
+ [1, 4, 5] # not [1, 5, 4]
438
+ ```
439
+
440
+
301
441
  ### `fill_nil_forward`, `fill_nil_backward` => vector
302
442
 
303
443
  Propagate the last valid observation forward (or backward).
@@ -315,3 +455,73 @@ integer.fill_nil_backward
315
455
  #<RedAmber::Vector(:uint8, size=5):0x000000000000f974>
316
456
  [0, 1, 3, 3, nil]
317
457
  ```
458
+
459
+ ### `boolean_vector.if_else(true_choice, false_choice)` => vector
460
+
461
+ Choose values based on self. Self must be a boolean Vector.
462
+
463
+ `true_choice`, `false_choice` must be of the same type scalar / array / Vector.
464
+ `nil` values in `cond` will be promoted to the output.
465
+
466
+ This example will normalize negative indices to positive ones.
467
+
468
+ ```ruby
469
+ indices = RedAmber::Vector.new([1, -1, 3, -4])
470
+ array_size = 10
471
+ normalized_indices = (indices < 0).if_else(indices + array_size, indices)
472
+
473
+ # =>
474
+ #<RedAmber::Vector(:int16, size=4):0x000000000000f85c>
475
+ [1, 9, 3, 6]
476
+ ```
477
+
478
+ ### `is_in(values)` => boolean vector
479
+
480
+ For each element in self, return true if it is found in given `values`, false otherwise.
481
+ By default, nulls are matched against the value set. (This will be changed in SetLookupOptions: not impremented.)
482
+
483
+ ```ruby
484
+ vector = RedAmber::Vector.new %W[A B C D]
485
+ values = ['A', 'C', 'X']
486
+ vector.is_in(values)
487
+
488
+ # =>
489
+ #<RedAmber::Vector(:boolean, size=4):0x000000000000f2a8>
490
+ [true, false, true, false]
491
+ ```
492
+
493
+ `values` are casted to the same Class of Vector.
494
+
495
+ ```ruby
496
+ vector = RedAmber::Vector.new([1, 2, 255])
497
+ vector.is_in(1, -1)
498
+
499
+ # =>
500
+ #<RedAmber::Vector(:boolean, size=3):0x000000000000f320>
501
+ [true, false, true]
502
+ ```
503
+
504
+ ### `shift(amount = 1, fill: nil)`
505
+
506
+ Shift vector's values by specified `amount`. Shifted space is filled by value `fill`.
507
+
508
+ ```ruby
509
+ vector = RedAmber::Vector.new([1, 2, 3, 4, 5])
510
+ vector.shift
511
+
512
+ # =>
513
+ #<RedAmber::Vector(:uint8, size=5):0x00000000000072d8>
514
+ [nil, 1, 2, 3, 4]
515
+
516
+ vector.shift(-2)
517
+
518
+ # =>
519
+ #<RedAmber::Vector(:uint8, size=5):0x0000000000009970>
520
+ [3, 4, 5, nil, nil]
521
+
522
+ vector.shift(fill: Float::NAN)
523
+
524
+ # =>
525
+ #<RedAmber::Vector(:double, size=5):0x0000000000011d3c>
526
+ [NaN, 1.0, 2.0, 3.0, 4.0]
527
+ ```