red_amber 0.1.5 → 0.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +33 -5
- data/.rubocop_todo.yml +2 -15
- data/.yardopts +1 -0
- data/CHANGELOG.md +164 -18
- data/Gemfile +6 -1
- data/README.md +247 -33
- data/Rakefile +1 -0
- data/benchmark/csv_load_penguins.yml +1 -1
- data/doc/DataFrame.md +383 -219
- data/doc/Vector.md +247 -37
- data/doc/examples_of_red_amber.ipynb +5454 -0
- data/doc/image/dataframe/assign.png +0 -0
- data/doc/image/dataframe/drop.png +0 -0
- data/doc/image/dataframe/pick.png +0 -0
- data/doc/image/dataframe/remove.png +0 -0
- data/doc/image/dataframe/rename.png +0 -0
- data/doc/image/dataframe/slice.png +0 -0
- data/doc/image/dataframe_model.png +0 -0
- data/doc/image/vector/binary_element_wise.png +0 -0
- data/doc/image/vector/unary_aggregation.png +0 -0
- data/doc/image/vector/unary_aggregation_w_option.png +0 -0
- data/doc/image/vector/unary_element_wise.png +0 -0
- data/lib/red-amber.rb +3 -0
- data/lib/red_amber/data_frame.rb +62 -10
- data/lib/red_amber/data_frame_displayable.rb +86 -9
- data/lib/red_amber/data_frame_selectable.rb +151 -32
- data/lib/red_amber/data_frame_variable_operation.rb +4 -0
- data/lib/red_amber/group.rb +59 -0
- data/lib/red_amber/helper.rb +61 -0
- data/lib/red_amber/vector.rb +59 -15
- data/lib/red_amber/vector_functions.rb +47 -38
- data/lib/red_amber/vector_selectable.rb +126 -0
- data/lib/red_amber/vector_updatable.rb +125 -0
- data/lib/red_amber/version.rb +1 -1
- data/lib/red_amber.rb +6 -3
- data/red_amber.gemspec +0 -2
- metadata +9 -33
- data/lib/red_amber/data_frame_helper.rb +0 -64
- data/lib/red_amber/data_frame_observation_operation.rb +0 -83
- data/lib/red_amber/vector_compensable.rb +0 -68
data/doc/Vector.md
CHANGED
@@ -18,6 +18,13 @@ Class `RedAmber::Vector` represents a series of data in the DataFrame.
|
|
18
18
|
|
19
19
|
```ruby
|
20
20
|
vector = RedAmber::Vector.new([1, 2, 3])
|
21
|
+
# or
|
22
|
+
vector = RedAmber::Vector.new(1, 2, 3)
|
23
|
+
# or
|
24
|
+
vector = RedAmber::Vector.new(1..3)
|
25
|
+
# or
|
26
|
+
vector = RedAmber::Vector.new(Arrow::Array([1, 2, 3])
|
27
|
+
|
21
28
|
# =>
|
22
29
|
#<RedAmber::Vector(:uint8, size=3):0x000000000000f514>
|
23
30
|
[1, 2, 3]
|
@@ -29,29 +36,46 @@ Class `RedAmber::Vector` represents a series of data in the DataFrame.
|
|
29
36
|
|
30
37
|
### `values`, `to_a`, `entries`
|
31
38
|
|
39
|
+
### `indices`, `indexes`, `indeces`
|
40
|
+
|
41
|
+
Return indices in an Array.
|
42
|
+
|
43
|
+
### `to_ary`
|
44
|
+
|
45
|
+
It implicitly converts a Vector to an Array when required.
|
46
|
+
|
47
|
+
```ruby
|
48
|
+
[1, 2] + Vector.new([3, 4])
|
49
|
+
|
50
|
+
# =>
|
51
|
+
[1, 2, 3, 4]
|
52
|
+
```
|
53
|
+
|
32
54
|
### `size`, `length`, `n_rows`, `nrow`
|
33
55
|
|
56
|
+
### `empty?`
|
57
|
+
|
34
58
|
### `type`
|
35
59
|
|
36
60
|
### `boolean?`, `numeric?`, `string?`, `temporal?`
|
37
61
|
|
38
62
|
### `type_class`
|
39
63
|
|
40
|
-
###
|
41
|
-
|
42
|
-
### [ ] `chunked?` (not impremented yet)
|
64
|
+
### `each`
|
43
65
|
|
44
|
-
|
45
|
-
|
46
|
-
### [ ] `each_chunk` (not impremented yet)
|
66
|
+
If block is not given, returns Enumerator.
|
47
67
|
|
48
68
|
### `n_nils`, `n_nans`
|
49
69
|
|
50
70
|
- `n_nulls` is an alias of `n_nils`
|
51
71
|
|
72
|
+
### `has_nil?`
|
73
|
+
|
74
|
+
Returns `true` if self has any `nil`. Otherwise returns `false`.
|
75
|
+
|
52
76
|
### `inspect(limit: 80)`
|
53
77
|
|
54
|
-
- `limit` sets size limit to display long array.
|
78
|
+
- `limit` sets size limit to display a long array.
|
55
79
|
|
56
80
|
```ruby
|
57
81
|
vector = RedAmber::Vector.new((1..50).to_a)
|
@@ -60,6 +84,47 @@ Class `RedAmber::Vector` represents a series of data in the DataFrame.
|
|
60
84
|
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, ... ]
|
61
85
|
```
|
62
86
|
|
87
|
+
## Selecting Values
|
88
|
+
|
89
|
+
### `take(indices)`, `[](indices)`
|
90
|
+
|
91
|
+
- Acceptable class for indices:
|
92
|
+
- Integer, Float
|
93
|
+
- Vector of integer or float
|
94
|
+
- Arrow::Arry of integer or float
|
95
|
+
- Negative index is also OK like the Ruby's primitive Array.
|
96
|
+
|
97
|
+
```ruby
|
98
|
+
array = RedAmber::Vector.new(%w[A B C D E])
|
99
|
+
indices = RedAmber::Vector.new([0.1, -0.5, -5.1])
|
100
|
+
array.take(indices)
|
101
|
+
# or
|
102
|
+
array[indices]
|
103
|
+
|
104
|
+
# =>
|
105
|
+
#<RedAmber::Vector(:string, size=3):0x000000000000f820>
|
106
|
+
["A", "E", "A"]
|
107
|
+
```
|
108
|
+
|
109
|
+
### `filter(booleans)`, `[](booleans)`
|
110
|
+
|
111
|
+
- Acceptable class for booleans:
|
112
|
+
- An array of true, false, or nil
|
113
|
+
- Boolean Vector
|
114
|
+
- Arrow::BooleanArray
|
115
|
+
|
116
|
+
```ruby
|
117
|
+
array = RedAmber::Vector.new(%w[A B C D E])
|
118
|
+
booleans = [true, false, nil, false, true]
|
119
|
+
array.filter(booleans)
|
120
|
+
# or
|
121
|
+
array[booleans]
|
122
|
+
|
123
|
+
# =>
|
124
|
+
#<RedAmber::Vector(:string, size=2):0x000000000000f21c>
|
125
|
+
["A", "E"]
|
126
|
+
```
|
127
|
+
|
63
128
|
## Functions
|
64
129
|
|
65
130
|
### Unary aggregations: `vector.func => scalar`
|
@@ -68,8 +133,8 @@ Class `RedAmber::Vector` represents a series of data in the DataFrame.
|
|
68
133
|
|
69
134
|
| Method |Boolean|Numeric|String|Options|Remarks|
|
70
135
|
| ----------- | --- | --- | --- | --- | --- |
|
71
|
-
| ✓ `all
|
72
|
-
| ✓ `any
|
136
|
+
| ✓ `all?` | ✓ | | | ✓ ScalarAggregate| alias `all` |
|
137
|
+
| ✓ `any?` | ✓ | | | ✓ ScalarAggregate| alias `any` |
|
73
138
|
| ✓ `approximate_median`| |✓| | ✓ ScalarAggregate| alias `median`|
|
74
139
|
| ✓ `count` | ✓ | ✓ | ✓ | ✓ Count | |
|
75
140
|
| ✓ `count_distinct`| ✓ | ✓ | ✓ | ✓ Count |alias `count_uniq`|
|
@@ -99,9 +164,9 @@ double = RedAmber::Vector.new([1, 0/0.0, -1/0.0, 1/0.0, nil, ""])
|
|
99
164
|
[1.0, NaN, -Infinity, Infinity, nil, 0.0]
|
100
165
|
|
101
166
|
double.count #=> 5
|
102
|
-
double.count(
|
103
|
-
double.count(
|
104
|
-
double.count(
|
167
|
+
double.count(mode: :only_valid) #=> 5, default
|
168
|
+
double.count(mode: :only_null) #=> 1
|
169
|
+
double.count(mode: :all) #=> 6
|
105
170
|
|
106
171
|
boolean = RedAmber::Vector.new([true, true, nil])
|
107
172
|
#=>
|
@@ -109,8 +174,8 @@ boolean = RedAmber::Vector.new([true, true, nil])
|
|
109
174
|
[true, true, nil]
|
110
175
|
|
111
176
|
boolean.all #=> true
|
112
|
-
boolean.all(
|
113
|
-
boolean.all(
|
177
|
+
boolean.all(skip_nulls: true) #=> true
|
178
|
+
boolean.all(skip_nulls: false) #=> false
|
114
179
|
```
|
115
180
|
|
116
181
|
### Unary element-wise: `vector.func => vector`
|
@@ -144,6 +209,37 @@ boolean.all(opts: {skip_nulls: false}) #=> false
|
|
144
209
|
| ✓ `tan` | | ✓ | | | |
|
145
210
|
| ✓ `trunc` | | ✓ | | | |
|
146
211
|
|
212
|
+
Examples of options for `#round`;
|
213
|
+
|
214
|
+
- `:n-digits` The number of digits to show.
|
215
|
+
- `round_mode` Specify rounding mode.
|
216
|
+
|
217
|
+
```ruby
|
218
|
+
double = RedAmber::Vector.new([15.15, 2.5, 3.5, -4.5, -5.5])
|
219
|
+
# => [15.15, 2.5, 3.5, -4.5, -5.5]
|
220
|
+
double.round
|
221
|
+
# => [15.0, 2.0, 4.0, -4.0, -6.0]
|
222
|
+
double.round(mode: :half_to_even)
|
223
|
+
# => Default. Same as double.round
|
224
|
+
double.round(mode: :towards_infinity)
|
225
|
+
# => [16.0, 3.0, 4.0, -5.0, -6.0]
|
226
|
+
double.round(mode: :half_up)
|
227
|
+
# => [15.0, 3.0, 4.0, -4.0, -5.0]
|
228
|
+
double.round(mode: :half_towards_zero)
|
229
|
+
# => [15.0, 2.0, 3.0, -4.0, -5.0]
|
230
|
+
double.round(mode: :half_towards_infinity)
|
231
|
+
# => [15.0, 3.0, 4.0, -5.0, -6.0]
|
232
|
+
double.round(mode: :half_to_odd)
|
233
|
+
# => [15.0, 3.0, 3.0, -5.0, -5.0]
|
234
|
+
|
235
|
+
double.round(n_digits: 0)
|
236
|
+
# => Default. Same as double.round
|
237
|
+
double.round(n_digits: 1)
|
238
|
+
# => [15.2, 2.5, 3.5, -4.5, -5.5]
|
239
|
+
double.round(n_digits: -1)
|
240
|
+
# => [20.0, 0.0, 0.0, -0.0, -10.0]
|
241
|
+
```
|
242
|
+
|
147
243
|
### Binary element-wise: `vector.func(vector) => vector`
|
148
244
|
|
149
245
|

|
@@ -203,6 +299,9 @@ boolean.all(opts: {skip_nulls: false}) #=> false
|
|
203
299
|
vector.tally #=> {NaN=>2}
|
204
300
|
vector.value_counts #=> {NaN=>2}
|
205
301
|
```
|
302
|
+
### `index(element)`
|
303
|
+
|
304
|
+
Returns index of specified element.
|
206
305
|
|
207
306
|
### `sort_indexes`, `sort_indices`, `array_sort_indices`
|
208
307
|
|
@@ -215,42 +314,68 @@ boolean.all(opts: {skip_nulls: false}) #=> false
|
|
215
314
|
### [ ] (index functions)
|
216
315
|
### [ ] (other functions)
|
217
316
|
|
218
|
-
## Coerce
|
317
|
+
## Coerce
|
318
|
+
|
319
|
+
```ruby
|
320
|
+
vector = RedAmber::Vector.new(1,2,3)
|
321
|
+
# =>
|
322
|
+
#<RedAmber::Vector(:uint8, size=3):0x00000000000decc4>
|
323
|
+
[1, 2, 3]
|
324
|
+
|
325
|
+
# Vector's `#*` method
|
326
|
+
vector * -1
|
327
|
+
# =>
|
328
|
+
#<RedAmber::Vector(:int16, size=3):0x00000000000e3698>
|
329
|
+
[-1, -2, -3]
|
330
|
+
|
331
|
+
# coerced calculation
|
332
|
+
-1 * vector
|
333
|
+
# =>
|
334
|
+
#<RedAmber::Vector(:int16, size=3):0x00000000000ea4ac>
|
335
|
+
[-1, -2, -3]
|
336
|
+
|
337
|
+
# `@-` operator
|
338
|
+
-vector
|
339
|
+
# =>
|
340
|
+
#<RedAmber::Vector(:uint8, size=3):0x00000000000ee7b4>
|
341
|
+
[255, 254, 253]
|
342
|
+
```
|
219
343
|
|
220
344
|
## Update vector's value
|
221
|
-
### `
|
345
|
+
### `replace(specifier, replacer)` => vector
|
222
346
|
|
223
|
-
- Accepts Vector, Array, Arrow::Array
|
224
|
-
|
225
|
-
-
|
226
|
-
-
|
227
|
-
|
347
|
+
- Accepts Scalar, Range of Integer, Vector, Array, Arrow::Array as a specifier
|
348
|
+
- Accepts Scalar, Vector, Array and Arrow::Array as a replacer.
|
349
|
+
- Boolean specifiers specify the position of replacer in true.
|
350
|
+
- Index specifiers specify the position of replacer in indices.
|
351
|
+
- replacer specifies the values to be replaced.
|
352
|
+
- The number of true in booleans must be equal to the length of replacer
|
228
353
|
|
229
354
|
```ruby
|
230
355
|
vector = RedAmber::Vector.new([1, 2, 3])
|
231
356
|
booleans = [true, false, true]
|
232
|
-
|
233
|
-
vector.
|
357
|
+
replacer = [4, 5]
|
358
|
+
vector.replace(booleans, replacer)
|
234
359
|
# =>
|
235
360
|
#<RedAmber::Vector(:uint8, size=3):0x000000000001ee10>
|
236
361
|
[4, 2, 5]
|
237
362
|
```
|
238
363
|
|
239
|
-
- Scalar value in
|
364
|
+
- Scalar value in replacer can be broadcasted.
|
240
365
|
|
241
366
|
```ruby
|
242
|
-
|
243
|
-
vector.
|
367
|
+
replacer = 0
|
368
|
+
vector.replace(booleans, replacer)
|
244
369
|
# =>
|
245
370
|
#<RedAmber::Vector(:uint8, size=3):0x000000000001ee10>
|
246
371
|
[0, 2, 0]
|
247
372
|
```
|
248
373
|
|
249
|
-
- Returned data type is automatically up-casted by
|
374
|
+
- Returned data type is automatically up-casted by replacer.
|
250
375
|
|
251
376
|
```ruby
|
252
|
-
|
253
|
-
vector.
|
377
|
+
replacer = 1.0
|
378
|
+
vector.replace(booleans, replacer)
|
254
379
|
# =>
|
255
380
|
#<RedAmber::Vector(:double, size=3):0x0000000000025d78>
|
256
381
|
[1.0, 2.0, 1.0]
|
@@ -260,29 +385,29 @@ vector.replace_with(booleans, replacement)
|
|
260
385
|
|
261
386
|
```ruby
|
262
387
|
booleans = [true, false, nil]
|
263
|
-
|
264
|
-
vec.
|
388
|
+
replacer = -1
|
389
|
+
vec.replace(booleans, replacer)
|
265
390
|
=>
|
266
391
|
#<RedAmber::Vector(:int8, size=3):0x00000000000304d0>
|
267
392
|
[-1, 2, nil]
|
268
393
|
```
|
269
394
|
|
270
|
-
-
|
395
|
+
- replacer can have nil in it.
|
271
396
|
|
272
397
|
```ruby
|
273
398
|
booleans = [true, false, true]
|
274
|
-
|
275
|
-
vec.
|
399
|
+
replacer = [nil]
|
400
|
+
vec.replace(booleans, replacer)
|
276
401
|
=>
|
277
402
|
#<RedAmber::Vector(:int8, size=3):0x00000000000304d0>
|
278
403
|
[nil, 2, nil]
|
279
404
|
```
|
280
405
|
|
281
|
-
- If no
|
406
|
+
- If no replacer specified, it is same as to specify nil.
|
282
407
|
|
283
408
|
```ruby
|
284
409
|
booleans = [true, false, true]
|
285
|
-
vec.
|
410
|
+
vec.replace(booleans)
|
286
411
|
=>
|
287
412
|
#<RedAmber::Vector(:int8, size=3):0x00000000000304d0>
|
288
413
|
[nil, 2, nil]
|
@@ -292,12 +417,27 @@ vec.replace_with(booleans)
|
|
292
417
|
|
293
418
|
```ruby
|
294
419
|
vector = RedAmber::Vector.new(['A', 'B', 'NA'])
|
295
|
-
vector.
|
420
|
+
vector.replace(vector == 'NA', nil)
|
296
421
|
# =>
|
297
422
|
#<RedAmber::Vector(:string, size=3):0x000000000000f8ac>
|
298
423
|
["A", "B", nil]
|
299
424
|
```
|
300
425
|
|
426
|
+
- Specifier in indices.
|
427
|
+
|
428
|
+
Specified indices are used 'as sorted'. Position in indices and replacer may not have correspondence.
|
429
|
+
|
430
|
+
```ruby
|
431
|
+
vector = RedAmber::Vector.new([1, 2, 3])
|
432
|
+
indices = [2, 1]
|
433
|
+
replacer = [4, 5]
|
434
|
+
vector.replace(indices, replacer)
|
435
|
+
# =>
|
436
|
+
#<RedAmber::Vector(:uint8, size=3):0x000000000000f244>
|
437
|
+
[1, 4, 5] # not [1, 5, 4]
|
438
|
+
```
|
439
|
+
|
440
|
+
|
301
441
|
### `fill_nil_forward`, `fill_nil_backward` => vector
|
302
442
|
|
303
443
|
Propagate the last valid observation forward (or backward).
|
@@ -315,3 +455,73 @@ integer.fill_nil_backward
|
|
315
455
|
#<RedAmber::Vector(:uint8, size=5):0x000000000000f974>
|
316
456
|
[0, 1, 3, 3, nil]
|
317
457
|
```
|
458
|
+
|
459
|
+
### `boolean_vector.if_else(true_choice, false_choice)` => vector
|
460
|
+
|
461
|
+
Choose values based on self. Self must be a boolean Vector.
|
462
|
+
|
463
|
+
`true_choice`, `false_choice` must be of the same type scalar / array / Vector.
|
464
|
+
`nil` values in `cond` will be promoted to the output.
|
465
|
+
|
466
|
+
This example will normalize negative indices to positive ones.
|
467
|
+
|
468
|
+
```ruby
|
469
|
+
indices = RedAmber::Vector.new([1, -1, 3, -4])
|
470
|
+
array_size = 10
|
471
|
+
normalized_indices = (indices < 0).if_else(indices + array_size, indices)
|
472
|
+
|
473
|
+
# =>
|
474
|
+
#<RedAmber::Vector(:int16, size=4):0x000000000000f85c>
|
475
|
+
[1, 9, 3, 6]
|
476
|
+
```
|
477
|
+
|
478
|
+
### `is_in(values)` => boolean vector
|
479
|
+
|
480
|
+
For each element in self, return true if it is found in given `values`, false otherwise.
|
481
|
+
By default, nulls are matched against the value set. (This will be changed in SetLookupOptions: not impremented.)
|
482
|
+
|
483
|
+
```ruby
|
484
|
+
vector = RedAmber::Vector.new %W[A B C D]
|
485
|
+
values = ['A', 'C', 'X']
|
486
|
+
vector.is_in(values)
|
487
|
+
|
488
|
+
# =>
|
489
|
+
#<RedAmber::Vector(:boolean, size=4):0x000000000000f2a8>
|
490
|
+
[true, false, true, false]
|
491
|
+
```
|
492
|
+
|
493
|
+
`values` are casted to the same Class of Vector.
|
494
|
+
|
495
|
+
```ruby
|
496
|
+
vector = RedAmber::Vector.new([1, 2, 255])
|
497
|
+
vector.is_in(1, -1)
|
498
|
+
|
499
|
+
# =>
|
500
|
+
#<RedAmber::Vector(:boolean, size=3):0x000000000000f320>
|
501
|
+
[true, false, true]
|
502
|
+
```
|
503
|
+
|
504
|
+
### `shift(amount = 1, fill: nil)`
|
505
|
+
|
506
|
+
Shift vector's values by specified `amount`. Shifted space is filled by value `fill`.
|
507
|
+
|
508
|
+
```ruby
|
509
|
+
vector = RedAmber::Vector.new([1, 2, 3, 4, 5])
|
510
|
+
vector.shift
|
511
|
+
|
512
|
+
# =>
|
513
|
+
#<RedAmber::Vector(:uint8, size=5):0x00000000000072d8>
|
514
|
+
[nil, 1, 2, 3, 4]
|
515
|
+
|
516
|
+
vector.shift(-2)
|
517
|
+
|
518
|
+
# =>
|
519
|
+
#<RedAmber::Vector(:uint8, size=5):0x0000000000009970>
|
520
|
+
[3, 4, 5, nil, nil]
|
521
|
+
|
522
|
+
vector.shift(fill: Float::NAN)
|
523
|
+
|
524
|
+
# =>
|
525
|
+
#<RedAmber::Vector(:double, size=5):0x0000000000011d3c>
|
526
|
+
[NaN, 1.0, 2.0, 3.0, 4.0]
|
527
|
+
```
|