red_amber 0.3.0 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.rubocop.yml +39 -20
- data/.yardopts +2 -0
- data/CHANGELOG.md +113 -0
- data/Gemfile +1 -1
- data/LICENSE +1 -1
- data/README.md +25 -26
- data/benchmark/basic.yml +2 -2
- data/benchmark/combine.yml +2 -2
- data/benchmark/dataframe.yml +2 -2
- data/benchmark/group.yml +2 -2
- data/benchmark/reshape.yml +2 -2
- data/benchmark/vector.yml +3 -0
- data/doc/DataFrame.md +32 -12
- data/doc/DataFrame_Comparison.md +65 -0
- data/doc/SubFrames.md +11 -0
- data/doc/Vector.md +207 -1
- data/doc/yard-templates/default/fulldoc/html/css/common.css +6 -0
- data/lib/red_amber/data_frame.rb +429 -75
- data/lib/red_amber/data_frame_combinable.rb +516 -66
- data/lib/red_amber/data_frame_displayable.rb +244 -14
- data/lib/red_amber/data_frame_indexable.rb +121 -18
- data/lib/red_amber/data_frame_loadsave.rb +78 -10
- data/lib/red_amber/data_frame_reshaping.rb +184 -14
- data/lib/red_amber/data_frame_selectable.rb +622 -66
- data/lib/red_amber/data_frame_variable_operation.rb +446 -34
- data/lib/red_amber/group.rb +187 -22
- data/lib/red_amber/helper.rb +70 -10
- data/lib/red_amber/refinements.rb +12 -5
- data/lib/red_amber/subframes.rb +1066 -0
- data/lib/red_amber/vector.rb +385 -11
- data/lib/red_amber/vector_aggregation.rb +312 -0
- data/lib/red_amber/vector_binary_element_wise.rb +387 -0
- data/lib/red_amber/vector_selectable.rb +217 -12
- data/lib/red_amber/vector_unary_element_wise.rb +436 -0
- data/lib/red_amber/vector_updatable.rb +278 -34
- data/lib/red_amber/version.rb +2 -1
- data/lib/red_amber.rb +13 -1
- data/red_amber.gemspec +2 -2
- metadata +13 -8
- data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
- data/lib/red_amber/vector_functions.rb +0 -242
@@ -0,0 +1,65 @@
|
|
1
|
+
# Comparison of DataFrames
|
2
|
+
|
3
|
+
Compare basic features of RedAmber with Python
|
4
|
+
[pandas](https://pandas.pydata.org/),
|
5
|
+
R [Tidyverse](https://www.tidyverse.org/) and
|
6
|
+
Julia [Dataframes](https://dataframes.juliadata.org/stable/).
|
7
|
+
|
8
|
+
## Select columns (variables)
|
9
|
+
|
10
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
11
|
+
|--- |--- |--- |--- |--- |
|
12
|
+
| Select columns as a dataframe | pick, drop, [] | dplyr::select, dplyr::select_if | [], loc[], iloc[], drop, select_dtypes | [], select |
|
13
|
+
| Select a column as a vector | [], v | dplyr::pull, [, x] | [], loc[], iloc[] | [!, :x] |
|
14
|
+
| Move columns to a new position | pick, [] | relocate | [], reindex, loc[], iloc[] | select,transform |
|
15
|
+
|
16
|
+
## Select rows (records, observations)
|
17
|
+
|
18
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
19
|
+
|--- |--- |--- |--- |--- |
|
20
|
+
| Select rows that meet logical criteria as a dataframe | slice, remove, [] | dplyr::filter | [], filter, query, loc[] | filter |
|
21
|
+
| Select rows by position as a dataframe | slice, remove, [] | dplyr::slice | iloc[], drop | subset |
|
22
|
+
| Move rows to a new position | slice, [] | dplyr::filter, dplyr::slice | reindex, loc[], iloc[] | permute |
|
23
|
+
|
24
|
+
## Update columns / create new columns
|
25
|
+
|
26
|
+
|Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
27
|
+
|--- |--- |--- |--- |--- |
|
28
|
+
| Update existing columns | assign | dplyr::mutate | assign, []= | mapcols |
|
29
|
+
| Create new columns | assign, assign_left | dplyr::mutate | apply | insertcols,.+ |
|
30
|
+
| Compute new columns, drop others | new | transmute | (dfply:)transmute | transform,insertcols,mapcols |
|
31
|
+
| Rename columns | rename | dplyr::rename, dplyr::rename_with, purrr::set_names | rename, set_axis | rename |
|
32
|
+
| Sort dataframe | sort | dplyr::arrange | sort_values | sort |
|
33
|
+
|
34
|
+
## Reshape dataframe
|
35
|
+
|
36
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
37
|
+
|--- |--- |--- |--- |--- |
|
38
|
+
| Gather columns into rows (create a longer dataframe) | to_long | tidyr::pivot_longer | melt | stack |
|
39
|
+
| Spread rows into columns (create a wider dataframe) | to_wide | tidyr::pivot_wider | pivot | unstack |
|
40
|
+
| transpose a wide dataframe | transpose | transpose, t | transpose, T | permutedims |
|
41
|
+
|
42
|
+
## Grouping
|
43
|
+
|
44
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
45
|
+
|--- |--- |--- |--- |--- |
|
46
|
+
|Grouping | group, group.summarize | dplyr::group_by %>% dplyr::summarise | groupby.agg | combine,groupby |
|
47
|
+
|
48
|
+
## Combine dataframes or tables
|
49
|
+
|
50
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
51
|
+
|--- |--- |--- |--- |--- |
|
52
|
+
| Combine additional columns | merge, bind_cols | dplyr::bind_cols | concat | combine |
|
53
|
+
| Combine additional rows | concatenate, concat, bind_rows | dplyr::bind_rows | concat | transform |
|
54
|
+
| Join right to left, leaving only the matching rows| join, inner_join | dplyr::inner_join | merge | innerjoin |
|
55
|
+
| Join right to left, leaving all rows | join, full_join, outer_join | dplyr::full_join | merge | outerjoin |
|
56
|
+
| Join matching values to left from right | join, left_join | dplyr::left_join | merge | leftjoin |
|
57
|
+
| Join matching values from left to right | join, right_join | dplyr::right_join | merge | rightjoin |
|
58
|
+
| Return rows of left that have a match in right | join, semi_join | dplyr::semi_join | [isin] | semijoin |
|
59
|
+
| Return rows of left that do not have a match in right | join, anti_join | dplyr::anti_join | [isin] | antijoin |
|
60
|
+
| Collect rows that appear in left or right | union | dplyr::union | merge | |
|
61
|
+
| Collect rows that appear in both left and right | intersect | dplyr::intersect | merge | |
|
62
|
+
| Collect rows that appear in left but not right | difference, setdiff | dplyr::setdiff | merge | |
|
63
|
+
|
64
|
+
|
65
|
+
|
data/doc/SubFrames.md
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
# SubFrames
|
2
|
+
|
3
|
+
`SubFrames` represents a collection of subsets of a DataFrame.
|
4
|
+
It has an Array of indices `#subset_indices` which is able to create an Array of sub DataFrames.
|
5
|
+
The concept includes `group` operation of a Dataframe, rolling window operation and has more broad capabilities.
|
6
|
+
|
7
|
+
This feature is experimental. It may be removed or be changed in the future.
|
8
|
+
|
9
|
+
## Create SubFrames
|
10
|
+
|
11
|
+
## Properties of SubFrames
|
data/doc/Vector.md
CHANGED
@@ -182,6 +182,31 @@ boolean.all(skip_nulls: true) #=> true
|
|
182
182
|
boolean.all(skip_nulls: false) #=> false
|
183
183
|
```
|
184
184
|
|
185
|
+
### Check if `function` is an aggregation function: `Vector.aggregate?(function)`
|
186
|
+
|
187
|
+
Return true if `function` is an unary aggregation function. Otherwise return false.
|
188
|
+
|
189
|
+
### Treat aggregation function as an element-wise function: `propagate(function)`
|
190
|
+
|
191
|
+
Spread the return value of an aggregate function as if it is a element-wise function.
|
192
|
+
|
193
|
+
```ruby
|
194
|
+
vec = Vector.new(1, 2, 3, 4)
|
195
|
+
vec.propagate(:mean)
|
196
|
+
# =>
|
197
|
+
#<RedAmber::Vector(:double, size=4):0x000000000001985c>
|
198
|
+
[2.5, 2.5, 2.5, 2.5]
|
199
|
+
```
|
200
|
+
|
201
|
+
`#propagate` also accepts a block to compute with a customized aggregation function yielding a scalar.
|
202
|
+
|
203
|
+
```ruby
|
204
|
+
vec.propagate { |v| v.mean.round }
|
205
|
+
# =>
|
206
|
+
#<RedAmber::Vector(:uint8, size=4):0x000000000000cb98>
|
207
|
+
[3, 3, 3, 3]
|
208
|
+
```
|
209
|
+
|
185
210
|
### Unary element-wise: `vector.func => vector`
|
186
211
|
|
187
212
|
data:image/s3,"s3://crabby-images/b94b4/b94b42bfa8ffe204f6597efe3b0f18a4cb70a639" alt="unary element-wise"
|
@@ -305,7 +330,7 @@ double.round(n_digits: -1)
|
|
305
330
|
|
306
331
|
Returns index of specified element.
|
307
332
|
|
308
|
-
### `quantiles(probs = [
|
333
|
+
### `quantiles(probs = [0.0, 0.25, 0.5, 0.75, 1.0], interpolation: :linear, skip_nils: true, min_count: 0)`
|
309
334
|
|
310
335
|
Returns quantiles for specified probabilities in a DataFrame.
|
311
336
|
|
@@ -601,3 +626,184 @@ vector.merge(other, sep: '')
|
|
601
626
|
#<RedAmber::Vector(:string, size=3):0x0000000000038b80>
|
602
627
|
["ab", "cd", "ef"]
|
603
628
|
```
|
629
|
+
|
630
|
+
### `concatenate(other)` or `concat(other)`
|
631
|
+
|
632
|
+
Concatenate other array-like to self and return a concatenated Vector.
|
633
|
+
- `other` is one of `Vector`, `Array`, `Arrow::Array` or `Arrow::ChunkedArray`
|
634
|
+
- Different type will be 'resolved'.
|
635
|
+
|
636
|
+
Concatenate to string
|
637
|
+
```ruby
|
638
|
+
string_vector
|
639
|
+
|
640
|
+
# =>
|
641
|
+
#<RedAmber::Vector(:string, size=2):0x00000000000037b4>
|
642
|
+
["A", "B"]
|
643
|
+
|
644
|
+
string_vector.concatenate([1, 2])
|
645
|
+
|
646
|
+
# =>
|
647
|
+
#<RedAmber::Vector(:string, size=4):0x0000000000003818>
|
648
|
+
["A", "B", "1", "2"]
|
649
|
+
```
|
650
|
+
|
651
|
+
Concatenate to integer
|
652
|
+
|
653
|
+
```ruby
|
654
|
+
integer_vector
|
655
|
+
|
656
|
+
# =>
|
657
|
+
#<RedAmber::Vector(:uint8, size=2):0x000000000000382c>
|
658
|
+
[1, 2]
|
659
|
+
|
660
|
+
nteger_vector.concatenate(["A", "B"])
|
661
|
+
# =>
|
662
|
+
#<RedAmber::Vector(:uint8, size=4):0x0000000000003840>
|
663
|
+
[1, 2, 65, 66]
|
664
|
+
```
|
665
|
+
|
666
|
+
### `rank`
|
667
|
+
|
668
|
+
Returns numerical rank of self.
|
669
|
+
- Nil values are considered greater than any value.
|
670
|
+
- NaN values are considered greater than any value but smaller than nil values.
|
671
|
+
- Tiebreakers are ranked in order of appearance.
|
672
|
+
- `RankOptions` in C++ function is not implemented in C GLib yet.
|
673
|
+
This method is currently fixed to the default behavior.
|
674
|
+
|
675
|
+
Returns 0-based rank of self (0...size in range) as a Vector.
|
676
|
+
|
677
|
+
Rank of float Vector
|
678
|
+
```ruby
|
679
|
+
fv = Vector.new(0.1, nil, Float::NAN, 0.2, 0.1); fv
|
680
|
+
# =>
|
681
|
+
#<RedAmber::Vector(:double, size=5):0x000000000000c65c>
|
682
|
+
[0.1, nil, NaN, 0.2, 0.1]
|
683
|
+
|
684
|
+
fv.rank
|
685
|
+
# =>
|
686
|
+
#<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
|
687
|
+
[0, 4, 3, 2, 1]
|
688
|
+
```
|
689
|
+
|
690
|
+
Rank of string Vector
|
691
|
+
```ruby
|
692
|
+
sv = Vector.new("A", "B", nil, "A", "C"); sv
|
693
|
+
# =>
|
694
|
+
#<RedAmber::Vector(:string, size=5):0x0000000000003854>
|
695
|
+
["A", "B", nil, "A", "C"]
|
696
|
+
|
697
|
+
sv.rank
|
698
|
+
# =>
|
699
|
+
#<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
|
700
|
+
[0, 2, 4, 1, 3]
|
701
|
+
```
|
702
|
+
|
703
|
+
### `sample(integer_or_proportion)`
|
704
|
+
|
705
|
+
Pick up elements at random.
|
706
|
+
|
707
|
+
#### `sample` : without agrument
|
708
|
+
|
709
|
+
Return a randomly selected element.
|
710
|
+
This is one of an aggregation function.
|
711
|
+
|
712
|
+
```ruby
|
713
|
+
v = Vector.new('A'..'H'); v
|
714
|
+
# =>
|
715
|
+
#<RedAmber::Vector(:string, size=8):0x0000000000011b20>
|
716
|
+
["A", "B", "C", "D", "E", "F", "G", "H"]
|
717
|
+
|
718
|
+
v.sample
|
719
|
+
# =>
|
720
|
+
"C"
|
721
|
+
```
|
722
|
+
|
723
|
+
#### `sample(n)` : n as a Integer
|
724
|
+
|
725
|
+
Pick up n elements at random.
|
726
|
+
|
727
|
+
- Param `n` is number of elements to pick.
|
728
|
+
- `n` is a positive Integer
|
729
|
+
- If `n` is smaller or equal to size, elements are picked by non-repeating.
|
730
|
+
- If `n` is greater than `size`, elements are picked repeatedly.
|
731
|
+
@return [Vector] sampled elements.
|
732
|
+
- If `n == 1` (in case of `sample(1)`), it returns a Vector of `size == 1` not a scalar.
|
733
|
+
|
734
|
+
```ruby
|
735
|
+
v.sample(1)
|
736
|
+
# =>
|
737
|
+
#<RedAmber::Vector(:string, size=1):0x000000000001a3b0>
|
738
|
+
["H"]
|
739
|
+
```
|
740
|
+
|
741
|
+
Sample same size of self: every element is picked in random order.
|
742
|
+
|
743
|
+
```ruby
|
744
|
+
v.sample(8)
|
745
|
+
# =>
|
746
|
+
#<RedAmber::Vector(:string, size=8):0x000000000001bda0>
|
747
|
+
["H", "D", "B", "F", "E", "A", "G", "C"]
|
748
|
+
```
|
749
|
+
|
750
|
+
Over sampling: "E" and "A" are sampled repeatedly.
|
751
|
+
|
752
|
+
```ruby
|
753
|
+
v.sample(9)
|
754
|
+
# =>
|
755
|
+
#<RedAmber::Vector(:string, size=9):0x000000000001d790>
|
756
|
+
["E", "E", "A", "D", "H", "C", "A", "F", "H"]
|
757
|
+
```
|
758
|
+
|
759
|
+
#### `sample(prop)` : prop as a Float
|
760
|
+
|
761
|
+
Pick up elements by proportion `prop` at random.
|
762
|
+
|
763
|
+
- `prop` is proportion of elements to pick.
|
764
|
+
- `prop` is a positive Float.
|
765
|
+
- Absolute number of elements to pick:`prop*size` is rounded (by `half: :up`).
|
766
|
+
- If `prop` is smaller or equal to 1.0, elements are picked by non-repeating.
|
767
|
+
- If `prop` is greater than 1.0, some elements are picked repeatedly.
|
768
|
+
- Returns sampled elements by a Vector.
|
769
|
+
- If picked element is only one, it returns a Vector of `size == 1` not a scalar.
|
770
|
+
|
771
|
+
Sample same size of self: every element is picked in random order.
|
772
|
+
|
773
|
+
```ruby
|
774
|
+
v.sample(1.0)
|
775
|
+
# =>
|
776
|
+
#<RedAmber::Vector(:string, size=8):0x000000000001bda0>
|
777
|
+
["D", "H", "F", "C", "A", "B", "E", "G"]
|
778
|
+
```
|
779
|
+
|
780
|
+
2 times over sampling.
|
781
|
+
|
782
|
+
```ruby
|
783
|
+
v.sample(2.0)
|
784
|
+
# =>
|
785
|
+
#<RedAmber::Vector(:string, size=16):0x00000000000233e8>
|
786
|
+
["H", "B", "C", "B", "C", "A", "F", "A", "E", "C", "H", "F", "F", "A", ... ]
|
787
|
+
```
|
788
|
+
|
789
|
+
### `sort(integer_or_proportion)`
|
790
|
+
|
791
|
+
Arrange values in Vector.
|
792
|
+
|
793
|
+
- `:+`, `:ascending` or without argument will sort in increasing order.
|
794
|
+
- `:-` or `:descending` will sort in decreasing order.
|
795
|
+
|
796
|
+
```ruby
|
797
|
+
Vector.new(%w[B D A E C]).sort
|
798
|
+
# same as #sort(:+)
|
799
|
+
# same as #sort(:ascending)
|
800
|
+
# =>
|
801
|
+
#<RedAmber::Vector(:string, size=5):0x000000000000c134>
|
802
|
+
["A", "B", "C", "D", "E"]
|
803
|
+
|
804
|
+
Vector.new(%w[B D A E C]).sort(:-)
|
805
|
+
# same as #sort(:descending)
|
806
|
+
# =>
|
807
|
+
#<RedAmber::Vector(:string, size=5):0x000000000000c148>
|
808
|
+
["E", "D", "C", "B", "A"]
|
809
|
+
```
|