red_amber 0.3.0 → 0.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +39 -20
- data/.yardopts +2 -0
- data/CHANGELOG.md +113 -0
- data/Gemfile +1 -1
- data/LICENSE +1 -1
- data/README.md +25 -26
- data/benchmark/basic.yml +2 -2
- data/benchmark/combine.yml +2 -2
- data/benchmark/dataframe.yml +2 -2
- data/benchmark/group.yml +2 -2
- data/benchmark/reshape.yml +2 -2
- data/benchmark/vector.yml +3 -0
- data/doc/DataFrame.md +32 -12
- data/doc/DataFrame_Comparison.md +65 -0
- data/doc/SubFrames.md +11 -0
- data/doc/Vector.md +207 -1
- data/doc/yard-templates/default/fulldoc/html/css/common.css +6 -0
- data/lib/red_amber/data_frame.rb +429 -75
- data/lib/red_amber/data_frame_combinable.rb +516 -66
- data/lib/red_amber/data_frame_displayable.rb +244 -14
- data/lib/red_amber/data_frame_indexable.rb +121 -18
- data/lib/red_amber/data_frame_loadsave.rb +78 -10
- data/lib/red_amber/data_frame_reshaping.rb +184 -14
- data/lib/red_amber/data_frame_selectable.rb +622 -66
- data/lib/red_amber/data_frame_variable_operation.rb +446 -34
- data/lib/red_amber/group.rb +187 -22
- data/lib/red_amber/helper.rb +70 -10
- data/lib/red_amber/refinements.rb +12 -5
- data/lib/red_amber/subframes.rb +1066 -0
- data/lib/red_amber/vector.rb +385 -11
- data/lib/red_amber/vector_aggregation.rb +312 -0
- data/lib/red_amber/vector_binary_element_wise.rb +387 -0
- data/lib/red_amber/vector_selectable.rb +217 -12
- data/lib/red_amber/vector_unary_element_wise.rb +436 -0
- data/lib/red_amber/vector_updatable.rb +278 -34
- data/lib/red_amber/version.rb +2 -1
- data/lib/red_amber.rb +13 -1
- data/red_amber.gemspec +2 -2
- metadata +13 -8
- data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
- data/lib/red_amber/vector_functions.rb +0 -242
@@ -0,0 +1,65 @@
|
|
1
|
+
# Comparison of DataFrames
|
2
|
+
|
3
|
+
Compare basic features of RedAmber with Python
|
4
|
+
[pandas](https://pandas.pydata.org/),
|
5
|
+
R [Tidyverse](https://www.tidyverse.org/) and
|
6
|
+
Julia [Dataframes](https://dataframes.juliadata.org/stable/).
|
7
|
+
|
8
|
+
## Select columns (variables)
|
9
|
+
|
10
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
11
|
+
|--- |--- |--- |--- |--- |
|
12
|
+
| Select columns as a dataframe | pick, drop, [] | dplyr::select, dplyr::select_if | [], loc[], iloc[], drop, select_dtypes | [], select |
|
13
|
+
| Select a column as a vector | [], v | dplyr::pull, [, x] | [], loc[], iloc[] | [!, :x] |
|
14
|
+
| Move columns to a new position | pick, [] | relocate | [], reindex, loc[], iloc[] | select,transform |
|
15
|
+
|
16
|
+
## Select rows (records, observations)
|
17
|
+
|
18
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
19
|
+
|--- |--- |--- |--- |--- |
|
20
|
+
| Select rows that meet logical criteria as a dataframe | slice, remove, [] | dplyr::filter | [], filter, query, loc[] | filter |
|
21
|
+
| Select rows by position as a dataframe | slice, remove, [] | dplyr::slice | iloc[], drop | subset |
|
22
|
+
| Move rows to a new position | slice, [] | dplyr::filter, dplyr::slice | reindex, loc[], iloc[] | permute |
|
23
|
+
|
24
|
+
## Update columns / create new columns
|
25
|
+
|
26
|
+
|Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
27
|
+
|--- |--- |--- |--- |--- |
|
28
|
+
| Update existing columns | assign | dplyr::mutate | assign, []= | mapcols |
|
29
|
+
| Create new columns | assign, assign_left | dplyr::mutate | apply | insertcols,.+ |
|
30
|
+
| Compute new columns, drop others | new | transmute | (dfply:)transmute | transform,insertcols,mapcols |
|
31
|
+
| Rename columns | rename | dplyr::rename, dplyr::rename_with, purrr::set_names | rename, set_axis | rename |
|
32
|
+
| Sort dataframe | sort | dplyr::arrange | sort_values | sort |
|
33
|
+
|
34
|
+
## Reshape dataframe
|
35
|
+
|
36
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
37
|
+
|--- |--- |--- |--- |--- |
|
38
|
+
| Gather columns into rows (create a longer dataframe) | to_long | tidyr::pivot_longer | melt | stack |
|
39
|
+
| Spread rows into columns (create a wider dataframe) | to_wide | tidyr::pivot_wider | pivot | unstack |
|
40
|
+
| transpose a wide dataframe | transpose | transpose, t | transpose, T | permutedims |
|
41
|
+
|
42
|
+
## Grouping
|
43
|
+
|
44
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
45
|
+
|--- |--- |--- |--- |--- |
|
46
|
+
|Grouping | group, group.summarize | dplyr::group_by %>% dplyr::summarise | groupby.agg | combine,groupby |
|
47
|
+
|
48
|
+
## Combine dataframes or tables
|
49
|
+
|
50
|
+
| Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
|
51
|
+
|--- |--- |--- |--- |--- |
|
52
|
+
| Combine additional columns | merge, bind_cols | dplyr::bind_cols | concat | combine |
|
53
|
+
| Combine additional rows | concatenate, concat, bind_rows | dplyr::bind_rows | concat | transform |
|
54
|
+
| Join right to left, leaving only the matching rows| join, inner_join | dplyr::inner_join | merge | innerjoin |
|
55
|
+
| Join right to left, leaving all rows | join, full_join, outer_join | dplyr::full_join | merge | outerjoin |
|
56
|
+
| Join matching values to left from right | join, left_join | dplyr::left_join | merge | leftjoin |
|
57
|
+
| Join matching values from left to right | join, right_join | dplyr::right_join | merge | rightjoin |
|
58
|
+
| Return rows of left that have a match in right | join, semi_join | dplyr::semi_join | [isin] | semijoin |
|
59
|
+
| Return rows of left that do not have a match in right | join, anti_join | dplyr::anti_join | [isin] | antijoin |
|
60
|
+
| Collect rows that appear in left or right | union | dplyr::union | merge | |
|
61
|
+
| Collect rows that appear in both left and right | intersect | dplyr::intersect | merge | |
|
62
|
+
| Collect rows that appear in left but not right | difference, setdiff | dplyr::setdiff | merge | |
|
63
|
+
|
64
|
+
|
65
|
+
|
data/doc/SubFrames.md
ADDED
@@ -0,0 +1,11 @@
|
|
1
|
+
# SubFrames
|
2
|
+
|
3
|
+
`SubFrames` represents a collection of subsets of a DataFrame.
|
4
|
+
It has an Array of indices `#subset_indices` which is able to create an Array of sub DataFrames.
|
5
|
+
The concept includes `group` operation of a Dataframe, rolling window operation and has more broad capabilities.
|
6
|
+
|
7
|
+
This feature is experimental. It may be removed or be changed in the future.
|
8
|
+
|
9
|
+
## Create SubFrames
|
10
|
+
|
11
|
+
## Properties of SubFrames
|
data/doc/Vector.md
CHANGED
@@ -182,6 +182,31 @@ boolean.all(skip_nulls: true) #=> true
|
|
182
182
|
boolean.all(skip_nulls: false) #=> false
|
183
183
|
```
|
184
184
|
|
185
|
+
### Check if `function` is an aggregation function: `Vector.aggregate?(function)`
|
186
|
+
|
187
|
+
Return true if `function` is an unary aggregation function. Otherwise return false.
|
188
|
+
|
189
|
+
### Treat aggregation function as an element-wise function: `propagate(function)`
|
190
|
+
|
191
|
+
Spread the return value of an aggregate function as if it is a element-wise function.
|
192
|
+
|
193
|
+
```ruby
|
194
|
+
vec = Vector.new(1, 2, 3, 4)
|
195
|
+
vec.propagate(:mean)
|
196
|
+
# =>
|
197
|
+
#<RedAmber::Vector(:double, size=4):0x000000000001985c>
|
198
|
+
[2.5, 2.5, 2.5, 2.5]
|
199
|
+
```
|
200
|
+
|
201
|
+
`#propagate` also accepts a block to compute with a customized aggregation function yielding a scalar.
|
202
|
+
|
203
|
+
```ruby
|
204
|
+
vec.propagate { |v| v.mean.round }
|
205
|
+
# =>
|
206
|
+
#<RedAmber::Vector(:uint8, size=4):0x000000000000cb98>
|
207
|
+
[3, 3, 3, 3]
|
208
|
+
```
|
209
|
+
|
185
210
|
### Unary element-wise: `vector.func => vector`
|
186
211
|
|
187
212
|

|
@@ -305,7 +330,7 @@ double.round(n_digits: -1)
|
|
305
330
|
|
306
331
|
Returns index of specified element.
|
307
332
|
|
308
|
-
### `quantiles(probs = [
|
333
|
+
### `quantiles(probs = [0.0, 0.25, 0.5, 0.75, 1.0], interpolation: :linear, skip_nils: true, min_count: 0)`
|
309
334
|
|
310
335
|
Returns quantiles for specified probabilities in a DataFrame.
|
311
336
|
|
@@ -601,3 +626,184 @@ vector.merge(other, sep: '')
|
|
601
626
|
#<RedAmber::Vector(:string, size=3):0x0000000000038b80>
|
602
627
|
["ab", "cd", "ef"]
|
603
628
|
```
|
629
|
+
|
630
|
+
### `concatenate(other)` or `concat(other)`
|
631
|
+
|
632
|
+
Concatenate other array-like to self and return a concatenated Vector.
|
633
|
+
- `other` is one of `Vector`, `Array`, `Arrow::Array` or `Arrow::ChunkedArray`
|
634
|
+
- Different type will be 'resolved'.
|
635
|
+
|
636
|
+
Concatenate to string
|
637
|
+
```ruby
|
638
|
+
string_vector
|
639
|
+
|
640
|
+
# =>
|
641
|
+
#<RedAmber::Vector(:string, size=2):0x00000000000037b4>
|
642
|
+
["A", "B"]
|
643
|
+
|
644
|
+
string_vector.concatenate([1, 2])
|
645
|
+
|
646
|
+
# =>
|
647
|
+
#<RedAmber::Vector(:string, size=4):0x0000000000003818>
|
648
|
+
["A", "B", "1", "2"]
|
649
|
+
```
|
650
|
+
|
651
|
+
Concatenate to integer
|
652
|
+
|
653
|
+
```ruby
|
654
|
+
integer_vector
|
655
|
+
|
656
|
+
# =>
|
657
|
+
#<RedAmber::Vector(:uint8, size=2):0x000000000000382c>
|
658
|
+
[1, 2]
|
659
|
+
|
660
|
+
nteger_vector.concatenate(["A", "B"])
|
661
|
+
# =>
|
662
|
+
#<RedAmber::Vector(:uint8, size=4):0x0000000000003840>
|
663
|
+
[1, 2, 65, 66]
|
664
|
+
```
|
665
|
+
|
666
|
+
### `rank`
|
667
|
+
|
668
|
+
Returns numerical rank of self.
|
669
|
+
- Nil values are considered greater than any value.
|
670
|
+
- NaN values are considered greater than any value but smaller than nil values.
|
671
|
+
- Tiebreakers are ranked in order of appearance.
|
672
|
+
- `RankOptions` in C++ function is not implemented in C GLib yet.
|
673
|
+
This method is currently fixed to the default behavior.
|
674
|
+
|
675
|
+
Returns 0-based rank of self (0...size in range) as a Vector.
|
676
|
+
|
677
|
+
Rank of float Vector
|
678
|
+
```ruby
|
679
|
+
fv = Vector.new(0.1, nil, Float::NAN, 0.2, 0.1); fv
|
680
|
+
# =>
|
681
|
+
#<RedAmber::Vector(:double, size=5):0x000000000000c65c>
|
682
|
+
[0.1, nil, NaN, 0.2, 0.1]
|
683
|
+
|
684
|
+
fv.rank
|
685
|
+
# =>
|
686
|
+
#<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
|
687
|
+
[0, 4, 3, 2, 1]
|
688
|
+
```
|
689
|
+
|
690
|
+
Rank of string Vector
|
691
|
+
```ruby
|
692
|
+
sv = Vector.new("A", "B", nil, "A", "C"); sv
|
693
|
+
# =>
|
694
|
+
#<RedAmber::Vector(:string, size=5):0x0000000000003854>
|
695
|
+
["A", "B", nil, "A", "C"]
|
696
|
+
|
697
|
+
sv.rank
|
698
|
+
# =>
|
699
|
+
#<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
|
700
|
+
[0, 2, 4, 1, 3]
|
701
|
+
```
|
702
|
+
|
703
|
+
### `sample(integer_or_proportion)`
|
704
|
+
|
705
|
+
Pick up elements at random.
|
706
|
+
|
707
|
+
#### `sample` : without agrument
|
708
|
+
|
709
|
+
Return a randomly selected element.
|
710
|
+
This is one of an aggregation function.
|
711
|
+
|
712
|
+
```ruby
|
713
|
+
v = Vector.new('A'..'H'); v
|
714
|
+
# =>
|
715
|
+
#<RedAmber::Vector(:string, size=8):0x0000000000011b20>
|
716
|
+
["A", "B", "C", "D", "E", "F", "G", "H"]
|
717
|
+
|
718
|
+
v.sample
|
719
|
+
# =>
|
720
|
+
"C"
|
721
|
+
```
|
722
|
+
|
723
|
+
#### `sample(n)` : n as a Integer
|
724
|
+
|
725
|
+
Pick up n elements at random.
|
726
|
+
|
727
|
+
- Param `n` is number of elements to pick.
|
728
|
+
- `n` is a positive Integer
|
729
|
+
- If `n` is smaller or equal to size, elements are picked by non-repeating.
|
730
|
+
- If `n` is greater than `size`, elements are picked repeatedly.
|
731
|
+
@return [Vector] sampled elements.
|
732
|
+
- If `n == 1` (in case of `sample(1)`), it returns a Vector of `size == 1` not a scalar.
|
733
|
+
|
734
|
+
```ruby
|
735
|
+
v.sample(1)
|
736
|
+
# =>
|
737
|
+
#<RedAmber::Vector(:string, size=1):0x000000000001a3b0>
|
738
|
+
["H"]
|
739
|
+
```
|
740
|
+
|
741
|
+
Sample same size of self: every element is picked in random order.
|
742
|
+
|
743
|
+
```ruby
|
744
|
+
v.sample(8)
|
745
|
+
# =>
|
746
|
+
#<RedAmber::Vector(:string, size=8):0x000000000001bda0>
|
747
|
+
["H", "D", "B", "F", "E", "A", "G", "C"]
|
748
|
+
```
|
749
|
+
|
750
|
+
Over sampling: "E" and "A" are sampled repeatedly.
|
751
|
+
|
752
|
+
```ruby
|
753
|
+
v.sample(9)
|
754
|
+
# =>
|
755
|
+
#<RedAmber::Vector(:string, size=9):0x000000000001d790>
|
756
|
+
["E", "E", "A", "D", "H", "C", "A", "F", "H"]
|
757
|
+
```
|
758
|
+
|
759
|
+
#### `sample(prop)` : prop as a Float
|
760
|
+
|
761
|
+
Pick up elements by proportion `prop` at random.
|
762
|
+
|
763
|
+
- `prop` is proportion of elements to pick.
|
764
|
+
- `prop` is a positive Float.
|
765
|
+
- Absolute number of elements to pick:`prop*size` is rounded (by `half: :up`).
|
766
|
+
- If `prop` is smaller or equal to 1.0, elements are picked by non-repeating.
|
767
|
+
- If `prop` is greater than 1.0, some elements are picked repeatedly.
|
768
|
+
- Returns sampled elements by a Vector.
|
769
|
+
- If picked element is only one, it returns a Vector of `size == 1` not a scalar.
|
770
|
+
|
771
|
+
Sample same size of self: every element is picked in random order.
|
772
|
+
|
773
|
+
```ruby
|
774
|
+
v.sample(1.0)
|
775
|
+
# =>
|
776
|
+
#<RedAmber::Vector(:string, size=8):0x000000000001bda0>
|
777
|
+
["D", "H", "F", "C", "A", "B", "E", "G"]
|
778
|
+
```
|
779
|
+
|
780
|
+
2 times over sampling.
|
781
|
+
|
782
|
+
```ruby
|
783
|
+
v.sample(2.0)
|
784
|
+
# =>
|
785
|
+
#<RedAmber::Vector(:string, size=16):0x00000000000233e8>
|
786
|
+
["H", "B", "C", "B", "C", "A", "F", "A", "E", "C", "H", "F", "F", "A", ... ]
|
787
|
+
```
|
788
|
+
|
789
|
+
### `sort(integer_or_proportion)`
|
790
|
+
|
791
|
+
Arrange values in Vector.
|
792
|
+
|
793
|
+
- `:+`, `:ascending` or without argument will sort in increasing order.
|
794
|
+
- `:-` or `:descending` will sort in decreasing order.
|
795
|
+
|
796
|
+
```ruby
|
797
|
+
Vector.new(%w[B D A E C]).sort
|
798
|
+
# same as #sort(:+)
|
799
|
+
# same as #sort(:ascending)
|
800
|
+
# =>
|
801
|
+
#<RedAmber::Vector(:string, size=5):0x000000000000c134>
|
802
|
+
["A", "B", "C", "D", "E"]
|
803
|
+
|
804
|
+
Vector.new(%w[B D A E C]).sort(:-)
|
805
|
+
# same as #sort(:descending)
|
806
|
+
# =>
|
807
|
+
#<RedAmber::Vector(:string, size=5):0x000000000000c148>
|
808
|
+
["E", "D", "C", "B", "A"]
|
809
|
+
```
|