red_amber 0.3.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (42) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +39 -20
  3. data/.yardopts +2 -0
  4. data/CHANGELOG.md +113 -0
  5. data/Gemfile +1 -1
  6. data/LICENSE +1 -1
  7. data/README.md +25 -26
  8. data/benchmark/basic.yml +2 -2
  9. data/benchmark/combine.yml +2 -2
  10. data/benchmark/dataframe.yml +2 -2
  11. data/benchmark/group.yml +2 -2
  12. data/benchmark/reshape.yml +2 -2
  13. data/benchmark/vector.yml +3 -0
  14. data/doc/DataFrame.md +32 -12
  15. data/doc/DataFrame_Comparison.md +65 -0
  16. data/doc/SubFrames.md +11 -0
  17. data/doc/Vector.md +207 -1
  18. data/doc/yard-templates/default/fulldoc/html/css/common.css +6 -0
  19. data/lib/red_amber/data_frame.rb +429 -75
  20. data/lib/red_amber/data_frame_combinable.rb +516 -66
  21. data/lib/red_amber/data_frame_displayable.rb +244 -14
  22. data/lib/red_amber/data_frame_indexable.rb +121 -18
  23. data/lib/red_amber/data_frame_loadsave.rb +78 -10
  24. data/lib/red_amber/data_frame_reshaping.rb +184 -14
  25. data/lib/red_amber/data_frame_selectable.rb +622 -66
  26. data/lib/red_amber/data_frame_variable_operation.rb +446 -34
  27. data/lib/red_amber/group.rb +187 -22
  28. data/lib/red_amber/helper.rb +70 -10
  29. data/lib/red_amber/refinements.rb +12 -5
  30. data/lib/red_amber/subframes.rb +1066 -0
  31. data/lib/red_amber/vector.rb +385 -11
  32. data/lib/red_amber/vector_aggregation.rb +312 -0
  33. data/lib/red_amber/vector_binary_element_wise.rb +387 -0
  34. data/lib/red_amber/vector_selectable.rb +217 -12
  35. data/lib/red_amber/vector_unary_element_wise.rb +436 -0
  36. data/lib/red_amber/vector_updatable.rb +278 -34
  37. data/lib/red_amber/version.rb +2 -1
  38. data/lib/red_amber.rb +13 -1
  39. data/red_amber.gemspec +2 -2
  40. metadata +13 -8
  41. data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
  42. data/lib/red_amber/vector_functions.rb +0 -242
@@ -0,0 +1,65 @@
1
+ # Comparison of DataFrames
2
+
3
+ Compare basic features of RedAmber with Python
4
+ [pandas](https://pandas.pydata.org/),
5
+ R [Tidyverse](https://www.tidyverse.org/) and
6
+ Julia [Dataframes](https://dataframes.juliadata.org/stable/).
7
+
8
+ ## Select columns (variables)
9
+
10
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
11
+ |--- |--- |--- |--- |--- |
12
+ | Select columns as a dataframe | pick, drop, [] | dplyr::select, dplyr::select_if | [], loc[], iloc[], drop, select_dtypes | [], select |
13
+ | Select a column as a vector | [], v | dplyr::pull, [, x] | [], loc[], iloc[] | [!, :x] |
14
+ | Move columns to a new position | pick, [] | relocate | [], reindex, loc[], iloc[] | select,transform |
15
+
16
+ ## Select rows (records, observations)
17
+
18
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
19
+ |--- |--- |--- |--- |--- |
20
+ | Select rows that meet logical criteria as a dataframe | slice, remove, [] | dplyr::filter | [], filter, query, loc[] | filter |
21
+ | Select rows by position as a dataframe | slice, remove, [] | dplyr::slice | iloc[], drop | subset |
22
+ | Move rows to a new position | slice, [] | dplyr::filter, dplyr::slice | reindex, loc[], iloc[] | permute |
23
+
24
+ ## Update columns / create new columns
25
+
26
+ |Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
27
+ |--- |--- |--- |--- |--- |
28
+ | Update existing columns | assign | dplyr::mutate | assign, []= | mapcols |
29
+ | Create new columns | assign, assign_left | dplyr::mutate | apply | insertcols,.+ |
30
+ | Compute new columns, drop others | new | transmute | (dfply:)transmute | transform,insertcols,mapcols |
31
+ | Rename columns | rename | dplyr::rename, dplyr::rename_with, purrr::set_names | rename, set_axis | rename |
32
+ | Sort dataframe | sort | dplyr::arrange | sort_values | sort |
33
+
34
+ ## Reshape dataframe
35
+
36
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
37
+ |--- |--- |--- |--- |--- |
38
+ | Gather columns into rows (create a longer dataframe) | to_long | tidyr::pivot_longer | melt | stack |
39
+ | Spread rows into columns (create a wider dataframe) | to_wide | tidyr::pivot_wider | pivot | unstack |
40
+ | transpose a wide dataframe | transpose | transpose, t | transpose, T | permutedims |
41
+
42
+ ## Grouping
43
+
44
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
45
+ |--- |--- |--- |--- |--- |
46
+ |Grouping | group, group.summarize | dplyr::group_by %>% dplyr::summarise | groupby.agg | combine,groupby |
47
+
48
+ ## Combine dataframes or tables
49
+
50
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
51
+ |--- |--- |--- |--- |--- |
52
+ | Combine additional columns | merge, bind_cols | dplyr::bind_cols | concat | combine |
53
+ | Combine additional rows | concatenate, concat, bind_rows | dplyr::bind_rows | concat | transform |
54
+ | Join right to left, leaving only the matching rows| join, inner_join | dplyr::inner_join | merge | innerjoin |
55
+ | Join right to left, leaving all rows | join, full_join, outer_join | dplyr::full_join | merge | outerjoin |
56
+ | Join matching values to left from right | join, left_join | dplyr::left_join | merge | leftjoin |
57
+ | Join matching values from left to right | join, right_join | dplyr::right_join | merge | rightjoin |
58
+ | Return rows of left that have a match in right | join, semi_join | dplyr::semi_join | [isin] | semijoin |
59
+ | Return rows of left that do not have a match in right | join, anti_join | dplyr::anti_join | [isin] | antijoin |
60
+ | Collect rows that appear in left or right | union | dplyr::union | merge | |
61
+ | Collect rows that appear in both left and right | intersect | dplyr::intersect | merge | |
62
+ | Collect rows that appear in left but not right | difference, setdiff | dplyr::setdiff | merge | |
63
+
64
+
65
+
data/doc/SubFrames.md ADDED
@@ -0,0 +1,11 @@
1
+ # SubFrames
2
+
3
+ `SubFrames` represents a collection of subsets of a DataFrame.
4
+ It has an Array of indices `#subset_indices` which is able to create an Array of sub DataFrames.
5
+ The concept includes `group` operation of a Dataframe, rolling window operation and has more broad capabilities.
6
+
7
+ This feature is experimental. It may be removed or be changed in the future.
8
+
9
+ ## Create SubFrames
10
+
11
+ ## Properties of SubFrames
data/doc/Vector.md CHANGED
@@ -182,6 +182,31 @@ boolean.all(skip_nulls: true) #=> true
182
182
  boolean.all(skip_nulls: false) #=> false
183
183
  ```
184
184
 
185
+ ### Check if `function` is an aggregation function: `Vector.aggregate?(function)`
186
+
187
+ Return true if `function` is an unary aggregation function. Otherwise return false.
188
+
189
+ ### Treat aggregation function as an element-wise function: `propagate(function)`
190
+
191
+ Spread the return value of an aggregate function as if it is a element-wise function.
192
+
193
+ ```ruby
194
+ vec = Vector.new(1, 2, 3, 4)
195
+ vec.propagate(:mean)
196
+ # =>
197
+ #<RedAmber::Vector(:double, size=4):0x000000000001985c>
198
+ [2.5, 2.5, 2.5, 2.5]
199
+ ```
200
+
201
+ `#propagate` also accepts a block to compute with a customized aggregation function yielding a scalar.
202
+
203
+ ```ruby
204
+ vec.propagate { |v| v.mean.round }
205
+ # =>
206
+ #<RedAmber::Vector(:uint8, size=4):0x000000000000cb98>
207
+ [3, 3, 3, 3]
208
+ ```
209
+
185
210
  ### Unary element-wise: `vector.func => vector`
186
211
 
187
212
  ![unary element-wise](doc/image/../../image/vector/unary_element_wise.png)
@@ -305,7 +330,7 @@ double.round(n_digits: -1)
305
330
 
306
331
  Returns index of specified element.
307
332
 
308
- ### `quantiles(probs = [1.0, 0.75, 0.5, 0.25, 0.0], interpolation: :linear, skip_nils: true, min_count: 0)`
333
+ ### `quantiles(probs = [0.0, 0.25, 0.5, 0.75, 1.0], interpolation: :linear, skip_nils: true, min_count: 0)`
309
334
 
310
335
  Returns quantiles for specified probabilities in a DataFrame.
311
336
 
@@ -601,3 +626,184 @@ vector.merge(other, sep: '')
601
626
  #<RedAmber::Vector(:string, size=3):0x0000000000038b80>
602
627
  ["ab", "cd", "ef"]
603
628
  ```
629
+
630
+ ### `concatenate(other)` or `concat(other)`
631
+
632
+ Concatenate other array-like to self and return a concatenated Vector.
633
+ - `other` is one of `Vector`, `Array`, `Arrow::Array` or `Arrow::ChunkedArray`
634
+ - Different type will be 'resolved'.
635
+
636
+ Concatenate to string
637
+ ```ruby
638
+ string_vector
639
+
640
+ # =>
641
+ #<RedAmber::Vector(:string, size=2):0x00000000000037b4>
642
+ ["A", "B"]
643
+
644
+ string_vector.concatenate([1, 2])
645
+
646
+ # =>
647
+ #<RedAmber::Vector(:string, size=4):0x0000000000003818>
648
+ ["A", "B", "1", "2"]
649
+ ```
650
+
651
+ Concatenate to integer
652
+
653
+ ```ruby
654
+ integer_vector
655
+
656
+ # =>
657
+ #<RedAmber::Vector(:uint8, size=2):0x000000000000382c>
658
+ [1, 2]
659
+
660
+ nteger_vector.concatenate(["A", "B"])
661
+ # =>
662
+ #<RedAmber::Vector(:uint8, size=4):0x0000000000003840>
663
+ [1, 2, 65, 66]
664
+ ```
665
+
666
+ ### `rank`
667
+
668
+ Returns numerical rank of self.
669
+ - Nil values are considered greater than any value.
670
+ - NaN values are considered greater than any value but smaller than nil values.
671
+ - Tiebreakers are ranked in order of appearance.
672
+ - `RankOptions` in C++ function is not implemented in C GLib yet.
673
+ This method is currently fixed to the default behavior.
674
+
675
+ Returns 0-based rank of self (0...size in range) as a Vector.
676
+
677
+ Rank of float Vector
678
+ ```ruby
679
+ fv = Vector.new(0.1, nil, Float::NAN, 0.2, 0.1); fv
680
+ # =>
681
+ #<RedAmber::Vector(:double, size=5):0x000000000000c65c>
682
+ [0.1, nil, NaN, 0.2, 0.1]
683
+
684
+ fv.rank
685
+ # =>
686
+ #<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
687
+ [0, 4, 3, 2, 1]
688
+ ```
689
+
690
+ Rank of string Vector
691
+ ```ruby
692
+ sv = Vector.new("A", "B", nil, "A", "C"); sv
693
+ # =>
694
+ #<RedAmber::Vector(:string, size=5):0x0000000000003854>
695
+ ["A", "B", nil, "A", "C"]
696
+
697
+ sv.rank
698
+ # =>
699
+ #<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
700
+ [0, 2, 4, 1, 3]
701
+ ```
702
+
703
+ ### `sample(integer_or_proportion)`
704
+
705
+ Pick up elements at random.
706
+
707
+ #### `sample` : without agrument
708
+
709
+ Return a randomly selected element.
710
+ This is one of an aggregation function.
711
+
712
+ ```ruby
713
+ v = Vector.new('A'..'H'); v
714
+ # =>
715
+ #<RedAmber::Vector(:string, size=8):0x0000000000011b20>
716
+ ["A", "B", "C", "D", "E", "F", "G", "H"]
717
+
718
+ v.sample
719
+ # =>
720
+ "C"
721
+ ```
722
+
723
+ #### `sample(n)` : n as a Integer
724
+
725
+ Pick up n elements at random.
726
+
727
+ - Param `n` is number of elements to pick.
728
+ - `n` is a positive Integer
729
+ - If `n` is smaller or equal to size, elements are picked by non-repeating.
730
+ - If `n` is greater than `size`, elements are picked repeatedly.
731
+ @return [Vector] sampled elements.
732
+ - If `n == 1` (in case of `sample(1)`), it returns a Vector of `size == 1` not a scalar.
733
+
734
+ ```ruby
735
+ v.sample(1)
736
+ # =>
737
+ #<RedAmber::Vector(:string, size=1):0x000000000001a3b0>
738
+ ["H"]
739
+ ```
740
+
741
+ Sample same size of self: every element is picked in random order.
742
+
743
+ ```ruby
744
+ v.sample(8)
745
+ # =>
746
+ #<RedAmber::Vector(:string, size=8):0x000000000001bda0>
747
+ ["H", "D", "B", "F", "E", "A", "G", "C"]
748
+ ```
749
+
750
+ Over sampling: "E" and "A" are sampled repeatedly.
751
+
752
+ ```ruby
753
+ v.sample(9)
754
+ # =>
755
+ #<RedAmber::Vector(:string, size=9):0x000000000001d790>
756
+ ["E", "E", "A", "D", "H", "C", "A", "F", "H"]
757
+ ```
758
+
759
+ #### `sample(prop)` : prop as a Float
760
+
761
+ Pick up elements by proportion `prop` at random.
762
+
763
+ - `prop` is proportion of elements to pick.
764
+ - `prop` is a positive Float.
765
+ - Absolute number of elements to pick:`prop*size` is rounded (by `half: :up`).
766
+ - If `prop` is smaller or equal to 1.0, elements are picked by non-repeating.
767
+ - If `prop` is greater than 1.0, some elements are picked repeatedly.
768
+ - Returns sampled elements by a Vector.
769
+ - If picked element is only one, it returns a Vector of `size == 1` not a scalar.
770
+
771
+ Sample same size of self: every element is picked in random order.
772
+
773
+ ```ruby
774
+ v.sample(1.0)
775
+ # =>
776
+ #<RedAmber::Vector(:string, size=8):0x000000000001bda0>
777
+ ["D", "H", "F", "C", "A", "B", "E", "G"]
778
+ ```
779
+
780
+ 2 times over sampling.
781
+
782
+ ```ruby
783
+ v.sample(2.0)
784
+ # =>
785
+ #<RedAmber::Vector(:string, size=16):0x00000000000233e8>
786
+ ["H", "B", "C", "B", "C", "A", "F", "A", "E", "C", "H", "F", "F", "A", ... ]
787
+ ```
788
+
789
+ ### `sort(integer_or_proportion)`
790
+
791
+ Arrange values in Vector.
792
+
793
+ - `:+`, `:ascending` or without argument will sort in increasing order.
794
+ - `:-` or `:descending` will sort in decreasing order.
795
+
796
+ ```ruby
797
+ Vector.new(%w[B D A E C]).sort
798
+ # same as #sort(:+)
799
+ # same as #sort(:ascending)
800
+ # =>
801
+ #<RedAmber::Vector(:string, size=5):0x000000000000c134>
802
+ ["A", "B", "C", "D", "E"]
803
+
804
+ Vector.new(%w[B D A E C]).sort(:-)
805
+ # same as #sort(:descending)
806
+ # =>
807
+ #<RedAmber::Vector(:string, size=5):0x000000000000c148>
808
+ ["E", "D", "C", "B", "A"]
809
+ ```
@@ -0,0 +1,6 @@
1
+ /* Override this file with custom rules */
2
+
3
+ /* Use monospace font for code */
4
+ code {
5
+ font-family: "Courier New", Consolas, monospace;
6
+ }