red_amber 0.3.0 → 0.4.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (42) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +56 -22
  3. data/.yardopts +2 -0
  4. data/CHANGELOG.md +178 -0
  5. data/Gemfile +1 -1
  6. data/LICENSE +1 -1
  7. data/README.md +29 -30
  8. data/benchmark/basic.yml +7 -7
  9. data/benchmark/combine.yml +3 -3
  10. data/benchmark/dataframe.yml +15 -9
  11. data/benchmark/group.yml +6 -6
  12. data/benchmark/reshape.yml +6 -6
  13. data/benchmark/vector.yml +6 -3
  14. data/doc/DataFrame.md +32 -12
  15. data/doc/DataFrame_Comparison.md +65 -0
  16. data/doc/SubFrames.md +11 -0
  17. data/doc/Vector.md +207 -1
  18. data/doc/yard-templates/default/fulldoc/html/css/common.css +6 -0
  19. data/lib/red_amber/data_frame.rb +454 -85
  20. data/lib/red_amber/data_frame_combinable.rb +609 -115
  21. data/lib/red_amber/data_frame_displayable.rb +313 -34
  22. data/lib/red_amber/data_frame_indexable.rb +122 -19
  23. data/lib/red_amber/data_frame_loadsave.rb +78 -10
  24. data/lib/red_amber/data_frame_reshaping.rb +184 -14
  25. data/lib/red_amber/data_frame_selectable.rb +623 -70
  26. data/lib/red_amber/data_frame_variable_operation.rb +452 -35
  27. data/lib/red_amber/group.rb +186 -22
  28. data/lib/red_amber/helper.rb +74 -14
  29. data/lib/red_amber/refinements.rb +26 -6
  30. data/lib/red_amber/subframes.rb +1101 -0
  31. data/lib/red_amber/vector.rb +362 -11
  32. data/lib/red_amber/vector_aggregation.rb +312 -0
  33. data/lib/red_amber/vector_binary_element_wise.rb +506 -0
  34. data/lib/red_amber/vector_selectable.rb +265 -23
  35. data/lib/red_amber/vector_unary_element_wise.rb +529 -0
  36. data/lib/red_amber/vector_updatable.rb +278 -34
  37. data/lib/red_amber/version.rb +2 -1
  38. data/lib/red_amber.rb +13 -1
  39. data/red_amber.gemspec +2 -2
  40. metadata +13 -8
  41. data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
  42. data/lib/red_amber/vector_functions.rb +0 -242
data/doc/DataFrame.md CHANGED
@@ -57,6 +57,10 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
57
57
  ```ruby
58
58
  RedAmber::DataFrame.load("test/entity/with_header.csv")
59
59
  ```
60
+
61
+ ```ruby
62
+ RedAmber::DataFrame.load("test/entity/without_header.csv", headers: [:x, :y, :z])
63
+ ```
60
64
 
61
65
  - from a string buffer
62
66
 
@@ -275,6 +279,7 @@ penguins.to_rover
275
279
 
276
280
  - Shows some information about self in a transposed style.
277
281
  - `tdr_str` returns same info as a String.
282
+ - `glimpse` is an alias. It is similar to dplyr's (or Polars's) `glimpse()`.
278
283
 
279
284
  ```ruby
280
285
  require 'red_amber'
@@ -568,7 +573,7 @@ penguins.to_rover
568
573
  [1, 2, 3]
569
574
  ```
570
575
 
571
- ### `slice ` - slice and select records -
576
+ ### `slice ` - cut into slices of records -
572
577
 
573
578
  Slice and select records (rows) to create a sub DataFrame.
574
579
 
@@ -601,11 +606,14 @@ penguins.to_rover
601
606
 
602
607
  - Booleans as an argument
603
608
 
604
- `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
609
+ `filter(booleans)` or `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
610
+
611
+ note: `slice(booleans)` is acceptable for orthogonality of `slice`/`remove`.
605
612
 
606
613
  ```ruby
607
614
  vector = penguins[:bill_length_mm]
608
- penguins.slice(vector >= 40)
615
+ penguins.filter(vector >= 40)
616
+ # penguins.slice(vector >= 40) is also acceptable
609
617
 
610
618
  # =>
611
619
  #<RedAmber::DataFrame : 242 x 8 Vectors, 0x0000000000043d3c>
@@ -833,14 +841,14 @@ penguins.to_rover
833
841
 
834
842
  Assign new or updated variables (columns) and create an updated DataFrame.
835
843
 
836
- - Variables with new keys will append new columns from the right.
844
+ - Variables with new keys will append new columns from right.
837
845
  - Variables with exisiting keys will update corresponding vectors.
838
846
 
839
847
  ![assign method image](doc/../image/dataframe/assign.png)
840
848
 
841
849
  - Variables as arguments
842
850
 
843
- `assign(key_pairs)` accepts pairs of key and values as parameters. `key_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
851
+ `assign(key_value_pairs)` accepts pairs of key and values as parameters. `key_value_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
844
852
 
845
853
  ```ruby
846
854
  df = RedAmber::DataFrame.new(
@@ -857,12 +865,12 @@ penguins.to_rover
857
865
  2 Hinata 28
858
866
 
859
867
  # update :age and add :brother
860
- df.assign do
868
+ df.assign(
861
869
  {
862
870
  age: age + 29,
863
871
  brother: ['Santa', nil, 'Momotaro']
864
872
  }
865
- end
873
+ )
866
874
 
867
875
  # =>
868
876
  #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000658b0>
@@ -932,7 +940,7 @@ penguins.to_rover
932
940
 
933
941
  - Append from left
934
942
 
935
- `assign_left` method accepts the same parameters and block as `assign`, but append new columns from leftside.
943
+ `assign_left` method accepts the same parameters and block as `assign`, but append new columns from left.
936
944
 
937
945
  ```ruby
938
946
  df.assign_left(new_index: df.indices(1))
@@ -1453,6 +1461,8 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1453
1461
  1 B 4
1454
1462
  2 D 5
1455
1463
  ```
1464
+ ##### `set_operable?(other)`
1465
+ Check if `types` of self and other are same.
1456
1466
 
1457
1467
  ##### `intersect(other)`
1458
1468
 
@@ -1498,15 +1508,23 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1498
1508
  <string> <uint8>
1499
1509
  1 B 2
1500
1510
  2 C 3
1511
+
1512
+ other.differencr(df)
1513
+ #=>
1514
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000040e0c>
1515
+ KEY1 KEY2
1516
+ <string> <uint8>
1517
+ 0 B 4
1518
+ 1 D 5
1501
1519
  ```
1502
1520
 
1503
1521
  ## Binding
1504
1522
 
1505
1523
  ### `concatenate(other)`
1506
1524
 
1507
- Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
1525
+ Concatenate another DataFrame or Table onto the bottom of self. The types of other must be the same as self.
1508
1526
 
1509
- The alias is `concat`.
1527
+ The alias is `concat` and `bind_rows`.
1510
1528
 
1511
1529
  An array of DataFrames or Tables is also acceptable as other.
1512
1530
 
@@ -1538,9 +1556,11 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1538
1556
  3 4 D
1539
1557
  ```
1540
1558
 
1541
- ### `merge(other)`
1559
+ ### `merge(*other)`
1560
+
1561
+ Concatenate another DataFrame or Table onto the bottom of self. The size of other must be the same as self. Self and other must not share the same key.
1542
1562
 
1543
- Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
1563
+ The alias is `bind_cols`.
1544
1564
 
1545
1565
  ```ruby
1546
1566
  df
@@ -0,0 +1,65 @@
1
+ # Comparison of DataFrames
2
+
3
+ Compare basic features of RedAmber with Python
4
+ [pandas](https://pandas.pydata.org/),
5
+ R [Tidyverse](https://www.tidyverse.org/) and
6
+ Julia [Dataframes](https://dataframes.juliadata.org/stable/).
7
+
8
+ ## Select columns (variables)
9
+
10
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
11
+ |--- |--- |--- |--- |--- |
12
+ | Select columns as a dataframe | pick, drop, [] | dplyr::select, dplyr::select_if | [], loc[], iloc[], drop, select_dtypes | [], select |
13
+ | Select a column as a vector | [], v | dplyr::pull, [, x] | [], loc[], iloc[] | [!, :x] |
14
+ | Move columns to a new position | pick, [] | relocate | [], reindex, loc[], iloc[] | select,transform |
15
+
16
+ ## Select rows (records, observations)
17
+
18
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
19
+ |--- |--- |--- |--- |--- |
20
+ | Select rows that meet logical criteria as a dataframe | slice, remove, [] | dplyr::filter | [], filter, query, loc[] | filter |
21
+ | Select rows by position as a dataframe | slice, remove, [] | dplyr::slice | iloc[], drop | subset |
22
+ | Move rows to a new position | slice, [] | dplyr::filter, dplyr::slice | reindex, loc[], iloc[] | permute |
23
+
24
+ ## Update columns / create new columns
25
+
26
+ |Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
27
+ |--- |--- |--- |--- |--- |
28
+ | Update existing columns | assign | dplyr::mutate | assign, []= | mapcols |
29
+ | Create new columns | assign, assign_left | dplyr::mutate | apply | insertcols,.+ |
30
+ | Compute new columns, drop others | new | transmute | (dfply:)transmute | transform,insertcols,mapcols |
31
+ | Rename columns | rename | dplyr::rename, dplyr::rename_with, purrr::set_names | rename, set_axis | rename |
32
+ | Sort dataframe | sort | dplyr::arrange | sort_values | sort |
33
+
34
+ ## Reshape dataframe
35
+
36
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
37
+ |--- |--- |--- |--- |--- |
38
+ | Gather columns into rows (create a longer dataframe) | to_long | tidyr::pivot_longer | melt | stack |
39
+ | Spread rows into columns (create a wider dataframe) | to_wide | tidyr::pivot_wider | pivot | unstack |
40
+ | transpose a wide dataframe | transpose | transpose, t | transpose, T | permutedims |
41
+
42
+ ## Grouping
43
+
44
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
45
+ |--- |--- |--- |--- |--- |
46
+ |Grouping | group, group.summarize | dplyr::group_by %>% dplyr::summarise | groupby.agg | combine,groupby |
47
+
48
+ ## Combine dataframes or tables
49
+
50
+ | Features | RedAmber | Tidyverse | pandas | DataFrames.jl |
51
+ |--- |--- |--- |--- |--- |
52
+ | Combine additional columns | merge, bind_cols | dplyr::bind_cols | concat | combine |
53
+ | Combine additional rows | concatenate, concat, bind_rows | dplyr::bind_rows | concat | transform |
54
+ | Join right to left, leaving only the matching rows| join, inner_join | dplyr::inner_join | merge | innerjoin |
55
+ | Join right to left, leaving all rows | join, full_join, outer_join | dplyr::full_join | merge | outerjoin |
56
+ | Join matching values to left from right | join, left_join | dplyr::left_join | merge | leftjoin |
57
+ | Join matching values from left to right | join, right_join | dplyr::right_join | merge | rightjoin |
58
+ | Return rows of left that have a match in right | join, semi_join | dplyr::semi_join | [isin] | semijoin |
59
+ | Return rows of left that do not have a match in right | join, anti_join | dplyr::anti_join | [isin] | antijoin |
60
+ | Collect rows that appear in left or right | union | dplyr::union | merge | |
61
+ | Collect rows that appear in both left and right | intersect | dplyr::intersect | merge | |
62
+ | Collect rows that appear in left but not right | difference, setdiff | dplyr::setdiff | merge | |
63
+
64
+
65
+
data/doc/SubFrames.md ADDED
@@ -0,0 +1,11 @@
1
+ # SubFrames
2
+
3
+ `SubFrames` represents a collection of subsets of a DataFrame.
4
+ It has an Array of indices `#subset_indices` which is able to create an Array of sub DataFrames.
5
+ The concept includes `group` operation of a Dataframe, rolling window operation and has more broad capabilities.
6
+
7
+ This feature is experimental. It may be removed or be changed in the future.
8
+
9
+ ## Create SubFrames
10
+
11
+ ## Properties of SubFrames
data/doc/Vector.md CHANGED
@@ -182,6 +182,31 @@ boolean.all(skip_nulls: true) #=> true
182
182
  boolean.all(skip_nulls: false) #=> false
183
183
  ```
184
184
 
185
+ ### Check if `function` is an aggregation function: `Vector.aggregate?(function)`
186
+
187
+ Return true if `function` is an unary aggregation function. Otherwise return false.
188
+
189
+ ### Treat aggregation function as an element-wise function: `propagate(function)`
190
+
191
+ Spread the return value of an aggregate function as if it is a element-wise function.
192
+
193
+ ```ruby
194
+ vec = Vector.new(1, 2, 3, 4)
195
+ vec.propagate(:mean)
196
+ # =>
197
+ #<RedAmber::Vector(:double, size=4):0x000000000001985c>
198
+ [2.5, 2.5, 2.5, 2.5]
199
+ ```
200
+
201
+ `#propagate` also accepts a block to compute with a customized aggregation function yielding a scalar.
202
+
203
+ ```ruby
204
+ vec.propagate { |v| v.mean.round }
205
+ # =>
206
+ #<RedAmber::Vector(:uint8, size=4):0x000000000000cb98>
207
+ [3, 3, 3, 3]
208
+ ```
209
+
185
210
  ### Unary element-wise: `vector.func => vector`
186
211
 
187
212
  ![unary element-wise](doc/image/../../image/vector/unary_element_wise.png)
@@ -305,7 +330,7 @@ double.round(n_digits: -1)
305
330
 
306
331
  Returns index of specified element.
307
332
 
308
- ### `quantiles(probs = [1.0, 0.75, 0.5, 0.25, 0.0], interpolation: :linear, skip_nils: true, min_count: 0)`
333
+ ### `quantiles(probs = [0.0, 0.25, 0.5, 0.75, 1.0], interpolation: :linear, skip_nils: true, min_count: 0)`
309
334
 
310
335
  Returns quantiles for specified probabilities in a DataFrame.
311
336
 
@@ -601,3 +626,184 @@ vector.merge(other, sep: '')
601
626
  #<RedAmber::Vector(:string, size=3):0x0000000000038b80>
602
627
  ["ab", "cd", "ef"]
603
628
  ```
629
+
630
+ ### `concatenate(other)` or `concat(other)`
631
+
632
+ Concatenate other array-like to self and return a concatenated Vector.
633
+ - `other` is one of `Vector`, `Array`, `Arrow::Array` or `Arrow::ChunkedArray`
634
+ - Different type will be 'resolved'.
635
+
636
+ Concatenate to string
637
+ ```ruby
638
+ string_vector
639
+
640
+ # =>
641
+ #<RedAmber::Vector(:string, size=2):0x00000000000037b4>
642
+ ["A", "B"]
643
+
644
+ string_vector.concatenate([1, 2])
645
+
646
+ # =>
647
+ #<RedAmber::Vector(:string, size=4):0x0000000000003818>
648
+ ["A", "B", "1", "2"]
649
+ ```
650
+
651
+ Concatenate to integer
652
+
653
+ ```ruby
654
+ integer_vector
655
+
656
+ # =>
657
+ #<RedAmber::Vector(:uint8, size=2):0x000000000000382c>
658
+ [1, 2]
659
+
660
+ nteger_vector.concatenate(["A", "B"])
661
+ # =>
662
+ #<RedAmber::Vector(:uint8, size=4):0x0000000000003840>
663
+ [1, 2, 65, 66]
664
+ ```
665
+
666
+ ### `rank`
667
+
668
+ Returns numerical rank of self.
669
+ - Nil values are considered greater than any value.
670
+ - NaN values are considered greater than any value but smaller than nil values.
671
+ - Tiebreakers are ranked in order of appearance.
672
+ - `RankOptions` in C++ function is not implemented in C GLib yet.
673
+ This method is currently fixed to the default behavior.
674
+
675
+ Returns 0-based rank of self (0...size in range) as a Vector.
676
+
677
+ Rank of float Vector
678
+ ```ruby
679
+ fv = Vector.new(0.1, nil, Float::NAN, 0.2, 0.1); fv
680
+ # =>
681
+ #<RedAmber::Vector(:double, size=5):0x000000000000c65c>
682
+ [0.1, nil, NaN, 0.2, 0.1]
683
+
684
+ fv.rank
685
+ # =>
686
+ #<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
687
+ [0, 4, 3, 2, 1]
688
+ ```
689
+
690
+ Rank of string Vector
691
+ ```ruby
692
+ sv = Vector.new("A", "B", nil, "A", "C"); sv
693
+ # =>
694
+ #<RedAmber::Vector(:string, size=5):0x0000000000003854>
695
+ ["A", "B", nil, "A", "C"]
696
+
697
+ sv.rank
698
+ # =>
699
+ #<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
700
+ [0, 2, 4, 1, 3]
701
+ ```
702
+
703
+ ### `sample(integer_or_proportion)`
704
+
705
+ Pick up elements at random.
706
+
707
+ #### `sample` : without agrument
708
+
709
+ Return a randomly selected element.
710
+ This is one of an aggregation function.
711
+
712
+ ```ruby
713
+ v = Vector.new('A'..'H'); v
714
+ # =>
715
+ #<RedAmber::Vector(:string, size=8):0x0000000000011b20>
716
+ ["A", "B", "C", "D", "E", "F", "G", "H"]
717
+
718
+ v.sample
719
+ # =>
720
+ "C"
721
+ ```
722
+
723
+ #### `sample(n)` : n as a Integer
724
+
725
+ Pick up n elements at random.
726
+
727
+ - Param `n` is number of elements to pick.
728
+ - `n` is a positive Integer
729
+ - If `n` is smaller or equal to size, elements are picked by non-repeating.
730
+ - If `n` is greater than `size`, elements are picked repeatedly.
731
+ @return [Vector] sampled elements.
732
+ - If `n == 1` (in case of `sample(1)`), it returns a Vector of `size == 1` not a scalar.
733
+
734
+ ```ruby
735
+ v.sample(1)
736
+ # =>
737
+ #<RedAmber::Vector(:string, size=1):0x000000000001a3b0>
738
+ ["H"]
739
+ ```
740
+
741
+ Sample same size of self: every element is picked in random order.
742
+
743
+ ```ruby
744
+ v.sample(8)
745
+ # =>
746
+ #<RedAmber::Vector(:string, size=8):0x000000000001bda0>
747
+ ["H", "D", "B", "F", "E", "A", "G", "C"]
748
+ ```
749
+
750
+ Over sampling: "E" and "A" are sampled repeatedly.
751
+
752
+ ```ruby
753
+ v.sample(9)
754
+ # =>
755
+ #<RedAmber::Vector(:string, size=9):0x000000000001d790>
756
+ ["E", "E", "A", "D", "H", "C", "A", "F", "H"]
757
+ ```
758
+
759
+ #### `sample(prop)` : prop as a Float
760
+
761
+ Pick up elements by proportion `prop` at random.
762
+
763
+ - `prop` is proportion of elements to pick.
764
+ - `prop` is a positive Float.
765
+ - Absolute number of elements to pick:`prop*size` is rounded (by `half: :up`).
766
+ - If `prop` is smaller or equal to 1.0, elements are picked by non-repeating.
767
+ - If `prop` is greater than 1.0, some elements are picked repeatedly.
768
+ - Returns sampled elements by a Vector.
769
+ - If picked element is only one, it returns a Vector of `size == 1` not a scalar.
770
+
771
+ Sample same size of self: every element is picked in random order.
772
+
773
+ ```ruby
774
+ v.sample(1.0)
775
+ # =>
776
+ #<RedAmber::Vector(:string, size=8):0x000000000001bda0>
777
+ ["D", "H", "F", "C", "A", "B", "E", "G"]
778
+ ```
779
+
780
+ 2 times over sampling.
781
+
782
+ ```ruby
783
+ v.sample(2.0)
784
+ # =>
785
+ #<RedAmber::Vector(:string, size=16):0x00000000000233e8>
786
+ ["H", "B", "C", "B", "C", "A", "F", "A", "E", "C", "H", "F", "F", "A", ... ]
787
+ ```
788
+
789
+ ### `sort(integer_or_proportion)`
790
+
791
+ Arrange values in Vector.
792
+
793
+ - `:+`, `:ascending` or without argument will sort in increasing order.
794
+ - `:-` or `:descending` will sort in decreasing order.
795
+
796
+ ```ruby
797
+ Vector.new(%w[B D A E C]).sort
798
+ # same as #sort(:+)
799
+ # same as #sort(:ascending)
800
+ # =>
801
+ #<RedAmber::Vector(:string, size=5):0x000000000000c134>
802
+ ["A", "B", "C", "D", "E"]
803
+
804
+ Vector.new(%w[B D A E C]).sort(:-)
805
+ # same as #sort(:descending)
806
+ # =>
807
+ #<RedAmber::Vector(:string, size=5):0x000000000000c148>
808
+ ["E", "D", "C", "B", "A"]
809
+ ```
@@ -0,0 +1,6 @@
1
+ /* Override this file with custom rules */
2
+
3
+ /* Use monospace font for code */
4
+ code {
5
+ font-family: "Courier New", Consolas, monospace;
6
+ }