RubyGems - red_amber - Versions diffs - 0.3.0 → 0.4.1 - Mend

red_amber 0.3.0 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (42) hide show

checksums.yaml +4 -4
data/.rubocop.yml +56 -22
data/.yardopts +2 -0
data/CHANGELOG.md +178 -0
data/Gemfile +1 -1
data/LICENSE +1 -1
data/README.md +29 -30
data/benchmark/basic.yml +7 -7
data/benchmark/combine.yml +3 -3
data/benchmark/dataframe.yml +15 -9
data/benchmark/group.yml +6 -6
data/benchmark/reshape.yml +6 -6
data/benchmark/vector.yml +6 -3
data/doc/DataFrame.md +32 -12
data/doc/DataFrame_Comparison.md +65 -0
data/doc/SubFrames.md +11 -0
data/doc/Vector.md +207 -1
data/doc/yard-templates/default/fulldoc/html/css/common.css +6 -0
data/lib/red_amber/data_frame.rb +454 -85
data/lib/red_amber/data_frame_combinable.rb +609 -115
data/lib/red_amber/data_frame_displayable.rb +313 -34
data/lib/red_amber/data_frame_indexable.rb +122 -19
data/lib/red_amber/data_frame_loadsave.rb +78 -10
data/lib/red_amber/data_frame_reshaping.rb +184 -14
data/lib/red_amber/data_frame_selectable.rb +623 -70
data/lib/red_amber/data_frame_variable_operation.rb +452 -35
data/lib/red_amber/group.rb +186 -22
data/lib/red_amber/helper.rb +74 -14
data/lib/red_amber/refinements.rb +26 -6
data/lib/red_amber/subframes.rb +1101 -0
data/lib/red_amber/vector.rb +362 -11
data/lib/red_amber/vector_aggregation.rb +312 -0
data/lib/red_amber/vector_binary_element_wise.rb +506 -0
data/lib/red_amber/vector_selectable.rb +265 -23
data/lib/red_amber/vector_unary_element_wise.rb +529 -0
data/lib/red_amber/vector_updatable.rb +278 -34
data/lib/red_amber/version.rb +2 -1
data/lib/red_amber.rb +13 -1
data/red_amber.gemspec +2 -2
metadata +13 -8
data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
data/lib/red_amber/vector_functions.rb +0 -242

data/doc/DataFrame.md CHANGED Viewed

@@ -57,6 +57,10 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
   ```ruby
   RedAmber::DataFrame.load("test/entity/with_header.csv")
   ```
+  ```ruby
+  RedAmber::DataFrame.load("test/entity/without_header.csv", headers: [:x, :y, :z])
+  ```
 - from a string buffer
@@ -275,6 +279,7 @@ penguins.to_rover
   - Shows some information about self in a transposed style.
   - `tdr_str` returns same info as a String.
+  - `glimpse` is an alias. It is similar to dplyr's (or Polars's) `glimpse()`.
   ```ruby
   require 'red_amber'
@@ -568,7 +573,7 @@ penguins.to_rover
   [1, 2, 3]
   ```
-### `slice  `  - slice and select records -
+### `slice  `  - cut into slices of records -
   Slice and select records (rows) to create a sub DataFrame.
@@ -601,11 +606,14 @@ penguins.to_rover
 - Booleans as an argument
-  `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
+  `filter(booleans)` or `slice(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
+  note: `slice(booleans)` is acceptable for orthogonality of `slice`/`remove`.
     ```ruby
     vector = penguins[:bill_length_mm]
-    penguins.slice(vector >= 40)
+    penguins.filter(vector >= 40)
+    # penguins.slice(vector >= 40) is also acceptable
     # =>
     #<RedAmber::DataFrame : 242 x 8 Vectors, 0x0000000000043d3c>
@@ -833,14 +841,14 @@ penguins.to_rover
   Assign new or updated variables (columns) and create an updated DataFrame.
-  - Variables with new keys will append new columns from the right.
+  - Variables with new keys will append new columns from right.
   - Variables with exisiting keys will update corresponding vectors.
     ![assign method image](doc/../image/dataframe/assign.png)
 - Variables as arguments
-    `assign(key_pairs)` accepts pairs of key and values as parameters. `key_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
+    `assign(key_value_pairs)` accepts pairs of key and values as parameters. `key_value_pairs` should be a Hash of `{key => array_like}` or an Array of Arrays like `[[key, array_like], ... ]`. `array_like` is ether `Vector`, `Array` or `Arrow::Array`.
     ```ruby
     df = RedAmber::DataFrame.new(
@@ -857,12 +865,12 @@ penguins.to_rover
     2 Hinata        28
     # update :age and add :brother
-    df.assign do
+    df.assign(
       {
         age: age + 29,
         brother: ['Santa', nil, 'Momotaro']
       }
-    end
+    )
     # =>
     #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000658b0>
@@ -932,7 +940,7 @@ penguins.to_rover
 - Append from left
-  `assign_left` method accepts the same parameters and block as `assign`, but append new columns from leftside.
+  `assign_left` method accepts the same parameters and block as `assign`, but append new columns from left.
   ```ruby
   df.assign_left(new_index: df.indices(1))
@@ -1453,6 +1461,8 @@ When the option `keep_key: true` used, the column `key` will be preserved.
   1 B              4
   2 D              5
   ```
+##### `set_operable?(other)`
+  Check if `types` of self and other are same.
 ##### `intersect(other)`
@@ -1498,15 +1508,23 @@ When the option `keep_key: true` used, the column `key` will be preserved.
     <string> <uint8>
   1 B              2
   2 C              3
+  other.differencr(df)
+  #=>
+  #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000040e0c>
+    KEY1        KEY2
+    <string> <uint8>
+  0 B              4
+  1 D              5
   ```
 ## Binding
 ### `concatenate(other)`
-  Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
+  Concatenate another DataFrame or Table onto the bottom of self. The types  of other must be the same as self.
-  The alias is `concat`.
+  The alias is `concat` and `bind_rows`.
   An array of DataFrames or Tables is also acceptable as other.
@@ -1538,9 +1556,11 @@ When the option `keep_key: true` used, the column `key` will be preserved.
   3       4 D
   ```
-### `merge(other)`
+### `merge(*other)`
+  Concatenate another DataFrame or Table onto the bottom of self. The size of other must be the same as self. Self and other must not share the same key.
-  Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
+  The alias is `bind_cols`.
   ```ruby
   df

data/doc/DataFrame_Comparison.md ADDED Viewed

@@ -0,0 +1,65 @@
+# Comparison of DataFrames
+Compare basic features of RedAmber with Python
+[pandas](https://pandas.pydata.org/),
+R [Tidyverse](https://www.tidyverse.org/) and
+Julia [Dataframes](https://dataframes.juliadata.org/stable/).
+## Select columns (variables)
+| Features                        |	RedAmber        |	Tidyverse 	                     | pandas                                 | DataFrames.jl     |
+|---                              |---              |---                              |---                                     |---                |
+| Select columns as a dataframe   |	pick, drop, [] 	| dplyr::select, dplyr::select_if | [], loc[], iloc[], drop, select_dtypes | [], select        |
+| Select a column as a vector     | 	[], v 	        | dplyr::pull, [, x]	             | [], loc[], iloc[]                      | [!, :x]           |
+| Move columns to a new position  |	pick, [] 	      | relocate                        | [], reindex, loc[], iloc[]             | select,transform  |
+## Select rows (records, observations)
+| Features                                              |	RedAmber 	        | Tidyverse                   | pandas                   | DataFrames.jl |
+|---                                                    |---                |---                          |---                       |---            |
+| Select rows that meet logical criteria as a dataframe |	slice, remove, [] | 	dplyr::filter              |	[], filter, query, loc[] | filter        |
+| Select rows by position as a dataframe 	              | slice, remove, [] | dplyr::slice 	              | iloc[], drop             | subset        |
+| Move rows to a new position 	                         | slice, [] 	       | dplyr::filter, dplyr::slice |	reindex, loc[], iloc[]   | permute       |
+## Update columns / create new columns
+|Features 	                         | RedAmber 	          | Tidyverse 	                                        | pandas            | DataFrames.jl |
+|---                                |---                  |---                                                 |---                |---            |
+| Update existing columns           |	assign 	            | dplyr::mutate                                     	| assign, []=       | mapcols       |
+| Create new columns 	              | assign, assign_left |	dplyr::mutate 	                                    | apply             | insertcols,.+ |
+| Compute new columns, drop others 	| new 	               | transmute 	                                        | (dfply:)transmute | transform,insertcols,mapcols |
+| Rename columns 	                  | rename              |	dplyr::rename, dplyr::rename_with, purrr::set_names |	rename, set_axis  | rename        |
+| Sort dataframe 	                  | sort 	              | dplyr::arrange 	                                    | sort_values       | sort          |
+## Reshape dataframe
+| Features 	                                           | RedAmber 	| Tidyverse 	         | pandas       | DataFrames.jl |
+|---                                                   |---        |---                  |---           |---            |
+| Gather columns into rows (create a longer dataframe) |	to_long 	 | tidyr::pivot_longer |	melt         | stack         |
+| Spread rows into columns (create a wider dataframe)  | to_wide 	 | tidyr::pivot_wider 	| pivot        | unstack       |
+| transpose a wide dataframe 	                         | transpose | transpose, t 	      | transpose, T | permutedims   |
+## Grouping
+| Features | RedAmber 	              | Tidyverse 	                          | pandas       | DataFrames.jl   |
+|---       |---                      |---                                   |---           |---              |
+|Grouping 	| group, group.summarize 	| dplyr::group_by %>% dplyr::summarise | groupby.agg  | combine,groupby |
+## Combine dataframes or tables
+| Features 	                               |  RedAmber 	                    | Tidyverse          | pandas  | DataFrames.jl |
+|---                                       |---                             |---                 |---      |---            |
+| Combine additional columns               | merge, bind_cols               | dplyr::bind_cols   | concat  | combine       |
+| Combine additional rows 	                | concatenate, concat, bind_rows |	dplyr::bind_rows 	 | concat  | transform     |
+| Join right to left, leaving only the matching rows| join, inner_join      | dplyr::inner_join  | merge   | innerjoin     |
+| Join right to left, leaving all rows     | join, full_join, outer_join 	  | dplyr::full_join   | merge   | outerjoin     |
+| Join matching values to left from right  | join, left_join                |	dplyr::left_join 	 | merge   | leftjoin      |
+| Join matching values from left to right  | join, right_join               |	dplyr::right_join  | merge   | rightjoin     |
+| Return rows of left that have a match in right | join, semi_join 	        | dplyr::semi_join 	 | [isin]  | semijoin      |
+| Return rows of left that do not have a match in right | join, anti_join   |	dplyr::anti_join 	 | [isin]  | antijoin      |
+| Collect rows that appear in left or right | union 	                       | dplyr::union       | merge   |               |
+| Collect rows that appear in both left and right | intersect 	             | dplyr::intersect 	 | merge   |               |
+| Collect rows that appear in left but not right | difference, setdiff      | dplyr::setdiff 	   | merge   |               |

data/doc/SubFrames.md ADDED Viewed

@@ -0,0 +1,11 @@
+# SubFrames
+`SubFrames` represents a collection of subsets of a DataFrame.
+It has an Array of indices `#subset_indices` which is able to create an Array of sub DataFrames.
+The concept includes `group` operation of a Dataframe, rolling window operation and has more broad capabilities.
+This feature is experimental. It may be removed or be changed in the future.
+## Create SubFrames
+## Properties of SubFrames

data/doc/Vector.md CHANGED Viewed

@@ -182,6 +182,31 @@ boolean.all(skip_nulls: true) #=> true
 boolean.all(skip_nulls: false) #=> false
 ```
+### Check if `function` is an aggregation function: `Vector.aggregate?(function)`
+Return true if `function` is an unary aggregation function. Otherwise return false.
+### Treat aggregation function as an element-wise function: `propagate(function)`
+Spread the return value of an aggregate function as if it is a element-wise function.
+```ruby
+vec = Vector.new(1, 2, 3, 4)
+vec.propagate(:mean)
+# =>
+#<RedAmber::Vector(:double, size=4):0x000000000001985c>
+[2.5, 2.5, 2.5, 2.5]
+```
+`#propagate` also accepts a block to compute with a customized aggregation function yielding a scalar.
+```ruby
+vec.propagate { |v| v.mean.round }
+# =>
+#<RedAmber::Vector(:uint8, size=4):0x000000000000cb98>
+[3, 3, 3, 3]
+```
 ### Unary element-wise: `vector.func => vector`
   ![unary element-wise](doc/image/../../image/vector/unary_element_wise.png)
@@ -305,7 +330,7 @@ double.round(n_digits: -1)
   Returns index of specified element.
-### `quantiles(probs = [1.0, 0.75, 0.5, 0.25, 0.0], interpolation: :linear, skip_nils: true, min_count: 0)`
+### `quantiles(probs = [0.0, 0.25, 0.5, 0.75, 1.0], interpolation: :linear, skip_nils: true, min_count: 0)`
   Returns quantiles for specified probabilities in a DataFrame.
@@ -601,3 +626,184 @@ vector.merge(other, sep: '')
 #<RedAmber::Vector(:string, size=3):0x0000000000038b80>
 ["ab", "cd", "ef"]
 ```
+### `concatenate(other)` or `concat(other)`
+Concatenate other array-like to self and return a concatenated Vector.
+- `other` is one of `Vector`, `Array`, `Arrow::Array` or `Arrow::ChunkedArray`
+- Different type will be 'resolved'.
+Concatenate to string
+```ruby
+string_vector
+# =>
+#<RedAmber::Vector(:string, size=2):0x00000000000037b4>
+["A", "B"]
+string_vector.concatenate([1, 2])
+# =>
+#<RedAmber::Vector(:string, size=4):0x0000000000003818>
+["A", "B", "1", "2"]
+```
+Concatenate to integer
+```ruby
+integer_vector
+# =>
+#<RedAmber::Vector(:uint8, size=2):0x000000000000382c>
+[1, 2]
+nteger_vector.concatenate(["A", "B"])
+# =>
+#<RedAmber::Vector(:uint8, size=4):0x0000000000003840>
+[1, 2, 65, 66]
+```
+### `rank`
+Returns numerical rank of self.
+- Nil values are considered greater than any value.
+- NaN values are considered greater than any value but smaller than nil values.
+- Tiebreakers are ranked in order of appearance.
+- `RankOptions` in C++ function is not implemented in C GLib yet.
+  This method is currently fixed to the default behavior.
+Returns 0-based rank of self (0...size in range) as a Vector.
+Rank of float Vector
+```ruby
+fv = Vector.new(0.1, nil, Float::NAN, 0.2, 0.1); fv
+# =>
+#<RedAmber::Vector(:double, size=5):0x000000000000c65c>
+[0.1, nil, NaN, 0.2, 0.1]
+fv.rank
+# =>
+#<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
+[0, 4, 3, 2, 1]
+```
+Rank of string Vector
+```ruby
+sv = Vector.new("A", "B", nil, "A", "C"); sv
+# =>
+#<RedAmber::Vector(:string, size=5):0x0000000000003854>
+["A", "B", nil, "A", "C"]
+sv.rank
+# =>
+#<RedAmber::Vector(:uint64, size=5):0x0000000000003868>
+[0, 2, 4, 1, 3]
+```
+### `sample(integer_or_proportion)`
+Pick up elements at random.
+#### `sample` : without agrument
+Return a randomly selected element.
+This is one of an aggregation function.
+```ruby
+v = Vector.new('A'..'H'); v
+# =>
+#<RedAmber::Vector(:string, size=8):0x0000000000011b20>
+["A", "B", "C", "D", "E", "F", "G", "H"]
+v.sample
+# =>
+"C"
+```
+#### `sample(n)` : n as a Integer
+Pick up n elements at random.
+- Param `n` is number of elements to pick.
+- `n` is a positive Integer
+- If `n` is smaller or equal to size, elements are picked by non-repeating.
+- If `n` is greater than `size`, elements are picked repeatedly.
+@return [Vector] sampled elements.
+- If `n == 1` (in case of `sample(1)`), it returns a Vector of `size == 1` not a scalar.
+```ruby
+v.sample(1)
+# =>
+#<RedAmber::Vector(:string, size=1):0x000000000001a3b0>
+["H"]
+```
+Sample same size of self: every element is picked in random order.
+```ruby
+v.sample(8)
+# =>
+#<RedAmber::Vector(:string, size=8):0x000000000001bda0>
+["H", "D", "B", "F", "E", "A", "G", "C"]
+```
+Over sampling: "E" and "A" are sampled repeatedly.
+```ruby
+v.sample(9)
+# =>
+#<RedAmber::Vector(:string, size=9):0x000000000001d790>
+["E", "E", "A", "D", "H", "C", "A", "F", "H"]
+```
+#### `sample(prop)` : prop as a Float
+Pick up elements by proportion `prop` at random.
+- `prop` is proportion of elements to pick.
+- `prop` is a positive Float.
+- Absolute number of elements to pick:`prop*size` is rounded (by `half: :up`).
+- If `prop` is smaller or equal to 1.0, elements are picked by non-repeating.
+- If `prop` is greater than 1.0, some elements are picked repeatedly.
+- Returns sampled elements by a Vector.
+- If picked element is only one, it returns a Vector of `size == 1` not a scalar.
+Sample same size of self: every element is picked in random order.
+```ruby
+v.sample(1.0)
+# =>
+#<RedAmber::Vector(:string, size=8):0x000000000001bda0>
+["D", "H", "F", "C", "A", "B", "E", "G"]
+```
+2 times over sampling.
+```ruby
+v.sample(2.0)
+# =>
+#<RedAmber::Vector(:string, size=16):0x00000000000233e8>
+["H", "B", "C", "B", "C", "A", "F", "A", "E", "C", "H", "F", "F", "A", ... ]
+```
+### `sort(integer_or_proportion)`
+Arrange values in Vector.
+- `:+`, `:ascending` or without argument will sort in increasing order.
+- `:-` or `:descending` will sort in decreasing order.
+```ruby
+Vector.new(%w[B D A E C]).sort
+# same as #sort(:+)
+# same as #sort(:ascending)
+# =>
+#<RedAmber::Vector(:string, size=5):0x000000000000c134>
+["A", "B", "C", "D", "E"]
+Vector.new(%w[B D A E C]).sort(:-)
+# same as #sort(:descending)
+# =>
+#<RedAmber::Vector(:string, size=5):0x000000000000c148>
+["E", "D", "C", "B", "A"]
+```

data/doc/yard-templates/default/fulldoc/html/css/common.css ADDED Viewed

@@ -0,0 +1,6 @@
+/* Override this file with custom rules */
+/* Use monospace font for code */
+code {
+  font-family: "Courier New", Consolas, monospace;
+}