RubyGems - red_amber - Versions diffs - 0.1.3 → 0.1.4 - Mend

red_amber 0.1.3 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

checksums.yaml +4 -4
data/.rubocop.yml +9 -4
data/CHANGELOG.md +60 -8
data/README.md +41 -349
data/doc/DataFrame.md +690 -0
data/doc/Vector.md +195 -0
data/doc/image/TDR_operations.pdf +0 -0
data/doc/image/arrow_table_new.png +0 -0
data/doc/image/dataframe/assign.png +0 -0
data/doc/image/dataframe/drop.png +0 -0
data/doc/image/dataframe/pick.png +0 -0
data/doc/image/dataframe/remove.png +0 -0
data/doc/image/dataframe/rename.png +0 -0
data/doc/image/dataframe/slice.png +0 -0
data/doc/image/dataframe_model.png +0 -0
data/doc/image/example_in_red_arrow.png +0 -0
data/doc/image/tdr.png +0 -0
data/doc/image/tdr_and_table.png +0 -0
data/doc/image/tidy_data_in_TDR.png +0 -0
data/doc/image/vector/binary_element_wise.png +0 -0
data/doc/image/vector/unary_aggregation.png +0 -0
data/doc/image/vector/unary_aggregation_w_option.png +0 -0
data/doc/image/vector/unary_element_wise.png +0 -0
data/doc/tdr.md +53 -0
data/doc/tdr_ja.md +53 -0
data/lib/red_amber/data_frame.rb +22 -15
data/lib/red_amber/{data_frame_output.rb → data_frame_displayable.rb} +44 -37
data/lib/red_amber/data_frame_helper.rb +64 -0
data/lib/red_amber/data_frame_observation_operation.rb +72 -0
data/lib/red_amber/data_frame_selectable.rb +21 -43
data/lib/red_amber/data_frame_variable_operation.rb +133 -0
data/lib/red_amber/vector_functions.rb +54 -29
data/lib/red_amber/version.rb +1 -1
data/lib/red_amber.rb +4 -1
metadata +27 -3

data/doc/Vector.md ADDED Viewed

@@ -0,0 +1,195 @@
+# Vector
+Class `RedAmber::Vector` represents a series of data in the DataFrame.
+## Constructor
+### Create from a column in a DataFrame
+  ```ruby
+  df = RedAmber::DataFrame.new(x: [1, 2, 3])
+  df[:x]
+  # =>
+  #<RedAmber::Vector(:uint8, size=3):0x000000000000f4ec>
+  [1, 2, 3]
+  ```
+### New from an Array
+  ```ruby
+  vector = RedAmber::Vector.new([1, 2, 3])
+  # =>
+  #<RedAmber::Vector(:uint8, size=3):0x000000000000f514>
+  [1, 2, 3]
+  ```
+## Properties
+### `to_s`
+### `values`, `to_a`, `entries`
+### `size`, `length`, `n_rows`, `nrow`
+### `type`
+### `data_type`
+### [ ] `each` (not impremented yet)
+### [ ] `chunked?` (not impremented yet)
+### [ ] `n_chunks` (not impremented yet)
+### [ ] `each_chunk` (not impremented yet)
+### `tally`
+### `n_nils`, `n_nans`
+  - `n_nulls` is an alias of `n_nils`
+### `inspect(limit: 80)`
+  - `limit` sets size limit to display long array.
+    ```ruby
+    vector = RedAmber::Vector.new((1..50).to_a)
+    # =>
+    #<RedAmber::Vector(:uint8, size=50):0x000000000000f528>
+    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, ... ]
+    ```
+## Functions
+### Unary aggregations: `vector.func => scalar`
+  ![unary aggregation](doc/image/../../image/vector/unary_aggregation_w_option.png)
+| Method    |Boolean|Numeric|String|Options|Remarks|
+| ----------- | --- | --- | --- | --- | --- |
+| ✓ `all`     |  ✓  |     |     | ✓ ScalarAggregate|     |
+| ✓ `any`     |  ✓  |     |     | ✓ ScalarAggregate|     |
+| ✓ `approximate_median`|  |✓|  | ✓ ScalarAggregate| alias `median`|
+| ✓ `count`   |  ✓  |  ✓  |  ✓  | ✓  Count  |     |
+| ✓ `count_distinct`| ✓ | ✓ | ✓ | ✓  Count  |alias `count_uniq`|
+|[ ]`index`   | [ ] | [ ] | [ ] |[ ] Index  |     |
+| ✓ `max`     |  ✓  |  ✓  |  ✓  | ✓ ScalarAggregate|     |
+| ✓ `mean`    |  ✓  |  ✓  |     | ✓ ScalarAggregate|     |
+| ✓ `min`     |  ✓  |  ✓  |  ✓  | ✓ ScalarAggregate|     |
+| ✓ `min_max` |  ✓  |  ✓  |  ✓  | ✓ ScalarAggregate|     |
+|[ ]`mode`    |     | [ ] |     |[ ] Mode    |     |
+| ✓ `product` |  ✓  |  ✓  |     | ✓ ScalarAggregate|     |
+|[ ]`quantile`|     | [ ] |     |[ ] Quantile|     |
+| ✓ `sd    `  |     |  ✓  |     |          |ddof: 1 at `stddev`|
+| ✓ `stddev`  |     |  ✓  |     | ✓ Variance|ddof: 0 by default|
+| ✓ `sum`     |  ✓  |  ✓  |     | ✓ ScalarAggregate|     |
+|[ ]`tdigest` |     | [ ] |     |[ ] TDigest |     |
+| ✓ `var    `|     |  ✓  |     |   |ddof: 1 at `variance`<br>alias `unbiased_variance`|
+| ✓ `variance`|     |  ✓  |     | ✓ Variance|ddof: 0 by default|
+Options can be used as follows.
+See the [document of C++ function](https://arrow.apache.org/docs/cpp/compute.html) for detail.
+```ruby
+double = RedAmber::Vector.new([1, 0/0.0, -1/0.0, 1/0.0, nil, ""])
+#=>
+#<RedAmber::Vector(:double, size=6):0x000000000000f910>
+[1.0, NaN, -Infinity, Infinity, nil, 0.0]
+double.count #=> 5
+double.count(opts: {mode: :only_valid}) #=> 5, default
+double.count(opts: {mode: :only_null}) #=> 1
+double.count(opts: {mode: :all}) #=> 6
+boolean = RedAmber::Vector.new([true, true, nil])
+#=>
+#<RedAmber::Vector(:boolean, size=3):0x000000000000f924>
+[true, true, nil]
+boolean.all #=> true
+boolean.all(opts: {skip_nulls: true}) #=> true
+boolean.all(opts: {skip_nulls: false}) #=> false
+```
+### Unary element-wise: `vector.func => vector`
+  ![unary element-wise](doc/image/../../image/vector/unary_element_wise.png)
+| Method    |Boolean|Numeric|String|Options|Remarks|
+| ------------ | --- | --- | --- | --- | ----- |
+| ✓ `-@`       |     |  ✓  |     |     |as `-vector`|
+| ✓ `negate`   |     |  ✓  |     |     |`-@`   |
+| ✓ `abs`      |     |  ✓  |     |     |       |
+|[ ]`acos`     |     | [ ] |     |     |       |
+|[ ]`asin`     |     | [ ] |     |     |       |
+| ✓ `atan`     |     |  ✓  |     |     |       |
+| ✓ `bit_wise_not`|  | (✓) |     |     |integer only|
+|[ ]`ceil`     |     |  ✓  |     |     |       |
+| ✓ `cos`      |     |  ✓  |     |     |       |
+|[ ]`floor`    |     |  ✓  |     |     |       |
+| ✓ `invert`   |  ✓  |     |     |     |`!`, alias `not`|
+|[ ]`ln`       |     | [ ] |     |     |       |
+|[ ]`log10`    |     | [ ] |     |     |       |
+|[ ]`log1p`    |     | [ ] |     |     |       |
+|[ ]`log2`     |     | [ ] |     |     |       |
+|[ ]`round`    |     | [ ] |     |[ ] Round|       |
+|[ ]`round_to_multiple`| | [ ] | |[ ] RoundToMultiple|       |
+| ✓ `sign`     |     |  ✓  |     |     |       |
+| ✓ `sin`      |     |  ✓  |     |     |       |
+| ✓ `tan`      |     |  ✓  |     |     |       |
+|[ ]`trunc`    |     |  ✓  |     |     |       |
+### Binary element-wise: `vector.func(vector) => vector`
+  ![binary element-wise](doc/image/../../image/vector/binary_element_wise.png)
+| Method       |Boolean|Numeric|String|Options|Remarks|
+| ----------------- | --- | --- | --- | --- | ----- |
+| ✓ `add`           |     |  ✓  |     |     | `+`   |
+| ✓ `atan2`         |     |  ✓  |     |     |       |
+| ✓ `and_kleene`    |  ✓  |     |     |     | `&`   |
+| ✓ `and_org   `    |  ✓  |     |     |     |`and` in Red Arrow|
+| ✓ `and_not`       |  ✓  |     |     |     |       |
+| ✓ `and_not_kleene`|  ✓  |     |     |     |       |
+| ✓ `bit_wise_and`  |     | (✓) |     |     |integer only|
+| ✓ `bit_wise_or`   |     | (✓) |     |     |integer only|
+| ✓ `bit_wise_xor`  |     | (✓) |     |     |integer only|
+| ✓ `divide`        |     |  ✓  |     |     | `/`   |
+| ✓ `equal`         |  ✓  |  ✓  |  ✓  |     |`==`, alias `eq`|
+| ✓ `greater`       |  ✓  |  ✓  |  ✓  |     |`>`, alias `gt`|
+| ✓ `greater_equal` |  ✓  |  ✓  |  ✓  |     |`>=`, alias `ge`|
+| ✓ `is_finite`     |     |  ✓  |     |     |       |
+| ✓ `is_inf`        |     |  ✓  |     |     |       |
+| ✓ `is_na`         |  ✓  |  ✓  |  ✓  |     |       |
+| ✓ `is_nan`        |     |  ✓  |     |     |       |
+|[ ]`is_nil`        |  ✓  |  ✓  |  ✓  |[ ] Null|alias `is_null`|
+| ✓ `is_valid`      |  ✓  |  ✓  |  ✓  |     |       |
+| ✓ `less`          |  ✓  |  ✓  |  ✓  |     |`<`, alias `lt`|
+| ✓ `less_equal`    |  ✓  |  ✓  |  ✓  |     |`<=`, alias `le`|
+|[ ]`logb`          |     | [ ] |     |     |       |
+|[ ]`mod`           |     | [ ] |     |     | `%`   |
+| ✓ `multiply`      |     |  ✓  |     |     | `*`   |
+| ✓ `not_equal`     |  ✓  |  ✓  |  ✓  |     |`!=`, alias `ne`|
+| ✓ `or_kleene`     |  ✓  |     |     |     | `\|`  |
+| ✓ `or_org`        |  ✓  |     |     |     |`or` in Red Arrow|
+| ✓ `power`         |     |  ✓  |     |     | `**`  |
+| ✓ `subtract`      |     |  ✓  |     |     | `-`   |
+| ✓ `shift_left`    |     | (✓) |     |     |`<<`, integer only|
+| ✓ `shift_right`   |     | (✓) |     |     |`>>`, integer only|
+| ✓ `xor`           |  ✓  |     |     |     | `^`   |
+(Not impremented functions)
+### [ ] sort, sort_index
+### [ ] argmin, argmax
+### [ ] (array functions)
+### [ ] (strings functions)
+### [ ] (temporal functions)
+### [ ] (conditional functions)
+### [ ] (index functions)
+### [ ] (other functions)
+## Coerce (not impremented)
+## Updating (not impremented)

data/doc/image/TDR_operations.pdf ADDED Viewed

Binary file

data/doc/image/arrow_table_new.png ADDED Viewed

Binary file

data/doc/image/dataframe/assign.png ADDED Viewed

Binary file

data/doc/image/dataframe/drop.png ADDED Viewed

Binary file

data/doc/image/dataframe/pick.png ADDED Viewed

Binary file

data/doc/image/dataframe/remove.png ADDED Viewed

Binary file

data/doc/image/dataframe/rename.png ADDED Viewed

Binary file

data/doc/image/dataframe/slice.png ADDED Viewed

Binary file

data/doc/image/dataframe_model.png ADDED Viewed

Binary file

data/doc/image/example_in_red_arrow.png ADDED Viewed

Binary file

data/doc/image/tdr.png ADDED Viewed

Binary file

data/doc/image/tdr_and_table.png ADDED Viewed

Binary file

data/doc/image/tidy_data_in_TDR.png ADDED Viewed

Binary file

data/doc/image/vector/binary_element_wise.png ADDED Viewed

Binary file

data/doc/image/vector/unary_aggregation.png ADDED Viewed

Binary file

data/doc/image/vector/unary_aggregation_w_option.png ADDED Viewed

Binary file

data/doc/image/vector/unary_element_wise.png ADDED Viewed

Binary file

data/doc/tdr.md ADDED Viewed

@@ -0,0 +1,53 @@
+# TDR (Transposed DataFrame Representation)
+([Japanese version](tdr_ja.md) of this document is available)
+TDR is a presentation style of 2D data. It shows columnar vector values in *row Vector* and observations in *column* just like a **transposed** table.
+![TDR Image](image/tdr.png)
+Row-oriented data table (1) and columnar data table (2) have different data allocation in memory within a context of Arrow Columnar Format. But they have the same data placement (in rows and columns) in our brain.
+TDR (3) is a logical concept of data placement to transpose rows and columns in a columnar table (2).
+![TDR and Table Image](image/tdr_and_table.png)
+TDR is not an implementation in software but a logical image in our mind.
+TDR is consistent with the 'transposed' tidy data concept. The only thing we should do is not to use the positional words 'row' and 'column'.
+![tidy data in TDR](image/tidy_data_in_TDR.png)
+TDR is one of a simple way to create DataFrame object in many libraries. For example, we can initalize Arrow::Table in Red Arrow like the right below and get table as left.
+![Arrow Table New](image/arrow_table_new.png)
+We are using TDR style code naturally. For other example:
+  - Ruby: Daru::DataFrame, Rover::DataFrame accept same arguments.
+  - Python: similar style in Pandas for pd.DataFrame(data_in_dict)
+  - R: similar style in tidyr for tibble(x = 1:3, y = c("A", "B", "C"))
+There are other ways to initialize data frame, but they are not intuitive.
+## Table and TDR API
+The API based on TDR is draft and RedAmber is a small experiment to test the TDR concept. The following is a comparison of Table and TDR (draft).
+|     |Basic Table|Transposed DataFrame|Comment for TDR|
+|-----------|---------|------------|---|
+|name in TDR|`Table`|`TDR`|**T**ransposed **D**ataFrame **R**epresentation|
+|variable   |located in a column|a key and a `Vector` in lateral|select by key|
+|observation|located in a row|intersection in a vertical axis|select by index|
+|number of rows|n_rows etc. |`size` |`n_row` is available as an alias|
+|number of columns|n_columns etc. |`n_keys`  |`n_col` is available as an alias|
+|shape      |[n_rows, n_columns]  |`[size, n_keys]` |same order as Table|
+|merge/join left| left_join(a,b)<br>merge(a, b, how='left')|`a.join(b)` |naturally join from bottom|
+|merge/join right| right_join(a,b))<br>merge(a, b, how='right')|`b.join(a)` |naturally join from bottom|
+## Operation example with TDR API
+[Operation example with TDR API](TDR_operation.pdf) (draft)
+## Q and A for TDR
+（Not prepared yet)

data/doc/tdr_ja.md ADDED Viewed

@@ -0,0 +1,53 @@
+# TDR (Transposed DataFrame Representation)
+([英語版](tdr.md) もあります)
+TDR は、２次元のデータの表現方法につけた名前です。TDR では下の図のように同じ型のデータに key というラベルをつけて横に並べ、それらを縦に積み重ねてデータを表現します。
+![TDR Image](image/tdr.png)
+Arrow Columnar Format では、csv のような従来の行指向データ(1)に対して、列方向に連続したデータ(2)を取り扱います。この行、列という言葉は私たちの脳内イメージを規定していて、データフレームの構造といえば(1)または(2)のような形を思い浮かべることでしょう。しかし、本質は連続したデータの配置にあるので、我々の頭の中では(3)のように行と列を入れ替えて考えてもいいはずです。
+![TDR and Table Image](image/tdr_and_table.png)
+大事なことは、TDR は頭の中の論理的なイメージであって、実装上のアーキテクチャではないということです。
+TDR は、整然データ(tidy data)の考え方とも矛盾しません。TDR における整然データは行と列を入れ替えた形で全く同じデータを表しています。一つだけ気をつけることは、混乱を避けるため、位置や方向に関するワードである行(row)や列(column)を避けるべきであるということです。
+![tidy data in TDR](image/tidy_data_in_TDR.png)
+TDR は、現時点でも2次元データを楽に初期化できる記法で、ごく自然に使われています。例えば、Red Arrow ではArrow::Table を初期化する際に下の図の右のように書けます。
+![Arrow Table New](image/arrow_table_new.png)
+これはごく自然な書き方ですが、この形は TDR の形と一致しています。その他の例として:
+  - Ruby: Daru::DataFrame, Rover::DataFrame でも上と同じように書けます。
+  - Python: Pandas で pd.DataFrame(data_in_dict) のように dict を使う場合が同じです。
+  - R: tidyr で tibble(x = 1:3, y = c("A", "B", "C")) のように書けます。
+それぞれのライブラリーで、データフレームを初期化するやり方はこれだけではありませんが、他の方法は少し回りくどいような印象があります。
+TDR で考えた方がちょっぴりうまくいくというのは単なる仮説ですが、その理由は「この惑星では横書きでコードを書く」からではないかと私は考えています。
+## Table and TDR API
+TDR に基づいた API はまだ暫定板の段階であり、RedAmber は TDR の実験の場であると考えています。下記の表に TDR と行x列形式の Table のAPIの比較を示します（暫定版）。
+|     |従来の Table|Transposed DataFrame|TDRに対するコメント|
+|-----------|---------|------------|---|
+|TDRでの呼称|`Table`|`TDR`|**T**ransposed **D**ataFrame **R**epresentationの略|
+|変数 |列に配置|`variables`<br>key と `Vector` として横方向に配置|key で選択|
+|観測 |行に配置|`observations`<br>縦方向に切った一つ一つは`slice`|index や `slice` メソッドで選択|
+|行の数|nrow, n_rows など |`size` |`n_row` をエイリアスとして設定|
+|列の数|ncol, n_columns など |`n_keys`  |`n_col` をエイリアスとして設定|
+|shape      |[nrow, ncol]  |`[size, n_keys]` |行, 列の順番は同じ|
+|merge/join left| left_join(a,b)<br>merge(a, b, how='left')|`a.join(b)` |自然に下にくっつける|
+|merge/join right| right_join(a,b))<br>merge(a, b, how='right')|`b.join(a)` |自然に下にくっつける|
+## Operation example with TDR API
+[TDR の操作例](TDR_operation.pdf) (暫定版)
+## Q and A for TDR
+（作成中)

data/lib/red_amber/data_frame.rb CHANGED Viewed

@@ -5,8 +5,11 @@ module RedAmber
   #   @table   : holds Arrow::Table object
   class DataFrame
     # mix-in
+    include DataFrameDisplayable
+    include DataFrameHelper
     include DataFrameSelectable
-    include DataFrameOutput
+    include DataFrameObservationOperation
+    include DataFrameVariableOperation
     def initialize(*args)
       # DataFrame.new, DataFrame.new([]), DataFrame.new({}), DataFrame.new(nil)
@@ -44,43 +47,42 @@ module RedAmber
     end
     # Properties ===
-    def n_rows
+    def size
       @table.n_rows
     end
-    alias_method :nrow, :n_rows
-    alias_method :size, :n_rows
-    alias_method :length, :n_rows
+    alias_method :n_rows, :size
+    alias_method :n_obs, :size
-    def n_columns
+    def n_keys
       @table.n_columns
     end
-    alias_method :ncol, :n_columns
-    alias_method :width, :n_columns
+    alias_method :n_cols, :n_keys
+    alias_method :n_vars, :n_keys
     def shape
-      [n_rows, n_columns]
+      [size, n_keys]
     end
-    def column_names
+    def keys
       @table.columns.map { |column| column.name.to_sym }
     end
-    alias_method :keys, :column_names
-    alias_method :header, :column_names
+    alias_method :column_names, :keys
+    alias_method :var_names, :keys
     def key?(key)
-      column_names.include?(key.to_sym)
+      keys.include?(key.to_sym)
     end
     alias_method :has_key?, :key?
     def key_index(key)
-      column_names.find_index(key.to_sym)
+      keys.find_index(key.to_sym)
     end
     alias_method :find_index, :key_index
     alias_method :index, :key_index
     def types
       @table.columns.map do |column|
-        column.data_type.to_s.to_sym
+        column.data.value_type.nick.to_sym
       end
     end
@@ -96,6 +98,11 @@ module RedAmber
       end
     end
+    def indexes
+      0...size
+    end
+    alias_method :indices, :indexes
     def to_h
       @table.columns.each_with_object({}) do |column, result|
         result[column.name.to_sym] = column.entries

data/lib/red_amber/{data_frame_output.rb → data_frame_displayable.rb} RENAMED Viewed

@@ -4,7 +4,7 @@ require 'stringio'
 module RedAmber
   # mix-ins for the class DataFrame
-  module DataFrameOutput
+  module DataFrameDisplayable
     def to_s
       @table.to_s
     end
@@ -13,19 +13,37 @@ module RedAmber
     # def summary() end
-    def inspect_raw
-      format "#<#{self.class}:0x%016x>\n#{self}", object_id
+    def inspect
+      "#<#{shape_str(with_id: true)}>\n#{dataframe_info(3)}"
     end
-    # - tally_level: max level to use tally mode
-    # - max_element: max element to show values in each row
-    # - TODO: Is it better to change name other than `inspect` ?
-    # - TODO: Fall back to inspect_raw when treating large dataset
-    # - TODO: Refactor code to smaller methods
-    def inspect(tally_level: 5, max_element: 5)
-      return '#<RedAmber::DataFrame (empty)>' if empty?
+    # - limit: max num of Vectors to show
+    # - tally: max level to use tally mode
+    # - elements: max element to show values in each vector
+    def tdr(limit = 10, tally: 5, elements: 5)
+      puts tdr_str(limit, tally: tally, elements: elements)
+    end
+    def tdr_str(limit = 10, tally: 5, elements: 5)
+      "#{shape_str}\n#{dataframe_info(limit, tally_level: tally, max_element: elements)}"
+    end
+    private # =====
+    def pl(num)
+      num > 1 ? 's' : ''
+    end
+    def shape_str(with_id: false)
+      shape_info = empty? ? '(empty)' : "#{size} x #{n_keys} Vector#{pl(n_keys)}"
+      id = with_id ? format(', 0x%016x', object_id) : ''
+      "#{self.class} : #{shape_info}#{id}"
+    end
-      stringio = StringIO.new # output string buffer
+    def dataframe_info(limit, tally_level: 5, max_element: 5)
+      return '' if empty?
+      limit = n_keys if [:all, -1].include? limit
       tallys = vectors.map(&:tally)
       levels = tallys.map(&:size)
@@ -34,48 +52,37 @@ module RedAmber
       headers = { idx: '#', key: 'key', type: 'type', levels: 'level', data: 'data_preview' }
       header_format = make_header_format(levels, headers, quoted_keys)
-      # 1st row: show shape of the dataframe
-      vs = "Vector#{pl(ncol)}"
-      stringio.puts \
-        "#{self.class} : #{nrow} x #{ncol} #{vs}"
-      # 2nd row: show var counts by type
-      stringio.puts "#{vs} : #{var_type_count(type_groups).join(', ')}"
-      # 3rd row: print header of rows
-      stringio.printf header_format, *headers.values
+      sio = StringIO.new # output string buffer
+      sio.puts "Vector#{pl(n_keys)} : #{var_type_count(type_groups).join(', ')}"
+      sio.printf header_format, *headers.values
-      # 4th row ~: show details for each column (vector)
       vectors.each.with_index do |vector, i|
+        if i >= limit
+          sio << " ... #{n_keys - i} more Vector#{pl(n_keys - i)} ...\n"
+          break
+        end
         key = quoted_keys[i]
         type = types[i]
         type_group = type_groups[i]
         data_tally = tallys[i]
         a = case type_group
             when :numeric, :string, :boolean
-              if data_tally.size <= tally_level && data_tally.size != nrow
+              if data_tally.size <= tally_level && data_tally.size != size
                 [data_tally.to_s]
               else
-                [shorthand(vector, nrow, max_element)].concat na_string(vector)
+                [shorthand(vector, size, max_element)].concat na_string(vector)
               end
             else
-              shorthand(vector, nrow, max_element)
+              shorthand(vector, size, max_element)
             end
-        stringio.printf header_format, i + 1, key, type, data_tally.size, a.join(', ')
+        sio.printf header_format, i + 1, key, type, data_tally.size, a.join(', ')
       end
-      stringio.string
-    end
-    private # =====
-    def pl(num)
-      num > 1 ? 's' : ''
+      sio.string
     end
     def make_header_format(levels, headers, quoted_keys)
       # find longest word to adjust column width
-      w_idx = ncol.to_s.size
+      w_idx = n_keys.to_s.size
       w_key = [quoted_keys.map(&:size).max, headers[:key].size].max
       w_type = [types.map(&:size).max, headers[:type].size].max
       w_row = [levels.map { |l| l.to_s.size }.max, headers[:levels].size].max
@@ -103,10 +110,10 @@ module RedAmber
       a
     end
-    def shorthand(vector, nrow, max_element)
+    def shorthand(vector, size, max_element)
       a = vector.to_a.take(max_element)
       a.map! { |e| e.nil? ? 'nil' : e.inspect }
-      a << '... ' if nrow > max_element
+      a << '... ' if size > max_element
       "[#{a.join(', ')}]"
     end

data/lib/red_amber/data_frame_helper.rb ADDED Viewed

@@ -0,0 +1,64 @@
+# frozen_string_literal: true
+module RedAmber
+  # mix-in for the class DataFrame
+  module DataFrameHelper
+    private
+    def expand_range(args)
+      args.each_with_object([]) do |e, a|
+        e.is_a?(Range) ? a.concat(normalized_array(e)) : a.append(e)
+      end
+    end
+    def normalized_array(range)
+      both_end = [range.begin, range.end]
+      both_end[1] -= 1 if range.exclude_end? && range.end.is_a?(Integer)
+      if both_end.any?(Integer) || both_end.all?(&:nil?)
+        if both_end.any? { |e| e&.>=(size) || e&.<(-size) }
+          raise DataFrameArgumentError, "Index out of range: #{range} for 0..#{size - 1}"
+        end
+        (0...size).to_a[range]
+      else
+        range.to_a
+      end
+    end
+    def out_of_range?(indeces)
+      indeces.max >= size || indeces.min < -size
+    end
+    def integers?(enum)
+      enum.all?(Integer)
+    end
+    def sym_or_str?(enum)
+      enum.all? { |e| e.is_a?(Symbol) || e.is_a?(String) }
+    end
+    def booleans?(enum)
+      enum.all? { |e| e.is_a?(TrueClass) || e.is_a?(FalseClass) || e.is_a?(NilClass) }
+    end
+    def create_dataframe_from_vector(key, vector)
+      DataFrame.new(key => vector.data)
+    end
+    def select_obs_by_boolean(array)
+      DataFrame.new(@table.filter(array))
+    end
+    def select_obs_by_indeces(indeces)
+      out_of_range?(indeces) && raise(DataFrameArgumentError, "Invalid index: #{indeces} for 0..#{size - 1}")
+      a = indeces.map { |i| @table.slice(i).to_a }
+      DataFrame.new(@table.schema, a)
+    end
+    def keys_by_booleans(booleans)
+      keys.select.with_index { |_, i| booleans[i] }
+    end
+  end
+end