red_amber 0.1.3 → 0.1.6

Sign up to get free protection for your applications and to get access to all the features.
Files changed (43) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +31 -7
  3. data/CHANGELOG.md +214 -10
  4. data/Gemfile +4 -0
  5. data/README.md +117 -342
  6. data/benchmark/csv_load_penguins.yml +15 -0
  7. data/benchmark/drop_nil.yml +11 -0
  8. data/doc/DataFrame.md +854 -0
  9. data/doc/Vector.md +449 -0
  10. data/doc/image/arrow_table_new.png +0 -0
  11. data/doc/image/dataframe/assign.png +0 -0
  12. data/doc/image/dataframe/drop.png +0 -0
  13. data/doc/image/dataframe/pick.png +0 -0
  14. data/doc/image/dataframe/remove.png +0 -0
  15. data/doc/image/dataframe/rename.png +0 -0
  16. data/doc/image/dataframe/slice.png +0 -0
  17. data/doc/image/dataframe_model.png +0 -0
  18. data/doc/image/example_in_red_arrow.png +0 -0
  19. data/doc/image/tdr.png +0 -0
  20. data/doc/image/tdr_and_table.png +0 -0
  21. data/doc/image/tidy_data_in_TDR.png +0 -0
  22. data/doc/image/vector/binary_element_wise.png +0 -0
  23. data/doc/image/vector/unary_aggregation.png +0 -0
  24. data/doc/image/vector/unary_aggregation_w_option.png +0 -0
  25. data/doc/image/vector/unary_element_wise.png +0 -0
  26. data/doc/tdr.md +56 -0
  27. data/doc/tdr_ja.md +56 -0
  28. data/lib/red-amber.rb +27 -0
  29. data/lib/red_amber/data_frame.rb +91 -37
  30. data/lib/red_amber/{data_frame_output.rb → data_frame_displayable.rb} +49 -41
  31. data/lib/red_amber/data_frame_indexable.rb +38 -0
  32. data/lib/red_amber/data_frame_observation_operation.rb +11 -0
  33. data/lib/red_amber/data_frame_selectable.rb +155 -48
  34. data/lib/red_amber/data_frame_variable_operation.rb +137 -0
  35. data/lib/red_amber/helper.rb +61 -0
  36. data/lib/red_amber/vector.rb +69 -16
  37. data/lib/red_amber/vector_functions.rb +80 -45
  38. data/lib/red_amber/vector_selectable.rb +124 -0
  39. data/lib/red_amber/vector_updatable.rb +104 -0
  40. data/lib/red_amber/version.rb +1 -1
  41. data/lib/red_amber.rb +1 -16
  42. data/red_amber.gemspec +3 -6
  43. metadata +38 -9
data/README.md CHANGED
@@ -1,20 +1,29 @@
1
1
  # RedAmber
2
2
 
3
- A simple dataframe library for Ruby (experimental)
3
+ A simple dataframe library for Ruby (experimental).
4
4
 
5
5
  - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
6
- - Simple API similar to [Rover-df](https://github.com/ankane/rover)
6
+ - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
7
7
 
8
8
  ## Requirements
9
9
 
10
10
  ```ruby
11
- gem 'red-arrow', '>= 7.0.0'
12
- gem 'red-parquet', '>= 7.0.0' # if you use IO from/to parquet
11
+ gem 'red-arrow', '>= 8.0.0'
12
+ gem 'red-parquet', '>= 8.0.0' # if you use IO from/to parquet
13
13
  gem 'rover-df', '~> 0.3.0' # if you use IO from/to Rover::DataFrame
14
14
  ```
15
15
 
16
16
  ## Installation
17
17
 
18
+ Install requirements before you install Red Amber.
19
+
20
+ - Apache Arrow GLib (>= 8.0.0)
21
+ - Apache Parquet GLib (>= 8.0.0)
22
+
23
+ See [Apache Arrow install document](https://arrow.apache.org/install/).
24
+
25
+ Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber.
26
+
18
27
  Add this line to your Gemfile:
19
28
 
20
29
  ```ruby
@@ -23,134 +32,66 @@ gem 'red_amber'
23
32
 
24
33
  And then execute:
25
34
 
26
- $ bundle install
35
+ ```shell
36
+ bundle install
37
+ ```
27
38
 
28
39
  Or install it yourself as:
29
40
 
30
- $ gem install red_amber
31
-
32
- ## `RedAmber::DataFrame`
33
-
34
- ### Constructors and saving
35
-
36
- - [x] `new` from a columnar Hash
37
- - `RedAmber::DataFrame.new(x: [1, 2, 3])`
38
-
39
- - [x] `new` from a schema (by Hash) and rows (by Array)
40
- - `RedAmber::DataFrame.new({:x=>:uint8}, [[1], [2], [3]])`
41
-
42
- - [x] `new` from an Arrow::Table
43
- - `RedAmber::DataFrame.new(Arrow::Table.new(x: [1, 2, 3]))`
44
-
45
- - [x] `new` from a Rover::DataFrame
46
- - `RedAmber::DataFrame.new(Rover::DataFrame.new(x: [1, 2, 3]))`
47
-
48
- - [x] `load` (class method)
49
-
50
- - [x] from a [`.arrow`, `.arrows`, `.csv`, `.csv.gz`, `.tsv`] file
51
- - `RedAmber::DataFrame.load("test/entity/with_header.csv")`
52
-
53
- - [x] from a string buffer
54
-
55
- - [x] from a URI
56
- - `RedAmber::DataFrame.load(URI("https://github.com/heronshoes/red_amber/blob/master/test/entity/with_header.csv"))`
57
-
58
- - [x] from a Parquet file
59
-
60
- `red-parquet` gem is required.
61
-
62
- ```ruby
63
- require 'parquet'
64
- dataframe = RedAmber::DataFrame.load("file.parquet")
65
- ```
66
-
67
- - [x] `save` (instance method)
68
-
69
- - [x] to a [`.arrow`, `.arrows`, `.csv`, `.csv.gz`, `.tsv`] file
70
-
71
- - [x] to a string buffer
72
-
73
- - [x] to a URI
74
-
75
- - [x] to a Parquet file
76
-
77
- `red-parquet` gem is required.
78
-
79
- ```ruby
80
- require 'parquet'
81
- dataframe.save("file.parquet")
82
- ```
83
-
84
- ### Properties
85
-
86
- - [x] `table`
87
-
88
- Reader of Arrow::Table object inside.
89
-
90
- - [x] `n_rows`, `nrow`, `size`, `length`
91
-
92
- Returns num of rows (data size).
93
-
94
- - [x] `n_columns`, `ncol`, `width`
95
-
96
- Returns num of columns (num of vectors).
97
-
98
- - [x] `shape`
99
-
100
- Returns shape in an Array[n_rows, n_cols].
101
-
102
- - [x] `column_names`, `keys`
103
-
104
- Returns num of column names by an Array.
105
-
106
- - [x] `types`
107
-
108
- Returns types of columns by an Array of Symbols.
109
-
110
- - [x] `data_types`
111
-
112
- Returns types of columns by an Array of `Arrow::DataType`.
113
-
114
- - [x] `vectors`
115
-
116
- Returns an Array of Vectors.
117
-
118
- - [x] `to_h`
119
-
120
- Returns column-oriented data in a Hash.
121
-
122
- - [x] `to_a`, `raw_records`
123
-
124
- Returns an array of row-oriented data without header. If you need a column-oriented full array, use `.to_h.to_a`
125
-
126
- - [x] `schema`
127
-
128
- Returns column name and data type in a Hash.
41
+ ```shell
42
+ gem install red_amber
43
+ ```
129
44
 
130
- - [x] `==`
131
-
132
- - [x] `empty?`
45
+ (From v0.1.6)
133
46
 
134
- ### Output
47
+ RedAmber uses TDR mode for `#inspect` and `#to_iruby` by default. If you prefer Table mode, please set the environment variable
48
+ `RED_AMBER_OUTPUT_MODE` to `"table"`. See [TDR section](#TDR) for detail.
135
49
 
136
- - [x] `to_s`
50
+ ## `RedAmber::DataFrame`
137
51
 
138
- - [ ] summary, describe
52
+ Represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
139
53
 
140
- - [x] `to_rover`
54
+ ```ruby
55
+ require 'red_amber' # require 'red-amber' is also OK.
56
+ require 'datasets-arrow'
141
57
 
142
- Returns a `Rover::DataFrame`.
58
+ arrow = Datasets::Penguins.new.to_arrow
59
+ penguins = RedAmber::DataFrame.new(arrow)
60
+ penguins.table
143
61
 
144
- - [x] `inspect(tally_level: 5, max_element: 5)`
62
+ # =>
63
+ #<Arrow::Table:0x111271098 ptr=0x7f9118b3e0b0>
64
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
65
+ 0 Adelie Torgersen 39.100000 18.700000 181 3750 male 2007
66
+ 1 Adelie Torgersen 39.500000 17.400000 186 3800 female 2007
67
+ 2 Adelie Torgersen 40.300000 18.000000 195 3250 female 2007
68
+ 3 Adelie Torgersen (null) (null) (null) (null) (null) 2007
69
+ 4 Adelie Torgersen 36.700000 19.300000 193 3450 female 2007
70
+ 5 Adelie Torgersen 39.300000 20.600000 190 3650 male 2007
71
+ 6 Adelie Torgersen 38.900000 17.800000 181 3625 female 2007
72
+ 7 Adelie Torgersen 39.200000 19.600000 195 4675 male 2007
73
+ 8 Adelie Torgersen 34.100000 18.100000 193 3475 (null) 2007
74
+ 9 Adelie Torgersen 42.000000 20.200000 190 4250 (null) 2007
75
+ ...
76
+ 334 Gentoo Biscoe 46.200000 14.100000 217 4375 female 2009
77
+ 335 Gentoo Biscoe 55.100000 16.000000 230 5850 male 2009
78
+ 336 Gentoo Biscoe 44.500000 15.700000 217 4875 (null) 2009
79
+ 337 Gentoo Biscoe 48.800000 16.200000 222 6000 male 2009
80
+ 338 Gentoo Biscoe 47.200000 13.700000 214 4925 female 2009
81
+ 339 Gentoo Biscoe (null) (null) (null) (null) (null) 2009
82
+ 340 Gentoo Biscoe 46.800000 14.300000 215 4850 female 2009
83
+ 341 Gentoo Biscoe 50.400000 15.700000 222 5750 male 2009
84
+ 342 Gentoo Biscoe 45.200000 14.800000 212 5200 female 2009
85
+ 343 Gentoo Biscoe 49.900000 16.100000 213 5400 male 2009
86
+ ```
145
87
 
146
- Shows some information about self in a transposed style.
88
+ By default, RedAmber shows self by compact transposed style. This unfamiliar style (TDR) is designed for
89
+ the exploratory data processing. It keeps Vectors as row vectors, shows keys and types at a glance, shows levels
90
+ for the 'factor-like' variables and shows the number of abnormal values like NaN and nil.
147
91
 
148
92
  ```ruby
149
- require 'red_amber'
150
- require 'datasets-arrow'
93
+ penguins
151
94
 
152
- penguins = Datasets::Penguins.new.to_arrow
153
- RedAmber::DataFrame.new(penguins)
154
95
  # =>
155
96
  RedAmber::DataFrame : 344 x 8 Vectors
156
97
  Vectors : 5 numeric, 3 strings
@@ -165,257 +106,91 @@ Vectors : 5 numeric, 3 strings
165
106
  8 :year uint16 3 {2007=>110, 2008=>114, 2009=>120}
166
107
  ```
167
108
 
168
- - tally_level: max level to use tally mode
169
- - max_element: max num of element to show values in each row
170
-
171
- ### Selecting
109
+ ### DataFrame model
110
+ ![dataframe model of RedAmber](doc/image/dataframe_model.png)
172
111
 
173
- - [x] Select columns by `[]` as `[key]`, `[keys]`, `[keys[index]]`
174
- - Key in a Symbol: `df[:symbol]`
175
- - Key in a String: `df["string"]`
176
- - Keys in an Array: `df[:symbol1, "string", :symbol2]`
177
- - Keys in indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`
178
- - Keys in a Range:
179
- A end-less Range can be used to represent keys.
112
+ For example, `DataFrame#pick` accepts keys as an argument and returns a sub DataFrame.
180
113
 
181
114
  ```ruby
182
- hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
183
- df = RedAmber::DataFrame.new(hash)
184
- df[:b..:c, "a"]
115
+ df = penguins.pick(:body_mass_g)
185
116
  # =>
186
- RedAmber::DataFrame : 3 x 3 Vectors
187
- Vectors : 2 numeric, 1 string
188
- # key type level data_preview
189
- 1 :b string 3 ["A", "B", "C"]
190
- 2 :c double 3 [1.0, 2.0, 3.0]
191
- 3 :a uint8 3 [1, 2, 3]
117
+ #<RedAmber::DataFrame : 344 x 1 Vector, 0x000000000000fa14>
118
+ Vector : 1 numeric
119
+ # key type level data_preview
120
+ 1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
192
121
  ```
193
122
 
194
- - [x] Select rows by `[]` as `[index]`, `[range]`, `[array]`
195
- - Select a row by index: `df[0]`
196
- - Select rows by indeces in a Range: `df[1..2]`
197
- - Select rows by indeces in an Array: `df[1, 2]`
198
- - Mixed case: `df[2, 0..]`
199
-
200
- - [x] Select rows from top or bottom
201
-
202
- `head(n=5)`, `tail(n=5)`, `first(n=1)`, `last(n=1)`
203
-
204
- - [ ] slice
205
-
206
- ### Updating
207
-
208
- - [ ] Add a new column
209
-
210
- - [ ] Update a single element
211
-
212
- - [ ] Update multiple elements
213
-
214
- - [ ] Update all elements
215
-
216
- - [ ] Update elements matching a condition
123
+ `DataFrame#assign` creates new variables (column in the table).
217
124
 
218
- - [ ] Clamp
219
-
220
- - [ ] Delete columns
221
-
222
- - [ ] Rename a column
223
-
224
- - [ ] Sort rows
225
-
226
- - [ ] Clear data
227
-
228
- ### Treat na data
229
-
230
- - [ ] Drop na (NaN, nil)
231
-
232
- - [ ] Replace na with value
233
-
234
- - [ ] Interpolate na with convolution array
235
-
236
- ### Combining DataFrames
237
-
238
- - [ ] Add rows
239
-
240
- - [ ] Add columns
125
+ ```ruby
126
+ df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
127
+ # =>
128
+ #<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000fa28>
129
+ Vectors : 2 numeric
130
+ # key type level data_preview
131
+ 1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
132
+ 2 :body_mass_kg double 95 [3.75, 3.8, 3.25, nil, 3.45, ... ], 2 nils
133
+ ```
241
134
 
242
- - [ ] Inner join
135
+ DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
243
136
 
244
- - [ ] Left join
137
+ This is an exaple to eliminate observations (row in the table) containing nil.
245
138
 
246
- ### Encoding
139
+ ```ruby
140
+ # remove all observation contains nil
141
+ nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
142
+ nil_removed.tdr
143
+ # =>
144
+ RedAmber::DataFrame : 342 x 8 Vectors
145
+ Vectors : 5 numeric, 3 strings
146
+ # key type level data_preview
147
+ 1 :species string 3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
148
+ 2 :island string 3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
149
+ 3 :bill_length_mm double 164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
150
+ 4 :bill_depth_mm double 80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
151
+ 5 :flipper_length_mm int64 55 [181, 186, 195, 193, 190, ... ]
152
+ 6 :body_mass_g int64 94 [3750, 3800, 3250, 3450, 3650, ... ]
153
+ 7 :sex string 3 {"male"=>168, "female"=>165, ""=>9}
154
+ 8 :year int64 3 {2007=>109, 2008=>114, 2009=>119}
155
+ ```
247
156
 
248
- - [ ] One-hot encoding
157
+ For this frequently needed task, we can do it much simpler.
249
158
 
250
- ### Iteration (not impremented)
159
+ ```ruby
160
+ penguins.remove_nil # => same result as above
161
+ ```
251
162
 
252
- ### Filtering (not impremented)
163
+ See [DataFrame.md](doc/DataFrame.md) for details.
253
164
 
254
165
 
255
166
  ## `RedAmber::Vector`
256
- ### Constructor
257
-
258
- - [x] Create from a column in a DataFrame
259
-
260
- - [x] New from an Array
261
-
262
- ### Properties
263
-
264
- - [x] `to_s`
265
-
266
- - [x] `values`, `to_a`, `entries`
267
167
 
268
- - [x] `size`, `length`, `n_rows`, `nrow`
168
+ Class `RedAmber::Vector` represents a series of data in the DataFrame.
269
169
 
270
- - [x] `type`
271
-
272
- - [x] `data_type`
273
-
274
- - [ ] `each`
275
-
276
- - [ ] `chunked?`
277
-
278
- - [ ] `n_chunks`
279
-
280
- - [ ] `each_chunk`
281
-
282
- - [x] `tally`
283
-
284
- - [x] `n_nils`, `n_nans`
285
-
286
- - `n_nulls` is an alias of `n_nils`
287
-
288
- - [x] `inspect(limit: 80)`
170
+ ```ruby
171
+ penguins[:bill_length_mm]
172
+ # =>
173
+ #<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
174
+ [39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
175
+ ```
289
176
 
290
- - `limit` sets size limit to display long array.
177
+ Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
291
178
 
292
- ### Functions
293
- #### Unary aggregations: vector.func => scalar
179
+ See [Vector.md](doc/Vector.md) for details.
294
180
 
295
- | Method |Boolean|Numeric|String|Options|Remarks|
296
- | ----------- | --- | --- | --- | --- | --- |
297
- | ✓ `all` | ✓ | | | ✓ ScalarAggregate| |
298
- | ✓ `any` | ✓ | | | ✓ ScalarAggregate| |
299
- | ✓ `approximate_median`| |✓| | ✓ ScalarAggregate| alias `median`|
300
- | ✓ `count` | ✓ | ✓ | ✓ | ✓ Count | |
301
- | ✓ `count_distinct`| ✓ | ✓ | ✓ | ✓ Count |alias `count_uniq`|
302
- |[ ]`index` | [ ] | [ ] | [ ] |[ ] Index | |
303
- | ✓ `max` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| |
304
- | ✓ `mean` | ✓ | ✓ | | ✓ ScalarAggregate| |
305
- | ✓ `min` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| |
306
- |[ ]`min_max` | [ ] | [ ] | [ ] |[ ] ScalarAggregate| |
307
- |[ ]`mode` | | [ ] | |[ ] Mode | |
308
- | ✓ `product` | ✓ | ✓ | | ✓ ScalarAggregate| |
309
- |[ ]`quantile`| | [ ] | |[ ] Quantile| |
310
- |[ ]`stddev` | | ✓ | |[ ] Variance| |
311
- | ✓ `sum` | ✓ | ✓ | | ✓ ScalarAggregate| |
312
- |[ ]`tdigest` | | [ ] | |[ ] TDigest | |
313
- |[ ]`variance`| | ✓ | |[ ] Variance| |
181
+ ## TDR
314
182
 
183
+ I named the data frame representation style in the model above as TDR (Transposed DataFrame Representation).
315
184
 
316
- Options can be used as follows.
317
- See the [document of C++ function](https://arrow.apache.org/docs/cpp/compute.html) for detail.
185
+ This library can be used with both TDR mode and usual Table mode.
186
+ If you set the environment variable `RED_AMBER_OUTPUT_MODE` to `"table"`, output style by `inspect` and `to_iruby` is the Table mode. Other value including nil will output TDR style.
318
187
 
188
+ You can switch the mode in Ruby like this.
319
189
  ```ruby
320
- double = RedAmber::Vector.new([1, 0/0.0, -1/0.0, 1/0.0, nil, ""])
321
- #=>
322
- #<RedAmber::Vector(:double, size=6):0x000000000000f910>
323
- [1.0, NaN, -Infinity, Infinity, nil, 0.0]
324
-
325
- double.count #=> 5
326
- double.count(opts: {mode: :only_valid}) #=> 5, default
327
- double.count(opts: {mode: :only_null}) #=> 1
328
- double.count(opts: {mode: :all}) #=> 6
329
-
330
- boolean = RedAmber::Vector.new([true, true, nil])
331
- #=>
332
- #<RedAmber::Vector(:boolean, size=3):0x000000000000f924>
333
- [true, true, nil]
334
-
335
- boolean.all #=> true
336
- boolean.all(opts: {skip_nulls: true}) #=> true
337
- boolean.all(opts: {skip_nulls: false}) #=> false
190
+ ENV['RED_AMBER_OUTPUT_STYLE'] = 'table' # => Table mode
338
191
  ```
339
192
 
340
- #### Unary element-wise: vector.func => vector
341
-
342
- | Method |Boolean|Numeric|String|Options|Remarks|
343
- | ------------ | --- | --- | --- | --- | ----- |
344
- | ✓ `-@` | | ✓ | | |as `-vector`|
345
- | ✓ `negate` | | ✓ | | |`-@` |
346
- | ✓ `abs` | | ✓ | | | |
347
- |[ ]`acos` | | [ ] | | | |
348
- |[ ]`asin` | | [ ] | | | |
349
- | ✓ `atan` | | ✓ | | | |
350
- | ✓ `bit_wise_not`| | (✓) | | |integer only|
351
- |[ ]`ceil` | | ✓ | | | |
352
- | ✓ `cos` | | ✓ | | | |
353
- |[ ]`floor` | | ✓ | | | |
354
- | ✓ `invert` | ✓ | | | |`!`, alias `not`|
355
- |[ ]`ln` | | [ ] | | | |
356
- |[ ]`log10` | | [ ] | | | |
357
- |[ ]`log1p` | | [ ] | | | |
358
- |[ ]`log2` | | [ ] | | | |
359
- |[ ]`round` | | [ ] | |[ ] Round| |
360
- |[ ]`round_to_multiple`| | [ ] | |[ ] RoundToMultiple| |
361
- | ✓ `sign` | | ✓ | | | |
362
- | ✓ `sin` | | ✓ | | | |
363
- | ✓ `tan` | | ✓ | | | |
364
- |[ ]`trunc` | | ✓ | | | |
365
-
366
- #### Binary element-wise: vector.func(vector) => vector
367
-
368
- | Method |Boolean|Numeric|String|Options|Remarks|
369
- | ----------------- | --- | --- | --- | --- | ----- |
370
- | ✓ `add` | | ✓ | | | `+` |
371
- | ✓ `atan2` | | ✓ | | | |
372
- | ✓ `and_kleene` | ✓ | | | | `&` |
373
- | ✓ `and_org ` | ✓ | | | |`and` in Red Arrow|
374
- | ✓ `and_not` | ✓ | | | | |
375
- | ✓ `and_not_kleene`| ✓ | | | | |
376
- | ✓ `bit_wise_and` | | (✓) | | |integer only|
377
- | ✓ `bit_wise_or` | | (✓) | | |integer only|
378
- | ✓ `bit_wise_xor` | | (✓) | | |integer only|
379
- | ✓ `divide` | | ✓ | | | `/` |
380
- | ✓ `equal` | ✓ | ✓ | ✓ | |`==`, alias `eq`|
381
- | ✓ `greater` | ✓ | ✓ | ✓ | |`>`, alias `gt`|
382
- | ✓ `greater_equal` | ✓ | ✓ | ✓ | |`>=`, alias `ge`|
383
- | ✓ `is_finite` | | ✓ | | | |
384
- | ✓ `is_inf` | | ✓ | | | |
385
- | ✓ `is_na` | ✓ | ✓ | ✓ | | |
386
- | ✓ `is_nan` | | ✓ | | | |
387
- |[ ]`is_nil` | ✓ | ✓ | ✓ |[ ] Null|alias `is_null`|
388
- | ✓ `is_valid` | ✓ | ✓ | ✓ | | |
389
- | ✓ `less` | ✓ | ✓ | ✓ | |`<`, alias `lt`|
390
- | ✓ `less_equal` | ✓ | ✓ | ✓ | |`<=`, alias `le`|
391
- |[ ]`logb` | | [ ] | | | |
392
- |[ ]`mod` | | [ ] | | | `%` |
393
- | ✓ `multiply` | | ✓ | | | `*` |
394
- | ✓ `not_equal` | ✓ | ✓ | ✓ | |`!=`, alias `ne`|
395
- | ✓ `or_kleene` | ✓ | | | | `\|` |
396
- | ✓ `or_org` | ✓ | | | |`or` in Red Arrow|
397
- | ✓ `power` | | ✓ | | | `**` |
398
- | ✓ `subtract` | | ✓ | | | `-` |
399
- | ✓ `shift_left` | | (✓) | | |`<<`, integer only|
400
- | ✓ `shift_right` | | (✓) | | |`>>`, integer only|
401
- | ✓ `xor` | ✓ | | | | `^` |
402
-
403
- ##### (Not impremented)
404
- - [ ] sort, sort_index
405
- - [ ] argmin, argmax
406
- - [ ] (array functions)
407
- - [ ] (strings functions)
408
- - [ ] (temporal functions)
409
- - [ ] (conditional functions)
410
- - [ ] (index functions)
411
- - [ ] (other functions)
412
-
413
- ### Coerce (not impremented)
414
-
415
- ### Updating (not impremented)
416
-
417
- ### DSL in a block for faster calculation ?
418
-
193
+ For more detail information about TDR, see [TDR.md](doc/tdr.md).
419
194
 
420
195
  ## Development
421
196
 
@@ -0,0 +1,15 @@
1
+ prelude: |
2
+ require 'datasets-arrow'
3
+ require 'rover'
4
+ require 'red_amber'
5
+
6
+ penguins_csv = 'benchmark/cache/penguins.csv'
7
+
8
+ unless File.exist?(penguins_csv)
9
+ arrow = Datasets::Penguins.new.to_arrow
10
+ RedAmber::DataFrame.new(arrow).save(penguins_csv)
11
+ end
12
+
13
+ benchmark:
14
+ 'penguins by Rover': Rover.read_csv(penguins_csv)
15
+ 'penguins by RedAmber': RedAmber::DataFrame.load(penguins_csv)
@@ -0,0 +1,11 @@
1
+ prelude: |
2
+ require 'datasets-arrow'
3
+ require 'red_amber'
4
+
5
+ penguins = RedAmber::DataFrame.new(Datasets::Penguins.new.to_arrow)
6
+
7
+ def drop_nil(penguins)
8
+ penguins.remove { vectors.map { |v| v.is_nil} }
9
+ end
10
+
11
+ benchmark: drop_nil(penguins)