red_amber 0.1.3 → 0.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +31 -7
  3. data/CHANGELOG.md +214 -10
  4. data/Gemfile +4 -0
  5. data/README.md +117 -342
  6. data/benchmark/csv_load_penguins.yml +15 -0
  7. data/benchmark/drop_nil.yml +11 -0
  8. data/doc/DataFrame.md +854 -0
  9. data/doc/Vector.md +449 -0
  10. data/doc/image/arrow_table_new.png +0 -0
  11. data/doc/image/dataframe/assign.png +0 -0
  12. data/doc/image/dataframe/drop.png +0 -0
  13. data/doc/image/dataframe/pick.png +0 -0
  14. data/doc/image/dataframe/remove.png +0 -0
  15. data/doc/image/dataframe/rename.png +0 -0
  16. data/doc/image/dataframe/slice.png +0 -0
  17. data/doc/image/dataframe_model.png +0 -0
  18. data/doc/image/example_in_red_arrow.png +0 -0
  19. data/doc/image/tdr.png +0 -0
  20. data/doc/image/tdr_and_table.png +0 -0
  21. data/doc/image/tidy_data_in_TDR.png +0 -0
  22. data/doc/image/vector/binary_element_wise.png +0 -0
  23. data/doc/image/vector/unary_aggregation.png +0 -0
  24. data/doc/image/vector/unary_aggregation_w_option.png +0 -0
  25. data/doc/image/vector/unary_element_wise.png +0 -0
  26. data/doc/tdr.md +56 -0
  27. data/doc/tdr_ja.md +56 -0
  28. data/lib/red-amber.rb +27 -0
  29. data/lib/red_amber/data_frame.rb +91 -37
  30. data/lib/red_amber/{data_frame_output.rb → data_frame_displayable.rb} +49 -41
  31. data/lib/red_amber/data_frame_indexable.rb +38 -0
  32. data/lib/red_amber/data_frame_observation_operation.rb +11 -0
  33. data/lib/red_amber/data_frame_selectable.rb +155 -48
  34. data/lib/red_amber/data_frame_variable_operation.rb +137 -0
  35. data/lib/red_amber/helper.rb +61 -0
  36. data/lib/red_amber/vector.rb +69 -16
  37. data/lib/red_amber/vector_functions.rb +80 -45
  38. data/lib/red_amber/vector_selectable.rb +124 -0
  39. data/lib/red_amber/vector_updatable.rb +104 -0
  40. data/lib/red_amber/version.rb +1 -1
  41. data/lib/red_amber.rb +1 -16
  42. data/red_amber.gemspec +3 -6
  43. metadata +38 -9
data/README.md CHANGED
@@ -1,20 +1,29 @@
1
1
  # RedAmber
2
2
 
3
- A simple dataframe library for Ruby (experimental)
3
+ A simple dataframe library for Ruby (experimental).
4
4
 
5
5
  - Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
6
- - Simple API similar to [Rover-df](https://github.com/ankane/rover)
6
+ - Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
7
7
 
8
8
  ## Requirements
9
9
 
10
10
  ```ruby
11
- gem 'red-arrow', '>= 7.0.0'
12
- gem 'red-parquet', '>= 7.0.0' # if you use IO from/to parquet
11
+ gem 'red-arrow', '>= 8.0.0'
12
+ gem 'red-parquet', '>= 8.0.0' # if you use IO from/to parquet
13
13
  gem 'rover-df', '~> 0.3.0' # if you use IO from/to Rover::DataFrame
14
14
  ```
15
15
 
16
16
  ## Installation
17
17
 
18
+ Install requirements before you install Red Amber.
19
+
20
+ - Apache Arrow GLib (>= 8.0.0)
21
+ - Apache Parquet GLib (>= 8.0.0)
22
+
23
+ See [Apache Arrow install document](https://arrow.apache.org/install/).
24
+
25
+ Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber.
26
+
18
27
  Add this line to your Gemfile:
19
28
 
20
29
  ```ruby
@@ -23,134 +32,66 @@ gem 'red_amber'
23
32
 
24
33
  And then execute:
25
34
 
26
- $ bundle install
35
+ ```shell
36
+ bundle install
37
+ ```
27
38
 
28
39
  Or install it yourself as:
29
40
 
30
- $ gem install red_amber
31
-
32
- ## `RedAmber::DataFrame`
33
-
34
- ### Constructors and saving
35
-
36
- - [x] `new` from a columnar Hash
37
- - `RedAmber::DataFrame.new(x: [1, 2, 3])`
38
-
39
- - [x] `new` from a schema (by Hash) and rows (by Array)
40
- - `RedAmber::DataFrame.new({:x=>:uint8}, [[1], [2], [3]])`
41
-
42
- - [x] `new` from an Arrow::Table
43
- - `RedAmber::DataFrame.new(Arrow::Table.new(x: [1, 2, 3]))`
44
-
45
- - [x] `new` from a Rover::DataFrame
46
- - `RedAmber::DataFrame.new(Rover::DataFrame.new(x: [1, 2, 3]))`
47
-
48
- - [x] `load` (class method)
49
-
50
- - [x] from a [`.arrow`, `.arrows`, `.csv`, `.csv.gz`, `.tsv`] file
51
- - `RedAmber::DataFrame.load("test/entity/with_header.csv")`
52
-
53
- - [x] from a string buffer
54
-
55
- - [x] from a URI
56
- - `RedAmber::DataFrame.load(URI("https://github.com/heronshoes/red_amber/blob/master/test/entity/with_header.csv"))`
57
-
58
- - [x] from a Parquet file
59
-
60
- `red-parquet` gem is required.
61
-
62
- ```ruby
63
- require 'parquet'
64
- dataframe = RedAmber::DataFrame.load("file.parquet")
65
- ```
66
-
67
- - [x] `save` (instance method)
68
-
69
- - [x] to a [`.arrow`, `.arrows`, `.csv`, `.csv.gz`, `.tsv`] file
70
-
71
- - [x] to a string buffer
72
-
73
- - [x] to a URI
74
-
75
- - [x] to a Parquet file
76
-
77
- `red-parquet` gem is required.
78
-
79
- ```ruby
80
- require 'parquet'
81
- dataframe.save("file.parquet")
82
- ```
83
-
84
- ### Properties
85
-
86
- - [x] `table`
87
-
88
- Reader of Arrow::Table object inside.
89
-
90
- - [x] `n_rows`, `nrow`, `size`, `length`
91
-
92
- Returns num of rows (data size).
93
-
94
- - [x] `n_columns`, `ncol`, `width`
95
-
96
- Returns num of columns (num of vectors).
97
-
98
- - [x] `shape`
99
-
100
- Returns shape in an Array[n_rows, n_cols].
101
-
102
- - [x] `column_names`, `keys`
103
-
104
- Returns num of column names by an Array.
105
-
106
- - [x] `types`
107
-
108
- Returns types of columns by an Array of Symbols.
109
-
110
- - [x] `data_types`
111
-
112
- Returns types of columns by an Array of `Arrow::DataType`.
113
-
114
- - [x] `vectors`
115
-
116
- Returns an Array of Vectors.
117
-
118
- - [x] `to_h`
119
-
120
- Returns column-oriented data in a Hash.
121
-
122
- - [x] `to_a`, `raw_records`
123
-
124
- Returns an array of row-oriented data without header. If you need a column-oriented full array, use `.to_h.to_a`
125
-
126
- - [x] `schema`
127
-
128
- Returns column name and data type in a Hash.
41
+ ```shell
42
+ gem install red_amber
43
+ ```
129
44
 
130
- - [x] `==`
131
-
132
- - [x] `empty?`
45
+ (From v0.1.6)
133
46
 
134
- ### Output
47
+ RedAmber uses TDR mode for `#inspect` and `#to_iruby` by default. If you prefer Table mode, please set the environment variable
48
+ `RED_AMBER_OUTPUT_MODE` to `"table"`. See [TDR section](#TDR) for detail.
135
49
 
136
- - [x] `to_s`
50
+ ## `RedAmber::DataFrame`
137
51
 
138
- - [ ] summary, describe
52
+ Represents a set of data in 2D-shape. The entity is a Red Arrow's Table object.
139
53
 
140
- - [x] `to_rover`
54
+ ```ruby
55
+ require 'red_amber' # require 'red-amber' is also OK.
56
+ require 'datasets-arrow'
141
57
 
142
- Returns a `Rover::DataFrame`.
58
+ arrow = Datasets::Penguins.new.to_arrow
59
+ penguins = RedAmber::DataFrame.new(arrow)
60
+ penguins.table
143
61
 
144
- - [x] `inspect(tally_level: 5, max_element: 5)`
62
+ # =>
63
+ #<Arrow::Table:0x111271098 ptr=0x7f9118b3e0b0>
64
+ species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
65
+ 0 Adelie Torgersen 39.100000 18.700000 181 3750 male 2007
66
+ 1 Adelie Torgersen 39.500000 17.400000 186 3800 female 2007
67
+ 2 Adelie Torgersen 40.300000 18.000000 195 3250 female 2007
68
+ 3 Adelie Torgersen (null) (null) (null) (null) (null) 2007
69
+ 4 Adelie Torgersen 36.700000 19.300000 193 3450 female 2007
70
+ 5 Adelie Torgersen 39.300000 20.600000 190 3650 male 2007
71
+ 6 Adelie Torgersen 38.900000 17.800000 181 3625 female 2007
72
+ 7 Adelie Torgersen 39.200000 19.600000 195 4675 male 2007
73
+ 8 Adelie Torgersen 34.100000 18.100000 193 3475 (null) 2007
74
+ 9 Adelie Torgersen 42.000000 20.200000 190 4250 (null) 2007
75
+ ...
76
+ 334 Gentoo Biscoe 46.200000 14.100000 217 4375 female 2009
77
+ 335 Gentoo Biscoe 55.100000 16.000000 230 5850 male 2009
78
+ 336 Gentoo Biscoe 44.500000 15.700000 217 4875 (null) 2009
79
+ 337 Gentoo Biscoe 48.800000 16.200000 222 6000 male 2009
80
+ 338 Gentoo Biscoe 47.200000 13.700000 214 4925 female 2009
81
+ 339 Gentoo Biscoe (null) (null) (null) (null) (null) 2009
82
+ 340 Gentoo Biscoe 46.800000 14.300000 215 4850 female 2009
83
+ 341 Gentoo Biscoe 50.400000 15.700000 222 5750 male 2009
84
+ 342 Gentoo Biscoe 45.200000 14.800000 212 5200 female 2009
85
+ 343 Gentoo Biscoe 49.900000 16.100000 213 5400 male 2009
86
+ ```
145
87
 
146
- Shows some information about self in a transposed style.
88
+ By default, RedAmber shows self by compact transposed style. This unfamiliar style (TDR) is designed for
89
+ the exploratory data processing. It keeps Vectors as row vectors, shows keys and types at a glance, shows levels
90
+ for the 'factor-like' variables and shows the number of abnormal values like NaN and nil.
147
91
 
148
92
  ```ruby
149
- require 'red_amber'
150
- require 'datasets-arrow'
93
+ penguins
151
94
 
152
- penguins = Datasets::Penguins.new.to_arrow
153
- RedAmber::DataFrame.new(penguins)
154
95
  # =>
155
96
  RedAmber::DataFrame : 344 x 8 Vectors
156
97
  Vectors : 5 numeric, 3 strings
@@ -165,257 +106,91 @@ Vectors : 5 numeric, 3 strings
165
106
  8 :year uint16 3 {2007=>110, 2008=>114, 2009=>120}
166
107
  ```
167
108
 
168
- - tally_level: max level to use tally mode
169
- - max_element: max num of element to show values in each row
170
-
171
- ### Selecting
109
+ ### DataFrame model
110
+ ![dataframe model of RedAmber](doc/image/dataframe_model.png)
172
111
 
173
- - [x] Select columns by `[]` as `[key]`, `[keys]`, `[keys[index]]`
174
- - Key in a Symbol: `df[:symbol]`
175
- - Key in a String: `df["string"]`
176
- - Keys in an Array: `df[:symbol1, "string", :symbol2]`
177
- - Keys in indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`
178
- - Keys in a Range:
179
- A end-less Range can be used to represent keys.
112
+ For example, `DataFrame#pick` accepts keys as an argument and returns a sub DataFrame.
180
113
 
181
114
  ```ruby
182
- hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
183
- df = RedAmber::DataFrame.new(hash)
184
- df[:b..:c, "a"]
115
+ df = penguins.pick(:body_mass_g)
185
116
  # =>
186
- RedAmber::DataFrame : 3 x 3 Vectors
187
- Vectors : 2 numeric, 1 string
188
- # key type level data_preview
189
- 1 :b string 3 ["A", "B", "C"]
190
- 2 :c double 3 [1.0, 2.0, 3.0]
191
- 3 :a uint8 3 [1, 2, 3]
117
+ #<RedAmber::DataFrame : 344 x 1 Vector, 0x000000000000fa14>
118
+ Vector : 1 numeric
119
+ # key type level data_preview
120
+ 1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
192
121
  ```
193
122
 
194
- - [x] Select rows by `[]` as `[index]`, `[range]`, `[array]`
195
- - Select a row by index: `df[0]`
196
- - Select rows by indeces in a Range: `df[1..2]`
197
- - Select rows by indeces in an Array: `df[1, 2]`
198
- - Mixed case: `df[2, 0..]`
199
-
200
- - [x] Select rows from top or bottom
201
-
202
- `head(n=5)`, `tail(n=5)`, `first(n=1)`, `last(n=1)`
203
-
204
- - [ ] slice
205
-
206
- ### Updating
207
-
208
- - [ ] Add a new column
209
-
210
- - [ ] Update a single element
211
-
212
- - [ ] Update multiple elements
213
-
214
- - [ ] Update all elements
215
-
216
- - [ ] Update elements matching a condition
123
+ `DataFrame#assign` creates new variables (column in the table).
217
124
 
218
- - [ ] Clamp
219
-
220
- - [ ] Delete columns
221
-
222
- - [ ] Rename a column
223
-
224
- - [ ] Sort rows
225
-
226
- - [ ] Clear data
227
-
228
- ### Treat na data
229
-
230
- - [ ] Drop na (NaN, nil)
231
-
232
- - [ ] Replace na with value
233
-
234
- - [ ] Interpolate na with convolution array
235
-
236
- ### Combining DataFrames
237
-
238
- - [ ] Add rows
239
-
240
- - [ ] Add columns
125
+ ```ruby
126
+ df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
127
+ # =>
128
+ #<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000fa28>
129
+ Vectors : 2 numeric
130
+ # key type level data_preview
131
+ 1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
132
+ 2 :body_mass_kg double 95 [3.75, 3.8, 3.25, nil, 3.45, ... ], 2 nils
133
+ ```
241
134
 
242
- - [ ] Inner join
135
+ DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
243
136
 
244
- - [ ] Left join
137
+ This is an exaple to eliminate observations (row in the table) containing nil.
245
138
 
246
- ### Encoding
139
+ ```ruby
140
+ # remove all observation contains nil
141
+ nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
142
+ nil_removed.tdr
143
+ # =>
144
+ RedAmber::DataFrame : 342 x 8 Vectors
145
+ Vectors : 5 numeric, 3 strings
146
+ # key type level data_preview
147
+ 1 :species string 3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
148
+ 2 :island string 3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
149
+ 3 :bill_length_mm double 164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
150
+ 4 :bill_depth_mm double 80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
151
+ 5 :flipper_length_mm int64 55 [181, 186, 195, 193, 190, ... ]
152
+ 6 :body_mass_g int64 94 [3750, 3800, 3250, 3450, 3650, ... ]
153
+ 7 :sex string 3 {"male"=>168, "female"=>165, ""=>9}
154
+ 8 :year int64 3 {2007=>109, 2008=>114, 2009=>119}
155
+ ```
247
156
 
248
- - [ ] One-hot encoding
157
+ For this frequently needed task, we can do it much simpler.
249
158
 
250
- ### Iteration (not impremented)
159
+ ```ruby
160
+ penguins.remove_nil # => same result as above
161
+ ```
251
162
 
252
- ### Filtering (not impremented)
163
+ See [DataFrame.md](doc/DataFrame.md) for details.
253
164
 
254
165
 
255
166
  ## `RedAmber::Vector`
256
- ### Constructor
257
-
258
- - [x] Create from a column in a DataFrame
259
-
260
- - [x] New from an Array
261
-
262
- ### Properties
263
-
264
- - [x] `to_s`
265
-
266
- - [x] `values`, `to_a`, `entries`
267
167
 
268
- - [x] `size`, `length`, `n_rows`, `nrow`
168
+ Class `RedAmber::Vector` represents a series of data in the DataFrame.
269
169
 
270
- - [x] `type`
271
-
272
- - [x] `data_type`
273
-
274
- - [ ] `each`
275
-
276
- - [ ] `chunked?`
277
-
278
- - [ ] `n_chunks`
279
-
280
- - [ ] `each_chunk`
281
-
282
- - [x] `tally`
283
-
284
- - [x] `n_nils`, `n_nans`
285
-
286
- - `n_nulls` is an alias of `n_nils`
287
-
288
- - [x] `inspect(limit: 80)`
170
+ ```ruby
171
+ penguins[:bill_length_mm]
172
+ # =>
173
+ #<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
174
+ [39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
175
+ ```
289
176
 
290
- - `limit` sets size limit to display long array.
177
+ Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
291
178
 
292
- ### Functions
293
- #### Unary aggregations: vector.func => scalar
179
+ See [Vector.md](doc/Vector.md) for details.
294
180
 
295
- | Method |Boolean|Numeric|String|Options|Remarks|
296
- | ----------- | --- | --- | --- | --- | --- |
297
- | ✓ `all` | ✓ | | | ✓ ScalarAggregate| |
298
- | ✓ `any` | ✓ | | | ✓ ScalarAggregate| |
299
- | ✓ `approximate_median`| |✓| | ✓ ScalarAggregate| alias `median`|
300
- | ✓ `count` | ✓ | ✓ | ✓ | ✓ Count | |
301
- | ✓ `count_distinct`| ✓ | ✓ | ✓ | ✓ Count |alias `count_uniq`|
302
- |[ ]`index` | [ ] | [ ] | [ ] |[ ] Index | |
303
- | ✓ `max` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| |
304
- | ✓ `mean` | ✓ | ✓ | | ✓ ScalarAggregate| |
305
- | ✓ `min` | ✓ | ✓ | ✓ | ✓ ScalarAggregate| |
306
- |[ ]`min_max` | [ ] | [ ] | [ ] |[ ] ScalarAggregate| |
307
- |[ ]`mode` | | [ ] | |[ ] Mode | |
308
- | ✓ `product` | ✓ | ✓ | | ✓ ScalarAggregate| |
309
- |[ ]`quantile`| | [ ] | |[ ] Quantile| |
310
- |[ ]`stddev` | | ✓ | |[ ] Variance| |
311
- | ✓ `sum` | ✓ | ✓ | | ✓ ScalarAggregate| |
312
- |[ ]`tdigest` | | [ ] | |[ ] TDigest | |
313
- |[ ]`variance`| | ✓ | |[ ] Variance| |
181
+ ## TDR
314
182
 
183
+ I named the data frame representation style in the model above as TDR (Transposed DataFrame Representation).
315
184
 
316
- Options can be used as follows.
317
- See the [document of C++ function](https://arrow.apache.org/docs/cpp/compute.html) for detail.
185
+ This library can be used with both TDR mode and usual Table mode.
186
+ If you set the environment variable `RED_AMBER_OUTPUT_MODE` to `"table"`, output style by `inspect` and `to_iruby` is the Table mode. Other value including nil will output TDR style.
318
187
 
188
+ You can switch the mode in Ruby like this.
319
189
  ```ruby
320
- double = RedAmber::Vector.new([1, 0/0.0, -1/0.0, 1/0.0, nil, ""])
321
- #=>
322
- #<RedAmber::Vector(:double, size=6):0x000000000000f910>
323
- [1.0, NaN, -Infinity, Infinity, nil, 0.0]
324
-
325
- double.count #=> 5
326
- double.count(opts: {mode: :only_valid}) #=> 5, default
327
- double.count(opts: {mode: :only_null}) #=> 1
328
- double.count(opts: {mode: :all}) #=> 6
329
-
330
- boolean = RedAmber::Vector.new([true, true, nil])
331
- #=>
332
- #<RedAmber::Vector(:boolean, size=3):0x000000000000f924>
333
- [true, true, nil]
334
-
335
- boolean.all #=> true
336
- boolean.all(opts: {skip_nulls: true}) #=> true
337
- boolean.all(opts: {skip_nulls: false}) #=> false
190
+ ENV['RED_AMBER_OUTPUT_STYLE'] = 'table' # => Table mode
338
191
  ```
339
192
 
340
- #### Unary element-wise: vector.func => vector
341
-
342
- | Method |Boolean|Numeric|String|Options|Remarks|
343
- | ------------ | --- | --- | --- | --- | ----- |
344
- | ✓ `-@` | | ✓ | | |as `-vector`|
345
- | ✓ `negate` | | ✓ | | |`-@` |
346
- | ✓ `abs` | | ✓ | | | |
347
- |[ ]`acos` | | [ ] | | | |
348
- |[ ]`asin` | | [ ] | | | |
349
- | ✓ `atan` | | ✓ | | | |
350
- | ✓ `bit_wise_not`| | (✓) | | |integer only|
351
- |[ ]`ceil` | | ✓ | | | |
352
- | ✓ `cos` | | ✓ | | | |
353
- |[ ]`floor` | | ✓ | | | |
354
- | ✓ `invert` | ✓ | | | |`!`, alias `not`|
355
- |[ ]`ln` | | [ ] | | | |
356
- |[ ]`log10` | | [ ] | | | |
357
- |[ ]`log1p` | | [ ] | | | |
358
- |[ ]`log2` | | [ ] | | | |
359
- |[ ]`round` | | [ ] | |[ ] Round| |
360
- |[ ]`round_to_multiple`| | [ ] | |[ ] RoundToMultiple| |
361
- | ✓ `sign` | | ✓ | | | |
362
- | ✓ `sin` | | ✓ | | | |
363
- | ✓ `tan` | | ✓ | | | |
364
- |[ ]`trunc` | | ✓ | | | |
365
-
366
- #### Binary element-wise: vector.func(vector) => vector
367
-
368
- | Method |Boolean|Numeric|String|Options|Remarks|
369
- | ----------------- | --- | --- | --- | --- | ----- |
370
- | ✓ `add` | | ✓ | | | `+` |
371
- | ✓ `atan2` | | ✓ | | | |
372
- | ✓ `and_kleene` | ✓ | | | | `&` |
373
- | ✓ `and_org ` | ✓ | | | |`and` in Red Arrow|
374
- | ✓ `and_not` | ✓ | | | | |
375
- | ✓ `and_not_kleene`| ✓ | | | | |
376
- | ✓ `bit_wise_and` | | (✓) | | |integer only|
377
- | ✓ `bit_wise_or` | | (✓) | | |integer only|
378
- | ✓ `bit_wise_xor` | | (✓) | | |integer only|
379
- | ✓ `divide` | | ✓ | | | `/` |
380
- | ✓ `equal` | ✓ | ✓ | ✓ | |`==`, alias `eq`|
381
- | ✓ `greater` | ✓ | ✓ | ✓ | |`>`, alias `gt`|
382
- | ✓ `greater_equal` | ✓ | ✓ | ✓ | |`>=`, alias `ge`|
383
- | ✓ `is_finite` | | ✓ | | | |
384
- | ✓ `is_inf` | | ✓ | | | |
385
- | ✓ `is_na` | ✓ | ✓ | ✓ | | |
386
- | ✓ `is_nan` | | ✓ | | | |
387
- |[ ]`is_nil` | ✓ | ✓ | ✓ |[ ] Null|alias `is_null`|
388
- | ✓ `is_valid` | ✓ | ✓ | ✓ | | |
389
- | ✓ `less` | ✓ | ✓ | ✓ | |`<`, alias `lt`|
390
- | ✓ `less_equal` | ✓ | ✓ | ✓ | |`<=`, alias `le`|
391
- |[ ]`logb` | | [ ] | | | |
392
- |[ ]`mod` | | [ ] | | | `%` |
393
- | ✓ `multiply` | | ✓ | | | `*` |
394
- | ✓ `not_equal` | ✓ | ✓ | ✓ | |`!=`, alias `ne`|
395
- | ✓ `or_kleene` | ✓ | | | | `\|` |
396
- | ✓ `or_org` | ✓ | | | |`or` in Red Arrow|
397
- | ✓ `power` | | ✓ | | | `**` |
398
- | ✓ `subtract` | | ✓ | | | `-` |
399
- | ✓ `shift_left` | | (✓) | | |`<<`, integer only|
400
- | ✓ `shift_right` | | (✓) | | |`>>`, integer only|
401
- | ✓ `xor` | ✓ | | | | `^` |
402
-
403
- ##### (Not impremented)
404
- - [ ] sort, sort_index
405
- - [ ] argmin, argmax
406
- - [ ] (array functions)
407
- - [ ] (strings functions)
408
- - [ ] (temporal functions)
409
- - [ ] (conditional functions)
410
- - [ ] (index functions)
411
- - [ ] (other functions)
412
-
413
- ### Coerce (not impremented)
414
-
415
- ### Updating (not impremented)
416
-
417
- ### DSL in a block for faster calculation ?
418
-
193
+ For more detail information about TDR, see [TDR.md](doc/tdr.md).
419
194
 
420
195
  ## Development
421
196
 
@@ -0,0 +1,15 @@
1
+ prelude: |
2
+ require 'datasets-arrow'
3
+ require 'rover'
4
+ require 'red_amber'
5
+
6
+ penguins_csv = 'benchmark/cache/penguins.csv'
7
+
8
+ unless File.exist?(penguins_csv)
9
+ arrow = Datasets::Penguins.new.to_arrow
10
+ RedAmber::DataFrame.new(arrow).save(penguins_csv)
11
+ end
12
+
13
+ benchmark:
14
+ 'penguins by Rover': Rover.read_csv(penguins_csv)
15
+ 'penguins by RedAmber': RedAmber::DataFrame.load(penguins_csv)
@@ -0,0 +1,11 @@
1
+ prelude: |
2
+ require 'datasets-arrow'
3
+ require 'red_amber'
4
+
5
+ penguins = RedAmber::DataFrame.new(Datasets::Penguins.new.to_arrow)
6
+
7
+ def drop_nil(penguins)
8
+ penguins.remove { vectors.map { |v| v.is_nil} }
9
+ end
10
+
11
+ benchmark: drop_nil(penguins)