red_amber 0.1.2 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop.yml +21 -10
- data/CHANGELOG.md +162 -6
- data/Gemfile +3 -0
- data/README.md +89 -303
- data/benchmark/csv_load_penguins.yml +15 -0
- data/benchmark/drop_nil.yml +11 -0
- data/doc/DataFrame.md +840 -0
- data/doc/Vector.md +317 -0
- data/doc/image/arrow_table_new.png +0 -0
- data/doc/image/dataframe/assign.png +0 -0
- data/doc/image/dataframe/drop.png +0 -0
- data/doc/image/dataframe/pick.png +0 -0
- data/doc/image/dataframe/remove.png +0 -0
- data/doc/image/dataframe/rename.png +0 -0
- data/doc/image/dataframe/slice.png +0 -0
- data/doc/image/dataframe_model.png +0 -0
- data/doc/image/example_in_red_arrow.png +0 -0
- data/doc/image/tdr.png +0 -0
- data/doc/image/tdr_and_table.png +0 -0
- data/doc/image/tidy_data_in_TDR.png +0 -0
- data/doc/image/vector/binary_element_wise.png +0 -0
- data/doc/image/vector/unary_aggregation.png +0 -0
- data/doc/image/vector/unary_aggregation_w_option.png +0 -0
- data/doc/image/vector/unary_element_wise.png +0 -0
- data/doc/tdr.md +56 -0
- data/doc/tdr_ja.md +56 -0
- data/lib/red_amber/data_frame.rb +68 -35
- data/lib/red_amber/data_frame_displayable.rb +132 -0
- data/lib/red_amber/data_frame_helper.rb +64 -0
- data/lib/red_amber/data_frame_indexable.rb +38 -0
- data/lib/red_amber/data_frame_observation_operation.rb +83 -0
- data/lib/red_amber/data_frame_selectable.rb +34 -43
- data/lib/red_amber/data_frame_variable_operation.rb +133 -0
- data/lib/red_amber/vector.rb +58 -6
- data/lib/red_amber/vector_compensable.rb +68 -0
- data/lib/red_amber/vector_functions.rb +147 -68
- data/lib/red_amber/version.rb +1 -1
- data/lib/red_amber.rb +9 -1
- data/red_amber.gemspec +3 -6
- metadata +36 -9
- data/lib/red_amber/data_frame_output.rb +0 -116
data/README.md
CHANGED
@@ -3,18 +3,27 @@
|
|
3
3
|
A simple dataframe library for Ruby (experimental)
|
4
4
|
|
5
5
|
- Powered by [Red Arrow](https://github.com/apache/arrow/tree/master/ruby/red-arrow)
|
6
|
-
-
|
6
|
+
- Inspired by the dataframe library [Rover-df](https://github.com/ankane/rover)
|
7
7
|
|
8
8
|
## Requirements
|
9
9
|
|
10
10
|
```ruby
|
11
|
-
gem 'red-arrow', '>=
|
12
|
-
gem 'red-parquet', '>=
|
11
|
+
gem 'red-arrow', '>= 8.0.0'
|
12
|
+
gem 'red-parquet', '>= 8.0.0' # if you use IO from/to parquet
|
13
13
|
gem 'rover-df', '~> 0.3.0' # if you use IO from/to Rover::DataFrame
|
14
14
|
```
|
15
15
|
|
16
16
|
## Installation
|
17
17
|
|
18
|
+
Install requirements before you install Red Amber.
|
19
|
+
|
20
|
+
- Apache Arrow GLib (>= 8.0.0)
|
21
|
+
- Apache Parquet GLib (>= 8.0.0)
|
22
|
+
|
23
|
+
See [Apache Arrow install document](https://arrow.apache.org/install/).
|
24
|
+
|
25
|
+
Minimum installation example for the latest Ubuntu is in the ['Prepare the Apache Arrow' section in ci test](https://github.com/heronshoes/red_amber/blob/master/.github/workflows/test.yml) of Red Amber.
|
26
|
+
|
18
27
|
Add this line to your Gemfile:
|
19
28
|
|
20
29
|
```ruby
|
@@ -23,339 +32,116 @@ gem 'red_amber'
|
|
23
32
|
|
24
33
|
And then execute:
|
25
34
|
|
26
|
-
|
35
|
+
```shell
|
36
|
+
bundle install
|
37
|
+
```
|
27
38
|
|
28
39
|
Or install it yourself as:
|
29
40
|
|
30
|
-
|
41
|
+
```shell
|
42
|
+
gem install red_amber
|
43
|
+
```
|
31
44
|
|
32
45
|
## `RedAmber::DataFrame`
|
33
46
|
|
34
|
-
|
35
|
-
|
36
|
-
- [x] `new` from a columnar Hash
|
37
|
-
- `RedAmber::DataFrame.new(x: [1, 2, 3])`
|
38
|
-
|
39
|
-
- [x] `new` from a schema (by Hash) and rows (by Array)
|
40
|
-
- `RedAmber::DataFrame.new({:x=>:uint8}, [[1], [2], [3]])`
|
41
|
-
|
42
|
-
- [x] `new` from an Arrow::Table
|
43
|
-
- `RedAmber::DataFrame.new(Arrow::Table.new(x: [1, 2, 3]))`
|
44
|
-
|
45
|
-
- [x] `new` from a Rover::DataFrame
|
46
|
-
- `RedAmber::DataFrame.new(Rover::DataFrame.new(x: [1, 2, 3]))`
|
47
|
-
|
48
|
-
- [ ] `load` (class method)
|
49
|
-
|
50
|
-
- [x] from a [`.arrow`, `.arrows`, `.csv`, `.csv.gz`, `.tsv`] file
|
51
|
-
- `RedAmber::DataFrame.load("test/entity/with_header.csv")`
|
52
|
-
|
53
|
-
- [x] from a string buffer
|
54
|
-
|
55
|
-
- [x] from a URI
|
56
|
-
- `RedAmber::DataFrame.load(URI("https://github.com/heronshoes/red_amber/blob/master/test/entity/with_header.csv"))`
|
57
|
-
|
58
|
-
- [ ] from a parquet file
|
59
|
-
|
60
|
-
- [ ] `save` (instance method)
|
61
|
-
|
62
|
-
- [x] to a [`.arrow`, `.arrows`, `.csv`, `.csv.gz`, `.tsv`] file
|
63
|
-
|
64
|
-
- [x] to a string buffer
|
65
|
-
|
66
|
-
- [x] to a URI
|
67
|
-
|
68
|
-
- [ ] to a parquet file
|
69
|
-
|
70
|
-
### Properties
|
71
|
-
|
72
|
-
- [x] `table`
|
73
|
-
|
74
|
-
Reader of Arrow::Table object inside.
|
75
|
-
|
76
|
-
- [x] `n_rows`, `nrow`, `size`, `length`
|
77
|
-
|
78
|
-
Returns num of rows (data size).
|
79
|
-
|
80
|
-
- [x] `n_columns`, `ncol`, `width`
|
81
|
-
|
82
|
-
Returns num of columns (num of vectors).
|
83
|
-
|
84
|
-
- [x] `shape`
|
85
|
-
|
86
|
-
Returns shape in an Array[n_rows, n_cols].
|
87
|
-
|
88
|
-
- [x] `column_names`, `keys`
|
89
|
-
|
90
|
-
Returns num of column names by an Array.
|
91
|
-
|
92
|
-
- [x] `types`
|
93
|
-
|
94
|
-
Returns types of columns by an Array of Symbols.
|
95
|
-
|
96
|
-
- [x] `data_types`
|
97
|
-
|
98
|
-
Returns types of columns by an Array of `Arrow::DataType`.
|
99
|
-
|
100
|
-
- [x] `vectors`
|
101
|
-
|
102
|
-
Returns an Array of Vectors.
|
103
|
-
|
104
|
-
- [x] `to_h`
|
105
|
-
|
106
|
-
Returns column-oriented data in a Hash.
|
107
|
-
|
108
|
-
- [x] `to_a`, `raw_records`
|
47
|
+
Represents a set of data in 2D-shape.
|
109
48
|
|
110
|
-
|
111
|
-
|
112
|
-
-
|
113
|
-
|
114
|
-
Returns column name and data type in a Hash.
|
115
|
-
|
116
|
-
- [x] `==`
|
117
|
-
|
118
|
-
- [x] `empty?`
|
119
|
-
|
120
|
-
### Output
|
121
|
-
|
122
|
-
- [x] `to_s`
|
123
|
-
|
124
|
-
- [ ] summary, describe
|
125
|
-
|
126
|
-
- [x] `to_rover`
|
49
|
+
```ruby
|
50
|
+
require 'red_amber'
|
51
|
+
require 'datasets-arrow'
|
127
52
|
|
128
|
-
|
53
|
+
arrow = Datasets::Penguins.new.to_arrow
|
54
|
+
penguins = RedAmber::DataFrame.new(arrow)
|
55
|
+
penguins.tdr
|
56
|
+
# =>
|
57
|
+
RedAmber::DataFrame : 344 x 8 Vectors
|
58
|
+
Vectors : 5 numeric, 3 strings
|
59
|
+
# key type level data_preview
|
60
|
+
1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
|
61
|
+
2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
|
62
|
+
3 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
|
63
|
+
4 :bill_depth_mm double 81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
|
64
|
+
5 :flipper_length_mm uint8 56 [181, 186, 195, nil, 193, ... ], 2 nils
|
65
|
+
6 :body_mass_g uint16 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
|
66
|
+
7 :sex string 3 {"male"=>168, "female"=>165, nil=>11}
|
67
|
+
8 :year uint16 3 {2007=>110, 2008=>114, 2009=>120}
|
68
|
+
```
|
129
69
|
|
130
|
-
|
70
|
+
### DataFrame model
|
71
|
+

|
131
72
|
|
132
|
-
|
73
|
+
For example, `DataFrame#pick` accepts keys as an argument and returns a sub DataFrame.
|
133
74
|
|
134
75
|
```ruby
|
135
|
-
|
136
|
-
RedAmber::DataFrame.new(hash)
|
76
|
+
df = penguins.pick(:body_mass_g)
|
137
77
|
# =>
|
138
|
-
RedAmber::DataFrame :
|
139
|
-
|
140
|
-
# key
|
141
|
-
1 :
|
142
|
-
2 :b string 3 [A, B, C]
|
143
|
-
3 :c double 3 [1.0, 2.0, 3.0]
|
78
|
+
#<RedAmber::DataFrame : 344 x 1 Vector, 0x000000000000fa14>
|
79
|
+
Vector : 1 numeric
|
80
|
+
# key type level data_preview
|
81
|
+
1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
|
144
82
|
```
|
145
83
|
|
146
|
-
|
147
|
-
- max_element: max num of element to show values in each row
|
84
|
+
`DataFrame#assign` creates new variables (column in the table).
|
148
85
|
|
149
|
-
### Selecting
|
150
|
-
|
151
|
-
- [x] Select columns by `[]` as `[key]`, `[keys]`, `[keys[index]]`
|
152
|
-
- Key in a Symbol: `df[:symbol]`
|
153
|
-
- Key in a String: `df["string"]`
|
154
|
-
- Keys in an Array: `df[:symbol1`, `"string"`, `:symbol2`
|
155
|
-
- Keys in indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`
|
156
|
-
- Keys in a Range:
|
157
|
-
A end-less Range can be used to represent keys.
|
158
86
|
```ruby
|
159
|
-
|
160
|
-
df = RedAmber::DataFrame.new(hash)
|
161
|
-
df[:b..:c, "a"]
|
87
|
+
df.assign(:body_mass_kg => df[:body_mass_g] / 1000.0)
|
162
88
|
# =>
|
163
|
-
RedAmber::DataFrame :
|
164
|
-
|
165
|
-
# key
|
166
|
-
1 :
|
167
|
-
2 :
|
168
|
-
3 :a uint8 3 [1, 2, 3]
|
89
|
+
#<RedAmber::DataFrame : 344 x 2 Vectors, 0x000000000000fa28>
|
90
|
+
Vectors : 2 numeric
|
91
|
+
# key type level data_preview
|
92
|
+
1 :body_mass_g int64 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
|
93
|
+
2 :body_mass_kg double 95 [3.75, 3.8, 3.25, nil, 3.45, ... ], 2 nils
|
169
94
|
```
|
170
95
|
|
171
|
-
|
172
|
-
- Select a row by index: `df[0]`
|
173
|
-
- Select rows by indeces in a Range: `df[1..2]`
|
174
|
-
- Select rows by indeces in an Array: `df[1, 2]`
|
175
|
-
- Mixed case: `df[2, 0..]`
|
176
|
-
|
177
|
-
- [x] Select rows from top or bottom
|
178
|
-
|
179
|
-
`head(n=5)`, `tail(n=5)`, `first(n=1)`, `last(n=1)`
|
180
|
-
|
181
|
-
- [ ] slice
|
182
|
-
|
183
|
-
### Updating
|
184
|
-
|
185
|
-
- [ ] Add a new column
|
186
|
-
|
187
|
-
- [ ] Update a single element
|
188
|
-
|
189
|
-
- [ ] Update multiple elements
|
190
|
-
|
191
|
-
- [ ] Update all elements
|
96
|
+
DataFrame manipulating methods like `pick`, `drop`, `slice`, `remove`, `rename` and `assign` accept a block.
|
192
97
|
|
193
|
-
|
98
|
+
This is an exaple to eliminate observations (row in the table) containing nil.
|
194
99
|
|
195
|
-
|
196
|
-
|
197
|
-
|
198
|
-
|
199
|
-
|
200
|
-
|
201
|
-
|
202
|
-
|
203
|
-
|
204
|
-
|
205
|
-
|
206
|
-
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
-
|
211
|
-
|
212
|
-
|
213
|
-
### Combining DataFrames
|
100
|
+
```ruby
|
101
|
+
# remove all observation contains nil
|
102
|
+
nil_removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
|
103
|
+
nil_removed.tdr
|
104
|
+
# =>
|
105
|
+
RedAmber::DataFrame : 342 x 8 Vectors
|
106
|
+
Vectors : 5 numeric, 3 strings
|
107
|
+
# key type level data_preview
|
108
|
+
1 :species string 3 {"Adelie"=>151, "Chinstrap"=>68, "Gentoo"=>123}
|
109
|
+
2 :island string 3 {"Torgersen"=>51, "Biscoe"=>167, "Dream"=>124}
|
110
|
+
3 :bill_length_mm double 164 [39.1, 39.5, 40.3, 36.7, 39.3, ... ]
|
111
|
+
4 :bill_depth_mm double 80 [18.7, 17.4, 18.0, 19.3, 20.6, ... ]
|
112
|
+
5 :flipper_length_mm int64 55 [181, 186, 195, 193, 190, ... ]
|
113
|
+
6 :body_mass_g int64 94 [3750, 3800, 3250, 3450, 3650, ... ]
|
114
|
+
7 :sex string 3 {"male"=>168, "female"=>165, ""=>9}
|
115
|
+
8 :year int64 3 {2007=>109, 2008=>114, 2009=>119}
|
116
|
+
```
|
214
117
|
|
215
|
-
|
118
|
+
For this frequently needed task, we can do it much simpler.
|
216
119
|
|
217
|
-
|
120
|
+
```ruby
|
121
|
+
penguins.remove_nil # => same result as above
|
122
|
+
```
|
218
123
|
|
219
|
-
|
124
|
+
See [DataFrame.md](doc/DataFrame.md) for details.
|
220
125
|
|
221
|
-
- [ ] Left join
|
222
126
|
|
223
|
-
|
127
|
+
## `RedAmber::Vector`
|
224
128
|
|
225
|
-
|
129
|
+
Class `RedAmber::Vector` represents a series of data in the DataFrame.
|
226
130
|
|
227
|
-
|
131
|
+
```ruby
|
132
|
+
penguins[:bill_length_mm]
|
133
|
+
# =>
|
134
|
+
#<RedAmber::Vector(:double, size=344):0x000000000000f8fc>
|
135
|
+
[39.1, 39.5, 40.3, nil, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, ... ]
|
136
|
+
```
|
228
137
|
|
229
|
-
|
138
|
+
Vectors accepts some [functional methods from Arrow](https://arrow.apache.org/docs/cpp/compute.html).
|
230
139
|
|
140
|
+
See [Vector.md](doc/Vector.md) for details.
|
231
141
|
|
232
|
-
##
|
233
|
-
### Constructor
|
234
|
-
|
235
|
-
- [x] Create from a column in a DataFrame
|
236
|
-
|
237
|
-
- [x] New from an Array
|
238
|
-
|
239
|
-
### Properties
|
240
|
-
|
241
|
-
- [x] `to_s`
|
242
|
-
|
243
|
-
- [x] `values`, `to_a`, `entries`
|
244
|
-
|
245
|
-
- [x] `size`, `length`, `n_rows`, `nrow`
|
246
|
-
|
247
|
-
- [x] `type`
|
248
|
-
|
249
|
-
- [x] `data_type`
|
250
|
-
|
251
|
-
- [ ] `each`
|
252
|
-
|
253
|
-
- [ ] `chunked?`
|
254
|
-
|
255
|
-
- [ ] `n_chunks`
|
256
|
-
|
257
|
-
- [ ] `each_chunk`
|
258
|
-
|
259
|
-
- [x] `tally`
|
260
|
-
|
261
|
-
- [ ] `n_nulls`
|
262
|
-
|
263
|
-
### Functions
|
264
|
-
#### Unary aggregations: vector.func => Scalar
|
265
|
-
|
266
|
-
| Method |Boolean|Numeric|String|Remarks|
|
267
|
-
| ------------ | --- | --- | --- | ----- |
|
268
|
-
|[x] `all` | [x] | | | |
|
269
|
-
|[x] `any` | [x] | | | |
|
270
|
-
|[x] `approximate_median`| | [x] | | |
|
271
|
-
|[x] `count` | [x] | [x] | [x] | |
|
272
|
-
|[x] `count_distinct`| [x] | [x] | [x] | |
|
273
|
-
|[x] `count_uniq` | [x] | [x] | [x] |an alias of `count_distinct`|
|
274
|
-
|[ ] `index` | | | | |
|
275
|
-
|[x] `max` | [x] | [x] | [x] | |
|
276
|
-
|[x] `mean` | [x] | [x] | | |
|
277
|
-
|[x] `min` | [x] | [x] | [x] | |
|
278
|
-
|[ ] `min_max` | | | | |
|
279
|
-
|[ ] `mode` | | | | |
|
280
|
-
|[x] `product` | [x] | [x] | | |
|
281
|
-
|[ ] `quantile`| | | | |
|
282
|
-
|[x] `stddev` | | [x] | | |
|
283
|
-
|[x] `sum` | [x] | [x] | | |
|
284
|
-
|[ ] `tdigest` | | | | |
|
285
|
-
|[x] `variance`| | [x] | | |
|
286
|
-
|
287
|
-
#### Unary element-wise: vector.func => Vector
|
288
|
-
|
289
|
-
| Method |Boolean|Numeric|String|Remarks|
|
290
|
-
| ------------ | --- | --- | --- | ----- |
|
291
|
-
|[x] `-@` | | [x] | |as `-vector`|
|
292
|
-
|[x] `negate` | | [x] | |`-@` |
|
293
|
-
|[x] `abs` | | [x] | | |
|
294
|
-
|[ ] `acos` | | [ ] | | |
|
295
|
-
|[ ] `asin` | | [ ] | | |
|
296
|
-
|[x] `atan` | | [x] | | |
|
297
|
-
|[ ] `ceil` | | [x] | | |
|
298
|
-
|[x] `cos` | | [x] | | |
|
299
|
-
|[ ] `floor` | | [x] | | |
|
300
|
-
|[ ] `ln` | | [ ] | | |
|
301
|
-
|[ ] `log10` | | [ ] | | |
|
302
|
-
|[ ] `log1p` | | [ ] | | |
|
303
|
-
|[ ] `log2` | | [ ] | | |
|
304
|
-
|[x] `sign` | | [x] | | |
|
305
|
-
|[x] `sin` | | [x] | | |
|
306
|
-
|[x] `tan` | | [x] | | |
|
307
|
-
|[ ] `trunc` | | [x] | | |
|
308
|
-
|
309
|
-
#### Binary element-wise: vector.func(vector) => Vector
|
310
|
-
|
311
|
-
| Method |Boolean|Numeric|String|Remarks|
|
312
|
-
| ------------------ | --- | --- | --- | ----- |
|
313
|
-
|[x] `add` | | [x] | | `+` |
|
314
|
-
|[x] `atan2` | | [x] | | |
|
315
|
-
|[x] `and` | [x] | | | |
|
316
|
-
|[x] `and_kleene` | [x] | | | |
|
317
|
-
|[x] `and_not` | [x] | | | |
|
318
|
-
|[x] `and_not_kleene`| [x] | | | |
|
319
|
-
|[x] `bit_wise_and` | |([x])| |`&`, integer only|
|
320
|
-
|[ ] `bit_wise_not` | |([x])| |`!`, integer only|
|
321
|
-
|[x] `bit_wise_or` | |([x])| |`|`, integer only|
|
322
|
-
|[x] `bit_wise_xor` | |([x])| |`^`, integer only|
|
323
|
-
|[x] `divide` | | [x] | | `/` |
|
324
|
-
|[x] `equal` | [x] | [x] | [x] |`==`, alias `eq`|
|
325
|
-
|[x] `greater` | [x] | [x] | [x] |`>`, alias `gt`|
|
326
|
-
|[x] `greater_equal` | [x] | [x] | [x] |`>=`, alias `ge`|
|
327
|
-
|[x] `less` | [x] | [x] | [x] |`<`, alias `lt`|
|
328
|
-
|[x] `less_equal` | [x] | [x] | [x] |`<=`, alias `le`|
|
329
|
-
|[ ] `logb` | | [ ] | | |
|
330
|
-
|[ ] `mod` | | [ ] | | |
|
331
|
-
|[x] `multiply` | | [x] | | `*` |
|
332
|
-
|[x] `not_equal` | [x] | [x] | [x] |`!=`, alias `ne`|
|
333
|
-
|[x] `or` | [x] | | | |
|
334
|
-
|[x] `or_kleene` | [x] | | | |
|
335
|
-
|[x] `power` | | [x] | | `**` |
|
336
|
-
|[x] `subtract` | | [x] | | `-` |
|
337
|
-
|[x] `shift_left` | |([x])| |`<<`, integer only|
|
338
|
-
|[x] `shift_right` | |([x])| |`>>`, integer only|
|
339
|
-
|[x] `xor` | [x] | | | |
|
340
|
-
|
341
|
-
##### (Not impremented)
|
342
|
-
- [ ] invert, round, round_to_multiple
|
343
|
-
- [ ] sort, sort_index
|
344
|
-
- [ ] minmax, var, median, quantile
|
345
|
-
- [ ] argmin, argmax
|
346
|
-
- [ ] (array functions)
|
347
|
-
- [ ] (strings functions)
|
348
|
-
- [ ] (temporal functions)
|
349
|
-
- [ ] (conditional functions)
|
350
|
-
- [ ] (index functions)
|
351
|
-
- [ ] (other functions)
|
352
|
-
|
353
|
-
### Coerce (not impremented)
|
354
|
-
|
355
|
-
### Updating (not impremented)
|
356
|
-
|
357
|
-
### DSL in a block for faster calculation ?
|
142
|
+
## TDR concept
|
358
143
|
|
144
|
+
I named the data frame representation style in the model above as TDR (Transposed DataFrame Representation). See [TDR.md](doc/tdr.md) for details.
|
359
145
|
|
360
146
|
## Development
|
361
147
|
|
@@ -0,0 +1,15 @@
|
|
1
|
+
prelude: |
|
2
|
+
require 'datasets-arrow'
|
3
|
+
require 'rover'
|
4
|
+
require 'red_amber'
|
5
|
+
|
6
|
+
penguins_csv = 'benchmark/cache/penguins.csv'
|
7
|
+
|
8
|
+
unless File.exist?(penguins_csv)
|
9
|
+
arrow = Datasets::Penguins.new.to_arrow
|
10
|
+
RedAmber::DataFrame.new(arrow).save(penguins_csv)
|
11
|
+
end
|
12
|
+
|
13
|
+
benchmark:
|
14
|
+
'penguins by Rover': Rover.read_csv(penguins_csv)
|
15
|
+
'penguins by RedAmber': RedAmber::DataFrame.load(penguins_csv)
|