red_amber 0.2.1 → 0.2.3

Sign up to get free protection for your applications and to get access to all the features.
Files changed (58) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +15 -0
  3. data/CHANGELOG.md +170 -20
  4. data/Gemfile +4 -2
  5. data/README.md +121 -302
  6. data/benchmark/basic.yml +79 -0
  7. data/benchmark/combine.yml +63 -0
  8. data/benchmark/drop_nil.yml +15 -3
  9. data/benchmark/group.yml +33 -0
  10. data/benchmark/reshape.yml +27 -0
  11. data/benchmark/{csv_load_penguins.yml → rover/csv_load_penguins.yml} +3 -3
  12. data/benchmark/rover/flights.yml +23 -0
  13. data/benchmark/rover/penguins.yml +23 -0
  14. data/benchmark/rover/planes.yml +23 -0
  15. data/benchmark/rover/weather.yml +23 -0
  16. data/doc/DataFrame.md +611 -318
  17. data/doc/Vector.md +31 -36
  18. data/doc/image/basic_verbs.png +0 -0
  19. data/doc/image/dataframe/assign.png +0 -0
  20. data/doc/image/dataframe/assign_operation.png +0 -0
  21. data/doc/image/dataframe/drop.png +0 -0
  22. data/doc/image/dataframe/join.png +0 -0
  23. data/doc/image/dataframe/pick.png +0 -0
  24. data/doc/image/dataframe/pick_operation.png +0 -0
  25. data/doc/image/dataframe/remove.png +0 -0
  26. data/doc/image/dataframe/rename.png +0 -0
  27. data/doc/image/dataframe/rename_operation.png +0 -0
  28. data/doc/image/dataframe/reshaping_DataFrames.png +0 -0
  29. data/doc/image/dataframe/set_and_bind.png +0 -0
  30. data/doc/image/dataframe/slice.png +0 -0
  31. data/doc/image/dataframe/slice_operation.png +0 -0
  32. data/doc/image/dataframe_model.png +0 -0
  33. data/doc/image/group_operation.png +0 -0
  34. data/doc/image/replace-if_then.png +0 -0
  35. data/doc/image/reshaping_dataframe.png +0 -0
  36. data/doc/image/screenshot.png +0 -0
  37. data/doc/image/vector/binary_element_wise.png +0 -0
  38. data/doc/image/vector/unary_aggregation.png +0 -0
  39. data/doc/image/vector/unary_aggregation_w_option.png +0 -0
  40. data/doc/image/vector/unary_element_wise.png +0 -0
  41. data/lib/red_amber/data_frame.rb +16 -42
  42. data/lib/red_amber/data_frame_combinable.rb +283 -0
  43. data/lib/red_amber/data_frame_displayable.rb +58 -3
  44. data/lib/red_amber/data_frame_loadsave.rb +36 -0
  45. data/lib/red_amber/data_frame_reshaping.rb +8 -6
  46. data/lib/red_amber/data_frame_selectable.rb +9 -9
  47. data/lib/red_amber/data_frame_variable_operation.rb +27 -21
  48. data/lib/red_amber/group.rb +100 -17
  49. data/lib/red_amber/helper.rb +20 -30
  50. data/lib/red_amber/vector.rb +56 -30
  51. data/lib/red_amber/vector_functions.rb +0 -8
  52. data/lib/red_amber/vector_selectable.rb +9 -1
  53. data/lib/red_amber/vector_updatable.rb +61 -63
  54. data/lib/red_amber/version.rb +1 -1
  55. data/lib/red_amber.rb +2 -0
  56. data/red_amber.gemspec +1 -1
  57. metadata +32 -11
  58. data/doc/examples_of_red_amber.ipynb +0 -8979
data/doc/DataFrame.md CHANGED
@@ -5,7 +5,8 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
5
5
  - A label is attached to `Vector`. We call it `key`.
6
6
  - A `Vector` and associated `key` is grouped as a `variable`.
7
7
  - `variable`s with same vector length are aligned and arranged to be a `DataFrame`.
8
- - Each `Vector` in a `DataFrame` contains a set of relating data at same position. We call it `observation`.
8
+ - Each `key` in a `DataFrame` must be unique.
9
+ - Each `Vector` in a `DataFrame` contains a set of relating data at same position. We call it `record` or `observation`.
9
10
 
10
11
  ![dataframe model image](doc/../image/dataframe_model.png)
11
12
 
@@ -14,30 +15,38 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
14
15
  ### `new` from a Hash
15
16
 
16
17
  ```ruby
17
- RedAmber::DataFrame.new(x: [1, 2, 3])
18
+ df = RedAmber::DataFrame.new(x: [1, 2, 3], y: %w[A B C])
18
19
  ```
19
20
 
20
21
  ### `new` from a schema (by Hash) and data (by Array)
21
22
 
22
23
  ```ruby
23
- RedAmber::DataFrame.new({:x=>:uint8}, [[1], [2], [3]])
24
+ RedAmber::DataFrame.new({x: :uint8, y: :string}, [[1, "A"], [2, "B"], [3, "C"]])
24
25
  ```
25
26
 
26
27
  ### `new` from an Arrow::Table
27
28
 
28
29
 
29
30
  ```ruby
30
- table = Arrow::Table.new(x: [1, 2, 3])
31
+ table = Arrow::Table.new(x: [1, 2, 3], y: %w[A B C])
31
32
  RedAmber::DataFrame.new(table)
32
33
  ```
33
34
 
35
+ ### `new` from an Object which responds to `to_arrow`
36
+
37
+ ```ruby
38
+ require "datasets-arrow"
39
+ dataset = Datasets::Penguins.new
40
+ RedAmber::DataFrame.new(dataset)
41
+ ```
42
+
34
43
  ### `new` from a Rover::DataFrame
35
44
 
36
45
 
37
46
  ```ruby
38
47
  require 'rover'
39
48
 
40
- rover = Rover::DataFrame.new(x: [1, 2, 3])
49
+ rover = Rover::DataFrame.new(x: [1, 2, 3], y: %w[A B C])
41
50
  RedAmber::DataFrame.new(rover)
42
51
  ```
43
52
 
@@ -63,7 +72,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
63
72
  ```ruby
64
73
  require 'parquet'
65
74
 
66
- dataframe = RedAmber::DataFrame.load("file.parquet")
75
+ df = RedAmber::DataFrame.load("file.parquet")
67
76
  ```
68
77
 
69
78
  ### `save` (instance method)
@@ -79,20 +88,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
79
88
  ```ruby
80
89
  require 'parquet'
81
90
 
82
- dataframe.save("file.parquet")
91
+ df.save("file.parquet")
83
92
  ```
84
93
 
85
94
  ## Properties
86
95
 
87
96
  ### `table`, `to_arrow`
88
97
 
89
- - Reader of Arrow::Table object inside.
98
+ - Returns Arrow::Table object in the DataFrame.
90
99
 
91
- ### `size`, `n_obs`, `n_rows`
100
+ ### `size`, `n_records`, `n_obs`, `n_rows`
92
101
 
93
- - Returns size of Vector (num of observations).
94
-
95
- ### `n_keys`, `n_vars`, `n_cols`,
102
+ - Returns size of Vector (num of records).
103
+
104
+ ### `n_keys`, `n_variables`, `n_vars`, `n_cols`,
96
105
 
97
106
  - Returns num of keys (num of variables).
98
107
 
@@ -130,16 +139,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
130
139
 
131
140
  - Returns key names in an Array.
132
141
 
133
- When we use it with vectors, Vector#key is useful to get the key inside of DataFrame.
134
-
135
- ```ruby
136
- # update numeric variables, another solution
137
- df.assign do
138
- vectors.each_with_object({}) do |vector, assigner|
139
- assigner[vector.key] = vector * -1 if vector.numeric?
140
- end
141
- end
142
- ```
142
+ Each key must be unique in the DataFrame.
143
143
 
144
144
  ### `types`
145
145
 
@@ -153,9 +153,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
153
153
 
154
154
  - Returns an Array of Vectors.
155
155
 
156
+ When we use it, Vector#key is useful to get the key in the DataFrame.
157
+
158
+ ```ruby
159
+ # update numeric variables, another solution
160
+ df.assign do
161
+ vectors.each_with_object({}) do |vector, assigner|
162
+ assigner[vector.key] = vector * -1 if vector.numeric?
163
+ end
164
+ end
165
+ ```
166
+
156
167
  ### `indices`, `indexes`
157
168
 
158
- - Returns indexes in an Array.
169
+ - Returns indexes in a Vector.
159
170
  Accepts an option `start` as the first of indexes.
160
171
 
161
172
  ```ruby
@@ -163,15 +174,19 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
163
174
  df.indices
164
175
 
165
176
  # =>
177
+ #<RedAmber::Vector(:uint8, size=5):0x0000000000013ed4>
166
178
  [0, 1, 2, 3, 4]
167
179
 
168
180
  df.indices(1)
169
181
 
170
182
  # =>
183
+ #<RedAmber::Vector(:uint8, size=5):0x0000000000018fd8>
171
184
  [1, 2, 3, 4, 5]
172
185
 
173
186
  df.indices(:a)
187
+
174
188
  # =>
189
+ #<RedAmber::Vector(:dictionary, size=5):0x000000000001bd50>
175
190
  [:a, :b, :c, :d, :e]
176
191
  ```
177
192
 
@@ -210,15 +225,15 @@ puts penguins.to_s
210
225
  # =>
211
226
  species island bill_length_mm bill_depth_mm flipper_length_mm ... year
212
227
  <string> <string> <double> <double> <uint8> ... <uint16>
213
- 1 Adelie Torgersen 39.1 18.7 181 ... 2007
214
- 2 Adelie Torgersen 39.5 17.4 186 ... 2007
215
- 3 Adelie Torgersen 40.3 18.0 195 ... 2007
216
- 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
217
- 5 Adelie Torgersen 36.7 19.3 193 ... 2007
228
+ 0 Adelie Torgersen 39.1 18.7 181 ... 2007
229
+ 1 Adelie Torgersen 39.5 17.4 186 ... 2007
230
+ 2 Adelie Torgersen 40.3 18.0 195 ... 2007
231
+ 3 Adelie Torgersen (nil) (nil) (nil) ... 2007
232
+ 4 Adelie Torgersen 36.7 19.3 193 ... 2007
218
233
  : : : : : : ... :
219
- 342 Gentoo Biscoe 50.4 15.7 222 ... 2009
220
- 343 Gentoo Biscoe 45.2 14.8 212 ... 2009
221
- 344 Gentoo Biscoe 49.9 16.1 213 ... 2009
234
+ 341 Gentoo Biscoe 50.4 15.7 222 ... 2009
235
+ 342 Gentoo Biscoe 45.2 14.8 212 ... 2009
236
+ 343 Gentoo Biscoe 49.9 16.1 213 ... 2009
222
237
  ```
223
238
  ### `inspect`
224
239
 
@@ -235,11 +250,11 @@ puts penguins.summary.to_s(width: 82) # needs more width to show all stats in th
235
250
  # =>
236
251
  variables count mean std min 25% median 75% max
237
252
  <dictionary> <uint16> <double> <double> <double> <double> <double> <double> <double>
238
- 1 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
239
- 2 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
240
- 3 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
241
- 4 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
242
- 5 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
253
+ 0 bill_length_mm 342 43.92 5.46 32.1 39.23 44.38 48.5 59.6
254
+ 1 bill_depth_mm 342 17.15 1.97 13.1 15.6 17.32 18.7 21.5
255
+ 2 flipper_length_mm 342 200.92 14.06 172.0 190.0 197.0 213.0 231.0
256
+ 3 body_mass_g 342 4201.75 801.95 2700.0 3550.0 4031.5 4750.0 6300.0
257
+ 4 year 344 2008.03 0.82 2007.0 2007.0 2008.0 2009.0 2009.0
243
258
  ```
244
259
 
245
260
  ### `to_rover`
@@ -265,26 +280,29 @@ penguins.to_rover
265
280
  require 'red_amber'
266
281
  require 'datasets-arrow'
267
282
 
268
- penguins = Datasets::Penguins.new.to_arrow
269
- RedAmber::DataFrame.new(penguins).tdr
283
+ dataset = Datasets::Penguins.new
284
+ # (From 0.2.2) responsible to the object which has `to_arrow` method.
285
+ # If older, it should be `dataset.to_arrow` in the parentheses.
286
+ RedAmber::DataFrame.new(dataset).tdr
270
287
 
271
288
  # =>
272
289
  RedAmber::DataFrame : 344 x 8 Vectors
273
290
  Vectors : 5 numeric, 3 strings
274
291
  # key type level data_preview
275
- 1 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
276
- 2 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
277
- 3 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
278
- 4 :bill_depth_mm double 81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
279
- 5 :flipper_length_mm uint8 56 [181, 186, 195, nil, 193, ... ], 2 nils
280
- 6 :body_mass_g uint16 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
281
- 7 :sex string 3 {"male"=>168, "female"=>165, nil=>11}
282
- 8 :year uint16 3 {2007=>110, 2008=>114, 2009=>120}
292
+ 0 :species string 3 {"Adelie"=>152, "Chinstrap"=>68, "Gentoo"=>124}
293
+ 1 :island string 3 {"Torgersen"=>52, "Biscoe"=>168, "Dream"=>124}
294
+ 2 :bill_length_mm double 165 [39.1, 39.5, 40.3, nil, 36.7, ... ], 2 nils
295
+ 3 :bill_depth_mm double 81 [18.7, 17.4, 18.0, nil, 19.3, ... ], 2 nils
296
+ 4 :flipper_length_mm uint8 56 [181, 186, 195, nil, 193, ... ], 2 nils
297
+ 5 :body_mass_g uint16 95 [3750, 3800, 3250, nil, 3450, ... ], 2 nils
298
+ 6 :sex string 3 {"male"=>168, "female"=>165, nil=>11}
299
+ 7 :year uint16 3 {2007=>110, 2008=>114, 2009=>120}
283
300
  ```
284
-
301
+
302
+ Options:
285
303
  - limit: limit of variables to show. Default value is 10.
286
- - tally: max level to use tally mode.
287
- - elements: max num of element to show values in each observations.
304
+ - tally: max level to use tally mode. Default value is 5.
305
+ - elements: max num of element to show values in each records. Default value is 5.
288
306
 
289
307
  ## Selecting
290
308
 
@@ -294,13 +312,13 @@ penguins.to_rover
294
312
  - Keys in an Array: `df[:symbol1, "string", :symbol2]`
295
313
  - Keys by indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`
296
314
 
297
- Key indeces can be used via `keys[i]` because numbers are used to select observations (rows).
315
+ Key indeces should be used via `keys[i]` because numbers are used to select records (rows). See next section.
298
316
 
299
317
  - Keys by a Range:
300
318
 
301
- If keys are able to represent by Range, it can be included in the arguments. See a example below.
319
+ If keys are able to represent by a Range, it can be included in the arguments. See a example below.
302
320
 
303
- - You can exchange the order of variables (columns).
321
+ - You can also exchange the order of variables (columns).
304
322
 
305
323
  ```ruby
306
324
  hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
@@ -311,12 +329,12 @@ penguins.to_rover
311
329
  #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000328fc>
312
330
  b c a
313
331
  <string> <double> <uint8>
314
- 1 A 1.0 1
315
- 2 B 2.0 2
316
- 3 C 3.0 3
332
+ 0 A 1.0 1
333
+ 1 B 2.0 2
334
+ 2 C 3.0 3
317
335
  ```
318
336
 
319
- If `#[]` represents single variable (column), it returns a Vector object.
337
+ If `#[]` represents a single variable (column), it returns a Vector object.
320
338
 
321
339
  ```ruby
322
340
  df[:a]
@@ -325,6 +343,7 @@ penguins.to_rover
325
343
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
326
344
  [1, 2, 3]
327
345
  ```
346
+
328
347
  Or `#v` method also returns a Vector for a key.
329
348
 
330
349
  ```ruby
@@ -335,18 +354,19 @@ penguins.to_rover
335
354
  [1, 2, 3]
336
355
  ```
337
356
 
338
- This may be useful to use in a block of DataFrame manipulation verbs. We can write `v(:a)` rather than `self[:a]` or `df[:a]`
357
+ This method may be useful to use in a block of DataFrame manipulation verbs. We can write `v(:a)` rather than `self[:a]` or `df[:a]`
339
358
 
340
- ### Select observations (rows in a table) by `[]` as `[index]`, `[range]`, `[array]`
359
+ ### Select records (rows in a table) by `[]` as `[index]`, `[range]`, `[array]`
341
360
 
342
- - Select a obs. by index: `df[0]`
343
- - Select obs. by indeces in a Range: `df[1..2]`
361
+ - Select a record by index: `df[0]`
344
362
 
345
- An end-less or a begin-less Range can be used to represent indeces.
363
+ - Select records by indeces in an Array: `df[1, 2]`
346
364
 
347
- - Select obs. by indeces in an Array: `df[1, 2]`
365
+ - Select records by indeces in a Range: `df[1..2]`
348
366
 
349
- - You can use float indices.
367
+ An end-less or a begin-less Range can be used to represent indeces.
368
+
369
+ - You can use indices in Float.
350
370
 
351
371
  - Mixed case: `df[2, 0..]`
352
372
 
@@ -359,15 +379,15 @@ penguins.to_rover
359
379
  #<RedAmber::DataFrame : 4 x 3 Vectors, 0x0000000000033270>
360
380
  a b c
361
381
  <uint8> <string> <double>
362
- 1 3 C 3.0
363
- 2 1 A 1.0
364
- 3 2 B 2.0
365
- 4 3 C 3.0
382
+ 0 3 C 3.0
383
+ 1 1 A 1.0
384
+ 2 2 B 2.0
385
+ 3 3 C 3.0
366
386
  ```
367
387
 
368
- - Select obs. by a boolean Array or a boolean RedAmber::Vector at same size as self.
388
+ - Select records by a boolean Array or a boolean RedAmber::Vector at same size as self.
369
389
 
370
- It returns a sub dataframe with observations at boolean is true.
390
+ It returns a sub dataframe with records at boolean is true.
371
391
 
372
392
  ```ruby
373
393
  # with the same dataframe `df` above
@@ -382,15 +402,15 @@ penguins.to_rover
382
402
  1 1 A 1.0
383
403
  ```
384
404
 
385
- ### Select rows from top or from bottom
405
+ ### Select records (rows) from top or from bottom
386
406
 
387
407
  `head(n=5)`, `tail(n=5)`, `first(n=1)`, `last(n=1)`
388
408
 
389
409
  ## Sub DataFrame manipulations
390
410
 
391
- ### `pick ` - pick up variables by key label -
411
+ ### `pick ` - pick up variables -
392
412
 
393
- Pick up some columns (variables) to create a sub DataFrame.
413
+ Pick up some variables (columns) to create a sub DataFrame.
394
414
 
395
415
  ![pick method image](doc/../image/dataframe/pick.png)
396
416
 
@@ -405,15 +425,15 @@ penguins.to_rover
405
425
  #<RedAmber::DataFrame : 344 x 2 Vectors, 0x0000000000035ebc>
406
426
  species bill_length_mm
407
427
  <string> <double>
408
- 1 Adelie 39.1
409
- 2 Adelie 39.5
410
- 3 Adelie 40.3
411
- 4 Adelie (nil)
412
- 5 Adelie 36.7
428
+ 0 Adelie 39.1
429
+ 1 Adelie 39.5
430
+ 2 Adelie 40.3
431
+ 3 Adelie (nil)
432
+ 4 Adelie 36.7
413
433
  : : :
414
- 342 Gentoo 50.4
415
- 343 Gentoo 45.2
416
- 344 Gentoo 49.9
434
+ 341 Gentoo 50.4
435
+ 342 Gentoo 45.2
436
+ 343 Gentoo 49.9
417
437
  ```
418
438
 
419
439
  - Indices as arguments
@@ -427,15 +447,15 @@ penguins.to_rover
427
447
  #<RedAmber::DataFrame : 344 x 4 Vectors, 0x0000000000055ce4>
428
448
  species island bill_length_mm year
429
449
  <string> <string> <double> <uint16>
430
- 1 Adelie Torgersen 39.1 2007
431
- 2 Adelie Torgersen 39.5 2007
432
- 3 Adelie Torgersen 40.3 2007
433
- 4 Adelie Torgersen (nil) 2007
434
- 5 Adelie Torgersen 36.7 2007
450
+ 0 Adelie Torgersen 39.1 2007
451
+ 1 Adelie Torgersen 39.5 2007
452
+ 2 Adelie Torgersen 40.3 2007
453
+ 3 Adelie Torgersen (nil) 2007
454
+ 4 Adelie Torgersen 36.7 2007
435
455
  : : : : :
436
- 342 Gentoo Biscoe 50.4 2009
437
- 343 Gentoo Biscoe 45.2 2009
438
- 344 Gentoo Biscoe 49.9 2009
456
+ 341 Gentoo Biscoe 50.4 2009
457
+ 342 Gentoo Biscoe 45.2 2009
458
+ 343 Gentoo Biscoe 49.9 2009
439
459
  ```
440
460
 
441
461
  - Booleans as arguments
@@ -443,21 +463,21 @@ penguins.to_rover
443
463
  `pick(booleans)` accepts booleans as arguments in an Array. Booleans must be same length as `n_keys`.
444
464
 
445
465
  ```ruby
446
- penguins.pick(penguins.types.map { |type| type == :string })
466
+ penguins.pick(penguins.vectors.map(&:string?))
447
467
 
448
468
  # =>
449
469
  #<RedAmber::DataFrame : 344 x 3 Vectors, 0x00000000000387ac>
450
470
  species island sex
451
471
  <string> <string> <string>
452
- 1 Adelie Torgersen male
472
+ 0 Adelie Torgersen male
473
+ 1 Adelie Torgersen female
453
474
  2 Adelie Torgersen female
454
- 3 Adelie Torgersen female
455
- 4 Adelie Torgersen (nil)
456
- 5 Adelie Torgersen female
475
+ 3 Adelie Torgersen (nil)
476
+ 4 Adelie Torgersen female
457
477
  : : : :
458
- 342 Gentoo Biscoe male
459
- 343 Gentoo Biscoe female
460
- 344 Gentoo Biscoe male
478
+ 341 Gentoo Biscoe male
479
+ 342 Gentoo Biscoe female
480
+ 343 Gentoo Biscoe male
461
481
  ```
462
482
 
463
483
  - Keys or booleans by a block
@@ -471,20 +491,20 @@ penguins.to_rover
471
491
  #<RedAmber::DataFrame : 344 x 3 Vectors, 0x000000000003dd4c>
472
492
  bill_length_mm bill_depth_mm flipper_length_mm
473
493
  <double> <double> <uint8>
474
- 1 39.1 18.7 181
475
- 2 39.5 17.4 186
476
- 3 40.3 18.0 195
477
- 4 (nil) (nil) (nil)
478
- 5 36.7 19.3 193
494
+ 0 39.1 18.7 181
495
+ 1 39.5 17.4 186
496
+ 2 40.3 18.0 195
497
+ 3 (nil) (nil) (nil)
498
+ 4 36.7 19.3 193
479
499
  : : : :
480
- 342 50.4 15.7 222
481
- 343 45.2 14.8 212
482
- 344 49.9 16.1 213
500
+ 341 50.4 15.7 222
501
+ 342 45.2 14.8 212
502
+ 343 49.9 16.1 213
483
503
  ```
484
504
 
485
- ### `drop ` - pick and drop -
505
+ ### `drop ` - counterpart of pick -
486
506
 
487
- Drop some columns (variables) to create a remainer DataFrame.
507
+ Drop some variables (columns) to create a remainer DataFrame.
488
508
 
489
509
  ![drop method image](doc/../image/dataframe/drop.png)
490
510
 
@@ -526,9 +546,9 @@ penguins.to_rover
526
546
  #<RedAmber::DataFrame : 3 x 1 Vector, 0x000000000003f4bc>
527
547
  a
528
548
  <uint8>
529
- 1 1
530
- 2 2
531
- 3 3
549
+ 0 1
550
+ 1 2
551
+ 2 3
532
552
 
533
553
  df[:a]
534
554
 
@@ -548,9 +568,9 @@ penguins.to_rover
548
568
  [1, 2, 3]
549
569
  ```
550
570
 
551
- ### `slice ` - to cut vertically is slice -
571
+ ### `slice ` - slice and select records -
552
572
 
553
- Slice and select rows (observations) to create a sub DataFrame.
573
+ Slice and select records (rows) to create a sub DataFrame.
554
574
 
555
575
  ![slice method image](doc/../image/dataframe/slice.png)
556
576
 
@@ -561,22 +581,22 @@ penguins.to_rover
561
581
  Negative index from the tail like Ruby's Array is also acceptable.
562
582
 
563
583
  ```ruby
564
- # returns 5 obs. at start and 5 obs. from end
584
+ # returns 5 records at start and 5 records from end
565
585
  penguins.slice(0...5, -5..-1)
566
586
 
567
587
  # =>
568
588
  #<RedAmber::DataFrame : 10 x 8 Vectors, 0x0000000000042be4>
569
- species island bill_length_mm bill_depth_mm flipper_length_mm ... year
570
- <string> <string> <double> <double> <uint8> ... <uint16>
571
- 1 Adelie Torgersen 39.1 18.7 181 ... 2007
572
- 2 Adelie Torgersen 39.5 17.4 186 ... 2007
573
- 3 Adelie Torgersen 40.3 18.0 195 ... 2007
574
- 4 Adelie Torgersen (nil) (nil) (nil) ... 2007
575
- 5 Adelie Torgersen 36.7 19.3 193 ... 2007
576
- : : : : : : ... :
577
- 8 Gentoo Biscoe 50.4 15.7 222 ... 2009
578
- 9 Gentoo Biscoe 45.2 14.8 212 ... 2009
579
- 10 Gentoo Biscoe 49.9 16.1 213 ... 2009
589
+ species island bill_length_mm bill_depth_mm flipper_length_mm ... year
590
+ <string> <string> <double> <double> <uint8> ... <uint16>
591
+ 0 Adelie Torgersen 39.1 18.7 181 ... 2007
592
+ 1 Adelie Torgersen 39.5 17.4 186 ... 2007
593
+ 2 Adelie Torgersen 40.3 18.0 195 ... 2007
594
+ 3 Adelie Torgersen (nil) (nil) (nil) ... 2007
595
+ 4 Adelie Torgersen 36.7 19.3 193 ... 2007
596
+ : : : : : : ... :
597
+ 7 Gentoo Biscoe 50.4 15.7 222 ... 2009
598
+ 8 Gentoo Biscoe 45.2 14.8 212 ... 2009
599
+ 9 Gentoo Biscoe 49.9 16.1 213 ... 2009
580
600
  ```
581
601
 
582
602
  - Booleans as an argument
@@ -591,15 +611,15 @@ penguins.to_rover
591
611
  #<RedAmber::DataFrame : 242 x 8 Vectors, 0x0000000000043d3c>
592
612
  species island bill_length_mm bill_depth_mm flipper_length_mm ... year
593
613
  <string> <string> <double> <double> <uint8> ... <uint16>
594
- 1 Adelie Torgersen 40.3 18.0 195 ... 2007
595
- 2 Adelie Torgersen 42.0 20.2 190 ... 2007
596
- 3 Adelie Torgersen 41.1 17.6 182 ... 2007
597
- 4 Adelie Torgersen 42.5 20.7 197 ... 2007
598
- 5 Adelie Torgersen 46.0 21.5 194 ... 2007
614
+ 0 Adelie Torgersen 40.3 18.0 195 ... 2007
615
+ 1 Adelie Torgersen 42.0 20.2 190 ... 2007
616
+ 2 Adelie Torgersen 41.1 17.6 182 ... 2007
617
+ 3 Adelie Torgersen 42.5 20.7 197 ... 2007
618
+ 4 Adelie Torgersen 46.0 21.5 194 ... 2007
599
619
  : : : : : : ... :
600
- 240 Gentoo Biscoe 50.4 15.7 222 ... 2009
601
- 241 Gentoo Biscoe 45.2 14.8 212 ... 2009
602
- 242 Gentoo Biscoe 49.9 16.1 213 ... 2009
620
+ 239 Gentoo Biscoe 50.4 15.7 222 ... 2009
621
+ 240 Gentoo Biscoe 45.2 14.8 212 ... 2009
622
+ 241 Gentoo Biscoe 49.9 16.1 213 ... 2009
603
623
  ```
604
624
 
605
625
  - Indices or booleans by a block
@@ -619,15 +639,15 @@ penguins.to_rover
619
639
  #<RedAmber::DataFrame : 204 x 8 Vectors, 0x0000000000047a40>
620
640
  species island bill_length_mm bill_depth_mm flipper_length_mm ... year
621
641
  <string> <string> <double> <double> <uint8> ... <uint16>
622
- 1 Adelie Torgersen 39.1 18.7 181 ... 2007
623
- 2 Adelie Torgersen 39.5 17.4 186 ... 2007
624
- 3 Adelie Torgersen 40.3 18.0 195 ... 2007
625
- 4 Adelie Torgersen 39.3 20.6 190 ... 2007
626
- 5 Adelie Torgersen 38.9 17.8 181 ... 2007
642
+ 0 Adelie Torgersen 39.1 18.7 181 ... 2007
643
+ 1 Adelie Torgersen 39.5 17.4 186 ... 2007
644
+ 2 Adelie Torgersen 40.3 18.0 195 ... 2007
645
+ 3 Adelie Torgersen 39.3 20.6 190 ... 2007
646
+ 4 Adelie Torgersen 38.9 17.8 181 ... 2007
627
647
  : : : : : : ... :
628
- 202 Gentoo Biscoe 47.2 13.7 214 ... 2009
629
- 203 Gentoo Biscoe 46.8 14.3 215 ... 2009
630
- 204 Gentoo Biscoe 45.2 14.8 212 ... 2009
648
+ 201 Gentoo Biscoe 47.2 13.7 214 ... 2009
649
+ 202 Gentoo Biscoe 46.8 14.3 215 ... 2009
650
+ 203 Gentoo Biscoe 45.2 14.8 212 ... 2009
631
651
  ```
632
652
 
633
653
  - Notice: nil option
@@ -656,9 +676,9 @@ penguins.to_rover
656
676
  0 1 A 1.000000
657
677
  ```
658
678
 
659
- ### `remove`
679
+ ### `remove` - counterpart of slice -
660
680
 
661
- Slice and reject rows (observations) to create a remainer DataFrame.
681
+ Slice and reject records (rows) to create a remainer DataFrame.
662
682
 
663
683
  ![remove method image](doc/../image/dataframe/remove.png)
664
684
 
@@ -667,22 +687,22 @@ penguins.to_rover
667
687
  `remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
668
688
 
669
689
  ```ruby
670
- # returns 6th to 339th obs.
690
+ # returns 6th to 339th records
671
691
  penguins.remove(0...5, -5..-1)
672
692
 
673
693
  # =>
674
694
  #<RedAmber::DataFrame : 334 x 8 Vectors, 0x00000000000487c4>
675
695
  species island bill_length_mm bill_depth_mm flipper_length_mm ... year
676
696
  <string> <string> <double> <double> <uint8> ... <uint16>
677
- 1 Adelie Torgersen 39.3 20.6 190 ... 2007
678
- 2 Adelie Torgersen 38.9 17.8 181 ... 2007
679
- 3 Adelie Torgersen 39.2 19.6 195 ... 2007
680
- 4 Adelie Torgersen 34.1 18.1 193 ... 2007
681
- 5 Adelie Torgersen 42.0 20.2 190 ... 2007
697
+ 0 Adelie Torgersen 39.3 20.6 190 ... 2007
698
+ 1 Adelie Torgersen 38.9 17.8 181 ... 2007
699
+ 2 Adelie Torgersen 39.2 19.6 195 ... 2007
700
+ 3 Adelie Torgersen 34.1 18.1 193 ... 2007
701
+ 4 Adelie Torgersen 42.0 20.2 190 ... 2007
682
702
  : : : : : : ... :
683
- 332 Gentoo Biscoe 44.5 15.7 217 ... 2009
684
- 333 Gentoo Biscoe 48.8 16.2 222 ... 2009
685
- 334 Gentoo Biscoe 47.2 13.7 214 ... 2009
703
+ 331 Gentoo Biscoe 44.5 15.7 217 ... 2009
704
+ 332 Gentoo Biscoe 48.8 16.2 222 ... 2009
705
+ 333 Gentoo Biscoe 47.2 13.7 214 ... 2009
686
706
  ```
687
707
 
688
708
  - Booleans as an argument
@@ -690,7 +710,7 @@ penguins.to_rover
690
710
  `remove(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
691
711
 
692
712
  ```ruby
693
- # remove all observation contains nil
713
+ # remove all records contains nil
694
714
  removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
695
715
  removed
696
716
 
@@ -698,15 +718,15 @@ penguins.to_rover
698
718
  #<RedAmber::DataFrame : 333 x 8 Vectors, 0x0000000000049fac>
699
719
  species island bill_length_mm bill_depth_mm flipper_length_mm ... year
700
720
  <string> <string> <double> <double> <uint8> ... <uint16>
701
- 1 Adelie Torgersen 39.1 18.7 181 ... 2007
702
- 2 Adelie Torgersen 39.5 17.4 186 ... 2007
703
- 3 Adelie Torgersen 40.3 18.0 195 ... 2007
704
- 4 Adelie Torgersen 36.7 19.3 193 ... 2007
705
- 5 Adelie Torgersen 39.3 20.6 190 ... 2007
721
+ 0 Adelie Torgersen 39.1 18.7 181 ... 2007
722
+ 1 Adelie Torgersen 39.5 17.4 186 ... 2007
723
+ 2 Adelie Torgersen 40.3 18.0 195 ... 2007
724
+ 3 Adelie Torgersen 36.7 19.3 193 ... 2007
725
+ 4 Adelie Torgersen 39.3 20.6 190 ... 2007
706
726
  : : : : : : ... :
707
- 331 Gentoo Biscoe 50.4 15.7 222 ... 2009
708
- 332 Gentoo Biscoe 45.2 14.8 212 ... 2009
709
- 333 Gentoo Biscoe 49.9 16.1 213 ... 2009
727
+ 330 Gentoo Biscoe 50.4 15.7 222 ... 2009
728
+ 331 Gentoo Biscoe 45.2 14.8 212 ... 2009
729
+ 332 Gentoo Biscoe 49.9 16.1 213 ... 2009
710
730
  ```
711
731
 
712
732
  - Indices or booleans by a block
@@ -727,15 +747,15 @@ penguins.to_rover
727
747
  #<RedAmber::DataFrame : 140 x 8 Vectors, 0x000000000004de40>
728
748
  species island bill_length_mm bill_depth_mm flipper_length_mm ... year
729
749
  <string> <string> <double> <double> <uint8> ... <uint16>
730
- 1 Adelie Torgersen (nil) (nil) (nil) ... 2007
731
- 2 Adelie Torgersen 36.7 19.3 193 ... 2007
732
- 3 Adelie Torgersen 34.1 18.1 193 ... 2007
733
- 4 Adelie Torgersen 37.8 17.1 186 ... 2007
734
- 5 Adelie Torgersen 37.8 17.3 180 ... 2007
750
+ 0 Adelie Torgersen (nil) (nil) (nil) ... 2007
751
+ 1 Adelie Torgersen 36.7 19.3 193 ... 2007
752
+ 2 Adelie Torgersen 34.1 18.1 193 ... 2007
753
+ 3 Adelie Torgersen 37.8 17.1 186 ... 2007
754
+ 4 Adelie Torgersen 37.8 17.3 180 ... 2007
735
755
  : : : : : : ... :
736
- 138 Gentoo Biscoe (nil) (nil) (nil) ... 2009
737
- 139 Gentoo Biscoe 50.4 15.7 222 ... 2009
738
- 140 Gentoo Biscoe 49.9 16.1 213 ... 2009
756
+ 137 Gentoo Biscoe (nil) (nil) (nil) ... 2009
757
+ 138 Gentoo Biscoe 50.4 15.7 222 ... 2009
758
+ 139 Gentoo Biscoe 49.9 16.1 213 ... 2009
739
759
  ```
740
760
 
741
761
  - Notice for nil
@@ -770,13 +790,13 @@ penguins.to_rover
770
790
  #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000005df98>
771
791
  a b c
772
792
  <uint8> <string> <double>
773
- 1 1 A 1.0
774
- 2 (nil) C 3.0
793
+ 0 1 A 1.0
794
+ 1 (nil) C 3.0
775
795
  ```
776
796
 
777
797
  ### `rename`
778
798
 
779
- Rename keys (column names) to create a updated DataFrame.
799
+ Rename keys (variable/column names) to create a updated DataFrame.
780
800
 
781
801
  ![rename method image](doc/../image/dataframe/rename.png)
782
802
 
@@ -792,9 +812,9 @@ penguins.to_rover
792
812
  #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000060838>
793
813
  name age_in_1993
794
814
  <string> <uint8>
795
- 1 Yasuko 68
796
- 2 Rui 49
797
- 3 Hinata 28
815
+ 0 Yasuko 68
816
+ 1 Rui 49
817
+ 2 Hinata 28
798
818
  ```
799
819
 
800
820
  - Key pairs by a block
@@ -811,7 +831,7 @@ penguins.to_rover
811
831
 
812
832
  ### `assign`
813
833
 
814
- Assign new or updated columns (variables) and create a updated DataFrame.
834
+ Assign new or updated variables (columns) and create an updated DataFrame.
815
835
 
816
836
  - Variables with new keys will append new columns from the right.
817
837
  - Variables with exisiting keys will update corresponding vectors.
@@ -832,9 +852,9 @@ penguins.to_rover
832
852
  #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000062804>
833
853
  name age
834
854
  <string> <uint8>
835
- 1 Yasuko 68
836
- 2 Rui 49
837
- 3 Hinata 28
855
+ 0 Yasuko 68
856
+ 1 Rui 49
857
+ 2 Hinata 28
838
858
 
839
859
  # update :age and add :brother
840
860
  df.assign do
@@ -848,9 +868,9 @@ penguins.to_rover
848
868
  #<RedAmber::DataFrame : 3 x 3 Vectors, 0x00000000000658b0>
849
869
  name age brother
850
870
  <string> <uint8> <string>
851
- 1 Yasuko 97 Santa
852
- 2 Rui 78 (nil)
853
- 3 Hinata 57 Momotaro
871
+ 0 Yasuko 97 Santa
872
+ 1 Rui 78 (nil)
873
+ 2 Hinata 57 Momotaro
854
874
  ```
855
875
 
856
876
  - Key pairs by a block
@@ -869,11 +889,11 @@ penguins.to_rover
869
889
  #<RedAmber::DataFrame : 5 x 3 Vectors, 0x0000000000069e60>
870
890
  index float string
871
891
  <uint8> <double> <string>
872
- 1 0 0.0 A
873
- 2 1 1.1 B
874
- 3 2 2.2 C
875
- 4 3 NaN D
876
- 5 (nil) (nil) (nil)
892
+ 0 0 0.0 A
893
+ 1 1 1.1 B
894
+ 2 2 2.2 C
895
+ 3 3 NaN D
896
+ 4 (nil) (nil) (nil)
877
897
 
878
898
  # update :float
879
899
  # assigner by an Array
@@ -886,11 +906,11 @@ penguins.to_rover
886
906
  #<RedAmber::DataFrame : 5 x 3 Vectors, 0x00000000000dfffc>
887
907
  index float string
888
908
  <uint8> <double> <string>
889
- 1 0 -0.0 A
890
- 2 1 -1.1 B
891
- 3 2 -2.2 C
892
- 4 3 NaN D
893
- 5 (nil) (nil) (nil)
909
+ 0 0 -0.0 A
910
+ 1 1 -1.1 B
911
+ 2 2 -2.2 C
912
+ 3 3 NaN D
913
+ 4 (nil) (nil) (nil)
894
914
 
895
915
  # Or we can use assigner by a Hash
896
916
  df.assign do
@@ -921,11 +941,11 @@ penguins.to_rover
921
941
  #<RedAmber::DataFrame : 5 x 4 Vectors, 0x000000000001787c>
922
942
  new_index index float string
923
943
  <uint8> <uint8> <double> <string>
924
- 1 1 0 0.0 A
925
- 2 2 1 1.1 B
926
- 3 3 2 2.2 C
927
- 4 4 3 NaN D
928
- 5 5 (nil) (nil) (nil)
944
+ 0 1 0 0.0 A
945
+ 1 2 1 1.1 B
946
+ 2 3 2 2.2 C
947
+ 3 4 3 NaN D
948
+ 4 5 (nil) (nil) (nil)
929
949
  ```
930
950
 
931
951
  ### `slice_by(key, keep_key: false) { block }`
@@ -946,11 +966,11 @@ penguins.to_rover
946
966
  #<RedAmber::DataFrame : 5 x 3 Vectors, 0x0000000000069e60>
947
967
  index float string
948
968
  <uint8> <double> <string>
949
- 1 0 0.0 A
950
- 2 1 1.1 B
951
- 3 2 2.2 C
952
- 4 3 NaN D
953
- 5 (nil) (nil) (nil)
969
+ 0 0 0.0 A
970
+ 1 1 1.1 B
971
+ 2 2 2.2 C
972
+ 3 3 NaN D
973
+ 4 (nil) (nil) (nil)
954
974
 
955
975
  df.slice_by(:string) { ["A", "C"] }
956
976
 
@@ -958,8 +978,8 @@ penguins.to_rover
958
978
  #<RedAmber::DataFrame : 2 x 2 Vectors, 0x000000000001b1ac>
959
979
  index float
960
980
  <uint8> <double>
961
- 1 0 0.0
962
- 2 2 2.2
981
+ 0 0 0.0
982
+ 1 2 2.2
963
983
  ```
964
984
 
965
985
  It is the same behavior as;
@@ -977,9 +997,9 @@ It is the same behavior as;
977
997
  #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000069668>
978
998
  index float
979
999
  <uint8> <double>
980
- 1 0 0.0
981
- 2 1 1.1
982
- 3 2 2.2
1000
+ 0 0 0.0
1001
+ 1 1 1.1
1002
+ 2 2 2.2
983
1003
  ```
984
1004
 
985
1005
  When the option `keep_key: true` used, the column `key` will be preserved.
@@ -991,16 +1011,16 @@ When the option `keep_key: true` used, the column `key` will be preserved.
991
1011
  #<RedAmber::DataFrame : 3 x 3 Vectors, 0x0000000000073c44>
992
1012
  index float string
993
1013
  <uint8> <double> <string>
994
- 1 0 0.0 A
995
- 2 1 1.1 B
996
- 3 2 2.2 C
1014
+ 0 0 0.0 A
1015
+ 1 1 1.1 B
1016
+ 2 2 2.2 C
997
1017
  ```
998
1018
 
999
1019
  ## Updating
1000
1020
 
1001
1021
  ### `sort`
1002
1022
 
1003
- `sort` accepts parameters as sort_keys thanks to the amazing Red Arrow feature。
1023
+ `sort` accepts parameters as sort_keys thanks to the Red Arrow's feature。
1004
1024
  - :key, "key" or "+key" denotes ascending order
1005
1025
  - "-key" denotes descending order
1006
1026
 
@@ -1016,11 +1036,11 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1016
1036
  #<RedAmber::DataFrame : 5 x 3 Vectors, 0x000000000009b03c>
1017
1037
  index string bool
1018
1038
  <uint8> <string> <boolean>
1019
- 1 0 (nil) false
1020
- 2 0 B false
1021
- 3 1 B true
1022
- 4 1 C (nil)
1023
- 5 (nil) A true
1039
+ 0 0 (nil) false
1040
+ 1 0 B false
1041
+ 2 1 B true
1042
+ 3 1 C (nil)
1043
+ 4 (nil) A true
1024
1044
  ```
1025
1045
 
1026
1046
  - [ ] Clamp
@@ -1031,13 +1051,13 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1031
1051
 
1032
1052
  ### `remove_nil`
1033
1053
 
1034
- Remove any observations containing nil.
1054
+ Remove any records containing nil.
1035
1055
 
1036
1056
  ## Grouping
1037
1057
 
1038
1058
  ### `group(group_keys)`
1039
1059
 
1040
- `group` creates a class `Group` object. `Group` accepts functions below as a method.
1060
+ `group` creates a instance of class `Group`. `Group` accepts functions below as a method.
1041
1061
  Method accepts options as `group_keys`.
1042
1062
 
1043
1063
  Available functions are:
@@ -1064,23 +1084,22 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1064
1084
  This is an example of grouping of famous STARWARS dataset.
1065
1085
 
1066
1086
  ```ruby
1067
- starwars =
1068
- RedAmber::DataFrame.load(URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv"))
1069
- starwars
1087
+ uri = URI("https://vincentarelbundock.github.io/Rdatasets/csv/dplyr/starwars.csv")
1088
+ starwars = RedAmber::DataFrame.load(uri)
1070
1089
 
1071
1090
  # =>
1072
1091
  #<RedAmber::DataFrame : 87 x 12 Vectors, 0x0000000000005a50>
1073
1092
  unnamed1 name height mass hair_color skin_color eye_color ... species
1074
1093
  <int64> <string> <int64> <double> <string> <string> <string> ... <string>
1075
- 1 1 Luke Skywalker 172 77.0 blond fair blue ... Human
1076
- 2 2 C-3PO 167 75.0 NA gold yellow ... Droid
1077
- 3 3 R2-D2 96 32.0 NA white, blue red ... Droid
1078
- 4 4 Darth Vader 202 136.0 none white yellow ... Human
1079
- 5 5 Leia Organa 150 49.0 brown light brown ... Human
1094
+ 0 1 Luke Skywalker 172 77.0 blond fair blue ... Human
1095
+ 1 2 C-3PO 167 75.0 NA gold yellow ... Droid
1096
+ 2 3 R2-D2 96 32.0 NA white, blue red ... Droid
1097
+ 3 4 Darth Vader 202 136.0 none white yellow ... Human
1098
+ 4 5 Leia Organa 150 49.0 brown light brown ... Human
1080
1099
  : : : : : : : : ... :
1081
- 85 85 BB8 (nil) (nil) none none black ... Droid
1082
- 86 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
1083
- 87 87 Padmé Amidala 165 45.0 brown light brown ... Human
1100
+ 84 85 BB8 (nil) (nil) none none black ... Droid
1101
+ 85 86 Captain Phasma (nil) (nil) unknown unknown unknown ... NA
1102
+ 86 87 Padmé Amidala 165 45.0 brown light brown ... Human
1084
1103
 
1085
1104
  starwars.tdr(12)
1086
1105
 
@@ -1088,58 +1107,60 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1088
1107
  RedAmber::DataFrame : 87 x 12 Vectors
1089
1108
  Vectors : 4 numeric, 8 strings
1090
1109
  # key type level data_preview
1091
- 1 :unnamed1 int64 87 [1, 2, 3, 4, 5, ... ]
1092
- 2 :name string 87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organa", ... ]
1093
- 3 :height int64 46 [172, 167, 96, 202, 150, ... ], 6 nils
1094
- 4 :mass double 39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
1095
- 5 :hair_color string 13 ["blond", "NA", "NA", "none", "brown", ... ]
1096
- 6 :skin_color string 31 ["fair", "gold", "white, blue", "white", "light", ... ]
1097
- 7 :eye_color string 15 ["blue", "yellow", "red", "yellow", "brown", ... ]
1098
- 8 :birth_year double 37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
1099
- 9 :sex string 5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, "NA"=>4}
1100
- 10 :gender string 3 {"masculine"=>66, "feminine"=>17, "NA"=>4}
1101
- 11 :homeworld string 49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ]
1102
- 12 :species string 38 ["Human", "Droid", "Droid", "Human", "Human", ... ]
1110
+ 0 :unnamed1 int64 87 [1, 2, 3, 4, 5, ... ]
1111
+ 1 :name string 87 ["Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Organa", ... ]
1112
+ 2 :height int64 46 [172, 167, 96, 202, 150, ... ], 6 nils
1113
+ 3 :mass double 39 [77.0, 75.0, 32.0, 136.0, 49.0, ... ], 28 nils
1114
+ 4 :hair_color string 13 ["blond", "NA", "NA", "none", "brown", ... ]
1115
+ 5 :skin_color string 31 ["fair", "gold", "white, blue", "white", "light", ... ]
1116
+ 6 :eye_color string 15 ["blue", "yellow", "red", "yellow", "brown", ... ]
1117
+ 7 :birth_year double 37 [19.0, 112.0, 33.0, 41.9, 19.0, ... ], 44 nils
1118
+ 8 :sex string 5 {"male"=>60, "none"=>6, "female"=>16, "hermaphroditic"=>1, "NA"=>4}
1119
+ 9 :gender string 3 {"masculine"=>66, "feminine"=>17, "NA"=>4}
1120
+ 10 :homeworld string 49 ["Tatooine", "Tatooine", "Naboo", "Tatooine", "Alderaan", ... ]
1121
+ 11 :species string 38 ["Human", "Droid", "Droid", "Human", "Human", ... ]
1103
1122
  ```
1104
1123
 
1105
1124
  We can group by `:species` and calculate the count.
1106
1125
 
1107
1126
  ```ruby
1108
- starwars.group(:species).count(:species)
1127
+ starwars.remove { species == "NA" }
1128
+ .group(:species).count(:species)
1109
1129
 
1110
1130
  # =>
1111
- #<RedAmber::DataFrame : 38 x 2 Vectors, 0x000000000001d6f0>
1131
+ #<RedAmber::DataFrame : 37 x 2 Vectors, 0x000000000000ffa0>
1112
1132
  species count
1113
1133
  <string> <int64>
1114
- 1 Human 35
1115
- 2 Droid 6
1116
- 3 Wookiee 2
1117
- 4 Rodian 1
1118
- 5 Hutt 1
1134
+ 0 Human 35
1135
+ 1 Droid 6
1136
+ 2 Wookiee 2
1137
+ 3 Rodian 1
1138
+ 4 Hutt 1
1119
1139
  : : :
1120
- 36 Kaleesh 1
1121
- 37 Pau'an 1
1122
- 38 Kel Dor 1
1140
+ 34 Kaleesh 1
1141
+ 35 Pau'an 1
1142
+ 36 Kel Dor 1
1123
1143
  ```
1124
1144
 
1125
1145
  We can also calculate the mean of `:mass` and `:height` together.
1126
1146
 
1127
1147
  ```ruby
1128
- grouped = starwars.group(:species) { [count(:species), mean(:height, :mass)] }
1148
+ grouped = starwars.remove { species == "NA" }
1149
+ .group(:species) { [count(:species), mean(:height, :mass)] }
1129
1150
 
1130
1151
  # =>
1131
- #<RedAmber::DataFrame : 38 x 4 Vectors, 0x00000000000407cc>
1132
- specie s count mean(height) mean(mass)
1133
- <strin g> <int64> <double> <double>
1134
- 1 Human 35 176.6 82.8
1135
- 2 Droid 6 131.2 69.8
1136
- 3 Wookie e 2 231.0 124.0
1137
- 4 Rodian 1 173.0 74.0
1138
- 5 Hutt 1 175.0 1358.0
1139
- : : : : :
1140
- 36 Kalees h 1 216.0 159.0
1141
- 37 Pau'an 1 206.0 80.0
1142
- 38 Kel Dor 1 188.0 80.0
1152
+ #<RedAmber::DataFrame : 37 x 4 Vectors, 0x000000000000fff0>
1153
+ species count mean(height) mean(mass)
1154
+ <string> <int64> <double> <double>
1155
+ 0 Human 35 176.65 82.78
1156
+ 1 Droid 6 131.2 69.75
1157
+ 2 Wookiee 2 231.0 124.0
1158
+ 3 Rodian 1 173.0 74.0
1159
+ 4 Hutt 1 175.0 1358.0
1160
+ : : : : :
1161
+ 34 Kaleesh 1 216.0 159.0
1162
+ 35 Pau'an 1 206.0 80.0
1163
+ 36 Kel Dor 1 188.0 80.0
1143
1164
  ```
1144
1165
 
1145
1166
  Select rows for count > 1.
@@ -1148,22 +1169,23 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1148
1169
  grouped.slice(grouped[:count] > 1)
1149
1170
 
1150
1171
  # =>
1151
- #<RedAmber::DataFrame : 9 x 4 Vectors, 0x000000000004c270>
1172
+ #<RedAmber::DataFrame : 8 x 4 Vectors, 0x000000000001002c>
1152
1173
  species count mean(height) mean(mass)
1153
1174
  <string> <int64> <double> <double>
1154
- 1 Human 35 176.6 82.8
1155
- 2 Droid 6 131.2 69.8
1156
- 3 Wookiee 2 231.0 124.0
1157
- 4 Gungan 3 208.7 74.0
1158
- 5 NA 4 181.3 48.0
1159
- : : : : :
1160
- 7 Twi'lek 2 179.0 55.0
1161
- 8 Mirialan 2 168.0 53.1
1162
- 9 Kaminoan 2 221.0 88.0
1175
+ 0 Human 35 176.65 82.78
1176
+ 1 Droid 6 131.2 69.75
1177
+ 2 Wookiee 2 231.0 124.0
1178
+ 3 Gungan 3 208.67 74.0
1179
+ 4 Zabrak 2 173.0 80.0
1180
+ 5 Twi'lek 2 179.0 55.0
1181
+ 6 Mirialan 2 168.0 53.1
1182
+ 7 Kaminoan 2 221.0 88.0
1163
1183
  ```
1164
1184
 
1165
1185
  ## Reshape
1166
1186
 
1187
+ ![dataframe reshapeing image](doc/../image/reshaping_dataframe.png)
1188
+
1167
1189
  ### `transpose`
1168
1190
 
1169
1191
  Creates transposed DataFrame for the wide (messy) dataframe.
@@ -1175,30 +1197,31 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1175
1197
  #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000d520>
1176
1198
  Year Audi BMW BMW_MINI Mercedes-Benz VW
1177
1199
  <int64> <int64> <int64> <int64> <int64> <int64>
1178
- 1 2017 28336 52527 25427 68221 49040
1179
- 2 2018 26473 50982 25984 67554 51961
1180
- 3 2019 24222 46814 23813 66553 46794
1181
- 4 2020 22304 35712 20196 57041 36576
1182
- 5 2021 22535 35905 18211 51722 35215
1183
- import_cars.transpose(:Manufacturer)
1200
+ 0 2017 28336 52527 25427 68221 49040
1201
+ 1 2018 26473 50982 25984 67554 51961
1202
+ 2 2019 24222 46814 23813 66553 46794
1203
+ 3 2020 22304 35712 20196 57041 36576
1204
+ 4 2021 22535 35905 18211 51722 35215
1205
+
1206
+ import_cars.transpose(name: :Manufacturer)
1184
1207
 
1185
1208
  # =>
1186
- #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000ef74>
1209
+ #<RedAmber::DataFrame : 5 x 6 Vectors, 0x0000000000010a2c>
1187
1210
  Manufacturer 2017 2018 2019 2020 2021
1188
- <dictionary> <uint32> <uint32> <uint32> <uint16> <uint16>
1189
- 1 Audi 28336 26473 24222 22304 22535
1190
- 2 BMW 52527 50982 46814 35712 35905
1191
- 3 BMW_MINI 25427 25984 23813 20196 18211
1192
- 4 Mercedes-Benz 68221 67554 66553 57041 51722
1193
- 5 VW 49040 51961 46794 36576 35215
1211
+ <string> <uint32> <uint32> <uint32> <uint16> <uint16>
1212
+ 0 Audi 28336 26473 24222 22304 22535
1213
+ 1 BMW 52527 50982 46814 35712 35905
1214
+ 2 BMW_MINI 25427 25984 23813 20196 18211
1215
+ 3 Mercedes-Benz 68221 67554 66553 57041 51722
1216
+ 4 VW 49040 51961 46794 36576 35215
1194
1217
  ```
1195
1218
 
1196
1219
  The leftmost column is created by original keys. Key name of the column is
1197
- named by parameter `:name`. If `:name` is not specified, `:N` is used for the key.
1220
+ named by parameter `:name`. If `:name` is not specified, `:NAME` is used for the key.
1198
1221
 
1199
1222
  ### `to_long(*keep_keys)`
1200
1223
 
1201
- Creates a 'long' (tidy) DataFrame from a 'wide' DataFrame.
1224
+ Creates a 'long' (may be tidy) DataFrame from a 'wide' DataFrame.
1202
1225
 
1203
1226
  - Parameter `keep_keys` specifies the key names to keep.
1204
1227
 
@@ -1206,47 +1229,51 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1206
1229
  import_cars.to_long(:Year)
1207
1230
 
1208
1231
  # =>
1209
- #<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000012750>
1210
- Year N V
1211
- <uint16> <dictionary> <uint32>
1212
- 1 2017 Audi 28336
1213
- 2 2017 BMW 52527
1214
- 3 2017 BMW_MINI 25427
1215
- 4 2017 Mercedes-Benz 68221
1216
- 5 2017 VW 49040
1232
+ #<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000011864>
1233
+ Year NAME VALUE
1234
+ <uint16> <string> <uint32>
1235
+ 0 2017 Audi 28336
1236
+ 1 2017 BMW 52527
1237
+ 2 2017 BMW_MINI 25427
1238
+ 3 2017 Mercedes-Benz 68221
1239
+ 4 2017 VW 49040
1217
1240
  : : : :
1218
- 23 2021 BMW_MINI 18211
1219
- 24 2021 Mercedes-Benz 51722
1220
- 25 2021 VW 35215
1241
+ 22 2021 BMW_MINI 18211
1242
+ 23 2021 Mercedes-Benz 51722
1243
+ 24 2021 VW 35215
1221
1244
  ```
1222
1245
 
1223
1246
  - Option `:name` is the key of the column which came **from key names**.
1247
+ The default value is `:NAME` if it is not specified.
1224
1248
  - Option `:value` is the key of the column which came **from values**.
1249
+ The default value is `:VALUE` if it is not specified.
1225
1250
 
1226
1251
  ```ruby
1227
1252
  import_cars.to_long(:Year, name: :Manufacturer, value: :Num_of_imported)
1228
1253
 
1229
1254
  # =>
1230
- #<RedAmber::DataFrame : 25 x 3 Vectors, 0x0000000000017700>
1255
+ #<RedAmber::DataFrame : 25 x 3 Vectors, 0x000000000001359c>
1231
1256
  Year Manufacturer Num_of_imported
1232
- <uint16> <dictionary> <uint32>
1233
- 1 2017 Audi 28336
1234
- 2 2017 BMW 52527
1235
- 3 2017 BMW_MINI 25427
1236
- 4 2017 Mercedes-Benz 68221
1237
- 5 2017 VW 49040
1257
+ <uint16> <string> <uint32>
1258
+ 0 2017 Audi 28336
1259
+ 1 2017 BMW 52527
1260
+ 2 2017 BMW_MINI 25427
1261
+ 3 2017 Mercedes-Benz 68221
1262
+ 4 2017 VW 49040
1238
1263
  : : : :
1239
- 23 2021 BMW_MINI 18211
1240
- 24 2021 Mercedes-Benz 51722
1241
- 25 2021 VW 35215
1264
+ 22 2021 BMW_MINI 18211
1265
+ 23 2021 Mercedes-Benz 51722
1266
+ 24 2021 VW 35215
1242
1267
  ```
1243
1268
 
1244
1269
  ### `to_wide`
1245
1270
 
1246
- Creates a 'wide' (messy) DataFrame from a 'long' DataFrame.
1271
+ Creates a 'wide' (may be messy) DataFrame from a 'long' DataFrame.
1247
1272
 
1248
1273
  - Option `:name` is the key of the column which will be expanded **to key names**.
1274
+ The default value is `:NAME` if it is not specified.
1249
1275
  - Option `:value` is the key of the column which will be expanded **to values**.
1276
+ The default value is `:VALUE` if it is not specified.
1250
1277
 
1251
1278
  ```ruby
1252
1279
  import_cars.to_long(:Year).to_wide
@@ -1257,20 +1284,286 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1257
1284
  #<RedAmber::DataFrame : 5 x 6 Vectors, 0x000000000000f0f0>
1258
1285
  Year Audi BMW BMW_MINI Mercedes-Benz VW
1259
1286
  <uint16> <uint16> <uint16> <uint16> <uint32> <uint16>
1260
- 1 2017 28336 52527 25427 68221 49040
1261
- 2 2018 26473 50982 25984 67554 51961
1262
- 3 2019 24222 46814 23813 66553 46794
1263
- 4 2020 22304 35712 20196 57041 36576
1264
- 5 2021 22535 35905 18211 51722 35215
1265
-
1266
- # == import_cars
1287
+ 0 2017 28336 52527 25427 68221 49040
1288
+ 1 2018 26473 50982 25984 67554 51961
1289
+ 2 2019 24222 46814 23813 66553 46794
1290
+ 3 2020 22304 35712 20196 57041 36576
1291
+ 4 2021 22535 35905 18211 51722 35215
1267
1292
  ```
1268
1293
 
1269
1294
  ## Combine
1270
1295
 
1271
- - [ ] Combining dataframes
1296
+ ### `join`
1297
+ ![dataframe joining image](doc/../image/dataframe/join.png)
1298
+
1299
+ You should use specific `*_join` methods below.
1300
+
1301
+ - `other` is a DataFrame or a Arrow::Table.
1302
+ - `join_keys` are keys shared by self and other to match with them.
1303
+ - If `join_keys` are empty, common keys in self and other are chosen (natural join).
1304
+ - If (common keys) > `join_keys`, duplicated keys are renamed by `suffix`.
1305
+
1306
+ ```ruby
1307
+ df = DataFrame.new(
1308
+ KEY: %w[A B C],
1309
+ X1: [1, 2, 3]
1310
+ )
1311
+ #=>
1312
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000012a70>
1313
+ KEY X1
1314
+ <string> <uint8>
1315
+ 0 A 1
1316
+ 1 B 2
1317
+ 2 C 3
1318
+
1319
+ other = DataFrame.new(
1320
+ KEY: %w[A B D],
1321
+ X2: [true, false, nil]
1322
+ )
1323
+ #=>
1324
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000017034>
1325
+ KEY X2
1326
+ <string> <boolean>
1327
+ 0 A true
1328
+ 1 B false
1329
+ 2 D (nil)
1330
+ ```
1331
+
1332
+ #### Mutating joins
1333
+
1334
+ ##### `inner_join(other, join_keys = nil, suffix: '.1')`
1335
+
1336
+ Join data, leaving only the matching records.
1337
+
1338
+ ```ruby
1339
+ df.inner_join(other, :KEY)
1340
+ #=>
1341
+ #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000001e2bc>
1342
+ KEY X1 X2
1343
+ <string> <uint8> <boolean>
1344
+ 0 A 1 true
1345
+ 1 B 2 false
1346
+ ```
1347
+
1348
+ ##### `full_join(other, join_keys = nil, suffix: '.1')`
1349
+
1350
+ Join data, leaving all records.
1351
+
1352
+ ```ruby
1353
+ df.full_join(other, :KEY)
1354
+ #=>
1355
+ #<RedAmber::DataFrame : 4 x 3 Vectors, 0x0000000000029fcc>
1356
+ KEY X1 X2
1357
+ <string> <uint8> <boolean>
1358
+ 0 A 1 true
1359
+ 1 B 2 false
1360
+ 2 C 3 (nil)
1361
+ 3 D (nil) (nil)
1362
+ ```
1272
1363
 
1273
- - [ ] Join
1364
+ ##### `left_join(other, join_keys = nil, suffix: '.1')`
1365
+
1366
+ Join matching values to self from other.
1367
+
1368
+ ```ruby
1369
+ df.left_join(other, :KEY)
1370
+ #=>
1371
+ #<RedAmber::DataFrame : 3 x 3 Vectors, 0x0000000000029fcc>
1372
+ KEY X1 X2
1373
+ <string> <uint8> <boolean>
1374
+ 0 A 1 true
1375
+ 1 B 2 false
1376
+ 2 C 3 (nil)
1377
+ ```
1378
+
1379
+ ##### `right_join(other, join_keys = nil, suffix: '.1')`
1380
+
1381
+ Join matching values from self to other.
1382
+
1383
+ ```ruby
1384
+ df.right_join(other, :KEY)
1385
+ #=>
1386
+ #<RedAmber::DataFrame : 2 x 3 Vectors, 0x0000000000029fcc>
1387
+ KEY X1 X2
1388
+ <string> <uint8> <boolean>
1389
+ 0 A 1 true
1390
+ 1 B 2 false
1391
+ 2 D (nil) (nil)
1392
+ ```
1393
+
1394
+ #### Filtering join
1395
+
1396
+ ##### `semi_join(other, join_keys = nil, suffix: '.1')`
1397
+
1398
+ Return records of self that have a match in other.
1399
+
1400
+ ```ruby
1401
+ df.semi_join(other, :KEY)
1402
+ #=>
1403
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000029fcc>
1404
+ KEY X1
1405
+ <string> <uint8>
1406
+ 0 A 1
1407
+ 1 B 2
1408
+ ```
1409
+
1410
+ ##### `anti_join(other, join_keys = nil, suffix: '.1')`
1411
+
1412
+ Return records of self that do not have a match in other.
1413
+
1414
+ ```ruby
1415
+ df.anti_join(other, :KEY)
1416
+ #=>
1417
+ #<RedAmber::DataFrame : 1 x 2 Vectors, 0x0000000000029fcc>
1418
+ KEY X1
1419
+ <string> <uint8>
1420
+ 0 C 3
1421
+ ```
1422
+
1423
+ ## Set operations
1424
+ ![dataframe set and binding image](doc/../image/dataframe/set_and_bind.png)
1425
+
1426
+ Keys in self and other must be same in set operations.
1427
+
1428
+ ```ruby
1429
+ df = DataFrame.new(
1430
+ KEY1: %w[A B C],
1431
+ KEY2: [1, 2, 3]
1432
+ )
1433
+ #=>
1434
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000012a70>
1435
+ KEY1 KEY2
1436
+ <string> <uint8>
1437
+ 0 A 1
1438
+ 1 B 2
1439
+ 2 C 3
1440
+
1441
+ other = DataFrame.new(
1442
+ KEY1: %w[A B D],
1443
+ KEY2: [1, 4, 5]
1444
+ )
1445
+ #=>
1446
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000017034>
1447
+ KEY1 KEY2
1448
+ <string> <uint8>
1449
+ 0 A 1
1450
+ 1 B 4
1451
+ 2 D 5
1452
+ ```
1453
+
1454
+ ##### `intersect(other)`
1455
+
1456
+ Select records appearing in both self and other.
1457
+
1458
+ ```ruby
1459
+ df.intersect(other)
1460
+ #=>
1461
+ #<RedAmber::DataFrame : 1 x 2 Vectors, 0x0000000000029fcc>
1462
+ KEY1 KEY2
1463
+ <string> <uint8>
1464
+ 0 A 1
1465
+ ```
1466
+
1467
+ ##### `union(other)`
1468
+
1469
+ Select records appearing in self or other.
1470
+
1471
+ ```ruby
1472
+ df.union(other)
1473
+ #=>
1474
+ #<RedAmber::DataFrame : 5 x 2 Vectors, 0x0000000000029fcc>
1475
+ KEY1 KEY2
1476
+ <string> <uint8>
1477
+ 0 A 1
1478
+ 1 B 2
1479
+ 2 C 3
1480
+ 3 B 4
1481
+ 4 D 5
1482
+ ```
1483
+
1484
+ ##### `difference(other)`
1485
+
1486
+ Select records appearing in self but not in other.
1487
+
1488
+ It has an alias `setdiff`.
1489
+
1490
+ ```ruby
1491
+ df.difference(other)
1492
+ #=>
1493
+ #<RedAmber::DataFrame : 1 x 2 Vectors, 0x0000000000029fcc>
1494
+ KEY1 KEY2
1495
+ <string> <uint8>
1496
+ 1 B 2
1497
+ 2 C 3
1498
+ ```
1499
+
1500
+ ## Binding
1501
+
1502
+ ### `concatenate(other)`
1503
+
1504
+ Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
1505
+
1506
+ The alias is `concat`.
1507
+
1508
+ An array of DataFrames or Tables is also acceptable as other.
1509
+
1510
+ ```ruby
1511
+ df
1512
+ #=>
1513
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000022cb8>
1514
+ x y
1515
+ <uint8> <string>
1516
+ 0 1 A
1517
+ 1 2 B
1518
+
1519
+ other
1520
+ #=>
1521
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x000000000001f6d0>
1522
+ x y
1523
+ <uint8> <string>
1524
+ 0 3 C
1525
+ 1 4 D
1526
+
1527
+ df.concatenate(other)
1528
+ #=>
1529
+ #<RedAmber::DataFrame : 4 x 2 Vectors, 0x0000000000022574>
1530
+ x y
1531
+ <uint8> <string>
1532
+ 0 1 A
1533
+ 1 2 B
1534
+ 2 3 C
1535
+ 3 4 D
1536
+ ```
1537
+
1538
+ ### `merge(other)`
1539
+
1540
+ Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
1541
+
1542
+ ```ruby
1543
+ df
1544
+ #=>
1545
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000009150>
1546
+ x y
1547
+ <uint8> <uint8>
1548
+ 0 1 3
1549
+ 1 2 4
1550
+
1551
+ other
1552
+ #=>
1553
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000008a0c>
1554
+ a b
1555
+ <string> <string>
1556
+ 0 A C
1557
+ 1 B D
1558
+
1559
+ df.merge(other)
1560
+ #=>
1561
+ #<RedAmber::DataFrame : 2 x 4 Vectors, 0x000000000000cb70>
1562
+ x y a b
1563
+ <uint8> <uint8> <string> <string>
1564
+ 0 1 3 A C
1565
+ 1 2 4 B D
1566
+ ```
1274
1567
 
1275
1568
  ## Encoding
1276
1569