red_amber 0.2.2 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (41) hide show
  1. checksums.yaml +4 -4
  2. data/.rubocop.yml +114 -39
  3. data/CHANGELOG.md +203 -31
  4. data/Gemfile +5 -2
  5. data/README.md +62 -29
  6. data/benchmark/basic.yml +86 -0
  7. data/benchmark/combine.yml +62 -0
  8. data/benchmark/dataframe.yml +62 -0
  9. data/benchmark/drop_nil.yml +15 -3
  10. data/benchmark/group.yml +39 -0
  11. data/benchmark/reshape.yml +31 -0
  12. data/benchmark/{csv_load_penguins.yml → rover/csv_load_penguins.yml} +3 -3
  13. data/benchmark/rover/flights.yml +23 -0
  14. data/benchmark/rover/penguins.yml +23 -0
  15. data/benchmark/rover/planes.yml +23 -0
  16. data/benchmark/rover/weather.yml +23 -0
  17. data/benchmark/vector.yml +60 -0
  18. data/doc/DataFrame.md +335 -53
  19. data/doc/Vector.md +91 -0
  20. data/doc/image/dataframe/join.png +0 -0
  21. data/doc/image/dataframe/set_and_bind.png +0 -0
  22. data/doc/image/dataframe_model.png +0 -0
  23. data/lib/red_amber/data_frame.rb +167 -51
  24. data/lib/red_amber/data_frame_combinable.rb +486 -0
  25. data/lib/red_amber/data_frame_displayable.rb +6 -4
  26. data/lib/red_amber/data_frame_indexable.rb +2 -2
  27. data/lib/red_amber/data_frame_loadsave.rb +4 -1
  28. data/lib/red_amber/data_frame_reshaping.rb +35 -10
  29. data/lib/red_amber/data_frame_selectable.rb +221 -116
  30. data/lib/red_amber/data_frame_variable_operation.rb +146 -82
  31. data/lib/red_amber/group.rb +108 -18
  32. data/lib/red_amber/helper.rb +53 -43
  33. data/lib/red_amber/refinements.rb +199 -0
  34. data/lib/red_amber/vector.rb +56 -46
  35. data/lib/red_amber/vector_functions.rb +23 -83
  36. data/lib/red_amber/vector_selectable.rb +116 -69
  37. data/lib/red_amber/vector_updatable.rb +189 -65
  38. data/lib/red_amber/version.rb +1 -1
  39. data/lib/red_amber.rb +3 -0
  40. data/red_amber.gemspec +4 -3
  41. metadata +24 -10
data/doc/DataFrame.md CHANGED
@@ -5,7 +5,8 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
5
5
  - A label is attached to `Vector`. We call it `key`.
6
6
  - A `Vector` and associated `key` is grouped as a `variable`.
7
7
  - `variable`s with same vector length are aligned and arranged to be a `DataFrame`.
8
- - Each `Vector` in a `DataFrame` contains a set of relating data at same position. We call it `observation`.
8
+ - Each `key` in a `DataFrame` must be unique.
9
+ - Each `Vector` in a `DataFrame` contains a set of relating data at same position. We call it `record` or `observation`.
9
10
 
10
11
  ![dataframe model image](doc/../image/dataframe_model.png)
11
12
 
@@ -94,13 +95,13 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
94
95
 
95
96
  ### `table`, `to_arrow`
96
97
 
97
- - Reader of Arrow::Table object inside.
98
+ - Returns Arrow::Table object in the DataFrame.
98
99
 
99
- ### `size`, `n_obs`, `n_rows`
100
+ ### `size`, `n_records`, `n_obs`, `n_rows`
100
101
 
101
- - Returns size of Vector (num of observations).
102
-
103
- ### `n_keys`, `n_vars`, `n_cols`,
102
+ - Returns size of Vector (num of records).
103
+
104
+ ### `n_keys`, `n_variables`, `n_vars`, `n_cols`,
104
105
 
105
106
  - Returns num of keys (num of variables).
106
107
 
@@ -138,16 +139,7 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
138
139
 
139
140
  - Returns key names in an Array.
140
141
 
141
- When we use it with vectors, Vector#key is useful to get the key inside of DataFrame.
142
-
143
- ```ruby
144
- # update numeric variables, another solution
145
- df.assign do
146
- vectors.each_with_object({}) do |vector, assigner|
147
- assigner[vector.key] = vector * -1 if vector.numeric?
148
- end
149
- end
150
- ```
142
+ Each key must be unique in the DataFrame.
151
143
 
152
144
  ### `types`
153
145
 
@@ -161,9 +153,20 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
161
153
 
162
154
  - Returns an Array of Vectors.
163
155
 
156
+ When we use it, Vector#key is useful to get the key in the DataFrame.
157
+
158
+ ```ruby
159
+ # update numeric variables, another solution
160
+ df.assign do
161
+ vectors.each_with_object({}) do |vector, assigner|
162
+ assigner[vector.key] = vector * -1 if vector.numeric?
163
+ end
164
+ end
165
+ ```
166
+
164
167
  ### `indices`, `indexes`
165
168
 
166
- - Returns indexes in an Array.
169
+ - Returns indexes in a Vector.
167
170
  Accepts an option `start` as the first of indexes.
168
171
 
169
172
  ```ruby
@@ -171,15 +174,19 @@ Class `RedAmber::DataFrame` represents 2D-data. A `DataFrame` consists with:
171
174
  df.indices
172
175
 
173
176
  # =>
177
+ #<RedAmber::Vector(:uint8, size=5):0x0000000000013ed4>
174
178
  [0, 1, 2, 3, 4]
175
179
 
176
180
  df.indices(1)
177
181
 
178
182
  # =>
183
+ #<RedAmber::Vector(:uint8, size=5):0x0000000000018fd8>
179
184
  [1, 2, 3, 4, 5]
180
185
 
181
186
  df.indices(:a)
187
+
182
188
  # =>
189
+ #<RedAmber::Vector(:dictionary, size=5):0x000000000001bd50>
183
190
  [:a, :b, :c, :d, :e]
184
191
  ```
185
192
 
@@ -275,6 +282,7 @@ penguins.to_rover
275
282
 
276
283
  dataset = Datasets::Penguins.new
277
284
  # (From 0.2.2) responsible to the object which has `to_arrow` method.
285
+ # If older, it should be `dataset.to_arrow` in the parentheses.
278
286
  RedAmber::DataFrame.new(dataset).tdr
279
287
 
280
288
  # =>
@@ -290,10 +298,11 @@ penguins.to_rover
290
298
  6 :sex string 3 {"male"=>168, "female"=>165, nil=>11}
291
299
  7 :year uint16 3 {2007=>110, 2008=>114, 2009=>120}
292
300
  ```
293
-
301
+
302
+ Options:
294
303
  - limit: limit of variables to show. Default value is 10.
295
- - tally: max level to use tally mode.
296
- - elements: max num of element to show values in each observations.
304
+ - tally: max level to use tally mode. Default value is 5.
305
+ - elements: max num of element to show values in each records. Default value is 5.
297
306
 
298
307
  ## Selecting
299
308
 
@@ -303,13 +312,13 @@ penguins.to_rover
303
312
  - Keys in an Array: `df[:symbol1, "string", :symbol2]`
304
313
  - Keys by indeces: `df[df.keys[0]`, `df[df.keys[1,2]]`, `df[df.keys[1..]]`
305
314
 
306
- Key indeces can be used via `keys[i]` because numbers are used to select observations (rows).
315
+ Key indeces should be used via `keys[i]` because numbers are used to select records (rows). See next section.
307
316
 
308
317
  - Keys by a Range:
309
318
 
310
- If keys are able to represent by Range, it can be included in the arguments. See a example below.
319
+ If keys are able to represent by a Range, it can be included in the arguments. See a example below.
311
320
 
312
- - You can exchange the order of variables (columns).
321
+ - You can also exchange the order of variables (columns).
313
322
 
314
323
  ```ruby
315
324
  hash = {a: [1, 2, 3], b: %w[A B C], c: [1.0, 2, 3]}
@@ -325,7 +334,7 @@ penguins.to_rover
325
334
  2 C 3.0 3
326
335
  ```
327
336
 
328
- If `#[]` represents single variable (column), it returns a Vector object.
337
+ If `#[]` represents a single variable (column), it returns a Vector object.
329
338
 
330
339
  ```ruby
331
340
  df[:a]
@@ -334,6 +343,7 @@ penguins.to_rover
334
343
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f140>
335
344
  [1, 2, 3]
336
345
  ```
346
+
337
347
  Or `#v` method also returns a Vector for a key.
338
348
 
339
349
  ```ruby
@@ -344,18 +354,19 @@ penguins.to_rover
344
354
  [1, 2, 3]
345
355
  ```
346
356
 
347
- This may be useful to use in a block of DataFrame manipulation verbs. We can write `v(:a)` rather than `self[:a]` or `df[:a]`
357
+ This method may be useful to use in a block of DataFrame manipulation verbs. We can write `v(:a)` rather than `self[:a]` or `df[:a]`
348
358
 
349
- ### Select observations (rows in a table) by `[]` as `[index]`, `[range]`, `[array]`
359
+ ### Select records (rows in a table) by `[]` as `[index]`, `[range]`, `[array]`
350
360
 
351
- - Select a obs. by index: `df[0]`
352
- - Select obs. by indeces in a Range: `df[1..2]`
361
+ - Select a record by index: `df[0]`
353
362
 
354
- An end-less or a begin-less Range can be used to represent indeces.
363
+ - Select records by indeces in an Array: `df[1, 2]`
355
364
 
356
- - Select obs. by indeces in an Array: `df[1, 2]`
365
+ - Select records by indeces in a Range: `df[1..2]`
357
366
 
358
- - You can use float indices.
367
+ An end-less or a begin-less Range can be used to represent indeces.
368
+
369
+ - You can use indices in Float.
359
370
 
360
371
  - Mixed case: `df[2, 0..]`
361
372
 
@@ -374,9 +385,9 @@ penguins.to_rover
374
385
  3 3 C 3.0
375
386
  ```
376
387
 
377
- - Select obs. by a boolean Array or a boolean RedAmber::Vector at same size as self.
388
+ - Select records by a boolean Array or a boolean RedAmber::Vector at same size as self.
378
389
 
379
- It returns a sub dataframe with observations at boolean is true.
390
+ It returns a sub dataframe with records at boolean is true.
380
391
 
381
392
  ```ruby
382
393
  # with the same dataframe `df` above
@@ -391,15 +402,15 @@ penguins.to_rover
391
402
  1 1 A 1.0
392
403
  ```
393
404
 
394
- ### Select rows from top or from bottom
405
+ ### Select records (rows) from top or from bottom
395
406
 
396
407
  `head(n=5)`, `tail(n=5)`, `first(n=1)`, `last(n=1)`
397
408
 
398
409
  ## Sub DataFrame manipulations
399
410
 
400
- ### `pick ` - pick up variables by key label -
411
+ ### `pick ` - pick up variables -
401
412
 
402
- Pick up some columns (variables) to create a sub DataFrame.
413
+ Pick up some variables (columns) to create a sub DataFrame.
403
414
 
404
415
  ![pick method image](doc/../image/dataframe/pick.png)
405
416
 
@@ -491,9 +502,9 @@ penguins.to_rover
491
502
  343 49.9 16.1 213
492
503
  ```
493
504
 
494
- ### `drop ` - pick and drop -
505
+ ### `drop ` - counterpart of pick -
495
506
 
496
- Drop some columns (variables) to create a remainer DataFrame.
507
+ Drop some variables (columns) to create a remainer DataFrame.
497
508
 
498
509
  ![drop method image](doc/../image/dataframe/drop.png)
499
510
 
@@ -557,9 +568,9 @@ penguins.to_rover
557
568
  [1, 2, 3]
558
569
  ```
559
570
 
560
- ### `slice ` - to cut vertically is slice -
571
+ ### `slice ` - slice and select records -
561
572
 
562
- Slice and select rows (observations) to create a sub DataFrame.
573
+ Slice and select records (rows) to create a sub DataFrame.
563
574
 
564
575
  ![slice method image](doc/../image/dataframe/slice.png)
565
576
 
@@ -570,7 +581,7 @@ penguins.to_rover
570
581
  Negative index from the tail like Ruby's Array is also acceptable.
571
582
 
572
583
  ```ruby
573
- # returns 5 obs. at start and 5 obs. from end
584
+ # returns 5 records at start and 5 records from end
574
585
  penguins.slice(0...5, -5..-1)
575
586
 
576
587
  # =>
@@ -665,9 +676,9 @@ penguins.to_rover
665
676
  0 1 A 1.000000
666
677
  ```
667
678
 
668
- ### `remove`
679
+ ### `remove` - counterpart of slice -
669
680
 
670
- Slice and reject rows (observations) to create a remainer DataFrame.
681
+ Slice and reject records (rows) to create a remainer DataFrame.
671
682
 
672
683
  ![remove method image](doc/../image/dataframe/remove.png)
673
684
 
@@ -676,7 +687,7 @@ penguins.to_rover
676
687
  `remove(indeces)` accepts indeces as arguments. Indeces should be an Integer or a Range of Integer.
677
688
 
678
689
  ```ruby
679
- # returns 6th to 339th obs.
690
+ # returns 6th to 339th records
680
691
  penguins.remove(0...5, -5..-1)
681
692
 
682
693
  # =>
@@ -699,7 +710,7 @@ penguins.to_rover
699
710
  `remove(booleans)` accepts booleans as an argument in an Array, a Vector or an Arrow::BooleanArray . Booleans must be same length as `size`.
700
711
 
701
712
  ```ruby
702
- # remove all observation contains nil
713
+ # remove all records contains nil
703
714
  removed = penguins.remove { vectors.map(&:is_nil).reduce(&:|) }
704
715
  removed
705
716
 
@@ -785,7 +796,7 @@ penguins.to_rover
785
796
 
786
797
  ### `rename`
787
798
 
788
- Rename keys (column names) to create a updated DataFrame.
799
+ Rename keys (variable/column names) to create a updated DataFrame.
789
800
 
790
801
  ![rename method image](doc/../image/dataframe/rename.png)
791
802
 
@@ -820,7 +831,7 @@ penguins.to_rover
820
831
 
821
832
  ### `assign`
822
833
 
823
- Assign new or updated columns (variables) and create a updated DataFrame.
834
+ Assign new or updated variables (columns) and create an updated DataFrame.
824
835
 
825
836
  - Variables with new keys will append new columns from the right.
826
837
  - Variables with exisiting keys will update corresponding vectors.
@@ -1009,7 +1020,7 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1009
1020
 
1010
1021
  ### `sort`
1011
1022
 
1012
- `sort` accepts parameters as sort_keys thanks to the amazing Red Arrow feature。
1023
+ `sort` accepts parameters as sort_keys thanks to the Red Arrow's feature。
1013
1024
  - :key, "key" or "+key" denotes ascending order
1014
1025
  - "-key" denotes descending order
1015
1026
 
@@ -1040,7 +1051,7 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1040
1051
 
1041
1052
  ### `remove_nil`
1042
1053
 
1043
- Remove any observations containing nil.
1054
+ Remove any records containing nil.
1044
1055
 
1045
1056
  ## Grouping
1046
1057
 
@@ -1210,7 +1221,7 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1210
1221
 
1211
1222
  ### `to_long(*keep_keys)`
1212
1223
 
1213
- Creates a 'long' (tidy) DataFrame from a 'wide' DataFrame.
1224
+ Creates a 'long' (may be tidy) DataFrame from a 'wide' DataFrame.
1214
1225
 
1215
1226
  - Parameter `keep_keys` specifies the key names to keep.
1216
1227
 
@@ -1257,7 +1268,7 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1257
1268
 
1258
1269
  ### `to_wide`
1259
1270
 
1260
- Creates a 'wide' (messy) DataFrame from a 'long' DataFrame.
1271
+ Creates a 'wide' (may be messy) DataFrame from a 'long' DataFrame.
1261
1272
 
1262
1273
  - Option `:name` is the key of the column which will be expanded **to key names**.
1263
1274
  The default value is `:NAME` if it is not specified.
@@ -1282,9 +1293,280 @@ When the option `keep_key: true` used, the column `key` will be preserved.
1282
1293
 
1283
1294
  ## Combine
1284
1295
 
1285
- - [ ] Combining dataframes
1296
+ ### `join`
1297
+ ![dataframe joining image](doc/../image/dataframe/join.png)
1298
+
1299
+ You should use specific `*_join` methods below.
1300
+
1301
+ - `other` is a DataFrame or a Arrow::Table.
1302
+ - `join_keys` are keys shared by self and other to match with them.
1303
+ - If `join_keys` are empty, common keys in self and other are chosen (natural join).
1304
+ - If (common keys) > `join_keys`, duplicated keys are renamed by `suffix`.
1305
+ - If you want to match the columns with different names,
1306
+ use Hash for `join_keys` such as `{ left: :KEY1, right: KEY2}`.
1307
+
1308
+ These are dataframes to use in the examples of joins.
1309
+ ```ruby
1310
+ df = DataFrame.new(
1311
+ KEY: %w[A B C],
1312
+ X1: [1, 2, 3]
1313
+ )
1314
+ #=>
1315
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000012a70>
1316
+ KEY X1
1317
+ <string> <uint8>
1318
+ 0 A 1
1319
+ 1 B 2
1320
+ 2 C 3
1321
+
1322
+ other = DataFrame.new(
1323
+ KEY: %w[A B D],
1324
+ X2: [true, false, nil]
1325
+ )
1326
+ #=>
1327
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000017034>
1328
+ KEY X2
1329
+ <string> <boolean>
1330
+ 0 A true
1331
+ 1 B false
1332
+ 2 D (nil)
1333
+ ```
1334
+
1335
+ #### Mutating joins
1336
+
1337
+ ##### `inner_join(other, join_keys = nil, suffix: '.1')`
1338
+
1339
+ Join data, leaving only the matching records.
1340
+
1341
+ ```ruby
1342
+ df.inner_join(other, :KEY)
1343
+ #=>
1344
+ #<RedAmber::DataFrame : 2 x 3 Vectors, 0x000000000001e2bc>
1345
+ KEY X1 X2
1346
+ <string> <uint8> <boolean>
1347
+ 0 A 1 true
1348
+ 1 B 2 false
1349
+ ```
1350
+
1351
+ ##### `full_join(other, join_keys = nil, suffix: '.1')`
1352
+
1353
+ Join data, leaving all records.
1354
+
1355
+ ```ruby
1356
+ df.full_join(other, :KEY)
1357
+ #=>
1358
+ #<RedAmber::DataFrame : 4 x 3 Vectors, 0x0000000000029fcc>
1359
+ KEY X1 X2
1360
+ <string> <uint8> <boolean>
1361
+ 0 A 1 true
1362
+ 1 B 2 false
1363
+ 2 C 3 (nil)
1364
+ 3 D (nil) (nil)
1365
+ ```
1366
+
1367
+ ##### `left_join(other, join_keys = nil, suffix: '.1')`
1368
+
1369
+ Join matching values to self from other.
1370
+
1371
+ ```ruby
1372
+ df.left_join(other, :KEY)
1373
+ #=>
1374
+ #<RedAmber::DataFrame : 3 x 3 Vectors, 0x0000000000029fcc>
1375
+ KEY X1 X2
1376
+ <string> <uint8> <boolean>
1377
+ 0 A 1 true
1378
+ 1 B 2 false
1379
+ 2 C 3 (nil)
1380
+ ```
1381
+
1382
+ ##### `right_join(other, join_keys = nil, suffix: '.1')`
1383
+
1384
+ Join matching values from self to other.
1286
1385
 
1287
- - [ ] Join
1386
+ ```ruby
1387
+ df.right_join(other, :KEY)
1388
+ #=>
1389
+ #<RedAmber::DataFrame : 2 x 3 Vectors, 0x0000000000029fcc>
1390
+ KEY X1 X2
1391
+ <string> <uint8> <boolean>
1392
+ 0 A 1 true
1393
+ 1 B 2 false
1394
+ 2 D (nil) (nil)
1395
+ ```
1396
+
1397
+ #### Filtering join
1398
+
1399
+ ##### `semi_join(other, join_keys = nil, suffix: '.1')`
1400
+
1401
+ Return records of self that have a match in other.
1402
+
1403
+ ```ruby
1404
+ df.semi_join(other, :KEY)
1405
+ #=>
1406
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000029fcc>
1407
+ KEY X1
1408
+ <string> <uint8>
1409
+ 0 A 1
1410
+ 1 B 2
1411
+ ```
1412
+
1413
+ ##### `anti_join(other, join_keys = nil, suffix: '.1')`
1414
+
1415
+ Return records of self that do not have a match in other.
1416
+
1417
+ ```ruby
1418
+ df.anti_join(other, :KEY)
1419
+ #=>
1420
+ #<RedAmber::DataFrame : 1 x 2 Vectors, 0x0000000000029fcc>
1421
+ KEY X1
1422
+ <string> <uint8>
1423
+ 0 C 3
1424
+ ```
1425
+
1426
+ ## Set operations
1427
+ ![dataframe set and binding image](doc/../image/dataframe/set_and_bind.png)
1428
+
1429
+ Keys in self and other must be same in set operations.
1430
+
1431
+ ```ruby
1432
+ df = DataFrame.new(
1433
+ KEY1: %w[A B C],
1434
+ KEY2: [1, 2, 3]
1435
+ )
1436
+ #=>
1437
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000012a70>
1438
+ KEY1 KEY2
1439
+ <string> <uint8>
1440
+ 0 A 1
1441
+ 1 B 2
1442
+ 2 C 3
1443
+
1444
+ other = DataFrame.new(
1445
+ KEY1: %w[A B D],
1446
+ KEY2: [1, 4, 5]
1447
+ )
1448
+ #=>
1449
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x0000000000017034>
1450
+ KEY1 KEY2
1451
+ <string> <uint8>
1452
+ 0 A 1
1453
+ 1 B 4
1454
+ 2 D 5
1455
+ ```
1456
+
1457
+ ##### `intersect(other)`
1458
+
1459
+ Select records appearing in both self and other.
1460
+
1461
+ ```ruby
1462
+ df.intersect(other)
1463
+ #=>
1464
+ #<RedAmber::DataFrame : 1 x 2 Vectors, 0x0000000000029fcc>
1465
+ KEY1 KEY2
1466
+ <string> <uint8>
1467
+ 0 A 1
1468
+ ```
1469
+
1470
+ ##### `union(other)`
1471
+
1472
+ Select records appearing in self or other.
1473
+
1474
+ ```ruby
1475
+ df.union(other)
1476
+ #=>
1477
+ #<RedAmber::DataFrame : 5 x 2 Vectors, 0x0000000000029fcc>
1478
+ KEY1 KEY2
1479
+ <string> <uint8>
1480
+ 0 A 1
1481
+ 1 B 2
1482
+ 2 C 3
1483
+ 3 B 4
1484
+ 4 D 5
1485
+ ```
1486
+
1487
+ ##### `difference(other)`
1488
+
1489
+ Select records appearing in self but not in other.
1490
+
1491
+ It has an alias `setdiff`.
1492
+
1493
+ ```ruby
1494
+ df.difference(other)
1495
+ #=>
1496
+ #<RedAmber::DataFrame : 1 x 2 Vectors, 0x0000000000029fcc>
1497
+ KEY1 KEY2
1498
+ <string> <uint8>
1499
+ 1 B 2
1500
+ 2 C 3
1501
+ ```
1502
+
1503
+ ## Binding
1504
+
1505
+ ### `concatenate(other)`
1506
+
1507
+ Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
1508
+
1509
+ The alias is `concat`.
1510
+
1511
+ An array of DataFrames or Tables is also acceptable as other.
1512
+
1513
+ ```ruby
1514
+ df
1515
+ #=>
1516
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000022cb8>
1517
+ x y
1518
+ <uint8> <string>
1519
+ 0 1 A
1520
+ 1 2 B
1521
+
1522
+ other
1523
+ #=>
1524
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x000000000001f6d0>
1525
+ x y
1526
+ <uint8> <string>
1527
+ 0 3 C
1528
+ 1 4 D
1529
+
1530
+ df.concatenate(other)
1531
+ #=>
1532
+ #<RedAmber::DataFrame : 4 x 2 Vectors, 0x0000000000022574>
1533
+ x y
1534
+ <uint8> <string>
1535
+ 0 1 A
1536
+ 1 2 B
1537
+ 2 3 C
1538
+ 3 4 D
1539
+ ```
1540
+
1541
+ ### `merge(other)`
1542
+
1543
+ Concatenate another DataFrame or Table onto the bottom of self. The shape and data type of other must be the same as self.
1544
+
1545
+ ```ruby
1546
+ df
1547
+ #=>
1548
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000009150>
1549
+ x y
1550
+ <uint8> <uint8>
1551
+ 0 1 3
1552
+ 1 2 4
1553
+
1554
+ other
1555
+ #=>
1556
+ #<RedAmber::DataFrame : 2 x 2 Vectors, 0x0000000000008a0c>
1557
+ a b
1558
+ <string> <string>
1559
+ 0 A C
1560
+ 1 B D
1561
+
1562
+ df.merge(other)
1563
+ #=>
1564
+ #<RedAmber::DataFrame : 2 x 4 Vectors, 0x000000000000cb70>
1565
+ x y a b
1566
+ <uint8> <uint8> <string> <string>
1567
+ 0 1 3 A C
1568
+ 1 2 4 B D
1569
+ ```
1288
1570
 
1289
1571
  ## Encoding
1290
1572
 
data/doc/Vector.md CHANGED
@@ -24,6 +24,9 @@ Class `RedAmber::Vector` represents a series of data in the DataFrame.
24
24
  vector = Vector.new(1..3)
25
25
  # or
26
26
  vector = Vector.new(Arrow::Array.new([1, 2, 3])
27
+ # or
28
+ require 'arrow-numo-narray'
29
+ vector = Vector.new(Numo::Int8[1, 2, 3])
27
30
 
28
31
  # =>
29
32
  #<RedAmber::Vector(:uint8, size=3):0x000000000000f514>
@@ -510,3 +513,91 @@ vector.shift(fill: Float::NAN)
510
513
  #<RedAmber::Vector(:double, size=5):0x0000000000011d3c>
511
514
  [NaN, 1.0, 2.0, 3.0, 4.0]
512
515
  ```
516
+
517
+ ### `split_to_columns(sep = ' ', limit = 0)`
518
+
519
+ Split string type Vector with any ASCII whitespace as separator.
520
+ Returns an Array of Vectors.
521
+
522
+ ```ruby
523
+ vector = Vector.new(['a b', 'c d', 'e f'])
524
+ vector.split_to_columns
525
+
526
+ #=>
527
+ [#<RedAmber::Vector(:string, size=3):0x00000000000363a8>
528
+ ["a", "c", "e"]
529
+ ,
530
+ #<RedAmber::Vector(:string, size=3):0x00000000000363bc>
531
+ ["b", "d", "f"]
532
+ ]
533
+ ```
534
+ It will be used for column splitting in DataFrame.
535
+
536
+ ```ruby
537
+ df = DataFrame.new(year_month: %w[2022-01 2022-02 2022-03])
538
+ .assign(:year, :month) { year_month.split_to_columns('-') }
539
+ .drop(:year_month)
540
+
541
+ #=>
542
+ #<RedAmber::DataFrame : 3 x 2 Vectors, 0x000000000000f974>
543
+ year month
544
+ <string> <string>
545
+ 0 2022 01
546
+ 1 2022 02
547
+ 2 2022 03
548
+ ```
549
+
550
+ ### `split_to_rows(sep = ' ', limit = 0)`
551
+
552
+ Split string type Vector with any ASCII whitespace as separator.
553
+ Returns an flattend into rows by Vector.
554
+
555
+ ```ruby
556
+ vector = Vector.new(['a b', 'c d', 'e f'])
557
+ vector.split_to_rows
558
+
559
+ #=>
560
+ #<RedAmber::Vector(:string, size=6):0x000000000002ccf4>
561
+ ["a", "b", "c", "d", "e", "f"]
562
+ ```
563
+
564
+ ### `merge(other, sep: ' ')`
565
+
566
+ Merge String or other string Vector to self using aseparator.
567
+ Self must be a string Vector.
568
+ Returns merged string Vector.
569
+
570
+ ```ruby
571
+ # with vector
572
+ vector = Vector.new(%w[a c e])
573
+ other = Vector.new(%w[b d f])
574
+ vector.merge(other)
575
+
576
+ #=>
577
+ #<RedAmber::Vector(:string, size=3):0x0000000000038b80>
578
+ ["a b", "c d", "e f"]
579
+ ```
580
+
581
+ If other is a String it will be broadcasted.
582
+
583
+ ```ruby
584
+ # with vector
585
+ vector = Vector.new(%w[a c e])
586
+
587
+ #=>
588
+ #<RedAmber::Vector(:string, size=3):0x00000000000446b0>
589
+ ["a x", "c x", "e x"]
590
+ ```
591
+
592
+ You can specify separator string by :sep.
593
+
594
+ ```ruby
595
+ # with vector
596
+ vector = Vector.new(%w[a c e])
597
+ other = Vector.new(%w[b d f])
598
+ vector.merge(other, sep: '')
599
+
600
+ #=>
601
+ #<RedAmber::Vector(:string, size=3):0x0000000000038b80>
602
+ ["ab", "cd", "ef"]
603
+ ```
Binary file
Binary file