galaaz 0.4.7 → 0.4.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +1179 -39
  3. data/Rakefile +4 -2
  4. data/bin/grun +1 -1
  5. data/bin/gstudio +1 -1
  6. data/blogs/dev/dev.Rmd +2 -56
  7. data/blogs/dev/dev.md +32 -61
  8. data/blogs/dev/dev2.Rmd +65 -0
  9. data/blogs/dplyr/dplyr.Rmd +29 -0
  10. data/blogs/{dev/dev.html → dplyr/dplyr.html} +88 -57
  11. data/blogs/dplyr/dplyr.md +58 -0
  12. data/blogs/gknit/gknit.html +1262 -25
  13. data/blogs/gknit/gknit.md +471 -27
  14. data/blogs/gknit/gknit_files/figure-html/bubble-1.png +0 -0
  15. data/blogs/manual/graph.rb +29 -0
  16. data/blogs/manual/manual.Rmd +567 -29
  17. data/blogs/manual/manual.html +743 -46
  18. data/blogs/manual/manual.md +1179 -39
  19. data/blogs/nse_dplyr/nse_dplyr.Rmd +466 -11
  20. data/blogs/nse_dplyr/nse_dplyr.html +472 -37
  21. data/blogs/nse_dplyr/nse_dplyr.md +645 -32
  22. data/blogs/ruby_plot/ruby_plot.Rmd +4 -4
  23. data/blogs/ruby_plot/ruby_plot.html +217 -2
  24. data/blogs/ruby_plot/ruby_plot.md +226 -1
  25. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
  26. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +2 -2
  27. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
  28. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +70 -70
  29. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
  30. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +72 -72
  31. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
  32. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +116 -116
  33. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
  34. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +176 -176
  35. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
  36. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
  37. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +236 -236
  38. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
  39. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +176 -176
  40. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
  41. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +160 -160
  42. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
  43. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +105 -105
  44. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
  45. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +121 -121
  46. data/examples/islr/ch2.spec.rb +1 -1
  47. data/examples/islr/ch3_boston.rb +4 -4
  48. data/examples/islr/x_y_rnorm.jpg +0 -0
  49. data/lib/R_interface/r.rb +1 -1
  50. data/lib/R_interface/r_methods.rb +2 -2
  51. data/lib/R_interface/rdata_frame.rb +8 -5
  52. data/lib/R_interface/rindexed_object.rb +1 -2
  53. data/lib/R_interface/rlist.rb +1 -0
  54. data/lib/R_interface/robject.rb +0 -1
  55. data/lib/R_interface/rpkg.rb +14 -6
  56. data/lib/R_interface/rsupport.rb +7 -9
  57. data/lib/R_interface/ruby_extensions.rb +17 -5
  58. data/lib/gknit/knitr_engine.rb +9 -2
  59. data/lib/util/exec_ruby.rb +2 -2
  60. data/specs/r_dataframe.spec.rb +173 -0
  61. data/specs/r_list.spec.rb +4 -4
  62. data/specs/ruby_expression.spec.rb +2 -11
  63. data/specs/tmp.rb +76 -34
  64. data/version.rb +1 -1
  65. metadata +17 -6
  66. data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
  67. data/blogs/dev/dev_files/figure-html/diverging_bar. +0 -0
  68. data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c79823d334f1b44a16be1d98c2c075b04ac734c3b80003334b4cc6ad6697a975
4
- data.tar.gz: 818de59d34378899f8b442e301624748d5447137768a9b9131721b86a8d25bdd
3
+ metadata.gz: 58c62af9441e2a11082559e619498f518d1060d64e4078dea85a89cb795305d5
4
+ data.tar.gz: 72d415df8641497ed7ca1a442a8e06e5eaab21aa7f9162dfa132cf2728808948
5
5
  SHA512:
6
- metadata.gz: '090b147d622051c62aeda052e3d77b23aa9db5478bebf869d99d73bdc377db01b49ead2b09db084c4acf34f8712b839ca1b1a7f45ddf5c22125cf49f5b2b52ab'
7
- data.tar.gz: d256afe6d9fbabcd44491762e6b3a6c618b29c7302b08621c89a92e0b94681d0519e45f0271db944ad96d0dd2e54133cba01f57a6602edfd1a34b23a1b14e1bc
6
+ metadata.gz: 7a6559d5aa01378730bdd1e9b47ec157eaea0dddf70f2d8974a6de36a86c746e429d71293adb511ce746cdd5ffba913eb3db7e9c39b8b0b370a8162ee0f511e3
7
+ data.tar.gz: e247f80a65ed5a338b553ce0e2226d46ab4527e8460beb3a1871646f6cbf83be653d8fa0d57fbd32c727377fa826e7b84463858ac0c9047baf7fe9c21ff82ad0
data/README.md CHANGED
@@ -22,7 +22,6 @@ fontsize: 11pt
22
22
 
23
23
 
24
24
 
25
-
26
25
  # Introduction
27
26
 
28
27
  Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with a large
@@ -96,9 +95,20 @@ Panda, SciPy, SciKit-Learn and a couple more.
96
95
 
97
96
  > galaaz master_list:scatter_plot
98
97
 
99
- # Basic Types
98
+ # gKnitting a Document
99
+
100
+ This manual has been formatted usign gKnit. gKnit uses Knitr and R markdown to knit
101
+ a document in Ruby or R and output it in any of the available formats for R markdown.
102
+ gKnit runs atop of GraalVM, and Galaaz. In gKnit, Ruby variables are persisted between
103
+ chunks, making it an ideal solution for literate programming.
104
+ Also, since it is based on Galaaz, Ruby chunks can have access to R variables and Polyglot
105
+ Programming with Ruby and R is quite natural.
106
+
107
+ gknit was describe in more depth in:
108
+
109
+ * xxx.xxxx.xxx
100
110
 
101
- ## Vectors
111
+ # Vector
102
112
 
103
113
  Vectors can be thought of as contiguous cells containing data. Cells are accessed through
104
114
  indexing operations such as x[5]. Galaaz has six basic (‘atomic’) vector types: logical,
@@ -134,11 +144,13 @@ puts @vec
134
144
 
135
145
  Lets take a look at the type, mode and storage.mode of our vector @vec. In order to print
136
146
  this out, we are creating a data frame 'df' and printing it out. A data frame, for those
137
- not familiar with it, it basically a table. Here we create the data frame and add the
147
+ not familiar with it, is basically a table. Here we create the data frame and add the
138
148
  column name by passing named parameters for each column, such as 'typeof:', 'mode:' and
139
- 'storage__mode'. You should also note here that the double underscore is converted to a '.'.
149
+ 'storage__mode?'. You should also note here that the double underscore is converted to a '.'.
150
+ So, when printed 'storage\_\_mode' will actually print as 'storage.mode'.
140
151
 
141
- In R, the method used to create a data frame is 'data.frame', in Galaaz we use 'data__frame'.
152
+ Data frames will later be more carefully described. In R, the method used to create a
153
+ data frame is 'data.frame', in Galaaz we use 'data\_\_frame'.
142
154
 
143
155
 
144
156
  ```ruby
@@ -207,25 +219,235 @@ vec = R.c(1, hello, 5)
207
219
  ```
208
220
  ## Message:
209
221
  ## (eval):1:in `exec_ruby'
210
- ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:137:in `instance_eval'
211
- ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:137:in `exec_ruby'
212
- ## /home/rbotafogo/desenv/galaaz/lib/gknit/ruby_engine.rb:55:in `block in initialize'
222
+ ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:141:in `instance_eval'
223
+ ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:141:in `exec_ruby'
224
+ ## /home/rbotafogo/desenv/galaaz/lib/gknit/knitr_engine.rb:657:in `block in initialize'
213
225
  ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `call'
214
226
  ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `callback'
215
227
  ## (eval):3:in `function(...) {\n rb_method(...)'
216
228
  ## unknown.r:1:in `in_dir'
217
- ## unknown.r:1:in `block_exec'
218
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc12/jre/languages/R/library/knitr/R/block.R:91:in `call_block'
219
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc12/jre/languages/R/library/knitr/R/block.R:6:in `process_group.block'
220
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc12/jre/languages/R/library/knitr/R/block.R:3:in `<no source>'
229
+ ## unknown.r:1:in `block_exec:BLOCK0'
230
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:102:in `block_exec'
231
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:92:in `call_block'
232
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:6:in `process_group.block'
233
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:3:in `<no source>'
221
234
  ## unknown.r:1:in `withCallingHandlers'
222
235
  ## unknown.r:1:in `process_file'
223
- ## unknown.r:1:in `<no source>'
224
- ## unknown.r:1:in `<no source>'
225
- ## <REPL>:4:in `<repl wrapper>'
236
+ ## unknown.r:1:in `<no source>:BLOCK1'
237
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/output.R:129:in `<no source>'
238
+ ## unknown.r:1:in `<no source>:BLOCK1'
239
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/rmarkdown/R/render.R:162:in `<no source>'
240
+ ## <REPL>:5:in `<repl wrapper>'
226
241
  ## <REPL>:1
227
242
  ```
228
243
 
244
+ Here is a vector with logical values
245
+
246
+
247
+ ```ruby
248
+ @vec = R.c(true, true, false, false, true)
249
+ puts @vec
250
+ ```
251
+
252
+ ```
253
+ ## [1] TRUE TRUE FALSE FALSE TRUE
254
+ ```
255
+
256
+ ## Combining Vectors
257
+
258
+ The 'c' functions used to create vectors can also be used to combine two vectors:
259
+
260
+
261
+ ```ruby
262
+ @vec1 = R.c(10.0, 20.0, 30.0)
263
+ @vec2 = R.c(4.0, 5.0, 6.0)
264
+ @vec = R.c(@vec1, @vec2)
265
+ puts @vec
266
+ ```
267
+
268
+ ```
269
+ ## [1] 10 20 30 4 5 6
270
+ ```
271
+ In galaaz, methods can be chainned (somewhat like the pipe operator in R %>%, but more generic).
272
+ In this next example, method 'c' is chainned after '@vec1'. This also looks like 'c' is a
273
+ method of the vector, but in reallity, this is actually closer to the pipe operator. When
274
+ Galaaz identifies that 'c' is not a method of 'vec' it actually tries to call 'R.c' with
275
+ '@vec1' as the first argument concatenated with all the other available arguments. The code
276
+ bellow is automatically converted to the code above.
277
+
278
+
279
+ ```ruby
280
+ @vec = @vec1.c(@vec2)
281
+ puts @vec
282
+ ```
283
+
284
+ ```
285
+ ## [1] 10 20 30 4 5 6
286
+ ```
287
+
288
+ ## Vector Arithmetic
289
+
290
+ Arithmetic operations on vectors are performed element by element:
291
+
292
+
293
+ ```ruby
294
+ puts @vec1 + @vec2
295
+ ```
296
+
297
+ ```
298
+ ## [1] 14 25 36
299
+ ```
300
+
301
+
302
+ ```ruby
303
+ puts @vec1 * 5
304
+ ```
305
+
306
+ ```
307
+ ## [1] 50 100 150
308
+ ```
309
+
310
+ When vectors have different length, a recycling rule is applied to the shorter vector:
311
+
312
+
313
+ ```ruby
314
+ @vec3 = R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)
315
+ puts @vec4 = @vec1 + @vec3
316
+ ```
317
+
318
+ ```
319
+ ## [1] 11 22 33 14 25 36 17 28 39
320
+ ```
321
+
322
+ ## Vector Indexing
323
+
324
+ Vectors can be indexed by using the '[]' operator:
325
+
326
+
327
+ ```ruby
328
+ puts @vec4[3]
329
+ ```
330
+
331
+ ```
332
+ ## [1] 33
333
+ ```
334
+
335
+ We can also index a vector with another vector. For example, in the code bellow, we take elements
336
+ 1, 3, 5, and 7 from @vec3:
337
+
338
+
339
+ ```ruby
340
+ puts @vec4[R.c(1, 3, 5, 7)]
341
+ ```
342
+
343
+ ```
344
+ ## [1] 11 33 25 17
345
+ ```
346
+
347
+ Repeating an index and having indices out of order is valid code:
348
+
349
+
350
+ ```ruby
351
+ puts @vec4[R.c(1, 3, 3, 1)]
352
+ ```
353
+
354
+ ```
355
+ ## [1] 11 33 33 11
356
+ ```
357
+
358
+ It is also possible to index a vector with a negative number or negative vector. In these cases
359
+ the indexed values are not returned:
360
+
361
+
362
+ ```ruby
363
+ puts @vec4[-3]
364
+ puts @vec4[-R.c(1, 3, 5, 7)]
365
+ ```
366
+
367
+ ```
368
+ ## [1] 11 22 14 25 36 17 28 39
369
+ ## [1] 22 14 36 28 39
370
+ ```
371
+
372
+ If an index is out of range, a missing value (NA) will be reported.
373
+
374
+
375
+ ```ruby
376
+ puts @vec4[30]
377
+ ```
378
+
379
+ ```
380
+ ## [1] NA
381
+ ```
382
+
383
+ It is also possible to index a vector by range:
384
+
385
+
386
+ ```ruby
387
+ puts @vec4[(2..5)]
388
+ ```
389
+
390
+ ```
391
+ ## [1] 22 33 14 25
392
+ ```
393
+
394
+ Elements in a vector can be named using the 'names' attribute of a vector:
395
+
396
+
397
+ ```ruby
398
+ full_name = R.c("Rodrigo", "A", "Botafogo")
399
+ full_name.names = R.c("First", "Middle", "Last")
400
+ puts full_name
401
+ ```
402
+
403
+ ```
404
+ ## First Middle Last
405
+ ## "Rodrigo" "A" "Botafogo"
406
+ ```
407
+
408
+ Or it can also be named by using the 'c' function with named paramenters:
409
+
410
+
411
+ ```ruby
412
+ full_name = R.c(First: "Rodrigo", Middle: "A", Last: "Botafogo")
413
+ puts full_name
414
+ ```
415
+
416
+ ```
417
+ ## First Middle Last
418
+ ## "Rodrigo" "A" "Botafogo"
419
+ ```
420
+
421
+ ## Extracting Native Ruby Types from a Vector
422
+
423
+ Vectors created with 'R.c' are of class R::Vector. You might have noticed that when indexing a
424
+ vector, a new vector is returned, even if this vector has one single element. In order to use
425
+ R::Vector with other ruby classes it might be necessary to extract the actual Ruby native type
426
+ from the vector. In order to do this extraction the '>>' operator is used.
427
+
428
+
429
+ ```ruby
430
+ puts @vec4
431
+ puts @vec4 >> 0
432
+ puts @vec4 >> 4
433
+ ```
434
+
435
+ ```
436
+ ## [1] 11 22 33 14 25 36 17 28 39
437
+ ## 11.0
438
+ ## 25.0
439
+ ```
440
+
441
+ Note that indexing with '>>' starts at 0 and not at 1, also, we cannot do negative indexing.
442
+
443
+ # Accessing R variables
444
+
445
+ Galaaz allows Ruby to access variables created in R. For example, the 'mtcars' data set is
446
+ available in R and can be accessed from Ruby by using the 'tilda' operator followed by the
447
+ symbol for the variable, in this case ':mtcar'. In the code bellow method 'outputs' is
448
+ used to output the 'mtcars' data set nicely formatted in HTML by use of the 'kable' and
449
+ 'kable_styling' functions. Method 'outputs' is only available when used with 'gknit'.
450
+
229
451
 
230
452
  ```ruby
231
453
  outputs (~:mtcars).kable.kable_styling
@@ -700,40 +922,958 @@ outputs (~:mtcars).kable.kable_styling
700
922
  </tbody>
701
923
  </table>
702
924
 
925
+ # Matrix
926
+
927
+ A matrix is a collection of elements organized as a two dimensional table. A matrix can be
928
+ created by the 'matrix' function:
929
+
930
+
931
+ ```ruby
932
+ @mat = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
933
+ nrow: 3,
934
+ ncol: 3)
935
+
936
+ puts @mat
937
+ ```
938
+
939
+ ```
940
+ ## [,1] [,2] [,3]
941
+ ## [1,] 1 4 7
942
+ ## [2,] 2 5 8
943
+ ## [3,] 3 6 9
944
+ ```
945
+ Note that matrices data is organized by column first. It is possible to organize the matrix
946
+ memory by row first passing an extra argument to the 'matrix' function:
947
+
948
+
949
+ ```ruby
950
+ @mat_row = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
951
+ nrow: 3,
952
+ ncol: 3,
953
+ byrow: true)
954
+
955
+ puts @mat_row
956
+ ```
957
+
958
+ ```
959
+ ## [,1] [,2] [,3]
960
+ ## [1,] 1 2 3
961
+ ## [2,] 4 5 6
962
+ ## [3,] 7 8 9
963
+ ```
964
+
965
+ ## Indexing a Matrix
966
+
967
+ A matrix can be indexed by [row, column]:
968
+
969
+
970
+ ```ruby
971
+ puts @mat_row[1, 1]
972
+ puts @mat_row[2, 3]
973
+ ```
974
+
975
+ ```
976
+ ## [1] 1
977
+ ## [1] 6
978
+ ```
979
+ It is possible to index an entire row or column with the ':all' keyword
980
+
981
+
982
+ ```ruby
983
+ puts @mat_row[1, :all]
984
+ puts @mat_row[:all, 2]
985
+ ```
986
+
987
+ ```
988
+ ## [1] 1 2 3
989
+ ## [1] 2 5 8
990
+ ```
991
+
992
+ Indexing with a vector is also possible for matrices. In the following example we want
993
+ rows 1 and 3 and columns 2 and 3 building a 2 x 2 matrix.
994
+
995
+
996
+ ```ruby
997
+ puts @mat_row[R.c(1, 3), R.c(2, 3)]
998
+ ```
999
+
1000
+ ```
1001
+ ## [,1] [,2]
1002
+ ## [1,] 2 3
1003
+ ## [2,] 8 9
1004
+ ```
1005
+
1006
+ Matrices can be combined with functions 'rbind' and 'cbind'
1007
+
1008
+
1009
+ ```ruby
1010
+ puts @mat_row.rbind(@mat)
1011
+ puts @mat_row.cbind(@mat)
1012
+ ```
1013
+
1014
+ ```
1015
+ ## [,1] [,2] [,3]
1016
+ ## [1,] 1 2 3
1017
+ ## [2,] 4 5 6
1018
+ ## [3,] 7 8 9
1019
+ ## [4,] 1 4 7
1020
+ ## [5,] 2 5 8
1021
+ ## [6,] 3 6 9
1022
+ ## [,1] [,2] [,3] [,4] [,5] [,6]
1023
+ ## [1,] 1 2 3 1 4 7
1024
+ ## [2,] 4 5 6 2 5 8
1025
+ ## [3,] 7 8 9 3 6 9
1026
+ ```
1027
+
1028
+ # List
1029
+
1030
+ A list is a data structure that can contain sublists of different types, while vector and matrix
1031
+ can only hold one type of element.
1032
+
1033
+
1034
+ ```ruby
1035
+ nums = R.c(1.0, 2.0, 3.0)
1036
+ strs = R.c("a", "b", "c", "d")
1037
+ bool = R.c(true, true, false)
1038
+ @lst = R.list(nums: nums, strs: strs, bool: bool)
1039
+ puts @lst
1040
+ ```
1041
+
1042
+ ```
1043
+ ## $nums
1044
+ ## [1] 1 2 3
1045
+ ##
1046
+ ## $strs
1047
+ ## [1] "a" "b" "c" "d"
1048
+ ##
1049
+ ## $bool
1050
+ ## [1] TRUE TRUE FALSE
1051
+ ```
1052
+
1053
+ Note that '@lst' elements are named elements.
1054
+
1055
+
1056
+ ## List Indexing
1057
+
1058
+ List indexing, also called slicing, is done using the '[]' operator and the '[[]]' operator. Let's
1059
+ first start with the '[]' operator. The list above has three sublist indexing with '[]' will
1060
+ return one of the sublists.
1061
+
1062
+
1063
+ ```ruby
1064
+ puts @lst[1]
1065
+ ```
1066
+
1067
+ ```
1068
+ ## $nums
1069
+ ## [1] 1 2 3
1070
+ ```
1071
+
1072
+ Note that when using '[]' a new list is returned. When using the double square bracket operator
1073
+ the value returned is the actual element of the list in the given position and not a slice of
1074
+ the original list
1075
+
1076
+
1077
+
1078
+ ```ruby
1079
+ puts @lst[[1]]
1080
+ ```
1081
+
1082
+ ```
1083
+ ## [1] 1 2 3
1084
+ ```
1085
+
1086
+ When elements are named, as dones with @lst, indexing can be done by name:
1087
+
1088
+
1089
+ ```ruby
1090
+ puts @lst[['bool']][[1]] >> 0
1091
+ ```
1092
+
1093
+ ```
1094
+ ## true
1095
+ ```
1096
+
1097
+ In this example, first the 'bool' element of the list was extracted, not as a list, but as a vector,
1098
+ then the first element of the vector was extracted (note that vectors also accept the '[[]]'
1099
+ operator) and then the vector was indexed by its first element, extracting the native Ruby type.
1100
+
1101
+
1102
+ # Data Frame
1103
+
1104
+ A data frame is a table like structure in which each column has the same number of
1105
+ rows. Data frames are the basic structure for storing data for data analysis. We have already
1106
+ seen a data frame previously when we accessed variable '~:mtcars'. In order to create a
1107
+ data frame, function 'data__frame' is used:
1108
+
1109
+
1110
+ ```ruby
1111
+ df = R.data__frame(
1112
+ year: R.c(2010, 2011, 2012),
1113
+ income: R.c(1000.0, 1500.0, 2000.0))
1114
+
1115
+ puts df
1116
+ ```
1117
+
1118
+ ```
1119
+ ## year income
1120
+ ## 1 2010 1000
1121
+ ## 2 2011 1500
1122
+ ## 3 2012 2000
1123
+ ```
1124
+
1125
+ ## Data Frame Indexing
1126
+
1127
+ A data frame can be indexed the same way as a matrix, by using '[row, column]', where row and
1128
+ column can either be a numeric or the name of the row or column
1129
+
1130
+
1131
+ ```ruby
1132
+ puts (~:mtcars).head
1133
+ puts (~:mtcars)[1, 2]
1134
+ puts (~:mtcars)['Datsun 710', 'mpg']
1135
+ ```
1136
+
1137
+ ```
1138
+ ## mpg cyl disp hp drat wt qsec vs am gear carb
1139
+ ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
1140
+ ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
1141
+ ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
1142
+ ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
1143
+ ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
1144
+ ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
1145
+ ## [1] 6
1146
+ ## [1] 22.8
1147
+ ```
1148
+
1149
+ Extracting a column from a data frame as a vector can be done by using the double square bracket
1150
+ operator:
1151
+
703
1152
 
704
- ## Graphics with ggplot
1153
+ ```ruby
1154
+ puts (~:mtcars)[['mpg']]
1155
+ ```
1156
+
1157
+ ```
1158
+ ## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2
1159
+ ## [15] 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4
1160
+ ## [29] 15.8 19.7 15.0 21.4
1161
+ ```
1162
+
1163
+ A data frame column can also be accessed as if it were an instance variable of the data frame:
1164
+
1165
+
1166
+ ```ruby
1167
+ puts (~:mtcars).mpg
1168
+ ```
1169
+
1170
+ ```
1171
+ ## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2
1172
+ ## [15] 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4
1173
+ ## [29] 15.8 19.7 15.0 21.4
1174
+ ```
1175
+
1176
+ Slicing a data frame can be done by indexing it with a vector (we use 'head' to reduce the
1177
+ output):
1178
+
1179
+
1180
+ ```ruby
1181
+ puts (~:mtcars)[R.c('mpg', 'hp')].head
1182
+ ```
1183
+
1184
+ ```
1185
+ ## mpg hp
1186
+ ## Mazda RX4 21.0 110
1187
+ ## Mazda RX4 Wag 21.0 110
1188
+ ## Datsun 710 22.8 93
1189
+ ## Hornet 4 Drive 21.4 110
1190
+ ## Hornet Sportabout 18.7 175
1191
+ ## Valiant 18.1 105
1192
+ ```
1193
+
1194
+ A row slice can be obtained by indexing by row and using the ':all' keyword for the column:
1195
+
1196
+
1197
+ ```ruby
1198
+ puts (~:mtcars)[R.c('Datsun 710', 'Camaro Z28'), :all]
1199
+ ```
1200
+
1201
+ ```
1202
+ ## mpg cyl disp hp drat wt qsec vs am gear carb
1203
+ ## Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
1204
+ ## Camaro Z28 13.3 8 350 245 3.73 3.84 15.41 0 0 3 4
1205
+ ```
1206
+
1207
+ Finally, a data frame can also be indexed with a logical vector. In this next example, the
1208
+ 'am' column of :mtcars is compared with 0 (with method 'eq'). When 'am' is equal to 0 the
1209
+ car is automatic. So, by doing '(~:mtcars).am.eq 0' a logical vector is created with
1210
+ 'true' whenever 'am' is 0 and 'false' otherwise. Using this logical vector, the data frame
1211
+ is indexed, returning a new data frame in which all cars have automatic transmission.
1212
+
1213
+
1214
+ ```ruby
1215
+ # obtain a vector with 'true' for cars with automatic transmission
1216
+ automatic = (~:mtcars).am.eq 0
1217
+ puts automatic
1218
+
1219
+ # slice the data frame by using this vector
1220
+ puts (~:mtcars)[automatic, :all]
1221
+ ```
1222
+
1223
+ ```
1224
+ ## [1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
1225
+ ## [12] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
1226
+ ## [23] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1227
+ ## mpg cyl disp hp drat wt qsec vs am gear carb
1228
+ ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
1229
+ ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
1230
+ ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
1231
+ ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
1232
+ ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
1233
+ ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
1234
+ ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
1235
+ ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
1236
+ ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
1237
+ ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
1238
+ ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
1239
+ ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
1240
+ ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
1241
+ ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
1242
+ ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
1243
+ ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
1244
+ ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
1245
+ ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
1246
+ ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
1247
+ ```
1248
+
1249
+ # Writing Expressions in Galaaz
1250
+
1251
+ Galaaz extends Ruby to work with complex expressions, similar to R's expressions build with 'quote'
1252
+ (base R) or 'quo' (tidyverse). Let's take a look at some of those expressions.
1253
+
1254
+ ## Expressions from operators
1255
+
1256
+ The code bellow
1257
+ creates an expression summing two symbols
1258
+
1259
+
1260
+ ```ruby
1261
+ exp1 = :a + :b
1262
+ puts exp1
1263
+ ```
1264
+
1265
+ ```
1266
+ ## a + b
1267
+ ```
1268
+ We can build any complex mathematical expression
1269
+
1270
+
1271
+ ```ruby
1272
+ exp2 = (:a + :b) * 2.0 + :c ** 2 / :z
1273
+ puts exp2
1274
+ ```
1275
+
1276
+ ```
1277
+ ## (a + b) * 2 + c^2L/z
1278
+ ```
1279
+
1280
+ It is also possible to use inequality operators in building expressions
1281
+
1282
+
1283
+ ```ruby
1284
+ exp3 = (:a + :b) >= :z
1285
+ puts exp3
1286
+ ```
1287
+
1288
+ ```
1289
+ ## a + b >= z
1290
+ ```
1291
+
1292
+ Galaaz provides both symbolic representations for operators, such as (>, <, !=) as functional
1293
+ notation for those operators such as (.gt, .ge, etc.). So the same expression written
1294
+ above can also be written as
1295
+
1296
+
1297
+ ```ruby
1298
+ exp4 = (:a + :b).ge :z
1299
+ puts exp4
1300
+ ```
1301
+
1302
+ ```
1303
+ ## a + b >= z
1304
+ ```
1305
+
1306
+ Two type of expression can only be created with the functional representation of the operators,
1307
+ those are expressions involving '==', and '='. In order to write an expression involving '==' we
1308
+ need to use the method '.eq' and for '=' we need the function '.assign'
1309
+
1310
+
1311
+ ```ruby
1312
+ exp5 = (:a + :b).eq :z
1313
+ puts exp5
1314
+ ```
1315
+
1316
+ ```
1317
+ ## a + b == z
1318
+ ```
1319
+
1320
+
1321
+ ```ruby
1322
+ exp6 = :y.assign :a + :b
1323
+ puts exp6
1324
+ ```
1325
+
1326
+ ```
1327
+ ## y <- a + b
1328
+ ```
1329
+ In general we think that using the functional notation is preferable to using the
1330
+ symbolic notation as otherwise, we end up writing invalid expressions such as
1331
+
1332
+
1333
+ ```ruby
1334
+ exp_wrong = (:a + :b) == :z
1335
+ puts exp_wrong
1336
+ ```
1337
+
1338
+ ```
1339
+ ## Message:
1340
+ ## Error in function (x, y, num.eq = TRUE, single.NA = TRUE, attrib.as.set = TRUE, :
1341
+ ## object 'a' not found (RError)
1342
+ ## Translated to internal error
1343
+ ```
1344
+ and it might be difficult to understand what is going on here. The problem lies with the fact that
1345
+ when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When the
1346
+ comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols at
1347
+ this time are not bound to anything and we get a "object 'a' not found" message.
1348
+ If we only use functional notation, this type of error will not occur.
1349
+
1350
+ ## Expressions with R methods
1351
+
1352
+ It is often necessary to create an expression that uses a method or function. For instance, in
1353
+ mathematics, it's quite natural to write an expressin such as $y = sin(x)$. In this case, the
1354
+ 'sin' function is part of the expression and should not immediately executed. Now, let's say
1355
+ that 'x' is an angle of 45$^\circ$ and we acttually want our expression to be $y = 0.850...$.
1356
+ When we want the function to be part of the expression, we call the function preceeding it
1357
+ by the letter E, such as 'E.sin(x)'
1358
+
1359
+
1360
+ ```ruby
1361
+ exp7 = :y.assign E.sin(:x)
1362
+ puts exp7
1363
+ ```
1364
+
1365
+ ```
1366
+ ## y <- sin(x)
1367
+ ```
1368
+
1369
+ # Manipulating Data
1370
+
1371
+ One of the major benefits of Galaaz is to bring strong data manipulation to Ruby. The following
1372
+ examples were extracted from Hardley's "R for Data Science" (https://r4ds.had.co.nz/). This
1373
+ is a highly recommended book for those not already familiar with the 'tidyverse' style of
1374
+ programming in R. In the sections to follow, we will limit ourselves to convert the R code to
1375
+ Galaaz.
1376
+
1377
+ For these
1378
+ examples, we will investigate the nycflights13 data set available on the package by the
1379
+ same name. We use function 'R.install_and_loads' that checks if the library is available
1380
+ locally, and if not, installs it. This data frame contains all 336,776 flights that
1381
+ departed from New York City in 2013. The data comes from the US Bureau of
1382
+ Transportation Statistics.
1383
+
1384
+
1385
+ ```ruby
1386
+ R.install_and_loads('nycflights13')
1387
+ R.library('dplyr')
1388
+ ```
1389
+
1390
+
1391
+ ```ruby
1392
+ @flights = ~:flights
1393
+ puts @flights.head.as__data__frame
1394
+ ```
1395
+
1396
+ ```
1397
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1398
+ ## 1 2013 1 1 517 515 2 830 819
1399
+ ## 2 2013 1 1 533 529 4 850 830
1400
+ ## 3 2013 1 1 542 540 2 923 850
1401
+ ## 4 2013 1 1 544 545 -1 1004 1022
1402
+ ## 5 2013 1 1 554 600 -6 812 837
1403
+ ## 6 2013 1 1 554 558 -4 740 728
1404
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1405
+ ## 1 11 UA 1545 N14228 EWR IAH 227 1400 5
1406
+ ## 2 20 UA 1714 N24211 LGA IAH 227 1416 5
1407
+ ## 3 33 AA 1141 N619AA JFK MIA 160 1089 5
1408
+ ## 4 -18 B6 725 N804JB JFK BQN 183 1576 5
1409
+ ## 5 -25 DL 461 N668DN LGA ATL 116 762 6
1410
+ ## 6 12 UA 1696 N39463 EWR ORD 150 719 5
1411
+ ## minute time_hour
1412
+ ## 1 15 2013-01-01 05:00:00
1413
+ ## 2 29 2013-01-01 05:00:00
1414
+ ## 3 40 2013-01-01 05:00:00
1415
+ ## 4 45 2013-01-01 05:00:00
1416
+ ## 5 0 2013-01-01 06:00:00
1417
+ ## 6 58 2013-01-01 05:00:00
1418
+ ```
1419
+
1420
+ ## Filtering rows with Filter
1421
+
1422
+ In this example we filter the flights data set by giving to the filter function two expressions:
1423
+ the first :month.eq 1
1424
+
1425
+
1426
+ ```ruby
1427
+ puts @flights.filter((:month.eq 1), (:day.eq 1)).head.as__data__frame
1428
+ ```
1429
+
1430
+ ```
1431
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1432
+ ## 1 2013 1 1 517 515 2 830 819
1433
+ ## 2 2013 1 1 533 529 4 850 830
1434
+ ## 3 2013 1 1 542 540 2 923 850
1435
+ ## 4 2013 1 1 544 545 -1 1004 1022
1436
+ ## 5 2013 1 1 554 600 -6 812 837
1437
+ ## 6 2013 1 1 554 558 -4 740 728
1438
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1439
+ ## 1 11 UA 1545 N14228 EWR IAH 227 1400 5
1440
+ ## 2 20 UA 1714 N24211 LGA IAH 227 1416 5
1441
+ ## 3 33 AA 1141 N619AA JFK MIA 160 1089 5
1442
+ ## 4 -18 B6 725 N804JB JFK BQN 183 1576 5
1443
+ ## 5 -25 DL 461 N668DN LGA ATL 116 762 6
1444
+ ## 6 12 UA 1696 N39463 EWR ORD 150 719 5
1445
+ ## minute time_hour
1446
+ ## 1 15 2013-01-01 05:00:00
1447
+ ## 2 29 2013-01-01 05:00:00
1448
+ ## 3 40 2013-01-01 05:00:00
1449
+ ## 4 45 2013-01-01 05:00:00
1450
+ ## 5 0 2013-01-01 06:00:00
1451
+ ## 6 58 2013-01-01 05:00:00
1452
+ ```
1453
+
1454
+ ## Logical Operators
1455
+
1456
+ All flights that departed in November of December
1457
+
1458
+
1459
+ ```ruby
1460
+ puts @flights.filter((:month.eq 11) | (:month.eq 12)).head.as__data__frame
1461
+ ```
1462
+
1463
+ ```
1464
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1465
+ ## 1 2013 11 1 5 2359 6 352 345
1466
+ ## 2 2013 11 1 35 2250 105 123 2356
1467
+ ## 3 2013 11 1 455 500 -5 641 651
1468
+ ## 4 2013 11 1 539 545 -6 856 827
1469
+ ## 5 2013 11 1 542 545 -3 831 855
1470
+ ## 6 2013 11 1 549 600 -11 912 923
1471
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1472
+ ## 1 7 B6 745 N568JB JFK PSE 205 1617 23
1473
+ ## 2 87 B6 1816 N353JB JFK SYR 36 209 22
1474
+ ## 3 -10 US 1895 N192UW EWR CLT 88 529 5
1475
+ ## 4 29 UA 1714 N38727 LGA IAH 229 1416 5
1476
+ ## 5 -24 AA 2243 N5CLAA JFK MIA 147 1089 5
1477
+ ## 6 -11 UA 303 N595UA JFK SFO 359 2586 6
1478
+ ## minute time_hour
1479
+ ## 1 59 2013-11-01 23:00:00
1480
+ ## 2 50 2013-11-01 22:00:00
1481
+ ## 3 0 2013-11-01 05:00:00
1482
+ ## 4 45 2013-11-01 05:00:00
1483
+ ## 5 45 2013-11-01 05:00:00
1484
+ ## 6 0 2013-11-01 06:00:00
1485
+ ```
1486
+
1487
+ The same as above, but using the 'in' operator. In R, it is possible to define many operators
1488
+ by doing %<op>%. The %in% operator checks if a value is in a vector. In order to use those
1489
+ operators from Galaaz the '._' method is used, where the first argument is the operator's
1490
+ symbol, in this case ':in' and the second argument is the vector:
1491
+
1492
+
1493
+ ```ruby
1494
+ puts @flights.filter(:month._ :in, R.c(11, 12)).head.as__data__frame
1495
+ ```
1496
+
1497
+ ```
1498
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1499
+ ## 1 2013 11 1 5 2359 6 352 345
1500
+ ## 2 2013 11 1 35 2250 105 123 2356
1501
+ ## 3 2013 11 1 455 500 -5 641 651
1502
+ ## 4 2013 11 1 539 545 -6 856 827
1503
+ ## 5 2013 11 1 542 545 -3 831 855
1504
+ ## 6 2013 11 1 549 600 -11 912 923
1505
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1506
+ ## 1 7 B6 745 N568JB JFK PSE 205 1617 23
1507
+ ## 2 87 B6 1816 N353JB JFK SYR 36 209 22
1508
+ ## 3 -10 US 1895 N192UW EWR CLT 88 529 5
1509
+ ## 4 29 UA 1714 N38727 LGA IAH 229 1416 5
1510
+ ## 5 -24 AA 2243 N5CLAA JFK MIA 147 1089 5
1511
+ ## 6 -11 UA 303 N595UA JFK SFO 359 2586 6
1512
+ ## minute time_hour
1513
+ ## 1 59 2013-11-01 23:00:00
1514
+ ## 2 50 2013-11-01 22:00:00
1515
+ ## 3 0 2013-11-01 05:00:00
1516
+ ## 4 45 2013-11-01 05:00:00
1517
+ ## 5 45 2013-11-01 05:00:00
1518
+ ## 6 0 2013-11-01 06:00:00
1519
+ ```
1520
+
1521
+ ## Filtering with NA (Not Available)
1522
+
1523
+ Let's first create a 'tibble' with a Not Available value (R::NA). Tibbles are a modern
1524
+ version of a data frame and operate very similarly to one. It differs in how it outputs
1525
+ the values and the result of some subsetting operations that are more consistent than
1526
+ what is obtained from data frame.
1527
+
1528
+
1529
+ ```ruby
1530
+ @df = R.tibble(x: R.c(1, R::NA, 3))
1531
+ puts @df.as__data__frame
1532
+ ```
1533
+
1534
+ ```
1535
+ ## x
1536
+ ## 1 1
1537
+ ## 2 NA
1538
+ ## 3 3
1539
+ ```
1540
+
1541
+ Now filtering by :x > 1 shows all lines that satisfy this condition, where the row with R:NA does
1542
+ not.
705
1543
 
706
1544
 
707
1545
  ```ruby
708
- require 'ggplot'
1546
+ puts @df.filter(:x > 1).as__data__frame
1547
+ ```
1548
+
1549
+ ```
1550
+ ## x
1551
+ ## 1 3
1552
+ ```
1553
+
1554
+ To match an NA use method 'is__na'
1555
+
1556
+
1557
+ ```ruby
1558
+ puts @df.filter((:x.is__na) | (:x > 1)).as__data__frame
1559
+ ```
1560
+
1561
+ ```
1562
+ ## x
1563
+ ## 1 NA
1564
+ ## 2 3
1565
+ ```
709
1566
 
710
- R.theme_set R.theme_bw
1567
+ ## Arrange Rows with arrange
711
1568
 
712
- # Data Prep
713
- mtcars = ~:mtcars
714
- mtcars.car_name = R.rownames(:mtcars)
715
- # compute normalized mpg
716
- mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean)/mtcars.mpg.sd).round 2
717
- mtcars.mpg_type = mtcars.mpg_z < 0 ? "below" : "above"
718
- mtcars = mtcars[mtcars.mpg_z.order, :all]
719
- # convert to factor to retain sorted order in plot
720
- mtcars.car_name = mtcars.car_name.factor levels: mtcars.car_name
1569
+ Arrange reorders the rows of a data frame by the given arguments.
721
1570
 
722
- # Diverging Barcharts
723
- gg = mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
724
- R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
725
- R.scale_fill_manual(name: "Mileage",
726
- labels: R.c("Above Average", "Below Average"),
727
- values: R.c("above": "#00ba38", "below": "#f8766d")) +
728
- R.labs(subtitle: "Normalised mileage from 'mtcars'",
729
- title: "Diverging Bars") +
730
- R.coord_flip()
731
1571
 
732
- puts gg
1572
+ ```ruby
1573
+ puts @flights.arrange(:year, :month, :day).head.as__data__frame
1574
+ ```
1575
+
1576
+ ```
1577
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1578
+ ## 1 2013 1 1 517 515 2 830 819
1579
+ ## 2 2013 1 1 533 529 4 850 830
1580
+ ## 3 2013 1 1 542 540 2 923 850
1581
+ ## 4 2013 1 1 544 545 -1 1004 1022
1582
+ ## 5 2013 1 1 554 600 -6 812 837
1583
+ ## 6 2013 1 1 554 558 -4 740 728
1584
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1585
+ ## 1 11 UA 1545 N14228 EWR IAH 227 1400 5
1586
+ ## 2 20 UA 1714 N24211 LGA IAH 227 1416 5
1587
+ ## 3 33 AA 1141 N619AA JFK MIA 160 1089 5
1588
+ ## 4 -18 B6 725 N804JB JFK BQN 183 1576 5
1589
+ ## 5 -25 DL 461 N668DN LGA ATL 116 762 6
1590
+ ## 6 12 UA 1696 N39463 EWR ORD 150 719 5
1591
+ ## minute time_hour
1592
+ ## 1 15 2013-01-01 05:00:00
1593
+ ## 2 29 2013-01-01 05:00:00
1594
+ ## 3 40 2013-01-01 05:00:00
1595
+ ## 4 45 2013-01-01 05:00:00
1596
+ ## 5 0 2013-01-01 06:00:00
1597
+ ## 6 58 2013-01-01 05:00:00
733
1598
  ```
734
1599
 
1600
+ To arrange in descending order, use function 'desc'
1601
+
1602
+
1603
+ ```ruby
1604
+ puts @flights.arrange(:dep_delay.desc).head.as__data__frame
1605
+ ```
1606
+
1607
+ ```
1608
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1609
+ ## 1 2013 1 9 641 900 1301 1242 1530
1610
+ ## 2 2013 6 15 1432 1935 1137 1607 2120
1611
+ ## 3 2013 1 10 1121 1635 1126 1239 1810
1612
+ ## 4 2013 9 20 1139 1845 1014 1457 2210
1613
+ ## 5 2013 7 22 845 1600 1005 1044 1815
1614
+ ## 6 2013 4 10 1100 1900 960 1342 2211
1615
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1616
+ ## 1 1272 HA 51 N384HA JFK HNL 640 4983 9
1617
+ ## 2 1127 MQ 3535 N504MQ JFK CMH 74 483 19
1618
+ ## 3 1109 MQ 3695 N517MQ EWR ORD 111 719 16
1619
+ ## 4 1007 AA 177 N338AA JFK SFO 354 2586 18
1620
+ ## 5 989 MQ 3075 N665MQ JFK CVG 96 589 16
1621
+ ## 6 931 DL 2391 N959DL JFK TPA 139 1005 19
1622
+ ## minute time_hour
1623
+ ## 1 0 2013-01-09 09:00:00
1624
+ ## 2 35 2013-06-15 19:00:00
1625
+ ## 3 35 2013-01-10 16:00:00
1626
+ ## 4 45 2013-09-20 18:00:00
1627
+ ## 5 0 2013-07-22 16:00:00
1628
+ ## 6 0 2013-04-10 19:00:00
1629
+ ```
1630
+
1631
+ ## Selecting columns
1632
+
1633
+ To select specific columns from a dataset we use function 'select':
1634
+
1635
+
1636
+ ```ruby
1637
+ puts @flights.select(:year, :month, :day).head.as__data__frame
1638
+ ```
1639
+
1640
+ ```
1641
+ ## year month day
1642
+ ## 1 2013 1 1
1643
+ ## 2 2013 1 1
1644
+ ## 3 2013 1 1
1645
+ ## 4 2013 1 1
1646
+ ## 5 2013 1 1
1647
+ ## 6 2013 1 1
1648
+ ```
1649
+
1650
+ It is also possible to select column in a given range
1651
+
1652
+
1653
+ ```ruby
1654
+ puts @flights.select(:year.up_to :day).head.as__data__frame
1655
+ ```
1656
+
1657
+ ```
1658
+ ## year month day
1659
+ ## 1 2013 1 1
1660
+ ## 2 2013 1 1
1661
+ ## 3 2013 1 1
1662
+ ## 4 2013 1 1
1663
+ ## 5 2013 1 1
1664
+ ## 6 2013 1 1
1665
+ ```
1666
+
1667
+ Select all columns that start with a given name sequence
1668
+
1669
+
1670
+ ```ruby
1671
+ puts @flights.select(E.starts_with('arr')).head.as__data__frame
1672
+ ```
1673
+
1674
+ ```
1675
+ ## arr_time arr_delay
1676
+ ## 1 830 11
1677
+ ## 2 850 20
1678
+ ## 3 923 33
1679
+ ## 4 1004 -18
1680
+ ## 5 812 -25
1681
+ ## 6 740 12
1682
+ ```
1683
+
1684
+ Other functions that can be used:
1685
+
1686
+ * ends_with("xyz"): matches names that end with “xyz”.
1687
+
1688
+ * contains("ijk"): matches names that contain “ijk”.
1689
+
1690
+ * matches("(.)\\1"): selects variables that match a regular expression. This one matches
1691
+ any variables that contain repeated characters.
1692
+
1693
+ * num_range("x", (1..3)): matches x1, x2 and x3
1694
+
1695
+ A helper function that comes in handy when we just want to rearrange column order is 'Everything':
1696
+
1697
+
1698
+ ```ruby
1699
+ puts @flights.select(:year, :month, :day, E.everything).head.as__data__frame
1700
+ ```
1701
+
1702
+ ```
1703
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1704
+ ## 1 2013 1 1 517 515 2 830 819
1705
+ ## 2 2013 1 1 533 529 4 850 830
1706
+ ## 3 2013 1 1 542 540 2 923 850
1707
+ ## 4 2013 1 1 544 545 -1 1004 1022
1708
+ ## 5 2013 1 1 554 600 -6 812 837
1709
+ ## 6 2013 1 1 554 558 -4 740 728
1710
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1711
+ ## 1 11 UA 1545 N14228 EWR IAH 227 1400 5
1712
+ ## 2 20 UA 1714 N24211 LGA IAH 227 1416 5
1713
+ ## 3 33 AA 1141 N619AA JFK MIA 160 1089 5
1714
+ ## 4 -18 B6 725 N804JB JFK BQN 183 1576 5
1715
+ ## 5 -25 DL 461 N668DN LGA ATL 116 762 6
1716
+ ## 6 12 UA 1696 N39463 EWR ORD 150 719 5
1717
+ ## minute time_hour
1718
+ ## 1 15 2013-01-01 05:00:00
1719
+ ## 2 29 2013-01-01 05:00:00
1720
+ ## 3 40 2013-01-01 05:00:00
1721
+ ## 4 45 2013-01-01 05:00:00
1722
+ ## 5 0 2013-01-01 06:00:00
1723
+ ## 6 58 2013-01-01 05:00:00
1724
+ ```
1725
+
1726
+ ## Add variables to a dataframe with 'mutate'
1727
+
1728
+
1729
+ ```ruby
1730
+ @flights_sm = @flights.
1731
+ select((:year.up_to :day),
1732
+ E.ends_with('delay'),
1733
+ :distance,
1734
+ :air_time)
1735
+
1736
+ puts @flights_sm.head.as__data__frame
1737
+ ```
1738
+
1739
+ ```
1740
+ ## year month day dep_delay arr_delay distance air_time
1741
+ ## 1 2013 1 1 2 11 1400 227
1742
+ ## 2 2013 1 1 4 20 1416 227
1743
+ ## 3 2013 1 1 2 33 1089 160
1744
+ ## 4 2013 1 1 -1 -18 1576 183
1745
+ ## 5 2013 1 1 -6 -25 762 116
1746
+ ## 6 2013 1 1 -4 12 719 150
1747
+ ```
1748
+
1749
+
1750
+ ```ruby
1751
+ @flights_sm = @flights_sm.
1752
+ mutate(gain: :dep_delay - :arr_delay,
1753
+ speed: :distance / :air_time * 60)
1754
+ puts @flights_sm.head.as__data__frame
1755
+ ```
1756
+
1757
+ ```
1758
+ ## year month day dep_delay arr_delay distance air_time gain speed
1759
+ ## 1 2013 1 1 2 11 1400 227 -9 370.0441
1760
+ ## 2 2013 1 1 4 20 1416 227 -16 374.2731
1761
+ ## 3 2013 1 1 2 33 1089 160 -31 408.3750
1762
+ ## 4 2013 1 1 -1 -18 1576 183 17 516.7213
1763
+ ## 5 2013 1 1 -6 -25 762 116 19 394.1379
1764
+ ## 6 2013 1 1 -4 12 719 150 -16 287.6000
1765
+ ```
1766
+
1767
+ # Graphics in Galaaz
1768
+
1769
+ Creating graphics in Galaaz is quite easy, as it can use all the power of ggplot2. There are
1770
+ many resources in the web that teaches ggplot, so here we give a quick example of ggplot
1771
+ integration with Ruby. We continue to use the :mtcars dataset and we will plot a diverging
1772
+ bar plot, showing cars that have 'above' or 'below' gas consuption. Let's first prepare
1773
+ the data frame with the necessary data:
1774
+
1775
+
1776
+ ```ruby
1777
+ # copy the R variable :mtcars to the Ruby mtcars variable
1778
+ @mtcars = ~:mtcars
1779
+
1780
+ # create a new column 'car_name' to store the car names so that it can be
1781
+ # used for plotting. The 'rownames' of the data frame cannot be used as
1782
+ # data for plotting
1783
+ @mtcars.car_name = R.rownames(:mtcars)
1784
+
1785
+ # compute normalized mpg and add it to a new column called mpg_z
1786
+ # Note that the mean value for mpg can be obtained by calling the 'mean'
1787
+ # function on the vector 'mtcars.mpg'. The same with the standard
1788
+ # deviation 'sd'. The vector is then rounded to two digits with 'round 2'
1789
+ @mtcars.mpg_z = ((@mtcars.mpg - @mtcars.mpg.mean)/@mtcars.mpg.sd).round 2
1790
+
1791
+ # create a new column 'mpg_type'. Function 'ifelse' is a vectorized function
1792
+ # that looks at every element of the mpg_z vector and if the value is below
1793
+ # 0, returns 'below', otherwise returns 'above'
1794
+ @mtcars.mpg_type = (@mtcars.mpg_z < 0).ifelse("below", "above")
1795
+
1796
+ # order the mtcar data set by the mpg_z vector from smaler to larger values
1797
+ @mtcars = @mtcars[@mtcars.mpg_z.order, :all]
1798
+
1799
+ # convert the car_name column to a factor to retain sorted order in plot
1800
+ @mtcars.car_name = @mtcars.car_name.factor levels: @mtcars.car_name
1801
+
1802
+ # let's look at the final data frame
1803
+ puts @mtcars
1804
+ ```
1805
+
1806
+ ```
1807
+ ## mpg cyl disp hp drat wt qsec vs am gear carb
1808
+ ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
1809
+ ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
1810
+ ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
1811
+ ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
1812
+ ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
1813
+ ## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
1814
+ ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
1815
+ ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
1816
+ ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
1817
+ ## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
1818
+ ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
1819
+ ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
1820
+ ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
1821
+ ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
1822
+ ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
1823
+ ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
1824
+ ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
1825
+ ## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
1826
+ ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
1827
+ ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
1828
+ ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
1829
+ ## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
1830
+ ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
1831
+ ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
1832
+ ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
1833
+ ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
1834
+ ## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
1835
+ ## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
1836
+ ## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
1837
+ ## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
1838
+ ## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
1839
+ ## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
1840
+ ## car_name mpg_z mpg_type
1841
+ ## Cadillac Fleetwood Cadillac Fleetwood -1.61 below
1842
+ ## Lincoln Continental Lincoln Continental -1.61 below
1843
+ ## Camaro Z28 Camaro Z28 -1.13 below
1844
+ ## Duster 360 Duster 360 -0.96 below
1845
+ ## Chrysler Imperial Chrysler Imperial -0.89 below
1846
+ ## Maserati Bora Maserati Bora -0.84 below
1847
+ ## Merc 450SLC Merc 450SLC -0.81 below
1848
+ ## AMC Javelin AMC Javelin -0.81 below
1849
+ ## Dodge Challenger Dodge Challenger -0.76 below
1850
+ ## Ford Pantera L Ford Pantera L -0.71 below
1851
+ ## Merc 450SE Merc 450SE -0.61 below
1852
+ ## Merc 450SL Merc 450SL -0.46 below
1853
+ ## Merc 280C Merc 280C -0.38 below
1854
+ ## Valiant Valiant -0.33 below
1855
+ ## Hornet Sportabout Hornet Sportabout -0.23 below
1856
+ ## Merc 280 Merc 280 -0.15 below
1857
+ ## Pontiac Firebird Pontiac Firebird -0.15 below
1858
+ ## Ferrari Dino Ferrari Dino -0.06 below
1859
+ ## Mazda RX4 Mazda RX4 0.15 above
1860
+ ## Mazda RX4 Wag Mazda RX4 Wag 0.15 above
1861
+ ## Hornet 4 Drive Hornet 4 Drive 0.22 above
1862
+ ## Volvo 142E Volvo 142E 0.22 above
1863
+ ## Toyota Corona Toyota Corona 0.23 above
1864
+ ## Datsun 710 Datsun 710 0.45 above
1865
+ ## Merc 230 Merc 230 0.45 above
1866
+ ## Merc 240D Merc 240D 0.72 above
1867
+ ## Porsche 914-2 Porsche 914-2 0.98 above
1868
+ ## Fiat X1-9 Fiat X1-9 1.20 above
1869
+ ## Honda Civic Honda Civic 1.71 above
1870
+ ## Lotus Europa Lotus Europa 1.71 above
1871
+ ## Fiat 128 Fiat 128 2.04 above
1872
+ ## Toyota Corolla Toyota Corolla 2.29 above
1873
+ ```
1874
+ Now, lets plot the diverging bar plot. When using gKnit, there is no need to call
1875
+ 'R.awt' to create a plotting device, since gKnit does take care of it:
735
1876
 
736
- ![](/home/rbotafogo/desenv/galaaz/blogs/manual/manual_files/figure-html/diverging_bar.png)<!-- -->
737
1877
 
738
1878
 
739
1879
  [TO BE CONTINUED...]