galaaz 0.4.7 → 0.4.8

Sign up to get free protection for your applications and to get access to all the features.
Files changed (68) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +1179 -39
  3. data/Rakefile +4 -2
  4. data/bin/grun +1 -1
  5. data/bin/gstudio +1 -1
  6. data/blogs/dev/dev.Rmd +2 -56
  7. data/blogs/dev/dev.md +32 -61
  8. data/blogs/dev/dev2.Rmd +65 -0
  9. data/blogs/dplyr/dplyr.Rmd +29 -0
  10. data/blogs/{dev/dev.html → dplyr/dplyr.html} +88 -57
  11. data/blogs/dplyr/dplyr.md +58 -0
  12. data/blogs/gknit/gknit.html +1262 -25
  13. data/blogs/gknit/gknit.md +471 -27
  14. data/blogs/gknit/gknit_files/figure-html/bubble-1.png +0 -0
  15. data/blogs/manual/graph.rb +29 -0
  16. data/blogs/manual/manual.Rmd +567 -29
  17. data/blogs/manual/manual.html +743 -46
  18. data/blogs/manual/manual.md +1179 -39
  19. data/blogs/nse_dplyr/nse_dplyr.Rmd +466 -11
  20. data/blogs/nse_dplyr/nse_dplyr.html +472 -37
  21. data/blogs/nse_dplyr/nse_dplyr.md +645 -32
  22. data/blogs/ruby_plot/ruby_plot.Rmd +4 -4
  23. data/blogs/ruby_plot/ruby_plot.html +217 -2
  24. data/blogs/ruby_plot/ruby_plot.md +226 -1
  25. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
  26. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +2 -2
  27. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
  28. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +70 -70
  29. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
  30. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +72 -72
  31. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
  32. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +116 -116
  33. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
  34. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +176 -176
  35. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
  36. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
  37. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +236 -236
  38. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
  39. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +176 -176
  40. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
  41. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +160 -160
  42. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
  43. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +105 -105
  44. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
  45. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +121 -121
  46. data/examples/islr/ch2.spec.rb +1 -1
  47. data/examples/islr/ch3_boston.rb +4 -4
  48. data/examples/islr/x_y_rnorm.jpg +0 -0
  49. data/lib/R_interface/r.rb +1 -1
  50. data/lib/R_interface/r_methods.rb +2 -2
  51. data/lib/R_interface/rdata_frame.rb +8 -5
  52. data/lib/R_interface/rindexed_object.rb +1 -2
  53. data/lib/R_interface/rlist.rb +1 -0
  54. data/lib/R_interface/robject.rb +0 -1
  55. data/lib/R_interface/rpkg.rb +14 -6
  56. data/lib/R_interface/rsupport.rb +7 -9
  57. data/lib/R_interface/ruby_extensions.rb +17 -5
  58. data/lib/gknit/knitr_engine.rb +9 -2
  59. data/lib/util/exec_ruby.rb +2 -2
  60. data/specs/r_dataframe.spec.rb +173 -0
  61. data/specs/r_list.spec.rb +4 -4
  62. data/specs/ruby_expression.spec.rb +2 -11
  63. data/specs/tmp.rb +76 -34
  64. data/version.rb +1 -1
  65. metadata +17 -6
  66. data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
  67. data/blogs/dev/dev_files/figure-html/diverging_bar. +0 -0
  68. data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c79823d334f1b44a16be1d98c2c075b04ac734c3b80003334b4cc6ad6697a975
4
- data.tar.gz: 818de59d34378899f8b442e301624748d5447137768a9b9131721b86a8d25bdd
3
+ metadata.gz: 58c62af9441e2a11082559e619498f518d1060d64e4078dea85a89cb795305d5
4
+ data.tar.gz: 72d415df8641497ed7ca1a442a8e06e5eaab21aa7f9162dfa132cf2728808948
5
5
  SHA512:
6
- metadata.gz: '090b147d622051c62aeda052e3d77b23aa9db5478bebf869d99d73bdc377db01b49ead2b09db084c4acf34f8712b839ca1b1a7f45ddf5c22125cf49f5b2b52ab'
7
- data.tar.gz: d256afe6d9fbabcd44491762e6b3a6c618b29c7302b08621c89a92e0b94681d0519e45f0271db944ad96d0dd2e54133cba01f57a6602edfd1a34b23a1b14e1bc
6
+ metadata.gz: 7a6559d5aa01378730bdd1e9b47ec157eaea0dddf70f2d8974a6de36a86c746e429d71293adb511ce746cdd5ffba913eb3db7e9c39b8b0b370a8162ee0f511e3
7
+ data.tar.gz: e247f80a65ed5a338b553ce0e2226d46ab4527e8460beb3a1871646f6cbf83be653d8fa0d57fbd32c727377fa826e7b84463858ac0c9047baf7fe9c21ff82ad0
data/README.md CHANGED
@@ -22,7 +22,6 @@ fontsize: 11pt
22
22
 
23
23
 
24
24
 
25
-
26
25
  # Introduction
27
26
 
28
27
  Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with a large
@@ -96,9 +95,20 @@ Panda, SciPy, SciKit-Learn and a couple more.
96
95
 
97
96
  > galaaz master_list:scatter_plot
98
97
 
99
- # Basic Types
98
+ # gKnitting a Document
99
+
100
+ This manual has been formatted usign gKnit. gKnit uses Knitr and R markdown to knit
101
+ a document in Ruby or R and output it in any of the available formats for R markdown.
102
+ gKnit runs atop of GraalVM, and Galaaz. In gKnit, Ruby variables are persisted between
103
+ chunks, making it an ideal solution for literate programming.
104
+ Also, since it is based on Galaaz, Ruby chunks can have access to R variables and Polyglot
105
+ Programming with Ruby and R is quite natural.
106
+
107
+ gknit was describe in more depth in:
108
+
109
+ * xxx.xxxx.xxx
100
110
 
101
- ## Vectors
111
+ # Vector
102
112
 
103
113
  Vectors can be thought of as contiguous cells containing data. Cells are accessed through
104
114
  indexing operations such as x[5]. Galaaz has six basic (‘atomic’) vector types: logical,
@@ -134,11 +144,13 @@ puts @vec
134
144
 
135
145
  Lets take a look at the type, mode and storage.mode of our vector @vec. In order to print
136
146
  this out, we are creating a data frame 'df' and printing it out. A data frame, for those
137
- not familiar with it, it basically a table. Here we create the data frame and add the
147
+ not familiar with it, is basically a table. Here we create the data frame and add the
138
148
  column name by passing named parameters for each column, such as 'typeof:', 'mode:' and
139
- 'storage__mode'. You should also note here that the double underscore is converted to a '.'.
149
+ 'storage__mode?'. You should also note here that the double underscore is converted to a '.'.
150
+ So, when printed 'storage\_\_mode' will actually print as 'storage.mode'.
140
151
 
141
- In R, the method used to create a data frame is 'data.frame', in Galaaz we use 'data__frame'.
152
+ Data frames will later be more carefully described. In R, the method used to create a
153
+ data frame is 'data.frame', in Galaaz we use 'data\_\_frame'.
142
154
 
143
155
 
144
156
  ```ruby
@@ -207,25 +219,235 @@ vec = R.c(1, hello, 5)
207
219
  ```
208
220
  ## Message:
209
221
  ## (eval):1:in `exec_ruby'
210
- ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:137:in `instance_eval'
211
- ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:137:in `exec_ruby'
212
- ## /home/rbotafogo/desenv/galaaz/lib/gknit/ruby_engine.rb:55:in `block in initialize'
222
+ ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:141:in `instance_eval'
223
+ ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:141:in `exec_ruby'
224
+ ## /home/rbotafogo/desenv/galaaz/lib/gknit/knitr_engine.rb:657:in `block in initialize'
213
225
  ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `call'
214
226
  ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `callback'
215
227
  ## (eval):3:in `function(...) {\n rb_method(...)'
216
228
  ## unknown.r:1:in `in_dir'
217
- ## unknown.r:1:in `block_exec'
218
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc12/jre/languages/R/library/knitr/R/block.R:91:in `call_block'
219
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc12/jre/languages/R/library/knitr/R/block.R:6:in `process_group.block'
220
- ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc12/jre/languages/R/library/knitr/R/block.R:3:in `<no source>'
229
+ ## unknown.r:1:in `block_exec:BLOCK0'
230
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:102:in `block_exec'
231
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:92:in `call_block'
232
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:6:in `process_group.block'
233
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/block.R:3:in `<no source>'
221
234
  ## unknown.r:1:in `withCallingHandlers'
222
235
  ## unknown.r:1:in `process_file'
223
- ## unknown.r:1:in `<no source>'
224
- ## unknown.r:1:in `<no source>'
225
- ## <REPL>:4:in `<repl wrapper>'
236
+ ## unknown.r:1:in `<no source>:BLOCK1'
237
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/knitr/R/output.R:129:in `<no source>'
238
+ ## unknown.r:1:in `<no source>:BLOCK1'
239
+ ## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc15/jre/languages/R/library/rmarkdown/R/render.R:162:in `<no source>'
240
+ ## <REPL>:5:in `<repl wrapper>'
226
241
  ## <REPL>:1
227
242
  ```
228
243
 
244
+ Here is a vector with logical values
245
+
246
+
247
+ ```ruby
248
+ @vec = R.c(true, true, false, false, true)
249
+ puts @vec
250
+ ```
251
+
252
+ ```
253
+ ## [1] TRUE TRUE FALSE FALSE TRUE
254
+ ```
255
+
256
+ ## Combining Vectors
257
+
258
+ The 'c' functions used to create vectors can also be used to combine two vectors:
259
+
260
+
261
+ ```ruby
262
+ @vec1 = R.c(10.0, 20.0, 30.0)
263
+ @vec2 = R.c(4.0, 5.0, 6.0)
264
+ @vec = R.c(@vec1, @vec2)
265
+ puts @vec
266
+ ```
267
+
268
+ ```
269
+ ## [1] 10 20 30 4 5 6
270
+ ```
271
+ In galaaz, methods can be chainned (somewhat like the pipe operator in R %>%, but more generic).
272
+ In this next example, method 'c' is chainned after '@vec1'. This also looks like 'c' is a
273
+ method of the vector, but in reallity, this is actually closer to the pipe operator. When
274
+ Galaaz identifies that 'c' is not a method of 'vec' it actually tries to call 'R.c' with
275
+ '@vec1' as the first argument concatenated with all the other available arguments. The code
276
+ bellow is automatically converted to the code above.
277
+
278
+
279
+ ```ruby
280
+ @vec = @vec1.c(@vec2)
281
+ puts @vec
282
+ ```
283
+
284
+ ```
285
+ ## [1] 10 20 30 4 5 6
286
+ ```
287
+
288
+ ## Vector Arithmetic
289
+
290
+ Arithmetic operations on vectors are performed element by element:
291
+
292
+
293
+ ```ruby
294
+ puts @vec1 + @vec2
295
+ ```
296
+
297
+ ```
298
+ ## [1] 14 25 36
299
+ ```
300
+
301
+
302
+ ```ruby
303
+ puts @vec1 * 5
304
+ ```
305
+
306
+ ```
307
+ ## [1] 50 100 150
308
+ ```
309
+
310
+ When vectors have different length, a recycling rule is applied to the shorter vector:
311
+
312
+
313
+ ```ruby
314
+ @vec3 = R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)
315
+ puts @vec4 = @vec1 + @vec3
316
+ ```
317
+
318
+ ```
319
+ ## [1] 11 22 33 14 25 36 17 28 39
320
+ ```
321
+
322
+ ## Vector Indexing
323
+
324
+ Vectors can be indexed by using the '[]' operator:
325
+
326
+
327
+ ```ruby
328
+ puts @vec4[3]
329
+ ```
330
+
331
+ ```
332
+ ## [1] 33
333
+ ```
334
+
335
+ We can also index a vector with another vector. For example, in the code bellow, we take elements
336
+ 1, 3, 5, and 7 from @vec3:
337
+
338
+
339
+ ```ruby
340
+ puts @vec4[R.c(1, 3, 5, 7)]
341
+ ```
342
+
343
+ ```
344
+ ## [1] 11 33 25 17
345
+ ```
346
+
347
+ Repeating an index and having indices out of order is valid code:
348
+
349
+
350
+ ```ruby
351
+ puts @vec4[R.c(1, 3, 3, 1)]
352
+ ```
353
+
354
+ ```
355
+ ## [1] 11 33 33 11
356
+ ```
357
+
358
+ It is also possible to index a vector with a negative number or negative vector. In these cases
359
+ the indexed values are not returned:
360
+
361
+
362
+ ```ruby
363
+ puts @vec4[-3]
364
+ puts @vec4[-R.c(1, 3, 5, 7)]
365
+ ```
366
+
367
+ ```
368
+ ## [1] 11 22 14 25 36 17 28 39
369
+ ## [1] 22 14 36 28 39
370
+ ```
371
+
372
+ If an index is out of range, a missing value (NA) will be reported.
373
+
374
+
375
+ ```ruby
376
+ puts @vec4[30]
377
+ ```
378
+
379
+ ```
380
+ ## [1] NA
381
+ ```
382
+
383
+ It is also possible to index a vector by range:
384
+
385
+
386
+ ```ruby
387
+ puts @vec4[(2..5)]
388
+ ```
389
+
390
+ ```
391
+ ## [1] 22 33 14 25
392
+ ```
393
+
394
+ Elements in a vector can be named using the 'names' attribute of a vector:
395
+
396
+
397
+ ```ruby
398
+ full_name = R.c("Rodrigo", "A", "Botafogo")
399
+ full_name.names = R.c("First", "Middle", "Last")
400
+ puts full_name
401
+ ```
402
+
403
+ ```
404
+ ## First Middle Last
405
+ ## "Rodrigo" "A" "Botafogo"
406
+ ```
407
+
408
+ Or it can also be named by using the 'c' function with named paramenters:
409
+
410
+
411
+ ```ruby
412
+ full_name = R.c(First: "Rodrigo", Middle: "A", Last: "Botafogo")
413
+ puts full_name
414
+ ```
415
+
416
+ ```
417
+ ## First Middle Last
418
+ ## "Rodrigo" "A" "Botafogo"
419
+ ```
420
+
421
+ ## Extracting Native Ruby Types from a Vector
422
+
423
+ Vectors created with 'R.c' are of class R::Vector. You might have noticed that when indexing a
424
+ vector, a new vector is returned, even if this vector has one single element. In order to use
425
+ R::Vector with other ruby classes it might be necessary to extract the actual Ruby native type
426
+ from the vector. In order to do this extraction the '>>' operator is used.
427
+
428
+
429
+ ```ruby
430
+ puts @vec4
431
+ puts @vec4 >> 0
432
+ puts @vec4 >> 4
433
+ ```
434
+
435
+ ```
436
+ ## [1] 11 22 33 14 25 36 17 28 39
437
+ ## 11.0
438
+ ## 25.0
439
+ ```
440
+
441
+ Note that indexing with '>>' starts at 0 and not at 1, also, we cannot do negative indexing.
442
+
443
+ # Accessing R variables
444
+
445
+ Galaaz allows Ruby to access variables created in R. For example, the 'mtcars' data set is
446
+ available in R and can be accessed from Ruby by using the 'tilda' operator followed by the
447
+ symbol for the variable, in this case ':mtcar'. In the code bellow method 'outputs' is
448
+ used to output the 'mtcars' data set nicely formatted in HTML by use of the 'kable' and
449
+ 'kable_styling' functions. Method 'outputs' is only available when used with 'gknit'.
450
+
229
451
 
230
452
  ```ruby
231
453
  outputs (~:mtcars).kable.kable_styling
@@ -700,40 +922,958 @@ outputs (~:mtcars).kable.kable_styling
700
922
  </tbody>
701
923
  </table>
702
924
 
925
+ # Matrix
926
+
927
+ A matrix is a collection of elements organized as a two dimensional table. A matrix can be
928
+ created by the 'matrix' function:
929
+
930
+
931
+ ```ruby
932
+ @mat = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
933
+ nrow: 3,
934
+ ncol: 3)
935
+
936
+ puts @mat
937
+ ```
938
+
939
+ ```
940
+ ## [,1] [,2] [,3]
941
+ ## [1,] 1 4 7
942
+ ## [2,] 2 5 8
943
+ ## [3,] 3 6 9
944
+ ```
945
+ Note that matrices data is organized by column first. It is possible to organize the matrix
946
+ memory by row first passing an extra argument to the 'matrix' function:
947
+
948
+
949
+ ```ruby
950
+ @mat_row = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
951
+ nrow: 3,
952
+ ncol: 3,
953
+ byrow: true)
954
+
955
+ puts @mat_row
956
+ ```
957
+
958
+ ```
959
+ ## [,1] [,2] [,3]
960
+ ## [1,] 1 2 3
961
+ ## [2,] 4 5 6
962
+ ## [3,] 7 8 9
963
+ ```
964
+
965
+ ## Indexing a Matrix
966
+
967
+ A matrix can be indexed by [row, column]:
968
+
969
+
970
+ ```ruby
971
+ puts @mat_row[1, 1]
972
+ puts @mat_row[2, 3]
973
+ ```
974
+
975
+ ```
976
+ ## [1] 1
977
+ ## [1] 6
978
+ ```
979
+ It is possible to index an entire row or column with the ':all' keyword
980
+
981
+
982
+ ```ruby
983
+ puts @mat_row[1, :all]
984
+ puts @mat_row[:all, 2]
985
+ ```
986
+
987
+ ```
988
+ ## [1] 1 2 3
989
+ ## [1] 2 5 8
990
+ ```
991
+
992
+ Indexing with a vector is also possible for matrices. In the following example we want
993
+ rows 1 and 3 and columns 2 and 3 building a 2 x 2 matrix.
994
+
995
+
996
+ ```ruby
997
+ puts @mat_row[R.c(1, 3), R.c(2, 3)]
998
+ ```
999
+
1000
+ ```
1001
+ ## [,1] [,2]
1002
+ ## [1,] 2 3
1003
+ ## [2,] 8 9
1004
+ ```
1005
+
1006
+ Matrices can be combined with functions 'rbind' and 'cbind'
1007
+
1008
+
1009
+ ```ruby
1010
+ puts @mat_row.rbind(@mat)
1011
+ puts @mat_row.cbind(@mat)
1012
+ ```
1013
+
1014
+ ```
1015
+ ## [,1] [,2] [,3]
1016
+ ## [1,] 1 2 3
1017
+ ## [2,] 4 5 6
1018
+ ## [3,] 7 8 9
1019
+ ## [4,] 1 4 7
1020
+ ## [5,] 2 5 8
1021
+ ## [6,] 3 6 9
1022
+ ## [,1] [,2] [,3] [,4] [,5] [,6]
1023
+ ## [1,] 1 2 3 1 4 7
1024
+ ## [2,] 4 5 6 2 5 8
1025
+ ## [3,] 7 8 9 3 6 9
1026
+ ```
1027
+
1028
+ # List
1029
+
1030
+ A list is a data structure that can contain sublists of different types, while vector and matrix
1031
+ can only hold one type of element.
1032
+
1033
+
1034
+ ```ruby
1035
+ nums = R.c(1.0, 2.0, 3.0)
1036
+ strs = R.c("a", "b", "c", "d")
1037
+ bool = R.c(true, true, false)
1038
+ @lst = R.list(nums: nums, strs: strs, bool: bool)
1039
+ puts @lst
1040
+ ```
1041
+
1042
+ ```
1043
+ ## $nums
1044
+ ## [1] 1 2 3
1045
+ ##
1046
+ ## $strs
1047
+ ## [1] "a" "b" "c" "d"
1048
+ ##
1049
+ ## $bool
1050
+ ## [1] TRUE TRUE FALSE
1051
+ ```
1052
+
1053
+ Note that '@lst' elements are named elements.
1054
+
1055
+
1056
+ ## List Indexing
1057
+
1058
+ List indexing, also called slicing, is done using the '[]' operator and the '[[]]' operator. Let's
1059
+ first start with the '[]' operator. The list above has three sublist indexing with '[]' will
1060
+ return one of the sublists.
1061
+
1062
+
1063
+ ```ruby
1064
+ puts @lst[1]
1065
+ ```
1066
+
1067
+ ```
1068
+ ## $nums
1069
+ ## [1] 1 2 3
1070
+ ```
1071
+
1072
+ Note that when using '[]' a new list is returned. When using the double square bracket operator
1073
+ the value returned is the actual element of the list in the given position and not a slice of
1074
+ the original list
1075
+
1076
+
1077
+
1078
+ ```ruby
1079
+ puts @lst[[1]]
1080
+ ```
1081
+
1082
+ ```
1083
+ ## [1] 1 2 3
1084
+ ```
1085
+
1086
+ When elements are named, as dones with @lst, indexing can be done by name:
1087
+
1088
+
1089
+ ```ruby
1090
+ puts @lst[['bool']][[1]] >> 0
1091
+ ```
1092
+
1093
+ ```
1094
+ ## true
1095
+ ```
1096
+
1097
+ In this example, first the 'bool' element of the list was extracted, not as a list, but as a vector,
1098
+ then the first element of the vector was extracted (note that vectors also accept the '[[]]'
1099
+ operator) and then the vector was indexed by its first element, extracting the native Ruby type.
1100
+
1101
+
1102
+ # Data Frame
1103
+
1104
+ A data frame is a table like structure in which each column has the same number of
1105
+ rows. Data frames are the basic structure for storing data for data analysis. We have already
1106
+ seen a data frame previously when we accessed variable '~:mtcars'. In order to create a
1107
+ data frame, function 'data__frame' is used:
1108
+
1109
+
1110
+ ```ruby
1111
+ df = R.data__frame(
1112
+ year: R.c(2010, 2011, 2012),
1113
+ income: R.c(1000.0, 1500.0, 2000.0))
1114
+
1115
+ puts df
1116
+ ```
1117
+
1118
+ ```
1119
+ ## year income
1120
+ ## 1 2010 1000
1121
+ ## 2 2011 1500
1122
+ ## 3 2012 2000
1123
+ ```
1124
+
1125
+ ## Data Frame Indexing
1126
+
1127
+ A data frame can be indexed the same way as a matrix, by using '[row, column]', where row and
1128
+ column can either be a numeric or the name of the row or column
1129
+
1130
+
1131
+ ```ruby
1132
+ puts (~:mtcars).head
1133
+ puts (~:mtcars)[1, 2]
1134
+ puts (~:mtcars)['Datsun 710', 'mpg']
1135
+ ```
1136
+
1137
+ ```
1138
+ ## mpg cyl disp hp drat wt qsec vs am gear carb
1139
+ ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
1140
+ ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
1141
+ ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
1142
+ ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
1143
+ ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
1144
+ ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
1145
+ ## [1] 6
1146
+ ## [1] 22.8
1147
+ ```
1148
+
1149
+ Extracting a column from a data frame as a vector can be done by using the double square bracket
1150
+ operator:
1151
+
703
1152
 
704
- ## Graphics with ggplot
1153
+ ```ruby
1154
+ puts (~:mtcars)[['mpg']]
1155
+ ```
1156
+
1157
+ ```
1158
+ ## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2
1159
+ ## [15] 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4
1160
+ ## [29] 15.8 19.7 15.0 21.4
1161
+ ```
1162
+
1163
+ A data frame column can also be accessed as if it were an instance variable of the data frame:
1164
+
1165
+
1166
+ ```ruby
1167
+ puts (~:mtcars).mpg
1168
+ ```
1169
+
1170
+ ```
1171
+ ## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2
1172
+ ## [15] 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4
1173
+ ## [29] 15.8 19.7 15.0 21.4
1174
+ ```
1175
+
1176
+ Slicing a data frame can be done by indexing it with a vector (we use 'head' to reduce the
1177
+ output):
1178
+
1179
+
1180
+ ```ruby
1181
+ puts (~:mtcars)[R.c('mpg', 'hp')].head
1182
+ ```
1183
+
1184
+ ```
1185
+ ## mpg hp
1186
+ ## Mazda RX4 21.0 110
1187
+ ## Mazda RX4 Wag 21.0 110
1188
+ ## Datsun 710 22.8 93
1189
+ ## Hornet 4 Drive 21.4 110
1190
+ ## Hornet Sportabout 18.7 175
1191
+ ## Valiant 18.1 105
1192
+ ```
1193
+
1194
+ A row slice can be obtained by indexing by row and using the ':all' keyword for the column:
1195
+
1196
+
1197
+ ```ruby
1198
+ puts (~:mtcars)[R.c('Datsun 710', 'Camaro Z28'), :all]
1199
+ ```
1200
+
1201
+ ```
1202
+ ## mpg cyl disp hp drat wt qsec vs am gear carb
1203
+ ## Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
1204
+ ## Camaro Z28 13.3 8 350 245 3.73 3.84 15.41 0 0 3 4
1205
+ ```
1206
+
1207
+ Finally, a data frame can also be indexed with a logical vector. In this next example, the
1208
+ 'am' column of :mtcars is compared with 0 (with method 'eq'). When 'am' is equal to 0 the
1209
+ car is automatic. So, by doing '(~:mtcars).am.eq 0' a logical vector is created with
1210
+ 'true' whenever 'am' is 0 and 'false' otherwise. Using this logical vector, the data frame
1211
+ is indexed, returning a new data frame in which all cars have automatic transmission.
1212
+
1213
+
1214
+ ```ruby
1215
+ # obtain a vector with 'true' for cars with automatic transmission
1216
+ automatic = (~:mtcars).am.eq 0
1217
+ puts automatic
1218
+
1219
+ # slice the data frame by using this vector
1220
+ puts (~:mtcars)[automatic, :all]
1221
+ ```
1222
+
1223
+ ```
1224
+ ## [1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
1225
+ ## [12] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
1226
+ ## [23] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
1227
+ ## mpg cyl disp hp drat wt qsec vs am gear carb
1228
+ ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
1229
+ ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
1230
+ ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
1231
+ ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
1232
+ ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
1233
+ ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
1234
+ ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
1235
+ ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
1236
+ ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
1237
+ ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
1238
+ ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
1239
+ ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
1240
+ ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
1241
+ ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
1242
+ ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
1243
+ ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
1244
+ ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
1245
+ ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
1246
+ ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
1247
+ ```
1248
+
1249
+ # Writing Expressions in Galaaz
1250
+
1251
+ Galaaz extends Ruby to work with complex expressions, similar to R's expressions build with 'quote'
1252
+ (base R) or 'quo' (tidyverse). Let's take a look at some of those expressions.
1253
+
1254
+ ## Expressions from operators
1255
+
1256
+ The code bellow
1257
+ creates an expression summing two symbols
1258
+
1259
+
1260
+ ```ruby
1261
+ exp1 = :a + :b
1262
+ puts exp1
1263
+ ```
1264
+
1265
+ ```
1266
+ ## a + b
1267
+ ```
1268
+ We can build any complex mathematical expression
1269
+
1270
+
1271
+ ```ruby
1272
+ exp2 = (:a + :b) * 2.0 + :c ** 2 / :z
1273
+ puts exp2
1274
+ ```
1275
+
1276
+ ```
1277
+ ## (a + b) * 2 + c^2L/z
1278
+ ```
1279
+
1280
+ It is also possible to use inequality operators in building expressions
1281
+
1282
+
1283
+ ```ruby
1284
+ exp3 = (:a + :b) >= :z
1285
+ puts exp3
1286
+ ```
1287
+
1288
+ ```
1289
+ ## a + b >= z
1290
+ ```
1291
+
1292
+ Galaaz provides both symbolic representations for operators, such as (>, <, !=) as functional
1293
+ notation for those operators such as (.gt, .ge, etc.). So the same expression written
1294
+ above can also be written as
1295
+
1296
+
1297
+ ```ruby
1298
+ exp4 = (:a + :b).ge :z
1299
+ puts exp4
1300
+ ```
1301
+
1302
+ ```
1303
+ ## a + b >= z
1304
+ ```
1305
+
1306
+ Two type of expression can only be created with the functional representation of the operators,
1307
+ those are expressions involving '==', and '='. In order to write an expression involving '==' we
1308
+ need to use the method '.eq' and for '=' we need the function '.assign'
1309
+
1310
+
1311
+ ```ruby
1312
+ exp5 = (:a + :b).eq :z
1313
+ puts exp5
1314
+ ```
1315
+
1316
+ ```
1317
+ ## a + b == z
1318
+ ```
1319
+
1320
+
1321
+ ```ruby
1322
+ exp6 = :y.assign :a + :b
1323
+ puts exp6
1324
+ ```
1325
+
1326
+ ```
1327
+ ## y <- a + b
1328
+ ```
1329
+ In general we think that using the functional notation is preferable to using the
1330
+ symbolic notation as otherwise, we end up writing invalid expressions such as
1331
+
1332
+
1333
+ ```ruby
1334
+ exp_wrong = (:a + :b) == :z
1335
+ puts exp_wrong
1336
+ ```
1337
+
1338
+ ```
1339
+ ## Message:
1340
+ ## Error in function (x, y, num.eq = TRUE, single.NA = TRUE, attrib.as.set = TRUE, :
1341
+ ## object 'a' not found (RError)
1342
+ ## Translated to internal error
1343
+ ```
1344
+ and it might be difficult to understand what is going on here. The problem lies with the fact that
1345
+ when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When the
1346
+ comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols at
1347
+ this time are not bound to anything and we get a "object 'a' not found" message.
1348
+ If we only use functional notation, this type of error will not occur.
1349
+
1350
+ ## Expressions with R methods
1351
+
1352
+ It is often necessary to create an expression that uses a method or function. For instance, in
1353
+ mathematics, it's quite natural to write an expressin such as $y = sin(x)$. In this case, the
1354
+ 'sin' function is part of the expression and should not immediately executed. Now, let's say
1355
+ that 'x' is an angle of 45$^\circ$ and we acttually want our expression to be $y = 0.850...$.
1356
+ When we want the function to be part of the expression, we call the function preceeding it
1357
+ by the letter E, such as 'E.sin(x)'
1358
+
1359
+
1360
+ ```ruby
1361
+ exp7 = :y.assign E.sin(:x)
1362
+ puts exp7
1363
+ ```
1364
+
1365
+ ```
1366
+ ## y <- sin(x)
1367
+ ```
1368
+
1369
+ # Manipulating Data
1370
+
1371
+ One of the major benefits of Galaaz is to bring strong data manipulation to Ruby. The following
1372
+ examples were extracted from Hardley's "R for Data Science" (https://r4ds.had.co.nz/). This
1373
+ is a highly recommended book for those not already familiar with the 'tidyverse' style of
1374
+ programming in R. In the sections to follow, we will limit ourselves to convert the R code to
1375
+ Galaaz.
1376
+
1377
+ For these
1378
+ examples, we will investigate the nycflights13 data set available on the package by the
1379
+ same name. We use function 'R.install_and_loads' that checks if the library is available
1380
+ locally, and if not, installs it. This data frame contains all 336,776 flights that
1381
+ departed from New York City in 2013. The data comes from the US Bureau of
1382
+ Transportation Statistics.
1383
+
1384
+
1385
+ ```ruby
1386
+ R.install_and_loads('nycflights13')
1387
+ R.library('dplyr')
1388
+ ```
1389
+
1390
+
1391
+ ```ruby
1392
+ @flights = ~:flights
1393
+ puts @flights.head.as__data__frame
1394
+ ```
1395
+
1396
+ ```
1397
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1398
+ ## 1 2013 1 1 517 515 2 830 819
1399
+ ## 2 2013 1 1 533 529 4 850 830
1400
+ ## 3 2013 1 1 542 540 2 923 850
1401
+ ## 4 2013 1 1 544 545 -1 1004 1022
1402
+ ## 5 2013 1 1 554 600 -6 812 837
1403
+ ## 6 2013 1 1 554 558 -4 740 728
1404
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1405
+ ## 1 11 UA 1545 N14228 EWR IAH 227 1400 5
1406
+ ## 2 20 UA 1714 N24211 LGA IAH 227 1416 5
1407
+ ## 3 33 AA 1141 N619AA JFK MIA 160 1089 5
1408
+ ## 4 -18 B6 725 N804JB JFK BQN 183 1576 5
1409
+ ## 5 -25 DL 461 N668DN LGA ATL 116 762 6
1410
+ ## 6 12 UA 1696 N39463 EWR ORD 150 719 5
1411
+ ## minute time_hour
1412
+ ## 1 15 2013-01-01 05:00:00
1413
+ ## 2 29 2013-01-01 05:00:00
1414
+ ## 3 40 2013-01-01 05:00:00
1415
+ ## 4 45 2013-01-01 05:00:00
1416
+ ## 5 0 2013-01-01 06:00:00
1417
+ ## 6 58 2013-01-01 05:00:00
1418
+ ```
1419
+
1420
+ ## Filtering rows with Filter
1421
+
1422
+ In this example we filter the flights data set by giving to the filter function two expressions:
1423
+ the first :month.eq 1
1424
+
1425
+
1426
+ ```ruby
1427
+ puts @flights.filter((:month.eq 1), (:day.eq 1)).head.as__data__frame
1428
+ ```
1429
+
1430
+ ```
1431
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1432
+ ## 1 2013 1 1 517 515 2 830 819
1433
+ ## 2 2013 1 1 533 529 4 850 830
1434
+ ## 3 2013 1 1 542 540 2 923 850
1435
+ ## 4 2013 1 1 544 545 -1 1004 1022
1436
+ ## 5 2013 1 1 554 600 -6 812 837
1437
+ ## 6 2013 1 1 554 558 -4 740 728
1438
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1439
+ ## 1 11 UA 1545 N14228 EWR IAH 227 1400 5
1440
+ ## 2 20 UA 1714 N24211 LGA IAH 227 1416 5
1441
+ ## 3 33 AA 1141 N619AA JFK MIA 160 1089 5
1442
+ ## 4 -18 B6 725 N804JB JFK BQN 183 1576 5
1443
+ ## 5 -25 DL 461 N668DN LGA ATL 116 762 6
1444
+ ## 6 12 UA 1696 N39463 EWR ORD 150 719 5
1445
+ ## minute time_hour
1446
+ ## 1 15 2013-01-01 05:00:00
1447
+ ## 2 29 2013-01-01 05:00:00
1448
+ ## 3 40 2013-01-01 05:00:00
1449
+ ## 4 45 2013-01-01 05:00:00
1450
+ ## 5 0 2013-01-01 06:00:00
1451
+ ## 6 58 2013-01-01 05:00:00
1452
+ ```
1453
+
1454
+ ## Logical Operators
1455
+
1456
+ All flights that departed in November of December
1457
+
1458
+
1459
+ ```ruby
1460
+ puts @flights.filter((:month.eq 11) | (:month.eq 12)).head.as__data__frame
1461
+ ```
1462
+
1463
+ ```
1464
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1465
+ ## 1 2013 11 1 5 2359 6 352 345
1466
+ ## 2 2013 11 1 35 2250 105 123 2356
1467
+ ## 3 2013 11 1 455 500 -5 641 651
1468
+ ## 4 2013 11 1 539 545 -6 856 827
1469
+ ## 5 2013 11 1 542 545 -3 831 855
1470
+ ## 6 2013 11 1 549 600 -11 912 923
1471
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1472
+ ## 1 7 B6 745 N568JB JFK PSE 205 1617 23
1473
+ ## 2 87 B6 1816 N353JB JFK SYR 36 209 22
1474
+ ## 3 -10 US 1895 N192UW EWR CLT 88 529 5
1475
+ ## 4 29 UA 1714 N38727 LGA IAH 229 1416 5
1476
+ ## 5 -24 AA 2243 N5CLAA JFK MIA 147 1089 5
1477
+ ## 6 -11 UA 303 N595UA JFK SFO 359 2586 6
1478
+ ## minute time_hour
1479
+ ## 1 59 2013-11-01 23:00:00
1480
+ ## 2 50 2013-11-01 22:00:00
1481
+ ## 3 0 2013-11-01 05:00:00
1482
+ ## 4 45 2013-11-01 05:00:00
1483
+ ## 5 45 2013-11-01 05:00:00
1484
+ ## 6 0 2013-11-01 06:00:00
1485
+ ```
1486
+
1487
+ The same as above, but using the 'in' operator. In R, it is possible to define many operators
1488
+ by doing %<op>%. The %in% operator checks if a value is in a vector. In order to use those
1489
+ operators from Galaaz the '._' method is used, where the first argument is the operator's
1490
+ symbol, in this case ':in' and the second argument is the vector:
1491
+
1492
+
1493
+ ```ruby
1494
+ puts @flights.filter(:month._ :in, R.c(11, 12)).head.as__data__frame
1495
+ ```
1496
+
1497
+ ```
1498
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1499
+ ## 1 2013 11 1 5 2359 6 352 345
1500
+ ## 2 2013 11 1 35 2250 105 123 2356
1501
+ ## 3 2013 11 1 455 500 -5 641 651
1502
+ ## 4 2013 11 1 539 545 -6 856 827
1503
+ ## 5 2013 11 1 542 545 -3 831 855
1504
+ ## 6 2013 11 1 549 600 -11 912 923
1505
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1506
+ ## 1 7 B6 745 N568JB JFK PSE 205 1617 23
1507
+ ## 2 87 B6 1816 N353JB JFK SYR 36 209 22
1508
+ ## 3 -10 US 1895 N192UW EWR CLT 88 529 5
1509
+ ## 4 29 UA 1714 N38727 LGA IAH 229 1416 5
1510
+ ## 5 -24 AA 2243 N5CLAA JFK MIA 147 1089 5
1511
+ ## 6 -11 UA 303 N595UA JFK SFO 359 2586 6
1512
+ ## minute time_hour
1513
+ ## 1 59 2013-11-01 23:00:00
1514
+ ## 2 50 2013-11-01 22:00:00
1515
+ ## 3 0 2013-11-01 05:00:00
1516
+ ## 4 45 2013-11-01 05:00:00
1517
+ ## 5 45 2013-11-01 05:00:00
1518
+ ## 6 0 2013-11-01 06:00:00
1519
+ ```
1520
+
1521
+ ## Filtering with NA (Not Available)
1522
+
1523
+ Let's first create a 'tibble' with a Not Available value (R::NA). Tibbles are a modern
1524
+ version of a data frame and operate very similarly to one. It differs in how it outputs
1525
+ the values and the result of some subsetting operations that are more consistent than
1526
+ what is obtained from data frame.
1527
+
1528
+
1529
+ ```ruby
1530
+ @df = R.tibble(x: R.c(1, R::NA, 3))
1531
+ puts @df.as__data__frame
1532
+ ```
1533
+
1534
+ ```
1535
+ ## x
1536
+ ## 1 1
1537
+ ## 2 NA
1538
+ ## 3 3
1539
+ ```
1540
+
1541
+ Now filtering by :x > 1 shows all lines that satisfy this condition, where the row with R:NA does
1542
+ not.
705
1543
 
706
1544
 
707
1545
  ```ruby
708
- require 'ggplot'
1546
+ puts @df.filter(:x > 1).as__data__frame
1547
+ ```
1548
+
1549
+ ```
1550
+ ## x
1551
+ ## 1 3
1552
+ ```
1553
+
1554
+ To match an NA use method 'is__na'
1555
+
1556
+
1557
+ ```ruby
1558
+ puts @df.filter((:x.is__na) | (:x > 1)).as__data__frame
1559
+ ```
1560
+
1561
+ ```
1562
+ ## x
1563
+ ## 1 NA
1564
+ ## 2 3
1565
+ ```
709
1566
 
710
- R.theme_set R.theme_bw
1567
+ ## Arrange Rows with arrange
711
1568
 
712
- # Data Prep
713
- mtcars = ~:mtcars
714
- mtcars.car_name = R.rownames(:mtcars)
715
- # compute normalized mpg
716
- mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean)/mtcars.mpg.sd).round 2
717
- mtcars.mpg_type = mtcars.mpg_z < 0 ? "below" : "above"
718
- mtcars = mtcars[mtcars.mpg_z.order, :all]
719
- # convert to factor to retain sorted order in plot
720
- mtcars.car_name = mtcars.car_name.factor levels: mtcars.car_name
1569
+ Arrange reorders the rows of a data frame by the given arguments.
721
1570
 
722
- # Diverging Barcharts
723
- gg = mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
724
- R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
725
- R.scale_fill_manual(name: "Mileage",
726
- labels: R.c("Above Average", "Below Average"),
727
- values: R.c("above": "#00ba38", "below": "#f8766d")) +
728
- R.labs(subtitle: "Normalised mileage from 'mtcars'",
729
- title: "Diverging Bars") +
730
- R.coord_flip()
731
1571
 
732
- puts gg
1572
+ ```ruby
1573
+ puts @flights.arrange(:year, :month, :day).head.as__data__frame
1574
+ ```
1575
+
1576
+ ```
1577
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1578
+ ## 1 2013 1 1 517 515 2 830 819
1579
+ ## 2 2013 1 1 533 529 4 850 830
1580
+ ## 3 2013 1 1 542 540 2 923 850
1581
+ ## 4 2013 1 1 544 545 -1 1004 1022
1582
+ ## 5 2013 1 1 554 600 -6 812 837
1583
+ ## 6 2013 1 1 554 558 -4 740 728
1584
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1585
+ ## 1 11 UA 1545 N14228 EWR IAH 227 1400 5
1586
+ ## 2 20 UA 1714 N24211 LGA IAH 227 1416 5
1587
+ ## 3 33 AA 1141 N619AA JFK MIA 160 1089 5
1588
+ ## 4 -18 B6 725 N804JB JFK BQN 183 1576 5
1589
+ ## 5 -25 DL 461 N668DN LGA ATL 116 762 6
1590
+ ## 6 12 UA 1696 N39463 EWR ORD 150 719 5
1591
+ ## minute time_hour
1592
+ ## 1 15 2013-01-01 05:00:00
1593
+ ## 2 29 2013-01-01 05:00:00
1594
+ ## 3 40 2013-01-01 05:00:00
1595
+ ## 4 45 2013-01-01 05:00:00
1596
+ ## 5 0 2013-01-01 06:00:00
1597
+ ## 6 58 2013-01-01 05:00:00
733
1598
  ```
734
1599
 
1600
+ To arrange in descending order, use function 'desc'
1601
+
1602
+
1603
+ ```ruby
1604
+ puts @flights.arrange(:dep_delay.desc).head.as__data__frame
1605
+ ```
1606
+
1607
+ ```
1608
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1609
+ ## 1 2013 1 9 641 900 1301 1242 1530
1610
+ ## 2 2013 6 15 1432 1935 1137 1607 2120
1611
+ ## 3 2013 1 10 1121 1635 1126 1239 1810
1612
+ ## 4 2013 9 20 1139 1845 1014 1457 2210
1613
+ ## 5 2013 7 22 845 1600 1005 1044 1815
1614
+ ## 6 2013 4 10 1100 1900 960 1342 2211
1615
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1616
+ ## 1 1272 HA 51 N384HA JFK HNL 640 4983 9
1617
+ ## 2 1127 MQ 3535 N504MQ JFK CMH 74 483 19
1618
+ ## 3 1109 MQ 3695 N517MQ EWR ORD 111 719 16
1619
+ ## 4 1007 AA 177 N338AA JFK SFO 354 2586 18
1620
+ ## 5 989 MQ 3075 N665MQ JFK CVG 96 589 16
1621
+ ## 6 931 DL 2391 N959DL JFK TPA 139 1005 19
1622
+ ## minute time_hour
1623
+ ## 1 0 2013-01-09 09:00:00
1624
+ ## 2 35 2013-06-15 19:00:00
1625
+ ## 3 35 2013-01-10 16:00:00
1626
+ ## 4 45 2013-09-20 18:00:00
1627
+ ## 5 0 2013-07-22 16:00:00
1628
+ ## 6 0 2013-04-10 19:00:00
1629
+ ```
1630
+
1631
+ ## Selecting columns
1632
+
1633
+ To select specific columns from a dataset we use function 'select':
1634
+
1635
+
1636
+ ```ruby
1637
+ puts @flights.select(:year, :month, :day).head.as__data__frame
1638
+ ```
1639
+
1640
+ ```
1641
+ ## year month day
1642
+ ## 1 2013 1 1
1643
+ ## 2 2013 1 1
1644
+ ## 3 2013 1 1
1645
+ ## 4 2013 1 1
1646
+ ## 5 2013 1 1
1647
+ ## 6 2013 1 1
1648
+ ```
1649
+
1650
+ It is also possible to select column in a given range
1651
+
1652
+
1653
+ ```ruby
1654
+ puts @flights.select(:year.up_to :day).head.as__data__frame
1655
+ ```
1656
+
1657
+ ```
1658
+ ## year month day
1659
+ ## 1 2013 1 1
1660
+ ## 2 2013 1 1
1661
+ ## 3 2013 1 1
1662
+ ## 4 2013 1 1
1663
+ ## 5 2013 1 1
1664
+ ## 6 2013 1 1
1665
+ ```
1666
+
1667
+ Select all columns that start with a given name sequence
1668
+
1669
+
1670
+ ```ruby
1671
+ puts @flights.select(E.starts_with('arr')).head.as__data__frame
1672
+ ```
1673
+
1674
+ ```
1675
+ ## arr_time arr_delay
1676
+ ## 1 830 11
1677
+ ## 2 850 20
1678
+ ## 3 923 33
1679
+ ## 4 1004 -18
1680
+ ## 5 812 -25
1681
+ ## 6 740 12
1682
+ ```
1683
+
1684
+ Other functions that can be used:
1685
+
1686
+ * ends_with("xyz"): matches names that end with “xyz”.
1687
+
1688
+ * contains("ijk"): matches names that contain “ijk”.
1689
+
1690
+ * matches("(.)\\1"): selects variables that match a regular expression. This one matches
1691
+ any variables that contain repeated characters.
1692
+
1693
+ * num_range("x", (1..3)): matches x1, x2 and x3
1694
+
1695
+ A helper function that comes in handy when we just want to rearrange column order is 'Everything':
1696
+
1697
+
1698
+ ```ruby
1699
+ puts @flights.select(:year, :month, :day, E.everything).head.as__data__frame
1700
+ ```
1701
+
1702
+ ```
1703
+ ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
1704
+ ## 1 2013 1 1 517 515 2 830 819
1705
+ ## 2 2013 1 1 533 529 4 850 830
1706
+ ## 3 2013 1 1 542 540 2 923 850
1707
+ ## 4 2013 1 1 544 545 -1 1004 1022
1708
+ ## 5 2013 1 1 554 600 -6 812 837
1709
+ ## 6 2013 1 1 554 558 -4 740 728
1710
+ ## arr_delay carrier flight tailnum origin dest air_time distance hour
1711
+ ## 1 11 UA 1545 N14228 EWR IAH 227 1400 5
1712
+ ## 2 20 UA 1714 N24211 LGA IAH 227 1416 5
1713
+ ## 3 33 AA 1141 N619AA JFK MIA 160 1089 5
1714
+ ## 4 -18 B6 725 N804JB JFK BQN 183 1576 5
1715
+ ## 5 -25 DL 461 N668DN LGA ATL 116 762 6
1716
+ ## 6 12 UA 1696 N39463 EWR ORD 150 719 5
1717
+ ## minute time_hour
1718
+ ## 1 15 2013-01-01 05:00:00
1719
+ ## 2 29 2013-01-01 05:00:00
1720
+ ## 3 40 2013-01-01 05:00:00
1721
+ ## 4 45 2013-01-01 05:00:00
1722
+ ## 5 0 2013-01-01 06:00:00
1723
+ ## 6 58 2013-01-01 05:00:00
1724
+ ```
1725
+
1726
+ ## Add variables to a dataframe with 'mutate'
1727
+
1728
+
1729
+ ```ruby
1730
+ @flights_sm = @flights.
1731
+ select((:year.up_to :day),
1732
+ E.ends_with('delay'),
1733
+ :distance,
1734
+ :air_time)
1735
+
1736
+ puts @flights_sm.head.as__data__frame
1737
+ ```
1738
+
1739
+ ```
1740
+ ## year month day dep_delay arr_delay distance air_time
1741
+ ## 1 2013 1 1 2 11 1400 227
1742
+ ## 2 2013 1 1 4 20 1416 227
1743
+ ## 3 2013 1 1 2 33 1089 160
1744
+ ## 4 2013 1 1 -1 -18 1576 183
1745
+ ## 5 2013 1 1 -6 -25 762 116
1746
+ ## 6 2013 1 1 -4 12 719 150
1747
+ ```
1748
+
1749
+
1750
+ ```ruby
1751
+ @flights_sm = @flights_sm.
1752
+ mutate(gain: :dep_delay - :arr_delay,
1753
+ speed: :distance / :air_time * 60)
1754
+ puts @flights_sm.head.as__data__frame
1755
+ ```
1756
+
1757
+ ```
1758
+ ## year month day dep_delay arr_delay distance air_time gain speed
1759
+ ## 1 2013 1 1 2 11 1400 227 -9 370.0441
1760
+ ## 2 2013 1 1 4 20 1416 227 -16 374.2731
1761
+ ## 3 2013 1 1 2 33 1089 160 -31 408.3750
1762
+ ## 4 2013 1 1 -1 -18 1576 183 17 516.7213
1763
+ ## 5 2013 1 1 -6 -25 762 116 19 394.1379
1764
+ ## 6 2013 1 1 -4 12 719 150 -16 287.6000
1765
+ ```
1766
+
1767
+ # Graphics in Galaaz
1768
+
1769
+ Creating graphics in Galaaz is quite easy, as it can use all the power of ggplot2. There are
1770
+ many resources in the web that teaches ggplot, so here we give a quick example of ggplot
1771
+ integration with Ruby. We continue to use the :mtcars dataset and we will plot a diverging
1772
+ bar plot, showing cars that have 'above' or 'below' gas consuption. Let's first prepare
1773
+ the data frame with the necessary data:
1774
+
1775
+
1776
+ ```ruby
1777
+ # copy the R variable :mtcars to the Ruby mtcars variable
1778
+ @mtcars = ~:mtcars
1779
+
1780
+ # create a new column 'car_name' to store the car names so that it can be
1781
+ # used for plotting. The 'rownames' of the data frame cannot be used as
1782
+ # data for plotting
1783
+ @mtcars.car_name = R.rownames(:mtcars)
1784
+
1785
+ # compute normalized mpg and add it to a new column called mpg_z
1786
+ # Note that the mean value for mpg can be obtained by calling the 'mean'
1787
+ # function on the vector 'mtcars.mpg'. The same with the standard
1788
+ # deviation 'sd'. The vector is then rounded to two digits with 'round 2'
1789
+ @mtcars.mpg_z = ((@mtcars.mpg - @mtcars.mpg.mean)/@mtcars.mpg.sd).round 2
1790
+
1791
+ # create a new column 'mpg_type'. Function 'ifelse' is a vectorized function
1792
+ # that looks at every element of the mpg_z vector and if the value is below
1793
+ # 0, returns 'below', otherwise returns 'above'
1794
+ @mtcars.mpg_type = (@mtcars.mpg_z < 0).ifelse("below", "above")
1795
+
1796
+ # order the mtcar data set by the mpg_z vector from smaler to larger values
1797
+ @mtcars = @mtcars[@mtcars.mpg_z.order, :all]
1798
+
1799
+ # convert the car_name column to a factor to retain sorted order in plot
1800
+ @mtcars.car_name = @mtcars.car_name.factor levels: @mtcars.car_name
1801
+
1802
+ # let's look at the final data frame
1803
+ puts @mtcars
1804
+ ```
1805
+
1806
+ ```
1807
+ ## mpg cyl disp hp drat wt qsec vs am gear carb
1808
+ ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
1809
+ ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
1810
+ ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
1811
+ ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
1812
+ ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
1813
+ ## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
1814
+ ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
1815
+ ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
1816
+ ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
1817
+ ## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
1818
+ ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
1819
+ ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
1820
+ ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
1821
+ ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
1822
+ ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
1823
+ ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
1824
+ ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
1825
+ ## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
1826
+ ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
1827
+ ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
1828
+ ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
1829
+ ## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
1830
+ ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
1831
+ ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
1832
+ ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
1833
+ ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
1834
+ ## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
1835
+ ## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
1836
+ ## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
1837
+ ## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
1838
+ ## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
1839
+ ## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
1840
+ ## car_name mpg_z mpg_type
1841
+ ## Cadillac Fleetwood Cadillac Fleetwood -1.61 below
1842
+ ## Lincoln Continental Lincoln Continental -1.61 below
1843
+ ## Camaro Z28 Camaro Z28 -1.13 below
1844
+ ## Duster 360 Duster 360 -0.96 below
1845
+ ## Chrysler Imperial Chrysler Imperial -0.89 below
1846
+ ## Maserati Bora Maserati Bora -0.84 below
1847
+ ## Merc 450SLC Merc 450SLC -0.81 below
1848
+ ## AMC Javelin AMC Javelin -0.81 below
1849
+ ## Dodge Challenger Dodge Challenger -0.76 below
1850
+ ## Ford Pantera L Ford Pantera L -0.71 below
1851
+ ## Merc 450SE Merc 450SE -0.61 below
1852
+ ## Merc 450SL Merc 450SL -0.46 below
1853
+ ## Merc 280C Merc 280C -0.38 below
1854
+ ## Valiant Valiant -0.33 below
1855
+ ## Hornet Sportabout Hornet Sportabout -0.23 below
1856
+ ## Merc 280 Merc 280 -0.15 below
1857
+ ## Pontiac Firebird Pontiac Firebird -0.15 below
1858
+ ## Ferrari Dino Ferrari Dino -0.06 below
1859
+ ## Mazda RX4 Mazda RX4 0.15 above
1860
+ ## Mazda RX4 Wag Mazda RX4 Wag 0.15 above
1861
+ ## Hornet 4 Drive Hornet 4 Drive 0.22 above
1862
+ ## Volvo 142E Volvo 142E 0.22 above
1863
+ ## Toyota Corona Toyota Corona 0.23 above
1864
+ ## Datsun 710 Datsun 710 0.45 above
1865
+ ## Merc 230 Merc 230 0.45 above
1866
+ ## Merc 240D Merc 240D 0.72 above
1867
+ ## Porsche 914-2 Porsche 914-2 0.98 above
1868
+ ## Fiat X1-9 Fiat X1-9 1.20 above
1869
+ ## Honda Civic Honda Civic 1.71 above
1870
+ ## Lotus Europa Lotus Europa 1.71 above
1871
+ ## Fiat 128 Fiat 128 2.04 above
1872
+ ## Toyota Corolla Toyota Corolla 2.29 above
1873
+ ```
1874
+ Now, lets plot the diverging bar plot. When using gKnit, there is no need to call
1875
+ 'R.awt' to create a plotting device, since gKnit does take care of it:
735
1876
 
736
- ![](/home/rbotafogo/desenv/galaaz/blogs/manual/manual_files/figure-html/diverging_bar.png)<!-- -->
737
1877
 
738
1878
 
739
1879
  [TO BE CONTINUED...]