galaaz 0.4.7 → 0.4.8

Sign up to get free protection for your applications and to get access to all the features.
Files changed (68) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +1179 -39
  3. data/Rakefile +4 -2
  4. data/bin/grun +1 -1
  5. data/bin/gstudio +1 -1
  6. data/blogs/dev/dev.Rmd +2 -56
  7. data/blogs/dev/dev.md +32 -61
  8. data/blogs/dev/dev2.Rmd +65 -0
  9. data/blogs/dplyr/dplyr.Rmd +29 -0
  10. data/blogs/{dev/dev.html → dplyr/dplyr.html} +88 -57
  11. data/blogs/dplyr/dplyr.md +58 -0
  12. data/blogs/gknit/gknit.html +1262 -25
  13. data/blogs/gknit/gknit.md +471 -27
  14. data/blogs/gknit/gknit_files/figure-html/bubble-1.png +0 -0
  15. data/blogs/manual/graph.rb +29 -0
  16. data/blogs/manual/manual.Rmd +567 -29
  17. data/blogs/manual/manual.html +743 -46
  18. data/blogs/manual/manual.md +1179 -39
  19. data/blogs/nse_dplyr/nse_dplyr.Rmd +466 -11
  20. data/blogs/nse_dplyr/nse_dplyr.html +472 -37
  21. data/blogs/nse_dplyr/nse_dplyr.md +645 -32
  22. data/blogs/ruby_plot/ruby_plot.Rmd +4 -4
  23. data/blogs/ruby_plot/ruby_plot.html +217 -2
  24. data/blogs/ruby_plot/ruby_plot.md +226 -1
  25. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
  26. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +2 -2
  27. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
  28. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +70 -70
  29. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
  30. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +72 -72
  31. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
  32. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +116 -116
  33. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
  34. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +176 -176
  35. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
  36. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
  37. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +236 -236
  38. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
  39. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +176 -176
  40. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
  41. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +160 -160
  42. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
  43. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +105 -105
  44. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
  45. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +121 -121
  46. data/examples/islr/ch2.spec.rb +1 -1
  47. data/examples/islr/ch3_boston.rb +4 -4
  48. data/examples/islr/x_y_rnorm.jpg +0 -0
  49. data/lib/R_interface/r.rb +1 -1
  50. data/lib/R_interface/r_methods.rb +2 -2
  51. data/lib/R_interface/rdata_frame.rb +8 -5
  52. data/lib/R_interface/rindexed_object.rb +1 -2
  53. data/lib/R_interface/rlist.rb +1 -0
  54. data/lib/R_interface/robject.rb +0 -1
  55. data/lib/R_interface/rpkg.rb +14 -6
  56. data/lib/R_interface/rsupport.rb +7 -9
  57. data/lib/R_interface/ruby_extensions.rb +17 -5
  58. data/lib/gknit/knitr_engine.rb +9 -2
  59. data/lib/util/exec_ruby.rb +2 -2
  60. data/specs/r_dataframe.spec.rb +173 -0
  61. data/specs/r_list.spec.rb +4 -4
  62. data/specs/ruby_expression.spec.rb +2 -11
  63. data/specs/tmp.rb +76 -34
  64. data/version.rb +1 -1
  65. metadata +17 -6
  66. data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
  67. data/blogs/dev/dev_files/figure-html/diverging_bar. +0 -0
  68. data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0
@@ -0,0 +1,29 @@
1
+ # Graphics with ggplot
2
+
3
+ ```{ruby diverging_bar}
4
+ require 'ggplot'
5
+
6
+ R.theme_set R.theme_bw
7
+
8
+ # Data Prep
9
+ mtcars = ~:mtcars
10
+ mtcars.car_name = R.rownames(:mtcars)
11
+ # compute normalized mpg
12
+ mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean)/mtcars.mpg.sd).round 2
13
+ mtcars.mpg_type = (mtcars.mpg_z > 0) ? "below" : "above"
14
+ mtcars = mtcars[mtcars.mpg_z.order, :all]
15
+ # convert to factor to retain sorted order in plot
16
+ mtcars.car_name = mtcars.car_name.factor levels: mtcars.car_name
17
+
18
+ # Diverging Barcharts
19
+ gg = mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
20
+ R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
21
+ R.scale_fill_manual(name: "Mileage",
22
+ labels: R.c("Above Average", "Below Average"),
23
+ values: R.c("above": "#00ba38", "below": "#f8766d")) +
24
+ R.labs(subtitle: "Normalised mileage from 'mtcars'",
25
+ title: "Diverging Bars") +
26
+ R.coord_flip()
27
+
28
+ puts gg
29
+ ```
@@ -24,7 +24,6 @@ fontsize: 11pt
24
24
  R.install_and_loads('kableExtra')
25
25
  ```
26
26
 
27
-
28
27
  # Introduction
29
28
 
30
29
  Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with a large
@@ -93,9 +92,20 @@ Panda, SciPy, SciKit-Learn and a couple more.
93
92
 
94
93
  > galaaz master_list:scatter_plot
95
94
 
96
- # Basic Types
95
+ # gKnitting a Document
96
+
97
+ This manual has been formatted usign gKnit. gKnit uses Knitr and R markdown to knit
98
+ a document in Ruby or R and output it in any of the available formats for R markdown.
99
+ gKnit runs atop of GraalVM, and Galaaz. In gKnit, Ruby variables are persisted between
100
+ chunks, making it an ideal solution for literate programming.
101
+ Also, since it is based on Galaaz, Ruby chunks can have access to R variables and Polyglot
102
+ Programming with Ruby and R is quite natural.
103
+
104
+ gknit was describe in more depth in:
105
+
106
+ * xxx.xxxx.xxx
97
107
 
98
- ## Vectors
108
+ # Vector
99
109
 
100
110
  Vectors can be thought of as contiguous cells containing data. Cells are accessed through
101
111
  indexing operations such as x[5]. Galaaz has six basic (‘atomic’) vector types: logical,
@@ -126,11 +136,13 @@ puts @vec
126
136
 
127
137
  Lets take a look at the type, mode and storage.mode of our vector @vec. In order to print
128
138
  this out, we are creating a data frame 'df' and printing it out. A data frame, for those
129
- not familiar with it, it basically a table. Here we create the data frame and add the
139
+ not familiar with it, is basically a table. Here we create the data frame and add the
130
140
  column name by passing named parameters for each column, such as 'typeof:', 'mode:' and
131
- 'storage__mode'. You should also note here that the double underscore is converted to a '.'.
141
+ 'storage__mode?'. You should also note here that the double underscore is converted to a '.'.
142
+ So, when printed 'storage\_\_mode' will actually print as 'storage.mode'.
132
143
 
133
- In R, the method used to create a data frame is 'data.frame', in Galaaz we use 'data__frame'.
144
+ Data frames will later be more carefully described. In R, the method used to create a
145
+ data frame is 'data.frame', in Galaaz we use 'data\_\_frame'.
134
146
 
135
147
  ```{ruby typeof_integer}
136
148
  df = R.data__frame(typeof: @vec.typeof, mode: @vec.mode, storage__mode: @vec.storage__mode)
@@ -161,41 +173,567 @@ of the error.
161
173
  vec = R.c(1, hello, 5)
162
174
  ```
163
175
 
176
+ Here is a vector with logical values
177
+
178
+ ```{ruby logical_vector}
179
+ @vec = R.c(true, true, false, false, true)
180
+ puts @vec
181
+ ```
182
+
183
+ ## Combining Vectors
184
+
185
+ The 'c' functions used to create vectors can also be used to combine two vectors:
186
+
187
+ ```{ruby combining_vectors}
188
+ @vec1 = R.c(10.0, 20.0, 30.0)
189
+ @vec2 = R.c(4.0, 5.0, 6.0)
190
+ @vec = R.c(@vec1, @vec2)
191
+ puts @vec
192
+ ```
193
+ In galaaz, methods can be chainned (somewhat like the pipe operator in R %>%, but more generic).
194
+ In this next example, method 'c' is chainned after '@vec1'. This also looks like 'c' is a
195
+ method of the vector, but in reallity, this is actually closer to the pipe operator. When
196
+ Galaaz identifies that 'c' is not a method of 'vec' it actually tries to call 'R.c' with
197
+ '@vec1' as the first argument concatenated with all the other available arguments. The code
198
+ bellow is automatically converted to the code above.
199
+
200
+ ```{ruby chainning_methods}
201
+ @vec = @vec1.c(@vec2)
202
+ puts @vec
203
+ ```
204
+
205
+ ## Vector Arithmetic
206
+
207
+ Arithmetic operations on vectors are performed element by element:
208
+
209
+ ```{ruby vec_arith1}
210
+ puts @vec1 + @vec2
211
+ ```
212
+
213
+ ```{ruby mult}
214
+ puts @vec1 * 5
215
+ ```
216
+
217
+ When vectors have different length, a recycling rule is applied to the shorter vector:
218
+
219
+ ```{ruby recycle}
220
+ @vec3 = R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)
221
+ puts @vec4 = @vec1 + @vec3
222
+ ```
223
+
224
+ ## Vector Indexing
225
+
226
+ Vectors can be indexed by using the '[]' operator:
227
+
228
+ ```{ruby index}
229
+ puts @vec4[3]
230
+ ```
231
+
232
+ We can also index a vector with another vector. For example, in the code bellow, we take elements
233
+ 1, 3, 5, and 7 from @vec3:
234
+
235
+ ```{ruby index_by_vector}
236
+ puts @vec4[R.c(1, 3, 5, 7)]
237
+ ```
238
+
239
+ Repeating an index and having indices out of order is valid code:
240
+
241
+ ```{ruby repeated_index}
242
+ puts @vec4[R.c(1, 3, 3, 1)]
243
+ ```
244
+
245
+ It is also possible to index a vector with a negative number or negative vector. In these cases
246
+ the indexed values are not returned:
247
+
248
+ ```{ruby neg_index}
249
+ puts @vec4[-3]
250
+ puts @vec4[-R.c(1, 3, 5, 7)]
251
+ ```
252
+
253
+ If an index is out of range, a missing value (NA) will be reported.
254
+
255
+ ```{ruby out_of_range}
256
+ puts @vec4[30]
257
+ ```
258
+
259
+ It is also possible to index a vector by range:
260
+
261
+ ```{ruby range}
262
+ puts @vec4[(2..5)]
263
+ ```
264
+
265
+ Elements in a vector can be named using the 'names' attribute of a vector:
266
+
267
+ ```{ruby naming}
268
+ full_name = R.c("Rodrigo", "A", "Botafogo")
269
+ full_name.names = R.c("First", "Middle", "Last")
270
+ puts full_name
271
+ ```
272
+
273
+ Or it can also be named by using the 'c' function with named paramenters:
274
+
275
+ ```{ruby named_param}
276
+ full_name = R.c(First: "Rodrigo", Middle: "A", Last: "Botafogo")
277
+ puts full_name
278
+ ```
279
+
280
+ ## Extracting Native Ruby Types from a Vector
281
+
282
+ Vectors created with 'R.c' are of class R::Vector. You might have noticed that when indexing a
283
+ vector, a new vector is returned, even if this vector has one single element. In order to use
284
+ R::Vector with other ruby classes it might be necessary to extract the actual Ruby native type
285
+ from the vector. In order to do this extraction the '>>' operator is used.
286
+
287
+ ```{ruby ruby_native}
288
+ puts @vec4
289
+ puts @vec4 >> 0
290
+ puts @vec4 >> 4
291
+ ```
292
+
293
+ Note that indexing with '>>' starts at 0 and not at 1, also, we cannot do negative indexing.
294
+
295
+ # Accessing R variables
296
+
297
+ Galaaz allows Ruby to access variables created in R. For example, the 'mtcars' data set is
298
+ available in R and can be accessed from Ruby by using the 'tilda' operator followed by the
299
+ symbol for the variable, in this case ':mtcar'. In the code bellow method 'outputs' is
300
+ used to output the 'mtcars' data set nicely formatted in HTML by use of the 'kable' and
301
+ 'kable_styling' functions. Method 'outputs' is only available when used with 'gknit'.
302
+
164
303
  ```{ruby view_kable}
165
304
  outputs (~:mtcars).kable.kable_styling
166
305
  ```
167
306
 
307
+ # Matrix
308
+
309
+ A matrix is a collection of elements organized as a two dimensional table. A matrix can be
310
+ created by the 'matrix' function:
311
+
312
+ ```{ruby matrix}
313
+ @mat = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
314
+ nrow: 3,
315
+ ncol: 3)
316
+
317
+ puts @mat
318
+ ```
319
+ Note that matrices data is organized by column first. It is possible to organize the matrix
320
+ memory by row first passing an extra argument to the 'matrix' function:
321
+
322
+ ```{ruby matrix_rowfirst}
323
+ @mat_row = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
324
+ nrow: 3,
325
+ ncol: 3,
326
+ byrow: true)
327
+
328
+ puts @mat_row
329
+ ```
330
+
331
+ ## Indexing a Matrix
332
+
333
+ A matrix can be indexed by [row, column]:
334
+
335
+ ```{ruby matrix_index}
336
+ puts @mat_row[1, 1]
337
+ puts @mat_row[2, 3]
338
+ ```
339
+ It is possible to index an entire row or column with the ':all' keyword
340
+
341
+ ```{ruby matrix_index_all}
342
+ puts @mat_row[1, :all]
343
+ puts @mat_row[:all, 2]
344
+ ```
345
+
346
+ Indexing with a vector is also possible for matrices. In the following example we want
347
+ rows 1 and 3 and columns 2 and 3 building a 2 x 2 matrix.
348
+
349
+ ```{ruby matrix_index_vector}
350
+ puts @mat_row[R.c(1, 3), R.c(2, 3)]
351
+ ```
168
352
 
169
- ## Graphics with ggplot
353
+ Matrices can be combined with functions 'rbind' and 'cbind'
354
+
355
+ ```{ruby matrix_combine}
356
+ puts @mat_row.rbind(@mat)
357
+ puts @mat_row.cbind(@mat)
358
+ ```
359
+
360
+ # List
361
+
362
+ A list is a data structure that can contain sublists of different types, while vector and matrix
363
+ can only hold one type of element.
364
+
365
+ ```{ruby list}
366
+ nums = R.c(1.0, 2.0, 3.0)
367
+ strs = R.c("a", "b", "c", "d")
368
+ bool = R.c(true, true, false)
369
+ @lst = R.list(nums: nums, strs: strs, bool: bool)
370
+ puts @lst
371
+ ```
170
372
 
171
- ```{ruby diverging_bar}
172
- require 'ggplot'
373
+ Note that '@lst' elements are named elements.
173
374
 
174
- R.theme_set R.theme_bw
175
375
 
176
- # Data Prep
177
- mtcars = ~:mtcars
178
- mtcars.car_name = R.rownames(:mtcars)
179
- # compute normalized mpg
180
- mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean)/mtcars.mpg.sd).round 2
181
- mtcars.mpg_type = mtcars.mpg_z < 0 ? "below" : "above"
182
- mtcars = mtcars[mtcars.mpg_z.order, :all]
183
- # convert to factor to retain sorted order in plot
184
- mtcars.car_name = mtcars.car_name.factor levels: mtcars.car_name
376
+ ## List Indexing
377
+
378
+ List indexing, also called slicing, is done using the '[]' operator and the '[[]]' operator. Let's
379
+ first start with the '[]' operator. The list above has three sublist indexing with '[]' will
380
+ return one of the sublists.
381
+
382
+ ```{ruby list_indexing}
383
+ puts @lst[1]
384
+ ```
385
+
386
+ Note that when using '[]' a new list is returned. When using the double square bracket operator
387
+ the value returned is the actual element of the list in the given position and not a slice of
388
+ the original list
389
+
390
+
391
+ ```{ruby list_indexing_single}
392
+ puts @lst[[1]]
393
+ ```
394
+
395
+ When elements are named, as dones with @lst, indexing can be done by name:
396
+
397
+ ```{ruby list_indexing_by_name}
398
+ puts @lst[['bool']][[1]] >> 0
399
+ ```
400
+
401
+ In this example, first the 'bool' element of the list was extracted, not as a list, but as a vector,
402
+ then the first element of the vector was extracted (note that vectors also accept the '[[]]'
403
+ operator) and then the vector was indexed by its first element, extracting the native Ruby type.
404
+
405
+
406
+ # Data Frame
407
+
408
+ A data frame is a table like structure in which each column has the same number of
409
+ rows. Data frames are the basic structure for storing data for data analysis. We have already
410
+ seen a data frame previously when we accessed variable '~:mtcars'. In order to create a
411
+ data frame, function 'data__frame' is used:
412
+
413
+ ```{ruby dataframe}
414
+ df = R.data__frame(
415
+ year: R.c(2010, 2011, 2012),
416
+ income: R.c(1000.0, 1500.0, 2000.0))
417
+
418
+ puts df
419
+ ```
420
+
421
+ ## Data Frame Indexing
422
+
423
+ A data frame can be indexed the same way as a matrix, by using '[row, column]', where row and
424
+ column can either be a numeric or the name of the row or column
425
+
426
+ ```{ruby dataframe_index}
427
+ puts (~:mtcars).head
428
+ puts (~:mtcars)[1, 2]
429
+ puts (~:mtcars)['Datsun 710', 'mpg']
430
+ ```
431
+
432
+ Extracting a column from a data frame as a vector can be done by using the double square bracket
433
+ operator:
434
+
435
+ ```{ruby dataframe_column}
436
+ puts (~:mtcars)[['mpg']]
437
+ ```
185
438
 
186
- # Diverging Barcharts
187
- gg = mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
188
- R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
189
- R.scale_fill_manual(name: "Mileage",
190
- labels: R.c("Above Average", "Below Average"),
191
- values: R.c("above": "#00ba38", "below": "#f8766d")) +
192
- R.labs(subtitle: "Normalised mileage from 'mtcars'",
193
- title: "Diverging Bars") +
194
- R.coord_flip()
439
+ A data frame column can also be accessed as if it were an instance variable of the data frame:
195
440
 
196
- puts gg
441
+ ```{ruby dataframe_instance_variable}
442
+ puts (~:mtcars).mpg
197
443
  ```
198
444
 
445
+ Slicing a data frame can be done by indexing it with a vector (we use 'head' to reduce the
446
+ output):
447
+
448
+ ```{ruby dataframe_column_slice}
449
+ puts (~:mtcars)[R.c('mpg', 'hp')].head
450
+ ```
451
+
452
+ A row slice can be obtained by indexing by row and using the ':all' keyword for the column:
453
+
454
+ ```{ruby dataframe_row_slice}
455
+ puts (~:mtcars)[R.c('Datsun 710', 'Camaro Z28'), :all]
456
+ ```
457
+
458
+ Finally, a data frame can also be indexed with a logical vector. In this next example, the
459
+ 'am' column of :mtcars is compared with 0 (with method 'eq'). When 'am' is equal to 0 the
460
+ car is automatic. So, by doing '(~:mtcars).am.eq 0' a logical vector is created with
461
+ 'true' whenever 'am' is 0 and 'false' otherwise. Using this logical vector, the data frame
462
+ is indexed, returning a new data frame in which all cars have automatic transmission.
463
+
464
+ ```{ruby dataframe_logical}
465
+ # obtain a vector with 'true' for cars with automatic transmission
466
+ automatic = (~:mtcars).am.eq 0
467
+ puts automatic
468
+
469
+ # slice the data frame by using this vector
470
+ puts (~:mtcars)[automatic, :all]
471
+ ```
472
+
473
+ # Writing Expressions in Galaaz
474
+
475
+ Galaaz extends Ruby to work with complex expressions, similar to R's expressions build with 'quote'
476
+ (base R) or 'quo' (tidyverse). Let's take a look at some of those expressions.
477
+
478
+ ## Expressions from operators
479
+
480
+ The code bellow
481
+ creates an expression summing two symbols
482
+
483
+ ```{ruby expressions}
484
+ exp1 = :a + :b
485
+ puts exp1
486
+ ```
487
+ We can build any complex mathematical expression
488
+
489
+ ```{ruby expr2}
490
+ exp2 = (:a + :b) * 2.0 + :c ** 2 / :z
491
+ puts exp2
492
+ ```
493
+
494
+ It is also possible to use inequality operators in building expressions
495
+
496
+ ```{ruby expr3}
497
+ exp3 = (:a + :b) >= :z
498
+ puts exp3
499
+ ```
500
+
501
+ Galaaz provides both symbolic representations for operators, such as (>, <, !=) as functional
502
+ notation for those operators such as (.gt, .ge, etc.). So the same expression written
503
+ above can also be written as
504
+
505
+ ```{ruby expr4}
506
+ exp4 = (:a + :b).ge :z
507
+ puts exp4
508
+ ```
509
+
510
+ Two type of expression can only be created with the functional representation of the operators,
511
+ those are expressions involving '==', and '='. In order to write an expression involving '==' we
512
+ need to use the method '.eq' and for '=' we need the function '.assign'
513
+
514
+ ```{ruby expr5}
515
+ exp5 = (:a + :b).eq :z
516
+ puts exp5
517
+ ```
518
+
519
+ ```{ruby expr6}
520
+ exp6 = :y.assign :a + :b
521
+ puts exp6
522
+ ```
523
+ In general we think that using the functional notation is preferable to using the
524
+ symbolic notation as otherwise, we end up writing invalid expressions such as
525
+
526
+ ```{ruby exp_wrong, warning=FALSE}
527
+ exp_wrong = (:a + :b) == :z
528
+ puts exp_wrong
529
+ ```
530
+ and it might be difficult to understand what is going on here. The problem lies with the fact that
531
+ when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When the
532
+ comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols at
533
+ this time are not bound to anything and we get a "object 'a' not found" message.
534
+ If we only use functional notation, this type of error will not occur.
535
+
536
+ ## Expressions with R methods
537
+
538
+ It is often necessary to create an expression that uses a method or function. For instance, in
539
+ mathematics, it's quite natural to write an expressin such as $y = sin(x)$. In this case, the
540
+ 'sin' function is part of the expression and should not immediately executed. Now, let's say
541
+ that 'x' is an angle of 45$^\circ$ and we acttually want our expression to be $y = 0.850...$.
542
+ When we want the function to be part of the expression, we call the function preceeding it
543
+ by the letter E, such as 'E.sin(x)'
544
+
545
+ ```{ruby method_expression}
546
+ exp7 = :y.assign E.sin(:x)
547
+ puts exp7
548
+ ```
549
+
550
+ # Manipulating Data
551
+
552
+ One of the major benefits of Galaaz is to bring strong data manipulation to Ruby. The following
553
+ examples were extracted from Hardley's "R for Data Science" (https://r4ds.had.co.nz/). This
554
+ is a highly recommended book for those not already familiar with the 'tidyverse' style of
555
+ programming in R. In the sections to follow, we will limit ourselves to convert the R code to
556
+ Galaaz.
557
+
558
+ For these
559
+ examples, we will investigate the nycflights13 data set available on the package by the
560
+ same name. We use function 'R.install_and_loads' that checks if the library is available
561
+ locally, and if not, installs it. This data frame contains all 336,776 flights that
562
+ departed from New York City in 2013. The data comes from the US Bureau of
563
+ Transportation Statistics.
564
+
565
+ ```{ruby nycflights13}
566
+ R.install_and_loads('nycflights13')
567
+ R.library('dplyr')
568
+ ```
569
+
570
+ ```{ruby flights}
571
+ @flights = ~:flights
572
+ puts @flights.head.as__data__frame
573
+ ```
574
+
575
+ ## Filtering rows with Filter
576
+
577
+ In this example we filter the flights data set by giving to the filter function two expressions:
578
+ the first :month.eq 1
579
+
580
+ ```{ruby filter_rows}
581
+ puts @flights.filter((:month.eq 1), (:day.eq 1)).head.as__data__frame
582
+ ```
583
+
584
+ ## Logical Operators
585
+
586
+ All flights that departed in November of December
587
+
588
+ ```{ruby nov_dec}
589
+ puts @flights.filter((:month.eq 11) | (:month.eq 12)).head.as__data__frame
590
+ ```
591
+
592
+ The same as above, but using the 'in' operator. In R, it is possible to define many operators
593
+ by doing %<op>%. The %in% operator checks if a value is in a vector. In order to use those
594
+ operators from Galaaz the '._' method is used, where the first argument is the operator's
595
+ symbol, in this case ':in' and the second argument is the vector:
596
+
597
+ ```{ruby in_op}
598
+ puts @flights.filter(:month._ :in, R.c(11, 12)).head.as__data__frame
599
+ ```
600
+
601
+ ## Filtering with NA (Not Available)
602
+
603
+ Let's first create a 'tibble' with a Not Available value (R::NA). Tibbles are a modern
604
+ version of a data frame and operate very similarly to one. It differs in how it outputs
605
+ the values and the result of some subsetting operations that are more consistent than
606
+ what is obtained from data frame.
607
+
608
+ ```{ruby na_tibble}
609
+ @df = R.tibble(x: R.c(1, R::NA, 3))
610
+ puts @df.as__data__frame
611
+ ```
612
+
613
+ Now filtering by :x > 1 shows all lines that satisfy this condition, where the row with R:NA does
614
+ not.
615
+
616
+ ```{ruby filter_na}
617
+ puts @df.filter(:x > 1).as__data__frame
618
+ ```
619
+
620
+ To match an NA use method 'is__na'
621
+
622
+ ```{ruby with_na}
623
+ puts @df.filter((:x.is__na) | (:x > 1)).as__data__frame
624
+ ```
625
+
626
+ ## Arrange Rows with arrange
627
+
628
+ Arrange reorders the rows of a data frame by the given arguments.
629
+
630
+ ```{ruby arrange}
631
+ puts @flights.arrange(:year, :month, :day).head.as__data__frame
632
+ ```
633
+
634
+ To arrange in descending order, use function 'desc'
635
+
636
+ ```{ruby desc_arrange}
637
+ puts @flights.arrange(:dep_delay.desc).head.as__data__frame
638
+ ```
639
+
640
+ ## Selecting columns
641
+
642
+ To select specific columns from a dataset we use function 'select':
643
+
644
+ ```{ruby select}
645
+ puts @flights.select(:year, :month, :day).head.as__data__frame
646
+ ```
647
+
648
+ It is also possible to select column in a given range
649
+
650
+ ```{ruby select_range}
651
+ puts @flights.select(:year.up_to :day).head.as__data__frame
652
+ ```
653
+
654
+ Select all columns that start with a given name sequence
655
+
656
+ ```{ruby select_starts_with}
657
+ puts @flights.select(E.starts_with('arr')).head.as__data__frame
658
+ ```
659
+
660
+ Other functions that can be used:
661
+
662
+ * ends_with("xyz"): matches names that end with “xyz”.
663
+
664
+ * contains("ijk"): matches names that contain “ijk”.
665
+
666
+ * matches("(.)\\1"): selects variables that match a regular expression. This one matches
667
+ any variables that contain repeated characters.
668
+
669
+ * num_range("x", (1..3)): matches x1, x2 and x3
670
+
671
+ A helper function that comes in handy when we just want to rearrange column order is 'Everything':
672
+
673
+ ```{ruby everything}
674
+ puts @flights.select(:year, :month, :day, E.everything).head.as__data__frame
675
+ ```
676
+
677
+ ## Add variables to a dataframe with 'mutate'
678
+
679
+ ```{ruby small_flights}
680
+ @flights_sm = @flights.
681
+ select((:year.up_to :day),
682
+ E.ends_with('delay'),
683
+ :distance,
684
+ :air_time)
685
+
686
+ puts @flights_sm.head.as__data__frame
687
+ ```
688
+
689
+ ```{ruby mutate}
690
+ @flights_sm = @flights_sm.
691
+ mutate(gain: :dep_delay - :arr_delay,
692
+ speed: :distance / :air_time * 60)
693
+ puts @flights_sm.head.as__data__frame
694
+ ```
695
+
696
+ # Graphics in Galaaz
697
+
698
+ Creating graphics in Galaaz is quite easy, as it can use all the power of ggplot2. There are
699
+ many resources in the web that teaches ggplot, so here we give a quick example of ggplot
700
+ integration with Ruby. We continue to use the :mtcars dataset and we will plot a diverging
701
+ bar plot, showing cars that have 'above' or 'below' gas consuption. Let's first prepare
702
+ the data frame with the necessary data:
703
+
704
+ ```{ruby diverging_plot_pre}
705
+ # copy the R variable :mtcars to the Ruby mtcars variable
706
+ @mtcars = ~:mtcars
707
+
708
+ # create a new column 'car_name' to store the car names so that it can be
709
+ # used for plotting. The 'rownames' of the data frame cannot be used as
710
+ # data for plotting
711
+ @mtcars.car_name = R.rownames(:mtcars)
712
+
713
+ # compute normalized mpg and add it to a new column called mpg_z
714
+ # Note that the mean value for mpg can be obtained by calling the 'mean'
715
+ # function on the vector 'mtcars.mpg'. The same with the standard
716
+ # deviation 'sd'. The vector is then rounded to two digits with 'round 2'
717
+ @mtcars.mpg_z = ((@mtcars.mpg - @mtcars.mpg.mean)/@mtcars.mpg.sd).round 2
718
+
719
+ # create a new column 'mpg_type'. Function 'ifelse' is a vectorized function
720
+ # that looks at every element of the mpg_z vector and if the value is below
721
+ # 0, returns 'below', otherwise returns 'above'
722
+ @mtcars.mpg_type = (@mtcars.mpg_z < 0).ifelse("below", "above")
723
+
724
+ # order the mtcar data set by the mpg_z vector from smaler to larger values
725
+ @mtcars = @mtcars[@mtcars.mpg_z.order, :all]
726
+
727
+ # convert the car_name column to a factor to retain sorted order in plot
728
+ @mtcars.car_name = @mtcars.car_name.factor levels: @mtcars.car_name
729
+
730
+ # let's look at the final data frame
731
+ puts @mtcars
732
+ ```
733
+ Now, lets plot the diverging bar plot. When using gKnit, there is no need to call
734
+ 'R.awt' to create a plotting device, since gKnit does take care of it:
735
+
736
+
199
737
 
200
738
  [TO BE CONTINUED...]
201
739