galaaz 0.4.9 → 0.4.10

Sign up to get free protection for your applications and to get access to all the features.
Files changed (76) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +798 -285
  3. data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +3 -12
  4. data/blogs/galaaz_ggplot/galaaz_ggplot.aux +5 -7
  5. data/blogs/galaaz_ggplot/galaaz_ggplot.html +69 -29
  6. data/blogs/galaaz_ggplot/galaaz_ggplot.pdf +0 -0
  7. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/midwest_rb.png +0 -0
  8. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/scatter_plot_rb.png +0 -0
  9. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/midwest_rb.pdf +0 -0
  10. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/scatter_plot_rb.pdf +0 -0
  11. data/blogs/galaaz_ggplot/midwest.Rmd +1 -9
  12. data/blogs/gknit/gknit.Rmd +37 -40
  13. data/blogs/gknit/gknit.html +32 -30
  14. data/blogs/gknit/gknit.md +36 -37
  15. data/blogs/gknit/gknit.pdf +0 -0
  16. data/blogs/gknit/gknit.tex +35 -37
  17. data/blogs/manual/manual.Rmd +548 -125
  18. data/blogs/manual/manual.html +509 -286
  19. data/blogs/manual/manual.md +798 -285
  20. data/blogs/manual/manual.pdf +0 -0
  21. data/blogs/manual/manual.tex +2816 -0
  22. data/blogs/manual/manual_files/figure-latex/diverging_bar.pdf +0 -0
  23. data/blogs/nse_dplyr/nse_dplyr.Rmd +240 -74
  24. data/blogs/nse_dplyr/nse_dplyr.html +191 -87
  25. data/blogs/nse_dplyr/nse_dplyr.md +361 -107
  26. data/blogs/nse_dplyr/nse_dplyr.pdf +0 -0
  27. data/blogs/nse_dplyr/nse_dplyr.tex +1373 -0
  28. data/blogs/ruby_plot/ruby_plot.Rmd +61 -81
  29. data/blogs/ruby_plot/ruby_plot.html +54 -57
  30. data/blogs/ruby_plot/ruby_plot.md +48 -67
  31. data/blogs/ruby_plot/ruby_plot.pdf +0 -0
  32. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
  33. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
  34. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
  35. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
  36. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
  37. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
  38. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
  39. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
  40. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
  41. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
  42. data/blogs/ruby_plot/ruby_plot_files/figure-latex/dose_len.png +0 -0
  43. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_delivery.png +0 -0
  44. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_dose.png +0 -0
  45. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color.png +0 -0
  46. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color2.png +0 -0
  47. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_decorations.png +0 -0
  48. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_jitter.png +0 -0
  49. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_points.png +0 -0
  50. data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_box_plot.png +0 -0
  51. data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_violin_plot.png +0 -0
  52. data/blogs/ruby_plot/ruby_plot_files/figure-latex/violin_with_jitter.png +0 -0
  53. data/lib/R_interface/rdata_frame.rb +0 -12
  54. data/lib/R_interface/robject.rb +14 -14
  55. data/lib/R_interface/ruby_extensions.rb +3 -31
  56. data/lib/R_interface/rvector.rb +0 -12
  57. data/lib/gknit/knitr_engine.rb +5 -3
  58. data/lib/util/exec_ruby.rb +22 -61
  59. data/specs/tmp.rb +26 -12
  60. data/version.rb +1 -1
  61. metadata +22 -17
  62. data/bin/gknit_old_r +0 -236
  63. data/blogs/dev/dev.Rmd +0 -23
  64. data/blogs/dev/dev.md +0 -58
  65. data/blogs/dev/dev2.Rmd +0 -65
  66. data/blogs/dev/model.rb +0 -41
  67. data/blogs/dplyr/dplyr.Rmd +0 -29
  68. data/blogs/dplyr/dplyr.html +0 -433
  69. data/blogs/dplyr/dplyr.md +0 -58
  70. data/blogs/dplyr/dplyr.rb +0 -63
  71. data/blogs/galaaz_ggplot/galaaz_ggplot.log +0 -640
  72. data/blogs/galaaz_ggplot/galaaz_ggplot.md +0 -431
  73. data/blogs/galaaz_ggplot/galaaz_ggplot.tex +0 -481
  74. data/blogs/galaaz_ggplot/midwest.png +0 -0
  75. data/blogs/galaaz_ggplot/scatter_plot.png +0 -0
  76. data/blogs/ruby_plot/ruby_plot.tex +0 -1077
@@ -4,7 +4,7 @@ author:
4
4
  - "Rodrigo Botafogo"
5
5
  - "Daniel Mossé - University of Pittsburgh"
6
6
  tags: [Tech, Data Science, Ruby, R, GraalVM]
7
- date: "20/02/2019"
7
+ date: "10/05/2019"
8
8
  output:
9
9
  html_document:
10
10
  self_contained: true
@@ -13,6 +13,11 @@ output:
13
13
  includes:
14
14
  in_header: ["../../sty/galaaz.sty"]
15
15
  number_sections: yes
16
+ toc: true
17
+ toc_depth: 2
18
+ md_document:
19
+ variant: markdown_github
20
+ fontsize: 11pt
16
21
  ---
17
22
 
18
23
  ```{r setup, echo=FALSE, message = FALSE}
@@ -22,21 +27,30 @@ library('tibble')
22
27
 
23
28
  # Introduction
24
29
 
25
- In this post we will see how to program with dplyr in Galaaz.
30
+ In this post we will see how to program with _dplyr_ in Galaaz.
26
31
 
27
- ### But first, what is Galaaz??
32
+ ## But first, what is Galaaz??
28
33
 
29
34
  Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with
30
35
  a large community, a very large set of libraries and great for web development. However,
31
36
  it lacks libraries for data science, statistics, scientific plotting and machine learning.
32
37
  On the other hand, R is considered one of the most powerful languages for solving all of the
33
38
  above problems. Maybe the strongest competitor to R is Python with libraries such as NumPy,
34
- Panda, SciPy, SciKit-Learn and a couple more.
39
+ Pandas, SciPy, SciKit-Learn and many more.
35
40
 
36
41
  With Galaaz we do not intend to re-implement any of the scientific libraries in R. However, we
37
42
  allow for very tight coupling between the two languages to the point that the Ruby
38
- developer does not need to know that there is an R engine running. For this to happen we
39
- use new technologies provided by Oracle: GraalVM, TruffleRuby and FastR:
43
+ developer does not need to know that there is an R engine running. Also, from the point of
44
+ view of the R user/developer Galaaz looks a lot like R, with just minor syntactic difference,
45
+ so there is almost no learning courve for the R developer. And as we will see in this
46
+ post, programming with _dplyr_ is easier in Galaaz than in R.
47
+
48
+ R users are probably quite knowledgeable about _dplyr_, for the Ruby developer, _dplyr_ and
49
+ the _tidyverse_ libraries are a set of libraries for data manipulation in R, developed by
50
+ Hardley Wickham, chief scientis at RStudio and a prolific R coder and writer.
51
+
52
+ For the coupling of Ruby and R we use new technologies provided by Oracle: GraalVM,
53
+ TruffleRuby and FastR:
40
54
 
41
55
  GraalVM is a universal virtual machine for running applications
42
56
  written in JavaScript, Python 3, Ruby, R, JVM-based languages like Java,
@@ -71,10 +85,16 @@ Interested readers should also check out the following sites:
71
85
  * [TruffleRuby](https://github.com/oracle/truffleruby)
72
86
  * [FastR](https://github.com/oracle/fastr)
73
87
  * [Faster R with FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
88
+ * [How to make Beautiful Ruby Plots with Galaaz](https://medium.freecodecamp.org/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857)
89
+ * [Ruby Plotting with Galaaz: An example of tightly coupling Ruby and R in GraalVM](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021)
90
+ * [How to do reproducible research in Ruby with gKnit](https://towardsdatascience.com/how-to-do-reproducible-research-in-ruby-with-gknit-c26d2684d64e)
91
+ * [R for Data Science](https://r4ds.had.co.nz/)
92
+ * [Advanced R](https://adv-r.hadley.nz/)
74
93
 
75
- ### Programming with dplyr
94
+ ## Programming with dplyr
76
95
 
77
- According to Hardley (https://dplyr.tidyverse.org/articles/programming.html)
96
+ This post will follow closely the work done in https://dplyr.tidyverse.org/articles/programming.html,
97
+ by Hardley Wickham. In it, Hardley states:
78
98
 
79
99
  > Most dplyr functions use non-standard evaluation (NSE). This is a catch-all term that
80
100
  > means they don’t follow the usual R rules of evaluation. Instead, they capture the
@@ -83,7 +103,7 @@ According to Hardley (https://dplyr.tidyverse.org/articles/programming.html)
83
103
 
84
104
  > Operations on data frames can be expressed succinctly because you don’t need to repeat
85
105
  > the name of the data frame. For example, you can write filter(df, x == 1, y == 2, z == 3)
86
- > instead of df[df$x == 1 & df$y ==2 & df$z == 3, ].
106
+ > instead of df[df\$x == 1 & df\$y ==2 & df\$z == 3, ].
87
107
 
88
108
  > dplyr can choose to compute results in a different way to base R. This is important for
89
109
  > database backends because dplyr itself doesn’t do any work, but instead generates the SQL
@@ -112,36 +132,49 @@ filter(df, my_var == 1)
112
132
  ```
113
133
  > This makes it hard to create functions with arguments that change how dplyr verbs are computed.
114
134
 
115
- In this post we will see that programming with dplyr in Galaaz does not require knowledge of
135
+ In this post we will see that programming with _dplyr_ in Galaaz does not require knowledge of
116
136
  non-standard evaluation in R and can be accomplished by utilizing normal Ruby constructs.
117
137
 
118
138
  # Writing Expressions in Galaaz
119
139
 
120
- Galaaz extends Ruby to work with complex expressions, similar to R's expressions build with 'quote'
121
- (base R) or 'quo' (tidyverse). Let's take a look at some of those expressions.
140
+ Galaaz extends Ruby to work with expressions, similar to R's expressions build with 'quote'
141
+ (base R) or 'quo' (tidyverse). Expressions in this context are like mathematical expressions or
142
+ formulae. For instance, in mathematics, the expression $y = sin(x)$ describes a function but cannot
143
+ be computed unless the value of $x$ is bound to some value.
144
+
145
+ Let's take a look at some of those expressions in Ruby:
122
146
 
123
147
  ## Expressions from operators
124
148
 
125
- The code bellow
126
- creates an expression summing two symbols
149
+ The code bellow creates an expression summing two symbols. Note that :a and :b are Ruby symbols and
150
+ are not bound to any value at the time of expression definition:
127
151
 
128
152
  ```{ruby expressions}
129
153
  exp1 = :a + :b
130
154
  puts exp1
131
155
  ```
132
- We can build any complex mathematical expression
156
+ We can build any complex mathematical expression such as:
133
157
 
134
158
  ```{ruby expr2}
135
159
  exp2 = (:a + :b) * 2.0 + :c ** 2 / :z
136
160
  puts exp2
137
161
  ```
162
+ The 'L' after two indicates that 2 is an integer.
138
163
 
139
- It is also possible to use inequality operators in building expressions
164
+ It is also possible to use inequality operators in building expressions:
140
165
 
141
166
  ```{ruby expr3}
142
167
  exp3 = (:a + :b) >= :z
143
168
  puts exp3
144
169
  ```
170
+ Expressions' definition can also make use of normal Ruby variables without any problem:
171
+
172
+ ```{ruby expr_with_var}
173
+ x = 20
174
+ y = 30
175
+ exp_var = (:a + :b) * x <= :z - y
176
+ puts exp_var
177
+ ```
145
178
 
146
179
  Galaaz provides both symbolic representations for operators, such as (>, <, !=) as functional
147
180
  notation for those operators such as (.gt, .ge, etc.). So the same expression written
@@ -152,8 +185,9 @@ exp4 = (:a + :b).ge :z
152
185
  puts exp4
153
186
  ```
154
187
 
155
- Two type of expression can only be created with the functional representation of the operators,
156
- those are expressions involving '==', and '='. In order to write an expression involving '==' we
188
+ Two type of expression, however, can only be created with the functional representation
189
+ of the operators, those are expressions involving '==', and '='. In order to write an
190
+ expression involving '==' we
157
191
  need to use the method '.eq' and for '=' we need the function '.assign'
158
192
 
159
193
  ```{ruby expr5}
@@ -182,9 +216,8 @@ If we only use functional notation, this type of error will not occur.
182
216
 
183
217
  It is often necessary to create an expression that uses a method or function. For instance, in
184
218
  mathematics, it's quite natural to write an expressin such as $y = sin(x)$. In this case, the
185
- 'sin' function is part of the expression and should not immediately executed. Now, let's say
186
- that 'x' is an angle of 45$^\circ$ and we acttually want our expression to be $y = 0.850...$.
187
- When we want the function to be part of the expression, we call the function preceeding it
219
+ 'sin' function is part of the expression and should not immediately be executed. When we want
220
+ the function to be part of the expression, we call the function preceeding it
188
221
  by the letter E, such as 'E.sin(x)'
189
222
 
190
223
  ```{ruby method_expression}
@@ -192,11 +225,48 @@ exp7 = :y.assign E.sin(:x)
192
225
  puts exp7
193
226
  ```
194
227
 
228
+ Expressions can also be written using '.' notation:
229
+
230
+ ```{ruby expression_with_dot}
231
+ exp8 = :y.assign :x.sin
232
+ puts exp8
233
+ ```
234
+
235
+ When a function has multiple arguments, the first one can be used before the '.':
236
+
237
+ ```{ruby expression_multiple_args}
238
+ exp9 = :x.c(:y)
239
+ puts exp9
240
+ ```
241
+
242
+ ## Evaluating an Expression
243
+
244
+ Expressions can be evaluated by calling function 'eval' with a binding. A binding can be provided
245
+ with a list:
246
+
247
+ ```{ruby eval_expression_list}
248
+ exp = (:a + :b) * 2.0 + :c ** 2 / :z
249
+ puts exp.eval(R.list(a: 10, b: 20, c: 30, z: 40))
250
+ ```
251
+
252
+ ... with a data frame:
253
+
254
+ ```{ruby eval_expression_df}
255
+ df = R.data__frame(
256
+ a: R.c(1, 2, 3),
257
+ b: R.c(10, 20, 30),
258
+ c: R.c(100, 200, 300),
259
+ z: R.c(1000, 2000, 3000))
260
+
261
+ puts exp.eval(df)
262
+ ```
263
+
195
264
  # Using Galaaz to call R functions
196
265
 
197
266
  Galaaz tries to emulate as closely as possible the way R functions are called and migrating from
198
267
  R to Galaaz should be quite easy requiring only minor syntactic changes to an R script. In
199
- this post, we do not have enough space to write a complety manual to Galaaz, so we will
268
+ this post, we do not have enough space to write a complete manual on Galaaz
269
+ (a short manual can be found at: https://www.rubydoc.info/gems/galaaz/0.4.9), so we will
200
270
  present only a few examples scripts using Galaaz.
201
271
 
202
272
  Basically, to call an R function from Ruby with Galaaz, one only needs to preceed the function
@@ -204,14 +274,14 @@ with 'R.'. For instance, to create a vector in R, the 'c' function is used. Fr
204
274
  vector can be created by using 'R.c':
205
275
 
206
276
  ```{ruby vector}
207
- @vec = R.c(1.0, 2, 3)
208
- puts @vec
277
+ vec = R.c(1.0, 2, 3)
278
+ puts vec
209
279
  ```
210
280
  A list is created in R with the 'list' function, so in Galaaz we do:
211
281
 
212
282
  ```{ruby list}
213
- @list = R.list(a: 1.0, b: 2, c: 3)
214
- puts @list
283
+ list = R.list(a: 1.0, b: 2, c: 3)
284
+ puts list
215
285
  ```
216
286
  Note that we can use named arguments in our list. The same code in R would be:
217
287
 
@@ -219,36 +289,39 @@ Note that we can use named arguments in our list. The same code in R would be:
219
289
  lst = list(a = 1, b = 2L, c = 3L)
220
290
  print(lst)
221
291
  ```
292
+ Now, let's say that 'x' is an angle of 45$^\circ$ and we acttually want to create
293
+ the expression $y = sin(45^\circ)$, which is $y = 0.850...$. In this case,
294
+ we will use 'R.sin':
222
295
 
223
-
224
- However, if we want the function to be evaluated, then
225
- we use the normal call to function with R as 'R.sin(x)'.
226
-
227
- ```{ruby method_exc}
228
- x = 45
229
- exp8 = :y.assign R.sin(x)
230
- puts exp8
296
+ ```{ruby eval_sin}
297
+ exp10 = :y.assign R.sin(45)
298
+ puts exp10
231
299
  ```
232
300
 
233
-
234
301
  # Filtering using expressions
235
302
 
236
- Now that we now how to write expression, we can use then to filter a data frame by expressions.
237
- Let's first start by creating a simple data frame with two columns named 'x' and 'y'
303
+ Now that we know how to write expression and call R functions let's do some data manipulation in
304
+ Galaaz. Let's first start by creating the same data frame that we created previously in section
305
+ "Programming with dplyr":
238
306
 
239
307
  ```{ruby df}
240
- @df = R.data__frame(x: (1..3), y: (3..1))
241
- puts @df
308
+ df = R.data__frame(x: (1..3), y: (3..1))
309
+ puts df
242
310
  ```
243
- In the code bellow we want to filter the data frame by rows in which the value of 'x' is
244
- equal to 1.
311
+ The 'filter' function can be called on this data frame either by using 'R.filter(df, ...)' or
312
+ by using dot notation. We prefer to use dot notation as shown bellow. The argument to 'filter'
313
+ in Galaaz should be an expression. Note that if we gave to filter a Ruby expression such as
314
+ 'x == 1', we would get an error, since there is no variable 'x' defined and if 'x' was a variable
315
+ then 'x == 1' would either be 'true' or 'false'. Our goal is to filter our data frame returning
316
+ all rows in which the 'x' value is equal to 1. To express this we want: ':x.eq 1', where :x will
317
+ be interpreted by filter as the 'x' column.
245
318
 
246
319
  ```{ruby filter_exp}
247
- puts @df.filter(:x.eq 1)
320
+ puts df.filter(:x.eq 1)
248
321
  ```
249
322
 
250
323
  In R, and when coding with 'tidyverse', arguments to a function are usually not
251
- *referencially transparent*. That is, ou can’t replace a value with a seemingly equivalent
324
+ *referencially transparent*. That is, you can’t replace a value with a seemingly equivalent
252
325
  object that you’ve defined elsewhere. In other words, this code
253
326
 
254
327
  ```{r not_transp, eval=FALSE}
@@ -257,15 +330,15 @@ filter(df, my_var == 1)
257
330
  ```
258
331
  Generates the following error: "object 'x' not found.
259
332
 
260
- However, in Ruby and Galaaz, arguments are referencially transparent as can be seen by the
261
- code bellow. Note, initally that 'my_var = :x' will not give the error "object 'x' not found"
333
+ However, in Galaaz, arguments are referencially transparent as can be seen by the
334
+ code bellow. Note initally that 'my_var = :x' will not give the error "object 'x' not found"
262
335
  since ':x' is treated as an expression and assigned to my\_var. Then when doing (my\_var.eq 1),
263
336
  my\_var is a variable that resolves to ':x' and it becomes equivalent to (:x.eq 1) which is
264
337
  what we want.
265
338
 
266
339
  ```{ruby my_var}
267
340
  my_var = :x
268
- puts @df.filter(my_var.eq 1)
341
+ puts df.filter(my_var.eq 1)
269
342
  ```
270
343
  As stated by Hardley
271
344
 
@@ -280,26 +353,30 @@ df[x == y, ]
280
353
  ```
281
354
  In galaaz this ambiguity does not exist, filter(df, x.eq y) is not a valid expression as
282
355
  expressions are build with symbols. In doing filter(df, :x.eq y) we are looking for elements
283
- of the 'x' column that are equal to a previously defined y variable. Finally,
356
+ of the 'x' column that are equal to a previously defined y variable. Finally in
284
357
  filter(df, :x.eq :y) we are looking for elements in which the 'x' column value is equal to
285
358
  the 'y' column value. This can be seen in the following two chunks of code:
286
359
 
287
360
  ```{ruby disamb1}
288
- @y = 1
289
- @x = 2
361
+ y = 1
362
+ x = 2
290
363
 
291
364
  # looking for values where the 'x' column is equal to the 'y' column
292
- puts @df.filter(:x.eq :y)
365
+ puts df.filter(:x.eq :y)
293
366
  ```
294
367
 
295
368
  ```{ruby disamb2}
296
369
  # looking for values where the 'x' column is equal to the 'y' variable
297
370
  # in this case, the number 1
298
- puts @df.filter(:x.eq @y)
371
+ puts df.filter(:x.eq y)
299
372
  ```
300
373
  # Writing a function that applies to different data sets
301
374
 
375
+ Let's suppose that we want to write a function that receives as the first argument a data frame
376
+ and as second argument an expression that adds a column to the data frame that is equal to the
377
+ sum of elements in column 'a' plus 'x'.
302
378
 
379
+ Here is the intended behaviour using the 'mutate' function of 'dplyr':
303
380
 
304
381
  ```
305
382
  mutate(df1, y = a + x)
@@ -307,28 +384,49 @@ mutate(df2, y = a + x)
307
384
  mutate(df3, y = a + x)
308
385
  mutate(df4, y = a + x)
309
386
  ```
387
+ The naive approach to writing an R function to solve this problem is:
310
388
 
311
- Here we create a mutate_y Ruby method.
389
+ ```
390
+ mutate_y <- function(df) {
391
+ mutate(df, y = a + x)
392
+ }
393
+ ```
394
+ Unfortunately, in R, this function can fail silently if one of the variables isn’t present
395
+ in the data frame, but is present in the global environment. We will not go through here how
396
+ to solve this problem in R.
397
+
398
+ In Galaaz the method mutate_y bellow will work fine and will never fail silently.
312
399
 
313
400
  ```{ruby mutate_y, warning=FALSE}
314
401
  def mutate_y(df)
315
402
  df.mutate(:y.assign :a + :x)
316
403
  end
317
404
  ```
405
+ Here we create a data frame that has only one column named 'x':
318
406
 
319
- Note that contrary to what happens in R, method mutate_y will fail independetly from the fact
320
- that variable 'a' is defined or not.
321
-
322
- ```{ruby call_mutate, warning=FALSE}
407
+ ```{ruby data_frame_no_a_column, warning=FALSE}
323
408
  df1 = R.data__frame(x: (1..3))
324
409
  puts df1
410
+ ```
411
+
412
+ Note that method mutate_y will fail independetly from the fact that variable 'a' is defined and
413
+ in the scope of the method. Variable 'a' has no relationship with the symbol ':a' used in the
414
+ definition of 'mutate\_y' above:
415
+
416
+ ```{ruby call_mutate_y, warning = FALSE}
325
417
  a = 10
326
418
  mutate_y(df1)
327
419
  ```
328
-
329
420
  # Different expressions
330
421
 
422
+ Let's move to the next problem as presented by Hardley where trying to write a function in R
423
+ that will receive two argumens, the first a variable and the second an expression is not trivial.
424
+ Bellow we create a data frame and we want to write a function that groups data by a variable and
425
+ summarises it by an expression:
426
+
331
427
  ```{r diff_expr}
428
+ set.seed(123)
429
+
332
430
  df <- data.frame(
333
431
  g1 = c(1, 1, 2, 2, 2),
334
432
  g2 = c(1, 2, 1, 2, 1),
@@ -336,6 +434,8 @@ df <- data.frame(
336
434
  b = sample(5)
337
435
  )
338
436
 
437
+ as.data.frame(df)
438
+
339
439
  d2 <- df %>%
340
440
  group_by(g1) %>%
341
441
  summarise(a = mean(a))
@@ -349,9 +449,7 @@ d2 <- df %>%
349
449
  as.data.frame(d2)
350
450
  ```
351
451
 
352
- Trying to write a function in R that will receive two argumens, the first a variable and
353
- the second an expression is not trivia. As shown by Hardley, one might expect this function
354
- to do the trick:
452
+ As shown by Hardley, one might expect this function to do the trick:
355
453
 
356
454
  ```{r diff_exp_fnc}
357
455
  my_summarise <- function(df, group_var) {
@@ -365,11 +463,13 @@ my_summarise <- function(df, group_var) {
365
463
  ```
366
464
 
367
465
  In order to solve this problem, coding with dplyr requires the introduction of many new concepts
368
- and functions such as 'quo', 'quos', 'enquo', 'enquos', '!!' (bang bang), '!!!' (triple bang).
466
+ and functions such as 'quo', 'quos', 'enquo', 'enquos', '!!' (bang bang), '!!!' (triple bang).
467
+ Again, we'll leave to Hardley the explanation on how to use all those functions.
369
468
 
370
469
  Now, let's try to implement the same function in galaaz. The next code block first prints the
371
- 'df' data frame define previously in R, then creates the my_summarize function and calls it
372
- passing the R data frame and the group by variable ':g1'
470
+ 'df' data frame define previously in R (to access an R variable from Galaaz, we use the tilda
471
+ operator '~' applied to the R variable name as symbol, i.e., ':df'. We then create the
472
+ 'my_summarize' method and call it passing the R data frame and the group by variable ':g1':
373
473
 
374
474
  ```{ruby diff_exp_ruby_func}
375
475
  puts ~:df
@@ -377,15 +477,15 @@ print "\n"
377
477
 
378
478
  def my_summarize(df, group_var)
379
479
  df.group_by(group_var).
380
- summarize(a: E.mean(:a))
480
+ summarize(a: :a.mean)
381
481
  end
382
482
 
383
- puts my_summarize((~:df), :g1).as__data__frame
483
+ puts my_summarize(:df, :g1).as__data__frame
384
484
  ```
385
- It works!!! Well let's make sure this was not just some coincidence
485
+ It works!!! Well, let's make sure this was not just some coincidence
386
486
 
387
487
  ```{ruby group_g2}
388
- puts my_summarize((~:df), :g2).as__data__frame
488
+ puts my_summarize(:df, :g2).as__data__frame
389
489
  ```
390
490
 
391
491
  Great, everything is fine! No magic, no new functions, no complexities, just normal, standard Ruby
@@ -394,7 +494,7 @@ code. If you've ever done NSE in R, this certainly feels much safer and easy to
394
494
  # Different input variables
395
495
 
396
496
  In the previous section we've managed to get rid of all NSE formulation for a simple example, but
397
- does this remain true for more complex examples, or will the Ruby way prove inpractical for
497
+ does this remain true for more complex examples, or will the Galaaz way prove inpractical for
398
498
  more complex code?
399
499
 
400
500
  In the next example Hardley proposes us to write a function that given an expression such as 'a'
@@ -412,7 +512,7 @@ summarise(df, mean = mean(a * b), sum = sum(a * b), n = n())
412
512
  #> # A tibble: 1 x 3
413
513
  #> mean sum n
414
514
  #> <dbl> <int> <int>
415
- #> 1 9.6 48 5
515
+ #> 1 9 45 5
416
516
  ```
417
517
 
418
518
  Let's try it in galaaz:
@@ -431,7 +531,7 @@ puts my_summarise2((~:df), :a * :b)
431
531
  ```
432
532
 
433
533
  Once again, there is no need to use any special theory or functions. The only point to be
434
- careful about is the use of 'E' to build an expression that uses the mean, sum and n.
534
+ careful about is the use of 'E' to build expressions from functions 'mean', 'sum' and 'n'.
435
535
 
436
536
  # Different input and output variable
437
537
 
@@ -461,8 +561,10 @@ mutate(df, mean_b = mean(b), sum_b = sum(b))
461
561
  #> 4 2 2 5 4 3 15
462
562
  #> # … with 1 more row
463
563
  ```
564
+ In order to solve this problem in R, Hardley needs to introduce some more new functions and notations:
565
+ 'quo_name' and the ':=' operator from package 'rlang'
464
566
 
465
- Here is our Ruby code
567
+ Here is our Ruby code:
466
568
 
467
569
  ```{ruby name_change}
468
570
  def my_mutate(df, expr)
@@ -486,6 +588,12 @@ and variable mean\_name is not followed by ':' but by '=>'. This is standard Ru
486
588
 
487
589
  # Capturing multiple variables
488
590
 
591
+ Moving on with new complexities, Hardley proposes us to solve the problem in which the
592
+ summarise function will receive any number of grouping variables.
593
+
594
+ This again is quite standard Ruby. In order to receive an undefined number of paramenters
595
+ the paramenter is preceded by '*':
596
+
489
597
  ```{ruby multiple_vars}
490
598
  def my_summarise3(df, *group_vars)
491
599
  df.group_by(*group_vars).
@@ -495,12 +603,58 @@ end
495
603
  puts my_summarise3((~:df), :g1, :g2).as__data__frame
496
604
  ```
497
605
 
606
+ # Why does R require NSE and Galaaz does not?
607
+
608
+ NSE introduces a number of new concepts, such as 'quoting', 'quasiquotation', 'unquoting' and
609
+ 'unquote-splicing', while in Galaaz none of those concepts are needed. What gives?
610
+
611
+ R is an extremely flexible language and it has lazy evaluation of parameters. When in R a
612
+ function is called as 'summarise(df, a = b)', the summarise function receives the litteral
613
+ 'a = b' parameter and can work with this as if it were a string. In R, it is not clear what
614
+ a and b are, they can be expressions or they can be variables, it is up to the function to
615
+ decide what 'a = b' means.
616
+
617
+ In Ruby, there is no lazy evaluation of parameters and 'a' is always a variable and so is 'b'.
618
+ Variables assume their value as soon as they are used, so 'x = a' is immediately evaluate and
619
+ variable 'x' will receive the value of variable 'a' as soon as the Ruby statement is executed.
620
+ Ruby also provides the notion of a symbol; ':a' is a symbol and does not evaluate to anything.
621
+ Galaaz uses Ruby symbols to build expressions that are not bound to anything: ':a.eq :b' is
622
+ clearly an expression and has no relationship whatsoever with the statment 'a = b'. By using
623
+ symbols, variables and expressions all the possible ambiguities that are found in R are
624
+ eliminated in Galaaz.
625
+
626
+ The main problem that remains, is that in R, functions are not clearly documented as what type
627
+ of input they are expecting, they might be expecting regular variables or they might be
628
+ expecting expressions and the R function will know how to deal with an input of the form
629
+ 'a = b', now for the Ruby developer it might not be immediately clear if it should call the
630
+ function passing the value 'true' if variable 'a' is equal to variable 'b' or if it should
631
+ call the function passing the expression ':a.eq :b'.
632
+
633
+
498
634
  # Advanced dplyr features
499
- https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/
635
+
636
+ In the blog: Programming with dplyr by using dplyr (https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/) Iñaki Úcar shows surprise that some R users are trying to code in dplyr avoiding
637
+ the use of NSE. For instance he says:
638
+
639
+ > Take the example of seplyr. It stands for standard evaluation dplyr, and enables us to
640
+ > program over dplyr without having “to bring in (or study) any deep-theory or
641
+ > heavy-weight tools such as rlang/tidyeval”.
642
+
643
+ For me, there isn't really any surprise that users are trying to avoid dplyr deep-theory. R
644
+ users frequently are not programmers and learning to code is already hard business, on top
645
+ of that, having to learn how to 'quote' or 'enquo' or 'quos' or 'enquos' is not necessarily
646
+ a 'piece of cake'. So much so, that 'tidyeval' has some more advanced functions that instead
647
+ of using quoted expressions, uses strings as arguments.
648
+
649
+ In the following examples, we show the use of functions 'group\_by\_at', 'summarise\_at' and
650
+ 'rename\_at' that receive strings as argument. The data frame used in 'starwars' that describes
651
+ features of characters in the Starwars movies:
500
652
 
501
653
  ```{ruby starwars}
502
654
  puts (~:starwars).head.as__data__frame
503
655
  ```
656
+ The grouped_mean function bellow will receive a grouping variable and calculate summaries for
657
+ the value\_variables given:
504
658
 
505
659
  ```{r grouped_mean}
506
660
  grouped_mean <- function(data, grouping_variables, value_variables) {
@@ -517,14 +671,26 @@ gm = starwars %>%
517
671
  as.data.frame(gm)
518
672
  ```
519
673
 
674
+ The same code with Galaaz, becomes:
675
+
520
676
  ```{ruby advanced_starwars}
521
677
  def grouped_mean(data, grouping_variables, value_variables)
522
678
  data.
523
679
  group_by_at(grouping_variables).
524
680
  mutate(count: E.n).
525
681
  summarise_at(E.c(value_variables, "count"), ~:mean, na__rm: true).
526
- rename_at(value_variables, R.funs(E.paste0("mean_", value_variables)))
682
+ rename_at(value_variables, E.funs(E.paste0("mean_", value_variables)))
527
683
  end
528
684
 
529
- puts grouped_mean((~:starwars), "eye_color", R.c("mass", "birth_year")).as__data__frame
685
+ puts grouped_mean((~:starwars), "eye_color", E.c("mass", "birth_year")).as__data__frame
530
686
  ```
687
+
688
+ # Conclusion
689
+
690
+ Ruby and Galaaz provide a nice framework for developing code that uses R functions. Although R is
691
+ a very powerful and flexible language, sometimes, too much flexibility makes life harder for
692
+ the casual user. We believe however, that even for the advanced user, Ruby integrated
693
+ with R throught Galaaz, makes a powerful environment for data analysis. In this blog post we
694
+ showed how Galaaz consistent syntax eliminates the need for complex constructs such as quoting,
695
+ enquoting, quasiquotation, etc. This simplification comes from the fact that expressions and
696
+ variables are clearly separated objects, which is not the case in the R language.