galaaz 0.4.6 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (181) hide show
  1. checksums.yaml +5 -5
  2. data/README.md +3575 -118
  3. data/Rakefile +21 -4
  4. data/bin/gknit +152 -6
  5. data/bin/gknit-draft +105 -0
  6. data/bin/gknit-draft.rb +28 -0
  7. data/bin/gknit_Rscript +127 -0
  8. data/bin/grun +27 -1
  9. data/bin/gstudio +47 -4
  10. data/bin/{gstudio.rb → gstudio_irb.rb} +0 -0
  11. data/bin/gstudio_pry.rb +7 -0
  12. data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +3 -12
  13. data/blogs/galaaz_ggplot/galaaz_ggplot.html +77 -222
  14. data/blogs/galaaz_ggplot/galaaz_ggplot.md +4 -31
  15. data/blogs/galaaz_ggplot/galaaz_ggplot.pdf +0 -0
  16. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/midwest_rb.png +0 -0
  17. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/scatter_plot_rb.png +0 -0
  18. data/blogs/galaaz_ggplot/midwest.Rmd +1 -9
  19. data/blogs/gknit/gknit.Rmd +232 -123
  20. data/blogs/{dev/dev.html → gknit/gknit.html} +1897 -33
  21. data/blogs/gknit/gknit.pdf +0 -0
  22. data/blogs/gknit/lst.rds +0 -0
  23. data/blogs/gknit/stats.bib +27 -0
  24. data/blogs/manual/lst.rds +0 -0
  25. data/blogs/manual/manual.Rmd +1893 -47
  26. data/blogs/manual/manual.html +3153 -347
  27. data/blogs/manual/manual.md +3575 -118
  28. data/blogs/manual/manual.pdf +0 -0
  29. data/blogs/manual/manual.tex +4026 -0
  30. data/blogs/manual/manual_files/figure-html/bubble-1.png +0 -0
  31. data/blogs/manual/manual_files/figure-html/diverging_bar.png +0 -0
  32. data/blogs/manual/manual_files/figure-latex/bubble-1.png +0 -0
  33. data/blogs/manual/manual_files/figure-latex/diverging_bar.pdf +0 -0
  34. data/blogs/{dev → manual}/model.rb +0 -0
  35. data/blogs/nse_dplyr/nse_dplyr.Rmd +849 -0
  36. data/blogs/nse_dplyr/nse_dplyr.html +878 -0
  37. data/blogs/nse_dplyr/nse_dplyr.md +1198 -0
  38. data/blogs/nse_dplyr/nse_dplyr.pdf +0 -0
  39. data/blogs/oh_my/oh_my.html +274 -386
  40. data/blogs/oh_my/oh_my.md +208 -205
  41. data/blogs/ruby_plot/ruby_plot.Rmd +64 -84
  42. data/blogs/ruby_plot/ruby_plot.html +235 -208
  43. data/blogs/ruby_plot/ruby_plot.md +239 -34
  44. data/blogs/ruby_plot/ruby_plot.pdf +0 -0
  45. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
  46. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
  47. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
  48. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
  49. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
  50. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
  51. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
  52. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
  53. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
  54. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
  55. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
  56. data/examples/Bibliography/master.bib +50 -0
  57. data/examples/Bibliography/stats.bib +72 -0
  58. data/examples/islr/ch2.spec.rb +1 -1
  59. data/examples/islr/ch3_boston.rb +4 -4
  60. data/examples/islr/x_y_rnorm.jpg +0 -0
  61. data/examples/latex_templates/Test-acm_article/Makefile +16 -0
  62. data/examples/latex_templates/Test-acm_article/Test-acm_article.Rmd +65 -0
  63. data/examples/latex_templates/Test-acm_article/acm_proc_article-sp.cls +1670 -0
  64. data/examples/latex_templates/Test-acm_article/sensys-abstract.cls +703 -0
  65. data/examples/latex_templates/Test-acm_article/sigproc.bib +59 -0
  66. data/examples/latex_templates/Test-acs_article/Test-acs_article.Rmd +260 -0
  67. data/examples/latex_templates/Test-acs_article/Test-acs_article.pdf +0 -0
  68. data/examples/latex_templates/Test-acs_article/acs-Test-acs_article.bib +11 -0
  69. data/examples/latex_templates/Test-acs_article/acs-my_output.bib +11 -0
  70. data/examples/latex_templates/Test-acs_article/acstest.bib +17 -0
  71. data/examples/latex_templates/Test-aea_article/AEA.cls +1414 -0
  72. data/examples/latex_templates/Test-aea_article/BibFile.bib +0 -0
  73. data/examples/latex_templates/Test-aea_article/Test-aea_article.Rmd +108 -0
  74. data/examples/latex_templates/Test-aea_article/Test-aea_article.pdf +0 -0
  75. data/examples/latex_templates/Test-aea_article/aea.bst +1269 -0
  76. data/examples/latex_templates/Test-aea_article/multicol.sty +853 -0
  77. data/examples/latex_templates/Test-aea_article/references.bib +0 -0
  78. data/examples/latex_templates/Test-aea_article/setspace.sty +546 -0
  79. data/examples/latex_templates/Test-amq_article/Test-amq_article.Rmd +256 -0
  80. data/examples/latex_templates/Test-amq_article/Test-amq_article.pdf +0 -0
  81. data/examples/latex_templates/Test-amq_article/Test-amq_article.pdfsync +3397 -0
  82. data/examples/latex_templates/Test-amq_article/pics/Figure2.pdf +0 -0
  83. data/examples/latex_templates/Test-ams_article/Test-ams_article.Rmd +215 -0
  84. data/examples/latex_templates/Test-ams_article/amstest.bib +436 -0
  85. data/examples/latex_templates/Test-asa_article/Test-asa_article.Rmd +153 -0
  86. data/examples/latex_templates/Test-asa_article/Test-asa_article.pdf +0 -0
  87. data/examples/latex_templates/Test-asa_article/agsm.bst +1353 -0
  88. data/examples/latex_templates/Test-asa_article/bibliography.bib +233 -0
  89. data/examples/latex_templates/Test-ieee_article/IEEEtran.bst +2409 -0
  90. data/examples/latex_templates/Test-ieee_article/IEEEtran.cls +6346 -0
  91. data/examples/latex_templates/Test-ieee_article/Test-ieee_article.Rmd +175 -0
  92. data/examples/latex_templates/Test-ieee_article/Test-ieee_article.pdf +0 -0
  93. data/examples/latex_templates/Test-ieee_article/mybibfile.bib +20 -0
  94. data/examples/latex_templates/Test-rjournal_article/RJournal.sty +335 -0
  95. data/examples/latex_templates/Test-rjournal_article/RJreferences.bib +18 -0
  96. data/examples/latex_templates/Test-rjournal_article/RJwrapper.pdf +0 -0
  97. data/examples/latex_templates/Test-rjournal_article/Test-rjournal_article.Rmd +52 -0
  98. data/examples/latex_templates/Test-springer_article/Test-springer_article.Rmd +65 -0
  99. data/examples/latex_templates/Test-springer_article/Test-springer_article.pdf +0 -0
  100. data/examples/latex_templates/Test-springer_article/bibliography.bib +26 -0
  101. data/examples/latex_templates/Test-springer_article/spbasic.bst +1658 -0
  102. data/examples/latex_templates/Test-springer_article/spmpsci.bst +1512 -0
  103. data/examples/latex_templates/Test-springer_article/spphys.bst +1443 -0
  104. data/examples/latex_templates/Test-springer_article/svglov3.clo +113 -0
  105. data/examples/latex_templates/Test-springer_article/svjour3.cls +1431 -0
  106. data/examples/misc/moneyball.rb +1 -1
  107. data/examples/misc/subsetting.rb +37 -37
  108. data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.Rmd +73 -0
  109. data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.pdf +0 -0
  110. data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.Rmd +382 -0
  111. data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.pdf +0 -0
  112. data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.Rmd +164 -0
  113. data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.pdf +0 -0
  114. data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.Rmd +92 -0
  115. data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.pdf +0 -0
  116. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/attend-grade-relationships.csv +482 -0
  117. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.Rmd +280 -0
  118. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.pdf +0 -0
  119. data/examples/rmarkdown/svm-xaringan-example/svm-xaringan-example.Rmd +386 -0
  120. data/lib/R_interface/r.rb +2 -2
  121. data/lib/R_interface/r_libs.R +6 -1
  122. data/lib/R_interface/r_methods.rb +12 -2
  123. data/lib/R_interface/rdata_frame.rb +8 -17
  124. data/lib/R_interface/rindexed_object.rb +1 -2
  125. data/lib/R_interface/rlist.rb +1 -0
  126. data/lib/R_interface/robject.rb +20 -23
  127. data/lib/R_interface/rpkg.rb +15 -6
  128. data/lib/R_interface/rsupport.rb +13 -19
  129. data/lib/R_interface/ruby_extensions.rb +14 -18
  130. data/lib/R_interface/rvector.rb +0 -12
  131. data/lib/gknit.rb +2 -0
  132. data/lib/gknit/draft.rb +105 -0
  133. data/lib/gknit/knitr_engine.rb +6 -37
  134. data/lib/util/exec_ruby.rb +22 -84
  135. data/lib/util/inline_file.rb +7 -3
  136. data/specs/figures/bg.jpeg +0 -0
  137. data/specs/figures/bg.png +0 -0
  138. data/specs/figures/bg.svg +2 -2
  139. data/specs/figures/dose_len.png +0 -0
  140. data/specs/figures/no_args.jpeg +0 -0
  141. data/specs/figures/no_args.png +0 -0
  142. data/specs/figures/no_args.svg +2 -2
  143. data/specs/figures/width_height.jpeg +0 -0
  144. data/specs/figures/width_height.png +0 -0
  145. data/specs/figures/width_height_units1.jpeg +0 -0
  146. data/specs/figures/width_height_units1.png +0 -0
  147. data/specs/figures/width_height_units2.jpeg +0 -0
  148. data/specs/figures/width_height_units2.png +0 -0
  149. data/specs/r_dataframe.spec.rb +184 -11
  150. data/specs/r_list.spec.rb +4 -4
  151. data/specs/r_list_apply.spec.rb +11 -10
  152. data/specs/ruby_expression.spec.rb +3 -11
  153. data/specs/tmp.rb +106 -34
  154. data/version.rb +1 -1
  155. metadata +96 -33
  156. data/bin/gknit_old_r +0 -236
  157. data/blogs/dev/dev.Rmd +0 -77
  158. data/blogs/dev/dev.md +0 -87
  159. data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
  160. data/blogs/dev/dev_files/figure-html/diverging_bar. +0 -0
  161. data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0
  162. data/blogs/dplyr/dplyr.rb +0 -63
  163. data/blogs/galaaz_ggplot/galaaz_ggplot.aux +0 -43
  164. data/blogs/galaaz_ggplot/galaaz_ggplot.log +0 -640
  165. data/blogs/galaaz_ggplot/galaaz_ggplot.out +0 -10
  166. data/blogs/galaaz_ggplot/galaaz_ggplot.tex +0 -481
  167. data/blogs/galaaz_ggplot/midwest.png +0 -0
  168. data/blogs/galaaz_ggplot/scatter_plot.png +0 -0
  169. data/blogs/ruby_plot/ruby_plot.Rmd_external_figs +0 -662
  170. data/blogs/ruby_plot/ruby_plot.tex +0 -1077
  171. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +0 -57
  172. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +0 -106
  173. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +0 -110
  174. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +0 -174
  175. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +0 -236
  176. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +0 -296
  177. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +0 -236
  178. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +0 -218
  179. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +0 -128
  180. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +0 -150
  181. data/examples/paper/paper.rb +0 -36
File without changes
@@ -0,0 +1,849 @@
1
+ ---
2
+ title: "Non Standard Evaluation in dplyr with Galaaz"
3
+ author:
4
+ - "Rodrigo Botafogo"
5
+ - "Daniel Mossé - University of Pittsburgh"
6
+ tags: [Tech, Data Science, Ruby, R, GraalVM]
7
+ date: "10/05/2019"
8
+ output:
9
+ html_document:
10
+ self_contained: true
11
+ keep_md: true
12
+ pdf_document:
13
+ includes:
14
+ in_header: ["../../sty/galaaz.sty"]
15
+ number_sections: yes
16
+ toc: true
17
+ toc_depth: 2
18
+ md_document:
19
+ variant: markdown_github
20
+ fontsize: 11pt
21
+ ---
22
+
23
+ ```{r setup, echo=FALSE, message = FALSE}
24
+ #R.options(crayon__enabled: false)
25
+ options(crayon.enabled = FALSE)
26
+ library('dplyr')
27
+ library('tibble')
28
+ ```
29
+
30
+ # Introduction
31
+
32
+ According to Steven Sagaert answer on Quora about "Is programming language R overrated?":
33
+
34
+ > R is a sophisticated language with an unusual (i.e. non-mainstream) set of features. It‘s
35
+ > an impure functional programming language with sophisticated metaprogramming and 3
36
+ > different OO systems.
37
+
38
+ > Just like common lisp you can completely customise how things work via metaprogramming.
39
+ > The biggest example is the tidyverse: by creating it’s own evaluation system (tidyeval)
40
+ > was able to create a custom syntax for dplyr.
41
+
42
+ > Mastering R (the language) and its ecosystem is not a matter of weeks or months but
43
+ > takes years. The rabbit hole goes pretty deep…
44
+
45
+ Although having a highly configurable language might give extreme power to the programmer,
46
+ it can also be, as stated above, a question of years to master it. Programming with _dplyr_
47
+ for instance, requires learning a set of complex concepts and rules that are not easily
48
+ accessible for casual users or _unsofisticated_ programmers as many users of R are. Being
49
+ _unsofisticated_ is NOT used here in a negative sense, as R was build for statitians and
50
+ not programmers, that need to solve real problems, often in a short time spam and are not
51
+ concerned about creating complex computer systems.
52
+
53
+ Unfortunatelly, if this _unsofisticated_ programmer decides to move unto more sofisticated
54
+ coding, the learning curve might become a serious impediment.
55
+
56
+ In this post we will see how to program with _dplyr_ in Galaaz and how Ruby can simplify
57
+ the learning curve of mastering _dplyr_ coding.
58
+
59
+ # But first, what is Galaaz??
60
+
61
+ Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with
62
+ a large community, a very large set of libraries and great for web development. It is also
63
+ easy to learn. However,
64
+ it lacks libraries for data science, statistics, scientific plotting and machine learning.
65
+ On the other hand, R is considered one of the most powerful languages for solving all of the
66
+ above problems. Maybe the strongest competitor to R is Python with libraries such as NumPy,
67
+ Pandas, SciPy, SciKit-Learn and many more. We will not get here in the discussion on R
68
+ versus Python, both are excellent languages with powerful features, benefits and drawbacks.
69
+ Our interest is to bring to yet another excellent language, Ruby, the data science libraries
70
+ that it lacks.
71
+
72
+ With Galaaz we do not intend to re-implement any of the scientific libraries in R. However, we
73
+ allow for very tight coupling between the two languages to the point that the Ruby
74
+ developer does not need to know that there is an R engine running. Also, from the point of
75
+ view of the R user/developer, Galaaz looks a lot like R, with just minor syntactic difference,
76
+ so there is almost no learning curve for the R developer. And as we will see in this
77
+ post that programming with _dplyr_ is easier in Galaaz than in R.
78
+
79
+ R users are probably quite knowledgeable about _dplyr_. For the Ruby developer, _dplyr_ and
80
+ the _tidyverse_ libraries are a set of libraries for data manipulation in R, developed by
81
+ Hardley Wickham, chief scientis at RStudio and a prolific R coder and writer.
82
+
83
+ For the coupling of Ruby and R, we use new technologies provided by Oracle: GraalVM,
84
+ TruffleRuby and FastR. GraalVM home page had the following definition:
85
+
86
+ GraalVM is a universal virtual machine for running applications
87
+ written in JavaScript, Python 3, Ruby, R, JVM-based languages like Java,
88
+ Scala, Kotlin, and LLVM-based languages such as C and C++.
89
+
90
+ GraalVM removes the isolation between programming languages and enables
91
+ interoperability in a shared runtime. It can run either standalone or in
92
+ the context of OpenJDK, Node.js, Oracle Database, or MySQL.
93
+
94
+ GraalVM allows you to write polyglot applications with a seamless way to
95
+ pass values from one language to another. With GraalVM there is no copying
96
+ or marshaling necessary as it is with other polyglot systems. This lets
97
+ you achieve high performance when language boundaries are crossed. Most
98
+ of the time there is no additional cost for crossing a language boundary
99
+ at all.
100
+
101
+ Often developers have to make uncomfortable compromises that require them
102
+ to rewrite their software in other languages. For example:
103
+
104
+ * “That library is not available in my language. I need to rewrite it.”
105
+ * “That language would be the perfect fit for my problem, but we cannot
106
+ run it in our environment.”
107
+ * “That problem is already solved in my language, but the language is
108
+ too slow.”
109
+
110
+ With GraalVM we aim to allow developers to freely choose the right language
111
+ for the task at hand without making compromises.
112
+
113
+
114
+ # Tidyverse and dplyr
115
+
116
+ In [What is the tidyverse?](https://rviews.rstudio.com/2017/06/08/what-is-the-tidyverse/) the
117
+ tidyverse is explained as follows:
118
+
119
+ > The tidyverse is a coherent system of packages for data manipulation, exploration and
120
+ > visualization that share a common design philosophy. These were mostly developed by
121
+ > Hadley Wickham himself, but they are now being expanded by several contributors. Tidyverse
122
+ > packages are intended to make statisticians and data scientists more productive by
123
+ > guiding them through workflows that facilitate communication, and result in reproducible
124
+ > work products. Fundamentally, the tidyverse is about the connections between the tools
125
+ > that make the workflow possible.
126
+
127
+ _dplyr_ is one of the many packages that are part of the tidyverse. It is:
128
+
129
+ > a grammar of data manipulation, providing a consistent set of verbs that help you solve
130
+ > the most common data manipulation challenges:
131
+
132
+ > 1. mutate() adds new variables that are functions of existing variables
133
+ > 2. select() picks variables based on their names.
134
+ > 3. filter() picks cases based on their values.
135
+ > 4. summarise() reduces multiple values down to a single summary.
136
+ > 5. arrange() changes the ordering of the rows.
137
+
138
+ Very often R is used interactively and users use _dplyr_ to manipulate a single dataset
139
+ without programming. When users want to replicate their work for
140
+ multiple datasets, programming becomes necessary.
141
+
142
+ # Programming with dplyr
143
+
144
+ In the vignette ["Programming with dplyr"](https://dplyr.tidyverse.org/articles/programming.html),
145
+ Hardley Wickham states:
146
+
147
+ > Most dplyr functions use non-standard evaluation (NSE). This is a catch-all term that
148
+ > means they don’t follow the usual R rules of evaluation. Instead, they capture the
149
+ > expression that you typed and evaluate it in a custom way. This has two main
150
+ > benefits for dplyr code:
151
+
152
+ > Operations on data frames can be expressed succinctly because you don’t need to repeat
153
+ > the name of the data frame. For example, you can write filter(df, x == 1, y == 2, z == 3)
154
+ > instead of df[df\$x == 1 & df\$y ==2 & df\$z == 3, ].
155
+
156
+ > dplyr can choose to compute results in a different way to base R. This is important for
157
+ > database backends because dplyr itself doesn’t do any work, but instead generates the SQL
158
+ > that tells the database what to do.
159
+
160
+ But then he goes on:
161
+
162
+ > Unfortunately these benefits do not come for free. There are two main drawbacks:
163
+
164
+ > Most dplyr arguments are not referentially transparent. That means you can’t replace a value
165
+ > with a seemingly equivalent object that you’ve defined elsewhere. In other words, this code:
166
+
167
+ ```{r tibble, eval=FALSE}
168
+ df <- data.frame(x = 1:3, y = 3:1)
169
+ print(filter(df, x == 1))
170
+ #> # A tibble: 1 x 2
171
+ #> x y
172
+ #> <int> <int>
173
+ #> 1 1 3
174
+ ```
175
+ > Is not equivalent to this code:
176
+
177
+ ```{r my_var_err, eval = FALSE}
178
+ my_var <- x
179
+ #> Error in eval(expr, envir, enclos): object 'x' not found
180
+ filter(df, my_var == 1)
181
+ #> Error: object 'my_var' not found
182
+ ```
183
+ > This makes it hard to create functions with arguments that change how dplyr verbs are computed.
184
+
185
+ As a result of this, programming with _dplyr_ requires learning a set of new ideas and concepts.
186
+ In this vignette Hardley goes on showing how to program ever more difficult problems with _dplyr_,
187
+ showing the problems it faces and the new concepts needed to solve them.
188
+
189
+ In this blog, we will look at all the problems presented by Harley on the vignette and show how
190
+ those same problems can be solved using Galaaz and the Ruby language.
191
+
192
+ This blog is organized as follows: first we show how to write expressions using Galaaz.
193
+ Expressions are a fundamental concept in _dplyr_ and are not part of basic Ruby. We extend
194
+ the Ruby language create a manipulate expressions that will be used by _dplyr_ functions.
195
+
196
+ Then we show very succintly how Ruby and R can be integrated and how R functions are
197
+ transparently called from Ruby. Galaaz [user manual](https://github.com/rbotafogo/galaaz/wiki)
198
+ (still in development) goes in much deeper detail about this integration.
199
+
200
+ Next in section "Data manipulation wiht _dplyr_" we go through all the problems on the
201
+ _dplyr_ vignette and look at how they are solved in Galaaz. We then discuss why programming
202
+ with Galaaz and _dplyr_ is easier than programming with _dplyr_ in plain R.
203
+
204
+ The following section looks at another more advanced problem and shows that Galaaz can still
205
+ handle it without any difficulty. We then provide further reading and concluding remarks.
206
+
207
+ # Writing Expressions in Galaaz
208
+
209
+ Galaaz extends Ruby to work with expressions, similar to R's expressions build with 'quote'
210
+ (base R) or 'quo' (tidyverse). Expressions in this context are like mathematical expressions or
211
+ formulae. For instance, in mathematics, the expression $y = sin(x)$ describes a function but cannot
212
+ be computed unless the value of $x$ is bound to some value.
213
+
214
+ Expressions are fundamental in _dplyr_ programming as they are the input to _dplyr_ functions,
215
+ for instance, as we will see shortly, if a data frame has a column named 'x' and we want
216
+ to add another column, y, to this dataframe that has the values of 'x' times 2, then we would
217
+ call a _dplyr_ function with the expression 'y = x * 2'.
218
+
219
+ ## A note on notation
220
+
221
+ This blog was written in Rmarkdown and automatically converted to HTML or PDF (depending on
222
+ where you are reading this blog) with gKnit (a tool provided by Galaaz). In Rmarkdown, it is
223
+ possible to write text and code blocks that are executed to generate the final report. Code
224
+ blocks appear inside a 'box' and the result of their execution appear either in another type
225
+ of 'box' with a different background (HTML) or as normal text (PDF). Every output line from
226
+ the code execution is preceeded by '##'.
227
+
228
+ ## Expressions from operators
229
+
230
+ The code below creates an expression summing two symbols. Note that :a and :b are Ruby symbols and
231
+ are not bound to any values at the time of expression definition:
232
+
233
+ ```{ruby expressions}
234
+ exp1 = :a + :b
235
+ puts exp1
236
+ ```
237
+ In Galaaz, we can build any complex mathematical expression such as:
238
+
239
+ ```{ruby expr2}
240
+ exp2 = (:a + :b) * 2.0 + :c ** 2 / :z
241
+ puts exp2
242
+ ```
243
+ Expressions are printed with the same format as the equivalent R expressions. The 'L' after
244
+ 2 indicates that 2 is an integer.
245
+
246
+ The R developer should note that in R, if she writes the
247
+ number '2', the R interpreter will convert it to float. In order to get an interger she
248
+ should write '2L'. Galaaz follows Ruby notation and '2' is an integer, while '2.0' is a
249
+ float.
250
+
251
+ It is also possible to use inequality operators in building expressions:
252
+
253
+ ```{ruby expr3}
254
+ exp3 = (:a + :b) >= :z
255
+ puts exp3
256
+ ```
257
+ Expressions' definition can also make use of normal Ruby variables without any problem:
258
+
259
+ ```{ruby expr_with_var}
260
+ x = 20
261
+ y = 30.0
262
+ exp_var = (:a + :b) * x <= :z - y
263
+ puts exp_var
264
+ ```
265
+
266
+ Galaaz provides both symbolic representations for operators, such as (>, <, !=) as functional
267
+ notation for those operators such as (.gt, .ge, etc.). So the same expression written
268
+ above can also be written as
269
+
270
+ ```{ruby expr4}
271
+ exp4 = (:a + :b).ge :z
272
+ puts exp4
273
+ ```
274
+
275
+ Two types of expressions, however, can only be created with the functional representation
276
+ of the operators. Those are expressions involving '==', and '='. This is the case since
277
+ those symbols have special meaning in Ruby and should not be redefined.
278
+
279
+ In order to write an expression involving '==' we
280
+ need to use the method '.eq' and for '=' we need the function '.assign':
281
+
282
+ ```{ruby expr5}
283
+ exp5 = (:a + :b).eq :z
284
+ puts exp5
285
+ ```
286
+
287
+ ```{ruby expr6}
288
+ exp6 = :y.assign :a + :b
289
+ puts exp6
290
+ ```
291
+ Users should be careful when writing expressions not to inadvertently use '==' or '=' as
292
+ this will generate an error, that might be a bit cryptic (in future releases of Galaza, we
293
+ plan to improve the error message).
294
+
295
+ ```{ruby exp_wrong, warning=FALSE}
296
+ exp_wrong = (:a + :b) == :z
297
+ puts exp_wrong
298
+ ```
299
+ The problem lies with the fact that
300
+ when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When this
301
+ comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols, at
302
+ this time, are not bound to anything giving the "object 'a' not found" message.
303
+
304
+ ## Expressions with R methods
305
+
306
+ It is often necessary to create an expression that uses a method or function. For instance, in
307
+ mathematics, it's quite natural to write an expressin such as $y = sin(x)$. In this case, the
308
+ 'sin' function is part of the expression and should not be immediately executed. When we want
309
+ the function to be part of the expression, we call the function preceeding it
310
+ by the letter E, such as 'E.sin(x)'
311
+
312
+ ```{ruby method_expression}
313
+ exp7 = :y.assign E.sin(:x)
314
+ puts exp7
315
+ ```
316
+ Function expressions can also be written using '.' notation:
317
+
318
+ ```{ruby expression_with_dot}
319
+ exp8 = :y.assign :x.sin
320
+ puts exp8
321
+ ```
322
+ When a function has multiple arguments, the first one can be used before the '.'. For instance,
323
+ the R concatenate function 'c', that concatenates two or more arguments can be part of
324
+ an expression as:
325
+
326
+ ```{ruby expression_multiple_args}
327
+ exp9 = :x.c(:y)
328
+ puts exp9
329
+ ```
330
+ Note that this gives an OO feeling to the code, as if we were saying 'x' concatenates 'y'. As a
331
+ side note, '.' notation can be used as the R pipe operator '%>%', but is more general than the
332
+ pipe.
333
+
334
+ ## Evaluating an Expression
335
+
336
+ Although we are mainly focusing on expressions to pass them to _dplyr_ functions, expressions
337
+ can be evaluated by calling function 'eval' with a binding.
338
+
339
+ A binding can be provided with a list or a data frame as shown below:
340
+
341
+ ```{ruby eval_expression_list}
342
+ exp = (:a + :b) * 2.0 + :c ** 2 / :z
343
+ puts exp.eval(R.list(a: 10, b: 20, c: 30, z: 40))
344
+ ```
345
+
346
+ with a data frame:
347
+
348
+ ```{ruby eval_expression_df}
349
+ df = R.data__frame(
350
+ a: R.c(1, 2, 3),
351
+ b: R.c(10, 20, 30),
352
+ c: R.c(100, 200, 300),
353
+ z: R.c(1000, 2000, 3000))
354
+
355
+ puts exp.eval(df)
356
+ ```
357
+
358
+ # Using Galaaz to call R functions
359
+
360
+ Galaaz tries to emulate as closely as possible the way R functions are called and migrating from
361
+ R to Galaaz should be quite easy requiring only minor syntactic changes to an R script. In
362
+ this post, we do not have enough space to write a complete manual on Galaaz
363
+ (a short manual can be found at: https://www.rubydoc.info/gems/galaaz/0.4.9), so we will
364
+ present only a few examples scripts using Galaaz.
365
+
366
+ Basically, to call an R function from Ruby with Galaaz, one only needs to preced the function
367
+ with 'R.'. For instance, to create a vector in R, the 'c' function is used. In Galaaz, a
368
+ vector can be created by using 'R.c':
369
+
370
+ ```{ruby vector}
371
+ vec = R.c(1.0, 2, 3)
372
+ puts vec
373
+ ```
374
+ A list is created in R with the 'list' function, so in Galaaz we do:
375
+
376
+ ```{ruby list}
377
+ list = R.list(a: 1.0, b: 2, c: 3)
378
+ puts list
379
+ ```
380
+ Note that we can use named arguments in our list. The same code in R would be:
381
+
382
+ ```{r list2}
383
+ lst = list(a = 1, b = 2L, c = 3L)
384
+ print(lst)
385
+ ```
386
+ Now, let's say that 'x' is an angle of 45$^\circ$ and we acttually want to create
387
+ the expression $y = sin(45^\circ)$, which is $y = 0.850...$. In this case,
388
+ we will use 'R.sin':
389
+
390
+ ```{ruby eval_sin}
391
+ exp10 = :y.assign R.sin(45)
392
+ puts exp10
393
+ ```
394
+
395
+ # Data manipulation wiht _dplyr_
396
+
397
+ In this section we will give a brief tour _dplyr_'s usage in Galaaz and how to manipulate
398
+ data in Ruby with it. This section will follow [_dplyr_'s vignette](https://dplyr.tidyverse.org/articles/dplyr.html) that explores the nycflights13 data set. This dataset contains all 336776
399
+ flights that departed from New York City in 2013. The data comes from the US Bureau of
400
+ Transportation Statistics.
401
+
402
+ Let's start by taking a look at this dataset:
403
+
404
+ ```{ruby nycflights13}
405
+ R.library('nycflights13')
406
+ # check it's dimension
407
+ puts ~:flights.dim
408
+ # and the structure
409
+ ~:flights.str
410
+ ```
411
+
412
+ Now, let's use a first verb of _dplyr_: 'filter'. This verb, obviously, will filter the data
413
+ by the given expression. In the next block, we filter by columns 'month' and 'day'. The
414
+ first argument to the filter function is symbol ':flights'. A Ruby symbol, when given to
415
+ an R function will convert to the R variable of the same name, in this case 'flights', that
416
+ holds the nycflights13 data frame.
417
+
418
+ The second and third arguments are expressions that will be used by the filter function to
419
+ filter by columns, looking for entries in which the month and day are equal to 1.
420
+
421
+ ```{ruby filter}
422
+ puts R.filter(:flights, (:month.eq 1), (:day.eq 1))
423
+ ```
424
+
425
+
426
+ ## Programming with _dplyr_: problems and how to solve them in Galaaz
427
+
428
+ In this section we look at the list of problems that Hardley describes in the "Programming with dplyr"
429
+ vignette and show how those problems are solved and coded with Galaaz. Readers interested in
430
+ how those problems are treated in _dplyr_ should read the vignette and use it as a comparison with
431
+ this blog.
432
+
433
+ ## Filtering using expressions
434
+
435
+ Now that we know how to write expressions and call R functions, let's do some data manipulation in
436
+ Galaaz. Let's first start by creating a data frame. In R, the 'data.frame' function creates a
437
+ data frame. In Ruby, writing 'data.frame' will not parse as a single object. To call R
438
+ functions that have a '.' in them, we need to substitute the '.' with '__'. So, method
439
+ 'data.frame' in R, is called in Galaaz as 'R.data\_\_frame':
440
+
441
+ ```{ruby df}
442
+ df = R.data__frame(x: (1..3), y: (3..1))
443
+ puts df
444
+ ```
445
+
446
+ _dplyr_ provides the 'filter' function, that filters data in a data brame. The 'filter'
447
+ function can be called on this data frame either by using 'R.filter(df, ...)' or
448
+ by using dot notation.
449
+
450
+ -------FIX---------
451
+
452
+ We prefer to use dot notation as shown bellow. The argument to 'filter' should be an
453
+ expression. Note that if we gave to filter a Ruby expression such as
454
+ 'x == 1', we would get an error, since there is no variable 'x' defined and if 'x' was a variable
455
+ then 'x == 1' would either be 'true' or 'false'. Our goal is to filter our data frame returning
456
+ all rows in which the 'x' value is equal to 1. To express this we want: ':x.eq 1', where :x will
457
+ be interpreted by filter as the 'x' column.
458
+
459
+ ```{ruby filter_exp}
460
+ puts df.filter(:x.eq 1)
461
+ ```
462
+ In R, and when coding with 'tidyverse', arguments to a function are usually not
463
+ *referencially transparent*. That is, you can’t replace a value with a seemingly equivalent
464
+ object that you’ve defined elsewhere. In other words, this code
465
+
466
+ ```{r not_transp, eval=FALSE}
467
+ my_var <- x
468
+ filter(df, my_var == 1)
469
+ ```
470
+ Generates the following error: "object 'x' not found.
471
+
472
+ However, in Galaaz, arguments are referencially transparent as can be seen by the
473
+ code bellow. Note initally that 'my_var = :x' will not give the error "object 'x' not found"
474
+ since ':x' is treated as an expression and assigned to my\_var. Then when doing (my\_var.eq 1),
475
+ my\_var is a variable that resolves to ':x' and it becomes equivalent to (:x.eq 1) which is
476
+ what we want.
477
+
478
+ ```{ruby my_var}
479
+ my_var = :x
480
+ puts df.filter(my_var.eq 1)
481
+ ```
482
+ As stated by Hardley
483
+
484
+ > dplyr code is ambiguous. Depending on what variables are defined where,
485
+ > filter(df, x == y) could be equivalent to any of:
486
+
487
+ ```
488
+ df[df$x == df$y, ]
489
+ df[df$x == y, ]
490
+ df[x == df$y, ]
491
+ df[x == y, ]
492
+ ```
493
+ In galaaz this ambiguity does not exist, filter(df, x.eq y) is not a valid expression as
494
+ expressions are build with symbols. In doing filter(df, :x.eq y) we are looking for elements
495
+ of the 'x' column that are equal to a previously defined y variable. Finally in
496
+ filter(df, :x.eq :y) we are looking for elements in which the 'x' column value is equal to
497
+ the 'y' column value. This can be seen in the following two chunks of code:
498
+
499
+ ```{ruby disamb1}
500
+ y = 1
501
+ x = 2
502
+
503
+ # looking for values where the 'x' column is equal to the 'y' column
504
+ puts df.filter(:x.eq :y)
505
+ ```
506
+
507
+ ```{ruby disamb2}
508
+ # looking for values where the 'x' column is equal to the 'y' variable
509
+ # in this case, the number 1
510
+ puts df.filter(:x.eq y)
511
+ ```
512
+ ## Writing a function that applies to different data sets
513
+
514
+ Let's suppose that we want to write a function that receives as the first argument a data frame
515
+ and as second argument an expression that adds a column to the data frame that is equal to the
516
+ sum of elements in column 'a' plus 'x'.
517
+
518
+ Here is the intended behaviour using the 'mutate' function of 'dplyr':
519
+
520
+ ```
521
+ mutate(df1, y = a + x)
522
+ mutate(df2, y = a + x)
523
+ mutate(df3, y = a + x)
524
+ mutate(df4, y = a + x)
525
+ ```
526
+ The naive approach to writing an R function to solve this problem is:
527
+
528
+ ```
529
+ mutate_y <- function(df) {
530
+ mutate(df, y = a + x)
531
+ }
532
+ ```
533
+ Unfortunately, in R, this function can fail silently if one of the variables isn’t present
534
+ in the data frame, but is present in the global environment. We will not go through here how
535
+ to solve this problem in R.
536
+
537
+ In Galaaz the method mutate_y bellow will work fine and will never fail silently.
538
+
539
+ ```{ruby mutate_y, warning=FALSE}
540
+ def mutate_y(df)
541
+ df.mutate(:y.assign :a + :x)
542
+ end
543
+ ```
544
+ Here we create a data frame that has only one column named 'x':
545
+
546
+ ```{ruby data_frame_no_a_column, warning=FALSE}
547
+ df1 = R.data__frame(x: (1..3))
548
+ puts df1
549
+ ```
550
+
551
+ Note that method mutate_y will fail independetly from the fact that variable 'a' is defined and
552
+ in the scope of the method. Variable 'a' has no relationship with the symbol ':a' used in the
553
+ definition of 'mutate\_y' above:
554
+
555
+ ```{ruby call_mutate_y, warning = FALSE}
556
+ a = 10
557
+ mutate_y(df1)
558
+ ```
559
+ ## Different expressions
560
+
561
+ Let's move to the next problem as presented by Hardley where trying to write a function in R
562
+ that will receive two argumens, the first a variable and the second an expression is not trivial.
563
+ Bellow we create a data frame and we want to write a function that groups data by a variable and
564
+ summarises it by an expression:
565
+
566
+ ```{r diff_expr}
567
+ set.seed(123)
568
+
569
+ df <- data.frame(
570
+ g1 = c(1, 1, 2, 2, 2),
571
+ g2 = c(1, 2, 1, 2, 1),
572
+ a = sample(5),
573
+ b = sample(5)
574
+ )
575
+
576
+ as.data.frame(df)
577
+
578
+ d2 <- df %>%
579
+ group_by(g1) %>%
580
+ summarise(a = mean(a))
581
+
582
+ as.data.frame(d2)
583
+
584
+ d2 <- df %>%
585
+ group_by(g2) %>%
586
+ summarise(a = mean(a))
587
+
588
+ as.data.frame(d2)
589
+ ```
590
+
591
+ As shown by Hardley, one might expect this function to do the trick:
592
+
593
+ ```{r diff_exp_fnc}
594
+ my_summarise <- function(df, group_var) {
595
+ df %>%
596
+ group_by(group_var) %>%
597
+ summarise(a = mean(a))
598
+ }
599
+
600
+ # my_summarise(df, g1)
601
+ #> Error: Column `group_var` is unknown
602
+ ```
603
+
604
+ In order to solve this problem, coding with dplyr requires the introduction of many new concepts
605
+ and functions such as 'quo', 'quos', 'enquo', 'enquos', '!!' (bang bang), '!!!' (triple bang).
606
+ Again, we'll leave to Hardley the explanation on how to use all those functions.
607
+
608
+ Now, let's try to implement the same function in galaaz. The next code block first prints the
609
+ 'df' data frame define previously in R (to access an R variable from Galaaz, we use the tilda
610
+ operator '~' applied to the R variable name as symbol, i.e., ':df'. We then create the
611
+ 'my_summarize' method and call it passing the R data frame and the group by variable ':g1':
612
+
613
+ ```{ruby diff_exp_ruby_func}
614
+ puts ~:df
615
+ print "\n"
616
+
617
+ def my_summarize(df, group_var)
618
+ df.group_by(group_var).
619
+ summarize(a: :a.mean)
620
+ end
621
+
622
+ puts my_summarize(:df, :g1)
623
+ ```
624
+ It works!!! Well, let's make sure this was not just some coincidence
625
+
626
+ ```{ruby group_g2}
627
+ puts my_summarize(:df, :g2)
628
+ ```
629
+
630
+ Great, everything is fine! No magic, no new functions, no complexities, just normal, standard Ruby
631
+ code. If you've ever done NSE in R, this certainly feels much safer and easy to implement.
632
+
633
+ ## Different input variables
634
+
635
+ In the previous section we've managed to get rid of all NSE formulation for a simple example, but
636
+ does this remain true for more complex examples, or will the Galaaz way prove inpractical for
637
+ more complex code?
638
+
639
+ In the next example Hardley proposes us to write a function that given an expression such as 'a'
640
+ or 'a * b', calculates three summaries. What we want a function that does the same as these R
641
+ statements:
642
+
643
+ ```
644
+ summarise(df, mean = mean(a), sum = sum(a), n = n())
645
+ #> # A tibble: 1 x 3
646
+ #> mean sum n
647
+ #> <dbl> <int> <int>
648
+ #> 1 3 15 5
649
+
650
+ summarise(df, mean = mean(a * b), sum = sum(a * b), n = n())
651
+ #> # A tibble: 1 x 3
652
+ #> mean sum n
653
+ #> <dbl> <int> <int>
654
+ #> 1 9 45 5
655
+ ```
656
+
657
+ Let's try it in galaaz:
658
+
659
+ ```{ruby summarize_method}
660
+ def my_summarise2(df, expr)
661
+ df.summarize(
662
+ mean: E.mean(expr),
663
+ sum: E.sum(expr),
664
+ n: E.n
665
+ )
666
+ end
667
+
668
+ puts my_summarise2((~:df), :a)
669
+ puts my_summarise2((~:df), :a * :b)
670
+ ```
671
+
672
+ Once again, there is no need to use any special theory or functions. The only point to be
673
+ careful about is the use of 'E' to build expressions from functions 'mean', 'sum' and 'n'.
674
+
675
+ ## Different input and output variable
676
+
677
+ Now the next challenge presented by Hardley is to vary the name of the output variables based on
678
+ the received expression. So, if the input expression is 'a', we want our data frame columns to
679
+ be named 'mean\_a' and 'sum\_a'. Now, if the input expression is 'b', columns
680
+ should be named 'mean\_b' and 'sum\_b'.
681
+
682
+ ```
683
+ mutate(df, mean_a = mean(a), sum_a = sum(a))
684
+ #> # A tibble: 5 x 6
685
+ #> g1 g2 a b mean_a sum_a
686
+ #> <dbl> <dbl> <int> <int> <dbl> <int>
687
+ #> 1 1 1 1 3 3 15
688
+ #> 2 1 2 4 2 3 15
689
+ #> 3 2 1 2 1 3 15
690
+ #> 4 2 2 5 4 3 15
691
+ #> # … with 1 more row
692
+
693
+ mutate(df, mean_b = mean(b), sum_b = sum(b))
694
+ #> # A tibble: 5 x 6
695
+ #> g1 g2 a b mean_b sum_b
696
+ #> <dbl> <dbl> <int> <int> <dbl> <int>
697
+ #> 1 1 1 1 3 3 15
698
+ #> 2 1 2 4 2 3 15
699
+ #> 3 2 1 2 1 3 15
700
+ #> 4 2 2 5 4 3 15
701
+ #> # … with 1 more row
702
+ ```
703
+ In order to solve this problem in R, Hardley needs to introduce some more new functions and notations:
704
+ 'quo_name' and the ':=' operator from package 'rlang'
705
+
706
+ Here is our Ruby code:
707
+
708
+ ```{ruby name_change}
709
+ def my_mutate(df, expr)
710
+ mean_name = "mean_#{expr.to_s}"
711
+ sum_name = "sum_#{expr.to_s}"
712
+
713
+ df.mutate(mean_name => E.mean(expr),
714
+ sum_name => E.sum(expr))
715
+ end
716
+
717
+ puts my_mutate((~:df), :a)
718
+ puts my_mutate((~:df), :b)
719
+ ```
720
+ It really seems that "Non Standard Evaluation" is actually quite standard in Galaaz! But, you
721
+ might have noticed a small change in the way the arguments to the mutate method were called.
722
+ In a previous example we used df.summarise(mean: E.mean(:a), ...) where the column name was
723
+ followed by a ':' colom. In this example, we have df.mutate(mean_name => E.mean(expr), ...)
724
+ and variable mean\_name is not followed by ':' but by '=>'. This is standard Ruby notation.
725
+
726
+ [explain....]
727
+
728
+ ## Capturing multiple variables
729
+
730
+ Moving on with new complexities, Hardley proposes us to solve the problem in which the
731
+ summarise function will receive any number of grouping variables.
732
+
733
+ This again is quite standard Ruby. In order to receive an undefined number of paramenters
734
+ the paramenter is preceded by '*':
735
+
736
+ ```{ruby multiple_vars}
737
+ def my_summarise3(df, *group_vars)
738
+ df.group_by(*group_vars).
739
+ summarise(a: E.mean(:a))
740
+ end
741
+
742
+ puts my_summarise3((~:df), :g1, :g2)
743
+ ```
744
+
745
+ # Why does R require NSE and Galaaz does not?
746
+
747
+ NSE introduces a number of new concepts, such as 'quoting', 'quasiquotation', 'unquoting' and
748
+ 'unquote-splicing', while in Galaaz none of those concepts are needed. What gives?
749
+
750
+ R is an extremely flexible language and it has lazy evaluation of parameters. When in R a
751
+ function is called as 'summarise(df, a = b)', the summarise function receives the litteral
752
+ 'a = b' parameter and can work with this as if it were a string. In R, it is not clear what
753
+ a and b are, they can be expressions or they can be variables, it is up to the function to
754
+ decide what 'a = b' means.
755
+
756
+ In Ruby, there is no lazy evaluation of parameters and 'a' is always a variable and so is 'b'.
757
+ Variables assume their value as soon as they are used, so 'x = a' is immediately evaluate and
758
+ variable 'x' will receive the value of variable 'a' as soon as the Ruby statement is executed.
759
+ Ruby also provides the notion of a symbol; ':a' is a symbol and does not evaluate to anything.
760
+ Galaaz uses Ruby symbols to build expressions that are not bound to anything: ':a.eq :b' is
761
+ clearly an expression and has no relationship whatsoever with the statment 'a = b'. By using
762
+ symbols, variables and expressions all the possible ambiguities that are found in R are
763
+ eliminated in Galaaz.
764
+
765
+ The main problem that remains, is that in R, functions are not clearly documented as what type
766
+ of input they are expecting, they might be expecting regular variables or they might be
767
+ expecting expressions and the R function will know how to deal with an input of the form
768
+ 'a = b', now for the Ruby developer it might not be immediately clear if it should call the
769
+ function passing the value 'true' if variable 'a' is equal to variable 'b' or if it should
770
+ call the function passing the expression ':a.eq :b'.
771
+
772
+
773
+ # Advanced dplyr features
774
+
775
+ In the blog: [Programming with dplyr by using dplyr](https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/) Iñaki Úcar shows surprise that some R users are trying to code in dplyr avoiding
776
+ the use of NSE. For instance he says:
777
+
778
+ > Take the example of seplyr. It stands for standard evaluation dplyr, and enables us to
779
+ > program over dplyr without having “to bring in (or study) any deep-theory or
780
+ > heavy-weight tools such as rlang/tidyeval”.
781
+
782
+ For me, there isn't really any surprise that users are trying to avoid dplyr deep-theory. R
783
+ users frequently are not programmers and learning to code is already hard business, on top
784
+ of that, having to learn how to 'quote' or 'enquo' or 'quos' or 'enquos' is not necessarily
785
+ a 'piece of cake'. So much so, that 'tidyeval' has some more advanced functions that instead
786
+ of using quoted expressions, uses strings as arguments.
787
+
788
+ In the following examples, we show the use of functions 'group\_by\_at', 'summarise\_at' and
789
+ 'rename\_at' that receive strings as argument. The data frame used in 'starwars' that describes
790
+ features of characters in the Starwars movies:
791
+
792
+ ```{ruby starwars}
793
+ puts (~:starwars).head
794
+ ```
795
+ The grouped_mean function bellow will receive a grouping variable and calculate summaries for
796
+ the value\_variables given:
797
+
798
+ ```{r grouped_mean}
799
+ grouped_mean <- function(data, grouping_variables, value_variables) {
800
+ data %>%
801
+ group_by_at(grouping_variables) %>%
802
+ mutate(count = n()) %>%
803
+ summarise_at(c(value_variables, "count"), mean, na.rm = TRUE) %>%
804
+ rename_at(value_variables, funs(paste0("mean_", .)))
805
+ }
806
+
807
+ gm = starwars %>%
808
+ grouped_mean("eye_color", c("mass", "birth_year"))
809
+
810
+ as.data.frame(gm)
811
+ ```
812
+
813
+ The same code with Galaaz, becomes:
814
+
815
+ ```{ruby advanced_starwars}
816
+ def grouped_mean(data, grouping_variables, value_variables)
817
+ data.
818
+ group_by_at(grouping_variables).
819
+ mutate(count: E.n).
820
+ summarise_at(E.c(value_variables, "count"), ~:mean, na__rm: true).
821
+ rename_at(value_variables, E.funs(E.paste0("mean_", value_variables)))
822
+ end
823
+
824
+ puts grouped_mean((~:starwars), "eye_color", E.c("mass", "birth_year"))
825
+ ```
826
+
827
+ # Further reading
828
+
829
+ For more information on GraalVM, TruffleRuby, fastR, R and Galaaz check out the following sites/posts:
830
+
831
+ * [GraalVM Home](https://www.graalvm.org/)
832
+ * [TruffleRuby](https://github.com/oracle/truffleruby)
833
+ * [FastR](https://github.com/oracle/fastr)
834
+ * [Faster R with FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
835
+ * [How to make Beautiful Ruby Plots with Galaaz](https://medium.freecodecamp.org/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857)
836
+ * [Ruby Plotting with Galaaz: An example of tightly coupling Ruby and R in GraalVM](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021)
837
+ * [How to do reproducible research in Ruby with gKnit](https://towardsdatascience.com/how-to-do-reproducible-research-in-ruby-with-gknit-c26d2684d64e)
838
+ * [R for Data Science](https://r4ds.had.co.nz/)
839
+ * [Advanced R](https://adv-r.hadley.nz/)
840
+
841
+ # Conclusion
842
+
843
+ Ruby and Galaaz provide a nice framework for developing code that uses R functions. Although R is
844
+ a very powerful and flexible language, sometimes, too much flexibility makes life harder for
845
+ the casual user. We believe however, that even for the advanced user, Ruby integrated
846
+ with R throught Galaaz, makes a powerful environment for data analysis. In this blog post we
847
+ showed how Galaaz consistent syntax eliminates the need for complex constructs such as quoting,
848
+ enquoting, quasiquotation, etc. This simplification comes from the fact that expressions and
849
+ variables are clearly separated objects, which is not the case in the R language.