galaaz 0.4.10 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (163) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +2048 -531
  3. data/Rakefile +3 -2
  4. data/bin/gknit +152 -6
  5. data/bin/gknit-draft +105 -0
  6. data/bin/gknit-draft.rb +28 -0
  7. data/bin/gknit_Rscript +127 -0
  8. data/bin/grun +27 -1
  9. data/bin/gstudio +47 -4
  10. data/bin/{gstudio.rb → gstudio_irb.rb} +0 -0
  11. data/bin/gstudio_pry.rb +7 -0
  12. data/blogs/galaaz_ggplot/galaaz_ggplot.html +10 -195
  13. data/blogs/galaaz_ggplot/galaaz_ggplot.md +404 -0
  14. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/midwest_rb.png +0 -0
  15. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/scatter_plot_rb.png +0 -0
  16. data/blogs/gknit/gknit.Rmd +5 -3
  17. data/blogs/gknit/gknit.pdf +0 -0
  18. data/blogs/gknit/lst.rds +0 -0
  19. data/blogs/manual/lst.rds +0 -0
  20. data/blogs/manual/manual.Rmd +826 -53
  21. data/blogs/manual/manual.html +2338 -695
  22. data/blogs/manual/manual.md +2032 -539
  23. data/blogs/manual/manual.pdf +0 -0
  24. data/blogs/manual/manual.tex +1804 -594
  25. data/blogs/manual/manual_files/figure-html/bubble-1.png +0 -0
  26. data/blogs/manual/manual_files/figure-html/diverging_bar.png +0 -0
  27. data/blogs/manual/manual_files/figure-latex/bubble-1.png +0 -0
  28. data/blogs/manual/manual_files/figure-latex/diverging_bar.pdf +0 -0
  29. data/blogs/manual/model.rb +41 -0
  30. data/blogs/nse_dplyr/nse_dplyr.Rmd +226 -73
  31. data/blogs/nse_dplyr/nse_dplyr.html +254 -336
  32. data/blogs/nse_dplyr/nse_dplyr.md +353 -158
  33. data/blogs/oh_my/oh_my.html +274 -386
  34. data/blogs/oh_my/oh_my.md +208 -205
  35. data/blogs/ruby_plot/ruby_plot.html +20 -205
  36. data/blogs/ruby_plot/ruby_plot.md +14 -15
  37. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
  38. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
  39. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
  40. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
  41. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
  42. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
  43. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
  44. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
  45. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
  46. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
  47. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
  48. data/examples/Bibliography/master.bib +50 -0
  49. data/examples/Bibliography/stats.bib +72 -0
  50. data/examples/islr/x_y_rnorm.jpg +0 -0
  51. data/examples/latex_templates/Test-acm_article/Makefile +16 -0
  52. data/examples/latex_templates/Test-acm_article/Test-acm_article.Rmd +65 -0
  53. data/examples/latex_templates/Test-acm_article/acm_proc_article-sp.cls +1670 -0
  54. data/examples/latex_templates/Test-acm_article/sensys-abstract.cls +703 -0
  55. data/examples/latex_templates/Test-acm_article/sigproc.bib +59 -0
  56. data/examples/latex_templates/Test-acs_article/Test-acs_article.Rmd +260 -0
  57. data/examples/latex_templates/Test-acs_article/Test-acs_article.pdf +0 -0
  58. data/examples/latex_templates/Test-acs_article/acs-Test-acs_article.bib +11 -0
  59. data/examples/latex_templates/Test-acs_article/acs-my_output.bib +11 -0
  60. data/examples/latex_templates/Test-acs_article/acstest.bib +17 -0
  61. data/examples/latex_templates/Test-aea_article/AEA.cls +1414 -0
  62. data/{blogs/gknit/marshal.dump → examples/latex_templates/Test-aea_article/BibFile.bib} +0 -0
  63. data/examples/latex_templates/Test-aea_article/Test-aea_article.Rmd +108 -0
  64. data/examples/latex_templates/Test-aea_article/Test-aea_article.pdf +0 -0
  65. data/examples/latex_templates/Test-aea_article/aea.bst +1269 -0
  66. data/examples/latex_templates/Test-aea_article/multicol.sty +853 -0
  67. data/examples/latex_templates/Test-aea_article/references.bib +0 -0
  68. data/examples/latex_templates/Test-aea_article/setspace.sty +546 -0
  69. data/examples/latex_templates/Test-amq_article/Test-amq_article.Rmd +256 -0
  70. data/examples/latex_templates/Test-amq_article/Test-amq_article.pdf +0 -0
  71. data/examples/latex_templates/Test-amq_article/Test-amq_article.pdfsync +3397 -0
  72. data/examples/latex_templates/Test-amq_article/pics/Figure2.pdf +0 -0
  73. data/examples/latex_templates/Test-ams_article/Test-ams_article.Rmd +215 -0
  74. data/examples/latex_templates/Test-ams_article/amstest.bib +436 -0
  75. data/examples/latex_templates/Test-asa_article/Test-asa_article.Rmd +153 -0
  76. data/examples/latex_templates/Test-asa_article/Test-asa_article.pdf +0 -0
  77. data/examples/latex_templates/Test-asa_article/agsm.bst +1353 -0
  78. data/examples/latex_templates/Test-asa_article/bibliography.bib +233 -0
  79. data/examples/latex_templates/Test-ieee_article/IEEEtran.bst +2409 -0
  80. data/examples/latex_templates/Test-ieee_article/IEEEtran.cls +6346 -0
  81. data/examples/latex_templates/Test-ieee_article/Test-ieee_article.Rmd +175 -0
  82. data/examples/latex_templates/Test-ieee_article/Test-ieee_article.pdf +0 -0
  83. data/examples/latex_templates/Test-ieee_article/mybibfile.bib +20 -0
  84. data/examples/latex_templates/Test-rjournal_article/RJournal.sty +335 -0
  85. data/examples/latex_templates/Test-rjournal_article/RJreferences.bib +18 -0
  86. data/examples/latex_templates/Test-rjournal_article/RJwrapper.pdf +0 -0
  87. data/examples/latex_templates/Test-rjournal_article/Test-rjournal_article.Rmd +52 -0
  88. data/examples/latex_templates/Test-springer_article/Test-springer_article.Rmd +65 -0
  89. data/examples/latex_templates/Test-springer_article/Test-springer_article.pdf +0 -0
  90. data/examples/latex_templates/Test-springer_article/bibliography.bib +26 -0
  91. data/examples/latex_templates/Test-springer_article/spbasic.bst +1658 -0
  92. data/examples/latex_templates/Test-springer_article/spmpsci.bst +1512 -0
  93. data/examples/latex_templates/Test-springer_article/spphys.bst +1443 -0
  94. data/examples/latex_templates/Test-springer_article/svglov3.clo +113 -0
  95. data/examples/latex_templates/Test-springer_article/svjour3.cls +1431 -0
  96. data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.Rmd +73 -0
  97. data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.pdf +0 -0
  98. data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.Rmd +382 -0
  99. data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.pdf +0 -0
  100. data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.Rmd +164 -0
  101. data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.pdf +0 -0
  102. data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.Rmd +92 -0
  103. data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.pdf +0 -0
  104. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/attend-grade-relationships.csv +482 -0
  105. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.Rmd +280 -0
  106. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.pdf +0 -0
  107. data/examples/rmarkdown/svm-xaringan-example/svm-xaringan-example.Rmd +386 -0
  108. data/lib/R_interface/r.rb +1 -1
  109. data/lib/R_interface/r_libs.R +1 -1
  110. data/lib/R_interface/r_methods.rb +10 -0
  111. data/lib/R_interface/rpkg.rb +1 -0
  112. data/lib/R_interface/rsupport.rb +4 -6
  113. data/lib/gknit.rb +2 -0
  114. data/lib/gknit/draft.rb +105 -0
  115. data/lib/gknit/knitr_engine.rb +0 -33
  116. data/lib/util/exec_ruby.rb +1 -27
  117. data/specs/figures/bg.jpeg +0 -0
  118. data/specs/figures/bg.png +0 -0
  119. data/specs/figures/dose_len.png +0 -0
  120. data/specs/figures/no_args.jpeg +0 -0
  121. data/specs/figures/no_args.png +0 -0
  122. data/specs/figures/width_height.jpeg +0 -0
  123. data/specs/figures/width_height.png +0 -0
  124. data/specs/figures/width_height_units1.jpeg +0 -0
  125. data/specs/figures/width_height_units1.png +0 -0
  126. data/specs/figures/width_height_units2.jpeg +0 -0
  127. data/specs/figures/width_height_units2.png +0 -0
  128. data/specs/r_dataframe.spec.rb +11 -11
  129. data/specs/ruby_expression.spec.rb +1 -0
  130. data/specs/tmp.rb +41 -20
  131. data/version.rb +1 -1
  132. metadata +73 -35
  133. data/blogs/galaaz_ggplot/galaaz_ggplot.aux +0 -41
  134. data/blogs/galaaz_ggplot/galaaz_ggplot.out +0 -10
  135. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/midwest_rb.pdf +0 -0
  136. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/scatter_plot_rb.pdf +0 -0
  137. data/blogs/gknit/gknit.md +0 -1430
  138. data/blogs/gknit/gknit.tex +0 -1358
  139. data/blogs/manual/graph.rb +0 -29
  140. data/blogs/nse_dplyr/nse_dplyr.tex +0 -1373
  141. data/blogs/ruby_plot/ruby_plot.Rmd_external_figs +0 -662
  142. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +0 -57
  143. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +0 -106
  144. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +0 -110
  145. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +0 -174
  146. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +0 -236
  147. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +0 -296
  148. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +0 -236
  149. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +0 -218
  150. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +0 -128
  151. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +0 -150
  152. data/blogs/ruby_plot/ruby_plot_files/figure-latex/dose_len.png +0 -0
  153. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_delivery.png +0 -0
  154. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_dose.png +0 -0
  155. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color.png +0 -0
  156. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color2.png +0 -0
  157. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_decorations.png +0 -0
  158. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_jitter.png +0 -0
  159. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_points.png +0 -0
  160. data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_box_plot.png +0 -0
  161. data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_violin_plot.png +0 -0
  162. data/blogs/ruby_plot/ruby_plot_files/figure-latex/violin_with_jitter.png +0 -0
  163. data/examples/paper/paper.rb +0 -36
@@ -24,30 +24,59 @@ fontsize: 11pt
24
24
 
25
25
  # Introduction
26
26
 
27
- In this post we will see how to program with _dplyr_ in Galaaz.
27
+ According to Steven Sagaert answer on Quora about "Is programming language R overrated?":
28
28
 
29
- ## But first, what is Galaaz??
29
+ > R is a sophisticated language with an unusual (i.e. non-mainstream) set of features. It‘s
30
+ > an impure functional programming language with sophisticated metaprogramming and 3
31
+ > different OO systems.
32
+
33
+ > Just like common lisp you can completely customise how things work via metaprogramming.
34
+ > The biggest example is the tidyverse: by creating it’s own evaluation system (tidyeval)
35
+ > was able to create a custom syntax for dplyr.
36
+
37
+ > Mastering R (the language) and its ecosystem is not a matter of weeks or months but
38
+ > takes years. The rabbit hole goes pretty deep…
39
+
40
+ Although having a highly configurable language might give extreme power to the programmer,
41
+ it can also be, as stated above, a question of years to master it. Programming with _dplyr_
42
+ for instance, requires learning a set of complex concepts and rules that are not easily
43
+ accessible for casual users or _unsofisticated_ programmers as many users of R are. Being
44
+ _unsofisticated_ is NOT used here in a negative sense, as R was build for statitians and
45
+ not programmers, that need to solve real problems, often in a short time spam and are not
46
+ concerned about creating complex computer systems.
47
+
48
+ Unfortunatelly, if this _unsofisticated_ programmer decides to move unto more sofisticated
49
+ coding, the learning curve might become a serious impediment.
50
+
51
+ In this post we will see how to program with _dplyr_ in Galaaz and how Ruby can simplify
52
+ the learning curve of mastering _dplyr_ coding.
53
+
54
+ # But first, what is Galaaz??
30
55
 
31
56
  Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with
32
- a large community, a very large set of libraries and great for web development. However,
57
+ a large community, a very large set of libraries and great for web development. It is also
58
+ easy to learn. However,
33
59
  it lacks libraries for data science, statistics, scientific plotting and machine learning.
34
60
  On the other hand, R is considered one of the most powerful languages for solving all of the
35
61
  above problems. Maybe the strongest competitor to R is Python with libraries such as NumPy,
36
- Pandas, SciPy, SciKit-Learn and many more.
62
+ Pandas, SciPy, SciKit-Learn and many more. We will not get here in the discussion on R
63
+ versus Python, both are excellent languages with powerful features, benefits and drawbacks.
64
+ Our interest is to bring to yet another excellent language, Ruby, the data science libraries
65
+ that it lacks.
37
66
 
38
67
  With Galaaz we do not intend to re-implement any of the scientific libraries in R. However, we
39
68
  allow for very tight coupling between the two languages to the point that the Ruby
40
69
  developer does not need to know that there is an R engine running. Also, from the point of
41
- view of the R user/developer Galaaz looks a lot like R, with just minor syntactic difference,
42
- so there is almost no learning courve for the R developer. And as we will see in this
43
- post, programming with _dplyr_ is easier in Galaaz than in R.
70
+ view of the R user/developer, Galaaz looks a lot like R, with just minor syntactic difference,
71
+ so there is almost no learning curve for the R developer. And as we will see in this
72
+ post that programming with _dplyr_ is easier in Galaaz than in R.
44
73
 
45
- R users are probably quite knowledgeable about _dplyr_, for the Ruby developer, _dplyr_ and
74
+ R users are probably quite knowledgeable about _dplyr_. For the Ruby developer, _dplyr_ and
46
75
  the _tidyverse_ libraries are a set of libraries for data manipulation in R, developed by
47
76
  Hardley Wickham, chief scientis at RStudio and a prolific R coder and writer.
48
77
 
49
- For the coupling of Ruby and R we use new technologies provided by Oracle: GraalVM,
50
- TruffleRuby and FastR:
78
+ For the coupling of Ruby and R, we use new technologies provided by Oracle: GraalVM,
79
+ TruffleRuby and FastR. GraalVM home page had the following definition:
51
80
 
52
81
  GraalVM is a universal virtual machine for running applications
53
82
  written in JavaScript, Python 3, Ruby, R, JVM-based languages like Java,
@@ -76,22 +105,39 @@ TruffleRuby and FastR:
76
105
  With GraalVM we aim to allow developers to freely choose the right language
77
106
  for the task at hand without making compromises.
78
107
 
79
- Interested readers should also check out the following sites:
80
108
 
81
- * [GraalVM Home](https://www.graalvm.org/)
82
- * [TruffleRuby](https://github.com/oracle/truffleruby)
83
- * [FastR](https://github.com/oracle/fastr)
84
- * [Faster R with FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
85
- * [How to make Beautiful Ruby Plots with Galaaz](https://medium.freecodecamp.org/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857)
86
- * [Ruby Plotting with Galaaz: An example of tightly coupling Ruby and R in GraalVM](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021)
87
- * [How to do reproducible research in Ruby with gKnit](https://towardsdatascience.com/how-to-do-reproducible-research-in-ruby-with-gknit-c26d2684d64e)
88
- * [R for Data Science](https://r4ds.had.co.nz/)
89
- * [Advanced R](https://adv-r.hadley.nz/)
109
+ # Tidyverse and dplyr
110
+
111
+ In [What is the tidyverse?](https://rviews.rstudio.com/2017/06/08/what-is-the-tidyverse/) the
112
+ tidyverse is explained as follows:
113
+
114
+ > The tidyverse is a coherent system of packages for data manipulation, exploration and
115
+ > visualization that share a common design philosophy. These were mostly developed by
116
+ > Hadley Wickham himself, but they are now being expanded by several contributors. Tidyverse
117
+ > packages are intended to make statisticians and data scientists more productive by
118
+ > guiding them through workflows that facilitate communication, and result in reproducible
119
+ > work products. Fundamentally, the tidyverse is about the connections between the tools
120
+ > that make the workflow possible.
121
+
122
+ _dplyr_ is one of the many packages that are part of the tidyverse. It is:
123
+
124
+ > a grammar of data manipulation, providing a consistent set of verbs that help you solve
125
+ > the most common data manipulation challenges:
90
126
 
91
- ## Programming with dplyr
127
+ > 1. mutate() adds new variables that are functions of existing variables
128
+ > 2. select() picks variables based on their names.
129
+ > 3. filter() picks cases based on their values.
130
+ > 4. summarise() reduces multiple values down to a single summary.
131
+ > 5. arrange() changes the ordering of the rows.
92
132
 
93
- This post will follow closely the work done in https://dplyr.tidyverse.org/articles/programming.html,
94
- by Hardley Wickham. In it, Hardley states:
133
+ Very often R is used interactively and users use _dplyr_ to manipulate a single dataset
134
+ without programming. When users want to replicate their work for
135
+ multiple datasets, programming becomes necessary.
136
+
137
+ # Programming with dplyr
138
+
139
+ In the vignette ["Programming with dplyr"](https://dplyr.tidyverse.org/articles/programming.html),
140
+ Hardley Wickham states:
95
141
 
96
142
  > Most dplyr functions use non-standard evaluation (NSE). This is a catch-all term that
97
143
  > means they don’t follow the usual R rules of evaluation. Instead, they capture the
@@ -106,6 +152,8 @@ by Hardley Wickham. In it, Hardley states:
106
152
  > database backends because dplyr itself doesn’t do any work, but instead generates the SQL
107
153
  > that tells the database what to do.
108
154
 
155
+ But then he goes on:
156
+
109
157
  > Unfortunately these benefits do not come for free. There are two main drawbacks:
110
158
 
111
159
  > Most dplyr arguments are not referentially transparent. That means you can’t replace a value
@@ -131,8 +179,27 @@ filter(df, my_var == 1)
131
179
  ```
132
180
  > This makes it hard to create functions with arguments that change how dplyr verbs are computed.
133
181
 
134
- In this post we will see that programming with _dplyr_ in Galaaz does not require knowledge of
135
- non-standard evaluation in R and can be accomplished by utilizing normal Ruby constructs.
182
+ As a result of this, programming with _dplyr_ requires learning a set of new ideas and concepts.
183
+ In this vignette Hardley goes on showing how to program ever more difficult problems with _dplyr_,
184
+ showing the problems it faces and the new concepts needed to solve them.
185
+
186
+ In this blog, we will look at all the problems presented by Harley on the vignette and show how
187
+ those same problems can be solved using Galaaz and the Ruby language.
188
+
189
+ This blog is organized as follows: first we show how to write expressions using Galaaz.
190
+ Expressions are a fundamental concept in _dplyr_ and are not part of basic Ruby. We extend
191
+ the Ruby language create a manipulate expressions that will be used by _dplyr_ functions.
192
+
193
+ Then we show very succintly how Ruby and R can be integrated and how R functions are
194
+ transparently called from Ruby. Galaaz [user manual](https://github.com/rbotafogo/galaaz/wiki)
195
+ (still in development) goes in much deeper detail about this integration.
196
+
197
+ Next in section "Data manipulation wiht _dplyr_" we go through all the problems on the
198
+ _dplyr_ vignette and look at how they are solved in Galaaz. We then discuss why programming
199
+ with Galaaz and _dplyr_ is easier than programming with _dplyr_ in plain R.
200
+
201
+ The following section looks at another more advanced problem and shows that Galaaz can still
202
+ handle it without any difficulty. We then provide further reading and concluding remarks.
136
203
 
137
204
  # Writing Expressions in Galaaz
138
205
 
@@ -141,12 +208,24 @@ Galaaz extends Ruby to work with expressions, similar to R's expressions build w
141
208
  formulae. For instance, in mathematics, the expression $y = sin(x)$ describes a function but cannot
142
209
  be computed unless the value of $x$ is bound to some value.
143
210
 
144
- Let's take a look at some of those expressions in Ruby:
211
+ Expressions are fundamental in _dplyr_ programming as they are the input to _dplyr_ functions,
212
+ for instance, as we will see shortly, if a data frame has a column named 'x' and we want
213
+ to add another column, y, to this dataframe that has the values of 'x' times 2, then we would
214
+ call a _dplyr_ function with the expression 'y = x * 2'.
215
+
216
+ ## A note on notation
217
+
218
+ This blog was written in Rmarkdown and automatically converted to HTML or PDF (depending on
219
+ where you are reading this blog) with gKnit (a tool provided by Galaaz). In Rmarkdown, it is
220
+ possible to write text and code blocks that are executed to generate the final report. Code
221
+ blocks appear inside a 'box' and the result of their execution appear either in another type
222
+ of 'box' with a different background (HTML) or as normal text (PDF). Every output line from
223
+ the code execution is preceeded by '##'.
145
224
 
146
225
  ## Expressions from operators
147
226
 
148
- The code bellow creates an expression summing two symbols. Note that :a and :b are Ruby symbols and
149
- are not bound to any value at the time of expression definition:
227
+ The code below creates an expression summing two symbols. Note that :a and :b are Ruby symbols and
228
+ are not bound to any values at the time of expression definition:
150
229
 
151
230
 
152
231
  ```ruby
@@ -157,7 +236,7 @@ puts exp1
157
236
  ```
158
237
  ## a + b
159
238
  ```
160
- We can build any complex mathematical expression such as:
239
+ In Galaaz, we can build any complex mathematical expression such as:
161
240
 
162
241
 
163
242
  ```ruby
@@ -168,7 +247,13 @@ puts exp2
168
247
  ```
169
248
  ## (a + b) * 2 + c^2L/z
170
249
  ```
171
- The 'L' after two indicates that 2 is an integer.
250
+ Expressions are printed with the same format as the equivalent R expressions. The 'L' after
251
+ 2 indicates that 2 is an integer.
252
+
253
+ The R developer should note that in R, if she writes the
254
+ number '2', the R interpreter will convert it to float. In order to get an interger she
255
+ should write '2L'. Galaaz follows Ruby notation and '2' is an integer, while '2.0' is a
256
+ float.
172
257
 
173
258
  It is also possible to use inequality operators in building expressions:
174
259
 
@@ -186,13 +271,13 @@ Expressions' definition can also make use of normal Ruby variables without any p
186
271
 
187
272
  ```ruby
188
273
  x = 20
189
- y = 30
274
+ y = 30.0
190
275
  exp_var = (:a + :b) * x <= :z - y
191
276
  puts exp_var
192
277
  ```
193
278
 
194
279
  ```
195
- ## (a + b) * 20L <= z - 30L
280
+ ## (a + b) * 20L <= z - 30
196
281
  ```
197
282
 
198
283
  Galaaz provides both symbolic representations for operators, such as (>, <, !=) as functional
@@ -209,10 +294,12 @@ puts exp4
209
294
  ## a + b >= z
210
295
  ```
211
296
 
212
- Two type of expression, however, can only be created with the functional representation
213
- of the operators, those are expressions involving '==', and '='. In order to write an
214
- expression involving '==' we
215
- need to use the method '.eq' and for '=' we need the function '.assign'
297
+ Two types of expressions, however, can only be created with the functional representation
298
+ of the operators. Those are expressions involving '==', and '='. This is the case since
299
+ those symbols have special meaning in Ruby and should not be redefined.
300
+
301
+ In order to write an expression involving '==' we
302
+ need to use the method '.eq' and for '=' we need the function '.assign':
216
303
 
217
304
 
218
305
  ```ruby
@@ -233,8 +320,9 @@ puts exp6
233
320
  ```
234
321
  ## y <- a + b
235
322
  ```
236
- In general we think that using the functional notation is preferable to using the
237
- symbolic notation as otherwise, we end up writing invalid expressions such as
323
+ Users should be careful when writing expressions not to inadvertently use '==' or '=' as
324
+ this will generate an error, that might be a bit cryptic (in future releases of Galaza, we
325
+ plan to improve the error message).
238
326
 
239
327
 
240
328
  ```ruby
@@ -248,17 +336,16 @@ puts exp_wrong
248
336
  ## object 'a' not found (RError)
249
337
  ## Translated to internal error
250
338
  ```
251
- and it might be difficult to understand what is going on here. The problem lies with the fact that
252
- when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When the
253
- comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols at
254
- this time are not bound to anything and we get a "object 'a' not found" message.
255
- If we only use functional notation, this type of error will not occur.
339
+ The problem lies with the fact that
340
+ when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When this
341
+ comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols, at
342
+ this time, are not bound to anything giving the "object 'a' not found" message.
256
343
 
257
344
  ## Expressions with R methods
258
345
 
259
346
  It is often necessary to create an expression that uses a method or function. For instance, in
260
347
  mathematics, it's quite natural to write an expressin such as $y = sin(x)$. In this case, the
261
- 'sin' function is part of the expression and should not immediately be executed. When we want
348
+ 'sin' function is part of the expression and should not be immediately executed. When we want
262
349
  the function to be part of the expression, we call the function preceeding it
263
350
  by the letter E, such as 'E.sin(x)'
264
351
 
@@ -271,8 +358,7 @@ puts exp7
271
358
  ```
272
359
  ## y <- sin(x)
273
360
  ```
274
-
275
- Expressions can also be written using '.' notation:
361
+ Function expressions can also be written using '.' notation:
276
362
 
277
363
 
278
364
  ```ruby
@@ -283,8 +369,9 @@ puts exp8
283
369
  ```
284
370
  ## y <- sin(x)
285
371
  ```
286
-
287
- When a function has multiple arguments, the first one can be used before the '.':
372
+ When a function has multiple arguments, the first one can be used before the '.'. For instance,
373
+ the R concatenate function 'c', that concatenates two or more arguments can be part of
374
+ an expression as:
288
375
 
289
376
 
290
377
  ```ruby
@@ -295,11 +382,16 @@ puts exp9
295
382
  ```
296
383
  ## c(x, y)
297
384
  ```
385
+ Note that this gives an OO feeling to the code, as if we were saying 'x' concatenates 'y'. As a
386
+ side note, '.' notation can be used as the R pipe operator '%>%', but is more general than the
387
+ pipe.
298
388
 
299
389
  ## Evaluating an Expression
300
390
 
301
- Expressions can be evaluated by calling function 'eval' with a binding. A binding can be provided
302
- with a list:
391
+ Although we are mainly focusing on expressions to pass them to _dplyr_ functions, expressions
392
+ can be evaluated by calling function 'eval' with a binding.
393
+
394
+ A binding can be provided with a list or a data frame as shown below:
303
395
 
304
396
 
305
397
  ```ruby
@@ -311,7 +403,7 @@ puts exp.eval(R.list(a: 10, b: 20, c: 30, z: 40))
311
403
  ## [1] 82.5
312
404
  ```
313
405
 
314
- ... with a data frame:
406
+ with a data frame:
315
407
 
316
408
 
317
409
  ```ruby
@@ -336,8 +428,8 @@ this post, we do not have enough space to write a complete manual on Galaaz
336
428
  (a short manual can be found at: https://www.rubydoc.info/gems/galaaz/0.4.9), so we will
337
429
  present only a few examples scripts using Galaaz.
338
430
 
339
- Basically, to call an R function from Ruby with Galaaz, one only needs to preceed the function
340
- with 'R.'. For instance, to create a vector in R, the 'c' function is used. From Galaaz, a
431
+ Basically, to call an R function from Ruby with Galaaz, one only needs to preced the function
432
+ with 'R.'. For instance, to create a vector in R, the 'c' function is used. In Galaaz, a
341
433
  vector can be created by using 'R.c':
342
434
 
343
435
 
@@ -399,11 +491,103 @@ puts exp10
399
491
  ## y <- 0.850903524534118
400
492
  ```
401
493
 
402
- # Filtering using expressions
494
+ # Data manipulation wiht _dplyr_
495
+
496
+ In this section we will give a brief tour _dplyr_'s usage in Galaaz and how to manipulate
497
+ data in Ruby with it. This section will follow [_dplyr_'s vignette](https://dplyr.tidyverse.org/articles/dplyr.html) that explores the nycflights13 data set. This dataset contains all 336776
498
+ flights that departed from New York City in 2013. The data comes from the US Bureau of
499
+ Transportation Statistics.
500
+
501
+ Let's start by taking a look at this dataset:
502
+
503
+
504
+ ```ruby
505
+ R.library('nycflights13')
506
+ # check it's dimension
507
+ puts ~:flights.dim
508
+ # and the structure
509
+ ~:flights.str
510
+ ```
511
+
512
+ ```
513
+ ## Message:
514
+ ## Method ~ not found in R environment
515
+ ```
516
+
517
+ ```
518
+ ## Message:
519
+ ## /home/rbotafogo/desenv/galaaz/lib/R_interface/rsupport.rb:90:in `eval'
520
+ ## /home/rbotafogo/desenv/galaaz/lib/R_interface/rsupport.rb:270:in `exec_function_name'
521
+ ## /home/rbotafogo/desenv/galaaz/lib/R_interface/robject.rb:166:in `method_missing'
522
+ ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:105:in `get_binding'
523
+ ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:102:in `eval'
524
+ ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:102:in `exec_ruby'
525
+ ## /home/rbotafogo/desenv/galaaz/lib/gknit/knitr_engine.rb:650:in `block in initialize'
526
+ ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `call'
527
+ ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `callback'
528
+ ## (eval):3:in `function(...) {\n rb_method(...)'
529
+ ## unknown.r:1:in `in_dir'
530
+ ## unknown.r:1:in `block_exec'
531
+ ## /usr/local/lib/graalvm-ce-java11-20.0.0/languages/R/library/knitr/R/block.R:92:in `call_block'
532
+ ## /usr/local/lib/graalvm-ce-java11-20.0.0/languages/R/library/knitr/R/block.R:6:in `process_group.block'
533
+ ## /usr/local/lib/graalvm-ce-java11-20.0.0/languages/R/library/knitr/R/block.R:3:in `<no source>'
534
+ ## unknown.r:1:in `withCallingHandlers'
535
+ ## unknown.r:1:in `process_file'
536
+ ## unknown.r:1:in `<no source>'
537
+ ## unknown.r:1:in `<no source>'
538
+ ## <REPL>:4:in `<repl wrapper>'
539
+ ## <REPL>:1
540
+ ```
541
+
542
+ Now, let's use a first verb of _dplyr_: 'filter'. This verb, obviously, will filter the data
543
+ by the given expression. In the next block, we filter by columns 'month' and 'day'. The
544
+ first argument to the filter function is symbol ':flights'. A Ruby symbol, when given to
545
+ an R function will convert to the R variable of the same name, in this case 'flights', that
546
+ holds the nycflights13 data frame.
547
+
548
+ The second and third arguments are expressions that will be used by the filter function to
549
+ filter by columns, looking for entries in which the month and day are equal to 1.
550
+
551
+
552
+ ```ruby
553
+ puts R.filter(:flights, (:month.eq 1), (:day.eq 1))
554
+ ```
555
+
556
+ ```
557
+ ## # A tibble: 842 x 19
558
+ ## year month day dep_time sched_dep_time dep_delay arr_time
559
+ ## <int> <int> <int> <int> <int> <dbl> <int>
560
+ ## 1 2013 1 1 517 515 2 830
561
+ ## 2 2013 1 1 533 529 4 850
562
+ ## 3 2013 1 1 542 540 2 923
563
+ ## 4 2013 1 1 544 545 -1 1004
564
+ ## 5 2013 1 1 554 600 -6 812
565
+ ## 6 2013 1 1 554 558 -4 740
566
+ ## 7 2013 1 1 555 600 -5 913
567
+ ## 8 2013 1 1 557 600 -3 709
568
+ ## 9 2013 1 1 557 600 -3 838
569
+ ## 10 2013 1 1 558 600 -2 753
570
+ ## # … with 832 more rows, and 12 more variables: sched_arr_time <int>,
571
+ ## # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
572
+ ## # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
573
+ ## # minute <dbl>, time_hour <dttm>
574
+ ```
575
+
576
+
577
+ ## Programming with _dplyr_: problems and how to solve them in Galaaz
578
+
579
+ In this section we look at the list of problems that Hardley describes in the "Programming with dplyr"
580
+ vignette and show how those problems are solved and coded with Galaaz. Readers interested in
581
+ how those problems are treated in _dplyr_ should read the vignette and use it as a comparison with
582
+ this blog.
403
583
 
404
- Now that we know how to write expression and call R functions let's do some data manipulation in
405
- Galaaz. Let's first start by creating the same data frame that we created previously in section
406
- "Programming with dplyr":
584
+ ## Filtering using expressions
585
+
586
+ Now that we know how to write expressions and call R functions, let's do some data manipulation in
587
+ Galaaz. Let's first start by creating a data frame. In R, the 'data.frame' function creates a
588
+ data frame. In Ruby, writing 'data.frame' will not parse as a single object. To call R
589
+ functions that have a '.' in them, we need to substitute the '.' with '__'. So, method
590
+ 'data.frame' in R, is called in Galaaz as 'R.data\_\_frame':
407
591
 
408
592
 
409
593
  ```ruby
@@ -417,9 +601,15 @@ puts df
417
601
  ## 2 2 2
418
602
  ## 3 3 1
419
603
  ```
420
- The 'filter' function can be called on this data frame either by using 'R.filter(df, ...)' or
421
- by using dot notation. We prefer to use dot notation as shown bellow. The argument to 'filter'
422
- in Galaaz should be an expression. Note that if we gave to filter a Ruby expression such as
604
+
605
+ _dplyr_ provides the 'filter' function, that filters data in a data brame. The 'filter'
606
+ function can be called on this data frame either by using 'R.filter(df, ...)' or
607
+ by using dot notation.
608
+
609
+ -------FIX---------
610
+
611
+ We prefer to use dot notation as shown bellow. The argument to 'filter' should be an
612
+ expression. Note that if we gave to filter a Ruby expression such as
423
613
  'x == 1', we would get an error, since there is no variable 'x' defined and if 'x' was a variable
424
614
  then 'x == 1' would either be 'true' or 'false'. Our goal is to filter our data frame returning
425
615
  all rows in which the 'x' value is equal to 1. To express this we want: ':x.eq 1', where :x will
@@ -434,7 +624,6 @@ puts df.filter(:x.eq 1)
434
624
  ## x y
435
625
  ## 1 1 3
436
626
  ```
437
-
438
627
  In R, and when coding with 'tidyverse', arguments to a function are usually not
439
628
  *referencially transparent*. That is, you can’t replace a value with a seemingly equivalent
440
629
  object that you’ve defined elsewhere. In other words, this code
@@ -504,7 +693,7 @@ puts df.filter(:x.eq y)
504
693
  ## x y
505
694
  ## 1 1 3
506
695
  ```
507
- # Writing a function that applies to different data sets
696
+ ## Writing a function that applies to different data sets
508
697
 
509
698
  Let's suppose that we want to write a function that receives as the first argument a data frame
510
699
  and as second argument an expression that adds a column to the data frame that is equal to the
@@ -571,7 +760,7 @@ mutate_y(df1)
571
760
  ## mismatched protect/unprotect (unprotect with empty protect stack) (RError)
572
761
  ## Translated to internal error
573
762
  ```
574
- # Different expressions
763
+ ## Different expressions
575
764
 
576
765
  Let's move to the next problem as presented by Hardley where trying to write a function in R
577
766
  that will receive two argumens, the first a variable and the second an expression is not trivial.
@@ -589,16 +778,16 @@ df <- data.frame(
589
778
  b = sample(5)
590
779
  )
591
780
 
592
- as.data.frame(df)
781
+ as.data.frame(df)
593
782
  ```
594
783
 
595
784
  ```
596
785
  ## g1 g2 a b
597
- ## 1 1 1 2 1
598
- ## 2 1 2 4 3
599
- ## 3 2 1 5 4
600
- ## 4 2 2 3 2
601
- ## 5 2 1 1 5
786
+ ## 1 1 1 3 3
787
+ ## 2 1 2 2 1
788
+ ## 3 2 1 5 2
789
+ ## 4 2 2 4 5
790
+ ## 5 2 1 1 4
602
791
  ```
603
792
 
604
793
  ```r
@@ -606,13 +795,13 @@ d2 <- df %>%
606
795
  group_by(g1) %>%
607
796
  summarise(a = mean(a))
608
797
 
609
- as.data.frame(d2)
798
+ as.data.frame(d2)
610
799
  ```
611
800
 
612
801
  ```
613
- ## g1 a
614
- ## 1 1 3
615
- ## 2 2 3
802
+ ## g1 a
803
+ ## 1 1 2.500000
804
+ ## 2 2 3.333333
616
805
  ```
617
806
 
618
807
  ```r
@@ -624,9 +813,9 @@ as.data.frame(d2)
624
813
  ```
625
814
 
626
815
  ```
627
- ## g2 a
628
- ## 1 1 2.666667
629
- ## 2 2 3.500000
816
+ ## g2 a
817
+ ## 1 1 3
818
+ ## 2 2 3
630
819
  ```
631
820
 
632
821
  As shown by Hardley, one might expect this function to do the trick:
@@ -662,38 +851,42 @@ def my_summarize(df, group_var)
662
851
  summarize(a: :a.mean)
663
852
  end
664
853
 
665
- puts my_summarize(:df, :g1).as__data__frame
854
+ puts my_summarize(:df, :g1)
666
855
  ```
667
856
 
668
857
  ```
669
858
  ## g1 g2 a b
670
- ## 1 1 1 2 1
671
- ## 2 1 2 4 3
672
- ## 3 2 1 5 4
673
- ## 4 2 2 3 2
674
- ## 5 2 1 1 5
859
+ ## 1 1 1 3 3
860
+ ## 2 1 2 2 1
861
+ ## 3 2 1 5 2
862
+ ## 4 2 2 4 5
863
+ ## 5 2 1 1 4
675
864
  ##
676
- ## g1 a
677
- ## 1 1 3
678
- ## 2 2 3
865
+ ## # A tibble: 2 x 2
866
+ ## g1 a
867
+ ## <dbl> <dbl>
868
+ ## 1 1 2.5
869
+ ## 2 2 3.33
679
870
  ```
680
871
  It works!!! Well, let's make sure this was not just some coincidence
681
872
 
682
873
 
683
874
  ```ruby
684
- puts my_summarize(:df, :g2).as__data__frame
875
+ puts my_summarize(:df, :g2)
685
876
  ```
686
877
 
687
878
  ```
688
- ## g2 a
689
- ## 1 1 2.666667
690
- ## 2 2 3.500000
879
+ ## # A tibble: 2 x 2
880
+ ## g2 a
881
+ ## <dbl> <dbl>
882
+ ## 1 1 3
883
+ ## 2 2 3
691
884
  ```
692
885
 
693
886
  Great, everything is fine! No magic, no new functions, no complexities, just normal, standard Ruby
694
887
  code. If you've ever done NSE in R, this certainly feels much safer and easy to implement.
695
888
 
696
- # Different input variables
889
+ ## Different input variables
697
890
 
698
891
  In the previous section we've managed to get rid of all NSE formulation for a simple example, but
699
892
  does this remain true for more complex examples, or will the Galaaz way prove inpractical for
@@ -743,7 +936,7 @@ puts my_summarise2((~:df), :a * :b)
743
936
  Once again, there is no need to use any special theory or functions. The only point to be
744
937
  careful about is the use of 'E' to build expressions from functions 'mean', 'sum' and 'n'.
745
938
 
746
- # Different input and output variable
939
+ ## Different input and output variable
747
940
 
748
941
  Now the next challenge presented by Hardley is to vary the name of the output variables based on
749
942
  the received expression. So, if the input expression is 'a', we want our data frame columns to
@@ -792,17 +985,17 @@ puts my_mutate((~:df), :b)
792
985
 
793
986
  ```
794
987
  ## g1 g2 a b mean_a sum_a
795
- ## 1 1 1 2 1 3 15
796
- ## 2 1 2 4 3 3 15
797
- ## 3 2 1 5 4 3 15
798
- ## 4 2 2 3 2 3 15
799
- ## 5 2 1 1 5 3 15
988
+ ## 1 1 1 3 3 3 15
989
+ ## 2 1 2 2 1 3 15
990
+ ## 3 2 1 5 2 3 15
991
+ ## 4 2 2 4 5 3 15
992
+ ## 5 2 1 1 4 3 15
800
993
  ## g1 g2 a b mean_b sum_b
801
- ## 1 1 1 2 1 3 15
802
- ## 2 1 2 4 3 3 15
803
- ## 3 2 1 5 4 3 15
804
- ## 4 2 2 3 2 3 15
805
- ## 5 2 1 1 5 3 15
994
+ ## 1 1 1 3 3 3 15
995
+ ## 2 1 2 2 1 3 15
996
+ ## 3 2 1 5 2 3 15
997
+ ## 4 2 2 4 5 3 15
998
+ ## 5 2 1 1 4 3 15
806
999
  ```
807
1000
  It really seems that "Non Standard Evaluation" is actually quite standard in Galaaz! But, you
808
1001
  might have noticed a small change in the way the arguments to the mutate method were called.
@@ -812,7 +1005,7 @@ and variable mean\_name is not followed by ':' but by '=>'. This is standard Ru
812
1005
 
813
1006
  [explain....]
814
1007
 
815
- # Capturing multiple variables
1008
+ ## Capturing multiple variables
816
1009
 
817
1010
  Moving on with new complexities, Hardley proposes us to solve the problem in which the
818
1011
  summarise function will receive any number of grouping variables.
@@ -827,15 +1020,18 @@ def my_summarise3(df, *group_vars)
827
1020
  summarise(a: E.mean(:a))
828
1021
  end
829
1022
 
830
- puts my_summarise3((~:df), :g1, :g2).as__data__frame
1023
+ puts my_summarise3((~:df), :g1, :g2)
831
1024
  ```
832
1025
 
833
1026
  ```
834
- ## g1 g2 a
835
- ## 1 1 1 2
836
- ## 2 1 2 4
837
- ## 3 2 1 3
838
- ## 4 2 2 3
1027
+ ## # A tibble: 4 x 3
1028
+ ## # Groups: g1 [?]
1029
+ ## g1 g2 a
1030
+ ## <dbl> <dbl> <dbl>
1031
+ ## 1 1 1 3
1032
+ ## 2 1 2 2
1033
+ ## 3 2 1 3
1034
+ ## 4 2 2 4
839
1035
  ```
840
1036
 
841
1037
  # Why does R require NSE and Galaaz does not?
@@ -868,7 +1064,7 @@ call the function passing the expression ':a.eq :b'.
868
1064
 
869
1065
  # Advanced dplyr features
870
1066
 
871
- In the blog: Programming with dplyr by using dplyr (https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/) Iñaki Úcar shows surprise that some R users are trying to code in dplyr avoiding
1067
+ In the blog: [Programming with dplyr by using dplyr](https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/) Iñaki Úcar shows surprise that some R users are trying to code in dplyr avoiding
872
1068
  the use of NSE. For instance he says:
873
1069
 
874
1070
  > Take the example of seplyr. It stands for standard evaluation dplyr, and enables us to
@@ -887,38 +1083,21 @@ features of characters in the Starwars movies:
887
1083
 
888
1084
 
889
1085
  ```ruby
890
- puts (~:starwars).head.as__data__frame
891
- ```
892
-
893
- ```
894
- ## name height mass hair_color skin_color eye_color birth_year
895
- ## 1 Luke Skywalker 172 77 blond fair blue 19.0
896
- ## 2 C-3PO 167 75 <NA> gold yellow 112.0
897
- ## 3 R2-D2 96 32 <NA> white, blue red 33.0
898
- ## 4 Darth Vader 202 136 none white yellow 41.9
899
- ## 5 Leia Organa 150 49 brown light brown 19.0
900
- ## 6 Owen Lars 178 120 brown, grey light blue 52.0
901
- ## gender homeworld species
902
- ## 1 male Tatooine Human
903
- ## 2 <NA> Tatooine Droid
904
- ## 3 <NA> Naboo Droid
905
- ## 4 male Tatooine Human
906
- ## 5 female Alderaan Human
907
- ## 6 male Tatooine Human
908
- ## films
909
- ## 1 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
910
- ## 2 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
911
- ## 3 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
912
- ## 4 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
913
- ## 5 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
914
- ## 6 Attack of the Clones, Revenge of the Sith, A New Hope
915
- ## vehicles starships
916
- ## 1 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle
917
- ## 2
918
- ## 3
919
- ## 4 TIE Advanced x1
920
- ## 5 Imperial Speeder Bike
921
- ## 6
1086
+ puts (~:starwars).head
1087
+ ```
1088
+
1089
+ ```
1090
+ ## # A tibble: 6 x 13
1091
+ ## name height mass hair_color skin_color eye_color birth_year gender
1092
+ ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
1093
+ ## 1 Luke… 172 77 blond fair blue 19 male
1094
+ ## 2 C-3PO 167 75 <NA> gold yellow 112 <NA>
1095
+ ## 3 R2-D2 96 32 <NA> white, bl… red 33 <NA>
1096
+ ## 4 Dart… 202 136 none white yellow 41.9 male
1097
+ ## 5 Leia… 150 49 brown light brown 19 female
1098
+ ## 6 Owen… 178 120 brown, gr… light blue 52 male
1099
+ ## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
1100
+ ## # vehicles <list>, starships <list>
922
1101
  ```
923
1102
  The grouped_mean function bellow will receive a grouping variable and calculate summaries for
924
1103
  the value\_variables given:
@@ -970,28 +1149,44 @@ def grouped_mean(data, grouping_variables, value_variables)
970
1149
  rename_at(value_variables, E.funs(E.paste0("mean_", value_variables)))
971
1150
  end
972
1151
 
973
- puts grouped_mean((~:starwars), "eye_color", E.c("mass", "birth_year")).as__data__frame
1152
+ puts grouped_mean((~:starwars), "eye_color", E.c("mass", "birth_year"))
974
1153
  ```
975
1154
 
976
1155
  ```
977
- ## eye_color mean_mass mean_birth_year count
978
- ## 1 black 76.28571 33.00000 10
979
- ## 2 blue 86.51667 67.06923 19
980
- ## 3 blue-gray 77.00000 57.00000 1
981
- ## 4 brown 66.09231 108.96429 21
982
- ## 5 dark NaN NaN 1
983
- ## 6 gold NaN NaN 1
984
- ## 7 green, yellow 159.00000 NaN 1
985
- ## 8 hazel 66.00000 34.50000 3
986
- ## 9 orange 282.33333 231.00000 8
987
- ## 10 pink NaN NaN 1
988
- ## 11 red 81.40000 33.66667 5
989
- ## 12 red, blue NaN NaN 1
990
- ## 13 unknown 31.50000 NaN 3
991
- ## 14 white 48.00000 NaN 1
992
- ## 15 yellow 81.11111 76.38000 11
1156
+ ## # A tibble: 15 x 4
1157
+ ## eye_color mean_mass mean_birth_year count
1158
+ ## <chr> <dbl> <dbl> <dbl>
1159
+ ## 1 black 76.3 33 10
1160
+ ## 2 blue 86.5 67.1 19
1161
+ ## 3 blue-gray 77 57 1
1162
+ ## 4 brown 66.1 109. 21
1163
+ ## 5 dark NaN NaN 1
1164
+ ## 6 gold NaN NaN 1
1165
+ ## 7 green, yellow 159 NaN 1
1166
+ ## 8 hazel 66 34.5 3
1167
+ ## 9 orange 282. 231 8
1168
+ ## 10 pink NaN NaN 1
1169
+ ## 11 red 81.4 33.7 5
1170
+ ## 12 red, blue NaN NaN 1
1171
+ ## 13 unknown 31.5 NaN 3
1172
+ ## 14 white 48 NaN 1
1173
+ ## 15 yellow 81.1 76.4 11
993
1174
  ```
994
1175
 
1176
+ # Further reading
1177
+
1178
+ For more information on GraalVM, TruffleRuby, fastR, R and Galaaz check out the following sites/posts:
1179
+
1180
+ * [GraalVM Home](https://www.graalvm.org/)
1181
+ * [TruffleRuby](https://github.com/oracle/truffleruby)
1182
+ * [FastR](https://github.com/oracle/fastr)
1183
+ * [Faster R with FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
1184
+ * [How to make Beautiful Ruby Plots with Galaaz](https://medium.freecodecamp.org/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857)
1185
+ * [Ruby Plotting with Galaaz: An example of tightly coupling Ruby and R in GraalVM](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021)
1186
+ * [How to do reproducible research in Ruby with gKnit](https://towardsdatascience.com/how-to-do-reproducible-research-in-ruby-with-gknit-c26d2684d64e)
1187
+ * [R for Data Science](https://r4ds.had.co.nz/)
1188
+ * [Advanced R](https://adv-r.hadley.nz/)
1189
+
995
1190
  # Conclusion
996
1191
 
997
1192
  Ruby and Galaaz provide a nice framework for developing code that uses R functions. Although R is