galaaz 0.4.10 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (163) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +2048 -531
  3. data/Rakefile +3 -2
  4. data/bin/gknit +152 -6
  5. data/bin/gknit-draft +105 -0
  6. data/bin/gknit-draft.rb +28 -0
  7. data/bin/gknit_Rscript +127 -0
  8. data/bin/grun +27 -1
  9. data/bin/gstudio +47 -4
  10. data/bin/{gstudio.rb → gstudio_irb.rb} +0 -0
  11. data/bin/gstudio_pry.rb +7 -0
  12. data/blogs/galaaz_ggplot/galaaz_ggplot.html +10 -195
  13. data/blogs/galaaz_ggplot/galaaz_ggplot.md +404 -0
  14. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/midwest_rb.png +0 -0
  15. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/scatter_plot_rb.png +0 -0
  16. data/blogs/gknit/gknit.Rmd +5 -3
  17. data/blogs/gknit/gknit.pdf +0 -0
  18. data/blogs/gknit/lst.rds +0 -0
  19. data/blogs/manual/lst.rds +0 -0
  20. data/blogs/manual/manual.Rmd +826 -53
  21. data/blogs/manual/manual.html +2338 -695
  22. data/blogs/manual/manual.md +2032 -539
  23. data/blogs/manual/manual.pdf +0 -0
  24. data/blogs/manual/manual.tex +1804 -594
  25. data/blogs/manual/manual_files/figure-html/bubble-1.png +0 -0
  26. data/blogs/manual/manual_files/figure-html/diverging_bar.png +0 -0
  27. data/blogs/manual/manual_files/figure-latex/bubble-1.png +0 -0
  28. data/blogs/manual/manual_files/figure-latex/diverging_bar.pdf +0 -0
  29. data/blogs/manual/model.rb +41 -0
  30. data/blogs/nse_dplyr/nse_dplyr.Rmd +226 -73
  31. data/blogs/nse_dplyr/nse_dplyr.html +254 -336
  32. data/blogs/nse_dplyr/nse_dplyr.md +353 -158
  33. data/blogs/oh_my/oh_my.html +274 -386
  34. data/blogs/oh_my/oh_my.md +208 -205
  35. data/blogs/ruby_plot/ruby_plot.html +20 -205
  36. data/blogs/ruby_plot/ruby_plot.md +14 -15
  37. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
  38. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
  39. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
  40. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
  41. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
  42. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
  43. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
  44. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
  45. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
  46. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
  47. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
  48. data/examples/Bibliography/master.bib +50 -0
  49. data/examples/Bibliography/stats.bib +72 -0
  50. data/examples/islr/x_y_rnorm.jpg +0 -0
  51. data/examples/latex_templates/Test-acm_article/Makefile +16 -0
  52. data/examples/latex_templates/Test-acm_article/Test-acm_article.Rmd +65 -0
  53. data/examples/latex_templates/Test-acm_article/acm_proc_article-sp.cls +1670 -0
  54. data/examples/latex_templates/Test-acm_article/sensys-abstract.cls +703 -0
  55. data/examples/latex_templates/Test-acm_article/sigproc.bib +59 -0
  56. data/examples/latex_templates/Test-acs_article/Test-acs_article.Rmd +260 -0
  57. data/examples/latex_templates/Test-acs_article/Test-acs_article.pdf +0 -0
  58. data/examples/latex_templates/Test-acs_article/acs-Test-acs_article.bib +11 -0
  59. data/examples/latex_templates/Test-acs_article/acs-my_output.bib +11 -0
  60. data/examples/latex_templates/Test-acs_article/acstest.bib +17 -0
  61. data/examples/latex_templates/Test-aea_article/AEA.cls +1414 -0
  62. data/{blogs/gknit/marshal.dump → examples/latex_templates/Test-aea_article/BibFile.bib} +0 -0
  63. data/examples/latex_templates/Test-aea_article/Test-aea_article.Rmd +108 -0
  64. data/examples/latex_templates/Test-aea_article/Test-aea_article.pdf +0 -0
  65. data/examples/latex_templates/Test-aea_article/aea.bst +1269 -0
  66. data/examples/latex_templates/Test-aea_article/multicol.sty +853 -0
  67. data/examples/latex_templates/Test-aea_article/references.bib +0 -0
  68. data/examples/latex_templates/Test-aea_article/setspace.sty +546 -0
  69. data/examples/latex_templates/Test-amq_article/Test-amq_article.Rmd +256 -0
  70. data/examples/latex_templates/Test-amq_article/Test-amq_article.pdf +0 -0
  71. data/examples/latex_templates/Test-amq_article/Test-amq_article.pdfsync +3397 -0
  72. data/examples/latex_templates/Test-amq_article/pics/Figure2.pdf +0 -0
  73. data/examples/latex_templates/Test-ams_article/Test-ams_article.Rmd +215 -0
  74. data/examples/latex_templates/Test-ams_article/amstest.bib +436 -0
  75. data/examples/latex_templates/Test-asa_article/Test-asa_article.Rmd +153 -0
  76. data/examples/latex_templates/Test-asa_article/Test-asa_article.pdf +0 -0
  77. data/examples/latex_templates/Test-asa_article/agsm.bst +1353 -0
  78. data/examples/latex_templates/Test-asa_article/bibliography.bib +233 -0
  79. data/examples/latex_templates/Test-ieee_article/IEEEtran.bst +2409 -0
  80. data/examples/latex_templates/Test-ieee_article/IEEEtran.cls +6346 -0
  81. data/examples/latex_templates/Test-ieee_article/Test-ieee_article.Rmd +175 -0
  82. data/examples/latex_templates/Test-ieee_article/Test-ieee_article.pdf +0 -0
  83. data/examples/latex_templates/Test-ieee_article/mybibfile.bib +20 -0
  84. data/examples/latex_templates/Test-rjournal_article/RJournal.sty +335 -0
  85. data/examples/latex_templates/Test-rjournal_article/RJreferences.bib +18 -0
  86. data/examples/latex_templates/Test-rjournal_article/RJwrapper.pdf +0 -0
  87. data/examples/latex_templates/Test-rjournal_article/Test-rjournal_article.Rmd +52 -0
  88. data/examples/latex_templates/Test-springer_article/Test-springer_article.Rmd +65 -0
  89. data/examples/latex_templates/Test-springer_article/Test-springer_article.pdf +0 -0
  90. data/examples/latex_templates/Test-springer_article/bibliography.bib +26 -0
  91. data/examples/latex_templates/Test-springer_article/spbasic.bst +1658 -0
  92. data/examples/latex_templates/Test-springer_article/spmpsci.bst +1512 -0
  93. data/examples/latex_templates/Test-springer_article/spphys.bst +1443 -0
  94. data/examples/latex_templates/Test-springer_article/svglov3.clo +113 -0
  95. data/examples/latex_templates/Test-springer_article/svjour3.cls +1431 -0
  96. data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.Rmd +73 -0
  97. data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.pdf +0 -0
  98. data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.Rmd +382 -0
  99. data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.pdf +0 -0
  100. data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.Rmd +164 -0
  101. data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.pdf +0 -0
  102. data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.Rmd +92 -0
  103. data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.pdf +0 -0
  104. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/attend-grade-relationships.csv +482 -0
  105. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.Rmd +280 -0
  106. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.pdf +0 -0
  107. data/examples/rmarkdown/svm-xaringan-example/svm-xaringan-example.Rmd +386 -0
  108. data/lib/R_interface/r.rb +1 -1
  109. data/lib/R_interface/r_libs.R +1 -1
  110. data/lib/R_interface/r_methods.rb +10 -0
  111. data/lib/R_interface/rpkg.rb +1 -0
  112. data/lib/R_interface/rsupport.rb +4 -6
  113. data/lib/gknit.rb +2 -0
  114. data/lib/gknit/draft.rb +105 -0
  115. data/lib/gknit/knitr_engine.rb +0 -33
  116. data/lib/util/exec_ruby.rb +1 -27
  117. data/specs/figures/bg.jpeg +0 -0
  118. data/specs/figures/bg.png +0 -0
  119. data/specs/figures/dose_len.png +0 -0
  120. data/specs/figures/no_args.jpeg +0 -0
  121. data/specs/figures/no_args.png +0 -0
  122. data/specs/figures/width_height.jpeg +0 -0
  123. data/specs/figures/width_height.png +0 -0
  124. data/specs/figures/width_height_units1.jpeg +0 -0
  125. data/specs/figures/width_height_units1.png +0 -0
  126. data/specs/figures/width_height_units2.jpeg +0 -0
  127. data/specs/figures/width_height_units2.png +0 -0
  128. data/specs/r_dataframe.spec.rb +11 -11
  129. data/specs/ruby_expression.spec.rb +1 -0
  130. data/specs/tmp.rb +41 -20
  131. data/version.rb +1 -1
  132. metadata +73 -35
  133. data/blogs/galaaz_ggplot/galaaz_ggplot.aux +0 -41
  134. data/blogs/galaaz_ggplot/galaaz_ggplot.out +0 -10
  135. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/midwest_rb.pdf +0 -0
  136. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/scatter_plot_rb.pdf +0 -0
  137. data/blogs/gknit/gknit.md +0 -1430
  138. data/blogs/gknit/gknit.tex +0 -1358
  139. data/blogs/manual/graph.rb +0 -29
  140. data/blogs/nse_dplyr/nse_dplyr.tex +0 -1373
  141. data/blogs/ruby_plot/ruby_plot.Rmd_external_figs +0 -662
  142. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +0 -57
  143. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +0 -106
  144. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +0 -110
  145. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +0 -174
  146. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +0 -236
  147. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +0 -296
  148. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +0 -236
  149. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +0 -218
  150. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +0 -128
  151. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +0 -150
  152. data/blogs/ruby_plot/ruby_plot_files/figure-latex/dose_len.png +0 -0
  153. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_delivery.png +0 -0
  154. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_dose.png +0 -0
  155. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color.png +0 -0
  156. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color2.png +0 -0
  157. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_decorations.png +0 -0
  158. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_jitter.png +0 -0
  159. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_points.png +0 -0
  160. data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_box_plot.png +0 -0
  161. data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_violin_plot.png +0 -0
  162. data/blogs/ruby_plot/ruby_plot_files/figure-latex/violin_with_jitter.png +0 -0
  163. data/examples/paper/paper.rb +0 -36
@@ -0,0 +1,41 @@
1
+ require 'galaaz'
2
+
3
+ # Loads the R 'caret' package. If not present, installs it
4
+ R.install_and_loads 'caret'
5
+
6
+ class Model
7
+
8
+ attr_reader :data
9
+ attr_reader :test
10
+ attr_reader :train
11
+
12
+ #==========================================================
13
+ #
14
+ #==========================================================
15
+
16
+ def initialize(data, percent_train:, seed: 123)
17
+
18
+ R.set__seed(seed)
19
+ @data = data
20
+ @percent_train = percent_train
21
+ @seed = seed
22
+
23
+ end
24
+
25
+ #==========================================================
26
+ #
27
+ #==========================================================
28
+
29
+ def partition(field)
30
+
31
+ train_index =
32
+ R.createDataPartition(@data.send(field), p: @percet_train,
33
+ list: false, times: 1)
34
+ @train = @data[train_index, :all]
35
+ @test = @data[-train_index, :all]
36
+
37
+ end
38
+
39
+ end
40
+
41
+
@@ -21,36 +21,67 @@ fontsize: 11pt
21
21
  ---
22
22
 
23
23
  ```{r setup, echo=FALSE, message = FALSE}
24
+ #R.options(crayon__enabled: false)
25
+ options(crayon.enabled = FALSE)
24
26
  library('dplyr')
25
27
  library('tibble')
26
28
  ```
27
29
 
28
30
  # Introduction
29
31
 
30
- In this post we will see how to program with _dplyr_ in Galaaz.
32
+ According to Steven Sagaert answer on Quora about "Is programming language R overrated?":
31
33
 
32
- ## But first, what is Galaaz??
34
+ > R is a sophisticated language with an unusual (i.e. non-mainstream) set of features. It‘s
35
+ > an impure functional programming language with sophisticated metaprogramming and 3
36
+ > different OO systems.
37
+
38
+ > Just like common lisp you can completely customise how things work via metaprogramming.
39
+ > The biggest example is the tidyverse: by creating it’s own evaluation system (tidyeval)
40
+ > was able to create a custom syntax for dplyr.
41
+
42
+ > Mastering R (the language) and its ecosystem is not a matter of weeks or months but
43
+ > takes years. The rabbit hole goes pretty deep…
44
+
45
+ Although having a highly configurable language might give extreme power to the programmer,
46
+ it can also be, as stated above, a question of years to master it. Programming with _dplyr_
47
+ for instance, requires learning a set of complex concepts and rules that are not easily
48
+ accessible for casual users or _unsofisticated_ programmers as many users of R are. Being
49
+ _unsofisticated_ is NOT used here in a negative sense, as R was build for statitians and
50
+ not programmers, that need to solve real problems, often in a short time spam and are not
51
+ concerned about creating complex computer systems.
52
+
53
+ Unfortunatelly, if this _unsofisticated_ programmer decides to move unto more sofisticated
54
+ coding, the learning curve might become a serious impediment.
55
+
56
+ In this post we will see how to program with _dplyr_ in Galaaz and how Ruby can simplify
57
+ the learning curve of mastering _dplyr_ coding.
58
+
59
+ # But first, what is Galaaz??
33
60
 
34
61
  Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with
35
- a large community, a very large set of libraries and great for web development. However,
62
+ a large community, a very large set of libraries and great for web development. It is also
63
+ easy to learn. However,
36
64
  it lacks libraries for data science, statistics, scientific plotting and machine learning.
37
65
  On the other hand, R is considered one of the most powerful languages for solving all of the
38
66
  above problems. Maybe the strongest competitor to R is Python with libraries such as NumPy,
39
- Pandas, SciPy, SciKit-Learn and many more.
67
+ Pandas, SciPy, SciKit-Learn and many more. We will not get here in the discussion on R
68
+ versus Python, both are excellent languages with powerful features, benefits and drawbacks.
69
+ Our interest is to bring to yet another excellent language, Ruby, the data science libraries
70
+ that it lacks.
40
71
 
41
72
  With Galaaz we do not intend to re-implement any of the scientific libraries in R. However, we
42
73
  allow for very tight coupling between the two languages to the point that the Ruby
43
74
  developer does not need to know that there is an R engine running. Also, from the point of
44
- view of the R user/developer Galaaz looks a lot like R, with just minor syntactic difference,
45
- so there is almost no learning courve for the R developer. And as we will see in this
46
- post, programming with _dplyr_ is easier in Galaaz than in R.
75
+ view of the R user/developer, Galaaz looks a lot like R, with just minor syntactic difference,
76
+ so there is almost no learning curve for the R developer. And as we will see in this
77
+ post that programming with _dplyr_ is easier in Galaaz than in R.
47
78
 
48
- R users are probably quite knowledgeable about _dplyr_, for the Ruby developer, _dplyr_ and
79
+ R users are probably quite knowledgeable about _dplyr_. For the Ruby developer, _dplyr_ and
49
80
  the _tidyverse_ libraries are a set of libraries for data manipulation in R, developed by
50
81
  Hardley Wickham, chief scientis at RStudio and a prolific R coder and writer.
51
82
 
52
- For the coupling of Ruby and R we use new technologies provided by Oracle: GraalVM,
53
- TruffleRuby and FastR:
83
+ For the coupling of Ruby and R, we use new technologies provided by Oracle: GraalVM,
84
+ TruffleRuby and FastR. GraalVM home page had the following definition:
54
85
 
55
86
  GraalVM is a universal virtual machine for running applications
56
87
  written in JavaScript, Python 3, Ruby, R, JVM-based languages like Java,
@@ -79,22 +110,39 @@ TruffleRuby and FastR:
79
110
  With GraalVM we aim to allow developers to freely choose the right language
80
111
  for the task at hand without making compromises.
81
112
 
82
- Interested readers should also check out the following sites:
83
113
 
84
- * [GraalVM Home](https://www.graalvm.org/)
85
- * [TruffleRuby](https://github.com/oracle/truffleruby)
86
- * [FastR](https://github.com/oracle/fastr)
87
- * [Faster R with FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
88
- * [How to make Beautiful Ruby Plots with Galaaz](https://medium.freecodecamp.org/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857)
89
- * [Ruby Plotting with Galaaz: An example of tightly coupling Ruby and R in GraalVM](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021)
90
- * [How to do reproducible research in Ruby with gKnit](https://towardsdatascience.com/how-to-do-reproducible-research-in-ruby-with-gknit-c26d2684d64e)
91
- * [R for Data Science](https://r4ds.had.co.nz/)
92
- * [Advanced R](https://adv-r.hadley.nz/)
114
+ # Tidyverse and dplyr
115
+
116
+ In [What is the tidyverse?](https://rviews.rstudio.com/2017/06/08/what-is-the-tidyverse/) the
117
+ tidyverse is explained as follows:
93
118
 
94
- ## Programming with dplyr
119
+ > The tidyverse is a coherent system of packages for data manipulation, exploration and
120
+ > visualization that share a common design philosophy. These were mostly developed by
121
+ > Hadley Wickham himself, but they are now being expanded by several contributors. Tidyverse
122
+ > packages are intended to make statisticians and data scientists more productive by
123
+ > guiding them through workflows that facilitate communication, and result in reproducible
124
+ > work products. Fundamentally, the tidyverse is about the connections between the tools
125
+ > that make the workflow possible.
95
126
 
96
- This post will follow closely the work done in https://dplyr.tidyverse.org/articles/programming.html,
97
- by Hardley Wickham. In it, Hardley states:
127
+ _dplyr_ is one of the many packages that are part of the tidyverse. It is:
128
+
129
+ > a grammar of data manipulation, providing a consistent set of verbs that help you solve
130
+ > the most common data manipulation challenges:
131
+
132
+ > 1. mutate() adds new variables that are functions of existing variables
133
+ > 2. select() picks variables based on their names.
134
+ > 3. filter() picks cases based on their values.
135
+ > 4. summarise() reduces multiple values down to a single summary.
136
+ > 5. arrange() changes the ordering of the rows.
137
+
138
+ Very often R is used interactively and users use _dplyr_ to manipulate a single dataset
139
+ without programming. When users want to replicate their work for
140
+ multiple datasets, programming becomes necessary.
141
+
142
+ # Programming with dplyr
143
+
144
+ In the vignette ["Programming with dplyr"](https://dplyr.tidyverse.org/articles/programming.html),
145
+ Hardley Wickham states:
98
146
 
99
147
  > Most dplyr functions use non-standard evaluation (NSE). This is a catch-all term that
100
148
  > means they don’t follow the usual R rules of evaluation. Instead, they capture the
@@ -109,6 +157,8 @@ by Hardley Wickham. In it, Hardley states:
109
157
  > database backends because dplyr itself doesn’t do any work, but instead generates the SQL
110
158
  > that tells the database what to do.
111
159
 
160
+ But then he goes on:
161
+
112
162
  > Unfortunately these benefits do not come for free. There are two main drawbacks:
113
163
 
114
164
  > Most dplyr arguments are not referentially transparent. That means you can’t replace a value
@@ -132,8 +182,27 @@ filter(df, my_var == 1)
132
182
  ```
133
183
  > This makes it hard to create functions with arguments that change how dplyr verbs are computed.
134
184
 
135
- In this post we will see that programming with _dplyr_ in Galaaz does not require knowledge of
136
- non-standard evaluation in R and can be accomplished by utilizing normal Ruby constructs.
185
+ As a result of this, programming with _dplyr_ requires learning a set of new ideas and concepts.
186
+ In this vignette Hardley goes on showing how to program ever more difficult problems with _dplyr_,
187
+ showing the problems it faces and the new concepts needed to solve them.
188
+
189
+ In this blog, we will look at all the problems presented by Harley on the vignette and show how
190
+ those same problems can be solved using Galaaz and the Ruby language.
191
+
192
+ This blog is organized as follows: first we show how to write expressions using Galaaz.
193
+ Expressions are a fundamental concept in _dplyr_ and are not part of basic Ruby. We extend
194
+ the Ruby language create a manipulate expressions that will be used by _dplyr_ functions.
195
+
196
+ Then we show very succintly how Ruby and R can be integrated and how R functions are
197
+ transparently called from Ruby. Galaaz [user manual](https://github.com/rbotafogo/galaaz/wiki)
198
+ (still in development) goes in much deeper detail about this integration.
199
+
200
+ Next in section "Data manipulation wiht _dplyr_" we go through all the problems on the
201
+ _dplyr_ vignette and look at how they are solved in Galaaz. We then discuss why programming
202
+ with Galaaz and _dplyr_ is easier than programming with _dplyr_ in plain R.
203
+
204
+ The following section looks at another more advanced problem and shows that Galaaz can still
205
+ handle it without any difficulty. We then provide further reading and concluding remarks.
137
206
 
138
207
  # Writing Expressions in Galaaz
139
208
 
@@ -142,24 +211,42 @@ Galaaz extends Ruby to work with expressions, similar to R's expressions build w
142
211
  formulae. For instance, in mathematics, the expression $y = sin(x)$ describes a function but cannot
143
212
  be computed unless the value of $x$ is bound to some value.
144
213
 
145
- Let's take a look at some of those expressions in Ruby:
214
+ Expressions are fundamental in _dplyr_ programming as they are the input to _dplyr_ functions,
215
+ for instance, as we will see shortly, if a data frame has a column named 'x' and we want
216
+ to add another column, y, to this dataframe that has the values of 'x' times 2, then we would
217
+ call a _dplyr_ function with the expression 'y = x * 2'.
218
+
219
+ ## A note on notation
220
+
221
+ This blog was written in Rmarkdown and automatically converted to HTML or PDF (depending on
222
+ where you are reading this blog) with gKnit (a tool provided by Galaaz). In Rmarkdown, it is
223
+ possible to write text and code blocks that are executed to generate the final report. Code
224
+ blocks appear inside a 'box' and the result of their execution appear either in another type
225
+ of 'box' with a different background (HTML) or as normal text (PDF). Every output line from
226
+ the code execution is preceeded by '##'.
146
227
 
147
228
  ## Expressions from operators
148
229
 
149
- The code bellow creates an expression summing two symbols. Note that :a and :b are Ruby symbols and
150
- are not bound to any value at the time of expression definition:
230
+ The code below creates an expression summing two symbols. Note that :a and :b are Ruby symbols and
231
+ are not bound to any values at the time of expression definition:
151
232
 
152
233
  ```{ruby expressions}
153
234
  exp1 = :a + :b
154
235
  puts exp1
155
236
  ```
156
- We can build any complex mathematical expression such as:
237
+ In Galaaz, we can build any complex mathematical expression such as:
157
238
 
158
239
  ```{ruby expr2}
159
240
  exp2 = (:a + :b) * 2.0 + :c ** 2 / :z
160
241
  puts exp2
161
242
  ```
162
- The 'L' after two indicates that 2 is an integer.
243
+ Expressions are printed with the same format as the equivalent R expressions. The 'L' after
244
+ 2 indicates that 2 is an integer.
245
+
246
+ The R developer should note that in R, if she writes the
247
+ number '2', the R interpreter will convert it to float. In order to get an interger she
248
+ should write '2L'. Galaaz follows Ruby notation and '2' is an integer, while '2.0' is a
249
+ float.
163
250
 
164
251
  It is also possible to use inequality operators in building expressions:
165
252
 
@@ -171,7 +258,7 @@ Expressions' definition can also make use of normal Ruby variables without any p
171
258
 
172
259
  ```{ruby expr_with_var}
173
260
  x = 20
174
- y = 30
261
+ y = 30.0
175
262
  exp_var = (:a + :b) * x <= :z - y
176
263
  puts exp_var
177
264
  ```
@@ -185,10 +272,12 @@ exp4 = (:a + :b).ge :z
185
272
  puts exp4
186
273
  ```
187
274
 
188
- Two type of expression, however, can only be created with the functional representation
189
- of the operators, those are expressions involving '==', and '='. In order to write an
190
- expression involving '==' we
191
- need to use the method '.eq' and for '=' we need the function '.assign'
275
+ Two types of expressions, however, can only be created with the functional representation
276
+ of the operators. Those are expressions involving '==', and '='. This is the case since
277
+ those symbols have special meaning in Ruby and should not be redefined.
278
+
279
+ In order to write an expression involving '==' we
280
+ need to use the method '.eq' and for '=' we need the function '.assign':
192
281
 
193
282
  ```{ruby expr5}
194
283
  exp5 = (:a + :b).eq :z
@@ -199,24 +288,24 @@ puts exp5
199
288
  exp6 = :y.assign :a + :b
200
289
  puts exp6
201
290
  ```
202
- In general we think that using the functional notation is preferable to using the
203
- symbolic notation as otherwise, we end up writing invalid expressions such as
291
+ Users should be careful when writing expressions not to inadvertently use '==' or '=' as
292
+ this will generate an error, that might be a bit cryptic (in future releases of Galaza, we
293
+ plan to improve the error message).
204
294
 
205
295
  ```{ruby exp_wrong, warning=FALSE}
206
296
  exp_wrong = (:a + :b) == :z
207
297
  puts exp_wrong
208
298
  ```
209
- and it might be difficult to understand what is going on here. The problem lies with the fact that
210
- when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When the
211
- comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols at
212
- this time are not bound to anything and we get a "object 'a' not found" message.
213
- If we only use functional notation, this type of error will not occur.
299
+ The problem lies with the fact that
300
+ when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When this
301
+ comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols, at
302
+ this time, are not bound to anything giving the "object 'a' not found" message.
214
303
 
215
304
  ## Expressions with R methods
216
305
 
217
306
  It is often necessary to create an expression that uses a method or function. For instance, in
218
307
  mathematics, it's quite natural to write an expressin such as $y = sin(x)$. In this case, the
219
- 'sin' function is part of the expression and should not immediately be executed. When we want
308
+ 'sin' function is part of the expression and should not be immediately executed. When we want
220
309
  the function to be part of the expression, we call the function preceeding it
221
310
  by the letter E, such as 'E.sin(x)'
222
311
 
@@ -224,32 +313,37 @@ by the letter E, such as 'E.sin(x)'
224
313
  exp7 = :y.assign E.sin(:x)
225
314
  puts exp7
226
315
  ```
227
-
228
- Expressions can also be written using '.' notation:
316
+ Function expressions can also be written using '.' notation:
229
317
 
230
318
  ```{ruby expression_with_dot}
231
319
  exp8 = :y.assign :x.sin
232
320
  puts exp8
233
321
  ```
234
-
235
- When a function has multiple arguments, the first one can be used before the '.':
322
+ When a function has multiple arguments, the first one can be used before the '.'. For instance,
323
+ the R concatenate function 'c', that concatenates two or more arguments can be part of
324
+ an expression as:
236
325
 
237
326
  ```{ruby expression_multiple_args}
238
327
  exp9 = :x.c(:y)
239
328
  puts exp9
240
329
  ```
330
+ Note that this gives an OO feeling to the code, as if we were saying 'x' concatenates 'y'. As a
331
+ side note, '.' notation can be used as the R pipe operator '%>%', but is more general than the
332
+ pipe.
241
333
 
242
334
  ## Evaluating an Expression
243
335
 
244
- Expressions can be evaluated by calling function 'eval' with a binding. A binding can be provided
245
- with a list:
336
+ Although we are mainly focusing on expressions to pass them to _dplyr_ functions, expressions
337
+ can be evaluated by calling function 'eval' with a binding.
338
+
339
+ A binding can be provided with a list or a data frame as shown below:
246
340
 
247
341
  ```{ruby eval_expression_list}
248
342
  exp = (:a + :b) * 2.0 + :c ** 2 / :z
249
343
  puts exp.eval(R.list(a: 10, b: 20, c: 30, z: 40))
250
344
  ```
251
345
 
252
- ... with a data frame:
346
+ with a data frame:
253
347
 
254
348
  ```{ruby eval_expression_df}
255
349
  df = R.data__frame(
@@ -269,8 +363,8 @@ this post, we do not have enough space to write a complete manual on Galaaz
269
363
  (a short manual can be found at: https://www.rubydoc.info/gems/galaaz/0.4.9), so we will
270
364
  present only a few examples scripts using Galaaz.
271
365
 
272
- Basically, to call an R function from Ruby with Galaaz, one only needs to preceed the function
273
- with 'R.'. For instance, to create a vector in R, the 'c' function is used. From Galaaz, a
366
+ Basically, to call an R function from Ruby with Galaaz, one only needs to preced the function
367
+ with 'R.'. For instance, to create a vector in R, the 'c' function is used. In Galaaz, a
274
368
  vector can be created by using 'R.c':
275
369
 
276
370
  ```{ruby vector}
@@ -298,19 +392,65 @@ exp10 = :y.assign R.sin(45)
298
392
  puts exp10
299
393
  ```
300
394
 
301
- # Filtering using expressions
395
+ # Data manipulation wiht _dplyr_
396
+
397
+ In this section we will give a brief tour _dplyr_'s usage in Galaaz and how to manipulate
398
+ data in Ruby with it. This section will follow [_dplyr_'s vignette](https://dplyr.tidyverse.org/articles/dplyr.html) that explores the nycflights13 data set. This dataset contains all 336776
399
+ flights that departed from New York City in 2013. The data comes from the US Bureau of
400
+ Transportation Statistics.
401
+
402
+ Let's start by taking a look at this dataset:
403
+
404
+ ```{ruby nycflights13}
405
+ R.library('nycflights13')
406
+ # check it's dimension
407
+ puts ~:flights.dim
408
+ # and the structure
409
+ ~:flights.str
410
+ ```
411
+
412
+ Now, let's use a first verb of _dplyr_: 'filter'. This verb, obviously, will filter the data
413
+ by the given expression. In the next block, we filter by columns 'month' and 'day'. The
414
+ first argument to the filter function is symbol ':flights'. A Ruby symbol, when given to
415
+ an R function will convert to the R variable of the same name, in this case 'flights', that
416
+ holds the nycflights13 data frame.
417
+
418
+ The second and third arguments are expressions that will be used by the filter function to
419
+ filter by columns, looking for entries in which the month and day are equal to 1.
420
+
421
+ ```{ruby filter}
422
+ puts R.filter(:flights, (:month.eq 1), (:day.eq 1))
423
+ ```
424
+
425
+
426
+ ## Programming with _dplyr_: problems and how to solve them in Galaaz
302
427
 
303
- Now that we know how to write expression and call R functions let's do some data manipulation in
304
- Galaaz. Let's first start by creating the same data frame that we created previously in section
305
- "Programming with dplyr":
428
+ In this section we look at the list of problems that Hardley describes in the "Programming with dplyr"
429
+ vignette and show how those problems are solved and coded with Galaaz. Readers interested in
430
+ how those problems are treated in _dplyr_ should read the vignette and use it as a comparison with
431
+ this blog.
432
+
433
+ ## Filtering using expressions
434
+
435
+ Now that we know how to write expressions and call R functions, let's do some data manipulation in
436
+ Galaaz. Let's first start by creating a data frame. In R, the 'data.frame' function creates a
437
+ data frame. In Ruby, writing 'data.frame' will not parse as a single object. To call R
438
+ functions that have a '.' in them, we need to substitute the '.' with '__'. So, method
439
+ 'data.frame' in R, is called in Galaaz as 'R.data\_\_frame':
306
440
 
307
441
  ```{ruby df}
308
442
  df = R.data__frame(x: (1..3), y: (3..1))
309
443
  puts df
310
444
  ```
311
- The 'filter' function can be called on this data frame either by using 'R.filter(df, ...)' or
312
- by using dot notation. We prefer to use dot notation as shown bellow. The argument to 'filter'
313
- in Galaaz should be an expression. Note that if we gave to filter a Ruby expression such as
445
+
446
+ _dplyr_ provides the 'filter' function, that filters data in a data brame. The 'filter'
447
+ function can be called on this data frame either by using 'R.filter(df, ...)' or
448
+ by using dot notation.
449
+
450
+ -------FIX---------
451
+
452
+ We prefer to use dot notation as shown bellow. The argument to 'filter' should be an
453
+ expression. Note that if we gave to filter a Ruby expression such as
314
454
  'x == 1', we would get an error, since there is no variable 'x' defined and if 'x' was a variable
315
455
  then 'x == 1' would either be 'true' or 'false'. Our goal is to filter our data frame returning
316
456
  all rows in which the 'x' value is equal to 1. To express this we want: ':x.eq 1', where :x will
@@ -319,7 +459,6 @@ be interpreted by filter as the 'x' column.
319
459
  ```{ruby filter_exp}
320
460
  puts df.filter(:x.eq 1)
321
461
  ```
322
-
323
462
  In R, and when coding with 'tidyverse', arguments to a function are usually not
324
463
  *referencially transparent*. That is, you can’t replace a value with a seemingly equivalent
325
464
  object that you’ve defined elsewhere. In other words, this code
@@ -370,7 +509,7 @@ puts df.filter(:x.eq :y)
370
509
  # in this case, the number 1
371
510
  puts df.filter(:x.eq y)
372
511
  ```
373
- # Writing a function that applies to different data sets
512
+ ## Writing a function that applies to different data sets
374
513
 
375
514
  Let's suppose that we want to write a function that receives as the first argument a data frame
376
515
  and as second argument an expression that adds a column to the data frame that is equal to the
@@ -417,7 +556,7 @@ definition of 'mutate\_y' above:
417
556
  a = 10
418
557
  mutate_y(df1)
419
558
  ```
420
- # Different expressions
559
+ ## Different expressions
421
560
 
422
561
  Let's move to the next problem as presented by Hardley where trying to write a function in R
423
562
  that will receive two argumens, the first a variable and the second an expression is not trivial.
@@ -434,13 +573,13 @@ df <- data.frame(
434
573
  b = sample(5)
435
574
  )
436
575
 
437
- as.data.frame(df)
576
+ as.data.frame(df)
438
577
 
439
578
  d2 <- df %>%
440
579
  group_by(g1) %>%
441
580
  summarise(a = mean(a))
442
581
 
443
- as.data.frame(d2)
582
+ as.data.frame(d2)
444
583
 
445
584
  d2 <- df %>%
446
585
  group_by(g2) %>%
@@ -480,18 +619,18 @@ def my_summarize(df, group_var)
480
619
  summarize(a: :a.mean)
481
620
  end
482
621
 
483
- puts my_summarize(:df, :g1).as__data__frame
622
+ puts my_summarize(:df, :g1)
484
623
  ```
485
624
  It works!!! Well, let's make sure this was not just some coincidence
486
625
 
487
626
  ```{ruby group_g2}
488
- puts my_summarize(:df, :g2).as__data__frame
627
+ puts my_summarize(:df, :g2)
489
628
  ```
490
629
 
491
630
  Great, everything is fine! No magic, no new functions, no complexities, just normal, standard Ruby
492
631
  code. If you've ever done NSE in R, this certainly feels much safer and easy to implement.
493
632
 
494
- # Different input variables
633
+ ## Different input variables
495
634
 
496
635
  In the previous section we've managed to get rid of all NSE formulation for a simple example, but
497
636
  does this remain true for more complex examples, or will the Galaaz way prove inpractical for
@@ -533,7 +672,7 @@ puts my_summarise2((~:df), :a * :b)
533
672
  Once again, there is no need to use any special theory or functions. The only point to be
534
673
  careful about is the use of 'E' to build expressions from functions 'mean', 'sum' and 'n'.
535
674
 
536
- # Different input and output variable
675
+ ## Different input and output variable
537
676
 
538
677
  Now the next challenge presented by Hardley is to vary the name of the output variables based on
539
678
  the received expression. So, if the input expression is 'a', we want our data frame columns to
@@ -586,7 +725,7 @@ and variable mean\_name is not followed by ':' but by '=>'. This is standard Ru
586
725
 
587
726
  [explain....]
588
727
 
589
- # Capturing multiple variables
728
+ ## Capturing multiple variables
590
729
 
591
730
  Moving on with new complexities, Hardley proposes us to solve the problem in which the
592
731
  summarise function will receive any number of grouping variables.
@@ -600,7 +739,7 @@ def my_summarise3(df, *group_vars)
600
739
  summarise(a: E.mean(:a))
601
740
  end
602
741
 
603
- puts my_summarise3((~:df), :g1, :g2).as__data__frame
742
+ puts my_summarise3((~:df), :g1, :g2)
604
743
  ```
605
744
 
606
745
  # Why does R require NSE and Galaaz does not?
@@ -633,7 +772,7 @@ call the function passing the expression ':a.eq :b'.
633
772
 
634
773
  # Advanced dplyr features
635
774
 
636
- In the blog: Programming with dplyr by using dplyr (https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/) Iñaki Úcar shows surprise that some R users are trying to code in dplyr avoiding
775
+ In the blog: [Programming with dplyr by using dplyr](https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/) Iñaki Úcar shows surprise that some R users are trying to code in dplyr avoiding
637
776
  the use of NSE. For instance he says:
638
777
 
639
778
  > Take the example of seplyr. It stands for standard evaluation dplyr, and enables us to
@@ -651,7 +790,7 @@ In the following examples, we show the use of functions 'group\_by\_at', 'summar
651
790
  features of characters in the Starwars movies:
652
791
 
653
792
  ```{ruby starwars}
654
- puts (~:starwars).head.as__data__frame
793
+ puts (~:starwars).head
655
794
  ```
656
795
  The grouped_mean function bellow will receive a grouping variable and calculate summaries for
657
796
  the value\_variables given:
@@ -682,9 +821,23 @@ def grouped_mean(data, grouping_variables, value_variables)
682
821
  rename_at(value_variables, E.funs(E.paste0("mean_", value_variables)))
683
822
  end
684
823
 
685
- puts grouped_mean((~:starwars), "eye_color", E.c("mass", "birth_year")).as__data__frame
824
+ puts grouped_mean((~:starwars), "eye_color", E.c("mass", "birth_year"))
686
825
  ```
687
826
 
827
+ # Further reading
828
+
829
+ For more information on GraalVM, TruffleRuby, fastR, R and Galaaz check out the following sites/posts:
830
+
831
+ * [GraalVM Home](https://www.graalvm.org/)
832
+ * [TruffleRuby](https://github.com/oracle/truffleruby)
833
+ * [FastR](https://github.com/oracle/fastr)
834
+ * [Faster R with FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
835
+ * [How to make Beautiful Ruby Plots with Galaaz](https://medium.freecodecamp.org/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857)
836
+ * [Ruby Plotting with Galaaz: An example of tightly coupling Ruby and R in GraalVM](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021)
837
+ * [How to do reproducible research in Ruby with gKnit](https://towardsdatascience.com/how-to-do-reproducible-research-in-ruby-with-gknit-c26d2684d64e)
838
+ * [R for Data Science](https://r4ds.had.co.nz/)
839
+ * [Advanced R](https://adv-r.hadley.nz/)
840
+
688
841
  # Conclusion
689
842
 
690
843
  Ruby and Galaaz provide a nice framework for developing code that uses R functions. Although R is