galaaz 0.4.10 → 0.5.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +2048 -531
- data/Rakefile +3 -2
- data/bin/gknit +152 -6
- data/bin/gknit-draft +105 -0
- data/bin/gknit-draft.rb +28 -0
- data/bin/gknit_Rscript +127 -0
- data/bin/grun +27 -1
- data/bin/gstudio +47 -4
- data/bin/{gstudio.rb → gstudio_irb.rb} +0 -0
- data/bin/gstudio_pry.rb +7 -0
- data/blogs/galaaz_ggplot/galaaz_ggplot.html +10 -195
- data/blogs/galaaz_ggplot/galaaz_ggplot.md +404 -0
- data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/midwest_rb.png +0 -0
- data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/scatter_plot_rb.png +0 -0
- data/blogs/gknit/gknit.Rmd +5 -3
- data/blogs/gknit/gknit.pdf +0 -0
- data/blogs/gknit/lst.rds +0 -0
- data/blogs/manual/lst.rds +0 -0
- data/blogs/manual/manual.Rmd +826 -53
- data/blogs/manual/manual.html +2338 -695
- data/blogs/manual/manual.md +2032 -539
- data/blogs/manual/manual.pdf +0 -0
- data/blogs/manual/manual.tex +1804 -594
- data/blogs/manual/manual_files/figure-html/bubble-1.png +0 -0
- data/blogs/manual/manual_files/figure-html/diverging_bar.png +0 -0
- data/blogs/manual/manual_files/figure-latex/bubble-1.png +0 -0
- data/blogs/manual/manual_files/figure-latex/diverging_bar.pdf +0 -0
- data/blogs/manual/model.rb +41 -0
- data/blogs/nse_dplyr/nse_dplyr.Rmd +226 -73
- data/blogs/nse_dplyr/nse_dplyr.html +254 -336
- data/blogs/nse_dplyr/nse_dplyr.md +353 -158
- data/blogs/oh_my/oh_my.html +274 -386
- data/blogs/oh_my/oh_my.md +208 -205
- data/blogs/ruby_plot/ruby_plot.html +20 -205
- data/blogs/ruby_plot/ruby_plot.md +14 -15
- data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
- data/examples/Bibliography/master.bib +50 -0
- data/examples/Bibliography/stats.bib +72 -0
- data/examples/islr/x_y_rnorm.jpg +0 -0
- data/examples/latex_templates/Test-acm_article/Makefile +16 -0
- data/examples/latex_templates/Test-acm_article/Test-acm_article.Rmd +65 -0
- data/examples/latex_templates/Test-acm_article/acm_proc_article-sp.cls +1670 -0
- data/examples/latex_templates/Test-acm_article/sensys-abstract.cls +703 -0
- data/examples/latex_templates/Test-acm_article/sigproc.bib +59 -0
- data/examples/latex_templates/Test-acs_article/Test-acs_article.Rmd +260 -0
- data/examples/latex_templates/Test-acs_article/Test-acs_article.pdf +0 -0
- data/examples/latex_templates/Test-acs_article/acs-Test-acs_article.bib +11 -0
- data/examples/latex_templates/Test-acs_article/acs-my_output.bib +11 -0
- data/examples/latex_templates/Test-acs_article/acstest.bib +17 -0
- data/examples/latex_templates/Test-aea_article/AEA.cls +1414 -0
- data/{blogs/gknit/marshal.dump → examples/latex_templates/Test-aea_article/BibFile.bib} +0 -0
- data/examples/latex_templates/Test-aea_article/Test-aea_article.Rmd +108 -0
- data/examples/latex_templates/Test-aea_article/Test-aea_article.pdf +0 -0
- data/examples/latex_templates/Test-aea_article/aea.bst +1269 -0
- data/examples/latex_templates/Test-aea_article/multicol.sty +853 -0
- data/examples/latex_templates/Test-aea_article/references.bib +0 -0
- data/examples/latex_templates/Test-aea_article/setspace.sty +546 -0
- data/examples/latex_templates/Test-amq_article/Test-amq_article.Rmd +256 -0
- data/examples/latex_templates/Test-amq_article/Test-amq_article.pdf +0 -0
- data/examples/latex_templates/Test-amq_article/Test-amq_article.pdfsync +3397 -0
- data/examples/latex_templates/Test-amq_article/pics/Figure2.pdf +0 -0
- data/examples/latex_templates/Test-ams_article/Test-ams_article.Rmd +215 -0
- data/examples/latex_templates/Test-ams_article/amstest.bib +436 -0
- data/examples/latex_templates/Test-asa_article/Test-asa_article.Rmd +153 -0
- data/examples/latex_templates/Test-asa_article/Test-asa_article.pdf +0 -0
- data/examples/latex_templates/Test-asa_article/agsm.bst +1353 -0
- data/examples/latex_templates/Test-asa_article/bibliography.bib +233 -0
- data/examples/latex_templates/Test-ieee_article/IEEEtran.bst +2409 -0
- data/examples/latex_templates/Test-ieee_article/IEEEtran.cls +6346 -0
- data/examples/latex_templates/Test-ieee_article/Test-ieee_article.Rmd +175 -0
- data/examples/latex_templates/Test-ieee_article/Test-ieee_article.pdf +0 -0
- data/examples/latex_templates/Test-ieee_article/mybibfile.bib +20 -0
- data/examples/latex_templates/Test-rjournal_article/RJournal.sty +335 -0
- data/examples/latex_templates/Test-rjournal_article/RJreferences.bib +18 -0
- data/examples/latex_templates/Test-rjournal_article/RJwrapper.pdf +0 -0
- data/examples/latex_templates/Test-rjournal_article/Test-rjournal_article.Rmd +52 -0
- data/examples/latex_templates/Test-springer_article/Test-springer_article.Rmd +65 -0
- data/examples/latex_templates/Test-springer_article/Test-springer_article.pdf +0 -0
- data/examples/latex_templates/Test-springer_article/bibliography.bib +26 -0
- data/examples/latex_templates/Test-springer_article/spbasic.bst +1658 -0
- data/examples/latex_templates/Test-springer_article/spmpsci.bst +1512 -0
- data/examples/latex_templates/Test-springer_article/spphys.bst +1443 -0
- data/examples/latex_templates/Test-springer_article/svglov3.clo +113 -0
- data/examples/latex_templates/Test-springer_article/svjour3.cls +1431 -0
- data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.Rmd +73 -0
- data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.pdf +0 -0
- data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.Rmd +382 -0
- data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.pdf +0 -0
- data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.Rmd +164 -0
- data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.pdf +0 -0
- data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.Rmd +92 -0
- data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.pdf +0 -0
- data/examples/rmarkdown/svm-rmarkdown-syllabus-example/attend-grade-relationships.csv +482 -0
- data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.Rmd +280 -0
- data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.pdf +0 -0
- data/examples/rmarkdown/svm-xaringan-example/svm-xaringan-example.Rmd +386 -0
- data/lib/R_interface/r.rb +1 -1
- data/lib/R_interface/r_libs.R +1 -1
- data/lib/R_interface/r_methods.rb +10 -0
- data/lib/R_interface/rpkg.rb +1 -0
- data/lib/R_interface/rsupport.rb +4 -6
- data/lib/gknit.rb +2 -0
- data/lib/gknit/draft.rb +105 -0
- data/lib/gknit/knitr_engine.rb +0 -33
- data/lib/util/exec_ruby.rb +1 -27
- data/specs/figures/bg.jpeg +0 -0
- data/specs/figures/bg.png +0 -0
- data/specs/figures/dose_len.png +0 -0
- data/specs/figures/no_args.jpeg +0 -0
- data/specs/figures/no_args.png +0 -0
- data/specs/figures/width_height.jpeg +0 -0
- data/specs/figures/width_height.png +0 -0
- data/specs/figures/width_height_units1.jpeg +0 -0
- data/specs/figures/width_height_units1.png +0 -0
- data/specs/figures/width_height_units2.jpeg +0 -0
- data/specs/figures/width_height_units2.png +0 -0
- data/specs/r_dataframe.spec.rb +11 -11
- data/specs/ruby_expression.spec.rb +1 -0
- data/specs/tmp.rb +41 -20
- data/version.rb +1 -1
- metadata +73 -35
- data/blogs/galaaz_ggplot/galaaz_ggplot.aux +0 -41
- data/blogs/galaaz_ggplot/galaaz_ggplot.out +0 -10
- data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/midwest_rb.pdf +0 -0
- data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/scatter_plot_rb.pdf +0 -0
- data/blogs/gknit/gknit.md +0 -1430
- data/blogs/gknit/gknit.tex +0 -1358
- data/blogs/manual/graph.rb +0 -29
- data/blogs/nse_dplyr/nse_dplyr.tex +0 -1373
- data/blogs/ruby_plot/ruby_plot.Rmd_external_figs +0 -662
- data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +0 -57
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +0 -106
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +0 -110
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +0 -174
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +0 -236
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +0 -296
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +0 -236
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +0 -218
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +0 -128
- data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +0 -150
- data/blogs/ruby_plot/ruby_plot_files/figure-latex/dose_len.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_delivery.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_dose.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color2.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_decorations.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_jitter.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_points.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_box_plot.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_violin_plot.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-latex/violin_with_jitter.png +0 -0
- data/examples/paper/paper.rb +0 -36
@@ -24,30 +24,59 @@ fontsize: 11pt
|
|
24
24
|
|
25
25
|
# Introduction
|
26
26
|
|
27
|
-
|
27
|
+
According to Steven Sagaert answer on Quora about "Is programming language R overrated?":
|
28
28
|
|
29
|
-
|
29
|
+
> R is a sophisticated language with an unusual (i.e. non-mainstream) set of features. It‘s
|
30
|
+
> an impure functional programming language with sophisticated metaprogramming and 3
|
31
|
+
> different OO systems.
|
32
|
+
|
33
|
+
> Just like common lisp you can completely customise how things work via metaprogramming.
|
34
|
+
> The biggest example is the tidyverse: by creating it’s own evaluation system (tidyeval)
|
35
|
+
> was able to create a custom syntax for dplyr.
|
36
|
+
|
37
|
+
> Mastering R (the language) and its ecosystem is not a matter of weeks or months but
|
38
|
+
> takes years. The rabbit hole goes pretty deep…
|
39
|
+
|
40
|
+
Although having a highly configurable language might give extreme power to the programmer,
|
41
|
+
it can also be, as stated above, a question of years to master it. Programming with _dplyr_
|
42
|
+
for instance, requires learning a set of complex concepts and rules that are not easily
|
43
|
+
accessible for casual users or _unsofisticated_ programmers as many users of R are. Being
|
44
|
+
_unsofisticated_ is NOT used here in a negative sense, as R was build for statitians and
|
45
|
+
not programmers, that need to solve real problems, often in a short time spam and are not
|
46
|
+
concerned about creating complex computer systems.
|
47
|
+
|
48
|
+
Unfortunatelly, if this _unsofisticated_ programmer decides to move unto more sofisticated
|
49
|
+
coding, the learning curve might become a serious impediment.
|
50
|
+
|
51
|
+
In this post we will see how to program with _dplyr_ in Galaaz and how Ruby can simplify
|
52
|
+
the learning curve of mastering _dplyr_ coding.
|
53
|
+
|
54
|
+
# But first, what is Galaaz??
|
30
55
|
|
31
56
|
Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with
|
32
|
-
a large community, a very large set of libraries and great for web development.
|
57
|
+
a large community, a very large set of libraries and great for web development. It is also
|
58
|
+
easy to learn. However,
|
33
59
|
it lacks libraries for data science, statistics, scientific plotting and machine learning.
|
34
60
|
On the other hand, R is considered one of the most powerful languages for solving all of the
|
35
61
|
above problems. Maybe the strongest competitor to R is Python with libraries such as NumPy,
|
36
|
-
Pandas, SciPy, SciKit-Learn and many more.
|
62
|
+
Pandas, SciPy, SciKit-Learn and many more. We will not get here in the discussion on R
|
63
|
+
versus Python, both are excellent languages with powerful features, benefits and drawbacks.
|
64
|
+
Our interest is to bring to yet another excellent language, Ruby, the data science libraries
|
65
|
+
that it lacks.
|
37
66
|
|
38
67
|
With Galaaz we do not intend to re-implement any of the scientific libraries in R. However, we
|
39
68
|
allow for very tight coupling between the two languages to the point that the Ruby
|
40
69
|
developer does not need to know that there is an R engine running. Also, from the point of
|
41
|
-
view of the R user/developer Galaaz looks a lot like R, with just minor syntactic difference,
|
42
|
-
so there is almost no learning
|
43
|
-
post
|
70
|
+
view of the R user/developer, Galaaz looks a lot like R, with just minor syntactic difference,
|
71
|
+
so there is almost no learning curve for the R developer. And as we will see in this
|
72
|
+
post that programming with _dplyr_ is easier in Galaaz than in R.
|
44
73
|
|
45
|
-
R users are probably quite knowledgeable about _dplyr_
|
74
|
+
R users are probably quite knowledgeable about _dplyr_. For the Ruby developer, _dplyr_ and
|
46
75
|
the _tidyverse_ libraries are a set of libraries for data manipulation in R, developed by
|
47
76
|
Hardley Wickham, chief scientis at RStudio and a prolific R coder and writer.
|
48
77
|
|
49
|
-
For the coupling of Ruby and R we use new technologies provided by Oracle: GraalVM,
|
50
|
-
TruffleRuby and FastR:
|
78
|
+
For the coupling of Ruby and R, we use new technologies provided by Oracle: GraalVM,
|
79
|
+
TruffleRuby and FastR. GraalVM home page had the following definition:
|
51
80
|
|
52
81
|
GraalVM is a universal virtual machine for running applications
|
53
82
|
written in JavaScript, Python 3, Ruby, R, JVM-based languages like Java,
|
@@ -76,22 +105,39 @@ TruffleRuby and FastR:
|
|
76
105
|
With GraalVM we aim to allow developers to freely choose the right language
|
77
106
|
for the task at hand without making compromises.
|
78
107
|
|
79
|
-
Interested readers should also check out the following sites:
|
80
108
|
|
81
|
-
|
82
|
-
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
|
89
|
-
|
109
|
+
# Tidyverse and dplyr
|
110
|
+
|
111
|
+
In [What is the tidyverse?](https://rviews.rstudio.com/2017/06/08/what-is-the-tidyverse/) the
|
112
|
+
tidyverse is explained as follows:
|
113
|
+
|
114
|
+
> The tidyverse is a coherent system of packages for data manipulation, exploration and
|
115
|
+
> visualization that share a common design philosophy. These were mostly developed by
|
116
|
+
> Hadley Wickham himself, but they are now being expanded by several contributors. Tidyverse
|
117
|
+
> packages are intended to make statisticians and data scientists more productive by
|
118
|
+
> guiding them through workflows that facilitate communication, and result in reproducible
|
119
|
+
> work products. Fundamentally, the tidyverse is about the connections between the tools
|
120
|
+
> that make the workflow possible.
|
121
|
+
|
122
|
+
_dplyr_ is one of the many packages that are part of the tidyverse. It is:
|
123
|
+
|
124
|
+
> a grammar of data manipulation, providing a consistent set of verbs that help you solve
|
125
|
+
> the most common data manipulation challenges:
|
90
126
|
|
91
|
-
|
127
|
+
> 1. mutate() adds new variables that are functions of existing variables
|
128
|
+
> 2. select() picks variables based on their names.
|
129
|
+
> 3. filter() picks cases based on their values.
|
130
|
+
> 4. summarise() reduces multiple values down to a single summary.
|
131
|
+
> 5. arrange() changes the ordering of the rows.
|
92
132
|
|
93
|
-
|
94
|
-
|
133
|
+
Very often R is used interactively and users use _dplyr_ to manipulate a single dataset
|
134
|
+
without programming. When users want to replicate their work for
|
135
|
+
multiple datasets, programming becomes necessary.
|
136
|
+
|
137
|
+
# Programming with dplyr
|
138
|
+
|
139
|
+
In the vignette ["Programming with dplyr"](https://dplyr.tidyverse.org/articles/programming.html),
|
140
|
+
Hardley Wickham states:
|
95
141
|
|
96
142
|
> Most dplyr functions use non-standard evaluation (NSE). This is a catch-all term that
|
97
143
|
> means they don’t follow the usual R rules of evaluation. Instead, they capture the
|
@@ -106,6 +152,8 @@ by Hardley Wickham. In it, Hardley states:
|
|
106
152
|
> database backends because dplyr itself doesn’t do any work, but instead generates the SQL
|
107
153
|
> that tells the database what to do.
|
108
154
|
|
155
|
+
But then he goes on:
|
156
|
+
|
109
157
|
> Unfortunately these benefits do not come for free. There are two main drawbacks:
|
110
158
|
|
111
159
|
> Most dplyr arguments are not referentially transparent. That means you can’t replace a value
|
@@ -131,8 +179,27 @@ filter(df, my_var == 1)
|
|
131
179
|
```
|
132
180
|
> This makes it hard to create functions with arguments that change how dplyr verbs are computed.
|
133
181
|
|
134
|
-
|
135
|
-
|
182
|
+
As a result of this, programming with _dplyr_ requires learning a set of new ideas and concepts.
|
183
|
+
In this vignette Hardley goes on showing how to program ever more difficult problems with _dplyr_,
|
184
|
+
showing the problems it faces and the new concepts needed to solve them.
|
185
|
+
|
186
|
+
In this blog, we will look at all the problems presented by Harley on the vignette and show how
|
187
|
+
those same problems can be solved using Galaaz and the Ruby language.
|
188
|
+
|
189
|
+
This blog is organized as follows: first we show how to write expressions using Galaaz.
|
190
|
+
Expressions are a fundamental concept in _dplyr_ and are not part of basic Ruby. We extend
|
191
|
+
the Ruby language create a manipulate expressions that will be used by _dplyr_ functions.
|
192
|
+
|
193
|
+
Then we show very succintly how Ruby and R can be integrated and how R functions are
|
194
|
+
transparently called from Ruby. Galaaz [user manual](https://github.com/rbotafogo/galaaz/wiki)
|
195
|
+
(still in development) goes in much deeper detail about this integration.
|
196
|
+
|
197
|
+
Next in section "Data manipulation wiht _dplyr_" we go through all the problems on the
|
198
|
+
_dplyr_ vignette and look at how they are solved in Galaaz. We then discuss why programming
|
199
|
+
with Galaaz and _dplyr_ is easier than programming with _dplyr_ in plain R.
|
200
|
+
|
201
|
+
The following section looks at another more advanced problem and shows that Galaaz can still
|
202
|
+
handle it without any difficulty. We then provide further reading and concluding remarks.
|
136
203
|
|
137
204
|
# Writing Expressions in Galaaz
|
138
205
|
|
@@ -141,12 +208,24 @@ Galaaz extends Ruby to work with expressions, similar to R's expressions build w
|
|
141
208
|
formulae. For instance, in mathematics, the expression $y = sin(x)$ describes a function but cannot
|
142
209
|
be computed unless the value of $x$ is bound to some value.
|
143
210
|
|
144
|
-
|
211
|
+
Expressions are fundamental in _dplyr_ programming as they are the input to _dplyr_ functions,
|
212
|
+
for instance, as we will see shortly, if a data frame has a column named 'x' and we want
|
213
|
+
to add another column, y, to this dataframe that has the values of 'x' times 2, then we would
|
214
|
+
call a _dplyr_ function with the expression 'y = x * 2'.
|
215
|
+
|
216
|
+
## A note on notation
|
217
|
+
|
218
|
+
This blog was written in Rmarkdown and automatically converted to HTML or PDF (depending on
|
219
|
+
where you are reading this blog) with gKnit (a tool provided by Galaaz). In Rmarkdown, it is
|
220
|
+
possible to write text and code blocks that are executed to generate the final report. Code
|
221
|
+
blocks appear inside a 'box' and the result of their execution appear either in another type
|
222
|
+
of 'box' with a different background (HTML) or as normal text (PDF). Every output line from
|
223
|
+
the code execution is preceeded by '##'.
|
145
224
|
|
146
225
|
## Expressions from operators
|
147
226
|
|
148
|
-
The code
|
149
|
-
are not bound to any
|
227
|
+
The code below creates an expression summing two symbols. Note that :a and :b are Ruby symbols and
|
228
|
+
are not bound to any values at the time of expression definition:
|
150
229
|
|
151
230
|
|
152
231
|
```ruby
|
@@ -157,7 +236,7 @@ puts exp1
|
|
157
236
|
```
|
158
237
|
## a + b
|
159
238
|
```
|
160
|
-
|
239
|
+
In Galaaz, we can build any complex mathematical expression such as:
|
161
240
|
|
162
241
|
|
163
242
|
```ruby
|
@@ -168,7 +247,13 @@ puts exp2
|
|
168
247
|
```
|
169
248
|
## (a + b) * 2 + c^2L/z
|
170
249
|
```
|
171
|
-
|
250
|
+
Expressions are printed with the same format as the equivalent R expressions. The 'L' after
|
251
|
+
2 indicates that 2 is an integer.
|
252
|
+
|
253
|
+
The R developer should note that in R, if she writes the
|
254
|
+
number '2', the R interpreter will convert it to float. In order to get an interger she
|
255
|
+
should write '2L'. Galaaz follows Ruby notation and '2' is an integer, while '2.0' is a
|
256
|
+
float.
|
172
257
|
|
173
258
|
It is also possible to use inequality operators in building expressions:
|
174
259
|
|
@@ -186,13 +271,13 @@ Expressions' definition can also make use of normal Ruby variables without any p
|
|
186
271
|
|
187
272
|
```ruby
|
188
273
|
x = 20
|
189
|
-
y = 30
|
274
|
+
y = 30.0
|
190
275
|
exp_var = (:a + :b) * x <= :z - y
|
191
276
|
puts exp_var
|
192
277
|
```
|
193
278
|
|
194
279
|
```
|
195
|
-
## (a + b) * 20L <= z -
|
280
|
+
## (a + b) * 20L <= z - 30
|
196
281
|
```
|
197
282
|
|
198
283
|
Galaaz provides both symbolic representations for operators, such as (>, <, !=) as functional
|
@@ -209,10 +294,12 @@ puts exp4
|
|
209
294
|
## a + b >= z
|
210
295
|
```
|
211
296
|
|
212
|
-
Two
|
213
|
-
of the operators
|
214
|
-
|
215
|
-
|
297
|
+
Two types of expressions, however, can only be created with the functional representation
|
298
|
+
of the operators. Those are expressions involving '==', and '='. This is the case since
|
299
|
+
those symbols have special meaning in Ruby and should not be redefined.
|
300
|
+
|
301
|
+
In order to write an expression involving '==' we
|
302
|
+
need to use the method '.eq' and for '=' we need the function '.assign':
|
216
303
|
|
217
304
|
|
218
305
|
```ruby
|
@@ -233,8 +320,9 @@ puts exp6
|
|
233
320
|
```
|
234
321
|
## y <- a + b
|
235
322
|
```
|
236
|
-
|
237
|
-
|
323
|
+
Users should be careful when writing expressions not to inadvertently use '==' or '=' as
|
324
|
+
this will generate an error, that might be a bit cryptic (in future releases of Galaza, we
|
325
|
+
plan to improve the error message).
|
238
326
|
|
239
327
|
|
240
328
|
```ruby
|
@@ -248,17 +336,16 @@ puts exp_wrong
|
|
248
336
|
## object 'a' not found (RError)
|
249
337
|
## Translated to internal error
|
250
338
|
```
|
251
|
-
|
252
|
-
when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When
|
253
|
-
comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols at
|
254
|
-
this time are not bound to anything
|
255
|
-
If we only use functional notation, this type of error will not occur.
|
339
|
+
The problem lies with the fact that
|
340
|
+
when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When this
|
341
|
+
comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols, at
|
342
|
+
this time, are not bound to anything giving the "object 'a' not found" message.
|
256
343
|
|
257
344
|
## Expressions with R methods
|
258
345
|
|
259
346
|
It is often necessary to create an expression that uses a method or function. For instance, in
|
260
347
|
mathematics, it's quite natural to write an expressin such as $y = sin(x)$. In this case, the
|
261
|
-
'sin' function is part of the expression and should not immediately
|
348
|
+
'sin' function is part of the expression and should not be immediately executed. When we want
|
262
349
|
the function to be part of the expression, we call the function preceeding it
|
263
350
|
by the letter E, such as 'E.sin(x)'
|
264
351
|
|
@@ -271,8 +358,7 @@ puts exp7
|
|
271
358
|
```
|
272
359
|
## y <- sin(x)
|
273
360
|
```
|
274
|
-
|
275
|
-
Expressions can also be written using '.' notation:
|
361
|
+
Function expressions can also be written using '.' notation:
|
276
362
|
|
277
363
|
|
278
364
|
```ruby
|
@@ -283,8 +369,9 @@ puts exp8
|
|
283
369
|
```
|
284
370
|
## y <- sin(x)
|
285
371
|
```
|
286
|
-
|
287
|
-
|
372
|
+
When a function has multiple arguments, the first one can be used before the '.'. For instance,
|
373
|
+
the R concatenate function 'c', that concatenates two or more arguments can be part of
|
374
|
+
an expression as:
|
288
375
|
|
289
376
|
|
290
377
|
```ruby
|
@@ -295,11 +382,16 @@ puts exp9
|
|
295
382
|
```
|
296
383
|
## c(x, y)
|
297
384
|
```
|
385
|
+
Note that this gives an OO feeling to the code, as if we were saying 'x' concatenates 'y'. As a
|
386
|
+
side note, '.' notation can be used as the R pipe operator '%>%', but is more general than the
|
387
|
+
pipe.
|
298
388
|
|
299
389
|
## Evaluating an Expression
|
300
390
|
|
301
|
-
|
302
|
-
with a
|
391
|
+
Although we are mainly focusing on expressions to pass them to _dplyr_ functions, expressions
|
392
|
+
can be evaluated by calling function 'eval' with a binding.
|
393
|
+
|
394
|
+
A binding can be provided with a list or a data frame as shown below:
|
303
395
|
|
304
396
|
|
305
397
|
```ruby
|
@@ -311,7 +403,7 @@ puts exp.eval(R.list(a: 10, b: 20, c: 30, z: 40))
|
|
311
403
|
## [1] 82.5
|
312
404
|
```
|
313
405
|
|
314
|
-
|
406
|
+
with a data frame:
|
315
407
|
|
316
408
|
|
317
409
|
```ruby
|
@@ -336,8 +428,8 @@ this post, we do not have enough space to write a complete manual on Galaaz
|
|
336
428
|
(a short manual can be found at: https://www.rubydoc.info/gems/galaaz/0.4.9), so we will
|
337
429
|
present only a few examples scripts using Galaaz.
|
338
430
|
|
339
|
-
Basically, to call an R function from Ruby with Galaaz, one only needs to
|
340
|
-
with 'R.'. For instance, to create a vector in R, the 'c' function is used.
|
431
|
+
Basically, to call an R function from Ruby with Galaaz, one only needs to preced the function
|
432
|
+
with 'R.'. For instance, to create a vector in R, the 'c' function is used. In Galaaz, a
|
341
433
|
vector can be created by using 'R.c':
|
342
434
|
|
343
435
|
|
@@ -399,11 +491,103 @@ puts exp10
|
|
399
491
|
## y <- 0.850903524534118
|
400
492
|
```
|
401
493
|
|
402
|
-
#
|
494
|
+
# Data manipulation wiht _dplyr_
|
495
|
+
|
496
|
+
In this section we will give a brief tour _dplyr_'s usage in Galaaz and how to manipulate
|
497
|
+
data in Ruby with it. This section will follow [_dplyr_'s vignette](https://dplyr.tidyverse.org/articles/dplyr.html) that explores the nycflights13 data set. This dataset contains all 336776
|
498
|
+
flights that departed from New York City in 2013. The data comes from the US Bureau of
|
499
|
+
Transportation Statistics.
|
500
|
+
|
501
|
+
Let's start by taking a look at this dataset:
|
502
|
+
|
503
|
+
|
504
|
+
```ruby
|
505
|
+
R.library('nycflights13')
|
506
|
+
# check it's dimension
|
507
|
+
puts ~:flights.dim
|
508
|
+
# and the structure
|
509
|
+
~:flights.str
|
510
|
+
```
|
511
|
+
|
512
|
+
```
|
513
|
+
## Message:
|
514
|
+
## Method ~ not found in R environment
|
515
|
+
```
|
516
|
+
|
517
|
+
```
|
518
|
+
## Message:
|
519
|
+
## /home/rbotafogo/desenv/galaaz/lib/R_interface/rsupport.rb:90:in `eval'
|
520
|
+
## /home/rbotafogo/desenv/galaaz/lib/R_interface/rsupport.rb:270:in `exec_function_name'
|
521
|
+
## /home/rbotafogo/desenv/galaaz/lib/R_interface/robject.rb:166:in `method_missing'
|
522
|
+
## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:105:in `get_binding'
|
523
|
+
## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:102:in `eval'
|
524
|
+
## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:102:in `exec_ruby'
|
525
|
+
## /home/rbotafogo/desenv/galaaz/lib/gknit/knitr_engine.rb:650:in `block in initialize'
|
526
|
+
## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `call'
|
527
|
+
## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `callback'
|
528
|
+
## (eval):3:in `function(...) {\n rb_method(...)'
|
529
|
+
## unknown.r:1:in `in_dir'
|
530
|
+
## unknown.r:1:in `block_exec'
|
531
|
+
## /usr/local/lib/graalvm-ce-java11-20.0.0/languages/R/library/knitr/R/block.R:92:in `call_block'
|
532
|
+
## /usr/local/lib/graalvm-ce-java11-20.0.0/languages/R/library/knitr/R/block.R:6:in `process_group.block'
|
533
|
+
## /usr/local/lib/graalvm-ce-java11-20.0.0/languages/R/library/knitr/R/block.R:3:in `<no source>'
|
534
|
+
## unknown.r:1:in `withCallingHandlers'
|
535
|
+
## unknown.r:1:in `process_file'
|
536
|
+
## unknown.r:1:in `<no source>'
|
537
|
+
## unknown.r:1:in `<no source>'
|
538
|
+
## <REPL>:4:in `<repl wrapper>'
|
539
|
+
## <REPL>:1
|
540
|
+
```
|
541
|
+
|
542
|
+
Now, let's use a first verb of _dplyr_: 'filter'. This verb, obviously, will filter the data
|
543
|
+
by the given expression. In the next block, we filter by columns 'month' and 'day'. The
|
544
|
+
first argument to the filter function is symbol ':flights'. A Ruby symbol, when given to
|
545
|
+
an R function will convert to the R variable of the same name, in this case 'flights', that
|
546
|
+
holds the nycflights13 data frame.
|
547
|
+
|
548
|
+
The second and third arguments are expressions that will be used by the filter function to
|
549
|
+
filter by columns, looking for entries in which the month and day are equal to 1.
|
550
|
+
|
551
|
+
|
552
|
+
```ruby
|
553
|
+
puts R.filter(:flights, (:month.eq 1), (:day.eq 1))
|
554
|
+
```
|
555
|
+
|
556
|
+
```
|
557
|
+
## # A tibble: 842 x 19
|
558
|
+
## year month day dep_time sched_dep_time dep_delay arr_time
|
559
|
+
## <int> <int> <int> <int> <int> <dbl> <int>
|
560
|
+
## 1 2013 1 1 517 515 2 830
|
561
|
+
## 2 2013 1 1 533 529 4 850
|
562
|
+
## 3 2013 1 1 542 540 2 923
|
563
|
+
## 4 2013 1 1 544 545 -1 1004
|
564
|
+
## 5 2013 1 1 554 600 -6 812
|
565
|
+
## 6 2013 1 1 554 558 -4 740
|
566
|
+
## 7 2013 1 1 555 600 -5 913
|
567
|
+
## 8 2013 1 1 557 600 -3 709
|
568
|
+
## 9 2013 1 1 557 600 -3 838
|
569
|
+
## 10 2013 1 1 558 600 -2 753
|
570
|
+
## # … with 832 more rows, and 12 more variables: sched_arr_time <int>,
|
571
|
+
## # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
|
572
|
+
## # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
|
573
|
+
## # minute <dbl>, time_hour <dttm>
|
574
|
+
```
|
575
|
+
|
576
|
+
|
577
|
+
## Programming with _dplyr_: problems and how to solve them in Galaaz
|
578
|
+
|
579
|
+
In this section we look at the list of problems that Hardley describes in the "Programming with dplyr"
|
580
|
+
vignette and show how those problems are solved and coded with Galaaz. Readers interested in
|
581
|
+
how those problems are treated in _dplyr_ should read the vignette and use it as a comparison with
|
582
|
+
this blog.
|
403
583
|
|
404
|
-
|
405
|
-
|
406
|
-
|
584
|
+
## Filtering using expressions
|
585
|
+
|
586
|
+
Now that we know how to write expressions and call R functions, let's do some data manipulation in
|
587
|
+
Galaaz. Let's first start by creating a data frame. In R, the 'data.frame' function creates a
|
588
|
+
data frame. In Ruby, writing 'data.frame' will not parse as a single object. To call R
|
589
|
+
functions that have a '.' in them, we need to substitute the '.' with '__'. So, method
|
590
|
+
'data.frame' in R, is called in Galaaz as 'R.data\_\_frame':
|
407
591
|
|
408
592
|
|
409
593
|
```ruby
|
@@ -417,9 +601,15 @@ puts df
|
|
417
601
|
## 2 2 2
|
418
602
|
## 3 3 1
|
419
603
|
```
|
420
|
-
|
421
|
-
|
422
|
-
|
604
|
+
|
605
|
+
_dplyr_ provides the 'filter' function, that filters data in a data brame. The 'filter'
|
606
|
+
function can be called on this data frame either by using 'R.filter(df, ...)' or
|
607
|
+
by using dot notation.
|
608
|
+
|
609
|
+
-------FIX---------
|
610
|
+
|
611
|
+
We prefer to use dot notation as shown bellow. The argument to 'filter' should be an
|
612
|
+
expression. Note that if we gave to filter a Ruby expression such as
|
423
613
|
'x == 1', we would get an error, since there is no variable 'x' defined and if 'x' was a variable
|
424
614
|
then 'x == 1' would either be 'true' or 'false'. Our goal is to filter our data frame returning
|
425
615
|
all rows in which the 'x' value is equal to 1. To express this we want: ':x.eq 1', where :x will
|
@@ -434,7 +624,6 @@ puts df.filter(:x.eq 1)
|
|
434
624
|
## x y
|
435
625
|
## 1 1 3
|
436
626
|
```
|
437
|
-
|
438
627
|
In R, and when coding with 'tidyverse', arguments to a function are usually not
|
439
628
|
*referencially transparent*. That is, you can’t replace a value with a seemingly equivalent
|
440
629
|
object that you’ve defined elsewhere. In other words, this code
|
@@ -504,7 +693,7 @@ puts df.filter(:x.eq y)
|
|
504
693
|
## x y
|
505
694
|
## 1 1 3
|
506
695
|
```
|
507
|
-
|
696
|
+
## Writing a function that applies to different data sets
|
508
697
|
|
509
698
|
Let's suppose that we want to write a function that receives as the first argument a data frame
|
510
699
|
and as second argument an expression that adds a column to the data frame that is equal to the
|
@@ -571,7 +760,7 @@ mutate_y(df1)
|
|
571
760
|
## mismatched protect/unprotect (unprotect with empty protect stack) (RError)
|
572
761
|
## Translated to internal error
|
573
762
|
```
|
574
|
-
|
763
|
+
## Different expressions
|
575
764
|
|
576
765
|
Let's move to the next problem as presented by Hardley where trying to write a function in R
|
577
766
|
that will receive two argumens, the first a variable and the second an expression is not trivial.
|
@@ -589,16 +778,16 @@ df <- data.frame(
|
|
589
778
|
b = sample(5)
|
590
779
|
)
|
591
780
|
|
592
|
-
as.data.frame(df)
|
781
|
+
as.data.frame(df)
|
593
782
|
```
|
594
783
|
|
595
784
|
```
|
596
785
|
## g1 g2 a b
|
597
|
-
## 1 1 1
|
598
|
-
## 2 1 2
|
599
|
-
## 3 2 1 5
|
600
|
-
## 4 2 2
|
601
|
-
## 5 2 1 1
|
786
|
+
## 1 1 1 3 3
|
787
|
+
## 2 1 2 2 1
|
788
|
+
## 3 2 1 5 2
|
789
|
+
## 4 2 2 4 5
|
790
|
+
## 5 2 1 1 4
|
602
791
|
```
|
603
792
|
|
604
793
|
```r
|
@@ -606,13 +795,13 @@ d2 <- df %>%
|
|
606
795
|
group_by(g1) %>%
|
607
796
|
summarise(a = mean(a))
|
608
797
|
|
609
|
-
as.data.frame(d2)
|
798
|
+
as.data.frame(d2)
|
610
799
|
```
|
611
800
|
|
612
801
|
```
|
613
|
-
## g1
|
614
|
-
## 1 1
|
615
|
-
## 2 2 3
|
802
|
+
## g1 a
|
803
|
+
## 1 1 2.500000
|
804
|
+
## 2 2 3.333333
|
616
805
|
```
|
617
806
|
|
618
807
|
```r
|
@@ -624,9 +813,9 @@ as.data.frame(d2)
|
|
624
813
|
```
|
625
814
|
|
626
815
|
```
|
627
|
-
## g2
|
628
|
-
## 1 1
|
629
|
-
## 2 2 3
|
816
|
+
## g2 a
|
817
|
+
## 1 1 3
|
818
|
+
## 2 2 3
|
630
819
|
```
|
631
820
|
|
632
821
|
As shown by Hardley, one might expect this function to do the trick:
|
@@ -662,38 +851,42 @@ def my_summarize(df, group_var)
|
|
662
851
|
summarize(a: :a.mean)
|
663
852
|
end
|
664
853
|
|
665
|
-
puts my_summarize(:df, :g1)
|
854
|
+
puts my_summarize(:df, :g1)
|
666
855
|
```
|
667
856
|
|
668
857
|
```
|
669
858
|
## g1 g2 a b
|
670
|
-
## 1 1 1
|
671
|
-
## 2 1 2
|
672
|
-
## 3 2 1 5
|
673
|
-
## 4 2 2
|
674
|
-
## 5 2 1 1
|
859
|
+
## 1 1 1 3 3
|
860
|
+
## 2 1 2 2 1
|
861
|
+
## 3 2 1 5 2
|
862
|
+
## 4 2 2 4 5
|
863
|
+
## 5 2 1 1 4
|
675
864
|
##
|
676
|
-
##
|
677
|
-
##
|
678
|
-
##
|
865
|
+
## # A tibble: 2 x 2
|
866
|
+
## g1 a
|
867
|
+
## <dbl> <dbl>
|
868
|
+
## 1 1 2.5
|
869
|
+
## 2 2 3.33
|
679
870
|
```
|
680
871
|
It works!!! Well, let's make sure this was not just some coincidence
|
681
872
|
|
682
873
|
|
683
874
|
```ruby
|
684
|
-
puts my_summarize(:df, :g2)
|
875
|
+
puts my_summarize(:df, :g2)
|
685
876
|
```
|
686
877
|
|
687
878
|
```
|
688
|
-
##
|
689
|
-
##
|
690
|
-
##
|
879
|
+
## # A tibble: 2 x 2
|
880
|
+
## g2 a
|
881
|
+
## <dbl> <dbl>
|
882
|
+
## 1 1 3
|
883
|
+
## 2 2 3
|
691
884
|
```
|
692
885
|
|
693
886
|
Great, everything is fine! No magic, no new functions, no complexities, just normal, standard Ruby
|
694
887
|
code. If you've ever done NSE in R, this certainly feels much safer and easy to implement.
|
695
888
|
|
696
|
-
|
889
|
+
## Different input variables
|
697
890
|
|
698
891
|
In the previous section we've managed to get rid of all NSE formulation for a simple example, but
|
699
892
|
does this remain true for more complex examples, or will the Galaaz way prove inpractical for
|
@@ -743,7 +936,7 @@ puts my_summarise2((~:df), :a * :b)
|
|
743
936
|
Once again, there is no need to use any special theory or functions. The only point to be
|
744
937
|
careful about is the use of 'E' to build expressions from functions 'mean', 'sum' and 'n'.
|
745
938
|
|
746
|
-
|
939
|
+
## Different input and output variable
|
747
940
|
|
748
941
|
Now the next challenge presented by Hardley is to vary the name of the output variables based on
|
749
942
|
the received expression. So, if the input expression is 'a', we want our data frame columns to
|
@@ -792,17 +985,17 @@ puts my_mutate((~:df), :b)
|
|
792
985
|
|
793
986
|
```
|
794
987
|
## g1 g2 a b mean_a sum_a
|
795
|
-
## 1 1 1
|
796
|
-
## 2 1 2
|
797
|
-
## 3 2 1 5
|
798
|
-
## 4 2 2
|
799
|
-
## 5 2 1 1
|
988
|
+
## 1 1 1 3 3 3 15
|
989
|
+
## 2 1 2 2 1 3 15
|
990
|
+
## 3 2 1 5 2 3 15
|
991
|
+
## 4 2 2 4 5 3 15
|
992
|
+
## 5 2 1 1 4 3 15
|
800
993
|
## g1 g2 a b mean_b sum_b
|
801
|
-
## 1 1 1
|
802
|
-
## 2 1 2
|
803
|
-
## 3 2 1 5
|
804
|
-
## 4 2 2
|
805
|
-
## 5 2 1 1
|
994
|
+
## 1 1 1 3 3 3 15
|
995
|
+
## 2 1 2 2 1 3 15
|
996
|
+
## 3 2 1 5 2 3 15
|
997
|
+
## 4 2 2 4 5 3 15
|
998
|
+
## 5 2 1 1 4 3 15
|
806
999
|
```
|
807
1000
|
It really seems that "Non Standard Evaluation" is actually quite standard in Galaaz! But, you
|
808
1001
|
might have noticed a small change in the way the arguments to the mutate method were called.
|
@@ -812,7 +1005,7 @@ and variable mean\_name is not followed by ':' but by '=>'. This is standard Ru
|
|
812
1005
|
|
813
1006
|
[explain....]
|
814
1007
|
|
815
|
-
|
1008
|
+
## Capturing multiple variables
|
816
1009
|
|
817
1010
|
Moving on with new complexities, Hardley proposes us to solve the problem in which the
|
818
1011
|
summarise function will receive any number of grouping variables.
|
@@ -827,15 +1020,18 @@ def my_summarise3(df, *group_vars)
|
|
827
1020
|
summarise(a: E.mean(:a))
|
828
1021
|
end
|
829
1022
|
|
830
|
-
puts my_summarise3((~:df), :g1, :g2)
|
1023
|
+
puts my_summarise3((~:df), :g1, :g2)
|
831
1024
|
```
|
832
1025
|
|
833
1026
|
```
|
834
|
-
##
|
835
|
-
##
|
836
|
-
##
|
837
|
-
##
|
838
|
-
##
|
1027
|
+
## # A tibble: 4 x 3
|
1028
|
+
## # Groups: g1 [?]
|
1029
|
+
## g1 g2 a
|
1030
|
+
## <dbl> <dbl> <dbl>
|
1031
|
+
## 1 1 1 3
|
1032
|
+
## 2 1 2 2
|
1033
|
+
## 3 2 1 3
|
1034
|
+
## 4 2 2 4
|
839
1035
|
```
|
840
1036
|
|
841
1037
|
# Why does R require NSE and Galaaz does not?
|
@@ -868,7 +1064,7 @@ call the function passing the expression ':a.eq :b'.
|
|
868
1064
|
|
869
1065
|
# Advanced dplyr features
|
870
1066
|
|
871
|
-
In the blog: Programming with dplyr by using dplyr
|
1067
|
+
In the blog: [Programming with dplyr by using dplyr](https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/) Iñaki Úcar shows surprise that some R users are trying to code in dplyr avoiding
|
872
1068
|
the use of NSE. For instance he says:
|
873
1069
|
|
874
1070
|
> Take the example of seplyr. It stands for standard evaluation dplyr, and enables us to
|
@@ -887,38 +1083,21 @@ features of characters in the Starwars movies:
|
|
887
1083
|
|
888
1084
|
|
889
1085
|
```ruby
|
890
|
-
puts (~:starwars).head
|
891
|
-
```
|
892
|
-
|
893
|
-
```
|
894
|
-
##
|
895
|
-
##
|
896
|
-
##
|
897
|
-
##
|
898
|
-
##
|
899
|
-
##
|
900
|
-
##
|
901
|
-
##
|
902
|
-
##
|
903
|
-
##
|
904
|
-
##
|
905
|
-
## 4 male Tatooine Human
|
906
|
-
## 5 female Alderaan Human
|
907
|
-
## 6 male Tatooine Human
|
908
|
-
## films
|
909
|
-
## 1 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
|
910
|
-
## 2 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
|
911
|
-
## 3 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
|
912
|
-
## 4 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
|
913
|
-
## 5 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
|
914
|
-
## 6 Attack of the Clones, Revenge of the Sith, A New Hope
|
915
|
-
## vehicles starships
|
916
|
-
## 1 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle
|
917
|
-
## 2
|
918
|
-
## 3
|
919
|
-
## 4 TIE Advanced x1
|
920
|
-
## 5 Imperial Speeder Bike
|
921
|
-
## 6
|
1086
|
+
puts (~:starwars).head
|
1087
|
+
```
|
1088
|
+
|
1089
|
+
```
|
1090
|
+
## # A tibble: 6 x 13
|
1091
|
+
## name height mass hair_color skin_color eye_color birth_year gender
|
1092
|
+
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
|
1093
|
+
## 1 Luke… 172 77 blond fair blue 19 male
|
1094
|
+
## 2 C-3PO 167 75 <NA> gold yellow 112 <NA>
|
1095
|
+
## 3 R2-D2 96 32 <NA> white, bl… red 33 <NA>
|
1096
|
+
## 4 Dart… 202 136 none white yellow 41.9 male
|
1097
|
+
## 5 Leia… 150 49 brown light brown 19 female
|
1098
|
+
## 6 Owen… 178 120 brown, gr… light blue 52 male
|
1099
|
+
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
|
1100
|
+
## # vehicles <list>, starships <list>
|
922
1101
|
```
|
923
1102
|
The grouped_mean function bellow will receive a grouping variable and calculate summaries for
|
924
1103
|
the value\_variables given:
|
@@ -970,28 +1149,44 @@ def grouped_mean(data, grouping_variables, value_variables)
|
|
970
1149
|
rename_at(value_variables, E.funs(E.paste0("mean_", value_variables)))
|
971
1150
|
end
|
972
1151
|
|
973
|
-
puts grouped_mean((~:starwars), "eye_color", E.c("mass", "birth_year"))
|
1152
|
+
puts grouped_mean((~:starwars), "eye_color", E.c("mass", "birth_year"))
|
974
1153
|
```
|
975
1154
|
|
976
1155
|
```
|
977
|
-
##
|
978
|
-
##
|
979
|
-
##
|
980
|
-
## 3
|
981
|
-
##
|
982
|
-
##
|
983
|
-
##
|
984
|
-
##
|
985
|
-
##
|
986
|
-
##
|
987
|
-
##
|
988
|
-
##
|
989
|
-
##
|
990
|
-
##
|
991
|
-
##
|
992
|
-
##
|
1156
|
+
## # A tibble: 15 x 4
|
1157
|
+
## eye_color mean_mass mean_birth_year count
|
1158
|
+
## <chr> <dbl> <dbl> <dbl>
|
1159
|
+
## 1 black 76.3 33 10
|
1160
|
+
## 2 blue 86.5 67.1 19
|
1161
|
+
## 3 blue-gray 77 57 1
|
1162
|
+
## 4 brown 66.1 109. 21
|
1163
|
+
## 5 dark NaN NaN 1
|
1164
|
+
## 6 gold NaN NaN 1
|
1165
|
+
## 7 green, yellow 159 NaN 1
|
1166
|
+
## 8 hazel 66 34.5 3
|
1167
|
+
## 9 orange 282. 231 8
|
1168
|
+
## 10 pink NaN NaN 1
|
1169
|
+
## 11 red 81.4 33.7 5
|
1170
|
+
## 12 red, blue NaN NaN 1
|
1171
|
+
## 13 unknown 31.5 NaN 3
|
1172
|
+
## 14 white 48 NaN 1
|
1173
|
+
## 15 yellow 81.1 76.4 11
|
993
1174
|
```
|
994
1175
|
|
1176
|
+
# Further reading
|
1177
|
+
|
1178
|
+
For more information on GraalVM, TruffleRuby, fastR, R and Galaaz check out the following sites/posts:
|
1179
|
+
|
1180
|
+
* [GraalVM Home](https://www.graalvm.org/)
|
1181
|
+
* [TruffleRuby](https://github.com/oracle/truffleruby)
|
1182
|
+
* [FastR](https://github.com/oracle/fastr)
|
1183
|
+
* [Faster R with FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
|
1184
|
+
* [How to make Beautiful Ruby Plots with Galaaz](https://medium.freecodecamp.org/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857)
|
1185
|
+
* [Ruby Plotting with Galaaz: An example of tightly coupling Ruby and R in GraalVM](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021)
|
1186
|
+
* [How to do reproducible research in Ruby with gKnit](https://towardsdatascience.com/how-to-do-reproducible-research-in-ruby-with-gknit-c26d2684d64e)
|
1187
|
+
* [R for Data Science](https://r4ds.had.co.nz/)
|
1188
|
+
* [Advanced R](https://adv-r.hadley.nz/)
|
1189
|
+
|
995
1190
|
# Conclusion
|
996
1191
|
|
997
1192
|
Ruby and Galaaz provide a nice framework for developing code that uses R functions. Although R is
|