galaaz 0.4.6 → 0.5.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/README.md +3575 -118
- data/Rakefile +21 -4
- data/bin/gknit +152 -6
- data/bin/gknit-draft +105 -0
- data/bin/gknit-draft.rb +28 -0
- data/bin/gknit_Rscript +127 -0
- data/bin/grun +27 -1
- data/bin/gstudio +47 -4
- data/bin/{gstudio.rb → gstudio_irb.rb} +0 -0
- data/bin/gstudio_pry.rb +7 -0
- data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +3 -12
- data/blogs/galaaz_ggplot/galaaz_ggplot.html +77 -222
- data/blogs/galaaz_ggplot/galaaz_ggplot.md +4 -31
- data/blogs/galaaz_ggplot/galaaz_ggplot.pdf +0 -0
- data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/midwest_rb.png +0 -0
- data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/scatter_plot_rb.png +0 -0
- data/blogs/galaaz_ggplot/midwest.Rmd +1 -9
- data/blogs/gknit/gknit.Rmd +232 -123
- data/blogs/{dev/dev.html → gknit/gknit.html} +1897 -33
- data/blogs/gknit/gknit.pdf +0 -0
- data/blogs/gknit/lst.rds +0 -0
- data/blogs/gknit/stats.bib +27 -0
- data/blogs/manual/lst.rds +0 -0
- data/blogs/manual/manual.Rmd +1893 -47
- data/blogs/manual/manual.html +3153 -347
- data/blogs/manual/manual.md +3575 -118
- data/blogs/manual/manual.pdf +0 -0
- data/blogs/manual/manual.tex +4026 -0
- data/blogs/manual/manual_files/figure-html/bubble-1.png +0 -0
- data/blogs/manual/manual_files/figure-html/diverging_bar.png +0 -0
- data/blogs/manual/manual_files/figure-latex/bubble-1.png +0 -0
- data/blogs/manual/manual_files/figure-latex/diverging_bar.pdf +0 -0
- data/blogs/{dev → manual}/model.rb +0 -0
- data/blogs/nse_dplyr/nse_dplyr.Rmd +849 -0
- data/blogs/nse_dplyr/nse_dplyr.html +878 -0
- data/blogs/nse_dplyr/nse_dplyr.md +1198 -0
- data/blogs/nse_dplyr/nse_dplyr.pdf +0 -0
- data/blogs/oh_my/oh_my.html +274 -386
- data/blogs/oh_my/oh_my.md +208 -205
- data/blogs/ruby_plot/ruby_plot.Rmd +64 -84
- data/blogs/ruby_plot/ruby_plot.html +235 -208
- data/blogs/ruby_plot/ruby_plot.md +239 -34
- data/blogs/ruby_plot/ruby_plot.pdf +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
- data/examples/Bibliography/master.bib +50 -0
- data/examples/Bibliography/stats.bib +72 -0
- data/examples/islr/ch2.spec.rb +1 -1
- data/examples/islr/ch3_boston.rb +4 -4
- data/examples/islr/x_y_rnorm.jpg +0 -0
- data/examples/latex_templates/Test-acm_article/Makefile +16 -0
- data/examples/latex_templates/Test-acm_article/Test-acm_article.Rmd +65 -0
- data/examples/latex_templates/Test-acm_article/acm_proc_article-sp.cls +1670 -0
- data/examples/latex_templates/Test-acm_article/sensys-abstract.cls +703 -0
- data/examples/latex_templates/Test-acm_article/sigproc.bib +59 -0
- data/examples/latex_templates/Test-acs_article/Test-acs_article.Rmd +260 -0
- data/examples/latex_templates/Test-acs_article/Test-acs_article.pdf +0 -0
- data/examples/latex_templates/Test-acs_article/acs-Test-acs_article.bib +11 -0
- data/examples/latex_templates/Test-acs_article/acs-my_output.bib +11 -0
- data/examples/latex_templates/Test-acs_article/acstest.bib +17 -0
- data/examples/latex_templates/Test-aea_article/AEA.cls +1414 -0
- data/examples/latex_templates/Test-aea_article/BibFile.bib +0 -0
- data/examples/latex_templates/Test-aea_article/Test-aea_article.Rmd +108 -0
- data/examples/latex_templates/Test-aea_article/Test-aea_article.pdf +0 -0
- data/examples/latex_templates/Test-aea_article/aea.bst +1269 -0
- data/examples/latex_templates/Test-aea_article/multicol.sty +853 -0
- data/examples/latex_templates/Test-aea_article/references.bib +0 -0
- data/examples/latex_templates/Test-aea_article/setspace.sty +546 -0
- data/examples/latex_templates/Test-amq_article/Test-amq_article.Rmd +256 -0
- data/examples/latex_templates/Test-amq_article/Test-amq_article.pdf +0 -0
- data/examples/latex_templates/Test-amq_article/Test-amq_article.pdfsync +3397 -0
- data/examples/latex_templates/Test-amq_article/pics/Figure2.pdf +0 -0
- data/examples/latex_templates/Test-ams_article/Test-ams_article.Rmd +215 -0
- data/examples/latex_templates/Test-ams_article/amstest.bib +436 -0
- data/examples/latex_templates/Test-asa_article/Test-asa_article.Rmd +153 -0
- data/examples/latex_templates/Test-asa_article/Test-asa_article.pdf +0 -0
- data/examples/latex_templates/Test-asa_article/agsm.bst +1353 -0
- data/examples/latex_templates/Test-asa_article/bibliography.bib +233 -0
- data/examples/latex_templates/Test-ieee_article/IEEEtran.bst +2409 -0
- data/examples/latex_templates/Test-ieee_article/IEEEtran.cls +6346 -0
- data/examples/latex_templates/Test-ieee_article/Test-ieee_article.Rmd +175 -0
- data/examples/latex_templates/Test-ieee_article/Test-ieee_article.pdf +0 -0
- data/examples/latex_templates/Test-ieee_article/mybibfile.bib +20 -0
- data/examples/latex_templates/Test-rjournal_article/RJournal.sty +335 -0
- data/examples/latex_templates/Test-rjournal_article/RJreferences.bib +18 -0
- data/examples/latex_templates/Test-rjournal_article/RJwrapper.pdf +0 -0
- data/examples/latex_templates/Test-rjournal_article/Test-rjournal_article.Rmd +52 -0
- data/examples/latex_templates/Test-springer_article/Test-springer_article.Rmd +65 -0
- data/examples/latex_templates/Test-springer_article/Test-springer_article.pdf +0 -0
- data/examples/latex_templates/Test-springer_article/bibliography.bib +26 -0
- data/examples/latex_templates/Test-springer_article/spbasic.bst +1658 -0
- data/examples/latex_templates/Test-springer_article/spmpsci.bst +1512 -0
- data/examples/latex_templates/Test-springer_article/spphys.bst +1443 -0
- data/examples/latex_templates/Test-springer_article/svglov3.clo +113 -0
- data/examples/latex_templates/Test-springer_article/svjour3.cls +1431 -0
- data/examples/misc/moneyball.rb +1 -1
- data/examples/misc/subsetting.rb +37 -37
- data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.Rmd +73 -0
- data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.pdf +0 -0
- data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.Rmd +382 -0
- data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.pdf +0 -0
- data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.Rmd +164 -0
- data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.pdf +0 -0
- data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.Rmd +92 -0
- data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.pdf +0 -0
- data/examples/rmarkdown/svm-rmarkdown-syllabus-example/attend-grade-relationships.csv +482 -0
- data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.Rmd +280 -0
- data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.pdf +0 -0
- data/examples/rmarkdown/svm-xaringan-example/svm-xaringan-example.Rmd +386 -0
- data/lib/R_interface/r.rb +2 -2
- data/lib/R_interface/r_libs.R +6 -1
- data/lib/R_interface/r_methods.rb +12 -2
- data/lib/R_interface/rdata_frame.rb +8 -17
- data/lib/R_interface/rindexed_object.rb +1 -2
- data/lib/R_interface/rlist.rb +1 -0
- data/lib/R_interface/robject.rb +20 -23
- data/lib/R_interface/rpkg.rb +15 -6
- data/lib/R_interface/rsupport.rb +13 -19
- data/lib/R_interface/ruby_extensions.rb +14 -18
- data/lib/R_interface/rvector.rb +0 -12
- data/lib/gknit.rb +2 -0
- data/lib/gknit/draft.rb +105 -0
- data/lib/gknit/knitr_engine.rb +6 -37
- data/lib/util/exec_ruby.rb +22 -84
- data/lib/util/inline_file.rb +7 -3
- data/specs/figures/bg.jpeg +0 -0
- data/specs/figures/bg.png +0 -0
- data/specs/figures/bg.svg +2 -2
- data/specs/figures/dose_len.png +0 -0
- data/specs/figures/no_args.jpeg +0 -0
- data/specs/figures/no_args.png +0 -0
- data/specs/figures/no_args.svg +2 -2
- data/specs/figures/width_height.jpeg +0 -0
- data/specs/figures/width_height.png +0 -0
- data/specs/figures/width_height_units1.jpeg +0 -0
- data/specs/figures/width_height_units1.png +0 -0
- data/specs/figures/width_height_units2.jpeg +0 -0
- data/specs/figures/width_height_units2.png +0 -0
- data/specs/r_dataframe.spec.rb +184 -11
- data/specs/r_list.spec.rb +4 -4
- data/specs/r_list_apply.spec.rb +11 -10
- data/specs/ruby_expression.spec.rb +3 -11
- data/specs/tmp.rb +106 -34
- data/version.rb +1 -1
- metadata +96 -33
- data/bin/gknit_old_r +0 -236
- data/blogs/dev/dev.Rmd +0 -77
- data/blogs/dev/dev.md +0 -87
- data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
- data/blogs/dev/dev_files/figure-html/diverging_bar. +0 -0
- data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0
- data/blogs/dplyr/dplyr.rb +0 -63
- data/blogs/galaaz_ggplot/galaaz_ggplot.aux +0 -43
- data/blogs/galaaz_ggplot/galaaz_ggplot.log +0 -640
- data/blogs/galaaz_ggplot/galaaz_ggplot.out +0 -10
- data/blogs/galaaz_ggplot/galaaz_ggplot.tex +0 -481
- data/blogs/galaaz_ggplot/midwest.png +0 -0
- data/blogs/galaaz_ggplot/scatter_plot.png +0 -0
- data/blogs/ruby_plot/ruby_plot.Rmd_external_figs +0 -662
- data/blogs/ruby_plot/ruby_plot.tex +0 -1077
- data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +0 -57
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +0 -106
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +0 -110
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +0 -174
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +0 -236
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +0 -296
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +0 -236
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +0 -218
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +0 -128
- data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +0 -150
- data/examples/paper/paper.rb +0 -36
Binary file
|
Binary file
|
@@ -1,662 +0,0 @@
|
|
1
|
-
---
|
2
|
-
title: "How to make Beautiful Ruby Plots with Galaaz"
|
3
|
-
author:
|
4
|
-
- "Rodrigo Botafogo"
|
5
|
-
- "Daniel Mossé - University of Pittsburgh"
|
6
|
-
tags: [Tech, Data Science, Ruby, R, GraalVM]
|
7
|
-
date: "November 19th, 2018"
|
8
|
-
output:
|
9
|
-
html_document:
|
10
|
-
self_contained: true
|
11
|
-
keep_md: true
|
12
|
-
pdf_document:
|
13
|
-
includes:
|
14
|
-
in_header: "../../sty/galaaz.sty"
|
15
|
-
keep_tex: yes
|
16
|
-
number_sections: yes
|
17
|
-
toc: true
|
18
|
-
toc_depth: 2
|
19
|
-
fontsize: 11pt
|
20
|
-
---
|
21
|
-
|
22
|
-
```{r setup, echo=FALSE}
|
23
|
-
# set global chunk options. We want all figures to be 'svg'
|
24
|
-
knitr::opts_chunk$set(fig.width=7, fig.height=7, dev="svg")
|
25
|
-
```
|
26
|
-
|
27
|
-
According to Wikipedia "Ruby is a dynamic, interpreted, reflective, object-oriented,
|
28
|
-
general-purpose programming language. It was designed and developed in the mid-1990s by Yukihiro
|
29
|
-
"Matz" Matsumoto in Japan." It reached high popularity with the development of Ruby on Rails
|
30
|
-
(RoR) by David Heinemeier Hansson. RoR is a web application framework first released
|
31
|
-
around 2005. It makes extensive use of Ruby's metaprogramming features. With RoR,
|
32
|
-
Ruby became very popular. According to [Ruby's Tiobe index](https://www.tiobe.com/tiobe-index/ruby/)
|
33
|
-
it peeked in popularity around 2008, then declined until 2015 when it started picking up again.
|
34
|
-
At the time of this writing (November 2018), the Tiobe index puts Ruby in 16th position as
|
35
|
-
most popular language.
|
36
|
-
|
37
|
-
Python, a language similar to Ruby, ranks 4th in the index. Java, C and C++ take the
|
38
|
-
first three positions. Ruby is often criticized for its focus on web applications.
|
39
|
-
But Ruby can do [much more](https://github.com/markets/awesome-ruby) than just web applications.
|
40
|
-
Yet, for scientific computing, Ruby lags way behind Python and R. Python has
|
41
|
-
Django framework for web, NumPy for numerical arrays, Pandas for data analysis.
|
42
|
-
R is a free software environment for statistical computing and graphics with thousands
|
43
|
-
of libraries for data analysis.
|
44
|
-
|
45
|
-
Until recently, there was no real perspective for Ruby to bridge this gap.
|
46
|
-
Implementing a complete scientific computing infrastructure would take too long.
|
47
|
-
Enters [Oracle's GraalVM](https://www.graalvm.org/):
|
48
|
-
|
49
|
-
> GraalVM is a universal virtual machine for running applications written in
|
50
|
-
> JavaScript, Python 3, Ruby, R, JVM-based languages like Java, Scala, Kotlin,
|
51
|
-
> and LLVM-based languages such as C and C++.
|
52
|
-
>
|
53
|
-
> GraalVM removes the isolation between programming languages and enables
|
54
|
-
> interoperability in a shared runtime. It can run either standalone or in the
|
55
|
-
> context of OpenJDK, Node.js, Oracle Database, or MySQL.
|
56
|
-
>
|
57
|
-
> GraalVM allows you to write polyglot applications with a seamless way to pass
|
58
|
-
> values from one language to another. With GraalVM there is no copying or
|
59
|
-
> marshaling necessary as it is with other polyglot systems. This lets you
|
60
|
-
> achieve high performance when language boundaries are crossed. Most of the time
|
61
|
-
> there is no additional cost for crossing a language boundary at all.
|
62
|
-
>
|
63
|
-
> Often developers have to make uncomfortable compromises that require them
|
64
|
-
> to rewrite their software in other languages. For example:
|
65
|
-
>
|
66
|
-
> * That library is not available in my language. I need to rewrite it.
|
67
|
-
> * That language would be the perfect fit for my problem, but we cannot
|
68
|
-
> run it in our environment.
|
69
|
-
> * That problem is already solved in my language, but the language is
|
70
|
-
> too slow.
|
71
|
-
>
|
72
|
-
> With GraalVM we aim to allow developers to freely choose the right language for
|
73
|
-
> the task at hand without making compromises.
|
74
|
-
|
75
|
-
As stated above, GraalVM is a _universal_ virtual machine that allows Ruby and R (and other
|
76
|
-
languages) to run on the same environment. GraalVM allows polyglot applications to
|
77
|
-
_seamlessly_ interact with one another and pass values from one language to the other.
|
78
|
-
Although a great idea, GraalVM still requires application writers to know several languages.
|
79
|
-
To eliminate that requirement, we built Galaaz, a gem for Ruby, to tightly couple
|
80
|
-
Ruby and R and allow those languages to interact in a way that the user will be unaware
|
81
|
-
of such interaction. In other words, a Ruby programmer will be able to use all
|
82
|
-
the capabilities of R without knowing the R syntax.
|
83
|
-
|
84
|
-
Library wrapping is a usual way of bringing features from one language into another.
|
85
|
-
To improve performance, Python often wraps more efficient C libraries. For the
|
86
|
-
Python developer, the existence of such C libraries is hidden. The problem with
|
87
|
-
library wrapping is that for any new library, there is the need to handcraft a new
|
88
|
-
wrapper.
|
89
|
-
|
90
|
-
Galaaz, instead of wrapping a single C or R library, wraps the whole R language
|
91
|
-
in Ruby. Doing so, all thousands of R libraries are available immediately
|
92
|
-
to Ruby developers without any new wrapping effort.
|
93
|
-
|
94
|
-
To show the power of Galaaz, we show in this article how Ruby can use R's ggplot2
|
95
|
-
library tranparantly bringing to Ruby the power of high quality scientific plotting.
|
96
|
-
We also show that migrating from R to Ruby with Galaaz is a matter of small
|
97
|
-
syntactic changes. By using Ruby, the R developer can use all of Ruby's powerful
|
98
|
-
object-oriented features. Also, with Ruby, it becomes much easier to move code
|
99
|
-
from the analysis phase to the production phase.
|
100
|
-
|
101
|
-
In this article we will explore the R ToothGrowth dataset. To illustrate, we will
|
102
|
-
create some boxplots. A primer on boxplot is available in
|
103
|
-
[this article](https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51).
|
104
|
-
|
105
|
-
We will also create a Corporate Template ensuring that plots will have a consistent
|
106
|
-
visualization. This template is built using a Ruby module. There is a way of building
|
107
|
-
ggplot themes that will work the same as the Ruby module. Yet, writing a new theme
|
108
|
-
requires specific knowledge on theme writing. Ruby modules are standard to the
|
109
|
-
language and don't need special knowledge.
|
110
|
-
|
111
|
-
[Here](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021) we show a scatter plot in Ruby also with Galaaz.
|
112
|
-
|
113
|
-
# gKnit
|
114
|
-
|
115
|
-
_Knitr_ is an application that converts text written in rmarkdown to many
|
116
|
-
different output formats. For instance, a writer can convert an rmarkdown document
|
117
|
-
to HTML, $LaTex$, docx and many other formats. Rmarkdown documents can contain
|
118
|
-
text and _code chunks_. Knitr formats code chunks in a grayed box in the output document.
|
119
|
-
It also executes the code chunks and formats the output in a white box. Every line of
|
120
|
-
output from the execution code is preceded by '##'.
|
121
|
-
|
122
|
-
Knitr allows code chunks to be in R, Python,
|
123
|
-
Ruby and dozens of other languages. Yet, while R and Python chunks can share data, in other
|
124
|
-
languages, chunks are independent. This means that a variable defined in one chunk
|
125
|
-
cannot be used in another chunk.
|
126
|
-
|
127
|
-
With _gKnit_ Ruby code chunks can share data. In gKnit each
|
128
|
-
Ruby chunk executes in its own scope and thus, local variable defined in a chunk are
|
129
|
-
not accessible by other chunks. Yet, All chunks execute in the scope of a 'chunk'
|
130
|
-
class and instance variables ('@'), are available in all chunks.
|
131
|
-
|
132
|
-
# Exploring the Dataset
|
133
|
-
|
134
|
-
Let's start by exploring our selected dataset. ToothGrowth is an R dataset. A dataset
|
135
|
-
is like a simple excel spreadsheet, in which each column has only one type of data.
|
136
|
-
For instance one column can have float, the other integer, and a third strings.
|
137
|
-
This dataset analyzes the length of odontoblasts (cells responsible for tooth growth)
|
138
|
-
in 60 guinea pigs, where each animal received one of three dose levels of Vitamin C
|
139
|
-
(0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice OJ or ascorbic acid
|
140
|
-
(a form of vitamin C and coded as VC).
|
141
|
-
|
142
|
-
The ToothGrowth dataset contains three columns: 'len', 'supp' and 'dose'. Let's
|
143
|
-
take a look at a few rows of this dataset. In Galaaz, R variables are accessed
|
144
|
-
by using the corresponding Ruby symbol preceeded by the tilda ('~') function. Note in the
|
145
|
-
following chunk that 'ToothGrowth' is the R variable and Ruby's '@tooth_growth' is
|
146
|
-
assigned the value of '~:ToothGrowth'.
|
147
|
-
|
148
|
-
```{ruby head}
|
149
|
-
# Read the R ToothGrowth variable and assign it to the
|
150
|
-
# Ruby instance variable @tooth_growth that will be
|
151
|
-
# available to all Ruby chunks in this document.
|
152
|
-
@tooth_growth = ~:ToothGrowth
|
153
|
-
# print the first few elements of the dataset
|
154
|
-
puts @tooth_growth.head
|
155
|
-
```
|
156
|
-
|
157
|
-
Great! We've managed to read the ToothGrowth dataset and take a look at its elements.
|
158
|
-
We see here the first 6 rows of the dataset. To access a column, follow the dataset name
|
159
|
-
with a dot ('.') and the name of the column. Also use dot notation to chain methods
|
160
|
-
in usual Ruby style.
|
161
|
-
|
162
|
-
```{ruby dataset_columns}
|
163
|
-
# Access the tooth_growth 'len' column and print the first few
|
164
|
-
# elements of this column with the 'head' method.
|
165
|
-
puts @tooth_growth.len.head
|
166
|
-
```
|
167
|
-
|
168
|
-
The 'dose' column contains a numeric value with either, 0.5, 1 or 2, although the
|
169
|
-
first 6 rows as seen above only contain the 0.5 values. Even though those are
|
170
|
-
number, they are better interpreted as a [factor or cathegory](https://swcarpentry.github.io/r-novice-inflammation/12-supp-factors/). So, let's convert our 'dose' column from numeric to 'factor'.
|
171
|
-
In R, the function 'as.factor' is used to convert data in a vector to factors. To use this
|
172
|
-
function from Galaaz the dot ('.') in the function name is substituted by '__' (double underline).
|
173
|
-
The function 'as.factor' becomes 'R.as__factor' or just 'as__factor' when chaining.
|
174
|
-
|
175
|
-
```{ruby tooth_growth}
|
176
|
-
# convert the dose to a factor
|
177
|
-
@tooth_growth.dose = @tooth_growth.dose.as__factor
|
178
|
-
```
|
179
|
-
|
180
|
-
Let's explore some more details of this dataset. In particular, let's look at its dimensions,
|
181
|
-
structure and summary statistics.
|
182
|
-
|
183
|
-
```{ruby dim}
|
184
|
-
puts @tooth_growth.dim
|
185
|
-
```
|
186
|
-
|
187
|
-
This dataset has 60 rows, one for each subject and 3 columns, as we have already seen.
|
188
|
-
|
189
|
-
Note that we do not need to call 'puts' when using the 'str' function. This
|
190
|
-
functions does not return anything and prints the structure of the dataset
|
191
|
-
as a side effect.
|
192
|
-
|
193
|
-
```{ruby str}
|
194
|
-
@tooth_growth.str
|
195
|
-
```
|
196
|
-
Observe that both variables 'supp' and 'dose' are factors. The system made variable 'supp'
|
197
|
-
a factor automatically, since it contais two strings OJ and VC.
|
198
|
-
|
199
|
-
Finally, using the summary method, we get the statistical summary for the dataset
|
200
|
-
|
201
|
-
```{ruby summary}
|
202
|
-
puts @tooth_growth.summary
|
203
|
-
```
|
204
|
-
|
205
|
-
# Doing the Data Analysis
|
206
|
-
|
207
|
-
## Quick plot for seing the data
|
208
|
-
|
209
|
-
Let's now create our first plot with the given data by accessing ggplot2 from Ruby.
|
210
|
-
For Rubyists that have never seen or used ggplot2, here is the description of ggplot
|
211
|
-
found in its home page:
|
212
|
-
|
213
|
-
> "ggplot2 is a system for declaratively creating graphics, based on _The Grammar of Graphics_.
|
214
|
-
> You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical
|
215
|
-
> primitives to use, and it takes care of the details."
|
216
|
-
|
217
|
-
This description might be a bit cryptic and it is best to see it at work to understand it.
|
218
|
-
Basically, in the _grammar of graphics_ developers add layers of components such as grid,
|
219
|
-
axis, data, title, subtitle and also graphical primitives such as _bar plot_, _box plot_,
|
220
|
-
to form the final graphics.
|
221
|
-
|
222
|
-
In order to make a plot, we use the 'ggplot' function to the dataset. In R, this would be
|
223
|
-
written as ```ggplot(<dataset>, ...)```. Galaaz gives you the flexibility to use
|
224
|
-
either ```R.ggplot(<dataset>, ...)``` or ```<dataset>.ggplot(...)```. In the graph s
|
225
|
-
pecification bellow, we use the second notation
|
226
|
-
that looks more like Ruby. ggplot uses the ‘aes’ method to specify
|
227
|
-
x and y axes; in this case, the 'dose' on the $x$ axis and the 'length' on
|
228
|
-
the $y$ axis: 'E.aes(x: :dose, y: :len)'. To specify the type of plot add a geom to
|
229
|
-
the plot. For a boxplot, the geom is R.geom_boxplot.
|
230
|
-
|
231
|
-
Note also that we have a call to 'R.png' before plotting and 'R.dev__off' after the print
|
232
|
-
statement. 'R.png' opens a 'png device' for outputting the plot. If we do no pass a
|
233
|
-
name to the 'png' function, the
|
234
|
-
image gets a default name of 'Rplot\<nnn\>' where \<nnn\> is the number of the plot.
|
235
|
-
'R.dev__off'
|
236
|
-
closes the device and creates the 'png' file. We can
|
237
|
-
then include the generated 'png' file in the document by adding an rmarkdown directive.
|
238
|
-
|
239
|
-
```{ruby dose_len}
|
240
|
-
require 'ggplot'
|
241
|
-
|
242
|
-
e = @tooth_growth.ggplot(E.aes(x: :dose, y: :len))
|
243
|
-
print e + R.geom_boxplot
|
244
|
-
```
|
245
|
-
|
246
|
-
[//]: # (Including the 'png' file generated above. In future releases)
|
247
|
-
[//]: # (of gKnit, the figures should be automatically saved and the name)
|
248
|
-
[//]: # (taken from the chunk 'label' and possibly chunk parameters)
|
249
|
-
|
250
|
-
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/dose_len.png)
|
251
|
-
|
252
|
-
Great! We've just managed to create and save our first plot in Ruby with only
|
253
|
-
four lines of code. We can now easily see with this plot a clear trend: as the
|
254
|
-
dose of the supplement
|
255
|
-
is increased, so is the length of teeth.
|
256
|
-
|
257
|
-
## Facetting the plot
|
258
|
-
|
259
|
-
This first plot shows a trend, but our data has information about two different forms
|
260
|
-
of delivery method, either by Orange Juice OJ or by Vitamin C VC.
|
261
|
-
Let's then try to create a plot that helps us discern the effect of each
|
262
|
-
delivery method. This next
|
263
|
-
plot is a _facetted_ plot where each delivery method gets is own plot.
|
264
|
-
On the left side, the plot shows the OJ delivery method. On the right side,
|
265
|
-
we see the VC delivery method. To obtain this plot, we use the
|
266
|
-
'R.facet_grid' function, that
|
267
|
-
automatically creates the facets based on the delivery method factors. The parameter to
|
268
|
-
the 'facet_grid' method is a [_formula_](https://thomasleeper.com/Rcourse/Tutorials/formulae.html).
|
269
|
-
|
270
|
-
In Galaaz we give programmers the flexibility to use two different ways to write formulas.
|
271
|
-
In the first way, the following changes from writing formulas (for example 'x ~ y')
|
272
|
-
in R are necessary:
|
273
|
-
|
274
|
-
* R symbols are represented by the same Ruby symbol prefixed with the '+' method. The
|
275
|
-
symbol ```x``` in R becomes ```+:x``` in Ruby;
|
276
|
-
* The '~' operator in R becomes '=~' in Ruby. The formula ```x ~ y``` in R is written as
|
277
|
-
```+:x =~ +:y``` in Ruby;
|
278
|
-
* The '.' symbol in R becomes '+:all'
|
279
|
-
|
280
|
-
Another way of writing a formula is to use the 'formula' function with the actual formula as
|
281
|
-
a string. The formula ```x ~ y``` in R can be written as ```R.formula("x ~ y")```. For more
|
282
|
-
complex formulas, the use of the 'formula' function is preferred.
|
283
|
-
|
284
|
-
The formula ```+:all =~ +:supp``` indicates to the 'facet_grid' function that it needs to
|
285
|
-
facet the plot based on the ```supp``` variable and split the plot vertically. Changing
|
286
|
-
the formula to ```+:supp =~ +:all``` would split the plot horizontally.
|
287
|
-
|
288
|
-
```{ruby facet_by_delivery}
|
289
|
-
@base_tooth = @tooth_growth.ggplot(E.aes(x: :dose, y: :len, group: :dose))
|
290
|
-
|
291
|
-
@bp = @base_tooth + R.geom_boxplot +
|
292
|
-
# Split in vertical direction
|
293
|
-
R.facet_grid(+:all =~ +:supp)
|
294
|
-
|
295
|
-
puts @bp
|
296
|
-
```
|
297
|
-
|
298
|
-
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facet_by_delivery.png)
|
299
|
-
|
300
|
-
It now becomes clear that although both methods of delivery have a direct
|
301
|
-
impact on tooth growth, method (OJ) is non-linear having a higher impact with smaller
|
302
|
-
doses of ascorbic acid and reducing it's impact as the dose increases. With the
|
303
|
-
(VC) approach, the impact seems to be more linear.
|
304
|
-
|
305
|
-
## Adding Color
|
306
|
-
|
307
|
-
If we were writing about data analysis, we would make a better analysis of the trends and
|
308
|
-
improve the statistical analysis. But here we are interested in working with ggplot
|
309
|
-
in Ruby. So, let's add some color to this plot to make the trend and comparison more
|
310
|
-
visible. In the following plot, the boxes are color coded by dose. To add color, it is
|
311
|
-
enough to add ```fill: :dose``` to the aesthetic of boxplot. With this command each 'dose'
|
312
|
-
factor gets its own color.
|
313
|
-
|
314
|
-
```{ruby facets_by_delivery_color}
|
315
|
-
@bp = @bp + R.geom_boxplot(E.aes(fill: :dose))
|
316
|
-
puts @bp
|
317
|
-
```
|
318
|
-
|
319
|
-
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_by_delivery_color.png)
|
320
|
-
|
321
|
-
Facetting helps us compare the general trends for each delivery method.
|
322
|
-
Adding color allow us to compare specifically how each dosage impacts the tooth growth.
|
323
|
-
It is possible to observe that with smaller doses, up to 1mg, OJ performs better
|
324
|
-
than VC (red color). For 2mg, both OJ and VC have the same median, but OJ is
|
325
|
-
less disperse (blue color).
|
326
|
-
For 1mg (green color), OJ is significantly bettern than VC. By this very quick
|
327
|
-
visual analysis, it seems that OJ is a better delivery method than VC.
|
328
|
-
|
329
|
-
## Clarifying the data
|
330
|
-
|
331
|
-
Boxplots give us a nice idea of the distribution of data, but looking at those plots with
|
332
|
-
large colored boxes leaves us wondering what else is going on. According to
|
333
|
-
Edward Tufte in Envisioning Information:
|
334
|
-
|
335
|
-
> Thin data rightly prompts suspicions: "What are they leaving out? Is that really everything
|
336
|
-
> they know? What are they hiding? Is that all they did?" Now and then it is claimed
|
337
|
-
> that vacant space is "friendly" (anthropomorphizing an inherently murky idea) but
|
338
|
-
> _it is not how much empty space there is, but rather how it is used. It is not how much
|
339
|
-
> information there is, but rather how effectively it is arranged._
|
340
|
-
|
341
|
-
And he states:
|
342
|
-
|
343
|
-
> A most unconventional design strategy is revealed: _to clarify, add detail._
|
344
|
-
|
345
|
-
Let's use this wisdom and add yet another layer of data to our plot, so that we clarify
|
346
|
-
it with detail and do not leave large empty boxes. In this next plot, we add data points for
|
347
|
-
each of the 60 pigs in the experiment. For that, add the function 'R.geom_point' to the
|
348
|
-
plot.
|
349
|
-
|
350
|
-
```{ruby facets_with_points}
|
351
|
-
# Split in vertical direction
|
352
|
-
@bp = @bp + R.geom_point
|
353
|
-
|
354
|
-
puts @bp
|
355
|
-
```
|
356
|
-
|
357
|
-
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_with_points.png)
|
358
|
-
|
359
|
-
Now we can see the actual distribution of all the 60 subjects. Actually, this is not
|
360
|
-
totally true. We have a hard time seing all 60 subjects. It seems that some points
|
361
|
-
might be placed one over the other hiding useful information.
|
362
|
-
|
363
|
-
But no sweat! Another layer might solve the problem. In the following plot a new layer
|
364
|
-
called 'geom_jitter' is added to the plot. Jitter adds a small amount of random variation
|
365
|
-
to the location of each point, and is a useful way of handling overplotting caused by
|
366
|
-
discreteness in smaller datasets. This makes it easier to see all of the points and
|
367
|
-
prevents data hiding. We also add
|
368
|
-
color and change the shape of the points, making them even easier to see.
|
369
|
-
|
370
|
-
```{ruby facets_with_jitter}
|
371
|
-
# Split in vertical direction
|
372
|
-
puts @bp + R.geom_jitter(shape: 23, color: "cyan3", size: 1)
|
373
|
-
```
|
374
|
-
|
375
|
-
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_with_jitter.png)
|
376
|
-
|
377
|
-
Now we can see all 60 points in the graph. We have here a much higher information density
|
378
|
-
and we can see outliers and subjects distribution.
|
379
|
-
|
380
|
-
# Preparing the Plot for Presentation
|
381
|
-
|
382
|
-
We have come a long way since our first plot. As we already said, this is not
|
383
|
-
an article about data analysis and the focus is on the
|
384
|
-
integration of Ruby and ggplot. So, let's assume that the analysis is now done. Yet,
|
385
|
-
ending the analysis does not mean that the work is done. On the contrary, the hardest
|
386
|
-
part is yet to come!
|
387
|
-
|
388
|
-
After the analysis it is necessary to communicate it by making a final plot for
|
389
|
-
presentation. The last plot has all the information we want to share, but it is not very
|
390
|
-
pleasing to the eye.
|
391
|
-
|
392
|
-
## Improving Colors
|
393
|
-
|
394
|
-
Let's start by trying to improve colors. For now, we will not use the jitter layer.
|
395
|
-
The previous plot has three bright colors that have no relashionship between them. Is
|
396
|
-
there any obvious, or non-obvious for that matter, interpretation for the colors?
|
397
|
-
Clearly, they are just random colors selected automatically by our software. Although
|
398
|
-
those colors helped us understand the data, for a final presentation random colors
|
399
|
-
can distract the viewer.
|
400
|
-
|
401
|
-
In the following plot we use shades function 'scale_fill_manual' to change
|
402
|
-
the colors of the boxes and order of labels. For colors, we use shades of blue for
|
403
|
-
each dosage, with light blue ('cyan')
|
404
|
-
representing the lower dose and deep blue ('deepskyblue4') the higher dose.
|
405
|
-
Also, the legend could be improved: we use the ‘breaks’ parameter to put
|
406
|
-
the smaller value (0.5) at the botton of the labels and the largest (2) at the top.
|
407
|
-
This ordering seems more natural and
|
408
|
-
matches with the actual order of the colors in the plot.
|
409
|
-
|
410
|
-
```{ruby facets_by_delivery_color2}
|
411
|
-
@bp = @bp +
|
412
|
-
R.scale_fill_manual(values: R.c("cyan", "deepskyblue", "deepskyblue4"),
|
413
|
-
breaks: R.c("2","1","0.5"))
|
414
|
-
|
415
|
-
puts @bp
|
416
|
-
```
|
417
|
-
|
418
|
-
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_by_delivery_color2.png)
|
419
|
-
|
420
|
-
|
421
|
-
## Violin Plot and Jitter
|
422
|
-
|
423
|
-
The boxplot with jitter did look a bit overwhelming. The next plot uses a variation of
|
424
|
-
a boxplot known as a _violin plot_ with jittered data.
|
425
|
-
|
426
|
-
[From Wikipedia](https://en.wikipedia.org/wiki/Violin_plot)
|
427
|
-
|
428
|
-
|
429
|
-
> A violin plot is a method of plotting numeric data. It is similar to a box plot with
|
430
|
-
> a rotated kernel density plot on each side.
|
431
|
-
>
|
432
|
-
> A violin plot has four layers. The outer shape represents all possible results, with
|
433
|
-
> thickness indicating how common. (Thus the thickest section represents the mode average.)
|
434
|
-
> The next layer inside represents the values that occur 95% of the time.
|
435
|
-
> The next layer (if it exists) inside represents the values that occur 50% of the time.
|
436
|
-
> The central dot represents the median average value.
|
437
|
-
|
438
|
-
```{ruby violin_with_jitter}
|
439
|
-
@violin = @base_tooth + R.geom_violin(E.aes(fill: :dose)) +
|
440
|
-
R.facet_grid(+:all =~ +:supp) +
|
441
|
-
R.geom_jitter(shape: 23, color: "cyan3", size: 1) +
|
442
|
-
R.scale_fill_manual(values: R.c("cyan", "deepskyblue", "deepskyblue4"),
|
443
|
-
breaks: R.c("2","1","0.5"))
|
444
|
-
|
445
|
-
puts @violin
|
446
|
-
```
|
447
|
-
|
448
|
-
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/violin_with_jitter.png)
|
449
|
-
|
450
|
-
This plot is an alternative to the original boxplot. For the final presentation, it is
|
451
|
-
important to think which graphics will be best understood by our audience. A violin plot
|
452
|
-
is a less known plot and could add mental overhead, yet, in my opinion, it does look a lit
|
453
|
-
bit better than the boxplot and provides even more information than the boxplot with jitter.
|
454
|
-
|
455
|
-
## Adding Decoration
|
456
|
-
|
457
|
-
Our final plot is starting to take shape, but a presentation plot should have at least a
|
458
|
-
title, labels on the axes and maybe some other decorations. Let's start adding those.
|
459
|
-
Since decoration requires more graph area, this new plot has a 'width' and 'height'
|
460
|
-
specification. When there is no specification, the default values from R for width and
|
461
|
-
height are 480.
|
462
|
-
|
463
|
-
The 'labs' function adds the required decoration. In this example we use 'title',
|
464
|
-
'subtitle', 'x' for the $x$ axis label and 'y', for the $y$ axis label, and 'caption'
|
465
|
-
for information about the plot (for clarity, we defined a caption variable using Ruby's
|
466
|
-
Here Doc style).
|
467
|
-
|
468
|
-
```{ruby facets_with_decorations, dev = "png", fig.width = 540, fig.height = 560, units = "px"}
|
469
|
-
caption = <<-EOT
|
470
|
-
Length of odontoblasts in 60 guinea pigs.
|
471
|
-
Each animal received one of three dose levels of vitamin C.
|
472
|
-
EOT
|
473
|
-
|
474
|
-
@decorations =
|
475
|
-
R.labs(title: "Tooth Growth: Length vs Vitamin C Dose",
|
476
|
-
subtitle: "Faceted by delivery method, OJ or VC",
|
477
|
-
x: "Dose (mg)", y: "Teeth length",
|
478
|
-
caption: caption)
|
479
|
-
|
480
|
-
puts @bp + @decorations
|
481
|
-
```
|
482
|
-
|
483
|
-
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/facets_with_decorations.png)
|
484
|
-
|
485
|
-
## The Corp Theme
|
486
|
-
|
487
|
-
We are almost done. But the default plot configuration does not yet look
|
488
|
-
nice to the eye. We are still distracted
|
489
|
-
by many aspects of the graph. First, the back font color does not look good. Then
|
490
|
-
plot background, borders, grids all add clutter to the plot.
|
491
|
-
|
492
|
-
We will now define our corporate theme. in a module that can be used/loaded for all
|
493
|
-
plots, similar to CSS or any other style definition.
|
494
|
-
|
495
|
-
In this theme, we remove borders and grids. The
|
496
|
-
background if left for faceted plots but removed for non-faceted plots. Font colors are
|
497
|
-
a shade o blue (color: '#00080'). Axis labels are moved near the end of the axis and
|
498
|
-
written in 'bold'.
|
499
|
-
|
500
|
-
```{ruby coorp_theme}
|
501
|
-
module CorpTheme
|
502
|
-
|
503
|
-
R.install_and_loads 'RColorBrewer'
|
504
|
-
|
505
|
-
#---------------------------------------------------------------------------------
|
506
|
-
# face can be (1=plain, 2=bold, 3=italic, 4=bold-italic)
|
507
|
-
#---------------------------------------------------------------------------------
|
508
|
-
|
509
|
-
def self.text_element(size, face: "plain", hjust: nil)
|
510
|
-
E.element_text(color: "#000080",
|
511
|
-
face: face,
|
512
|
-
size: size,
|
513
|
-
hjust: hjust)
|
514
|
-
end
|
515
|
-
|
516
|
-
#---------------------------------------------------------------------------------
|
517
|
-
# Defines the plot theme (visualization). In this theme we remove major and minor
|
518
|
-
# grids, borders and background. We also turn-off scientific notation.
|
519
|
-
#---------------------------------------------------------------------------------
|
520
|
-
|
521
|
-
def self.global_theme(faceted = false)
|
522
|
-
|
523
|
-
R.options(scipen: 999) # turn-off scientific notation like 1e+48
|
524
|
-
# R.theme_set(R.theme_bw)
|
525
|
-
|
526
|
-
# remove major grids
|
527
|
-
gb = R.theme(panel__grid__major: E.element_blank())
|
528
|
-
# remove minor grids
|
529
|
-
gb = gb + R.theme(panel__grid__minor: E.element_blank)
|
530
|
-
# gb = R.theme(panel__grid__minor: E.element_blank)
|
531
|
-
# remove border
|
532
|
-
gb = gb + R.theme(panel__border: E.element_blank)
|
533
|
-
# remove background. When working with faceted graphs, the background makes
|
534
|
-
# it easier to see each facet, so leave it
|
535
|
-
gb = gb + R.theme(panel__background: E.element_blank) if !faceted
|
536
|
-
# Change axis font
|
537
|
-
gb = gb + R.theme(axis__text: text_element(8))
|
538
|
-
# change axis title font
|
539
|
-
gb = gb + R.theme(axis__title: text_element(10, face: "bold", hjust: 1))
|
540
|
-
# change font of title
|
541
|
-
gb = gb + R.theme(title: text_element(12, face: "bold"))
|
542
|
-
# change font of subtitle
|
543
|
-
gb = gb + R.theme(plot__subtitle: text_element(9))
|
544
|
-
# change font of captions
|
545
|
-
gb = gb + R.theme(plot__caption: text_element(8))
|
546
|
-
|
547
|
-
end
|
548
|
-
|
549
|
-
end
|
550
|
-
```
|
551
|
-
|
552
|
-
## Final Box Plot
|
553
|
-
|
554
|
-
We can now easily make our final boxplot and violin plot. All the layers for the plot were
|
555
|
-
added in order to expose our understanding of the data and the need to present the result
|
556
|
-
to our audience.
|
557
|
-
|
558
|
-
The final specification is just the addition of all layers build up to this point (@bp), plus
|
559
|
-
the decorations (@decorations), plus the corporate theme.
|
560
|
-
|
561
|
-
Here is our final boxplot, without jitter.
|
562
|
-
|
563
|
-
```{ruby final_box_plot}
|
564
|
-
puts @bp + @decorations + CorpTheme.global_theme(faceted: true)
|
565
|
-
```
|
566
|
-
|
567
|
-
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_box_plot.png)
|
568
|
-
|
569
|
-
And here is the final violin plot, with jitter and the same look and feel of the corporate
|
570
|
-
boxplot.
|
571
|
-
|
572
|
-
```{ruby final_violin_plot}
|
573
|
-
puts @violin + @decorations + CorpTheme.global_theme(faceted: true)
|
574
|
-
```
|
575
|
-
|
576
|
-
|
577
|
-
![]https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_violin_plot.png
|
578
|
-
|
579
|
-
## Another View
|
580
|
-
|
581
|
-
We now make another plot, with the same look and feel as before but facetted by
|
582
|
-
dose and not by supplement. This shows how easy it is to create new plots by just
|
583
|
-
changing small statement on the _grammar of graphics_.
|
584
|
-
|
585
|
-
```{ruby facet_by_dose}
|
586
|
-
caption = <<-EOT
|
587
|
-
Length of odontoblasts in 60 guinea pigs.
|
588
|
-
Each animal received one of three dose levels of vitamin C.
|
589
|
-
EOT
|
590
|
-
|
591
|
-
@bp = @tooth_growth.ggplot(E.aes(x: :supp, y: :len, group: :supp)) +
|
592
|
-
R.geom_boxplot(E.aes(fill: :supp)) + R.facet_grid(+:all =~ +:dose) +
|
593
|
-
R.scale_fill_manual(values: R.c("cyan", "deepskyblue4")) +
|
594
|
-
R.labs(title: "Tooth Growth: Length by Dose",
|
595
|
-
subtitle: "Faceted by dose",
|
596
|
-
x: "Delivery method", y: "Teeth length",
|
597
|
-
caption: caption) +
|
598
|
-
CorpTheme.global_theme(faceted: true)
|
599
|
-
puts @bp
|
600
|
-
```
|
601
|
-
|
602
|
-
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facet_by_dose.png)
|
603
|
-
|
604
|
-
# Conclusion
|
605
|
-
|
606
|
-
In this article, we introduce Galaaz and show how to tightly couple Ruby and R
|
607
|
-
in a way that Ruby developers do not need to be aware
|
608
|
-
of the executing R engine. For the Ruby developer the existence of R
|
609
|
-
is of no consequence, she is just coding in Ruby. On the other hand, for the R
|
610
|
-
developer, migration to Ruby is a matter of small syntactic changes with a very gentle
|
611
|
-
learning curve. As the R developer becomes more proficient in Ruby, he can start using
|
612
|
-
'classes', 'modules', 'procs', 'lambdas'.
|
613
|
-
|
614
|
-
Trying to bring to Ruby the power of R starting from scratch is an enourmous endeavour
|
615
|
-
and would probably never be accomplished. Today's data scientists would certainly
|
616
|
-
stick with either Python or R. Now, both the Ruby and R communities can benefit
|
617
|
-
from this marriage, provided by Galaaz on top of GraalVM and Truffle's
|
618
|
-
polyglot environment. We presented
|
619
|
-
the process to couple Ruby and R, but this process can also be done to couple Ruby
|
620
|
-
and JavaScript or Ruby and Python. In a polyglot world a *uniglot* language might
|
621
|
-
be extremely relevant.
|
622
|
-
|
623
|
-
From the perspective of performance, GraalVM and Truffle promises improvements that could
|
624
|
-
reach over 10 times, both for [FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
|
625
|
-
and for [TruffleRuby](https://rubykaigi.org/2018/presentations/eregontp.html).
|
626
|
-
|
627
|
-
This article has shown how to improve a plot step-by-step. Starting from a very simple
|
628
|
-
boxplot with all default configurations, we moved slowly to our final plot. The important
|
629
|
-
point here is not if the final plot is actually beautiful (as beauty is in the eye of
|
630
|
-
the beholder), but that there is a process of small steps improvements that can be followed
|
631
|
-
to getting a final plot ready for presentation.
|
632
|
-
|
633
|
-
Finally, this whole article was written in rmarkdown and compiled to HTML by _gknit_, an
|
634
|
-
application that wraps _knitr_ and allows documenting Ruby code. This application can
|
635
|
-
be of great help for any Rubyist trying to write articles, blogs or documentation for Ruby.
|
636
|
-
|
637
|
-
# Installing Galaaz
|
638
|
-
|
639
|
-
## Prerequisites
|
640
|
-
|
641
|
-
* GraalVM (>= rc8): https://github.com/oracle/graal/releases
|
642
|
-
* TruffleRuby
|
643
|
-
* FastR
|
644
|
-
|
645
|
-
The following R packages will be automatically installed when necessary, but could be installed prior
|
646
|
-
to using gKnit if desired:
|
647
|
-
|
648
|
-
* ggplot2
|
649
|
-
* gridExtra
|
650
|
-
* knitr
|
651
|
-
|
652
|
-
Installation of R packages requires a development environment and can be time consuming. In Linux,
|
653
|
-
the gnu compiler and tools should be enough. I am not sure what is needed on the Mac.
|
654
|
-
|
655
|
-
## Preparation
|
656
|
-
|
657
|
-
* gem install galaaz
|
658
|
-
|
659
|
-
## Usage
|
660
|
-
|
661
|
-
* gknit <filename>
|
662
|
-
* In a scrip add: require 'galaaz'
|