galaaz 0.4.2 → 0.4.5
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/LICENSE +25 -0
- data/Rakefile +8 -0
- data/bin/gknit +9 -5
- data/bin/gstudio +4 -2
- data/bin/gstudio.rb +32 -2
- data/blogs/dev/dev.html +219 -34
- data/blogs/dev/dev.md +26 -26
- data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
- data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0
- data/blogs/dplyr/dplyr.rb +63 -0
- data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +38 -26
- data/blogs/galaaz_ggplot/galaaz_ggplot.aux +16 -17
- data/blogs/galaaz_ggplot/galaaz_ggplot.pdf +0 -0
- data/blogs/galaaz_ggplot/galaaz_ggplot.tex +65 -31
- data/blogs/oh_my/not_so.rb +2342 -0
- data/blogs/oh_my/oh_my.Rmd +493 -0
- data/blogs/oh_my/oh_my.html +680 -0
- data/blogs/oh_my/oh_my.md +597 -0
- data/blogs/oh_my/old.Rmd +2100 -0
- data/blogs/ruby_plot/figures/facets_with_decorations.png +0 -0
- data/blogs/ruby_plot/figures/facets_with_jitter.png +0 -0
- data/blogs/ruby_plot/figures/final_box_plot.png +0 -0
- data/blogs/ruby_plot/figures/final_violin_plot.png +0 -0
- data/blogs/ruby_plot/figures/violin_with_jitter.png +0 -0
- data/blogs/ruby_plot/ruby_plot.Rmd +147 -122
- data/blogs/ruby_plot/ruby_plot.Rmd_external_figs +662 -0
- data/blogs/ruby_plot/ruby_plot.html +49 -54
- data/blogs/ruby_plot/ruby_plot.md +147 -122
- data/blogs/ruby_plot/ruby_plot.pdf +0 -0
- data/blogs/ruby_plot/ruby_plot.tex +776 -157
- data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +57 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +106 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +110 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +174 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +236 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +296 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +236 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +218 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +128 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +150 -0
- data/examples/islr/ch2.spec.rb +21 -18
- data/examples/islr/ch3_boston.rb +14 -5
- data/examples/islr/ch3_multiple_regression.rb +2 -3
- data/examples/islr/ch6.spec.rb +1 -1
- data/examples/islr/x_y_rnorm.jpg +0 -0
- data/lib/R_interface/r.rb +14 -10
- data/lib/R_interface/r_libs.R +9 -0
- data/lib/R_interface/r_methods.rb +77 -6
- data/lib/R_interface/{expression.rb → r_module_s.rb} +13 -14
- data/lib/R_interface/rbinary_operators.rb +58 -71
- data/lib/R_interface/rdata_frame.rb +2 -1
- data/lib/R_interface/rdevices.R +4 -0
- data/lib/R_interface/rdevices.rb +1 -1
- data/lib/R_interface/renvironment.rb +34 -1
- data/lib/R_interface/rexpression.rb +108 -2
- data/lib/R_interface/rindexed_object.rb +3 -1
- data/lib/R_interface/rlanguage.rb +18 -2
- data/lib/R_interface/rmatrix.rb +14 -0
- data/lib/R_interface/rmd_indexed_object.rb +5 -1
- data/lib/R_interface/robject.rb +61 -23
- data/lib/R_interface/rsupport.rb +111 -53
- data/lib/R_interface/rsymbol.rb +6 -5
- data/lib/R_interface/ruby_extensions.rb +130 -4
- data/lib/R_interface/runary_operators.rb +35 -3
- data/lib/R_interface/rvector.rb +1 -0
- data/lib/galaaz.rb +0 -2
- data/lib/gknit/knitr_engine.rb +58 -4
- data/lib/gknit/ruby_engine.rb +5 -6
- data/lib/util/exec_ruby.rb +55 -9
- data/specs/all.rb +13 -3
- data/specs/figures/dose_len.png +0 -0
- data/specs/r_dataframe.spec.rb +49 -26
- data/specs/r_environment.spec.rb +140 -0
- data/specs/r_eval.spec.rb +0 -15
- data/specs/r_formula.spec.rb +232 -0
- data/specs/r_function.spec.rb +7 -8
- data/specs/r_list.spec.rb +4 -0
- data/specs/r_list_apply.spec.rb +11 -11
- data/specs/r_matrix.spec.rb +3 -3
- data/specs/{r_plots.spec.rb~ → r_nse.spec.rb} +29 -6
- data/specs/r_vector_creation.spec.rb +6 -0
- data/specs/r_vector_object.spec.rb +2 -2
- data/specs/r_vector_operators.spec.rb +3 -3
- data/specs/r_vector_subsetting.spec.rb +4 -4
- data/specs/ruby_expression.spec.rb +324 -0
- data/specs/tmp.rb +12 -524
- data/sty/galaaz.sty +71 -0
- data/version.rb +1 -1
- metadata +31 -41
- data/bin/gknit2~ +0 -6
- data/bin/ogk~ +0 -4
- data/bin/prepareR.rb~ +0 -1
- data/blogs/dev/dev.Rmd~ +0 -104
- data/blogs/galaaz_ggplot/galaaz_ggplot.dvi +0 -0
- data/blogs/galaaz_ggplot/midwest_external_png~ +0 -1
- data/blogs/gknit/gknit.Rmd~ +0 -184
- data/blogs/gknit/gknit.Rnd~ +0 -17
- data/blogs/gknit/model.rb~ +0 -46
- data/blogs/ruby_plot/ruby_plot.Rmd~ +0 -215
- data/examples/islr/Figure.jpg +0 -0
- data/examples/misc/moneyball.rb~ +0 -16
- data/examples/misc/subsetting.rb~ +0 -372
- data/lib/R/eng_ruby.R~ +0 -63
- data/lib/R_interface/capture_plot.rb~ +0 -23
- data/lib/R_interface/r.rb~ +0 -121
- data/lib/R_interface/rdevices.rb~ +0 -27
- data/lib/gknit.rb~ +0 -26
- data/lib/gknit/knitr_engine.rb~ +0 -102
- data/lib/gknit/ruby_engine.rb~ +0 -72
- data/lib/util/inline_file.rb~ +0 -23
- data/r_requires/knitr.rb~ +0 -4
- data/specs/r_language.spec.rb +0 -157
Binary file
|
Binary file
|
Binary file
|
Binary file
|
Binary file
|
@@ -1,6 +1,8 @@
|
|
1
1
|
---
|
2
2
|
title: "How to make Beautiful Ruby Plots with Galaaz"
|
3
|
-
author:
|
3
|
+
author:
|
4
|
+
- "Rodrigo Botafogo"
|
5
|
+
- "Daniel Mossé - University of Pittsburgh"
|
4
6
|
tags: [Tech, Data Science, Ruby, R, GraalVM]
|
5
7
|
date: "November 19th, 2018"
|
6
8
|
output:
|
@@ -9,27 +11,29 @@ output:
|
|
9
11
|
keep_md: true
|
10
12
|
pdf_document:
|
11
13
|
includes:
|
12
|
-
in_header:
|
14
|
+
in_header: "../../sty/galaaz.sty"
|
15
|
+
keep_tex: yes
|
13
16
|
number_sections: yes
|
17
|
+
toc: true
|
18
|
+
toc_depth: 2
|
19
|
+
fontsize: 11pt
|
14
20
|
---
|
15
21
|
|
16
22
|
```{r setup, echo=FALSE}
|
17
23
|
|
18
24
|
```
|
19
25
|
|
20
|
-
# Introduction
|
21
|
-
|
22
26
|
According to Wikipedia "Ruby is a dynamic, interpreted, reflective, object-oriented,
|
23
27
|
general-purpose programming language. It was designed and developed in the mid-1990s by Yukihiro
|
24
28
|
"Matz" Matsumoto in Japan." It reached high popularity with the development of Ruby on Rails
|
25
29
|
(RoR) by David Heinemeier Hansson. RoR is a web application framework first released
|
26
30
|
around 2005. It makes extensive use of Ruby's metaprogramming features. With RoR,
|
27
31
|
Ruby became very popular. According to [Ruby's Tiobe index](https://www.tiobe.com/tiobe-index/ruby/)
|
28
|
-
it peeked in popularity around 2008
|
29
|
-
|
30
|
-
|
32
|
+
it peeked in popularity around 2008, then declined until 2015 when it started picking up again.
|
33
|
+
At the time of this writing (November 2018), the Tiobe index puts Ruby in 16th position as
|
34
|
+
most popular language.
|
31
35
|
|
32
|
-
Python, a similar
|
36
|
+
Python, a language similar to Ruby, ranks 4th in the index. Java, C and C++ take the
|
33
37
|
first three positions. Ruby is often criticized for its focus on web applications.
|
34
38
|
But Ruby can do [much more](https://github.com/markets/awesome-ruby) than just web applications.
|
35
39
|
Yet, for scientific computing, Ruby lags way behind Python and R. Python has
|
@@ -39,7 +43,7 @@ of libraries for data analysis.
|
|
39
43
|
|
40
44
|
Until recently, there was no real perspective for Ruby to bridge this gap.
|
41
45
|
Implementing a complete scientific computing infrastructure would take too long.
|
42
|
-
|
46
|
+
Enters [Oracle's GraalVM](https://www.graalvm.org/):
|
43
47
|
|
44
48
|
> GraalVM is a universal virtual machine for running applications written in
|
45
49
|
> JavaScript, Python 3, Ruby, R, JVM-based languages like Java, Scala, Kotlin,
|
@@ -70,38 +74,40 @@ Comes GraalVM into the picture:
|
|
70
74
|
As stated above, GraalVM is a _universal_ virtual machine that allows Ruby and R (and other
|
71
75
|
languages) to run on the same environment. GraalVM allows polyglot applications to
|
72
76
|
_seamlessly_ interact with one another and pass values from one language to the other.
|
73
|
-
|
74
|
-
|
75
|
-
|
77
|
+
Although a great idea, GraalVM still requires application writers to know several languages.
|
78
|
+
To eliminate that requirement, we built Galaaz, a gem for Ruby, to tightly couple
|
79
|
+
Ruby and R and allow those languages to interact in a way that the user will be unaware
|
80
|
+
of such interaction. In other words, a Ruby programmer will be able to use all
|
81
|
+
the capabilities of R without knowing the R syntax.
|
76
82
|
|
77
|
-
Library wrapping is
|
83
|
+
Library wrapping is a usual way of bringing features from one language into another.
|
78
84
|
To improve performance, Python often wraps more efficient C libraries. For the
|
79
|
-
Python developer, the existence of such C libraries is
|
85
|
+
Python developer, the existence of such C libraries is hidden. The problem with
|
80
86
|
library wrapping is that for any new library, there is the need to handcraft a new
|
81
87
|
wrapper.
|
82
88
|
|
83
|
-
Galaaz, instead of wrapping a single C or R library, wraps the whole
|
84
|
-
|
85
|
-
Ruby developers
|
86
|
-
new wrapping effort.
|
89
|
+
Galaaz, instead of wrapping a single C or R library, wraps the whole R language
|
90
|
+
in Ruby. Doing so, all thousands of R libraries are available immediately
|
91
|
+
to Ruby developers without any new wrapping effort.
|
87
92
|
|
88
|
-
|
89
|
-
|
90
|
-
migrating from R to Ruby with Galaaz is a matter of small
|
91
|
-
|
92
|
-
|
93
|
+
To show the power of Galaaz, we show in this article how Ruby can use R's ggplot2
|
94
|
+
library tranparantly bringing to Ruby the power of high quality scientific plotting.
|
95
|
+
We also show that migrating from R to Ruby with Galaaz is a matter of small
|
96
|
+
syntactic changes. By using Ruby, the R developer can use all of Ruby's powerful
|
97
|
+
object-oriented features. Also, with Ruby, it becomes much easier to move code
|
98
|
+
from the analysis phase to the production phase.
|
93
99
|
|
94
|
-
In this article we will explore the R ToothGrowth dataset.
|
100
|
+
In this article we will explore the R ToothGrowth dataset. To illustrate, we will
|
95
101
|
create some boxplots. A primer on boxplot is available in
|
96
102
|
[this article](https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51).
|
97
103
|
|
98
104
|
We will also create a Corporate Template ensuring that plots will have a consistent
|
99
|
-
visualization. This template is
|
105
|
+
visualization. This template is built using a Ruby module. There is a way of building
|
100
106
|
ggplot themes that will work the same as the Ruby module. Yet, writing a new theme
|
101
|
-
requires specific knowledge. Ruby modules are standard to the
|
102
|
-
need special knowledge.
|
107
|
+
requires specific knowledge on theme writing. Ruby modules are standard to the
|
108
|
+
language and don't need special knowledge.
|
103
109
|
|
104
|
-
|
110
|
+
[Here](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021) we show a scatter plot in Ruby also with Galaaz.
|
105
111
|
|
106
112
|
# gKnit
|
107
113
|
|
@@ -125,18 +131,18 @@ class and instance variables ('@'), are available in all chunks.
|
|
125
131
|
# Exploring the Dataset
|
126
132
|
|
127
133
|
Let's start by exploring our selected dataset. ToothGrowth is an R dataset. A dataset
|
128
|
-
is like
|
134
|
+
is like a simple excel spreadsheet, in which each column has only one type of data.
|
129
135
|
For instance one column can have float, the other integer, and a third strings.
|
130
|
-
This dataset
|
136
|
+
This dataset analyzes the length of odontoblasts (cells responsible for tooth growth)
|
131
137
|
in 60 guinea pigs, where each animal received one of three dose levels of Vitamin C
|
132
|
-
(0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice
|
138
|
+
(0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice OJ or ascorbic acid
|
133
139
|
(a form of vitamin C and coded as VC).
|
134
140
|
|
135
141
|
The ToothGrowth dataset contains three columns: 'len', 'supp' and 'dose'. Let's
|
136
|
-
take a look at a few rows of this dataset. In Galaaz,
|
137
|
-
|
138
|
-
following chunk that
|
139
|
-
|
142
|
+
take a look at a few rows of this dataset. In Galaaz, R variables are accessed
|
143
|
+
by using the corresponding Ruby symbol preceeded by the tilda ('~') function. Note in the
|
144
|
+
following chunk that 'ToothGrowth' is the R variable and Ruby's '@tooth_growth' is
|
145
|
+
assigned the value of '~:ToothGrowth'.
|
140
146
|
|
141
147
|
```{ruby head}
|
142
148
|
# Read the R ToothGrowth variable and assign it to the
|
@@ -158,7 +164,8 @@ in usual Ruby style.
|
|
158
164
|
puts @tooth_growth.len.head
|
159
165
|
```
|
160
166
|
|
161
|
-
The 'dose' column contains a numeric value
|
167
|
+
The 'dose' column contains a numeric value with either, 0.5, 1 or 2, although the
|
168
|
+
first 6 rows as seen above only contain the 0.5 values. Even though those are
|
162
169
|
number, they are better interpreted as a [factor or cathegory](https://swcarpentry.github.io/r-novice-inflammation/12-supp-factors/). So, let's convert our 'dose' column from numeric to 'factor'.
|
163
170
|
In R, the function 'as.factor' is used to convert data in a vector to factors. To use this
|
164
171
|
function from Galaaz the dot ('.') in the function name is substituted by '__' (double underline).
|
@@ -178,8 +185,9 @@ puts @tooth_growth.dim
|
|
178
185
|
|
179
186
|
This dataset has 60 rows, one for each subject and 3 columns, as we have already seen.
|
180
187
|
|
181
|
-
Note that we do not call 'puts' when using the 'str' function. This
|
182
|
-
return anything and prints the structure of the dataset
|
188
|
+
Note that we do not need to call 'puts' when using the 'str' function. This
|
189
|
+
functions does not return anything and prints the structure of the dataset
|
190
|
+
as a side effect.
|
183
191
|
|
184
192
|
```{ruby str}
|
185
193
|
@tooth_growth.str
|
@@ -197,8 +205,9 @@ puts @tooth_growth.summary
|
|
197
205
|
|
198
206
|
## Quick plot for seing the data
|
199
207
|
|
200
|
-
Let's now create our first plot with the given data by accessing ggplot2 from Ruby.
|
201
|
-
that have never seen or used ggplot2, here is the description of ggplot
|
208
|
+
Let's now create our first plot with the given data by accessing ggplot2 from Ruby.
|
209
|
+
For Rubyists that have never seen or used ggplot2, here is the description of ggplot
|
210
|
+
found in its home page:
|
202
211
|
|
203
212
|
> "ggplot2 is a system for declaratively creating graphics, based on _The Grammar of Graphics_.
|
204
213
|
> You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical
|
@@ -210,16 +219,20 @@ axis, data, title, subtitle and also graphical primitives such as _bar plot_, _b
|
|
210
219
|
to form the final graphics.
|
211
220
|
|
212
221
|
In order to make a plot, we use the 'ggplot' function to the dataset. In R, this would be
|
213
|
-
written as ```ggplot(<dataset>, ...)```.
|
214
|
-
or ```<dataset>.ggplot(...)```. In the graph
|
215
|
-
|
216
|
-
|
217
|
-
|
222
|
+
written as ```ggplot(<dataset>, ...)```. Galaaz gives you the flexibility to use
|
223
|
+
either ```R.ggplot(<dataset>, ...)``` or ```<dataset>.ggplot(...)```. In the graph s
|
224
|
+
pecification bellow, we use the second notation
|
225
|
+
that looks more like Ruby. ggplot uses the ‘aes’ method to specify
|
226
|
+
x and y axes; in this case, the 'dose' on the $x$ axis and the 'length' on
|
227
|
+
the $y$ axis: 'E.aes(x: :dose, y: :len)'. To specify the type of plot add a geom to
|
228
|
+
the plot. For a boxplot, the geom is R.geom_boxplot.
|
218
229
|
|
219
230
|
Note also that we have a call to 'R.png' before plotting and 'R.dev__off' after the print
|
220
|
-
statement. 'R.png' opens a 'png
|
221
|
-
|
222
|
-
image gets a default name of 'Rplot\<nnn\>' where \<nnn\> is the number of the plot.
|
231
|
+
statement. 'R.png' opens a 'png device' for outputting the plot. If we do no pass a
|
232
|
+
name to the 'png' function, the
|
233
|
+
image gets a default name of 'Rplot\<nnn\>' where \<nnn\> is the number of the plot.
|
234
|
+
'R.dev__off'
|
235
|
+
closes the device and creates the 'png' file. We can
|
223
236
|
then include the generated 'png' file in the document by adding an rmarkdown directive.
|
224
237
|
|
225
238
|
```{ruby dose_len}
|
@@ -237,25 +250,29 @@ R.dev__off
|
|
237
250
|
[//]: # (of gKnit, the figures should be automatically saved and the name)
|
238
251
|
[//]: # (taken from the chunk 'label' and possibly chunk parameters)
|
239
252
|
|
240
|
-
![](
|
253
|
+
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/dose_len.png)
|
241
254
|
|
242
255
|
Great! We've just managed to create and save our first plot in Ruby with only
|
243
|
-
four lines of code. We can see with this plot a clear trend: as the
|
256
|
+
four lines of code. We can now easily see with this plot a clear trend: as the
|
257
|
+
dose of the supplement
|
244
258
|
is increased, so is the length of teeth.
|
245
259
|
|
246
260
|
## Facetting the plot
|
247
261
|
|
248
262
|
This first plot shows a trend, but our data has information about two different forms
|
249
|
-
of delivery method, either by Orange Juice
|
250
|
-
Let's then try to create a plot that
|
263
|
+
of delivery method, either by Orange Juice OJ or by Vitamin C VC.
|
264
|
+
Let's then try to create a plot that helps us discern the effect of each
|
265
|
+
delivery method. This next
|
251
266
|
plot is a _facetted_ plot where each delivery method gets is own plot.
|
252
|
-
On the left side, the plot shows the OJ delivery method. On the right side,
|
253
|
-
VC delivery method. To obtain this plot, we use the
|
267
|
+
On the left side, the plot shows the OJ delivery method. On the right side,
|
268
|
+
we see the VC delivery method. To obtain this plot, we use the
|
269
|
+
'R.facet_grid' function, that
|
254
270
|
automatically creates the facets based on the delivery method factors. The parameter to
|
255
271
|
the 'facet_grid' method is a [_formula_](https://thomasleeper.com/Rcourse/Tutorials/formulae.html).
|
256
272
|
|
257
|
-
In Galaaz
|
258
|
-
|
273
|
+
In Galaaz we give programmers the flexibility to use two different ways to write formulas.
|
274
|
+
In the first way, the following changes from writing formulas (for example 'x ~ y')
|
275
|
+
in R are necessary:
|
259
276
|
|
260
277
|
* R symbols are represented by the same Ruby symbol prefixed with the '+' method. The
|
261
278
|
symbol ```x``` in R becomes ```+:x``` in Ruby;
|
@@ -285,7 +302,7 @@ puts @bp
|
|
285
302
|
R.dev__off
|
286
303
|
```
|
287
304
|
|
288
|
-
![](
|
305
|
+
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facet_by_delivery.png)
|
289
306
|
|
290
307
|
It now becomes clear that although both methods of delivery have a direct
|
291
308
|
impact on tooth growth, method (OJ) is non-linear having a higher impact with smaller
|
@@ -294,9 +311,9 @@ doses of ascorbic acid and reducing it's impact as the dose increases. With the
|
|
294
311
|
|
295
312
|
## Adding Color
|
296
313
|
|
297
|
-
If
|
298
|
-
|
299
|
-
in Ruby. So,
|
314
|
+
If we were writing about data analysis, we would make a better analysis of the trends and
|
315
|
+
improve the statistical analysis. But here we are interested in working with ggplot
|
316
|
+
in Ruby. So, let's add some color to this plot to make the trend and comparison more
|
300
317
|
visible. In the following plot, the boxes are color coded by dose. To add color, it is
|
301
318
|
enough to add ```fill: :dose``` to the aesthetic of boxplot. With this command each 'dose'
|
302
319
|
factor gets its own color.
|
@@ -310,20 +327,20 @@ puts @bp
|
|
310
327
|
R.dev__off
|
311
328
|
```
|
312
329
|
|
313
|
-
![](
|
330
|
+
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_by_delivery_color.png)
|
314
331
|
|
315
|
-
Facetting helps us compare the general trends
|
316
|
-
Adding color allow us to compare specifically how each dosage impacts the
|
317
|
-
It is possible to observe that with smaller doses, up to 1mg,
|
318
|
-
than
|
332
|
+
Facetting helps us compare the general trends for each delivery method.
|
333
|
+
Adding color allow us to compare specifically how each dosage impacts the tooth growth.
|
334
|
+
It is possible to observe that with smaller doses, up to 1mg, OJ performs better
|
335
|
+
than VC (red color). For 2mg, both OJ and VC have the same median, but OJ is
|
319
336
|
less disperse (blue color).
|
320
|
-
For 1mg (green color),
|
321
|
-
it seems that
|
337
|
+
For 1mg (green color), OJ is significantly bettern than VC. By this very quick
|
338
|
+
visual analysis, it seems that OJ is a better delivery method than VC.
|
322
339
|
|
323
340
|
## Clarifying the data
|
324
341
|
|
325
342
|
Boxplots give us a nice idea of the distribution of data, but looking at those plots with
|
326
|
-
large colored boxes leaves us wondering what is going on
|
343
|
+
large colored boxes leaves us wondering what else is going on. According to
|
327
344
|
Edward Tufte in Envisioning Information:
|
328
345
|
|
329
346
|
> Thin data rightly prompts suspicions: "What are they leaving out? Is that really everything
|
@@ -336,7 +353,7 @@ And he states:
|
|
336
353
|
|
337
354
|
> A most unconventional design strategy is revealed: _to clarify, add detail._
|
338
355
|
|
339
|
-
Let's
|
356
|
+
Let's use this wisdom and add yet another layer of data to our plot, so that we clarify
|
340
357
|
it with detail and do not leave large empty boxes. In this next plot, we add data points for
|
341
358
|
each of the 60 pigs in the experiment. For that, add the function 'R.geom_point' to the
|
342
359
|
plot.
|
@@ -352,15 +369,17 @@ puts @bp
|
|
352
369
|
R.dev__off
|
353
370
|
```
|
354
371
|
|
355
|
-
![](
|
372
|
+
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_with_points.png)
|
356
373
|
|
357
|
-
Now we can see the actual distribution of all the 60
|
374
|
+
Now we can see the actual distribution of all the 60 subjects. Actually, this is not
|
358
375
|
totally true. We have a hard time seing all 60 subjects. It seems that some points
|
359
376
|
might be placed one over the other hiding useful information.
|
360
377
|
|
361
378
|
But no sweat! Another layer might solve the problem. In the following plot a new layer
|
362
|
-
called 'geom_jitter' is added to the plot.
|
363
|
-
the
|
379
|
+
called 'geom_jitter' is added to the plot. Jitter adds a small amount of random variation
|
380
|
+
to the location of each point, and is a useful way of handling overplotting caused by
|
381
|
+
discreteness in smaller datasets. This makes it easier to see all of the points and
|
382
|
+
prevents data hiding. We also add
|
364
383
|
color and change the shape of the points, making them even easier to see.
|
365
384
|
|
366
385
|
```{ruby facets_with_jitter}
|
@@ -372,14 +391,14 @@ puts @bp + R.geom_jitter(shape: 23, color: "cyan3", size: 1)
|
|
372
391
|
R.dev__off
|
373
392
|
```
|
374
393
|
|
375
|
-
![](
|
394
|
+
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_with_jitter.png)
|
376
395
|
|
377
396
|
Now we can see all 60 points in the graph. We have here a much higher information density
|
378
397
|
and we can see outliers and subjects distribution.
|
379
398
|
|
380
399
|
# Preparing the Plot for Presentation
|
381
400
|
|
382
|
-
We have come a long way since our first plot. As
|
401
|
+
We have come a long way since our first plot. As we already said, this is not
|
383
402
|
an article about data analysis and the focus is on the
|
384
403
|
integration of Ruby and ggplot. So, let's assume that the analysis is now done. Yet,
|
385
404
|
ending the analysis does not mean that the work is done. On the contrary, the hardest
|
@@ -399,11 +418,12 @@ those colors helped us understand the data, for a final presentation random colo
|
|
399
418
|
can distract the viewer.
|
400
419
|
|
401
420
|
In the following plot we use shades function 'scale_fill_manual' to change
|
402
|
-
the colors of the boxes and order of labels. For colors we use shades of blue for
|
421
|
+
the colors of the boxes and order of labels. For colors, we use shades of blue for
|
403
422
|
each dosage, with light blue ('cyan')
|
404
|
-
representing the lower dose and deep blue ('deepskyblue4') the higher dose.
|
405
|
-
the
|
406
|
-
the botton of the labels and (2) at the top.
|
423
|
+
representing the lower dose and deep blue ('deepskyblue4') the higher dose.
|
424
|
+
Also, the legend could be improved: we use the ‘breaks’ parameter to put
|
425
|
+
the smaller value (0.5) at the botton of the labels and the largest (2) at the top.
|
426
|
+
This ordering seems more natural and
|
407
427
|
matches with the actual order of the colors in the plot.
|
408
428
|
|
409
429
|
```{ruby facets_by_delivery_color2}
|
@@ -418,7 +438,7 @@ puts @bp
|
|
418
438
|
R.dev__off
|
419
439
|
```
|
420
440
|
|
421
|
-
![](
|
441
|
+
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_by_delivery_color2.png)
|
422
442
|
|
423
443
|
|
424
444
|
## Violin Plot and Jitter
|
@@ -452,7 +472,7 @@ puts @violin
|
|
452
472
|
R.dev__off
|
453
473
|
```
|
454
474
|
|
455
|
-
![](
|
475
|
+
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/violin_with_jitter.png)
|
456
476
|
|
457
477
|
This plot is an alternative to the original boxplot. For the final presentation, it is
|
458
478
|
important to think which graphics will be best understood by our audience. A violin plot
|
@@ -462,14 +482,15 @@ bit better than the boxplot and provides even more information than the boxplot
|
|
462
482
|
## Adding Decoration
|
463
483
|
|
464
484
|
Our final plot is starting to take shape, but a presentation plot should have at least a
|
465
|
-
title, labels on the
|
485
|
+
title, labels on the axes and maybe some other decorations. Let's start adding those.
|
466
486
|
Since decoration requires more graph area, this new plot has a 'width' and 'height'
|
467
|
-
specification. When there is no specification, the default values for width and
|
468
|
-
480.
|
487
|
+
specification. When there is no specification, the default values from R for width and
|
488
|
+
height are 480.
|
469
489
|
|
470
|
-
The 'labs' function adds
|
471
|
-
'x' for the $x$ axis label and 'y', for the $y$ axis label, and 'caption'
|
472
|
-
about the plot
|
490
|
+
The 'labs' function adds the required decoration. In this example we use 'title',
|
491
|
+
'subtitle', 'x' for the $x$ axis label and 'y', for the $y$ axis label, and 'caption'
|
492
|
+
for information about the plot (for clarity, we defined a caption variable using Ruby's
|
493
|
+
Here Doc style).
|
473
494
|
|
474
495
|
```{ruby facets_with_decorations}
|
475
496
|
R.png("figures/facets_with_decorations.png", width: 540, height: 560)
|
@@ -480,8 +501,8 @@ Each animal received one of three dose levels of vitamin C.
|
|
480
501
|
EOT
|
481
502
|
|
482
503
|
@decorations =
|
483
|
-
R.labs(title: "Tooth Growth: Length
|
484
|
-
subtitle: "Faceted by delivery method,
|
504
|
+
R.labs(title: "Tooth Growth: Length vs Vitamin C Dose",
|
505
|
+
subtitle: "Faceted by delivery method, OJ or VC",
|
485
506
|
x: "Dose (mg)", y: "Teeth length",
|
486
507
|
caption: caption)
|
487
508
|
|
@@ -490,16 +511,19 @@ puts @bp + @decorations
|
|
490
511
|
R.dev__off
|
491
512
|
```
|
492
513
|
|
493
|
-
![](
|
494
|
-
|
514
|
+
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/facets_with_decorations.png)
|
495
515
|
|
496
516
|
## The Corp Theme
|
497
517
|
|
498
|
-
We are almost done. But the plot does not yet look
|
518
|
+
We are almost done. But the default plot configuration does not yet look
|
519
|
+
nice to the eye. We are still distracted
|
499
520
|
by many aspects of the graph. First, the back font color does not look good. Then
|
500
521
|
plot background, borders, grids all add clutter to the plot.
|
501
522
|
|
502
|
-
We will now define our corporate theme.
|
523
|
+
We will now define our corporate theme. in a module that can be used/loaded for all
|
524
|
+
plots, similar to CSS or any other style definition.
|
525
|
+
|
526
|
+
In this theme, we remove borders and grids. The
|
503
527
|
background if left for faceted plots but removed for non-faceted plots. Font colors are
|
504
528
|
a shade o blue (color: '#00080'). Axis labels are moved near the end of the axis and
|
505
529
|
written in 'bold'.
|
@@ -558,6 +582,13 @@ end
|
|
558
582
|
|
559
583
|
## Final Box Plot
|
560
584
|
|
585
|
+
We can now easily make our final boxplot and violin plot. All the layers for the plot were
|
586
|
+
added in order to expose our understanding of the data and the need to present the result
|
587
|
+
to our audience.
|
588
|
+
|
589
|
+
The final specification is just the addition of all layers build up to this point (@bp), plus
|
590
|
+
the decorations (@decorations), plus the corporate theme.
|
591
|
+
|
561
592
|
Here is our final boxplot, without jitter.
|
562
593
|
|
563
594
|
```{ruby final_box_plot}
|
@@ -568,11 +599,9 @@ puts @bp + @decorations + CorpTheme.global_theme(faceted: true)
|
|
568
599
|
R.dev__off
|
569
600
|
```
|
570
601
|
|
571
|
-
![](
|
602
|
+
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_box_plot.png)
|
572
603
|
|
573
|
-
|
574
|
-
|
575
|
-
Here is the final violin plot, with jitter and the same look and feel of the corporate
|
604
|
+
And here is the final violin plot, with jitter and the same look and feel of the corporate
|
576
605
|
boxplot.
|
577
606
|
|
578
607
|
```{ruby final_violin_plot}
|
@@ -583,12 +612,14 @@ puts @violin + @decorations + CorpTheme.global_theme(faceted: true)
|
|
583
612
|
R.dev__off
|
584
613
|
```
|
585
614
|
|
586
|
-
|
615
|
+
|
616
|
+
![]https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_violin_plot.png
|
587
617
|
|
588
618
|
## Another View
|
589
619
|
|
590
|
-
|
591
|
-
dose and not by supplement.
|
620
|
+
We now make another plot, with the same look and feel as before but facetted by
|
621
|
+
dose and not by supplement. This shows how easy it is to create new plots by just
|
622
|
+
changing small statement on the _grammar of graphics_.
|
592
623
|
|
593
624
|
```{ruby facet_by_dose}
|
594
625
|
R.png("figures/facet_by_dose.png", width: 540, height: 560)
|
@@ -611,24 +642,26 @@ puts @bp
|
|
611
642
|
R.dev__off
|
612
643
|
```
|
613
644
|
|
614
|
-
![](
|
645
|
+
![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facet_by_dose.png)
|
615
646
|
|
616
647
|
# Conclusion
|
617
648
|
|
618
|
-
|
649
|
+
In this article, we introduce Galaaz and show how to tightly couple Ruby and R
|
650
|
+
in a way that Ruby developers do not need to be aware
|
619
651
|
of the executing R engine. For the Ruby developer the existence of R
|
620
|
-
is of no consequence
|
621
|
-
developer, migration to Ruby is a matter of small syntactic changes
|
652
|
+
is of no consequence, she is just coding in Ruby. On the other hand, for the R
|
653
|
+
developer, migration to Ruby is a matter of small syntactic changes with a very gentle
|
622
654
|
learning curve. As the R developer becomes more proficient in Ruby, he can start using
|
623
655
|
'classes', 'modules', 'procs', 'lambdas'.
|
624
656
|
|
625
|
-
|
626
|
-
|
627
|
-
|
628
|
-
|
629
|
-
|
630
|
-
|
631
|
-
|
657
|
+
Trying to bring to Ruby the power of R starting from scratch is an enourmous endeavour
|
658
|
+
and would probably never be accomplished. Today's data scientists would certainly
|
659
|
+
stick with either Python or R. Now, both the Ruby and R communities can benefit
|
660
|
+
from this marriage, provided by Galaaz on top of GraalVM and Truffle's
|
661
|
+
polyglot environment. We presented
|
662
|
+
the process to couple Ruby and R, but this process can also be done to couple Ruby
|
663
|
+
and JavaScript or Ruby and Python. In a polyglot world a *uniglot* language might
|
664
|
+
be extremely relevant.
|
632
665
|
|
633
666
|
From the perspective of performance, GraalVM and Truffle promises improvements that could
|
634
667
|
reach over 10 times, both for [FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
|
@@ -636,15 +669,14 @@ and for [TruffleRuby](https://rubykaigi.org/2018/presentations/eregontp.html).
|
|
636
669
|
|
637
670
|
This article has shown how to improve a plot step-by-step. Starting from a very simple
|
638
671
|
boxplot with all default configurations, we moved slowly to our final plot. The important
|
639
|
-
point here is not if the final plot is actually beautiful
|
640
|
-
of small steps improvements that can be followed
|
641
|
-
presentation.
|
672
|
+
point here is not if the final plot is actually beautiful (as beauty is in the eye of
|
673
|
+
the beholder), but that there is a process of small steps improvements that can be followed
|
674
|
+
to getting a final plot ready for presentation.
|
642
675
|
|
643
676
|
Finally, this whole article was written in rmarkdown and compiled to HTML by _gknit_, an
|
644
677
|
application that wraps _knitr_ and allows documenting Ruby code. This application can
|
645
678
|
be of great help for any Rubyist trying to write articles, blogs or documentation for Ruby.
|
646
679
|
|
647
|
-
|
648
680
|
# Installing Galaaz
|
649
681
|
|
650
682
|
## Prerequisites
|
@@ -671,10 +703,3 @@ the gnu compiler and tools should be enough. I am not sure what is needed on th
|
|
671
703
|
|
672
704
|
* gknit <filename>
|
673
705
|
* In a scrip add: require 'galaaz'
|
674
|
-
|
675
|
-
|
676
|
-
And now that you’ve read this far, here’s how to submit your story to the freeCodeCamp
|
677
|
-
publication: send an email to submit at freecodecamp org. Include the URL for your story on
|
678
|
-
Medium (preferably an unpublished draft) and the word “bananas” so that we’ll know that you
|
679
|
-
have read all this. Only send one story URL per email. There’s no need to add anything
|
680
|
-
further to your email — we just read the stories and judge them based on their own merits.
|