galaaz 0.4.1 → 0.4.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (105) hide show
  1. checksums.yaml +4 -4
  2. data/Rakefile +29 -0
  3. data/bin/gknit +208 -10
  4. data/bin/gknit2 +14 -0
  5. data/bin/gknit2~ +6 -0
  6. data/bin/prepareR.rb +3 -0
  7. data/bin/prepareR.rb~ +1 -0
  8. data/bin/tmp.py +51 -0
  9. data/blogs/dev/dev.Rmd +70 -0
  10. data/blogs/dev/dev.Rmd~ +104 -0
  11. data/blogs/dev/dev.html +209 -0
  12. data/blogs/dev/dev.md +72 -0
  13. data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
  14. data/blogs/dev/model.rb +41 -0
  15. data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +55 -27
  16. data/blogs/galaaz_ggplot/galaaz_ggplot.aux +44 -0
  17. data/blogs/galaaz_ggplot/galaaz_ggplot.dvi +0 -0
  18. data/blogs/galaaz_ggplot/galaaz_ggplot.html +17 -4
  19. data/blogs/galaaz_ggplot/galaaz_ggplot.out +10 -0
  20. data/blogs/galaaz_ggplot/galaaz_ggplot.pdf +0 -0
  21. data/blogs/galaaz_ggplot/galaaz_ggplot.tex +630 -0
  22. data/blogs/galaaz_ggplot/midwest.Rmd +1 -1
  23. data/blogs/galaaz_ggplot/midwest_external_png +13 -0
  24. data/blogs/galaaz_ggplot/midwest_external_png~ +1 -0
  25. data/blogs/gknit/gknit.Rmd +500 -0
  26. data/blogs/gknit/gknit.Rmd~ +184 -0
  27. data/blogs/gknit/gknit.Rnd~ +17 -0
  28. data/blogs/gknit/gknit.html +528 -0
  29. data/blogs/gknit/gknit.md +628 -0
  30. data/blogs/gknit/gknit.pdf +0 -0
  31. data/blogs/gknit/gknit.tex +745 -0
  32. data/blogs/gknit/gknit_files/figure-html/bubble-1.png +0 -0
  33. data/blogs/gknit/gknit_files/figure-html/diverging_bar.png +0 -0
  34. data/blogs/gknit/model.rb +41 -0
  35. data/blogs/gknit/model.rb~ +46 -0
  36. data/blogs/ruby_plot/figures/dose_len.png +0 -0
  37. data/blogs/ruby_plot/figures/facet_by_delivery.png +0 -0
  38. data/blogs/ruby_plot/figures/facet_by_dose.png +0 -0
  39. data/blogs/ruby_plot/figures/facets_by_delivery_color.png +0 -0
  40. data/blogs/ruby_plot/figures/facets_by_delivery_color2.png +0 -0
  41. data/blogs/ruby_plot/figures/facets_with_decorations.png +0 -0
  42. data/blogs/ruby_plot/figures/facets_with_jitter.png +0 -0
  43. data/blogs/ruby_plot/figures/facets_with_points.png +0 -0
  44. data/blogs/ruby_plot/figures/final_box_plot.png +0 -0
  45. data/blogs/ruby_plot/figures/final_violin_plot.png +0 -0
  46. data/blogs/ruby_plot/figures/violin_with_jitter.png +0 -0
  47. data/blogs/ruby_plot/ruby_plot.Rmd +680 -0
  48. data/blogs/ruby_plot/ruby_plot.Rmd~ +215 -0
  49. data/blogs/ruby_plot/ruby_plot.html +563 -0
  50. data/blogs/ruby_plot/ruby_plot.md +731 -0
  51. data/blogs/ruby_plot/ruby_plot.pdf +0 -0
  52. data/blogs/ruby_plot/ruby_plot.tex +458 -0
  53. data/examples/sthda_ggplot/all.rb +0 -6
  54. data/examples/sthda_ggplot/two_variables_cont_bivariate/geom_hex.rb +1 -1
  55. data/examples/sthda_ggplot/two_variables_cont_cont/misc.rb +1 -1
  56. data/examples/sthda_ggplot/two_variables_disc_cont/geom_bar.rb +2 -2
  57. data/examples/sthda_ggplot/two_variables_disc_disc/geom_jitter.rb +0 -1
  58. data/lib/R/eng_ruby.R +62 -0
  59. data/lib/R/eng_ruby.R~ +63 -0
  60. data/lib/R_interface/capture_plot.rb~ +23 -0
  61. data/lib/{R → R_interface}/expression.rb +0 -0
  62. data/lib/{R → R_interface}/r.rb +10 -1
  63. data/lib/{R → R_interface}/r.rb~ +0 -0
  64. data/lib/{R → R_interface}/r_methods.rb +21 -5
  65. data/lib/{R → R_interface}/rbinary_operators.rb +6 -1
  66. data/lib/R_interface/rclosure.rb +38 -0
  67. data/lib/{R → R_interface}/rdata_frame.rb +0 -0
  68. data/lib/R_interface/rdevices.R +31 -0
  69. data/lib/R_interface/rdevices.rb +225 -0
  70. data/lib/{R/rclosure.rb → R_interface/rdevices.rb~} +3 -10
  71. data/lib/{R → R_interface}/renvironment.rb +0 -0
  72. data/lib/{R → R_interface}/rexpression.rb +0 -0
  73. data/lib/{R → R_interface}/rindexed_object.rb +0 -0
  74. data/lib/{R → R_interface}/rlanguage.rb +0 -0
  75. data/lib/{R → R_interface}/rlist.rb +0 -0
  76. data/lib/{R → R_interface}/rmatrix.rb +0 -0
  77. data/lib/{R → R_interface}/rmd_indexed_object.rb +0 -0
  78. data/lib/{R → R_interface}/robject.rb +5 -0
  79. data/lib/{R → R_interface}/rpkg.rb +0 -0
  80. data/lib/{R → R_interface}/rsupport.rb +49 -13
  81. data/lib/{R → R_interface}/rsupport_scope.rb +0 -0
  82. data/lib/{R → R_interface}/rsymbol.rb +1 -0
  83. data/lib/{R → R_interface}/ruby_callback.rb +0 -0
  84. data/lib/{R → R_interface}/ruby_extensions.rb +2 -1
  85. data/lib/{R → R_interface}/runary_operators.rb +0 -0
  86. data/lib/{R → R_interface}/rvector.rb +0 -0
  87. data/lib/galaaz.rb +4 -2
  88. data/lib/gknit.rb +27 -0
  89. data/lib/gknit.rb~ +26 -0
  90. data/lib/gknit/knitr_engine.rb +120 -0
  91. data/lib/gknit/knitr_engine.rb~ +102 -0
  92. data/lib/gknit/ruby_engine.rb +70 -0
  93. data/lib/gknit/ruby_engine.rb~ +72 -0
  94. data/lib/util/exec_ruby.rb +8 -7
  95. data/lib/util/inline_file.rb +70 -0
  96. data/lib/util/inline_file.rb~ +23 -0
  97. data/r_requires/ggplot.rb +1 -8
  98. data/r_requires/knitr.rb +27 -0
  99. data/r_requires/knitr.rb~ +4 -0
  100. data/specs/r_language.spec.rb +22 -0
  101. data/specs/r_plots.spec.rb +72 -0
  102. data/specs/r_plots.spec.rb~ +37 -0
  103. data/specs/tmp.rb +255 -1
  104. data/version.rb +1 -1
  105. metadata +89 -39
@@ -0,0 +1,41 @@
1
+ require 'galaaz'
2
+
3
+ # Loads the R 'caret' package. If not present, installs it
4
+ R.install_and_loads 'caret'
5
+
6
+ class Model
7
+
8
+ attr_reader :data
9
+ attr_reader :test
10
+ attr_reader :train
11
+
12
+ #==========================================================
13
+ #
14
+ #==========================================================
15
+
16
+ def initialize(data, percent_train:, seed: 123)
17
+
18
+ R.set__seed(seed)
19
+ @data = data
20
+ @percent_train = percent_train
21
+ @seed = seed
22
+
23
+ end
24
+
25
+ #==========================================================
26
+ #
27
+ #==========================================================
28
+
29
+ def partition(field)
30
+
31
+ train_index =
32
+ R.createDataPartition(@data.send(field), p: @percet_train,
33
+ list: false, times: 1)
34
+ @train = @data[train_index, :all]
35
+ @test = @data[-train_index, :all]
36
+
37
+ end
38
+
39
+ end
40
+
41
+
@@ -0,0 +1,46 @@
1
+ require 'galaaz'
2
+
3
+ # Loads the R 'caret' package. If not present, installs it
4
+ R.install_and_loads 'caret'
5
+
6
+ class Model
7
+
8
+ attr_reader :data
9
+ attr_reader :test
10
+ attr_reader :train
11
+
12
+ #==========================================================
13
+ #
14
+ #==========================================================
15
+
16
+ def initialize(data, percent_train:, seed: 123)
17
+
18
+ R.set__seed(seed)
19
+ @data = data
20
+ @percent_train = percent_train
21
+ @seed = seed
22
+
23
+ end
24
+
25
+ #==========================================================
26
+ #
27
+ #==========================================================
28
+
29
+ def partition
30
+
31
+ train_index =
32
+ R.createDataPartition(@data.mpg, p: @percet_train,
33
+ list: false, times: 1)
34
+ @train = @data[train_index, :all]
35
+ @test = @data[-train_index, :all]
36
+
37
+ end
38
+
39
+ end
40
+
41
+ mtcars = ~:mtcars
42
+ model = Model.new(mtcars, percent_train: 0.8)
43
+ model.partition
44
+ puts model.train.head
45
+ puts model.test.head
46
+
@@ -0,0 +1,680 @@
1
+ ---
2
+ title: "How to make Beautiful Ruby Plots with Galaaz"
3
+ author: "Rodrigo Botafogo"
4
+ tags: [Tech, Data Science, Ruby, R, GraalVM]
5
+ date: "November 19th, 2018"
6
+ output:
7
+ html_document:
8
+ self_contained: true
9
+ keep_md: true
10
+ pdf_document:
11
+ includes:
12
+ in_header: ["../../sty/galaaz.sty"]
13
+ number_sections: yes
14
+ ---
15
+
16
+ ```{r setup, echo=FALSE}
17
+
18
+ ```
19
+
20
+ # Introduction
21
+
22
+ According to Wikipedia "Ruby is a dynamic, interpreted, reflective, object-oriented,
23
+ general-purpose programming language. It was designed and developed in the mid-1990s by Yukihiro
24
+ "Matz" Matsumoto in Japan." It reached high popularity with the development of Ruby on Rails
25
+ (RoR) by David Heinemeier Hansson. RoR is a web application framework first released
26
+ around 2005. It makes extensive use of Ruby's metaprogramming features. With RoR,
27
+ Ruby became very popular. According to [Ruby's Tiobe index](https://www.tiobe.com/tiobe-index/ruby/)
28
+ it peeked in popularity around 2008. Then it's popularity
29
+ declined until 2015 when it started picking up again. At the time of
30
+ this writing (November 2018), the Tiobe index puts ruby in 16th position.
31
+
32
+ Python, a similar language to Ruby, ranks 4th in the index. Java, C and C++ take the
33
+ first three positions. Ruby is often criticized for its focus on web applications.
34
+ But Ruby can do [much more](https://github.com/markets/awesome-ruby) than just web applications.
35
+ Yet, for scientific computing, Ruby lags way behind Python and R. Python has
36
+ Django framework for web, NumPy for numerical arrays, Pandas for data analysis.
37
+ R is a free software environment for statistical computing and graphics with thousands
38
+ of libraries for data analysis.
39
+
40
+ Until recently, there was no real perspective for Ruby to bridge this gap.
41
+ Implementing a complete scientific computing infrastructure would take too long.
42
+ Comes GraalVM into the picture:
43
+
44
+ > GraalVM is a universal virtual machine for running applications written in
45
+ > JavaScript, Python 3, Ruby, R, JVM-based languages like Java, Scala, Kotlin,
46
+ > and LLVM-based languages such as C and C++.
47
+ >
48
+ > GraalVM removes the isolation between programming languages and enables
49
+ > interoperability in a shared runtime. It can run either standalone or in the
50
+ > context of OpenJDK, Node.js, Oracle Database, or MySQL.
51
+ >
52
+ > GraalVM allows you to write polyglot applications with a seamless way to pass
53
+ > values from one language to another. With GraalVM there is no copying or
54
+ > marshaling necessary as it is with other polyglot systems. This lets you
55
+ > achieve high performance when language boundaries are crossed. Most of the time
56
+ > there is no additional cost for crossing a language boundary at all.
57
+ >
58
+ > Often developers have to make uncomfortable compromises that require them
59
+ > to rewrite their software in other languages. For example:
60
+ >
61
+ > * That library is not available in my language. I need to rewrite it.
62
+ > * That language would be the perfect fit for my problem, but we cannot
63
+ > run it in our environment.
64
+ > * That problem is already solved in my language, but the language is
65
+ > too slow.
66
+ >
67
+ > With GraalVM we aim to allow developers to freely choose the right language for
68
+ > the task at hand without making compromises.
69
+
70
+ As stated above, GraalVM is a _universal_ virtual machine that allows Ruby and R (and other
71
+ languages) to run on the same environment. GraalVM allows polyglot applications to
72
+ _seamlessly_ interact with one another and pass values from one language to the other.
73
+ Galaaz, a gem for Ruby, intends to tightly couple Ruby and R
74
+ and allow those languages to interact in a way that the user will be unaware
75
+ of such interaction.
76
+
77
+ Library wrapping is an usual way of bringing features from one language into another.
78
+ To improve performance, Python often wraps more efficient C libraries. For the
79
+ Python developer, the existence of such C libraries is of no concern. The problem with
80
+ library wrapping is that for any new library, there is the need to handcraft a new
81
+ wrapper.
82
+
83
+ Galaaz, instead of wrapping a single C or R library, wraps the whole of
84
+ the R language in Ruby. Doing so, all thousands of R libraries are available to
85
+ Ruby developers. Also any new library developed in R will be available without a
86
+ new wrapping effort.
87
+
88
+ This article shows how Ruby can use R's ggplot2 library tranparantly, and
89
+ bring to Ruby the power of high quality scientific plotting. it also shows that
90
+ migrating from R to Ruby with Galaaz is a matter of small syntactic changes.
91
+ Using Ruby, the R developer can use all of Ruby's powerful OO features. It also
92
+ becomes much easier to move code from the analysis phase to the production phase.
93
+
94
+ In this article we will explore the R ToothGrowth dataset. In doing so, we will
95
+ create some boxplots. A primer on boxplot is available in
96
+ [this article](https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51).
97
+
98
+ We will also create a Corporate Template ensuring that plots will have a consistent
99
+ visualization. This template is build using a Ruby module. There is a way of building
100
+ ggplot themes that will work the same as the Ruby module. Yet, writing a new theme
101
+ requires specific knowledge. Ruby modules are standard to the language and don't
102
+ need special knowledge.
103
+
104
+ In [this blog](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021) we show a scatter plot in Ruby also with Galaaz.
105
+
106
+ # gKnit
107
+
108
+ _Knitr_ is an application that converts text written in rmarkdown to many
109
+ different output formats. For instance, a writer can convert an rmarkdown document
110
+ to HTML, $LaTex$, docx and many other formats. Rmarkdown documents can contain
111
+ text and _code chunks_. Knitr formats code chunks in a grayed box in the output document.
112
+ It also executes the code chunks and formats the output in a white box. Every line of
113
+ output from the execution code is preceded by '##'.
114
+
115
+ Knitr allows code chunks to be in R, Python,
116
+ Ruby and dozens of other languages. Yet, while R and Python chunks can share data, in other
117
+ languages, chunks are independent. This means that a variable defined in one chunk
118
+ cannot be used in another chunk.
119
+
120
+ With _gKnit_ Ruby code chunks can share data. In gKnit each
121
+ Ruby chunk executes in its own scope and thus, local variable defined in a chunk are
122
+ not accessible by other chunks. Yet, All chunks execute in the scope of a 'chunk'
123
+ class and instance variables ('@'), are available in all chunks.
124
+
125
+ # Exploring the Dataset
126
+
127
+ Let's start by exploring our selected dataset. ToothGrowth is an R dataset. A dataset
128
+ is like an excel spreadsheet, but in which each column has only one type of data.
129
+ For instance one column can have float, the other integer, and a third strings.
130
+ This dataset analyses the length of odontoblasts (cells responsible for tooth growth)
131
+ in 60 guinea pigs, where each animal received one of three dose levels of Vitamin C
132
+ (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (OJ) or ascorbic acid
133
+ (a form of vitamin C and coded as VC).
134
+
135
+ The ToothGrowth dataset contains three columns: 'len', 'supp' and 'dose'. Let's
136
+ take a look at a few rows of this dataset. In Galaaz, to have access to an R variable
137
+ we use the corresponding Ruby symbol preceeded by the tilda ('~') function. Note in the
138
+ following chunk that Ruby's '@tooth_growth' is assigned the value of '~:ToothGrowth'.
139
+ 'ToothGrowth' is the R variable containing the dataset of interest.
140
+
141
+ ```{ruby head}
142
+ # Read the R ToothGrowth variable and assign it to the
143
+ # Ruby instance variable @tooth_growth that will be
144
+ # available to all Ruby chunks in this document.
145
+ @tooth_growth = ~:ToothGrowth
146
+ # print the first few elements of the dataset
147
+ puts @tooth_growth.head
148
+ ```
149
+
150
+ Great! We've managed to read the ToothGrowth dataset and take a look at its elements.
151
+ We see here the first 6 rows of the dataset. To access a column, follow the dataset name
152
+ with a dot ('.') and the name of the column. Also use dot notation to chain methods
153
+ in usual Ruby style.
154
+
155
+ ```{ruby dataset_columns}
156
+ # Access the tooth_growth 'len' column and print the first few
157
+ # elements of this column with the 'head' method.
158
+ puts @tooth_growth.len.head
159
+ ```
160
+
161
+ The 'dose' column contains a numeric value wiht either, 0.5, 1 or 2. Although those are
162
+ number, they are better interpreted as a [factor or cathegory](https://swcarpentry.github.io/r-novice-inflammation/12-supp-factors/). So, let's convert our 'dose' column from numeric to 'factor'.
163
+ In R, the function 'as.factor' is used to convert data in a vector to factors. To use this
164
+ function from Galaaz the dot ('.') in the function name is substituted by '__' (double underline).
165
+ The function 'as.factor' becomes 'R.as__factor' or just 'as__factor' when chaining.
166
+
167
+ ```{ruby tooth_growth}
168
+ # convert the dose to a factor
169
+ @tooth_growth.dose = @tooth_growth.dose.as__factor
170
+ ```
171
+
172
+ Let's explore some more details of this dataset. In particular, let's look at its dimensions,
173
+ structure and summary statistics.
174
+
175
+ ```{ruby dim}
176
+ puts @tooth_growth.dim
177
+ ```
178
+
179
+ This dataset has 60 rows, one for each subject and 3 columns, as we have already seen.
180
+
181
+ Note that we do not call 'puts' when using the 'str' function. This functions does not
182
+ return anything and prints the structure of the dataset as a side effect.
183
+
184
+ ```{ruby str}
185
+ @tooth_growth.str
186
+ ```
187
+ Observe that both variables 'supp' and 'dose' are factors. The system made variable 'supp'
188
+ a factor automatically, since it contais two strings OJ and VC.
189
+
190
+ Finally, using the summary method, we get the statistical summary for the dataset
191
+
192
+ ```{ruby summary}
193
+ puts @tooth_growth.summary
194
+ ```
195
+
196
+ # Doing the Data Analysis
197
+
198
+ ## Quick plot for seing the data
199
+
200
+ Let's now create our first plot with the given data by accessing ggplot2 from Ruby. For Rubyist
201
+ that have never seen or used ggplot2, here is the description of ggplot found on its home page:
202
+
203
+ > "ggplot2 is a system for declaratively creating graphics, based on _The Grammar of Graphics_.
204
+ > You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical
205
+ > primitives to use, and it takes care of the details."
206
+
207
+ This description might be a bit cryptic and it is best to see it at work to understand it.
208
+ Basically, in the _grammar of graphics_ developers add layers of components such as grid,
209
+ axis, data, title, subtitle and also graphical primitives such as _bar plot_, _box plot_,
210
+ to form the final graphics.
211
+
212
+ In order to make a plot, we use the 'ggplot' function to the dataset. In R, this would be
213
+ written as ```ggplot(<dataset>, ...)```. In Galaaz, use either ```R.ggplot(<dataset>, ...)```,
214
+ or ```<dataset>.ggplot(...)```. In the graph specification bellow, we use the second notation
215
+ that looks more Ruby like. The plot specifies the 'dose' on the $x$ axis and the 'length' on
216
+ the $y$ axis with the 'aes' method. 'E.aes(x: :dose, y: :len)'. To specify the type of plot to
217
+ create add a geom to the plot. For a boxplot, the geom is R.geom_boxplot.
218
+
219
+ Note also that we have a call to 'R.png' before plotting and 'R.dev__off' after the print
220
+ statement. 'R.png' opens a 'png' device for outputting the plot. 'R.dev__off'
221
+ closes the device and creates the 'png' file. If we do no pass a name to the 'png' function, the
222
+ image gets a default name of 'Rplot\<nnn\>' where \<nnn\> is the number of the plot. We can
223
+ then include the generated 'png' file in the document by adding an rmarkdown directive.
224
+
225
+ ```{ruby dose_len}
226
+ require 'ggplot'
227
+
228
+ R.png("figures/dose_len.png")
229
+
230
+ e = @tooth_growth.ggplot(E.aes(x: :dose, y: :len))
231
+ print e + R.geom_boxplot
232
+
233
+ R.dev__off
234
+ ```
235
+
236
+ [//]: # (Including the 'png' file generated above. In future releases)
237
+ [//]: # (of gKnit, the figures should be automatically saved and the name)
238
+ [//]: # (taken from the chunk 'label' and possibly chunk parameters)
239
+
240
+ ![](figures/dose_len.png)
241
+
242
+ Great! We've just managed to create and save our first plot in Ruby with only
243
+ four lines of code. We can see with this plot a clear trend: as the dose of the supplement
244
+ is increased, so is the length of teeth.
245
+
246
+ ## Facetting the plot
247
+
248
+ This first plot shows a trend, but our data has information about two different forms
249
+ of delivery method, either by Orange Juice (OJ) or by Vitamin C (VC).
250
+ Let's then try to create a plot that explicits the effect of each delivery method. This next
251
+ plot is a _facetted_ plot where each delivery method gets is own plot.
252
+ On the left side, the plot shows the OJ delivery method. On the right side, we see the
253
+ VC delivery method. To obtain this plot, we use the 'R.facet_grid' function, that
254
+ automatically creates the facets based on the delivery method factors. The parameter to
255
+ the 'facet_grid' method is a [_formula_](https://thomasleeper.com/Rcourse/Tutorials/formulae.html).
256
+
257
+ In Galaaz, formulas are written a bit differently than in R. The following changes are
258
+ necessary:
259
+
260
+ * R symbols are represented by the same Ruby symbol prefixed with the '+' method. The
261
+ symbol ```x``` in R becomes ```+:x``` in Ruby;
262
+ * The '~' operator in R becomes '=~' in Ruby. The formula ```x ~ y``` in R is written as
263
+ ```+:x =~ +:y``` in Ruby;
264
+ * The '.' symbol in R becomes '+:all'
265
+
266
+ Another way of writing a formula is to use the 'formula' function with the actual formula as
267
+ a string. The formula ```x ~ y``` in R can be written as ```R.formula("x ~ y")```. For more
268
+ complex formulas, the use of the 'formula' function is preferred.
269
+
270
+ The formula ```+:all =~ +:supp``` indicates to the 'facet_grid' function that it needs to
271
+ facet the plot based on the ```supp``` variable and split the plot vertically. Changing
272
+ the formula to ```+:supp =~ +:all``` would split the plot horizontally.
273
+
274
+ ```{ruby facet_by_delivery}
275
+ R.png("figures/facet_by_delivery.png")
276
+
277
+ @base_tooth = @tooth_growth.ggplot(E.aes(x: :dose, y: :len, group: :dose))
278
+
279
+ @bp = @base_tooth + R.geom_boxplot +
280
+ # Split in vertical direction
281
+ R.facet_grid(+:all =~ +:supp)
282
+
283
+ puts @bp
284
+
285
+ R.dev__off
286
+ ```
287
+
288
+ ![](figures/facet_by_delivery.png)
289
+
290
+ It now becomes clear that although both methods of delivery have a direct
291
+ impact on tooth growth, method (OJ) is non-linear having a higher impact with smaller
292
+ doses of ascorbic acid and reducing it's impact as the dose increases. With the
293
+ (VC) approach, the impact seems to be more linear.
294
+
295
+ ## Adding Color
296
+
297
+ If this paper was about data analysis, we should make a better analysis of the trends and
298
+ should improve the statistical analysis. But we are interested in working with ggplot
299
+ in Ruby. So, Let's add some color to this plot to make the trend and comparison more
300
+ visible. In the following plot, the boxes are color coded by dose. To add color, it is
301
+ enough to add ```fill: :dose``` to the aesthetic of boxplot. With this command each 'dose'
302
+ factor gets its own color.
303
+
304
+ ```{ruby facets_by_delivery_color}
305
+ R.png("figures/facets_by_delivery_color.png")
306
+
307
+ @bp = @bp + R.geom_boxplot(E.aes(fill: :dose))
308
+ puts @bp
309
+
310
+ R.dev__off
311
+ ```
312
+
313
+ ![](figures/facets_by_delivery_color.png)
314
+
315
+ Facetting helps us compare the general trends in the (OJ) and (VC) delivery methods.
316
+ Adding color allow us to compare specifically how each dosage impacts the teeth growth.
317
+ It is possible to observe that with smaller doses, up to 1mg, (OJ) performs better
318
+ than (VC) (red color). For 2mg, both (OJ) and (VC) have the same median, but (OJ) is
319
+ less disperse (blue color).
320
+ For 1mg (green color), (OJ) is significantly bettern than (VC). By this very quick analysis,
321
+ it seems that (OJ) is a better delivery method than (VC).
322
+
323
+ ## Clarifying the data
324
+
325
+ Boxplots give us a nice idea of the distribution of data, but looking at those plots with
326
+ large colored boxes leaves us wondering what is going on on those boxes. According to
327
+ Edward Tufte in Envisioning Information:
328
+
329
+ > Thin data rightly prompts suspicions: "What are they leaving out? Is that really everything
330
+ > they know? What are they hiding? Is that all they did?" Now and then it is claimed
331
+ > that vacant space is "friendly" (anthropomorphizing an inherently murky idea) but
332
+ > _it is not how much empty space there is, but rather how it is used. It is not how much
333
+ > information there is, but rather how effectively it is arranged._
334
+
335
+ And he states:
336
+
337
+ > A most unconventional design strategy is revealed: _to clarify, add detail._
338
+
339
+ Let's then use this wisdom and add yet another layer of data to our plot, so that we clarify
340
+ it with detail and do not leave large empty boxes. In this next plot, we add data points for
341
+ each of the 60 pigs in the experiment. For that, add the function 'R.geom_point' to the
342
+ plot.
343
+
344
+ ```{ruby facets_with_points}
345
+ R.png("figures/facets_with_points.png")
346
+
347
+ # Split in vertical direction
348
+ @bp = @bp + R.geom_point
349
+
350
+ puts @bp
351
+
352
+ R.dev__off
353
+ ```
354
+
355
+ ![](figures/facets_with_points.png)
356
+
357
+ Now we can see the actual distribution of all the 60 subject. Actually, this is not
358
+ totally true. We have a hard time seing all 60 subjects. It seems that some points
359
+ might be placed one over the other hiding useful information.
360
+
361
+ But no sweat! Another layer might solve the problem. In the following plot a new layer
362
+ called 'geom_jitter' is added to the plot. This adds randomness to the position of
363
+ the points, making it easier to see all of then and preventing data hiding. We also add
364
+ color and change the shape of the points, making them even easier to see.
365
+
366
+ ```{ruby facets_with_jitter}
367
+ R.png("figures/facets_with_jitter.png")
368
+
369
+ # Split in vertical direction
370
+ puts @bp + R.geom_jitter(shape: 23, color: "cyan3", size: 1)
371
+
372
+ R.dev__off
373
+ ```
374
+
375
+ ![](figures/facets_with_jitter.png)
376
+
377
+ Now we can see all 60 points in the graph. We have here a much higher information density
378
+ and we can see outliers and subjects distribution.
379
+
380
+ # Preparing the Plot for Presentation
381
+
382
+ We have come a long way since our first plot. As was already said, this is not
383
+ an article about data analysis and the focus is on the
384
+ integration of Ruby and ggplot. So, let's assume that the analysis is now done. Yet,
385
+ ending the analysis does not mean that the work is done. On the contrary, the hardest
386
+ part is yet to come!
387
+
388
+ After the analysis it is necessary to communicate it by making a final plot for
389
+ presentation. The last plot has all the information we want to share, but it is not very
390
+ pleasing to the eye.
391
+
392
+ ## Improving Colors
393
+
394
+ Let's start by trying to improve colors. For now, we will not use the jitter layer.
395
+ The previous plot has three bright colors that have no relashionship between them. Is
396
+ there any obvious, or non-obvious for that matter, interpretation for the colors?
397
+ Clearly, they are just random colors selected automatically by our software. Although
398
+ those colors helped us understand the data, for a final presentation random colors
399
+ can distract the viewer.
400
+
401
+ In the following plot we use shades function 'scale_fill_manual' to change
402
+ the colors of the boxes and order of labels. For colors we use shades of blue for
403
+ each dosage, with light blue ('cyan')
404
+ representing the lower dose and deep blue ('deepskyblue4') the higher dose. Also
405
+ the smaller value (0.5) is on
406
+ the botton of the labels and (2) at the top. This ordering seems more natural and
407
+ matches with the actual order of the colors in the plot.
408
+
409
+ ```{ruby facets_by_delivery_color2}
410
+ R.png("figures/facets_by_delivery_color2.png")
411
+
412
+ @bp = @bp +
413
+ R.scale_fill_manual(values: R.c("cyan", "deepskyblue", "deepskyblue4"),
414
+ breaks: R.c("2","1","0.5"))
415
+
416
+ puts @bp
417
+
418
+ R.dev__off
419
+ ```
420
+
421
+ ![](figures/facets_by_delivery_color2.png)
422
+
423
+
424
+ ## Violin Plot and Jitter
425
+
426
+ The boxplot with jitter did look a bit overwhelming. The next plot uses a variation of
427
+ a boxplot known as a _violin plot_ with jittered data.
428
+
429
+ [From Wikipedia](https://en.wikipedia.org/wiki/Violin_plot)
430
+
431
+
432
+ > A violin plot is a method of plotting numeric data. It is similar to a box plot with
433
+ > a rotated kernel density plot on each side.
434
+ >
435
+ > A violin plot has four layers. The outer shape represents all possible results, with
436
+ > thickness indicating how common. (Thus the thickest section represents the mode average.)
437
+ > The next layer inside represents the values that occur 95% of the time.
438
+ > The next layer (if it exists) inside represents the values that occur 50% of the time.
439
+ > The central dot represents the median average value.
440
+
441
+ ```{ruby violin_with_jitter}
442
+ R.png("figures/violin_with_jitter.png")
443
+
444
+ @violin = @base_tooth + R.geom_violin(E.aes(fill: :dose)) +
445
+ R.facet_grid(+:all =~ +:supp) +
446
+ R.geom_jitter(shape: 23, color: "cyan3", size: 1) +
447
+ R.scale_fill_manual(values: R.c("cyan", "deepskyblue", "deepskyblue4"),
448
+ breaks: R.c("2","1","0.5"))
449
+
450
+ puts @violin
451
+
452
+ R.dev__off
453
+ ```
454
+
455
+ ![](figures/violin_with_jitter.png)
456
+
457
+ This plot is an alternative to the original boxplot. For the final presentation, it is
458
+ important to think which graphics will be best understood by our audience. A violin plot
459
+ is a less known plot and could add mental overhead, yet, in my opinion, it does look a lit
460
+ bit better than the boxplot and provides even more information than the boxplot with jitter.
461
+
462
+ ## Adding Decoration
463
+
464
+ Our final plot is starting to take shape, but a presentation plot should have at least a
465
+ title, labels on the axis and maybe some other decorations. Let's start adding those.
466
+ Since decoration requires more graph area, this new plot has a 'width' and 'height'
467
+ specification. When there is no specification, the default values for width and height are
468
+ 480.
469
+
470
+ The 'labs' function adds require decoration. In this example we use 'title', 'subtitle',
471
+ 'x' for the $x$ axis label and 'y', for the $y$ axis label, and 'caption' for information
472
+ about the plot.
473
+
474
+ ```{ruby facets_with_decorations}
475
+ R.png("figures/facets_with_decorations.png", width: 540, height: 560)
476
+
477
+ caption = <<-EOT
478
+ Length of odontoblasts in 60 guinea pigs.
479
+ Each animal received one of three dose levels of vitamin C.
480
+ EOT
481
+
482
+ @decorations =
483
+ R.labs(title: "Tooth Growth: Length by Dose",
484
+ subtitle: "Faceted by delivery method, (OJ) or (VC)",
485
+ x: "Dose (mg)", y: "Teeth length",
486
+ caption: caption)
487
+
488
+ puts @bp + @decorations
489
+
490
+ R.dev__off
491
+ ```
492
+
493
+ ![](figures/facets_with_decorations.png)
494
+
495
+
496
+ ## The Corp Theme
497
+
498
+ We are almost done. But the plot does not yet look nice to the eye. We are still distracted
499
+ by many aspects of the graph. First, the back font color does not look good. Then
500
+ plot background, borders, grids all add clutter to the plot.
501
+
502
+ We will now define our corporate theme. In this theme, we remove borders and grids. The
503
+ background if left for faceted plots but removed for non-faceted plots. Font colors are
504
+ a shade o blue (color: '#00080'). Axis labels are moved near the end of the axis and
505
+ written in 'bold'.
506
+
507
+ ```{ruby coorp_theme}
508
+ module CorpTheme
509
+
510
+ R.install_and_loads 'RColorBrewer'
511
+
512
+ #---------------------------------------------------------------------------------
513
+ # face can be (1=plain, 2=bold, 3=italic, 4=bold-italic)
514
+ #---------------------------------------------------------------------------------
515
+
516
+ def self.text_element(size, face: "plain", hjust: nil)
517
+ E.element_text(color: "#000080",
518
+ face: face,
519
+ size: size,
520
+ hjust: hjust)
521
+ end
522
+
523
+ #---------------------------------------------------------------------------------
524
+ # Defines the plot theme (visualization). In this theme we remove major and minor
525
+ # grids, borders and background. We also turn-off scientific notation.
526
+ #---------------------------------------------------------------------------------
527
+
528
+ def self.global_theme(faceted = false)
529
+
530
+ R.options(scipen: 999) # turn-off scientific notation like 1e+48
531
+ # R.theme_set(R.theme_bw)
532
+
533
+ # remove major grids
534
+ gb = R.theme(panel__grid__major: E.element_blank())
535
+ # remove minor grids
536
+ gb = gb + R.theme(panel__grid__minor: E.element_blank)
537
+ # gb = R.theme(panel__grid__minor: E.element_blank)
538
+ # remove border
539
+ gb = gb + R.theme(panel__border: E.element_blank)
540
+ # remove background. When working with faceted graphs, the background makes
541
+ # it easier to see each facet, so leave it
542
+ gb = gb + R.theme(panel__background: E.element_blank) if !faceted
543
+ # Change axis font
544
+ gb = gb + R.theme(axis__text: text_element(8))
545
+ # change axis title font
546
+ gb = gb + R.theme(axis__title: text_element(10, face: "bold", hjust: 1))
547
+ # change font of title
548
+ gb = gb + R.theme(title: text_element(12, face: "bold"))
549
+ # change font of subtitle
550
+ gb = gb + R.theme(plot__subtitle: text_element(9))
551
+ # change font of captions
552
+ gb = gb + R.theme(plot__caption: text_element(8))
553
+
554
+ end
555
+
556
+ end
557
+ ```
558
+
559
+ ## Final Box Plot
560
+
561
+ Here is our final boxplot, without jitter.
562
+
563
+ ```{ruby final_box_plot}
564
+ R.png("figures/final_box_plot.png", width: 540, height: 560)
565
+
566
+ puts @bp + @decorations + CorpTheme.global_theme(faceted: true)
567
+
568
+ R.dev__off
569
+ ```
570
+
571
+ ![](figures/final_box_plot.png)
572
+
573
+ ## Final Violin Plot
574
+
575
+ Here is the final violin plot, with jitter and the same look and feel of the corporate
576
+ boxplot.
577
+
578
+ ```{ruby final_violin_plot}
579
+ R.png("figures/final_violin_plot.png", width: 540, height: 560)
580
+
581
+ puts @violin + @decorations + CorpTheme.global_theme(faceted: true)
582
+
583
+ R.dev__off
584
+ ```
585
+
586
+ ![](figures/final_violin_plot.png)
587
+
588
+ ## Another View
589
+
590
+ Finally, here is a last plot, with the same look and feel as before but facetted by
591
+ dose and not by supplement.
592
+
593
+ ```{ruby facet_by_dose}
594
+ R.png("figures/facet_by_dose.png", width: 540, height: 560)
595
+
596
+ caption = <<-EOT
597
+ Length of odontoblasts in 60 guinea pigs.
598
+ Each animal received one of three dose levels of vitamin C.
599
+ EOT
600
+
601
+ @bp = @tooth_growth.ggplot(E.aes(x: :supp, y: :len, group: :supp)) +
602
+ R.geom_boxplot(E.aes(fill: :supp)) + R.facet_grid(+:all =~ +:dose) +
603
+ R.scale_fill_manual(values: R.c("cyan", "deepskyblue4")) +
604
+ R.labs(title: "Tooth Growth: Length by Dose",
605
+ subtitle: "Faceted by dose",
606
+ x: "Delivery method", y: "Teeth length",
607
+ caption: caption) +
608
+ CorpTheme.global_theme(faceted: true)
609
+ puts @bp
610
+
611
+ R.dev__off
612
+ ```
613
+
614
+ ![](figures/facet_by_dose.png)
615
+
616
+ # Conclusion
617
+
618
+ Galaaz tightly couples Ruby and R in a way that Ruby developers do not need to be aware
619
+ of the executing R engine. For the Ruby developer the existence of R
620
+ is of no consequence. For her, she is just coding in Ruby. On the other hand, for the R
621
+ developer, migration to Ruby is a matter of small syntactic changes and very gentle
622
+ learning curve. As the R developer becomes more proficient in Ruby, he can start using
623
+ 'classes', 'modules', 'procs', 'lambdas'.
624
+
625
+ This coupling shows the power of GraalVM and Truffle polyglot environment. Trying to
626
+ bring to Ruby the power of R starting from scratch is an enourmous endeavour and would
627
+ probably never be accomplished. Today's data scientists would certainly stick with either
628
+ Python or R. Now, both the Ruby and R communities might benefit from this marriage. Also,
629
+ the process to couple Ruby and R can be also be done to couple Ruby and JavaScript and
630
+ maybe also Ruby and Python. In a polyglot world a *uniglot* language might be extremely
631
+ relevant.
632
+
633
+ From the perspective of performance, GraalVM and Truffle promises improvements that could
634
+ reach over 10 times, both for [FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
635
+ and for [TruffleRuby](https://rubykaigi.org/2018/presentations/eregontp.html).
636
+
637
+ This article has shown how to improve a plot step-by-step. Starting from a very simple
638
+ boxplot with all default configurations, we moved slowly to our final plot. The important
639
+ point here is not if the final plot is actually beautiful, but that there is a process
640
+ of small steps improvements that can be followed until getting a final plot ready for
641
+ presentation.
642
+
643
+ Finally, this whole article was written in rmarkdown and compiled to HTML by _gknit_, an
644
+ application that wraps _knitr_ and allows documenting Ruby code. This application can
645
+ be of great help for any Rubyist trying to write articles, blogs or documentation for Ruby.
646
+
647
+
648
+ # Installing Galaaz
649
+
650
+ ## Prerequisites
651
+
652
+ * GraalVM (>= rc8): https://github.com/oracle/graal/releases
653
+ * TruffleRuby
654
+ * FastR
655
+
656
+ The following R packages will be automatically installed when necessary, but could be installed prior
657
+ to using gKnit if desired:
658
+
659
+ * ggplot2
660
+ * gridExtra
661
+ * knitr
662
+
663
+ Installation of R packages requires a development environment and can be time consuming. In Linux,
664
+ the gnu compiler and tools should be enough. I am not sure what is needed on the Mac.
665
+
666
+ ## Preparation
667
+
668
+ * gem install galaaz
669
+
670
+ ## Usage
671
+
672
+ * gknit <filename>
673
+ * In a scrip add: require 'galaaz'
674
+
675
+
676
+ And now that you’ve read this far, here’s how to submit your story to the freeCodeCamp
677
+ publication: send an email to submit at freecodecamp org. Include the URL for your story on
678
+ Medium (preferably an unpublished draft) and the word “bananas” so that we’ll know that you
679
+ have read all this. Only send one story URL per email. There’s no need to add anything
680
+ further to your email — we just read the stories and judge them based on their own merits.