galaaz 0.4.2 → 0.4.5

Sign up to get free protection for your applications and to get access to all the features.
Files changed (114) hide show
  1. checksums.yaml +4 -4
  2. data/LICENSE +25 -0
  3. data/Rakefile +8 -0
  4. data/bin/gknit +9 -5
  5. data/bin/gstudio +4 -2
  6. data/bin/gstudio.rb +32 -2
  7. data/blogs/dev/dev.html +219 -34
  8. data/blogs/dev/dev.md +26 -26
  9. data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
  10. data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0
  11. data/blogs/dplyr/dplyr.rb +63 -0
  12. data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +38 -26
  13. data/blogs/galaaz_ggplot/galaaz_ggplot.aux +16 -17
  14. data/blogs/galaaz_ggplot/galaaz_ggplot.pdf +0 -0
  15. data/blogs/galaaz_ggplot/galaaz_ggplot.tex +65 -31
  16. data/blogs/oh_my/not_so.rb +2342 -0
  17. data/blogs/oh_my/oh_my.Rmd +493 -0
  18. data/blogs/oh_my/oh_my.html +680 -0
  19. data/blogs/oh_my/oh_my.md +597 -0
  20. data/blogs/oh_my/old.Rmd +2100 -0
  21. data/blogs/ruby_plot/figures/facets_with_decorations.png +0 -0
  22. data/blogs/ruby_plot/figures/facets_with_jitter.png +0 -0
  23. data/blogs/ruby_plot/figures/final_box_plot.png +0 -0
  24. data/blogs/ruby_plot/figures/final_violin_plot.png +0 -0
  25. data/blogs/ruby_plot/figures/violin_with_jitter.png +0 -0
  26. data/blogs/ruby_plot/ruby_plot.Rmd +147 -122
  27. data/blogs/ruby_plot/ruby_plot.Rmd_external_figs +662 -0
  28. data/blogs/ruby_plot/ruby_plot.html +49 -54
  29. data/blogs/ruby_plot/ruby_plot.md +147 -122
  30. data/blogs/ruby_plot/ruby_plot.pdf +0 -0
  31. data/blogs/ruby_plot/ruby_plot.tex +776 -157
  32. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +57 -0
  33. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +106 -0
  34. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +110 -0
  35. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +174 -0
  36. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +236 -0
  37. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
  38. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +296 -0
  39. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +236 -0
  40. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +218 -0
  41. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +128 -0
  42. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +150 -0
  43. data/examples/islr/ch2.spec.rb +21 -18
  44. data/examples/islr/ch3_boston.rb +14 -5
  45. data/examples/islr/ch3_multiple_regression.rb +2 -3
  46. data/examples/islr/ch6.spec.rb +1 -1
  47. data/examples/islr/x_y_rnorm.jpg +0 -0
  48. data/lib/R_interface/r.rb +14 -10
  49. data/lib/R_interface/r_libs.R +9 -0
  50. data/lib/R_interface/r_methods.rb +77 -6
  51. data/lib/R_interface/{expression.rb → r_module_s.rb} +13 -14
  52. data/lib/R_interface/rbinary_operators.rb +58 -71
  53. data/lib/R_interface/rdata_frame.rb +2 -1
  54. data/lib/R_interface/rdevices.R +4 -0
  55. data/lib/R_interface/rdevices.rb +1 -1
  56. data/lib/R_interface/renvironment.rb +34 -1
  57. data/lib/R_interface/rexpression.rb +108 -2
  58. data/lib/R_interface/rindexed_object.rb +3 -1
  59. data/lib/R_interface/rlanguage.rb +18 -2
  60. data/lib/R_interface/rmatrix.rb +14 -0
  61. data/lib/R_interface/rmd_indexed_object.rb +5 -1
  62. data/lib/R_interface/robject.rb +61 -23
  63. data/lib/R_interface/rsupport.rb +111 -53
  64. data/lib/R_interface/rsymbol.rb +6 -5
  65. data/lib/R_interface/ruby_extensions.rb +130 -4
  66. data/lib/R_interface/runary_operators.rb +35 -3
  67. data/lib/R_interface/rvector.rb +1 -0
  68. data/lib/galaaz.rb +0 -2
  69. data/lib/gknit/knitr_engine.rb +58 -4
  70. data/lib/gknit/ruby_engine.rb +5 -6
  71. data/lib/util/exec_ruby.rb +55 -9
  72. data/specs/all.rb +13 -3
  73. data/specs/figures/dose_len.png +0 -0
  74. data/specs/r_dataframe.spec.rb +49 -26
  75. data/specs/r_environment.spec.rb +140 -0
  76. data/specs/r_eval.spec.rb +0 -15
  77. data/specs/r_formula.spec.rb +232 -0
  78. data/specs/r_function.spec.rb +7 -8
  79. data/specs/r_list.spec.rb +4 -0
  80. data/specs/r_list_apply.spec.rb +11 -11
  81. data/specs/r_matrix.spec.rb +3 -3
  82. data/specs/{r_plots.spec.rb~ → r_nse.spec.rb} +29 -6
  83. data/specs/r_vector_creation.spec.rb +6 -0
  84. data/specs/r_vector_object.spec.rb +2 -2
  85. data/specs/r_vector_operators.spec.rb +3 -3
  86. data/specs/r_vector_subsetting.spec.rb +4 -4
  87. data/specs/ruby_expression.spec.rb +324 -0
  88. data/specs/tmp.rb +12 -524
  89. data/sty/galaaz.sty +71 -0
  90. data/version.rb +1 -1
  91. metadata +31 -41
  92. data/bin/gknit2~ +0 -6
  93. data/bin/ogk~ +0 -4
  94. data/bin/prepareR.rb~ +0 -1
  95. data/blogs/dev/dev.Rmd~ +0 -104
  96. data/blogs/galaaz_ggplot/galaaz_ggplot.dvi +0 -0
  97. data/blogs/galaaz_ggplot/midwest_external_png~ +0 -1
  98. data/blogs/gknit/gknit.Rmd~ +0 -184
  99. data/blogs/gknit/gknit.Rnd~ +0 -17
  100. data/blogs/gknit/model.rb~ +0 -46
  101. data/blogs/ruby_plot/ruby_plot.Rmd~ +0 -215
  102. data/examples/islr/Figure.jpg +0 -0
  103. data/examples/misc/moneyball.rb~ +0 -16
  104. data/examples/misc/subsetting.rb~ +0 -372
  105. data/lib/R/eng_ruby.R~ +0 -63
  106. data/lib/R_interface/capture_plot.rb~ +0 -23
  107. data/lib/R_interface/r.rb~ +0 -121
  108. data/lib/R_interface/rdevices.rb~ +0 -27
  109. data/lib/gknit.rb~ +0 -26
  110. data/lib/gknit/knitr_engine.rb~ +0 -102
  111. data/lib/gknit/ruby_engine.rb~ +0 -72
  112. data/lib/util/inline_file.rb~ +0 -23
  113. data/r_requires/knitr.rb~ +0 -4
  114. data/specs/r_language.spec.rb +0 -157
@@ -10,6 +10,7 @@
10
10
 
11
11
 
12
12
  <meta name="author" content="Rodrigo Botafogo" />
13
+ <meta name="author" content="Daniel Mossé - University of Pittsburgh" />
13
14
 
14
15
 
15
16
  <title>How to make Beautiful Ruby Plots with Galaaz</title>
@@ -123,16 +124,15 @@ $(document).ready(function () {
123
124
 
124
125
  <h1 class="title toc-ignore">How to make Beautiful Ruby Plots with Galaaz</h1>
125
126
  <h4 class="author"><em>Rodrigo Botafogo</em></h4>
127
+ <h4 class="author"><em>Daniel Mossé - University of Pittsburgh</em></h4>
126
128
  <h4 class="date"><em>November 19th, 2018</em></h4>
127
129
 
128
130
  </div>
129
131
 
130
132
 
131
- <div id="introduction" class="section level1">
132
- <h1>Introduction</h1>
133
- <p>According to Wikipedia “Ruby is a dynamic, interpreted, reflective, object-oriented, general-purpose programming language. It was designed and developed in the mid-1990s by Yukihiro”Matz&quot; Matsumoto in Japan.&quot; It reached high popularity with the development of Ruby on Rails (RoR) by David Heinemeier Hansson. RoR is a web application framework first released around 2005. It makes extensive use of Ruby’s metaprogramming features. With RoR, Ruby became very popular. According to <a href="https://www.tiobe.com/tiobe-index/ruby/">Ruby’s Tiobe index</a> it peeked in popularity around 2008. Then it’s popularity declined until 2015 when it started picking up again. At the time of this writing (November 2018), the Tiobe index puts ruby in 16th position.</p>
134
- <p>Python, a similar language to Ruby, ranks 4th in the index. Java, C and C++ take the first three positions. Ruby is often criticized for its focus on web applications. But Ruby can do <a href="https://github.com/markets/awesome-ruby">much more</a> than just web applications. Yet, for scientific computing, Ruby lags way behind Python and R. Python has Django framework for web, NumPy for numerical arrays, Pandas for data analysis. R is a free software environment for statistical computing and graphics with thousands of libraries for data analysis.</p>
135
- <p>Until recently, there was no real perspective for Ruby to bridge this gap. Implementing a complete scientific computing infrastructure would take too long. Comes GraalVM into the picture:</p>
133
+ <p>According to Wikipedia “Ruby is a dynamic, interpreted, reflective, object-oriented, general-purpose programming language. It was designed and developed in the mid-1990s by Yukihiro”Matz&quot; Matsumoto in Japan.&quot; It reached high popularity with the development of Ruby on Rails (RoR) by David Heinemeier Hansson. RoR is a web application framework first released around 2005. It makes extensive use of Ruby’s metaprogramming features. With RoR, Ruby became very popular. According to <a href="https://www.tiobe.com/tiobe-index/ruby/">Ruby’s Tiobe index</a> it peeked in popularity around 2008, then declined until 2015 when it started picking up again. At the time of this writing (November 2018), the Tiobe index puts Ruby in 16th position as most popular language.</p>
134
+ <p>Python, a language similar to Ruby, ranks 4th in the index. Java, C and C++ take the first three positions. Ruby is often criticized for its focus on web applications. But Ruby can do <a href="https://github.com/markets/awesome-ruby">much more</a> than just web applications. Yet, for scientific computing, Ruby lags way behind Python and R. Python has Django framework for web, NumPy for numerical arrays, Pandas for data analysis. R is a free software environment for statistical computing and graphics with thousands of libraries for data analysis.</p>
135
+ <p>Until recently, there was no real perspective for Ruby to bridge this gap. Implementing a complete scientific computing infrastructure would take too long. Enters <a href="https://www.graalvm.org/">Oracle’s GraalVM</a>:</p>
136
136
  <blockquote>
137
137
  <p>GraalVM is a universal virtual machine for running applications written in JavaScript, Python 3, Ruby, R, JVM-based languages like Java, Scala, Kotlin, and LLVM-based languages such as C and C++.</p>
138
138
  <p>GraalVM removes the isolation between programming languages and enables interoperability in a shared runtime. It can run either standalone or in the context of OpenJDK, Node.js, Oracle Database, or MySQL.</p>
@@ -145,14 +145,13 @@ $(document).ready(function () {
145
145
  </ul>
146
146
  <p>With GraalVM we aim to allow developers to freely choose the right language for the task at hand without making compromises.</p>
147
147
  </blockquote>
148
- <p>As stated above, GraalVM is a <em>universal</em> virtual machine that allows Ruby and R (and other languages) to run on the same environment. GraalVM allows polyglot applications to <em>seamlessly</em> interact with one another and pass values from one language to the other. Galaaz, a gem for Ruby, intends to tightly couple Ruby and R and allow those languages to interact in a way that the user will be unaware of such interaction.</p>
149
- <p>Library wrapping is an usual way of bringing features from one language into another. To improve performance, Python often wraps more efficient C libraries. For the Python developer, the existence of such C libraries is of no concern. The problem with library wrapping is that for any new library, there is the need to handcraft a new wrapper.</p>
150
- <p>Galaaz, instead of wrapping a single C or R library, wraps the whole of the R language in Ruby. Doing so, all thousands of R libraries are available to Ruby developers. Also any new library developed in R will be available without a new wrapping effort.</p>
151
- <p>This article shows how Ruby can use R’s ggplot2 library tranparantly, and bring to Ruby the power of high quality scientific plotting. it also shows that migrating from R to Ruby with Galaaz is a matter of small syntactic changes. Using Ruby, the R developer can use all of Ruby’s powerful OO features. It also becomes much easier to move code from the analysis phase to the production phase.</p>
152
- <p>In this article we will explore the R ToothGrowth dataset. In doing so, we will create some boxplots. A primer on boxplot is available in <a href="https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51">this article</a>.</p>
153
- <p>We will also create a Corporate Template ensuring that plots will have a consistent visualization. This template is build using a Ruby module. There is a way of building ggplot themes that will work the same as the Ruby module. Yet, writing a new theme requires specific knowledge. Ruby modules are standard to the language and don’t need special knowledge.</p>
154
- <p>In <a href="https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021">this blog</a> we show a scatter plot in Ruby also with Galaaz.</p>
155
- </div>
148
+ <p>As stated above, GraalVM is a <em>universal</em> virtual machine that allows Ruby and R (and other languages) to run on the same environment. GraalVM allows polyglot applications to <em>seamlessly</em> interact with one another and pass values from one language to the other. Although a great idea, GraalVM still requires application writers to know several languages. To eliminate that requirement, we built Galaaz, a gem for Ruby, to tightly couple Ruby and R and allow those languages to interact in a way that the user will be unaware of such interaction. In other words, a Ruby programmer will be able to use all the capabilities of R without knowing the R syntax.</p>
149
+ <p>Library wrapping is a usual way of bringing features from one language into another. To improve performance, Python often wraps more efficient C libraries. For the Python developer, the existence of such C libraries is hidden. The problem with library wrapping is that for any new library, there is the need to handcraft a new wrapper.</p>
150
+ <p>Galaaz, instead of wrapping a single C or R library, wraps the whole R language in Ruby. Doing so, all thousands of R libraries are available immediately to Ruby developers without any new wrapping effort.</p>
151
+ <p>To show the power of Galaaz, we show in this article how Ruby can use R’s ggplot2 library tranparantly bringing to Ruby the power of high quality scientific plotting. We also show that migrating from R to Ruby with Galaaz is a matter of small syntactic changes. By using Ruby, the R developer can use all of Ruby’s powerful object-oriented features. Also, with Ruby, it becomes much easier to move code from the analysis phase to the production phase.</p>
152
+ <p>In this article we will explore the R ToothGrowth dataset. To illustrate, we will create some boxplots. A primer on boxplot is available in <a href="https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51">this article</a>.</p>
153
+ <p>We will also create a Corporate Template ensuring that plots will have a consistent visualization. This template is built using a Ruby module. There is a way of building ggplot themes that will work the same as the Ruby module. Yet, writing a new theme requires specific knowledge on theme writing. Ruby modules are standard to the language and don’t need special knowledge.</p>
154
+ <p><a href="https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021">Here</a> we show a scatter plot in Ruby also with Galaaz.</p>
156
155
  <div id="gknit" class="section level1">
157
156
  <h1>gKnit</h1>
158
157
  <p><em>Knitr</em> is an application that converts text written in rmarkdown to many different output formats. For instance, a writer can convert an rmarkdown document to HTML, <span class="math inline">\(LaTex\)</span>, docx and many other formats. Rmarkdown documents can contain text and <em>code chunks</em>. Knitr formats code chunks in a grayed box in the output document. It also executes the code chunks and formats the output in a white box. Every line of output from the execution code is preceded by ‘##’.</p>
@@ -161,8 +160,8 @@ $(document).ready(function () {
161
160
  </div>
162
161
  <div id="exploring-the-dataset" class="section level1">
163
162
  <h1>Exploring the Dataset</h1>
164
- <p>Let’s start by exploring our selected dataset. ToothGrowth is an R dataset. A dataset is like an excel spreadsheet, but in which each column has only one type of data. For instance one column can have float, the other integer, and a third strings. This dataset analyses the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs, where each animal received one of three dose levels of Vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (OJ) or ascorbic acid (a form of vitamin C and coded as VC).</p>
165
- <p>The ToothGrowth dataset contains three columns: ‘len’, ‘supp’ and ‘dose’. Let’s take a look at a few rows of this dataset. In Galaaz, to have access to an R variable we use the corresponding Ruby symbol preceeded by the tilda (‘~’) function. Note in the following chunk that Ruby’s ‘<span class="citation">@tooth_growth</span>’ is assigned the value of ‘~:ToothGrowth’. ‘ToothGrowth’ is the R variable containing the dataset of interest.</p>
163
+ <p>Let’s start by exploring our selected dataset. ToothGrowth is an R dataset. A dataset is like a simple excel spreadsheet, in which each column has only one type of data. For instance one column can have float, the other integer, and a third strings. This dataset analyzes the length of odontoblasts (cells responsible for tooth growth) in 60 guinea pigs, where each animal received one of three dose levels of Vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice OJ or ascorbic acid (a form of vitamin C and coded as VC).</p>
164
+ <p>The ToothGrowth dataset contains three columns: ‘len’, ‘supp’ and ‘dose’. Let’s take a look at a few rows of this dataset. In Galaaz, R variables are accessed by using the corresponding Ruby symbol preceeded by the tilda (‘~’) function. Note in the following chunk that ‘ToothGrowth’ is the R variable and Ruby’s ‘<span class="citation">@tooth_growth</span>’ is assigned the value of ‘~:ToothGrowth’.</p>
166
165
  <pre class="ruby"><code># Read the R ToothGrowth variable and assign it to the
167
166
  # Ruby instance variable @tooth_growth that will be
168
167
  # available to all Ruby chunks in this document.
@@ -181,14 +180,14 @@ puts @tooth_growth.head</code></pre>
181
180
  # elements of this column with the 'head' method.
182
181
  puts @tooth_growth.len.head</code></pre>
183
182
  <pre><code>## [1] 4.2 11.5 7.3 5.8 6.4 10.0</code></pre>
184
- <p>The ‘dose’ column contains a numeric value wiht either, 0.5, 1 or 2. Although those are number, they are better interpreted as a <a href="https://swcarpentry.github.io/r-novice-inflammation/12-supp-factors/">factor or cathegory</a>. So, let’s convert our ‘dose’ column from numeric to ‘factor’. In R, the function ‘as.factor’ is used to convert data in a vector to factors. To use this function from Galaaz the dot (‘.’) in the function name is substituted by ’__‘(double underline). The function ’as.factor’ becomes ’R.as__factor’ or just ’as__factor’ when chaining.</p>
183
+ <p>The ‘dose’ column contains a numeric value with either, 0.5, 1 or 2, although the first 6 rows as seen above only contain the 0.5 values. Even though those are number, they are better interpreted as a <a href="https://swcarpentry.github.io/r-novice-inflammation/12-supp-factors/">factor or cathegory</a>. So, let’s convert our ‘dose’ column from numeric to ‘factor’. In R, the function ‘as.factor’ is used to convert data in a vector to factors. To use this function from Galaaz the dot (‘.’) in the function name is substituted by ’__‘(double underline). The function ’as.factor’ becomes ’R.as__factor’ or just ’as__factor’ when chaining.</p>
185
184
  <pre class="ruby"><code># convert the dose to a factor
186
185
  @tooth_growth.dose = @tooth_growth.dose.as__factor</code></pre>
187
186
  <p>Let’s explore some more details of this dataset. In particular, let’s look at its dimensions, structure and summary statistics.</p>
188
187
  <pre class="ruby"><code>puts @tooth_growth.dim</code></pre>
189
188
  <pre><code>## [1] 60 3</code></pre>
190
189
  <p>This dataset has 60 rows, one for each subject and 3 columns, as we have already seen.</p>
191
- <p>Note that we do not call ‘puts’ when using the ‘str’ function. This functions does not return anything and prints the structure of the dataset as a side effect.</p>
190
+ <p>Note that we do not need to call ‘puts’ when using the ‘str’ function. This functions does not return anything and prints the structure of the dataset as a side effect.</p>
192
191
  <pre class="ruby"><code>@tooth_growth.str</code></pre>
193
192
  <pre><code>## 'data.frame': 60 obs. of 3 variables:
194
193
  ## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
@@ -209,13 +208,13 @@ puts @tooth_growth.len.head</code></pre>
209
208
  <h1>Doing the Data Analysis</h1>
210
209
  <div id="quick-plot-for-seing-the-data" class="section level2">
211
210
  <h2>Quick plot for seing the data</h2>
212
- <p>Let’s now create our first plot with the given data by accessing ggplot2 from Ruby. For Rubyist that have never seen or used ggplot2, here is the description of ggplot found on its home page:</p>
211
+ <p>Let’s now create our first plot with the given data by accessing ggplot2 from Ruby. For Rubyists that have never seen or used ggplot2, here is the description of ggplot found in its home page:</p>
213
212
  <blockquote>
214
213
  <p>“ggplot2 is a system for declaratively creating graphics, based on <em>The Grammar of Graphics</em>. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.”</p>
215
214
  </blockquote>
216
215
  <p>This description might be a bit cryptic and it is best to see it at work to understand it. Basically, in the <em>grammar of graphics</em> developers add layers of components such as grid, axis, data, title, subtitle and also graphical primitives such as <em>bar plot</em>, <em>box plot</em>, to form the final graphics.</p>
217
- <p>In order to make a plot, we use the ‘ggplot’ function to the dataset. In R, this would be written as <code>ggplot(&lt;dataset&gt;, ...)</code>. In Galaaz, use either <code>R.ggplot(&lt;dataset&gt;, ...)</code>, or <code>&lt;dataset&gt;.ggplot(...)</code>. In the graph specification bellow, we use the second notation that looks more Ruby like. The plot specifies the ‘dose’ on the <span class="math inline">\(x\)</span> axis and the ‘length’ on the <span class="math inline">\(y\)</span> axis with the aes’ method. ‘E.aes(x: :dose, y: :len)’. To specify the type of plot to create add a geom to the plot. For a boxplot, the geom is R.geom_boxplot.</p>
218
- <p>Note also that we have a call to ‘R.png’ before plotting and ’R.dev__off’ after the print statement. ‘R.png’ opens a ‘png device for outputting the plot. ’R.dev__off’ closes the device and creates the ‘png’ file. If we do no pass a name to the ‘png’ function, the image gets a default name of ‘Rplot&lt;nnn&gt;’ where &lt;nnn&gt; is the number of the plot. We can then include the generated ‘png’ file in the document by adding an rmarkdown directive.</p>
216
+ <p>In order to make a plot, we use the ‘ggplot’ function to the dataset. In R, this would be written as <code>ggplot(&lt;dataset&gt;, ...)</code>. Galaaz gives you the flexibility to use either <code>R.ggplot(&lt;dataset&gt;, ...)</code> or <code>&lt;dataset&gt;.ggplot(...)</code>. In the graph s pecification bellow, we use the second notation that looks more like Ruby. ggplot uses the ‘aes’ method to specify x and y axes; in this case, the ‘dose’ on the <span class="math inline">\(x\)</span> axis and the ‘length’ on the <span class="math inline">\(y\)</span> axis: ‘E.aes(x: :dose, y: :len)’. To specify the type of plot add a geom to the plot. For a boxplot, the geom is R.geom_boxplot.</p>
217
+ <p>Note also that we have a call to ‘R.png’ before plotting and ’R.dev__off’ after the print statement. ‘R.png’ opens a ‘png device for outputting the plot. If we do no pass a name to the ‘png’ function, the image gets a default name of ‘Rplot&lt;nnn&gt;’ where &lt;nnn&gt; is the number of the plot. ’R.dev__off’ closes the device and creates the ‘png’ file. We can then include the generated ‘png’ file in the document by adding an rmarkdown directive.</p>
219
218
  <pre class="ruby"><code>require 'ggplot'
220
219
 
221
220
  R.png(&quot;figures/dose_len.png&quot;)
@@ -228,12 +227,12 @@ R.dev__off</code></pre>
228
227
  <img src="" />
229
228
 
230
229
  </div>
231
- <p>Great! We’ve just managed to create and save our first plot in Ruby with only four lines of code. We can see with this plot a clear trend: as the dose of the supplement is increased, so is the length of teeth.</p>
230
+ <p>Great! We’ve just managed to create and save our first plot in Ruby with only four lines of code. We can now easily see with this plot a clear trend: as the dose of the supplement is increased, so is the length of teeth.</p>
232
231
  </div>
233
232
  <div id="facetting-the-plot" class="section level2">
234
233
  <h2>Facetting the plot</h2>
235
- <p>This first plot shows a trend, but our data has information about two different forms of delivery method, either by Orange Juice (OJ) or by Vitamin C (VC). Let’s then try to create a plot that explicits the effect of each delivery method. This next plot is a <em>facetted</em> plot where each delivery method gets is own plot. On the left side, the plot shows the OJ delivery method. On the right side, we see the VC delivery method. To obtain this plot, we use the ‘R.facet_grid’ function, that automatically creates the facets based on the delivery method factors. The parameter to the ‘facet_grid’ method is a <a href="https://thomasleeper.com/Rcourse/Tutorials/formulae.html"><em>formula</em></a>.</p>
236
- <p>In Galaaz, formulas are written a bit differently than in R. The following changes are necessary:</p>
234
+ <p>This first plot shows a trend, but our data has information about two different forms of delivery method, either by Orange Juice OJ or by Vitamin C VC. Let’s then try to create a plot that helps us discern the effect of each delivery method. This next plot is a <em>facetted</em> plot where each delivery method gets is own plot. On the left side, the plot shows the OJ delivery method. On the right side, we see the VC delivery method. To obtain this plot, we use the ‘R.facet_grid’ function, that automatically creates the facets based on the delivery method factors. The parameter to the ‘facet_grid’ method is a <a href="https://thomasleeper.com/Rcourse/Tutorials/formulae.html"><em>formula</em></a>.</p>
235
+ <p>In Galaaz we give programmers the flexibility to use two different ways to write formulas. In the first way, the following changes from writing formulas (for example ‘x ~ y’) in R are necessary:</p>
237
236
  <ul>
238
237
  <li>R symbols are represented by the same Ruby symbol prefixed with the ‘+’ method. The symbol <code>x</code> in R becomes <code>+:x</code> in Ruby;</li>
239
238
  <li>The ‘~’ operator in R becomes ‘=~’ in Ruby. The formula <code>x ~ y</code> in R is written as <code>+:x =~ +:y</code> in Ruby;</li>
@@ -260,7 +259,7 @@ R.dev__off</code></pre>
260
259
  </div>
261
260
  <div id="adding-color" class="section level2">
262
261
  <h2>Adding Color</h2>
263
- <p>If this paper was about data analysis, we should make a better analysis of the trends and should improve the statistical analysis. But we are interested in working with ggplot in Ruby. So, Let’s add some color to this plot to make the trend and comparison more visible. In the following plot, the boxes are color coded by dose. To add color, it is enough to add <code>fill: :dose</code> to the aesthetic of boxplot. With this command each ‘dose’ factor gets its own color.</p>
262
+ <p>If we were writing about data analysis, we would make a better analysis of the trends and improve the statistical analysis. But here we are interested in working with ggplot in Ruby. So, let’s add some color to this plot to make the trend and comparison more visible. In the following plot, the boxes are color coded by dose. To add color, it is enough to add <code>fill: :dose</code> to the aesthetic of boxplot. With this command each ‘dose’ factor gets its own color.</p>
264
263
  <pre class="ruby"><code>R.png(&quot;figures/facets_by_delivery_color.png&quot;)
265
264
 
266
265
  @bp = @bp + R.geom_boxplot(E.aes(fill: :dose))
@@ -271,11 +270,11 @@ R.dev__off</code></pre>
271
270
  <img src="" />
272
271
 
273
272
  </div>
274
- <p>Facetting helps us compare the general trends in the (OJ) and (VC) delivery methods. Adding color allow us to compare specifically how each dosage impacts the teeth growth. It is possible to observe that with smaller doses, up to 1mg, (OJ) performs better than (VC) (red color). For 2mg, both (OJ) and (VC) have the same median, but (OJ) is less disperse (blue color). For 1mg (green color), (OJ) is significantly bettern than (VC). By this very quick analysis, it seems that (OJ) is a better delivery method than (VC).</p>
273
+ <p>Facetting helps us compare the general trends for each delivery method. Adding color allow us to compare specifically how each dosage impacts the tooth growth. It is possible to observe that with smaller doses, up to 1mg, OJ performs better than VC (red color). For 2mg, both OJ and VC have the same median, but OJ is less disperse (blue color). For 1mg (green color), OJ is significantly bettern than VC. By this very quick visual analysis, it seems that OJ is a better delivery method than VC.</p>
275
274
  </div>
276
275
  <div id="clarifying-the-data" class="section level2">
277
276
  <h2>Clarifying the data</h2>
278
- <p>Boxplots give us a nice idea of the distribution of data, but looking at those plots with large colored boxes leaves us wondering what is going on on those boxes. According to Edward Tufte in Envisioning Information:</p>
277
+ <p>Boxplots give us a nice idea of the distribution of data, but looking at those plots with large colored boxes leaves us wondering what else is going on. According to Edward Tufte in Envisioning Information:</p>
279
278
  <blockquote>
280
279
  <p>Thin data rightly prompts suspicions: “What are they leaving out? Is that really everything they know? What are they hiding? Is that all they did?” Now and then it is claimed that vacant space is “friendly” (anthropomorphizing an inherently murky idea) but <em>it is not how much empty space there is, but rather how it is used. It is not how much information there is, but rather how effectively it is arranged.</em></p>
281
280
  </blockquote>
@@ -283,7 +282,7 @@ R.dev__off</code></pre>
283
282
  <blockquote>
284
283
  <p>A most unconventional design strategy is revealed: <em>to clarify, add detail.</em></p>
285
284
  </blockquote>
286
- <p>Let’s then use this wisdom and add yet another layer of data to our plot, so that we clarify it with detail and do not leave large empty boxes. In this next plot, we add data points for each of the 60 pigs in the experiment. For that, add the function ‘R.geom_point’ to the plot.</p>
285
+ <p>Let’s use this wisdom and add yet another layer of data to our plot, so that we clarify it with detail and do not leave large empty boxes. In this next plot, we add data points for each of the 60 pigs in the experiment. For that, add the function ‘R.geom_point’ to the plot.</p>
287
286
  <pre class="ruby"><code>R.png(&quot;figures/facets_with_points.png&quot;)
288
287
 
289
288
  # Split in vertical direction
@@ -296,8 +295,8 @@ R.dev__off</code></pre>
296
295
  <img src="" />
297
296
 
298
297
  </div>
299
- <p>Now we can see the actual distribution of all the 60 subject. Actually, this is not totally true. We have a hard time seing all 60 subjects. It seems that some points might be placed one over the other hiding useful information.</p>
300
- <p>But no sweat! Another layer might solve the problem. In the following plot a new layer called ‘geom_jitter’ is added to the plot. This adds randomness to the position of the points, making it easier to see all of then and preventing data hiding. We also add color and change the shape of the points, making them even easier to see.</p>
298
+ <p>Now we can see the actual distribution of all the 60 subjects. Actually, this is not totally true. We have a hard time seing all 60 subjects. It seems that some points might be placed one over the other hiding useful information.</p>
299
+ <p>But no sweat! Another layer might solve the problem. In the following plot a new layer called ‘geom_jitter’ is added to the plot. Jitter adds a small amount of random variation to the location of each point, and is a useful way of handling overplotting caused by discreteness in smaller datasets. This makes it easier to see all of the points and prevents data hiding. We also add color and change the shape of the points, making them even easier to see.</p>
301
300
  <pre class="ruby"><code>R.png(&quot;figures/facets_with_jitter.png&quot;)
302
301
 
303
302
  # Split in vertical direction
@@ -305,7 +304,7 @@ puts @bp + R.geom_jitter(shape: 23, color: &quot;cyan3&quot;, size: 1)
305
304
 
306
305
  R.dev__off</code></pre>
307
306
  <div class="figure">
308
- <img src="" />
307
+ <img src="" />
309
308
 
310
309
  </div>
311
310
  <p>Now we can see all 60 points in the graph. We have here a much higher information density and we can see outliers and subjects distribution.</p>
@@ -313,12 +312,12 @@ R.dev__off</code></pre>
313
312
  </div>
314
313
  <div id="preparing-the-plot-for-presentation" class="section level1">
315
314
  <h1>Preparing the Plot for Presentation</h1>
316
- <p>We have come a long way since our first plot. As was already said, this is not an article about data analysis and the focus is on the integration of Ruby and ggplot. So, let’s assume that the analysis is now done. Yet, ending the analysis does not mean that the work is done. On the contrary, the hardest part is yet to come!</p>
315
+ <p>We have come a long way since our first plot. As we already said, this is not an article about data analysis and the focus is on the integration of Ruby and ggplot. So, let’s assume that the analysis is now done. Yet, ending the analysis does not mean that the work is done. On the contrary, the hardest part is yet to come!</p>
317
316
  <p>After the analysis it is necessary to communicate it by making a final plot for presentation. The last plot has all the information we want to share, but it is not very pleasing to the eye.</p>
318
317
  <div id="improving-colors" class="section level2">
319
318
  <h2>Improving Colors</h2>
320
319
  <p>Let’s start by trying to improve colors. For now, we will not use the jitter layer. The previous plot has three bright colors that have no relashionship between them. Is there any obvious, or non-obvious for that matter, interpretation for the colors? Clearly, they are just random colors selected automatically by our software. Although those colors helped us understand the data, for a final presentation random colors can distract the viewer.</p>
321
- <p>In the following plot we use shades function ‘scale_fill_manual’ to change the colors of the boxes and order of labels. For colors we use shades of blue for each dosage, with light blue (‘cyan’) representing the lower dose and deep blue (‘deepskyblue4’) the higher dose. Also the smaller value (0.5) is on the botton of the labels and (2) at the top. This ordering seems more natural and matches with the actual order of the colors in the plot.</p>
320
+ <p>In the following plot we use shades function ‘scale_fill_manual’ to change the colors of the boxes and order of labels. For colors, we use shades of blue for each dosage, with light blue (‘cyan’) representing the lower dose and deep blue (‘deepskyblue4’) the higher dose. Also, the legend could be improved: we use the ‘breaks’ parameter to put the smaller value (0.5) at the botton of the labels and the largest (2) at the top. This ordering seems more natural and matches with the actual order of the colors in the plot.</p>
322
321
  <pre class="ruby"><code>R.png(&quot;figures/facets_by_delivery_color2.png&quot;)
323
322
 
324
323
  @bp = @bp +
@@ -353,15 +352,15 @@ puts @violin
353
352
 
354
353
  R.dev__off</code></pre>
355
354
  <div class="figure">
356
- <img src="" />
355
+ <img src="" />
357
356
 
358
357
  </div>
359
358
  <p>This plot is an alternative to the original boxplot. For the final presentation, it is important to think which graphics will be best understood by our audience. A violin plot is a less known plot and could add mental overhead, yet, in my opinion, it does look a lit bit better than the boxplot and provides even more information than the boxplot with jitter.</p>
360
359
  </div>
361
360
  <div id="adding-decoration" class="section level2">
362
361
  <h2>Adding Decoration</h2>
363
- <p>Our final plot is starting to take shape, but a presentation plot should have at least a title, labels on the axis and maybe some other decorations. Let’s start adding those. Since decoration requires more graph area, this new plot has a ‘width’ and ‘height’ specification. When there is no specification, the default values for width and height are 480.</p>
364
- <p>The ‘labs’ function adds require decoration. In this example we use ‘title’, ‘subtitle’, ‘x’ for the <span class="math inline">\(x\)</span> axis label and ‘y’, for the <span class="math inline">\(y\)</span> axis label, and ‘caption’ for information about the plot.</p>
362
+ <p>Our final plot is starting to take shape, but a presentation plot should have at least a title, labels on the axes and maybe some other decorations. Let’s start adding those. Since decoration requires more graph area, this new plot has a ‘width’ and ‘height’ specification. When there is no specification, the default values from R for width and height are 480.</p>
363
+ <p>The ‘labs’ function adds the required decoration. In this example we use ‘title’, ‘subtitle’, ‘x’ for the <span class="math inline">\(x\)</span> axis label and ‘y’, for the <span class="math inline">\(y\)</span> axis label, and ‘caption’ for information about the plot (for clarity, we defined a caption variable using Ruby’s Here Doc style).</p>
365
364
  <pre class="ruby"><code>R.png(&quot;figures/facets_with_decorations.png&quot;, width: 540, height: 560)
366
365
 
367
366
  caption = &lt;&lt;-EOT
@@ -370,8 +369,8 @@ Each animal received one of three dose levels of vitamin C.
370
369
  EOT
371
370
 
372
371
  @decorations =
373
- R.labs(title: &quot;Tooth Growth: Length by Dose&quot;,
374
- subtitle: &quot;Faceted by delivery method, (OJ) or (VC)&quot;,
372
+ R.labs(title: &quot;Tooth Growth: Length vs Vitamin C Dose&quot;,
373
+ subtitle: &quot;Faceted by delivery method, OJ or VC&quot;,
375
374
  x: &quot;Dose (mg)&quot;, y: &quot;Teeth length&quot;,
376
375
  caption: caption)
377
376
 
@@ -379,14 +378,15 @@ puts @bp + @decorations
379
378
 
380
379
  R.dev__off</code></pre>
381
380
  <div class="figure">
382
- <img src="" />
381
+ <img src="" />
383
382
 
384
383
  </div>
385
384
  </div>
386
385
  <div id="the-corp-theme" class="section level2">
387
386
  <h2>The Corp Theme</h2>
388
- <p>We are almost done. But the plot does not yet look nice to the eye. We are still distracted by many aspects of the graph. First, the back font color does not look good. Then plot background, borders, grids all add clutter to the plot.</p>
389
- <p>We will now define our corporate theme. In this theme, we remove borders and grids. The background if left for faceted plots but removed for non-faceted plots. Font colors are a shade o blue (color: ‘#00080’). Axis labels are moved near the end of the axis and written in ‘bold’.</p>
387
+ <p>We are almost done. But the default plot configuration does not yet look nice to the eye. We are still distracted by many aspects of the graph. First, the back font color does not look good. Then plot background, borders, grids all add clutter to the plot.</p>
388
+ <p>We will now define our corporate theme. in a module that can be used/loaded for all plots, similar to CSS or any other style definition.</p>
389
+ <p>In this theme, we remove borders and grids. The background if left for faceted plots but removed for non-faceted plots. Font colors are a shade o blue (color: ‘#00080’). Axis labels are moved near the end of the axis and written in ‘bold’.</p>
390
390
  <pre class="ruby"><code>module CorpTheme
391
391
 
392
392
  R.install_and_loads 'RColorBrewer'
@@ -439,6 +439,8 @@ end</code></pre>
439
439
  </div>
440
440
  <div id="final-box-plot" class="section level2">
441
441
  <h2>Final Box Plot</h2>
442
+ <p>We can now easily make our final boxplot and violin plot. All the layers for the plot were added in order to expose our understanding of the data and the need to present the result to our audience.</p>
443
+ <p>The final specification is just the addition of all layers build up to this point (<span class="citation">@bp</span>), plus the decorations (<span class="citation">@decorations</span>), plus the corporate theme.</p>
442
444
  <p>Here is our final boxplot, without jitter.</p>
443
445
  <pre class="ruby"><code>R.png(&quot;figures/final_box_plot.png&quot;, width: 540, height: 560)
444
446
 
@@ -446,26 +448,20 @@ puts @bp + @decorations + CorpTheme.global_theme(faceted: true)
446
448
 
447
449
  R.dev__off</code></pre>
448
450
  <div class="figure">
449
- <img src="" />
451
+ <img src="" />
450
452
 
451
453
  </div>
452
- </div>
453
- <div id="final-violin-plot" class="section level2">
454
- <h2>Final Violin Plot</h2>
455
- <p>Here is the final violin plot, with jitter and the same look and feel of the corporate boxplot.</p>
454
+ <p>And here is the final violin plot, with jitter and the same look and feel of the corporate boxplot.</p>
456
455
  <pre class="ruby"><code>R.png(&quot;figures/final_violin_plot.png&quot;, width: 540, height: 560)
457
456
 
458
457
  puts @violin + @decorations + CorpTheme.global_theme(faceted: true)
459
458
 
460
459
  R.dev__off</code></pre>
461
- <div class="figure">
462
- <img src="" />
463
-
464
- </div>
460
+ <p>[]<a href="https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_violin_plot.png" class="uri">https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_violin_plot.png</a></p>
465
461
  </div>
466
462
  <div id="another-view" class="section level2">
467
463
  <h2>Another View</h2>
468
- <p>Finally, here is a last plot, with the same look and feel as before but facetted by dose and not by supplement.</p>
464
+ <p>We now make another plot, with the same look and feel as before but facetted by dose and not by supplement. This shows how easy it is to create new plots by just changing small statement on the <em>grammar of graphics</em>.</p>
469
465
  <pre class="ruby"><code>R.png(&quot;figures/facet_by_dose.png&quot;, width: 540, height: 560)
470
466
 
471
467
  caption = &lt;&lt;-EOT
@@ -492,10 +488,10 @@ R.dev__off</code></pre>
492
488
  </div>
493
489
  <div id="conclusion" class="section level1">
494
490
  <h1>Conclusion</h1>
495
- <p>Galaaz tightly couples Ruby and R in a way that Ruby developers do not need to be aware of the executing R engine. For the Ruby developer the existence of R is of no consequence. For her, she is just coding in Ruby. On the other hand, for the R developer, migration to Ruby is a matter of small syntactic changes and very gentle learning curve. As the R developer becomes more proficient in Ruby, he can start using ‘classes’, ‘modules’, ‘procs’, ‘lambdas’.</p>
496
- <p>This coupling shows the power of GraalVM and Truffle polyglot environment. Trying to bring to Ruby the power of R starting from scratch is an enourmous endeavour and would probably never be accomplished. Today’s data scientists would certainly stick with either Python or R. Now, both the Ruby and R communities might benefit from this marriage. Also, the process to couple Ruby and R can be also be done to couple Ruby and JavaScript and maybe also Ruby and Python. In a polyglot world a <em>uniglot</em> language might be extremely relevant.</p>
491
+ <p>In this article, we introduce Galaaz and show how to tightly couple Ruby and R in a way that Ruby developers do not need to be aware of the executing R engine. For the Ruby developer the existence of R is of no consequence, she is just coding in Ruby. On the other hand, for the R developer, migration to Ruby is a matter of small syntactic changes with a very gentle learning curve. As the R developer becomes more proficient in Ruby, he can start using ‘classes’, ‘modules’, ‘procs’, ‘lambdas’.</p>
492
+ <p>Trying to bring to Ruby the power of R starting from scratch is an enourmous endeavour and would probably never be accomplished. Today’s data scientists would certainly stick with either Python or R. Now, both the Ruby and R communities can benefit from this marriage, provided by Galaaz on top of GraalVM and Truffle’s polyglot environment. We presented the process to couple Ruby and R, but this process can also be done to couple Ruby and JavaScript or Ruby and Python. In a polyglot world a <em>uniglot</em> language might be extremely relevant.</p>
497
493
  <p>From the perspective of performance, GraalVM and Truffle promises improvements that could reach over 10 times, both for <a href="https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb">FastR</a> and for <a href="https://rubykaigi.org/2018/presentations/eregontp.html">TruffleRuby</a>.</p>
498
- <p>This article has shown how to improve a plot step-by-step. Starting from a very simple boxplot with all default configurations, we moved slowly to our final plot. The important point here is not if the final plot is actually beautiful, but that there is a process of small steps improvements that can be followed to getting a final plot ready for presentation.</p>
494
+ <p>This article has shown how to improve a plot step-by-step. Starting from a very simple boxplot with all default configurations, we moved slowly to our final plot. The important point here is not if the final plot is actually beautiful (as beauty is in the eye of the beholder), but that there is a process of small steps improvements that can be followed to getting a final plot ready for presentation.</p>
499
495
  <p>Finally, this whole article was written in rmarkdown and compiled to HTML by <em>gknit</em>, an application that wraps <em>knitr</em> and allows documenting Ruby code. This application can be of great help for any Rubyist trying to write articles, blogs or documentation for Ruby.</p>
500
496
  </div>
501
497
  <div id="installing-galaaz" class="section level1">
@@ -527,7 +523,6 @@ R.dev__off</code></pre>
527
523
  <li>gknit <filename></li>
528
524
  <li>In a scrip add: require ‘galaaz’</li>
529
525
  </ul>
530
- <p>And now that you’ve read this far, here’s how to submit your story to the freeCodeCamp publication: send an email to submit at freecodecamp org. Include the URL for your story on Medium (preferably an unpublished draft) and the word “bananas” so that we’ll know that you have read all this. Only send one story URL per email. There’s no need to add anything further to your email — we just read the stories and judge them based on their own merits.</p>
531
526
  </div>
532
527
  </div>
533
528
 
@@ -1,6 +1,8 @@
1
1
  ---
2
2
  title: "How to make Beautiful Ruby Plots with Galaaz"
3
- author: "Rodrigo Botafogo"
3
+ author:
4
+ - "Rodrigo Botafogo"
5
+ - "Daniel Mossé - University of Pittsburgh"
4
6
  tags: [Tech, Data Science, Ruby, R, GraalVM]
5
7
  date: "November 19th, 2018"
6
8
  output:
@@ -9,25 +11,27 @@ output:
9
11
  keep_md: true
10
12
  pdf_document:
11
13
  includes:
12
- in_header: ["../../sty/galaaz.sty"]
14
+ in_header: "../../sty/galaaz.sty"
15
+ keep_tex: yes
13
16
  number_sections: yes
17
+ toc: true
18
+ toc_depth: 2
19
+ fontsize: 11pt
14
20
  ---
15
21
 
16
22
 
17
23
 
18
- # Introduction
19
-
20
24
  According to Wikipedia "Ruby is a dynamic, interpreted, reflective, object-oriented,
21
25
  general-purpose programming language. It was designed and developed in the mid-1990s by Yukihiro
22
26
  "Matz" Matsumoto in Japan." It reached high popularity with the development of Ruby on Rails
23
27
  (RoR) by David Heinemeier Hansson. RoR is a web application framework first released
24
28
  around 2005. It makes extensive use of Ruby's metaprogramming features. With RoR,
25
29
  Ruby became very popular. According to [Ruby's Tiobe index](https://www.tiobe.com/tiobe-index/ruby/)
26
- it peeked in popularity around 2008. Then it's popularity
27
- declined until 2015 when it started picking up again. At the time of
28
- this writing (November 2018), the Tiobe index puts ruby in 16th position.
30
+ it peeked in popularity around 2008, then declined until 2015 when it started picking up again.
31
+ At the time of this writing (November 2018), the Tiobe index puts Ruby in 16th position as
32
+ most popular language.
29
33
 
30
- Python, a similar language to Ruby, ranks 4th in the index. Java, C and C++ take the
34
+ Python, a language similar to Ruby, ranks 4th in the index. Java, C and C++ take the
31
35
  first three positions. Ruby is often criticized for its focus on web applications.
32
36
  But Ruby can do [much more](https://github.com/markets/awesome-ruby) than just web applications.
33
37
  Yet, for scientific computing, Ruby lags way behind Python and R. Python has
@@ -37,7 +41,7 @@ of libraries for data analysis.
37
41
 
38
42
  Until recently, there was no real perspective for Ruby to bridge this gap.
39
43
  Implementing a complete scientific computing infrastructure would take too long.
40
- Comes GraalVM into the picture:
44
+ Enters [Oracle's GraalVM](https://www.graalvm.org/):
41
45
 
42
46
  > GraalVM is a universal virtual machine for running applications written in
43
47
  > JavaScript, Python 3, Ruby, R, JVM-based languages like Java, Scala, Kotlin,
@@ -68,38 +72,40 @@ Comes GraalVM into the picture:
68
72
  As stated above, GraalVM is a _universal_ virtual machine that allows Ruby and R (and other
69
73
  languages) to run on the same environment. GraalVM allows polyglot applications to
70
74
  _seamlessly_ interact with one another and pass values from one language to the other.
71
- Galaaz, a gem for Ruby, intends to tightly couple Ruby and R
72
- and allow those languages to interact in a way that the user will be unaware
73
- of such interaction.
75
+ Although a great idea, GraalVM still requires application writers to know several languages.
76
+ To eliminate that requirement, we built Galaaz, a gem for Ruby, to tightly couple
77
+ Ruby and R and allow those languages to interact in a way that the user will be unaware
78
+ of such interaction. In other words, a Ruby programmer will be able to use all
79
+ the capabilities of R without knowing the R syntax.
74
80
 
75
- Library wrapping is an usual way of bringing features from one language into another.
81
+ Library wrapping is a usual way of bringing features from one language into another.
76
82
  To improve performance, Python often wraps more efficient C libraries. For the
77
- Python developer, the existence of such C libraries is of no concern. The problem with
83
+ Python developer, the existence of such C libraries is hidden. The problem with
78
84
  library wrapping is that for any new library, there is the need to handcraft a new
79
85
  wrapper.
80
86
 
81
- Galaaz, instead of wrapping a single C or R library, wraps the whole of
82
- the R language in Ruby. Doing so, all thousands of R libraries are available to
83
- Ruby developers. Also any new library developed in R will be available without a
84
- new wrapping effort.
87
+ Galaaz, instead of wrapping a single C or R library, wraps the whole R language
88
+ in Ruby. Doing so, all thousands of R libraries are available immediately
89
+ to Ruby developers without any new wrapping effort.
85
90
 
86
- This article shows how Ruby can use R's ggplot2 library tranparantly, and
87
- bring to Ruby the power of high quality scientific plotting. it also shows that
88
- migrating from R to Ruby with Galaaz is a matter of small syntactic changes.
89
- Using Ruby, the R developer can use all of Ruby's powerful OO features. It also
90
- becomes much easier to move code from the analysis phase to the production phase.
91
+ To show the power of Galaaz, we show in this article how Ruby can use R's ggplot2
92
+ library tranparantly bringing to Ruby the power of high quality scientific plotting.
93
+ We also show that migrating from R to Ruby with Galaaz is a matter of small
94
+ syntactic changes. By using Ruby, the R developer can use all of Ruby's powerful
95
+ object-oriented features. Also, with Ruby, it becomes much easier to move code
96
+ from the analysis phase to the production phase.
91
97
 
92
- In this article we will explore the R ToothGrowth dataset. In doing so, we will
98
+ In this article we will explore the R ToothGrowth dataset. To illustrate, we will
93
99
  create some boxplots. A primer on boxplot is available in
94
100
  [this article](https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51).
95
101
 
96
102
  We will also create a Corporate Template ensuring that plots will have a consistent
97
- visualization. This template is build using a Ruby module. There is a way of building
103
+ visualization. This template is built using a Ruby module. There is a way of building
98
104
  ggplot themes that will work the same as the Ruby module. Yet, writing a new theme
99
- requires specific knowledge. Ruby modules are standard to the language and don't
100
- need special knowledge.
105
+ requires specific knowledge on theme writing. Ruby modules are standard to the
106
+ language and don't need special knowledge.
101
107
 
102
- In [this blog](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021) we show a scatter plot in Ruby also with Galaaz.
108
+ [Here](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021) we show a scatter plot in Ruby also with Galaaz.
103
109
 
104
110
  # gKnit
105
111
 
@@ -123,18 +129,18 @@ class and instance variables ('@'), are available in all chunks.
123
129
  # Exploring the Dataset
124
130
 
125
131
  Let's start by exploring our selected dataset. ToothGrowth is an R dataset. A dataset
126
- is like an excel spreadsheet, but in which each column has only one type of data.
132
+ is like a simple excel spreadsheet, in which each column has only one type of data.
127
133
  For instance one column can have float, the other integer, and a third strings.
128
- This dataset analyses the length of odontoblasts (cells responsible for tooth growth)
134
+ This dataset analyzes the length of odontoblasts (cells responsible for tooth growth)
129
135
  in 60 guinea pigs, where each animal received one of three dose levels of Vitamin C
130
- (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (OJ) or ascorbic acid
136
+ (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice OJ or ascorbic acid
131
137
  (a form of vitamin C and coded as VC).
132
138
 
133
139
  The ToothGrowth dataset contains three columns: 'len', 'supp' and 'dose'. Let's
134
- take a look at a few rows of this dataset. In Galaaz, to have access to an R variable
135
- we use the corresponding Ruby symbol preceeded by the tilda ('~') function. Note in the
136
- following chunk that Ruby's '@tooth_growth' is assigned the value of '~:ToothGrowth'.
137
- 'ToothGrowth' is the R variable containing the dataset of interest.
140
+ take a look at a few rows of this dataset. In Galaaz, R variables are accessed
141
+ by using the corresponding Ruby symbol preceeded by the tilda ('~') function. Note in the
142
+ following chunk that 'ToothGrowth' is the R variable and Ruby's '@tooth_growth' is
143
+ assigned the value of '~:ToothGrowth'.
138
144
 
139
145
 
140
146
  ```ruby
@@ -172,7 +178,8 @@ puts @tooth_growth.len.head
172
178
  ## [1] 4.2 11.5 7.3 5.8 6.4 10.0
173
179
  ```
174
180
 
175
- The 'dose' column contains a numeric value wiht either, 0.5, 1 or 2. Although those are
181
+ The 'dose' column contains a numeric value with either, 0.5, 1 or 2, although the
182
+ first 6 rows as seen above only contain the 0.5 values. Even though those are
176
183
  number, they are better interpreted as a [factor or cathegory](https://swcarpentry.github.io/r-novice-inflammation/12-supp-factors/). So, let's convert our 'dose' column from numeric to 'factor'.
177
184
  In R, the function 'as.factor' is used to convert data in a vector to factors. To use this
178
185
  function from Galaaz the dot ('.') in the function name is substituted by '__' (double underline).
@@ -198,8 +205,9 @@ puts @tooth_growth.dim
198
205
 
199
206
  This dataset has 60 rows, one for each subject and 3 columns, as we have already seen.
200
207
 
201
- Note that we do not call 'puts' when using the 'str' function. This functions does not
202
- return anything and prints the structure of the dataset as a side effect.
208
+ Note that we do not need to call 'puts' when using the 'str' function. This
209
+ functions does not return anything and prints the structure of the dataset
210
+ as a side effect.
203
211
 
204
212
 
205
213
  ```ruby
@@ -236,8 +244,9 @@ puts @tooth_growth.summary
236
244
 
237
245
  ## Quick plot for seing the data
238
246
 
239
- Let's now create our first plot with the given data by accessing ggplot2 from Ruby. For Rubyist
240
- that have never seen or used ggplot2, here is the description of ggplot found on its home page:
247
+ Let's now create our first plot with the given data by accessing ggplot2 from Ruby.
248
+ For Rubyists that have never seen or used ggplot2, here is the description of ggplot
249
+ found in its home page:
241
250
 
242
251
  > "ggplot2 is a system for declaratively creating graphics, based on _The Grammar of Graphics_.
243
252
  > You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical
@@ -249,16 +258,20 @@ axis, data, title, subtitle and also graphical primitives such as _bar plot_, _b
249
258
  to form the final graphics.
250
259
 
251
260
  In order to make a plot, we use the 'ggplot' function to the dataset. In R, this would be
252
- written as ```ggplot(<dataset>, ...)```. In Galaaz, use either ```R.ggplot(<dataset>, ...)```,
253
- or ```<dataset>.ggplot(...)```. In the graph specification bellow, we use the second notation
254
- that looks more Ruby like. The plot specifies the 'dose' on the $x$ axis and the 'length' on
255
- the $y$ axis with the 'aes' method. 'E.aes(x: :dose, y: :len)'. To specify the type of plot to
256
- create add a geom to the plot. For a boxplot, the geom is R.geom_boxplot.
261
+ written as ```ggplot(<dataset>, ...)```. Galaaz gives you the flexibility to use
262
+ either ```R.ggplot(<dataset>, ...)``` or ```<dataset>.ggplot(...)```. In the graph s
263
+ pecification bellow, we use the second notation
264
+ that looks more like Ruby. ggplot uses the ‘aes’ method to specify
265
+ x and y axes; in this case, the 'dose' on the $x$ axis and the 'length' on
266
+ the $y$ axis: 'E.aes(x: :dose, y: :len)'. To specify the type of plot add a geom to
267
+ the plot. For a boxplot, the geom is R.geom_boxplot.
257
268
 
258
269
  Note also that we have a call to 'R.png' before plotting and 'R.dev__off' after the print
259
- statement. 'R.png' opens a 'png' device for outputting the plot. 'R.dev__off'
260
- closes the device and creates the 'png' file. If we do no pass a name to the 'png' function, the
261
- image gets a default name of 'Rplot\<nnn\>' where \<nnn\> is the number of the plot. We can
270
+ statement. 'R.png' opens a 'png device' for outputting the plot. If we do no pass a
271
+ name to the 'png' function, the
272
+ image gets a default name of 'Rplot\<nnn\>' where \<nnn\> is the number of the plot.
273
+ 'R.dev__off'
274
+ closes the device and creates the 'png' file. We can
262
275
  then include the generated 'png' file in the document by adding an rmarkdown directive.
263
276
 
264
277
 
@@ -277,25 +290,29 @@ R.dev__off
277
290
  [//]: # (of gKnit, the figures should be automatically saved and the name)
278
291
  [//]: # (taken from the chunk 'label' and possibly chunk parameters)
279
292
 
280
- ![](figures/dose_len.png)
293
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/dose_len.png)
281
294
 
282
295
  Great! We've just managed to create and save our first plot in Ruby with only
283
- four lines of code. We can see with this plot a clear trend: as the dose of the supplement
296
+ four lines of code. We can now easily see with this plot a clear trend: as the
297
+ dose of the supplement
284
298
  is increased, so is the length of teeth.
285
299
 
286
300
  ## Facetting the plot
287
301
 
288
302
  This first plot shows a trend, but our data has information about two different forms
289
- of delivery method, either by Orange Juice (OJ) or by Vitamin C (VC).
290
- Let's then try to create a plot that explicits the effect of each delivery method. This next
303
+ of delivery method, either by Orange Juice OJ or by Vitamin C VC.
304
+ Let's then try to create a plot that helps us discern the effect of each
305
+ delivery method. This next
291
306
  plot is a _facetted_ plot where each delivery method gets is own plot.
292
- On the left side, the plot shows the OJ delivery method. On the right side, we see the
293
- VC delivery method. To obtain this plot, we use the 'R.facet_grid' function, that
307
+ On the left side, the plot shows the OJ delivery method. On the right side,
308
+ we see the VC delivery method. To obtain this plot, we use the
309
+ 'R.facet_grid' function, that
294
310
  automatically creates the facets based on the delivery method factors. The parameter to
295
311
  the 'facet_grid' method is a [_formula_](https://thomasleeper.com/Rcourse/Tutorials/formulae.html).
296
312
 
297
- In Galaaz, formulas are written a bit differently than in R. The following changes are
298
- necessary:
313
+ In Galaaz we give programmers the flexibility to use two different ways to write formulas.
314
+ In the first way, the following changes from writing formulas (for example 'x ~ y')
315
+ in R are necessary:
299
316
 
300
317
  * R symbols are represented by the same Ruby symbol prefixed with the '+' method. The
301
318
  symbol ```x``` in R becomes ```+:x``` in Ruby;
@@ -326,7 +343,7 @@ puts @bp
326
343
  R.dev__off
327
344
  ```
328
345
 
329
- ![](figures/facet_by_delivery.png)
346
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facet_by_delivery.png)
330
347
 
331
348
  It now becomes clear that although both methods of delivery have a direct
332
349
  impact on tooth growth, method (OJ) is non-linear having a higher impact with smaller
@@ -335,9 +352,9 @@ doses of ascorbic acid and reducing it's impact as the dose increases. With the
335
352
 
336
353
  ## Adding Color
337
354
 
338
- If this paper was about data analysis, we should make a better analysis of the trends and
339
- should improve the statistical analysis. But we are interested in working with ggplot
340
- in Ruby. So, Let's add some color to this plot to make the trend and comparison more
355
+ If we were writing about data analysis, we would make a better analysis of the trends and
356
+ improve the statistical analysis. But here we are interested in working with ggplot
357
+ in Ruby. So, let's add some color to this plot to make the trend and comparison more
341
358
  visible. In the following plot, the boxes are color coded by dose. To add color, it is
342
359
  enough to add ```fill: :dose``` to the aesthetic of boxplot. With this command each 'dose'
343
360
  factor gets its own color.
@@ -352,20 +369,20 @@ puts @bp
352
369
  R.dev__off
353
370
  ```
354
371
 
355
- ![](figures/facets_by_delivery_color.png)
372
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_by_delivery_color.png)
356
373
 
357
- Facetting helps us compare the general trends in the (OJ) and (VC) delivery methods.
358
- Adding color allow us to compare specifically how each dosage impacts the teeth growth.
359
- It is possible to observe that with smaller doses, up to 1mg, (OJ) performs better
360
- than (VC) (red color). For 2mg, both (OJ) and (VC) have the same median, but (OJ) is
374
+ Facetting helps us compare the general trends for each delivery method.
375
+ Adding color allow us to compare specifically how each dosage impacts the tooth growth.
376
+ It is possible to observe that with smaller doses, up to 1mg, OJ performs better
377
+ than VC (red color). For 2mg, both OJ and VC have the same median, but OJ is
361
378
  less disperse (blue color).
362
- For 1mg (green color), (OJ) is significantly bettern than (VC). By this very quick analysis,
363
- it seems that (OJ) is a better delivery method than (VC).
379
+ For 1mg (green color), OJ is significantly bettern than VC. By this very quick
380
+ visual analysis, it seems that OJ is a better delivery method than VC.
364
381
 
365
382
  ## Clarifying the data
366
383
 
367
384
  Boxplots give us a nice idea of the distribution of data, but looking at those plots with
368
- large colored boxes leaves us wondering what is going on on those boxes. According to
385
+ large colored boxes leaves us wondering what else is going on. According to
369
386
  Edward Tufte in Envisioning Information:
370
387
 
371
388
  > Thin data rightly prompts suspicions: "What are they leaving out? Is that really everything
@@ -378,7 +395,7 @@ And he states:
378
395
 
379
396
  > A most unconventional design strategy is revealed: _to clarify, add detail._
380
397
 
381
- Let's then use this wisdom and add yet another layer of data to our plot, so that we clarify
398
+ Let's use this wisdom and add yet another layer of data to our plot, so that we clarify
382
399
  it with detail and do not leave large empty boxes. In this next plot, we add data points for
383
400
  each of the 60 pigs in the experiment. For that, add the function 'R.geom_point' to the
384
401
  plot.
@@ -395,15 +412,17 @@ puts @bp
395
412
  R.dev__off
396
413
  ```
397
414
 
398
- ![](figures/facets_with_points.png)
415
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_with_points.png)
399
416
 
400
- Now we can see the actual distribution of all the 60 subject. Actually, this is not
417
+ Now we can see the actual distribution of all the 60 subjects. Actually, this is not
401
418
  totally true. We have a hard time seing all 60 subjects. It seems that some points
402
419
  might be placed one over the other hiding useful information.
403
420
 
404
421
  But no sweat! Another layer might solve the problem. In the following plot a new layer
405
- called 'geom_jitter' is added to the plot. This adds randomness to the position of
406
- the points, making it easier to see all of then and preventing data hiding. We also add
422
+ called 'geom_jitter' is added to the plot. Jitter adds a small amount of random variation
423
+ to the location of each point, and is a useful way of handling overplotting caused by
424
+ discreteness in smaller datasets. This makes it easier to see all of the points and
425
+ prevents data hiding. We also add
407
426
  color and change the shape of the points, making them even easier to see.
408
427
 
409
428
 
@@ -416,14 +435,14 @@ puts @bp + R.geom_jitter(shape: 23, color: "cyan3", size: 1)
416
435
  R.dev__off
417
436
  ```
418
437
 
419
- ![](figures/facets_with_jitter.png)
438
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_with_jitter.png)
420
439
 
421
440
  Now we can see all 60 points in the graph. We have here a much higher information density
422
441
  and we can see outliers and subjects distribution.
423
442
 
424
443
  # Preparing the Plot for Presentation
425
444
 
426
- We have come a long way since our first plot. As was already said, this is not
445
+ We have come a long way since our first plot. As we already said, this is not
427
446
  an article about data analysis and the focus is on the
428
447
  integration of Ruby and ggplot. So, let's assume that the analysis is now done. Yet,
429
448
  ending the analysis does not mean that the work is done. On the contrary, the hardest
@@ -443,11 +462,12 @@ those colors helped us understand the data, for a final presentation random colo
443
462
  can distract the viewer.
444
463
 
445
464
  In the following plot we use shades function 'scale_fill_manual' to change
446
- the colors of the boxes and order of labels. For colors we use shades of blue for
465
+ the colors of the boxes and order of labels. For colors, we use shades of blue for
447
466
  each dosage, with light blue ('cyan')
448
- representing the lower dose and deep blue ('deepskyblue4') the higher dose. Also
449
- the smaller value (0.5) is on
450
- the botton of the labels and (2) at the top. This ordering seems more natural and
467
+ representing the lower dose and deep blue ('deepskyblue4') the higher dose.
468
+ Also, the legend could be improved: we use the ‘breaks’ parameter to put
469
+ the smaller value (0.5) at the botton of the labels and the largest (2) at the top.
470
+ This ordering seems more natural and
451
471
  matches with the actual order of the colors in the plot.
452
472
 
453
473
 
@@ -463,7 +483,7 @@ puts @bp
463
483
  R.dev__off
464
484
  ```
465
485
 
466
- ![](figures/facets_by_delivery_color2.png)
486
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_by_delivery_color2.png)
467
487
 
468
488
 
469
489
  ## Violin Plot and Jitter
@@ -498,7 +518,7 @@ puts @violin
498
518
  R.dev__off
499
519
  ```
500
520
 
501
- ![](figures/violin_with_jitter.png)
521
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/violin_with_jitter.png)
502
522
 
503
523
  This plot is an alternative to the original boxplot. For the final presentation, it is
504
524
  important to think which graphics will be best understood by our audience. A violin plot
@@ -508,14 +528,15 @@ bit better than the boxplot and provides even more information than the boxplot
508
528
  ## Adding Decoration
509
529
 
510
530
  Our final plot is starting to take shape, but a presentation plot should have at least a
511
- title, labels on the axis and maybe some other decorations. Let's start adding those.
531
+ title, labels on the axes and maybe some other decorations. Let's start adding those.
512
532
  Since decoration requires more graph area, this new plot has a 'width' and 'height'
513
- specification. When there is no specification, the default values for width and height are
514
- 480.
533
+ specification. When there is no specification, the default values from R for width and
534
+ height are 480.
515
535
 
516
- The 'labs' function adds require decoration. In this example we use 'title', 'subtitle',
517
- 'x' for the $x$ axis label and 'y', for the $y$ axis label, and 'caption' for information
518
- about the plot.
536
+ The 'labs' function adds the required decoration. In this example we use 'title',
537
+ 'subtitle', 'x' for the $x$ axis label and 'y', for the $y$ axis label, and 'caption'
538
+ for information about the plot (for clarity, we defined a caption variable using Ruby's
539
+ Here Doc style).
519
540
 
520
541
 
521
542
  ```ruby
@@ -527,8 +548,8 @@ Each animal received one of three dose levels of vitamin C.
527
548
  EOT
528
549
 
529
550
  @decorations =
530
- R.labs(title: "Tooth Growth: Length by Dose",
531
- subtitle: "Faceted by delivery method, (OJ) or (VC)",
551
+ R.labs(title: "Tooth Growth: Length vs Vitamin C Dose",
552
+ subtitle: "Faceted by delivery method, OJ or VC",
532
553
  x: "Dose (mg)", y: "Teeth length",
533
554
  caption: caption)
534
555
 
@@ -537,16 +558,19 @@ puts @bp + @decorations
537
558
  R.dev__off
538
559
  ```
539
560
 
540
- ![](figures/facets_with_decorations.png)
541
-
561
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/facets_with_decorations.png)
542
562
 
543
563
  ## The Corp Theme
544
564
 
545
- We are almost done. But the plot does not yet look nice to the eye. We are still distracted
565
+ We are almost done. But the default plot configuration does not yet look
566
+ nice to the eye. We are still distracted
546
567
  by many aspects of the graph. First, the back font color does not look good. Then
547
568
  plot background, borders, grids all add clutter to the plot.
548
569
 
549
- We will now define our corporate theme. In this theme, we remove borders and grids. The
570
+ We will now define our corporate theme. in a module that can be used/loaded for all
571
+ plots, similar to CSS or any other style definition.
572
+
573
+ In this theme, we remove borders and grids. The
550
574
  background if left for faceted plots but removed for non-faceted plots. Font colors are
551
575
  a shade o blue (color: '#00080'). Axis labels are moved near the end of the axis and
552
576
  written in 'bold'.
@@ -606,6 +630,13 @@ end
606
630
 
607
631
  ## Final Box Plot
608
632
 
633
+ We can now easily make our final boxplot and violin plot. All the layers for the plot were
634
+ added in order to expose our understanding of the data and the need to present the result
635
+ to our audience.
636
+
637
+ The final specification is just the addition of all layers build up to this point (@bp), plus
638
+ the decorations (@decorations), plus the corporate theme.
639
+
609
640
  Here is our final boxplot, without jitter.
610
641
 
611
642
 
@@ -617,11 +648,9 @@ puts @bp + @decorations + CorpTheme.global_theme(faceted: true)
617
648
  R.dev__off
618
649
  ```
619
650
 
620
- ![](figures/final_box_plot.png)
651
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_box_plot.png)
621
652
 
622
- ## Final Violin Plot
623
-
624
- Here is the final violin plot, with jitter and the same look and feel of the corporate
653
+ And here is the final violin plot, with jitter and the same look and feel of the corporate
625
654
  boxplot.
626
655
 
627
656
 
@@ -633,12 +662,14 @@ puts @violin + @decorations + CorpTheme.global_theme(faceted: true)
633
662
  R.dev__off
634
663
  ```
635
664
 
636
- ![](figures/final_violin_plot.png)
665
+
666
+ ![]https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_violin_plot.png
637
667
 
638
668
  ## Another View
639
669
 
640
- Finally, here is a last plot, with the same look and feel as before but facetted by
641
- dose and not by supplement.
670
+ We now make another plot, with the same look and feel as before but facetted by
671
+ dose and not by supplement. This shows how easy it is to create new plots by just
672
+ changing small statement on the _grammar of graphics_.
642
673
 
643
674
 
644
675
  ```ruby
@@ -662,24 +693,26 @@ puts @bp
662
693
  R.dev__off
663
694
  ```
664
695
 
665
- ![](figures/facet_by_dose.png)
696
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facet_by_dose.png)
666
697
 
667
698
  # Conclusion
668
699
 
669
- Galaaz tightly couples Ruby and R in a way that Ruby developers do not need to be aware
700
+ In this article, we introduce Galaaz and show how to tightly couple Ruby and R
701
+ in a way that Ruby developers do not need to be aware
670
702
  of the executing R engine. For the Ruby developer the existence of R
671
- is of no consequence. For her, she is just coding in Ruby. On the other hand, for the R
672
- developer, migration to Ruby is a matter of small syntactic changes and very gentle
703
+ is of no consequence, she is just coding in Ruby. On the other hand, for the R
704
+ developer, migration to Ruby is a matter of small syntactic changes with a very gentle
673
705
  learning curve. As the R developer becomes more proficient in Ruby, he can start using
674
706
  'classes', 'modules', 'procs', 'lambdas'.
675
707
 
676
- This coupling shows the power of GraalVM and Truffle polyglot environment. Trying to
677
- bring to Ruby the power of R starting from scratch is an enourmous endeavour and would
678
- probably never be accomplished. Today's data scientists would certainly stick with either
679
- Python or R. Now, both the Ruby and R communities might benefit from this marriage. Also,
680
- the process to couple Ruby and R can be also be done to couple Ruby and JavaScript and
681
- maybe also Ruby and Python. In a polyglot world a *uniglot* language might be extremely
682
- relevant.
708
+ Trying to bring to Ruby the power of R starting from scratch is an enourmous endeavour
709
+ and would probably never be accomplished. Today's data scientists would certainly
710
+ stick with either Python or R. Now, both the Ruby and R communities can benefit
711
+ from this marriage, provided by Galaaz on top of GraalVM and Truffle's
712
+ polyglot environment. We presented
713
+ the process to couple Ruby and R, but this process can also be done to couple Ruby
714
+ and JavaScript or Ruby and Python. In a polyglot world a *uniglot* language might
715
+ be extremely relevant.
683
716
 
684
717
  From the perspective of performance, GraalVM and Truffle promises improvements that could
685
718
  reach over 10 times, both for [FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
@@ -687,15 +720,14 @@ and for [TruffleRuby](https://rubykaigi.org/2018/presentations/eregontp.html).
687
720
 
688
721
  This article has shown how to improve a plot step-by-step. Starting from a very simple
689
722
  boxplot with all default configurations, we moved slowly to our final plot. The important
690
- point here is not if the final plot is actually beautiful, but that there is a process
691
- of small steps improvements that can be followed to getting a final plot ready for
692
- presentation.
723
+ point here is not if the final plot is actually beautiful (as beauty is in the eye of
724
+ the beholder), but that there is a process of small steps improvements that can be followed
725
+ to getting a final plot ready for presentation.
693
726
 
694
727
  Finally, this whole article was written in rmarkdown and compiled to HTML by _gknit_, an
695
728
  application that wraps _knitr_ and allows documenting Ruby code. This application can
696
729
  be of great help for any Rubyist trying to write articles, blogs or documentation for Ruby.
697
730
 
698
-
699
731
  # Installing Galaaz
700
732
 
701
733
  ## Prerequisites
@@ -722,10 +754,3 @@ the gnu compiler and tools should be enough. I am not sure what is needed on th
722
754
 
723
755
  * gknit <filename>
724
756
  * In a scrip add: require 'galaaz'
725
-
726
-
727
- And now that you’ve read this far, here’s how to submit your story to the freeCodeCamp
728
- publication: send an email to submit at freecodecamp org. Include the URL for your story on
729
- Medium (preferably an unpublished draft) and the word “bananas” so that we’ll know that you
730
- have read all this. Only send one story URL per email. There’s no need to add anything
731
- further to your email — we just read the stories and judge them based on their own merits.