galaaz 0.4.2 → 0.4.5

Sign up to get free protection for your applications and to get access to all the features.
Files changed (114) hide show
  1. checksums.yaml +4 -4
  2. data/LICENSE +25 -0
  3. data/Rakefile +8 -0
  4. data/bin/gknit +9 -5
  5. data/bin/gstudio +4 -2
  6. data/bin/gstudio.rb +32 -2
  7. data/blogs/dev/dev.html +219 -34
  8. data/blogs/dev/dev.md +26 -26
  9. data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
  10. data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0
  11. data/blogs/dplyr/dplyr.rb +63 -0
  12. data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +38 -26
  13. data/blogs/galaaz_ggplot/galaaz_ggplot.aux +16 -17
  14. data/blogs/galaaz_ggplot/galaaz_ggplot.pdf +0 -0
  15. data/blogs/galaaz_ggplot/galaaz_ggplot.tex +65 -31
  16. data/blogs/oh_my/not_so.rb +2342 -0
  17. data/blogs/oh_my/oh_my.Rmd +493 -0
  18. data/blogs/oh_my/oh_my.html +680 -0
  19. data/blogs/oh_my/oh_my.md +597 -0
  20. data/blogs/oh_my/old.Rmd +2100 -0
  21. data/blogs/ruby_plot/figures/facets_with_decorations.png +0 -0
  22. data/blogs/ruby_plot/figures/facets_with_jitter.png +0 -0
  23. data/blogs/ruby_plot/figures/final_box_plot.png +0 -0
  24. data/blogs/ruby_plot/figures/final_violin_plot.png +0 -0
  25. data/blogs/ruby_plot/figures/violin_with_jitter.png +0 -0
  26. data/blogs/ruby_plot/ruby_plot.Rmd +147 -122
  27. data/blogs/ruby_plot/ruby_plot.Rmd_external_figs +662 -0
  28. data/blogs/ruby_plot/ruby_plot.html +49 -54
  29. data/blogs/ruby_plot/ruby_plot.md +147 -122
  30. data/blogs/ruby_plot/ruby_plot.pdf +0 -0
  31. data/blogs/ruby_plot/ruby_plot.tex +776 -157
  32. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +57 -0
  33. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +106 -0
  34. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +110 -0
  35. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +174 -0
  36. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +236 -0
  37. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
  38. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +296 -0
  39. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +236 -0
  40. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +218 -0
  41. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +128 -0
  42. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +150 -0
  43. data/examples/islr/ch2.spec.rb +21 -18
  44. data/examples/islr/ch3_boston.rb +14 -5
  45. data/examples/islr/ch3_multiple_regression.rb +2 -3
  46. data/examples/islr/ch6.spec.rb +1 -1
  47. data/examples/islr/x_y_rnorm.jpg +0 -0
  48. data/lib/R_interface/r.rb +14 -10
  49. data/lib/R_interface/r_libs.R +9 -0
  50. data/lib/R_interface/r_methods.rb +77 -6
  51. data/lib/R_interface/{expression.rb → r_module_s.rb} +13 -14
  52. data/lib/R_interface/rbinary_operators.rb +58 -71
  53. data/lib/R_interface/rdata_frame.rb +2 -1
  54. data/lib/R_interface/rdevices.R +4 -0
  55. data/lib/R_interface/rdevices.rb +1 -1
  56. data/lib/R_interface/renvironment.rb +34 -1
  57. data/lib/R_interface/rexpression.rb +108 -2
  58. data/lib/R_interface/rindexed_object.rb +3 -1
  59. data/lib/R_interface/rlanguage.rb +18 -2
  60. data/lib/R_interface/rmatrix.rb +14 -0
  61. data/lib/R_interface/rmd_indexed_object.rb +5 -1
  62. data/lib/R_interface/robject.rb +61 -23
  63. data/lib/R_interface/rsupport.rb +111 -53
  64. data/lib/R_interface/rsymbol.rb +6 -5
  65. data/lib/R_interface/ruby_extensions.rb +130 -4
  66. data/lib/R_interface/runary_operators.rb +35 -3
  67. data/lib/R_interface/rvector.rb +1 -0
  68. data/lib/galaaz.rb +0 -2
  69. data/lib/gknit/knitr_engine.rb +58 -4
  70. data/lib/gknit/ruby_engine.rb +5 -6
  71. data/lib/util/exec_ruby.rb +55 -9
  72. data/specs/all.rb +13 -3
  73. data/specs/figures/dose_len.png +0 -0
  74. data/specs/r_dataframe.spec.rb +49 -26
  75. data/specs/r_environment.spec.rb +140 -0
  76. data/specs/r_eval.spec.rb +0 -15
  77. data/specs/r_formula.spec.rb +232 -0
  78. data/specs/r_function.spec.rb +7 -8
  79. data/specs/r_list.spec.rb +4 -0
  80. data/specs/r_list_apply.spec.rb +11 -11
  81. data/specs/r_matrix.spec.rb +3 -3
  82. data/specs/{r_plots.spec.rb~ → r_nse.spec.rb} +29 -6
  83. data/specs/r_vector_creation.spec.rb +6 -0
  84. data/specs/r_vector_object.spec.rb +2 -2
  85. data/specs/r_vector_operators.spec.rb +3 -3
  86. data/specs/r_vector_subsetting.spec.rb +4 -4
  87. data/specs/ruby_expression.spec.rb +324 -0
  88. data/specs/tmp.rb +12 -524
  89. data/sty/galaaz.sty +71 -0
  90. data/version.rb +1 -1
  91. metadata +31 -41
  92. data/bin/gknit2~ +0 -6
  93. data/bin/ogk~ +0 -4
  94. data/bin/prepareR.rb~ +0 -1
  95. data/blogs/dev/dev.Rmd~ +0 -104
  96. data/blogs/galaaz_ggplot/galaaz_ggplot.dvi +0 -0
  97. data/blogs/galaaz_ggplot/midwest_external_png~ +0 -1
  98. data/blogs/gknit/gknit.Rmd~ +0 -184
  99. data/blogs/gknit/gknit.Rnd~ +0 -17
  100. data/blogs/gknit/model.rb~ +0 -46
  101. data/blogs/ruby_plot/ruby_plot.Rmd~ +0 -215
  102. data/examples/islr/Figure.jpg +0 -0
  103. data/examples/misc/moneyball.rb~ +0 -16
  104. data/examples/misc/subsetting.rb~ +0 -372
  105. data/lib/R/eng_ruby.R~ +0 -63
  106. data/lib/R_interface/capture_plot.rb~ +0 -23
  107. data/lib/R_interface/r.rb~ +0 -121
  108. data/lib/R_interface/rdevices.rb~ +0 -27
  109. data/lib/gknit.rb~ +0 -26
  110. data/lib/gknit/knitr_engine.rb~ +0 -102
  111. data/lib/gknit/ruby_engine.rb~ +0 -72
  112. data/lib/util/inline_file.rb~ +0 -23
  113. data/r_requires/knitr.rb~ +0 -4
  114. data/specs/r_language.spec.rb +0 -157
@@ -1,6 +1,8 @@
1
1
  ---
2
2
  title: "How to make Beautiful Ruby Plots with Galaaz"
3
- author: "Rodrigo Botafogo"
3
+ author:
4
+ - "Rodrigo Botafogo"
5
+ - "Daniel Mossé - University of Pittsburgh"
4
6
  tags: [Tech, Data Science, Ruby, R, GraalVM]
5
7
  date: "November 19th, 2018"
6
8
  output:
@@ -9,27 +11,29 @@ output:
9
11
  keep_md: true
10
12
  pdf_document:
11
13
  includes:
12
- in_header: ["../../sty/galaaz.sty"]
14
+ in_header: "../../sty/galaaz.sty"
15
+ keep_tex: yes
13
16
  number_sections: yes
17
+ toc: true
18
+ toc_depth: 2
19
+ fontsize: 11pt
14
20
  ---
15
21
 
16
22
  ```{r setup, echo=FALSE}
17
23
 
18
24
  ```
19
25
 
20
- # Introduction
21
-
22
26
  According to Wikipedia "Ruby is a dynamic, interpreted, reflective, object-oriented,
23
27
  general-purpose programming language. It was designed and developed in the mid-1990s by Yukihiro
24
28
  "Matz" Matsumoto in Japan." It reached high popularity with the development of Ruby on Rails
25
29
  (RoR) by David Heinemeier Hansson. RoR is a web application framework first released
26
30
  around 2005. It makes extensive use of Ruby's metaprogramming features. With RoR,
27
31
  Ruby became very popular. According to [Ruby's Tiobe index](https://www.tiobe.com/tiobe-index/ruby/)
28
- it peeked in popularity around 2008. Then it's popularity
29
- declined until 2015 when it started picking up again. At the time of
30
- this writing (November 2018), the Tiobe index puts ruby in 16th position.
32
+ it peeked in popularity around 2008, then declined until 2015 when it started picking up again.
33
+ At the time of this writing (November 2018), the Tiobe index puts Ruby in 16th position as
34
+ most popular language.
31
35
 
32
- Python, a similar language to Ruby, ranks 4th in the index. Java, C and C++ take the
36
+ Python, a language similar to Ruby, ranks 4th in the index. Java, C and C++ take the
33
37
  first three positions. Ruby is often criticized for its focus on web applications.
34
38
  But Ruby can do [much more](https://github.com/markets/awesome-ruby) than just web applications.
35
39
  Yet, for scientific computing, Ruby lags way behind Python and R. Python has
@@ -39,7 +43,7 @@ of libraries for data analysis.
39
43
 
40
44
  Until recently, there was no real perspective for Ruby to bridge this gap.
41
45
  Implementing a complete scientific computing infrastructure would take too long.
42
- Comes GraalVM into the picture:
46
+ Enters [Oracle's GraalVM](https://www.graalvm.org/):
43
47
 
44
48
  > GraalVM is a universal virtual machine for running applications written in
45
49
  > JavaScript, Python 3, Ruby, R, JVM-based languages like Java, Scala, Kotlin,
@@ -70,38 +74,40 @@ Comes GraalVM into the picture:
70
74
  As stated above, GraalVM is a _universal_ virtual machine that allows Ruby and R (and other
71
75
  languages) to run on the same environment. GraalVM allows polyglot applications to
72
76
  _seamlessly_ interact with one another and pass values from one language to the other.
73
- Galaaz, a gem for Ruby, intends to tightly couple Ruby and R
74
- and allow those languages to interact in a way that the user will be unaware
75
- of such interaction.
77
+ Although a great idea, GraalVM still requires application writers to know several languages.
78
+ To eliminate that requirement, we built Galaaz, a gem for Ruby, to tightly couple
79
+ Ruby and R and allow those languages to interact in a way that the user will be unaware
80
+ of such interaction. In other words, a Ruby programmer will be able to use all
81
+ the capabilities of R without knowing the R syntax.
76
82
 
77
- Library wrapping is an usual way of bringing features from one language into another.
83
+ Library wrapping is a usual way of bringing features from one language into another.
78
84
  To improve performance, Python often wraps more efficient C libraries. For the
79
- Python developer, the existence of such C libraries is of no concern. The problem with
85
+ Python developer, the existence of such C libraries is hidden. The problem with
80
86
  library wrapping is that for any new library, there is the need to handcraft a new
81
87
  wrapper.
82
88
 
83
- Galaaz, instead of wrapping a single C or R library, wraps the whole of
84
- the R language in Ruby. Doing so, all thousands of R libraries are available to
85
- Ruby developers. Also any new library developed in R will be available without a
86
- new wrapping effort.
89
+ Galaaz, instead of wrapping a single C or R library, wraps the whole R language
90
+ in Ruby. Doing so, all thousands of R libraries are available immediately
91
+ to Ruby developers without any new wrapping effort.
87
92
 
88
- This article shows how Ruby can use R's ggplot2 library tranparantly, and
89
- bring to Ruby the power of high quality scientific plotting. it also shows that
90
- migrating from R to Ruby with Galaaz is a matter of small syntactic changes.
91
- Using Ruby, the R developer can use all of Ruby's powerful OO features. It also
92
- becomes much easier to move code from the analysis phase to the production phase.
93
+ To show the power of Galaaz, we show in this article how Ruby can use R's ggplot2
94
+ library tranparantly bringing to Ruby the power of high quality scientific plotting.
95
+ We also show that migrating from R to Ruby with Galaaz is a matter of small
96
+ syntactic changes. By using Ruby, the R developer can use all of Ruby's powerful
97
+ object-oriented features. Also, with Ruby, it becomes much easier to move code
98
+ from the analysis phase to the production phase.
93
99
 
94
- In this article we will explore the R ToothGrowth dataset. In doing so, we will
100
+ In this article we will explore the R ToothGrowth dataset. To illustrate, we will
95
101
  create some boxplots. A primer on boxplot is available in
96
102
  [this article](https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51).
97
103
 
98
104
  We will also create a Corporate Template ensuring that plots will have a consistent
99
- visualization. This template is build using a Ruby module. There is a way of building
105
+ visualization. This template is built using a Ruby module. There is a way of building
100
106
  ggplot themes that will work the same as the Ruby module. Yet, writing a new theme
101
- requires specific knowledge. Ruby modules are standard to the language and don't
102
- need special knowledge.
107
+ requires specific knowledge on theme writing. Ruby modules are standard to the
108
+ language and don't need special knowledge.
103
109
 
104
- In [this blog](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021) we show a scatter plot in Ruby also with Galaaz.
110
+ [Here](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021) we show a scatter plot in Ruby also with Galaaz.
105
111
 
106
112
  # gKnit
107
113
 
@@ -125,18 +131,18 @@ class and instance variables ('@'), are available in all chunks.
125
131
  # Exploring the Dataset
126
132
 
127
133
  Let's start by exploring our selected dataset. ToothGrowth is an R dataset. A dataset
128
- is like an excel spreadsheet, but in which each column has only one type of data.
134
+ is like a simple excel spreadsheet, in which each column has only one type of data.
129
135
  For instance one column can have float, the other integer, and a third strings.
130
- This dataset analyses the length of odontoblasts (cells responsible for tooth growth)
136
+ This dataset analyzes the length of odontoblasts (cells responsible for tooth growth)
131
137
  in 60 guinea pigs, where each animal received one of three dose levels of Vitamin C
132
- (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (OJ) or ascorbic acid
138
+ (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice OJ or ascorbic acid
133
139
  (a form of vitamin C and coded as VC).
134
140
 
135
141
  The ToothGrowth dataset contains three columns: 'len', 'supp' and 'dose'. Let's
136
- take a look at a few rows of this dataset. In Galaaz, to have access to an R variable
137
- we use the corresponding Ruby symbol preceeded by the tilda ('~') function. Note in the
138
- following chunk that Ruby's '@tooth_growth' is assigned the value of '~:ToothGrowth'.
139
- 'ToothGrowth' is the R variable containing the dataset of interest.
142
+ take a look at a few rows of this dataset. In Galaaz, R variables are accessed
143
+ by using the corresponding Ruby symbol preceeded by the tilda ('~') function. Note in the
144
+ following chunk that 'ToothGrowth' is the R variable and Ruby's '@tooth_growth' is
145
+ assigned the value of '~:ToothGrowth'.
140
146
 
141
147
  ```{ruby head}
142
148
  # Read the R ToothGrowth variable and assign it to the
@@ -158,7 +164,8 @@ in usual Ruby style.
158
164
  puts @tooth_growth.len.head
159
165
  ```
160
166
 
161
- The 'dose' column contains a numeric value wiht either, 0.5, 1 or 2. Although those are
167
+ The 'dose' column contains a numeric value with either, 0.5, 1 or 2, although the
168
+ first 6 rows as seen above only contain the 0.5 values. Even though those are
162
169
  number, they are better interpreted as a [factor or cathegory](https://swcarpentry.github.io/r-novice-inflammation/12-supp-factors/). So, let's convert our 'dose' column from numeric to 'factor'.
163
170
  In R, the function 'as.factor' is used to convert data in a vector to factors. To use this
164
171
  function from Galaaz the dot ('.') in the function name is substituted by '__' (double underline).
@@ -178,8 +185,9 @@ puts @tooth_growth.dim
178
185
 
179
186
  This dataset has 60 rows, one for each subject and 3 columns, as we have already seen.
180
187
 
181
- Note that we do not call 'puts' when using the 'str' function. This functions does not
182
- return anything and prints the structure of the dataset as a side effect.
188
+ Note that we do not need to call 'puts' when using the 'str' function. This
189
+ functions does not return anything and prints the structure of the dataset
190
+ as a side effect.
183
191
 
184
192
  ```{ruby str}
185
193
  @tooth_growth.str
@@ -197,8 +205,9 @@ puts @tooth_growth.summary
197
205
 
198
206
  ## Quick plot for seing the data
199
207
 
200
- Let's now create our first plot with the given data by accessing ggplot2 from Ruby. For Rubyist
201
- that have never seen or used ggplot2, here is the description of ggplot found on its home page:
208
+ Let's now create our first plot with the given data by accessing ggplot2 from Ruby.
209
+ For Rubyists that have never seen or used ggplot2, here is the description of ggplot
210
+ found in its home page:
202
211
 
203
212
  > "ggplot2 is a system for declaratively creating graphics, based on _The Grammar of Graphics_.
204
213
  > You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical
@@ -210,16 +219,20 @@ axis, data, title, subtitle and also graphical primitives such as _bar plot_, _b
210
219
  to form the final graphics.
211
220
 
212
221
  In order to make a plot, we use the 'ggplot' function to the dataset. In R, this would be
213
- written as ```ggplot(<dataset>, ...)```. In Galaaz, use either ```R.ggplot(<dataset>, ...)```,
214
- or ```<dataset>.ggplot(...)```. In the graph specification bellow, we use the second notation
215
- that looks more Ruby like. The plot specifies the 'dose' on the $x$ axis and the 'length' on
216
- the $y$ axis with the 'aes' method. 'E.aes(x: :dose, y: :len)'. To specify the type of plot to
217
- create add a geom to the plot. For a boxplot, the geom is R.geom_boxplot.
222
+ written as ```ggplot(<dataset>, ...)```. Galaaz gives you the flexibility to use
223
+ either ```R.ggplot(<dataset>, ...)``` or ```<dataset>.ggplot(...)```. In the graph s
224
+ pecification bellow, we use the second notation
225
+ that looks more like Ruby. ggplot uses the ‘aes’ method to specify
226
+ x and y axes; in this case, the 'dose' on the $x$ axis and the 'length' on
227
+ the $y$ axis: 'E.aes(x: :dose, y: :len)'. To specify the type of plot add a geom to
228
+ the plot. For a boxplot, the geom is R.geom_boxplot.
218
229
 
219
230
  Note also that we have a call to 'R.png' before plotting and 'R.dev__off' after the print
220
- statement. 'R.png' opens a 'png' device for outputting the plot. 'R.dev__off'
221
- closes the device and creates the 'png' file. If we do no pass a name to the 'png' function, the
222
- image gets a default name of 'Rplot\<nnn\>' where \<nnn\> is the number of the plot. We can
231
+ statement. 'R.png' opens a 'png device' for outputting the plot. If we do no pass a
232
+ name to the 'png' function, the
233
+ image gets a default name of 'Rplot\<nnn\>' where \<nnn\> is the number of the plot.
234
+ 'R.dev__off'
235
+ closes the device and creates the 'png' file. We can
223
236
  then include the generated 'png' file in the document by adding an rmarkdown directive.
224
237
 
225
238
  ```{ruby dose_len}
@@ -237,25 +250,29 @@ R.dev__off
237
250
  [//]: # (of gKnit, the figures should be automatically saved and the name)
238
251
  [//]: # (taken from the chunk 'label' and possibly chunk parameters)
239
252
 
240
- ![](figures/dose_len.png)
253
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/dose_len.png)
241
254
 
242
255
  Great! We've just managed to create and save our first plot in Ruby with only
243
- four lines of code. We can see with this plot a clear trend: as the dose of the supplement
256
+ four lines of code. We can now easily see with this plot a clear trend: as the
257
+ dose of the supplement
244
258
  is increased, so is the length of teeth.
245
259
 
246
260
  ## Facetting the plot
247
261
 
248
262
  This first plot shows a trend, but our data has information about two different forms
249
- of delivery method, either by Orange Juice (OJ) or by Vitamin C (VC).
250
- Let's then try to create a plot that explicits the effect of each delivery method. This next
263
+ of delivery method, either by Orange Juice OJ or by Vitamin C VC.
264
+ Let's then try to create a plot that helps us discern the effect of each
265
+ delivery method. This next
251
266
  plot is a _facetted_ plot where each delivery method gets is own plot.
252
- On the left side, the plot shows the OJ delivery method. On the right side, we see the
253
- VC delivery method. To obtain this plot, we use the 'R.facet_grid' function, that
267
+ On the left side, the plot shows the OJ delivery method. On the right side,
268
+ we see the VC delivery method. To obtain this plot, we use the
269
+ 'R.facet_grid' function, that
254
270
  automatically creates the facets based on the delivery method factors. The parameter to
255
271
  the 'facet_grid' method is a [_formula_](https://thomasleeper.com/Rcourse/Tutorials/formulae.html).
256
272
 
257
- In Galaaz, formulas are written a bit differently than in R. The following changes are
258
- necessary:
273
+ In Galaaz we give programmers the flexibility to use two different ways to write formulas.
274
+ In the first way, the following changes from writing formulas (for example 'x ~ y')
275
+ in R are necessary:
259
276
 
260
277
  * R symbols are represented by the same Ruby symbol prefixed with the '+' method. The
261
278
  symbol ```x``` in R becomes ```+:x``` in Ruby;
@@ -285,7 +302,7 @@ puts @bp
285
302
  R.dev__off
286
303
  ```
287
304
 
288
- ![](figures/facet_by_delivery.png)
305
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facet_by_delivery.png)
289
306
 
290
307
  It now becomes clear that although both methods of delivery have a direct
291
308
  impact on tooth growth, method (OJ) is non-linear having a higher impact with smaller
@@ -294,9 +311,9 @@ doses of ascorbic acid and reducing it's impact as the dose increases. With the
294
311
 
295
312
  ## Adding Color
296
313
 
297
- If this paper was about data analysis, we should make a better analysis of the trends and
298
- should improve the statistical analysis. But we are interested in working with ggplot
299
- in Ruby. So, Let's add some color to this plot to make the trend and comparison more
314
+ If we were writing about data analysis, we would make a better analysis of the trends and
315
+ improve the statistical analysis. But here we are interested in working with ggplot
316
+ in Ruby. So, let's add some color to this plot to make the trend and comparison more
300
317
  visible. In the following plot, the boxes are color coded by dose. To add color, it is
301
318
  enough to add ```fill: :dose``` to the aesthetic of boxplot. With this command each 'dose'
302
319
  factor gets its own color.
@@ -310,20 +327,20 @@ puts @bp
310
327
  R.dev__off
311
328
  ```
312
329
 
313
- ![](figures/facets_by_delivery_color.png)
330
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_by_delivery_color.png)
314
331
 
315
- Facetting helps us compare the general trends in the (OJ) and (VC) delivery methods.
316
- Adding color allow us to compare specifically how each dosage impacts the teeth growth.
317
- It is possible to observe that with smaller doses, up to 1mg, (OJ) performs better
318
- than (VC) (red color). For 2mg, both (OJ) and (VC) have the same median, but (OJ) is
332
+ Facetting helps us compare the general trends for each delivery method.
333
+ Adding color allow us to compare specifically how each dosage impacts the tooth growth.
334
+ It is possible to observe that with smaller doses, up to 1mg, OJ performs better
335
+ than VC (red color). For 2mg, both OJ and VC have the same median, but OJ is
319
336
  less disperse (blue color).
320
- For 1mg (green color), (OJ) is significantly bettern than (VC). By this very quick analysis,
321
- it seems that (OJ) is a better delivery method than (VC).
337
+ For 1mg (green color), OJ is significantly bettern than VC. By this very quick
338
+ visual analysis, it seems that OJ is a better delivery method than VC.
322
339
 
323
340
  ## Clarifying the data
324
341
 
325
342
  Boxplots give us a nice idea of the distribution of data, but looking at those plots with
326
- large colored boxes leaves us wondering what is going on on those boxes. According to
343
+ large colored boxes leaves us wondering what else is going on. According to
327
344
  Edward Tufte in Envisioning Information:
328
345
 
329
346
  > Thin data rightly prompts suspicions: "What are they leaving out? Is that really everything
@@ -336,7 +353,7 @@ And he states:
336
353
 
337
354
  > A most unconventional design strategy is revealed: _to clarify, add detail._
338
355
 
339
- Let's then use this wisdom and add yet another layer of data to our plot, so that we clarify
356
+ Let's use this wisdom and add yet another layer of data to our plot, so that we clarify
340
357
  it with detail and do not leave large empty boxes. In this next plot, we add data points for
341
358
  each of the 60 pigs in the experiment. For that, add the function 'R.geom_point' to the
342
359
  plot.
@@ -352,15 +369,17 @@ puts @bp
352
369
  R.dev__off
353
370
  ```
354
371
 
355
- ![](figures/facets_with_points.png)
372
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_with_points.png)
356
373
 
357
- Now we can see the actual distribution of all the 60 subject. Actually, this is not
374
+ Now we can see the actual distribution of all the 60 subjects. Actually, this is not
358
375
  totally true. We have a hard time seing all 60 subjects. It seems that some points
359
376
  might be placed one over the other hiding useful information.
360
377
 
361
378
  But no sweat! Another layer might solve the problem. In the following plot a new layer
362
- called 'geom_jitter' is added to the plot. This adds randomness to the position of
363
- the points, making it easier to see all of then and preventing data hiding. We also add
379
+ called 'geom_jitter' is added to the plot. Jitter adds a small amount of random variation
380
+ to the location of each point, and is a useful way of handling overplotting caused by
381
+ discreteness in smaller datasets. This makes it easier to see all of the points and
382
+ prevents data hiding. We also add
364
383
  color and change the shape of the points, making them even easier to see.
365
384
 
366
385
  ```{ruby facets_with_jitter}
@@ -372,14 +391,14 @@ puts @bp + R.geom_jitter(shape: 23, color: "cyan3", size: 1)
372
391
  R.dev__off
373
392
  ```
374
393
 
375
- ![](figures/facets_with_jitter.png)
394
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_with_jitter.png)
376
395
 
377
396
  Now we can see all 60 points in the graph. We have here a much higher information density
378
397
  and we can see outliers and subjects distribution.
379
398
 
380
399
  # Preparing the Plot for Presentation
381
400
 
382
- We have come a long way since our first plot. As was already said, this is not
401
+ We have come a long way since our first plot. As we already said, this is not
383
402
  an article about data analysis and the focus is on the
384
403
  integration of Ruby and ggplot. So, let's assume that the analysis is now done. Yet,
385
404
  ending the analysis does not mean that the work is done. On the contrary, the hardest
@@ -399,11 +418,12 @@ those colors helped us understand the data, for a final presentation random colo
399
418
  can distract the viewer.
400
419
 
401
420
  In the following plot we use shades function 'scale_fill_manual' to change
402
- the colors of the boxes and order of labels. For colors we use shades of blue for
421
+ the colors of the boxes and order of labels. For colors, we use shades of blue for
403
422
  each dosage, with light blue ('cyan')
404
- representing the lower dose and deep blue ('deepskyblue4') the higher dose. Also
405
- the smaller value (0.5) is on
406
- the botton of the labels and (2) at the top. This ordering seems more natural and
423
+ representing the lower dose and deep blue ('deepskyblue4') the higher dose.
424
+ Also, the legend could be improved: we use the ‘breaks’ parameter to put
425
+ the smaller value (0.5) at the botton of the labels and the largest (2) at the top.
426
+ This ordering seems more natural and
407
427
  matches with the actual order of the colors in the plot.
408
428
 
409
429
  ```{ruby facets_by_delivery_color2}
@@ -418,7 +438,7 @@ puts @bp
418
438
  R.dev__off
419
439
  ```
420
440
 
421
- ![](figures/facets_by_delivery_color2.png)
441
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_by_delivery_color2.png)
422
442
 
423
443
 
424
444
  ## Violin Plot and Jitter
@@ -452,7 +472,7 @@ puts @violin
452
472
  R.dev__off
453
473
  ```
454
474
 
455
- ![](figures/violin_with_jitter.png)
475
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/violin_with_jitter.png)
456
476
 
457
477
  This plot is an alternative to the original boxplot. For the final presentation, it is
458
478
  important to think which graphics will be best understood by our audience. A violin plot
@@ -462,14 +482,15 @@ bit better than the boxplot and provides even more information than the boxplot
462
482
  ## Adding Decoration
463
483
 
464
484
  Our final plot is starting to take shape, but a presentation plot should have at least a
465
- title, labels on the axis and maybe some other decorations. Let's start adding those.
485
+ title, labels on the axes and maybe some other decorations. Let's start adding those.
466
486
  Since decoration requires more graph area, this new plot has a 'width' and 'height'
467
- specification. When there is no specification, the default values for width and height are
468
- 480.
487
+ specification. When there is no specification, the default values from R for width and
488
+ height are 480.
469
489
 
470
- The 'labs' function adds require decoration. In this example we use 'title', 'subtitle',
471
- 'x' for the $x$ axis label and 'y', for the $y$ axis label, and 'caption' for information
472
- about the plot.
490
+ The 'labs' function adds the required decoration. In this example we use 'title',
491
+ 'subtitle', 'x' for the $x$ axis label and 'y', for the $y$ axis label, and 'caption'
492
+ for information about the plot (for clarity, we defined a caption variable using Ruby's
493
+ Here Doc style).
473
494
 
474
495
  ```{ruby facets_with_decorations}
475
496
  R.png("figures/facets_with_decorations.png", width: 540, height: 560)
@@ -480,8 +501,8 @@ Each animal received one of three dose levels of vitamin C.
480
501
  EOT
481
502
 
482
503
  @decorations =
483
- R.labs(title: "Tooth Growth: Length by Dose",
484
- subtitle: "Faceted by delivery method, (OJ) or (VC)",
504
+ R.labs(title: "Tooth Growth: Length vs Vitamin C Dose",
505
+ subtitle: "Faceted by delivery method, OJ or VC",
485
506
  x: "Dose (mg)", y: "Teeth length",
486
507
  caption: caption)
487
508
 
@@ -490,16 +511,19 @@ puts @bp + @decorations
490
511
  R.dev__off
491
512
  ```
492
513
 
493
- ![](figures/facets_with_decorations.png)
494
-
514
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/facets_with_decorations.png)
495
515
 
496
516
  ## The Corp Theme
497
517
 
498
- We are almost done. But the plot does not yet look nice to the eye. We are still distracted
518
+ We are almost done. But the default plot configuration does not yet look
519
+ nice to the eye. We are still distracted
499
520
  by many aspects of the graph. First, the back font color does not look good. Then
500
521
  plot background, borders, grids all add clutter to the plot.
501
522
 
502
- We will now define our corporate theme. In this theme, we remove borders and grids. The
523
+ We will now define our corporate theme. in a module that can be used/loaded for all
524
+ plots, similar to CSS or any other style definition.
525
+
526
+ In this theme, we remove borders and grids. The
503
527
  background if left for faceted plots but removed for non-faceted plots. Font colors are
504
528
  a shade o blue (color: '#00080'). Axis labels are moved near the end of the axis and
505
529
  written in 'bold'.
@@ -558,6 +582,13 @@ end
558
582
 
559
583
  ## Final Box Plot
560
584
 
585
+ We can now easily make our final boxplot and violin plot. All the layers for the plot were
586
+ added in order to expose our understanding of the data and the need to present the result
587
+ to our audience.
588
+
589
+ The final specification is just the addition of all layers build up to this point (@bp), plus
590
+ the decorations (@decorations), plus the corporate theme.
591
+
561
592
  Here is our final boxplot, without jitter.
562
593
 
563
594
  ```{ruby final_box_plot}
@@ -568,11 +599,9 @@ puts @bp + @decorations + CorpTheme.global_theme(faceted: true)
568
599
  R.dev__off
569
600
  ```
570
601
 
571
- ![](figures/final_box_plot.png)
602
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_box_plot.png)
572
603
 
573
- ## Final Violin Plot
574
-
575
- Here is the final violin plot, with jitter and the same look and feel of the corporate
604
+ And here is the final violin plot, with jitter and the same look and feel of the corporate
576
605
  boxplot.
577
606
 
578
607
  ```{ruby final_violin_plot}
@@ -583,12 +612,14 @@ puts @violin + @decorations + CorpTheme.global_theme(faceted: true)
583
612
  R.dev__off
584
613
  ```
585
614
 
586
- ![](figures/final_violin_plot.png)
615
+
616
+ ![]https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_violin_plot.png
587
617
 
588
618
  ## Another View
589
619
 
590
- Finally, here is a last plot, with the same look and feel as before but facetted by
591
- dose and not by supplement.
620
+ We now make another plot, with the same look and feel as before but facetted by
621
+ dose and not by supplement. This shows how easy it is to create new plots by just
622
+ changing small statement on the _grammar of graphics_.
592
623
 
593
624
  ```{ruby facet_by_dose}
594
625
  R.png("figures/facet_by_dose.png", width: 540, height: 560)
@@ -611,24 +642,26 @@ puts @bp
611
642
  R.dev__off
612
643
  ```
613
644
 
614
- ![](figures/facet_by_dose.png)
645
+ ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facet_by_dose.png)
615
646
 
616
647
  # Conclusion
617
648
 
618
- Galaaz tightly couples Ruby and R in a way that Ruby developers do not need to be aware
649
+ In this article, we introduce Galaaz and show how to tightly couple Ruby and R
650
+ in a way that Ruby developers do not need to be aware
619
651
  of the executing R engine. For the Ruby developer the existence of R
620
- is of no consequence. For her, she is just coding in Ruby. On the other hand, for the R
621
- developer, migration to Ruby is a matter of small syntactic changes and very gentle
652
+ is of no consequence, she is just coding in Ruby. On the other hand, for the R
653
+ developer, migration to Ruby is a matter of small syntactic changes with a very gentle
622
654
  learning curve. As the R developer becomes more proficient in Ruby, he can start using
623
655
  'classes', 'modules', 'procs', 'lambdas'.
624
656
 
625
- This coupling shows the power of GraalVM and Truffle polyglot environment. Trying to
626
- bring to Ruby the power of R starting from scratch is an enourmous endeavour and would
627
- probably never be accomplished. Today's data scientists would certainly stick with either
628
- Python or R. Now, both the Ruby and R communities might benefit from this marriage. Also,
629
- the process to couple Ruby and R can be also be done to couple Ruby and JavaScript and
630
- maybe also Ruby and Python. In a polyglot world a *uniglot* language might be extremely
631
- relevant.
657
+ Trying to bring to Ruby the power of R starting from scratch is an enourmous endeavour
658
+ and would probably never be accomplished. Today's data scientists would certainly
659
+ stick with either Python or R. Now, both the Ruby and R communities can benefit
660
+ from this marriage, provided by Galaaz on top of GraalVM and Truffle's
661
+ polyglot environment. We presented
662
+ the process to couple Ruby and R, but this process can also be done to couple Ruby
663
+ and JavaScript or Ruby and Python. In a polyglot world a *uniglot* language might
664
+ be extremely relevant.
632
665
 
633
666
  From the perspective of performance, GraalVM and Truffle promises improvements that could
634
667
  reach over 10 times, both for [FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
@@ -636,15 +669,14 @@ and for [TruffleRuby](https://rubykaigi.org/2018/presentations/eregontp.html).
636
669
 
637
670
  This article has shown how to improve a plot step-by-step. Starting from a very simple
638
671
  boxplot with all default configurations, we moved slowly to our final plot. The important
639
- point here is not if the final plot is actually beautiful, but that there is a process
640
- of small steps improvements that can be followed until getting a final plot ready for
641
- presentation.
672
+ point here is not if the final plot is actually beautiful (as beauty is in the eye of
673
+ the beholder), but that there is a process of small steps improvements that can be followed
674
+ to getting a final plot ready for presentation.
642
675
 
643
676
  Finally, this whole article was written in rmarkdown and compiled to HTML by _gknit_, an
644
677
  application that wraps _knitr_ and allows documenting Ruby code. This application can
645
678
  be of great help for any Rubyist trying to write articles, blogs or documentation for Ruby.
646
679
 
647
-
648
680
  # Installing Galaaz
649
681
 
650
682
  ## Prerequisites
@@ -671,10 +703,3 @@ the gnu compiler and tools should be enough. I am not sure what is needed on th
671
703
 
672
704
  * gknit <filename>
673
705
  * In a scrip add: require 'galaaz'
674
-
675
-
676
- And now that you’ve read this far, here’s how to submit your story to the freeCodeCamp
677
- publication: send an email to submit at freecodecamp org. Include the URL for your story on
678
- Medium (preferably an unpublished draft) and the word “bananas” so that we’ll know that you
679
- have read all this. Only send one story URL per email. There’s no need to add anything
680
- further to your email — we just read the stories and judge them based on their own merits.