galaaz 0.4.6 → 0.5.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (181) hide show
  1. checksums.yaml +5 -5
  2. data/README.md +3575 -118
  3. data/Rakefile +21 -4
  4. data/bin/gknit +152 -6
  5. data/bin/gknit-draft +105 -0
  6. data/bin/gknit-draft.rb +28 -0
  7. data/bin/gknit_Rscript +127 -0
  8. data/bin/grun +27 -1
  9. data/bin/gstudio +47 -4
  10. data/bin/{gstudio.rb → gstudio_irb.rb} +0 -0
  11. data/bin/gstudio_pry.rb +7 -0
  12. data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +3 -12
  13. data/blogs/galaaz_ggplot/galaaz_ggplot.html +77 -222
  14. data/blogs/galaaz_ggplot/galaaz_ggplot.md +4 -31
  15. data/blogs/galaaz_ggplot/galaaz_ggplot.pdf +0 -0
  16. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/midwest_rb.png +0 -0
  17. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/scatter_plot_rb.png +0 -0
  18. data/blogs/galaaz_ggplot/midwest.Rmd +1 -9
  19. data/blogs/gknit/gknit.Rmd +232 -123
  20. data/blogs/{dev/dev.html → gknit/gknit.html} +1897 -33
  21. data/blogs/gknit/gknit.pdf +0 -0
  22. data/blogs/gknit/lst.rds +0 -0
  23. data/blogs/gknit/stats.bib +27 -0
  24. data/blogs/manual/lst.rds +0 -0
  25. data/blogs/manual/manual.Rmd +1893 -47
  26. data/blogs/manual/manual.html +3153 -347
  27. data/blogs/manual/manual.md +3575 -118
  28. data/blogs/manual/manual.pdf +0 -0
  29. data/blogs/manual/manual.tex +4026 -0
  30. data/blogs/manual/manual_files/figure-html/bubble-1.png +0 -0
  31. data/blogs/manual/manual_files/figure-html/diverging_bar.png +0 -0
  32. data/blogs/manual/manual_files/figure-latex/bubble-1.png +0 -0
  33. data/blogs/manual/manual_files/figure-latex/diverging_bar.pdf +0 -0
  34. data/blogs/{dev → manual}/model.rb +0 -0
  35. data/blogs/nse_dplyr/nse_dplyr.Rmd +849 -0
  36. data/blogs/nse_dplyr/nse_dplyr.html +878 -0
  37. data/blogs/nse_dplyr/nse_dplyr.md +1198 -0
  38. data/blogs/nse_dplyr/nse_dplyr.pdf +0 -0
  39. data/blogs/oh_my/oh_my.html +274 -386
  40. data/blogs/oh_my/oh_my.md +208 -205
  41. data/blogs/ruby_plot/ruby_plot.Rmd +64 -84
  42. data/blogs/ruby_plot/ruby_plot.html +235 -208
  43. data/blogs/ruby_plot/ruby_plot.md +239 -34
  44. data/blogs/ruby_plot/ruby_plot.pdf +0 -0
  45. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
  46. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
  47. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
  48. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
  49. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
  50. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
  51. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
  52. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
  53. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
  54. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
  55. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
  56. data/examples/Bibliography/master.bib +50 -0
  57. data/examples/Bibliography/stats.bib +72 -0
  58. data/examples/islr/ch2.spec.rb +1 -1
  59. data/examples/islr/ch3_boston.rb +4 -4
  60. data/examples/islr/x_y_rnorm.jpg +0 -0
  61. data/examples/latex_templates/Test-acm_article/Makefile +16 -0
  62. data/examples/latex_templates/Test-acm_article/Test-acm_article.Rmd +65 -0
  63. data/examples/latex_templates/Test-acm_article/acm_proc_article-sp.cls +1670 -0
  64. data/examples/latex_templates/Test-acm_article/sensys-abstract.cls +703 -0
  65. data/examples/latex_templates/Test-acm_article/sigproc.bib +59 -0
  66. data/examples/latex_templates/Test-acs_article/Test-acs_article.Rmd +260 -0
  67. data/examples/latex_templates/Test-acs_article/Test-acs_article.pdf +0 -0
  68. data/examples/latex_templates/Test-acs_article/acs-Test-acs_article.bib +11 -0
  69. data/examples/latex_templates/Test-acs_article/acs-my_output.bib +11 -0
  70. data/examples/latex_templates/Test-acs_article/acstest.bib +17 -0
  71. data/examples/latex_templates/Test-aea_article/AEA.cls +1414 -0
  72. data/examples/latex_templates/Test-aea_article/BibFile.bib +0 -0
  73. data/examples/latex_templates/Test-aea_article/Test-aea_article.Rmd +108 -0
  74. data/examples/latex_templates/Test-aea_article/Test-aea_article.pdf +0 -0
  75. data/examples/latex_templates/Test-aea_article/aea.bst +1269 -0
  76. data/examples/latex_templates/Test-aea_article/multicol.sty +853 -0
  77. data/examples/latex_templates/Test-aea_article/references.bib +0 -0
  78. data/examples/latex_templates/Test-aea_article/setspace.sty +546 -0
  79. data/examples/latex_templates/Test-amq_article/Test-amq_article.Rmd +256 -0
  80. data/examples/latex_templates/Test-amq_article/Test-amq_article.pdf +0 -0
  81. data/examples/latex_templates/Test-amq_article/Test-amq_article.pdfsync +3397 -0
  82. data/examples/latex_templates/Test-amq_article/pics/Figure2.pdf +0 -0
  83. data/examples/latex_templates/Test-ams_article/Test-ams_article.Rmd +215 -0
  84. data/examples/latex_templates/Test-ams_article/amstest.bib +436 -0
  85. data/examples/latex_templates/Test-asa_article/Test-asa_article.Rmd +153 -0
  86. data/examples/latex_templates/Test-asa_article/Test-asa_article.pdf +0 -0
  87. data/examples/latex_templates/Test-asa_article/agsm.bst +1353 -0
  88. data/examples/latex_templates/Test-asa_article/bibliography.bib +233 -0
  89. data/examples/latex_templates/Test-ieee_article/IEEEtran.bst +2409 -0
  90. data/examples/latex_templates/Test-ieee_article/IEEEtran.cls +6346 -0
  91. data/examples/latex_templates/Test-ieee_article/Test-ieee_article.Rmd +175 -0
  92. data/examples/latex_templates/Test-ieee_article/Test-ieee_article.pdf +0 -0
  93. data/examples/latex_templates/Test-ieee_article/mybibfile.bib +20 -0
  94. data/examples/latex_templates/Test-rjournal_article/RJournal.sty +335 -0
  95. data/examples/latex_templates/Test-rjournal_article/RJreferences.bib +18 -0
  96. data/examples/latex_templates/Test-rjournal_article/RJwrapper.pdf +0 -0
  97. data/examples/latex_templates/Test-rjournal_article/Test-rjournal_article.Rmd +52 -0
  98. data/examples/latex_templates/Test-springer_article/Test-springer_article.Rmd +65 -0
  99. data/examples/latex_templates/Test-springer_article/Test-springer_article.pdf +0 -0
  100. data/examples/latex_templates/Test-springer_article/bibliography.bib +26 -0
  101. data/examples/latex_templates/Test-springer_article/spbasic.bst +1658 -0
  102. data/examples/latex_templates/Test-springer_article/spmpsci.bst +1512 -0
  103. data/examples/latex_templates/Test-springer_article/spphys.bst +1443 -0
  104. data/examples/latex_templates/Test-springer_article/svglov3.clo +113 -0
  105. data/examples/latex_templates/Test-springer_article/svjour3.cls +1431 -0
  106. data/examples/misc/moneyball.rb +1 -1
  107. data/examples/misc/subsetting.rb +37 -37
  108. data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.Rmd +73 -0
  109. data/examples/rmarkdown/svm-rmarkdown-anon-ms-example/svm-rmarkdown-anon-ms-example.pdf +0 -0
  110. data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.Rmd +382 -0
  111. data/examples/rmarkdown/svm-rmarkdown-article-example/svm-rmarkdown-article-example.pdf +0 -0
  112. data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.Rmd +164 -0
  113. data/examples/rmarkdown/svm-rmarkdown-beamer-example/svm-rmarkdown-beamer-example.pdf +0 -0
  114. data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.Rmd +92 -0
  115. data/examples/rmarkdown/svm-rmarkdown-cv/svm-rmarkdown-cv.pdf +0 -0
  116. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/attend-grade-relationships.csv +482 -0
  117. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.Rmd +280 -0
  118. data/examples/rmarkdown/svm-rmarkdown-syllabus-example/svm-rmarkdown-syllabus-example.pdf +0 -0
  119. data/examples/rmarkdown/svm-xaringan-example/svm-xaringan-example.Rmd +386 -0
  120. data/lib/R_interface/r.rb +2 -2
  121. data/lib/R_interface/r_libs.R +6 -1
  122. data/lib/R_interface/r_methods.rb +12 -2
  123. data/lib/R_interface/rdata_frame.rb +8 -17
  124. data/lib/R_interface/rindexed_object.rb +1 -2
  125. data/lib/R_interface/rlist.rb +1 -0
  126. data/lib/R_interface/robject.rb +20 -23
  127. data/lib/R_interface/rpkg.rb +15 -6
  128. data/lib/R_interface/rsupport.rb +13 -19
  129. data/lib/R_interface/ruby_extensions.rb +14 -18
  130. data/lib/R_interface/rvector.rb +0 -12
  131. data/lib/gknit.rb +2 -0
  132. data/lib/gknit/draft.rb +105 -0
  133. data/lib/gknit/knitr_engine.rb +6 -37
  134. data/lib/util/exec_ruby.rb +22 -84
  135. data/lib/util/inline_file.rb +7 -3
  136. data/specs/figures/bg.jpeg +0 -0
  137. data/specs/figures/bg.png +0 -0
  138. data/specs/figures/bg.svg +2 -2
  139. data/specs/figures/dose_len.png +0 -0
  140. data/specs/figures/no_args.jpeg +0 -0
  141. data/specs/figures/no_args.png +0 -0
  142. data/specs/figures/no_args.svg +2 -2
  143. data/specs/figures/width_height.jpeg +0 -0
  144. data/specs/figures/width_height.png +0 -0
  145. data/specs/figures/width_height_units1.jpeg +0 -0
  146. data/specs/figures/width_height_units1.png +0 -0
  147. data/specs/figures/width_height_units2.jpeg +0 -0
  148. data/specs/figures/width_height_units2.png +0 -0
  149. data/specs/r_dataframe.spec.rb +184 -11
  150. data/specs/r_list.spec.rb +4 -4
  151. data/specs/r_list_apply.spec.rb +11 -10
  152. data/specs/ruby_expression.spec.rb +3 -11
  153. data/specs/tmp.rb +106 -34
  154. data/version.rb +1 -1
  155. metadata +96 -33
  156. data/bin/gknit_old_r +0 -236
  157. data/blogs/dev/dev.Rmd +0 -77
  158. data/blogs/dev/dev.md +0 -87
  159. data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
  160. data/blogs/dev/dev_files/figure-html/diverging_bar. +0 -0
  161. data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0
  162. data/blogs/dplyr/dplyr.rb +0 -63
  163. data/blogs/galaaz_ggplot/galaaz_ggplot.aux +0 -43
  164. data/blogs/galaaz_ggplot/galaaz_ggplot.log +0 -640
  165. data/blogs/galaaz_ggplot/galaaz_ggplot.out +0 -10
  166. data/blogs/galaaz_ggplot/galaaz_ggplot.tex +0 -481
  167. data/blogs/galaaz_ggplot/midwest.png +0 -0
  168. data/blogs/galaaz_ggplot/scatter_plot.png +0 -0
  169. data/blogs/ruby_plot/ruby_plot.Rmd_external_figs +0 -662
  170. data/blogs/ruby_plot/ruby_plot.tex +0 -1077
  171. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +0 -57
  172. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +0 -106
  173. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +0 -110
  174. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +0 -174
  175. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +0 -236
  176. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +0 -296
  177. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +0 -236
  178. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +0 -218
  179. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +0 -128
  180. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +0 -150
  181. data/examples/paper/paper.rb +0 -36
@@ -1,662 +0,0 @@
1
- ---
2
- title: "How to make Beautiful Ruby Plots with Galaaz"
3
- author:
4
- - "Rodrigo Botafogo"
5
- - "Daniel Mossé - University of Pittsburgh"
6
- tags: [Tech, Data Science, Ruby, R, GraalVM]
7
- date: "November 19th, 2018"
8
- output:
9
- html_document:
10
- self_contained: true
11
- keep_md: true
12
- pdf_document:
13
- includes:
14
- in_header: "../../sty/galaaz.sty"
15
- keep_tex: yes
16
- number_sections: yes
17
- toc: true
18
- toc_depth: 2
19
- fontsize: 11pt
20
- ---
21
-
22
- ```{r setup, echo=FALSE}
23
- # set global chunk options. We want all figures to be 'svg'
24
- knitr::opts_chunk$set(fig.width=7, fig.height=7, dev="svg")
25
- ```
26
-
27
- According to Wikipedia "Ruby is a dynamic, interpreted, reflective, object-oriented,
28
- general-purpose programming language. It was designed and developed in the mid-1990s by Yukihiro
29
- "Matz" Matsumoto in Japan." It reached high popularity with the development of Ruby on Rails
30
- (RoR) by David Heinemeier Hansson. RoR is a web application framework first released
31
- around 2005. It makes extensive use of Ruby's metaprogramming features. With RoR,
32
- Ruby became very popular. According to [Ruby's Tiobe index](https://www.tiobe.com/tiobe-index/ruby/)
33
- it peeked in popularity around 2008, then declined until 2015 when it started picking up again.
34
- At the time of this writing (November 2018), the Tiobe index puts Ruby in 16th position as
35
- most popular language.
36
-
37
- Python, a language similar to Ruby, ranks 4th in the index. Java, C and C++ take the
38
- first three positions. Ruby is often criticized for its focus on web applications.
39
- But Ruby can do [much more](https://github.com/markets/awesome-ruby) than just web applications.
40
- Yet, for scientific computing, Ruby lags way behind Python and R. Python has
41
- Django framework for web, NumPy for numerical arrays, Pandas for data analysis.
42
- R is a free software environment for statistical computing and graphics with thousands
43
- of libraries for data analysis.
44
-
45
- Until recently, there was no real perspective for Ruby to bridge this gap.
46
- Implementing a complete scientific computing infrastructure would take too long.
47
- Enters [Oracle's GraalVM](https://www.graalvm.org/):
48
-
49
- > GraalVM is a universal virtual machine for running applications written in
50
- > JavaScript, Python 3, Ruby, R, JVM-based languages like Java, Scala, Kotlin,
51
- > and LLVM-based languages such as C and C++.
52
- >
53
- > GraalVM removes the isolation between programming languages and enables
54
- > interoperability in a shared runtime. It can run either standalone or in the
55
- > context of OpenJDK, Node.js, Oracle Database, or MySQL.
56
- >
57
- > GraalVM allows you to write polyglot applications with a seamless way to pass
58
- > values from one language to another. With GraalVM there is no copying or
59
- > marshaling necessary as it is with other polyglot systems. This lets you
60
- > achieve high performance when language boundaries are crossed. Most of the time
61
- > there is no additional cost for crossing a language boundary at all.
62
- >
63
- > Often developers have to make uncomfortable compromises that require them
64
- > to rewrite their software in other languages. For example:
65
- >
66
- > * That library is not available in my language. I need to rewrite it.
67
- > * That language would be the perfect fit for my problem, but we cannot
68
- > run it in our environment.
69
- > * That problem is already solved in my language, but the language is
70
- > too slow.
71
- >
72
- > With GraalVM we aim to allow developers to freely choose the right language for
73
- > the task at hand without making compromises.
74
-
75
- As stated above, GraalVM is a _universal_ virtual machine that allows Ruby and R (and other
76
- languages) to run on the same environment. GraalVM allows polyglot applications to
77
- _seamlessly_ interact with one another and pass values from one language to the other.
78
- Although a great idea, GraalVM still requires application writers to know several languages.
79
- To eliminate that requirement, we built Galaaz, a gem for Ruby, to tightly couple
80
- Ruby and R and allow those languages to interact in a way that the user will be unaware
81
- of such interaction. In other words, a Ruby programmer will be able to use all
82
- the capabilities of R without knowing the R syntax.
83
-
84
- Library wrapping is a usual way of bringing features from one language into another.
85
- To improve performance, Python often wraps more efficient C libraries. For the
86
- Python developer, the existence of such C libraries is hidden. The problem with
87
- library wrapping is that for any new library, there is the need to handcraft a new
88
- wrapper.
89
-
90
- Galaaz, instead of wrapping a single C or R library, wraps the whole R language
91
- in Ruby. Doing so, all thousands of R libraries are available immediately
92
- to Ruby developers without any new wrapping effort.
93
-
94
- To show the power of Galaaz, we show in this article how Ruby can use R's ggplot2
95
- library tranparantly bringing to Ruby the power of high quality scientific plotting.
96
- We also show that migrating from R to Ruby with Galaaz is a matter of small
97
- syntactic changes. By using Ruby, the R developer can use all of Ruby's powerful
98
- object-oriented features. Also, with Ruby, it becomes much easier to move code
99
- from the analysis phase to the production phase.
100
-
101
- In this article we will explore the R ToothGrowth dataset. To illustrate, we will
102
- create some boxplots. A primer on boxplot is available in
103
- [this article](https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51).
104
-
105
- We will also create a Corporate Template ensuring that plots will have a consistent
106
- visualization. This template is built using a Ruby module. There is a way of building
107
- ggplot themes that will work the same as the Ruby module. Yet, writing a new theme
108
- requires specific knowledge on theme writing. Ruby modules are standard to the
109
- language and don't need special knowledge.
110
-
111
- [Here](https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021) we show a scatter plot in Ruby also with Galaaz.
112
-
113
- # gKnit
114
-
115
- _Knitr_ is an application that converts text written in rmarkdown to many
116
- different output formats. For instance, a writer can convert an rmarkdown document
117
- to HTML, $LaTex$, docx and many other formats. Rmarkdown documents can contain
118
- text and _code chunks_. Knitr formats code chunks in a grayed box in the output document.
119
- It also executes the code chunks and formats the output in a white box. Every line of
120
- output from the execution code is preceded by '##'.
121
-
122
- Knitr allows code chunks to be in R, Python,
123
- Ruby and dozens of other languages. Yet, while R and Python chunks can share data, in other
124
- languages, chunks are independent. This means that a variable defined in one chunk
125
- cannot be used in another chunk.
126
-
127
- With _gKnit_ Ruby code chunks can share data. In gKnit each
128
- Ruby chunk executes in its own scope and thus, local variable defined in a chunk are
129
- not accessible by other chunks. Yet, All chunks execute in the scope of a 'chunk'
130
- class and instance variables ('@'), are available in all chunks.
131
-
132
- # Exploring the Dataset
133
-
134
- Let's start by exploring our selected dataset. ToothGrowth is an R dataset. A dataset
135
- is like a simple excel spreadsheet, in which each column has only one type of data.
136
- For instance one column can have float, the other integer, and a third strings.
137
- This dataset analyzes the length of odontoblasts (cells responsible for tooth growth)
138
- in 60 guinea pigs, where each animal received one of three dose levels of Vitamin C
139
- (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice OJ or ascorbic acid
140
- (a form of vitamin C and coded as VC).
141
-
142
- The ToothGrowth dataset contains three columns: 'len', 'supp' and 'dose'. Let's
143
- take a look at a few rows of this dataset. In Galaaz, R variables are accessed
144
- by using the corresponding Ruby symbol preceeded by the tilda ('~') function. Note in the
145
- following chunk that 'ToothGrowth' is the R variable and Ruby's '@tooth_growth' is
146
- assigned the value of '~:ToothGrowth'.
147
-
148
- ```{ruby head}
149
- # Read the R ToothGrowth variable and assign it to the
150
- # Ruby instance variable @tooth_growth that will be
151
- # available to all Ruby chunks in this document.
152
- @tooth_growth = ~:ToothGrowth
153
- # print the first few elements of the dataset
154
- puts @tooth_growth.head
155
- ```
156
-
157
- Great! We've managed to read the ToothGrowth dataset and take a look at its elements.
158
- We see here the first 6 rows of the dataset. To access a column, follow the dataset name
159
- with a dot ('.') and the name of the column. Also use dot notation to chain methods
160
- in usual Ruby style.
161
-
162
- ```{ruby dataset_columns}
163
- # Access the tooth_growth 'len' column and print the first few
164
- # elements of this column with the 'head' method.
165
- puts @tooth_growth.len.head
166
- ```
167
-
168
- The 'dose' column contains a numeric value with either, 0.5, 1 or 2, although the
169
- first 6 rows as seen above only contain the 0.5 values. Even though those are
170
- number, they are better interpreted as a [factor or cathegory](https://swcarpentry.github.io/r-novice-inflammation/12-supp-factors/). So, let's convert our 'dose' column from numeric to 'factor'.
171
- In R, the function 'as.factor' is used to convert data in a vector to factors. To use this
172
- function from Galaaz the dot ('.') in the function name is substituted by '__' (double underline).
173
- The function 'as.factor' becomes 'R.as__factor' or just 'as__factor' when chaining.
174
-
175
- ```{ruby tooth_growth}
176
- # convert the dose to a factor
177
- @tooth_growth.dose = @tooth_growth.dose.as__factor
178
- ```
179
-
180
- Let's explore some more details of this dataset. In particular, let's look at its dimensions,
181
- structure and summary statistics.
182
-
183
- ```{ruby dim}
184
- puts @tooth_growth.dim
185
- ```
186
-
187
- This dataset has 60 rows, one for each subject and 3 columns, as we have already seen.
188
-
189
- Note that we do not need to call 'puts' when using the 'str' function. This
190
- functions does not return anything and prints the structure of the dataset
191
- as a side effect.
192
-
193
- ```{ruby str}
194
- @tooth_growth.str
195
- ```
196
- Observe that both variables 'supp' and 'dose' are factors. The system made variable 'supp'
197
- a factor automatically, since it contais two strings OJ and VC.
198
-
199
- Finally, using the summary method, we get the statistical summary for the dataset
200
-
201
- ```{ruby summary}
202
- puts @tooth_growth.summary
203
- ```
204
-
205
- # Doing the Data Analysis
206
-
207
- ## Quick plot for seing the data
208
-
209
- Let's now create our first plot with the given data by accessing ggplot2 from Ruby.
210
- For Rubyists that have never seen or used ggplot2, here is the description of ggplot
211
- found in its home page:
212
-
213
- > "ggplot2 is a system for declaratively creating graphics, based on _The Grammar of Graphics_.
214
- > You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical
215
- > primitives to use, and it takes care of the details."
216
-
217
- This description might be a bit cryptic and it is best to see it at work to understand it.
218
- Basically, in the _grammar of graphics_ developers add layers of components such as grid,
219
- axis, data, title, subtitle and also graphical primitives such as _bar plot_, _box plot_,
220
- to form the final graphics.
221
-
222
- In order to make a plot, we use the 'ggplot' function to the dataset. In R, this would be
223
- written as ```ggplot(<dataset>, ...)```. Galaaz gives you the flexibility to use
224
- either ```R.ggplot(<dataset>, ...)``` or ```<dataset>.ggplot(...)```. In the graph s
225
- pecification bellow, we use the second notation
226
- that looks more like Ruby. ggplot uses the ‘aes’ method to specify
227
- x and y axes; in this case, the 'dose' on the $x$ axis and the 'length' on
228
- the $y$ axis: 'E.aes(x: :dose, y: :len)'. To specify the type of plot add a geom to
229
- the plot. For a boxplot, the geom is R.geom_boxplot.
230
-
231
- Note also that we have a call to 'R.png' before plotting and 'R.dev__off' after the print
232
- statement. 'R.png' opens a 'png device' for outputting the plot. If we do no pass a
233
- name to the 'png' function, the
234
- image gets a default name of 'Rplot\<nnn\>' where \<nnn\> is the number of the plot.
235
- 'R.dev__off'
236
- closes the device and creates the 'png' file. We can
237
- then include the generated 'png' file in the document by adding an rmarkdown directive.
238
-
239
- ```{ruby dose_len}
240
- require 'ggplot'
241
-
242
- e = @tooth_growth.ggplot(E.aes(x: :dose, y: :len))
243
- print e + R.geom_boxplot
244
- ```
245
-
246
- [//]: # (Including the 'png' file generated above. In future releases)
247
- [//]: # (of gKnit, the figures should be automatically saved and the name)
248
- [//]: # (taken from the chunk 'label' and possibly chunk parameters)
249
-
250
- ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/dose_len.png)
251
-
252
- Great! We've just managed to create and save our first plot in Ruby with only
253
- four lines of code. We can now easily see with this plot a clear trend: as the
254
- dose of the supplement
255
- is increased, so is the length of teeth.
256
-
257
- ## Facetting the plot
258
-
259
- This first plot shows a trend, but our data has information about two different forms
260
- of delivery method, either by Orange Juice OJ or by Vitamin C VC.
261
- Let's then try to create a plot that helps us discern the effect of each
262
- delivery method. This next
263
- plot is a _facetted_ plot where each delivery method gets is own plot.
264
- On the left side, the plot shows the OJ delivery method. On the right side,
265
- we see the VC delivery method. To obtain this plot, we use the
266
- 'R.facet_grid' function, that
267
- automatically creates the facets based on the delivery method factors. The parameter to
268
- the 'facet_grid' method is a [_formula_](https://thomasleeper.com/Rcourse/Tutorials/formulae.html).
269
-
270
- In Galaaz we give programmers the flexibility to use two different ways to write formulas.
271
- In the first way, the following changes from writing formulas (for example 'x ~ y')
272
- in R are necessary:
273
-
274
- * R symbols are represented by the same Ruby symbol prefixed with the '+' method. The
275
- symbol ```x``` in R becomes ```+:x``` in Ruby;
276
- * The '~' operator in R becomes '=~' in Ruby. The formula ```x ~ y``` in R is written as
277
- ```+:x =~ +:y``` in Ruby;
278
- * The '.' symbol in R becomes '+:all'
279
-
280
- Another way of writing a formula is to use the 'formula' function with the actual formula as
281
- a string. The formula ```x ~ y``` in R can be written as ```R.formula("x ~ y")```. For more
282
- complex formulas, the use of the 'formula' function is preferred.
283
-
284
- The formula ```+:all =~ +:supp``` indicates to the 'facet_grid' function that it needs to
285
- facet the plot based on the ```supp``` variable and split the plot vertically. Changing
286
- the formula to ```+:supp =~ +:all``` would split the plot horizontally.
287
-
288
- ```{ruby facet_by_delivery}
289
- @base_tooth = @tooth_growth.ggplot(E.aes(x: :dose, y: :len, group: :dose))
290
-
291
- @bp = @base_tooth + R.geom_boxplot +
292
- # Split in vertical direction
293
- R.facet_grid(+:all =~ +:supp)
294
-
295
- puts @bp
296
- ```
297
-
298
- ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facet_by_delivery.png)
299
-
300
- It now becomes clear that although both methods of delivery have a direct
301
- impact on tooth growth, method (OJ) is non-linear having a higher impact with smaller
302
- doses of ascorbic acid and reducing it's impact as the dose increases. With the
303
- (VC) approach, the impact seems to be more linear.
304
-
305
- ## Adding Color
306
-
307
- If we were writing about data analysis, we would make a better analysis of the trends and
308
- improve the statistical analysis. But here we are interested in working with ggplot
309
- in Ruby. So, let's add some color to this plot to make the trend and comparison more
310
- visible. In the following plot, the boxes are color coded by dose. To add color, it is
311
- enough to add ```fill: :dose``` to the aesthetic of boxplot. With this command each 'dose'
312
- factor gets its own color.
313
-
314
- ```{ruby facets_by_delivery_color}
315
- @bp = @bp + R.geom_boxplot(E.aes(fill: :dose))
316
- puts @bp
317
- ```
318
-
319
- ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_by_delivery_color.png)
320
-
321
- Facetting helps us compare the general trends for each delivery method.
322
- Adding color allow us to compare specifically how each dosage impacts the tooth growth.
323
- It is possible to observe that with smaller doses, up to 1mg, OJ performs better
324
- than VC (red color). For 2mg, both OJ and VC have the same median, but OJ is
325
- less disperse (blue color).
326
- For 1mg (green color), OJ is significantly bettern than VC. By this very quick
327
- visual analysis, it seems that OJ is a better delivery method than VC.
328
-
329
- ## Clarifying the data
330
-
331
- Boxplots give us a nice idea of the distribution of data, but looking at those plots with
332
- large colored boxes leaves us wondering what else is going on. According to
333
- Edward Tufte in Envisioning Information:
334
-
335
- > Thin data rightly prompts suspicions: "What are they leaving out? Is that really everything
336
- > they know? What are they hiding? Is that all they did?" Now and then it is claimed
337
- > that vacant space is "friendly" (anthropomorphizing an inherently murky idea) but
338
- > _it is not how much empty space there is, but rather how it is used. It is not how much
339
- > information there is, but rather how effectively it is arranged._
340
-
341
- And he states:
342
-
343
- > A most unconventional design strategy is revealed: _to clarify, add detail._
344
-
345
- Let's use this wisdom and add yet another layer of data to our plot, so that we clarify
346
- it with detail and do not leave large empty boxes. In this next plot, we add data points for
347
- each of the 60 pigs in the experiment. For that, add the function 'R.geom_point' to the
348
- plot.
349
-
350
- ```{ruby facets_with_points}
351
- # Split in vertical direction
352
- @bp = @bp + R.geom_point
353
-
354
- puts @bp
355
- ```
356
-
357
- ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_with_points.png)
358
-
359
- Now we can see the actual distribution of all the 60 subjects. Actually, this is not
360
- totally true. We have a hard time seing all 60 subjects. It seems that some points
361
- might be placed one over the other hiding useful information.
362
-
363
- But no sweat! Another layer might solve the problem. In the following plot a new layer
364
- called 'geom_jitter' is added to the plot. Jitter adds a small amount of random variation
365
- to the location of each point, and is a useful way of handling overplotting caused by
366
- discreteness in smaller datasets. This makes it easier to see all of the points and
367
- prevents data hiding. We also add
368
- color and change the shape of the points, making them even easier to see.
369
-
370
- ```{ruby facets_with_jitter}
371
- # Split in vertical direction
372
- puts @bp + R.geom_jitter(shape: 23, color: "cyan3", size: 1)
373
- ```
374
-
375
- ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_with_jitter.png)
376
-
377
- Now we can see all 60 points in the graph. We have here a much higher information density
378
- and we can see outliers and subjects distribution.
379
-
380
- # Preparing the Plot for Presentation
381
-
382
- We have come a long way since our first plot. As we already said, this is not
383
- an article about data analysis and the focus is on the
384
- integration of Ruby and ggplot. So, let's assume that the analysis is now done. Yet,
385
- ending the analysis does not mean that the work is done. On the contrary, the hardest
386
- part is yet to come!
387
-
388
- After the analysis it is necessary to communicate it by making a final plot for
389
- presentation. The last plot has all the information we want to share, but it is not very
390
- pleasing to the eye.
391
-
392
- ## Improving Colors
393
-
394
- Let's start by trying to improve colors. For now, we will not use the jitter layer.
395
- The previous plot has three bright colors that have no relashionship between them. Is
396
- there any obvious, or non-obvious for that matter, interpretation for the colors?
397
- Clearly, they are just random colors selected automatically by our software. Although
398
- those colors helped us understand the data, for a final presentation random colors
399
- can distract the viewer.
400
-
401
- In the following plot we use shades function 'scale_fill_manual' to change
402
- the colors of the boxes and order of labels. For colors, we use shades of blue for
403
- each dosage, with light blue ('cyan')
404
- representing the lower dose and deep blue ('deepskyblue4') the higher dose.
405
- Also, the legend could be improved: we use the ‘breaks’ parameter to put
406
- the smaller value (0.5) at the botton of the labels and the largest (2) at the top.
407
- This ordering seems more natural and
408
- matches with the actual order of the colors in the plot.
409
-
410
- ```{ruby facets_by_delivery_color2}
411
- @bp = @bp +
412
- R.scale_fill_manual(values: R.c("cyan", "deepskyblue", "deepskyblue4"),
413
- breaks: R.c("2","1","0.5"))
414
-
415
- puts @bp
416
- ```
417
-
418
- ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facets_by_delivery_color2.png)
419
-
420
-
421
- ## Violin Plot and Jitter
422
-
423
- The boxplot with jitter did look a bit overwhelming. The next plot uses a variation of
424
- a boxplot known as a _violin plot_ with jittered data.
425
-
426
- [From Wikipedia](https://en.wikipedia.org/wiki/Violin_plot)
427
-
428
-
429
- > A violin plot is a method of plotting numeric data. It is similar to a box plot with
430
- > a rotated kernel density plot on each side.
431
- >
432
- > A violin plot has four layers. The outer shape represents all possible results, with
433
- > thickness indicating how common. (Thus the thickest section represents the mode average.)
434
- > The next layer inside represents the values that occur 95% of the time.
435
- > The next layer (if it exists) inside represents the values that occur 50% of the time.
436
- > The central dot represents the median average value.
437
-
438
- ```{ruby violin_with_jitter}
439
- @violin = @base_tooth + R.geom_violin(E.aes(fill: :dose)) +
440
- R.facet_grid(+:all =~ +:supp) +
441
- R.geom_jitter(shape: 23, color: "cyan3", size: 1) +
442
- R.scale_fill_manual(values: R.c("cyan", "deepskyblue", "deepskyblue4"),
443
- breaks: R.c("2","1","0.5"))
444
-
445
- puts @violin
446
- ```
447
-
448
- ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/violin_with_jitter.png)
449
-
450
- This plot is an alternative to the original boxplot. For the final presentation, it is
451
- important to think which graphics will be best understood by our audience. A violin plot
452
- is a less known plot and could add mental overhead, yet, in my opinion, it does look a lit
453
- bit better than the boxplot and provides even more information than the boxplot with jitter.
454
-
455
- ## Adding Decoration
456
-
457
- Our final plot is starting to take shape, but a presentation plot should have at least a
458
- title, labels on the axes and maybe some other decorations. Let's start adding those.
459
- Since decoration requires more graph area, this new plot has a 'width' and 'height'
460
- specification. When there is no specification, the default values from R for width and
461
- height are 480.
462
-
463
- The 'labs' function adds the required decoration. In this example we use 'title',
464
- 'subtitle', 'x' for the $x$ axis label and 'y', for the $y$ axis label, and 'caption'
465
- for information about the plot (for clarity, we defined a caption variable using Ruby's
466
- Here Doc style).
467
-
468
- ```{ruby facets_with_decorations, dev = "png", fig.width = 540, fig.height = 560, units = "px"}
469
- caption = <<-EOT
470
- Length of odontoblasts in 60 guinea pigs.
471
- Each animal received one of three dose levels of vitamin C.
472
- EOT
473
-
474
- @decorations =
475
- R.labs(title: "Tooth Growth: Length vs Vitamin C Dose",
476
- subtitle: "Faceted by delivery method, OJ or VC",
477
- x: "Dose (mg)", y: "Teeth length",
478
- caption: caption)
479
-
480
- puts @bp + @decorations
481
- ```
482
-
483
- ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/facets_with_decorations.png)
484
-
485
- ## The Corp Theme
486
-
487
- We are almost done. But the default plot configuration does not yet look
488
- nice to the eye. We are still distracted
489
- by many aspects of the graph. First, the back font color does not look good. Then
490
- plot background, borders, grids all add clutter to the plot.
491
-
492
- We will now define our corporate theme. in a module that can be used/loaded for all
493
- plots, similar to CSS or any other style definition.
494
-
495
- In this theme, we remove borders and grids. The
496
- background if left for faceted plots but removed for non-faceted plots. Font colors are
497
- a shade o blue (color: '#00080'). Axis labels are moved near the end of the axis and
498
- written in 'bold'.
499
-
500
- ```{ruby coorp_theme}
501
- module CorpTheme
502
-
503
- R.install_and_loads 'RColorBrewer'
504
-
505
- #---------------------------------------------------------------------------------
506
- # face can be (1=plain, 2=bold, 3=italic, 4=bold-italic)
507
- #---------------------------------------------------------------------------------
508
-
509
- def self.text_element(size, face: "plain", hjust: nil)
510
- E.element_text(color: "#000080",
511
- face: face,
512
- size: size,
513
- hjust: hjust)
514
- end
515
-
516
- #---------------------------------------------------------------------------------
517
- # Defines the plot theme (visualization). In this theme we remove major and minor
518
- # grids, borders and background. We also turn-off scientific notation.
519
- #---------------------------------------------------------------------------------
520
-
521
- def self.global_theme(faceted = false)
522
-
523
- R.options(scipen: 999) # turn-off scientific notation like 1e+48
524
- # R.theme_set(R.theme_bw)
525
-
526
- # remove major grids
527
- gb = R.theme(panel__grid__major: E.element_blank())
528
- # remove minor grids
529
- gb = gb + R.theme(panel__grid__minor: E.element_blank)
530
- # gb = R.theme(panel__grid__minor: E.element_blank)
531
- # remove border
532
- gb = gb + R.theme(panel__border: E.element_blank)
533
- # remove background. When working with faceted graphs, the background makes
534
- # it easier to see each facet, so leave it
535
- gb = gb + R.theme(panel__background: E.element_blank) if !faceted
536
- # Change axis font
537
- gb = gb + R.theme(axis__text: text_element(8))
538
- # change axis title font
539
- gb = gb + R.theme(axis__title: text_element(10, face: "bold", hjust: 1))
540
- # change font of title
541
- gb = gb + R.theme(title: text_element(12, face: "bold"))
542
- # change font of subtitle
543
- gb = gb + R.theme(plot__subtitle: text_element(9))
544
- # change font of captions
545
- gb = gb + R.theme(plot__caption: text_element(8))
546
-
547
- end
548
-
549
- end
550
- ```
551
-
552
- ## Final Box Plot
553
-
554
- We can now easily make our final boxplot and violin plot. All the layers for the plot were
555
- added in order to expose our understanding of the data and the need to present the result
556
- to our audience.
557
-
558
- The final specification is just the addition of all layers build up to this point (@bp), plus
559
- the decorations (@decorations), plus the corporate theme.
560
-
561
- Here is our final boxplot, without jitter.
562
-
563
- ```{ruby final_box_plot}
564
- puts @bp + @decorations + CorpTheme.global_theme(faceted: true)
565
- ```
566
-
567
- ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_box_plot.png)
568
-
569
- And here is the final violin plot, with jitter and the same look and feel of the corporate
570
- boxplot.
571
-
572
- ```{ruby final_violin_plot}
573
- puts @violin + @decorations + CorpTheme.global_theme(faceted: true)
574
- ```
575
-
576
-
577
- ![]https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/225058450f4e69e5e82a01e22f69725554746893/final_violin_plot.png
578
-
579
- ## Another View
580
-
581
- We now make another plot, with the same look and feel as before but facetted by
582
- dose and not by supplement. This shows how easy it is to create new plots by just
583
- changing small statement on the _grammar of graphics_.
584
-
585
- ```{ruby facet_by_dose}
586
- caption = <<-EOT
587
- Length of odontoblasts in 60 guinea pigs.
588
- Each animal received one of three dose levels of vitamin C.
589
- EOT
590
-
591
- @bp = @tooth_growth.ggplot(E.aes(x: :supp, y: :len, group: :supp)) +
592
- R.geom_boxplot(E.aes(fill: :supp)) + R.facet_grid(+:all =~ +:dose) +
593
- R.scale_fill_manual(values: R.c("cyan", "deepskyblue4")) +
594
- R.labs(title: "Tooth Growth: Length by Dose",
595
- subtitle: "Faceted by dose",
596
- x: "Delivery method", y: "Teeth length",
597
- caption: caption) +
598
- CorpTheme.global_theme(faceted: true)
599
- puts @bp
600
- ```
601
-
602
- ![](https://gist.githubusercontent.com/rbotafogo/5538d6c679a59f4d56179b2c030e8d28/raw/96db2729e02ced0f9336216d87d14af141c1e81b/facet_by_dose.png)
603
-
604
- # Conclusion
605
-
606
- In this article, we introduce Galaaz and show how to tightly couple Ruby and R
607
- in a way that Ruby developers do not need to be aware
608
- of the executing R engine. For the Ruby developer the existence of R
609
- is of no consequence, she is just coding in Ruby. On the other hand, for the R
610
- developer, migration to Ruby is a matter of small syntactic changes with a very gentle
611
- learning curve. As the R developer becomes more proficient in Ruby, he can start using
612
- 'classes', 'modules', 'procs', 'lambdas'.
613
-
614
- Trying to bring to Ruby the power of R starting from scratch is an enourmous endeavour
615
- and would probably never be accomplished. Today's data scientists would certainly
616
- stick with either Python or R. Now, both the Ruby and R communities can benefit
617
- from this marriage, provided by Galaaz on top of GraalVM and Truffle's
618
- polyglot environment. We presented
619
- the process to couple Ruby and R, but this process can also be done to couple Ruby
620
- and JavaScript or Ruby and Python. In a polyglot world a *uniglot* language might
621
- be extremely relevant.
622
-
623
- From the perspective of performance, GraalVM and Truffle promises improvements that could
624
- reach over 10 times, both for [FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
625
- and for [TruffleRuby](https://rubykaigi.org/2018/presentations/eregontp.html).
626
-
627
- This article has shown how to improve a plot step-by-step. Starting from a very simple
628
- boxplot with all default configurations, we moved slowly to our final plot. The important
629
- point here is not if the final plot is actually beautiful (as beauty is in the eye of
630
- the beholder), but that there is a process of small steps improvements that can be followed
631
- to getting a final plot ready for presentation.
632
-
633
- Finally, this whole article was written in rmarkdown and compiled to HTML by _gknit_, an
634
- application that wraps _knitr_ and allows documenting Ruby code. This application can
635
- be of great help for any Rubyist trying to write articles, blogs or documentation for Ruby.
636
-
637
- # Installing Galaaz
638
-
639
- ## Prerequisites
640
-
641
- * GraalVM (>= rc8): https://github.com/oracle/graal/releases
642
- * TruffleRuby
643
- * FastR
644
-
645
- The following R packages will be automatically installed when necessary, but could be installed prior
646
- to using gKnit if desired:
647
-
648
- * ggplot2
649
- * gridExtra
650
- * knitr
651
-
652
- Installation of R packages requires a development environment and can be time consuming. In Linux,
653
- the gnu compiler and tools should be enough. I am not sure what is needed on the Mac.
654
-
655
- ## Preparation
656
-
657
- * gem install galaaz
658
-
659
- ## Usage
660
-
661
- * gknit <filename>
662
- * In a scrip add: require 'galaaz'