galaaz 0.4.9 → 0.4.10

Sign up to get free protection for your applications and to get access to all the features.
Files changed (76) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +798 -285
  3. data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +3 -12
  4. data/blogs/galaaz_ggplot/galaaz_ggplot.aux +5 -7
  5. data/blogs/galaaz_ggplot/galaaz_ggplot.html +69 -29
  6. data/blogs/galaaz_ggplot/galaaz_ggplot.pdf +0 -0
  7. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/midwest_rb.png +0 -0
  8. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/scatter_plot_rb.png +0 -0
  9. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/midwest_rb.pdf +0 -0
  10. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/scatter_plot_rb.pdf +0 -0
  11. data/blogs/galaaz_ggplot/midwest.Rmd +1 -9
  12. data/blogs/gknit/gknit.Rmd +37 -40
  13. data/blogs/gknit/gknit.html +32 -30
  14. data/blogs/gknit/gknit.md +36 -37
  15. data/blogs/gknit/gknit.pdf +0 -0
  16. data/blogs/gknit/gknit.tex +35 -37
  17. data/blogs/manual/manual.Rmd +548 -125
  18. data/blogs/manual/manual.html +509 -286
  19. data/blogs/manual/manual.md +798 -285
  20. data/blogs/manual/manual.pdf +0 -0
  21. data/blogs/manual/manual.tex +2816 -0
  22. data/blogs/manual/manual_files/figure-latex/diverging_bar.pdf +0 -0
  23. data/blogs/nse_dplyr/nse_dplyr.Rmd +240 -74
  24. data/blogs/nse_dplyr/nse_dplyr.html +191 -87
  25. data/blogs/nse_dplyr/nse_dplyr.md +361 -107
  26. data/blogs/nse_dplyr/nse_dplyr.pdf +0 -0
  27. data/blogs/nse_dplyr/nse_dplyr.tex +1373 -0
  28. data/blogs/ruby_plot/ruby_plot.Rmd +61 -81
  29. data/blogs/ruby_plot/ruby_plot.html +54 -57
  30. data/blogs/ruby_plot/ruby_plot.md +48 -67
  31. data/blogs/ruby_plot/ruby_plot.pdf +0 -0
  32. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
  33. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
  34. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
  35. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
  36. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
  37. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
  38. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
  39. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
  40. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
  41. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
  42. data/blogs/ruby_plot/ruby_plot_files/figure-latex/dose_len.png +0 -0
  43. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_delivery.png +0 -0
  44. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_dose.png +0 -0
  45. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color.png +0 -0
  46. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color2.png +0 -0
  47. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_decorations.png +0 -0
  48. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_jitter.png +0 -0
  49. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_points.png +0 -0
  50. data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_box_plot.png +0 -0
  51. data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_violin_plot.png +0 -0
  52. data/blogs/ruby_plot/ruby_plot_files/figure-latex/violin_with_jitter.png +0 -0
  53. data/lib/R_interface/rdata_frame.rb +0 -12
  54. data/lib/R_interface/robject.rb +14 -14
  55. data/lib/R_interface/ruby_extensions.rb +3 -31
  56. data/lib/R_interface/rvector.rb +0 -12
  57. data/lib/gknit/knitr_engine.rb +5 -3
  58. data/lib/util/exec_ruby.rb +22 -61
  59. data/specs/tmp.rb +26 -12
  60. data/version.rb +1 -1
  61. metadata +22 -17
  62. data/bin/gknit_old_r +0 -236
  63. data/blogs/dev/dev.Rmd +0 -23
  64. data/blogs/dev/dev.md +0 -58
  65. data/blogs/dev/dev2.Rmd +0 -65
  66. data/blogs/dev/model.rb +0 -41
  67. data/blogs/dplyr/dplyr.Rmd +0 -29
  68. data/blogs/dplyr/dplyr.html +0 -433
  69. data/blogs/dplyr/dplyr.md +0 -58
  70. data/blogs/dplyr/dplyr.rb +0 -63
  71. data/blogs/galaaz_ggplot/galaaz_ggplot.log +0 -640
  72. data/blogs/galaaz_ggplot/galaaz_ggplot.md +0 -431
  73. data/blogs/galaaz_ggplot/galaaz_ggplot.tex +0 -481
  74. data/blogs/galaaz_ggplot/midwest.png +0 -0
  75. data/blogs/galaaz_ggplot/scatter_plot.png +0 -0
  76. data/blogs/ruby_plot/ruby_plot.tex +0 -1077
@@ -1,1077 +0,0 @@
1
- \documentclass[11pt,]{article}
2
- \usepackage{lmodern}
3
- \usepackage{amssymb,amsmath}
4
- \usepackage{ifxetex,ifluatex}
5
- \usepackage{fixltx2e} % provides \textsubscript
6
- \ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
7
- \usepackage[T1]{fontenc}
8
- \usepackage[utf8]{inputenc}
9
- \else % if luatex or xelatex
10
- \ifxetex
11
- \usepackage{mathspec}
12
- \else
13
- \usepackage{fontspec}
14
- \fi
15
- \defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
16
- \fi
17
- % use upquote if available, for straight quotes in verbatim environments
18
- \IfFileExists{upquote.sty}{\usepackage{upquote}}{}
19
- % use microtype if available
20
- \IfFileExists{microtype.sty}{%
21
- \usepackage{microtype}
22
- \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
23
- }{}
24
- \usepackage[margin=1in]{geometry}
25
- \usepackage{hyperref}
26
- \hypersetup{unicode=true,
27
- pdftitle={How to make Beautiful Ruby Plots with Galaaz},
28
- pdfauthor={Rodrigo Botafogo; Daniel Mossé; University of Pittsburgh},
29
- pdfborder={0 0 0},
30
- breaklinks=true}
31
- \urlstyle{same} % don't use monospace font for urls
32
- \usepackage{color}
33
- \usepackage{fancyvrb}
34
- \newcommand{\VerbBar}{|}
35
- \newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
36
- \DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
37
- % Add ',fontsize=\small' for more characters per line
38
- \usepackage{framed}
39
- \definecolor{shadecolor}{RGB}{248,248,248}
40
- \newenvironment{Shaded}{\begin{snugshade}}{\end{snugshade}}
41
- \newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
42
- \newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{#1}}
43
- \newcommand{\DecValTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
44
- \newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
45
- \newcommand{\FloatTok}[1]{\textcolor[rgb]{0.00,0.00,0.81}{#1}}
46
- \newcommand{\ConstantTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
47
- \newcommand{\CharTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
48
- \newcommand{\SpecialCharTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
49
- \newcommand{\StringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
50
- \newcommand{\VerbatimStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
51
- \newcommand{\SpecialStringTok}[1]{\textcolor[rgb]{0.31,0.60,0.02}{#1}}
52
- \newcommand{\ImportTok}[1]{#1}
53
- \newcommand{\CommentTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
54
- \newcommand{\DocumentationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
55
- \newcommand{\AnnotationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
56
- \newcommand{\CommentVarTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
57
- \newcommand{\OtherTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{#1}}
58
- \newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
59
- \newcommand{\VariableTok}[1]{\textcolor[rgb]{0.00,0.00,0.00}{#1}}
60
- \newcommand{\ControlFlowTok}[1]{\textcolor[rgb]{0.13,0.29,0.53}{\textbf{#1}}}
61
- \newcommand{\OperatorTok}[1]{\textcolor[rgb]{0.81,0.36,0.00}{\textbf{#1}}}
62
- \newcommand{\BuiltInTok}[1]{#1}
63
- \newcommand{\ExtensionTok}[1]{#1}
64
- \newcommand{\PreprocessorTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textit{#1}}}
65
- \newcommand{\AttributeTok}[1]{\textcolor[rgb]{0.77,0.63,0.00}{#1}}
66
- \newcommand{\RegionMarkerTok}[1]{#1}
67
- \newcommand{\InformationTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
68
- \newcommand{\WarningTok}[1]{\textcolor[rgb]{0.56,0.35,0.01}{\textbf{\textit{#1}}}}
69
- \newcommand{\AlertTok}[1]{\textcolor[rgb]{0.94,0.16,0.16}{#1}}
70
- \newcommand{\ErrorTok}[1]{\textcolor[rgb]{0.64,0.00,0.00}{\textbf{#1}}}
71
- \newcommand{\NormalTok}[1]{#1}
72
- \usepackage{graphicx,grffile}
73
- \makeatletter
74
- \def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
75
- \def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
76
- \makeatother
77
- % Scale images if necessary, so that they will not overflow the page
78
- % margins by default, and it is still possible to overwrite the defaults
79
- % using explicit options in \includegraphics[width, height, ...]{}
80
- \setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
81
- \IfFileExists{parskip.sty}{%
82
- \usepackage{parskip}
83
- }{% else
84
- \setlength{\parindent}{0pt}
85
- \setlength{\parskip}{6pt plus 2pt minus 1pt}
86
- }
87
- \setlength{\emergencystretch}{3em} % prevent overfull lines
88
- \providecommand{\tightlist}{%
89
- \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
90
- \setcounter{secnumdepth}{5}
91
- % Redefines (sub)paragraphs to behave more like sections
92
- \ifx\paragraph\undefined\else
93
- \let\oldparagraph\paragraph
94
- \renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
95
- \fi
96
- \ifx\subparagraph\undefined\else
97
- \let\oldsubparagraph\subparagraph
98
- \renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
99
- \fi
100
-
101
- %%% Use protect on footnotes to avoid problems with footnotes in titles
102
- \let\rmarkdownfootnote\footnote%
103
- \def\footnote{\protect\rmarkdownfootnote}
104
-
105
- %%% Change title format to be more compact
106
- \usepackage{titling}
107
-
108
- % Create subtitle command for use in maketitle
109
- \newcommand{\subtitle}[1]{
110
- \posttitle{
111
- \begin{center}\large#1\end{center}
112
- }
113
- }
114
-
115
- \setlength{\droptitle}{-2em}
116
-
117
- \title{How to make Beautiful Ruby Plots with Galaaz}
118
- \pretitle{\vspace{\droptitle}\centering\huge}
119
- \posttitle{\par}
120
- \author{Rodrigo Botafogo \\ Daniel Mossé \\ University of Pittsburgh}
121
- \preauthor{\centering\large\emph}
122
- \postauthor{\par}
123
- \predate{\centering\large\emph}
124
- \postdate{\par}
125
- \date{November 19th, 2018}
126
-
127
- % usar portugues do Brasil
128
- % \usepackage[brazilian]{babel}
129
- \usepackage[utf8]{inputenc}
130
-
131
- \usepackage{geometry}
132
- \geometry{a4paper, top=1in}
133
-
134
- % needed for kableExtra
135
- \usepackage{longtable}
136
- \usepackage{multirow}
137
- \usepackage[table]{xcolor}
138
- \usepackage{wrapfig}
139
- \usepackage{float}
140
- \usepackage{colortbl}
141
- \usepackage{pdflscape}
142
- \usepackage{tabu}
143
- \usepackage{threeparttable}
144
- \usepackage[normalem]{ulem}
145
-
146
- \usepackage{bbm}
147
- \usepackage{booktabs}
148
- \usepackage{expex}
149
-
150
- \usepackage{graphicx}
151
-
152
- \usepackage{fancyhdr}
153
- % set the header and foot style
154
- % style 'fancy' adds the section name on the header
155
- % and the page number on the footer
156
- \pagestyle{fancy}
157
-
158
- % style 'fancyhf' leaves header and footer empty
159
- %\fancyhf{}
160
-
161
- % sets the left head element to \rightmark, which contains the
162
- % current section (\leftmark is the current chapter)
163
- %\fancyhead[L]{\rightmark} .
164
-
165
- % sets the right head element to the page number.
166
- % \fancyhead[R]{\thepage}
167
-
168
- % lets the head rule disappear.
169
- % \renewcommand{\headrulewidth}{0pt}
170
- % Possible selectors for the optional argument of \fancyhead/\fancyfoot
171
- % are L (left), C (center) or R (right) for the position of the element
172
- % and E (even) or O (odd) to distinguish even and odd pages. If you omit
173
- % E/O the element is set for all pages.
174
-
175
- % \usepackage{lipsum}
176
-
177
- % make available command lastpage
178
- \usepackage{lastpage}
179
-
180
- % default fontsize 11pt better to add
181
- % fontsize on the yaml header
182
- % \usepackage[fontsize=11pt]{scrextend}
183
-
184
- % comandos para formatar uma tabela
185
- \usepackage{array}
186
- \newcolumntype{L}[1]{>{\raggedright\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
187
- \newcolumntype{C}[1]{>{\centering\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
188
- \newcolumntype{R}[1]{>{\raggedleft\let\newline\\\arraybackslash\hspace{0pt}}m{#1}}
189
-
190
- % necessário if we need to import other latex documents
191
- \usepackage{import}
192
-
193
- % Command to import an R variable to latex
194
- \newcommand{\RtoLatex}[2]{\newcommand{#1}{#2}}
195
-
196
- %
197
- %\newcommand{\atraso}[1]{\color{red} \textbf {Tempo desde a Assinatura do Contrato: #1 dias}}
198
-
199
- \begin{document}
200
- \maketitle
201
-
202
- {
203
- \setcounter{tocdepth}{2}
204
- \tableofcontents
205
- }
206
- \section{Introduction}\label{introduction}
207
-
208
- According to Wikipedia ``Ruby is a dynamic, interpreted, reflective,
209
- object-oriented, general-purpose programming language. It was designed
210
- and developed in the mid-1990s by Yukihiro''Matz" Matsumoto in Japan."
211
- It reached high popularity with the development of Ruby on Rails (RoR)
212
- by David Heinemeier Hansson. RoR is a web application framework first
213
- released around 2005. It makes extensive use of Ruby's metaprogramming
214
- features. With RoR, Ruby became very popular. According to
215
- \href{https://www.tiobe.com/tiobe-index/ruby/}{Ruby's Tiobe index} it
216
- peeked in popularity around 2008. Then it's popularity declined until
217
- 2015 when it started picking up again. At the time of this writing
218
- (November 2018), the Tiobe index puts ruby in 16th position.
219
-
220
- Python, a similar language to Ruby, ranks 4th in the index. Java, C and
221
- C++ take the first three positions. Ruby is often criticized for its
222
- focus on web applications. But Ruby can do
223
- \href{https://github.com/markets/awesome-ruby}{much more} than just web
224
- applications. Yet, for scientific computing, Ruby lags way behind Python
225
- and R. Python has Django framework for web, NumPy for numerical arrays,
226
- Pandas for data analysis. R is a free software environment for
227
- statistical computing and graphics with thousands of libraries for data
228
- analysis.
229
-
230
- Until recently, there was no real perspective for Ruby to bridge this
231
- gap. Implementing a complete scientific computing infrastructure would
232
- take too long. Comes GraalVM into the picture:
233
-
234
- \begin{quote}
235
- GraalVM is a universal virtual machine for running applications written
236
- in JavaScript, Python 3, Ruby, R, JVM-based languages like Java, Scala,
237
- Kotlin, and LLVM-based languages such as C and C++.
238
-
239
- GraalVM removes the isolation between programming languages and enables
240
- interoperability in a shared runtime. It can run either standalone or in
241
- the context of OpenJDK, Node.js, Oracle Database, or MySQL.
242
-
243
- GraalVM allows you to write polyglot applications with a seamless way to
244
- pass values from one language to another. With GraalVM there is no
245
- copying or marshaling necessary as it is with other polyglot systems.
246
- This lets you achieve high performance when language boundaries are
247
- crossed. Most of the time there is no additional cost for crossing a
248
- language boundary at all.
249
-
250
- Often developers have to make uncomfortable compromises that require
251
- them to rewrite their software in other languages. For example:
252
-
253
- \begin{itemize}
254
- \tightlist
255
- \item
256
- That library is not available in my language. I need to rewrite it.
257
- \item
258
- That language would be the perfect fit for my problem, but we cannot
259
- run it in our environment.
260
- \item
261
- That problem is already solved in my language, but the language is too
262
- slow.
263
- \end{itemize}
264
-
265
- With GraalVM we aim to allow developers to freely choose the right
266
- language for the task at hand without making compromises.
267
- \end{quote}
268
-
269
- As stated above, GraalVM is a \emph{universal} virtual machine that
270
- allows Ruby and R (and other languages) to run on the same environment.
271
- GraalVM allows polyglot applications to \emph{seamlessly} interact with
272
- one another and pass values from one language to the other. Galaaz, a
273
- gem for Ruby, intends to tightly couple Ruby and R and allow those
274
- languages to interact in a way that the user will be unaware of such
275
- interaction.
276
-
277
- Library wrapping is an usual way of bringing features from one language
278
- into another. To improve performance, Python often wraps more efficient
279
- C libraries. For the Python developer, the existence of such C libraries
280
- is of no concern. The problem with library wrapping is that for any new
281
- library, there is the need to handcraft a new wrapper.
282
-
283
- Galaaz, instead of wrapping a single C or R library, wraps the whole of
284
- the R language in Ruby. Doing so, all thousands of R libraries are
285
- available to Ruby developers. Also any new library developed in R will
286
- be available without a new wrapping effort.
287
-
288
- This article shows how Ruby can use R's ggplot2 library tranparantly,
289
- and bring to Ruby the power of high quality scientific plotting. it also
290
- shows that migrating from R to Ruby with Galaaz is a matter of small
291
- syntactic changes. Using Ruby, the R developer can use all of Ruby's
292
- powerful OO features. It also becomes much easier to move code from the
293
- analysis phase to the production phase.
294
-
295
- In this article we will explore the R ToothGrowth dataset. In doing so,
296
- we will create some boxplots. A primer on boxplot is available in
297
- \href{https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51}{this
298
- article}.
299
-
300
- We will also create a Corporate Template ensuring that plots will have a
301
- consistent visualization. This template is build using a Ruby module.
302
- There is a way of building ggplot themes that will work the same as the
303
- Ruby module. Yet, writing a new theme requires specific knowledge. Ruby
304
- modules are standard to the language and don't need special knowledge.
305
-
306
- In
307
- \href{https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021}{this
308
- blog} we show a scatter plot in Ruby also with Galaaz.
309
-
310
- \section{gKnit}\label{gknit}
311
-
312
- \emph{Knitr} is an application that converts text written in rmarkdown
313
- to many different output formats. For instance, a writer can convert an
314
- rmarkdown document to HTML, \(LaTex\), docx and many other formats.
315
- Rmarkdown documents can contain text and \emph{code chunks}. Knitr
316
- formats code chunks in a grayed box in the output document. It also
317
- executes the code chunks and formats the output in a white box. Every
318
- line of output from the execution code is preceded by `\#\#'.
319
-
320
- Knitr allows code chunks to be in R, Python, Ruby and dozens of other
321
- languages. Yet, while R and Python chunks can share data, in other
322
- languages, chunks are independent. This means that a variable defined in
323
- one chunk cannot be used in another chunk.
324
-
325
- With \emph{gKnit} Ruby code chunks can share data. In gKnit each Ruby
326
- chunk executes in its own scope and thus, local variable defined in a
327
- chunk are not accessible by other chunks. Yet, All chunks execute in the
328
- scope of a `chunk' class and instance variables (`@'), are available in
329
- all chunks.
330
-
331
- \section{Exploring the Dataset}\label{exploring-the-dataset}
332
-
333
- Let's start by exploring our selected dataset. ToothGrowth is an R
334
- dataset. A dataset is like an excel spreadsheet, but in which each
335
- column has only one type of data. For instance one column can have
336
- float, the other integer, and a third strings. This dataset analyses the
337
- length of odontoblasts (cells responsible for tooth growth) in 60 guinea
338
- pigs, where each animal received one of three dose levels of Vitamin C
339
- (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice (OJ)
340
- or ascorbic acid (a form of vitamin C and coded as VC).
341
-
342
- The ToothGrowth dataset contains three columns: `len', `supp' and
343
- `dose'. Let's take a look at a few rows of this dataset. In Galaaz, to
344
- have access to an R variable we use the corresponding Ruby symbol
345
- preceeded by the tilda (`\textasciitilde{}') function. Note in the
346
- following chunk that Ruby's `@tooth\_growth' is assigned the value of
347
- `\textasciitilde{}:ToothGrowth'. `ToothGrowth' is the R variable
348
- containing the dataset of interest.
349
-
350
- \begin{Shaded}
351
- \begin{Highlighting}[]
352
- \CommentTok{# Read the R ToothGrowth variable and assign it to the}
353
- \CommentTok{# Ruby instance variable @tooth_growth that will be }
354
- \CommentTok{# available to all Ruby chunks in this document.}
355
- \OtherTok{@tooth_growth}\NormalTok{ = ~}\StringTok{:ToothGrowth}
356
- \CommentTok{# print the first few elements of the dataset}
357
- \NormalTok{puts }\OtherTok{@tooth_growth}\NormalTok{.head}
358
- \end{Highlighting}
359
- \end{Shaded}
360
-
361
- \begin{verbatim}
362
- ## len supp dose
363
- ## 1 4.2 VC 0.5
364
- ## 2 11.5 VC 0.5
365
- ## 3 7.3 VC 0.5
366
- ## 4 5.8 VC 0.5
367
- ## 5 6.4 VC 0.5
368
- ## 6 10.0 VC 0.5
369
- \end{verbatim}
370
-
371
- Great! We've managed to read the ToothGrowth dataset and take a look at
372
- its elements. We see here the first 6 rows of the dataset. To access a
373
- column, follow the dataset name with a dot (`.') and the name of the
374
- column. Also use dot notation to chain methods in usual Ruby style.
375
-
376
- \begin{Shaded}
377
- \begin{Highlighting}[]
378
- \CommentTok{# Access the tooth_growth 'len' column and print the first few}
379
- \CommentTok{# elements of this column with the 'head' method.}
380
- \NormalTok{puts }\OtherTok{@tooth_growth}\NormalTok{.len.head}
381
- \end{Highlighting}
382
- \end{Shaded}
383
-
384
- \begin{verbatim}
385
- ## [1] 4.2 11.5 7.3 5.8 6.4 10.0
386
- \end{verbatim}
387
-
388
- The `dose' column contains a numeric value wiht either, 0.5, 1 or 2.
389
- Although those are number, they are better interpreted as a
390
- \href{https://swcarpentry.github.io/r-novice-inflammation/12-supp-factors/}{factor
391
- or cathegory}. So, let's convert our `dose' column from numeric to
392
- `factor'. In R, the function `as.factor' is used to convert data in a
393
- vector to factors. To use this function from Galaaz the dot (`.') in the
394
- function name is substituted by '\_\_`(double underline). The function
395
- 'as.factor' becomes 'R.as\_\_factor' or just 'as\_\_factor' when
396
- chaining.
397
-
398
- \begin{Shaded}
399
- \begin{Highlighting}[]
400
- \CommentTok{# convert the dose to a factor}
401
- \OtherTok{@tooth_growth}\NormalTok{.dose = }\OtherTok{@tooth_growth}\NormalTok{.dose.as__factor}
402
- \end{Highlighting}
403
- \end{Shaded}
404
-
405
- Let's explore some more details of this dataset. In particular, let's
406
- look at its dimensions, structure and summary statistics.
407
-
408
- \begin{Shaded}
409
- \begin{Highlighting}[]
410
- \NormalTok{puts }\OtherTok{@tooth_growth}\NormalTok{.dim}
411
- \end{Highlighting}
412
- \end{Shaded}
413
-
414
- \begin{verbatim}
415
- ## [1] 60 3
416
- \end{verbatim}
417
-
418
- This dataset has 60 rows, one for each subject and 3 columns, as we have
419
- already seen.
420
-
421
- Note that we do not call `puts' when using the `str' function. This
422
- functions does not return anything and prints the structure of the
423
- dataset as a side effect.
424
-
425
- \begin{Shaded}
426
- \begin{Highlighting}[]
427
- \OtherTok{@tooth_growth}\NormalTok{.str}
428
- \end{Highlighting}
429
- \end{Shaded}
430
-
431
- \begin{verbatim}
432
- ## 'data.frame': 60 obs. of 3 variables:
433
- ## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
434
- ## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
435
- ## $ dose: Factor w/ 3 levels "0.5","1","2": 1 1 1 1 1 1 1 1 1 1 ...
436
- \end{verbatim}
437
-
438
- Observe that both variables `supp' and `dose' are factors. The system
439
- made variable `supp' a factor automatically, since it contais two
440
- strings OJ and VC.
441
-
442
- Finally, using the summary method, we get the statistical summary for
443
- the dataset
444
-
445
- \begin{Shaded}
446
- \begin{Highlighting}[]
447
- \NormalTok{puts }\OtherTok{@tooth_growth}\NormalTok{.summary}
448
- \end{Highlighting}
449
- \end{Shaded}
450
-
451
- \begin{verbatim}
452
- ## len supp dose
453
- ## Min. : 4.20 OJ:30 0.5:20
454
- ## 1st Qu.:13.07 VC:30 1 :20
455
- ## Median :19.25 2 :20
456
- ## Mean :18.81
457
- ## 3rd Qu.:25.27
458
- ## Max. :33.90
459
- \end{verbatim}
460
-
461
- \section{Doing the Data Analysis}\label{doing-the-data-analysis}
462
-
463
- \subsection{Quick plot for seing the
464
- data}\label{quick-plot-for-seing-the-data}
465
-
466
- Let's now create our first plot with the given data by accessing ggplot2
467
- from Ruby. For Rubyist that have never seen or used ggplot2, here is the
468
- description of ggplot found on its home page:
469
-
470
- \begin{quote}
471
- ``ggplot2 is a system for declaratively creating graphics, based on
472
- \emph{The Grammar of Graphics}. You provide the data, tell ggplot2 how
473
- to map variables to aesthetics, what graphical primitives to use, and it
474
- takes care of the details.''
475
- \end{quote}
476
-
477
- This description might be a bit cryptic and it is best to see it at work
478
- to understand it. Basically, in the \emph{grammar of graphics}
479
- developers add layers of components such as grid, axis, data, title,
480
- subtitle and also graphical primitives such as \emph{bar plot},
481
- \emph{box plot}, to form the final graphics.
482
-
483
- In order to make a plot, we use the `ggplot' function to the dataset. In
484
- R, this would be written as
485
- \texttt{ggplot(\textless{}dataset\textgreater{},\ ...)}. In Galaaz, use
486
- either \texttt{R.ggplot(\textless{}dataset\textgreater{},\ ...)}, or
487
- \texttt{\textless{}dataset\textgreater{}.ggplot(...)}. In the graph
488
- specification bellow, we use the second notation that looks more Ruby
489
- like. The plot specifies the `dose' on the \(x\) axis and the `length'
490
- on the \(y\) axis with the `aes' method. `E.aes(x: :dose, y: :len)'. To
491
- specify the type of plot to create add a geom to the plot. For a
492
- boxplot, the geom is R.geom\_boxplot.
493
-
494
- Note also that we have a call to `R.png' before plotting and
495
- 'R.dev\_\_off' after the print statement. `R.png' opens a `png' device
496
- for outputting the plot. 'R.dev\_\_off' closes the device and creates
497
- the `png' file. If we do no pass a name to the `png' function, the image
498
- gets a default name of `Rplot\textless{}nnn\textgreater{}' where
499
- \textless{}nnn\textgreater{} is the number of the plot. We can then
500
- include the generated `png' file in the document by adding an rmarkdown
501
- directive.
502
-
503
- \begin{Shaded}
504
- \begin{Highlighting}[]
505
- \NormalTok{require }\StringTok{'ggplot'}
506
-
507
- \NormalTok{R.png(}\StringTok{"figures/dose_len.png"}\NormalTok{)}
508
-
509
- \NormalTok{e = }\OtherTok{@tooth_growth}\NormalTok{.ggplot(E.aes(}\StringTok{x: :dose}\NormalTok{, }\StringTok{y: :len}\NormalTok{))}
510
- \NormalTok{print e + R.geom_boxplot}
511
-
512
- \NormalTok{R.dev__off}
513
- \end{Highlighting}
514
- \end{Shaded}
515
-
516
- \begin{figure}
517
- \centering
518
- \includegraphics[width=0.70000\textwidth]{figures/dose_len.png}
519
- \caption{}
520
- \end{figure}
521
-
522
- Great! We've just managed to create and save our first plot in Ruby with
523
- only four lines of code. We can see with this plot a clear trend: as the
524
- dose of the supplement is increased, so is the length of teeth.
525
-
526
- \subsection{Facetting the plot}\label{facetting-the-plot}
527
-
528
- This first plot shows a trend, but our data has information about two
529
- different forms of delivery method, either by Orange Juice (OJ) or by
530
- Vitamin C (VC). Let's then try to create a plot that explicits the
531
- effect of each delivery method. This next plot is a \emph{facetted} plot
532
- where each delivery method gets is own plot. On the left side, the plot
533
- shows the OJ delivery method. On the right side, we see the VC delivery
534
- method. To obtain this plot, we use the `R.facet\_grid' function, that
535
- automatically creates the facets based on the delivery method factors.
536
- The parameter to the `facet\_grid' method is a
537
- \href{https://thomasleeper.com/Rcourse/Tutorials/formulae.html}{\emph{formula}}.
538
-
539
- In Galaaz, formulas are written a bit differently than in R. The
540
- following changes are necessary:
541
-
542
- \begin{itemize}
543
- \tightlist
544
- \item
545
- R symbols are represented by the same Ruby symbol prefixed with the
546
- `+' method. The symbol \texttt{x} in R becomes \texttt{+:x} in Ruby;
547
- \item
548
- The `\textasciitilde{}' operator in R becomes `=\textasciitilde{}' in
549
- Ruby. The formula \texttt{x\ \textasciitilde{}\ y} in R is written as
550
- \texttt{+:x\ =\textasciitilde{}\ +:y} in Ruby;
551
- \item
552
- The `.' symbol in R becomes `+:all'
553
- \end{itemize}
554
-
555
- Another way of writing a formula is to use the `formula' function with
556
- the actual formula as a string. The formula
557
- \texttt{x\ \textasciitilde{}\ y} in R can be written as
558
- \texttt{R.formula("x\ \textasciitilde{}\ y")}. For more complex
559
- formulas, the use of the `formula' function is preferred.
560
-
561
- The formula \texttt{+:all\ =\textasciitilde{}\ +:supp} indicates to the
562
- `facet\_grid' function that it needs to facet the plot based on the
563
- \texttt{supp} variable and split the plot vertically. Changing the
564
- formula to \texttt{+:supp\ =\textasciitilde{}\ +:all} would split the
565
- plot horizontally.
566
-
567
- \begin{Shaded}
568
- \begin{Highlighting}[]
569
- \NormalTok{R.png(}\StringTok{"figures/facet_by_delivery.png"}\NormalTok{)}
570
-
571
- \OtherTok{@base_tooth}\NormalTok{ = }\OtherTok{@tooth_growth}\NormalTok{.ggplot(E.aes(}\StringTok{x: :dose}\NormalTok{, }\StringTok{y: :len}\NormalTok{, }\StringTok{group: :dose}\NormalTok{))}
572
-
573
- \OtherTok{@bp}\NormalTok{ = }\OtherTok{@base_tooth}\NormalTok{ + R.geom_boxplot +}
574
- \CommentTok{# Split in vertical direction}
575
- \NormalTok{ R.facet_grid(+}\StringTok{:all}\NormalTok{ =~ +}\StringTok{:supp}\NormalTok{)}
576
-
577
- \NormalTok{puts }\OtherTok{@bp}
578
-
579
- \NormalTok{R.dev__off}
580
- \end{Highlighting}
581
- \end{Shaded}
582
-
583
- \begin{figure}
584
- \centering
585
- \includegraphics[width=0.70000\textwidth]{figures/facet_by_delivery.png}
586
- \caption{}
587
- \end{figure}
588
-
589
- It now becomes clear that although both methods of delivery have a
590
- direct impact on tooth growth, method (OJ) is non-linear having a higher
591
- impact with smaller doses of ascorbic acid and reducing it's impact as
592
- the dose increases. With the (VC) approach, the impact seems to be more
593
- linear.
594
-
595
- \subsection{Adding Color}\label{adding-color}
596
-
597
- If this paper was about data analysis, we should make a better analysis
598
- of the trends and should improve the statistical analysis. But we are
599
- interested in working with ggplot in Ruby. So, Let's add some color to
600
- this plot to make the trend and comparison more visible. In the
601
- following plot, the boxes are color coded by dose. To add color, it is
602
- enough to add \texttt{fill:\ :dose} to the aesthetic of boxplot. With
603
- this command each `dose' factor gets its own color.
604
-
605
- \begin{Shaded}
606
- \begin{Highlighting}[]
607
- \NormalTok{R.png(}\StringTok{"figures/facets_by_delivery_color.png"}\NormalTok{)}
608
-
609
- \OtherTok{@bp}\NormalTok{ = }\OtherTok{@bp}\NormalTok{ + R.geom_boxplot(E.aes(}\StringTok{fill: :dose}\NormalTok{))}
610
- \NormalTok{puts }\OtherTok{@bp}
611
-
612
- \NormalTok{R.dev__off}
613
- \end{Highlighting}
614
- \end{Shaded}
615
-
616
- \begin{figure}
617
- \centering
618
- \includegraphics[width=0.70000\textwidth]{figures/facets_by_delivery_color.png}
619
- \caption{}
620
- \end{figure}
621
-
622
- Facetting helps us compare the general trends in the (OJ) and (VC)
623
- delivery methods. Adding color allow us to compare specifically how each
624
- dosage impacts the teeth growth. It is possible to observe that with
625
- smaller doses, up to 1mg, (OJ) performs better than (VC) (red color).
626
- For 2mg, both (OJ) and (VC) have the same median, but (OJ) is less
627
- disperse (blue color). For 1mg (green color), (OJ) is significantly
628
- bettern than (VC). By this very quick analysis, it seems that (OJ) is a
629
- better delivery method than (VC).
630
-
631
- \subsection{Clarifying the data}\label{clarifying-the-data}
632
-
633
- Boxplots give us a nice idea of the distribution of data, but looking at
634
- those plots with large colored boxes leaves us wondering what is going
635
- on on those boxes. According to Edward Tufte in Envisioning Information:
636
-
637
- \begin{quote}
638
- Thin data rightly prompts suspicions: ``What are they leaving out? Is
639
- that really everything they know? What are they hiding? Is that all they
640
- did?'' Now and then it is claimed that vacant space is ``friendly''
641
- (anthropomorphizing an inherently murky idea) but \emph{it is not how
642
- much empty space there is, but rather how it is used. It is not how much
643
- information there is, but rather how effectively it is arranged.}
644
- \end{quote}
645
-
646
- And he states:
647
-
648
- \begin{quote}
649
- A most unconventional design strategy is revealed: \emph{to clarify, add
650
- detail.}
651
- \end{quote}
652
-
653
- Let's then use this wisdom and add yet another layer of data to our
654
- plot, so that we clarify it with detail and do not leave large empty
655
- boxes. In this next plot, we add data points for each of the 60 pigs in
656
- the experiment. For that, add the function `R.geom\_point' to the plot.
657
-
658
- \begin{Shaded}
659
- \begin{Highlighting}[]
660
- \NormalTok{R.png(}\StringTok{"figures/facets_with_points.png"}\NormalTok{)}
661
-
662
- \CommentTok{# Split in vertical direction}
663
- \OtherTok{@bp}\NormalTok{ = }\OtherTok{@bp}\NormalTok{ + R.geom_point}
664
-
665
- \NormalTok{puts }\OtherTok{@bp}
666
-
667
- \NormalTok{R.dev__off}
668
- \end{Highlighting}
669
- \end{Shaded}
670
-
671
- \begin{figure}
672
- \centering
673
- \includegraphics[width=0.70000\textwidth]{figures/facets_with_points.png}
674
- \caption{}
675
- \end{figure}
676
-
677
- Now we can see the actual distribution of all the 60 subject. Actually,
678
- this is not totally true. We have a hard time seing all 60 subjects. It
679
- seems that some points might be placed one over the other hiding useful
680
- information.
681
-
682
- But no sweat! Another layer might solve the problem. In the following
683
- plot a new layer called `geom\_jitter' is added to the plot. This adds
684
- randomness to the position of the points, making it easier to see all of
685
- then and preventing data hiding. We also add color and change the shape
686
- of the points, making them even easier to see.
687
-
688
- \begin{Shaded}
689
- \begin{Highlighting}[]
690
- \NormalTok{R.png(}\StringTok{"figures/facets_with_jitter.png"}\NormalTok{)}
691
-
692
- \CommentTok{# Split in vertical direction}
693
- \NormalTok{puts }\OtherTok{@bp}\NormalTok{ + R.geom_jitter(}\StringTok{shape: }\DecValTok{23}\NormalTok{, }\StringTok{color: "cyan3"}\NormalTok{, }\StringTok{size: }\DecValTok{1}\NormalTok{)}
694
-
695
- \NormalTok{R.dev__off}
696
- \end{Highlighting}
697
- \end{Shaded}
698
-
699
- \begin{figure}
700
- \centering
701
- \includegraphics[width=0.70000\textwidth]{figures/facets_with_jitter.png}
702
- \caption{}
703
- \end{figure}
704
-
705
- Now we can see all 60 points in the graph. We have here a much higher
706
- information density and we can see outliers and subjects distribution.
707
-
708
- \section{Preparing the Plot for
709
- Presentation}\label{preparing-the-plot-for-presentation}
710
-
711
- We have come a long way since our first plot. As was already said, this
712
- is not an article about data analysis and the focus is on the
713
- integration of Ruby and ggplot. So, let's assume that the analysis is
714
- now done. Yet, ending the analysis does not mean that the work is done.
715
- On the contrary, the hardest part is yet to come!
716
-
717
- After the analysis it is necessary to communicate it by making a final
718
- plot for presentation. The last plot has all the information we want to
719
- share, but it is not very pleasing to the eye.
720
-
721
- \subsection{Improving Colors}\label{improving-colors}
722
-
723
- Let's start by trying to improve colors. For now, we will not use the
724
- jitter layer. The previous plot has three bright colors that have no
725
- relashionship between them. Is there any obvious, or non-obvious for
726
- that matter, interpretation for the colors? Clearly, they are just
727
- random colors selected automatically by our software. Although those
728
- colors helped us understand the data, for a final presentation random
729
- colors can distract the viewer.
730
-
731
- In the following plot we use shades function `scale\_fill\_manual' to
732
- change the colors of the boxes and order of labels. For colors we use
733
- shades of blue for each dosage, with light blue (`cyan') representing
734
- the lower dose and deep blue (`deepskyblue4') the higher dose. Also the
735
- smaller value (0.5) is on the botton of the labels and (2) at the top.
736
- This ordering seems more natural and matches with the actual order of
737
- the colors in the plot.
738
-
739
- \begin{Shaded}
740
- \begin{Highlighting}[]
741
- \NormalTok{R.png(}\StringTok{"figures/facets_by_delivery_color2.png"}\NormalTok{)}
742
-
743
- \OtherTok{@bp}\NormalTok{ = }\OtherTok{@bp}\NormalTok{ +}
744
- \NormalTok{ R.scale_fill_manual(}\StringTok{values: }\NormalTok{R.c(}\StringTok{"cyan"}\NormalTok{, }\StringTok{"deepskyblue"}\NormalTok{, }\StringTok{"deepskyblue4"}\NormalTok{),}
745
- \StringTok{breaks: }\NormalTok{R.c(}\StringTok{"2"}\NormalTok{,}\StringTok{"1"}\NormalTok{,}\StringTok{"0.5"}\NormalTok{))}
746
-
747
- \NormalTok{puts }\OtherTok{@bp}
748
-
749
- \NormalTok{R.dev__off}
750
- \end{Highlighting}
751
- \end{Shaded}
752
-
753
- \begin{figure}
754
- \centering
755
- \includegraphics[width=0.70000\textwidth]{figures/facets_by_delivery_color2.png}
756
- \caption{}
757
- \end{figure}
758
-
759
- \subsection{Violin Plot and Jitter}\label{violin-plot-and-jitter}
760
-
761
- The boxplot with jitter did look a bit overwhelming. The next plot uses
762
- a variation of a boxplot known as a \emph{violin plot} with jittered
763
- data.
764
-
765
- \href{https://en.wikipedia.org/wiki/Violin_plot}{From Wikipedia}
766
-
767
- \begin{quote}
768
- A violin plot is a method of plotting numeric data. It is similar to a
769
- box plot with a rotated kernel density plot on each side.
770
-
771
- A violin plot has four layers. The outer shape represents all possible
772
- results, with thickness indicating how common. (Thus the thickest
773
- section represents the mode average.) The next layer inside represents
774
- the values that occur 95\% of the time. The next layer (if it exists)
775
- inside represents the values that occur 50\% of the time. The central
776
- dot represents the median average value.
777
- \end{quote}
778
-
779
- \begin{Shaded}
780
- \begin{Highlighting}[]
781
- \NormalTok{R.png(}\StringTok{"figures/violin_with_jitter.png"}\NormalTok{)}
782
-
783
- \OtherTok{@violin}\NormalTok{ = }\OtherTok{@base_tooth}\NormalTok{ + R.geom_violin(E.aes(}\StringTok{fill: :dose}\NormalTok{)) + }
784
- \NormalTok{ R.facet_grid(+}\StringTok{:all}\NormalTok{ =~ +}\StringTok{:supp}\NormalTok{) +}
785
- \NormalTok{ R.geom_jitter(}\StringTok{shape: }\DecValTok{23}\NormalTok{, }\StringTok{color: "cyan3"}\NormalTok{, }\StringTok{size: }\DecValTok{1}\NormalTok{) +}
786
- \NormalTok{ R.scale_fill_manual(}\StringTok{values: }\NormalTok{R.c(}\StringTok{"cyan"}\NormalTok{, }\StringTok{"deepskyblue"}\NormalTok{, }\StringTok{"deepskyblue4"}\NormalTok{),}
787
- \StringTok{breaks: }\NormalTok{R.c(}\StringTok{"2"}\NormalTok{,}\StringTok{"1"}\NormalTok{,}\StringTok{"0.5"}\NormalTok{))}
788
-
789
- \NormalTok{puts }\OtherTok{@violin}
790
-
791
- \NormalTok{R.dev__off}
792
- \end{Highlighting}
793
- \end{Shaded}
794
-
795
- \begin{figure}
796
- \centering
797
- \includegraphics[width=0.70000\textwidth]{figures/violin_with_jitter.png}
798
- \caption{}
799
- \end{figure}
800
-
801
- This plot is an alternative to the original boxplot. For the final
802
- presentation, it is important to think which graphics will be best
803
- understood by our audience. A violin plot is a less known plot and could
804
- add mental overhead, yet, in my opinion, it does look a lit bit better
805
- than the boxplot and provides even more information than the boxplot
806
- with jitter.
807
-
808
- \subsection{Adding Decoration}\label{adding-decoration}
809
-
810
- Our final plot is starting to take shape, but a presentation plot should
811
- have at least a title, labels on the axis and maybe some other
812
- decorations. Let's start adding those. Since decoration requires more
813
- graph area, this new plot has a `width' and `height' specification. When
814
- there is no specification, the default values for width and height are
815
- 480.
816
-
817
- The `labs' function adds require decoration. In this example we use
818
- `title', `subtitle', `x' for the \(x\) axis label and `y', for the \(y\)
819
- axis label, and `caption' for information about the plot.
820
-
821
- \begin{Shaded}
822
- \begin{Highlighting}[]
823
- \NormalTok{R.png(}\StringTok{"figures/facets_with_decorations.png"}\NormalTok{, }\StringTok{width: }\DecValTok{540}\NormalTok{, }\StringTok{height: }\DecValTok{560}\NormalTok{)}
824
-
825
- \NormalTok{caption = <<-}\KeywordTok{EOT}
826
- \OtherTok{Length of odontoblasts in 60 guinea pigs. }
827
- \OtherTok{Each animal received one of three dose levels of vitamin C.}
828
- \KeywordTok{EOT}
829
-
830
- \OtherTok{@decorations}\NormalTok{ =}
831
- \NormalTok{ R.labs(}\StringTok{title: "Tooth Growth: Length by Dose"}\NormalTok{,}
832
- \StringTok{subtitle: "Faceted by delivery method, (OJ) or (VC)"}\NormalTok{,}
833
- \StringTok{x: "Dose (mg)"}\NormalTok{, }\StringTok{y: "Teeth length"}\NormalTok{,}
834
- \StringTok{caption: }\NormalTok{caption)}
835
-
836
- \NormalTok{puts }\OtherTok{@bp}\NormalTok{ + }\OtherTok{@decorations}
837
-
838
- \NormalTok{R.dev__off}
839
- \end{Highlighting}
840
- \end{Shaded}
841
-
842
- \begin{figure}
843
- \centering
844
- \includegraphics[width=0.70000\textwidth]{figures/facets_with_decorations.png}
845
- \caption{}
846
- \end{figure}
847
-
848
- \subsection{The Corp Theme}\label{the-corp-theme}
849
-
850
- We are almost done. But the plot does not yet look nice to the eye. We
851
- are still distracted by many aspects of the graph. First, the back font
852
- color does not look good. Then plot background, borders, grids all add
853
- clutter to the plot.
854
-
855
- We will now define our corporate theme. In this theme, we remove borders
856
- and grids. The background if left for faceted plots but removed for
857
- non-faceted plots. Font colors are a shade o blue (color: `\#00080').
858
- Axis labels are moved near the end of the axis and written in `bold'.
859
-
860
- \begin{Shaded}
861
- \begin{Highlighting}[]
862
- \KeywordTok{module} \DataTypeTok{CorpTheme}
863
-
864
- \NormalTok{ R.install_and_loads }\StringTok{'RColorBrewer'}
865
-
866
- \CommentTok{#---------------------------------------------------------------------------------}
867
- \CommentTok{# face can be (1=plain, 2=bold, 3=italic, 4=bold-italic)}
868
- \CommentTok{#---------------------------------------------------------------------------------}
869
-
870
- \KeywordTok{def} \DecValTok{self}\NormalTok{.text_element(size, }\StringTok{face: "plain"}\NormalTok{, }\StringTok{hjust: }\DecValTok{nil}\NormalTok{)}
871
- \NormalTok{ E.element_text(}\StringTok{color: "#000080"}\NormalTok{, }
872
- \StringTok{face: }\NormalTok{face,}
873
- \StringTok{size: }\NormalTok{size,}
874
- \StringTok{hjust: }\NormalTok{hjust)}
875
- \KeywordTok{end}
876
-
877
- \CommentTok{#---------------------------------------------------------------------------------}
878
- \CommentTok{# Defines the plot theme (visualization). In this theme we remove major and minor}
879
- \CommentTok{# grids, borders and background. We also turn-off scientific notation.}
880
- \CommentTok{#---------------------------------------------------------------------------------}
881
-
882
- \KeywordTok{def} \DecValTok{self}\NormalTok{.global_theme(faceted = }\DecValTok{false}\NormalTok{)}
883
-
884
- \NormalTok{ R.options(}\StringTok{scipen: }\DecValTok{999}\NormalTok{) }\CommentTok{# turn-off scientific notation like 1e+48}
885
- \CommentTok{# R.theme_set(R.theme_bw)}
886
-
887
- \CommentTok{# remove major grids}
888
- \NormalTok{ gb = R.theme(}\StringTok{panel__grid__major: }\NormalTok{E.element_blank())}
889
- \CommentTok{# remove minor grids}
890
- \NormalTok{ gb = gb + R.theme(}\StringTok{panel__grid__minor: }\NormalTok{E.element_blank)}
891
- \CommentTok{# gb = R.theme(panel__grid__minor: E.element_blank)}
892
- \CommentTok{# remove border}
893
- \NormalTok{ gb = gb + R.theme(}\StringTok{panel__border: }\NormalTok{E.element_blank)}
894
- \CommentTok{# remove background. When working with faceted graphs, the background makes}
895
- \CommentTok{# it easier to see each facet, so leave it}
896
- \NormalTok{ gb = gb + R.theme(}\StringTok{panel__background: }\NormalTok{E.element_blank) }\KeywordTok{if}\NormalTok{ !faceted}
897
- \CommentTok{# Change axis font}
898
- \NormalTok{ gb = gb + R.theme(}\StringTok{axis__text: }\NormalTok{text_element(}\DecValTok{8}\NormalTok{))}
899
- \CommentTok{# change axis title font}
900
- \NormalTok{ gb = gb + R.theme(}\StringTok{axis__title: }\NormalTok{text_element(}\DecValTok{10}\NormalTok{, }\StringTok{face: "bold"}\NormalTok{, }\StringTok{hjust: }\DecValTok{1}\NormalTok{))}
901
- \CommentTok{# change font of title}
902
- \NormalTok{ gb = gb + R.theme(}\StringTok{title: }\NormalTok{text_element(}\DecValTok{12}\NormalTok{, }\StringTok{face: "bold"}\NormalTok{))}
903
- \CommentTok{# change font of subtitle}
904
- \NormalTok{ gb = gb + R.theme(}\StringTok{plot__subtitle: }\NormalTok{text_element(}\DecValTok{9}\NormalTok{))}
905
- \CommentTok{# change font of captions}
906
- \NormalTok{ gb = gb + R.theme(}\StringTok{plot__caption: }\NormalTok{text_element(}\DecValTok{8}\NormalTok{))}
907
-
908
- \KeywordTok{end}
909
-
910
- \KeywordTok{end}
911
- \end{Highlighting}
912
- \end{Shaded}
913
-
914
- \subsection{Final Box Plot}\label{final-box-plot}
915
-
916
- Here is our final boxplot, without jitter.
917
-
918
- \begin{Shaded}
919
- \begin{Highlighting}[]
920
- \NormalTok{R.png(}\StringTok{"figures/final_box_plot.png"}\NormalTok{, }\StringTok{width: }\DecValTok{540}\NormalTok{, }\StringTok{height: }\DecValTok{560}\NormalTok{)}
921
-
922
- \NormalTok{puts }\OtherTok{@bp}\NormalTok{ + }\OtherTok{@decorations}\NormalTok{ + }\DataTypeTok{CorpTheme}\NormalTok{.global_theme(}\StringTok{faceted: }\DecValTok{true}\NormalTok{)}
923
-
924
- \NormalTok{R.dev__off}
925
- \end{Highlighting}
926
- \end{Shaded}
927
-
928
- \begin{figure}
929
- \centering
930
- \includegraphics[width=0.70000\textwidth]{figures/final_box_plot.png}
931
- \caption{}
932
- \end{figure}
933
-
934
- \subsection{Final Violin Plot}\label{final-violin-plot}
935
-
936
- Here is the final violin plot, with jitter and the same look and feel of
937
- the corporate boxplot.
938
-
939
- \begin{Shaded}
940
- \begin{Highlighting}[]
941
- \NormalTok{R.png(}\StringTok{"figures/final_violin_plot.png"}\NormalTok{, }\StringTok{width: }\DecValTok{540}\NormalTok{, }\StringTok{height: }\DecValTok{560}\NormalTok{)}
942
-
943
- \NormalTok{puts }\OtherTok{@violin}\NormalTok{ + }\OtherTok{@decorations}\NormalTok{ + }\DataTypeTok{CorpTheme}\NormalTok{.global_theme(}\StringTok{faceted: }\DecValTok{true}\NormalTok{)}
944
-
945
- \NormalTok{R.dev__off}
946
- \end{Highlighting}
947
- \end{Shaded}
948
-
949
- \begin{figure}
950
- \centering
951
- \includegraphics[width=0.70000\textwidth]{figures/final_violin_plot.png}
952
- \caption{}
953
- \end{figure}
954
-
955
- \subsection{Another View}\label{another-view}
956
-
957
- Finally, here is a last plot, with the same look and feel as before but
958
- facetted by dose and not by supplement.
959
-
960
- \begin{Shaded}
961
- \begin{Highlighting}[]
962
- \NormalTok{R.png(}\StringTok{"figures/facet_by_dose.png"}\NormalTok{, }\StringTok{width: }\DecValTok{540}\NormalTok{, }\StringTok{height: }\DecValTok{560}\NormalTok{)}
963
-
964
- \NormalTok{caption = <<-}\KeywordTok{EOT}
965
- \OtherTok{Length of odontoblasts in 60 guinea pigs. }
966
- \OtherTok{Each animal received one of three dose levels of vitamin C.}
967
- \KeywordTok{EOT}
968
-
969
- \OtherTok{@bp}\NormalTok{ = }\OtherTok{@tooth_growth}\NormalTok{.ggplot(E.aes(}\StringTok{x: :supp}\NormalTok{, }\StringTok{y: :len}\NormalTok{, }\StringTok{group: :supp}\NormalTok{)) + }
970
- \NormalTok{ R.geom_boxplot(E.aes(}\StringTok{fill: :supp}\NormalTok{)) + R.facet_grid(+}\StringTok{:all}\NormalTok{ =~ +}\StringTok{:dose}\NormalTok{) +}
971
- \NormalTok{ R.scale_fill_manual(}\StringTok{values: }\NormalTok{R.c(}\StringTok{"cyan"}\NormalTok{, }\StringTok{"deepskyblue4"}\NormalTok{)) +}
972
- \NormalTok{ R.labs(}\StringTok{title: "Tooth Growth: Length by Dose"}\NormalTok{,}
973
- \StringTok{subtitle: "Faceted by dose"}\NormalTok{,}
974
- \StringTok{x: "Delivery method"}\NormalTok{, }\StringTok{y: "Teeth length"}\NormalTok{,}
975
- \StringTok{caption: }\NormalTok{caption) +}
976
- \DataTypeTok{CorpTheme}\NormalTok{.global_theme(}\StringTok{faceted: }\DecValTok{true}\NormalTok{)}
977
- \NormalTok{puts }\OtherTok{@bp}
978
-
979
- \NormalTok{R.dev__off}
980
- \end{Highlighting}
981
- \end{Shaded}
982
-
983
- \begin{figure}
984
- \centering
985
- \includegraphics[width=0.70000\textwidth]{figures/facet_by_dose.png}
986
- \caption{}
987
- \end{figure}
988
-
989
- \section{Conclusion}\label{conclusion}
990
-
991
- Galaaz tightly couples Ruby and R in a way that Ruby developers do not
992
- need to be aware of the executing R engine. For the Ruby developer the
993
- existence of R is of no consequence. For her, she is just coding in
994
- Ruby. On the other hand, for the R developer, migration to Ruby is a
995
- matter of small syntactic changes and very gentle learning curve. As the
996
- R developer becomes more proficient in Ruby, he can start using
997
- `classes', `modules', `procs', `lambdas'.
998
-
999
- This coupling shows the power of GraalVM and Truffle polyglot
1000
- environment. Trying to bring to Ruby the power of R starting from
1001
- scratch is an enourmous endeavour and would probably never be
1002
- accomplished. Today's data scientists would certainly stick with either
1003
- Python or R. Now, both the Ruby and R communities might benefit from
1004
- this marriage. Also, the process to couple Ruby and R can be also be
1005
- done to couple Ruby and JavaScript and maybe also Ruby and Python. In a
1006
- polyglot world a \emph{uniglot} language might be extremely relevant.
1007
-
1008
- From the perspective of performance, GraalVM and Truffle promises
1009
- improvements that could reach over 10 times, both for
1010
- \href{https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb}{FastR}
1011
- and for
1012
- \href{https://rubykaigi.org/2018/presentations/eregontp.html}{TruffleRuby}.
1013
-
1014
- This article has shown how to improve a plot step-by-step. Starting from
1015
- a very simple boxplot with all default configurations, we moved slowly
1016
- to our final plot. The important point here is not if the final plot is
1017
- actually beautiful, but that there is a process of small steps
1018
- improvements that can be followed until getting a final plot ready for
1019
- presentation.
1020
-
1021
- Finally, this whole article was written in rmarkdown and compiled to
1022
- HTML by \emph{gknit}, an application that wraps \emph{knitr} and allows
1023
- documenting Ruby code. This application can be of great help for any
1024
- Rubyist trying to write articles, blogs or documentation for Ruby.
1025
-
1026
- \section{Installing Galaaz}\label{installing-galaaz}
1027
-
1028
- \subsection{Prerequisites}\label{prerequisites}
1029
-
1030
- \begin{itemize}
1031
- \tightlist
1032
- \item
1033
- GraalVM (\textgreater{}= rc8):
1034
- \url{https://github.com/oracle/graal/releases}
1035
- \item
1036
- TruffleRuby
1037
- \item
1038
- FastR
1039
- \end{itemize}
1040
-
1041
- The following R packages will be automatically installed when necessary,
1042
- but could be installed prior to using gKnit if desired:
1043
-
1044
- \begin{itemize}
1045
- \tightlist
1046
- \item
1047
- ggplot2
1048
- \item
1049
- gridExtra
1050
- \item
1051
- knitr
1052
- \end{itemize}
1053
-
1054
- Installation of R packages requires a development environment and can be
1055
- time consuming. In Linux, the gnu compiler and tools should be enough. I
1056
- am not sure what is needed on the Mac.
1057
-
1058
- \subsection{Preparation}\label{preparation}
1059
-
1060
- \begin{itemize}
1061
- \tightlist
1062
- \item
1063
- gem install galaaz
1064
- \end{itemize}
1065
-
1066
- \subsection{Usage}\label{usage}
1067
-
1068
- \begin{itemize}
1069
- \tightlist
1070
- \item
1071
- gknit
1072
- \item
1073
- In a scrip add: require `galaaz'
1074
- \end{itemize}
1075
-
1076
-
1077
- \end{document}