galaaz 0.4.8 → 0.4.9

Sign up to get free protection for your applications and to get access to all the features.
@@ -4,15 +4,16 @@ author:
4
4
  - "Rodrigo Botafogo"
5
5
  - "Daniel Mossé - University of Pittsburgh"
6
6
  tags: [Tech, Data Science, Ruby, R, GraalVM]
7
- date: "20/02/2019"
7
+ date: "29/04/2019"
8
+ bibliography: stats.bib
8
9
  output:
9
- html_document:
10
- self_contained: true
11
- keep_md: true
12
10
  pdf_document:
13
11
  includes:
14
12
  in_header: ["../../sty/galaaz.sty"]
15
13
  number_sections: yes
14
+ html_document:
15
+ self_contained: true
16
+ keep_md: true
16
17
  ---
17
18
 
18
19
  ```{r setup, echo=FALSE}
@@ -21,7 +22,8 @@ output:
21
22
 
22
23
  # Introduction
23
24
 
24
- The idea of "literate programming" was first introduced by Donald Knuth in the 1980's.
25
+ The idea of "literate programming" was first introduced by Donald Knuth in the
26
+ 1980's [@Knuth:literate_programming].
25
27
  The main intention of this approach was to develop software interspersing macro snippets,
26
28
  traditional source code, and a natural language such as English in a document
27
29
  that could be compiled into
@@ -37,19 +39,22 @@ single document or set of documents that when distributed to peers could be reru
37
39
  the same output and reports.
38
40
 
39
41
  The R community has put a great deal of effort in reproducible research. In 2002, Sweave was
40
- introduced and it allowed mixing R code with Latex generating high quality PDF documents. Those
41
- documents could include the code, the result of executing the code, graphics and text. This
42
- contained the whole narrative to reproduce the research. But Sweave had many problems and in
43
- 2012, Knitr, developed by Yihui Xie from RStudio was released, solving many of the long lasting
44
- problems from Sweave and including in one single package many extensions and add-on packages that
42
+ introduced and it allowed mixing R code with Latex generating high quality PDF documents. A
43
+ Sweave
44
+ document could include code, the results of executing the code, graphics and text such that it
45
+ contained the whole narrative to reproduce the research. In
46
+ 2012, Knitr, developed by Yihui Xie from RStudio was released to replace Sweave and to
47
+ consolidate in one single package the many extensions and add-on packages that
45
48
  were necessary for Sweave.
46
49
 
47
- With Knitr, R markdown was also developed, an extension to the
48
- Markdown format. With R markdown and Knitr it is possible to generate reports in a multitude
49
- of formats such as HTML, markdown, Latex, PDF, dvi, etc. R markdown also allows the use of
50
- multiple programming languages in the same document. In R markdown text is interspersed with
50
+ With Knitr, __R markdown__ was also developed, an extension to the
51
+ Markdown format. With __R markdown__ and Knitr it is possible to generate reports in a multitude
52
+ of formats such as HTML, markdown, Latex, PDF, dvi, etc. __R markdown__ also allows the use of
53
+ multiple programming languages such as R, Ruby, Python, etc. in the same document.
54
+
55
+ In __R markdown__, text is interspersed with
51
56
  code chunks that can be executed and both the code and its results can become
52
- part of the final report. Although R markdown allows multiple programming languages in the
57
+ part of the final report. Although __R markdown__ allows multiple programming languages in the
53
58
  same document, only R and Python (with
54
59
  the reticulate package) can persist variables between chunks. For other languages, such as
55
60
  Ruby, every chunk will start a new process and thus all data is lost between chunks, unless it
@@ -57,46 +62,76 @@ is somehow stored in a data file that is read by the next chunk.
57
62
 
58
63
  Being able to persist data
59
64
  between chunks is critical for literate programming otherwise the flow of the narrative is lost
60
- by all the effort of having to save data and then reload it. Probably, because of
61
- this impossibility,
62
- it is very rare to see any R markdown document in the Ruby community. Also, the use of
63
- R markdown for the Ruby community would also require the Ruby developer to download R and
64
- have some minimal knowledge of Knitr.
65
+ by all the effort of having to save data and then reload it. Although this might, at first, seem like
66
+ a small nuisance, not being able to persist data between chunks is a major issue. For example, let's
67
+ take a look at the following simple example in which we want to show how to create a list and the
68
+ use it. Let's first assume that data cannot be persisted between chunks. In the next chunk we
69
+ create a list, then we would need to save it to file, but to save it, we need somehow to marshal the
70
+ data into a binary format:
71
+
72
+ ```{ruby no_persistence}
73
+ lst = R.list(a: 1, b: 2, c: 3)
74
+ lst.saveRDS("lst.rds")
75
+ ```
76
+ then, on the next chunk, where variable 'lst' is used, we need to read back it's value
77
+
78
+ ```{ruby load_persisted_data}
79
+ lst = R.readRDS("lst.rds")
80
+ puts lst
81
+ ```
82
+
83
+ Now, any single code has dozens of variables that we might want to use and reuse between chunks.
84
+ Clearly, such an approach becomes quickly unmanageable. Probably, because of
85
+ this problem, it is very rare to see any __R markdown__ document in the Ruby community.
86
+
87
+ When variables can be used accross chunks, then no overhead is needed:
88
+
89
+ ```{ruby persistence}
90
+ @lst = R.list(a: 1, b: 2, c: 3)
91
+ # any other code can be added here
92
+ ```
93
+
94
+ ```{ruby use_var}
95
+ puts @lst
96
+ ```
65
97
 
66
98
  In the Python community, the same effort to have code and text in an integrated environment
67
99
  started around the first decade of 2000. In 2006 iPython 0.7.2 was released. In 2014,
68
100
  Fernando Pérez, spun off project Jupyter from iPython creating a web-based interactive
69
101
  computation environment. Jupyter can now be used with many languages, including Ruby with the
70
- iruby gem (https://github.com/SciRuby/iruby). I am not sure if multiple languages can be used
71
- in a Jupyter notebook and if variables can persist between chunks.
102
+ iruby gem (https://github.com/SciRuby/iruby). In order to have multiple languages in a Jupyter
103
+ notebook the SoS kernel was developed (https://vatlab.github.io/sos-docs/).
72
104
 
73
105
  # gKnitting a Document
74
106
 
75
- This document describes gKnit. gKnit uses Knitr and R markdown to knit a document in Ruby or R
76
- and output it in any of the available formats for R markdown.
107
+ This document describes gKnit. gKnit is based on knitr and __R markdown__ and can knit a document
108
+ written both in Ruby and/or R and output it in any of the available formats of __R markdown__. gKnit
109
+ allows ruby developers to do literate programming and reproducible research by allowing them to
110
+ have in a single document, text and code.
111
+
77
112
  gKnit runs atop of GraalVM, and Galaaz (an integration
78
- library between Ruby and R). In gKnit, Ruby variables are persisted between chunks, making
79
- it an ideal solution for literate programming in this language. Also, since it is based on
80
- Galaaz, Ruby chunks can have access to R variables and Polyglot Programming with Ruby and R
81
- is quite natural.
113
+ library between Ruby and R - see bellow). In gKnit, Ruby variables are persisted between
114
+ chunks, making it an ideal solution for literate programming in this language. Also,
115
+ since it is based on Galaaz, Ruby chunks can have access to R variables and Polyglot Programming
116
+ with Ruby and R is quite natural.
82
117
 
83
- Galaaz has been describe already in the following posts:
118
+ Galaaz has already been describe in the following posts:
84
119
 
85
120
  * https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021.
86
121
  * https://medium.freecodecamp.org/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857
87
122
 
88
- This is not a blog post on R markdown, and the interested user is directed to the following links
123
+ This is not a blog post on __R markdown__, and the interested user is directed to the following links
89
124
  for detailed information on its capabilities and use.
90
125
 
91
126
  * https://rmarkdown.rstudio.com/ or
92
127
  * https://bookdown.org/yihui/rmarkdown/
93
128
 
94
- Here, we will describe quickly the main aspects of R markdown, so the user can start gKnitting
95
- Ruby and R documents quickly.
129
+ In this post, we will describe just the main aspects of __R markdown__, so the user can start
130
+ gKnitting Ruby and R documents quickly.
96
131
 
97
132
  ## The Yaml header
98
133
 
99
- An R markdown document should start with a Yaml header and be stored in a file with
134
+ An __R markdown__ document should start with a Yaml header and be stored in a file with
100
135
  '.Rmd' extension. This document has the following header for gKitting an HTML document.
101
136
 
102
137
  ```
@@ -120,7 +155,7 @@ output:
120
155
 
121
156
  For more information on the options in the Yaml header, check https://bookdown.org/yihui/rmarkdown/html-document.html.
122
157
 
123
- ## R Markdown formatting
158
+ ## __R Markdown__ formatting
124
159
 
125
160
  Document formatting can be done with simple markups such as:
126
161
 
@@ -156,7 +191,7 @@ Ordered Lists
156
191
  + Item 3b
157
192
  ```
158
193
 
159
- Please, go to https://rmarkdown.rstudio.com/authoring_basics.html, for more R markdown formatting.
194
+ For more R markdown formatting go to https://rmarkdown.rstudio.com/authoring_basics.html.
160
195
 
161
196
  ### R chunks
162
197
 
@@ -172,8 +207,7 @@ any optional chunk_label and options, as shown bellow:
172
207
  ````
173
208
 
174
209
  for instance, let's add an R chunk to the document labeled 'first_r_chunk'. This is
175
- a very simple code just to create a variable and print it out. The code block should
176
- be defined as follows:
210
+ a very simple code just to create a variable and print it out, as follows:
177
211
 
178
212
  ````
179
213
  ```{r first_r_chunk}`r ''`
@@ -182,7 +216,7 @@ print(vec)
182
216
  ```
183
217
  ````
184
218
 
185
- If this block is added to an R markdown document and gKnitted the result will be:
219
+ If this block is added to an __R markdown__ document and gKnitted the result will be:
186
220
 
187
221
  ```{r first_r_chunk}
188
222
  vec <- c(1, 2, 3)
@@ -208,9 +242,9 @@ vec3 <- vec * vec2
208
242
  print(vec3)
209
243
  ```
210
244
 
211
- A description of the available chunk options can be found in the documentation cited above.
245
+ A description of the available chunk options can be found in https://yihui.name/knitr/.
212
246
 
213
- Let's add another R chunkd with a function definition. In this example, a vector
247
+ Let's add another R chunk with a function definition. In this example, a vector
214
248
  'r_vec' is created and
215
249
  a new function 'reduce_sum' is defined. The chunk specification is
216
250
 
@@ -224,8 +258,8 @@ reduce_sum <- function(...) {
224
258
  ```
225
259
  ````
226
260
 
227
- and this is how it will look like once executed. From now on, we will not
228
- show the chunk definition any longer.
261
+ and this is how it will look like once executed. From now on, to be concise in the
262
+ presentation we will not show chunk definitions any longer.
229
263
 
230
264
 
231
265
  ```{r data_creation}
@@ -249,11 +283,28 @@ this document. Note that there is no directive in the code to include the image
249
283
  occurs automatically. The 'mpg' dataframe is natively available to R and to Galaaz as
250
284
  well.
251
285
 
286
+ For the reader not knowledgeable of ggplot, ggplot is a graphics library based on "the
287
+ grammar of graphics" [@Wilkinson:grammar_of_graphics]. The idea of the grammar of graphics
288
+ is to build a graphics by adding layers to the plot. More information can be found in
289
+ https://towardsdatascience.com/a-comprehensive-guide-to-the-grammar-of-graphics-for-effective-visualization-of-multi-dimensional-1f92b4ed4149.
290
+
291
+ In the plot bellow the 'mpg' dataset from base R is used. "The data concerns city-cycle fuel
292
+ consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5
293
+ continuous attributes." (Quinlan, 1993)
294
+
295
+ First, the 'mpg' dataset if filtered to extract only cars from the following manumactures: Audi, Ford,
296
+ Honda, and Hyundai and stored in the 'mpg_select' variable. Then, the selected dataframe is passed
297
+ to the ggplot function specifying in the aesthetic method (aes) that 'displacement' (disp) should
298
+ be plotted in the 'x' axis and 'city mileage' should be on the 'y' axis. In the 'labs' layer we
299
+ pass the 'title' and 'subtitle' for the plot. To the basic plot 'g', geom\_jitter is added, that
300
+ plots cars from the same manufactures with the same color (col=manufactures) and the size of the
301
+ car point equal its high way consumption (size = hwy). Finally, a last layer is plotter containing
302
+ a linear regression line (method = "lm") for every manufacturer.
303
+
252
304
  ```{r bubble, dev='png'}
253
305
  # load package and data
254
306
  library(ggplot2)
255
307
  data(mpg, package="ggplot2")
256
- # mpg <- read.csv("http://goo.gl/uEeRGu")
257
308
 
258
309
  mpg_select <- mpg[mpg$manufacturer %in% c("audi", "ford", "honda", "hyundai"), ]
259
310
 
@@ -282,20 +333,17 @@ available to R chunks. Future versions will add those options.
282
333
 
283
334
  In this example, the ruby chunk is called 'first_ruby_chunk'. One important
284
335
  aspect of chunk labels is that they cannot be duplicated. If a chunk label is
285
- duplicated, gKnitting will stop with an error.
336
+ duplicated, gKnit will stop with an error.
286
337
 
287
- Another relevant point with Ruby chunks is that they are evaluated in the scope
338
+ Another important point with Ruby chunks is that they are evaluated in the scope
288
339
  of a class called RubyChunk. To make sure that variables are
289
340
  available between chunks, they should be made as instance variables of the
290
341
  RubyChunk class. In the following chunk, variable '\@a', '\@b' and '\@c'
291
342
  are standard Ruby variables and '\@vec' and '\@vec2' are two vectors created
292
343
  by calling the 'c' method on the R module.
293
344
 
294
- In Galaaz, the R module allows us to access R functions transparently. The 'c'
345
+ In Galaaz, the R module allows us to access R functions transparently. The 'c'
295
346
  function in R, is a function that concatenates its arguments making a vector.
296
- Calling the 'c' method in the R module is automatically converted to calling the
297
- 'c' function in R, that, through Galaaz and the Truffle interface creates the
298
- vector.
299
347
 
300
348
  It
301
349
  should be clear that there is no requirement in gknit to call or use any R
@@ -318,7 +366,7 @@ puts @a
318
366
  puts @vec * @vec2
319
367
  ```
320
368
 
321
- Note that @a is a standard Ruby Array and @vec and @vec2 are vectors that behave accordingly,
369
+ Note that \@a is a standard Ruby Array and \@vec and \@vec2 are vectors that behave accordingly,
322
370
  where multiplication works as expected.
323
371
 
324
372
 
@@ -344,7 +392,7 @@ puts R.reduce_sum(~:r_vec)
344
392
  ### Ruby Plotting
345
393
 
346
394
  We have seen an example of plotting with R. Plotting with Ruby does not require
347
- anything different from plotting with R. In the following example we plot a
395
+ anything different from plotting with R. In the following example, we plot a
348
396
  diverging bar graph using the 'mtcars' dataframe from R. This data was extracted
349
397
  from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects
350
398
  of automobile design and performance for 32 automobiles (1973–74 models). The
@@ -363,18 +411,40 @@ ten aspects are:
363
411
  * carb: Number of carburetors
364
412
 
365
413
 
366
- ```{ruby diverging_bar, fig.width = 9.1, fig.height = 6.5}
367
- require 'ggplot'
414
+ ```{ruby diverging_plot_pre}
415
+ # copy the R variable :mtcars to the Ruby mtcars variable
416
+ @mtcars = ~:mtcars
368
417
 
369
- mtcars = ~:mtcars
418
+ # create a new column 'car_name' to store the car names so that it can be
419
+ # used for plotting. The 'rownames' of the data frame cannot be used as
420
+ # data for plotting
421
+ @mtcars.car_name = R.rownames(:mtcars)
422
+
423
+ # compute normalized mpg and add it to a new column called mpg_z
424
+ # Note that the mean value for mpg can be obtained by calling the 'mean'
425
+ # function on the vector 'mtcars.mpg'. The same with the standard
426
+ # deviation 'sd'. The vector is then rounded to two digits with 'round 2'
427
+ @mtcars.mpg_z = ((@mtcars.mpg - @mtcars.mpg.mean)/@mtcars.mpg.sd).round 2
428
+
429
+ # create a new column 'mpg_type'. Function 'ifelse' is a vectorized function
430
+ # that looks at every element of the mpg_z vector and if the value is below
431
+ # 0, returns 'below', otherwise returns 'above'
432
+ @mtcars.mpg_type = (@mtcars.mpg_z < 0).ifelse("below", "above")
433
+
434
+ # order the mtcar data set by the mpg_z vector from smaler to larger values
435
+ @mtcars = @mtcars[@mtcars.mpg_z.order, :all]
370
436
 
371
- mtcars.car_name = mtcars.rownames # create new column for car names
372
- mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean) / mtcars.mpg.sd).round 2
373
- mtcars.mpg_type = (mtcars.mpg_z < 0).ifelse('below', 'above')
374
- mtcars = mtcars[mtcars.mpg_z.order, :all]
375
- mtcars.car_name = R.factor(mtcars.car_name, levels: mtcars.car_name)
437
+ # convert the car_name column to a factor to retain sorted order in plot
438
+ @mtcars.car_name = @mtcars.car_name.factor levels: @mtcars.car_name
439
+
440
+ # let's look at the first records of the final data frame
441
+ puts @mtcars.head
442
+ ```
443
+
444
+ ```{ruby diverging_bar, fig.width = 9.1, fig.height = 6.5}
445
+ require 'ggplot'
376
446
 
377
- puts mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
447
+ puts @mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
378
448
  R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
379
449
  R.scale_fill_manual(name: 'Mileage',
380
450
  labels: R.c('Above Average', 'Below Average'),
@@ -399,10 +469,6 @@ This is some text with inline Ruby accessing variable \@b which has value:
399
469
  and is followed by some other text!
400
470
  ````
401
471
 
402
- Note that it is important not to add any new line before of after the code
403
- block if we want everything to be in only one line, resulting in the following sentence
404
- with inline Ruby code
405
-
406
472
  <div style="margin-bottom:30px;">
407
473
  </div>
408
474
 
@@ -414,23 +480,28 @@ and is followed by some other text!
414
480
  <div style="margin-bottom:30px;">
415
481
  </div>
416
482
 
483
+ Note that it is important not to add any new line before of after the code
484
+ block if we want everything to be in only one line, resulting in the following sentence
485
+ with inline Ruby code.
486
+
417
487
 
418
488
  ```{ruby heading, echo = FALSE}
419
489
  outputs "### #{@c}"
420
490
  ```
421
491
 
422
- He have previously used the standard 'puts' method in Ruby chunks in order to get some
423
- output. As can be seen, the result of a 'puts' is formatted inside a white box that
492
+ He have previously used the standard 'puts' method in Ruby chunks in order produce
493
+ output. The result of a 'puts', as seen in all previous chunks that use it, is formatted
494
+ inside a white box that
424
495
  follows the code block. Many times however, we would like to do some processing in the
425
496
  Ruby chunk and have the result of this processing generate and output that is
426
- 'included' in the document as if we had typed it in R markdown.
497
+ "included" in the document as if we had typed it in __R markdown__ document.
427
498
 
428
- For example, suppose we want to create a new 'heading' in our document, but the heading
499
+ For example, suppose we want to create a new heading in our document, but the heading
429
500
  phrase is the result of some code processing: maybe it's the first line of a file we are
430
- going to read. Method 'outputs' adds its output as if typed in the R markdown document.
501
+ going to read. Method 'outputs' adds its output as if typed in the __R markdown__ document.
431
502
 
432
- Take now a look at variable '@c' (it was defined in a previous block above) as
433
- '@c = "The 'outputs' function". "The 'outputs' function" is actually the name of this
503
+ Take now a look at variable '\@c' (it was defined in a previous block above) as
504
+ '\@c = "The 'outputs' function". "The 'outputs' function" is actually the name of this
434
505
  section and it was created using the 'outputs' function inside a Ruby chunk.
435
506
 
436
507
  The ruby chunk to generate this heading is:
@@ -441,14 +512,14 @@ outputs "### #{@c}"
441
512
  ```
442
513
  ````
443
514
 
444
- The three '###' are the way we add a Heading 3 in R markdown.
515
+ The three '###' is the way we add a Heading 3 in __R markdown__.
445
516
 
446
517
 
447
518
  ### HTML Output from Ruby Chunks
448
519
 
449
- We've just seen the use of method 'outputs' to add text to the the R markdown
450
- document. This technique can also be used to add HTML code to the document. In R
451
- markdown any html code typed directly in the document will be properly rendered.
520
+ We've just seen the use of method 'outputs' to add text to the the __R markdown__
521
+ document. This technique can also be used to add HTML code to the document. In
522
+ __R markdown__, any html code typed directly in the document will be properly rendered.
452
523
  Here, for instance, is a table definition in HTML and its output in the document:
453
524
 
454
525
  ```
@@ -494,8 +565,10 @@ Here, for instance, is a table definition in HTML and its output in the document
494
565
  <div style="margin-bottom:30px;">
495
566
  </div>
496
567
 
497
- But manually creating HTML output is not always easy or desirable. The above
498
- table certainly looks ugly. The 'kableExtra' library is a great library for
568
+ But manually creating HTML output is not always easy or desirable, specially
569
+ if we intend the document to be rendered in other formats, for example, as Latex.
570
+ Also, The above
571
+ table looks ugly. The 'kableExtra' library is a great library for
499
572
  creating beautiful tables. Take a look at https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
500
573
 
501
574
  In the next chunk, we output the 'mtcars' dataframe from R in a nicely formatted
@@ -506,39 +579,38 @@ R.install_and_loads('kableExtra')
506
579
  outputs (~:mtcars).kable.kable_styling
507
580
  ```
508
581
 
509
- ### Including Ruby files
582
+ ### Including Ruby files in a chunk
510
583
 
511
584
  R is a language that was created to be easy and fast for statisticians to use. As far
512
- as I know (and please correct me if you think otherwise), tt was not a
585
+ as I know, it was not a
513
586
  language to be used for developing large systems. Of course, there are large systems and
514
587
  libraries in R, but the focus of the language is for developing statistical models and
515
588
  distribute that to peers.
516
589
 
517
590
  Ruby on the other hand, is a language for large software development. Systems written in
518
- Ruby will have dozens, hundreds or even thousands of files. In order to document a
519
- large system with
520
- literate programming we cannot expect the developer to add all the files in a single '.Rmd'
521
- file. gKnit provides the 'include' chunk engine to include a Ruby file as if it had being
522
- typed in the '.Rmd' file.
591
+ Ruby will have dozens, hundreds or even thousands of files. To document a
592
+ large system with literate programming, we cannot expect the developer to add all the
593
+ files in a single '.Rmd' file. gKnit provides the 'include' chunk engine to include
594
+ a Ruby file as if it had being typed in the '.Rmd' file.
523
595
 
524
596
  To include a file, the following chunk should be created, where <filename> is the name of
525
- the file to be include and where the extension, if it is '.rb', does not need to be added.
597
+ the file to be included and where the extension, if it is '.rb', does not need to be added.
526
598
  If the 'relative' option is not included, then it is treated as TRUE. When 'relative' is
527
- true, 'require_relative' semantics is used to load the file, when false, Ruby's \$LOAD_PATH
528
- is searched to find the file and it is 'require'd.
599
+ true, ruby's 'require\_relative' semantics is used to load the file, when false, Ruby's
600
+ \$LOAD_PATH is searched to find the file and it is 'require'd.
529
601
 
530
602
  ````
531
603
  ```{include <filename>, relative = <TRUE/FALSE>}`r ''`
532
604
  ```
533
605
  ````
534
606
 
535
- Here we include file 'model.rb' which is in the same directory of this blog.
607
+ Bellow we include file 'model.rb', which is in the same directory of this blog.
536
608
  This code uses R 'caret' package to split a dataset in a train and test sets.
537
609
  The 'caret' package is a very important a useful package for doing Data Analysis,
538
610
  it has hundreds of functions for all steps of the Data Analysis workflow. To
539
- just split a dataset it is using the proverbial cannon to kill the fly. We use
540
- it here only to show that integrating Ruby and R and using even a very comples
541
- package as 'caret' is trivial with Galaaz.
611
+ use 'caret' just to split a dataset is like using the proverbial cannon to
612
+ kill the fly. We use it here only to show that integrating Ruby and R and
613
+ using even a very complex package as 'caret' is trivial with Galaaz.
542
614
 
543
615
  A word of advice: the 'caret' package has lots of dependencies and installing
544
616
  it in a Linux system is a time consuming operation. Method 'R.install_and_loads'
@@ -563,10 +635,11 @@ puts model.test.head
563
635
  ### Documenting Gems
564
636
 
565
637
  gKnit also allows developers to document and load files that are not in the same directory
566
- of the '.Rmd' file. When using 'relative = FALSE' in a chunk header, gKnit will look for the
567
- file in Ruby's \$LOAD_PATH and load it if found.
638
+ of the '.Rmd' file.
568
639
 
569
- Here is an example of loading the 'find.rb' file from TruffleRuby.
640
+ Here is an example of loading the 'find.rb' file from TruffleRuby. In this example, relative
641
+ is set to FALSE, so Ruby will look for the file in its $LOAD\_PATH, and the user does not
642
+ need to no it's directory.
570
643
 
571
644
  ````
572
645
  ```{include find, relative = FALSE}`r ''`
@@ -579,7 +652,7 @@ Here is an example of loading the 'find.rb' file from TruffleRuby.
579
652
  ## Converting to PDF
580
653
 
581
654
  One of the beauties of knitr is that the same input can be converted to many different outputs.
582
- One very useful format, is, of course, PDF. In order to converted an R markdown file to PDF
655
+ One very useful format, is, of course, PDF. In order to converted an __R markdown__ file to PDF
583
656
  it is necessary to have LaTeX installed on the system. We will not explain here how to
584
657
  install LaTeX as there are plenty of documents on the web showing how to proceed.
585
658
 
@@ -595,25 +668,36 @@ date: "29 October 2018"
595
668
  output:
596
669
  pdf_document:
597
670
  includes:
598
- in_header: ["../../sty/galaaz.sty"]
599
- number_sections: yes
671
+ in\_header: ["../../sty/galaaz.sty"]
672
+ number\_sections: yes
600
673
  ---
601
674
  ```
602
675
 
603
676
  # Conclusion
604
677
 
605
- One of the promises of GraalVM is that users/developers will be able to use the best tool
606
- for their task at hand, independently of the programming language the tool was written. Galaaz
607
- and gKnit are not trivial implementations atop the GraalVM and Truffle interop messages;
608
- however, the time and effort it took to wrap Ruby over R - Galaaz - (not finished yet) or to
609
- wrap Knitr with gKnit is a fraction of a fraction of a fraction of the time require to
678
+ In order to do reproducible research, one of the main basic tools needed is a systhem that
679
+ allows "literate programming" where text, code and possibly a set of files can be compiled
680
+ onto a report that can be easily distributed to peers. Peers should be able to use this
681
+ same set of files to rerun the compilation by their own obtaining the exact same original
682
+ report. gKnit is such a system for Ruby and R. It uses __R Markdown__ to integrate
683
+ text and code chunks, where code chunks can either be part of the __R Markdwon__ file or
684
+ be imported from files in the system. Ideally, in reproducible research, all the files
685
+ needed to rebuild a report should be easilly packed together (in the same zipped directory)
686
+ and distributed to peers for reexecution.
687
+
688
+ One of the promises of Oracle's GraalVM is that users/developers will be able to use the best tool
689
+ for their task at hand, independently of the programming language the tool was written on.
690
+ We developed and implemented Galaaz atop the GraalVM and Truffle interop messages and
691
+ the time and effort to wrap Ruby over R - Galaaz - or to
692
+ wrap Knitr with gKnit was a fraction of a fraction of a fraction (one man effort for a couple
693
+ of hours a day, for approximately six months) of the time require to
610
694
  implement the original tools. Trying to reimplement all R packages in Ruby would require the
611
- same effort it is taking Python to implement NumPy, Panda and all supporting libraries and it
695
+ same effort it is taking Python to implement NumPy, Pandas and all supporting libraries and it
612
696
  is unlikely that this effort would ever be done. GraalVM has allowed Ruby to profit "almost
613
697
  for free" from this huge set of libraries and tools that make R one of the most used
614
698
  languages for data analysis and machine learning.
615
699
 
616
- More interesting though than being able to wrap the R libraries with Ruby, is that Ruby adds
700
+ More interesting than wrapping the R libraries with Ruby, is that Ruby adds
617
701
  value to R, by allowing developers to use powerful and modern constructs for code reuse that
618
702
  are not the strong points of R. As shown in this blog, R and Ruby can easily communicate
619
703
  and R can be structured in classes and modules in a way that greatly expands its power and
@@ -644,3 +728,6 @@ the gnu compiler and tools should be enough. I am not sure what is needed on th
644
728
  ## Usage
645
729
 
646
730
  * gknit \<filename\>
731
+
732
+ # References
733
+