galaaz 0.4.8 → 0.4.9

Sign up to get free protection for your applications and to get access to all the features.
@@ -380,32 +380,59 @@ $(document).ready(function () {
380
380
 
381
381
  <div id="introduction" class="section level1">
382
382
  <h1>Introduction</h1>
383
- <p>The idea of “literate programming” was first introduced by Donald Knuth in the 1980’s. The main intention of this approach was to develop software interspersing macro snippets, traditional source code, and a natural language such as English in a document that could be compiled into executable code and at the same time easily read by a human developer. According to Knuth “The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style.”</p>
383
+ <p>The idea of “literate programming” was first introduced by Donald Knuth in the 1980’s <span class="citation">(Knuth 1984)</span>. The main intention of this approach was to develop software interspersing macro snippets, traditional source code, and a natural language such as English in a document that could be compiled into executable code and at the same time easily read by a human developer. According to Knuth “The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style.”</p>
384
384
  <p>The idea of literate programming evolved into the idea of reproducible research, in which all the data, software code, documentation, graphics etc. needed to reproduce the research and its reports could be included in a single document or set of documents that when distributed to peers could be rerun generating the same output and reports.</p>
385
- <p>The R community has put a great deal of effort in reproducible research. In 2002, Sweave was introduced and it allowed mixing R code with Latex generating high quality PDF documents. Those documents could include the code, the result of executing the code, graphics and text. This contained the whole narrative to reproduce the research. But Sweave had many problems and in 2012, Knitr, developed by Yihui Xie from RStudio was released, solving many of the long lasting problems from Sweave and including in one single package many extensions and add-on packages that were necessary for Sweave.</p>
386
- <p>With Knitr, R markdown was also developed, an extension to the Markdown format. With R markdown and Knitr it is possible to generate reports in a multitude of formats such as HTML, markdown, Latex, PDF, dvi, etc. R markdown also allows the use of multiple programming languages in the same document. In R markdown text is interspersed with code chunks that can be executed and both the code and its results can become part of the final report. Although R markdown allows multiple programming languages in the same document, only R and Python (with the reticulate package) can persist variables between chunks. For other languages, such as Ruby, every chunk will start a new process and thus all data is lost between chunks, unless it is somehow stored in a data file that is read by the next chunk.</p>
387
- <p>Being able to persist data between chunks is critical for literate programming otherwise the flow of the narrative is lost by all the effort of having to save data and then reload it. Probably, because of this impossibility, it is very rare to see any R markdown document in the Ruby community. Also, the use of R markdown for the Ruby community would also require the Ruby developer to download R and have some minimal knowledge of Knitr.</p>
388
- <p>In the Python community, the same effort to have code and text in an integrated environment started around the first decade of 2000. In 2006 iPython 0.7.2 was released. In 2014, Fernando Pérez, spun off project Jupyter from iPython creating a web-based interactive computation environment. Jupyter can now be used with many languages, including Ruby with the iruby gem (<a href="https://github.com/SciRuby/iruby" class="uri">https://github.com/SciRuby/iruby</a>). I am not sure if multiple languages can be used in a Jupyter notebook and if variables can persist between chunks.</p>
385
+ <p>The R community has put a great deal of effort in reproducible research. In 2002, Sweave was introduced and it allowed mixing R code with Latex generating high quality PDF documents. A Sweave document could include code, the results of executing the code, graphics and text such that it contained the whole narrative to reproduce the research. In 2012, Knitr, developed by Yihui Xie from RStudio was released to replace Sweave and to consolidate in one single package the many extensions and add-on packages that were necessary for Sweave.</p>
386
+ <p>With Knitr, <strong>R markdown</strong> was also developed, an extension to the Markdown format. With <strong>R markdown</strong> and Knitr it is possible to generate reports in a multitude of formats such as HTML, markdown, Latex, PDF, dvi, etc. <strong>R markdown</strong> also allows the use of multiple programming languages such as R, Ruby, Python, etc. in the same document.</p>
387
+ <p>In <strong>R markdown</strong>, text is interspersed with code chunks that can be executed and both the code and its results can become part of the final report. Although <strong>R markdown</strong> allows multiple programming languages in the same document, only R and Python (with the reticulate package) can persist variables between chunks. For other languages, such as Ruby, every chunk will start a new process and thus all data is lost between chunks, unless it is somehow stored in a data file that is read by the next chunk.</p>
388
+ <p>Being able to persist data between chunks is critical for literate programming otherwise the flow of the narrative is lost by all the effort of having to save data and then reload it. Although this might, at first, seem like a small nuisance, not being able to persist data between chunks is a major issue. For example, let’s take a look at the following simple example in which we want to show how to create a list and the use it. Let’s first assume that data cannot be persisted between chunks. In the next chunk we create a list, then we would need to save it to file, but to save it, we need somehow to marshal the data into a binary format:</p>
389
+ <pre class="ruby"><code>lst = R.list(a: 1, b: 2, c: 3)
390
+ lst.saveRDS(&quot;lst.rds&quot;)</code></pre>
391
+ <p>then, on the next chunk, where variable ‘lst’ is used, we need to read back it’s value</p>
392
+ <pre class="ruby"><code>lst = R.readRDS(&quot;lst.rds&quot;)
393
+ puts lst</code></pre>
394
+ <pre><code>## $a
395
+ ## [1] 1
396
+ ##
397
+ ## $b
398
+ ## [1] 2
399
+ ##
400
+ ## $c
401
+ ## [1] 3</code></pre>
402
+ <p>Now, any single code has dozens of variables that we might want to use and reuse between chunks. Clearly, such an approach becomes quickly unmanageable. Probably, because of this problem, it is very rare to see any <strong>R markdown</strong> document in the Ruby community.</p>
403
+ <p>When variables can be used accross chunks, then no overhead is needed:</p>
404
+ <pre class="ruby"><code>@lst = R.list(a: 1, b: 2, c: 3)
405
+ # any other code can be added here</code></pre>
406
+ <pre class="ruby"><code>puts @lst</code></pre>
407
+ <pre><code>## $a
408
+ ## [1] 1
409
+ ##
410
+ ## $b
411
+ ## [1] 2
412
+ ##
413
+ ## $c
414
+ ## [1] 3</code></pre>
415
+ <p>In the Python community, the same effort to have code and text in an integrated environment started around the first decade of 2000. In 2006 iPython 0.7.2 was released. In 2014, Fernando Pérez, spun off project Jupyter from iPython creating a web-based interactive computation environment. Jupyter can now be used with many languages, including Ruby with the iruby gem (<a href="https://github.com/SciRuby/iruby" class="uri">https://github.com/SciRuby/iruby</a>). In order to have multiple languages in a Jupyter notebook the SoS kernel was developed (<a href="https://vatlab.github.io/sos-docs/" class="uri">https://vatlab.github.io/sos-docs/</a>).</p>
389
416
  </div>
390
417
  <div id="gknitting-a-document" class="section level1">
391
418
  <h1>gKnitting a Document</h1>
392
- <p>This document describes gKnit. gKnit uses Knitr and R markdown to knit a document in Ruby or R and output it in any of the available formats for R markdown.<br />
393
- gKnit runs atop of GraalVM, and Galaaz (an integration library between Ruby and R). In gKnit, Ruby variables are persisted between chunks, making it an ideal solution for literate programming in this language. Also, since it is based on Galaaz, Ruby chunks can have access to R variables and Polyglot Programming with Ruby and R is quite natural.</p>
394
- <p>Galaaz has been describe already in the following posts:</p>
419
+ <p>This document describes gKnit. gKnit is based on knitr and <strong>R markdown</strong> and can knit a document written both in Ruby and/or R and output it in any of the available formats of <strong>R markdown</strong>. gKnit allows ruby developers to do literate programming and reproducible research by allowing them to have in a single document, text and code.</p>
420
+ <p>gKnit runs atop of GraalVM, and Galaaz (an integration library between Ruby and R - see bellow). In gKnit, Ruby variables are persisted between chunks, making it an ideal solution for literate programming in this language. Also, since it is based on Galaaz, Ruby chunks can have access to R variables and Polyglot Programming with Ruby and R is quite natural.</p>
421
+ <p>Galaaz has already been describe in the following posts:</p>
395
422
  <ul>
396
423
  <li><a href="https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021" class="uri">https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021</a>.<br />
397
424
  </li>
398
425
  <li><a href="https://medium.freecodecamp.org/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857" class="uri">https://medium.freecodecamp.org/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857</a></li>
399
426
  </ul>
400
- <p>This is not a blog post on R markdown, and the interested user is directed to the following links for detailed information on its capabilities and use.</p>
427
+ <p>This is not a blog post on <strong>R markdown</strong>, and the interested user is directed to the following links for detailed information on its capabilities and use.</p>
401
428
  <ul>
402
429
  <li><a href="https://rmarkdown.rstudio.com/" class="uri">https://rmarkdown.rstudio.com/</a> or</li>
403
430
  <li><a href="https://bookdown.org/yihui/rmarkdown/" class="uri">https://bookdown.org/yihui/rmarkdown/</a></li>
404
431
  </ul>
405
- <p>Here, we will describe quickly the main aspects of R markdown, so the user can start gKnitting Ruby and R documents quickly.</p>
432
+ <p>In this post, we will describe just the main aspects of <strong>R markdown</strong>, so the user can start gKnitting Ruby and R documents quickly.</p>
406
433
  <div id="the-yaml-header" class="section level2">
407
434
  <h2>The Yaml header</h2>
408
- <p>An R markdown document should start with a Yaml header and be stored in a file with ‘.Rmd’ extension. This document has the following header for gKitting an HTML document.</p>
435
+ <p>An <strong>R markdown</strong> document should start with a Yaml header and be stored in a file with ‘.Rmd’ extension. This document has the following header for gKitting an HTML document.</p>
409
436
  <pre><code>---
410
437
  title: &quot;How to do reproducible research in Ruby with gKnit&quot;
411
438
  author:
@@ -425,7 +452,7 @@ output:
425
452
  <p>For more information on the options in the Yaml header, check <a href="https://bookdown.org/yihui/rmarkdown/html-document.html" class="uri">https://bookdown.org/yihui/rmarkdown/html-document.html</a>.</p>
426
453
  </div>
427
454
  <div id="r-markdown-formatting" class="section level2">
428
- <h2>R Markdown formatting</h2>
455
+ <h2><strong>R Markdown</strong> formatting</h2>
429
456
  <p>Document formatting can be done with simple markups such as:</p>
430
457
  <div id="headers" class="section level3">
431
458
  <h3>Headers</h3>
@@ -451,7 +478,7 @@ output:
451
478
  3. Item 3
452
479
  + Item 3a
453
480
  + Item 3b</code></pre>
454
- <p>Please, go to <a href="https://rmarkdown.rstudio.com/authoring_basics.html" class="uri">https://rmarkdown.rstudio.com/authoring_basics.html</a>, for more R markdown formatting.</p>
481
+ <p>For more R markdown formatting go to <a href="https://rmarkdown.rstudio.com/authoring_basics.html" class="uri">https://rmarkdown.rstudio.com/authoring_basics.html</a>.</p>
455
482
  </div>
456
483
  <div id="r-chunks" class="section level3">
457
484
  <h3>R chunks</h3>
@@ -459,12 +486,12 @@ output:
459
486
  Inserting a code chunk is done by adding code in a block delimited by three back ticks followed by an open curly brace (‘{’) followed with the engine name (r, ruby, rb, include, …), an any optional chunk_label and options, as shown bellow:</p>
460
487
  <pre><code>```{engine_name [chunk_label], [chunk_options]}
461
488
  ```</code></pre>
462
- <p>for instance, let’s add an R chunk to the document labeled ‘first_r_chunk’. This is a very simple code just to create a variable and print it out. The code block should be defined as follows:</p>
489
+ <p>for instance, let’s add an R chunk to the document labeled ‘first_r_chunk’. This is a very simple code just to create a variable and print it out, as follows:</p>
463
490
  <pre><code>```{r first_r_chunk}
464
491
  vec &lt;- c(1, 2, 3)
465
492
  print(vec)
466
493
  ```</code></pre>
467
- <p>If this block is added to an R markdown document and gKnitted the result will be:</p>
494
+ <p>If this block is added to an <strong>R markdown</strong> document and gKnitted the result will be:</p>
468
495
  <pre class="r"><code>vec &lt;- c(1, 2, 3)
469
496
  print(vec)</code></pre>
470
497
  <pre><code>## [1] 1 2 3</code></pre>
@@ -476,8 +503,8 @@ print(vec3)
476
503
  ```</code></pre>
477
504
  <p>Here is how this block will show up in the document. Observe that the code is not shown and we only see the execution result in a white box</p>
478
505
  <pre><code>## [1] 10 40 90</code></pre>
479
- <p>A description of the available chunk options can be found in the documentation cited above.</p>
480
- <p>Let’s add another R chunkd with a function definition. In this example, a vector ‘r_vec’ is created and a new function ‘reduce_sum’ is defined. The chunk specification is</p>
506
+ <p>A description of the available chunk options can be found in <a href="https://yihui.name/knitr/" class="uri">https://yihui.name/knitr/</a>.</p>
507
+ <p>Let’s add another R chunk with a function definition. In this example, a vector ‘r_vec’ is created and a new function ‘reduce_sum’ is defined. The chunk specification is</p>
481
508
  <pre><code>```{r data_creation}
482
509
  r_vec &lt;- c(1, 2, 3, 4, 5)
483
510
 
@@ -485,7 +512,7 @@ reduce_sum &lt;- function(...) {
485
512
  Reduce(sum, as.list(...))
486
513
  }
487
514
  ```</code></pre>
488
- <p>and this is how it will look like once executed. From now on, we will not show the chunk definition any longer.</p>
515
+ <p>and this is how it will look like once executed. From now on, to be concise in the presentation we will not show chunk definitions any longer.</p>
489
516
  <pre class="r"><code>r_vec &lt;- c(1, 2, 3, 4, 5)
490
517
 
491
518
  reduce_sum &lt;- function(...) {
@@ -500,10 +527,12 @@ reduce_sum &lt;- function(...) {
500
527
  <div id="r-graphics-with-ggplot" class="section level3">
501
528
  <h3>R Graphics with ggplot</h3>
502
529
  <p>In the following chunk, we create a bubble chart in R using ggplot and include it in this document. Note that there is no directive in the code to include the image, this occurs automatically. The ‘mpg’ dataframe is natively available to R and to Galaaz as well.</p>
530
+ <p>For the reader not knowledgeable of ggplot, ggplot is a graphics library based on “the grammar of graphics” <span class="citation">(Wilkinson 2005)</span>. The idea of the grammar of graphics is to build a graphics by adding layers to the plot. More information can be found in <a href="https://towardsdatascience.com/a-comprehensive-guide-to-the-grammar-of-graphics-for-effective-visualization-of-multi-dimensional-1f92b4ed4149" class="uri">https://towardsdatascience.com/a-comprehensive-guide-to-the-grammar-of-graphics-for-effective-visualization-of-multi-dimensional-1f92b4ed4149</a>.</p>
531
+ <p>In the plot bellow the ‘mpg’ dataset from base R is used. “The data concerns city-cycle fuel consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5 continuous attributes.” (Quinlan, 1993)</p>
532
+ <p>First, the ‘mpg’ dataset if filtered to extract only cars from the following manumactures: Audi, Ford, Honda, and Hyundai and stored in the ‘mpg_select’ variable. Then, the selected dataframe is passed to the ggplot function specifying in the aesthetic method (aes) that ‘displacement’ (disp) should be plotted in the ‘x’ axis and ‘city mileage’ should be on the ‘y’ axis. In the ‘labs’ layer we pass the ‘title’ and ‘subtitle’ for the plot. To the basic plot ‘g’, geom_jitter is added, that plots cars from the same manufactures with the same color (col=manufactures) and the size of the car point equal its high way consumption (size = hwy). Finally, a last layer is plotter containing a linear regression line (method = “lm”) for every manufacturer.</p>
503
533
  <pre class="r"><code># load package and data
504
534
  library(ggplot2)
505
535
  data(mpg, package=&quot;ggplot2&quot;)
506
- # mpg &lt;- read.csv(&quot;http://goo.gl/uEeRGu&quot;)
507
536
 
508
537
  mpg_select &lt;- mpg[mpg$manufacturer %in% c(&quot;audi&quot;, &quot;ford&quot;, &quot;honda&quot;, &quot;hyundai&quot;), ]
509
538
 
@@ -515,16 +544,16 @@ g &lt;- ggplot(mpg_select, aes(displ, cty)) +
515
544
 
516
545
  g + geom_jitter(aes(col=manufacturer, size=hwy)) +
517
546
  geom_smooth(aes(col=manufacturer), method=&quot;lm&quot;, se=F)</code></pre>
518
- <p><img src="" /><!-- --></p>
547
+ <p><img src="" /><!-- --></p>
519
548
  </div>
520
549
  <div id="ruby-chunks" class="section level3">
521
550
  <h3>Ruby chunks</h3>
522
551
  <p>Including a Ruby chunk is just as easy as including an R chunk in the document: just change the name of the engine to ‘ruby’. It is also possible to pass chunk options to the Ruby engine; however, this version does not accept all the options that are available to R chunks. Future versions will add those options.</p>
523
552
  <pre><code>```{ruby first_ruby_chunk}
524
553
  ```</code></pre>
525
- <p>In this example, the ruby chunk is called ‘first_ruby_chunk’. One important aspect of chunk labels is that they cannot be duplicated. If a chunk label is duplicated, gKnitting will stop with an error.</p>
526
- <p>Another relevant point with Ruby chunks is that they are evaluated in the scope of a class called RubyChunk. To make sure that variables are available between chunks, they should be made as instance variables of the RubyChunk class. In the following chunk, variable ‘@a’, ‘@b’ and ‘@c’ are standard Ruby variables and ‘@vec’ and ‘@vec2’ are two vectors created by calling the ‘c’ method on the R module.</p>
527
- <p>In Galaaz, the R module allows us to access R functions transparently. The ‘c’ function in R, is a function that concatenates its arguments making a vector. Calling the ‘c’ method in the R module is automatically converted to calling the ‘c’ function in R, that, through Galaaz and the Truffle interface creates the vector.</p>
554
+ <p>In this example, the ruby chunk is called ‘first_ruby_chunk’. One important aspect of chunk labels is that they cannot be duplicated. If a chunk label is duplicated, gKnit will stop with an error.</p>
555
+ <p>Another important point with Ruby chunks is that they are evaluated in the scope of a class called RubyChunk. To make sure that variables are available between chunks, they should be made as instance variables of the RubyChunk class. In the following chunk, variable ‘@a’, ‘@b’ and ‘@c’ are standard Ruby variables and ‘@vec’ and ‘@vec2’ are two vectors created by calling the ‘c’ method on the R module.</p>
556
+ <p>In Galaaz, the R module allows us to access R functions transparently. The ‘c’ function in R, is a function that concatenates its arguments making a vector.</p>
528
557
  <p>It should be clear that there is no requirement in gknit to call or use any R functions. gKnit will knit standard Ruby code, or even general text without any code.</p>
529
558
  <pre class="ruby"><code>@a = [1, 2, 3]
530
559
  @b = &quot;US$ 250.000&quot;
@@ -537,7 +566,7 @@ g + geom_jitter(aes(col=manufacturer, size=hwy)) +
537
566
  puts @vec * @vec2</code></pre>
538
567
  <pre><code>## [1, 2, 3]
539
568
  ## [1] 10 40 90</code></pre>
540
- <p>Note that <span class="citation">@a</span> is a standard Ruby Array and <span class="citation">@vec</span> and <span class="citation">@vec2</span> are vectors that behave accordingly, where multiplication works as expected.</p>
569
+ <p>Note that @a is a standard Ruby Array and @vec and @vec2 are vectors that behave accordingly, where multiplication works as expected.</p>
541
570
  </div>
542
571
  <div id="accessing-r-from-ruby" class="section level3">
543
572
  <h3>Accessing R from Ruby</h3>
@@ -550,7 +579,7 @@ puts @vec * @vec2</code></pre>
550
579
  </div>
551
580
  <div id="ruby-plotting" class="section level3">
552
581
  <h3>Ruby Plotting</h3>
553
- <p>We have seen an example of plotting with R. Plotting with Ruby does not require anything different from plotting with R. In the following example we plot a diverging bar graph using the ‘mtcars’ dataframe from R. This data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). The ten aspects are:</p>
582
+ <p>We have seen an example of plotting with R. Plotting with Ruby does not require anything different from plotting with R. In the following example, we plot a diverging bar graph using the ‘mtcars’ dataframe from R. This data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). The ten aspects are:</p>
554
583
  <ul>
555
584
  <li>mpg: Miles/(US) gallon</li>
556
585
  <li>cyl: Number of cylinders</li>
@@ -564,17 +593,50 @@ puts @vec * @vec2</code></pre>
564
593
  <li>gear: Number of forward gears</li>
565
594
  <li>carb: Number of carburetors</li>
566
595
  </ul>
596
+ <pre class="ruby"><code># copy the R variable :mtcars to the Ruby mtcars variable
597
+ @mtcars = ~:mtcars
598
+
599
+ # create a new column 'car_name' to store the car names so that it can be
600
+ # used for plotting. The 'rownames' of the data frame cannot be used as
601
+ # data for plotting
602
+ @mtcars.car_name = R.rownames(:mtcars)
603
+
604
+ # compute normalized mpg and add it to a new column called mpg_z
605
+ # Note that the mean value for mpg can be obtained by calling the 'mean'
606
+ # function on the vector 'mtcars.mpg'. The same with the standard
607
+ # deviation 'sd'. The vector is then rounded to two digits with 'round 2'
608
+ @mtcars.mpg_z = ((@mtcars.mpg - @mtcars.mpg.mean)/@mtcars.mpg.sd).round 2
609
+
610
+ # create a new column 'mpg_type'. Function 'ifelse' is a vectorized function
611
+ # that looks at every element of the mpg_z vector and if the value is below
612
+ # 0, returns 'below', otherwise returns 'above'
613
+ @mtcars.mpg_type = (@mtcars.mpg_z &lt; 0).ifelse(&quot;below&quot;, &quot;above&quot;)
614
+
615
+ # order the mtcar data set by the mpg_z vector from smaler to larger values
616
+ @mtcars = @mtcars[@mtcars.mpg_z.order, :all]
617
+
618
+ # convert the car_name column to a factor to retain sorted order in plot
619
+ @mtcars.car_name = @mtcars.car_name.factor levels: @mtcars.car_name
620
+
621
+ # let's look at the first records of the final data frame
622
+ puts @mtcars.head</code></pre>
623
+ <pre><code>## mpg cyl disp hp drat wt qsec vs am gear carb
624
+ ## Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
625
+ ## Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
626
+ ## Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
627
+ ## Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
628
+ ## Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
629
+ ## Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
630
+ ## car_name mpg_z mpg_type
631
+ ## Cadillac Fleetwood Cadillac Fleetwood -1.61 below
632
+ ## Lincoln Continental Lincoln Continental -1.61 below
633
+ ## Camaro Z28 Camaro Z28 -1.13 below
634
+ ## Duster 360 Duster 360 -0.96 below
635
+ ## Chrysler Imperial Chrysler Imperial -0.89 below
636
+ ## Maserati Bora Maserati Bora -0.84 below</code></pre>
567
637
  <pre class="ruby"><code>require 'ggplot'
568
638
 
569
- mtcars = ~:mtcars
570
-
571
- mtcars.car_name = mtcars.rownames # create new column for car names
572
- mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean) / mtcars.mpg.sd).round 2
573
- mtcars.mpg_type = (mtcars.mpg_z &lt; 0).ifelse('below', 'above')
574
- mtcars = mtcars[mtcars.mpg_z.order, :all]
575
- mtcars.car_name = R.factor(mtcars.car_name, levels: mtcars.car_name)
576
-
577
- puts mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
639
+ puts @mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
578
640
  R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
579
641
  R.scale_fill_manual(name: 'Mileage',
580
642
  labels: R.c('Above Average', 'Below Average'),
@@ -591,7 +653,6 @@ puts mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
591
653
  ```{rb puts @b}
592
654
  ```
593
655
  and is followed by some other text!</code></pre>
594
- <p>Note that it is important not to add any new line before of after the code block if we want everything to be in only one line, resulting in the following sentence with inline Ruby code</p>
595
656
  <div style="margin-bottom:30px;">
596
657
 
597
658
  </div>
@@ -599,21 +660,22 @@ and is followed by some other text!</code></pre>
599
660
  <div style="margin-bottom:30px;">
600
661
 
601
662
  </div>
663
+ <p>Note that it is important not to add any new line before of after the code block if we want everything to be in only one line, resulting in the following sentence with inline Ruby code.</p>
602
664
  </div>
603
665
  <div id="the-outputs-function" class="section level3">
604
666
  <h3>The ‘outputs’ function</h3>
605
- <p>He have previously used the standard ‘puts’ method in Ruby chunks in order to get some output. As can be seen, the result of a ‘puts is formatted inside a white box that follows the code block. Many times however, we would like to do some processing in the Ruby chunk and have the result of this processing generate and output that is included in the document as if we had typed it in R markdown.</p>
606
- <p>For example, suppose we want to create a new heading in our document, but the heading phrase is the result of some code processing: maybe it’s the first line of a file we are going to read. Method ‘outputs’ adds its output as if typed in the R markdown document.</p>
607
- <p>Take now a look at variable ‘<span class="citation">@c</span>’ (it was defined in a previous block above) as ‘<span class="citation">@c</span> = “The ‘outputs’ function”. “The ‘outputs’ function” is actually the name of this section and it was created using the ’outputs’ function inside a Ruby chunk.</p>
667
+ <p>He have previously used the standard ‘puts’ method in Ruby chunks in order produce output. The result of a ‘puts’, as seen in all previous chunks that use it, is formatted inside a white box that follows the code block. Many times however, we would like to do some processing in the Ruby chunk and have the result of this processing generate and output that is included in the document as if we had typed it in <strong>R markdown</strong> document.</p>
668
+ <p>For example, suppose we want to create a new heading in our document, but the heading phrase is the result of some code processing: maybe it’s the first line of a file we are going to read. Method ‘outputs’ adds its output as if typed in the <strong>R markdown</strong> document.</p>
669
+ <p>Take now a look at variable ‘@c (it was defined in a previous block above) as ‘@c = “The ‘outputs’ function”. “The ‘outputs’ function” is actually the name of this section and it was created using the ’outputs’ function inside a Ruby chunk.</p>
608
670
  <p>The ruby chunk to generate this heading is:</p>
609
671
  <pre><code>```{ruby heading}
610
672
  outputs &quot;### #{@c}&quot;
611
673
  ```</code></pre>
612
- <p>The three ‘###’ are the way we add a Heading 3 in R markdown.</p>
674
+ <p>The three ‘###’ is the way we add a Heading 3 in <strong>R markdown</strong>.</p>
613
675
  </div>
614
676
  <div id="html-output-from-ruby-chunks" class="section level3">
615
677
  <h3>HTML Output from Ruby Chunks</h3>
616
- <p>We’ve just seen the use of method ‘outputs’ to add text to the the R markdown document. This technique can also be used to add HTML code to the document. In R markdown any html code typed directly in the document will be properly rendered.<br />
678
+ <p>We’ve just seen the use of method ‘outputs’ to add text to the the <strong>R markdown</strong> document. This technique can also be used to add HTML code to the document. In <strong>R markdown</strong>, any html code typed directly in the document will be properly rendered.<br />
617
679
  Here, for instance, is a table definition in HTML and its output in the document:</p>
618
680
  <pre><code>&lt;table style=&quot;width:100%&quot;&gt;
619
681
  &lt;tr&gt;
@@ -673,12 +735,10 @@ Jackson
673
735
  <div style="margin-bottom:30px;">
674
736
 
675
737
  </div>
676
- <p>But manually creating HTML output is not always easy or desirable. The above table certainly looks ugly. The ‘kableExtra’ library is a great library for creating beautiful tables. Take a look at <a href="https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html" class="uri">https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html</a></p>
738
+ <p>But manually creating HTML output is not always easy or desirable, specially if we intend the document to be rendered in other formats, for example, as Latex. Also, The above table looks ugly. The ‘kableExtra’ library is a great library for creating beautiful tables. Take a look at <a href="https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html" class="uri">https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html</a></p>
677
739
  <p>In the next chunk, we output the ‘mtcars’ dataframe from R in a nicely formatted table. Note that we retrieve the mtcars dataframe by using ‘~:mtcars’.</p>
678
740
  <pre class="ruby"><code>R.install_and_loads('kableExtra')
679
741
  outputs (~:mtcars).kable.kable_styling</code></pre>
680
- <pre><code>## The following packages are missing and will be installed:
681
- ## [1] &quot;kableExtra&quot;</code></pre>
682
742
  <table class="table" style="margin-left: auto; margin-right: auto;">
683
743
  <thead>
684
744
  <tr>
@@ -1939,15 +1999,15 @@ Volvo 142E
1939
1999
  </tbody>
1940
2000
  </table>
1941
2001
  </div>
1942
- <div id="including-ruby-files" class="section level3">
1943
- <h3>Including Ruby files</h3>
1944
- <p>R is a language that was created to be easy and fast for statisticians to use. As far as I know (and please correct me if you think otherwise), tt was not a language to be used for developing large systems. Of course, there are large systems and libraries in R, but the focus of the language is for developing statistical models and distribute that to peers.</p>
1945
- <p>Ruby on the other hand, is a language for large software development. Systems written in Ruby will have dozens, hundreds or even thousands of files. In order to document a large system with literate programming we cannot expect the developer to add all the files in a single ‘.Rmd’ file. gKnit provides the ‘include’ chunk engine to include a Ruby file as if it had being typed in the ‘.Rmd’ file.</p>
1946
- <p>To include a file, the following chunk should be created, where <filename> is the name of the file to be include and where the extension, if it is ‘.rb’, does not need to be added. If the ‘relative’ option is not included, then it is treated as TRUE. When ‘relative’ is true, ‘require_relative’ semantics is used to load the file, when false, Ruby’s $LOAD_PATH is searched to find the file and it is ’require’d.</p>
2002
+ <div id="including-ruby-files-in-a-chunk" class="section level3">
2003
+ <h3>Including Ruby files in a chunk</h3>
2004
+ <p>R is a language that was created to be easy and fast for statisticians to use. As far as I know, it was not a language to be used for developing large systems. Of course, there are large systems and libraries in R, but the focus of the language is for developing statistical models and distribute that to peers.</p>
2005
+ <p>Ruby on the other hand, is a language for large software development. Systems written in Ruby will have dozens, hundreds or even thousands of files. To document a large system with literate programming, we cannot expect the developer to add all the files in a single ‘.Rmd’ file. gKnit provides the ‘include’ chunk engine to include a Ruby file as if it had being typed in the ‘.Rmd’ file.</p>
2006
+ <p>To include a file, the following chunk should be created, where <filename> is the name of the file to be included and where the extension, if it is ‘.rb’, does not need to be added. If the ‘relative’ option is not included, then it is treated as TRUE. When ‘relative’ is true, ruby’s ‘require_relative’ semantics is used to load the file, when false, Ruby’s $LOAD_PATH is searched to find the file and it is ’require’d.</p>
1947
2007
  <pre><code>```{include &lt;filename&gt;, relative = &lt;TRUE/FALSE&gt;}
1948
2008
  ```</code></pre>
1949
- <p>Here we include file ‘model.rb which is in the same directory of this blog.<br />
1950
- This code uses R ‘caret’ package to split a dataset in a train and test sets. The ‘caret’ package is a very important a useful package for doing Data Analysis, it has hundreds of functions for all steps of the Data Analysis workflow. To just split a dataset it is using the proverbial cannon to kill the fly. We use it here only to show that integrating Ruby and R and using even a very comples package as ‘caret’ is trivial with Galaaz.</p>
2009
+ <p>Bellow we include file ‘model.rb’, which is in the same directory of this blog.<br />
2010
+ This code uses R ‘caret’ package to split a dataset in a train and test sets. The ‘caret’ package is a very important a useful package for doing Data Analysis, it has hundreds of functions for all steps of the Data Analysis workflow. To use ‘caret’ just to split a dataset is like using the proverbial cannon to kill the fly. We use it here only to show that integrating Ruby and R and using even a very complex package as ‘caret’ is trivial with Galaaz.</p>
1951
2011
  <p>A word of advice: the ‘caret’ package has lots of dependencies and installing it in a Linux system is a time consuming operation. Method ‘R.install_and_loads’ will install the package if it is not already installed and can take a while.</p>
1952
2012
  <pre><code>```{include model}
1953
2013
  ```</code></pre>
@@ -2013,8 +2073,8 @@ puts model.test.head</code></pre>
2013
2073
  </div>
2014
2074
  <div id="documenting-gems" class="section level3">
2015
2075
  <h3>Documenting Gems</h3>
2016
- <p>gKnit also allows developers to document and load files that are not in the same directory of the ‘.Rmd’ file. When using ‘relative = FALSE’ in a chunk header, gKnit will look for the file in Ruby’s $LOAD_PATH and load it if found.</p>
2017
- <p>Here is an example of loading the ‘find.rb’ file from TruffleRuby.</p>
2076
+ <p>gKnit also allows developers to document and load files that are not in the same directory of the ‘.Rmd’ file.</p>
2077
+ <p>Here is an example of loading the ‘find.rb’ file from TruffleRuby. In this example, relative is set to FALSE, so Ruby will look for the file in its $LOAD_PATH, and the user does not need to no it’s directory.</p>
2018
2078
  <pre><code>```{include find, relative = FALSE}
2019
2079
  ```</code></pre>
2020
2080
  <pre class="include"><code># frozen_string_literal: true
@@ -2109,7 +2169,7 @@ end</code></pre>
2109
2169
  </div>
2110
2170
  <div id="converting-to-pdf" class="section level2">
2111
2171
  <h2>Converting to PDF</h2>
2112
- <p>One of the beauties of knitr is that the same input can be converted to many different outputs. One very useful format, is, of course, PDF. In order to converted an R markdown file to PDF it is necessary to have LaTeX installed on the system. We will not explain here how to install LaTeX as there are plenty of documents on the web showing how to proceed.</p>
2172
+ <p>One of the beauties of knitr is that the same input can be converted to many different outputs. One very useful format, is, of course, PDF. In order to converted an <strong>R markdown</strong> file to PDF it is necessary to have LaTeX installed on the system. We will not explain here how to install LaTeX as there are plenty of documents on the web showing how to proceed.</p>
2113
2173
  <p>gKnit comes with a simple LaTeX style file for gknitting this blog as a PDF document. Here is the Yaml header to generate this blog in PDF format instead of HTML:</p>
2114
2174
  <pre><code>---
2115
2175
  title: &quot;gKnit - Ruby and R Knitting with Galaaz in GraalVM&quot;
@@ -2119,15 +2179,16 @@ date: &quot;29 October 2018&quot;
2119
2179
  output:
2120
2180
  pdf_document:
2121
2181
  includes:
2122
- in_header: [&quot;../../sty/galaaz.sty&quot;]
2123
- number_sections: yes
2182
+ in\_header: [&quot;../../sty/galaaz.sty&quot;]
2183
+ number\_sections: yes
2124
2184
  ---</code></pre>
2125
2185
  </div>
2126
2186
  </div>
2127
2187
  <div id="conclusion" class="section level1">
2128
2188
  <h1>Conclusion</h1>
2129
- <p>One of the promises of GraalVM is that users/developers will be able to use the best tool for their task at hand, independently of the programming language the tool was written. Galaaz and gKnit are not trivial implementations atop the GraalVM and Truffle interop messages; however, the time and effort it took to wrap Ruby over R - Galaaz - (not finished yet) or to wrap Knitr with gKnit is a fraction of a fraction of a fraction of the time require to implement the original tools. Trying to reimplement all R packages in Ruby would require the same effort it is taking Python to implement NumPy, Panda and all supporting libraries and it is unlikely that this effort would ever be done. GraalVM has allowed Ruby to profit “almost for free” from this huge set of libraries and tools that make R one of the most used languages for data analysis and machine learning.</p>
2130
- <p>More interesting though than being able to wrap the R libraries with Ruby, is that Ruby adds value to R, by allowing developers to use powerful and modern constructs for code reuse that are not the strong points of R. As shown in this blog, R and Ruby can easily communicate and R can be structured in classes and modules in a way that greatly expands its power and readability.</p>
2189
+ <p>In order to do reproducible research, one of the main basic tools needed is a systhem that allows “literate programming” where text, code and possibly a set of files can be compiled onto a report that can be easily distributed to peers. Peers should be able to use this same set of files to rerun the compilation by their own obtaining the exact same original report. gKnit is such a system for Ruby and R. It uses <strong>R Markdown</strong> to integrate text and code chunks, where code chunks can either be part of the <strong>R Markdwon</strong> file or be imported from files in the system. Ideally, in reproducible research, all the files needed to rebuild a report should be easilly packed together (in the same zipped directory) and distributed to peers for reexecution.</p>
2190
+ <p>One of the promises of Oracle’s GraalVM is that users/developers will be able to use the best tool for their task at hand, independently of the programming language the tool was written on. We developed and implemented Galaaz atop the GraalVM and Truffle interop messages and the time and effort to wrap Ruby over R - Galaaz - or to wrap Knitr with gKnit was a fraction of a fraction of a fraction (one man effort for a couple of hours a day, for approximately six months) of the time require to implement the original tools. Trying to reimplement all R packages in Ruby would require the same effort it is taking Python to implement NumPy, Pandas and all supporting libraries and it is unlikely that this effort would ever be done. GraalVM has allowed Ruby to profit “almost for free” from this huge set of libraries and tools that make R one of the most used languages for data analysis and machine learning.</p>
2191
+ <p>More interesting than wrapping the R libraries with Ruby, is that Ruby adds value to R, by allowing developers to use powerful and modern constructs for code reuse that are not the strong points of R. As shown in this blog, R and Ruby can easily communicate and R can be structured in classes and modules in a way that greatly expands its power and readability.</p>
2131
2192
  </div>
2132
2193
  <div id="installing-gknit" class="section level1">
2133
2194
  <h1>Installing gKnit</h1>
@@ -2159,6 +2220,17 @@ output:
2159
2220
  </ul>
2160
2221
  </div>
2161
2222
  </div>
2223
+ <div id="references" class="section level1 unnumbered">
2224
+ <h1>References</h1>
2225
+ <div id="refs" class="references">
2226
+ <div id="ref-Knuth:literate_programming">
2227
+ <p>Knuth, Donald E. 1984. “Literate Programming.” <em>Comput. J.</em> 27 (2). Oxford, UK: Oxford University Press: 97–111. <a href="https://doi.org/10.1093/comjnl/27.2.97" class="uri">https://doi.org/10.1093/comjnl/27.2.97</a>.</p>
2228
+ </div>
2229
+ <div id="ref-Wilkinson:grammar_of_graphics">
2230
+ <p>Wilkinson, Leland. 2005. <em>The Grammar of Graphics (Statistics and Computing)</em>. Berlin, Heidelberg: Springer-Verlag.</p>
2231
+ </div>
2232
+ </div>
2233
+ </div>
2162
2234
 
2163
2235
 
2164
2236
 
@@ -5,6 +5,7 @@ author:
5
5
  - "Daniel Mossé - University of Pittsburgh"
6
6
  tags: [Tech, Data Science, Ruby, R, GraalVM]
7
7
  date: "20/02/2019"
8
+ bibliography: stats.bib
8
9
  output:
9
10
  html_document:
10
11
  self_contained: true
@@ -19,7 +20,8 @@ output:
19
20
 
20
21
  # Introduction
21
22
 
22
- The idea of "literate programming" was first introduced by Donald Knuth in the 1980's.
23
+ The idea of "literate programming" was first introduced by Donald Knuth in the
24
+ 1980's [@Knuth:literate_programming].
23
25
  The main intention of this approach was to develop software interspersing macro snippets,
24
26
  traditional source code, and a natural language such as English in a document
25
27
  that could be compiled into
@@ -35,19 +37,22 @@ single document or set of documents that when distributed to peers could be reru
35
37
  the same output and reports.
36
38
 
37
39
  The R community has put a great deal of effort in reproducible research. In 2002, Sweave was
38
- introduced and it allowed mixing R code with Latex generating high quality PDF documents. Those
39
- documents could include the code, the result of executing the code, graphics and text. This
40
- contained the whole narrative to reproduce the research. But Sweave had many problems and in
41
- 2012, Knitr, developed by Yihui Xie from RStudio was released, solving many of the long lasting
42
- problems from Sweave and including in one single package many extensions and add-on packages that
40
+ introduced and it allowed mixing R code with Latex generating high quality PDF documents. A
41
+ Sweave
42
+ document could include code, the results of executing the code, graphics and text such that it
43
+ contained the whole narrative to reproduce the research. In
44
+ 2012, Knitr, developed by Yihui Xie from RStudio was released to replace Sweave and to
45
+ consolidate in one single package the many extensions and add-on packages that
43
46
  were necessary for Sweave.
44
47
 
45
- With Knitr, R markdown was also developed, an extension to the
46
- Markdown format. With R markdown and Knitr it is possible to generate reports in a multitude
47
- of formats such as HTML, markdown, Latex, PDF, dvi, etc. R markdown also allows the use of
48
- multiple programming languages in the same document. In R markdown text is interspersed with
48
+ With Knitr, __R markdown__ was also developed, an extension to the
49
+ Markdown format. With __R markdown__ and Knitr it is possible to generate reports in a multitude
50
+ of formats such as HTML, markdown, Latex, PDF, dvi, etc. __R markdown__ also allows the use of
51
+ multiple programming languages such as R, Ruby, Python, etc. in the same document.
52
+
53
+ In __R markdown__, text is interspersed with
49
54
  code chunks that can be executed and both the code and its results can become
50
- part of the final report. Although R markdown allows multiple programming languages in the
55
+ part of the final report. Although __R markdown__ allows multiple programming languages in the
51
56
  same document, only R and Python (with
52
57
  the reticulate package) can persist variables between chunks. For other languages, such as
53
58
  Ruby, every chunk will start a new process and thus all data is lost between chunks, unless it
@@ -55,46 +60,102 @@ is somehow stored in a data file that is read by the next chunk.
55
60
 
56
61
  Being able to persist data
57
62
  between chunks is critical for literate programming otherwise the flow of the narrative is lost
58
- by all the effort of having to save data and then reload it. Probably, because of
59
- this impossibility,
60
- it is very rare to see any R markdown document in the Ruby community. Also, the use of
61
- R markdown for the Ruby community would also require the Ruby developer to download R and
62
- have some minimal knowledge of Knitr.
63
+ by all the effort of having to save data and then reload it. Although this might, at first, seem like
64
+ a small nuisance, not being able to persist data between chunks is a major issue. For example, let's
65
+ take a look at the following simple example in which we want to show how to create a list and the
66
+ use it. Let's first assume that data cannot be persisted between chunks. In the next chunk we
67
+ create a list, then we would need to save it to file, but to save it, we need somehow to marshal the
68
+ data into a binary format:
69
+
70
+
71
+ ```ruby
72
+ lst = R.list(a: 1, b: 2, c: 3)
73
+ lst.saveRDS("lst.rds")
74
+ ```
75
+ then, on the next chunk, where variable 'lst' is used, we need to read back it's value
76
+
77
+
78
+ ```ruby
79
+ lst = R.readRDS("lst.rds")
80
+ puts lst
81
+ ```
82
+
83
+ ```
84
+ ## $a
85
+ ## [1] 1
86
+ ##
87
+ ## $b
88
+ ## [1] 2
89
+ ##
90
+ ## $c
91
+ ## [1] 3
92
+ ```
93
+
94
+ Now, any single code has dozens of variables that we might want to use and reuse between chunks.
95
+ Clearly, such an approach becomes quickly unmanageable. Probably, because of
96
+ this problem, it is very rare to see any __R markdown__ document in the Ruby community.
97
+
98
+ When variables can be used accross chunks, then no overhead is needed:
99
+
100
+
101
+ ```ruby
102
+ @lst = R.list(a: 1, b: 2, c: 3)
103
+ # any other code can be added here
104
+ ```
105
+
106
+
107
+ ```ruby
108
+ puts @lst
109
+ ```
110
+
111
+ ```
112
+ ## $a
113
+ ## [1] 1
114
+ ##
115
+ ## $b
116
+ ## [1] 2
117
+ ##
118
+ ## $c
119
+ ## [1] 3
120
+ ```
63
121
 
64
122
  In the Python community, the same effort to have code and text in an integrated environment
65
123
  started around the first decade of 2000. In 2006 iPython 0.7.2 was released. In 2014,
66
124
  Fernando Pérez, spun off project Jupyter from iPython creating a web-based interactive
67
125
  computation environment. Jupyter can now be used with many languages, including Ruby with the
68
- iruby gem (https://github.com/SciRuby/iruby). I am not sure if multiple languages can be used
69
- in a Jupyter notebook and if variables can persist between chunks.
126
+ iruby gem (https://github.com/SciRuby/iruby). In order to have multiple languages in a Jupyter
127
+ notebook the SoS kernel was developed (https://vatlab.github.io/sos-docs/).
70
128
 
71
129
  # gKnitting a Document
72
130
 
73
- This document describes gKnit. gKnit uses Knitr and R markdown to knit a document in Ruby or R
74
- and output it in any of the available formats for R markdown.
131
+ This document describes gKnit. gKnit is based on knitr and __R markdown__ and can knit a document
132
+ written both in Ruby and/or R and output it in any of the available formats of __R markdown__. gKnit
133
+ allows ruby developers to do literate programming and reproducible research by allowing them to
134
+ have in a single document, text and code.
135
+
75
136
  gKnit runs atop of GraalVM, and Galaaz (an integration
76
- library between Ruby and R). In gKnit, Ruby variables are persisted between chunks, making
77
- it an ideal solution for literate programming in this language. Also, since it is based on
78
- Galaaz, Ruby chunks can have access to R variables and Polyglot Programming with Ruby and R
79
- is quite natural.
137
+ library between Ruby and R - see bellow). In gKnit, Ruby variables are persisted between
138
+ chunks, making it an ideal solution for literate programming in this language. Also,
139
+ since it is based on Galaaz, Ruby chunks can have access to R variables and Polyglot Programming
140
+ with Ruby and R is quite natural.
80
141
 
81
- Galaaz has been describe already in the following posts:
142
+ Galaaz has already been describe in the following posts:
82
143
 
83
144
  * https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021.
84
145
  * https://medium.freecodecamp.org/how-to-make-beautiful-ruby-plots-with-galaaz-320848058857
85
146
 
86
- This is not a blog post on R markdown, and the interested user is directed to the following links
147
+ This is not a blog post on __R markdown__, and the interested user is directed to the following links
87
148
  for detailed information on its capabilities and use.
88
149
 
89
150
  * https://rmarkdown.rstudio.com/ or
90
151
  * https://bookdown.org/yihui/rmarkdown/
91
152
 
92
- Here, we will describe quickly the main aspects of R markdown, so the user can start gKnitting
93
- Ruby and R documents quickly.
153
+ In this post, we will describe just the main aspects of __R markdown__, so the user can start
154
+ gKnitting Ruby and R documents quickly.
94
155
 
95
156
  ## The Yaml header
96
157
 
97
- An R markdown document should start with a Yaml header and be stored in a file with
158
+ An __R markdown__ document should start with a Yaml header and be stored in a file with
98
159
  '.Rmd' extension. This document has the following header for gKitting an HTML document.
99
160
 
100
161
  ```
@@ -118,7 +179,7 @@ output:
118
179
 
119
180
  For more information on the options in the Yaml header, check https://bookdown.org/yihui/rmarkdown/html-document.html.
120
181
 
121
- ## R Markdown formatting
182
+ ## __R Markdown__ formatting
122
183
 
123
184
  Document formatting can be done with simple markups such as:
124
185
 
@@ -154,7 +215,7 @@ Ordered Lists
154
215
  + Item 3b
155
216
  ```
156
217
 
157
- Please, go to https://rmarkdown.rstudio.com/authoring_basics.html, for more R markdown formatting.
218
+ For more R markdown formatting go to https://rmarkdown.rstudio.com/authoring_basics.html.
158
219
 
159
220
  ### R chunks
160
221
 
@@ -170,8 +231,7 @@ any optional chunk_label and options, as shown bellow:
170
231
  ````
171
232
 
172
233
  for instance, let's add an R chunk to the document labeled 'first_r_chunk'. This is
173
- a very simple code just to create a variable and print it out. The code block should
174
- be defined as follows:
234
+ a very simple code just to create a variable and print it out, as follows:
175
235
 
176
236
  ````
177
237
  ```{r first_r_chunk}
@@ -180,7 +240,7 @@ print(vec)
180
240
  ```
181
241
  ````
182
242
 
183
- If this block is added to an R markdown document and gKnitted the result will be:
243
+ If this block is added to an __R markdown__ document and gKnitted the result will be:
184
244
 
185
245
 
186
246
  ```r
@@ -210,9 +270,9 @@ and we only see the execution result in a white box
210
270
  ## [1] 10 40 90
211
271
  ```
212
272
 
213
- A description of the available chunk options can be found in the documentation cited above.
273
+ A description of the available chunk options can be found in https://yihui.name/knitr/.
214
274
 
215
- Let's add another R chunkd with a function definition. In this example, a vector
275
+ Let's add another R chunk with a function definition. In this example, a vector
216
276
  'r_vec' is created and
217
277
  a new function 'reduce_sum' is defined. The chunk specification is
218
278
 
@@ -226,8 +286,8 @@ reduce_sum <- function(...) {
226
286
  ```
227
287
  ````
228
288
 
229
- and this is how it will look like once executed. From now on, we will not
230
- show the chunk definition any longer.
289
+ and this is how it will look like once executed. From now on, to be concise in the
290
+ presentation we will not show chunk definitions any longer.
231
291
 
232
292
 
233
293
 
@@ -264,12 +324,29 @@ this document. Note that there is no directive in the code to include the image
264
324
  occurs automatically. The 'mpg' dataframe is natively available to R and to Galaaz as
265
325
  well.
266
326
 
327
+ For the reader not knowledgeable of ggplot, ggplot is a graphics library based on "the
328
+ grammar of graphics" [@Wilkinson:grammar_of_graphics]. The idea of the grammar of graphics
329
+ is to build a graphics by adding layers to the plot. More information can be found in
330
+ https://towardsdatascience.com/a-comprehensive-guide-to-the-grammar-of-graphics-for-effective-visualization-of-multi-dimensional-1f92b4ed4149.
331
+
332
+ In the plot bellow the 'mpg' dataset from base R is used. "The data concerns city-cycle fuel
333
+ consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5
334
+ continuous attributes." (Quinlan, 1993)
335
+
336
+ First, the 'mpg' dataset if filtered to extract only cars from the following manumactures: Audi, Ford,
337
+ Honda, and Hyundai and stored in the 'mpg_select' variable. Then, the selected dataframe is passed
338
+ to the ggplot function specifying in the aesthetic method (aes) that 'displacement' (disp) should
339
+ be plotted in the 'x' axis and 'city mileage' should be on the 'y' axis. In the 'labs' layer we
340
+ pass the 'title' and 'subtitle' for the plot. To the basic plot 'g', geom\_jitter is added, that
341
+ plots cars from the same manufactures with the same color (col=manufactures) and the size of the
342
+ car point equal its high way consumption (size = hwy). Finally, a last layer is plotter containing
343
+ a linear regression line (method = "lm") for every manufacturer.
344
+
267
345
 
268
346
  ```r
269
347
  # load package and data
270
348
  library(ggplot2)
271
349
  data(mpg, package="ggplot2")
272
- # mpg <- read.csv("http://goo.gl/uEeRGu")
273
350
 
274
351
  mpg_select <- mpg[mpg$manufacturer %in% c("audi", "ford", "honda", "hyundai"), ]
275
352
 
@@ -300,20 +377,17 @@ available to R chunks. Future versions will add those options.
300
377
 
301
378
  In this example, the ruby chunk is called 'first_ruby_chunk'. One important
302
379
  aspect of chunk labels is that they cannot be duplicated. If a chunk label is
303
- duplicated, gKnitting will stop with an error.
380
+ duplicated, gKnit will stop with an error.
304
381
 
305
- Another relevant point with Ruby chunks is that they are evaluated in the scope
382
+ Another important point with Ruby chunks is that they are evaluated in the scope
306
383
  of a class called RubyChunk. To make sure that variables are
307
384
  available between chunks, they should be made as instance variables of the
308
385
  RubyChunk class. In the following chunk, variable '\@a', '\@b' and '\@c'
309
386
  are standard Ruby variables and '\@vec' and '\@vec2' are two vectors created
310
387
  by calling the 'c' method on the R module.
311
388
 
312
- In Galaaz, the R module allows us to access R functions transparently. The 'c'
389
+ In Galaaz, the R module allows us to access R functions transparently. The 'c'
313
390
  function in R, is a function that concatenates its arguments making a vector.
314
- Calling the 'c' method in the R module is automatically converted to calling the
315
- 'c' function in R, that, through Galaaz and the Truffle interface creates the
316
- vector.
317
391
 
318
392
  It
319
393
  should be clear that there is no requirement in gknit to call or use any R
@@ -343,7 +417,7 @@ puts @vec * @vec2
343
417
  ## [1] 10 40 90
344
418
  ```
345
419
 
346
- Note that @a is a standard Ruby Array and @vec and @vec2 are vectors that behave accordingly,
420
+ Note that \@a is a standard Ruby Array and \@vec and \@vec2 are vectors that behave accordingly,
347
421
  where multiplication works as expected.
348
422
 
349
423
 
@@ -379,7 +453,7 @@ puts R.reduce_sum(~:r_vec)
379
453
  ### Ruby Plotting
380
454
 
381
455
  We have seen an example of plotting with R. Plotting with Ruby does not require
382
- anything different from plotting with R. In the following example we plot a
456
+ anything different from plotting with R. In the following example, we plot a
383
457
  diverging bar graph using the 'mtcars' dataframe from R. This data was extracted
384
458
  from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects
385
459
  of automobile design and performance for 32 automobiles (1973–74 models). The
@@ -400,17 +474,57 @@ ten aspects are:
400
474
 
401
475
 
402
476
  ```ruby
403
- require 'ggplot'
477
+ # copy the R variable :mtcars to the Ruby mtcars variable
478
+ @mtcars = ~:mtcars
404
479
 
405
- mtcars = ~:mtcars
480
+ # create a new column 'car_name' to store the car names so that it can be
481
+ # used for plotting. The 'rownames' of the data frame cannot be used as
482
+ # data for plotting
483
+ @mtcars.car_name = R.rownames(:mtcars)
484
+
485
+ # compute normalized mpg and add it to a new column called mpg_z
486
+ # Note that the mean value for mpg can be obtained by calling the 'mean'
487
+ # function on the vector 'mtcars.mpg'. The same with the standard
488
+ # deviation 'sd'. The vector is then rounded to two digits with 'round 2'
489
+ @mtcars.mpg_z = ((@mtcars.mpg - @mtcars.mpg.mean)/@mtcars.mpg.sd).round 2
490
+
491
+ # create a new column 'mpg_type'. Function 'ifelse' is a vectorized function
492
+ # that looks at every element of the mpg_z vector and if the value is below
493
+ # 0, returns 'below', otherwise returns 'above'
494
+ @mtcars.mpg_type = (@mtcars.mpg_z < 0).ifelse("below", "above")
495
+
496
+ # order the mtcar data set by the mpg_z vector from smaler to larger values
497
+ @mtcars = @mtcars[@mtcars.mpg_z.order, :all]
498
+
499
+ # convert the car_name column to a factor to retain sorted order in plot
500
+ @mtcars.car_name = @mtcars.car_name.factor levels: @mtcars.car_name
406
501
 
407
- mtcars.car_name = mtcars.rownames # create new column for car names
408
- mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean) / mtcars.mpg.sd).round 2
409
- mtcars.mpg_type = (mtcars.mpg_z < 0).ifelse('below', 'above')
410
- mtcars = mtcars[mtcars.mpg_z.order, :all]
411
- mtcars.car_name = R.factor(mtcars.car_name, levels: mtcars.car_name)
502
+ # let's look at the first records of the final data frame
503
+ puts @mtcars.head
504
+ ```
505
+
506
+ ```
507
+ ## mpg cyl disp hp drat wt qsec vs am gear carb
508
+ ## Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
509
+ ## Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
510
+ ## Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
511
+ ## Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
512
+ ## Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
513
+ ## Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
514
+ ## car_name mpg_z mpg_type
515
+ ## Cadillac Fleetwood Cadillac Fleetwood -1.61 below
516
+ ## Lincoln Continental Lincoln Continental -1.61 below
517
+ ## Camaro Z28 Camaro Z28 -1.13 below
518
+ ## Duster 360 Duster 360 -0.96 below
519
+ ## Chrysler Imperial Chrysler Imperial -0.89 below
520
+ ## Maserati Bora Maserati Bora -0.84 below
521
+ ```
522
+
523
+
524
+ ```ruby
525
+ require 'ggplot'
412
526
 
413
- puts mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
527
+ puts @mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
414
528
  R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
415
529
  R.scale_fill_manual(name: 'Mileage',
416
530
  labels: R.c('Above Average', 'Below Average'),
@@ -438,10 +552,6 @@ This is some text with inline Ruby accessing variable \@b which has value:
438
552
  and is followed by some other text!
439
553
  ````
440
554
 
441
- Note that it is important not to add any new line before of after the code
442
- block if we want everything to be in only one line, resulting in the following sentence
443
- with inline Ruby code
444
-
445
555
  <div style="margin-bottom:30px;">
446
556
  </div>
447
557
 
@@ -452,21 +562,26 @@ and is followed by some other text!
452
562
  <div style="margin-bottom:30px;">
453
563
  </div>
454
564
 
565
+ Note that it is important not to add any new line before of after the code
566
+ block if we want everything to be in only one line, resulting in the following sentence
567
+ with inline Ruby code.
568
+
455
569
 
456
570
  ### The 'outputs' function
457
571
 
458
- He have previously used the standard 'puts' method in Ruby chunks in order to get some
459
- output. As can be seen, the result of a 'puts' is formatted inside a white box that
572
+ He have previously used the standard 'puts' method in Ruby chunks in order produce
573
+ output. The result of a 'puts', as seen in all previous chunks that use it, is formatted
574
+ inside a white box that
460
575
  follows the code block. Many times however, we would like to do some processing in the
461
576
  Ruby chunk and have the result of this processing generate and output that is
462
- 'included' in the document as if we had typed it in R markdown.
577
+ "included" in the document as if we had typed it in __R markdown__ document.
463
578
 
464
- For example, suppose we want to create a new 'heading' in our document, but the heading
579
+ For example, suppose we want to create a new heading in our document, but the heading
465
580
  phrase is the result of some code processing: maybe it's the first line of a file we are
466
- going to read. Method 'outputs' adds its output as if typed in the R markdown document.
581
+ going to read. Method 'outputs' adds its output as if typed in the __R markdown__ document.
467
582
 
468
- Take now a look at variable '@c' (it was defined in a previous block above) as
469
- '@c = "The 'outputs' function". "The 'outputs' function" is actually the name of this
583
+ Take now a look at variable '\@c' (it was defined in a previous block above) as
584
+ '\@c = "The 'outputs' function". "The 'outputs' function" is actually the name of this
470
585
  section and it was created using the 'outputs' function inside a Ruby chunk.
471
586
 
472
587
  The ruby chunk to generate this heading is:
@@ -477,14 +592,14 @@ outputs "### #{@c}"
477
592
  ```
478
593
  ````
479
594
 
480
- The three '###' are the way we add a Heading 3 in R markdown.
595
+ The three '###' is the way we add a Heading 3 in __R markdown__.
481
596
 
482
597
 
483
598
  ### HTML Output from Ruby Chunks
484
599
 
485
- We've just seen the use of method 'outputs' to add text to the the R markdown
486
- document. This technique can also be used to add HTML code to the document. In R
487
- markdown any html code typed directly in the document will be properly rendered.
600
+ We've just seen the use of method 'outputs' to add text to the the __R markdown__
601
+ document. This technique can also be used to add HTML code to the document. In
602
+ __R markdown__, any html code typed directly in the document will be properly rendered.
488
603
  Here, for instance, is a table definition in HTML and its output in the document:
489
604
 
490
605
  ```
@@ -530,8 +645,10 @@ Here, for instance, is a table definition in HTML and its output in the document
530
645
  <div style="margin-bottom:30px;">
531
646
  </div>
532
647
 
533
- But manually creating HTML output is not always easy or desirable. The above
534
- table certainly looks ugly. The 'kableExtra' library is a great library for
648
+ But manually creating HTML output is not always easy or desirable, specially
649
+ if we intend the document to be rendered in other formats, for example, as Latex.
650
+ Also, The above
651
+ table looks ugly. The 'kableExtra' library is a great library for
535
652
  creating beautiful tables. Take a look at https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
536
653
 
537
654
  In the next chunk, we output the 'mtcars' dataframe from R in a nicely formatted
@@ -543,11 +660,6 @@ R.install_and_loads('kableExtra')
543
660
  outputs (~:mtcars).kable.kable_styling
544
661
  ```
545
662
 
546
- ```
547
- ## The following packages are missing and will be installed:
548
- ## [1] "kableExtra"
549
- ```
550
-
551
663
  <table class="table" style="margin-left: auto; margin-right: auto;">
552
664
  <thead>
553
665
  <tr>
@@ -1017,39 +1129,38 @@ outputs (~:mtcars).kable.kable_styling
1017
1129
  </tbody>
1018
1130
  </table>
1019
1131
 
1020
- ### Including Ruby files
1132
+ ### Including Ruby files in a chunk
1021
1133
 
1022
1134
  R is a language that was created to be easy and fast for statisticians to use. As far
1023
- as I know (and please correct me if you think otherwise), tt was not a
1135
+ as I know, it was not a
1024
1136
  language to be used for developing large systems. Of course, there are large systems and
1025
1137
  libraries in R, but the focus of the language is for developing statistical models and
1026
1138
  distribute that to peers.
1027
1139
 
1028
1140
  Ruby on the other hand, is a language for large software development. Systems written in
1029
- Ruby will have dozens, hundreds or even thousands of files. In order to document a
1030
- large system with
1031
- literate programming we cannot expect the developer to add all the files in a single '.Rmd'
1032
- file. gKnit provides the 'include' chunk engine to include a Ruby file as if it had being
1033
- typed in the '.Rmd' file.
1141
+ Ruby will have dozens, hundreds or even thousands of files. To document a
1142
+ large system with literate programming, we cannot expect the developer to add all the
1143
+ files in a single '.Rmd' file. gKnit provides the 'include' chunk engine to include
1144
+ a Ruby file as if it had being typed in the '.Rmd' file.
1034
1145
 
1035
1146
  To include a file, the following chunk should be created, where <filename> is the name of
1036
- the file to be include and where the extension, if it is '.rb', does not need to be added.
1147
+ the file to be included and where the extension, if it is '.rb', does not need to be added.
1037
1148
  If the 'relative' option is not included, then it is treated as TRUE. When 'relative' is
1038
- true, 'require_relative' semantics is used to load the file, when false, Ruby's \$LOAD_PATH
1039
- is searched to find the file and it is 'require'd.
1149
+ true, ruby's 'require\_relative' semantics is used to load the file, when false, Ruby's
1150
+ \$LOAD_PATH is searched to find the file and it is 'require'd.
1040
1151
 
1041
1152
  ````
1042
1153
  ```{include <filename>, relative = <TRUE/FALSE>}
1043
1154
  ```
1044
1155
  ````
1045
1156
 
1046
- Here we include file 'model.rb' which is in the same directory of this blog.
1157
+ Bellow we include file 'model.rb', which is in the same directory of this blog.
1047
1158
  This code uses R 'caret' package to split a dataset in a train and test sets.
1048
1159
  The 'caret' package is a very important a useful package for doing Data Analysis,
1049
1160
  it has hundreds of functions for all steps of the Data Analysis workflow. To
1050
- just split a dataset it is using the proverbial cannon to kill the fly. We use
1051
- it here only to show that integrating Ruby and R and using even a very comples
1052
- package as 'caret' is trivial with Galaaz.
1161
+ use 'caret' just to split a dataset is like using the proverbial cannon to
1162
+ kill the fly. We use it here only to show that integrating Ruby and R and
1163
+ using even a very complex package as 'caret' is trivial with Galaaz.
1053
1164
 
1054
1165
  A word of advice: the 'caret' package has lots of dependencies and installing
1055
1166
  it in a Linux system is a time consuming operation. Method 'R.install_and_loads'
@@ -1133,10 +1244,11 @@ puts model.test.head
1133
1244
  ### Documenting Gems
1134
1245
 
1135
1246
  gKnit also allows developers to document and load files that are not in the same directory
1136
- of the '.Rmd' file. When using 'relative = FALSE' in a chunk header, gKnit will look for the
1137
- file in Ruby's \$LOAD_PATH and load it if found.
1247
+ of the '.Rmd' file.
1138
1248
 
1139
- Here is an example of loading the 'find.rb' file from TruffleRuby.
1249
+ Here is an example of loading the 'find.rb' file from TruffleRuby. In this example, relative
1250
+ is set to FALSE, so Ruby will look for the file in its $LOAD\_PATH, and the user does not
1251
+ need to no it's directory.
1140
1252
 
1141
1253
  ````
1142
1254
  ```{include find, relative = FALSE}
@@ -1238,7 +1350,7 @@ end
1238
1350
  ## Converting to PDF
1239
1351
 
1240
1352
  One of the beauties of knitr is that the same input can be converted to many different outputs.
1241
- One very useful format, is, of course, PDF. In order to converted an R markdown file to PDF
1353
+ One very useful format, is, of course, PDF. In order to converted an __R markdown__ file to PDF
1242
1354
  it is necessary to have LaTeX installed on the system. We will not explain here how to
1243
1355
  install LaTeX as there are plenty of documents on the web showing how to proceed.
1244
1356
 
@@ -1254,25 +1366,36 @@ date: "29 October 2018"
1254
1366
  output:
1255
1367
  pdf_document:
1256
1368
  includes:
1257
- in_header: ["../../sty/galaaz.sty"]
1258
- number_sections: yes
1369
+ in\_header: ["../../sty/galaaz.sty"]
1370
+ number\_sections: yes
1259
1371
  ---
1260
1372
  ```
1261
1373
 
1262
1374
  # Conclusion
1263
1375
 
1264
- One of the promises of GraalVM is that users/developers will be able to use the best tool
1265
- for their task at hand, independently of the programming language the tool was written. Galaaz
1266
- and gKnit are not trivial implementations atop the GraalVM and Truffle interop messages;
1267
- however, the time and effort it took to wrap Ruby over R - Galaaz - (not finished yet) or to
1268
- wrap Knitr with gKnit is a fraction of a fraction of a fraction of the time require to
1376
+ In order to do reproducible research, one of the main basic tools needed is a systhem that
1377
+ allows "literate programming" where text, code and possibly a set of files can be compiled
1378
+ onto a report that can be easily distributed to peers. Peers should be able to use this
1379
+ same set of files to rerun the compilation by their own obtaining the exact same original
1380
+ report. gKnit is such a system for Ruby and R. It uses __R Markdown__ to integrate
1381
+ text and code chunks, where code chunks can either be part of the __R Markdwon__ file or
1382
+ be imported from files in the system. Ideally, in reproducible research, all the files
1383
+ needed to rebuild a report should be easilly packed together (in the same zipped directory)
1384
+ and distributed to peers for reexecution.
1385
+
1386
+ One of the promises of Oracle's GraalVM is that users/developers will be able to use the best tool
1387
+ for their task at hand, independently of the programming language the tool was written on.
1388
+ We developed and implemented Galaaz atop the GraalVM and Truffle interop messages and
1389
+ the time and effort to wrap Ruby over R - Galaaz - or to
1390
+ wrap Knitr with gKnit was a fraction of a fraction of a fraction (one man effort for a couple
1391
+ of hours a day, for approximately six months) of the time require to
1269
1392
  implement the original tools. Trying to reimplement all R packages in Ruby would require the
1270
- same effort it is taking Python to implement NumPy, Panda and all supporting libraries and it
1393
+ same effort it is taking Python to implement NumPy, Pandas and all supporting libraries and it
1271
1394
  is unlikely that this effort would ever be done. GraalVM has allowed Ruby to profit "almost
1272
1395
  for free" from this huge set of libraries and tools that make R one of the most used
1273
1396
  languages for data analysis and machine learning.
1274
1397
 
1275
- More interesting though than being able to wrap the R libraries with Ruby, is that Ruby adds
1398
+ More interesting than wrapping the R libraries with Ruby, is that Ruby adds
1276
1399
  value to R, by allowing developers to use powerful and modern constructs for code reuse that
1277
1400
  are not the strong points of R. As shown in this blog, R and Ruby can easily communicate
1278
1401
  and R can be structured in classes and modules in a way that greatly expands its power and
@@ -1303,3 +1426,6 @@ the gnu compiler and tools should be enough. I am not sure what is needed on th
1303
1426
  ## Usage
1304
1427
 
1305
1428
  * gknit \<filename\>
1429
+
1430
+ # References
1431
+