galaaz 0.4.9 → 0.4.10

Sign up to get free protection for your applications and to get access to all the features.
Files changed (76) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +798 -285
  3. data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +3 -12
  4. data/blogs/galaaz_ggplot/galaaz_ggplot.aux +5 -7
  5. data/blogs/galaaz_ggplot/galaaz_ggplot.html +69 -29
  6. data/blogs/galaaz_ggplot/galaaz_ggplot.pdf +0 -0
  7. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/midwest_rb.png +0 -0
  8. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-html/scatter_plot_rb.png +0 -0
  9. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/midwest_rb.pdf +0 -0
  10. data/blogs/galaaz_ggplot/galaaz_ggplot_files/figure-latex/scatter_plot_rb.pdf +0 -0
  11. data/blogs/galaaz_ggplot/midwest.Rmd +1 -9
  12. data/blogs/gknit/gknit.Rmd +37 -40
  13. data/blogs/gknit/gknit.html +32 -30
  14. data/blogs/gknit/gknit.md +36 -37
  15. data/blogs/gknit/gknit.pdf +0 -0
  16. data/blogs/gknit/gknit.tex +35 -37
  17. data/blogs/manual/manual.Rmd +548 -125
  18. data/blogs/manual/manual.html +509 -286
  19. data/blogs/manual/manual.md +798 -285
  20. data/blogs/manual/manual.pdf +0 -0
  21. data/blogs/manual/manual.tex +2816 -0
  22. data/blogs/manual/manual_files/figure-latex/diverging_bar.pdf +0 -0
  23. data/blogs/nse_dplyr/nse_dplyr.Rmd +240 -74
  24. data/blogs/nse_dplyr/nse_dplyr.html +191 -87
  25. data/blogs/nse_dplyr/nse_dplyr.md +361 -107
  26. data/blogs/nse_dplyr/nse_dplyr.pdf +0 -0
  27. data/blogs/nse_dplyr/nse_dplyr.tex +1373 -0
  28. data/blogs/ruby_plot/ruby_plot.Rmd +61 -81
  29. data/blogs/ruby_plot/ruby_plot.html +54 -57
  30. data/blogs/ruby_plot/ruby_plot.md +48 -67
  31. data/blogs/ruby_plot/ruby_plot.pdf +0 -0
  32. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
  33. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
  34. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
  35. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
  36. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
  37. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
  38. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
  39. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
  40. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
  41. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
  42. data/blogs/ruby_plot/ruby_plot_files/figure-latex/dose_len.png +0 -0
  43. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_delivery.png +0 -0
  44. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facet_by_dose.png +0 -0
  45. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color.png +0 -0
  46. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_by_delivery_color2.png +0 -0
  47. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_decorations.png +0 -0
  48. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_jitter.png +0 -0
  49. data/blogs/ruby_plot/ruby_plot_files/figure-latex/facets_with_points.png +0 -0
  50. data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_box_plot.png +0 -0
  51. data/blogs/ruby_plot/ruby_plot_files/figure-latex/final_violin_plot.png +0 -0
  52. data/blogs/ruby_plot/ruby_plot_files/figure-latex/violin_with_jitter.png +0 -0
  53. data/lib/R_interface/rdata_frame.rb +0 -12
  54. data/lib/R_interface/robject.rb +14 -14
  55. data/lib/R_interface/ruby_extensions.rb +3 -31
  56. data/lib/R_interface/rvector.rb +0 -12
  57. data/lib/gknit/knitr_engine.rb +5 -3
  58. data/lib/util/exec_ruby.rb +22 -61
  59. data/specs/tmp.rb +26 -12
  60. data/version.rb +1 -1
  61. metadata +22 -17
  62. data/bin/gknit_old_r +0 -236
  63. data/blogs/dev/dev.Rmd +0 -23
  64. data/blogs/dev/dev.md +0 -58
  65. data/blogs/dev/dev2.Rmd +0 -65
  66. data/blogs/dev/model.rb +0 -41
  67. data/blogs/dplyr/dplyr.Rmd +0 -29
  68. data/blogs/dplyr/dplyr.html +0 -433
  69. data/blogs/dplyr/dplyr.md +0 -58
  70. data/blogs/dplyr/dplyr.rb +0 -63
  71. data/blogs/galaaz_ggplot/galaaz_ggplot.log +0 -640
  72. data/blogs/galaaz_ggplot/galaaz_ggplot.md +0 -431
  73. data/blogs/galaaz_ggplot/galaaz_ggplot.tex +0 -481
  74. data/blogs/galaaz_ggplot/midwest.png +0 -0
  75. data/blogs/galaaz_ggplot/scatter_plot.png +0 -0
  76. data/blogs/ruby_plot/ruby_plot.tex +0 -1077
@@ -447,13 +447,8 @@ $(document).ready(function () {
447
447
  </div>
448
448
  <div id="gknitting-a-document" class="section level1">
449
449
  <h1>gKnitting a Document</h1>
450
- <p>This manual has been formatted usign gKnit. gKnit uses Knitr and R markdown to knit a document in Ruby or R and output it in any of the available formats for R markdown.<br />
451
- gKnit runs atop of GraalVM, and Galaaz. In gKnit, Ruby variables are persisted between chunks, making it an ideal solution for literate programming.<br />
452
- Also, since it is based on Galaaz, Ruby chunks can have access to R variables and Polyglot Programming with Ruby and R is quite natural.</p>
453
- <p>gknit was describe in more depth in:</p>
454
- <ul>
455
- <li>xxx.xxxx.xxx</li>
456
- </ul>
450
+ <p>This manual has been formatted usign gKnit. gKnit uses Knitr and R markdown to knit a document in Ruby or R and output it in any of the available formats for R markdown. gKnit runs atop of GraalVM, and Galaaz. In gKnit, Ruby variables are persisted between chunks, making it an ideal solution for literate programming. Also, since it is based on Galaaz, Ruby chunks can have access to R variables and Polyglot Programming with Ruby and R is quite natural.</p>
451
+ <p><a href="https://towardsdatascience.com/how-to-do-reproducible-research-in-ruby-with-gknit-c26d2684d64e">gknit is described in more details here</a></p>
457
452
  </div>
458
453
  <div id="vector" class="section level1">
459
454
  <h1>Vector</h1>
@@ -501,20 +496,20 @@ Also, since it is based on Galaaz, Ruby chunks can have access to R variables an
501
496
  </table>
502
497
  <p>Single numbers, such as 4.2, and strings, such as “four point two” are still vectors, of length 1; there are no more basic types. Vectors with length zero are possible (and useful). String vectors have mode and storage mode “character”. A single element of a character vector is often referred to as a character string.</p>
503
498
  <p>To create a vector the ‘c’ (concatenate) method from the ‘R’ module should be used:</p>
504
- <pre class="ruby"><code>@vec = R.c(1, 2, 3)
505
- puts @vec</code></pre>
499
+ <pre class="ruby"><code>vec = R.c(1, 2, 3)
500
+ puts vec</code></pre>
506
501
  <pre><code>## [1] 1 2 3</code></pre>
507
- <p>Lets take a look at the type, mode and storage.mode of our vector <span class="citation">@vec</span>. In order to print this out, we are creating a data frame ‘df’ and printing it out. A data frame, for those not familiar with it, is basically a table. Here we create the data frame and add the column name by passing named parameters for each column, such as ‘typeof:’, ‘mode:’ and ’storage__mode?‘. You should also note here that the double underscore is converted to a’.’. So, when printed ‘storage__mode’ will actually print as ‘storage.mode’.</p>
502
+ <p>Lets take a look at the type, mode and storage.mode of our vector vec. In order to print this out, we are creating a data frame ‘df’ and printing it out. A data frame, for those not familiar with it, is basically a table. Here we create the data frame and add the column name by passing named parameters for each column, such as ‘typeof:’, ‘mode:’ and ’storage__mode?‘. You should also note here that the double underscore is converted to a’.’. So, when printed ‘storage__mode’ will actually print as ‘storage.mode’.</p>
508
503
  <p>Data frames will later be more carefully described. In R, the method used to create a data frame is ‘data.frame’, in Galaaz we use ‘data__frame’.</p>
509
- <pre class="ruby"><code>df = R.data__frame(typeof: @vec.typeof, mode: @vec.mode, storage__mode: @vec.storage__mode)
504
+ <pre class="ruby"><code>df = R.data__frame(typeof: vec.typeof, mode: vec.mode, storage__mode: vec.storage__mode)
510
505
  puts df</code></pre>
511
506
  <pre><code>## typeof mode storage.mode
512
507
  ## 1 integer numeric integer</code></pre>
513
508
  <p>If you want to create a vector with floating point numbers, then we need at least one of the vector’s element to be a float, such as 1.0. R users should be careful, since in R a number like ‘1’ is converted to float and to have an integer the R developer will use ‘1L’. Galaaz follows normal Ruby rules and the number 1 is an integer and 1.0 is a float.</p>
514
- <pre class="ruby"><code>@vec = R.c(1.0, 2, 3)
515
- puts @vec</code></pre>
509
+ <pre class="ruby"><code>vec = R.c(1.0, 2, 3)
510
+ puts vec</code></pre>
516
511
  <pre><code>## [1] 1 2 3</code></pre>
517
- <pre class="ruby"><code>df = R.data__frame(typeof: @vec.typeof, mode: @vec.mode, storage__mode: @vec.storage__mode)
512
+ <pre class="ruby"><code>df = R.data__frame(typeof: vec.typeof, mode: vec.mode, storage__mode: vec.storage__mode)
518
513
  outputs df.kable.kable_styling</code></pre>
519
514
  <table class="table" style="margin-left: auto; margin-right: auto;">
520
515
  <thead>
@@ -547,11 +542,11 @@ double
547
542
  <p>In this next example we try to create a vector with a variable ‘hello’ that has not yet being defined. This will raise an exception that is printed out. We get two return blocks, the first with a message explaining what went wrong and the second with the full backtrace of the error.</p>
548
543
  <pre class="ruby"><code>vec = R.c(1, hello, 5)</code></pre>
549
544
  <pre><code>## Message:
550
- ## undefined local variable or method `hello' for RubyChunk:Class</code></pre>
545
+ ## undefined local variable or method `hello' for #&lt;RC:0x2e0 @out_list=nil&gt;:RC</code></pre>
551
546
  <pre><code>## Message:
552
- ## (eval):1:in `exec_ruby'
553
- ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:141:in `instance_eval'
554
- ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:141:in `exec_ruby'
547
+ ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:103:in `get_binding'
548
+ ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:102:in `eval'
549
+ ## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:102:in `exec_ruby'
555
550
  ## /home/rbotafogo/desenv/galaaz/lib/gknit/knitr_engine.rb:650:in `block in initialize'
556
551
  ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `call'
557
552
  ## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `callback'
@@ -571,55 +566,55 @@ double
571
566
  ## &lt;REPL&gt;:5:in `&lt;repl wrapper&gt;'
572
567
  ## &lt;REPL&gt;:1</code></pre>
573
568
  <p>Here is a vector with logical values</p>
574
- <pre class="ruby"><code>@vec = R.c(true, true, false, false, true)
575
- puts @vec</code></pre>
569
+ <pre class="ruby"><code>vec = R.c(true, true, false, false, true)
570
+ puts vec</code></pre>
576
571
  <pre><code>## [1] TRUE TRUE FALSE FALSE TRUE</code></pre>
577
572
  <div id="combining-vectors" class="section level2">
578
573
  <h2>Combining Vectors</h2>
579
574
  <p>The ‘c’ functions used to create vectors can also be used to combine two vectors:</p>
580
- <pre class="ruby"><code>@vec1 = R.c(10.0, 20.0, 30.0)
581
- @vec2 = R.c(4.0, 5.0, 6.0)
582
- @vec = R.c(@vec1, @vec2)
583
- puts @vec</code></pre>
575
+ <pre class="ruby"><code>vec1 = R.c(10.0, 20.0, 30.0)
576
+ vec2 = R.c(4.0, 5.0, 6.0)
577
+ vec = R.c(vec1, vec2)
578
+ puts vec</code></pre>
584
579
  <pre><code>## [1] 10 20 30 4 5 6</code></pre>
585
- <p>In galaaz, methods can be chainned (somewhat like the pipe operator in R %&gt;%, but more generic). In this next example, method ‘c’ is chainned after ‘<span class="citation">@vec1</span>’. This also looks like ‘c’ is a method of the vector, but in reallity, this is actually closer to the pipe operator. When Galaaz identifies that ‘c’ is not a method of ‘vec’ it actually tries to call ‘R.c’ with ‘<span class="citation">@vec1</span>’ as the first argument concatenated with all the other available arguments. The code bellow is automatically converted to the code above.</p>
586
- <pre class="ruby"><code>@vec = @vec1.c(@vec2)
587
- puts @vec</code></pre>
580
+ <p>In galaaz, methods can be chainned (somewhat like the pipe operator in R %&gt;%, but more generic). In this next example, method ‘c’ is chainned after vec1’. This also looks like ‘c’ is a method of the vector, but in reallity, this is actually closer to the pipe operator. When Galaaz identifies that ‘c’ is not a method of ‘vec’ it actually tries to call ‘R.c’ with vec1 as the first argument concatenated with all the other available arguments. The code bellow is automatically converted to the code above.</p>
581
+ <pre class="ruby"><code>vec = vec1.c(vec2)
582
+ puts vec</code></pre>
588
583
  <pre><code>## [1] 10 20 30 4 5 6</code></pre>
589
584
  </div>
590
585
  <div id="vector-arithmetic" class="section level2">
591
586
  <h2>Vector Arithmetic</h2>
592
587
  <p>Arithmetic operations on vectors are performed element by element:</p>
593
- <pre class="ruby"><code>puts @vec1 + @vec2</code></pre>
588
+ <pre class="ruby"><code>puts vec1 + vec2</code></pre>
594
589
  <pre><code>## [1] 14 25 36</code></pre>
595
- <pre class="ruby"><code>puts @vec1 * 5</code></pre>
590
+ <pre class="ruby"><code>puts vec1 * 5</code></pre>
596
591
  <pre><code>## [1] 50 100 150</code></pre>
597
592
  <p>When vectors have different length, a recycling rule is applied to the shorter vector:</p>
598
- <pre class="ruby"><code>@vec3 = R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)
599
- puts @vec4 = @vec1 + @vec3</code></pre>
593
+ <pre class="ruby"><code>vec3 = R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)
594
+ puts vec4 = vec1 + vec3</code></pre>
600
595
  <pre><code>## [1] 11 22 33 14 25 36 17 28 39</code></pre>
601
596
  </div>
602
597
  <div id="vector-indexing" class="section level2">
603
598
  <h2>Vector Indexing</h2>
604
599
  <p>Vectors can be indexed by using the ‘[]’ operator:</p>
605
- <pre class="ruby"><code>puts @vec4[3]</code></pre>
600
+ <pre class="ruby"><code>puts vec4[3]</code></pre>
606
601
  <pre><code>## [1] 33</code></pre>
607
- <p>We can also index a vector with another vector. For example, in the code bellow, we take elements 1, 3, 5, and 7 from <span class="citation">@vec3</span>:</p>
608
- <pre class="ruby"><code>puts @vec4[R.c(1, 3, 5, 7)]</code></pre>
602
+ <p>We can also index a vector with another vector. For example, in the code bellow, we take elements 1, 3, 5, and 7 from vec3:</p>
603
+ <pre class="ruby"><code>puts vec4[R.c(1, 3, 5, 7)]</code></pre>
609
604
  <pre><code>## [1] 11 33 25 17</code></pre>
610
605
  <p>Repeating an index and having indices out of order is valid code:</p>
611
- <pre class="ruby"><code>puts @vec4[R.c(1, 3, 3, 1)]</code></pre>
606
+ <pre class="ruby"><code>puts vec4[R.c(1, 3, 3, 1)]</code></pre>
612
607
  <pre><code>## [1] 11 33 33 11</code></pre>
613
608
  <p>It is also possible to index a vector with a negative number or negative vector. In these cases the indexed values are not returned:</p>
614
- <pre class="ruby"><code>puts @vec4[-3]
615
- puts @vec4[-R.c(1, 3, 5, 7)]</code></pre>
609
+ <pre class="ruby"><code>puts vec4[-3]
610
+ puts vec4[-R.c(1, 3, 5, 7)]</code></pre>
616
611
  <pre><code>## [1] 11 22 14 25 36 17 28 39
617
612
  ## [1] 22 14 36 28 39</code></pre>
618
613
  <p>If an index is out of range, a missing value (NA) will be reported.</p>
619
- <pre class="ruby"><code>puts @vec4[30]</code></pre>
614
+ <pre class="ruby"><code>puts vec4[30]</code></pre>
620
615
  <pre><code>## [1] NA</code></pre>
621
616
  <p>It is also possible to index a vector by range:</p>
622
- <pre class="ruby"><code>puts @vec4[(2..5)]</code></pre>
617
+ <pre class="ruby"><code>puts vec4[(2..5)]</code></pre>
623
618
  <pre><code>## [1] 22 33 14 25</code></pre>
624
619
  <p>Elements in a vector can be named using the ‘names’ attribute of a vector:</p>
625
620
  <pre class="ruby"><code>full_name = R.c(&quot;Rodrigo&quot;, &quot;A&quot;, &quot;Botafogo&quot;)
@@ -636,9 +631,9 @@ puts full_name</code></pre>
636
631
  <div id="extracting-native-ruby-types-from-a-vector" class="section level2">
637
632
  <h2>Extracting Native Ruby Types from a Vector</h2>
638
633
  <p>Vectors created with ‘R.c’ are of class R::Vector. You might have noticed that when indexing a vector, a new vector is returned, even if this vector has one single element. In order to use R::Vector with other ruby classes it might be necessary to extract the actual Ruby native type from the vector. In order to do this extraction the ‘&gt;&gt;’ operator is used.</p>
639
- <pre class="ruby"><code>puts @vec4
640
- puts @vec4 &gt;&gt; 0
641
- puts @vec4 &gt;&gt; 4</code></pre>
634
+ <pre class="ruby"><code>puts vec4
635
+ puts vec4 &gt;&gt; 0
636
+ puts vec4 &gt;&gt; 4</code></pre>
642
637
  <pre><code>## [1] 11 22 33 14 25 36 17 28 39
643
638
  ## 11.0
644
639
  ## 25.0</code></pre>
@@ -1912,22 +1907,22 @@ Volvo 142E
1912
1907
  <div id="matrix" class="section level1">
1913
1908
  <h1>Matrix</h1>
1914
1909
  <p>A matrix is a collection of elements organized as a two dimensional table. A matrix can be created by the ‘matrix’ function:</p>
1915
- <pre class="ruby"><code>@mat = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
1916
- nrow: 3,
1917
- ncol: 3)
1910
+ <pre class="ruby"><code>mat = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
1911
+ nrow: 3,
1912
+ ncol: 3)
1918
1913
 
1919
- puts @mat</code></pre>
1914
+ puts mat</code></pre>
1920
1915
  <pre><code>## [,1] [,2] [,3]
1921
1916
  ## [1,] 1 4 7
1922
1917
  ## [2,] 2 5 8
1923
1918
  ## [3,] 3 6 9</code></pre>
1924
1919
  <p>Note that matrices data is organized by column first. It is possible to organize the matrix memory by row first passing an extra argument to the ‘matrix’ function:</p>
1925
- <pre class="ruby"><code>@mat_row = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
1926
- nrow: 3,
1927
- ncol: 3,
1928
- byrow: true)
1920
+ <pre class="ruby"><code>mat_row = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
1921
+ nrow: 3,
1922
+ ncol: 3,
1923
+ byrow: true)
1929
1924
 
1930
- puts @mat_row</code></pre>
1925
+ puts mat_row</code></pre>
1931
1926
  <pre><code>## [,1] [,2] [,3]
1932
1927
  ## [1,] 1 2 3
1933
1928
  ## [2,] 4 5 6
@@ -1935,31 +1930,32 @@ puts @mat_row</code></pre>
1935
1930
  <div id="indexing-a-matrix" class="section level2">
1936
1931
  <h2>Indexing a Matrix</h2>
1937
1932
  <p>A matrix can be indexed by [row, column]:</p>
1938
- <pre class="ruby"><code>puts @mat_row[1, 1]
1939
- puts @mat_row[2, 3]</code></pre>
1933
+ <pre class="ruby"><code>puts mat_row[1, 1]
1934
+ puts mat_row[2, 3]</code></pre>
1940
1935
  <pre><code>## [1] 1
1941
1936
  ## [1] 6</code></pre>
1942
1937
  <p>It is possible to index an entire row or column with the ‘:all’ keyword</p>
1943
- <pre class="ruby"><code>puts @mat_row[1, :all]
1944
- puts @mat_row[:all, 2]</code></pre>
1938
+ <pre class="ruby"><code>puts mat_row[1, :all]
1939
+ puts mat_row[:all, 2]</code></pre>
1945
1940
  <pre><code>## [1] 1 2 3
1946
1941
  ## [1] 2 5 8</code></pre>
1947
1942
  <p>Indexing with a vector is also possible for matrices. In the following example we want rows 1 and 3 and columns 2 and 3 building a 2 x 2 matrix.</p>
1948
- <pre class="ruby"><code>puts @mat_row[R.c(1, 3), R.c(2, 3)]</code></pre>
1943
+ <pre class="ruby"><code>puts mat_row[R.c(1, 3), R.c(2, 3)]</code></pre>
1949
1944
  <pre><code>## [,1] [,2]
1950
1945
  ## [1,] 2 3
1951
1946
  ## [2,] 8 9</code></pre>
1952
- <p>Matrices can be combined with functions ‘rbind’ and ‘cbind’</p>
1953
- <pre class="ruby"><code>puts @mat_row.rbind(@mat)
1954
- puts @mat_row.cbind(@mat)</code></pre>
1947
+ <p>Matrices can be combined with functions ‘rbind’:</p>
1948
+ <pre class="ruby"><code>puts mat_row.rbind(mat)</code></pre>
1955
1949
  <pre><code>## [,1] [,2] [,3]
1956
1950
  ## [1,] 1 2 3
1957
1951
  ## [2,] 4 5 6
1958
1952
  ## [3,] 7 8 9
1959
1953
  ## [4,] 1 4 7
1960
1954
  ## [5,] 2 5 8
1961
- ## [6,] 3 6 9
1962
- ## [,1] [,2] [,3] [,4] [,5] [,6]
1955
+ ## [6,] 3 6 9</code></pre>
1956
+ <p>and ‘cbind’:</p>
1957
+ <pre class="ruby"><code>puts mat_row.cbind(mat)</code></pre>
1958
+ <pre><code>## [,1] [,2] [,3] [,4] [,5] [,6]
1963
1959
  ## [1,] 1 2 3 1 4 7
1964
1960
  ## [2,] 4 5 6 2 5 8
1965
1961
  ## [3,] 7 8 9 3 6 9</code></pre>
@@ -1971,8 +1967,8 @@ puts @mat_row.cbind(@mat)</code></pre>
1971
1967
  <pre class="ruby"><code>nums = R.c(1.0, 2.0, 3.0)
1972
1968
  strs = R.c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;d&quot;)
1973
1969
  bool = R.c(true, true, false)
1974
- @lst = R.list(nums: nums, strs: strs, bool: bool)
1975
- puts @lst</code></pre>
1970
+ lst = R.list(nums: nums, strs: strs, bool: bool)
1971
+ puts lst</code></pre>
1976
1972
  <pre><code>## $nums
1977
1973
  ## [1] 1 2 3
1978
1974
  ##
@@ -1981,18 +1977,18 @@ puts @lst</code></pre>
1981
1977
  ##
1982
1978
  ## $bool
1983
1979
  ## [1] TRUE TRUE FALSE</code></pre>
1984
- <p>Note that ‘<span class="citation">@lst</span>’ elements are named elements.</p>
1980
+ <p>Note that lst elements are named elements.</p>
1985
1981
  <div id="list-indexing" class="section level2">
1986
1982
  <h2>List Indexing</h2>
1987
1983
  <p>List indexing, also called slicing, is done using the ‘[]’ operator and the ‘[[]]’ operator. Let’s first start with the ‘[]’ operator. The list above has three sublist indexing with ‘[]’ will return one of the sublists.</p>
1988
- <pre class="ruby"><code>puts @lst[1]</code></pre>
1984
+ <pre class="ruby"><code>puts lst[1]</code></pre>
1989
1985
  <pre><code>## $nums
1990
1986
  ## [1] 1 2 3</code></pre>
1991
1987
  <p>Note that when using ‘[]’ a new list is returned. When using the double square bracket operator the value returned is the actual element of the list in the given position and not a slice of the original list</p>
1992
- <pre class="ruby"><code>puts @lst[[1]]</code></pre>
1988
+ <pre class="ruby"><code>puts lst[[1]]</code></pre>
1993
1989
  <pre><code>## [1] 1 2 3</code></pre>
1994
- <p>When elements are named, as dones with <span class="citation">@lst</span>, indexing can be done by name:</p>
1995
- <pre class="ruby"><code>puts @lst[['bool']][[1]] &gt;&gt; 0</code></pre>
1990
+ <p>When elements are named, as dones with lst, indexing can be done by name:</p>
1991
+ <pre class="ruby"><code>puts lst[['bool']][[1]] &gt;&gt; 0</code></pre>
1996
1992
  <pre><code>## true</code></pre>
1997
1993
  <p>In this example, first the ‘bool’ element of the list was extracted, not as a list, but as a vector, then the first element of the vector was extracted (note that vectors also accept the ‘[[]]’ operator) and then the vector was indexed by its first element, extracting the native Ruby type.</p>
1998
1994
  </div>
@@ -2048,17 +2044,17 @@ puts (~:mtcars)['Datsun 710', 'mpg']</code></pre>
2048
2044
  <pre><code>## mpg cyl disp hp drat wt qsec vs am gear carb
2049
2045
  ## Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
2050
2046
  ## Camaro Z28 13.3 8 350 245 3.73 3.84 15.41 0 0 3 4</code></pre>
2051
- <p>Finally, a data frame can also be indexed with a logical vector. In this next example, the ‘am’ column of :mtcars is compared with 0 (with method ‘eq’). When ‘am’ is equal to 0 the car is automatic. So, by doing ‘(~:mtcars).am.eq 0’ a logical vector is created with ‘true’ whenever ‘am’ is 0 and ‘false’ otherwise. Using this logical vector, the data frame is indexed, returning a new data frame in which all cars have automatic transmission.</p>
2047
+ <p>Finally, a data frame can also be indexed with a logical vector. In this next example, the ‘am’ column of :mtcars is compared with 0 (with method ‘eq’). When ‘am’ is equal to 0 the car is automatic. So, by doing ‘(~:mtcars).am.eq 0’ a logical vector is created with ‘true’ whenever ‘am’ is 0 and ‘false’ otherwise.</p>
2052
2048
  <pre class="ruby"><code># obtain a vector with 'true' for cars with automatic transmission
2053
2049
  automatic = (~:mtcars).am.eq 0
2054
- puts automatic
2055
-
2056
- # slice the data frame by using this vector
2057
- puts (~:mtcars)[automatic, :all]</code></pre>
2050
+ puts automatic</code></pre>
2058
2051
  <pre><code>## [1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
2059
2052
  ## [12] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
2060
- ## [23] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
2061
- ## mpg cyl disp hp drat wt qsec vs am gear carb
2053
+ ## [23] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE</code></pre>
2054
+ <p>Using this logical vector, the data frame is indexed, returning a new data frame in which all cars have automatic transmission.</p>
2055
+ <pre class="ruby"><code># slice the data frame by using this vector
2056
+ puts (~:mtcars)[automatic, :all]</code></pre>
2057
+ <pre><code>## mpg cyl disp hp drat wt qsec vs am gear carb
2062
2058
  ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
2063
2059
  ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
2064
2060
  ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
@@ -2123,6 +2119,30 @@ puts exp_wrong</code></pre>
2123
2119
  <pre class="ruby"><code>exp7 = :y.assign E.sin(:x)
2124
2120
  puts exp7</code></pre>
2125
2121
  <pre><code>## y &lt;- sin(x)</code></pre>
2122
+ <p>Expressions can also be written using ‘.’ notation:</p>
2123
+ <pre class="ruby"><code>exp8 = :y.assign :x.sin
2124
+ puts exp8</code></pre>
2125
+ <pre><code>## y &lt;- sin(x)</code></pre>
2126
+ <p>When a function has multiple arguments, the first one can be used before the ‘.’:</p>
2127
+ <pre class="ruby"><code>exp9 = :x.c(:y)
2128
+ puts exp9</code></pre>
2129
+ <pre><code>## c(x, y)</code></pre>
2130
+ </div>
2131
+ <div id="evaluating-an-expression" class="section level2">
2132
+ <h2>Evaluating an Expression</h2>
2133
+ <p>Expressions can be evaluated by calling function ‘eval’ with a binding. A binding can be provided with a list:</p>
2134
+ <pre class="ruby"><code>exp = (:a + :b) * 2.0 + :c ** 2 / :z
2135
+ puts exp.eval(R.list(a: 10, b: 20, c: 30, z: 40))</code></pre>
2136
+ <pre><code>## [1] 82.5</code></pre>
2137
+ <p>… with a data frame:</p>
2138
+ <pre class="ruby"><code>df = R.data__frame(
2139
+ a: R.c(1, 2, 3),
2140
+ b: R.c(10, 20, 30),
2141
+ c: R.c(100, 200, 300),
2142
+ z: R.c(1000, 2000, 3000))
2143
+
2144
+ puts exp.eval(df)</code></pre>
2145
+ <pre><code>## [1] 32 64 96</code></pre>
2126
2146
  </div>
2127
2147
  </div>
2128
2148
  <div id="manipulating-data" class="section level1">
@@ -2131,8 +2151,8 @@ puts exp7</code></pre>
2131
2151
  <p>For these examples, we will investigate the nycflights13 data set available on the package by the same name. We use function ‘R.install_and_loads’ that checks if the library is available locally, and if not, installs it. This data frame contains all 336,776 flights that departed from New York City in 2013. The data comes from the US Bureau of Transportation Statistics.</p>
2132
2152
  <pre class="ruby"><code>R.install_and_loads('nycflights13')
2133
2153
  R.library('dplyr')</code></pre>
2134
- <pre class="ruby"><code>@flights = ~:flights
2135
- puts @flights.head.as__data__frame</code></pre>
2154
+ <pre class="ruby"><code>flights = ~:flights
2155
+ puts flights.head.as__data__frame</code></pre>
2136
2156
  <pre><code>## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
2137
2157
  ## 1 2013 1 1 517 515 2 830 819
2138
2158
  ## 2 2013 1 1 533 529 4 850 830
@@ -2157,7 +2177,7 @@ puts @flights.head.as__data__frame</code></pre>
2157
2177
  <div id="filtering-rows-with-filter" class="section level2">
2158
2178
  <h2>Filtering rows with Filter</h2>
2159
2179
  <p>In this example we filter the flights data set by giving to the filter function two expressions: the first :month.eq 1</p>
2160
- <pre class="ruby"><code>puts @flights.filter((:month.eq 1), (:day.eq 1)).head.as__data__frame</code></pre>
2180
+ <pre class="ruby"><code>puts flights.filter((:month.eq 1), (:day.eq 1)).head.as__data__frame</code></pre>
2161
2181
  <pre><code>## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
2162
2182
  ## 1 2013 1 1 517 515 2 830 819
2163
2183
  ## 2 2013 1 1 533 529 4 850 830
@@ -2183,7 +2203,7 @@ puts @flights.head.as__data__frame</code></pre>
2183
2203
  <div id="logical-operators" class="section level2">
2184
2204
  <h2>Logical Operators</h2>
2185
2205
  <p>All flights that departed in November of December</p>
2186
- <pre class="ruby"><code>puts @flights.filter((:month.eq 11) | (:month.eq 12)).head.as__data__frame</code></pre>
2206
+ <pre class="ruby"><code>puts flights.filter((:month.eq 11) | (:month.eq 12)).head.as__data__frame</code></pre>
2187
2207
  <pre><code>## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
2188
2208
  ## 1 2013 11 1 5 2359 6 352 345
2189
2209
  ## 2 2013 11 1 35 2250 105 123 2356
@@ -2206,7 +2226,7 @@ puts @flights.head.as__data__frame</code></pre>
2206
2226
  ## 5 45 2013-11-01 05:00:00
2207
2227
  ## 6 0 2013-11-01 06:00:00</code></pre>
2208
2228
  <p>The same as above, but using the ‘in’ operator. In R, it is possible to define many operators by doing %<op>%. The %in% operator checks if a value is in a vector. In order to use those operators from Galaaz the ‘._’ method is used, where the first argument is the operator’s symbol, in this case ‘:in’ and the second argument is the vector:</p>
2209
- <pre class="ruby"><code>puts @flights.filter(:month._ :in, R.c(11, 12)).head.as__data__frame</code></pre>
2229
+ <pre class="ruby"><code>puts flights.filter(:month._ :in, R.c(11, 12)).head.as__data__frame</code></pre>
2210
2230
  <pre><code>## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
2211
2231
  ## 1 2013 11 1 5 2359 6 352 345
2212
2232
  ## 2 2013 11 1 35 2250 105 123 2356
@@ -2232,18 +2252,18 @@ puts @flights.head.as__data__frame</code></pre>
2232
2252
  <div id="filtering-with-na-not-available" class="section level2">
2233
2253
  <h2>Filtering with NA (Not Available)</h2>
2234
2254
  <p>Let’s first create a ‘tibble’ with a Not Available value (R::NA). Tibbles are a modern version of a data frame and operate very similarly to one. It differs in how it outputs the values and the result of some subsetting operations that are more consistent than what is obtained from data frame.</p>
2235
- <pre class="ruby"><code>@df = R.tibble(x: R.c(1, R::NA, 3))
2236
- puts @df.as__data__frame</code></pre>
2255
+ <pre class="ruby"><code>df = R.tibble(x: R.c(1, R::NA, 3))
2256
+ puts df.as__data__frame</code></pre>
2237
2257
  <pre><code>## x
2238
2258
  ## 1 1
2239
2259
  ## 2 NA
2240
2260
  ## 3 3</code></pre>
2241
2261
  <p>Now filtering by :x &gt; 1 shows all lines that satisfy this condition, where the row with R:NA does not.</p>
2242
- <pre class="ruby"><code>puts @df.filter(:x &gt; 1).as__data__frame</code></pre>
2262
+ <pre class="ruby"><code>puts df.filter(:x &gt; 1).as__data__frame</code></pre>
2243
2263
  <pre><code>## x
2244
2264
  ## 1 3</code></pre>
2245
2265
  <p>To match an NA use method ’is__na’</p>
2246
- <pre class="ruby"><code>puts @df.filter((:x.is__na) | (:x &gt; 1)).as__data__frame</code></pre>
2266
+ <pre class="ruby"><code>puts df.filter((:x.is__na) | (:x &gt; 1)).as__data__frame</code></pre>
2247
2267
  <pre><code>## x
2248
2268
  ## 1 NA
2249
2269
  ## 2 3</code></pre>
@@ -2251,7 +2271,7 @@ puts @df.as__data__frame</code></pre>
2251
2271
  <div id="arrange-rows-with-arrange" class="section level2">
2252
2272
  <h2>Arrange Rows with arrange</h2>
2253
2273
  <p>Arrange reorders the rows of a data frame by the given arguments.</p>
2254
- <pre class="ruby"><code>puts @flights.arrange(:year, :month, :day).head.as__data__frame</code></pre>
2274
+ <pre class="ruby"><code>puts flights.arrange(:year, :month, :day).head.as__data__frame</code></pre>
2255
2275
  <pre><code>## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
2256
2276
  ## 1 2013 1 1 517 515 2 830 819
2257
2277
  ## 2 2013 1 1 533 529 4 850 830
@@ -2274,7 +2294,7 @@ puts @df.as__data__frame</code></pre>
2274
2294
  ## 5 0 2013-01-01 06:00:00
2275
2295
  ## 6 58 2013-01-01 05:00:00</code></pre>
2276
2296
  <p>To arrange in descending order, use function ‘desc’</p>
2277
- <pre class="ruby"><code>puts @flights.arrange(:dep_delay.desc).head.as__data__frame</code></pre>
2297
+ <pre class="ruby"><code>puts flights.arrange(:dep_delay.desc).head.as__data__frame</code></pre>
2278
2298
  <pre><code>## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
2279
2299
  ## 1 2013 1 9 641 900 1301 1242 1530
2280
2300
  ## 2 2013 6 15 1432 1935 1137 1607 2120
@@ -2300,7 +2320,7 @@ puts @df.as__data__frame</code></pre>
2300
2320
  <div id="selecting-columns" class="section level2">
2301
2321
  <h2>Selecting columns</h2>
2302
2322
  <p>To select specific columns from a dataset we use function ‘select’:</p>
2303
- <pre class="ruby"><code>puts @flights.select(:year, :month, :day).head.as__data__frame</code></pre>
2323
+ <pre class="ruby"><code>puts flights.select(:year, :month, :day).head.as__data__frame</code></pre>
2304
2324
  <pre><code>## year month day
2305
2325
  ## 1 2013 1 1
2306
2326
  ## 2 2013 1 1
@@ -2309,7 +2329,7 @@ puts @df.as__data__frame</code></pre>
2309
2329
  ## 5 2013 1 1
2310
2330
  ## 6 2013 1 1</code></pre>
2311
2331
  <p>It is also possible to select column in a given range</p>
2312
- <pre class="ruby"><code>puts @flights.select(:year.up_to :day).head.as__data__frame</code></pre>
2332
+ <pre class="ruby"><code>puts flights.select(:year.up_to :day).head.as__data__frame</code></pre>
2313
2333
  <pre><code>## year month day
2314
2334
  ## 1 2013 1 1
2315
2335
  ## 2 2013 1 1
@@ -2318,7 +2338,7 @@ puts @df.as__data__frame</code></pre>
2318
2338
  ## 5 2013 1 1
2319
2339
  ## 6 2013 1 1</code></pre>
2320
2340
  <p>Select all columns that start with a given name sequence</p>
2321
- <pre class="ruby"><code>puts @flights.select(E.starts_with('arr')).head.as__data__frame</code></pre>
2341
+ <pre class="ruby"><code>puts flights.select(E.starts_with('arr')).head.as__data__frame</code></pre>
2322
2342
  <pre><code>## arr_time arr_delay
2323
2343
  ## 1 830 11
2324
2344
  ## 2 850 20
@@ -2334,7 +2354,7 @@ puts @df.as__data__frame</code></pre>
2334
2354
  <li><p>num_range(“x”, (1..3)): matches x1, x2 and x3</p></li>
2335
2355
  </ul>
2336
2356
  <p>A helper function that comes in handy when we just want to rearrange column order is ‘Everything’:</p>
2337
- <pre class="ruby"><code>puts @flights.select(:year, :month, :day, E.everything).head.as__data__frame</code></pre>
2357
+ <pre class="ruby"><code>puts flights.select(:year, :month, :day, E.everything).head.as__data__frame</code></pre>
2338
2358
  <pre><code>## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
2339
2359
  ## 1 2013 1 1 517 515 2 830 819
2340
2360
  ## 2 2013 1 1 533 529 4 850 830
@@ -2359,13 +2379,13 @@ puts @df.as__data__frame</code></pre>
2359
2379
  </div>
2360
2380
  <div id="add-variables-to-a-dataframe-with-mutate" class="section level2">
2361
2381
  <h2>Add variables to a dataframe with ‘mutate’</h2>
2362
- <pre class="ruby"><code>@flights_sm = @flights.
2363
- select((:year.up_to :day),
2364
- E.ends_with('delay'),
2365
- :distance,
2366
- :air_time)
2382
+ <pre class="ruby"><code>flights_sm = flights.
2383
+ select((:year.up_to :day),
2384
+ E.ends_with('delay'),
2385
+ :distance,
2386
+ :air_time)
2367
2387
 
2368
- puts @flights_sm.head.as__data__frame</code></pre>
2388
+ puts flights_sm.head.as__data__frame</code></pre>
2369
2389
  <pre><code>## year month day dep_delay arr_delay distance air_time
2370
2390
  ## 1 2013 1 1 2 11 1400 227
2371
2391
  ## 2 2013 1 1 4 20 1416 227
@@ -2373,10 +2393,10 @@ puts @flights_sm.head.as__data__frame</code></pre>
2373
2393
  ## 4 2013 1 1 -1 -18 1576 183
2374
2394
  ## 5 2013 1 1 -6 -25 762 116
2375
2395
  ## 6 2013 1 1 -4 12 719 150</code></pre>
2376
- <pre class="ruby"><code>@flights_sm = @flights_sm.
2377
- mutate(gain: :dep_delay - :arr_delay,
2378
- speed: :distance / :air_time * 60)
2379
- puts @flights_sm.head.as__data__frame</code></pre>
2396
+ <pre class="ruby"><code>flights_sm = flights_sm.
2397
+ mutate(gain: :dep_delay - :arr_delay,
2398
+ speed: :distance / :air_time * 60)
2399
+ puts flights_sm.head.as__data__frame</code></pre>
2380
2400
  <pre><code>## year month day dep_delay arr_delay distance air_time gain speed
2381
2401
  ## 1 2013 1 1 2 11 1400 227 -9 370.0441
2382
2402
  ## 2 2013 1 1 4 20 1416 227 -16 374.2731
@@ -2388,11 +2408,11 @@ puts @flights_sm.head.as__data__frame</code></pre>
2388
2408
  <div id="summarising-data" class="section level2">
2389
2409
  <h2>Summarising data</h2>
2390
2410
  <p>Function ‘summarise’ calculates summaries for the data frame. When no ‘group_by’ is used a single value is obtained from the data frame:</p>
2391
- <pre class="ruby"><code>puts @flights.summarise(delay: E.mean(:dep_delay, na__rm: true)).as__data__frame</code></pre>
2411
+ <pre class="ruby"><code>puts flights.summarise(delay: E.mean(:dep_delay, na__rm: true)).as__data__frame</code></pre>
2392
2412
  <pre><code>## delay
2393
2413
  ## 1 12.63907</code></pre>
2394
2414
  <p>When a data frame is groupe with ‘group_by’ summaries apply to the given group:</p>
2395
- <pre class="ruby"><code>by_day = @flights.group_by(:year, :month, :day)
2415
+ <pre class="ruby"><code>by_day = flights.group_by(:year, :month, :day)
2396
2416
  puts by_day.summarise(delay: :dep_delay.mean(na__rm: true)).head.as__data__frame</code></pre>
2397
2417
  <pre><code>## year month day delay
2398
2418
  ## 1 2013 1 1 11.548926
@@ -2402,7 +2422,7 @@ puts by_day.summarise(delay: :dep_delay.mean(na__rm: true)).head.as__data__frame
2402
2422
  ## 5 2013 1 5 5.732218
2403
2423
  ## 6 2013 1 6 7.148014</code></pre>
2404
2424
  <p>Next we put many operations together by pipping them one after the other:</p>
2405
- <pre class="ruby"><code>delays = @flights.
2425
+ <pre class="ruby"><code>delays = flights.
2406
2426
  group_by(:dest).
2407
2427
  summarise(
2408
2428
  count: E.n,
@@ -2410,105 +2430,14 @@ puts by_day.summarise(delay: :dep_delay.mean(na__rm: true)).head.as__data__frame
2410
2430
  delay: :arr_delay.mean(na__rm: true)).
2411
2431
  filter(:count &gt; 20, :dest != &quot;NHL&quot;)
2412
2432
 
2413
- puts delays.as__data__frame</code></pre>
2414
- <pre><code>## dest count dist delay
2415
- ## 1 ABQ 254 1826.00000 4.38188976
2416
- ## 2 ACK 265 199.00000 4.85227273
2417
- ## 3 ALB 439 143.00000 14.39712919
2418
- ## 4 ATL 17215 757.10822 11.30011285
2419
- ## 5 AUS 2439 1514.25297 6.01990875
2420
- ## 6 AVL 275 583.58182 8.00383142
2421
- ## 7 BDL 443 116.00000 7.04854369
2422
- ## 8 BGR 375 378.00000 8.02793296
2423
- ## 9 BHM 297 865.99663 16.87732342
2424
- ## 10 BNA 6333 758.21348 11.81245891
2425
- ## 11 BOS 15508 190.63696 2.91439222
2426
- ## 12 BQN 896 1578.98326 8.24549550
2427
- ## 13 BTV 2589 265.09154 8.95099602
2428
- ## 14 BUF 4681 296.80837 8.94595186
2429
- ## 15 BUR 371 2465.00000 8.17567568
2430
- ## 16 BWI 1781 179.41830 10.72673385
2431
- ## 17 BZN 36 1882.00000 7.60000000
2432
- ## 18 CAE 116 603.55172 41.76415094
2433
- ## 19 CAK 864 397.00000 19.69833729
2434
- ## 20 CHO 52 305.00000 9.50000000
2435
- ## 21 CHS 2884 632.91678 10.59296847
2436
- ## 22 CLE 4573 414.17428 9.18161129
2437
- ## 23 CLT 14064 538.02730 7.36031885
2438
- ## 24 CMH 3524 476.55505 10.60132291
2439
- ## 25 CRW 138 444.00000 14.67164179
2440
- ## 26 CVG 3941 575.15986 15.36456376
2441
- ## 27 DAY 1525 537.10230 12.68048606
2442
- ## 28 DCA 9705 211.00618 9.06695204
2443
- ## 29 DEN 7266 1614.67836 8.60650021
2444
- ## 30 DFW 8738 1383.04303 0.32212685
2445
- ## 31 DSM 569 1020.88752 19.00573614
2446
- ## 32 DTW 9384 498.12852 5.42996346
2447
- ## 33 EGE 213 1735.70892 6.30434783
2448
- ## 34 FLL 12055 1070.06877 8.08212154
2449
- ## 35 GRR 765 605.78170 18.18956044
2450
- ## 36 GSO 1606 449.84184 14.11260054
2451
- ## 37 GSP 849 595.95995 15.93544304
2452
- ## 38 HNL 707 4972.67468 -1.36519258
2453
- ## 39 HOU 2115 1420.15508 7.17618819
2454
- ## 40 IAD 5700 224.84684 13.86420212
2455
- ## 41 IAH 7198 1407.20672 4.24079040
2456
- ## 42 ILM 110 500.00000 4.63551402
2457
- ## 43 IND 2077 652.26288 9.94043412
2458
- ## 44 JAC 25 1875.60000 28.09523810
2459
- ## 45 JAX 2720 824.67610 11.84483416
2460
- ## 46 LAS 5997 2240.96148 0.25772849
2461
- ## 47 LAX 16174 2468.62236 0.54711094
2462
- ## 48 LGB 668 2465.00000 -0.06202723
2463
- ## 49 MCI 2008 1097.69522 14.51405836
2464
- ## 50 MCO 14082 943.11057 5.45464309
2465
- ## 51 MDW 4113 718.04595 12.36422360
2466
- ## 52 MEM 1789 954.20123 10.64531435
2467
- ## 53 MHT 1009 207.02973 14.78755365
2468
- ## 54 MIA 11728 1091.55244 0.29905978
2469
- ## 55 MKE 2802 733.38151 14.16722038
2470
- ## 56 MSN 572 803.95455 20.19604317
2471
- ## 57 MSP 7185 1017.40167 7.27016886
2472
- ## 58 MSY 3799 1177.70571 6.49017497
2473
- ## 59 MVY 221 173.00000 -0.28571429
2474
- ## 60 MYR 59 550.66102 4.60344828
2475
- ## 61 OAK 312 2576.00000 3.07766990
2476
- ## 62 OKC 346 1325.00000 30.61904762
2477
- ## 63 OMA 849 1135.56655 14.69889841
2478
- ## 64 ORD 17283 729.00081 5.87661475
2479
- ## 65 ORF 1536 288.52344 10.94909344
2480
- ## 66 PBI 6554 1028.83811 8.56297210
2481
- ## 67 PDX 1354 2445.56573 5.14157973
2482
- ## 68 PHL 1632 94.32353 10.12719014
2483
- ## 69 PHX 4656 2141.30326 2.09704733
2484
- ## 70 PIT 2875 334.06122 7.68099053
2485
- ## 71 PSE 365 1617.00000 7.87150838
2486
- ## 72 PVD 376 160.00000 16.23463687
2487
- ## 73 PWM 2352 276.12840 11.66040210
2488
- ## 74 RDU 8163 426.75769 10.05238095
2489
- ## 75 RIC 2454 281.40465 20.11125320
2490
- ## 76 ROC 2416 259.25083 11.56064461
2491
- ## 77 RSW 3537 1072.85327 3.23814963
2492
- ## 78 SAN 2737 2437.29923 3.13916574
2493
- ## 79 SAT 686 1578.34111 6.94537178
2494
- ## 80 SAV 804 709.18408 15.12950601
2495
- ## 81 SDF 1157 645.98358 12.66938406
2496
- ## 82 SEA 3923 2412.66531 -1.09909910
2497
- ## 83 SFO 13331 2577.92356 2.67289152
2498
- ## 84 SJC 329 2569.00000 3.44817073
2499
- ## 85 SJU 5819 1599.83365 2.52052659
2500
- ## 86 SLC 2467 1986.98662 0.17625459
2501
- ## 87 SMF 284 2521.00000 12.10992908
2502
- ## 88 SNA 825 2434.00000 -7.86822660
2503
- ## 89 SRQ 1211 1044.65153 3.08243131
2504
- ## 90 STL 4339 878.72321 11.07846451
2505
- ## 91 STT 522 1626.98276 -3.83590734
2506
- ## 92 SYR 1761 205.92164 8.90392501
2507
- ## 93 TPA 7466 1003.93557 7.40852503
2508
- ## 94 TUL 315 1215.00000 33.65986395
2509
- ## 95 TVC 101 652.38614 12.96842105
2510
- ## 96 TYS 631 638.80983 24.06920415
2511
- ## 97 XNA 1036 1142.50579 7.46572581</code></pre>
2433
+ puts delays.as__data__frame.head</code></pre>
2434
+ <pre><code>## dest count dist delay
2435
+ ## 1 ABQ 254 1826.0000 4.381890
2436
+ ## 2 ACK 265 199.0000 4.852273
2437
+ ## 3 ALB 439 143.0000 14.397129
2438
+ ## 4 ATL 17215 757.1082 11.300113
2439
+ ## 5 AUS 2439 1514.2530 6.019909
2440
+ ## 6 AVL 275 583.5818 8.003831</code></pre>
2512
2441
  </div>
2513
2442
  </div>
2514
2443
  <div id="using-data-table" class="section level1">
@@ -2517,9 +2446,9 @@ puts delays.as__data__frame</code></pre>
2517
2446
  R.install_and_loads('curl')
2518
2447
 
2519
2448
  input = &quot;https://raw.githubusercontent.com/Rdatatable/data.table/master/vignettes/flights14.csv&quot;
2520
- @flights = R.fread(input)
2521
- puts @flights
2522
- puts @flights.dim</code></pre>
2449
+ flights = R.fread(input)
2450
+ puts flights
2451
+ puts flights.dim</code></pre>
2523
2452
  <pre><code>## year month day dep_delay arr_delay carrier origin dest air_time
2524
2453
  ## 1: 2014 1 1 14 13 AA JFK LAX 359
2525
2454
  ## 2: 2014 1 1 -3 13 AA JFK LAX 363
@@ -2564,17 +2493,17 @@ puts data_table.ID</code></pre>
2564
2493
  ## 6: c 6 12 18
2565
2494
  ## [1] &quot;b&quot; &quot;b&quot; &quot;b&quot; &quot;a&quot; &quot;a&quot; &quot;c&quot;</code></pre>
2566
2495
  <pre class="ruby"><code># subset rows in i
2567
- ans = @flights[(:origin.eq &quot;JFK&quot;) &amp; (:month.eq 6)]
2496
+ ans = flights[(:origin.eq &quot;JFK&quot;) &amp; (:month.eq 6)]
2568
2497
  puts ans.head
2569
2498
 
2570
2499
  # Get the first two rows from flights.
2571
2500
 
2572
- ans = @flights[(1..2)]
2501
+ ans = flights[(1..2)]
2573
2502
  puts ans
2574
2503
 
2575
2504
  # Sort flights first by column origin in ascending order, and then by dest in descending order:
2576
2505
 
2577
- # ans = @flights[E.order(:origin, -(:dest))]
2506
+ # ans = flights[E.order(:origin, -(:dest))]
2578
2507
  # puts ans.head</code></pre>
2579
2508
  <pre><code>## year month day dep_delay arr_delay carrier origin dest air_time
2580
2509
  ## 1: 2014 6 1 -9 -5 AA JFK LAX 324
@@ -2599,15 +2528,15 @@ puts ans
2599
2528
  <pre class="ruby"><code># Select column(s) in j
2600
2529
  # select arr_delay column, but return it as a vector.
2601
2530
 
2602
- ans = @flights[:all, :arr_delay]
2531
+ ans = flights[:all, :arr_delay]
2603
2532
  puts ans.head
2604
2533
 
2605
2534
  # Select arr_delay column, but return as a data.table instead.
2606
2535
 
2607
- ans = @flights[:all, :arr_delay.list]
2536
+ ans = flights[:all, :arr_delay.list]
2608
2537
  puts ans.head
2609
2538
 
2610
- ans = @flights[:all, E.list(:arr_delay, :dep_delay)]</code></pre>
2539
+ ans = flights[:all, E.list(:arr_delay, :dep_delay)]</code></pre>
2611
2540
  <pre><code>## [1] 13 13 9 -26 1 0
2612
2541
  ## arr_delay
2613
2542
  ## 1: 13
@@ -2621,113 +2550,407 @@ ans = @flights[:all, E.list(:arr_delay, :dep_delay)]</code></pre>
2621
2550
  <h1>Graphics in Galaaz</h1>
2622
2551
  <p>Creating graphics in Galaaz is quite easy, as it can use all the power of ggplot2. There are many resources in the web that teaches ggplot, so here we give a quick example of ggplot integration with Ruby. We continue to use the :mtcars dataset and we will plot a diverging bar plot, showing cars that have ‘above’ or ‘below’ gas consuption. Let’s first prepare the data frame with the necessary data:</p>
2623
2552
  <pre class="ruby"><code># copy the R variable :mtcars to the Ruby mtcars variable
2624
- @mtcars = ~:mtcars
2553
+ mtcars = ~:mtcars
2625
2554
 
2626
2555
  # create a new column 'car_name' to store the car names so that it can be
2627
2556
  # used for plotting. The 'rownames' of the data frame cannot be used as
2628
2557
  # data for plotting
2629
- @mtcars.car_name = R.rownames(:mtcars)
2558
+ mtcars.car_name = R.rownames(:mtcars)
2630
2559
 
2631
2560
  # compute normalized mpg and add it to a new column called mpg_z
2632
2561
  # Note that the mean value for mpg can be obtained by calling the 'mean'
2633
2562
  # function on the vector 'mtcars.mpg'. The same with the standard
2634
2563
  # deviation 'sd'. The vector is then rounded to two digits with 'round 2'
2635
- @mtcars.mpg_z = ((@mtcars.mpg - @mtcars.mpg.mean)/@mtcars.mpg.sd).round 2
2564
+ mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean)/mtcars.mpg.sd).round 2
2636
2565
 
2637
2566
  # create a new column 'mpg_type'. Function 'ifelse' is a vectorized function
2638
2567
  # that looks at every element of the mpg_z vector and if the value is below
2639
2568
  # 0, returns 'below', otherwise returns 'above'
2640
- @mtcars.mpg_type = (@mtcars.mpg_z &lt; 0).ifelse(&quot;below&quot;, &quot;above&quot;)
2569
+ mtcars.mpg_type = (mtcars.mpg_z &lt; 0).ifelse(&quot;below&quot;, &quot;above&quot;)
2641
2570
 
2642
2571
  # order the mtcar data set by the mpg_z vector from smaler to larger values
2643
- @mtcars = @mtcars[@mtcars.mpg_z.order, :all]
2572
+ mtcars = mtcars[mtcars.mpg_z.order, :all]
2644
2573
 
2645
2574
  # convert the car_name column to a factor to retain sorted order in plot
2646
- @mtcars.car_name = @mtcars.car_name.factor levels: @mtcars.car_name
2575
+ mtcars.car_name = mtcars.car_name.factor levels: mtcars.car_name
2647
2576
 
2648
2577
  # let's look at the final data frame
2649
- puts @mtcars</code></pre>
2650
- <pre><code>## mpg cyl disp hp drat wt qsec vs am gear carb
2651
- ## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
2652
- ## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
2653
- ## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
2654
- ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
2655
- ## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
2656
- ## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
2657
- ## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
2658
- ## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
2659
- ## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
2660
- ## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
2661
- ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
2662
- ## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
2663
- ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
2664
- ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
2665
- ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
2666
- ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
2667
- ## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
2668
- ## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
2669
- ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
2670
- ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
2671
- ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
2672
- ## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
2673
- ## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
2674
- ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
2675
- ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
2676
- ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
2677
- ## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
2678
- ## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
2679
- ## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
2680
- ## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
2681
- ## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
2682
- ## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
2578
+ puts mtcars.head</code></pre>
2579
+ <pre><code>## mpg cyl disp hp drat wt qsec vs am gear carb
2580
+ ## Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
2581
+ ## Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4
2582
+ ## Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4
2583
+ ## Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
2584
+ ## Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4
2585
+ ## Maserati Bora 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8
2683
2586
  ## car_name mpg_z mpg_type
2684
2587
  ## Cadillac Fleetwood Cadillac Fleetwood -1.61 below
2685
2588
  ## Lincoln Continental Lincoln Continental -1.61 below
2686
2589
  ## Camaro Z28 Camaro Z28 -1.13 below
2687
2590
  ## Duster 360 Duster 360 -0.96 below
2688
2591
  ## Chrysler Imperial Chrysler Imperial -0.89 below
2689
- ## Maserati Bora Maserati Bora -0.84 below
2690
- ## Merc 450SLC Merc 450SLC -0.81 below
2691
- ## AMC Javelin AMC Javelin -0.81 below
2692
- ## Dodge Challenger Dodge Challenger -0.76 below
2693
- ## Ford Pantera L Ford Pantera L -0.71 below
2694
- ## Merc 450SE Merc 450SE -0.61 below
2695
- ## Merc 450SL Merc 450SL -0.46 below
2696
- ## Merc 280C Merc 280C -0.38 below
2697
- ## Valiant Valiant -0.33 below
2698
- ## Hornet Sportabout Hornet Sportabout -0.23 below
2699
- ## Merc 280 Merc 280 -0.15 below
2700
- ## Pontiac Firebird Pontiac Firebird -0.15 below
2701
- ## Ferrari Dino Ferrari Dino -0.06 below
2702
- ## Mazda RX4 Mazda RX4 0.15 above
2703
- ## Mazda RX4 Wag Mazda RX4 Wag 0.15 above
2704
- ## Hornet 4 Drive Hornet 4 Drive 0.22 above
2705
- ## Volvo 142E Volvo 142E 0.22 above
2706
- ## Toyota Corona Toyota Corona 0.23 above
2707
- ## Datsun 710 Datsun 710 0.45 above
2708
- ## Merc 230 Merc 230 0.45 above
2709
- ## Merc 240D Merc 240D 0.72 above
2710
- ## Porsche 914-2 Porsche 914-2 0.98 above
2711
- ## Fiat X1-9 Fiat X1-9 1.20 above
2712
- ## Honda Civic Honda Civic 1.71 above
2713
- ## Lotus Europa Lotus Europa 1.71 above
2714
- ## Fiat 128 Fiat 128 2.04 above
2715
- ## Toyota Corolla Toyota Corolla 2.29 above</code></pre>
2592
+ ## Maserati Bora Maserati Bora -0.84 below</code></pre>
2716
2593
  <p>Now, lets plot the diverging bar plot. When using gKnit, there is no need to call ‘R.awt’ to create a plotting device, since gKnit does take care of it. Galaaz provides integration with ggplot. The interested reader should check online for more information on ggplot, since it is outside the scope of this manual describing how ggplot works. We give here but a brief description on how this plot is generated.</p>
2717
2594
  <p>ggplot implements the ‘grammar of graphics’. In this approach, plots are build by adding layers to the plot. On the first layer we describe what we want on the ‘x’ and ‘y’ axis of the plot. In this case, we have ‘car_name’ on the ‘x’ axis and ‘mpg_z’ on the ‘y’ axis. Then the type of graph is specified by adding ‘geom_bar’ (for a bar graph). We specify that our bars should be filled using ‘mpg_type’, which is either ‘above’ or ‘bellow’ giving then two colours for filling. On the next layer we specify the labels for the graph, then we add the title and subtitle. Finally, in a bar chart usually bars go on the vertical direction, but in this graph we want the bars to be horizontally layed so we add ‘coord_flip’.</p>
2718
2595
  <pre class="ruby"><code>require 'ggplot'
2719
2596
 
2720
- puts @mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
2721
- R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
2722
- R.scale_fill_manual(name: 'Mileage',
2723
- labels: R.c('Above Average', 'Below Average'),
2724
- values: R.c('above': '#00ba38', 'below': '#f8766d')) +
2725
- R.labs(subtitle: &quot;Normalised mileage from 'mtcars'&quot;,
2726
- title: &quot;Diverging Bars&quot;) +
2727
- R.coord_flip</code></pre>
2597
+ puts mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
2598
+ R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
2599
+ R.scale_fill_manual(name: 'Mileage',
2600
+ labels: R.c('Above Average', 'Below Average'),
2601
+ values: R.c('above': '#00ba38', 'below': '#f8766d')) +
2602
+ R.labs(subtitle: &quot;Normalised mileage from 'mtcars'&quot;,
2603
+ title: &quot;Diverging Bars&quot;) +
2604
+ R.coord_flip</code></pre>
2728
2605
  <p><img src="" /><!-- --></p>
2606
+ </div>
2607
+ <div id="coding-with-tidyverse" class="section level1">
2608
+ <h1>Coding with Tidyverse</h1>
2609
+ <p>In R, and when coding with ‘tidyverse’, arguments to a function are usually not <em>referencially transparent</em>. That is, you can’t replace a value with a seemingly equivalent object that you’ve defined elsewhere. To see the problem, let’s first define a data frame:</p>
2610
+ <pre class="ruby"><code>df = R.data__frame(x: (1..3), y: (3..1))
2611
+ puts df</code></pre>
2612
+ <pre><code>## x y
2613
+ ## 1 1 3
2614
+ ## 2 2 2
2615
+ ## 3 3 1</code></pre>
2616
+ <p>and now, let’s look at this code:</p>
2617
+ <pre class="r"><code>my_var &lt;- x
2618
+ filter(df, my_var == 1)</code></pre>
2619
+ <p>It generates the following error: &quot;object ‘x’ not found.</p>
2620
+ <p>However, in Galaaz, arguments are referencially transparent as can be seen by the code bellow. Note initally that ‘my_var = :x’ will not give the error “object ‘x’ not found” since ‘:x’ is treated as an expression and assigned to my_var. Then when doing (my_var.eq 1), my_var is a variable that resolves to ‘:x’ and it becomes equivalent to (:x.eq 1) which is what we want.</p>
2621
+ <pre class="ruby"><code>my_var = :x
2622
+ puts df.filter(my_var.eq 1)</code></pre>
2623
+ <pre><code>## x y
2624
+ ## 1 1 3</code></pre>
2625
+ <p>As stated by Hardley</p>
2626
+ <blockquote>
2627
+ <p>dplyr code is ambiguous. Depending on what variables are defined where, filter(df, x == y) could be equivalent to any of:</p>
2628
+ </blockquote>
2629
+ <pre><code>df[df$x == df$y, ]
2630
+ df[df$x == y, ]
2631
+ df[x == df$y, ]
2632
+ df[x == y, ]</code></pre>
2633
+ <p>In galaaz this ambiguity does not exist, filter(df, x.eq y) is not a valid expression as expressions are build with symbols. In doing filter(df, :x.eq y) we are looking for elements of the ‘x’ column that are equal to a previously defined y variable. Finally in filter(df, :x.eq :y) we are looking for elements in which the ‘x’ column value is equal to the ‘y’ column value. This can be seen in the following two chunks of code:</p>
2634
+ <pre class="ruby"><code>y = 1
2635
+ x = 2
2636
+
2637
+ # looking for values where the 'x' column is equal to the 'y' column
2638
+ puts df.filter(:x.eq :y)</code></pre>
2639
+ <pre><code>## x y
2640
+ ## 1 2 2</code></pre>
2641
+ <pre class="ruby"><code># looking for values where the 'x' column is equal to the 'y' variable
2642
+ # in this case, the number 1
2643
+ puts df.filter(:x.eq y)</code></pre>
2644
+ <pre><code>## x y
2645
+ ## 1 1 3</code></pre>
2646
+ <div id="writing-a-function-that-applies-to-different-data-sets" class="section level2">
2647
+ <h2>Writing a function that applies to different data sets</h2>
2648
+ <p>Let’s suppose that we want to write a function that receives as the first argument a data frame and as second argument an expression that adds a column to the data frame that is equal to the sum of elements in column ‘a’ plus ‘x’.</p>
2649
+ <p>Here is the intended behaviour using the ‘mutate’ function of ‘dplyr’:</p>
2650
+ <pre><code>mutate(df1, y = a + x)
2651
+ mutate(df2, y = a + x)
2652
+ mutate(df3, y = a + x)
2653
+ mutate(df4, y = a + x)</code></pre>
2654
+ <p>The naive approach to writing an R function to solve this problem is:</p>
2655
+ <pre><code>mutate_y &lt;- function(df) {
2656
+ mutate(df, y = a + x)
2657
+ }</code></pre>
2658
+ <p>Unfortunately, in R, this function can fail silently if one of the variables isn’t present in the data frame, but is present in the global environment. We will not go through here how to solve this problem in R.</p>
2659
+ <p>In Galaaz the method mutate_y bellow will work fine and will never fail silently.</p>
2660
+ <pre class="ruby"><code>def mutate_y(df)
2661
+ df.mutate(:y.assign :a + :x)
2662
+ end</code></pre>
2663
+ <p>Here we create a data frame that has only one column named ‘x’:</p>
2664
+ <pre class="ruby"><code>df1 = R.data__frame(x: (1..3))
2665
+ puts df1</code></pre>
2666
+ <pre><code>## x
2667
+ ## 1 1
2668
+ ## 2 2
2669
+ ## 3 3</code></pre>
2670
+ <p>Note that method mutate_y will fail independetly from the fact that variable ‘a’ is defined and in the scope of the method. Variable ‘a’ has no relationship with the symbol ‘:a’ used in the definition of ‘mutate_y’ above:</p>
2671
+ <pre class="ruby"><code>a = 10
2672
+ mutate_y(df1)</code></pre>
2673
+ <pre><code>## Message:
2674
+ ## Error in mutate_impl(.data, dots) :
2675
+ ## Evaluation error: object 'a' not found.
2676
+ ## In addition: Warning message:
2677
+ ## In mutate_impl(.data, dots) :
2678
+ ## mismatched protect/unprotect (unprotect with empty protect stack) (RError)
2679
+ ## Translated to internal error</code></pre>
2680
+ </div>
2681
+ <div id="different-expressions" class="section level2">
2682
+ <h2>Different expressions</h2>
2683
+ <p>Let’s move to the next problem as presented by Hardley where trying to write a function in R that will receive two argumens, the first a variable and the second an expression is not trivial. Bellow we create a data frame and we want to write a function that groups data by a variable and summarises it by an expression:</p>
2684
+ <pre class="r"><code>set.seed(123)
2685
+
2686
+ df &lt;- data.frame(
2687
+ g1 = c(1, 1, 2, 2, 2),
2688
+ g2 = c(1, 2, 1, 2, 1),
2689
+ a = sample(5),
2690
+ b = sample(5)
2691
+ )
2692
+
2693
+ as.data.frame(df) </code></pre>
2694
+ <pre><code>## g1 g2 a b
2695
+ ## 1 1 1 2 1
2696
+ ## 2 1 2 4 3
2697
+ ## 3 2 1 5 4
2698
+ ## 4 2 2 3 2
2699
+ ## 5 2 1 1 5</code></pre>
2700
+ <pre class="r"><code>d2 &lt;- df %&gt;%
2701
+ group_by(g1) %&gt;%
2702
+ summarise(a = mean(a))
2703
+
2704
+ as.data.frame(d2) </code></pre>
2705
+ <pre><code>## g1 a
2706
+ ## 1 1 3
2707
+ ## 2 2 3</code></pre>
2708
+ <pre class="r"><code>d2 &lt;- df %&gt;%
2709
+ group_by(g2) %&gt;%
2710
+ summarise(a = mean(a))
2711
+
2712
+ as.data.frame(d2) </code></pre>
2713
+ <pre><code>## g2 a
2714
+ ## 1 1 2.666667
2715
+ ## 2 2 3.500000</code></pre>
2716
+ <p>As shown by Hardley, one might expect this function to do the trick:</p>
2717
+ <pre class="r"><code>my_summarise &lt;- function(df, group_var) {
2718
+ df %&gt;%
2719
+ group_by(group_var) %&gt;%
2720
+ summarise(a = mean(a))
2721
+ }
2722
+
2723
+ # my_summarise(df, g1)
2724
+ #&gt; Error: Column `group_var` is unknown</code></pre>
2725
+ <p>In order to solve this problem, coding with dplyr requires the introduction of many new concepts and functions such as ‘quo’, ‘quos’, ‘enquo’, ‘enquos’, ‘!!’ (bang bang), ‘!!!’ (triple bang). Again, we’ll leave to Hardley the explanation on how to use all those functions.</p>
2726
+ <p>Now, let’s try to implement the same function in galaaz. The next code block first prints the ‘df’ data frame defined previously in R (to access an R variable from Galaaz, we use the tilda operator ‘~’ applied to the R variable name as symbol, i.e., ‘:df’.</p>
2727
+ <pre class="ruby"><code>puts ~:df</code></pre>
2728
+ <pre><code>## g1 g2 a b
2729
+ ## 1 1 1 2 1
2730
+ ## 2 1 2 4 3
2731
+ ## 3 2 1 5 4
2732
+ ## 4 2 2 3 2
2733
+ ## 5 2 1 1 5</code></pre>
2734
+ <p>We then create the ‘my_summarize’ method and call it passing the R data frame and the group by variable ‘:g1’:</p>
2735
+ <pre class="ruby"><code>def my_summarize(df, group_var)
2736
+ df.group_by(group_var).
2737
+ summarize(a: :a.mean)
2738
+ end
2739
+
2740
+ puts my_summarize(:df, :g1).as__data__frame</code></pre>
2741
+ <pre><code>## g1 a
2742
+ ## 1 1 3
2743
+ ## 2 2 3</code></pre>
2744
+ <p>It works!!! Well, let’s make sure this was not just some coincidence</p>
2745
+ <pre class="ruby"><code>puts my_summarize(:df, :g2).as__data__frame</code></pre>
2746
+ <pre><code>## g2 a
2747
+ ## 1 1 2.666667
2748
+ ## 2 2 3.500000</code></pre>
2749
+ <p>Great, everything is fine! No magic, no new functions, no complexities, just normal, standard Ruby code. If you’ve ever done NSE in R, this certainly feels much safer and easy to implement.</p>
2750
+ </div>
2751
+ <div id="different-input-variables" class="section level2">
2752
+ <h2>Different input variables</h2>
2753
+ <p>In the previous section we’ve managed to get rid of all NSE formulation for a simple example, but does this remain true for more complex examples, or will the Galaaz way prove inpractical for more complex code?</p>
2754
+ <p>In the next example Hardley proposes us to write a function that given an expression such as ‘a’ or ‘a * b’, calculates three summaries. What we want a function that does the same as these R statements:</p>
2755
+ <pre><code>summarise(df, mean = mean(a), sum = sum(a), n = n())
2756
+ #&gt; # A tibble: 1 x 3
2757
+ #&gt; mean sum n
2758
+ #&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt;
2759
+ #&gt; 1 3 15 5
2760
+
2761
+ summarise(df, mean = mean(a * b), sum = sum(a * b), n = n())
2762
+ #&gt; # A tibble: 1 x 3
2763
+ #&gt; mean sum n
2764
+ #&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt;
2765
+ #&gt; 1 9 45 5</code></pre>
2766
+ <p>Let’s try it in galaaz:</p>
2767
+ <pre class="ruby"><code>def my_summarise2(df, expr)
2768
+ df.summarize(
2769
+ mean: E.mean(expr),
2770
+ sum: E.sum(expr),
2771
+ n: E.n
2772
+ )
2773
+ end
2774
+
2775
+ puts my_summarise2((~:df), :a)
2776
+ puts &quot;\n&quot;
2777
+ puts my_summarise2((~:df), :a * :b)</code></pre>
2778
+ <pre><code>## mean sum n
2779
+ ## 1 3 15 5
2780
+ ##
2781
+ ## mean sum n
2782
+ ## 1 9 45 5</code></pre>
2783
+ <p>Once again, there is no need to use any special theory or functions. The only point to be careful about is the use of ‘E’ to build expressions from functions ‘mean’, ‘sum’ and ‘n’.</p>
2784
+ </div>
2785
+ <div id="different-input-and-output-variable" class="section level2">
2786
+ <h2>Different input and output variable</h2>
2787
+ <p>Now the next challenge presented by Hardley is to vary the name of the output variables based on the received expression. So, if the input expression is ‘a’, we want our data frame columns to be named ‘mean_a’ and ‘sum_a’. Now, if the input expression is ‘b’, columns should be named ‘mean_b’ and ‘sum_b’.</p>
2788
+ <pre><code>mutate(df, mean_a = mean(a), sum_a = sum(a))
2789
+ #&gt; # A tibble: 5 x 6
2790
+ #&gt; g1 g2 a b mean_a sum_a
2791
+ #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt;
2792
+ #&gt; 1 1 1 1 3 3 15
2793
+ #&gt; 2 1 2 4 2 3 15
2794
+ #&gt; 3 2 1 2 1 3 15
2795
+ #&gt; 4 2 2 5 4 3 15
2796
+ #&gt; # … with 1 more row
2797
+
2798
+ mutate(df, mean_b = mean(b), sum_b = sum(b))
2799
+ #&gt; # A tibble: 5 x 6
2800
+ #&gt; g1 g2 a b mean_b sum_b
2801
+ #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;int&gt;
2802
+ #&gt; 1 1 1 1 3 3 15
2803
+ #&gt; 2 1 2 4 2 3 15
2804
+ #&gt; 3 2 1 2 1 3 15
2805
+ #&gt; 4 2 2 5 4 3 15
2806
+ #&gt; # … with 1 more row</code></pre>
2807
+ <p>In order to solve this problem in R, Hardley needs to introduce some more new functions and notations: ‘quo_name’ and the ‘:=’ operator from package ‘rlang’</p>
2808
+ <p>Here is our Ruby code:</p>
2809
+ <pre class="ruby"><code>def my_mutate(df, expr)
2810
+ mean_name = &quot;mean_#{expr.to_s}&quot;
2811
+ sum_name = &quot;sum_#{expr.to_s}&quot;
2812
+
2813
+ df.mutate(mean_name =&gt; E.mean(expr),
2814
+ sum_name =&gt; E.sum(expr))
2815
+ end
2816
+
2817
+ puts my_mutate((~:df), :a)
2818
+ puts &quot;\n&quot;
2819
+ puts my_mutate((~:df), :b)</code></pre>
2820
+ <pre><code>## g1 g2 a b mean_a sum_a
2821
+ ## 1 1 1 2 1 3 15
2822
+ ## 2 1 2 4 3 3 15
2823
+ ## 3 2 1 5 4 3 15
2824
+ ## 4 2 2 3 2 3 15
2825
+ ## 5 2 1 1 5 3 15
2826
+ ##
2827
+ ## g1 g2 a b mean_b sum_b
2828
+ ## 1 1 1 2 1 3 15
2829
+ ## 2 1 2 4 3 3 15
2830
+ ## 3 2 1 5 4 3 15
2831
+ ## 4 2 2 3 2 3 15
2832
+ ## 5 2 1 1 5 3 15</code></pre>
2833
+ <p>It really seems that “Non Standard Evaluation” is actually quite standard in Galaaz! But, you might have noticed a small change in the way the arguments to the mutate method were called. In a previous example we used df.summarise(mean: E.mean(:a), …) where the column name was followed by a ‘:’ colom. In this example, we have df.mutate(mean_name =&gt; E.mean(expr), …) and variable mean_name is not followed by ‘:’ but by ‘=&gt;’. This is standard Ruby notation.</p>
2834
+ <p>[explain….]</p>
2835
+ </div>
2836
+ <div id="capturing-multiple-variables" class="section level2">
2837
+ <h2>Capturing multiple variables</h2>
2838
+ <p>Moving on with new complexities, Hardley proposes us to solve the problem in which the summarise function will receive any number of grouping variables.</p>
2839
+ <p>This again is quite standard Ruby. In order to receive an undefined number of paramenters the paramenter is preceded by ’*’:</p>
2840
+ <pre class="ruby"><code>def my_summarise3(df, *group_vars)
2841
+ df.group_by(*group_vars).
2842
+ summarise(a: E.mean(:a))
2843
+ end
2844
+
2845
+ puts my_summarise3((~:df), :g1, :g2).as__data__frame</code></pre>
2846
+ <pre><code>## g1 g2 a
2847
+ ## 1 1 1 2
2848
+ ## 2 1 2 4
2849
+ ## 3 2 1 3
2850
+ ## 4 2 2 3</code></pre>
2851
+ </div>
2852
+ <div id="why-does-r-require-nse-and-galaaz-does-not" class="section level2">
2853
+ <h2>Why does R require NSE and Galaaz does not?</h2>
2854
+ <p>NSE introduces a number of new concepts, such as ‘quoting’, ‘quasiquotation’, ‘unquoting’ and ‘unquote-splicing’, while in Galaaz none of those concepts are needed. What gives?</p>
2855
+ <p>R is an extremely flexible language and it has lazy evaluation of parameters. When in R a function is called as ‘summarise(df, a = b)’, the summarise function receives the litteral ‘a = b’ parameter and can work with this as if it were a string. In R, it is not clear what a and b are, they can be expressions or they can be variables, it is up to the function to decide what ‘a = b’ means.</p>
2856
+ <p>In Ruby, there is no lazy evaluation of parameters and ‘a’ is always a variable and so is ‘b’. Variables assume their value as soon as they are used, so ‘x = a’ is immediately evaluate and variable ‘x’ will receive the value of variable ‘a’ as soon as the Ruby statement is executed. Ruby also provides the notion of a symbol; ‘:a’ is a symbol and does not evaluate to anything. Galaaz uses Ruby symbols to build expressions that are not bound to anything: ‘:a.eq :b’ is clearly an expression and has no relationship whatsoever with the statment ‘a = b’. By using symbols, variables and expressions all the possible ambiguities that are found in R are eliminated in Galaaz.</p>
2857
+ <p>The main problem that remains, is that in R, functions are not clearly documented as what type of input they are expecting, they might be expecting regular variables or they might be expecting expressions and the R function will know how to deal with an input of the form ‘a = b’, now for the Ruby developer it might not be immediately clear if it should call the function passing the value ‘true’ if variable ‘a’ is equal to variable ‘b’ or if it should call the function passing the expression ‘:a.eq :b’.</p>
2858
+ </div>
2859
+ <div id="advanced-dplyr-features" class="section level2">
2860
+ <h2>Advanced dplyr features</h2>
2861
+ <p>In the blog: Programming with dplyr by using dplyr (<a href="https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/" class="uri">https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/</a>) Iñaki Úcar shows surprise that some R users are trying to code in dplyr avoiding the use of NSE. For instance he says:</p>
2862
+ <blockquote>
2863
+ <p>Take the example of seplyr. It stands for standard evaluation dplyr, and enables us to program over dplyr without having “to bring in (or study) any deep-theory or heavy-weight tools such as rlang/tidyeval”.</p>
2864
+ </blockquote>
2865
+ <p>For me, there isn’t really any surprise that users are trying to avoid dplyr deep-theory. R users frequently are not programmers and learning to code is already hard business, on top of that, having to learn how to ‘quote’ or ‘enquo’ or ‘quos’ or ‘enquos’ is not necessarily a ‘piece of cake’. So much so, that ‘tidyeval’ has some more advanced functions that instead of using quoted expressions, uses strings as arguments.</p>
2866
+ <p>In the following examples, we show the use of functions ‘group_by_at’, ‘summarise_at’ and ‘rename_at’ that receive strings as argument. The data frame used in ‘starwars’ that describes features of characters in the Starwars movies:</p>
2867
+ <pre class="ruby"><code>puts (~:starwars).head.as__data__frame</code></pre>
2868
+ <pre><code>## name height mass hair_color skin_color eye_color birth_year
2869
+ ## 1 Luke Skywalker 172 77 blond fair blue 19.0
2870
+ ## 2 C-3PO 167 75 &lt;NA&gt; gold yellow 112.0
2871
+ ## 3 R2-D2 96 32 &lt;NA&gt; white, blue red 33.0
2872
+ ## 4 Darth Vader 202 136 none white yellow 41.9
2873
+ ## 5 Leia Organa 150 49 brown light brown 19.0
2874
+ ## 6 Owen Lars 178 120 brown, grey light blue 52.0
2875
+ ## gender homeworld species
2876
+ ## 1 male Tatooine Human
2877
+ ## 2 &lt;NA&gt; Tatooine Droid
2878
+ ## 3 &lt;NA&gt; Naboo Droid
2879
+ ## 4 male Tatooine Human
2880
+ ## 5 female Alderaan Human
2881
+ ## 6 male Tatooine Human
2882
+ ## films
2883
+ ## 1 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
2884
+ ## 2 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
2885
+ ## 3 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
2886
+ ## 4 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
2887
+ ## 5 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
2888
+ ## 6 Attack of the Clones, Revenge of the Sith, A New Hope
2889
+ ## vehicles starships
2890
+ ## 1 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle
2891
+ ## 2
2892
+ ## 3
2893
+ ## 4 TIE Advanced x1
2894
+ ## 5 Imperial Speeder Bike
2895
+ ## 6</code></pre>
2896
+ <p>The grouped_mean function bellow will receive a grouping variable and calculate summaries for the value_variables given:</p>
2897
+ <pre class="r"><code>grouped_mean &lt;- function(data, grouping_variables, value_variables) {
2898
+ data %&gt;%
2899
+ group_by_at(grouping_variables) %&gt;%
2900
+ mutate(count = n()) %&gt;%
2901
+ summarise_at(c(value_variables, &quot;count&quot;), mean, na.rm = TRUE) %&gt;%
2902
+ rename_at(value_variables, funs(paste0(&quot;mean_&quot;, .)))
2903
+ }
2904
+
2905
+ gm = starwars %&gt;%
2906
+ grouped_mean(&quot;eye_color&quot;, c(&quot;mass&quot;, &quot;birth_year&quot;))
2907
+
2908
+ as.data.frame(gm) </code></pre>
2909
+ <pre><code>## eye_color mean_mass mean_birth_year count
2910
+ ## 1 black 76.28571 33.00000 10
2911
+ ## 2 blue 86.51667 67.06923 19
2912
+ ## 3 blue-gray 77.00000 57.00000 1
2913
+ ## 4 brown 66.09231 108.96429 21
2914
+ ## 5 dark NaN NaN 1
2915
+ ## 6 gold NaN NaN 1
2916
+ ## 7 green, yellow 159.00000 NaN 1
2917
+ ## 8 hazel 66.00000 34.50000 3
2918
+ ## 9 orange 282.33333 231.00000 8
2919
+ ## 10 pink NaN NaN 1
2920
+ ## 11 red 81.40000 33.66667 5
2921
+ ## 12 red, blue NaN NaN 1
2922
+ ## 13 unknown 31.50000 NaN 3
2923
+ ## 14 white 48.00000 NaN 1
2924
+ ## 15 yellow 81.11111 76.38000 11</code></pre>
2925
+ <p>The same code with Galaaz, becomes:</p>
2926
+ <pre class="ruby"><code>def grouped_mean(data, grouping_variables, value_variables)
2927
+ data.
2928
+ group_by_at(grouping_variables).
2929
+ mutate(count: E.n).
2930
+ summarise_at(E.c(value_variables, &quot;count&quot;), ~:mean, na__rm: true).
2931
+ rename_at(value_variables, E.funs(E.paste0(&quot;mean_&quot;, value_variables)))
2932
+ end
2933
+
2934
+ puts grouped_mean((~:starwars), &quot;eye_color&quot;, E.c(&quot;mass&quot;, &quot;birth_year&quot;)).as__data__frame</code></pre>
2935
+ <pre><code>## eye_color mean_mass mean_birth_year count
2936
+ ## 1 black 76.28571 33.00000 10
2937
+ ## 2 blue 86.51667 67.06923 19
2938
+ ## 3 blue-gray 77.00000 57.00000 1
2939
+ ## 4 brown 66.09231 108.96429 21
2940
+ ## 5 dark NaN NaN 1
2941
+ ## 6 gold NaN NaN 1
2942
+ ## 7 green, yellow 159.00000 NaN 1
2943
+ ## 8 hazel 66.00000 34.50000 3
2944
+ ## 9 orange 282.33333 231.00000 8
2945
+ ## 10 pink NaN NaN 1
2946
+ ## 11 red 81.40000 33.66667 5
2947
+ ## 12 red, blue NaN NaN 1
2948
+ ## 13 unknown 31.50000 NaN 3
2949
+ ## 14 white 48.00000 NaN 1
2950
+ ## 15 yellow 81.11111 76.38000 11</code></pre>
2729
2951
  <p>[TO BE CONTINUED…]</p>
2730
2952
  </div>
2953
+ </div>
2731
2954
  <div id="contributing" class="section level1">
2732
2955
  <h1>Contributing</h1>
2733
2956
  <ul>