RubyGems - galaaz - Versions diffs - 0.4.7 → 0.4.8 - Mend

galaaz 0.4.7 → 0.4.8

Files changed (68) hide show

checksums.yaml +4 -4
data/README.md +1179 -39
data/Rakefile +4 -2
data/bin/grun +1 -1
data/bin/gstudio +1 -1
data/blogs/dev/dev.Rmd +2 -56
data/blogs/dev/dev.md +32 -61
data/blogs/dev/dev2.Rmd +65 -0
data/blogs/dplyr/dplyr.Rmd +29 -0
data/blogs/{dev/dev.html → dplyr/dplyr.html} +88 -57
data/blogs/dplyr/dplyr.md +58 -0
data/blogs/gknit/gknit.html +1262 -25
data/blogs/gknit/gknit.md +471 -27
data/blogs/gknit/gknit_files/figure-html/bubble-1.png +0 -0
data/blogs/manual/graph.rb +29 -0
data/blogs/manual/manual.Rmd +567 -29
data/blogs/manual/manual.html +743 -46
data/blogs/manual/manual.md +1179 -39
data/blogs/nse_dplyr/nse_dplyr.Rmd +466 -11
data/blogs/nse_dplyr/nse_dplyr.html +472 -37
data/blogs/nse_dplyr/nse_dplyr.md +645 -32
data/blogs/ruby_plot/ruby_plot.Rmd +4 -4
data/blogs/ruby_plot/ruby_plot.html +217 -2
data/blogs/ruby_plot/ruby_plot.md +226 -1
data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +2 -2
data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +70 -70
data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +72 -72
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +116 -116
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +176 -176
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +236 -236
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +176 -176
data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +160 -160
data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +105 -105
data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +121 -121
data/examples/islr/ch2.spec.rb +1 -1
data/examples/islr/ch3_boston.rb +4 -4
data/examples/islr/x_y_rnorm.jpg +0 -0
data/lib/R_interface/r.rb +1 -1
data/lib/R_interface/r_methods.rb +2 -2
data/lib/R_interface/rdata_frame.rb +8 -5
data/lib/R_interface/rindexed_object.rb +1 -2
data/lib/R_interface/rlist.rb +1 -0
data/lib/R_interface/robject.rb +0 -1
data/lib/R_interface/rpkg.rb +14 -6
data/lib/R_interface/rsupport.rb +7 -9
data/lib/R_interface/ruby_extensions.rb +17 -5
data/lib/gknit/knitr_engine.rb +9 -2
data/lib/util/exec_ruby.rb +2 -2
data/specs/r_dataframe.spec.rb +173 -0
data/specs/r_list.spec.rb +4 -4
data/specs/ruby_expression.spec.rb +2 -11
data/specs/tmp.rb +76 -34
data/version.rb +1 -1
metadata +17 -6
data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
data/blogs/dev/dev_files/figure-html/diverging_bar. +0 -0
data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0

data/blogs/nse_dplyr/nse_dplyr.html CHANGED

@@ -283,6 +283,9 @@ img {
 button.code-folding-btn:focus {
   outline: none;
 }
+summary {
+  display: list-item;
+}
 </style>
@@ -290,10 +293,71 @@ button.code-folding-btn:focus {
 <div class="container-fluid main-container">
 <!-- tabsets -->
+<style type="text/css">
+.tabset-dropdown > .nav-tabs {
+  display: inline-table;
+  max-height: 500px;
+  min-height: 44px;
+  overflow-y: auto;
+  background: white;
+  border: 1px solid #ddd;
+  border-radius: 4px;
+}
+.tabset-dropdown > .nav-tabs > li.active:before {
+  content: "";
+  font-family: 'Glyphicons Halflings';
+  display: inline-block;
+  padding: 10px;
+  border-right: 1px solid #ddd;
+}
+.tabset-dropdown > .nav-tabs.nav-tabs-open > li.active:before {
+  content: "";
+  border: none;
+}
+.tabset-dropdown > .nav-tabs.nav-tabs-open:before {
+  content: "";
+  font-family: 'Glyphicons Halflings';
+  display: inline-block;
+  padding: 10px;
+  border-right: 1px solid #ddd;
+}
+.tabset-dropdown > .nav-tabs > li.active {
+  display: block;
+}
+.tabset-dropdown > .nav-tabs > li > a,
+.tabset-dropdown > .nav-tabs > li > a:focus,
+.tabset-dropdown > .nav-tabs > li > a:hover {
+  border: none;
+  display: inline-block;
+  border-radius: 4px;
+}
+.tabset-dropdown > .nav-tabs.nav-tabs-open > li {
+  display: block;
+  float: none;
+}
+.tabset-dropdown > .nav-tabs > li {
+  display: none;
+}
+</style>
 <script>
 $(document).ready(function () {
   window.buildTabsets("TOC");
 });
+$(document).ready(function () {
+  $('.tabset-dropdown > .nav-tabs > li').click(function () {
+    $(this).parent().toggleClass('nav-tabs-open')
+  });
+});
 </script>
 <!-- code folding -->
@@ -302,7 +366,6 @@ $(document).ready(function () {
 <div class="fluid-row" id="header">
@@ -317,6 +380,63 @@ $(document).ready(function () {
 <div id="introduction" class="section level1">
 <h1>Introduction</h1>
+<p>In this post we will see how to program with dplyr in Galaaz.</p>
+<div id="but-first-what-is-galaaz" class="section level3">
+<h3>But first, what is Galaaz??</h3>
+<p>Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with a large community, a very large set of libraries and great for web development. However, it lacks libraries for data science, statistics, scientific plotting and machine learning. On the other hand, R is considered one of the most powerful languages for solving all of the above problems. Maybe the strongest competitor to R is Python with libraries such as NumPy, Panda, SciPy, SciKit-Learn and a couple more.</p>
+<p>With Galaaz we do not intend to re-implement any of the scientific libraries in R. However, we allow for very tight coupling between the two languages to the point that the Ruby developer does not need to know that there is an R engine running. For this to happen we use new technologies provided by Oracle: GraalVM, TruffleRuby and FastR:</p>
+<pre><code> GraalVM is a universal virtual machine for running applications
+ written in JavaScript, Python 3, Ruby, R, JVM-based languages like Java,
+ Scala, Kotlin, and LLVM-based languages such as C and C++.
+ GraalVM removes the isolation between programming languages and enables
+ interoperability in a shared runtime. It can run either standalone or in
+ the context of OpenJDK, Node.js, Oracle Database, or MySQL.
+ GraalVM allows you to write polyglot applications with a seamless way to
+ pass values from one language to another. With GraalVM there is no copying
+ or marshaling necessary as it is with other polyglot systems. This lets
+ you achieve high performance when language boundaries are crossed. Most
+ of the time there is no additional cost for crossing a language boundary
+ at all.
+ Often developers have to make uncomfortable compromises that require them
+ to rewrite their software in other languages. For example:
+  * “That library is not available in my language. I need to rewrite it.”
+  * “That language would be the perfect fit for my problem, but we cannot
+    run it in our environment.”
+  * “That problem is already solved in my language, but the language is
+    too slow.”
+With GraalVM we aim to allow developers to freely choose the right language
+for the task at hand without making compromises.</code></pre>
+<p>Interested readers should also check out the following sites:</p>
+<ul>
+<li><a href="https://www.graalvm.org/">GraalVM Home</a></li>
+<li><a href="https://github.com/oracle/truffleruby">TruffleRuby</a></li>
+<li><a href="https://github.com/oracle/fastr">FastR</a></li>
+<li><a href="https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb">Faster R with FastR</a></li>
+</ul>
+</div>
+<div id="now-to-programming-with-dplyr" class="section level3">
+<h3>Now to programming with dplyr</h3>
+<p>According to Hardley (<a href="https://dplyr.tidyverse.org/articles/programming.html" class="uri">https://dplyr.tidyverse.org/articles/programming.html</a>)</p>
+<blockquote>
+<p>Most dplyr functions use non-standard evaluation (NSE). This is a catch-all term that means they don’t follow the usual R rules of evaluation. Instead, they capture the expression that you typed and evaluate it in a custom way. This has two main benefits for dplyr code:</p>
+</blockquote>
+<blockquote>
+<p>Operations on data frames can be expressed succinctly because you don’t need to repeat the name of the data frame. For example, you can write filter(df, x == 1, y == 2, z == 3) instead of df[df<span class="math inline">\(x == 1 &amp; df\)</span>y ==2 &amp; df$z == 3, ].</p>
+</blockquote>
+<blockquote>
+<p>dplyr can choose to compute results in a different way to base R. This is important for database backends because dplyr itself doesn’t do any work, but instead generates the SQL that tells the database what to do.</p>
+</blockquote>
+<blockquote>
+<p>Unfortunately these benefits do not come for free. There are two main drawbacks:</p>
+</blockquote>
+<blockquote>
+<p>Most dplyr arguments are not referentially transparent. That means you can’t replace a value with a seemingly equivalent object that you’ve defined elsewhere. In other words, this code:</p>
+</blockquote>
 <pre class="r"><code>df &lt;- data.frame(x = 1:3, y = 3:1)
 print(df)</code></pre>
 <pre><code>##   x y
@@ -330,63 +450,378 @@ print(df)</code></pre>
 #&gt;       x     y
 #&gt;   &lt;int&gt; &lt;int&gt;
 #&gt; 1     1     3</code></pre>
+<blockquote>
+<p>Is not equivalent to this code:</p>
+</blockquote>
 <pre class="r"><code>my_var &lt;- x
 #&gt; Error in eval(expr, envir, enclos): object 'x' not found
 filter(df, my_var == 1)
 #&gt; Error: object 'my_var' not found</code></pre>
+<blockquote>
+<p>This makes it hard to create functions with arguments that change how dplyr verbs are computed.</p>
+</blockquote>
+</div>
+</div>
+<div id="writing-expressions-in-galaaz" class="section level1">
+<h1>Writing Expressions in Galaaz</h1>
+<p>Galaaz extends Ruby to work with complex expressions, similar to R’s expressions build with ‘quote’ (base R) or ‘quo’ (tidyverse). Let’s take a look at some of those expressions.</p>
+<div id="expressions-from-operators" class="section level2">
+<h2>Expressions from operators</h2>
+<p>The code bellow creates an expression summing two symbols</p>
+<pre class="ruby"><code>exp1 = :a + :b
+puts exp1</code></pre>
+<pre><code>## a + b</code></pre>
+<p>We can build any complex mathematical expression</p>
+<pre class="ruby"><code>exp2 = (:a + :b) * 2.0 + :c ** 2 / :z
+puts exp2</code></pre>
+<pre><code>## (a + b) * 2 + c^2L/z</code></pre>
+<p>It is also possible to use inequality operators in building expressions</p>
+<pre class="ruby"><code>exp3 = (:a + :b) &gt;= :z
+puts exp3</code></pre>
+<pre><code>## a + b &gt;= z</code></pre>
+<p>Galaaz provides both symbolic representations for operators, such as (&gt;, &lt;, !=) as functional notation for those operators such as (.gt, .ge, etc.). So the same expression written above can also be written as</p>
+<pre class="ruby"><code>exp4 = (:a + :b).ge :z
+puts exp4</code></pre>
+<pre><code>## a + b &gt;= z</code></pre>
+<p>Two type of expression can only be created with the functional representation of the operators, those are expressions involving ‘==’, and ‘=’. In order to write an expression involving ‘==’ we need to use the method ‘.eq’ and for ‘=’ we need the function ‘.assign’</p>
+<pre class="ruby"><code>exp5 = (:a + :b).eq :z
+puts exp5</code></pre>
+<pre><code>## a + b == z</code></pre>
+<pre class="ruby"><code>exp6 = :y.assign :a + :b
+puts exp6</code></pre>
+<pre><code>## y &lt;- a + b</code></pre>
+<p>In general we think that using the functional notation is preferable to using the symbolic notation as otherwise, we end up writing invalid expressions such as</p>
+<pre class="ruby"><code>exp_wrong = (:a + :b) == :z
+puts exp_wrong</code></pre>
+<pre><code>## Message:
+##  Error in function (x, y, num.eq = TRUE, single.NA = TRUE, attrib.as.set = TRUE,  :
+##   object 'a' not found (RError)
+## Translated to internal error</code></pre>
+<p>and it might be difficult to understand what is going on here. The problem lies with the fact that when using ‘==’ we are comparing expression (:a + :b) to expression :z with ‘==’. When the comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols, at this time are not bound to anything and we get a “object ‘a’ not found” message.<br />
+If we only use functional notation, this type of error will never occur.</p>
+</div>
+<div id="expressions-with-r-methods" class="section level2">
+<h2>Expressions with R methods</h2>
+<p>It is often necessary to create an expression that uses a method or function. For instance, in mathematics, it’s quite natural to write an expressin such as <span class="math inline">\(y = sin(x)\)</span>. In this case, the ‘sin’ function is part of the expression and should not immediately executed. Now, let’s say that ‘x’ is an angle of 45<span class="math inline">\(^\circ\)</span> and we acttually want our expression to be <span class="math inline">\(y = 0.850...\)</span>. When we want the function to be part of the expression, we call the function preceeding it by the letter E, such as ‘E.sin(x)’</p>
+<pre class="ruby"><code>exp7 = :y.assign E.sin(:x)
+puts exp7</code></pre>
+<pre><code>## y &lt;- sin(x)</code></pre>
+<p>However, if we want the function to be evaluated, then we use the normal call to function with R as ‘R.sin(x)’.</p>
+<pre class="ruby"><code>x = 45
+exp8 = :y.assign R.sin(x)
+puts exp8</code></pre>
+<pre><code>## y &lt;- 0.850903524534118</code></pre>
+</div>
+</div>
+<div id="filtering-using-expressions" class="section level1">
+<h1>Filtering using expressions</h1>
+<p>Now that we now how to write expression, we can use then to filter a data frame by expressions.<br />
+Let’s first start by creating a simple data frame with two columns named ‘x’ and ‘y’</p>
 <pre class="ruby"><code>@df = R.data__frame(x: (1..3), y: (3..1))
-puts @df
-puts @df.filter(:x.eq 1)</code></pre>
+puts @df</code></pre>
 <pre><code>##   x y
 ## 1 1 3
 ## 2 2 2
-## 3 3 1
-##   x y
+## 3 3 1</code></pre>
+<p>In the code bellow we want to filter the data frame by rows in which the value of ‘x’ is equal to 1.</p>
+<pre class="ruby"><code>puts @df.filter(:x.eq 1)</code></pre>
+<pre><code>##   x y
 ## 1 1 3</code></pre>
+<p>In R, and when coding with ‘tidyverse’, arguments to a function are usually not <em>referencially transparent</em>. That is, ou can’t replace a value with a seemingly equivalent object that you’ve defined elsewhere. In other words, this code</p>
+<pre class="r"><code>my_var &lt;- x
+filter(df, my_var == 1)</code></pre>
+<p>Generates the following error: &quot;object ‘x’ not found.</p>
+<p>However, in Ruby and Galaaz, arguments are referencially transparent as can be seen by the code bellow. Note, initally that ‘my_var = :x’ will not give the error “object ‘x’ not found” since ‘:x’ is treated as an expression and assigned to my_var. Then when doing (my_var.eq 1), my_var is a variable that resolves to ‘:x’ and it becomes equivalent to (:x.eq 1) which is what we want.</p>
 <pre class="ruby"><code>my_var = :x
 puts @df.filter(my_var.eq 1)</code></pre>
 <pre><code>##   x y
 ## 1 1 3</code></pre>
-<blockquote>
+<p>As stated by Hardley</p>
 <blockquote>
 <p>dplyr code is ambiguous. Depending on what variables are defined where, filter(df, x == y) could be equivalent to any of:</p>
 </blockquote>
-</blockquote>
 <pre><code>df[df$x == df$y, ]
 df[df$x == y, ]
 df[x == df$y, ]
 df[x == y, ]</code></pre>
-<p>In galaaz this ambiguity does not exist</p>
-<pre class="ruby"><code>y = 2
-x = 2
-@df[:x.eq :y, :all]
-@df[:x.eq y, :all]
-# @df[x.eq :y, :all]
-# @df[x == y, :all]</code></pre>
-<pre><code>## Message:
-##  wrong number of arguments (given 2, expected 1)</code></pre>
+<p>In galaaz this ambiguity does not exist, filter(df, x.eq y) is not a valid expression as expressions are build with symbols. In doing filter(df, :x.eq y) we are looking for elements of the ‘x’ column that are equal to a previously defined y variable. Finally, filter(df, :x.eq :y) we are looking for elements in which the ‘x’ column value is equal to the ‘y’ column value. This can be seen in the following two chunks of code:</p>
+<pre class="ruby"><code>@y = 1
+@x = 2
+# looking for values where the 'x' column is equal to the 'y' column
+puts @df.filter(:x.eq :y)</code></pre>
+<pre><code>##   x y
+## 1 2 2</code></pre>
+<pre class="ruby"><code># looking for values where the 'x' column is equal to the 'y' variable
+# in this case, the number 1
+puts @df.filter(:x.eq @y)</code></pre>
+<pre><code>##   x y
+## 1 1 3</code></pre>
+</div>
+<div id="writing-a-function-that-applies-to-different-data-sets" class="section level1">
+<h1>Writing a function that applies to different data sets</h1>
+<pre><code>mutate(df1, y = a + x)
+mutate(df2, y = a + x)
+mutate(df3, y = a + x)
+mutate(df4, y = a + x)</code></pre>
+<p>Here we create a mutate_y Ruby method.</p>
+<pre class="ruby"><code>def mutate_y(df)
+  df.mutate(:y.assign :a + :x)
+end</code></pre>
+<p>Note that contrary to what happens in R, method mutate_y will fail independetly from the fact that variable ‘a’ is defined or not.</p>
+<pre class="ruby"><code>df1 = R.data__frame(x: (1..3))
+puts df1
+a = 10
+mutate_y(df1)</code></pre>
 <pre><code>## Message:
-##  /home/rbotafogo/desenv/galaaz/lib/R_interface/rbinary_operators.rb:134:in `eq'
-## (eval):4:in `exec_ruby'
-## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:138:in `instance_eval'
-## /home/rbotafogo/desenv/galaaz/lib/util/exec_ruby.rb:138:in `exec_ruby'
-## /home/rbotafogo/desenv/galaaz/lib/gknit/knitr_engine.rb:650:in `block in initialize'
-## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `call'
-## /home/rbotafogo/desenv/galaaz/lib/R_interface/ruby_callback.rb:77:in `callback'
-## (eval):3:in `function(...) {\n          rb_method(...)'
-## unknown.r:1:in `in_dir'
-## unknown.r:1:in `block_exec'
-## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc12/jre/languages/R/library/knitr/R/block.R:91:in `call_block'
-## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc12/jre/languages/R/library/knitr/R/block.R:6:in `process_group.block'
-## /home/rbotafogo/lib/graalvm-ce-1.0.0-rc12/jre/languages/R/library/knitr/R/block.R:3:in `&lt;no source&gt;'
-## unknown.r:1:in `withCallingHandlers'
-## unknown.r:1:in `process_file'
-## unknown.r:1:in `&lt;no source&gt;'
-## unknown.r:1:in `&lt;no source&gt;'
-## &lt;REPL&gt;:5:in `&lt;repl wrapper&gt;'
-## &lt;REPL&gt;:1</code></pre>
+##  Error in mutate_impl(.data, dots) :
+##   Evaluation error: object 'a' not found.
+## In addition: Warning message:
+## In mutate_impl(.data, dots) :
+##   mismatched protect/unprotect (unprotect with empty protect stack) (RError)
+## Translated to internal error</code></pre>
+</div>
+<div id="different-expressions" class="section level1">
+<h1>Different expressions</h1>
+<pre class="r"><code>df &lt;- data.frame(
+  g1 = c(1, 1, 2, 2, 2),
+  g2 = c(1, 2, 1, 2, 1),
+  a = sample(5),
+  b = sample(5)
+)
+d2 &lt;- df %&gt;%
+  group_by(g1) %&gt;%
+  summarise(a = mean(a))
+as.data.frame(d2)          </code></pre>
+<pre><code>##   g1 a
+## 1  1 3
+## 2  2 3</code></pre>
+<pre class="r"><code>d2 &lt;- df %&gt;%
+  group_by(g2) %&gt;%
+  summarise(a = mean(a))
+as.data.frame(d2)          </code></pre>
+<pre><code>##   g2        a
+## 1  1 3.666667
+## 2  2 2.000000</code></pre>
+<p>Trying to write a function in R that will receive two argumens, the first a variable and the second an expression is not trivia. As shown by Hardley, one might expect this function to do the trick:</p>
+<pre class="r"><code>my_summarise &lt;- function(df, group_var) {
+  df %&gt;%
+    group_by(group_var) %&gt;%
+    summarise(a = mean(a))
+}
+# my_summarise(df, g1)
+#&gt; Error: Column `group_var` is unknown</code></pre>
+<p>In order to solve this problem, coding with dplyr requires the introduction of many new concepts and functions such as ‘quo’, ‘quos’, ‘enquo’, ‘enquos’, ‘!!’ (bang bang), ‘!!!’ (triple bang).</p>
+<p>Now, let’s try to implement the same function in galaaz. The next code block first prints the ‘df’ data frame define previously in R, then creates the my_summarize function and calls it passing the R data frame and the group by variable ‘:g1’</p>
+<pre class="ruby"><code>puts ~:df
+print &quot;\n&quot;
+def my_summarize(df, group_var)
+  df.group_by(group_var).
+    summarize(a: E.mean(:a))
+end
+puts my_summarize((~:df), :g1).as__data__frame</code></pre>
+<pre><code>##   g1 g2 a b
+## 1  1  1 5 2
+## 2  1  2 1 5
+## 3  2  1 2 4
+## 4  2  2 3 1
+## 5  2  1 4 3
+##
+##   g1 a
+## 1  1 3
+## 2  2 3</code></pre>
+<p>It works!!! Well let’s make sure this was not just some coincidence</p>
+<pre class="ruby"><code>puts my_summarize((~:df), :g2).as__data__frame</code></pre>
+<pre><code>##   g2        a
+## 1  1 3.666667
+## 2  2 2.000000</code></pre>
+<p>Great, everything is fine! No magic, no new functions, no complexities, just normal, standard Ruby code. If you’ve ever done NSE in R, this certainly feels much safer and easy to implement.</p>
+</div>
+<div id="different-input-variables" class="section level1">
+<h1>Different input variables</h1>
+<p>In the previous section we’ve managed to get rid of all NSE formulation for a simple example, but does this remain true for more complex examples, or will the Ruby way prove inpractical for more complex code?</p>
+<p>In the next example Hardley proposes us to write a function that given an expression such as ‘a’ or ‘a * b’, calculates three summaries. What we want a function that does the same as these R statements:</p>
+<pre><code>summarise(df, mean = mean(a), sum = sum(a), n = n())
+#&gt; # A tibble: 1 x 3
+#&gt;    mean   sum     n
+#&gt;   &lt;dbl&gt; &lt;int&gt; &lt;int&gt;
+#&gt; 1     3    15     5
+summarise(df, mean = mean(a * b), sum = sum(a * b), n = n())
+#&gt; # A tibble: 1 x 3
+#&gt;    mean   sum     n
+#&gt;   &lt;dbl&gt; &lt;int&gt; &lt;int&gt;
+#&gt; 1   9.6    48     5</code></pre>
+<p>Let’s try it in galaaz:</p>
+<pre class="ruby"><code>def my_summarise2(df, expr)
+  df.summarize(
+    mean: E.mean(expr),
+    sum: E.sum(expr),
+    n: E.n
+  )
+end
+puts my_summarise2((~:df), :a)
+puts my_summarise2((~:df), :a * :b)</code></pre>
+<pre><code>##   mean sum n
+## 1    3  15 5
+##   mean sum n
+## 1  7.6  38 5</code></pre>
+<p>Once again, there is no need to use any special theory or functions. The only point to be careful about is the use of ‘E’ to build an expression that uses the mean, sum and n.</p>
+</div>
+<div id="different-input-and-output-variable" class="section level1">
+<h1>Different input and output variable</h1>
+<p>Now the next challenge presented by Hardley is to vary the name of the output variables based on the received expression. So, if the input expression is ‘a’, we want our data frame columns to be named ‘mean_a’ and ‘sum_a’. Now, if the input expression is ‘b’, columns should be named ‘mean_b’ and ‘sum_b’.</p>
+<pre><code>mutate(df, mean_a = mean(a), sum_a = sum(a))
+#&gt; # A tibble: 5 x 6
+#&gt;      g1    g2     a     b mean_a sum_a
+#&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt;  &lt;dbl&gt; &lt;int&gt;
+#&gt; 1     1     1     1     3      3    15
+#&gt; 2     1     2     4     2      3    15
+#&gt; 3     2     1     2     1      3    15
+#&gt; 4     2     2     5     4      3    15
+#&gt; # … with 1 more row
+mutate(df, mean_b = mean(b), sum_b = sum(b))
+#&gt; # A tibble: 5 x 6
+#&gt;      g1    g2     a     b mean_b sum_b
+#&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;int&gt;  &lt;dbl&gt; &lt;int&gt;
+#&gt; 1     1     1     1     3      3    15
+#&gt; 2     1     2     4     2      3    15
+#&gt; 3     2     1     2     1      3    15
+#&gt; 4     2     2     5     4      3    15
+#&gt; # … with 1 more row</code></pre>
+<p>Here is our Ruby code</p>
+<pre class="ruby"><code>def my_mutate(df, expr)
+  mean_name = &quot;mean_#{expr.to_s}&quot;
+  sum_name = &quot;sum_#{expr.to_s}&quot;
+  df.mutate(mean_name =&gt; E.mean(expr),
+            sum_name =&gt; E.sum(expr))
+end
+puts my_mutate((~:df), :a)
+puts my_mutate((~:df), :b)</code></pre>
+<pre><code>##   g1 g2 a b mean_a sum_a
+## 1  1  1 5 2      3    15
+## 2  1  2 1 5      3    15
+## 3  2  1 2 4      3    15
+## 4  2  2 3 1      3    15
+## 5  2  1 4 3      3    15
+##   g1 g2 a b mean_b sum_b
+## 1  1  1 5 2      3    15
+## 2  1  2 1 5      3    15
+## 3  2  1 2 4      3    15
+## 4  2  2 3 1      3    15
+## 5  2  1 4 3      3    15</code></pre>
+<p>It really seems that “Non Standard Evaluation” is actually quite standard in Galaaz! But, you might have noticed a small change in the way the arguments to the mutate method were called. In a previous example we used df.summarise(mean: E.mean(:a), …) where the column name was followed by a ‘:’ colom. In this example, we have df.mutate(mean_name =&gt; E.mean(expr), …) and variable mean_name is not followed by ‘:’ but by ‘=&gt;’. This is standard Ruby notation.</p>
+<p>[explain….]</p>
+</div>
+<div id="capturing-multiple-variables" class="section level1">
+<h1>Capturing multiple variables</h1>
+<pre class="ruby"><code>def my_summarise3(df, *group_vars)
+  df.group_by(*group_vars).
+    summarise(a: E.mean(:a))
+end
+puts my_summarise3((~:df), :g1, :g2).as__data__frame</code></pre>
+<pre><code>##   g1 g2 a
+## 1  1  1 5
+## 2  1  2 1
+## 3  2  1 3
+## 4  2  2 3</code></pre>
+</div>
+<div id="advanced-dplyr-features" class="section level1">
+<h1>Advanced dplyr features</h1>
+<p><a href="https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/" class="uri">https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/</a></p>
+<pre class="ruby"><code>puts (~:starwars).head.as__data__frame</code></pre>
+<pre><code>##             name height mass  hair_color  skin_color eye_color birth_year
+## 1 Luke Skywalker    172   77       blond        fair      blue       19.0
+## 2          C-3PO    167   75        &lt;NA&gt;        gold    yellow      112.0
+## 3          R2-D2     96   32        &lt;NA&gt; white, blue       red       33.0
+## 4    Darth Vader    202  136        none       white    yellow       41.9
+## 5    Leia Organa    150   49       brown       light     brown       19.0
+## 6      Owen Lars    178  120 brown, grey       light      blue       52.0
+##   gender homeworld species
+## 1   male  Tatooine   Human
+## 2   &lt;NA&gt;  Tatooine   Droid
+## 3   &lt;NA&gt;     Naboo   Droid
+## 4   male  Tatooine   Human
+## 5 female  Alderaan   Human
+## 6   male  Tatooine   Human
+##                                                                                                                                       films
+## 1                                           Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
+## 2                    Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
+## 3 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
+## 4                                                              Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
+## 5                                           Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
+## 6                                                                                     Attack of the Clones, Revenge of the Sith, A New Hope
+##                             vehicles                starships
+## 1 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle
+## 2
+## 3
+## 4                                             TIE Advanced x1
+## 5              Imperial Speeder Bike
+## 6</code></pre>
+<pre class="r"><code>grouped_mean &lt;- function(data, grouping_variables, value_variables) {
+  data %&gt;%
+    group_by_at(grouping_variables) %&gt;%
+    mutate(count = n()) %&gt;%
+    summarise_at(c(value_variables, &quot;count&quot;), mean, na.rm = TRUE) %&gt;%
+    rename_at(value_variables, funs(paste0(&quot;mean_&quot;, .)))
+    }
+gm = starwars %&gt;%
+   grouped_mean(&quot;eye_color&quot;, c(&quot;mass&quot;, &quot;birth_year&quot;))
+as.data.frame(gm)   </code></pre>
+<pre><code>##        eye_color mean_mass mean_birth_year count
+## 1          black  76.28571        33.00000    10
+## 2           blue  86.51667        67.06923    19
+## 3      blue-gray  77.00000        57.00000     1
+## 4          brown  66.09231       108.96429    21
+## 5           dark       NaN             NaN     1
+## 6           gold       NaN             NaN     1
+## 7  green, yellow 159.00000             NaN     1
+## 8          hazel  66.00000        34.50000     3
+## 9         orange 282.33333       231.00000     8
+## 10          pink       NaN             NaN     1
+## 11           red  81.40000        33.66667     5
+## 12     red, blue       NaN             NaN     1
+## 13       unknown  31.50000             NaN     3
+## 14         white  48.00000             NaN     1
+## 15        yellow  81.11111        76.38000    11</code></pre>
+<pre class="ruby"><code>def grouped_mean(data, grouping_variables, value_variables)
+  data.
+    group_by_at(grouping_variables).
+    mutate(count: E.n).
+    summarise_at(E.c(value_variables, &quot;count&quot;), ~:mean, na__rm: true).
+    rename_at(value_variables, R.funs(E.paste0(&quot;mean_&quot;, value_variables)))
+end
+puts grouped_mean((~:starwars), &quot;eye_color&quot;, R.c(&quot;mass&quot;, &quot;birth_year&quot;)).as__data__frame</code></pre>
+<pre><code>##        eye_color mean_mass mean_birth_year count
+## 1          black  76.28571        33.00000    10
+## 2           blue  86.51667        67.06923    19
+## 3      blue-gray  77.00000        57.00000     1
+## 4          brown  66.09231       108.96429    21
+## 5           dark       NaN             NaN     1
+## 6           gold       NaN             NaN     1
+## 7  green, yellow 159.00000             NaN     1
+## 8          hazel  66.00000        34.50000     3
+## 9         orange 282.33333       231.00000     8
+## 10          pink       NaN             NaN     1
+## 11           red  81.40000        33.66667     5
+## 12     red, blue       NaN             NaN     1
+## 13       unknown  31.50000             NaN     3
+## 14         white  48.00000             NaN     1
+## 15        yellow  81.11111        76.38000    11</code></pre>
 </div>