galaaz 0.4.7 → 0.4.8

Sign up to get free protection for your applications and to get access to all the features.
Files changed (68) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +1179 -39
  3. data/Rakefile +4 -2
  4. data/bin/grun +1 -1
  5. data/bin/gstudio +1 -1
  6. data/blogs/dev/dev.Rmd +2 -56
  7. data/blogs/dev/dev.md +32 -61
  8. data/blogs/dev/dev2.Rmd +65 -0
  9. data/blogs/dplyr/dplyr.Rmd +29 -0
  10. data/blogs/{dev/dev.html → dplyr/dplyr.html} +88 -57
  11. data/blogs/dplyr/dplyr.md +58 -0
  12. data/blogs/gknit/gknit.html +1262 -25
  13. data/blogs/gknit/gknit.md +471 -27
  14. data/blogs/gknit/gknit_files/figure-html/bubble-1.png +0 -0
  15. data/blogs/manual/graph.rb +29 -0
  16. data/blogs/manual/manual.Rmd +567 -29
  17. data/blogs/manual/manual.html +743 -46
  18. data/blogs/manual/manual.md +1179 -39
  19. data/blogs/nse_dplyr/nse_dplyr.Rmd +466 -11
  20. data/blogs/nse_dplyr/nse_dplyr.html +472 -37
  21. data/blogs/nse_dplyr/nse_dplyr.md +645 -32
  22. data/blogs/ruby_plot/ruby_plot.Rmd +4 -4
  23. data/blogs/ruby_plot/ruby_plot.html +217 -2
  24. data/blogs/ruby_plot/ruby_plot.md +226 -1
  25. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
  26. data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +2 -2
  27. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
  28. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +70 -70
  29. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
  30. data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +72 -72
  31. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
  32. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +116 -116
  33. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
  34. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +176 -176
  35. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
  36. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
  37. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +236 -236
  38. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
  39. data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +176 -176
  40. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
  41. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +160 -160
  42. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
  43. data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +105 -105
  44. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
  45. data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +121 -121
  46. data/examples/islr/ch2.spec.rb +1 -1
  47. data/examples/islr/ch3_boston.rb +4 -4
  48. data/examples/islr/x_y_rnorm.jpg +0 -0
  49. data/lib/R_interface/r.rb +1 -1
  50. data/lib/R_interface/r_methods.rb +2 -2
  51. data/lib/R_interface/rdata_frame.rb +8 -5
  52. data/lib/R_interface/rindexed_object.rb +1 -2
  53. data/lib/R_interface/rlist.rb +1 -0
  54. data/lib/R_interface/robject.rb +0 -1
  55. data/lib/R_interface/rpkg.rb +14 -6
  56. data/lib/R_interface/rsupport.rb +7 -9
  57. data/lib/R_interface/ruby_extensions.rb +17 -5
  58. data/lib/gknit/knitr_engine.rb +9 -2
  59. data/lib/util/exec_ruby.rb +2 -2
  60. data/specs/r_dataframe.spec.rb +173 -0
  61. data/specs/r_list.spec.rb +4 -4
  62. data/specs/ruby_expression.spec.rb +2 -11
  63. data/specs/tmp.rb +76 -34
  64. data/version.rb +1 -1
  65. metadata +17 -6
  66. data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
  67. data/blogs/dev/dev_files/figure-html/diverging_bar. +0 -0
  68. data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0
@@ -22,16 +22,87 @@ library('tibble')
22
22
 
23
23
  # Introduction
24
24
 
25
- ```{r tibble}
26
- df <- data.frame(x = 1:3, y = 3:1)
27
- print(df)
25
+ In this post we will see how to program with dplyr in Galaaz.
26
+
27
+ ### But first, what is Galaaz??
28
+
29
+ Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with
30
+ a large community, a very large set of libraries and great for web development. However,
31
+ it lacks libraries for data science, statistics, scientific plotting and machine learning.
32
+ On the other hand, R is considered one of the most powerful languages for solving all of the
33
+ above problems. Maybe the strongest competitor to R is Python with libraries such as NumPy,
34
+ Panda, SciPy, SciKit-Learn and a couple more.
35
+
36
+ With Galaaz we do not intend to re-implement any of the scientific libraries in R. However, we
37
+ allow for very tight coupling between the two languages to the point that the Ruby
38
+ developer does not need to know that there is an R engine running. For this to happen we
39
+ use new technologies provided by Oracle: GraalVM, TruffleRuby and FastR:
40
+
41
+ GraalVM is a universal virtual machine for running applications
42
+ written in JavaScript, Python 3, Ruby, R, JVM-based languages like Java,
43
+ Scala, Kotlin, and LLVM-based languages such as C and C++.
44
+
45
+ GraalVM removes the isolation between programming languages and enables
46
+ interoperability in a shared runtime. It can run either standalone or in
47
+ the context of OpenJDK, Node.js, Oracle Database, or MySQL.
48
+
49
+ GraalVM allows you to write polyglot applications with a seamless way to
50
+ pass values from one language to another. With GraalVM there is no copying
51
+ or marshaling necessary as it is with other polyglot systems. This lets
52
+ you achieve high performance when language boundaries are crossed. Most
53
+ of the time there is no additional cost for crossing a language boundary
54
+ at all.
55
+
56
+ Often developers have to make uncomfortable compromises that require them
57
+ to rewrite their software in other languages. For example:
58
+
59
+ * “That library is not available in my language. I need to rewrite it.”
60
+ * “That language would be the perfect fit for my problem, but we cannot
61
+ run it in our environment.”
62
+ * “That problem is already solved in my language, but the language is
63
+ too slow.”
64
+
65
+ With GraalVM we aim to allow developers to freely choose the right language
66
+ for the task at hand without making compromises.
67
+
68
+ Interested readers should also check out the following sites:
69
+
70
+ * [GraalVM Home](https://www.graalvm.org/)
71
+ * [TruffleRuby](https://github.com/oracle/truffleruby)
72
+ * [FastR](https://github.com/oracle/fastr)
73
+ * [Faster R with FastR](https://medium.com/graalvm/faster-r-with-fastr-4b8db0e0dceb)
74
+
75
+ ### Programming with dplyr
76
+
77
+ According to Hardley (https://dplyr.tidyverse.org/articles/programming.html)
78
+
79
+ > Most dplyr functions use non-standard evaluation (NSE). This is a catch-all term that
80
+ > means they don’t follow the usual R rules of evaluation. Instead, they capture the
81
+ > expression that you typed and evaluate it in a custom way. This has two main
82
+ > benefits for dplyr code:
83
+
84
+ > Operations on data frames can be expressed succinctly because you don’t need to repeat
85
+ > the name of the data frame. For example, you can write filter(df, x == 1, y == 2, z == 3)
86
+ > instead of df[df$x == 1 & df$y ==2 & df$z == 3, ].
87
+
88
+ > dplyr can choose to compute results in a different way to base R. This is important for
89
+ > database backends because dplyr itself doesn’t do any work, but instead generates the SQL
90
+ > that tells the database what to do.
91
+
92
+ > Unfortunately these benefits do not come for free. There are two main drawbacks:
28
93
 
94
+ > Most dplyr arguments are not referentially transparent. That means you can’t replace a value
95
+ > with a seemingly equivalent object that you’ve defined elsewhere. In other words, this code:
96
+
97
+ ```{r tibble, eval=FALSE}
98
+ df <- data.frame(x = 1:3, y = 3:1)
29
99
  print(filter(df, x == 1))
30
100
  #> # A tibble: 1 x 2
31
101
  #> x y
32
102
  #> <int> <int>
33
103
  #> 1 1 3
34
104
  ```
105
+ > Is not equivalent to this code:
35
106
 
36
107
  ```{r my_var_err, eval = FALSE}
37
108
  my_var <- x
@@ -39,18 +110,164 @@ my_var <- x
39
110
  filter(df, my_var == 1)
40
111
  #> Error: object 'my_var' not found
41
112
  ```
113
+ > This makes it hard to create functions with arguments that change how dplyr verbs are computed.
114
+
115
+ In this post we will see that programming with dplyr in Galaaz does not require knowledge of
116
+ non-standard evaluation in R and can be accomplished by utilizing normal Ruby constructs.
117
+
118
+ # Writing Expressions in Galaaz
119
+
120
+ Galaaz extends Ruby to work with complex expressions, similar to R's expressions build with 'quote'
121
+ (base R) or 'quo' (tidyverse). Let's take a look at some of those expressions.
122
+
123
+ ## Expressions from operators
124
+
125
+ The code bellow
126
+ creates an expression summing two symbols
127
+
128
+ ```{ruby expressions}
129
+ exp1 = :a + :b
130
+ puts exp1
131
+ ```
132
+ We can build any complex mathematical expression
133
+
134
+ ```{ruby expr2}
135
+ exp2 = (:a + :b) * 2.0 + :c ** 2 / :z
136
+ puts exp2
137
+ ```
138
+
139
+ It is also possible to use inequality operators in building expressions
140
+
141
+ ```{ruby expr3}
142
+ exp3 = (:a + :b) >= :z
143
+ puts exp3
144
+ ```
145
+
146
+ Galaaz provides both symbolic representations for operators, such as (>, <, !=) as functional
147
+ notation for those operators such as (.gt, .ge, etc.). So the same expression written
148
+ above can also be written as
149
+
150
+ ```{ruby expr4}
151
+ exp4 = (:a + :b).ge :z
152
+ puts exp4
153
+ ```
154
+
155
+ Two type of expression can only be created with the functional representation of the operators,
156
+ those are expressions involving '==', and '='. In order to write an expression involving '==' we
157
+ need to use the method '.eq' and for '=' we need the function '.assign'
158
+
159
+ ```{ruby expr5}
160
+ exp5 = (:a + :b).eq :z
161
+ puts exp5
162
+ ```
163
+
164
+ ```{ruby expr6}
165
+ exp6 = :y.assign :a + :b
166
+ puts exp6
167
+ ```
168
+ In general we think that using the functional notation is preferable to using the
169
+ symbolic notation as otherwise, we end up writing invalid expressions such as
170
+
171
+ ```{ruby exp_wrong, warning=FALSE}
172
+ exp_wrong = (:a + :b) == :z
173
+ puts exp_wrong
174
+ ```
175
+ and it might be difficult to understand what is going on here. The problem lies with the fact that
176
+ when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When the
177
+ comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols at
178
+ this time are not bound to anything and we get a "object 'a' not found" message.
179
+ If we only use functional notation, this type of error will not occur.
180
+
181
+ ## Expressions with R methods
182
+
183
+ It is often necessary to create an expression that uses a method or function. For instance, in
184
+ mathematics, it's quite natural to write an expressin such as $y = sin(x)$. In this case, the
185
+ 'sin' function is part of the expression and should not immediately executed. Now, let's say
186
+ that 'x' is an angle of 45$^\circ$ and we acttually want our expression to be $y = 0.850...$.
187
+ When we want the function to be part of the expression, we call the function preceeding it
188
+ by the letter E, such as 'E.sin(x)'
189
+
190
+ ```{ruby method_expression}
191
+ exp7 = :y.assign E.sin(:x)
192
+ puts exp7
193
+ ```
194
+
195
+ # Using Galaaz to call R functions
196
+
197
+ Galaaz tries to emulate as closely as possible the way R functions are called and migrating from
198
+ R to Galaaz should be quite easy requiring only minor syntactic changes to an R script. In
199
+ this post, we do not have enough space to write a complety manual to Galaaz, so we will
200
+ present only a few examples scripts using Galaaz.
201
+
202
+ Basically, to call an R function from Ruby with Galaaz, one only needs to preceed the function
203
+ with 'R.'. For instance, to create a vector in R, the 'c' function is used. From Galaaz, a
204
+ vector can be created by using 'R.c':
205
+
206
+ ```{ruby vector}
207
+ @vec = R.c(1.0, 2, 3)
208
+ puts @vec
209
+ ```
210
+ A list is created in R with the 'list' function, so in Galaaz we do:
211
+
212
+ ```{ruby list}
213
+ @list = R.list(a: 1.0, b: 2, c: 3)
214
+ puts @list
215
+ ```
216
+ Note that we can use named arguments in our list. The same code in R would be:
217
+
218
+ ```{r list2}
219
+ lst = list(a = 1, b = 2L, c = 3L)
220
+ print(lst)
221
+ ```
222
+
223
+
224
+ However, if we want the function to be evaluated, then
225
+ we use the normal call to function with R as 'R.sin(x)'.
226
+
227
+ ```{ruby method_exc}
228
+ x = 45
229
+ exp8 = :y.assign R.sin(x)
230
+ puts exp8
231
+ ```
232
+
233
+
234
+ # Filtering using expressions
235
+
236
+ Now that we now how to write expression, we can use then to filter a data frame by expressions.
237
+ Let's first start by creating a simple data frame with two columns named 'x' and 'y'
42
238
 
43
239
  ```{ruby df}
44
240
  @df = R.data__frame(x: (1..3), y: (3..1))
45
241
  puts @df
242
+ ```
243
+ In the code bellow we want to filter the data frame by rows in which the value of 'x' is
244
+ equal to 1.
46
245
 
246
+ ```{ruby filter_exp}
47
247
  puts @df.filter(:x.eq 1)
48
248
  ```
49
249
 
250
+ In R, and when coding with 'tidyverse', arguments to a function are usually not
251
+ *referencially transparent*. That is, ou can’t replace a value with a seemingly equivalent
252
+ object that you’ve defined elsewhere. In other words, this code
253
+
254
+ ```{r not_transp, eval=FALSE}
255
+ my_var <- x
256
+ filter(df, my_var == 1)
257
+ ```
258
+ Generates the following error: "object 'x' not found.
259
+
260
+ However, in Ruby and Galaaz, arguments are referencially transparent as can be seen by the
261
+ code bellow. Note, initally that 'my_var = :x' will not give the error "object 'x' not found"
262
+ since ':x' is treated as an expression and assigned to my\_var. Then when doing (my\_var.eq 1),
263
+ my\_var is a variable that resolves to ':x' and it becomes equivalent to (:x.eq 1) which is
264
+ what we want.
265
+
50
266
  ```{ruby my_var}
51
267
  my_var = :x
52
268
  puts @df.filter(my_var.eq 1)
53
269
  ```
270
+ As stated by Hardley
54
271
 
55
272
  > dplyr code is ambiguous. Depending on what variables are defined where,
56
273
  > filter(df, x == y) could be equivalent to any of:
@@ -61,15 +278,253 @@ df[df$x == y, ]
61
278
  df[x == df$y, ]
62
279
  df[x == y, ]
63
280
  ```
281
+ In galaaz this ambiguity does not exist, filter(df, x.eq y) is not a valid expression as
282
+ expressions are build with symbols. In doing filter(df, :x.eq y) we are looking for elements
283
+ of the 'x' column that are equal to a previously defined y variable. Finally,
284
+ filter(df, :x.eq :y) we are looking for elements in which the 'x' column value is equal to
285
+ the 'y' column value. This can be seen in the following two chunks of code:
64
286
 
65
- In galaaz this ambiguity does not exist
287
+ ```{ruby disamb1}
288
+ @y = 1
289
+ @x = 2
290
+
291
+ # looking for values where the 'x' column is equal to the 'y' column
292
+ puts @df.filter(:x.eq :y)
293
+ ```
294
+
295
+ ```{ruby disamb2}
296
+ # looking for values where the 'x' column is equal to the 'y' variable
297
+ # in this case, the number 1
298
+ puts @df.filter(:x.eq @y)
299
+ ```
300
+ # Writing a function that applies to different data sets
301
+
302
+
303
+
304
+ ```
305
+ mutate(df1, y = a + x)
306
+ mutate(df2, y = a + x)
307
+ mutate(df3, y = a + x)
308
+ mutate(df4, y = a + x)
309
+ ```
310
+
311
+ Here we create a mutate_y Ruby method.
312
+
313
+ ```{ruby mutate_y, warning=FALSE}
314
+ def mutate_y(df)
315
+ df.mutate(:y.assign :a + :x)
316
+ end
317
+ ```
318
+
319
+ Note that contrary to what happens in R, method mutate_y will fail independetly from the fact
320
+ that variable 'a' is defined or not.
321
+
322
+ ```{ruby call_mutate, warning=FALSE}
323
+ df1 = R.data__frame(x: (1..3))
324
+ puts df1
325
+ a = 10
326
+ mutate_y(df1)
327
+ ```
328
+
329
+ # Different expressions
330
+
331
+ ```{r diff_expr}
332
+ df <- data.frame(
333
+ g1 = c(1, 1, 2, 2, 2),
334
+ g2 = c(1, 2, 1, 2, 1),
335
+ a = sample(5),
336
+ b = sample(5)
337
+ )
338
+
339
+ d2 <- df %>%
340
+ group_by(g1) %>%
341
+ summarise(a = mean(a))
342
+
343
+ as.data.frame(d2)
344
+
345
+ d2 <- df %>%
346
+ group_by(g2) %>%
347
+ summarise(a = mean(a))
348
+
349
+ as.data.frame(d2)
350
+ ```
351
+
352
+ Trying to write a function in R that will receive two argumens, the first a variable and
353
+ the second an expression is not trivia. As shown by Hardley, one might expect this function
354
+ to do the trick:
355
+
356
+ ```{r diff_exp_fnc}
357
+ my_summarise <- function(df, group_var) {
358
+ df %>%
359
+ group_by(group_var) %>%
360
+ summarise(a = mean(a))
361
+ }
362
+
363
+ # my_summarise(df, g1)
364
+ #> Error: Column `group_var` is unknown
365
+ ```
366
+
367
+ In order to solve this problem, coding with dplyr requires the introduction of many new concepts
368
+ and functions such as 'quo', 'quos', 'enquo', 'enquos', '!!' (bang bang), '!!!' (triple bang).
369
+
370
+ Now, let's try to implement the same function in galaaz. The next code block first prints the
371
+ 'df' data frame define previously in R, then creates the my_summarize function and calls it
372
+ passing the R data frame and the group by variable ':g1'
373
+
374
+ ```{ruby diff_exp_ruby_func}
375
+ puts ~:df
376
+ print "\n"
377
+
378
+ def my_summarize(df, group_var)
379
+ df.group_by(group_var).
380
+ summarize(a: E.mean(:a))
381
+ end
382
+
383
+ puts my_summarize((~:df), :g1).as__data__frame
384
+ ```
385
+ It works!!! Well let's make sure this was not just some coincidence
386
+
387
+ ```{ruby group_g2}
388
+ puts my_summarize((~:df), :g2).as__data__frame
389
+ ```
390
+
391
+ Great, everything is fine! No magic, no new functions, no complexities, just normal, standard Ruby
392
+ code. If you've ever done NSE in R, this certainly feels much safer and easy to implement.
393
+
394
+ # Different input variables
395
+
396
+ In the previous section we've managed to get rid of all NSE formulation for a simple example, but
397
+ does this remain true for more complex examples, or will the Ruby way prove inpractical for
398
+ more complex code?
399
+
400
+ In the next example Hardley proposes us to write a function that given an expression such as 'a'
401
+ or 'a * b', calculates three summaries. What we want a function that does the same as these R
402
+ statements:
403
+
404
+ ```
405
+ summarise(df, mean = mean(a), sum = sum(a), n = n())
406
+ #> # A tibble: 1 x 3
407
+ #> mean sum n
408
+ #> <dbl> <int> <int>
409
+ #> 1 3 15 5
410
+
411
+ summarise(df, mean = mean(a * b), sum = sum(a * b), n = n())
412
+ #> # A tibble: 1 x 3
413
+ #> mean sum n
414
+ #> <dbl> <int> <int>
415
+ #> 1 9.6 48 5
416
+ ```
417
+
418
+ Let's try it in galaaz:
419
+
420
+ ```{ruby summarize_method}
421
+ def my_summarise2(df, expr)
422
+ df.summarize(
423
+ mean: E.mean(expr),
424
+ sum: E.sum(expr),
425
+ n: E.n
426
+ )
427
+ end
428
+
429
+ puts my_summarise2((~:df), :a)
430
+ puts my_summarise2((~:df), :a * :b)
431
+ ```
432
+
433
+ Once again, there is no need to use any special theory or functions. The only point to be
434
+ careful about is the use of 'E' to build an expression that uses the mean, sum and n.
435
+
436
+ # Different input and output variable
437
+
438
+ Now the next challenge presented by Hardley is to vary the name of the output variables based on
439
+ the received expression. So, if the input expression is 'a', we want our data frame columns to
440
+ be named 'mean\_a' and 'sum\_a'. Now, if the input expression is 'b', columns
441
+ should be named 'mean\_b' and 'sum\_b'.
442
+
443
+ ```
444
+ mutate(df, mean_a = mean(a), sum_a = sum(a))
445
+ #> # A tibble: 5 x 6
446
+ #> g1 g2 a b mean_a sum_a
447
+ #> <dbl> <dbl> <int> <int> <dbl> <int>
448
+ #> 1 1 1 1 3 3 15
449
+ #> 2 1 2 4 2 3 15
450
+ #> 3 2 1 2 1 3 15
451
+ #> 4 2 2 5 4 3 15
452
+ #> # … with 1 more row
453
+
454
+ mutate(df, mean_b = mean(b), sum_b = sum(b))
455
+ #> # A tibble: 5 x 6
456
+ #> g1 g2 a b mean_b sum_b
457
+ #> <dbl> <dbl> <int> <int> <dbl> <int>
458
+ #> 1 1 1 1 3 3 15
459
+ #> 2 1 2 4 2 3 15
460
+ #> 3 2 1 2 1 3 15
461
+ #> 4 2 2 5 4 3 15
462
+ #> # … with 1 more row
463
+ ```
464
+
465
+ Here is our Ruby code
466
+
467
+ ```{ruby name_change}
468
+ def my_mutate(df, expr)
469
+ mean_name = "mean_#{expr.to_s}"
470
+ sum_name = "sum_#{expr.to_s}"
471
+
472
+ df.mutate(mean_name => E.mean(expr),
473
+ sum_name => E.sum(expr))
474
+ end
475
+
476
+ puts my_mutate((~:df), :a)
477
+ puts my_mutate((~:df), :b)
478
+ ```
479
+ It really seems that "Non Standard Evaluation" is actually quite standard in Galaaz! But, you
480
+ might have noticed a small change in the way the arguments to the mutate method were called.
481
+ In a previous example we used df.summarise(mean: E.mean(:a), ...) where the column name was
482
+ followed by a ':' colom. In this example, we have df.mutate(mean_name => E.mean(expr), ...)
483
+ and variable mean\_name is not followed by ':' but by '=>'. This is standard Ruby notation.
484
+
485
+ [explain....]
486
+
487
+ # Capturing multiple variables
488
+
489
+ ```{ruby multiple_vars}
490
+ def my_summarise3(df, *group_vars)
491
+ df.group_by(*group_vars).
492
+ summarise(a: E.mean(:a))
493
+ end
494
+
495
+ puts my_summarise3((~:df), :g1, :g2).as__data__frame
496
+ ```
497
+
498
+ # Advanced dplyr features
499
+ https://www.r-bloggers.com/programming-with-dplyr-by-using-dplyr/
500
+
501
+ ```{ruby starwars}
502
+ puts (~:starwars).head.as__data__frame
503
+ ```
504
+
505
+ ```{r grouped_mean}
506
+ grouped_mean <- function(data, grouping_variables, value_variables) {
507
+ data %>%
508
+ group_by_at(grouping_variables) %>%
509
+ mutate(count = n()) %>%
510
+ summarise_at(c(value_variables, "count"), mean, na.rm = TRUE) %>%
511
+ rename_at(value_variables, funs(paste0("mean_", .)))
512
+ }
513
+
514
+ gm = starwars %>%
515
+ grouped_mean("eye_color", c("mass", "birth_year"))
516
+
517
+ as.data.frame(gm)
518
+ ```
66
519
 
67
- ```{ruby disamb}
68
- y = 2
69
- x = 2
520
+ ```{ruby advanced_starwars}
521
+ def grouped_mean(data, grouping_variables, value_variables)
522
+ data.
523
+ group_by_at(grouping_variables).
524
+ mutate(count: E.n).
525
+ summarise_at(E.c(value_variables, "count"), ~:mean, na__rm: true).
526
+ rename_at(value_variables, R.funs(E.paste0("mean_", value_variables)))
527
+ end
70
528
 
71
- puts @df[:x.eq :y, :all]
72
- puts @df[:x.eq y, :all]
73
- # @df[x.eq :y, :all]
74
- # @df[x == y, :all]
529
+ puts grouped_mean((~:starwars), "eye_color", R.c("mass", "birth_year")).as__data__frame
75
530
  ```