galaaz 0.4.1 → 0.4.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (105) hide show
  1. checksums.yaml +4 -4
  2. data/Rakefile +29 -0
  3. data/bin/gknit +208 -10
  4. data/bin/gknit2 +14 -0
  5. data/bin/gknit2~ +6 -0
  6. data/bin/prepareR.rb +3 -0
  7. data/bin/prepareR.rb~ +1 -0
  8. data/bin/tmp.py +51 -0
  9. data/blogs/dev/dev.Rmd +70 -0
  10. data/blogs/dev/dev.Rmd~ +104 -0
  11. data/blogs/dev/dev.html +209 -0
  12. data/blogs/dev/dev.md +72 -0
  13. data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
  14. data/blogs/dev/model.rb +41 -0
  15. data/blogs/galaaz_ggplot/galaaz_ggplot.Rmd +55 -27
  16. data/blogs/galaaz_ggplot/galaaz_ggplot.aux +44 -0
  17. data/blogs/galaaz_ggplot/galaaz_ggplot.dvi +0 -0
  18. data/blogs/galaaz_ggplot/galaaz_ggplot.html +17 -4
  19. data/blogs/galaaz_ggplot/galaaz_ggplot.out +10 -0
  20. data/blogs/galaaz_ggplot/galaaz_ggplot.pdf +0 -0
  21. data/blogs/galaaz_ggplot/galaaz_ggplot.tex +630 -0
  22. data/blogs/galaaz_ggplot/midwest.Rmd +1 -1
  23. data/blogs/galaaz_ggplot/midwest_external_png +13 -0
  24. data/blogs/galaaz_ggplot/midwest_external_png~ +1 -0
  25. data/blogs/gknit/gknit.Rmd +500 -0
  26. data/blogs/gknit/gknit.Rmd~ +184 -0
  27. data/blogs/gknit/gknit.Rnd~ +17 -0
  28. data/blogs/gknit/gknit.html +528 -0
  29. data/blogs/gknit/gknit.md +628 -0
  30. data/blogs/gknit/gknit.pdf +0 -0
  31. data/blogs/gknit/gknit.tex +745 -0
  32. data/blogs/gknit/gknit_files/figure-html/bubble-1.png +0 -0
  33. data/blogs/gknit/gknit_files/figure-html/diverging_bar.png +0 -0
  34. data/blogs/gknit/model.rb +41 -0
  35. data/blogs/gknit/model.rb~ +46 -0
  36. data/blogs/ruby_plot/figures/dose_len.png +0 -0
  37. data/blogs/ruby_plot/figures/facet_by_delivery.png +0 -0
  38. data/blogs/ruby_plot/figures/facet_by_dose.png +0 -0
  39. data/blogs/ruby_plot/figures/facets_by_delivery_color.png +0 -0
  40. data/blogs/ruby_plot/figures/facets_by_delivery_color2.png +0 -0
  41. data/blogs/ruby_plot/figures/facets_with_decorations.png +0 -0
  42. data/blogs/ruby_plot/figures/facets_with_jitter.png +0 -0
  43. data/blogs/ruby_plot/figures/facets_with_points.png +0 -0
  44. data/blogs/ruby_plot/figures/final_box_plot.png +0 -0
  45. data/blogs/ruby_plot/figures/final_violin_plot.png +0 -0
  46. data/blogs/ruby_plot/figures/violin_with_jitter.png +0 -0
  47. data/blogs/ruby_plot/ruby_plot.Rmd +680 -0
  48. data/blogs/ruby_plot/ruby_plot.Rmd~ +215 -0
  49. data/blogs/ruby_plot/ruby_plot.html +563 -0
  50. data/blogs/ruby_plot/ruby_plot.md +731 -0
  51. data/blogs/ruby_plot/ruby_plot.pdf +0 -0
  52. data/blogs/ruby_plot/ruby_plot.tex +458 -0
  53. data/examples/sthda_ggplot/all.rb +0 -6
  54. data/examples/sthda_ggplot/two_variables_cont_bivariate/geom_hex.rb +1 -1
  55. data/examples/sthda_ggplot/two_variables_cont_cont/misc.rb +1 -1
  56. data/examples/sthda_ggplot/two_variables_disc_cont/geom_bar.rb +2 -2
  57. data/examples/sthda_ggplot/two_variables_disc_disc/geom_jitter.rb +0 -1
  58. data/lib/R/eng_ruby.R +62 -0
  59. data/lib/R/eng_ruby.R~ +63 -0
  60. data/lib/R_interface/capture_plot.rb~ +23 -0
  61. data/lib/{R → R_interface}/expression.rb +0 -0
  62. data/lib/{R → R_interface}/r.rb +10 -1
  63. data/lib/{R → R_interface}/r.rb~ +0 -0
  64. data/lib/{R → R_interface}/r_methods.rb +21 -5
  65. data/lib/{R → R_interface}/rbinary_operators.rb +6 -1
  66. data/lib/R_interface/rclosure.rb +38 -0
  67. data/lib/{R → R_interface}/rdata_frame.rb +0 -0
  68. data/lib/R_interface/rdevices.R +31 -0
  69. data/lib/R_interface/rdevices.rb +225 -0
  70. data/lib/{R/rclosure.rb → R_interface/rdevices.rb~} +3 -10
  71. data/lib/{R → R_interface}/renvironment.rb +0 -0
  72. data/lib/{R → R_interface}/rexpression.rb +0 -0
  73. data/lib/{R → R_interface}/rindexed_object.rb +0 -0
  74. data/lib/{R → R_interface}/rlanguage.rb +0 -0
  75. data/lib/{R → R_interface}/rlist.rb +0 -0
  76. data/lib/{R → R_interface}/rmatrix.rb +0 -0
  77. data/lib/{R → R_interface}/rmd_indexed_object.rb +0 -0
  78. data/lib/{R → R_interface}/robject.rb +5 -0
  79. data/lib/{R → R_interface}/rpkg.rb +0 -0
  80. data/lib/{R → R_interface}/rsupport.rb +49 -13
  81. data/lib/{R → R_interface}/rsupport_scope.rb +0 -0
  82. data/lib/{R → R_interface}/rsymbol.rb +1 -0
  83. data/lib/{R → R_interface}/ruby_callback.rb +0 -0
  84. data/lib/{R → R_interface}/ruby_extensions.rb +2 -1
  85. data/lib/{R → R_interface}/runary_operators.rb +0 -0
  86. data/lib/{R → R_interface}/rvector.rb +0 -0
  87. data/lib/galaaz.rb +4 -2
  88. data/lib/gknit.rb +27 -0
  89. data/lib/gknit.rb~ +26 -0
  90. data/lib/gknit/knitr_engine.rb +120 -0
  91. data/lib/gknit/knitr_engine.rb~ +102 -0
  92. data/lib/gknit/ruby_engine.rb +70 -0
  93. data/lib/gknit/ruby_engine.rb~ +72 -0
  94. data/lib/util/exec_ruby.rb +8 -7
  95. data/lib/util/inline_file.rb +70 -0
  96. data/lib/util/inline_file.rb~ +23 -0
  97. data/r_requires/ggplot.rb +1 -8
  98. data/r_requires/knitr.rb +27 -0
  99. data/r_requires/knitr.rb~ +4 -0
  100. data/specs/r_language.spec.rb +22 -0
  101. data/specs/r_plots.spec.rb +72 -0
  102. data/specs/r_plots.spec.rb~ +37 -0
  103. data/specs/tmp.rb +255 -1
  104. data/version.rb +1 -1
  105. metadata +89 -39
@@ -0,0 +1,628 @@
1
+ ---
2
+ title: "gKnit - Ruby and R Knitting with Galaaz in GraalVM"
3
+ author: "Rodrigo Botafogo"
4
+ tags: [Galaaz, Ruby, R, TruffleRuby, FastR, GraalVM, knitr]
5
+ date: "19 October 2018"
6
+ output:
7
+ html_document:
8
+ self_contained: true
9
+ keep_md: true
10
+ pdf_document:
11
+ includes:
12
+ in_header: ["../../sty/galaaz.sty"]
13
+ number_sections: yes
14
+ ---
15
+
16
+
17
+
18
+ # Introduction
19
+
20
+ The idea of "literate programming" was first introduced by Donald Knuth in the 1980's.
21
+ The main intention of this approach was to develop software interspersing macro snippets,
22
+ traditional source code, and a natural language such as English that could be compiled into
23
+ executable code and at the same time easily read by a human developer. According to Knuth
24
+ "The practitioner of
25
+ literate programming can be regarded as an essayist, whose main concern is with exposition
26
+ and excellence of style."
27
+
28
+ The idea of literate programming evolved into the idea of reproducible research, in which
29
+ all the data, software code, documentation, graphics etc. needed to reproduce the research
30
+ and its reports could be included in a
31
+ single document or set of documents that when distributed to peers could be rerun generating
32
+ the same output and reports.
33
+
34
+ The R community has put a great deal of effort in reproducible research. In 2002, Sweave was
35
+ introduced and it allowed mixing R code with Latex generating high quality PDF documents. Those
36
+ documents could include the code, the result of executing the code, graphics and text. This
37
+ contained the whole narrative to reproduce the research. But Sweave had many problems and in
38
+ 2012, Knitr, developed by Yihui Xie from RStudio was released, solving many of the long lasting
39
+ problems from Sweave and including in one single package many extensions and add-on packages that
40
+ were necessary for Sweave.
41
+
42
+ With Knitr, R markdown was also developed, an extension the the
43
+ Markdown format. With R markdown and Knitr it is possible to generate reports in a multitude
44
+ of formats such as HTML, markdown, Latex, PDF, dvi, etc. R markdown also allows the use of
45
+ multiple programming languages in the same document. In R markdown text is interspersed with
46
+ code chunks that can be executed and both the code as the result of executing the code can become
47
+ part of the final report. Although R markdown allows multiple programming languages in the
48
+ same document, only R and Python (with
49
+ the reticulate package) can persist variables between chunks. For other languages, such as
50
+ Ruby, every chunk will start a new process and thus all data is lost between chunks, unless it
51
+ is somehow stored in a data file that is read by the next chunk.
52
+
53
+ Being able to persist data
54
+ between chunks is critical for literate programming otherwise the flow of the narrative is lost
55
+ by all the effort of having to save data and then reload it. Probably, because of this impossibility,
56
+ it is very rare to see any R markdown document document in the Ruby community.
57
+
58
+ In the Python community, the same effort to have code and text in an integrated environment
59
+ started also on the first decade of 2000. In 2006 iPython 0.7.2 was released. In 2014,
60
+ Fernando Pérez, spun off project Jupyter from iPython creating a web-based interactive
61
+ computation environment. Jupyter can now be used with many languages, including Ruby with the
62
+ iruby gem (https://github.com/SciRuby/iruby). I am not sure if multiple languages can be used
63
+ in a Jupyter notebook.
64
+
65
+ # gKnitting a Document
66
+
67
+ This document describes gKnit. gKnit uses Knitr and R markdown to knit a document in Ruby or R
68
+ and output it in any of the
69
+ available formats for R markdown. The only difference between gKnit and normal Knitr documents
70
+ is that gKnit runs atop of GraalVM, and Galaaz (an integration library between Ruby and R).
71
+ Another blog post on Galaaz and its integration with ggplot2 can be found at:
72
+ https://towardsdatascience.com/ruby-plotting-with-galaaz-an-example-of-tightly-coupling-ruby-and-r-in-graalvm-520b69e21021. With Galaaz, gKnit can knit documents in Ruby and R and both
73
+ Ruby and R execute on the same process and memory, variables, classes, etc.
74
+ will be preserved between chunks of code.
75
+
76
+ This is not a blog post on rmarkdown, and the interested user is directed to
77
+
78
+ * https://rmarkdown.rstudio.com/ or
79
+ * https://bookdown.org/yihui/rmarkdown/ for detailed information on its capabilities and use.
80
+
81
+ Here, we will describe quickly the main aspects of R markdown, so the user can start gKnitting
82
+ Ruby and R documents quickly.
83
+
84
+ ## The Yaml header
85
+
86
+ An R markdown document should start with a Yaml header and be stored in a file with '.Rmd' extension.
87
+ This document has the following header for gKitting an HTML document.
88
+
89
+ ```
90
+ ---
91
+ title: "gKnit - Ruby and R Knitting with Galaaz in GraalVM"
92
+ author: "Rodrigo Botafogo"
93
+ tags: [Galaaz, Ruby, R, TruffleRuby, FastR, GraalVM, knitr, gknit]
94
+ date: "29 October 2018"
95
+ output:
96
+ html_document:
97
+ keep_md: true
98
+ ---
99
+ ```
100
+
101
+ For more information on the options in the Yaml header, check https://bookdown.org/yihui/rmarkdown/html-document.html.
102
+
103
+ ## R Markdown formatting
104
+
105
+ Document formatting can be done with simple markups such as:
106
+
107
+ ### Headers
108
+
109
+ ```
110
+ # Header 1
111
+
112
+ ## Header 2
113
+
114
+ ### Header 3
115
+
116
+ ```
117
+
118
+ ### Lists
119
+
120
+ ```
121
+ Unordered lists:
122
+
123
+ * Item 1
124
+ * Item 2
125
+ + Item 2a
126
+ + Item 2b
127
+ ```
128
+
129
+ ```
130
+ Ordered Lists
131
+
132
+ 1. Item 1
133
+ 2. Item 2
134
+ 3. Item 3
135
+ + Item 3a
136
+ + Item 3b
137
+ ```
138
+
139
+ Please, go to https://rmarkdown.rstudio.com/authoring_basics.html, for more R markdown formatting.
140
+
141
+ ## Code Chunks
142
+
143
+ Running and executing Ruby and R code is actually what really interests us is this blog. Inserting
144
+ a code chunk is done by adding code in a block delimited by three back ticks followed by a
145
+ block with the engine name (r, ruby, rb, include, others), an optional chunk_label and optional
146
+ options, as shown bellow:
147
+
148
+ ````
149
+ ```{engine_name [chunk_label], [chunk_options]}
150
+ ```
151
+ ````
152
+
153
+ for instance, let's add an R chunk to the document labeled 'first_r_chunk'. In this case, the
154
+ code should not be shown in the document, so the option 'echo=FALSE' was added.
155
+
156
+ ````
157
+ ```{r first_r_chunk, echo = FALSE}
158
+ ```
159
+ ````
160
+
161
+ A description of the available chunk options can be found in the documentation cited above.
162
+
163
+ For including a Ruby chunk, just change the name of the engine to ruby as follows:
164
+
165
+ ````
166
+ ```{ruby first_ruby_chunk}
167
+ ```
168
+ ````
169
+
170
+ In this example, the ruby chunk is called 'first_ruby_chunk'. One important aspect of chunk
171
+ labels is that they cannot be duplicate. If a chunk label is duplicate, the knitting will
172
+ stop with an error.
173
+
174
+ ### R chunks
175
+
176
+ Let's now add an R chunk to this document. In this example, a vector 'r_vec' is created and
177
+ a new function 'redef_sum' is defined. The chunk specification is
178
+
179
+ ````
180
+ ```{r data_creation}
181
+ r_vec <- c(1, 2, 3, 4, 5)
182
+
183
+ redef_sum <- function(...) {
184
+ Reduce(sum, as.list(...))
185
+ }
186
+ ```
187
+ ````
188
+
189
+ and this is how it will look like once executed. From now on, we will not show the chunk
190
+ definition any longer.
191
+
192
+
193
+
194
+ ```r
195
+ r_vec <- c(1, 2, 3, 4, 5)
196
+
197
+ redef_sum <- function(...) {
198
+ Reduce(sum, as.list(...))
199
+ }
200
+ ```
201
+
202
+ We can, possibly in another chunk, access the vector and call the function as follows:
203
+
204
+
205
+ ```r
206
+ print(r_vec)
207
+ ```
208
+
209
+ ```
210
+ ## [1] 1 2 3 4 5
211
+ ```
212
+
213
+ ```r
214
+ print(redef_sum(r_vec))
215
+ ```
216
+
217
+ ```
218
+ ## [1] 15
219
+ ```
220
+
221
+
222
+ ```r
223
+ # load package and data
224
+ library(ggplot2)
225
+ data(mpg, package="ggplot2")
226
+ # mpg <- read.csv("http://goo.gl/uEeRGu")
227
+
228
+ mpg_select <- mpg[mpg$manufacturer %in% c("audi", "ford", "honda", "hyundai"), ]
229
+
230
+ # Scatterplot
231
+ theme_set(theme_bw()) # pre-set the bw theme.
232
+ g <- ggplot(mpg_select, aes(displ, cty)) +
233
+ labs(subtitle="mpg: Displacement vs City Mileage",
234
+ title="Bubble chart")
235
+
236
+ g + geom_jitter(aes(col=manufacturer, size=hwy)) +
237
+ geom_smooth(aes(col=manufacturer), method="lm", se=F)
238
+ ```
239
+
240
+ ![](/home/rbotafogo/desenv/galaaz/blogs/gknit/gknit_files/figure-html/bubble-1.png)<!-- -->
241
+
242
+ ### Ruby chunks
243
+
244
+ In the same way that an R chunk was created, let's now create a Ruby chunk. One important aspect
245
+ of Ruby is that in Ruby every evaluation of a chunk occurs on its own local scope, so, creating
246
+ a variable in a chunk will be out of scope in the next chunk. To make sure that variables are
247
+ available between chunks, they should be made global.
248
+
249
+ In this chunk, variable '\$a', '\$b' and '\$c' are standard Ruby variables and '\$vec' and '\$vec2'
250
+ are two vectors created by a call to FastR. It should be clear that there is no requirement
251
+ in gknit to call or use R functions. gKnit will knit standard Ruby code, or even general
252
+ text without code.
253
+
254
+
255
+ ```ruby
256
+ $a = [1, 2, 3]
257
+ $b = "US$ 250.000"
258
+ $c = "Inline text in a Heading"
259
+
260
+ $vec = R.c(1, 2, 3)
261
+ $vec2 = R.c(10, 20, 30)
262
+ ```
263
+
264
+ In this next block, variables '\$a', '\$vec' and '\$vec2' are used and printed.
265
+
266
+
267
+ ```ruby
268
+ puts $a
269
+ puts $vec * $vec2
270
+ ```
271
+
272
+ ```
273
+ ## 1
274
+ ## 2
275
+ ## 3
276
+ ## [1] 10 40 90
277
+ ```
278
+
279
+ ### Accessing R from Ruby
280
+
281
+ One of the nice aspects of Galaaz on GraalVM, is that variables and functions defined in R, can
282
+ be easily accessed from Ruby. This next chunk, reads data from R and uses the 'redef_fun'
283
+ function defined previously. To access an R variable from Ruby the '~' function should be
284
+ applied to the Ruby symbol representing the R variable. Since the R variable is called 'r_vec',
285
+ in Ruby, the symbol to acess it is ':r_vec' and thus '~:r_vec' retrieves the value of the
286
+ variable.
287
+
288
+
289
+ ```ruby
290
+ puts ~:r_vec
291
+ ```
292
+
293
+ ```
294
+ ## [1] 1 2 3 4 5
295
+ ```
296
+
297
+ In order to call an R function, the 'R.' module is used as follows
298
+
299
+
300
+ ```ruby
301
+ puts R.redef_sum($vec)
302
+ ```
303
+
304
+ ```
305
+ ## [1] 6
306
+ ```
307
+
308
+ ### Inline Ruby code
309
+
310
+ Knitr allows inserting R inline by adding
311
+ &#96;r code&#96;
312
+ . Unfortunately, this is not possible with Ruby code as there is no provision in knitr for
313
+ adding this kind of inline engine. However, gKnit allows adding inline Ruby code with the
314
+ 'rb' engine. The following text will create and inline Ruby text:
315
+
316
+ ````
317
+ This is some text with inline Ruby accessing variable \$b which has value:
318
+ ```{rb puts $b}
319
+ ```
320
+ and is followed by some other text!
321
+ ````
322
+
323
+ The result of executing the above chunk is the following sentence with inline Ruby code
324
+
325
+ <div style="margin-bottom:50px;">
326
+ </div>
327
+
328
+ This is some text with inline Ruby accessing variable \$b which has value:
329
+ US$ 250.000
330
+ and is followed by some other text!
331
+
332
+ <div style="margin-bottom:50px;">
333
+ </div>
334
+
335
+ In an inline block, it is possible to execute multiple Ruby statements by adding a semicolon
336
+ between them:
337
+
338
+ ````
339
+ Multiple statements in the 'rb' engine use semicolon:
340
+ ```{rb puts $a, puts $b}
341
+ ```
342
+ ````
343
+
344
+ <div style="margin-bottom:50px;">
345
+ </div>
346
+
347
+
348
+ Multiple statements in the 'rb' engine use semicolon:
349
+ 1
350
+ 2
351
+ 3
352
+ US$ 250.000
353
+
354
+ <div style="margin-bottom:50px;">
355
+ </div>
356
+
357
+
358
+ ### Inline text in a Heading
359
+
360
+ Sometimes one wants to add an inline text in a heading. To do that in Ruby the whole heading
361
+ needs to be returned by the inline Ruby engine. For example the heading above, was created by
362
+ the following chunk:
363
+
364
+ ````
365
+ ```{rb puts "### #{$c}"}
366
+ ```
367
+ ````
368
+
369
+ Remember that variable '$\c' was defined in a previous Ruby chunk and is now being used to
370
+ create the section heading for this section.
371
+
372
+
373
+ ### Plotting
374
+
375
+
376
+ ```ruby
377
+ require 'ggplot'
378
+
379
+ R.theme_set R.theme_bw
380
+
381
+ # Data Prep
382
+ mtcars = ~:mtcars
383
+ mtcars.car_name = R.rownames(:mtcars)
384
+ # compute normalized mpg
385
+ mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean)/mtcars.mpg.sd).round 2
386
+ mtcars.mpg_type = mtcars.mpg_z < 0 ? "below" : "above"
387
+ mtcars = mtcars[mtcars.mpg_z.order, :all]
388
+ # convert to factor to retain sorted order in plot
389
+ mtcars.car_name = mtcars.car_name.factor levels: mtcars.car_name
390
+
391
+ # Diverging Barcharts
392
+ # R.png
393
+ gg = mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
394
+ R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
395
+ R.scale_fill_manual(name: "Mileage",
396
+ labels: R.c("Above Average", "Below Average"),
397
+ values: R.c("above": "#00ba38", "below": "#f8766d")) +
398
+ R.labs(subtitle: "Normalised mileage from 'mtcars'",
399
+ title: "Diverging Bars") +
400
+ R.coord_flip()
401
+ print gg
402
+ # R.dev__off
403
+ # R.include_graphics("Rplot001.png")
404
+ ```
405
+
406
+ ### Including Ruby files
407
+
408
+ R is a language that was created to be easy and fast for statisticians to use. It was not a
409
+ language to be used for developing large systems. Of course, there are large systems and
410
+ libraries in R, but the focus of the language is for developing statistical models and
411
+ distribute that to peers.
412
+
413
+ Ruby on the other hand, is a language for large software development. Systems written in
414
+ Ruby will have dozens or hundreds of files. In order to document a large system with
415
+ literate programming we cannot expect the developer to add all the files in a single '.Rmd'
416
+ file. gKnit provides the 'include' chunk engine to include a Ruby file as if it had being
417
+ typed in the '.Rmd' file.
418
+
419
+ To include a file the following chunk should be created, where <filename> is the name of
420
+ the file to be include and where the extension, if it is '.rb', does not need to be added.
421
+ If the 'relative' option is not included, then it is treated as TRUE. When 'relative' is
422
+ true, 'require_relative' semantics is used to load the file, when false, Ruby's $LOAD_PATH
423
+ is searched to find the file and it is 'require'd.
424
+
425
+ ````
426
+ ```{include <filename>, relative = <TRUE/FALSE>}
427
+ ```
428
+ ````
429
+
430
+ Here we include file 'model.rb' which is in the same directory of this blog. This code
431
+ uses R 'caret' package to split a dataset in a train and test sets.
432
+
433
+ ````
434
+ ```{include model}
435
+ ```
436
+ ````
437
+
438
+
439
+ ```include
440
+ require 'galaaz'
441
+
442
+ # Loads the R 'caret' package. If not present, installs it
443
+ R.install_and_loads 'caret'
444
+
445
+ class Model
446
+
447
+ attr_reader :data
448
+ attr_reader :test
449
+ attr_reader :train
450
+
451
+ #==========================================================
452
+ #
453
+ #==========================================================
454
+
455
+ def initialize(data, percent_train:, seed: 123)
456
+
457
+ R.set__seed(seed)
458
+ @data = data
459
+ @percent_train = percent_train
460
+ @seed = seed
461
+
462
+ end
463
+
464
+ #==========================================================
465
+ #
466
+ #==========================================================
467
+
468
+ def partition(field)
469
+
470
+ train_index =
471
+ R.createDataPartition(@data.send(field), p: @percet_train,
472
+ list: false, times: 1)
473
+ @train = @data[train_index, :all]
474
+ @test = @data[-train_index, :all]
475
+
476
+ end
477
+
478
+ end
479
+
480
+ ```
481
+
482
+ ```
483
+ ## The following packages are missing and will be installed:
484
+ ## [1] "caret"
485
+ ```
486
+
487
+
488
+ ```ruby
489
+ mtcars = ~:mtcars
490
+ model = Model.new(mtcars, percent_train: 0.8)
491
+ model.partition(:mpg)
492
+ puts model.train.head
493
+ puts model.test.head
494
+ ```
495
+
496
+ ```
497
+ ## mpg cyl disp hp drat wt qsec vs am gear carb
498
+ ## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
499
+ ## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
500
+ ## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
501
+ ## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
502
+ ## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
503
+ ## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
504
+ ## mpg cyl disp hp drat wt qsec vs am gear carb
505
+ ## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
506
+ ## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
507
+ ## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
508
+ ## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
509
+ ## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
510
+ ## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
511
+ ```
512
+
513
+ ### Documenting Gems
514
+
515
+ gKnit also allows developers to document and load files that are not in the same directory
516
+ of the '.Rmd' file. When using 'relative = FALSE' in a chunk header, gKnit will look for the
517
+ file in Ruby's \$LOAD_PATH and load it if found.
518
+
519
+ Here is an example of loading the 'continuation.rb' file from TruffleRuby.
520
+
521
+ ````
522
+ ```{include continuation, relative = FALSE}
523
+ ```
524
+ ````
525
+
526
+
527
+
528
+ ```include
529
+ # Copyright (c) 2017 Oracle and/or its affiliates. All rights reserved. This
530
+ # code is released under a tri EPL/GPL/LGPL license. You can use it,
531
+ # redistribute it and/or modify it under the terms of the:
532
+ #
533
+ # Eclipse Public License version 1.0, or
534
+ # GNU General Public License version 2, or
535
+ # GNU Lesser General Public License version 2.1.
536
+
537
+ warn "#{File.basename(__FILE__)}: warning: callcc is obsolete; use Fiber instead"
538
+
539
+ class Continuation
540
+ def initialize
541
+ @fiber = Fiber.current
542
+ end
543
+
544
+ def call
545
+ if Fiber.current != @fiber
546
+ raise 'continuation called across fiber'
547
+ end
548
+ raise 'Continuations are unsupported on TruffleRuby'
549
+ end
550
+ end
551
+
552
+ module Kernel
553
+ def callcc
554
+ yield Continuation.new
555
+ end
556
+ module_function :callcc
557
+ end
558
+
559
+ ```
560
+
561
+ ## Converting to PDF
562
+
563
+ One of the beauties of knitr is that the same input can be converted to many different outputs.
564
+ One very useful format, is, of course, PDF. In order to converted an R markdown file to PDF
565
+ it is necessary to have LaTeX installed on the system. We will not explain here how to
566
+ install LaTeX as there are plenty of documents on the web showing how to proceed.
567
+
568
+ gKnit comes with a simple LaTeX style file for gknitting this blog as a PDF document. Here is
569
+ the Yaml header to generate this blog in PDF format instead of HTML:
570
+
571
+ ```
572
+ ---
573
+ title: "gKnit - Ruby and R Knitting with Galaaz in GraalVM"
574
+ author: "Rodrigo Botafogo"
575
+ tags: [Galaaz, Ruby, R, TruffleRuby, FastR, GraalVM, knitr, gknit]
576
+ date: "29 October 2018"
577
+ output:
578
+ pdf_document:
579
+ includes:
580
+ in_header: ["../../sty/galaaz.sty"]
581
+ number_sections: yes
582
+ ---
583
+ ```
584
+
585
+ # Conclusion
586
+
587
+ One of the promises of GraalVM is that users/developers will be able to use the best tool
588
+ for their task at hand, independently of the programming language the tool was written. Galaaz
589
+ and gKnit are not trivial implementations atop the GraalVM and Truffle interop messages;
590
+ however, the time and effort it took to wrap Ruby over R - Galaaz - (not finished yet) or to
591
+ wrap Knitr with gKnit is a fraction of a fraction of a fraction of the time require to
592
+ implement the original tools. Trying to reimplement all R packages in Ruby would require the
593
+ same effort it is taking Python to implement NumPy, Panda and all supporting libraries and it
594
+ is unlikely that this effort would ever be done. GraalVM has allowed Ruby to profit "almost
595
+ for free" from this huge set of libraries and tools that make R one of the most used
596
+ languages for data analysis and machine learning.
597
+
598
+ More interesting though than being able to wrap the R libraries with Ruby, is that Ruby adds
599
+ value to R, by allowing developers to use powerful and modern constructs for code reuse that
600
+ are not the strong points of R. As shown in this blog, R and Ruby can easily communicate
601
+ and R can be structured in classes and modules in a way that greatly expands its power and
602
+ readability.
603
+
604
+ # Installing gKnit
605
+
606
+ ## Prerequisites
607
+
608
+ * GraalVM (>= rc8)
609
+ * TruffleRuby
610
+ * FastR
611
+
612
+ The following R packages will be automatically installed when necessary, but could be installed prior
613
+ to using gKnit if desired:
614
+
615
+ * ggplot2
616
+ * gridExtra
617
+ * knitr
618
+
619
+ Installation of R packages requires a development environment and can be time consuming. In Linux,
620
+ the gnu compiler and tools should be enough. I am not sure what is needed on the Mac.
621
+
622
+ ## Preparation
623
+
624
+ * gem install galaaz
625
+
626
+ ## Usage
627
+
628
+ * gknit <filename>