galaaz 0.4.7 → 0.4.8
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +1179 -39
- data/Rakefile +4 -2
- data/bin/grun +1 -1
- data/bin/gstudio +1 -1
- data/blogs/dev/dev.Rmd +2 -56
- data/blogs/dev/dev.md +32 -61
- data/blogs/dev/dev2.Rmd +65 -0
- data/blogs/dplyr/dplyr.Rmd +29 -0
- data/blogs/{dev/dev.html → dplyr/dplyr.html} +88 -57
- data/blogs/dplyr/dplyr.md +58 -0
- data/blogs/gknit/gknit.html +1262 -25
- data/blogs/gknit/gknit.md +471 -27
- data/blogs/gknit/gknit_files/figure-html/bubble-1.png +0 -0
- data/blogs/manual/graph.rb +29 -0
- data/blogs/manual/manual.Rmd +567 -29
- data/blogs/manual/manual.html +743 -46
- data/blogs/manual/manual.md +1179 -39
- data/blogs/nse_dplyr/nse_dplyr.Rmd +466 -11
- data/blogs/nse_dplyr/nse_dplyr.html +472 -37
- data/blogs/nse_dplyr/nse_dplyr.md +645 -32
- data/blogs/ruby_plot/ruby_plot.Rmd +4 -4
- data/blogs/ruby_plot/ruby_plot.html +217 -2
- data/blogs/ruby_plot/ruby_plot.md +226 -1
- data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +2 -2
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +70 -70
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +72 -72
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +116 -116
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +176 -176
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +236 -236
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +176 -176
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +160 -160
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +105 -105
- data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
- data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +121 -121
- data/examples/islr/ch2.spec.rb +1 -1
- data/examples/islr/ch3_boston.rb +4 -4
- data/examples/islr/x_y_rnorm.jpg +0 -0
- data/lib/R_interface/r.rb +1 -1
- data/lib/R_interface/r_methods.rb +2 -2
- data/lib/R_interface/rdata_frame.rb +8 -5
- data/lib/R_interface/rindexed_object.rb +1 -2
- data/lib/R_interface/rlist.rb +1 -0
- data/lib/R_interface/robject.rb +0 -1
- data/lib/R_interface/rpkg.rb +14 -6
- data/lib/R_interface/rsupport.rb +7 -9
- data/lib/R_interface/ruby_extensions.rb +17 -5
- data/lib/gknit/knitr_engine.rb +9 -2
- data/lib/util/exec_ruby.rb +2 -2
- data/specs/r_dataframe.spec.rb +173 -0
- data/specs/r_list.spec.rb +4 -4
- data/specs/ruby_expression.spec.rb +2 -11
- data/specs/tmp.rb +76 -34
- data/version.rb +1 -1
- metadata +17 -6
- data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
- data/blogs/dev/dev_files/figure-html/diverging_bar. +0 -0
- data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0
Binary file
|
@@ -0,0 +1,29 @@
|
|
1
|
+
# Graphics with ggplot
|
2
|
+
|
3
|
+
```{ruby diverging_bar}
|
4
|
+
require 'ggplot'
|
5
|
+
|
6
|
+
R.theme_set R.theme_bw
|
7
|
+
|
8
|
+
# Data Prep
|
9
|
+
mtcars = ~:mtcars
|
10
|
+
mtcars.car_name = R.rownames(:mtcars)
|
11
|
+
# compute normalized mpg
|
12
|
+
mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean)/mtcars.mpg.sd).round 2
|
13
|
+
mtcars.mpg_type = (mtcars.mpg_z > 0) ? "below" : "above"
|
14
|
+
mtcars = mtcars[mtcars.mpg_z.order, :all]
|
15
|
+
# convert to factor to retain sorted order in plot
|
16
|
+
mtcars.car_name = mtcars.car_name.factor levels: mtcars.car_name
|
17
|
+
|
18
|
+
# Diverging Barcharts
|
19
|
+
gg = mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
|
20
|
+
R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
|
21
|
+
R.scale_fill_manual(name: "Mileage",
|
22
|
+
labels: R.c("Above Average", "Below Average"),
|
23
|
+
values: R.c("above": "#00ba38", "below": "#f8766d")) +
|
24
|
+
R.labs(subtitle: "Normalised mileage from 'mtcars'",
|
25
|
+
title: "Diverging Bars") +
|
26
|
+
R.coord_flip()
|
27
|
+
|
28
|
+
puts gg
|
29
|
+
```
|
data/blogs/manual/manual.Rmd
CHANGED
@@ -24,7 +24,6 @@ fontsize: 11pt
|
|
24
24
|
R.install_and_loads('kableExtra')
|
25
25
|
```
|
26
26
|
|
27
|
-
|
28
27
|
# Introduction
|
29
28
|
|
30
29
|
Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with a large
|
@@ -93,9 +92,20 @@ Panda, SciPy, SciKit-Learn and a couple more.
|
|
93
92
|
|
94
93
|
> galaaz master_list:scatter_plot
|
95
94
|
|
96
|
-
#
|
95
|
+
# gKnitting a Document
|
96
|
+
|
97
|
+
This manual has been formatted usign gKnit. gKnit uses Knitr and R markdown to knit
|
98
|
+
a document in Ruby or R and output it in any of the available formats for R markdown.
|
99
|
+
gKnit runs atop of GraalVM, and Galaaz. In gKnit, Ruby variables are persisted between
|
100
|
+
chunks, making it an ideal solution for literate programming.
|
101
|
+
Also, since it is based on Galaaz, Ruby chunks can have access to R variables and Polyglot
|
102
|
+
Programming with Ruby and R is quite natural.
|
103
|
+
|
104
|
+
gknit was describe in more depth in:
|
105
|
+
|
106
|
+
* xxx.xxxx.xxx
|
97
107
|
|
98
|
-
|
108
|
+
# Vector
|
99
109
|
|
100
110
|
Vectors can be thought of as contiguous cells containing data. Cells are accessed through
|
101
111
|
indexing operations such as x[5]. Galaaz has six basic (‘atomic’) vector types: logical,
|
@@ -126,11 +136,13 @@ puts @vec
|
|
126
136
|
|
127
137
|
Lets take a look at the type, mode and storage.mode of our vector @vec. In order to print
|
128
138
|
this out, we are creating a data frame 'df' and printing it out. A data frame, for those
|
129
|
-
not familiar with it,
|
139
|
+
not familiar with it, is basically a table. Here we create the data frame and add the
|
130
140
|
column name by passing named parameters for each column, such as 'typeof:', 'mode:' and
|
131
|
-
'storage__mode'. You should also note here that the double underscore is converted to a '.'.
|
141
|
+
'storage__mode?'. You should also note here that the double underscore is converted to a '.'.
|
142
|
+
So, when printed 'storage\_\_mode' will actually print as 'storage.mode'.
|
132
143
|
|
133
|
-
In R, the method used to create a
|
144
|
+
Data frames will later be more carefully described. In R, the method used to create a
|
145
|
+
data frame is 'data.frame', in Galaaz we use 'data\_\_frame'.
|
134
146
|
|
135
147
|
```{ruby typeof_integer}
|
136
148
|
df = R.data__frame(typeof: @vec.typeof, mode: @vec.mode, storage__mode: @vec.storage__mode)
|
@@ -161,41 +173,567 @@ of the error.
|
|
161
173
|
vec = R.c(1, hello, 5)
|
162
174
|
```
|
163
175
|
|
176
|
+
Here is a vector with logical values
|
177
|
+
|
178
|
+
```{ruby logical_vector}
|
179
|
+
@vec = R.c(true, true, false, false, true)
|
180
|
+
puts @vec
|
181
|
+
```
|
182
|
+
|
183
|
+
## Combining Vectors
|
184
|
+
|
185
|
+
The 'c' functions used to create vectors can also be used to combine two vectors:
|
186
|
+
|
187
|
+
```{ruby combining_vectors}
|
188
|
+
@vec1 = R.c(10.0, 20.0, 30.0)
|
189
|
+
@vec2 = R.c(4.0, 5.0, 6.0)
|
190
|
+
@vec = R.c(@vec1, @vec2)
|
191
|
+
puts @vec
|
192
|
+
```
|
193
|
+
In galaaz, methods can be chainned (somewhat like the pipe operator in R %>%, but more generic).
|
194
|
+
In this next example, method 'c' is chainned after '@vec1'. This also looks like 'c' is a
|
195
|
+
method of the vector, but in reallity, this is actually closer to the pipe operator. When
|
196
|
+
Galaaz identifies that 'c' is not a method of 'vec' it actually tries to call 'R.c' with
|
197
|
+
'@vec1' as the first argument concatenated with all the other available arguments. The code
|
198
|
+
bellow is automatically converted to the code above.
|
199
|
+
|
200
|
+
```{ruby chainning_methods}
|
201
|
+
@vec = @vec1.c(@vec2)
|
202
|
+
puts @vec
|
203
|
+
```
|
204
|
+
|
205
|
+
## Vector Arithmetic
|
206
|
+
|
207
|
+
Arithmetic operations on vectors are performed element by element:
|
208
|
+
|
209
|
+
```{ruby vec_arith1}
|
210
|
+
puts @vec1 + @vec2
|
211
|
+
```
|
212
|
+
|
213
|
+
```{ruby mult}
|
214
|
+
puts @vec1 * 5
|
215
|
+
```
|
216
|
+
|
217
|
+
When vectors have different length, a recycling rule is applied to the shorter vector:
|
218
|
+
|
219
|
+
```{ruby recycle}
|
220
|
+
@vec3 = R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)
|
221
|
+
puts @vec4 = @vec1 + @vec3
|
222
|
+
```
|
223
|
+
|
224
|
+
## Vector Indexing
|
225
|
+
|
226
|
+
Vectors can be indexed by using the '[]' operator:
|
227
|
+
|
228
|
+
```{ruby index}
|
229
|
+
puts @vec4[3]
|
230
|
+
```
|
231
|
+
|
232
|
+
We can also index a vector with another vector. For example, in the code bellow, we take elements
|
233
|
+
1, 3, 5, and 7 from @vec3:
|
234
|
+
|
235
|
+
```{ruby index_by_vector}
|
236
|
+
puts @vec4[R.c(1, 3, 5, 7)]
|
237
|
+
```
|
238
|
+
|
239
|
+
Repeating an index and having indices out of order is valid code:
|
240
|
+
|
241
|
+
```{ruby repeated_index}
|
242
|
+
puts @vec4[R.c(1, 3, 3, 1)]
|
243
|
+
```
|
244
|
+
|
245
|
+
It is also possible to index a vector with a negative number or negative vector. In these cases
|
246
|
+
the indexed values are not returned:
|
247
|
+
|
248
|
+
```{ruby neg_index}
|
249
|
+
puts @vec4[-3]
|
250
|
+
puts @vec4[-R.c(1, 3, 5, 7)]
|
251
|
+
```
|
252
|
+
|
253
|
+
If an index is out of range, a missing value (NA) will be reported.
|
254
|
+
|
255
|
+
```{ruby out_of_range}
|
256
|
+
puts @vec4[30]
|
257
|
+
```
|
258
|
+
|
259
|
+
It is also possible to index a vector by range:
|
260
|
+
|
261
|
+
```{ruby range}
|
262
|
+
puts @vec4[(2..5)]
|
263
|
+
```
|
264
|
+
|
265
|
+
Elements in a vector can be named using the 'names' attribute of a vector:
|
266
|
+
|
267
|
+
```{ruby naming}
|
268
|
+
full_name = R.c("Rodrigo", "A", "Botafogo")
|
269
|
+
full_name.names = R.c("First", "Middle", "Last")
|
270
|
+
puts full_name
|
271
|
+
```
|
272
|
+
|
273
|
+
Or it can also be named by using the 'c' function with named paramenters:
|
274
|
+
|
275
|
+
```{ruby named_param}
|
276
|
+
full_name = R.c(First: "Rodrigo", Middle: "A", Last: "Botafogo")
|
277
|
+
puts full_name
|
278
|
+
```
|
279
|
+
|
280
|
+
## Extracting Native Ruby Types from a Vector
|
281
|
+
|
282
|
+
Vectors created with 'R.c' are of class R::Vector. You might have noticed that when indexing a
|
283
|
+
vector, a new vector is returned, even if this vector has one single element. In order to use
|
284
|
+
R::Vector with other ruby classes it might be necessary to extract the actual Ruby native type
|
285
|
+
from the vector. In order to do this extraction the '>>' operator is used.
|
286
|
+
|
287
|
+
```{ruby ruby_native}
|
288
|
+
puts @vec4
|
289
|
+
puts @vec4 >> 0
|
290
|
+
puts @vec4 >> 4
|
291
|
+
```
|
292
|
+
|
293
|
+
Note that indexing with '>>' starts at 0 and not at 1, also, we cannot do negative indexing.
|
294
|
+
|
295
|
+
# Accessing R variables
|
296
|
+
|
297
|
+
Galaaz allows Ruby to access variables created in R. For example, the 'mtcars' data set is
|
298
|
+
available in R and can be accessed from Ruby by using the 'tilda' operator followed by the
|
299
|
+
symbol for the variable, in this case ':mtcar'. In the code bellow method 'outputs' is
|
300
|
+
used to output the 'mtcars' data set nicely formatted in HTML by use of the 'kable' and
|
301
|
+
'kable_styling' functions. Method 'outputs' is only available when used with 'gknit'.
|
302
|
+
|
164
303
|
```{ruby view_kable}
|
165
304
|
outputs (~:mtcars).kable.kable_styling
|
166
305
|
```
|
167
306
|
|
307
|
+
# Matrix
|
308
|
+
|
309
|
+
A matrix is a collection of elements organized as a two dimensional table. A matrix can be
|
310
|
+
created by the 'matrix' function:
|
311
|
+
|
312
|
+
```{ruby matrix}
|
313
|
+
@mat = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
|
314
|
+
nrow: 3,
|
315
|
+
ncol: 3)
|
316
|
+
|
317
|
+
puts @mat
|
318
|
+
```
|
319
|
+
Note that matrices data is organized by column first. It is possible to organize the matrix
|
320
|
+
memory by row first passing an extra argument to the 'matrix' function:
|
321
|
+
|
322
|
+
```{ruby matrix_rowfirst}
|
323
|
+
@mat_row = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
|
324
|
+
nrow: 3,
|
325
|
+
ncol: 3,
|
326
|
+
byrow: true)
|
327
|
+
|
328
|
+
puts @mat_row
|
329
|
+
```
|
330
|
+
|
331
|
+
## Indexing a Matrix
|
332
|
+
|
333
|
+
A matrix can be indexed by [row, column]:
|
334
|
+
|
335
|
+
```{ruby matrix_index}
|
336
|
+
puts @mat_row[1, 1]
|
337
|
+
puts @mat_row[2, 3]
|
338
|
+
```
|
339
|
+
It is possible to index an entire row or column with the ':all' keyword
|
340
|
+
|
341
|
+
```{ruby matrix_index_all}
|
342
|
+
puts @mat_row[1, :all]
|
343
|
+
puts @mat_row[:all, 2]
|
344
|
+
```
|
345
|
+
|
346
|
+
Indexing with a vector is also possible for matrices. In the following example we want
|
347
|
+
rows 1 and 3 and columns 2 and 3 building a 2 x 2 matrix.
|
348
|
+
|
349
|
+
```{ruby matrix_index_vector}
|
350
|
+
puts @mat_row[R.c(1, 3), R.c(2, 3)]
|
351
|
+
```
|
168
352
|
|
169
|
-
|
353
|
+
Matrices can be combined with functions 'rbind' and 'cbind'
|
354
|
+
|
355
|
+
```{ruby matrix_combine}
|
356
|
+
puts @mat_row.rbind(@mat)
|
357
|
+
puts @mat_row.cbind(@mat)
|
358
|
+
```
|
359
|
+
|
360
|
+
# List
|
361
|
+
|
362
|
+
A list is a data structure that can contain sublists of different types, while vector and matrix
|
363
|
+
can only hold one type of element.
|
364
|
+
|
365
|
+
```{ruby list}
|
366
|
+
nums = R.c(1.0, 2.0, 3.0)
|
367
|
+
strs = R.c("a", "b", "c", "d")
|
368
|
+
bool = R.c(true, true, false)
|
369
|
+
@lst = R.list(nums: nums, strs: strs, bool: bool)
|
370
|
+
puts @lst
|
371
|
+
```
|
170
372
|
|
171
|
-
|
172
|
-
require 'ggplot'
|
373
|
+
Note that '@lst' elements are named elements.
|
173
374
|
|
174
|
-
R.theme_set R.theme_bw
|
175
375
|
|
176
|
-
|
177
|
-
|
178
|
-
|
179
|
-
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
|
184
|
-
|
376
|
+
## List Indexing
|
377
|
+
|
378
|
+
List indexing, also called slicing, is done using the '[]' operator and the '[[]]' operator. Let's
|
379
|
+
first start with the '[]' operator. The list above has three sublist indexing with '[]' will
|
380
|
+
return one of the sublists.
|
381
|
+
|
382
|
+
```{ruby list_indexing}
|
383
|
+
puts @lst[1]
|
384
|
+
```
|
385
|
+
|
386
|
+
Note that when using '[]' a new list is returned. When using the double square bracket operator
|
387
|
+
the value returned is the actual element of the list in the given position and not a slice of
|
388
|
+
the original list
|
389
|
+
|
390
|
+
|
391
|
+
```{ruby list_indexing_single}
|
392
|
+
puts @lst[[1]]
|
393
|
+
```
|
394
|
+
|
395
|
+
When elements are named, as dones with @lst, indexing can be done by name:
|
396
|
+
|
397
|
+
```{ruby list_indexing_by_name}
|
398
|
+
puts @lst[['bool']][[1]] >> 0
|
399
|
+
```
|
400
|
+
|
401
|
+
In this example, first the 'bool' element of the list was extracted, not as a list, but as a vector,
|
402
|
+
then the first element of the vector was extracted (note that vectors also accept the '[[]]'
|
403
|
+
operator) and then the vector was indexed by its first element, extracting the native Ruby type.
|
404
|
+
|
405
|
+
|
406
|
+
# Data Frame
|
407
|
+
|
408
|
+
A data frame is a table like structure in which each column has the same number of
|
409
|
+
rows. Data frames are the basic structure for storing data for data analysis. We have already
|
410
|
+
seen a data frame previously when we accessed variable '~:mtcars'. In order to create a
|
411
|
+
data frame, function 'data__frame' is used:
|
412
|
+
|
413
|
+
```{ruby dataframe}
|
414
|
+
df = R.data__frame(
|
415
|
+
year: R.c(2010, 2011, 2012),
|
416
|
+
income: R.c(1000.0, 1500.0, 2000.0))
|
417
|
+
|
418
|
+
puts df
|
419
|
+
```
|
420
|
+
|
421
|
+
## Data Frame Indexing
|
422
|
+
|
423
|
+
A data frame can be indexed the same way as a matrix, by using '[row, column]', where row and
|
424
|
+
column can either be a numeric or the name of the row or column
|
425
|
+
|
426
|
+
```{ruby dataframe_index}
|
427
|
+
puts (~:mtcars).head
|
428
|
+
puts (~:mtcars)[1, 2]
|
429
|
+
puts (~:mtcars)['Datsun 710', 'mpg']
|
430
|
+
```
|
431
|
+
|
432
|
+
Extracting a column from a data frame as a vector can be done by using the double square bracket
|
433
|
+
operator:
|
434
|
+
|
435
|
+
```{ruby dataframe_column}
|
436
|
+
puts (~:mtcars)[['mpg']]
|
437
|
+
```
|
185
438
|
|
186
|
-
|
187
|
-
gg = mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
|
188
|
-
R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity', width: 0.5) +
|
189
|
-
R.scale_fill_manual(name: "Mileage",
|
190
|
-
labels: R.c("Above Average", "Below Average"),
|
191
|
-
values: R.c("above": "#00ba38", "below": "#f8766d")) +
|
192
|
-
R.labs(subtitle: "Normalised mileage from 'mtcars'",
|
193
|
-
title: "Diverging Bars") +
|
194
|
-
R.coord_flip()
|
439
|
+
A data frame column can also be accessed as if it were an instance variable of the data frame:
|
195
440
|
|
196
|
-
|
441
|
+
```{ruby dataframe_instance_variable}
|
442
|
+
puts (~:mtcars).mpg
|
197
443
|
```
|
198
444
|
|
445
|
+
Slicing a data frame can be done by indexing it with a vector (we use 'head' to reduce the
|
446
|
+
output):
|
447
|
+
|
448
|
+
```{ruby dataframe_column_slice}
|
449
|
+
puts (~:mtcars)[R.c('mpg', 'hp')].head
|
450
|
+
```
|
451
|
+
|
452
|
+
A row slice can be obtained by indexing by row and using the ':all' keyword for the column:
|
453
|
+
|
454
|
+
```{ruby dataframe_row_slice}
|
455
|
+
puts (~:mtcars)[R.c('Datsun 710', 'Camaro Z28'), :all]
|
456
|
+
```
|
457
|
+
|
458
|
+
Finally, a data frame can also be indexed with a logical vector. In this next example, the
|
459
|
+
'am' column of :mtcars is compared with 0 (with method 'eq'). When 'am' is equal to 0 the
|
460
|
+
car is automatic. So, by doing '(~:mtcars).am.eq 0' a logical vector is created with
|
461
|
+
'true' whenever 'am' is 0 and 'false' otherwise. Using this logical vector, the data frame
|
462
|
+
is indexed, returning a new data frame in which all cars have automatic transmission.
|
463
|
+
|
464
|
+
```{ruby dataframe_logical}
|
465
|
+
# obtain a vector with 'true' for cars with automatic transmission
|
466
|
+
automatic = (~:mtcars).am.eq 0
|
467
|
+
puts automatic
|
468
|
+
|
469
|
+
# slice the data frame by using this vector
|
470
|
+
puts (~:mtcars)[automatic, :all]
|
471
|
+
```
|
472
|
+
|
473
|
+
# Writing Expressions in Galaaz
|
474
|
+
|
475
|
+
Galaaz extends Ruby to work with complex expressions, similar to R's expressions build with 'quote'
|
476
|
+
(base R) or 'quo' (tidyverse). Let's take a look at some of those expressions.
|
477
|
+
|
478
|
+
## Expressions from operators
|
479
|
+
|
480
|
+
The code bellow
|
481
|
+
creates an expression summing two symbols
|
482
|
+
|
483
|
+
```{ruby expressions}
|
484
|
+
exp1 = :a + :b
|
485
|
+
puts exp1
|
486
|
+
```
|
487
|
+
We can build any complex mathematical expression
|
488
|
+
|
489
|
+
```{ruby expr2}
|
490
|
+
exp2 = (:a + :b) * 2.0 + :c ** 2 / :z
|
491
|
+
puts exp2
|
492
|
+
```
|
493
|
+
|
494
|
+
It is also possible to use inequality operators in building expressions
|
495
|
+
|
496
|
+
```{ruby expr3}
|
497
|
+
exp3 = (:a + :b) >= :z
|
498
|
+
puts exp3
|
499
|
+
```
|
500
|
+
|
501
|
+
Galaaz provides both symbolic representations for operators, such as (>, <, !=) as functional
|
502
|
+
notation for those operators such as (.gt, .ge, etc.). So the same expression written
|
503
|
+
above can also be written as
|
504
|
+
|
505
|
+
```{ruby expr4}
|
506
|
+
exp4 = (:a + :b).ge :z
|
507
|
+
puts exp4
|
508
|
+
```
|
509
|
+
|
510
|
+
Two type of expression can only be created with the functional representation of the operators,
|
511
|
+
those are expressions involving '==', and '='. In order to write an expression involving '==' we
|
512
|
+
need to use the method '.eq' and for '=' we need the function '.assign'
|
513
|
+
|
514
|
+
```{ruby expr5}
|
515
|
+
exp5 = (:a + :b).eq :z
|
516
|
+
puts exp5
|
517
|
+
```
|
518
|
+
|
519
|
+
```{ruby expr6}
|
520
|
+
exp6 = :y.assign :a + :b
|
521
|
+
puts exp6
|
522
|
+
```
|
523
|
+
In general we think that using the functional notation is preferable to using the
|
524
|
+
symbolic notation as otherwise, we end up writing invalid expressions such as
|
525
|
+
|
526
|
+
```{ruby exp_wrong, warning=FALSE}
|
527
|
+
exp_wrong = (:a + :b) == :z
|
528
|
+
puts exp_wrong
|
529
|
+
```
|
530
|
+
and it might be difficult to understand what is going on here. The problem lies with the fact that
|
531
|
+
when using '==' we are comparing expression (:a + :b) to expression :z with '=='. When the
|
532
|
+
comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols at
|
533
|
+
this time are not bound to anything and we get a "object 'a' not found" message.
|
534
|
+
If we only use functional notation, this type of error will not occur.
|
535
|
+
|
536
|
+
## Expressions with R methods
|
537
|
+
|
538
|
+
It is often necessary to create an expression that uses a method or function. For instance, in
|
539
|
+
mathematics, it's quite natural to write an expressin such as $y = sin(x)$. In this case, the
|
540
|
+
'sin' function is part of the expression and should not immediately executed. Now, let's say
|
541
|
+
that 'x' is an angle of 45$^\circ$ and we acttually want our expression to be $y = 0.850...$.
|
542
|
+
When we want the function to be part of the expression, we call the function preceeding it
|
543
|
+
by the letter E, such as 'E.sin(x)'
|
544
|
+
|
545
|
+
```{ruby method_expression}
|
546
|
+
exp7 = :y.assign E.sin(:x)
|
547
|
+
puts exp7
|
548
|
+
```
|
549
|
+
|
550
|
+
# Manipulating Data
|
551
|
+
|
552
|
+
One of the major benefits of Galaaz is to bring strong data manipulation to Ruby. The following
|
553
|
+
examples were extracted from Hardley's "R for Data Science" (https://r4ds.had.co.nz/). This
|
554
|
+
is a highly recommended book for those not already familiar with the 'tidyverse' style of
|
555
|
+
programming in R. In the sections to follow, we will limit ourselves to convert the R code to
|
556
|
+
Galaaz.
|
557
|
+
|
558
|
+
For these
|
559
|
+
examples, we will investigate the nycflights13 data set available on the package by the
|
560
|
+
same name. We use function 'R.install_and_loads' that checks if the library is available
|
561
|
+
locally, and if not, installs it. This data frame contains all 336,776 flights that
|
562
|
+
departed from New York City in 2013. The data comes from the US Bureau of
|
563
|
+
Transportation Statistics.
|
564
|
+
|
565
|
+
```{ruby nycflights13}
|
566
|
+
R.install_and_loads('nycflights13')
|
567
|
+
R.library('dplyr')
|
568
|
+
```
|
569
|
+
|
570
|
+
```{ruby flights}
|
571
|
+
@flights = ~:flights
|
572
|
+
puts @flights.head.as__data__frame
|
573
|
+
```
|
574
|
+
|
575
|
+
## Filtering rows with Filter
|
576
|
+
|
577
|
+
In this example we filter the flights data set by giving to the filter function two expressions:
|
578
|
+
the first :month.eq 1
|
579
|
+
|
580
|
+
```{ruby filter_rows}
|
581
|
+
puts @flights.filter((:month.eq 1), (:day.eq 1)).head.as__data__frame
|
582
|
+
```
|
583
|
+
|
584
|
+
## Logical Operators
|
585
|
+
|
586
|
+
All flights that departed in November of December
|
587
|
+
|
588
|
+
```{ruby nov_dec}
|
589
|
+
puts @flights.filter((:month.eq 11) | (:month.eq 12)).head.as__data__frame
|
590
|
+
```
|
591
|
+
|
592
|
+
The same as above, but using the 'in' operator. In R, it is possible to define many operators
|
593
|
+
by doing %<op>%. The %in% operator checks if a value is in a vector. In order to use those
|
594
|
+
operators from Galaaz the '._' method is used, where the first argument is the operator's
|
595
|
+
symbol, in this case ':in' and the second argument is the vector:
|
596
|
+
|
597
|
+
```{ruby in_op}
|
598
|
+
puts @flights.filter(:month._ :in, R.c(11, 12)).head.as__data__frame
|
599
|
+
```
|
600
|
+
|
601
|
+
## Filtering with NA (Not Available)
|
602
|
+
|
603
|
+
Let's first create a 'tibble' with a Not Available value (R::NA). Tibbles are a modern
|
604
|
+
version of a data frame and operate very similarly to one. It differs in how it outputs
|
605
|
+
the values and the result of some subsetting operations that are more consistent than
|
606
|
+
what is obtained from data frame.
|
607
|
+
|
608
|
+
```{ruby na_tibble}
|
609
|
+
@df = R.tibble(x: R.c(1, R::NA, 3))
|
610
|
+
puts @df.as__data__frame
|
611
|
+
```
|
612
|
+
|
613
|
+
Now filtering by :x > 1 shows all lines that satisfy this condition, where the row with R:NA does
|
614
|
+
not.
|
615
|
+
|
616
|
+
```{ruby filter_na}
|
617
|
+
puts @df.filter(:x > 1).as__data__frame
|
618
|
+
```
|
619
|
+
|
620
|
+
To match an NA use method 'is__na'
|
621
|
+
|
622
|
+
```{ruby with_na}
|
623
|
+
puts @df.filter((:x.is__na) | (:x > 1)).as__data__frame
|
624
|
+
```
|
625
|
+
|
626
|
+
## Arrange Rows with arrange
|
627
|
+
|
628
|
+
Arrange reorders the rows of a data frame by the given arguments.
|
629
|
+
|
630
|
+
```{ruby arrange}
|
631
|
+
puts @flights.arrange(:year, :month, :day).head.as__data__frame
|
632
|
+
```
|
633
|
+
|
634
|
+
To arrange in descending order, use function 'desc'
|
635
|
+
|
636
|
+
```{ruby desc_arrange}
|
637
|
+
puts @flights.arrange(:dep_delay.desc).head.as__data__frame
|
638
|
+
```
|
639
|
+
|
640
|
+
## Selecting columns
|
641
|
+
|
642
|
+
To select specific columns from a dataset we use function 'select':
|
643
|
+
|
644
|
+
```{ruby select}
|
645
|
+
puts @flights.select(:year, :month, :day).head.as__data__frame
|
646
|
+
```
|
647
|
+
|
648
|
+
It is also possible to select column in a given range
|
649
|
+
|
650
|
+
```{ruby select_range}
|
651
|
+
puts @flights.select(:year.up_to :day).head.as__data__frame
|
652
|
+
```
|
653
|
+
|
654
|
+
Select all columns that start with a given name sequence
|
655
|
+
|
656
|
+
```{ruby select_starts_with}
|
657
|
+
puts @flights.select(E.starts_with('arr')).head.as__data__frame
|
658
|
+
```
|
659
|
+
|
660
|
+
Other functions that can be used:
|
661
|
+
|
662
|
+
* ends_with("xyz"): matches names that end with “xyz”.
|
663
|
+
|
664
|
+
* contains("ijk"): matches names that contain “ijk”.
|
665
|
+
|
666
|
+
* matches("(.)\\1"): selects variables that match a regular expression. This one matches
|
667
|
+
any variables that contain repeated characters.
|
668
|
+
|
669
|
+
* num_range("x", (1..3)): matches x1, x2 and x3
|
670
|
+
|
671
|
+
A helper function that comes in handy when we just want to rearrange column order is 'Everything':
|
672
|
+
|
673
|
+
```{ruby everything}
|
674
|
+
puts @flights.select(:year, :month, :day, E.everything).head.as__data__frame
|
675
|
+
```
|
676
|
+
|
677
|
+
## Add variables to a dataframe with 'mutate'
|
678
|
+
|
679
|
+
```{ruby small_flights}
|
680
|
+
@flights_sm = @flights.
|
681
|
+
select((:year.up_to :day),
|
682
|
+
E.ends_with('delay'),
|
683
|
+
:distance,
|
684
|
+
:air_time)
|
685
|
+
|
686
|
+
puts @flights_sm.head.as__data__frame
|
687
|
+
```
|
688
|
+
|
689
|
+
```{ruby mutate}
|
690
|
+
@flights_sm = @flights_sm.
|
691
|
+
mutate(gain: :dep_delay - :arr_delay,
|
692
|
+
speed: :distance / :air_time * 60)
|
693
|
+
puts @flights_sm.head.as__data__frame
|
694
|
+
```
|
695
|
+
|
696
|
+
# Graphics in Galaaz
|
697
|
+
|
698
|
+
Creating graphics in Galaaz is quite easy, as it can use all the power of ggplot2. There are
|
699
|
+
many resources in the web that teaches ggplot, so here we give a quick example of ggplot
|
700
|
+
integration with Ruby. We continue to use the :mtcars dataset and we will plot a diverging
|
701
|
+
bar plot, showing cars that have 'above' or 'below' gas consuption. Let's first prepare
|
702
|
+
the data frame with the necessary data:
|
703
|
+
|
704
|
+
```{ruby diverging_plot_pre}
|
705
|
+
# copy the R variable :mtcars to the Ruby mtcars variable
|
706
|
+
@mtcars = ~:mtcars
|
707
|
+
|
708
|
+
# create a new column 'car_name' to store the car names so that it can be
|
709
|
+
# used for plotting. The 'rownames' of the data frame cannot be used as
|
710
|
+
# data for plotting
|
711
|
+
@mtcars.car_name = R.rownames(:mtcars)
|
712
|
+
|
713
|
+
# compute normalized mpg and add it to a new column called mpg_z
|
714
|
+
# Note that the mean value for mpg can be obtained by calling the 'mean'
|
715
|
+
# function on the vector 'mtcars.mpg'. The same with the standard
|
716
|
+
# deviation 'sd'. The vector is then rounded to two digits with 'round 2'
|
717
|
+
@mtcars.mpg_z = ((@mtcars.mpg - @mtcars.mpg.mean)/@mtcars.mpg.sd).round 2
|
718
|
+
|
719
|
+
# create a new column 'mpg_type'. Function 'ifelse' is a vectorized function
|
720
|
+
# that looks at every element of the mpg_z vector and if the value is below
|
721
|
+
# 0, returns 'below', otherwise returns 'above'
|
722
|
+
@mtcars.mpg_type = (@mtcars.mpg_z < 0).ifelse("below", "above")
|
723
|
+
|
724
|
+
# order the mtcar data set by the mpg_z vector from smaler to larger values
|
725
|
+
@mtcars = @mtcars[@mtcars.mpg_z.order, :all]
|
726
|
+
|
727
|
+
# convert the car_name column to a factor to retain sorted order in plot
|
728
|
+
@mtcars.car_name = @mtcars.car_name.factor levels: @mtcars.car_name
|
729
|
+
|
730
|
+
# let's look at the final data frame
|
731
|
+
puts @mtcars
|
732
|
+
```
|
733
|
+
Now, lets plot the diverging bar plot. When using gKnit, there is no need to call
|
734
|
+
'R.awt' to create a plotting device, since gKnit does take care of it:
|
735
|
+
|
736
|
+
|
199
737
|
|
200
738
|
[TO BE CONTINUED...]
|
201
739
|
|