RubyGems - galaaz - Versions diffs - 0.4.7 → 0.4.8 - Mend

galaaz 0.4.7 → 0.4.8

Files changed (68) hide show

checksums.yaml +4 -4
data/README.md +1179 -39
data/Rakefile +4 -2
data/bin/grun +1 -1
data/bin/gstudio +1 -1
data/blogs/dev/dev.Rmd +2 -56
data/blogs/dev/dev.md +32 -61
data/blogs/dev/dev2.Rmd +65 -0
data/blogs/dplyr/dplyr.Rmd +29 -0
data/blogs/{dev/dev.html → dplyr/dplyr.html} +88 -57
data/blogs/dplyr/dplyr.md +58 -0
data/blogs/gknit/gknit.html +1262 -25
data/blogs/gknit/gknit.md +471 -27
data/blogs/gknit/gknit_files/figure-html/bubble-1.png +0 -0
data/blogs/manual/graph.rb +29 -0
data/blogs/manual/manual.Rmd +567 -29
data/blogs/manual/manual.html +743 -46
data/blogs/manual/manual.md +1179 -39
data/blogs/nse_dplyr/nse_dplyr.Rmd +466 -11
data/blogs/nse_dplyr/nse_dplyr.html +472 -37
data/blogs/nse_dplyr/nse_dplyr.md +645 -32
data/blogs/ruby_plot/ruby_plot.Rmd +4 -4
data/blogs/ruby_plot/ruby_plot.html +217 -2
data/blogs/ruby_plot/ruby_plot.md +226 -1
data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/dose_len.svg +2 -2
data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_delivery.svg +70 -70
data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facet_by_dose.svg +72 -72
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color.svg +116 -116
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_by_delivery_color2.svg +176 -176
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_decorations.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_jitter.svg +236 -236
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/facets_with_points.svg +176 -176
data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/final_box_plot.svg +160 -160
data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/final_violin_plot.svg +105 -105
data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.png +0 -0
data/blogs/ruby_plot/ruby_plot_files/figure-html/violin_with_jitter.svg +121 -121
data/examples/islr/ch2.spec.rb +1 -1
data/examples/islr/ch3_boston.rb +4 -4
data/examples/islr/x_y_rnorm.jpg +0 -0
data/lib/R_interface/r.rb +1 -1
data/lib/R_interface/r_methods.rb +2 -2
data/lib/R_interface/rdata_frame.rb +8 -5
data/lib/R_interface/rindexed_object.rb +1 -2
data/lib/R_interface/rlist.rb +1 -0
data/lib/R_interface/robject.rb +0 -1
data/lib/R_interface/rpkg.rb +14 -6
data/lib/R_interface/rsupport.rb +7 -9
data/lib/R_interface/ruby_extensions.rb +17 -5
data/lib/gknit/knitr_engine.rb +9 -2
data/lib/util/exec_ruby.rb +2 -2
data/specs/r_dataframe.spec.rb +173 -0
data/specs/r_list.spec.rb +4 -4
data/specs/ruby_expression.spec.rb +2 -11
data/specs/tmp.rb +76 -34
data/version.rb +1 -1
metadata +17 -6
data/blogs/dev/dev_files/figure-html/bubble-1.png +0 -0
data/blogs/dev/dev_files/figure-html/diverging_bar. +0 -0
data/blogs/dev/dev_files/figure-html/diverging_bar.png +0 -0

data/blogs/gknit/gknit_files/figure-html/bubble-1.png CHANGED

Binary file

@@ -0,0 +1,29 @@
+# Graphics with ggplot
+```{ruby diverging_bar}
+require 'ggplot'
+R.theme_set R.theme_bw
+# Data Prep
+mtcars = ~:mtcars
+mtcars.car_name = R.rownames(:mtcars)
+# compute normalized mpg
+mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean)/mtcars.mpg.sd).round 2
+mtcars.mpg_type = (mtcars.mpg_z > 0) ? "below" : "above"
+mtcars = mtcars[mtcars.mpg_z.order, :all]
+# convert to factor to retain sorted order in plot
+mtcars.car_name = mtcars.car_name.factor levels: mtcars.car_name
+# Diverging Barcharts
+gg = mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
+     R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity',  width: 0.5) +
+     R.scale_fill_manual(name: "Mileage",
+                         labels: R.c("Above Average", "Below Average"),
+                         values: R.c("above": "#00ba38", "below": "#f8766d")) +
+     R.labs(subtitle: "Normalised mileage from 'mtcars'",
+            title: "Diverging Bars") +
+     R.coord_flip()
+puts gg
+```

data/blogs/manual/manual.Rmd CHANGED

@@ -24,7 +24,6 @@ fontsize: 11pt
 R.install_and_loads('kableExtra')
 ```
 # Introduction
 Galaaz is a system for tightly coupling Ruby and R. Ruby is a powerful language, with a large
@@ -93,9 +92,20 @@ Panda, SciPy, SciKit-Learn and a couple more.
   > galaaz master_list:scatter_plot
-# Basic Types
+# gKnitting a Document
+This manual has been formatted usign gKnit.  gKnit uses Knitr and R markdown to knit
+a document in Ruby or R and output it in any of the available formats for R markdown.
+gKnit runs atop of GraalVM, and Galaaz.  In gKnit, Ruby variables are persisted between
+chunks, making it an ideal solution for literate programming.
+Also, since it is based on Galaaz, Ruby chunks can have access to R variables and Polyglot
+Programming with Ruby and R is quite natural.
+gknit was describe in more depth in:
+* xxx.xxxx.xxx
-## Vectors
+# Vector
 Vectors can be thought of as contiguous cells containing data. Cells are accessed through
 indexing operations such as x[5]. Galaaz has six basic (‘atomic’) vector types: logical,
@@ -126,11 +136,13 @@ puts @vec
 Lets take a look at the type, mode and storage.mode of our vector @vec.  In order to print
 this out, we are creating a data frame 'df' and printing it out.  A data frame, for those
-not familiar with it, it basically a table.  Here we create the data frame and add the
+not familiar with it, is basically a table.  Here we create the data frame and add the
 column name by passing named parameters for each column, such as 'typeof:', 'mode:' and
-'storage__mode'.  You should also note here that the double underscore is converted to a '.'.
+'storage__mode?'.  You should also note here that the double underscore is converted to a '.'.
+So, when printed 'storage\_\_mode' will actually print as 'storage.mode'.
-In R, the method used to create a data frame is 'data.frame', in Galaaz we use 'data__frame'.
+Data frames will later be more carefully described.  In R, the method used to create a
+data frame is 'data.frame', in Galaaz we use 'data\_\_frame'.
 ```{ruby typeof_integer}
 df = R.data__frame(typeof: @vec.typeof, mode: @vec.mode, storage__mode: @vec.storage__mode)
@@ -161,41 +173,567 @@ of the error.
 vec = R.c(1, hello, 5)
 ```
+Here is a vector with logical values
+```{ruby logical_vector}
+@vec = R.c(true, true, false, false, true)
+puts @vec
+```
+## Combining Vectors
+The 'c' functions used to create vectors can also be used to combine two vectors:
+```{ruby combining_vectors}
+@vec1 = R.c(10.0, 20.0, 30.0)
+@vec2 = R.c(4.0, 5.0, 6.0)
+@vec = R.c(@vec1, @vec2)
+puts @vec
+```
+In galaaz, methods can be chainned (somewhat like the pipe operator in R %>%, but more generic).
+In this next example, method 'c' is chainned after '@vec1'.  This also looks like 'c' is a
+method of the vector, but in reallity, this is actually closer to the pipe operator.  When
+Galaaz identifies that 'c' is not a method of 'vec' it actually tries to call 'R.c' with
+'@vec1' as the first argument concatenated with all the other available arguments.  The code
+bellow is automatically converted to the code above.
+```{ruby chainning_methods}
+@vec = @vec1.c(@vec2)
+puts @vec
+```
+## Vector Arithmetic
+Arithmetic operations on vectors are performed element by element:
+```{ruby vec_arith1}
+puts @vec1 + @vec2
+```
+```{ruby mult}
+puts @vec1 * 5
+```
+When vectors have different length, a recycling rule is applied to the shorter vector:
+```{ruby recycle}
+@vec3 = R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0)
+puts @vec4 = @vec1 + @vec3
+```
+## Vector Indexing
+Vectors can be indexed by using the '[]' operator:
+```{ruby index}
+puts @vec4[3]
+```
+We can also index a vector with another vector.  For example, in the code bellow, we take elements
+1, 3, 5, and 7 from @vec3:
+```{ruby index_by_vector}
+puts @vec4[R.c(1, 3, 5, 7)]
+```
+Repeating an index and having indices out of order is valid code:
+```{ruby repeated_index}
+puts @vec4[R.c(1, 3, 3, 1)]
+```
+It is also possible to index a vector with a negative number or negative vector.  In these cases
+the indexed values are not returned:
+```{ruby neg_index}
+puts @vec4[-3]
+puts @vec4[-R.c(1, 3, 5, 7)]
+```
+If an index is out of range, a missing value (NA) will be reported.
+```{ruby out_of_range}
+puts @vec4[30]
+```
+It is also possible to index a vector by range:
+```{ruby range}
+puts @vec4[(2..5)]
+```
+Elements in a vector can be named using the 'names' attribute of a vector:
+```{ruby naming}
+full_name = R.c("Rodrigo", "A", "Botafogo")
+full_name.names = R.c("First", "Middle", "Last")
+puts full_name
+```
+Or it can also be named by using the 'c' function with named paramenters:
+```{ruby named_param}
+full_name = R.c(First: "Rodrigo", Middle: "A", Last: "Botafogo")
+puts full_name
+```
+## Extracting Native Ruby Types from a Vector
+Vectors created with 'R.c' are of class R::Vector.  You might have noticed that when indexing a
+vector, a new vector is returned, even if this vector has one single element. In order to use
+R::Vector with other ruby classes it might be necessary to extract the actual Ruby native type
+from the vector. In order to do this extraction the '>>' operator is used.
+```{ruby ruby_native}
+puts @vec4
+puts @vec4 >> 0
+puts @vec4 >> 4
+```
+Note that indexing with '>>' starts at 0 and not at 1, also, we cannot do negative indexing.
+# Accessing R variables
+Galaaz allows Ruby to access variables created in R.  For example, the 'mtcars' data set is
+available in R and can be accessed from Ruby by using the 'tilda' operator followed by the
+symbol for the variable, in this case ':mtcar'.  In the code bellow method 'outputs' is
+used to output the 'mtcars' data set nicely formatted in HTML by use of the 'kable' and
+'kable_styling' functions. Method 'outputs' is only available when used with 'gknit'.
 ```{ruby view_kable}
 outputs (~:mtcars).kable.kable_styling
 ```
+# Matrix
+A matrix is a collection of elements organized as a two dimensional table.  A matrix can be
+created by the 'matrix' function:
+```{ruby matrix}
+@mat = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
+                nrow: 3,
+                ncol: 3)
+puts @mat
+```
+Note that matrices data is organized by column first. It is possible to organize the matrix
+memory by row first passing an extra argument to the 'matrix' function:
+```{ruby matrix_rowfirst}
+@mat_row = R.matrix(R.c(1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0),
+                nrow: 3,
+                ncol: 3,
+                byrow: true)
+puts @mat_row
+```
+## Indexing a Matrix
+A matrix can be indexed by [row, column]:
+```{ruby matrix_index}
+puts @mat_row[1, 1]
+puts @mat_row[2, 3]
+```
+It is possible to index an entire row or column with the ':all' keyword
+```{ruby matrix_index_all}
+puts @mat_row[1, :all]
+puts @mat_row[:all, 2]
+```
+Indexing with a vector is also possible for matrices. In the following example we want
+rows 1 and 3 and columns 2 and 3 building a 2 x 2 matrix.
+```{ruby matrix_index_vector}
+puts @mat_row[R.c(1, 3), R.c(2, 3)]
+```
-## Graphics with ggplot
+Matrices can be combined with functions 'rbind' and 'cbind'
+```{ruby matrix_combine}
+puts @mat_row.rbind(@mat)
+puts @mat_row.cbind(@mat)
+```
+# List
+A list is a data structure that can contain sublists of different types, while vector and matrix
+can only hold one type of element.
+```{ruby list}
+nums = R.c(1.0, 2.0, 3.0)
+strs = R.c("a", "b", "c", "d")
+bool = R.c(true, true, false)
+@lst = R.list(nums: nums, strs: strs, bool: bool)
+puts @lst
+```
-```{ruby diverging_bar}
-require 'ggplot'
+Note that '@lst' elements are named elements.
-R.theme_set R.theme_bw
-# Data Prep
-mtcars = ~:mtcars
-mtcars.car_name = R.rownames(:mtcars)
-# compute normalized mpg
-mtcars.mpg_z = ((mtcars.mpg - mtcars.mpg.mean)/mtcars.mpg.sd).round 2
-mtcars.mpg_type = mtcars.mpg_z < 0 ? "below" : "above"
-mtcars = mtcars[mtcars.mpg_z.order, :all]
-# convert to factor to retain sorted order in plot
-mtcars.car_name = mtcars.car_name.factor levels: mtcars.car_name
+## List Indexing
+List indexing, also called slicing, is done using the '[]' operator and the '[[]]' operator. Let's
+first start with the '[]' operator. The list above has three sublist indexing with '[]' will
+return one of the sublists.
+```{ruby list_indexing}
+puts @lst[1]
+```
+Note that when using '[]' a new list is returned.  When using the double square bracket operator
+the value returned is the actual element of the list in the given position and not a slice of
+the original list
+```{ruby list_indexing_single}
+puts @lst[[1]]
+```
+When elements are named, as dones with @lst, indexing can be done by name:
+```{ruby list_indexing_by_name}
+puts @lst[['bool']][[1]] >> 0
+```
+In this example, first the 'bool' element of the list was extracted, not as a list, but as a vector,
+then the first element of the vector was extracted (note that vectors also accept the '[[]]'
+operator) and then the vector was indexed by its first element, extracting the native Ruby type.
+# Data Frame
+A data frame is a table like structure in which each column has the same number of
+rows. Data frames are the basic structure for storing data for data analysis.  We have already
+seen a data frame previously when we accessed variable '~:mtcars'.  In order to create a
+data frame, function 'data__frame' is used:
+```{ruby dataframe}
+df = R.data__frame(
+  year: R.c(2010, 2011, 2012),
+  income: R.c(1000.0, 1500.0, 2000.0))
+puts df
+```
+## Data Frame Indexing
+A data frame can be indexed the same way as a matrix, by using '[row, column]', where row and
+column can either be a numeric or the name of the row or column
+```{ruby dataframe_index}
+puts (~:mtcars).head
+puts (~:mtcars)[1, 2]
+puts (~:mtcars)['Datsun 710', 'mpg']
+```
+Extracting a column from a data frame as a vector can be done by using the double square bracket
+operator:
+```{ruby dataframe_column}
+puts (~:mtcars)[['mpg']]
+```
-# Diverging Barcharts
-gg = mtcars.ggplot(E.aes(x: :car_name, y: :mpg_z, label: :mpg_z)) +
-     R.geom_bar(E.aes(fill: :mpg_type), stat: 'identity',  width: 0.5) +
-     R.scale_fill_manual(name: "Mileage",
-                         labels: R.c("Above Average", "Below Average"),
-                         values: R.c("above": "#00ba38", "below": "#f8766d")) +
-     R.labs(subtitle: "Normalised mileage from 'mtcars'",
-            title: "Diverging Bars") +
-     R.coord_flip()
+A data frame column can also be accessed as if it were an instance variable of the data frame:
-puts gg
+```{ruby dataframe_instance_variable}
+puts (~:mtcars).mpg
 ```
+Slicing a data frame can be done by indexing it with a vector (we use 'head' to reduce the
+output):
+```{ruby dataframe_column_slice}
+puts (~:mtcars)[R.c('mpg', 'hp')].head
+```
+A row slice can be obtained by indexing by row and using the ':all' keyword for the column:
+```{ruby dataframe_row_slice}
+puts (~:mtcars)[R.c('Datsun 710', 'Camaro Z28'), :all]
+```
+Finally, a data frame can also be indexed with a logical vector.  In this next example, the
+'am' column of :mtcars is compared with 0 (with method 'eq').  When 'am' is equal to 0 the
+car is automatic.  So, by doing '(~:mtcars).am.eq 0' a logical vector is created with
+'true' whenever 'am' is 0 and 'false' otherwise.  Using this logical vector, the data frame
+is indexed, returning a new data frame in which all cars have automatic transmission.
+```{ruby dataframe_logical}
+# obtain a vector with 'true' for cars with automatic transmission
+automatic = (~:mtcars).am.eq 0
+puts automatic
+# slice the data frame by using this vector
+puts (~:mtcars)[automatic, :all]
+```
+# Writing Expressions in Galaaz
+Galaaz extends Ruby to work with complex expressions, similar to R's expressions build with 'quote'
+(base R) or 'quo' (tidyverse).  Let's take a look at some of those expressions.
+## Expressions from operators
+The code bellow
+creates an expression summing two symbols
+```{ruby expressions}
+exp1 = :a + :b
+puts exp1
+```
+We can build any complex mathematical expression
+```{ruby expr2}
+exp2 = (:a + :b) * 2.0 + :c ** 2 / :z
+puts exp2
+```
+It is also possible to use inequality operators in building expressions
+```{ruby expr3}
+exp3 = (:a + :b) >= :z
+puts exp3
+```
+Galaaz provides both symbolic representations for operators, such as (>, <, !=) as functional
+notation for those operators such as (.gt, .ge, etc.).  So the same expression written
+above can also be written as
+```{ruby expr4}
+exp4 = (:a + :b).ge :z
+puts exp4
+```
+Two type of expression can only be created with the functional representation of the operators,
+those are expressions involving '==', and '='.  In order to write an expression involving '==' we
+need to use the method '.eq' and for '=' we need the function '.assign'
+```{ruby expr5}
+exp5 = (:a + :b).eq :z
+puts exp5
+```
+```{ruby expr6}
+exp6 = :y.assign :a + :b
+puts exp6
+```
+In general we think that using the functional notation is preferable to using the
+symbolic notation as otherwise, we end up writing invalid expressions such as
+```{ruby exp_wrong, warning=FALSE}
+exp_wrong = (:a + :b) == :z
+puts exp_wrong
+```
+and it might be difficult to understand what is going on here.  The problem lies with the fact that
+when using '==' we are comparing expression (:a + :b) to expression :z with '=='.  When the
+comparison is executed, the system tries to evaluate :a, :b and :z, and those symbols at
+this time are not bound to anything and we get a "object 'a' not found" message.
+If we only use functional notation, this type of error will not occur.
+## Expressions with R methods
+It is often necessary to create an expression that uses a method or function.  For instance, in
+mathematics, it's quite natural to write an expressin such as $y = sin(x)$. In this case, the
+'sin' function is part of the expression and should not immediately executed. Now, let's say
+that 'x' is an angle of 45$^\circ$ and we acttually want our expression to be $y = 0.850...$.
+When we want the function to be part of the expression, we call the function preceeding it
+by the letter E, such as 'E.sin(x)'
+```{ruby method_expression}
+exp7 = :y.assign E.sin(:x)
+puts exp7
+```
+# Manipulating Data
+One of the major benefits of Galaaz is to bring strong data manipulation to Ruby. The following
+examples were extracted from Hardley's "R for Data Science" (https://r4ds.had.co.nz/). This
+is a highly recommended book for those not already familiar with the 'tidyverse' style of
+programming in R. In the sections to follow, we will limit ourselves to convert the R code to
+Galaaz.
+For these
+examples, we will investigate the nycflights13 data set available on the package by the
+same name.  We use function 'R.install_and_loads' that checks if the library is available
+locally, and if not, installs it. This data frame contains all 336,776 flights that
+departed from New York City in 2013. The data comes from the US Bureau of
+Transportation Statistics.
+```{ruby nycflights13}
+R.install_and_loads('nycflights13')
+R.library('dplyr')
+```
+```{ruby flights}
+@flights = ~:flights
+puts @flights.head.as__data__frame
+```
+## Filtering rows with Filter
+In this example we filter the flights data set by giving to the filter function two expressions:
+the first :month.eq 1
+```{ruby filter_rows}
+puts @flights.filter((:month.eq 1), (:day.eq 1)).head.as__data__frame
+```
+## Logical Operators
+All flights that departed in November of December
+```{ruby nov_dec}
+puts @flights.filter((:month.eq 11) | (:month.eq 12)).head.as__data__frame
+```
+The same as above, but using the 'in' operator. In R, it is possible to define many operators
+by doing %<op>%. The %in% operator checks if a value is in a vector.  In order to use those
+operators from Galaaz the '._' method is used, where the first argument is the operator's
+symbol, in this case ':in' and the second argument is the vector:
+```{ruby in_op}
+puts @flights.filter(:month._ :in, R.c(11, 12)).head.as__data__frame
+```
+## Filtering with NA (Not Available)
+Let's first create a 'tibble' with a Not Available value (R::NA).  Tibbles are a modern
+version of a data frame and operate very similarly to one.  It differs in how it outputs
+the values and the result of some subsetting operations that are more consistent than
+what is obtained from data frame.
+```{ruby na_tibble}
+@df = R.tibble(x: R.c(1, R::NA, 3))
+puts @df.as__data__frame
+```
+Now filtering by :x > 1 shows all lines that satisfy this condition, where the row with R:NA does
+not.
+```{ruby filter_na}
+puts @df.filter(:x > 1).as__data__frame
+```
+To match an NA use method 'is__na'
+```{ruby with_na}
+puts @df.filter((:x.is__na) | (:x > 1)).as__data__frame
+```
+## Arrange Rows with arrange
+Arrange reorders the rows of a data frame by the given arguments.
+```{ruby arrange}
+puts @flights.arrange(:year, :month, :day).head.as__data__frame
+```
+To arrange in descending order, use function 'desc'
+```{ruby desc_arrange}
+puts @flights.arrange(:dep_delay.desc).head.as__data__frame
+```
+## Selecting columns
+To select specific columns from a dataset we use function 'select':
+```{ruby select}
+puts @flights.select(:year, :month, :day).head.as__data__frame
+```
+It is also possible to select column in a given range
+```{ruby select_range}
+puts @flights.select(:year.up_to :day).head.as__data__frame
+```
+Select all columns that start with a given name sequence
+```{ruby select_starts_with}
+puts @flights.select(E.starts_with('arr')).head.as__data__frame
+```
+Other functions that can be used:
+* ends_with("xyz"): matches names that end with “xyz”.
+* contains("ijk"): matches names that contain “ijk”.
+* matches("(.)\\1"): selects variables that match a regular expression. This one matches
+  any variables that contain repeated characters.
+* num_range("x", (1..3)): matches x1, x2 and x3
+A helper function that comes in handy when we just want to rearrange column order is 'Everything':
+```{ruby everything}
+puts @flights.select(:year, :month, :day, E.everything).head.as__data__frame
+```
+## Add variables to a dataframe with 'mutate'
+```{ruby small_flights}
+@flights_sm = @flights.
+                select((:year.up_to :day),
+                       E.ends_with('delay'),
+                       :distance,
+                       :air_time)
+puts @flights_sm.head.as__data__frame
+```
+```{ruby mutate}
+@flights_sm = @flights_sm.
+                mutate(gain: :dep_delay - :arr_delay,
+                       speed: :distance / :air_time * 60)
+puts @flights_sm.head.as__data__frame
+```
+# Graphics in Galaaz
+Creating graphics in Galaaz is quite easy, as it can use all the power of ggplot2.  There are
+many resources in the web that teaches ggplot, so here we give a quick example of ggplot
+integration with Ruby.  We continue to use the :mtcars dataset and we will plot a diverging
+bar plot, showing cars that have 'above' or 'below' gas consuption. Let's first prepare
+the data frame with the necessary data:
+```{ruby diverging_plot_pre}
+# copy the R variable :mtcars to the Ruby mtcars variable
+@mtcars = ~:mtcars
+# create a new column 'car_name' to store the car names so that it can be
+# used for plotting. The 'rownames' of the data frame cannot be used as
+# data for plotting
+@mtcars.car_name = R.rownames(:mtcars)
+# compute normalized mpg and add it to a new column called mpg_z
+# Note that the mean value for mpg can be obtained by calling the 'mean'
+# function on the vector 'mtcars.mpg'.  The same with the standard
+# deviation 'sd'.  The vector is then rounded to two digits with 'round 2'
+@mtcars.mpg_z = ((@mtcars.mpg - @mtcars.mpg.mean)/@mtcars.mpg.sd).round 2
+# create a new column 'mpg_type'. Function 'ifelse' is a vectorized function
+# that looks at every element of the mpg_z vector and if the value is below
+# 0, returns 'below', otherwise returns 'above'
+@mtcars.mpg_type = (@mtcars.mpg_z < 0).ifelse("below", "above")
+# order the mtcar data set by the mpg_z vector from smaler to larger values
+@mtcars = @mtcars[@mtcars.mpg_z.order, :all]
+# convert the car_name column to a factor to retain sorted order in plot
+@mtcars.car_name = @mtcars.car_name.factor levels: @mtcars.car_name
+# let's look at the final data frame
+puts @mtcars
+```
+Now, lets plot the diverging bar plot.  When using gKnit, there is no need to call
+'R.awt' to create a plotting device, since gKnit does take care of it:
 [TO BE CONTINUED...]